Message ID | pull.1648.v3.git.git.1705615794307.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [v3] merge-ll: expose revision names to custom drivers | expand |
Hi Junio, After more testing (combining custom merge drivers with rerere) I realized that my patch can lead to a segmentation error. Many apologies for not having caught that earlier! On 18/01/2024 23:09, Antonin Delpeuch via GitGitGadget wrote: > @@ -222,6 +222,12 @@ static enum ll_merge_result ll_ext_merge(const struct ll_merge_driver *fn, > strbuf_addf(&cmd, "%d", marker_size); > else if (skip_prefix(format, "P", &format)) > sq_quote_buf(&cmd, path); > + else if (skip_prefix(format, "S", &format)) > + sq_quote_buf(&cmd, orig_name); > + else if (skip_prefix(format, "X", &format)) > + sq_quote_buf(&cmd, name1); > + else if (skip_prefix(format, "Y", &format)) > + sq_quote_buf(&cmd, name2); The "orig_name", "name1" and "name2" pointers can be NULL at this stage. This can happen when the merge is invoked from rerere, to resolve a conflict using a previous resolution. I wonder what the appropriate fallback would be in such a case. I am tempted to use the temporary filenames of the files to merge instead, so that the merge driver can rely on those names being non-empty and being the best string to use to identify the files. Passing an empty string seems dangerous to me, as it is likely to change the index of arguments passed to the merge driver. Passing fixed strings such as "base", "ours" and "theirs" could perhaps work too. Let me know if you have any preference about this. Best, Antonin
Hi Antonin On 18/01/2024 22:09, Antonin Delpeuch via GitGitGadget wrote: > From: Antonin Delpeuch <antonin@delpeuch.eu> > > Custom merge drivers need access to the names of the revisions they > are working on, so that the merge conflict markers they introduce > can refer to those revisions. The placeholders '%S', '%X' and '%Y' > are introduced to this end. Thanks for working on this, I think it is a useful improvement. I guess '%X' and '%Y' are no worse than the existing '%A' and '%B' but I do wonder if we want to take the opportunity to switch to more descriptive names for the various parameters passed to the custom merge strategy. We do do this by supporting %(label:ours) modeled after the format specifiers used by other commands such as "git log" and "git for-each-ref". > [...] > +will be stored via placeholder `%P`. Additionally, the names of the > +common ancestor revision (`%S`), of the current revision (`%X`) and > +of the other branch (`%Y`) can also be supplied. Those are short > +revision names, optionally joined with the paths of the file in each > +revision. Those paths are only present if they differ and are separated > +from the revision by a colon. It might be simpler to just call these the "conflict marker labels" without tying ourselves to a particular format. Something like The conflict labels to be used for the common ancestor, local head and other head can be passed by using '%(label:base)', '%(label:ours)' and '%(label:theirs) respectively. > @@ -222,6 +222,12 @@ static enum ll_merge_result ll_ext_merge(const struct ll_merge_driver *fn, Not part of this patch but I noticed that we're passing the filenames for '%A' etc. unquoted which is a bit scary. > strbuf_addf(&cmd, "%d", marker_size); > else if (skip_prefix(format, "P", &format)) > sq_quote_buf(&cmd, path); > + else if (skip_prefix(format, "S", &format)) > + sq_quote_buf(&cmd, orig_name); I think you can avoid the SIGSEV problem you mentioned in your other email by changing this to sq_quote_buf(&cmd, orig_name ? orig_name, ""); That would make sure the labels we pass match the ones used by the internal merge. Best Wishes Phillip
Antonin Delpeuch <antonin@delpeuch.eu> writes: > After more testing (combining custom merge drivers with rerere) I > realized that my patch can lead to a segmentation error. Many > apologies for not having caught that earlier! Ah, understandable. The 3-way merge machinery may not even have to work on commit objects (it can merge two trees, using another tree as the "common ancestor" tree, just fine). And in such a case, it is perfectly possible there is no "human readable name"; all there is may be a tree object name. > On 18/01/2024 23:09, Antonin Delpeuch via GitGitGadget wrote: >> @@ -222,6 +222,12 @@ static enum ll_merge_result ll_ext_merge(const struct ll_merge_driver *fn, >> strbuf_addf(&cmd, "%d", marker_size); >> else if (skip_prefix(format, "P", &format)) >> sq_quote_buf(&cmd, path); >> + else if (skip_prefix(format, "S", &format)) >> + sq_quote_buf(&cmd, orig_name); >> + else if (skip_prefix(format, "X", &format)) >> + sq_quote_buf(&cmd, name1); >> + else if (skip_prefix(format, "Y", &format)) >> + sq_quote_buf(&cmd, name2); > > The "orig_name", "name1" and "name2" pointers can be NULL at this > stage. This can happen when the merge is invoked from rerere, to > resolve a conflict using a previous resolution. sq_quote_buf(&cmd, name1 ? name1 : "(ours)"); or something like that, perhaps.
Phillip Wood <phillip.wood123@gmail.com> writes: > Not part of this patch but I noticed that we're passing the filenames > for '%A' etc. unquoted which is a bit scary. May be scary but safe, as long as create_temp() gives a reasonable temporary filename. We pass ".merge_file_XXXXXX" to xmkstemp(), which calls into mkstemp(), which should give us a shell safe name? It also should be a safe conversion to change strbuf_addstr() used for these three to sq_quote_buf(), as the string with these %[OAB] placeholders are passed to the shell that eats the quoting before invoking the end-user supplied external merge driver, which means the merge driver would not notice any difference. Thanks for being careful ;-)
Hi Junio On 20/01/2024 17:37, Junio C Hamano wrote: > Phillip Wood <phillip.wood123@gmail.com> writes: > >> Not part of this patch but I noticed that we're passing the filenames >> for '%A' etc. unquoted which is a bit scary. > > May be scary but safe, as long as create_temp() gives a reasonable > temporary filename. We pass ".merge_file_XXXXXX" to xmkstemp(), > which calls into mkstemp(), which should give us a shell safe name? Yes. I'd mis-read create_temp() and thought we were appending ".merge_file_XXXXX" to the path being merged but looking at it again it is safe. > It also should be a safe conversion to change strbuf_addstr() used > for these three to sq_quote_buf(), as the string with these %[OAB] > placeholders are passed to the shell that eats the quoting before > invoking the end-user supplied external merge driver, which means > the merge driver would not notice any difference. I agree that would be a safe conversion , but I'm not sure it is worth it. Best Wishes Phillip
Phillip Wood <phillip.wood123@gmail.com> writes: > Thanks for working on this, I think it is a useful improvement. I > guess '%X' and '%Y' are no worse than the existing '%A' and '%B' but I > do wonder if we want to take the opportunity to switch to more > descriptive names for the various parameters passed to the custom > merge strategy. We do do this by supporting %(label:ours) modeled > after the format specifiers used by other commands such as "git log" > and "git for-each-ref". Perhaps. Unlike the --format option these commands take, the placeholders are never typed from the command line (they always are taken from the configuration file), so mnemonic value longer version gives over the current single-letter ones is not as valuable, while making the total line length longer. So I dunno. >> [...] >> +will be stored via placeholder `%P`. Additionally, the names of the >> +common ancestor revision (`%S`), of the current revision (`%X`) and >> +of the other branch (`%Y`) can also be supplied. Those are short > +revision names, optionally joined with the paths of the file in each >> +revision. Those paths are only present if they differ and are separated >> +from the revision by a colon. > > It might be simpler to just call these the "conflict marker labels" > without tying ourselves to a particular format. Something like > > The conflict labels to be used for the common ancestor, local head > and other head can be passed by using '%(label:base)', > '%(label:ours)' and '%(label:theirs) respectively. Yeah, that sounds like a good improvement, even if we did not use the longhand placeholders and replaced %(label:{base,ours,theirs}) with %S, %X, and %Y. >> @@ -222,6 +222,12 @@ static enum ll_merge_result ll_ext_merge(const struct ll_merge_driver *fn, > > Not part of this patch but I noticed that we're passing the filenames > for '%A' etc. unquoted which is a bit scary. > >> strbuf_addf(&cmd, "%d", marker_size); >> else if (skip_prefix(format, "P", &format)) >> sq_quote_buf(&cmd, path); >> + else if (skip_prefix(format, "S", &format)) >> + sq_quote_buf(&cmd, orig_name); > > I think you can avoid the SIGSEV problem you mentioned in your other > email by changing this to > > sq_quote_buf(&cmd, orig_name ? orig_name, ""); > > That would make sure the labels we pass match the ones used by the > internal merge. Makes sense. That would be much better than using hardcoded string "ours", "theirs", etc.
diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt index 201bdf5edbd..86a0946bb9e 100644 --- a/Documentation/gitattributes.txt +++ b/Documentation/gitattributes.txt @@ -1137,11 +1137,11 @@ The `merge.*.name` variable gives the driver a human-readable name. The `merge.*.driver` variable's value is used to construct a -command to run to merge ancestor's version (`%O`), current +command to run to common ancestor's version (`%O`), current version (`%A`) and the other branches' version (`%B`). These three tokens are replaced with the names of temporary files that hold the contents of these versions when the command line is -built. Additionally, %L will be replaced with the conflict marker +built. Additionally, `%L` will be replaced with the conflict marker size (see below). The merge driver is expected to leave the result of the merge in @@ -1159,8 +1159,12 @@ When left unspecified, the driver itself is used for both internal merge and the final merge. The merge driver can learn the pathname in which the merged result -will be stored via placeholder `%P`. - +will be stored via placeholder `%P`. Additionally, the names of the +common ancestor revision (`%S`), of the current revision (`%X`) and +of the other branch (`%Y`) can also be supplied. Those are short +revision names, optionally joined with the paths of the file in each +revision. Those paths are only present if they differ and are separated +from the revision by a colon. `conflict-marker-size` ^^^^^^^^^^^^^^^^^^^^^^ diff --git a/merge-ll.c b/merge-ll.c index 1df58ebaac0..13e0713fe82 100644 --- a/merge-ll.c +++ b/merge-ll.c @@ -185,9 +185,9 @@ static void create_temp(mmfile_t *src, char *path, size_t len) static enum ll_merge_result ll_ext_merge(const struct ll_merge_driver *fn, mmbuffer_t *result, const char *path, - mmfile_t *orig, const char *orig_name UNUSED, - mmfile_t *src1, const char *name1 UNUSED, - mmfile_t *src2, const char *name2 UNUSED, + mmfile_t *orig, const char *orig_name, + mmfile_t *src1, const char *name1, + mmfile_t *src2, const char *name2, const struct ll_merge_options *opts, int marker_size) { @@ -222,6 +222,12 @@ static enum ll_merge_result ll_ext_merge(const struct ll_merge_driver *fn, strbuf_addf(&cmd, "%d", marker_size); else if (skip_prefix(format, "P", &format)) sq_quote_buf(&cmd, path); + else if (skip_prefix(format, "S", &format)) + sq_quote_buf(&cmd, orig_name); + else if (skip_prefix(format, "X", &format)) + sq_quote_buf(&cmd, name1); + else if (skip_prefix(format, "Y", &format)) + sq_quote_buf(&cmd, name2); else strbuf_addch(&cmd, '%'); } @@ -315,7 +321,12 @@ static int read_merge_config(const char *var, const char *value, * %B - temporary file name for the other branches' version. * %L - conflict marker length * %P - the original path (safely quoted for the shell) + * %S - the revision for the merge base + * %X - the revision for our version + * %Y - the revision for their version * + * If the file is not named indentically in all versions, then each + * revision is joined with the corresponding path, separated by a colon. * The external merge driver should write the results in the * file named by %A, and signal that it has done with zero exit * status. diff --git a/t/t6406-merge-attr.sh b/t/t6406-merge-attr.sh index 72f8c1722ff..156a1efacfe 100755 --- a/t/t6406-merge-attr.sh +++ b/t/t6406-merge-attr.sh @@ -42,11 +42,15 @@ test_expect_success setup ' #!/bin/sh orig="$1" ours="$2" theirs="$3" exit="$4" path=$5 + orig_name="$6" our_name="$7" their_name="$8" ( echo "orig is $orig" echo "ours is $ours" echo "theirs is $theirs" echo "path is $path" + echo "orig_name is $orig_name" + echo "our_name is $our_name" + echo "their_name is $their_name" echo "=== orig ===" cat "$orig" echo "=== ours ===" @@ -121,7 +125,7 @@ test_expect_success 'custom merge backend' ' git reset --hard anchor && git config --replace-all \ - merge.custom.driver "./custom-merge %O %A %B 0 %P" && + merge.custom.driver "./custom-merge %O %A %B 0 %P %S %X %Y" && git config --replace-all \ merge.custom.name "custom merge driver for testing" && @@ -132,7 +136,8 @@ test_expect_success 'custom merge backend' ' o=$(git unpack-file main^:text) && a=$(git unpack-file side^:text) && b=$(git unpack-file main:text) && - sh -c "./custom-merge $o $a $b 0 text" && + base_revid=$(git rev-parse --short main^) && + sh -c "./custom-merge $o $a $b 0 text $base_revid HEAD main" && sed -e 1,3d $a >check-2 && cmp check-1 check-2 && rm -f $o $a $b @@ -142,7 +147,7 @@ test_expect_success 'custom merge backend' ' git reset --hard anchor && git config --replace-all \ - merge.custom.driver "./custom-merge %O %A %B 1 %P" && + merge.custom.driver "./custom-merge %O %A %B 1 %P %S %X %Y" && git config --replace-all \ merge.custom.name "custom merge driver for testing" && @@ -159,7 +164,8 @@ test_expect_success 'custom merge backend' ' o=$(git unpack-file main^:text) && a=$(git unpack-file anchor:text) && b=$(git unpack-file main:text) && - sh -c "./custom-merge $o $a $b 0 text" && + base_revid=$(git rev-parse --short main^) && + sh -c "./custom-merge $o $a $b 0 text $base_revid HEAD main" && sed -e 1,3d $a >check-2 && cmp check-1 check-2 && sed -e 1,3d -e 4q $a >check-3 && @@ -173,7 +179,7 @@ test_expect_success !WINDOWS 'custom merge driver that is killed with a signal' git reset --hard anchor && git config --replace-all \ - merge.custom.driver "./custom-merge %O %A %B 0 %P" && + merge.custom.driver "./custom-merge %O %A %B 0 %P %S %X %Y" && git config --replace-all \ merge.custom.name "custom merge driver for testing" &&