diff mbox series

[v3] merge-ll: expose revision names to custom drivers

Message ID pull.1648.v3.git.git.1705615794307.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series [v3] merge-ll: expose revision names to custom drivers | expand

Commit Message

Antonin Delpeuch Jan. 18, 2024, 10:09 p.m. UTC
From: Antonin Delpeuch <antonin@delpeuch.eu>

Custom merge drivers need access to the names of the revisions they
are working on, so that the merge conflict markers they introduce
can refer to those revisions. The placeholders '%S', '%X' and '%Y'
are introduced to this end.

Signed-off-by: Antonin Delpeuch <antonin@delpeuch.eu>
---
    merge-ll: expose revision names to custom drivers
    
    Changes since v2:
    
     * change the documentation to use "common ancestor" rather than "merge
       ancestor"
     * fix indentation issue

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1648%2Fwetneb%2Fmerge_driver_pathnames-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1648/wetneb/merge_driver_pathnames-v3
Pull-Request: https://github.com/git/git/pull/1648

Range-diff vs v2:

 1:  6dec70529c0 ! 1:  aebd26711fe merge-ll: expose revision names to custom drivers
     @@ Commit message
          Signed-off-by: Antonin Delpeuch <antonin@delpeuch.eu>
      
       ## Documentation/gitattributes.txt ##
     -@@ Documentation/gitattributes.txt: command to run to merge ancestor's version (`%O`), current
     +@@ Documentation/gitattributes.txt: The `merge.*.name` variable gives the driver a human-readable
     + name.
     + 
     + The `merge.*.driver` variable's value is used to construct a
     +-command to run to merge ancestor's version (`%O`), current
     ++command to run to common ancestor's version (`%O`), current
       version (`%A`) and the other branches' version (`%B`).  These
       three tokens are replaced with the names of temporary files that
       hold the contents of these versions when the command line is
     @@ Documentation/gitattributes.txt: When left unspecified, the driver itself is use
      -will be stored via placeholder `%P`.
      -
      +will be stored via placeholder `%P`. Additionally, the names of the
     -+merge ancestor revision (`%S`), of the current revision (`%X`) and
     ++common ancestor revision (`%S`), of the current revision (`%X`) and
      +of the other branch (`%Y`) can also be supplied. Those are short
      +revision names, optionally joined with the paths of the file in each
      +revision. Those paths are only present if they differ and are separated
     @@ merge-ll.c: static enum ll_merge_result ll_ext_merge(const struct ll_merge_drive
       		else if (skip_prefix(format, "P", &format))
       			sq_quote_buf(&cmd, path);
      +		else if (skip_prefix(format, "S", &format))
     -+		    sq_quote_buf(&cmd, orig_name);
     ++			sq_quote_buf(&cmd, orig_name);
      +		else if (skip_prefix(format, "X", &format))
      +			sq_quote_buf(&cmd, name1);
      +		else if (skip_prefix(format, "Y", &format))


 Documentation/gitattributes.txt | 12 ++++++++----
 merge-ll.c                      | 17 ++++++++++++++---
 t/t6406-merge-attr.sh           | 16 +++++++++++-----
 3 files changed, 33 insertions(+), 12 deletions(-)


base-commit: 186b115d3062e6230ee296d1ddaa0c4b72a464b5

Comments

Antonin Delpeuch Jan. 19, 2024, 8:02 p.m. UTC | #1
Hi Junio,

After more testing (combining custom merge drivers with rerere) I 
realized that my patch can lead to a segmentation error. Many apologies 
for not having caught that earlier!

On 18/01/2024 23:09, Antonin Delpeuch via GitGitGadget wrote:
> @@ -222,6 +222,12 @@ static enum ll_merge_result ll_ext_merge(const struct ll_merge_driver *fn,
>   			strbuf_addf(&cmd, "%d", marker_size);
>   		else if (skip_prefix(format, "P", &format))
>   			sq_quote_buf(&cmd, path);
> +		else if (skip_prefix(format, "S", &format))
> +			sq_quote_buf(&cmd, orig_name);
> +		else if (skip_prefix(format, "X", &format))
> +			sq_quote_buf(&cmd, name1);
> +		else if (skip_prefix(format, "Y", &format))
> +			sq_quote_buf(&cmd, name2);

The "orig_name", "name1" and "name2" pointers can be NULL at this stage. 
This can happen when the merge is invoked from rerere, to resolve a 
conflict using a previous resolution.

I wonder what the appropriate fallback would be in such a case. I am 
tempted to use the temporary filenames of the files to merge instead, so 
that the merge driver can rely on those names being non-empty and being 
the best string to use to identify the files. Passing an empty string 
seems dangerous to me, as it is likely to change the index of arguments 
passed to the merge driver. Passing fixed strings such as "base", "ours" 
and "theirs" could perhaps work too.

Let me know if you have any preference about this.

Best,

Antonin
Phillip Wood Jan. 20, 2024, 2:13 p.m. UTC | #2
Hi Antonin

On 18/01/2024 22:09, Antonin Delpeuch via GitGitGadget wrote:
> From: Antonin Delpeuch <antonin@delpeuch.eu>
> 
> Custom merge drivers need access to the names of the revisions they
> are working on, so that the merge conflict markers they introduce
> can refer to those revisions. The placeholders '%S', '%X' and '%Y'
> are introduced to this end.

Thanks for working on this, I think it is a useful improvement. I guess 
'%X' and '%Y' are no worse than the existing '%A' and '%B' but I do 
wonder if we want to take the opportunity to switch to more descriptive 
names for the various parameters passed to the custom merge strategy. We 
do do this by supporting %(label:ours) modeled after the format 
specifiers used by other commands such as "git log" and "git for-each-ref".

> [...]
> +will be stored via placeholder `%P`. Additionally, the names of the
> +common ancestor revision (`%S`), of the current revision (`%X`) and
> +of the other branch (`%Y`) can also be supplied. Those are short > +revision names, optionally joined with the paths of the file in each
> +revision. Those paths are only present if they differ and are separated
> +from the revision by a colon.

It might be simpler to just call these the "conflict marker labels" 
without tying ourselves to a particular format. Something like

     The conflict labels to be used for the common ancestor, local head
     and other head can be passed by using '%(label:base)',
     '%(label:ours)' and '%(label:theirs) respectively.


> @@ -222,6 +222,12 @@ static enum ll_merge_result ll_ext_merge(const struct ll_merge_driver *fn,

Not part of this patch but I noticed that we're passing the filenames 
for '%A' etc. unquoted which is a bit scary.

>   			strbuf_addf(&cmd, "%d", marker_size);
>   		else if (skip_prefix(format, "P", &format))
>   			sq_quote_buf(&cmd, path);
> +		else if (skip_prefix(format, "S", &format))
> +			sq_quote_buf(&cmd, orig_name);

I think you can avoid the SIGSEV problem you mentioned in your other 
email by changing this to

	sq_quote_buf(&cmd, orig_name ? orig_name, "");

That would make sure the labels we pass match the ones used by the 
internal merge.

Best Wishes

Phillip
Junio C Hamano Jan. 20, 2024, 5:25 p.m. UTC | #3
Antonin Delpeuch <antonin@delpeuch.eu> writes:

> After more testing (combining custom merge drivers with rerere) I
> realized that my patch can lead to a segmentation error. Many
> apologies for not having caught that earlier!

Ah, understandable.  The 3-way merge machinery may not even have to
work on commit objects (it can merge two trees, using another tree
as the "common ancestor" tree, just fine).

And in such a case, it is perfectly possible there is no "human
readable name"; all there is may be a tree object name.

> On 18/01/2024 23:09, Antonin Delpeuch via GitGitGadget wrote:
>> @@ -222,6 +222,12 @@ static enum ll_merge_result ll_ext_merge(const struct ll_merge_driver *fn,
>>   			strbuf_addf(&cmd, "%d", marker_size);
>>   		else if (skip_prefix(format, "P", &format))
>>   			sq_quote_buf(&cmd, path);
>> +		else if (skip_prefix(format, "S", &format))
>> +			sq_quote_buf(&cmd, orig_name);
>> +		else if (skip_prefix(format, "X", &format))
>> +			sq_quote_buf(&cmd, name1);
>> +		else if (skip_prefix(format, "Y", &format))
>> +			sq_quote_buf(&cmd, name2);
>
> The "orig_name", "name1" and "name2" pointers can be NULL at this
> stage. This can happen when the merge is invoked from rerere, to
> resolve a conflict using a previous resolution.

	sq_quote_buf(&cmd, name1 ? name1 : "(ours)");

or something like that, perhaps.
Junio C Hamano Jan. 20, 2024, 5:37 p.m. UTC | #4
Phillip Wood <phillip.wood123@gmail.com> writes:

> Not part of this patch but I noticed that we're passing the filenames
> for '%A' etc. unquoted which is a bit scary.

May be scary but safe, as long as create_temp() gives a reasonable
temporary filename.  We pass ".merge_file_XXXXXX" to xmkstemp(),
which calls into mkstemp(), which should give us a shell safe name?

It also should be a safe conversion to change strbuf_addstr() used
for these three to sq_quote_buf(), as the string with these %[OAB]
placeholders are passed to the shell that eats the quoting before
invoking the end-user supplied external merge driver, which means
the merge driver would not notice any difference.

Thanks for being careful ;-)
Phillip Wood Jan. 20, 2024, 6:23 p.m. UTC | #5
Hi Junio

On 20/01/2024 17:37, Junio C Hamano wrote:
> Phillip Wood <phillip.wood123@gmail.com> writes:
> 
>> Not part of this patch but I noticed that we're passing the filenames
>> for '%A' etc. unquoted which is a bit scary.
> 
> May be scary but safe, as long as create_temp() gives a reasonable
> temporary filename.  We pass ".merge_file_XXXXXX" to xmkstemp(),
> which calls into mkstemp(), which should give us a shell safe name?

Yes. I'd mis-read create_temp() and thought we were appending 
".merge_file_XXXXX" to the path being merged but looking at it again it 
is safe.

> It also should be a safe conversion to change strbuf_addstr() used
> for these three to sq_quote_buf(), as the string with these %[OAB]
> placeholders are passed to the shell that eats the quoting before
> invoking the end-user supplied external merge driver, which means
> the merge driver would not notice any difference.

I agree that would be a safe conversion , but I'm not sure it is worth it.

Best Wishes

Phillip
Junio C Hamano Jan. 20, 2024, 10:49 p.m. UTC | #6
Phillip Wood <phillip.wood123@gmail.com> writes:

> Thanks for working on this, I think it is a useful improvement. I
> guess '%X' and '%Y' are no worse than the existing '%A' and '%B' but I
> do wonder if we want to take the opportunity to switch to more
> descriptive names for the various parameters passed to the custom
> merge strategy. We do do this by supporting %(label:ours) modeled
> after the format specifiers used by other commands such as "git log"
> and "git for-each-ref".

Perhaps.  Unlike the --format option these commands take, the
placeholders are never typed from the command line (they always are
taken from the configuration file), so mnemonic value longer version
gives over the current single-letter ones is not as valuable, while
making the total line length longer.  So I dunno.

>> [...]
>> +will be stored via placeholder `%P`. Additionally, the names of the
>> +common ancestor revision (`%S`), of the current revision (`%X`) and
>> +of the other branch (`%Y`) can also be supplied. Those are short > +revision names, optionally joined with the paths of the file in each
>> +revision. Those paths are only present if they differ and are separated
>> +from the revision by a colon.
>
> It might be simpler to just call these the "conflict marker labels"
> without tying ourselves to a particular format. Something like
>
>     The conflict labels to be used for the common ancestor, local head
>     and other head can be passed by using '%(label:base)',
>     '%(label:ours)' and '%(label:theirs) respectively.

Yeah, that sounds like a good improvement, even if we did not use
the longhand placeholders and replaced %(label:{base,ours,theirs})
with %S, %X, and %Y.

>> @@ -222,6 +222,12 @@ static enum ll_merge_result ll_ext_merge(const struct ll_merge_driver *fn,
>
> Not part of this patch but I noticed that we're passing the filenames
> for '%A' etc. unquoted which is a bit scary.
>
>>   			strbuf_addf(&cmd, "%d", marker_size);
>>   		else if (skip_prefix(format, "P", &format))
>>   			sq_quote_buf(&cmd, path);
>> +		else if (skip_prefix(format, "S", &format))
>> +			sq_quote_buf(&cmd, orig_name);
>
> I think you can avoid the SIGSEV problem you mentioned in your other
> email by changing this to
>
> 	sq_quote_buf(&cmd, orig_name ? orig_name, "");
>
> That would make sure the labels we pass match the ones used by the
> internal merge.

Makes sense.  That would be much better than using hardcoded string
"ours", "theirs", etc.
diff mbox series

Patch

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 201bdf5edbd..86a0946bb9e 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -1137,11 +1137,11 @@  The `merge.*.name` variable gives the driver a human-readable
 name.
 
 The `merge.*.driver` variable's value is used to construct a
-command to run to merge ancestor's version (`%O`), current
+command to run to common ancestor's version (`%O`), current
 version (`%A`) and the other branches' version (`%B`).  These
 three tokens are replaced with the names of temporary files that
 hold the contents of these versions when the command line is
-built. Additionally, %L will be replaced with the conflict marker
+built. Additionally, `%L` will be replaced with the conflict marker
 size (see below).
 
 The merge driver is expected to leave the result of the merge in
@@ -1159,8 +1159,12 @@  When left unspecified, the driver itself is used for both
 internal merge and the final merge.
 
 The merge driver can learn the pathname in which the merged result
-will be stored via placeholder `%P`.
-
+will be stored via placeholder `%P`. Additionally, the names of the
+common ancestor revision (`%S`), of the current revision (`%X`) and
+of the other branch (`%Y`) can also be supplied. Those are short
+revision names, optionally joined with the paths of the file in each
+revision. Those paths are only present if they differ and are separated
+from the revision by a colon.
 
 `conflict-marker-size`
 ^^^^^^^^^^^^^^^^^^^^^^
diff --git a/merge-ll.c b/merge-ll.c
index 1df58ebaac0..13e0713fe82 100644
--- a/merge-ll.c
+++ b/merge-ll.c
@@ -185,9 +185,9 @@  static void create_temp(mmfile_t *src, char *path, size_t len)
 static enum ll_merge_result ll_ext_merge(const struct ll_merge_driver *fn,
 			mmbuffer_t *result,
 			const char *path,
-			mmfile_t *orig, const char *orig_name UNUSED,
-			mmfile_t *src1, const char *name1 UNUSED,
-			mmfile_t *src2, const char *name2 UNUSED,
+			mmfile_t *orig, const char *orig_name,
+			mmfile_t *src1, const char *name1,
+			mmfile_t *src2, const char *name2,
 			const struct ll_merge_options *opts,
 			int marker_size)
 {
@@ -222,6 +222,12 @@  static enum ll_merge_result ll_ext_merge(const struct ll_merge_driver *fn,
 			strbuf_addf(&cmd, "%d", marker_size);
 		else if (skip_prefix(format, "P", &format))
 			sq_quote_buf(&cmd, path);
+		else if (skip_prefix(format, "S", &format))
+			sq_quote_buf(&cmd, orig_name);
+		else if (skip_prefix(format, "X", &format))
+			sq_quote_buf(&cmd, name1);
+		else if (skip_prefix(format, "Y", &format))
+			sq_quote_buf(&cmd, name2);
 		else
 			strbuf_addch(&cmd, '%');
 	}
@@ -315,7 +321,12 @@  static int read_merge_config(const char *var, const char *value,
 		 *    %B - temporary file name for the other branches' version.
 		 *    %L - conflict marker length
 		 *    %P - the original path (safely quoted for the shell)
+		 *    %S - the revision for the merge base
+		 *    %X - the revision for our version
+		 *    %Y - the revision for their version
 		 *
+		 * If the file is not named indentically in all versions, then each
+		 * revision is joined with the corresponding path, separated by a colon.
 		 * The external merge driver should write the results in the
 		 * file named by %A, and signal that it has done with zero exit
 		 * status.
diff --git a/t/t6406-merge-attr.sh b/t/t6406-merge-attr.sh
index 72f8c1722ff..156a1efacfe 100755
--- a/t/t6406-merge-attr.sh
+++ b/t/t6406-merge-attr.sh
@@ -42,11 +42,15 @@  test_expect_success setup '
 	#!/bin/sh
 
 	orig="$1" ours="$2" theirs="$3" exit="$4" path=$5
+	orig_name="$6" our_name="$7" their_name="$8"
 	(
 		echo "orig is $orig"
 		echo "ours is $ours"
 		echo "theirs is $theirs"
 		echo "path is $path"
+		echo "orig_name is $orig_name"
+		echo "our_name is $our_name"
+		echo "their_name is $their_name"
 		echo "=== orig ==="
 		cat "$orig"
 		echo "=== ours ==="
@@ -121,7 +125,7 @@  test_expect_success 'custom merge backend' '
 
 	git reset --hard anchor &&
 	git config --replace-all \
-	merge.custom.driver "./custom-merge %O %A %B 0 %P" &&
+	merge.custom.driver "./custom-merge %O %A %B 0 %P %S %X %Y" &&
 	git config --replace-all \
 	merge.custom.name "custom merge driver for testing" &&
 
@@ -132,7 +136,8 @@  test_expect_success 'custom merge backend' '
 	o=$(git unpack-file main^:text) &&
 	a=$(git unpack-file side^:text) &&
 	b=$(git unpack-file main:text) &&
-	sh -c "./custom-merge $o $a $b 0 text" &&
+	base_revid=$(git rev-parse --short main^) &&
+	sh -c "./custom-merge $o $a $b 0 text $base_revid HEAD main" &&
 	sed -e 1,3d $a >check-2 &&
 	cmp check-1 check-2 &&
 	rm -f $o $a $b
@@ -142,7 +147,7 @@  test_expect_success 'custom merge backend' '
 
 	git reset --hard anchor &&
 	git config --replace-all \
-	merge.custom.driver "./custom-merge %O %A %B 1 %P" &&
+	merge.custom.driver "./custom-merge %O %A %B 1 %P %S %X %Y" &&
 	git config --replace-all \
 	merge.custom.name "custom merge driver for testing" &&
 
@@ -159,7 +164,8 @@  test_expect_success 'custom merge backend' '
 	o=$(git unpack-file main^:text) &&
 	a=$(git unpack-file anchor:text) &&
 	b=$(git unpack-file main:text) &&
-	sh -c "./custom-merge $o $a $b 0 text" &&
+	base_revid=$(git rev-parse --short main^) &&
+	sh -c "./custom-merge $o $a $b 0 text $base_revid HEAD main" &&
 	sed -e 1,3d $a >check-2 &&
 	cmp check-1 check-2 &&
 	sed -e 1,3d -e 4q $a >check-3 &&
@@ -173,7 +179,7 @@  test_expect_success !WINDOWS 'custom merge driver that is killed with a signal'
 
 	git reset --hard anchor &&
 	git config --replace-all \
-	merge.custom.driver "./custom-merge %O %A %B 0 %P" &&
+	merge.custom.driver "./custom-merge %O %A %B 0 %P %S %X %Y" &&
 	git config --replace-all \
 	merge.custom.name "custom merge driver for testing" &&