diff mbox series

merge-file: add --diff-algorithm option

Message ID pull.1606.git.git.1699480494355.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series merge-file: add --diff-algorithm option | expand

Commit Message

Antonin Delpeuch Nov. 8, 2023, 9:54 p.m. UTC
From: Antonin Delpeuch <antonin@delpeuch.eu>

This makes it possible to use other diff algorithms than the 'myers'
default algorithm, when using the 'git merge-file' command.

Signed-off-by: Antonin Delpeuch <antonin@delpeuch.eu>
---
    merge-file: add --diff-algorithm option

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1606%2Fwetneb%2Fmerge_file_configurable_diff_algorithm-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1606/wetneb/merge_file_configurable_diff_algorithm-v1
Pull-Request: https://github.com/git/git/pull/1606

 Documentation/git-merge-file.txt |  5 +++++
 builtin/merge-file.c             | 28 ++++++++++++++++++++++++++++
 2 files changed, 33 insertions(+)


base-commit: 98009afd24e2304bf923a64750340423473809ff

Comments

Antonin Delpeuch Nov. 17, 2023, 9:42 p.m. UTC | #1
Hi all,

Here a few more thoughts about this patch, to explain what brought me to 
needing that. If this need is misguided, perhaps you could redirect me 
to a better solution.

I am writing a custom merge driver for Java files. This merge driver 
internally calls git-merge-file and then solves the merge conflicts 
which only consist of import statements (there might be cases where it 
gets it wrong, but I can then use other tools to cleanup those import 
statements). When testing this, I noticed that the merge driver 
performed more poorly on other sorts of conflicts, compared to the 
standard "ort" merge strategy. This is because "ort" uses the 
"histogram" diff algorithm, which gives better results than the "myers" 
diff algorithm that merge-file uses.

Intuitively, if "histogram" is the default diff algorithm used by "git 
merge", then it would also make sense to have the same default for "git 
merge-file", but I assume that changing this default could be considered 
a bad breaking change. So I thought that making this diff algorithm 
configurable would be an acceptable move, hence my patch.

Of course, the diffing could be configured in other ways, for instance 
with its handling of whitespace or EOL (similarly to what the "git-diff" 
command offers). I think those options would definitely be worth 
exposing in merge-file as well. If you think this makes sense, then I 
would be happy to work on a new version of this patch which would 
attempt to include all the relevant options. I could also try to add the 
corresponding tests.

But perhaps my need is misguided? Could it be that I should not be 
writing a custom merge driver, but instead use another extension point 
to only process the conflicting hunks after execution of the existing 
merge driver? I couldn't find such an extension point, but it can well 
be that I missed it.

Thank you,

Antonin
Phillip Wood Nov. 19, 2023, 4:42 p.m. UTC | #2
Hi Antonin

On 08/11/2023 21:54, Antonin Delpeuch via GitGitGadget wrote:
> From: Antonin Delpeuch <antonin@delpeuch.eu>
> 
> This makes it possible to use other diff algorithms than the 'myers'
> default algorithm, when using the 'git merge-file' command.

I think being able to select the diff algorithm is reasonable. I might 
be nice to mention the use of "git merge-file" in custom merge drivers 
as a motivation in the commit message.

> Signed-off-by: Antonin Delpeuch <antonin@delpeuch.eu>
> ---
>      merge-file: add --diff-algorithm option
> 
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1606%2Fwetneb%2Fmerge_file_configurable_diff_algorithm-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1606/wetneb/merge_file_configurable_diff_algorithm-v1
> Pull-Request: https://github.com/git/git/pull/1606
> 
>   Documentation/git-merge-file.txt |  5 +++++
>   builtin/merge-file.c             | 28 ++++++++++++++++++++++++++++
>   2 files changed, 33 insertions(+)
> 
> diff --git a/Documentation/git-merge-file.txt b/Documentation/git-merge-file.txt
> index 6a081eacb72..917535217c1 100644
> --- a/Documentation/git-merge-file.txt
> +++ b/Documentation/git-merge-file.txt
> @@ -92,6 +92,11 @@ object store and the object ID of its blob is written to standard output.
>   	Instead of leaving conflicts in the file, resolve conflicts
>   	favouring our (or their or both) side of the lines.
>   
> +--diff-algorithm <algorithm>::
> +	Use a different diff algorithm while merging, which can help
> +	avoid mismerges that occur due to unimportant matching lines
> +	(such as braces from distinct functions).  See also
> +	linkgit:git-diff[1] `--diff-algorithm`.

Perhaps we could list the available algorithms here so the user does not 
have to go searching for them in another man page.

>   EXAMPLES
>   --------
> diff --git a/builtin/merge-file.c b/builtin/merge-file.c
> index 832c93d8d54..1f987334a31 100644
> --- a/builtin/merge-file.c
> +++ b/builtin/merge-file.c
> @@ -1,5 +1,6 @@
>   #include "builtin.h"
>   #include "abspath.h"
> +#include "diff.h"
>   #include "hex.h"
>   #include "object-name.h"
>   #include "object-store.h"
> @@ -28,6 +29,30 @@ static int label_cb(const struct option *opt, const char *arg, int unset)
>   	return 0;
>   }
>   
> +static int set_diff_algorithm(xpparam_t *xpp,
> +			      const char *alg)
> +{
> +	long diff_algorithm = parse_algorithm_value(alg);
> +	if (diff_algorithm < 0)
> +		return -1;
> +	xpp->flags = (xpp->flags & ~XDF_DIFF_ALGORITHM_MASK) | diff_algorithm;
> +	return 0;
> +}
> +
> +static int diff_algorithm_cb(const struct option *opt,
> +				const char *arg, int unset)
> +{
> +	xpparam_t *xpp = opt->value;
> +
> +	BUG_ON_OPT_NEG(unset);
> +
> +	if (set_diff_algorithm(xpp, arg))
> +		return error(_("option diff-algorithm accepts \"myers\", "
> +			       "\"minimal\", \"patience\" and \"histogram\""));
> +
> +	return 0;
> +}
> +
>   int cmd_merge_file(int argc, const char **argv, const char *prefix)
>   {
>   	const char *names[3] = { 0 };
> @@ -48,6 +73,9 @@ int cmd_merge_file(int argc, const char **argv, const char *prefix)
>   			    XDL_MERGE_FAVOR_THEIRS),
>   		OPT_SET_INT(0, "union", &xmp.favor, N_("for conflicts, use a union version"),
>   			    XDL_MERGE_FAVOR_UNION),
> +		OPT_CALLBACK_F(0, "diff-algorithm", &xmp.xpp, N_("<algorithm>"),
> +			     N_("choose a diff algorithm"),
> +			     PARSE_OPT_NONEG, diff_algorithm_cb),
>   		OPT_INTEGER(0, "marker-size", &xmp.marker_size,
>   			    N_("for conflicts, use this marker size")),
>   		OPT__QUIET(&quiet, N_("do not warn about conflicts")),

This patch looks sensible to me, it would be nice to have some tests though.

Best Wishes

Phillip

> base-commit: 98009afd24e2304bf923a64750340423473809ff
Phillip Wood Nov. 19, 2023, 4:43 p.m. UTC | #3
Hi Antonin

On 17/11/2023 21:42, Antonin Delpeuch wrote:
> Hi all,
> 
> Here a few more thoughts about this patch, to explain what brought me to 
> needing that. If this need is misguided, perhaps you could redirect me 
> to a better solution.
> 
> I am writing a custom merge driver for Java files. This merge driver 
> internally calls git-merge-file and then solves the merge conflicts 
> which only consist of import statements (there might be cases where it 
> gets it wrong, but I can then use other tools to cleanup those import 
> statements). When testing this, I noticed that the merge driver 
> performed more poorly on other sorts of conflicts, compared to the 
> standard "ort" merge strategy. This is because "ort" uses the 
> "histogram" diff algorithm, which gives better results than the "myers" 
> diff algorithm that merge-file uses.

I cannot comment on this particular use but I think in general calling 
"git merge-file" from a custom merge driver is perfectly sensible. Have 
you tested your driver with this patch to see if you get better results 
with the histogram diff algorithm?

> Intuitively, if "histogram" is the default diff algorithm used by "git 
> merge", then it would also make sense to have the same default for "git 
> merge-file", but I assume that changing this default could be considered 
> a bad breaking change. So I thought that making this diff algorithm 
> configurable would be an acceptable move, hence my patch.

I can see there's an argument for changing the default algorithm of "git 
merge-file" to match what "ort" uses. I know Elijah found the histogram 
algorithm gave better results in his testing when he was developing 
"ort". While it would be a breaking change if on the average the new 
default gives better conflicts it might be worth it. This patch would 
mean that someone wanting to use the "myers" algorithm could still do so.

> Of course, the diffing could be configured in other ways, for instance 
> with its handling of whitespace or EOL (similarly to what the "git-diff" 
> command offers). I think those options would definitely be worth 
> exposing in merge-file as well. If you think this makes sense, then I 
> would be happy to work on a new version of this patch which would 
> attempt to include all the relevant options. I could also try to add the 
> corresponding tests.

It would be nice to see some tests for this patch, ideally using a test 
case that gives different conflicts for "myers" and "histogram". We 
could add the other options later if there is a demand.

Best Wishes

Phillip

> But perhaps my need is misguided? Could it be that I should not be 
> writing a custom merge driver, but instead use another extension point 
> to only process the conflicting hunks after execution of the existing 
> merge driver? I couldn't find such an extension point, but it can well 
> be that I missed it.
> 
> Thank you,
> 
> Antonin
> 
>
Antonin Delpeuch Nov. 19, 2023, 7:29 p.m. UTC | #4
Hi Phillip,

Thank you so much for taking the time to review this!

On 19/11/2023 17:43, Phillip Wood wrote:
> I cannot comment on this particular use but I think in general calling 
> "git merge-file" from a custom merge driver is perfectly sensible. 
> Have you tested your driver with this patch to see if you get better 
> results with the histogram diff algorithm?

Yes, I can confirm that the results are better in my use case indeed.

> I can see there's an argument for changing the default algorithm of 
> "git merge-file" to match what "ort" uses. I know Elijah found the 
> histogram algorithm gave better results in his testing when he was 
> developing "ort". While it would be a breaking change if on the 
> average the new default gives better conflicts it might be worth it. 
> This patch would mean that someone wanting to use the "myers" 
> algorithm could still do so.

Agreed. I would be happy to submit a follow-up patch to change the 
default. Or would you prefer to have it in the same patch (as a separate 
commit)? I was worried this would make my patch less likely to get merged.

> It would be nice to see some tests for this patch, ideally using a 
> test case that gives different conflicts for "myers" and "histogram". 
> We could add the other options later if there is a demand.

Will do.

> Perhaps we could list the available algorithms here so the user does 
> not have to go searching for them in another man page.

This part is copied from "Documentation/merge-strategies.txt", which 
redirects to the manual for git-diff in the same way. I assume it was 
done so that whenever a new diff algorithm is introduced, it only needs 
documenting in one place. But I agree it is definitely more 
user-friendly to list the algorithms directly. Should I change the 
documentation of merge strategies in the same way?

Best wishes,

Antonin
Junio C Hamano Nov. 19, 2023, 11:30 p.m. UTC | #5
Phillip Wood <phillip.wood123@gmail.com> writes:

> I can see there's an argument for changing the default algorithm of
> "git merge-file" to match what "ort" uses. I know Elijah found the
> histogram algorithm gave better results in his testing when he was
> developing "ort". While it would be a breaking change if on the
> average the new default gives better conflicts it might be worth
> it. This patch would mean that someone wanting to use the "myers"
> algorithm could still do so.

Sounds like a sensible thing to do.  First allow to configure the
custom algorithm from the command line option (and optionally via a
configuration variable) and ship it in a release, start giving a
warning if the using script did not specify the configuration or the
command line option and used the current default and ship it in the
next release, wait for a few releases and then finally flip the
default, or something like that.

Thanks.
diff mbox series

Patch

diff --git a/Documentation/git-merge-file.txt b/Documentation/git-merge-file.txt
index 6a081eacb72..917535217c1 100644
--- a/Documentation/git-merge-file.txt
+++ b/Documentation/git-merge-file.txt
@@ -92,6 +92,11 @@  object store and the object ID of its blob is written to standard output.
 	Instead of leaving conflicts in the file, resolve conflicts
 	favouring our (or their or both) side of the lines.
 
+--diff-algorithm <algorithm>::
+	Use a different diff algorithm while merging, which can help
+	avoid mismerges that occur due to unimportant matching lines
+	(such as braces from distinct functions).  See also
+	linkgit:git-diff[1] `--diff-algorithm`.
 
 EXAMPLES
 --------
diff --git a/builtin/merge-file.c b/builtin/merge-file.c
index 832c93d8d54..1f987334a31 100644
--- a/builtin/merge-file.c
+++ b/builtin/merge-file.c
@@ -1,5 +1,6 @@ 
 #include "builtin.h"
 #include "abspath.h"
+#include "diff.h"
 #include "hex.h"
 #include "object-name.h"
 #include "object-store.h"
@@ -28,6 +29,30 @@  static int label_cb(const struct option *opt, const char *arg, int unset)
 	return 0;
 }
 
+static int set_diff_algorithm(xpparam_t *xpp,
+			      const char *alg)
+{
+	long diff_algorithm = parse_algorithm_value(alg);
+	if (diff_algorithm < 0)
+		return -1;
+	xpp->flags = (xpp->flags & ~XDF_DIFF_ALGORITHM_MASK) | diff_algorithm;
+	return 0;
+}
+
+static int diff_algorithm_cb(const struct option *opt,
+				const char *arg, int unset)
+{
+	xpparam_t *xpp = opt->value;
+
+	BUG_ON_OPT_NEG(unset);
+
+	if (set_diff_algorithm(xpp, arg))
+		return error(_("option diff-algorithm accepts \"myers\", "
+			       "\"minimal\", \"patience\" and \"histogram\""));
+
+	return 0;
+}
+
 int cmd_merge_file(int argc, const char **argv, const char *prefix)
 {
 	const char *names[3] = { 0 };
@@ -48,6 +73,9 @@  int cmd_merge_file(int argc, const char **argv, const char *prefix)
 			    XDL_MERGE_FAVOR_THEIRS),
 		OPT_SET_INT(0, "union", &xmp.favor, N_("for conflicts, use a union version"),
 			    XDL_MERGE_FAVOR_UNION),
+		OPT_CALLBACK_F(0, "diff-algorithm", &xmp.xpp, N_("<algorithm>"),
+			     N_("choose a diff algorithm"),
+			     PARSE_OPT_NONEG, diff_algorithm_cb),
 		OPT_INTEGER(0, "marker-size", &xmp.marker_size,
 			    N_("for conflicts, use this marker size")),
 		OPT__QUIET(&quiet, N_("do not warn about conflicts")),