diff mbox series

[2/3] merge-ort: allow rename detection to be disabled

Message ID 4292b22723f759c3e0f84ac1000992187a9c7f7c.1741362522.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series Small new merge-ort features, prepping for deletion of merge-recursive.[ch] | expand

Commit Message

Elijah Newren March 7, 2025, 3:48 p.m. UTC
From: Elijah Newren <newren@gmail.com>

When merge-ort was written, I did not at first allow rename detection to
be disabled, because I suspected that most folks disabling rename
detection were doing so solely for performance reasons.  Since I put a
lot of working into providing dramatic speedups for rename detection
performance as used by the merge machinery, I wanted to know if there
were still real world repositories where rename detection was
problematic from a performance perspective.  We have had years now to
collect such information, and while we never received one, waiting
longer with the option disabled seems unlikely to help surface such
issues at this point.  Also, there has been at least one request to
allow rename detection to be disabled for behavioral rather than
performance reasons, so let's start heeding the config and command line
settings.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 Documentation/merge-strategies.adoc | 12 ++++++------
 merge-ort.c                         |  5 +++++
 2 files changed, 11 insertions(+), 6 deletions(-)

Comments

Patrick Steinhardt March 12, 2025, 8:06 a.m. UTC | #1
On Fri, Mar 07, 2025 at 03:48:41PM +0000, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@gmail.com>
> 
> When merge-ort was written, I did not at first allow rename detection to
> be disabled, because I suspected that most folks disabling rename
> detection were doing so solely for performance reasons.  Since I put a
> lot of working into providing dramatic speedups for rename detection
> performance as used by the merge machinery, I wanted to know if there
> were still real world repositories where rename detection was
> problematic from a performance perspective.  We have had years now to
> collect such information, and while we never received one, waiting
> longer with the option disabled seems unlikely to help surface such
> issues at this point.  Also, there has been at least one request to
> allow rename detection to be disabled for behavioral rather than
> performance reasons, so let's start heeding the config and command line
> settings.

It might be nice to provide a link to that request for more context.

> diff --git a/Documentation/merge-strategies.adoc b/Documentation/merge-strategies.adoc
> index 93822ebc4e8..59f5ae36ccb 100644
> --- a/Documentation/merge-strategies.adoc
> +++ b/Documentation/merge-strategies.adoc
> @@ -82,6 +82,11 @@ find-renames[=<n>];;
>  rename-threshold=<n>;;
>  	Deprecated synonym for `find-renames=<n>`.
>  
> +no-renames;;
> +	Turn off rename detection. This overrides the `merge.renames`
> +	configuration variable.
> +	See also linkgit:git-diff[1] `--no-renames`.
> +
>  subtree[=<path>];;
>  	This option is a more advanced form of 'subtree' strategy, where
>  	the strategy makes a guess on how two trees must be shifted to
> @@ -107,7 +112,7 @@ For a path that is a submodule, the same caution as 'ort' applies to this
>  strategy.
>  +
>  The 'recursive' strategy takes the same options as 'ort'.  However,
> -there are three additional options that 'ort' ignores (not documented
> +there are two additional options that 'ort' ignores (not documented
>  above) that are potentially useful with the 'recursive' strategy:
>  
>  patience;;
> @@ -121,11 +126,6 @@ diff-algorithm=[patience|minimal|histogram|myers];;
>  	specifically uses `diff-algorithm=histogram`, while `recursive`
>  	defaults to the `diff.algorithm` config setting.
>  
> -no-renames;;
> -	Turn off rename detection. This overrides the `merge.renames`
> -	configuration variable.
> -	See also linkgit:git-diff[1] `--no-renames`.
> -
>  resolve::
>  	This can only resolve two heads (i.e. the current branch
>  	and another branch you pulled from) using a 3-way merge

Makes sense.

> diff --git a/merge-ort.c b/merge-ort.c
> index b4ff24403a1..a6960b6a1b4 100644
> --- a/merge-ort.c
> +++ b/merge-ort.c
> @@ -3448,6 +3448,11 @@ static int detect_and_process_renames(struct merge_options *opt)
>  
>  	if (!possible_renames(renames))
>  		goto cleanup;
> +	if (opt->detect_renames == 0) {
> +		renames->redo_after_renames = 0;
> +		renames->cached_pairs_valid_side = 0;
> +		goto cleanup;
> +	}
>  
>  	trace2_region_enter("merge", "regular renames", opt->repo);
>  	detection_run |= detect_regular_renames(opt, MERGE_SIDE1);

Do we want to add a test that demonstrates that the option works as
expected?

Patrick
Taylor Blau March 12, 2025, 8:02 p.m. UTC | #2
On Wed, Mar 12, 2025 at 09:06:35AM +0100, Patrick Steinhardt wrote:
> On Fri, Mar 07, 2025 at 03:48:41PM +0000, Elijah Newren via GitGitGadget wrote:
> > From: Elijah Newren <newren@gmail.com>
> >
> > When merge-ort was written, I did not at first allow rename detection to
> > be disabled, because I suspected that most folks disabling rename
> > detection were doing so solely for performance reasons.  Since I put a
> > lot of working into providing dramatic speedups for rename detection
> > performance as used by the merge machinery, I wanted to know if there
> > were still real world repositories where rename detection was
> > problematic from a performance perspective.  We have had years now to
> > collect such information, and while we never received one, waiting
> > longer with the option disabled seems unlikely to help surface such
> > issues at this point.  Also, there has been at least one request to
> > allow rename detection to be disabled for behavioral rather than
> > performance reasons, so let's start heeding the config and command line
> > settings.
>
> It might be nice to provide a link to that request for more context.

I don't know if a link exists; I suspect the request referred to here is
an email that Johannes Schindelin wrote to Elijah privately).

But I am almost certain that the behavior requested here is to disable
rename detection to match the behavior of GitHub's prior use of libgit2
to perform merges, where we also had rename detection disabled (for
reasons that are unclear to me, but Peff might know).

> > diff --git a/merge-ort.c b/merge-ort.c
> > index b4ff24403a1..a6960b6a1b4 100644
> > --- a/merge-ort.c
> > +++ b/merge-ort.c
> > @@ -3448,6 +3448,11 @@ static int detect_and_process_renames(struct merge_options *opt)
> >
> >  	if (!possible_renames(renames))
> >  		goto cleanup;
> > +	if (opt->detect_renames == 0) {

    if (!opt->detect_renames)

?

> > +		renames->redo_after_renames = 0;
> > +		renames->cached_pairs_valid_side = 0;
> > +		goto cleanup;
> > +	}
> >
> >  	trace2_region_enter("merge", "regular renames", opt->repo);
> >  	detection_run |= detect_regular_renames(opt, MERGE_SIDE1);
>
> Do we want to add a test that demonstrates that the option works as
> expected?

Yeah, having a test here would be nice.

Thanks,
Taylor
Elijah Newren March 12, 2025, 9:40 p.m. UTC | #3
On Wed, Mar 12, 2025 at 1:02 PM Taylor Blau <me@ttaylorr.com> wrote:
>
> On Wed, Mar 12, 2025 at 09:06:35AM +0100, Patrick Steinhardt wrote:
> > On Fri, Mar 07, 2025 at 03:48:41PM +0000, Elijah Newren via GitGitGadget wrote:
> > > From: Elijah Newren <newren@gmail.com>
> > >
> > > [...] Also, there has been at least one request to
> > > allow rename detection to be disabled for behavioral rather than
> > > performance reasons, so let's start heeding the config and command line
> > > settings.
> >
> > It might be nice to provide a link to that request for more context.

Will add.

> I don't know if a link exists; I suspect the request referred to here is
> an email that Johannes Schindelin wrote to Elijah privately).

It exists: https://lore.kernel.org/git/CABPp-BG-Nx6SCxxkGXn_Fwd2wseifMFND8eddvWxiZVZk0zRaA@mail.gmail.com/

...which wasn't Johannes' request.

> But I am almost certain that the behavior requested here is to disable
> rename detection to match the behavior of GitHub's prior use of libgit2
> to perform merges, where we also had rename detection disabled (for
> reasons that are unclear to me, but Peff might know).

No, if that were the sole reason, I'd say it probably only belongs in
our internal fork.  Disabling of rename detection within GitHub was a
temporary internal migration measure, not a desired end state -- at
least that's the way Johannes portrayed it to me.  I know that
"temporary" sometimes lasts longer than we want, but now that I've
become internal to GitHub, one of the things I want to do is add some
weight to that "temporary" modifier.

> > > diff --git a/merge-ort.c b/merge-ort.c
> > > index b4ff24403a1..a6960b6a1b4 100644
> > > --- a/merge-ort.c
> > > +++ b/merge-ort.c
> > > @@ -3448,6 +3448,11 @@ static int detect_and_process_renames(struct merge_options *opt)
> > >
> > >     if (!possible_renames(renames))
> > >             goto cleanup;
> > > +   if (opt->detect_renames == 0) {
>
>     if (!opt->detect_renames)
>
> ?

Yeah, I wanted an opt->detect_renames == DIFF_DETECT_NONE, but we
never defined that and only defined DIFF_DETECT_RENAME and
DIFF_DETECT_COPY.  I'll switch it over.

> > > +           renames->redo_after_renames = 0;
> > > +           renames->cached_pairs_valid_side = 0;
> > > +           goto cleanup;
> > > +   }
> > >
> > >     trace2_region_enter("merge", "regular renames", opt->repo);
> > >     detection_run |= detect_regular_renames(opt, MERGE_SIDE1);
> >
> > Do we want to add a test that demonstrates that the option works as
> > expected?
>
> Yeah, having a test here would be nice.

Will add.
Taylor Blau March 12, 2025, 9:50 p.m. UTC | #4
On Wed, Mar 12, 2025 at 02:40:35PM -0700, Elijah Newren wrote:
> > I don't know if a link exists; I suspect the request referred to here is
> > an email that Johannes Schindelin wrote to Elijah privately).
>
> It exists: https://lore.kernel.org/git/CABPp-BG-Nx6SCxxkGXn_Fwd2wseifMFND8eddvWxiZVZk0zRaA@mail.gmail.com/
>
> ...which wasn't Johannes' request.

Ah, thanks for the link!

> > But I am almost certain that the behavior requested here is to disable
> > rename detection to match the behavior of GitHub's prior use of libgit2
> > to perform merges, where we also had rename detection disabled (for
> > reasons that are unclear to me, but Peff might know).
>
> No, if that were the sole reason, I'd say it probably only belongs in
> our internal fork.  Disabling of rename detection within GitHub was a
> temporary internal migration measure, not a desired end state -- at
> least that's the way Johannes portrayed it to me.  I know that
> "temporary" sometimes lasts longer than we want, but now that I've
> become internal to GitHub, one of the things I want to do is add some
> weight to that "temporary" modifier.

:-).

Thanks,
Taylor
Jeff King March 13, 2025, 5:25 a.m. UTC | #5
On Wed, Mar 12, 2025 at 02:40:35PM -0700, Elijah Newren wrote:

> > But I am almost certain that the behavior requested here is to disable
> > rename detection to match the behavior of GitHub's prior use of libgit2
> > to perform merges, where we also had rename detection disabled (for
> > reasons that are unclear to me, but Peff might know).
> 
> No, if that were the sole reason, I'd say it probably only belongs in
> our internal fork.  Disabling of rename detection within GitHub was a
> temporary internal migration measure, not a desired end state -- at
> least that's the way Johannes portrayed it to me.  I know that
> "temporary" sometimes lasts longer than we want, but now that I've
> become internal to GitHub, one of the things I want to do is add some
> weight to that "temporary" modifier.

Yes, I think it was a series of hysterical raisins. The original PR
merge test at GitHub was done using a shell script around git-merge-file
(because git-merge insisted on a working tree). And naturally that did
not support renames. (I think I probably wrote that script, but it's so
long ago I could be wrong, and I don't have access to the repo anymore).

And then we switched from that to libgit2, after Ed Thomson implemented
merge support there (mostly for performance). And the decision was made
to disable renames there at first, to confirm that it otherwise
performed identically to the existing shell script (to confirm the
results, but also because it was unclear if rename detection for
automated merges would always produce what the user wanted, or have bad
corner cases). So it was mostly temporary, with the idea that somebody
would explore turning it on later. But I don't think that ever happened.
Those with access to the correct repositories can probably find the
arguments in the issue tracker. ;)

I don't think I was around for switching from libgit2 to merge-tree, but
I'd guess the same "only change one thing at a time" logic applied.

So yes, mostly temporary-but-never-revisited, with a dash of
conservatism.

I don't have any real opinion on what should happen in the future,
except that renames on GitHub are probably reasonable, and having an
option to disable renames for everyone is probably also a reasonable
feature. ;)

-Peff
diff mbox series

Patch

diff --git a/Documentation/merge-strategies.adoc b/Documentation/merge-strategies.adoc
index 93822ebc4e8..59f5ae36ccb 100644
--- a/Documentation/merge-strategies.adoc
+++ b/Documentation/merge-strategies.adoc
@@ -82,6 +82,11 @@  find-renames[=<n>];;
 rename-threshold=<n>;;
 	Deprecated synonym for `find-renames=<n>`.
 
+no-renames;;
+	Turn off rename detection. This overrides the `merge.renames`
+	configuration variable.
+	See also linkgit:git-diff[1] `--no-renames`.
+
 subtree[=<path>];;
 	This option is a more advanced form of 'subtree' strategy, where
 	the strategy makes a guess on how two trees must be shifted to
@@ -107,7 +112,7 @@  For a path that is a submodule, the same caution as 'ort' applies to this
 strategy.
 +
 The 'recursive' strategy takes the same options as 'ort'.  However,
-there are three additional options that 'ort' ignores (not documented
+there are two additional options that 'ort' ignores (not documented
 above) that are potentially useful with the 'recursive' strategy:
 
 patience;;
@@ -121,11 +126,6 @@  diff-algorithm=[patience|minimal|histogram|myers];;
 	specifically uses `diff-algorithm=histogram`, while `recursive`
 	defaults to the `diff.algorithm` config setting.
 
-no-renames;;
-	Turn off rename detection. This overrides the `merge.renames`
-	configuration variable.
-	See also linkgit:git-diff[1] `--no-renames`.
-
 resolve::
 	This can only resolve two heads (i.e. the current branch
 	and another branch you pulled from) using a 3-way merge
diff --git a/merge-ort.c b/merge-ort.c
index b4ff24403a1..a6960b6a1b4 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -3448,6 +3448,11 @@  static int detect_and_process_renames(struct merge_options *opt)
 
 	if (!possible_renames(renames))
 		goto cleanup;
+	if (opt->detect_renames == 0) {
+		renames->redo_after_renames = 0;
+		renames->cached_pairs_valid_side = 0;
+		goto cleanup;
+	}
 
 	trace2_region_enter("merge", "regular renames", opt->repo);
 	detection_run |= detect_regular_renames(opt, MERGE_SIDE1);