diff mbox series

[v2,7/7] merge-ort: restart merge with cached renames to reduce process entry cost

Message ID 7133f0efa520b3d0cadb059151daa12484fdb003.1626204784.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series Optimization batch 14: trivial directory resolution | expand

Commit Message

Elijah Newren July 13, 2021, 7:33 p.m. UTC
From: Elijah Newren <newren@gmail.com>

The merge algorithm mostly consists of the following three functions:
   collect_merge_info()
   detect_and_process_renames()
   process_entries()
Prior to the trivial directory resolution optimization of the last half
dozen commits, process_entries() was consistently the slowest, followed
by collect_merge_info(), then detect_and_process_renames().  When the
trivial directory resolution applies, it often dramatically decreases
the amount of time spent in the two slower functions.

Looking at the performance results in the previous commit, the trivial
directory resolution optimization helps amazingly well when there are no
relevant renames.  It also helps really well when reapplying a long
series of linear commits (such as in a rebase or cherry-pick), since the
relevant renames may well be cached from the first reapplied commit.
But when there are any relevant renames that are not cached (represented
by the just-one-mega testcase), then the optimization does not help at
all.

Often, I noticed that when the optimization does not apply, it is
because there are a handful of relevant sources -- maybe even only one.
It felt frustrating to need to recurse into potentially hundreds or even
thousands of directories just for a single rename, but it was needed for
correctness.

However, staring at this list of functions and noticing that
process_entries() is the most expensive and knowing I could avoid it if
I had cached renames suggested a simple idea: change
   collect_merge_info()
   detect_and_process_renames()
   process_entries()
into
   collect_merge_info()
   detect_and_process_renames()
   <cache all the renames, and restart>
   collect_merge_info()
   detect_and_process_renames()
   process_entries()

This may seem odd and look like more work.  However, note that although
we run collect_merge_info() twice, the second time we get to employ
trivial directory resolves, which makes it much faster, so the increased
time in collect_merge_info() is small.  While we run
detect_and_process_renames() again, all renames are cached so it's
nearly a no-op (we don't call into diffcore_rename_extended() but we do
have a little bit of data structure checking and fixing up).  And the
big payoff comes from the fact that process_entries(), will be much
faster due to having far fewer entries to process.

This restarting only makes sense if we can save recursing into enough
directories to make it worth our while.  Introduce a simple heuristic to
guide this.  Note that this heuristic uses a "wanted_factor" that I have
virtually no actual real world data for, just some back-of-the-envelope
quasi-scientific calculations that I included in some comments and then
plucked a simple round number out of thin air.  It could be that
tweaking this number to make it either higher or lower improves the
optimization.  (There's slightly more here; when I first introduced this
optimization, I used a factor of 10, because I was completely confident
it was big enough to not cause slowdowns in special cases.  I was
certain it was higher than needed.  Several months later, I added the
rough calculations which make me think the optimal number is close to 2;
but instead of pushing to the limit, I just bumped it to 3 to reduce the
risk that there are special cases where this optimization can result in
slowing down the code a little.  If the ratio of path counts is below 3,
we probably will only see minor performance improvements at best
anyway.)

Also, note that while the diffstat looks kind of long (nearly 100
lines), more than half of it is in two comments explaining how things
work.

For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:

                            Before                  After
    no-renames:      205.1  ms ±  3.8  ms   204.2  ms ±  3.0  ms
    mega-renames:      1.564 s ±  0.010 s     1.076 s ±  0.015 s
    just-one-mega:   479.5  ms ±  3.9  ms   364.1  ms ±  7.0  ms

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c                         | 111 ++++++++++++++++++++++++++--
 t/t6423-merge-rename-directories.sh |   2 +-
 2 files changed, 106 insertions(+), 7 deletions(-)

Comments

Derrick Stolee July 15, 2021, 3:09 p.m. UTC | #1
On 7/13/2021 3:33 PM, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@gmail.com>
...
> Often, I noticed that when the optimization does not apply, it is
> because there are a handful of relevant sources -- maybe even only one.
> It felt frustrating to need to recurse into potentially hundreds or even
> thousands of directories just for a single rename, but it was needed for
> correctness.
> 
> However, staring at this list of functions and noticing that
> process_entries() is the most expensive and knowing I could avoid it if
> I had cached renames suggested a simple idea: change
>    collect_merge_info()
>    detect_and_process_renames()
>    process_entries()
> into
>    collect_merge_info()
>    detect_and_process_renames()
>    <cache all the renames, and restart>
>    collect_merge_info()
>    detect_and_process_renames()
>    process_entries()
> 
> This may seem odd and look like more work.  However, note that although
> we run collect_merge_info() twice, the second time we get to employ
> trivial directory resolves, which makes it much faster, so the increased
> time in collect_merge_info() is small.  While we run
> detect_and_process_renames() again, all renames are cached so it's
> nearly a no-op (we don't call into diffcore_rename_extended() but we do
> have a little bit of data structure checking and fixing up).  And the
> big payoff comes from the fact that process_entries(), will be much
> faster due to having far fewer entries to process.

I enjoyed the story you tell here.

> +	if (path_count_after) {
> +		/*
> +		 * Not sure were the right cut-off is for the optimization
> +		 * to redo collect_merge_info after we've cached the
> +		 * regular renames is.  Basically, collect_merge_info(),
> +		 * detect_regular_renames(), and process_entries() are
> +		 * similar costs and all big tent poles.  Caching the
> +		 * result of detect_regular_renames() means that redoing
> +		 * that one function will cost us virtually 0 extra, so it
> +		 * depends on the other two functions, which are both O(N)
> +		 * cost in the number of paths.  Thus, it makes sense that
> +		 * if we can cut the number of paths in half, then redoing
> +		 * collect_merge_info() at half cost in order to get
> +		 * process_entries() at half cost should be about equal
> +		 * cost.  If we can cut by more than half, then we would
> +		 * win.  The fact that process_entries() is about 10%-20%
> +		 * more expensive than collect_merge_info() suggests we
> +		 * could make the factor be less than two.  The fact that
> +		 * even when we have renames cached, we still have to
> +		 * traverse down to the individual (relevant) renames,
> +		 * which suggests we should perhaps use a bigger factor.
> +		 *
> +		 * The exact number isn't critical, since the code will
> +		 * work even if we get the factor wrong -- it just might be
> +		 * slightly slower if we're a bit off.  For now, just error
> +		 * on the side of a bigger fudge.  For the linux kernel

super-nit: s/linux/Linux/

> +		 * testcases I was looking at with massive renames, the
> +		 * ratio came in around 50 to 250, which clearly would
> +		 * trigger this optimization and provided some *very* nice
> +		 * speedups.

This bit of your testing might be more appropriate for your commit
message. This discussion of a test made at a certain point in time
is more likely to go stale than the description of how this factor
does not change correctness, only performance.

> +		 */
> +		int wanted_factor = 3;

and perhaps make it 'const'?

> +
> +		/* We should only redo collect_merge_info one time */
> +		assert(renames->redo_after_renames == 0);
> +
> +		if (path_count_after / path_count_before > wanted_factor) {

With truncation from integer division, this condition is equivalent* to

	path_count_after >= 4 * path_count_before

Or, do you want to change this to a ">=" so that the factor of 3 seems
more accurate?

*I understand the intention of using division to avoid (unlikely)
overflow via multiplication. The truncation is causing some confusion.
  
> -test_expect_merge_algorithm failure failure '12f: Trivial directory resolve, caching, all kinds of fun' '
> +test_expect_merge_algorithm failure success '12f: Trivial directory resolve, caching, all kinds of fun' '
>  	test_setup_12f &&
>  	(
>  		cd 12f &&
> 

Oh, and a bonus test success! Excellent.

Thanks,
-Stolee
Elijah Newren July 15, 2021, 4:53 p.m. UTC | #2
On Thu, Jul 15, 2021 at 8:10 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 7/13/2021 3:33 PM, Elijah Newren via GitGitGadget wrote:
> > From: Elijah Newren <newren@gmail.com>
> ...
> > Often, I noticed that when the optimization does not apply, it is
> > because there are a handful of relevant sources -- maybe even only one.
> > It felt frustrating to need to recurse into potentially hundreds or even
> > thousands of directories just for a single rename, but it was needed for
> > correctness.
> >
> > However, staring at this list of functions and noticing that
> > process_entries() is the most expensive and knowing I could avoid it if
> > I had cached renames suggested a simple idea: change
> >    collect_merge_info()
> >    detect_and_process_renames()
> >    process_entries()
> > into
> >    collect_merge_info()
> >    detect_and_process_renames()
> >    <cache all the renames, and restart>
> >    collect_merge_info()
> >    detect_and_process_renames()
> >    process_entries()
> >
> > This may seem odd and look like more work.  However, note that although
> > we run collect_merge_info() twice, the second time we get to employ
> > trivial directory resolves, which makes it much faster, so the increased
> > time in collect_merge_info() is small.  While we run
> > detect_and_process_renames() again, all renames are cached so it's
> > nearly a no-op (we don't call into diffcore_rename_extended() but we do
> > have a little bit of data structure checking and fixing up).  And the
> > big payoff comes from the fact that process_entries(), will be much
> > faster due to having far fewer entries to process.
>
> I enjoyed the story you tell here.

:-)

> > +     if (path_count_after) {
> > +             /*
> > +              * Not sure were the right cut-off is for the optimization
> > +              * to redo collect_merge_info after we've cached the
> > +              * regular renames is.  Basically, collect_merge_info(),
> > +              * detect_regular_renames(), and process_entries() are
> > +              * similar costs and all big tent poles.  Caching the
> > +              * result of detect_regular_renames() means that redoing
> > +              * that one function will cost us virtually 0 extra, so it
> > +              * depends on the other two functions, which are both O(N)
> > +              * cost in the number of paths.  Thus, it makes sense that
> > +              * if we can cut the number of paths in half, then redoing
> > +              * collect_merge_info() at half cost in order to get
> > +              * process_entries() at half cost should be about equal
> > +              * cost.  If we can cut by more than half, then we would
> > +              * win.  The fact that process_entries() is about 10%-20%
> > +              * more expensive than collect_merge_info() suggests we
> > +              * could make the factor be less than two.  The fact that
> > +              * even when we have renames cached, we still have to
> > +              * traverse down to the individual (relevant) renames,
> > +              * which suggests we should perhaps use a bigger factor.
> > +              *
> > +              * The exact number isn't critical, since the code will
> > +              * work even if we get the factor wrong -- it just might be
> > +              * slightly slower if we're a bit off.  For now, just error
> > +              * on the side of a bigger fudge.  For the linux kernel
>
> super-nit: s/linux/Linux/

Yeah, I tend to refer to projects by the name of their repository
instead of their proper name.  (I do it with git too.)  Bad habit.
Will fix.  That is, I will fix this instance.  Not sure I can fix the
habit.

> > +              * testcases I was looking at with massive renames, the
> > +              * ratio came in around 50 to 250, which clearly would
> > +              * trigger this optimization and provided some *very* nice
> > +              * speedups.
>
> This bit of your testing might be more appropriate for your commit
> message. This discussion of a test made at a certain point in time
> is more likely to go stale than the description of how this factor
> does not change correctness, only performance.

The commit message does include discussion on how this factor only
changes performance, not correctness.  I left this comment in the code
because I figured it looked weird and magic and deserved an
explanation without resorting to git-log or git-blame.  Granted, I
wrote this comment and the commit message at much different times (I
wrote the comment first, then the commit message many months later)
and thus summarized a bit differently.  But I thought I had the same
relevant content in both places.

Are there pieces you feel are missing from the commit message?  Should
I trim this comment down in the code and just let people look for the
commit message for more details?

> > +              */
> > +             int wanted_factor = 3;
>
> and perhaps make it 'const'?

Sure, will do.

> > +
> > +             /* We should only redo collect_merge_info one time */
> > +             assert(renames->redo_after_renames == 0);
> > +
> > +             if (path_count_after / path_count_before > wanted_factor) {
>
> With truncation from integer division, this condition is equivalent* to
>
>         path_count_after >= 4 * path_count_before
>
> Or, do you want to change this to a ">=" so that the factor of 3 seems
> more accurate?
>
> *I understand the intention of using division to avoid (unlikely)
> overflow via multiplication. The truncation is causing some confusion.

Good catch; I'll fix it up to use ">=".

> > -test_expect_merge_algorithm failure failure '12f: Trivial directory resolve, caching, all kinds of fun' '
> > +test_expect_merge_algorithm failure success '12f: Trivial directory resolve, caching, all kinds of fun' '
> >       test_setup_12f &&
> >       (
> >               cd 12f &&
> >
>
> Oh, and a bonus test success! Excellent.

Yeah, this testcase was slightly weird in the order I sent it
upstream.  12f was written specifically with this optimization in mind
as a way of ensuring code coverage of the restart logic.  I would have
waited to submit the 12f testcase with this series, but the testcase
also demonstrated an important directory rename detection bug that I
found existed in both merge-recursive and merge-ort at the time.  See
commit 902c521a35 ("t6423: more involved directory rename test",
2020-10-15)

The merge-ort bug was fixed with commit 203c872c4f ("merge-ort: fix a
directory rename detection bug", 2021-01-19).  The merge-recursive bug
still exists.

So, this series just fixed up the final thing that 12f was testing for
-- namely that it included two collect_merge_info() region_enter
trace2 calls per commit instead of just one.

Perhaps I could have split the test, but both things require a
relatively big set of files which makes the test a bit more expensive
so I didn't want to duplicate it.  Besides, having both factors
involved makes it a better stress test of the restart logic.
Derrick Stolee July 15, 2021, 5:19 p.m. UTC | #3
On 7/15/2021 12:53 PM, Elijah Newren wrote:
> On Thu, Jul 15, 2021 at 8:10 AM Derrick Stolee <stolee@gmail.com> wrote:
>>
>> On 7/13/2021 3:33 PM, Elijah Newren via GitGitGadget wrote:
>>> From: Elijah Newren <newren@gmail.com>
...
>>> +              * The exact number isn't critical, since the code will
>>> +              * work even if we get the factor wrong -- it just might be
>>> +              * slightly slower if we're a bit off.  For now, just error
>>> +              * on the side of a bigger fudge.  For the linux kernel
>>
>> super-nit: s/linux/Linux/
> 
> Yeah, I tend to refer to projects by the name of their repository
> instead of their proper name.  (I do it with git too.)  Bad habit.
> Will fix.  That is, I will fix this instance.  Not sure I can fix the
> habit.

If you had written it as torvalds/linux, then I wouldn't have batted
an eye, because that is clearly a repo name (at least, clear to me).

>>> +              * testcases I was looking at with massive renames, the
>>> +              * ratio came in around 50 to 250, which clearly would
>>> +              * trigger this optimization and provided some *very* nice
>>> +              * speedups.
>>
>> This bit of your testing might be more appropriate for your commit
>> message. This discussion of a test made at a certain point in time
>> is more likely to go stale than the description of how this factor
>> does not change correctness, only performance.
> 
> The commit message does include discussion on how this factor only
> changes performance, not correctness.  I left this comment in the code
> because I figured it looked weird and magic and deserved an
> explanation without resorting to git-log or git-blame.  Granted, I
> wrote this comment and the commit message at much different times (I
> wrote the comment first, then the commit message many months later)
> and thus summarized a bit differently.  But I thought I had the same
> relevant content in both places.
> 
> Are there pieces you feel are missing from the commit message?  Should
> I trim this comment down in the code and just let people look for the
> commit message for more details?

I meant to say "these kinds of details are better for the commit
message instead of comments" and not say "your commit message
doesn't have enough."

I don't feel strongly about this being present in the comment or
not, but it seems like something that could be dropped from the
comment without loss of information.

Thanks,
-Stolee
Elijah Newren July 15, 2021, 5:32 p.m. UTC | #4
On Thu, Jul 15, 2021 at 10:19 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 7/15/2021 12:53 PM, Elijah Newren wrote:
> > On Thu, Jul 15, 2021 at 8:10 AM Derrick Stolee <stolee@gmail.com> wrote:
> >>
> >> On 7/13/2021 3:33 PM, Elijah Newren via GitGitGadget wrote:
> >>> From: Elijah Newren <newren@gmail.com>
> ...
> >>> +              * The exact number isn't critical, since the code will
> >>> +              * work even if we get the factor wrong -- it just might be
> >>> +              * slightly slower if we're a bit off.  For now, just error
> >>> +              * on the side of a bigger fudge.  For the linux kernel
> >>
> >> super-nit: s/linux/Linux/
> >
> > Yeah, I tend to refer to projects by the name of their repository
> > instead of their proper name.  (I do it with git too.)  Bad habit.
> > Will fix.  That is, I will fix this instance.  Not sure I can fix the
> > habit.
>
> If you had written it as torvalds/linux, then I wouldn't have batted
> an eye, because that is clearly a repo name (at least, clear to me).

Yeah, the thing is I use the repo name even in cases where it's not
ambiguous whether the project or the repo is meant.  For example, I
would often write something like "I'm part of the git project."
Christian has been trying to fix my capitalization.  Folks in another
project tried to "fix" my capitalization too.  Perhaps by demanding
five letters of all caps, they were able to get one out of me[1].  I
suspect, though, that if they had just a single repo instead of having
hundreds of repositories, they might not have even gotten that one
letter of capitalization from me.  ;-)

[1] https://blogs.gnome.org/newren/2006/04/22/enlightening-mankind-about-the-correct-spelling-of-gnome/

> >>> +              * testcases I was looking at with massive renames, the
> >>> +              * ratio came in around 50 to 250, which clearly would
> >>> +              * trigger this optimization and provided some *very* nice
> >>> +              * speedups.
> >>
> >> This bit of your testing might be more appropriate for your commit
> >> message. This discussion of a test made at a certain point in time
> >> is more likely to go stale than the description of how this factor
> >> does not change correctness, only performance.
> >
> > The commit message does include discussion on how this factor only
> > changes performance, not correctness.  I left this comment in the code
> > because I figured it looked weird and magic and deserved an
> > explanation without resorting to git-log or git-blame.  Granted, I
> > wrote this comment and the commit message at much different times (I
> > wrote the comment first, then the commit message many months later)
> > and thus summarized a bit differently.  But I thought I had the same
> > relevant content in both places.
> >
> > Are there pieces you feel are missing from the commit message?  Should
> > I trim this comment down in the code and just let people look for the
> > commit message for more details?
>
> I meant to say "these kinds of details are better for the commit
> message instead of comments" and not say "your commit message
> doesn't have enough."
>
> I don't feel strongly about this being present in the comment or
> not, but it seems like something that could be dropped from the
> comment without loss of information.

Ah, makes sense.  I'll trim it down.
diff mbox series

Patch

diff --git a/merge-ort.c b/merge-ort.c
index c9cf7a158c8..99af64fce7d 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -209,6 +209,7 @@  struct rename_info {
 	 *   MERGE_SIDE2: cached data from side2 can be reused
 	 *   MERGE_SIDE1: cached data from side1 can be reused
 	 *   0:           no cached data can be reused
+	 *   -1:          See redo_after_renames; both sides can be reused.
 	 */
 	int cached_pairs_valid_side;
 
@@ -254,6 +255,28 @@  struct rename_info {
 	 */
 	struct strset cached_irrelevant[3];
 
+	/*
+	 * redo_after_renames: optimization flag for "restarting" the merge
+	 *
+	 * Sometimes it pays to detect renames, cache them, and then
+	 * restart the merge operation from the beginning.  The reason for
+	 * this is that when we know where all the renames are, we know
+	 * whether a certain directory has any paths under it affected --
+	 * and if a directory is not affected then it permits us to do
+	 * trivial tree merging in more cases.  Doing trivial tree merging
+	 * prevents the need to run process_entry() on every path
+	 * underneath trees that can be trivially merged, and
+	 * process_entry() is more expensive than collect_merge_info() --
+	 * plus, the second collect_merge_info() will be much faster since
+	 * it doesn't have to recurse into the relevant trees.
+	 *
+	 * Values for this flag:
+	 *   0 = don't bother, not worth it (or conditions not yet checked)
+	 *   1 = conditions for optimization met, optimization worthwhile
+	 *   2 = we already did it (don't restart merge yet again)
+	 */
+	unsigned redo_after_renames;
+
 	/*
 	 * needed_limit: value needed for inexact rename detection to run
 	 *
@@ -540,7 +563,8 @@  static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
 		renames->trivial_merges_okay[i] = 1; /* 1 == maybe */
 		if (!reinitialize)
 			assert(renames->cached_pairs_valid_side == 0);
-		if (i != renames->cached_pairs_valid_side) {
+		if (i != renames->cached_pairs_valid_side &&
+		    -1 != renames->cached_pairs_valid_side) {
 			strset_func(&renames->cached_target_names[i]);
 			strmap_func(&renames->cached_pairs[i], 1);
 			strset_func(&renames->cached_irrelevant[i]);
@@ -1242,7 +1266,9 @@  static int handle_deferred_entries(struct merge_options *opt,
 	struct hashmap_iter iter;
 	struct strmap_entry *entry;
 	int side, ret = 0;
+	int path_count_before, path_count_after = 0;
 
+	path_count_before = strmap_get_size(&opt->priv->paths);
 	for (side = MERGE_SIDE1; side <= MERGE_SIDE2; side++) {
 		unsigned optimization_okay = 1;
 		struct strintmap copy;
@@ -1377,7 +1403,51 @@  static int handle_deferred_entries(struct merge_options *opt,
 						path));
 			resolve_trivial_directory_merge(ci, side);
 		}
+		if (!optimization_okay || path_count_after)
+			path_count_after = strmap_get_size(&opt->priv->paths);
 	}
+	if (path_count_after) {
+		/*
+		 * Not sure were the right cut-off is for the optimization
+		 * to redo collect_merge_info after we've cached the
+		 * regular renames is.  Basically, collect_merge_info(),
+		 * detect_regular_renames(), and process_entries() are
+		 * similar costs and all big tent poles.  Caching the
+		 * result of detect_regular_renames() means that redoing
+		 * that one function will cost us virtually 0 extra, so it
+		 * depends on the other two functions, which are both O(N)
+		 * cost in the number of paths.  Thus, it makes sense that
+		 * if we can cut the number of paths in half, then redoing
+		 * collect_merge_info() at half cost in order to get
+		 * process_entries() at half cost should be about equal
+		 * cost.  If we can cut by more than half, then we would
+		 * win.  The fact that process_entries() is about 10%-20%
+		 * more expensive than collect_merge_info() suggests we
+		 * could make the factor be less than two.  The fact that
+		 * even when we have renames cached, we still have to
+		 * traverse down to the individual (relevant) renames,
+		 * which suggests we should perhaps use a bigger factor.
+		 *
+		 * The exact number isn't critical, since the code will
+		 * work even if we get the factor wrong -- it just might be
+		 * slightly slower if we're a bit off.  For now, just error
+		 * on the side of a bigger fudge.  For the linux kernel
+		 * testcases I was looking at with massive renames, the
+		 * ratio came in around 50 to 250, which clearly would
+		 * trigger this optimization and provided some *very* nice
+		 * speedups.
+		 */
+		int wanted_factor = 3;
+
+		/* We should only redo collect_merge_info one time */
+		assert(renames->redo_after_renames == 0);
+
+		if (path_count_after / path_count_before > wanted_factor) {
+			renames->redo_after_renames = 1;
+			renames->cached_pairs_valid_side = -1;
+		}
+	} else if (renames->redo_after_renames == 2)
+		renames->redo_after_renames = 0;
 	return ret;
 }
 
@@ -2820,8 +2890,8 @@  static int compare_pairs(const void *a_, const void *b_)
 }
 
 /* Call diffcore_rename() to update deleted/added pairs into rename pairs */
-static void detect_regular_renames(struct merge_options *opt,
-				   unsigned side_index)
+static int detect_regular_renames(struct merge_options *opt,
+				  unsigned side_index)
 {
 	struct diff_options diff_opts;
 	struct rename_info *renames = &opt->priv->renames;
@@ -2834,7 +2904,7 @@  static void detect_regular_renames(struct merge_options *opt,
 		 * side had directory renames.
 		 */
 		resolve_diffpair_statuses(&renames->pairs[side_index]);
-		return;
+		return 0;
 	}
 
 	partial_clear_dir_rename_count(&renames->dir_rename_count[side_index]);
@@ -2860,6 +2930,8 @@  static void detect_regular_renames(struct merge_options *opt,
 	trace2_region_leave("diff", "diffcore_rename", opt->repo);
 	resolve_diffpair_statuses(&diff_queued_diff);
 
+	if (diff_opts.needed_rename_limit > 0)
+		renames->redo_after_renames = 0;
 	if (diff_opts.needed_rename_limit > renames->needed_limit)
 		renames->needed_limit = diff_opts.needed_rename_limit;
 
@@ -2869,6 +2941,8 @@  static void detect_regular_renames(struct merge_options *opt,
 	diff_queued_diff.nr = 0;
 	diff_queued_diff.queue = NULL;
 	diff_flush(&diff_opts);
+
+	return 1;
 }
 
 /*
@@ -2958,14 +3032,32 @@  static int detect_and_process_renames(struct merge_options *opt,
 	struct diff_queue_struct combined;
 	struct rename_info *renames = &opt->priv->renames;
 	int need_dir_renames, s, clean = 1;
+	unsigned detection_run = 0;
 
 	memset(&combined, 0, sizeof(combined));
 	if (!possible_renames(renames))
 		goto cleanup;
 
 	trace2_region_enter("merge", "regular renames", opt->repo);
-	detect_regular_renames(opt, MERGE_SIDE1);
-	detect_regular_renames(opt, MERGE_SIDE2);
+	detection_run |= detect_regular_renames(opt, MERGE_SIDE1);
+	detection_run |= detect_regular_renames(opt, MERGE_SIDE2);
+	if (renames->redo_after_renames && detection_run) {
+		int i, side;
+		struct diff_filepair *p;
+
+		/* Cache the renames, we found */
+		for (side = MERGE_SIDE1; side <= MERGE_SIDE2; side++) {
+			for (i = 0; i < renames->pairs[side].nr; ++i) {
+				p = renames->pairs[side].queue[i];
+				possibly_cache_new_pair(renames, p, side, NULL);
+			}
+		}
+
+		/* Restart the merge with the cached renames */
+		renames->redo_after_renames = 2;
+		trace2_region_leave("merge", "regular renames", opt->repo);
+		goto cleanup;
+	}
 	use_cached_pairs(opt, &renames->cached_pairs[1], &renames->pairs[1]);
 	use_cached_pairs(opt, &renames->cached_pairs[2], &renames->pairs[2]);
 	trace2_region_leave("merge", "regular renames", opt->repo);
@@ -4380,6 +4472,7 @@  static void merge_ort_nonrecursive_internal(struct merge_options *opt,
 					       opt->subtree_shift);
 	}
 
+redo:
 	trace2_region_enter("merge", "collect_merge_info", opt->repo);
 	if (collect_merge_info(opt, merge_base, side1, side2) != 0) {
 		/*
@@ -4399,6 +4492,12 @@  static void merge_ort_nonrecursive_internal(struct merge_options *opt,
 	result->clean = detect_and_process_renames(opt, merge_base,
 						   side1, side2);
 	trace2_region_leave("merge", "renames", opt->repo);
+	if (opt->priv->renames.redo_after_renames == 2) {
+		trace2_region_enter("merge", "reset_maps", opt->repo);
+		clear_or_reinit_internal_opts(opt->priv, 1);
+		trace2_region_leave("merge", "reset_maps", opt->repo);
+		goto redo;
+	}
 
 	trace2_region_enter("merge", "process_entries", opt->repo);
 	process_entries(opt, &working_tree_oid);
diff --git a/t/t6423-merge-rename-directories.sh b/t/t6423-merge-rename-directories.sh
index e834b7e6efe..d8919d276a1 100755
--- a/t/t6423-merge-rename-directories.sh
+++ b/t/t6423-merge-rename-directories.sh
@@ -4797,7 +4797,7 @@  test_setup_12f () {
 	)
 }
 
-test_expect_merge_algorithm failure failure '12f: Trivial directory resolve, caching, all kinds of fun' '
+test_expect_merge_algorithm failure success '12f: Trivial directory resolve, caching, all kinds of fun' '
 	test_setup_12f &&
 	(
 		cd 12f &&