diff mbox series

[v2,5/7] merge-ort: defer recursing into directories when merge base is matched

Message ID 317553eadb68ce162b5ebea24045a9ca75342e91.1626204784.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series Optimization batch 14: trivial directory resolution | expand

Commit Message

Elijah Newren July 13, 2021, 7:33 p.m. UTC
From: Elijah Newren <newren@gmail.com>

When one side of history matches the merge base (including when the
merge base has no entry for the given directory), have
collect_merge_info_callback() defer recursing into the directory.  To
ensure those entries are eventually handled, add a call to
handled_deferred_entries() in collect_merge_info() after
traverse_trees() returns.

Note that the condition in collect_merge_info_callback() may look more
complicated than necessary at first glance;
renames->trivial_merges_okay[side] is always true until
handle_deferred_entries() is called, and possible_trivial_merges[side]
is always empty right now (and in the future won't be filled until
handle_deferred_entries() is called).  However, when
handle_deferred_entries() calls traverse_trees() for the relevant
deferred directories, those traverse_trees() calls will once again end
up in collect_merge_info_callback() for all the entries under those
subdirectories.  The extra conditions are there for such deferred cases
and will be used more as we do more with those variables.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 32 ++++++++++++++++++++++++++++++--
 1 file changed, 30 insertions(+), 2 deletions(-)

Comments

Derrick Stolee July 15, 2021, 2:43 p.m. UTC | #1
On 7/13/2021 3:33 PM, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@gmail.com>
...
> +		/*
> +		 * Check for whether we can avoid recursing due to one side
> +		 * matching the merge base.  The side that does NOT match is
> +		 * the one that might have a rename destination we need.
> +		 */
> +		assert(!side1_matches_mbase || !side2_matches_mbase);
> +		side = side1_matches_mbase ? MERGE_SIDE2 :
> +			side2_matches_mbase ? MERGE_SIDE1 : MERGE_BASE;
> +		if (filemask == 0 && (dirmask == 2 || dirmask == 4)) {
> +			/*
> +			 * Also defer recursing into new directories; set up a
> +			 * few variables to let us do so.
> +			 */
> +			ci->match_mask = (7 - dirmask);
> +			side = dirmask / 2;

Clever.

> +		}
> +		if (renames->dir_rename_mask != 0x07 &&
> +		    (side != MERGE_BASE) &&

nit: these parens are not necessary?

> +		    renames->trivial_merges_okay[side] &&
> +		    !strset_contains(&renames->target_dirs[side], pi.string)) {

Does this last condition mean "this directory is not already a parent of a
chosen rename target"?

> +			strintmap_set(&renames->possible_trivial_merges[side],
> +				      pi.string, renames->dir_rename_mask);
> +			renames->dir_rename_mask = prev_dir_rename_mask;
> +			return mask;
> +		}
> +
> +		/* We need to recurse */
>  		ci->match_mask &= filemask;
Thanks,
-Stolee
Elijah Newren July 15, 2021, 4:03 p.m. UTC | #2
On Thu, Jul 15, 2021 at 7:43 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 7/13/2021 3:33 PM, Elijah Newren via GitGitGadget wrote:
> > From: Elijah Newren <newren@gmail.com>
> ...
> > +             /*
> > +              * Check for whether we can avoid recursing due to one side
> > +              * matching the merge base.  The side that does NOT match is
> > +              * the one that might have a rename destination we need.
> > +              */
> > +             assert(!side1_matches_mbase || !side2_matches_mbase);
> > +             side = side1_matches_mbase ? MERGE_SIDE2 :
> > +                     side2_matches_mbase ? MERGE_SIDE1 : MERGE_BASE;
> > +             if (filemask == 0 && (dirmask == 2 || dirmask == 4)) {
> > +                     /*
> > +                      * Also defer recursing into new directories; set up a
> > +                      * few variables to let us do so.
> > +                      */
> > +                     ci->match_mask = (7 - dirmask);
> > +                     side = dirmask / 2;
>
> Clever.
>
> > +             }
> > +             if (renames->dir_rename_mask != 0x07 &&
> > +                 (side != MERGE_BASE) &&
>
> nit: these parens are not necessary?

Indeed; I'll remove them.

> > +                 renames->trivial_merges_okay[side] &&
> > +                 !strset_contains(&renames->target_dirs[side], pi.string)) {
>
> Does this last condition mean "this directory is not already a parent of a
> chosen rename target"?

I'm not sure what you mean by "chosen" here.  If by "chosen" you mean
"cached" (which comes from ort-perf-batch-11's caching of upstream
renames from previously cherry-picked commits), then yes.

Perhaps it's worth noting that rename detection has not yet been run
for this merge by the time we hit this logic, so the only renames we
can know are the cached ones from a previous pick.

> > +                     strintmap_set(&renames->possible_trivial_merges[side],
> > +                                   pi.string, renames->dir_rename_mask);
> > +                     renames->dir_rename_mask = prev_dir_rename_mask;
> > +                     return mask;
> > +             }
> > +
> > +             /* We need to recurse */
> >               ci->match_mask &= filemask;
> Thanks,
> -Stolee
Derrick Stolee July 15, 2021, 5:14 p.m. UTC | #3
On 7/15/2021 12:03 PM, Elijah Newren wrote:
> On Thu, Jul 15, 2021 at 7:43 AM Derrick Stolee <stolee@gmail.com> wrote:
>>
>> On 7/13/2021 3:33 PM, Elijah Newren via GitGitGadget wrote:
>>> From: Elijah Newren <newren@gmail.com>
...
>>> +                 renames->trivial_merges_okay[side] &&
>>> +                 !strset_contains(&renames->target_dirs[side], pi.string)) {
>>
>> Does this last condition mean "this directory is not already a parent of a
>> chosen rename target"?
> 
> I'm not sure what you mean by "chosen" here.  If by "chosen" you mean
> "cached" (which comes from ort-perf-batch-11's caching of upstream
> renames from previously cherry-picked commits), then yes.
> 
> Perhaps it's worth noting that rename detection has not yet been run
> for this merge by the time we hit this logic, so the only renames we
> can know are the cached ones from a previous pick.

I was missing the interaction with the cached results, which now makes
much more sense here.

Thanks,
-Stolee
diff mbox series

Patch

diff --git a/merge-ort.c b/merge-ort.c
index eb0e18d7546..bf0712d18a0 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -1141,8 +1141,35 @@  static int collect_merge_info_callback(int n,
 		struct tree_desc t[3];
 		void *buf[3] = {NULL, NULL, NULL};
 		const char *original_dir_name;
-		int i, ret;
+		int i, ret, side;
 
+		/*
+		 * Check for whether we can avoid recursing due to one side
+		 * matching the merge base.  The side that does NOT match is
+		 * the one that might have a rename destination we need.
+		 */
+		assert(!side1_matches_mbase || !side2_matches_mbase);
+		side = side1_matches_mbase ? MERGE_SIDE2 :
+			side2_matches_mbase ? MERGE_SIDE1 : MERGE_BASE;
+		if (filemask == 0 && (dirmask == 2 || dirmask == 4)) {
+			/*
+			 * Also defer recursing into new directories; set up a
+			 * few variables to let us do so.
+			 */
+			ci->match_mask = (7 - dirmask);
+			side = dirmask / 2;
+		}
+		if (renames->dir_rename_mask != 0x07 &&
+		    (side != MERGE_BASE) &&
+		    renames->trivial_merges_okay[side] &&
+		    !strset_contains(&renames->target_dirs[side], pi.string)) {
+			strintmap_set(&renames->possible_trivial_merges[side],
+				      pi.string, renames->dir_rename_mask);
+			renames->dir_rename_mask = prev_dir_rename_mask;
+			return mask;
+		}
+
+		/* We need to recurse */
 		ci->match_mask &= filemask;
 		newinfo = *info;
 		newinfo.prev = info;
@@ -1196,7 +1223,6 @@  static int collect_merge_info_callback(int n,
 	return mask;
 }
 
-MAYBE_UNUSED
 static int handle_deferred_entries(struct merge_options *opt,
 				   struct traverse_info *info)
 {
@@ -1285,6 +1311,8 @@  static int collect_merge_info(struct merge_options *opt,
 
 	trace2_region_enter("merge", "traverse_trees", opt->repo);
 	ret = traverse_trees(NULL, 3, t, &info);
+	if (ret == 0)
+		ret = handle_deferred_entries(opt, &info);
 	trace2_region_leave("merge", "traverse_trees", opt->repo);
 
 	return ret;