Message ID | 317bcc7f56cb718a8be625838576f33ce788c3ef.1623796907.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Optimization batch 13: partial clone optimizations for merge-ort | expand |
"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes: > + /* Ignore clean entries */ > + if (ci->merged.clean) > + continue; > + > + /* Ignore entries that don't need a content merge */ > + if (ci->match_mask || ci->filemask < 6 || > + !S_ISREG(ci->stages[1].mode) || > + !S_ISREG(ci->stages[2].mode) || > + oideq(&ci->stages[1].oid, &ci->stages[2].oid)) > + continue; > + > + /* Also don't need content merge if base matches either side */ > + if (ci->filemask == 7 && > + S_ISREG(ci->stages[0].mode) && > + (oideq(&ci->stages[0].oid, &ci->stages[1].oid) || > + oideq(&ci->stages[0].oid, &ci->stages[2].oid))) > + continue; Even though this is unlikely to change, it is unsatisfactory that we reproduce the knowledge on the situations when a merge will trivially resolve and when it will need to go content level. One obvious way to solve it would be to fold this logic into the main code that actually merges a list of "ci"s by making it a two pass process (the first pass does essentially the same as this new function, the second pass does the tree-level merge where the above says "continue", fills mmfiles with the loop below, and calls into ll_merge() after the loop to merge), but the logic duplication is not too big and it may not be worth such a code churn. > + for (i = 0; i < 3; i++) { > + unsigned side_mask = (1 << i); > + struct version_info *vi = &ci->stages[i]; > + > + if ((ci->filemask & side_mask) && > + S_ISREG(vi->mode) && > + oid_object_info_extended(opt->repo, &vi->oid, NULL, > + OBJECT_INFO_FOR_PREFETCH)) > + oid_array_append(&to_fetch, &vi->oid); > + } > + } > + > + promisor_remote_get_direct(opt->repo, to_fetch.oid, to_fetch.nr); > + oid_array_clear(&to_fetch); > +} > +
On Wed, Jun 16, 2021 at 10:04 PM Junio C Hamano <gitster@pobox.com> wrote: > > "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > + /* Ignore clean entries */ > > + if (ci->merged.clean) > > + continue; > > + > > + /* Ignore entries that don't need a content merge */ > > + if (ci->match_mask || ci->filemask < 6 || > > + !S_ISREG(ci->stages[1].mode) || > > + !S_ISREG(ci->stages[2].mode) || > > + oideq(&ci->stages[1].oid, &ci->stages[2].oid)) > > + continue; > > + > > + /* Also don't need content merge if base matches either side */ > > + if (ci->filemask == 7 && > > + S_ISREG(ci->stages[0].mode) && > > + (oideq(&ci->stages[0].oid, &ci->stages[1].oid) || > > + oideq(&ci->stages[0].oid, &ci->stages[2].oid))) > > + continue; > > Even though this is unlikely to change, it is unsatisfactory that we > reproduce the knowledge on the situations when a merge will > trivially resolve and when it will need to go content level. I agree, it's not the nicest. > One obvious way to solve it would be to fold this logic into the > main code that actually merges a list of "ci"s by making it a two > pass process (the first pass does essentially the same as this new > function, the second pass does the tree-level merge where the above > says "continue", fills mmfiles with the loop below, and calls into > ll_merge() after the loop to merge), but the logic duplication is > not too big and it may not be worth such a code churn. I'm worried even more about the resulting complexity than the code churn. The two-pass model, which I considered, would require special casing so many of the branches of process_entry() that it feels like it'd be increasing code complexity more than introducing a function with a few duplicated checks. process_entry() was already a function that Stolee reported as coming across as pretty complex to him in earlier rounds of review, but that seems to just be intrinsic based on the number of special cases: handling anything from entries with D/F conflicts, to different file types, to match_mask being precomputed, to recursive vs. normal cases, to modify/delete, to normalization, to added on one side, to deleted on both side, to three-way content merges. The three-way content merges are just one of 9-ish different branches, and are the only one that we're prefetching for. It just seems easier and cleaner overall to add these three checks to pick off the cases that will end up going through the three-way content merges. I've looked at it again a couple times over the past few days based on your comment, but I still can't see a way to restructure it that feels cleaner than what I've currently got. Also, it may be worth noting here that if these checks fell out of date with process_entry() in some manner, it still would not affect the correctness of the code. At worst, it'd only affect whether enough or too many objects are prefetched. If too many, then some extra objects would be downloaded, and if too few, then we'd end up later fetching additional objects 1-by-1 on demand later. So I'm going to agree with the not-worth-it portion of your final sentence and leave this out of the next roll.
diff --git a/merge-ort.c b/merge-ort.c index cfa751053b01..e3a5dfc7b312 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -29,6 +29,7 @@ #include "entry.h" #include "ll-merge.h" #include "object-store.h" +#include "promisor-remote.h" #include "revision.h" #include "strmap.h" #include "submodule.h" @@ -3485,6 +3486,54 @@ static void process_entry(struct merge_options *opt, record_entry_for_tree(dir_metadata, path, &ci->merged); } +static void prefetch_for_content_merges(struct merge_options *opt, + struct string_list *plist) +{ + struct string_list_item *e; + struct oid_array to_fetch = OID_ARRAY_INIT; + + if (opt->repo != the_repository || !has_promisor_remote()) + return; + + for (e = &plist->items[plist->nr-1]; e >= plist->items; --e) { + /* char *path = e->string; */ + struct conflict_info *ci = e->util; + int i; + + /* Ignore clean entries */ + if (ci->merged.clean) + continue; + + /* Ignore entries that don't need a content merge */ + if (ci->match_mask || ci->filemask < 6 || + !S_ISREG(ci->stages[1].mode) || + !S_ISREG(ci->stages[2].mode) || + oideq(&ci->stages[1].oid, &ci->stages[2].oid)) + continue; + + /* Also don't need content merge if base matches either side */ + if (ci->filemask == 7 && + S_ISREG(ci->stages[0].mode) && + (oideq(&ci->stages[0].oid, &ci->stages[1].oid) || + oideq(&ci->stages[0].oid, &ci->stages[2].oid))) + continue; + + for (i = 0; i < 3; i++) { + unsigned side_mask = (1 << i); + struct version_info *vi = &ci->stages[i]; + + if ((ci->filemask & side_mask) && + S_ISREG(vi->mode) && + oid_object_info_extended(opt->repo, &vi->oid, NULL, + OBJECT_INFO_FOR_PREFETCH)) + oid_array_append(&to_fetch, &vi->oid); + } + } + + promisor_remote_get_direct(opt->repo, to_fetch.oid, to_fetch.nr); + oid_array_clear(&to_fetch); +} + static void process_entries(struct merge_options *opt, struct object_id *result_oid) { @@ -3531,6 +3580,7 @@ static void process_entries(struct merge_options *opt, * the way when it is time to process the file at the same path). */ trace2_region_enter("merge", "processing", opt->repo); + prefetch_for_content_merges(opt, &plist); for (entry = &plist.items[plist.nr-1]; entry >= plist.items; --entry) { char *path = entry->string; /* diff --git a/t/t6421-merge-partial-clone.sh b/t/t6421-merge-partial-clone.sh index a011f8d27867..26964aa56256 100755 --- a/t/t6421-merge-partial-clone.sh +++ b/t/t6421-merge-partial-clone.sh @@ -396,7 +396,7 @@ test_expect_merge_algorithm failure success 'Objects downloaded when a directory # # Summary: 4 fetches (1 for 6 objects, 1 for 8, 1 for 3, 1 for 2) # -test_expect_merge_algorithm failure failure 'Objects downloaded with lots of renames and modifications' ' +test_expect_merge_algorithm failure success 'Objects downloaded with lots of renames and modifications' ' test_setup_repo && git clone --sparse --filter=blob:none "file://$(pwd)/server" objects-many && (