Message ID | 20240522201559.1677959-1-tom@compton.nu (mailing list archive) |
---|---|
State | Accepted |
Commit | 6549c41ead833c8d8c4098806a29399433065516 |
Headers | show |
Series | [v2] push: don't fetch commit object when checking existence | expand |
Tom Hughes <tom@compton.nu> writes: > diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh > index 88a66f0904..7797391c03 100755 > --- a/t/t0410-partial-clone.sh > +++ b/t/t0410-partial-clone.sh > @@ -689,6 +689,25 @@ test_expect_success 'lazy-fetch when accessing object not in the_repository' ' > ! grep "[?]$FILE_HASH" out > ' > > +test_expect_success 'push should not fetch new commit objects' ' > + rm -rf server client && > + test_create_repo server && > + test_config -C server uploadpack.allowfilter 1 && > + test_config -C server uploadpack.allowanysha1inwant 1 && > + test_commit -C server server1 && OK, we create the source that allows a partial clone. > + git clone --filter=blob:none "file://$(pwd)/server" client && > + test_commit -C client client1 && And make a clone out of it, without blobs. > + test_commit -C server server2 && > + COMMIT=$(git -C server rev-parse server2) && Then we create a new commit that the client does not yet have. > + test_must_fail git -C client push 2>err && We try to overwrite it. We expect it to fail with "not a fast forward". > + grep "fetch first" err && May want to use "test_grep" but this script does not use it, so being consistent with the surrounding tests is good. > + git -C client rev-list --objects --missing=print "$COMMIT" >objects && > + grep "^[?]$COMMIT" objects > +' OK. > . "$TEST_DIRECTORY"/lib-httpd.sh > start_httpd Looking good. Thanks, will queue.
On 22/05/2024 21:55, Junio C Hamano wrote: > Tom Hughes <tom@compton.nu> writes: > >> +test_expect_success 'push should not fetch new commit objects' ' >> + rm -rf server client && >> + test_create_repo server && >> + test_config -C server uploadpack.allowfilter 1 && >> + test_config -C server uploadpack.allowanysha1inwant 1 && >> + test_commit -C server server1 && > > OK, we create the source that allows a partial clone. > >> + git clone --filter=blob:none "file://$(pwd)/server" client && >> + test_commit -C client client1 && > > And make a clone out of it, without blobs. > >> + test_commit -C server server2 && >> + COMMIT=$(git -C server rev-parse server2) && > > Then we create a new commit that the client does not yet have. > >> + test_must_fail git -C client push 2>err && > > We try to overwrite it. We expect it to fail with "not a fast forward". Well that is what it would fail with at the moment, but it's not what would happen with a non-partial clone - a non-partial clone would fail with "fetch first" instead. This patch makes both cases consistent although that wasn't the main driver - the main driver was to stop it fetching 100Mb or more of history in the large repository I was working with when the upstream has one new commit. >> + grep "fetch first" err && > > May want to use "test_grep" but this script does not use it, so > being consistent with the surrounding tests is good. So here we are testing that it's a "fetch first" and rather than "not a fast forward". >> + git -C client rev-list --objects --missing=print "$COMMIT" >objects && >> + grep "^[?]$COMMIT" objects >> +' > > OK. and also that it hasn't fetched the new commit. Tom
Tom Hughes <tom@compton.nu> writes: >>> + test_must_fail git -C client push 2>err && >> We try to overwrite it. We expect it to fail with "not a fast >> forward". > > Well that is what it would fail with at the moment, but it's not > what would happen with a non-partial clone - a non-partial clone > would fail with "fetch first" instead. Oh, don't get me wrong. I wasn't trying to split hairs between the two error modes and their phrasing. The "fetch-first" from set_ref_status_for_push() is done before we even initiate the transfer to stop the operation, with a cheap check, that will eventually lead to "not a fast forward" error. IOW, in my mind, they are the same errors, just diagnosed at two different places in the code and their messages phrased differently. > So here we are testing that it's a "fetch first" and rather > than "not a fast forward". I think that is being overly specific, but that is fine. As I said, to the end users, these two errors mean the same thing (they would need to fetch first and then integrate their changes before pushing it out again), so it is plausible that we may in the future decide that we want to use the same message. When it happens, this test must change, which may even be a good thing (it makes it clear what the fallout from such a change looks like). >>> + git -C client rev-list --objects --missing=print "$COMMIT" >objects && >>> + grep "^[?]$COMMIT" objects >>> +' >> OK. > > and also that it hasn't fetched the new commit. Yes, and this is a good check that will stand the test of time, even across a change to rephrase the error message. Thanks.
On Wed, May 22, 2024 at 09:15:40PM +0100, Tom Hughes wrote: > diff --git a/remote.c b/remote.c > index 2b650b813b..20395bbbd0 100644 > --- a/remote.c > +++ b/remote.c > @@ -1773,7 +1773,7 @@ void set_ref_status_for_push(struct ref *remote_refs, int send_mirror, > if (!reject_reason && !ref->deletion && !is_null_oid(&ref->old_oid)) { > if (starts_with(ref->name, "refs/tags/")) > reject_reason = REF_STATUS_REJECT_ALREADY_EXISTS; > - else if (!repo_has_object_file(the_repository, &ref->old_oid)) > + else if (!repo_has_object_file_with_flags(the_repository, &ref->old_oid, OBJECT_INFO_SKIP_FETCH_OBJECT)) > reject_reason = REF_STATUS_REJECT_FETCH_FIRST; > else if (!lookup_commit_reference_gently(the_repository, &ref->old_oid, 1) || > !lookup_commit_reference_gently(the_repository, &ref->new_oid, 1)) This makes sense to me, as we're just speculatively asking "do we have the object". I think for that reason it would also be reasonable to use OBJECT_INFO_QUICK here, which would avoid a fruitless re-scan of the local objects/ directory. We often pair the two[1]. In practice, though, I think fetching the missing object is going to be much more expensive than a local re-scan. We tend to notice the latter only when you have a large number of objects to check, and here we're basically limited by the number of non-fast-forward refs you're trying to push. So I also think it would be OK to leave it here and only do QUICK if somebody ever notices it. -Peff [1] We've talked about unifying those two flags, since they so often come together. There's some discussion in: https://lore.kernel.org/git/20191011220822.154063-1-jonathantanmy@google.com/ that they could become one flag, but these two: https://lore.kernel.org/git/20190909222101.GB31319@sigill.intra.peff.net/ https://lore.kernel.org/git/20200322054916.GB578498@coredump.intra.peff.net/ argue that QUICK implies SKIP_FETCH, but not always the other way around. (Obviously getting a bit off topic for your patch; if anything, I think this call site would just use both for now).
diff --git a/remote.c b/remote.c index 2b650b813b..20395bbbd0 100644 --- a/remote.c +++ b/remote.c @@ -1773,7 +1773,7 @@ void set_ref_status_for_push(struct ref *remote_refs, int send_mirror, if (!reject_reason && !ref->deletion && !is_null_oid(&ref->old_oid)) { if (starts_with(ref->name, "refs/tags/")) reject_reason = REF_STATUS_REJECT_ALREADY_EXISTS; - else if (!repo_has_object_file(the_repository, &ref->old_oid)) + else if (!repo_has_object_file_with_flags(the_repository, &ref->old_oid, OBJECT_INFO_SKIP_FETCH_OBJECT)) reject_reason = REF_STATUS_REJECT_FETCH_FIRST; else if (!lookup_commit_reference_gently(the_repository, &ref->old_oid, 1) || !lookup_commit_reference_gently(the_repository, &ref->new_oid, 1)) diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh index 88a66f0904..7797391c03 100755 --- a/t/t0410-partial-clone.sh +++ b/t/t0410-partial-clone.sh @@ -689,6 +689,25 @@ test_expect_success 'lazy-fetch when accessing object not in the_repository' ' ! grep "[?]$FILE_HASH" out ' +test_expect_success 'push should not fetch new commit objects' ' + rm -rf server client && + test_create_repo server && + test_config -C server uploadpack.allowfilter 1 && + test_config -C server uploadpack.allowanysha1inwant 1 && + test_commit -C server server1 && + + git clone --filter=blob:none "file://$(pwd)/server" client && + test_commit -C client client1 && + + test_commit -C server server2 && + COMMIT=$(git -C server rev-parse server2) && + + test_must_fail git -C client push 2>err && + grep "fetch first" err && + git -C client rev-list --objects --missing=print "$COMMIT" >objects && + grep "^[?]$COMMIT" objects +' + . "$TEST_DIRECTORY"/lib-httpd.sh start_httpd
If we're checking to see whether to tell the user to do a fetch before pushing there's no need for us to actually fetch the object from the remote if the clone is partial. Because the promisor doesn't do negotiation actually trying to do the fetch of the new head can be very expensive as it will try and include history that we already have and it just results in rejecting the push with a different message, and in behavior that is different to a clone that is not partial. Signed-off-by: Tom Hughes <tom@compton.nu> --- remote.c | 2 +- t/t0410-partial-clone.sh | 19 +++++++++++++++++++ 2 files changed, 20 insertions(+), 1 deletion(-)