Message ID | 20180920184843.20898-1-jonathantanmy@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | fetch: in partial clone, check presence of targets | expand |
Jonathan Tan <jonathantanmy@google.com> writes: > diff --git a/builtin/fetch.c b/builtin/fetch.c > index 61bec5d21..e9640fe5a 100644 > --- a/builtin/fetch.c > +++ b/builtin/fetch.c > @@ -938,6 +938,25 @@ static int quickfetch(struct ref *ref_map) > */ > if (deepen) > return -1; > + > + if (repository_format_partial_clone) { > + /* > + * For the purposes of the connectivity check, > + * check_connected() considers all objects promised by > + * promisor objects as existing, which means that the > + * connectivity check would pass even if a target object > + * in rm is merely promised and not present. When > + * fetching objects, we need all of them to be present > + * (in addition to having correct connectivity), so > + * check them directly. > + */ > + struct ref *r; > + for (r = rm; r; r = r->next) { > + if (!has_object_file(&r->old_oid)) > + return -1; > + } Because check_connected() lies in the presense of "promisor", we can defeat it this way, which makes sense. I wonder if it makes sense to do this check unconditionally, even when we are in a fully functioning clone. The check is quite cheap and can reject a ref_map that has an object we do not know about, without check_connected() having to ask the rev-list. > + } > + > opt.quiet = 1; > return check_connected(iterate_ref_map, &rm, &opt); > } > diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh > index bbbe7537d..359d27d02 100755 > --- a/t/t5616-partial-clone.sh > +++ b/t/t5616-partial-clone.sh > @@ -170,6 +170,23 @@ test_expect_success 'partial clone fetches blobs pointed to by refs even if norm > git -C dst fsck > ' > > +test_expect_success 'fetch what is specified on CLI even if already promised' ' > + rm -rf src dst.git && > + git init src && > + test_commit -C src foo && > + test_config -C src uploadpack.allowfilter 1 && > + test_config -C src uploadpack.allowanysha1inwant 1 && > + > + git hash-object --stdin <src/foo.t >blob && > + > + git clone --bare --filter=blob:none "file://$(pwd)/src" dst.git && > + git -C dst.git rev-list --objects --quiet --missing=print HEAD >missing_before && > + grep "?$(cat blob)" missing_before && > + git -C dst.git fetch origin $(cat blob) && > + git -C dst.git rev-list --objects --quiet --missing=print HEAD >missing_after && > + ! grep "?$(cat blob)" missing_after > +' > + > . "$TEST_DIRECTORY"/lib-httpd.sh > start_httpd
> Jonathan Tan <jonathantanmy@google.com> writes: > > > + if (repository_format_partial_clone) { > > + /* > > + * For the purposes of the connectivity check, > > + * check_connected() considers all objects promised by > > + * promisor objects as existing, which means that the > > + * connectivity check would pass even if a target object > > + * in rm is merely promised and not present. When > > + * fetching objects, we need all of them to be present > > + * (in addition to having correct connectivity), so > > + * check them directly. > > + */ > > + struct ref *r; > > + for (r = rm; r; r = r->next) { > > + if (!has_object_file(&r->old_oid)) > > + return -1; > > + } > > Because check_connected() lies in the presense of "promisor", we can > defeat it this way, which makes sense. > > I wonder if it makes sense to do this check unconditionally, even > when we are in a fully functioning clone. The check is quite cheap > and can reject a ref_map that has an object we do not know about, > without check_connected() having to ask the rev-list. It makes sense to me from a runtime point of view - the usual case, for me, is when we're missing at least one object that we're fetching, and doing the cheap check even allows us to skip the expensive check. The hard part for me lies in how to communicate to future readers of the code that they cannot remove this section to simplify the code. We would need a more complicated comment, something like this: /* * Check if all wanted objects are present. * * If the local repository is a partial clone, check_connected() is not * sufficient - it would pass even if a wanted object is merely * promised and not present. This is because, for the purposes of the * connectivity check, check_connected() considers all objects promised * by promisor objects as existing. * * If the local repository is not a partial clone, check_connected() is * sufficient, but this loop allows us to avoid the expensive * check_connected() in the usual case that at least one wanted object * is missing. */
Jonathan Tan <jonathantanmy@google.com> writes: > The hard part for me lies in how to communicate to future readers of the > code that they cannot remove this section to simplify the code. We would > need a more complicated comment, something like this: That suggests two things. - Perhaps quickfetch() is misnamed. It is to ensure "these exist, and are 'connected'"; renaming the helper to convey that would be sufficient to prevent future readers from removing the "these exist" check from it. - Perhaps check_connected() is also misnamed, if the above "these exist, and are 'connected'" is not a sufficient warning against removal of the "these exist" test, perhaps "check_connected()" is not telling the readers that things that are 'connected' do not have to exist. What does being 'connected' mean in the world with "promised" objects anyway? The designer of the feature should probably have a concise and clear answer. > /* > * Check if all wanted objects are present. Here 'wanted' means... the tip that was directly requested must exist, and in addition, anything that is reachable from it must either exist locally or available from the lazy-clone source? But that is not quite it. Your definition of 'present' is fuzzy and mean two different things---for the wanted tips, they must exist. For the objects that are required for these wanted tips to be well formed, they do not have to exist but it is OK for them to be merely promised. Perhaps the comment for the quickfetch() function itself should say /* * Ensure that the requested tips exist locally, and anything that is * reachable from them either exist locally or promised to be available. */ Adding a similar comment to check_connected() function is left as an exercise, but I suspect it would be the latter half of the above sentence. It may be worth renaming both functions for clarity, as I mentioned already.
diff --git a/builtin/fetch.c b/builtin/fetch.c index 61bec5d21..e9640fe5a 100644 --- a/builtin/fetch.c +++ b/builtin/fetch.c @@ -938,6 +938,25 @@ static int quickfetch(struct ref *ref_map) */ if (deepen) return -1; + + if (repository_format_partial_clone) { + /* + * For the purposes of the connectivity check, + * check_connected() considers all objects promised by + * promisor objects as existing, which means that the + * connectivity check would pass even if a target object + * in rm is merely promised and not present. When + * fetching objects, we need all of them to be present + * (in addition to having correct connectivity), so + * check them directly. + */ + struct ref *r; + for (r = rm; r; r = r->next) { + if (!has_object_file(&r->old_oid)) + return -1; + } + } + opt.quiet = 1; return check_connected(iterate_ref_map, &rm, &opt); } diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh index bbbe7537d..359d27d02 100755 --- a/t/t5616-partial-clone.sh +++ b/t/t5616-partial-clone.sh @@ -170,6 +170,23 @@ test_expect_success 'partial clone fetches blobs pointed to by refs even if norm git -C dst fsck ' +test_expect_success 'fetch what is specified on CLI even if already promised' ' + rm -rf src dst.git && + git init src && + test_commit -C src foo && + test_config -C src uploadpack.allowfilter 1 && + test_config -C src uploadpack.allowanysha1inwant 1 && + + git hash-object --stdin <src/foo.t >blob && + + git clone --bare --filter=blob:none "file://$(pwd)/src" dst.git && + git -C dst.git rev-list --objects --quiet --missing=print HEAD >missing_before && + grep "?$(cat blob)" missing_before && + git -C dst.git fetch origin $(cat blob) && + git -C dst.git rev-list --objects --quiet --missing=print HEAD >missing_after && + ! grep "?$(cat blob)" missing_after +' + . "$TEST_DIRECTORY"/lib-httpd.sh start_httpd
When fetching an object that is known as a promisor object to the local repository, the connectivity check in quickfetch() in builtin/fetch.c succeeds, causing object transfer to be bypassed. However, this should not happen if that object is merely promised and not actually present. Because this happens, when a user invokes "git fetch origin <sha-1>" on the command-line, the <sha-1> object may not actually be fetched even though the command returns an exit code of 0. This is a similar issue (but with a different cause) to the one fixed by a0c9016abd ("upload-pack: send refs' objects despite "filter"", 2018-07-09). Therefore, update quickfetch() to also directly check for the presence of all objects to be fetched. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> --- In this patch, I have striven to avoid piping from Git commands that may fail, following the guidelines in [1]. [1] https://public-inbox.org/git/c625bfe2205d51b3158ef71e4bf472708642c146.1537223021.git.matvore@google.com/ --- builtin/fetch.c | 19 +++++++++++++++++++ t/t5616-partial-clone.sh | 17 +++++++++++++++++ 2 files changed, 36 insertions(+)