Message ID | 20191101203830.231676-1-jonathantanmy@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | fetch: remove fetch_if_missing=0 | expand |
Hi, Jonathan Tan wrote: > In fetch_pack() (and all functions it calls), pass > OBJECT_INFO_SKIP_FETCH_OBJECT whenever we query an object that could be > a tree or blob that we do not want to be lazy-fetched even if it is > absent. Thus, the only lazy-fetches occurring for trees and blobs are > when resolving deltas. > > Thus, we can remove fetch_if_missing=0 from builtin/fetch.c. Remove > this, and also add a test ensuring that such objects are not > lazy-fetched. (We might be able to remove fetch_if_missing=0 from other > places too, but I have limited myself to builtin/fetch.c in this commit > because I have not written tests for the other commands yet.) Hooray! Thanks much, this looks easier to maintain. > Note that commits and tags may still be lazy-fetched. I limited myself > to objects that could be trees or blobs here because Git does not > support creating such commit- and tag-excluding clones yet, and even if > such a clone were manually created, Git does not have good support for > fetching a single commit (when fetching a commit, it and all its > ancestors would be sent). Is there a place we could put a NEEDSWORK comment to avoid confusion when debugging if this gets introduced later? Even if not, this seems like a sensible choice. > Signed-off-by: Jonathan Tan <jonathantanmy@google.com> > --- > I've verified that this also solves the bug explained in: > https://public-inbox.org/git/20191007181825.13463-1-jonathantanmy@google.com/ Might be worth mentioning the example from there in the commit message as well, to help explain the context behind the change. I would still be in favor of applying that more conservative change to "master", even this late in the -rc cycle. [...] > --- a/builtin/fetch.c > +++ b/builtin/fetch.c > @@ -1074,7 +1074,8 @@ static int check_exist_and_connected(struct ref *ref_map) > * we need all direct targets to exist. > */ > for (r = rm; r; r = r->next) { > - if (!has_object_file(&r->old_oid)) > + if (!has_object_file_with_flags(&r->old_oid, > + OBJECT_INFO_SKIP_FETCH_OBJECT)) Yes. [...] > @@ -1755,8 +1756,6 @@ int cmd_fetch(int argc, const char **argv, const char *prefix) > > packet_trace_identity("fetch"); > > - fetch_if_missing = 0; > - This is the scary part, but in an "uncomfortably exciting" sense rather than a worrying one. Thanks for adding a test. [...] > --- a/fetch-pack.c > +++ b/fetch-pack.c > @@ -673,7 +673,8 @@ static void mark_complete_and_common_ref(struct fetch_negotiator *negotiator, > struct object *o; > > if (!has_object_file_with_flags(&ref->old_oid, > - OBJECT_INFO_QUICK)) > + OBJECT_INFO_QUICK | > + OBJECT_INFO_SKIP_FETCH_OBJECT)) Should we make OBJECT_INFO_QUICK always imply OBJECT_INFO_SKIP_FETCH_OBJECT? I would suspect that if we are willing to avoid checking thoroughly locally, checking remotely would be even more undesirable. [...] > --- a/t/t5616-partial-clone.sh > +++ b/t/t5616-partial-clone.sh > @@ -296,6 +296,75 @@ test_expect_success 'partial clone with unresolvable sparse filter fails cleanly > test_i18ngrep "unable to parse sparse filter data in" err > ' > > +setup_triangle () { > + rm -rf big-blob.txt server client promisor-remote && > + > + touch big-blob.txt && Tests seem to prefer spelling this as >big-blob.txt && because that specifes the content of the file. > + for i in $(seq 1 100) > + do > + echo line $i >>big-blob.txt > + done && Should this use test_seq for better portability? nit: can avoid a subshell: test_seq 1 100 | sed -e 's/^/line /' >big-blob.txt [...] > +test_expect_success 'fetch lazy-fetches only to resolve deltas' ' > + setup_triangle && > + > + # Exercise to make sure it works. Git will not fetch anything from the > + # promisor remote other than for the big blob (because it needs to > + # resolve the delta). > + GIT_TRACE_PACKET="$(pwd)/trace" git -C client \ > + fetch "file://$(pwd)/server" master && > + > + # Verify the assumption that the client needed to fetch the delta base > + # to resolve the delta. > + git hash-object big-blob.txt >hash && > + grep "want $(cat hash)" trace nit: can avoid using cat: hash=$(git hash-object big-blob.txt) && grep "want $hash" trace Thanks and hope that helps, Jonathan
Jonathan Nieder <jrnieder@gmail.com> writes: >> @@ -1755,8 +1756,6 @@ int cmd_fetch(int argc, const char **argv, const char *prefix) >> >> packet_trace_identity("fetch"); >> >> - fetch_if_missing = 0; >> - > > This is the scary part, but in an "uncomfortably exciting" sense rather > than a worrying one. Thanks for adding a test. > > [...] >> --- a/fetch-pack.c >> +++ b/fetch-pack.c >> @@ -673,7 +673,8 @@ static void mark_complete_and_common_ref(struct fetch_negotiator *negotiator, >> struct object *o; >> >> if (!has_object_file_with_flags(&ref->old_oid, >> - OBJECT_INFO_QUICK)) >> + OBJECT_INFO_QUICK | >> + OBJECT_INFO_SKIP_FETCH_OBJECT)) > > Should we make OBJECT_INFO_QUICK always imply > OBJECT_INFO_SKIP_FETCH_OBJECT? I would suspect that if we are willing to > avoid checking thoroughly locally, checking remotely would be even more > undesirable. I think I've seen this mentioned a few times during this cycle in the list archive ;-) >> + for i in $(seq 1 100) >> + do >> + echo line $i >>big-blob.txt >> + done && > > Should this use test_seq for better portability? Yup. > nit: can avoid a subshell: > > test_seq 1 100 | sed -e 's/^/line /' >big-blob.txt Yeah, but it costs process start-up and "sed" that may be rather heavyweight. At least for i in $(test_seq ...) do echo line $i done >big-blob.txt would save repeated opening and closing the file, I'd think. >> + git hash-object big-blob.txt >hash && >> + grep "want $(cat hash)" trace > > nit: can avoid using cat: > > hash=$(git hash-object big-blob.txt) && > grep "want $hash" trace > > Thanks and hope that helps, Thanks, both.
Jonathan Nieder <jrnieder@gmail.com> writes: > I would still be in favor of applying that more conservative change to > "master", even this late in the -rc cycle. I believe <20191023233403.145457-1-jonathantanmy@google.com> is the latest incarnation of the "conservative" approach. I think I can buy that, but let me see if there are interactions with other topics first.
On Sat, Nov 2, 2019 at 1:55 AM Junio C Hamano <gitster@pobox.com> wrote: > Jonathan Nieder <jrnieder@gmail.com> writes: > > nit: can avoid a subshell: > > > > test_seq 1 100 | sed -e 's/^/line /' >big-blob.txt > > Yeah, but it costs process start-up and "sed" that may be rather > heavyweight. At least > > for i in $(test_seq ...) > do > echo line $i > done >big-blob.txt > > would save repeated opening and closing the file, I'd think. More bikeshedding: printf "line %d\n" $(test_seq 1 100) >big-blob.txt is reasonably concise, perhaps easier to grok than the one-liner incorporating 'sed', and shouldn't run afoul of command-line length limitation.
Thanks for your review. > > Note that commits and tags may still be lazy-fetched. I limited myself > > to objects that could be trees or blobs here because Git does not > > support creating such commit- and tag-excluding clones yet, and even if > > such a clone were manually created, Git does not have good support for > > fetching a single commit (when fetching a commit, it and all its > > ancestors would be sent). > > Is there a place we could put a NEEDSWORK comment to avoid confusion > when debugging if this gets introduced later? > > Even if not, this seems like a sensible choice. Done in the test. > > I've verified that this also solves the bug explained in: > > https://public-inbox.org/git/20191007181825.13463-1-jonathantanmy@google.com/ > > Might be worth mentioning the example from there in the commit message > as well, to help explain the context behind the change. > > I would still be in favor of applying that more conservative change to > "master", even this late in the -rc cycle. If we're applying that change first, then this no longer fixes any bug, but is just a code cleanup. (If we don't apply that change, then I'll include that example in the commit message.) > Should we make OBJECT_INFO_QUICK always imply > OBJECT_INFO_SKIP_FETCH_OBJECT? I would suspect that if we are willing to > avoid checking thoroughly locally, checking remotely would be even more > undesirable. As I wrote in [1], the implication does not really go both ways. I think it's better to keep them separate. (Or, at least, we don't need to make the decision in this patch.) [1] https://public-inbox.org/git/20191011220822.154063-1-jonathantanmy@google.com/ > > +test_expect_success 'fetch lazy-fetches only to resolve deltas' ' > > + setup_triangle && > > + > > + # Exercise to make sure it works. Git will not fetch anything from the > > + # promisor remote other than for the big blob (because it needs to > > + # resolve the delta). > > + GIT_TRACE_PACKET="$(pwd)/trace" git -C client \ > > + fetch "file://$(pwd)/server" master && > > + > > + # Verify the assumption that the client needed to fetch the delta base > > + # to resolve the delta. > > + git hash-object big-blob.txt >hash && > > + grep "want $(cat hash)" trace > > nit: can avoid using cat: > > hash=$(git hash-object big-blob.txt) && > grep "want $hash" trace I think it's less error-prone if we always have a "git" command on its own on a line, to avoid losing its error code. When piped into another invocation, or when command-substituted into an argument (e.g. "echo $(git hash-object foo)"), we lose its error code.
Jonathan Tan wrote: > I think it's less error-prone if we always have a "git" command on its > own on a line, to avoid losing its error code. When piped into another > invocation, or when command-substituted into an argument (e.g. "echo > $(git hash-object foo)"), we lose its error code. Sure, but assignment is a special case: if myvar=$(false) then echo yes else echo no fi prints "no". See "Don't use command substitution in a way that discards git's exit code" in t/README for more details. Thanks, Jonathan
diff --git a/builtin/fetch.c b/builtin/fetch.c index 0c345b5dfe..5ff7367dd7 100644 --- a/builtin/fetch.c +++ b/builtin/fetch.c @@ -1074,7 +1074,8 @@ static int check_exist_and_connected(struct ref *ref_map) * we need all direct targets to exist. */ for (r = rm; r; r = r->next) { - if (!has_object_file(&r->old_oid)) + if (!has_object_file_with_flags(&r->old_oid, + OBJECT_INFO_SKIP_FETCH_OBJECT)) return -1; } @@ -1755,8 +1756,6 @@ int cmd_fetch(int argc, const char **argv, const char *prefix) packet_trace_identity("fetch"); - fetch_if_missing = 0; - /* Record the command line for the reflog */ strbuf_addstr(&default_rla, "fetch"); for (i = 1; i < argc; i++) diff --git a/fetch-pack.c b/fetch-pack.c index 0130b44112..37178e2d34 100644 --- a/fetch-pack.c +++ b/fetch-pack.c @@ -673,7 +673,8 @@ static void mark_complete_and_common_ref(struct fetch_negotiator *negotiator, struct object *o; if (!has_object_file_with_flags(&ref->old_oid, - OBJECT_INFO_QUICK)) + OBJECT_INFO_QUICK | + OBJECT_INFO_SKIP_FETCH_OBJECT)) continue; o = parse_object(the_repository, &ref->old_oid); if (!o) diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh index 79f7b65f8c..1081ed2de0 100755 --- a/t/t5616-partial-clone.sh +++ b/t/t5616-partial-clone.sh @@ -296,6 +296,75 @@ test_expect_success 'partial clone with unresolvable sparse filter fails cleanly test_i18ngrep "unable to parse sparse filter data in" err ' +setup_triangle () { + rm -rf big-blob.txt server client promisor-remote && + + touch big-blob.txt && + for i in $(seq 1 100) + do + echo line $i >>big-blob.txt + done && + + # Create a server with 2 commits: a commit with a big blob and a child + # commit with an incremental change. Also, create a partial clone + # client that only contains the first commit. + git init server && + git -C server config --local uploadpack.allowfilter 1 && + cp big-blob.txt server && + git -C server add big-blob.txt && + git -C server commit -m "initial" && + git clone --bare --filter=tree:0 "file://$(pwd)/server" client && + echo another line >>server/big-blob.txt && + git -C server commit -am "append line to big blob" && + + # Create a promisor remote that only contains the blob from the first + # commit, and set it as the promisor remote of client. Thus, whenever + # the client lazy fetches, the lazy fetch will succeed only if it is + # for this blob. + git init promisor-remote && + test_commit -C promisor-remote one && # so that ref advertisement is not empty + git -C promisor-remote config --local uploadpack.allowanysha1inwant 1 && + git -C promisor-remote hash-object -w --stdin <big-blob.txt && + git -C client remote set-url origin "file://$(pwd)/promisor-remote" +} + +test_expect_success 'fetch lazy-fetches only to resolve deltas' ' + setup_triangle && + + # Exercise to make sure it works. Git will not fetch anything from the + # promisor remote other than for the big blob (because it needs to + # resolve the delta). + GIT_TRACE_PACKET="$(pwd)/trace" git -C client \ + fetch "file://$(pwd)/server" master && + + # Verify the assumption that the client needed to fetch the delta base + # to resolve the delta. + git hash-object big-blob.txt >hash && + grep "want $(cat hash)" trace +' + +test_expect_success 'fetch lazy-fetches only to resolve deltas, protocol v2' ' + setup_triangle && + + git -C server config --local protocol.version 2 && + git -C client config --local protocol.version 2 && + git -C promisor-remote config --local protocol.version 2 && + + # Exercise to make sure it works. Git will not fetch anything from the + # promisor remote other than for the big blob (because it needs to + # resolve the delta). + GIT_TRACE_PACKET="$(pwd)/trace" git -C client \ + fetch "file://$(pwd)/server" master && + + # Verify that protocol version 2 was used. + grep "fetch< version 2" trace && + + # Verify the assumption that the client needed to fetch the delta base + # to resolve the delta. + git hash-object big-blob.txt >hash && + grep "want $(cat hash)" trace +' + . "$TEST_DIRECTORY"/lib-httpd.sh start_httpd