Message ID | 20230426205324.326501-2-sandals@crustytoothpaste.net (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Fix empty SHA-256 clones with v0 and v1 | expand |
"brian m. carlson" <sandals@crustytoothpaste.net> writes: > From: "brian m. carlson" <bk2204@github.com> > > When cloning an empty repository, the HTTP protocol version 0 currently > offers nothing but the header and flush packets for the /info/refs > endpoint. This means that no capabilities are provided, so the client > side doesn't know what capabilities are present. > > However, this does pose a problem when working with SHA-256 > repositories, since we use the capabilities to know the remote side's > object format (hash algorithm). It used to be possible to set the > correct algorithm with `GIT_DEFAULT_HASH` (which is what the Git LFS > testsuite did), but this no longer works as of 8b214c2e9d ("clone: "this no longer works as of" -> "this was a mistake and was fixed by". > propagate object-format when cloning from void", 2023-04-05), since > there we always read the hash algorithm from the remote. If there is no > hash algorithm provided, we default to SHA-1 for backwards > compatibility. Other than that, looks good to me. Thanks. Will queue.
On 2023-04-26 at 21:14:37, Junio C Hamano wrote: > "brian m. carlson" <sandals@crustytoothpaste.net> writes: > > > From: "brian m. carlson" <bk2204@github.com> > > > > When cloning an empty repository, the HTTP protocol version 0 currently > > offers nothing but the header and flush packets for the /info/refs > > endpoint. This means that no capabilities are provided, so the client > > side doesn't know what capabilities are present. > > > > However, this does pose a problem when working with SHA-256 > > repositories, since we use the capabilities to know the remote side's > > object format (hash algorithm). It used to be possible to set the > > correct algorithm with `GIT_DEFAULT_HASH` (which is what the Git LFS > > testsuite did), but this no longer works as of 8b214c2e9d ("clone: > > "this no longer works as of" -> "this was a mistake and was fixed by". I tend to disagree. While I agree that change is valuable because it fixes v2, which we want, it does cause a change in user-visible behaviour, which broke the Git LFS testsuite. Whether we like things working that way or not, clearly there were people relying on it. Fortunately, in that case, Git LFS can just enable protocol v2 and things work again, but I think "this no longer works" is accurate and more neutral, and addresses the issue. We wouldn't have to deal with that issue if we could gracefully handle git clone --local with older versions of the protocol, but one of the tests fails when we do that. I'll take some more time to see if I can come up with a nice way to gracefully handle that, and if so, I'll send a v2.
On Wed, Apr 26, 2023 at 09:28:31PM +0000, brian m. carlson wrote: > > > However, this does pose a problem when working with SHA-256 > > > repositories, since we use the capabilities to know the remote side's > > > object format (hash algorithm). It used to be possible to set the > > > correct algorithm with `GIT_DEFAULT_HASH` (which is what the Git LFS > > > testsuite did), but this no longer works as of 8b214c2e9d ("clone: > > > > "this no longer works as of" -> "this was a mistake and was fixed by". > > I tend to disagree. While I agree that change is valuable because it > fixes v2, which we want, it does cause a change in user-visible > behaviour, which broke the Git LFS testsuite. Whether we like things > working that way or not, clearly there were people relying on it. > > Fortunately, in that case, Git LFS can just enable protocol v2 and > things work again, but I think "this no longer works" is accurate and > more neutral, and addresses the issue. We wouldn't have to deal with > that issue if we could gracefully handle git clone --local with older > versions of the protocol, but one of the tests fails when we do that. > I'll take some more time to see if I can come up with a nice way to > gracefully handle that, and if so, I'll send a v2. Reiterating what I said upthread, I think it was always 50% broken. Taking the local hash format over the remote one was always the wrong thing to do, but it sometimes worked out (because we happened to match the remote). But the opposite case: git init --object-format=sha1 dst.git GIT_DEFAULT_HASH=sha256 git clone dst.git would previously have done the wrong thing. We just flipped which half was broken and which was not. -Peff
On Wed, Apr 26, 2023 at 08:53:23PM +0000, brian m. carlson wrote: > From: "brian m. carlson" <bk2204@github.com> > > When cloning an empty repository, the HTTP protocol version 0 currently > offers nothing but the header and flush packets for the /info/refs > endpoint. This means that no capabilities are provided, so the client > side doesn't know what capabilities are present. Is this really an HTTP problem? If I do: git init --bare --object-format=sha256 remote.git git -c protocol.version=0 clone --bare remote.git local.git git -C local.git rev-parse --show-object-format I will get sha1, which is wrong. Likewise with GIT_DEFAULT_HASH=sha256 on the clone (after Junio's recent patch), regardless of what the server claims. This is really a git-protocol issue that affects all transports. So I think in this hunk: > @@ -1379,6 +1381,8 @@ void upload_pack(const int advertise_refs, const int stateless_rpc, > data.no_done = 1; > head_ref_namespaced(send_ref, &data); > for_each_namespaced_ref(send_ref, &data); > + if (!data.sent_capabilities && advertise_refs) > + send_ref("capabilities^{}", null_oid(), 0, &data); > /* > * fflush stdout before calling advertise_shallow_grafts because send_ref > * uses stdio. you would want to drop the "&& advertise_refs" bit, after which both of the cases above would yield a sha256 repository. There is one other catch, though. Doing as I suggest results in a failure in t5509, because the new code does not interact correctly with namespaces. That is true of your version, as well; it's just that the test suite does not cover the combination of namespaces, http, and empty repos. The issue is that send_ref() will try to strip the namespace, and end up with NULL (which on my glibc system ends up with a ref named "(null)", but obviously could segfault, too). Something like this fixes it: diff --git a/environment.c b/environment.c index 8a96997539..37cd66b295 100644 --- a/environment.c +++ b/environment.c @@ -234,6 +234,8 @@ const char *get_git_namespace(void) const char *strip_namespace(const char *namespaced_ref) { const char *out; + if (!strcmp(namespaced_ref, "capabilities^{}")) + return namespaced_ref; /* magic ref */ if (skip_prefix(namespaced_ref, get_git_namespace(), &out)) return out; return NULL; but I suspect it would be cleaner to refactor send_ref() to allow sending a name more directly. (As an aside, it feels like send_ref() is also wrong not to check for NULL from strip_namespace(), but I guess in practice we do not feed it names outside of the namespace. Might be a good candidate for a BUG() check or other assertion). > +test_expect_success 'clone empty SHA-256 repository with protocol v0' ' > + rm -fr sha256 && > + echo sha256 >expected && > + GIT_TRACE=1 GIT_TRACE_PACKET=1 git -c protocol.version=0 clone "$HTTPD_URL/smart/sha256.git" && > + git -C sha256 rev-parse --show-object-format >actual && > + test_cmp actual expected && > + git ls-remote "$HTTPD_URL/smart/sha256.git" >actual && > + test_must_be_empty actual > +' This looks reasonable, though I think if we do not need HTTP to demonstrate the issue (and I don't think we do), then we should probably avoid it, just to get test coverage on platforms that don't support HTTP. -Peff
Jeff King <peff@peff.net> writes: > So I think in this hunk: > >> @@ -1379,6 +1381,8 @@ void upload_pack(const int advertise_refs, const int stateless_rpc, >> data.no_done = 1; >> head_ref_namespaced(send_ref, &data); >> for_each_namespaced_ref(send_ref, &data); >> + if (!data.sent_capabilities && advertise_refs) >> + send_ref("capabilities^{}", null_oid(), 0, &data); >> /* >> * fflush stdout before calling advertise_shallow_grafts because send_ref >> * uses stdio. > > you would want to drop the "&& advertise_refs" bit, after which both of > the cases above would yield a sha256 repository. Good suggestion. >> +test_expect_success 'clone empty SHA-256 repository with protocol v0' ' >> + rm -fr sha256 && >> + echo sha256 >expected && >> + GIT_TRACE=1 GIT_TRACE_PACKET=1 git -c protocol.version=0 clone "$HTTPD_URL/smart/sha256.git" && >> + git -C sha256 rev-parse --show-object-format >actual && >> + test_cmp actual expected && >> + git ls-remote "$HTTPD_URL/smart/sha256.git" >actual && >> + test_must_be_empty actual >> +' > > This looks reasonable, though I think if we do not need HTTP to > demonstrate the issue (and I don't think we do), then we should probably > avoid it, just to get test coverage on platforms that don't support > HTTP. HTTP tests tend to be more cumbersome to set up and harder to debug than the plain vanilla "over the pipe on the same machine" transport, so I tend to agree with the statement. They however represent a more common use case, so having HTTP tests in addition to non-HTTP tests would be nicer, if we can afford to. Thanks.
diff --git a/t/t5551-http-fetch-smart.sh b/t/t5551-http-fetch-smart.sh index 0908534f25..21b7767cbd 100755 --- a/t/t5551-http-fetch-smart.sh +++ b/t/t5551-http-fetch-smart.sh @@ -611,6 +611,33 @@ test_expect_success 'client falls back from v2 to v0 to match server' ' grep symref=HEAD:refs/heads/ trace ' +test_expect_success 'create empty http-accessible SHA-256 repository' ' + mkdir "$HTTPD_DOCUMENT_ROOT_PATH/sha256.git" && + (cd "$HTTPD_DOCUMENT_ROOT_PATH/sha256.git" && + git --bare init --object-format=sha256 + ) +' + +test_expect_success 'clone empty SHA-256 repository with protocol v2' ' + rm -fr sha256 && + echo sha256 >expected && + git -c protocol.version=2 clone "$HTTPD_URL/smart/sha256.git" && + git -C sha256 rev-parse --show-object-format >actual && + test_cmp actual expected && + git ls-remote "$HTTPD_URL/smart/sha256.git" >actual && + test_must_be_empty actual +' + +test_expect_success 'clone empty SHA-256 repository with protocol v0' ' + rm -fr sha256 && + echo sha256 >expected && + GIT_TRACE=1 GIT_TRACE_PACKET=1 git -c protocol.version=0 clone "$HTTPD_URL/smart/sha256.git" && + git -C sha256 rev-parse --show-object-format >actual && + test_cmp actual expected && + git ls-remote "$HTTPD_URL/smart/sha256.git" >actual && + test_must_be_empty actual +' + test_expect_success 'passing hostname resolution information works' ' BOGUS_HOST=gitbogusexamplehost.invalid && BOGUS_HTTPD_URL=$HTTPD_PROTO://$BOGUS_HOST:$LIB_HTTPD_PORT && diff --git a/t/t5700-protocol-v1.sh b/t/t5700-protocol-v1.sh index 6c8d4c6cf1..3cd9db9012 100755 --- a/t/t5700-protocol-v1.sh +++ b/t/t5700-protocol-v1.sh @@ -249,10 +249,12 @@ test_expect_success 'push with ssh:// using protocol v1' ' . "$TEST_DIRECTORY"/lib-httpd.sh start_httpd -test_expect_success 'create repo to be served by http:// transport' ' +test_expect_success 'create repos to be served by http:// transport' ' git init "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" && git -C "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" config http.receivepack true && - test_commit -C "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" one + test_commit -C "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" one && + git init --object-format=sha256 "$HTTPD_DOCUMENT_ROOT_PATH/sha256" && + git -C "$HTTPD_DOCUMENT_ROOT_PATH/sha256" config http.receivepack true ' test_expect_success 'clone with http:// using protocol v1' ' @@ -269,6 +271,21 @@ test_expect_success 'clone with http:// using protocol v1' ' grep "git< version 1" log ' +test_expect_success 'clone with http:// using protocol v1 with empty SHA-256 repo' ' + GIT_TRACE_PACKET=1 GIT_TRACE_CURL=1 git -c protocol.version=1 \ + clone "$HTTPD_URL/smart/sha256" sha256 2>log && + + cat log && + echo sha256 >expect && + git -C sha256 rev-parse --show-object-format >actual && + test_cmp expect actual && + + # Client requested to use protocol v1 + grep "Git-Protocol: version=1" log && + # Server responded using protocol v1 + grep "git< version 1" log +' + test_expect_success 'fetch with http:// using protocol v1' ' test_commit -C "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" two && diff --git a/upload-pack.c b/upload-pack.c index 08633dc121..5ef9b162b6 100644 --- a/upload-pack.c +++ b/upload-pack.c @@ -120,6 +120,7 @@ struct upload_pack_data { unsigned allow_ref_in_want : 1; /* v2 only */ unsigned allow_sideband_all : 1; /* v2 only */ unsigned advertise_sid : 1; + unsigned sent_capabilities : 1; }; static void upload_pack_data_init(struct upload_pack_data *data) @@ -1240,6 +1241,7 @@ static int send_ref(const char *refname, const struct object_id *oid, git_user_agent_sanitized()); strbuf_release(&symref_info); strbuf_release(&session_id); + data->sent_capabilities = 1; } else { packet_fwrite_fmt(stdout, "%s %s\n", oid_to_hex(oid), refname_nons); } @@ -1379,6 +1381,8 @@ void upload_pack(const int advertise_refs, const int stateless_rpc, data.no_done = 1; head_ref_namespaced(send_ref, &data); for_each_namespaced_ref(send_ref, &data); + if (!data.sent_capabilities && advertise_refs) + send_ref("capabilities^{}", null_oid(), 0, &data); /* * fflush stdout before calling advertise_shallow_grafts because send_ref * uses stdio.