Message ID | b5f8b81498e1d152014acab92fa1b6e9701b3a0e.1585363771.git.liu.denton@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] http-backend: write error packet if backend command fails | expand |
On Fri, Mar 27, 2020 at 10:50:32PM -0400, Denton Liu wrote: > If one tries to fetch packs with an incorrectly formatted parameter > while using the smart HTTP protocol v2, the client will block forever. > This is seen with the following command which does not terminate: > > $ git -c protocol.version=2 clone https://github.com/git/git.git --shallow-since=20151012 > Cloning into 'git'... That can't possibly be fixed by a change to http-backend.c, because github.com does not use git-http-backend. :) We have a custom proxy layer that terminates the http connection, and speaks to "upload-pack --stateless-rpc" on the backend. There may be a bug in that proxy layer, but it's usually pretty robust to upload-pack hanging up the connection. But since it works for v1 (which also dies!), and since you were able to reproduce the problem locally, I suspect it may be an issue in upload-pack itself. > This happens because when upload-pack detects invalid parameters, it > will die(). When http-backend calls finish_command() and detects a > non-zero exit code, it will simply call exit(1). Since there is no way > in a CGI script to force a connection to terminate, the client keeps > blocking on the read(), expecting more output. When the CGI exits, that should close the descriptor it holds. If the webserver hands off the socket to the CGI, that would be sufficient. If it doesn't, then it should notice the CGI exiting and close the socket itself. > Write an error packet if the backend command fails to start or finishes > with an error so that the client can terminate the connection. This is probably not a good idea; we don't know what state the protocol was in when the child died. -Peff
On Sat, Mar 28, 2020 at 10:57:42AM -0400, Jeff King wrote: > But since it works for v1 (which also dies!), and since you were able to > reproduce the problem locally, I suspect it may be an issue in > upload-pack itself. Actually, I think the problem is on the client side. If I run your test without the http-backend change, then nothing is left running on the server side, but on the client we have two processes: git-clone and the git-remote-https helper. And they're deadlocked trying to read from each other. I think the issue is that git-remote-https in v2 mode calls into stateless_connect(), and just pumps http request/response pairs from git-clone to the server. But if a request generates an empty response, then clone has no idea that anything was received at all. It continues waiting, but remote-https has looped to expect another request from it. Your patch fixes _this case_ because it causes the server to send a non-empty response. But the client can't rely on that. Besides that not being a documented server behavior, in the worst case the connection could be severed mid-stream. So I think remote-https needs to complain about an empty response. This isn't a problem in v1 because it would actually try to look at that empty response; in v2 it's just proxying bytes around. -Peff
On Sat, Mar 28, 2020 at 11:13:00AM -0400, Jeff King wrote: > On Sat, Mar 28, 2020 at 10:57:42AM -0400, Jeff King wrote: > > > But since it works for v1 (which also dies!), and since you were able to > > reproduce the problem locally, I suspect it may be an issue in > > upload-pack itself. > > Actually, I think the problem is on the client side. > > If I run your test without the http-backend change, then nothing is left > running on the server side, but on the client we have two processes: > git-clone and the git-remote-https helper. And they're deadlocked trying > to read from each other. > > I think the issue is that git-remote-https in v2 mode calls into > stateless_connect(), and just pumps http request/response pairs from > git-clone to the server. But if a request generates an empty response, > then clone has no idea that anything was received at all. It continues > waiting, but remote-https has looped to expect another request from it. > > Your patch fixes _this case_ because it causes the server to send a > non-empty response. But the client can't rely on that. Besides that not > being a documented server behavior, in the worst case the connection > could be severed mid-stream. So I think remote-https needs to complain > about an empty response. This isn't a problem in v1 because it would > actually try to look at that empty response; in v2 it's just proxying > bytes around. Ugh, it's actually much worse than this. We dealt with the empty-response case in 296b847c0d (remote-curl: don't hang when a server dies before any output, 2016-11-18). The issue here is that we got a _partial_ response, and clone is waiting for the rest of it. The server said "0011shallow-info\n" before it bailed. So from the perspective of git-clone, it's now waiting to read through a flush packet. But remote-https has nothing left to send. The root of the issue is that it has no way to signal to clone that it has already sent everything the server gave it. In non-helper code, clone would see the EOF. And in v1 https, I think there's some extra framing between remote-https and "fetch-pack --stateless-rpc" so that EOF effectively gets passed along. But v2's stateless_connect() strategy is fundamentally missing this signal, and there are probably other spots in the protocol that could cause similar deadlocks. I wonder if remote-https's stateless_connect() could complain if there's no flush packet in the output it writes back. That would certainly fix this case, but I'd worry there are rpc messages that don't end in a flush. And it would be susceptible to data cut-offs that happened to look like a flush packet. I think the robust solution is to introduce an extra level of pktline framing into the remote-helper stateless-connect protocol. -Peff
Hi Peff, On Sat, Mar 28, 2020 at 11:49:36AM -0400, Jeff King wrote: > On Sat, Mar 28, 2020 at 11:13:00AM -0400, Jeff King wrote: > > > On Sat, Mar 28, 2020 at 10:57:42AM -0400, Jeff King wrote: > > > > > But since it works for v1 (which also dies!), and since you were able to > > > reproduce the problem locally, I suspect it may be an issue in > > > upload-pack itself. > > > > Actually, I think the problem is on the client side. > > > > If I run your test without the http-backend change, then nothing is left > > running on the server side, but on the client we have two processes: > > git-clone and the git-remote-https helper. And they're deadlocked trying > > to read from each other. > > > > I think the issue is that git-remote-https in v2 mode calls into > > stateless_connect(), and just pumps http request/response pairs from > > git-clone to the server. But if a request generates an empty response, > > then clone has no idea that anything was received at all. It continues > > waiting, but remote-https has looped to expect another request from it. > > > > Your patch fixes _this case_ because it causes the server to send a > > non-empty response. But the client can't rely on that. Besides that not > > being a documented server behavior, in the worst case the connection > > could be severed mid-stream. So I think remote-https needs to complain > > about an empty response. This isn't a problem in v1 because it would > > actually try to look at that empty response; in v2 it's just proxying > > bytes around. > > Ugh, it's actually much worse than this. We dealt with the > empty-response case in 296b847c0d (remote-curl: don't hang when a server > dies before any output, 2016-11-18). The issue here is that we got a > _partial_ response, and clone is waiting for the rest of it. > > The server said "0011shallow-info\n" before it bailed. So from the > perspective of git-clone, it's now waiting to read through a flush > packet. But remote-https has nothing left to send. The root of the issue > is that it has no way to signal to clone that it has already sent > everything the server gave it. In non-helper code, clone would see the > EOF. And in v1 https, I think there's some extra framing between > remote-https and "fetch-pack --stateless-rpc" so that EOF effectively > gets passed along. But v2's stateless_connect() strategy is > fundamentally missing this signal, and there are probably other spots in > the protocol that could cause similar deadlocks. > > I wonder if remote-https's stateless_connect() could complain if there's > no flush packet in the output it writes back. That would certainly fix > this case, but I'd worry there are rpc messages that don't end in a > flush. And it would be susceptible to data cut-offs that happened to > look like a flush packet. > > I think the robust solution is to introduce an extra level of pktline > framing into the remote-helper stateless-connect protocol. Thanks for doing some more investigation and correcting my faulty analysis. I'll use this information to try and create a new patch. -Denton > -Peff
diff --git a/http-backend.c b/http-backend.c index ec3144b444..7ea3a688fd 100644 --- a/http-backend.c +++ b/http-backend.c @@ -488,10 +488,11 @@ static void run_service(const char **argv, int buffer_input) cld.git_cmd = 1; cld.clean_on_exit = 1; cld.wait_after_clean = 1; - if (start_command(&cld)) + if (start_command(&cld)) { + packet_write_fmt(1, "ERR %s failed to start\n", argv[0]); exit(1); + } - close(1); if (gzipped_request) inflate_request(argv[0], cld.in, buffer_input, req_len); else if (buffer_input) @@ -501,8 +502,11 @@ static void run_service(const char **argv, int buffer_input) else close(0); - if (finish_command(&cld)) + if (finish_command(&cld)) { + packet_write_fmt(1, "ERR %s failed\n", argv[0]); exit(1); + } + close(1); } static int show_text_ref(const char *name, const struct object_id *oid, diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh index 5039e66dc4..69a920a34b 100755 --- a/t/t5702-protocol-v2.sh +++ b/t/t5702-protocol-v2.sh @@ -586,6 +586,23 @@ test_expect_success 'clone with http:// using protocol v2' ' ! grep "Send header: Transfer-Encoding: chunked" log ' +test_expect_success 'clone with http:// using protocol v2 and invalid parameters' ' + test_when_finished "rm -f log" && + + test_must_fail env GIT_TRACE_PACKET="$(pwd)/log" GIT_TRACE_CURL="$(pwd)/log" \ + git -c protocol.version=2 \ + clone --shallow-since=20151012 "$HTTPD_URL/smart/http_parent" http_child_invalid && + + # Client requested to use protocol v2 + grep "Git-Protocol: version=2" log && + # Server responded using protocol v2 + grep "git< version 2" log && + # Verify that we sucessfully errored out + grep -F "ERR upload-pack failed" log && + # Verify that the chunked encoding sending codepath is NOT exercised + ! grep "Send header: Transfer-Encoding: chunked" log +' + test_expect_success 'clone big repository with http:// using protocol v2' ' test_when_finished "rm -f log" &&
If one tries to fetch packs with an incorrectly formatted parameter while using the smart HTTP protocol v2, the client will block forever. This is seen with the following command which does not terminate: $ git -c protocol.version=2 clone https://github.com/git/git.git --shallow-since=20151012 Cloning into 'git'... Meanwhile, if one uses v1, the command will terminate as expected: $ git -c protocol.version=1 clone https://github.com/git/git.git --shallow-since=20151012 Cloning into 'git'... fatal: the remote end hung up unexpectedly This happens because when upload-pack detects invalid parameters, it will die(). When http-backend calls finish_command() and detects a non-zero exit code, it will simply call exit(1). Since there is no way in a CGI script to force a connection to terminate, the client keeps blocking on the read(), expecting more output. Write an error packet if the backend command fails to start or finishes with an error so that the client can terminate the connection. Note that if the included test case is run without the remainder of the patch, it will run forever and fail to terminate. Signed-off-by: Denton Liu <liu.denton@gmail.com> --- Notes: So this is the first time I've touched the networking code of Git. There are a few problems with this patch that I'd appreciate some help addressing. First of all, is this even the right approach? I tried to find some other way to force a CGI script to terminate a connection but I don't think that it's possible so I had to get the client to do it instead. Next, with this patch applied it seems like t5539-fetch-http-shallow fails on 'no shallow lines after receiving ACK ready'. I've been trying to dig into why this happens but I can't quite figure it out. Finally, I'd like the test case I added in t5702 to be guarded by some timeout in case we regress instead of blocking the test suite forever but I'm not sure if that's even available functionality. It seems like we don't use the `timeout` program yet so I'm not sure if it's a good idea to introduce it here. Thanks, Denton http-backend.c | 10 +++++++--- t/t5702-protocol-v2.sh | 17 +++++++++++++++++ 2 files changed, 24 insertions(+), 3 deletions(-)