diff mbox series

[RFC] http-backend: write error packet if backend command fails

Message ID b5f8b81498e1d152014acab92fa1b6e9701b3a0e.1585363771.git.liu.denton@gmail.com (mailing list archive)
State New, archived
Headers show
Series [RFC] http-backend: write error packet if backend command fails | expand

Commit Message

Denton Liu March 28, 2020, 2:50 a.m. UTC
If one tries to fetch packs with an incorrectly formatted parameter
while using the smart HTTP protocol v2, the client will block forever.
This is seen with the following command which does not terminate:

	$ git -c protocol.version=2 clone https://github.com/git/git.git --shallow-since=20151012
	Cloning into 'git'...

Meanwhile, if one uses v1, the command will terminate as expected:

	$ git -c protocol.version=1 clone https://github.com/git/git.git --shallow-since=20151012
	Cloning into 'git'...
	fatal: the remote end hung up unexpectedly

This happens because when upload-pack detects invalid parameters, it
will die(). When http-backend calls finish_command() and detects a
non-zero exit code, it will simply call exit(1). Since there is no way
in a CGI script to force a connection to terminate, the client keeps
blocking on the read(), expecting more output.

Write an error packet if the backend command fails to start or finishes
with an error so that the client can terminate the connection.

Note that if the included test case is run without the remainder of the
patch, it will run forever and fail to terminate.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
---

Notes:
    So this is the first time I've touched the networking code of Git. There
    are a few problems with this patch that I'd appreciate some help
    addressing.
    
    First of all, is this even the right approach? I tried to find some
    other way to force a CGI script to terminate a connection but I don't
    think that it's possible so I had to get the client to do it instead.
    
    Next, with this patch applied it seems like t5539-fetch-http-shallow
    fails on 'no shallow lines after receiving ACK ready'. I've been trying
    to dig into why this happens but I can't quite figure it out.
    
    Finally, I'd like the test case I added in t5702 to be guarded by some
    timeout in case we regress instead of blocking the test suite forever
    but I'm not sure if that's even available functionality. It seems like
    we don't use the `timeout` program yet so I'm not sure if it's a good
    idea to introduce it here.
    
    Thanks,
    
    Denton

 http-backend.c         | 10 +++++++---
 t/t5702-protocol-v2.sh | 17 +++++++++++++++++
 2 files changed, 24 insertions(+), 3 deletions(-)

Comments

Jeff King March 28, 2020, 2:57 p.m. UTC | #1
On Fri, Mar 27, 2020 at 10:50:32PM -0400, Denton Liu wrote:

> If one tries to fetch packs with an incorrectly formatted parameter
> while using the smart HTTP protocol v2, the client will block forever.
> This is seen with the following command which does not terminate:
> 
> 	$ git -c protocol.version=2 clone https://github.com/git/git.git --shallow-since=20151012
> 	Cloning into 'git'...

That can't possibly be fixed by a change to http-backend.c, because
github.com does not use git-http-backend. :)

We have a custom proxy layer that terminates the http connection, and
speaks to "upload-pack --stateless-rpc" on the backend. There may be a
bug in that proxy layer, but it's usually pretty robust to upload-pack
hanging up the connection.

But since it works for v1 (which also dies!), and since you were able to
reproduce the problem locally, I suspect it may be an issue in
upload-pack itself.

> This happens because when upload-pack detects invalid parameters, it
> will die(). When http-backend calls finish_command() and detects a
> non-zero exit code, it will simply call exit(1). Since there is no way
> in a CGI script to force a connection to terminate, the client keeps
> blocking on the read(), expecting more output.

When the CGI exits, that should close the descriptor it holds. If the
webserver hands off the socket to the CGI, that would be sufficient. If
it doesn't, then it should notice the CGI exiting and close the socket
itself.

> Write an error packet if the backend command fails to start or finishes
> with an error so that the client can terminate the connection.

This is probably not a good idea; we don't know what state the protocol
was in when the child died.

-Peff
Jeff King March 28, 2020, 3:13 p.m. UTC | #2
On Sat, Mar 28, 2020 at 10:57:42AM -0400, Jeff King wrote:

> But since it works for v1 (which also dies!), and since you were able to
> reproduce the problem locally, I suspect it may be an issue in
> upload-pack itself.

Actually, I think the problem is on the client side.

If I run your test without the http-backend change, then nothing is left
running on the server side, but on the client we have two processes:
git-clone and the git-remote-https helper. And they're deadlocked trying
to read from each other.

I think the issue is that git-remote-https in v2 mode calls into
stateless_connect(), and just pumps http request/response pairs from
git-clone to the server. But if a request generates an empty response,
then clone has no idea that anything was received at all. It continues
waiting, but remote-https has looped to expect another request from it.

Your patch fixes _this case_ because it causes the server to send a
non-empty response. But the client can't rely on that. Besides that not
being a documented server behavior, in the worst case the connection
could be severed mid-stream. So I think remote-https needs to complain
about an empty response. This isn't a problem in v1 because it would
actually try to look at that empty response; in v2 it's just proxying
bytes around.

-Peff
Jeff King March 28, 2020, 3:49 p.m. UTC | #3
On Sat, Mar 28, 2020 at 11:13:00AM -0400, Jeff King wrote:

> On Sat, Mar 28, 2020 at 10:57:42AM -0400, Jeff King wrote:
> 
> > But since it works for v1 (which also dies!), and since you were able to
> > reproduce the problem locally, I suspect it may be an issue in
> > upload-pack itself.
> 
> Actually, I think the problem is on the client side.
> 
> If I run your test without the http-backend change, then nothing is left
> running on the server side, but on the client we have two processes:
> git-clone and the git-remote-https helper. And they're deadlocked trying
> to read from each other.
> 
> I think the issue is that git-remote-https in v2 mode calls into
> stateless_connect(), and just pumps http request/response pairs from
> git-clone to the server. But if a request generates an empty response,
> then clone has no idea that anything was received at all. It continues
> waiting, but remote-https has looped to expect another request from it.
> 
> Your patch fixes _this case_ because it causes the server to send a
> non-empty response. But the client can't rely on that. Besides that not
> being a documented server behavior, in the worst case the connection
> could be severed mid-stream. So I think remote-https needs to complain
> about an empty response. This isn't a problem in v1 because it would
> actually try to look at that empty response; in v2 it's just proxying
> bytes around.

Ugh, it's actually much worse than this. We dealt with the
empty-response case in 296b847c0d (remote-curl: don't hang when a server
dies before any output, 2016-11-18). The issue here is that we got a
_partial_ response, and clone is waiting for the rest of it.

The server said "0011shallow-info\n" before it bailed. So from the
perspective of git-clone, it's now waiting to read through a flush
packet. But remote-https has nothing left to send. The root of the issue
is that it has no way to signal to clone that it has already sent
everything the server gave it. In non-helper code, clone would see the
EOF. And in v1 https, I think there's some extra framing between
remote-https and "fetch-pack --stateless-rpc" so that EOF effectively
gets passed along. But v2's stateless_connect() strategy is
fundamentally missing this signal, and there are probably other spots in
the protocol that could cause similar deadlocks.

I wonder if remote-https's stateless_connect() could complain if there's
no flush packet in the output it writes back. That would certainly fix
this case, but I'd worry there are rpc messages that don't end in a
flush. And it would be susceptible to data cut-offs that happened to
look like a flush packet.

I think the robust solution is to introduce an extra level of pktline
framing into the remote-helper stateless-connect protocol.

-Peff
Denton Liu March 29, 2020, 5:34 a.m. UTC | #4
Hi Peff,

On Sat, Mar 28, 2020 at 11:49:36AM -0400, Jeff King wrote:
> On Sat, Mar 28, 2020 at 11:13:00AM -0400, Jeff King wrote:
> 
> > On Sat, Mar 28, 2020 at 10:57:42AM -0400, Jeff King wrote:
> > 
> > > But since it works for v1 (which also dies!), and since you were able to
> > > reproduce the problem locally, I suspect it may be an issue in
> > > upload-pack itself.
> > 
> > Actually, I think the problem is on the client side.
> > 
> > If I run your test without the http-backend change, then nothing is left
> > running on the server side, but on the client we have two processes:
> > git-clone and the git-remote-https helper. And they're deadlocked trying
> > to read from each other.
> > 
> > I think the issue is that git-remote-https in v2 mode calls into
> > stateless_connect(), and just pumps http request/response pairs from
> > git-clone to the server. But if a request generates an empty response,
> > then clone has no idea that anything was received at all. It continues
> > waiting, but remote-https has looped to expect another request from it.
> > 
> > Your patch fixes _this case_ because it causes the server to send a
> > non-empty response. But the client can't rely on that. Besides that not
> > being a documented server behavior, in the worst case the connection
> > could be severed mid-stream. So I think remote-https needs to complain
> > about an empty response. This isn't a problem in v1 because it would
> > actually try to look at that empty response; in v2 it's just proxying
> > bytes around.
> 
> Ugh, it's actually much worse than this. We dealt with the
> empty-response case in 296b847c0d (remote-curl: don't hang when a server
> dies before any output, 2016-11-18). The issue here is that we got a
> _partial_ response, and clone is waiting for the rest of it.
> 
> The server said "0011shallow-info\n" before it bailed. So from the
> perspective of git-clone, it's now waiting to read through a flush
> packet. But remote-https has nothing left to send. The root of the issue
> is that it has no way to signal to clone that it has already sent
> everything the server gave it. In non-helper code, clone would see the
> EOF. And in v1 https, I think there's some extra framing between
> remote-https and "fetch-pack --stateless-rpc" so that EOF effectively
> gets passed along. But v2's stateless_connect() strategy is
> fundamentally missing this signal, and there are probably other spots in
> the protocol that could cause similar deadlocks.
> 
> I wonder if remote-https's stateless_connect() could complain if there's
> no flush packet in the output it writes back. That would certainly fix
> this case, but I'd worry there are rpc messages that don't end in a
> flush. And it would be susceptible to data cut-offs that happened to
> look like a flush packet.
> 
> I think the robust solution is to introduce an extra level of pktline
> framing into the remote-helper stateless-connect protocol.

Thanks for doing some more investigation and correcting my faulty
analysis. I'll use this information to try and create a new patch.

-Denton

> -Peff
diff mbox series

Patch

diff --git a/http-backend.c b/http-backend.c
index ec3144b444..7ea3a688fd 100644
--- a/http-backend.c
+++ b/http-backend.c
@@ -488,10 +488,11 @@  static void run_service(const char **argv, int buffer_input)
 	cld.git_cmd = 1;
 	cld.clean_on_exit = 1;
 	cld.wait_after_clean = 1;
-	if (start_command(&cld))
+	if (start_command(&cld)) {
+		packet_write_fmt(1, "ERR %s failed to start\n", argv[0]);
 		exit(1);
+	}
 
-	close(1);
 	if (gzipped_request)
 		inflate_request(argv[0], cld.in, buffer_input, req_len);
 	else if (buffer_input)
@@ -501,8 +502,11 @@  static void run_service(const char **argv, int buffer_input)
 	else
 		close(0);
 
-	if (finish_command(&cld))
+	if (finish_command(&cld)) {
+		packet_write_fmt(1, "ERR %s failed\n", argv[0]);
 		exit(1);
+	}
+	close(1);
 }
 
 static int show_text_ref(const char *name, const struct object_id *oid,
diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh
index 5039e66dc4..69a920a34b 100755
--- a/t/t5702-protocol-v2.sh
+++ b/t/t5702-protocol-v2.sh
@@ -586,6 +586,23 @@  test_expect_success 'clone with http:// using protocol v2' '
 	! grep "Send header: Transfer-Encoding: chunked" log
 '
 
+test_expect_success 'clone with http:// using protocol v2 and invalid parameters' '
+	test_when_finished "rm -f log" &&
+
+	test_must_fail env GIT_TRACE_PACKET="$(pwd)/log" GIT_TRACE_CURL="$(pwd)/log" \
+		git -c protocol.version=2 \
+		clone --shallow-since=20151012 "$HTTPD_URL/smart/http_parent" http_child_invalid &&
+
+	# Client requested to use protocol v2
+	grep "Git-Protocol: version=2" log &&
+	# Server responded using protocol v2
+	grep "git< version 2" log &&
+	# Verify that we sucessfully errored out
+	grep -F "ERR upload-pack failed" log &&
+	# Verify that the chunked encoding sending codepath is NOT exercised
+	! grep "Send header: Transfer-Encoding: chunked" log
+'
+
 test_expect_success 'clone big repository with http:// using protocol v2' '
 	test_when_finished "rm -f log" &&