diff mbox series

[v2,7/7] stateless-connect: send response end packet

Message ID 4b079bcd83ea80b8a0e81b0c1e3d5e083efeb9c6.1589816719.git.liu.denton@gmail.com (mailing list archive)
State New, archived
Headers show
Series remote-curl: fix deadlocks when remote server disconnects | expand

Commit Message

Denton Liu May 18, 2020, 3:47 p.m. UTC
Currently, remote-curl acts as a proxy and blindly forwards packets
between an HTTP server and fetch-pack. In the case of a stateless RPC
connection where the connection is terminated before the transaction is
complete, remote-curl will blindly forward the packets before waiting on
more input from fetch-pack. Meanwhile, fetch-pack will read the
transaction and continue reading, expecting more input to continue the
transaction. This results in a deadlock between the two processes.

This can be seen in the following command which does not terminate:

	$ git -c protocol.version=2 clone https://github.com/git/git.git --shallow-since=20151012
	Cloning into 'git'...

whereas the v1 version does terminate as expected:

	$ git -c protocol.version=1 clone https://github.com/git/git.git --shallow-since=20151012
	Cloning into 'git'...
	fatal: the remote end hung up unexpectedly

Instead of blindly forwarding packets, make remote-curl insert response
end and flush packets after proxying the responses from the remote
server when using stateless_connect(). On the RPC client side, ensure
that each response ends as described.

A separate control packet is chosen because we need to be able to
differentiate between what the remote server sends and remote-curl's
control packets. By ensuring in the remote-curl code that a server
cannot send response end packets, we prevent a malicious server from
being able to perform a denial of service attack in which they spoof a
response end packet and cause the described deadlock to happen.

Reported-by: Force Charlie <charlieio@outlook.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Denton Liu <liu.denton@gmail.com>
---
 Documentation/gitremote-helpers.txt     |  4 +++-
 Documentation/technical/protocol-v2.txt |  2 ++
 builtin/fetch-pack.c                    |  2 +-
 connect.c                               | 10 +++++++++-
 fetch-pack.c                            | 12 ++++++++++++
 remote-curl.c                           |  7 +++++++
 remote.h                                |  3 ++-
 t/t5702-protocol-v2.sh                  | 13 +++++++++++++
 transport.c                             |  3 ++-
 9 files changed, 51 insertions(+), 5 deletions(-)

Comments

Jeff King May 18, 2020, 4:43 p.m. UTC | #1
On Mon, May 18, 2020 at 11:47:24AM -0400, Denton Liu wrote:

> A separate control packet is chosen because we need to be able to
> differentiate between what the remote server sends and remote-curl's
> control packets. By ensuring in the remote-curl code that a server
> cannot send response end packets, we prevent a malicious server from
> being able to perform a denial of service attack in which they spoof a
> response end packet and cause the described deadlock to happen.

Thanks for thinking about this. I was wondering what mischief a server
might be able to cause by sending such a packet.

> @@ -446,6 +447,13 @@ struct ref **get_remote_refs(int fd_out, struct packet_reader *reader,
>  	if (reader->status != PACKET_READ_FLUSH)
>  		die(_("expected flush after ref listing"));
>  
> +	if (stateless_rpc) {
> +		if (packet_reader_read(reader) != PACKET_READ_RESPONSE_END)
> +			die(_("expected response end packet after ref listing"));
> +		if (packet_reader_read(reader) != PACKET_READ_FLUSH)
> +			die(_("expected flush packet after response end"));
> +	}

Having two packets here surprised me. We'd have already received the
actual flush from the server (as you can see in the context above).
Wouldn't a single RESPONSE_END be enough to signal the reader?

> diff --git a/fetch-pack.c b/fetch-pack.c
> index f73a2ce6cb..bcbbb7e2fb 100644
> --- a/fetch-pack.c
> +++ b/fetch-pack.c
> @@ -1468,6 +1468,7 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
>  	struct fetch_negotiator negotiator_alloc;
>  	struct fetch_negotiator *negotiator;
>  	int seen_ack = 0;
> +	int check_http_delimiter;

This flag was more complicated than I expected, and I'm not sure how we
can easily be certain that all necessary paths are covered.

E.g., in FETCH_PROCESS_ACKS, we'll always be reading a response via
process_acks(). Why do COMMON_FOUND and NO_COMMON_FOUND check for
end-of-response, but READY doesn't? I think the answer is that we'd
continue to read the same response via FETCH_GET_PACK in this instance.

I just wonder if there is a better place to put this logic that would be
more certain to catch every place we'd expect to read to the end of a
response. But I suppose not. We could push it down into process_acks(),
but it would have the same READY logic that's here. I'll admit part of
my complaint is that the existing do_fetch_pack_v2 state-machine switch
is kind of hard to follow, but that's not your fault.

Two things that might help:

  - put a comment in READY to indicate why we don't need to check
    end-of-response (but do for its sibling cases)

  - instead of setting and clearing a variable and then using it to
    check at the end of the loop, call check_http_delimiter(&reader,
    args->stateless_rpc) at the moment we're done with the packet. The
    actual executed code would have the same behavior, I think, but it
    might be easier to understand that we're trying to hit the end of
    each response.

Also, it probably should be check_stateless_delimiter() or something.
There could be other helpers that support stateless-connect besides
http.

Speaking of which: this is a change to the remote-helper protocol, since
we're now expecting stateless-connect helpers to produce these delim
packets (and failing if they don't). I kind of doubt that anybody but
remote-curl has implemented v2 stateless-connect, but should we be
marking this with an extra capability to be on the safe side?

> +		if (args->stateless_rpc && check_http_delimiter) {
> +			if (packet_reader_read(&reader) != PACKET_READ_RESPONSE_END)
> +				die(_("git fetch-pack: expected response end packet"));
> +			if (packet_reader_read(&reader) != PACKET_READ_FLUSH)
> +				die(_("git fetch-pack: expected flush packet"));
> +		}

I wondered also if stateless_rpc might catch cases besides
stateless-connect. But I guess we're in the v2 fetch-pack path, and that
_only_ understands stateless-connect.

-Peff
Denton Liu May 18, 2020, 5:12 p.m. UTC | #2
Hi Peff,

On Mon, May 18, 2020 at 12:43:08PM -0400, Jeff King wrote:
> On Mon, May 18, 2020 at 11:47:24AM -0400, Denton Liu wrote:

[...]

> > @@ -446,6 +447,13 @@ struct ref **get_remote_refs(int fd_out, struct packet_reader *reader,
> >  	if (reader->status != PACKET_READ_FLUSH)
> >  		die(_("expected flush after ref listing"));
> >  
> > +	if (stateless_rpc) {
> > +		if (packet_reader_read(reader) != PACKET_READ_RESPONSE_END)
> > +			die(_("expected response end packet after ref listing"));
> > +		if (packet_reader_read(reader) != PACKET_READ_FLUSH)
> > +			die(_("expected flush packet after response end"));
> > +	}
> 
> Having two packets here surprised me. We'd have already received the
> actual flush from the server (as you can see in the context above).
> Wouldn't a single RESPONSE_END be enough to signal the reader?

Yes, having a single RESPONSE_END is enough. I sent a flush packet
because it seemed appropriate when I was writing the patch to end all
messages with a flush but I suppose that's just cargo culting. I'll
remove it in the next iteration.

> > diff --git a/fetch-pack.c b/fetch-pack.c
> > index f73a2ce6cb..bcbbb7e2fb 100644
> > --- a/fetch-pack.c
> > +++ b/fetch-pack.c
> > @@ -1468,6 +1468,7 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
> >  	struct fetch_negotiator negotiator_alloc;
> >  	struct fetch_negotiator *negotiator;
> >  	int seen_ack = 0;
> > +	int check_http_delimiter;
> 
> This flag was more complicated than I expected, and I'm not sure how we
> can easily be certain that all necessary paths are covered.

Essentially, any time we're in a path where state is set to
FETCH_SEND_REQUEST or FETCH_DONE, we know that we've finished processing
the current request and we can check for response end packets. Of course,
the one exception to this is the first iteration of the loop when we're
transitioning from FETCH_CHECK_LOCAL to FETCH_SEND_REQUEST where we
can't check for response end packets since nothing has been requested
yet.

> E.g., in FETCH_PROCESS_ACKS, we'll always be reading a response via
> process_acks(). Why do COMMON_FOUND and NO_COMMON_FOUND check for
> end-of-response, but READY doesn't? I think the answer is that we'd
> continue to read the same response via FETCH_GET_PACK in this instance.
> 
> I just wonder if there is a better place to put this logic that would be
> more certain to catch every place we'd expect to read to the end of a
> response. But I suppose not. We could push it down into process_acks(),
> but it would have the same READY logic that's here. I'll admit part of
> my complaint is that the existing do_fetch_pack_v2 state-machine switch
> is kind of hard to follow, but that's not your fault.

I debated between the current implementation and doing something like

	int first_iteration = 1;

	...

	while (state != FETCH_DONE) {
		switch (...) {
			...
		}

		if (args->stateless_rpc && !first_iteration && (state == FETCH_SEND_REQUEST || state == FETCH_DONE)) {
			if (packet_reader_read(&reader) != PACKET_READ_RESPONSE_END)
				die(_("git fetch-pack: expected response end packet"));
			if (packet_reader_read(&reader) != PACKET_READ_FLUSH)
				die(_("git fetch-pack: expected flush packet"));
		}
		first_iteration = 0;
	}

I think that this catches _all_ the cases without fiddling with any of
the state machine logic.

[...]

> Also, it probably should be check_stateless_delimiter() or something.
> There could be other helpers that support stateless-connect besides
> http.

Yeah... I forgot to change the check_http_delimiter name when I was
doing cleanup. Whoops.

> Speaking of which: this is a change to the remote-helper protocol, since
> we're now expecting stateless-connect helpers to produce these delim
> packets (and failing if they don't). I kind of doubt that anybody but
> remote-curl has implemented v2 stateless-connect, but should we be
> marking this with an extra capability to be on the safe side?

I think that we're probably safe from breaking anything external.
According to the gitremote-helpers documentation, 

	'stateless-connect'::
		Experimental; for internal use only.

so I think that gives us a bit of leeway in terms of the changes we can
make to the stateless-connect protocol. They've been warned ;)

Thanks,

Denton
Jeff King May 18, 2020, 5:26 p.m. UTC | #3
On Mon, May 18, 2020 at 01:12:49PM -0400, Denton Liu wrote:

> > I just wonder if there is a better place to put this logic that would be
> > more certain to catch every place we'd expect to read to the end of a
> > response. But I suppose not. We could push it down into process_acks(),
> > but it would have the same READY logic that's here. I'll admit part of
> > my complaint is that the existing do_fetch_pack_v2 state-machine switch
> > is kind of hard to follow, but that's not your fault.
> 
> I debated between the current implementation and doing something like
> 
> 	int first_iteration = 1;
> 
> 	...
> 
> 	while (state != FETCH_DONE) {
> 		switch (...) {
> 			...
> 		}
> 
> 		if (args->stateless_rpc && !first_iteration && (state == FETCH_SEND_REQUEST || state == FETCH_DONE)) {
> 			if (packet_reader_read(&reader) != PACKET_READ_RESPONSE_END)
> 				die(_("git fetch-pack: expected response end packet"));
> 			if (packet_reader_read(&reader) != PACKET_READ_FLUSH)
> 				die(_("git fetch-pack: expected flush packet"));
> 		}
> 		first_iteration = 0;
> 	}
> 
> I think that this catches _all_ the cases without fiddling with any of
> the state machine logic.

I think you're right that it does, but TBH I find it harder to follow,
because now we're even further away from the actual response reads. What
I most want to verify is:

  - there is no response-reading code path that does not check the
    delimiters

  - there is no code path where we check delimiters but did not actually
    read a response

So to me, putting the check as close to the response-reads as possible
makes sense. I think the patch below is equivalent and easier to verify,
but I admit it's somewhat subjective:

diff --git a/fetch-pack.c b/fetch-pack.c
index 5325649eda..0bf1e5760c 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -1443,6 +1443,17 @@ static void receive_wanted_refs(struct packet_reader *reader,
 		die(_("error processing wanted refs: %d"), reader->status);
 }
 
+static void check_stateless_delimiter(struct fetch_pack_args *args,
+				      struct packet_reader *reader)
+{
+	if (!args->stateless_rpc)
+		return; /* not in stateless mode, no delimiter expected */
+	if (packet_reader_read(reader) != PACKET_READ_RESPONSE_END)
+		die(_("git fetch-pack: expected response end packet"));
+	if (packet_reader_read(reader) != PACKET_READ_FLUSH)
+		die(_("git fetch-pack: expected flush packet"));
+}
+
 enum fetch_state {
 	FETCH_CHECK_LOCAL = 0,
 	FETCH_SEND_REQUEST,
@@ -1469,7 +1480,6 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 	struct fetch_negotiator negotiator_alloc;
 	struct fetch_negotiator *negotiator;
 	int seen_ack = 0;
-	int check_http_delimiter;
 
 	if (args->no_dependents) {
 		negotiator = NULL;
@@ -1488,8 +1498,6 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 	}
 
 	while (state != FETCH_DONE) {
-		check_http_delimiter = 0;
-
 		switch (state) {
 		case FETCH_CHECK_LOCAL:
 			sort_ref_list(&ref, ref_compare_name);
@@ -1538,15 +1546,19 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 			/* Process ACKs/NAKs */
 			switch (process_acks(negotiator, &reader, &common)) {
 			case READY:
+				/*
+				 * Don't check for response delimiter; get_pack() will
+				 * read the rest of this response.
+				 */
 				state = FETCH_GET_PACK;
 				break;
 			case COMMON_FOUND:
 				in_vain = 0;
 				seen_ack = 1;
 				/* fallthrough */
 			case NO_COMMON_FOUND:
 				state = FETCH_SEND_REQUEST;
-				check_http_delimiter = 1;
+				check_stateless_delimiter(args, &reader);
 				break;
 			}
 			break;
@@ -1565,20 +1577,13 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 			process_section_header(&reader, "packfile", 0);
 			if (get_pack(args, fd, pack_lockfile, sought, nr_sought))
 				die(_("git fetch-pack: fetch failed."));
+			check_stateless_delimiter(args, &reader);
 
 			state = FETCH_DONE;
-			check_http_delimiter = 1;
 			break;
 		case FETCH_DONE:
 			continue;
 		}
-
-		if (args->stateless_rpc && check_http_delimiter) {
-			if (packet_reader_read(&reader) != PACKET_READ_RESPONSE_END)
-				die(_("git fetch-pack: expected response end packet"));
-			if (packet_reader_read(&reader) != PACKET_READ_FLUSH)
-				die(_("git fetch-pack: expected flush packet"));
-		}
 	}
 
 	if (negotiator)

> > Speaking of which: this is a change to the remote-helper protocol, since
> > we're now expecting stateless-connect helpers to produce these delim
> > packets (and failing if they don't). I kind of doubt that anybody but
> > remote-curl has implemented v2 stateless-connect, but should we be
> > marking this with an extra capability to be on the safe side?
> 
> I think that we're probably safe from breaking anything external.
> According to the gitremote-helpers documentation, 
> 
> 	'stateless-connect'::
> 		Experimental; for internal use only.
> 
> so I think that gives us a bit of leeway in terms of the changes we can
> make to the stateless-connect protocol. They've been warned ;)

OK, I didn't realize we had explicitly marked it as experimental. I
agree we're probably fine here, then (now that v2 is starting to settle
a bit more, we might think about lifting that warning, but this patch
series shows we're not quite there yet :) ).

-Peff
diff mbox series

Patch

diff --git a/Documentation/gitremote-helpers.txt b/Documentation/gitremote-helpers.txt
index f48a031dc3..84f8e92b23 100644
--- a/Documentation/gitremote-helpers.txt
+++ b/Documentation/gitremote-helpers.txt
@@ -405,7 +405,9 @@  Supported if the helper has the "connect" capability.
 	trying to fall back).  After line feed terminating the positive
 	(empty) response, the output of the service starts.  Messages
 	(both request and response) must consist of zero or more
-	PKT-LINEs, terminating in a flush packet. The client must not
+	PKT-LINEs, terminating in a flush packet. Response messages will
+	have a response end packet before the flush packet to indicate
+	the end of a response.  The client must not
 	expect the server to store any state in between request-response
 	pairs.  After the connection ends, the remote helper exits.
 +
diff --git a/Documentation/technical/protocol-v2.txt b/Documentation/technical/protocol-v2.txt
index 7e3766cafb..3996d70891 100644
--- a/Documentation/technical/protocol-v2.txt
+++ b/Documentation/technical/protocol-v2.txt
@@ -33,6 +33,8 @@  In protocol v2 these special packets will have the following semantics:
 
   * '0000' Flush Packet (flush-pkt) - indicates the end of a message
   * '0001' Delimiter Packet (delim-pkt) - separates sections of a message
+  * '0002' Message Packet (response-end-pkt) - indicates the end of a response
+    for stateless connections
 
 Initial Client Request
 ----------------------
diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index 4771100072..94b0c89b82 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -224,7 +224,7 @@  int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
 	version = discover_version(&reader);
 	switch (version) {
 	case protocol_v2:
-		get_remote_refs(fd[1], &reader, &ref, 0, NULL, NULL);
+		get_remote_refs(fd[1], &reader, &ref, 0, NULL, NULL, args.stateless_rpc);
 		break;
 	case protocol_v1:
 	case protocol_v0:
diff --git a/connect.c b/connect.c
index 11c6ec70a0..12b57f5c0a 100644
--- a/connect.c
+++ b/connect.c
@@ -409,7 +409,8 @@  static int process_ref_v2(const char *line, struct ref ***list)
 struct ref **get_remote_refs(int fd_out, struct packet_reader *reader,
 			     struct ref **list, int for_push,
 			     const struct argv_array *ref_prefixes,
-			     const struct string_list *server_options)
+			     const struct string_list *server_options,
+			     int stateless_rpc)
 {
 	int i;
 	*list = NULL;
@@ -446,6 +447,13 @@  struct ref **get_remote_refs(int fd_out, struct packet_reader *reader,
 	if (reader->status != PACKET_READ_FLUSH)
 		die(_("expected flush after ref listing"));
 
+	if (stateless_rpc) {
+		if (packet_reader_read(reader) != PACKET_READ_RESPONSE_END)
+			die(_("expected response end packet after ref listing"));
+		if (packet_reader_read(reader) != PACKET_READ_FLUSH)
+			die(_("expected flush packet after response end"));
+	}
+
 	return list;
 }
 
diff --git a/fetch-pack.c b/fetch-pack.c
index f73a2ce6cb..bcbbb7e2fb 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -1468,6 +1468,7 @@  static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 	struct fetch_negotiator negotiator_alloc;
 	struct fetch_negotiator *negotiator;
 	int seen_ack = 0;
+	int check_http_delimiter;
 
 	if (args->no_dependents) {
 		negotiator = NULL;
@@ -1486,6 +1487,8 @@  static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 	}
 
 	while (state != FETCH_DONE) {
+		check_http_delimiter = 0;
+
 		switch (state) {
 		case FETCH_CHECK_LOCAL:
 			sort_ref_list(&ref, ref_compare_name);
@@ -1542,6 +1545,7 @@  static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 				/* fallthrough */
 			case NO_COMMON_FOUND:
 				state = FETCH_SEND_REQUEST;
+				check_http_delimiter = 1;
 				break;
 			}
 			break;
@@ -1562,10 +1566,18 @@  static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 				die(_("git fetch-pack: fetch failed."));
 
 			state = FETCH_DONE;
+			check_http_delimiter = 1;
 			break;
 		case FETCH_DONE:
 			continue;
 		}
+
+		if (args->stateless_rpc && check_http_delimiter) {
+			if (packet_reader_read(&reader) != PACKET_READ_RESPONSE_END)
+				die(_("git fetch-pack: expected response end packet"));
+			if (packet_reader_read(&reader) != PACKET_READ_FLUSH)
+				die(_("git fetch-pack: expected flush packet"));
+		}
 	}
 
 	if (negotiator)
diff --git a/remote-curl.c b/remote-curl.c
index d02cb547e9..8a72b5ee7a 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -703,6 +703,8 @@  static void check_pktline(struct check_pktline_state *state, const char *ptr, si
 				state->remaining = packet_length(state->len_buf);
 				if (state->remaining < 0) {
 					die(_("remote-curl: bad line length character: %.4s"), state->len_buf);
+				} else if (state->remaining == 2) {
+					die(_("remote-curl: unexpected response end packet"));
 				} else if (state->remaining < 4) {
 					state->remaining = 0;
 				} else {
@@ -991,6 +993,11 @@  static int post_rpc(struct rpc_state *rpc, int stateless_connect, int flush_rece
 	if (rpc_in_data.pktline_state.remaining)
 		err = error(_("%d bytes of body are still expected"), rpc_in_data.pktline_state.remaining);
 
+	if (stateless_connect) {
+		packet_response_end(rpc->in);
+		packet_flush(rpc->in);
+	}
+
 	curl_slist_free_all(headers);
 	free(gzip_body);
 	return err;
diff --git a/remote.h b/remote.h
index 11d8719b58..5cc26c1b3b 100644
--- a/remote.h
+++ b/remote.h
@@ -179,7 +179,8 @@  struct ref **get_remote_heads(struct packet_reader *reader,
 struct ref **get_remote_refs(int fd_out, struct packet_reader *reader,
 			     struct ref **list, int for_push,
 			     const struct argv_array *ref_prefixes,
-			     const struct string_list *server_options);
+			     const struct string_list *server_options,
+			     int stateless_rpc);
 
 int resolve_remote_symref(struct ref *ref, struct ref *list);
 
diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh
index 4eb81ba2d4..8da65e60de 100755
--- a/t/t5702-protocol-v2.sh
+++ b/t/t5702-protocol-v2.sh
@@ -620,6 +620,19 @@  test_expect_success 'clone repository with http:// using protocol v2 with incomp
 	test_i18ngrep "bytes of body are still expected" err
 '
 
+test_expect_success 'clone with http:// using protocol v2 and invalid parameters' '
+	test_when_finished "rm -f log" &&
+
+	test_must_fail env GIT_TRACE_PACKET="$(pwd)/log" GIT_TRACE_CURL="$(pwd)/log" \
+		git -c protocol.version=2 \
+		clone --shallow-since=20151012 "$HTTPD_URL/smart/http_parent" http_child_invalid &&
+
+	# Client requested to use protocol v2
+	grep "Git-Protocol: version=2" log &&
+	# Server responded using protocol v2
+	grep "git< version 2" log
+'
+
 test_expect_success 'clone big repository with http:// using protocol v2' '
 	test_when_finished "rm -f log" &&
 
diff --git a/transport.c b/transport.c
index a6002e502f..182978f4be 100644
--- a/transport.c
+++ b/transport.c
@@ -297,7 +297,8 @@  static struct ref *handshake(struct transport *transport, int for_push,
 		if (must_list_refs)
 			get_remote_refs(data->fd[1], &reader, &refs, for_push,
 					ref_prefixes,
-					transport->server_options);
+					transport->server_options,
+					transport->stateless_rpc);
 		break;
 	case protocol_v1:
 	case protocol_v0: