diff mbox series

[v2] fetch-pack: try harder to read an ERR packet

Message ID 20200428074442.29830-1-chriscool@tuxfamily.org (mailing list archive)
State New, archived
Headers show
Series [v2] fetch-pack: try harder to read an ERR packet | expand

Commit Message

Christian Couder April 28, 2020, 7:44 a.m. UTC
From: SZEDER Gábor <szeder.dev@gmail.com>

When the server has hung up after sending an ERR packet to the
client, the client might still be writing, for example a "done"
line. Therefore the client might get a write error before reading
the ERR packet.

When fetching, this could result in the client displaying a
"Broken pipe" error, instead of the more useful error sent by
the server in the ERR packet.

Instead of using write_in_full() which immediately returns an
error when the server has hung up, let's use a new
write_in_full_read_err() function which will read all the
packets left before returning an error.

Helped-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 cache.h      | 10 +++++++++-
 fetch-pack.c | 13 +++++++------
 wrapper.c    |  7 +++++--
 3 files changed, 21 insertions(+), 9 deletions(-)

Comments

Junio C Hamano April 28, 2020, 7:24 p.m. UTC | #1
Christian Couder <christian.couder@gmail.com> writes:

> From: SZEDER Gábor <szeder.dev@gmail.com>
>
> When the server has hung up after sending an ERR packet to the
> client, the client might still be writing, for example a "done"
> line. Therefore the client might get a write error before reading
> the ERR packet.
>
> When fetching, this could result in the client displaying a
> "Broken pipe" error, instead of the more useful error sent by
> the server in the ERR packet.

Hmm, if the connection gets severed just before the ERR packet the
other side has written, we will see "Broken pipe" if we write
"done", and no amount of "try to read to collect as much what they
said as possible" would help.  If you are lucky and the connection
is broken after the ERR reaches on this side, such an "extra effort"
may help, but is it really worth the effort?  It is not clear to me
if the extra complexity, one more API function people need to learn,
and the need to think which one to use every time they want to say
write_in_full(), are justifiable.

I dunno.
Taylor Blau April 28, 2020, 7:59 p.m. UTC | #2
On Tue, Apr 28, 2020 at 12:24:06PM -0700, Junio C Hamano wrote:
> Christian Couder <christian.couder@gmail.com> writes:
>
> > From: SZEDER Gábor <szeder.dev@gmail.com>
> >
> > When the server has hung up after sending an ERR packet to the
> > client, the client might still be writing, for example a "done"
> > line. Therefore the client might get a write error before reading
> > the ERR packet.
> >
> > When fetching, this could result in the client displaying a
> > "Broken pipe" error, instead of the more useful error sent by
> > the server in the ERR packet.
>
> Hmm, if the connection gets severed just before the ERR packet the
> other side has written, we will see "Broken pipe" if we write
> "done", and no amount of "try to read to collect as much what they
> said as possible" would help.  If you are lucky and the connection
> is broken after the ERR reaches on this side, such an "extra effort"
> may help, but is it really worth the effort?  It is not clear to me
> if the extra complexity, one more API function people need to learn,
> and the need to think which one to use every time they want to say
> write_in_full(), are justifiable.
>
> I dunno.

I think that you're right. The more that I thought about my suggestion,
the more dumb I was convinced that it was.

Sorry.

Thanks,
Taylor
Jeff King April 28, 2020, 8:49 p.m. UTC | #3
On Tue, Apr 28, 2020 at 12:24:06PM -0700, Junio C Hamano wrote:

> Christian Couder <christian.couder@gmail.com> writes:
> 
> > From: SZEDER Gábor <szeder.dev@gmail.com>
> >
> > When the server has hung up after sending an ERR packet to the
> > client, the client might still be writing, for example a "done"
> > line. Therefore the client might get a write error before reading
> > the ERR packet.
> >
> > When fetching, this could result in the client displaying a
> > "Broken pipe" error, instead of the more useful error sent by
> > the server in the ERR packet.
> 
> Hmm, if the connection gets severed just before the ERR packet the
> other side has written, we will see "Broken pipe" if we write
> "done", and no amount of "try to read to collect as much what they
> said as possible" would help.  If you are lucky and the connection
> is broken after the ERR reaches on this side, such an "extra effort"
> may help, but is it really worth the effort?  It is not clear to me
> if the extra complexity, one more API function people need to learn,
> and the need to think which one to use every time they want to say
> write_in_full(), are justifiable.

I think the "lucky" case happens pretty routinely. The situation we're
trying to catch here is that server does:

   packet_write("ERR I don't like your request for some reason");
   die("exiting");

On the client side, we'll get that final packet and then see that the
connection is closed. So if we get EPIPE or similar on write(), that
means we're seeing the closed connection. Which means we'll _always_
have gotten the ERR packet (in this situation).

So the problem it is solving is that there isn't really flow control in
the protocol. The server might be aborting and dumping out an error
response while the client is still writing. If the server were to
continue reading the client request before closing the connection, this
wouldn't happen. And that's what an HTTP server would be doing.

But I think that's pretty tricky to do in our programs, where the
protocol framing isn't so structured, and deciding when the client is
done talking often involves parsing their request. E.g., imagine we die
with "not our ref" due to a "want" line. We'd have to return an error up
the stack to the code that is reading want/have/done lines, so that it
knows to keep pumping the socket until it gets to a break point, and
_then_ die().

I think by contrast, just having the client handle EPIPE more gracefully
is a simpler fix.

-Peff
Jeff King April 28, 2020, 8:51 p.m. UTC | #4
On Tue, Apr 28, 2020 at 01:59:57PM -0600, Taylor Blau wrote:

> > > When the server has hung up after sending an ERR packet to the
> > > client, the client might still be writing, for example a "done"
> > > line. Therefore the client might get a write error before reading
> > > the ERR packet.
> > >
> > > When fetching, this could result in the client displaying a
> > > "Broken pipe" error, instead of the more useful error sent by
> > > the server in the ERR packet.
> >
> > Hmm, if the connection gets severed just before the ERR packet the
> > other side has written, we will see "Broken pipe" if we write
> > "done", and no amount of "try to read to collect as much what they
> > said as possible" would help.  If you are lucky and the connection
> > is broken after the ERR reaches on this side, such an "extra effort"
> > may help, but is it really worth the effort?  It is not clear to me
> > if the extra complexity, one more API function people need to learn,
> > and the need to think which one to use every time they want to say
> > write_in_full(), are justifiable.
> >
> > I dunno.
> 
> I think that you're right. The more that I thought about my suggestion,
> the more dumb I was convinced that it was.

Now I'm confused about which suggestion. The overall patch's purpose is
still good, I think (see my other response). But I agree that having an
alternate write_in_full() is spreading the hack too far outside of
fetch-pack (and as I wrote in [1], I really don't understand what it's
trying to do).

-Peff

[1] https://lore.kernel.org/git/20200428083336.GA2405176@coredump.intra.peff.net/
Junio C Hamano April 28, 2020, 9:02 p.m. UTC | #5
Jeff King <peff@peff.net> writes:

>> Hmm, if the connection gets severed just before the ERR packet the
>> other side has written, we will see "Broken pipe" if we write
>> "done", and no amount of "try to read to collect as much what they
>> said as possible" would help.  If you are lucky and the connection
>> is broken after the ERR reaches on this side, such an "extra effort"
>> may help, but is it really worth the effort?  It is not clear to me
>> if the extra complexity, one more API function people need to learn,
>> and the need to think which one to use every time they want to say
>> write_in_full(), are justifiable.
>
> I think the "lucky" case happens pretty routinely. The situation we're
> trying to catch here is that server does:
>
>    packet_write("ERR I don't like your request for some reason");
>    die("exiting");

OK, if we assume that the communication route is never flaky and
everything writtten will go through before TCP shutdown that would
happen when die() kills the process (or like test environment that
most communication goes locally between two processes), sure, it may
look common enough to be worth "fixing".  I simply did not realize
that was the shared assumption behind this patch before I went back
to the original discussion that was about a racy test.

As long as an extra read on the side that just got a write error
won't throw us into a deadlock, I think I am OK, but I am still not
sure if the code complexity to have two write_in_full() is worth it.

If the mechanism to do this were limited to the packet IO layer, it
may be more palatable, though.
Jeff King April 28, 2020, 9:17 p.m. UTC | #6
On Tue, Apr 28, 2020 at 02:02:33PM -0700, Junio C Hamano wrote:

> > I think the "lucky" case happens pretty routinely. The situation we're
> > trying to catch here is that server does:
> >
> >    packet_write("ERR I don't like your request for some reason");
> >    die("exiting");
> 
> OK, if we assume that the communication route is never flaky and
> everything writtten will go through before TCP shutdown that would
> happen when die() kills the process (or like test environment that
> most communication goes locally between two processes), sure, it may
> look common enough to be worth "fixing".  I simply did not realize
> that was the shared assumption behind this patch before I went back
> to the original discussion that was about a racy test.

Certainly the communication route will _sometimes_ be flaky. And there's
nothing we can do there except say "Broken pipe" or similar, whether the
other side said ERR or not. So any reads have to be speculative. But
assuming TCP is functioning, some portion of our ERR packets are simply
dropped from the incoming TCP buffers. I suspect we don't get more
reports of this in the wild (and are mostly annoyed by it in racy tests)
because:

  - ERR conditions don't happen all that often (though we have been
    adding more recently)

  - users may not realize they _could_ have gotten a good message
    (especially since historically many of these conditions did just
    involve the server hanging up)

  - I think the race may be easier to win on real networks with latency.
    Locally, if I write "want bogus" and then "done", the other side may
    process the "want" between the two and close the pipe. But if
    there's 50ms between client and server, we've usually managed to
    "done" into the systems TCP buffer, if not onto the wire, by the
    time the other side has figured out the error and gotten a FIN
    packet back to us.

> As long as an extra read on the side that just got a write error
> won't throw us into a deadlock, I think I am OK, but I am still not
> sure if the code complexity to have two write_in_full() is worth it.

Yes, I don't see the point of that.

> If the mechanism to do this were limited to the packet IO layer, it
> may be more palatable, though.

Agreed. There's not a single place where we write, though, since we
often form packets in local buffers and then write() them out manually.
So it does have to be sprinkled around fetch-pack.c. But certainly the
damage can be limited to that client network code.

-Peff
Junio C Hamano April 28, 2020, 9:23 p.m. UTC | #7
Jeff King <peff@peff.net> writes:

>> If the mechanism to do this were limited to the packet IO layer, it
>> may be more palatable, though.
>
> Agreed. There's not a single place where we write, though, since we
> often form packets in local buffers and then write() them out manually.
> So it does have to be sprinkled around fetch-pack.c. But certainly the
> damage can be limited to that client network code.

Yeah, I do not mind "sprinkling all over the place in client network
code" at all.  The ideal is if we never used write_in_full() and
always used this "write but if we get a write error then check for
ERR packet" helper consistently (which means we'd convert all the
writes done in that layer, and by definition that may have to be
"all over the place in the client networking code").

Thanks.
diff mbox series

Patch

diff --git a/cache.h b/cache.h
index 0f0485ecfe..efeee637b8 100644
--- a/cache.h
+++ b/cache.h
@@ -1808,10 +1808,18 @@  int copy_file_with_time(const char *dst, const char *src, int mode);
 void write_or_die(int fd, const void *buf, size_t count);
 void fsync_or_die(int fd, const char *);
 
+struct packet_reader;
+
 ssize_t read_in_full(int fd, void *buf, size_t count);
-ssize_t write_in_full(int fd, const void *buf, size_t count);
+ssize_t write_in_full_read_err(int fd, const void *buf, size_t count,
+			       struct packet_reader *reader);
 ssize_t pread_in_full(int fd, void *buf, size_t count, off_t offset);
 
+static inline ssize_t write_in_full(int fd, const void *buf, size_t count)
+{
+	return write_in_full_read_err(fd, buf, count, NULL);
+}
+
 static inline ssize_t write_str_in_full(int fd, const char *str)
 {
 	return write_in_full(fd, str, strlen(str));
diff --git a/fetch-pack.c b/fetch-pack.c
index 0b07b3ee73..febdf54bf4 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -185,13 +185,14 @@  static enum ack_type get_ack(struct packet_reader *reader,
 }
 
 static void send_request(struct fetch_pack_args *args,
-			 int fd, struct strbuf *buf)
+			 int fd, struct strbuf *buf,
+			 struct packet_reader *reader)
 {
 	if (args->stateless_rpc) {
 		send_sideband(fd, -1, buf->buf, buf->len, LARGE_PACKET_MAX);
 		packet_flush(fd);
 	} else {
-		if (write_in_full(fd, buf->buf, buf->len) < 0)
+		if (write_in_full_read_err(fd, buf->buf, buf->len, reader) < 0)
 			die_errno(_("unable to write to remote"));
 	}
 }
@@ -349,7 +350,7 @@  static int find_common(struct fetch_negotiator *negotiator,
 		const char *arg;
 		struct object_id oid;
 
-		send_request(args, fd[1], &req_buf);
+		send_request(args, fd[1], &req_buf, &reader);
 		while (packet_reader_read(&reader) == PACKET_READ_NORMAL) {
 			if (skip_prefix(reader.line, "shallow ", &arg)) {
 				if (get_oid_hex(arg, &oid))
@@ -372,7 +373,7 @@  static int find_common(struct fetch_negotiator *negotiator,
 			die(_("expected shallow/unshallow, got %s"), reader.line);
 		}
 	} else if (!args->stateless_rpc)
-		send_request(args, fd[1], &req_buf);
+		send_request(args, fd[1], &req_buf, &reader);
 
 	if (!args->stateless_rpc) {
 		/* If we aren't using the stateless-rpc interface
@@ -395,7 +396,7 @@  static int find_common(struct fetch_negotiator *negotiator,
 			int ack;
 
 			packet_buf_flush(&req_buf);
-			send_request(args, fd[1], &req_buf);
+			send_request(args, fd[1], &req_buf, &reader);
 			strbuf_setlen(&req_buf, state_len);
 			flushes++;
 			flush_at = next_flush(args->stateless_rpc, count);
@@ -470,7 +471,7 @@  static int find_common(struct fetch_negotiator *negotiator,
 	trace2_region_leave("fetch-pack", "negotiation_v0_v1", the_repository);
 	if (!got_ready || !no_done) {
 		packet_buf_write(&req_buf, "done\n");
-		send_request(args, fd[1], &req_buf);
+		send_request(args, fd[1], &req_buf, &reader);
 	}
 	print_verbose(args, _("done"));
 	if (retval != 0) {
diff --git a/wrapper.c b/wrapper.c
index 3a1c0e0526..2da7a15382 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -3,6 +3,7 @@ 
  */
 #include "cache.h"
 #include "config.h"
+#include "pkt-line.h"
 
 static int memory_limit_check(size_t size, int gentle)
 {
@@ -291,7 +292,8 @@  ssize_t read_in_full(int fd, void *buf, size_t count)
 	return total;
 }
 
-ssize_t write_in_full(int fd, const void *buf, size_t count)
+ssize_t write_in_full_read_err(int fd, const void *buf, size_t count,
+			       struct packet_reader *reader)
 {
 	const char *p = buf;
 	ssize_t total = 0;
@@ -300,7 +302,8 @@  ssize_t write_in_full(int fd, const void *buf, size_t count)
 		ssize_t written = xwrite(fd, p, count);
 		if (written < 0)
 			return -1;
-		if (!written) {
+		if (!written &&
+		    (!reader || packet_reader_read(reader) == PACKET_READ_EOF)) {
 			errno = ENOSPC;
 			return -1;
 		}