diff mbox series

[v2,02/14] pkt-line: promote static buffer in packet_write_gently() to callers

Message ID 3b03a8ff7a72c101f82a685cc6f34a5dd37a9c4b.1612208747.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series Simple IPC Mechanism | expand

Commit Message

Jeff Hostetler Feb. 1, 2021, 7:45 p.m. UTC
From: Jeff Hostetler <jeffhost@microsoft.com>

Move the static buffer used in `packet_write_gently()` to its callers.
This is a first step to make packet writing more thread-safe.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 pkt-line.c | 33 ++++++++++++++++++++++++---------
 pkt-line.h | 10 ++++++++--
 2 files changed, 32 insertions(+), 11 deletions(-)

Comments

Jeff King Feb. 2, 2021, 9:41 a.m. UTC | #1
On Mon, Feb 01, 2021 at 07:45:35PM +0000, Jeff Hostetler via GitGitGadget wrote:

> -static int packet_write_gently(const int fd_out, const char *buf, size_t size)
> +/*
> + * Use the provided scratch space to build a combined <hdr><buf> buffer
> + * and write it to the file descriptor (in one write if possible).
> + */
> +static int packet_write_gently(const int fd_out, const char *buf, size_t size,
> +			       struct packet_scratch_space *scratch)

Thanks for addressing my stack space concern.

This solution does work (and I like wrapping it in a struct like this),
though I have to wonder if we're not just punting on the thread issues
in an ever-so-slight way with things like this:

>  void packet_write(int fd_out, const char *buf, size_t size)
>  {
> -	if (packet_write_gently(fd_out, buf, size))
> +	static struct packet_scratch_space scratch;
> +
> +	if (packet_write_gently(fd_out, buf, size, &scratch))
>  		die_errno(_("packet write failed"));
>  }

Where we just moved it one step up the call stack.

>  int write_packetized_from_fd(int fd_in, int fd_out)
>  {
> +	/*
> +	 * TODO We could save a memcpy() if we essentially inline
> +	 * TODO packet_write_gently() here and change the xread()
> +	 * TODO to pass &buf[4].
> +	 */

And comments like this make me wonder if the current crop of pktline
functions are just mis-designed in the first place. There are two
obvious directions here.

One, we can observe that the only reason we need the scratch space is to
ship out the whole thing in a single write():

> [in packet_write_gently]
> -	set_packet_header(packet_write_buffer, packet_size);
> -	memcpy(packet_write_buffer + 4, buf, size);
> -	if (write_in_full(fd_out, packet_write_buffer, packet_size) < 0)
> +
> +	set_packet_header(scratch->buffer, packet_size);
> +	memcpy(scratch->buffer + 4, buf, size);
> +
> +	if (write_in_full(fd_out, scratch->buffer, packet_size) < 0)
>  		return error(_("packet write failed"));

Would it really be so bad to do:

  char header[4];
  set_packet_header(header, packet_size);
  if (write_in_full(fd_out, header, 4) < 0 ||
      write_in_full(fd_out, buf, size) < 0)
          return error(...);

I doubt that two syscalls is breaking the bank here, but if people are
really concerned, using writev() would be a much better solution.
Obviously we can't rely on it being available everywhere, but it's quite
easy to emulate with a wrapper (and I'd be happy punt on any writev
stuff until somebody actually measures a difference).

The other direction is that callers could be using a correctly-sized
buffer in the first place. I.e., something like:

  struct packet_buffer {
          char full_packet[LARGE_PACKET_MAX];
  };
  static inline char *packet_data(struct packet_buffer *pb)
  {
	return pb->full_packet + 4;
  }

That lets people work with the oversized buffer in a natural-ish way
that would be hard to get wrong, like:

  memcpy(packet_data(pb), some_other_buf, len);

(though if we wanted to go even further, we could provide accessors that
actually do the writing and sanity-check the lengths; the downside is
that I'm not sure how callers typically get the bytes into these bufs in
the first place).

That's a much bigger change, of course, and I'd guess you'd much prefer
to focus on the actual point of your series. ;)

-Peff
Jeff Hostetler Feb. 2, 2021, 8:33 p.m. UTC | #2
On 2/2/21 4:41 AM, Jeff King wrote:
> On Mon, Feb 01, 2021 at 07:45:35PM +0000, Jeff Hostetler via GitGitGadget wrote:
> 
>> -static int packet_write_gently(const int fd_out, const char *buf, size_t size)
>> +/*
>> + * Use the provided scratch space to build a combined <hdr><buf> buffer
>> + * and write it to the file descriptor (in one write if possible).
>> + */
>> +static int packet_write_gently(const int fd_out, const char *buf, size_t size,
>> +			       struct packet_scratch_space *scratch)
> 
> Thanks for addressing my stack space concern.
> 
> This solution does work (and I like wrapping it in a struct like this),
> though I have to wonder if we're not just punting on the thread issues
> in an ever-so-slight way with things like this:
> 
>>   void packet_write(int fd_out, const char *buf, size_t size)
>>   {
>> -	if (packet_write_gently(fd_out, buf, size))
>> +	static struct packet_scratch_space scratch;
>> +
>> +	if (packet_write_gently(fd_out, buf, size, &scratch))
>>   		die_errno(_("packet write failed"));
>>   }
> 
> Where we just moved it one step up the call stack.
> 
>>   int write_packetized_from_fd(int fd_in, int fd_out)
>>   {
>> +	/*
>> +	 * TODO We could save a memcpy() if we essentially inline
>> +	 * TODO packet_write_gently() here and change the xread()
>> +	 * TODO to pass &buf[4].
>> +	 */
> 
> And comments like this make me wonder if the current crop of pktline
> functions are just mis-designed in the first place. There are two
> obvious directions here.
> 
> One, we can observe that the only reason we need the scratch space is to
> ship out the whole thing in a single write():
> 
>> [in packet_write_gently]
>> -	set_packet_header(packet_write_buffer, packet_size);
>> -	memcpy(packet_write_buffer + 4, buf, size);
>> -	if (write_in_full(fd_out, packet_write_buffer, packet_size) < 0)
>> +
>> +	set_packet_header(scratch->buffer, packet_size);
>> +	memcpy(scratch->buffer + 4, buf, size);
>> +
>> +	if (write_in_full(fd_out, scratch->buffer, packet_size) < 0)
>>   		return error(_("packet write failed"));
> 
> Would it really be so bad to do:
> 
>    char header[4];
>    set_packet_header(header, packet_size);
>    if (write_in_full(fd_out, header, 4) < 0 ||
>        write_in_full(fd_out, buf, size) < 0)
>            return error(...);
> 
> I doubt that two syscalls is breaking the bank here, but if people are
> really concerned, using writev() would be a much better solution.
> Obviously we can't rely on it being available everywhere, but it's quite
> easy to emulate with a wrapper (and I'd be happy punt on any writev
> stuff until somebody actually measures a difference).
> 
> The other direction is that callers could be using a correctly-sized
> buffer in the first place. I.e., something like:
> 
>    struct packet_buffer {
>            char full_packet[LARGE_PACKET_MAX];
>    };
>    static inline char *packet_data(struct packet_buffer *pb)
>    {
> 	return pb->full_packet + 4;
>    }
> 
> That lets people work with the oversized buffer in a natural-ish way
> that would be hard to get wrong, like:
> 
>    memcpy(packet_data(pb), some_other_buf, len);
> 
> (though if we wanted to go even further, we could provide accessors that
> actually do the writing and sanity-check the lengths; the downside is
> that I'm not sure how callers typically get the bytes into these bufs in
> the first place).
> 
> That's a much bigger change, of course, and I'd guess you'd much prefer
> to focus on the actual point of your series. ;)
> 
> -Peff
> 

Yeah, I had all of those thoughts and debates in my head.  I'm not sure
there is a clear winner here.  And I was trying to prevent this change
from having a massive footprint and all that.  The FSMonitor stuff is
enough to worry about...

Personally, I like the 2 syscall model (for now at least and not mess
with writev()).  There are only 3 calls to packet_write_gently() and
this fixes 2 of them without any local buffers.  I might as well update
the 1 caller of write_packetized_from_fd() to pass a buffer rather than
have a static buffer while we're at it.  Then all of those routines
are fixed.

Let me see what that looks like.

Thanks
Jeff
Johannes Schindelin Feb. 2, 2021, 10:54 p.m. UTC | #3
Hi Peff,

On Tue, 2 Feb 2021, Jeff King wrote:

> On Mon, Feb 01, 2021 at 07:45:35PM +0000, Jeff Hostetler via GitGitGadget wrote:
>
> > [in packet_write_gently]
> > -	set_packet_header(packet_write_buffer, packet_size);
> > -	memcpy(packet_write_buffer + 4, buf, size);
> > -	if (write_in_full(fd_out, packet_write_buffer, packet_size) < 0)
> > +
> > +	set_packet_header(scratch->buffer, packet_size);
> > +	memcpy(scratch->buffer + 4, buf, size);
> > +
> > +	if (write_in_full(fd_out, scratch->buffer, packet_size) < 0)
> >  		return error(_("packet write failed"));
>
> Would it really be so bad to do:
>
>   char header[4];
>   set_packet_header(header, packet_size);
>   if (write_in_full(fd_out, header, 4) < 0 ||
>       write_in_full(fd_out, buf, size) < 0)
>           return error(...);

There must have been a reason why the original code went out of its way to
copy the data. At least that's what I _assume_.

I could see, for example, that these extra round-trips just for the
header, really have a negative impact on network operations.

> I doubt that two syscalls is breaking the bank here, but if people are
> really concerned, using writev() would be a much better solution.

No, because there is no equivalent for that on Windows. And since Windows
is the primary target of our Simple IPC/FSMonitor work, that would break
the bank.

> Obviously we can't rely on it being available everywhere, but it's quite
> easy to emulate with a wrapper (and I'd be happy punt on any writev
> stuff until somebody actually measures a difference).
>
> The other direction is that callers could be using a correctly-sized
> buffer in the first place. I.e., something like:
>
>   struct packet_buffer {
>           char full_packet[LARGE_PACKET_MAX];
>   };
>   static inline char *packet_data(struct packet_buffer *pb)
>   {
> 	return pb->full_packet + 4;
>   }

Or we change it to

	struct packet_buffer {
		char count[4];
		char payload[LARGE_PACKET_MAX - 4];
	};

and then ask the callers to allocate one of those beauties
Not sure how well we can guarantee that the compiler won't pad this,
though.

And then there is `write_packetized_from_buf()` whose `src` parameter can
come from `convert_to_git()` that _definitely_ would not be of the desired
form.

So I guess if we can get away with the 2-syscall version, that's kind of
better than that.

Ciao,
Dscho

>
> That lets people work with the oversized buffer in a natural-ish way
> that would be hard to get wrong, like:
>
>   memcpy(packet_data(pb), some_other_buf, len);
>
> (though if we wanted to go even further, we could provide accessors that
> actually do the writing and sanity-check the lengths; the downside is
> that I'm not sure how callers typically get the bytes into these bufs in
> the first place).
>
> That's a much bigger change, of course, and I'd guess you'd much prefer
> to focus on the actual point of your series. ;)
>
> -Peff
>
Jeff King Feb. 3, 2021, 4:52 a.m. UTC | #4
On Tue, Feb 02, 2021 at 11:54:43PM +0100, Johannes Schindelin wrote:

> > Would it really be so bad to do:
> >
> >   char header[4];
> >   set_packet_header(header, packet_size);
> >   if (write_in_full(fd_out, header, 4) < 0 ||
> >       write_in_full(fd_out, buf, size) < 0)
> >           return error(...);
> 
> There must have been a reason why the original code went out of its way to
> copy the data. At least that's what I _assume_.

Having looked at the history, including the original mailing list
threads, it doesn't seem to be.

> I could see, for example, that these extra round-trips just for the
> header, really have a negative impact on network operations.

Keep in mind these won't be network round-trips. They're just syscall
round-trips. The OS would keep writing without an ACK while filling a
TCP window. The worst case may be an extra packet on the wire, though
the OS may end up coalescing the writes into a single packet anyway.

> > I doubt that two syscalls is breaking the bank here, but if people are
> > really concerned, using writev() would be a much better solution.
> 
> No, because there is no equivalent for that on Windows. And since Windows
> is the primary target of our Simple IPC/FSMonitor work, that would break
> the bank.

Are you concerned about the performance implications, or just
portability? Falling back to two writes (and wrapping that in a
function) would be easy for the latter. For the former, there's WSASend,
but I have no idea what kind of difficulties/caveats we might run into.

> > The other direction is that callers could be using a correctly-sized
> > buffer in the first place. I.e., something like:
> >
> >   struct packet_buffer {
> >           char full_packet[LARGE_PACKET_MAX];
> >   };
> >   static inline char *packet_data(struct packet_buffer *pb)
> >   {
> > 	return pb->full_packet + 4;
> >   }
> 
> Or we change it to
> 
> 	struct packet_buffer {
> 		char count[4];
> 		char payload[LARGE_PACKET_MAX - 4];
> 	};
> 
> and then ask the callers to allocate one of those beauties
> Not sure how well we can guarantee that the compiler won't pad this,
> though.

Yeah, I almost suggested the same, but wasn't sure about padding. I
think the standard allows there to be arbitrary padding between the two,
so it's really up to the ABI to define. I'd be surprised if this struct
is a problem in practice (we already have some structs which assume
4-byte alignment, and nobody seems to have complained).

> And then there is `write_packetized_from_buf()` whose `src` parameter can
> come from `convert_to_git()` that _definitely_ would not be of the desired
> form.

Yep. It really does need to either use two writes or to copy, because
it's slicing up a much larger buffer (it wouldn't be the end of the
world for it to allocate a single LARGE_PACKET_MAX heap buffer for the
duration of its run, though).

> So I guess if we can get away with the 2-syscall version, that's kind of
> better than that.

I do prefer it, because then the whole thing just becomes an
implementation detail that callers don't need to care about.

-Peff
diff mbox series

Patch

diff --git a/pkt-line.c b/pkt-line.c
index d633005ef74..14af049cd9c 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -194,26 +194,34 @@  int packet_write_fmt_gently(int fd, const char *fmt, ...)
 	return status;
 }
 
-static int packet_write_gently(const int fd_out, const char *buf, size_t size)
+/*
+ * Use the provided scratch space to build a combined <hdr><buf> buffer
+ * and write it to the file descriptor (in one write if possible).
+ */
+static int packet_write_gently(const int fd_out, const char *buf, size_t size,
+			       struct packet_scratch_space *scratch)
 {
-	static char packet_write_buffer[LARGE_PACKET_MAX];
 	size_t packet_size;
 
-	if (size > sizeof(packet_write_buffer) - 4)
+	if (size > sizeof(scratch->buffer) - 4)
 		return error(_("packet write failed - data exceeds max packet size"));
 
 	packet_trace(buf, size, 1);
 	packet_size = size + 4;
-	set_packet_header(packet_write_buffer, packet_size);
-	memcpy(packet_write_buffer + 4, buf, size);
-	if (write_in_full(fd_out, packet_write_buffer, packet_size) < 0)
+
+	set_packet_header(scratch->buffer, packet_size);
+	memcpy(scratch->buffer + 4, buf, size);
+
+	if (write_in_full(fd_out, scratch->buffer, packet_size) < 0)
 		return error(_("packet write failed"));
 	return 0;
 }
 
 void packet_write(int fd_out, const char *buf, size_t size)
 {
-	if (packet_write_gently(fd_out, buf, size))
+	static struct packet_scratch_space scratch;
+
+	if (packet_write_gently(fd_out, buf, size, &scratch))
 		die_errno(_("packet write failed"));
 }
 
@@ -244,6 +252,12 @@  void packet_buf_write_len(struct strbuf *buf, const char *data, size_t len)
 
 int write_packetized_from_fd(int fd_in, int fd_out)
 {
+	/*
+	 * TODO We could save a memcpy() if we essentially inline
+	 * TODO packet_write_gently() here and change the xread()
+	 * TODO to pass &buf[4].
+	 */
+	static struct packet_scratch_space scratch;
 	static char buf[LARGE_PACKET_DATA_MAX];
 	int err = 0;
 	ssize_t bytes_to_write;
@@ -254,7 +268,7 @@  int write_packetized_from_fd(int fd_in, int fd_out)
 			return COPY_READ_ERROR;
 		if (bytes_to_write == 0)
 			break;
-		err = packet_write_gently(fd_out, buf, bytes_to_write);
+		err = packet_write_gently(fd_out, buf, bytes_to_write, &scratch);
 	}
 	if (!err)
 		err = packet_flush_gently(fd_out);
@@ -263,6 +277,7 @@  int write_packetized_from_fd(int fd_in, int fd_out)
 
 int write_packetized_from_buf(const char *src_in, size_t len, int fd_out)
 {
+	static struct packet_scratch_space scratch;
 	int err = 0;
 	size_t bytes_written = 0;
 	size_t bytes_to_write;
@@ -274,7 +289,7 @@  int write_packetized_from_buf(const char *src_in, size_t len, int fd_out)
 			bytes_to_write = len - bytes_written;
 		if (bytes_to_write == 0)
 			break;
-		err = packet_write_gently(fd_out, src_in + bytes_written, bytes_to_write);
+		err = packet_write_gently(fd_out, src_in + bytes_written, bytes_to_write, &scratch);
 		bytes_written += bytes_to_write;
 	}
 	if (!err)
diff --git a/pkt-line.h b/pkt-line.h
index 8c90daa59ef..4ccd6f88926 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -5,6 +5,13 @@ 
 #include "strbuf.h"
 #include "sideband.h"
 
+#define LARGE_PACKET_MAX 65520
+#define LARGE_PACKET_DATA_MAX (LARGE_PACKET_MAX - 4)
+
+struct packet_scratch_space {
+	char buffer[LARGE_PACKET_MAX];
+};
+
 /*
  * Write a packetized stream, where each line is preceded by
  * its length (including the header) as a 4-byte hex number.
@@ -213,8 +220,7 @@  enum packet_read_status packet_reader_read(struct packet_reader *reader);
 enum packet_read_status packet_reader_peek(struct packet_reader *reader);
 
 #define DEFAULT_PACKET_MAX 1000
-#define LARGE_PACKET_MAX 65520
-#define LARGE_PACKET_DATA_MAX (LARGE_PACKET_MAX - 4)
+
 extern char packet_buffer[LARGE_PACKET_MAX];
 
 struct packet_writer {