diff mbox series

mingw: handle writes to non-blocking pipe

Message ID 72d007c5-9429-1612-24d7-af5db906dd63@web.de (mailing list archive)
State New, archived
Headers show
Series mingw: handle writes to non-blocking pipe | expand

Commit Message

René Scharfe Aug. 10, 2022, 5:39 a.m. UTC
write() on Windows reports ENOSPC when writing to a non-blocking pipe
whose buffer is full and rejects writes bigger than the buffer outright.
Change the error code to EAGAIN and try a buffer-sized partial write to
comply with POSIX and the expections of our Git-internal callers.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: René Scharfe <l.s.r@web.de>
---
 compat/mingw.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

--
2.37.1.windows.1

Comments

Johannes Schindelin Aug. 10, 2022, 9:07 a.m. UTC | #1
Hi René,

On Wed, 10 Aug 2022, René Scharfe wrote:

> write() on Windows reports ENOSPC when writing to a non-blocking pipe
> whose buffer is full and rejects writes bigger than the buffer outright.
> Change the error code to EAGAIN and try a buffer-sized partial write to
> comply with POSIX and the expections of our Git-internal callers.

Excellent analysis, thank you!

However, let's reword this to clarify that the error code is set to EAGAIN
only if the buffer-sized partial write fails.

>
> Helped-by: Jeff King <peff@peff.net>
> Signed-off-by: René Scharfe <l.s.r@web.de>
> ---
>  compat/mingw.c | 19 ++++++++++++++++++-
>  1 file changed, 18 insertions(+), 1 deletion(-)
>
> diff --git a/compat/mingw.c b/compat/mingw.c
> index b5502997e2..c6f244c0fe 100644
> --- a/compat/mingw.c
> +++ b/compat/mingw.c
> @@ -689,6 +689,8 @@ int mingw_fflush(FILE *stream)
>  	return ret;
>  }
>
> +#define PIPE_BUFFER_SIZE (8192)

This constant hails all the way back from 897bb8cb2c2 (Windows: A pipe()
replacement whose ends are not inherited to children., 2007-12-07), in
case anyone wondered like I did where that number came from (and why it is
so low).

It is outside the purview of this patch to change that constant, therefore
I am fine with leaving this as-is.

> +
>  #undef write
>  ssize_t mingw_write(int fd, const void *buf, size_t len)
>  {
> @@ -702,6 +704,21 @@ ssize_t mingw_write(int fd, const void *buf, size_t len)
>  		else
>  			errno = EINVAL;
>  	}
> +	if (result < 0 && errno == ENOSPC) {

It might make the code clearer to turn this into an `else if`.

> +		/* check if fd is a non-blocking pipe */
> +		HANDLE h = (HANDLE) _get_osfhandle(fd);
> +		DWORD s;
> +		if (GetFileType(h) == FILE_TYPE_PIPE &&
> +		    GetNamedPipeHandleState(h, &s, NULL, NULL, NULL, NULL, 0) &&
> +		    (s & PIPE_NOWAIT)) {
> +			DWORD obuflen;
> +			if (!GetNamedPipeInfo(h, NULL, &obuflen, NULL, NULL))
> +				obuflen = PIPE_BUFFER_SIZE;
> +			if (len > obuflen)
> +				return mingw_write(fd, buf, obuflen);

It is probably easier to reason about to recurse instead of using a `goto`
here.

Thank you for this patch!
Dscho

> +			errno = EAGAIN;
> +		}
> +	}
>
>  	return result;
>  }
> @@ -1079,7 +1096,7 @@ int pipe(int filedes[2])
>  	HANDLE h[2];
>
>  	/* this creates non-inheritable handles */
> -	if (!CreatePipe(&h[0], &h[1], NULL, 8192)) {
> +	if (!CreatePipe(&h[0], &h[1], NULL, PIPE_BUFFER_SIZE)) {
>  		errno = err_win_to_posix(GetLastError());
>  		return -1;
>  	}
> --
> 2.37.1.windows.1
>
Jeff King Aug. 10, 2022, 8:02 p.m. UTC | #2
On Wed, Aug 10, 2022 at 07:39:40AM +0200, René Scharfe wrote:

> write() on Windows reports ENOSPC when writing to a non-blocking pipe
> whose buffer is full and rejects writes bigger than the buffer outright.
> Change the error code to EAGAIN and try a buffer-sized partial write to
> comply with POSIX and the expections of our Git-internal callers.

Bearing in mind that I have no qualifications for reviewing
Windows-specific patches, this seems like the right thing to be doing
from the behavior you saw.

One question:

> +	if (result < 0 && errno == ENOSPC) {
> +		/* check if fd is a non-blocking pipe */
> +		HANDLE h = (HANDLE) _get_osfhandle(fd);
> +		DWORD s;
> +		if (GetFileType(h) == FILE_TYPE_PIPE &&
> +		    GetNamedPipeHandleState(h, &s, NULL, NULL, NULL, NULL, 0) &&
> +		    (s & PIPE_NOWAIT)) {
> +			DWORD obuflen;
> +			if (!GetNamedPipeInfo(h, NULL, &obuflen, NULL, NULL))
> +				obuflen = PIPE_BUFFER_SIZE;
> +			if (len > obuflen)
> +				return mingw_write(fd, buf, obuflen);
> +			errno = EAGAIN;
> +		}

OK, so we call GetNamedPipeInfo() to find the size of the pipe buffer.
It's unclear to me from Microsoft's docs if that is the _total_ size, or
if it's the remaining available size. Hopefully the latter, since none
of this works otherwise. ;)

But two corner cases:

  - If we fail to get the size, we guess that it's the maximum. Is this
    dangerous? I'm not sure why the call would fail, but if for some
    reason it did fail and we can't make forward progress, would we
    enter an infinite recursion of mingw_write()? Would it be safer to
    bail with EAGAIN in such a case (through granted, that probably just
    puts us into an infinite loop in xwrite())?

  - According to the docs:

      If the buffer size is zero, the buffer is allocated as needed.

    If we see this case, we'd then call mingw_write() with 0 bytes,
    which I imagine also makes no forward progress (though maybe we
    eventually return a successful 0-byte write?).

-Peff
René Scharfe Aug. 10, 2022, 10:34 p.m. UTC | #3
Am 10.08.2022 um 22:02 schrieb Jeff King:
> On Wed, Aug 10, 2022 at 07:39:40AM +0200, René Scharfe wrote:
>
>> write() on Windows reports ENOSPC when writing to a non-blocking pipe
>> whose buffer is full and rejects writes bigger than the buffer outright.
>> Change the error code to EAGAIN and try a buffer-sized partial write to
>> comply with POSIX and the expections of our Git-internal callers.
>
> Bearing in mind that I have no qualifications for reviewing
> Windows-specific patches, this seems like the right thing to be doing
> from the behavior you saw.
>
> One question:
>
>> +	if (result < 0 && errno == ENOSPC) {
>> +		/* check if fd is a non-blocking pipe */
>> +		HANDLE h = (HANDLE) _get_osfhandle(fd);
>> +		DWORD s;
>> +		if (GetFileType(h) == FILE_TYPE_PIPE &&
>> +		    GetNamedPipeHandleState(h, &s, NULL, NULL, NULL, NULL, 0) &&
>> +		    (s & PIPE_NOWAIT)) {
>> +			DWORD obuflen;
>> +			if (!GetNamedPipeInfo(h, NULL, &obuflen, NULL, NULL))
>> +				obuflen = PIPE_BUFFER_SIZE;
>> +			if (len > obuflen)
>> +				return mingw_write(fd, buf, obuflen);
>> +			errno = EAGAIN;
>> +		}
>
> OK, so we call GetNamedPipeInfo() to find the size of the pipe buffer.
> It's unclear to me from Microsoft's docs if that is the _total_ size, or
> if it's the remaining available size. Hopefully the latter, since none
> of this works otherwise. ;)
>
> But two corner cases:
>
>   - If we fail to get the size, we guess that it's the maximum. Is this
>     dangerous? I'm not sure why the call would fail, but if for some
>     reason it did fail and we can't make forward progress, would we
>     enter an infinite recursion of mingw_write()? Would it be safer to
>     bail with EAGAIN in such a case (through granted, that probably just
>     puts us into an infinite loop in xwrite())?

AFAIU it's the total size, not the available space.  I think I confused
it with PIPE_BUF, which we should use instead.

Alternatively we could retry with ever smaller sizes, down to one byte,
to avoid EAGAIN as much as possible.  Sounds costly, though.

>
>   - According to the docs:
>
>       If the buffer size is zero, the buffer is allocated as needed.
>
>     If we see this case, we'd then call mingw_write() with 0 bytes,
>     which I imagine also makes no forward progress (though maybe we
>     eventually return a successful 0-byte write?).

Ah, yes, forgot that case.

René
Jeff King Aug. 11, 2022, 8:47 a.m. UTC | #4
On Thu, Aug 11, 2022 at 12:34:46AM +0200, René Scharfe wrote:

> > OK, so we call GetNamedPipeInfo() to find the size of the pipe buffer.
> > It's unclear to me from Microsoft's docs if that is the _total_ size, or
> > if it's the remaining available size. Hopefully the latter, since none
> > of this works otherwise. ;)
> >
> > But two corner cases:
> >
> >   - If we fail to get the size, we guess that it's the maximum. Is this
> >     dangerous? I'm not sure why the call would fail, but if for some
> >     reason it did fail and we can't make forward progress, would we
> >     enter an infinite recursion of mingw_write()? Would it be safer to
> >     bail with EAGAIN in such a case (through granted, that probably just
> >     puts us into an infinite loop in xwrite())?
> 
> AFAIU it's the total size, not the available space.  I think I confused
> it with PIPE_BUF, which we should use instead.

Hmm. If it's giving us the total size, that seems like it may fail in a
case like this:

  - we write a smaller amount to the pipe, say 7168 bytes. That leaves
    1024 bytes in the buffer. The reader reads nothing yet.

  - now we try to write another 4096 bytes. That's too big, so we get
    ENOSPC. But when we ask for the pipe size, it tells us 8192. So we
    _don't_ try to do a partial write of the remaining size, and return
    EAGAIN. But now we've made no forward progress (i.e., even though
    poll() said we could write, we don't, and we end up in the xwrite
    loop).

So we really do want to try to get a partial write. Even a single byte
means we are making forward progress.

> Alternatively we could retry with ever smaller sizes, down to one byte,
> to avoid EAGAIN as much as possible.  Sounds costly, though.

It's definitely not optimal, but it may not be too bad. If we cut the
size in half each time, then at worst we make log2(N) extra write
attempts, and we end up with a partial write within 50% of the optimal
size.

-Peff
René Scharfe Aug. 11, 2022, 5:35 p.m. UTC | #5
Am 11.08.2022 um 10:47 schrieb Jeff King:
> On Thu, Aug 11, 2022 at 12:34:46AM +0200, René Scharfe wrote:
>
>>> OK, so we call GetNamedPipeInfo() to find the size of the pipe buffer.
>>> It's unclear to me from Microsoft's docs if that is the _total_ size, or
>>> if it's the remaining available size. Hopefully the latter, since none
>>> of this works otherwise. ;)
>>>
>>> But two corner cases:
>>>
>>>   - If we fail to get the size, we guess that it's the maximum. Is this
>>>     dangerous? I'm not sure why the call would fail, but if for some
>>>     reason it did fail and we can't make forward progress, would we
>>>     enter an infinite recursion of mingw_write()? Would it be safer to
>>>     bail with EAGAIN in such a case (through granted, that probably just
>>>     puts us into an infinite loop in xwrite())?
>>
>> AFAIU it's the total size, not the available space.  I think I confused
>> it with PIPE_BUF, which we should use instead.
>
> Hmm. If it's giving us the total size, that seems like it may fail in a
> case like this:
>
>   - we write a smaller amount to the pipe, say 7168 bytes. That leaves
>     1024 bytes in the buffer. The reader reads nothing yet.
>
>   - now we try to write another 4096 bytes. That's too big, so we get
>     ENOSPC. But when we ask for the pipe size, it tells us 8192. So we
>     _don't_ try to do a partial write of the remaining size, and return
>     EAGAIN. But now we've made no forward progress (i.e., even though
>     poll() said we could write, we don't, and we end up in the xwrite
>     loop).
>
> So we really do want to try to get a partial write. Even a single byte
> means we are making forward progress.
>
>> Alternatively we could retry with ever smaller sizes, down to one byte,
>> to avoid EAGAIN as much as possible.  Sounds costly, though.
>
> It's definitely not optimal, but it may not be too bad. If we cut the
> size in half each time, then at worst we make log2(N) extra write
> attempts, and we end up with a partial write within 50% of the optimal
> size.

OK, but we can't just split anything up as we like: POSIX wants us to
keep writes up to PIPE_BUF atomic.  When I read that name I mistakenly
thought it was the size of the pipe buffer, but it's a different value.
The minimum value according to POSIX is 512 bytes; on Linux it's 4KB.

And Windows doesn't seem to define it.  Bash's ulimit -p returns 8,
which is in units of 512 bytes, so it's 4KB like on Linux.  But that's
apparently not respected by Windows' write.

I just realized that we can interprete the situation slightly
differently.  Windows has effectively PIPE_BUF = 2^32, which means all
writes are atomic.  I don't see POSIX specifying a maximum allowed
value, so this must be allowed.  Which means you cannot rely on write
performing partial writes.  Makes sense?

If we accept that, then we need a special write function for
non-blocking pipes that chops the data into small enough chunks.

Another oddity: t3701 with yesterday's patch finishes fine with -i, but
hangs without it (not just when running it via prove).  O_o

René
Jeff King Aug. 11, 2022, 6:20 p.m. UTC | #6
On Thu, Aug 11, 2022 at 07:35:33PM +0200, René Scharfe wrote:

> OK, but we can't just split anything up as we like: POSIX wants us to
> keep writes up to PIPE_BUF atomic.  When I read that name I mistakenly
> thought it was the size of the pipe buffer, but it's a different value.
> The minimum value according to POSIX is 512 bytes; on Linux it's 4KB.
> 
> And Windows doesn't seem to define it.  Bash's ulimit -p returns 8,
> which is in units of 512 bytes, so it's 4KB like on Linux.  But that's
> apparently not respected by Windows' write.
> 
> I just realized that we can interprete the situation slightly
> differently.  Windows has effectively PIPE_BUF = 2^32, which means all
> writes are atomic.  I don't see POSIX specifying a maximum allowed
> value, so this must be allowed.  Which means you cannot rely on write
> performing partial writes.  Makes sense?

Hmm, I'm not sure. POSIX does allow writing a single byte if that's all
the space there is, but only if the original _request_ was for more than
PIPE_BUF. Which I guess matches what you're saying.

If the original request is smaller than PIPE_BUF, it does seem to say
that EAGAIN is the correct output. But it also says:

  There is no exception regarding partial writes when O_NONBLOCK is set.
  With the exception of writing to an empty pipe, this volume of
  POSIX.1-2017 does not specify exactly when a partial write is
  performed since that would require specifying internal details of the
  implementation. Every application should be prepared to handle partial
  writes when O_NONBLOCK is set and the requested amount is greater than
  {PIPE_BUF}, just as every application should be prepared to handle
  partial writes on other kinds of file descriptors.

  The intent of forcing writing at least one byte if any can be written
  is to assure that each write makes progress if there is any room in
  the pipe. If the pipe is empty, {PIPE_BUF} bytes must be written; if
  not, at least some progress must have been made.

So they are clearly aware of the "we need to make forward progress"
problem. But how are you supposed to do that if you only have less than
PIPE_BUF bytes left to write? And likewise, even if it is technically
legal, having a PIPE_BUF of 2^32 seems like a quality-of-implementation
issue.

And that doesn't really match what poll() is saying. The response from
poll() told us we could write without blocking, which implies at least
PIPE_BUF bytes available. But clearly it isn't available, since the
write fails (with EAGAIN, but that is equivalent to blocking in POSIX's
view here).

Lawyering aside, I think it would be OK to move forward with cutting up
writes at least to a size of 512 bytes. Git runs on lots of platforms,
and none of the code can make an assumption that PIPE_BUF is larger than
that. I.e., we are reducing atomicity provided by Windows, but that's
OK.

I don't think that solves our problem fully, though. We might need to
write 5 bytes, and telling mingw_write() to do so means it's supposed to
abide by PIPE_BUF conventions. But again, we are in control of the
calling code here. I don't think there's any reason that we care about
PIPE_BUF atomicity. Certainly we don't get those atomicity guarantees on
the socket side, which is where much of our writing happens, and our
code is prepared to handle partial writes of any size. So we could just
ignore that guarantee here.

> If we accept that, then we need a special write function for
> non-blocking pipes that chops the data into small enough chunks.

I'm not sure what "small enough" we can rely on, though. Really it is
the interplay between poll() and write() that we care about here. We
would like to know at what point poll() will tell us it's OK to write().
But we don't know what the OS thinks of that.

Or maybe we do, since I think poll() is our own compat layer. But it
just seems to be based on select(). We do seem to use PeekNamedPipe()
there to look on the reading side, but I don't know if there's an
equivalent for writing.

> Another oddity: t3701 with yesterday's patch finishes fine with -i, but
> hangs without it (not just when running it via prove).  O_o

Yes, definitely strange. I'd expect weirdness with "-v", for example,
because of terminal stuff, but "-i" should have no impact on the running
environment.

-Peff
René Scharfe Aug. 14, 2022, 3:37 p.m. UTC | #7
Am 11.08.2022 um 20:20 schrieb Jeff King:
> On Thu, Aug 11, 2022 at 07:35:33PM +0200, René Scharfe wrote:
>
>> OK, but we can't just split anything up as we like: POSIX wants us to
>> keep writes up to PIPE_BUF atomic.  When I read that name I mistakenly
>> thought it was the size of the pipe buffer, but it's a different value.
>> The minimum value according to POSIX is 512 bytes; on Linux it's 4KB.
>>
>> And Windows doesn't seem to define it.  Bash's ulimit -p returns 8,
>> which is in units of 512 bytes, so it's 4KB like on Linux.  But that's
>> apparently not respected by Windows' write.
>>
>> I just realized that we can interprete the situation slightly
>> differently.  Windows has effectively PIPE_BUF = 2^32, which means all
>> writes are atomic.  I don't see POSIX specifying a maximum allowed
>> value, so this must be allowed.  Which means you cannot rely on write
>> performing partial writes.  Makes sense?
>
> Hmm, I'm not sure. POSIX does allow writing a single byte if that's all
> the space there is, but only if the original _request_ was for more than
> PIPE_BUF. Which I guess matches what you're saying.
>
> If the original request is smaller than PIPE_BUF, it does seem to say
> that EAGAIN is the correct output. But it also says:
>
>   There is no exception regarding partial writes when O_NONBLOCK is set.
>   With the exception of writing to an empty pipe, this volume of
>   POSIX.1-2017 does not specify exactly when a partial write is
>   performed since that would require specifying internal details of the
>   implementation. Every application should be prepared to handle partial
>   writes when O_NONBLOCK is set and the requested amount is greater than
>   {PIPE_BUF}, just as every application should be prepared to handle
>   partial writes on other kinds of file descriptors.
>
>   The intent of forcing writing at least one byte if any can be written
>   is to assure that each write makes progress if there is any room in
>   the pipe. If the pipe is empty, {PIPE_BUF} bytes must be written; if
>   not, at least some progress must have been made.
>
> So they are clearly aware of the "we need to make forward progress"
> problem. But how are you supposed to do that if you only have less than
> PIPE_BUF bytes left to write? And likewise, even if it is technically
> legal, having a PIPE_BUF of 2^32 seems like a quality-of-implementation
> issue.
>
> And that doesn't really match what poll() is saying. The response from
> poll() told us we could write without blocking, which implies at least
> PIPE_BUF bytes available. But clearly it isn't available, since the
> write fails (with EAGAIN, but that is equivalent to blocking in POSIX's
> view here).

Turns out that's not the case on Windows: 94f4d01932 (mingw: workaround
for hangs when sending STDIN, 2020-02-17) changed the compatibility
implementation to 'Make `poll()` always reply "writable" for write end
of the pipe.'.

An updated version of the test helper confirms it (patch below) by
reporting on Windows:

chunk size: 1 bytes
 EAGAIN after 8192 bytes
chunk size: 500 bytes
 EAGAIN after 8000 bytes
chunk size: 1000 bytes
 EAGAIN after 8000 bytes
chunk size: 1024 bytes
 EAGAIN after 8192 bytes
chunk size: 100000 bytes
 partial write of 8192 bytes after 0 bytes
 EAGAIN after 8192 bytes

On Debian I get this instead:

chunk size: 1 bytes
 POLLOUT gone after 61441 bytes
 EAGAIN after 65536 bytes
chunk size: 500 bytes
 POLLOUT gone after 60500 bytes
 EAGAIN after 64000 bytes
chunk size: 1000 bytes
 POLLOUT gone after 61000 bytes
 EAGAIN after 64000 bytes
chunk size: 1024 bytes
 POLLOUT gone after 62464 bytes
 EAGAIN after 65536 bytes
chunk size: 100000 bytes
 partial write of 65536 bytes after 0 bytes
 POLLOUT gone after 65536 bytes
 EAGAIN after 65536 bytes

So on Windows the POLLOUT bit is indeed never unset.

> Lawyering aside, I think it would be OK to move forward with cutting up
> writes at least to a size of 512 bytes. Git runs on lots of platforms,
> and none of the code can make an assumption that PIPE_BUF is larger than
> that. I.e., we are reducing atomicity provided by Windows, but that's
> OK.
>
> I don't think that solves our problem fully, though. We might need to
> write 5 bytes, and telling mingw_write() to do so means it's supposed to
> abide by PIPE_BUF conventions. But again, we are in control of the
> calling code here. I don't think there's any reason that we care about
> PIPE_BUF atomicity. Certainly we don't get those atomicity guarantees on
> the socket side, which is where much of our writing happens, and our
> code is prepared to handle partial writes of any size. So we could just
> ignore that guarantee here.
>
>> If we accept that, then we need a special write function for
>> non-blocking pipes that chops the data into small enough chunks.
>
> I'm not sure what "small enough" we can rely on, though. Really it is
> the interplay between poll() and write() that we care about here. We
> would like to know at what point poll() will tell us it's OK to write().
> But we don't know what the OS thinks of that.

Based on the output above I think Linux' poll() won't consider a pipe
writable that has less than PIPE_BUF (4096) available bytes.

> Or maybe we do, since I think poll() is our own compat layer. But it
> just seems to be based on select(). We do seem to use PeekNamedPipe()
> there to look on the reading side, but I don't know if there's an
> equivalent for writing.

This has been elusive so far (see 94f4d01932).

Perhaps we should take the advice about PIPE_NOWAIT in the docs serious
and use overlapping (asynchronous) writes on Windows instead.  This
would mean reimplementing the whole pipe_command() with Windows API
commands, I imagine.

>> Another oddity: t3701 with yesterday's patch finishes fine with -i, but
>> hangs without it (not just when running it via prove).  O_o
>
> Yes, definitely strange. I'd expect weirdness with "-v", for example,
> because of terminal stuff, but "-i" should have no impact on the running
> environment.
It's especially grating because test runs also take very looong.

Avoiding xwrite() in pump_io_round() on top lets the test suite
finish successfully.

René


---
 Makefile                 |  1 +
 t/helper/test-nonblock.c | 73 ++++++++++++++++++++++++++++++++++++++++
 t/helper/test-tool.c     |  1 +
 t/helper/test-tool.h     |  1 +
 4 files changed, 76 insertions(+)
 create mode 100644 t/helper/test-nonblock.c

diff --git a/Makefile b/Makefile
index d9c00cc05d..0bc028ca00 100644
--- a/Makefile
+++ b/Makefile
@@ -751,6 +751,7 @@ TEST_BUILTINS_OBJS += test-lazy-init-name-hash.o
 TEST_BUILTINS_OBJS += test-match-trees.o
 TEST_BUILTINS_OBJS += test-mergesort.o
 TEST_BUILTINS_OBJS += test-mktemp.o
+TEST_BUILTINS_OBJS += test-nonblock.o
 TEST_BUILTINS_OBJS += test-oid-array.o
 TEST_BUILTINS_OBJS += test-oidmap.o
 TEST_BUILTINS_OBJS += test-oidtree.o
diff --git a/t/helper/test-nonblock.c b/t/helper/test-nonblock.c
new file mode 100644
index 0000000000..c9350ce894
--- /dev/null
+++ b/t/helper/test-nonblock.c
@@ -0,0 +1,73 @@
+#include "test-tool.h"
+#include "compat/nonblock.h"
+
+static void fill_pipe(size_t write_len)
+{
+	void *buf = xcalloc(1, write_len);
+	int fds[2];
+	size_t total_written = 0;
+	int last = 0;
+	struct pollfd pfd;
+	int writable = 1;
+
+	if (pipe(fds))
+		die_errno("pipe failed");
+	if (enable_nonblock(fds[1]))
+		die_errno("enable_nonblock failed");
+	pfd.fd = fds[1];
+	pfd.events = POLLOUT;
+
+	printf("chunk size: %"PRIuMAX" bytes\n", write_len);
+	for (;;) {
+		ssize_t written;
+		if (poll(&pfd, 1, 0) < 0) {
+			if (errno == EINTR)
+				continue;
+			die_errno("poll failed");
+		}
+		if (writable && !(pfd.revents & POLLOUT)) {
+			writable = 0;
+			printf(" POLLOUT gone after %"PRIuMAX" bytes\n",
+			       total_written);
+		}
+		written = write(fds[1], buf, write_len);
+		if (written < 0) {
+			switch (errno) {
+			case EAGAIN:
+				printf(" EAGAIN after %"PRIuMAX" bytes\n",
+				       total_written);
+				break;
+			case ENOSPC:
+				printf(" ENOSPC after %"PRIuMAX" bytes\n",
+				       total_written);
+				break;
+			default:
+				printf(" errno %d after %"PRIuMAX" bytes\n",
+				       errno, total_written);
+			}
+			break;
+		} else if (written != write_len)
+			printf(" partial write of %"PRIuMAX" bytes"
+			       " after %"PRIuMAX" bytes\n",
+			       (uintmax_t)written, total_written);
+		if (last)
+			break;
+		if (written > 0)
+			total_written += written;
+		last = !written;
+	};
+
+	close(fds[1]);
+	close(fds[0]);
+	free(buf);
+}
+
+int cmd__nonblock(int argc, const char **argv)
+{
+	fill_pipe(1);
+	fill_pipe(500);
+	fill_pipe(1000);
+	fill_pipe(1024);
+	fill_pipe(100000);
+	return 0;
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index 318fdbab0c..562d7a9161 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -45,6 +45,7 @@ static struct test_cmd cmds[] = {
 	{ "match-trees", cmd__match_trees },
 	{ "mergesort", cmd__mergesort },
 	{ "mktemp", cmd__mktemp },
+	{ "nonblock", cmd__nonblock },
 	{ "oid-array", cmd__oid_array },
 	{ "oidmap", cmd__oidmap },
 	{ "oidtree", cmd__oidtree },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index bb79927163..d9006a5298 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -36,6 +36,7 @@ int cmd__lazy_init_name_hash(int argc, const char **argv);
 int cmd__match_trees(int argc, const char **argv);
 int cmd__mergesort(int argc, const char **argv);
 int cmd__mktemp(int argc, const char **argv);
+int cmd__nonblock(int argc, const char **argv);
 int cmd__oidmap(int argc, const char **argv);
 int cmd__oidtree(int argc, const char **argv);
 int cmd__online_cpus(int argc, const char **argv);
--
2.37.1.windows.1
Jeff King Aug. 17, 2022, 5:39 a.m. UTC | #8
On Sun, Aug 14, 2022 at 05:37:08PM +0200, René Scharfe wrote:

> Turns out that's not the case on Windows: 94f4d01932 (mingw: workaround
> for hangs when sending STDIN, 2020-02-17) changed the compatibility
> implementation to 'Make `poll()` always reply "writable" for write end
> of the pipe.'.

Ah, good find. That kind of explains everything, then, I think. ;)

> > I'm not sure what "small enough" we can rely on, though. Really it is
> > the interplay between poll() and write() that we care about here. We
> > would like to know at what point poll() will tell us it's OK to write().
> > But we don't know what the OS thinks of that.
> 
> Based on the output above I think Linux' poll() won't consider a pipe
> writable that has less than PIPE_BUF (4096) available bytes.

Right, that makes sense. It would have to in order to meet the atomicity
requirement for write(), but still always make forward progress for each
write().

> Perhaps we should take the advice about PIPE_NOWAIT in the docs serious
> and use overlapping (asynchronous) writes on Windows instead.  This
> would mean reimplementing the whole pipe_command() with Windows API
> commands, I imagine.

I wouldn't be opposed to that, in the sense that it's supposed to be a
black box to the caller, and it's relatively small in size. But I think
we're pretty close to having something usable without that, so I'd like
to pursue a smaller fix in the interim.

> Avoiding xwrite() in pump_io_round() on top lets the test suite
> finish successfully.

That makes sense. We end up busy-looping between poll() and write()
while we wait for our read descriptor to become available. But if poll()
doesn't block, that's the best we can do.

-Peff
diff mbox series

Patch

diff --git a/compat/mingw.c b/compat/mingw.c
index b5502997e2..c6f244c0fe 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -689,6 +689,8 @@  int mingw_fflush(FILE *stream)
 	return ret;
 }

+#define PIPE_BUFFER_SIZE (8192)
+
 #undef write
 ssize_t mingw_write(int fd, const void *buf, size_t len)
 {
@@ -702,6 +704,21 @@  ssize_t mingw_write(int fd, const void *buf, size_t len)
 		else
 			errno = EINVAL;
 	}
+	if (result < 0 && errno == ENOSPC) {
+		/* check if fd is a non-blocking pipe */
+		HANDLE h = (HANDLE) _get_osfhandle(fd);
+		DWORD s;
+		if (GetFileType(h) == FILE_TYPE_PIPE &&
+		    GetNamedPipeHandleState(h, &s, NULL, NULL, NULL, NULL, 0) &&
+		    (s & PIPE_NOWAIT)) {
+			DWORD obuflen;
+			if (!GetNamedPipeInfo(h, NULL, &obuflen, NULL, NULL))
+				obuflen = PIPE_BUFFER_SIZE;
+			if (len > obuflen)
+				return mingw_write(fd, buf, obuflen);
+			errno = EAGAIN;
+		}
+	}

 	return result;
 }
@@ -1079,7 +1096,7 @@  int pipe(int filedes[2])
 	HANDLE h[2];

 	/* this creates non-inheritable handles */
-	if (!CreatePipe(&h[0], &h[1], NULL, 8192)) {
+	if (!CreatePipe(&h[0], &h[1], NULL, PIPE_BUFFER_SIZE)) {
 		errno = err_win_to_posix(GetLastError());
 		return -1;
 	}