diff mbox series

t1400: avoid SIGPIPE race condition on fifo

Message ID YUIsZCTqm56KfilP@coredump.intra.peff.net (mailing list archive)
State New, archived
Headers show
Series t1400: avoid SIGPIPE race condition on fifo | expand

Commit Message

Jeff King Sept. 15, 2021, 5:24 p.m. UTC
On Wed, Sep 15, 2021 at 01:13:25PM -0400, Jeff King wrote:

> I think this test may be racy. I saw a strange failure from it in CI:
> 
>   https://github.com/peff/git/runs/3605506649?check_suite_focus=true#step:5:6734
> 
> I can't reproduce the problem locally with "--stress", but the failure
> there is on macOS (and likewise, a nearby run failed with a timeout just
> for macOS, which could be caused by a racy deadlock).

Ah, I just wasn't trying hard enough. Using --run=1,190 lets it run a
lot more tightly, and I got a failure pretty quickly. Here's the fix (on
top of ps/update-ref-batch-flush).

-- >8 --
Subject: [PATCH] t1400: avoid SIGPIPE race condition on fifo

t1400.190 sometimes fails or even hangs because of the way it uses
fifos. Our goal is to interactively read and write lines from
update-ref, so we have two fifos, in and out. We open a descriptor
connected to "in" and redirect output to that, so that update-ref does
not see EOF as it would if we opened and closed it for each "echo" call.

But we don't do the same for the output. This leads to a race where our
"read response <out" has not yet opened the fifo, but update-ref tries
to write to it and gets SIGPIPE. This can result in the test failing, or
worse, hanging as we wait forever for somebody to write to the pipe.

This is the same proble we fixed in 4783e7ea83 (t0008: avoid SIGPIPE
race condition on fifo, 2013-07-12), and we can fix it the same way, by
opening a second long-running descriptor.

Before this patch, running:

  ./t1400-update-ref.sh --run=1,190 --stress

failed or hung within a few dozen iterations. After it, I ran it for
several hundred without problems.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t1400-update-ref.sh | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

Comments

Junio C Hamano Sept. 15, 2021, 8:08 p.m. UTC | #1
Jeff King <peff@peff.net> writes:

> On Wed, Sep 15, 2021 at 01:13:25PM -0400, Jeff King wrote:
>
>> I think this test may be racy. I saw a strange failure from it in CI:
>> 
>>   https://github.com/peff/git/runs/3605506649?check_suite_focus=true#step:5:6734
>> 
>> I can't reproduce the problem locally with "--stress", but the failure
>> there is on macOS (and likewise, a nearby run failed with a timeout just
>> for macOS, which could be caused by a racy deadlock).
>
> Ah, I just wasn't trying hard enough. Using --run=1,190 lets it run a
> lot more tightly, and I got a failure pretty quickly. Here's the fix (on
> top of ps/update-ref-batch-flush).
>
> -- >8 --
> Subject: [PATCH] t1400: avoid SIGPIPE race condition on fifo
>
> t1400.190 sometimes fails or even hangs because of the way it uses
> fifos. Our goal is to interactively read and write lines from
> update-ref, so we have two fifos, in and out. We open a descriptor
> connected to "in" and redirect output to that, so that update-ref does
> not see EOF as it would if we opened and closed it for each "echo" call.

Thanks.  I think I see the same breakage in last night's pushout of
'seen'.

Will queue.
Patrick Steinhardt Sept. 16, 2021, 4:39 a.m. UTC | #2
On Wed, Sep 15, 2021 at 01:24:52PM -0400, Jeff King wrote:
> On Wed, Sep 15, 2021 at 01:13:25PM -0400, Jeff King wrote:
> 
> > I think this test may be racy. I saw a strange failure from it in CI:
> > 
> >   https://github.com/peff/git/runs/3605506649?check_suite_focus=true#step:5:6734
> > 
> > I can't reproduce the problem locally with "--stress", but the failure
> > there is on macOS (and likewise, a nearby run failed with a timeout just
> > for macOS, which could be caused by a racy deadlock).
> 
> Ah, I just wasn't trying hard enough. Using --run=1,190 lets it run a
> lot more tightly, and I got a failure pretty quickly. Here's the fix (on
> top of ps/update-ref-batch-flush).
> 
> -- >8 --
> Subject: [PATCH] t1400: avoid SIGPIPE race condition on fifo
> 
> t1400.190 sometimes fails or even hangs because of the way it uses
> fifos. Our goal is to interactively read and write lines from
> update-ref, so we have two fifos, in and out. We open a descriptor
> connected to "in" and redirect output to that, so that update-ref does
> not see EOF as it would if we opened and closed it for each "echo" call.
> 
> But we don't do the same for the output. This leads to a race where our
> "read response <out" has not yet opened the fifo, but update-ref tries
> to write to it and gets SIGPIPE. This can result in the test failing, or
> worse, hanging as we wait forever for somebody to write to the pipe.
> 
> This is the same proble we fixed in 4783e7ea83 (t0008: avoid SIGPIPE

Type: "proble" -> "problem".

> race condition on fifo, 2013-07-12), and we can fix it the same way, by
> opening a second long-running descriptor.
> 
> Before this patch, running:
> 
>   ./t1400-update-ref.sh --run=1,190 --stress
> 
> failed or hung within a few dozen iterations. After it, I ran it for
> several hundred without problems.
> 
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  t/t1400-update-ref.sh | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/t/t1400-update-ref.sh b/t/t1400-update-ref.sh
> index 1e754e258f..0d4f73acaa 100755
> --- a/t/t1400-update-ref.sh
> +++ b/t/t1400-update-ref.sh
> @@ -1603,19 +1603,21 @@ test_expect_success PIPE 'transaction flushes status updates' '
>  	(git update-ref --stdin <in >out &) &&
>  
>  	exec 9>in &&
> +	exec 8<out &&
>  	test_when_finished "exec 9>&-" &&
> +	test_when_finished "exec 8<&-" &&
>  
>  	echo "start" >&9 &&
>  	echo "start: ok" >expected &&
> -	read line <out &&
> +	read line <&8 &&
>  	echo "$line" >actual &&
>  	test_cmp expected actual &&
>  
>  	echo "create refs/heads/flush $A" >&9 &&
>  
>  	echo prepare >&9 &&
>  	echo "prepare: ok" >expected &&
> -	read line <out &&
> +	read line <&8 &&
>  	echo "$line" >actual &&
>  	test_cmp expected actual &&
>  
> @@ -1625,7 +1627,7 @@ test_expect_success PIPE 'transaction flushes status updates' '
>  
>  	echo commit >&9 &&
>  	echo "commit: ok" >expected &&
> -	read line <out &&
> +	read line <&8 &&
>  	echo "$line" >actual &&
>  	test_cmp expected actual
>  '
> -- 
> 2.33.0.917.g33ebf6a5f6
> 

Thanks a lot for digging and fixing my test. The patch looks good to me.

Patrick
diff mbox series

Patch

diff --git a/t/t1400-update-ref.sh b/t/t1400-update-ref.sh
index 1e754e258f..0d4f73acaa 100755
--- a/t/t1400-update-ref.sh
+++ b/t/t1400-update-ref.sh
@@ -1603,19 +1603,21 @@  test_expect_success PIPE 'transaction flushes status updates' '
 	(git update-ref --stdin <in >out &) &&
 
 	exec 9>in &&
+	exec 8<out &&
 	test_when_finished "exec 9>&-" &&
+	test_when_finished "exec 8<&-" &&
 
 	echo "start" >&9 &&
 	echo "start: ok" >expected &&
-	read line <out &&
+	read line <&8 &&
 	echo "$line" >actual &&
 	test_cmp expected actual &&
 
 	echo "create refs/heads/flush $A" >&9 &&
 
 	echo prepare >&9 &&
 	echo "prepare: ok" >expected &&
-	read line <out &&
+	read line <&8 &&
 	echo "$line" >actual &&
 	test_cmp expected actual &&
 
@@ -1625,7 +1627,7 @@  test_expect_success PIPE 'transaction flushes status updates' '
 
 	echo commit >&9 &&
 	echo "commit: ok" >expected &&
-	read line <out &&
+	read line <&8 &&
 	echo "$line" >actual &&
 	test_cmp expected actual
 '