diff mbox series

[v2,1/5] merge-tree --stdin: flush stdout to avoid deadlock

Message ID 3b3179785098580f3336bb24bdbaf0aa1366bfcd.1739895879.git.gitgitgadget@gmail.com (mailing list archive)
State Accepted
Commit 344a107b557604d3f958e7bf7ebf0901290c50d5
Headers show
Series merge-tree --stdin: flush stdout | expand

Commit Message

Phillip Wood Feb. 18, 2025, 4:24 p.m. UTC
From: Phillip Wood <phillip.wood@dunelm.org.uk>

If a process tries to read the output from "git merge-tree --stdin"
before it closes merge-tree's stdin then it deadlocks. This happens
because merge-tree does not flush its output before trying to read
another line of input and means that it is not possible to cherry-pick a
sequence of commits using "git merge-tree --stdin". Fix this by calling
maybe_flush_or_die() before trying to read the next line of
input. Flushing the output after each merge does not seem to affect the
performance, any difference is lost in the noise even after increasing
the number of runs.

$ git rev-list --merges --parents -n100 origin/master |
	sed 's/^[^ ]* //' >/tmp/merges
$ hyperfine -L flush 0,1 --warmup 1 --runs 30 \
	'GIT_FLUSH={flush} ./git merge-tree --stdin </tmp/merges'
Benchmark 1: GIT_FLUSH=0 ./git merge-tree --stdin </tmp/merges
  Time (mean ± σ):     546.6 ms ±  11.7 ms    [User: 503.2 ms, System: 40.9 ms]
  Range (min … max):   535.9 ms … 567.7 ms    30 runs

Benchmark 2: GIT_FLUSH=1 ./git merge-tree --stdin </tmp/merges
  Time (mean ± σ):     546.9 ms ±  12.0 ms    [User: 505.9 ms, System: 38.9 ms]
  Range (min … max):   529.8 ms … 570.0 ms    30 runs

Summary
  'GIT_FLUSH=0 ./git merge-tree --stdin </tmp/merges' ran
    1.00 ± 0.03 times faster than 'GIT_FLUSH=1 ./git merge-tree --stdin </tmp/merges'

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 builtin/merge-tree.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Patrick Steinhardt Feb. 19, 2025, 6:23 a.m. UTC | #1
On Tue, Feb 18, 2025 at 04:24:35PM +0000, Phillip Wood via GitGitGadget wrote:
> diff --git a/builtin/merge-tree.c b/builtin/merge-tree.c
> index 9a6c8b4e4cf..57f4340faba 100644
> --- a/builtin/merge-tree.c
> +++ b/builtin/merge-tree.c
> @@ -18,6 +18,7 @@
>  #include "tree.h"
>  #include "config.h"
>  #include "strvec.h"
> +#include "write-or-die.h"
>  
>  static int line_termination = '\n';
>  
> @@ -623,6 +624,7 @@ int cmd_merge_tree(int argc,
>  			} else {
>  				die(_("malformed input line: '%s'."), buf.buf);
>  			}
> +			maybe_flush_or_die(stdout, "stdout");
>  
>  			if (result < 0)
>  				die(_("merging cannot continue; got unclean result of %d"), result);

I was briefly wondering whether we should rather move this into
`real_merge()` itself, which is responsible for writing to stdout. But
the only other callsite doesn't really care as it will exit immediately
anyway. So this is probably fine.

Overall the series looks good to me, thanks!

Patrick
Phillip Wood Feb. 19, 2025, 2:55 p.m. UTC | #2
Hi Patrick

On 19/02/2025 06:23, Patrick Steinhardt wrote:
> 
> I was briefly wondering whether we should rather move this into
> `real_merge()` itself, which is responsible for writing to stdout. But
> the only other callsite doesn't really care as it will exit immediately
> anyway. So this is probably fine.

I did wonder about that when I was writing the patch but decided it only 
mattered when --stdin was given.

> Overall the series looks good to me, thanks!

Thanks

Phillip
Junio C Hamano Feb. 19, 2025, 4:02 p.m. UTC | #3
Patrick Steinhardt <ps@pks.im> writes:

> On Tue, Feb 18, 2025 at 04:24:35PM +0000, Phillip Wood via GitGitGadget wrote:
>> diff --git a/builtin/merge-tree.c b/builtin/merge-tree.c
>> index 9a6c8b4e4cf..57f4340faba 100644
>> --- a/builtin/merge-tree.c
>> +++ b/builtin/merge-tree.c
>> @@ -18,6 +18,7 @@
>>  #include "tree.h"
>>  #include "config.h"
>>  #include "strvec.h"
>> +#include "write-or-die.h"
>>  
>>  static int line_termination = '\n';
>>  
>> @@ -623,6 +624,7 @@ int cmd_merge_tree(int argc,
>>  			} else {
>>  				die(_("malformed input line: '%s'."), buf.buf);
>>  			}
>> +			maybe_flush_or_die(stdout, "stdout");
>>  
>>  			if (result < 0)
>>  				die(_("merging cannot continue; got unclean result of %d"), result);
>
> I was briefly wondering whether we should rather move this into
> `real_merge()` itself, which is responsible for writing to stdout.

Indeed.  Its "if we are working stdin mode, emit a line break" near
the end does hint that the original intent was to ensure the output
goes out before we return to read more (even though the code forgets
that the stdio layer buffers), so it doubly makes sense to place the
logic to flush before the function returns, no?

> But
> the only other callsite doesn't really care as it will exit immediately
> anyway. So this is probably fine.

With the current set of callers, yes.  When we write these
sentences, our imagination most likely fails short to come up with a
reason why the next callsite needs to be added to the program, so I
would strongly suggest flushing inside real_merge() IF it were a
public function, but because it is a file-scope static helper, we
hopefully will remember we would need to add a flush to the caller
when we add the fourth caller.  So I do not care too deeply either
way.

> Overall the series looks good to me, thanks!

Thanks, all.  Let's mark it for 'next'.
diff mbox series

Patch

diff --git a/builtin/merge-tree.c b/builtin/merge-tree.c
index 9a6c8b4e4cf..57f4340faba 100644
--- a/builtin/merge-tree.c
+++ b/builtin/merge-tree.c
@@ -18,6 +18,7 @@ 
 #include "tree.h"
 #include "config.h"
 #include "strvec.h"
+#include "write-or-die.h"
 
 static int line_termination = '\n';
 
@@ -623,6 +624,7 @@  int cmd_merge_tree(int argc,
 			} else {
 				die(_("malformed input line: '%s'."), buf.buf);
 			}
+			maybe_flush_or_die(stdout, "stdout");
 
 			if (result < 0)
 				die(_("merging cannot continue; got unclean result of %d"), result);