diff mbox series

[v3,2/6] run-command: add hide_output to run_processes_parallel_opts

Message ID 20221020232532.1128326-3-calvinwan@google.com (mailing list archive)
State Superseded
Headers show
Series submodule: parallelize diff | expand

Commit Message

Calvin Wan Oct. 20, 2022, 11:25 p.m. UTC
Output from child processes and callbacks may not be necessary to
print out for every invoker of run_processes_parallel. Add
hide_output as an option to not print said output.

Signed-off-by: Calvin Wan <calvinwan@google.com>
---
 run-command.c               | 8 +++++---
 run-command.h               | 9 +++++++++
 t/helper/test-run-command.c | 6 ++++++
 t/t0061-run-command.sh      | 6 ++++++
 4 files changed, 26 insertions(+), 3 deletions(-)

Comments

Ævar Arnfjörð Bjarmason Oct. 21, 2022, 2:54 a.m. UTC | #1
On Thu, Oct 20 2022, Calvin Wan wrote:

> Output from child processes and callbacks may not be necessary to
> print out for every invoker of run_processes_parallel. Add
> hide_output as an option to not print said output.
>
> Signed-off-by: Calvin Wan <calvinwan@google.com>
> ---
>  run-command.c               | 8 +++++---
>  run-command.h               | 9 +++++++++
>  t/helper/test-run-command.c | 6 ++++++
>  t/t0061-run-command.sh      | 6 ++++++
>  4 files changed, 26 insertions(+), 3 deletions(-)
>
> diff --git a/run-command.c b/run-command.c
> index 03787bc7f5..3aa28ad6dc 100644
> --- a/run-command.c
> +++ b/run-command.c
> @@ -1603,7 +1603,8 @@ static void pp_cleanup(struct parallel_processes *pp,
>  	 * When get_next_task added messages to the buffer in its last
>  	 * iteration, the buffered output is non empty.
>  	 */
> -	strbuf_write(&pp->buffered_output, stderr);
> +	if (!opts->hide_output)
> +		strbuf_write(&pp->buffered_output, stderr);
>  	strbuf_release(&pp->buffered_output);
>  
>  	sigchain_pop_common();
> @@ -1754,7 +1755,7 @@ static int pp_collect_finished(struct parallel_processes *pp,
>  			pp->pfd[i].fd = -1;
>  		child_process_init(&pp->children[i].process);
>  
> -		if (opts->ungroup) {
> +		if (opts->ungroup || opts->hide_output) {
>  			; /* no strbuf_*() work to do here */
>  		} else if (i != pp->output_owner) {
>  			strbuf_addbuf(&pp->buffered_output, &pp->children[i].err);
> @@ -1826,7 +1827,8 @@ void run_processes_parallel(const struct run_process_parallel_opts *opts)
>  				pp.children[i].state = GIT_CP_WAIT_CLEANUP;
>  		} else {
>  			pp_buffer_stderr(&pp, opts, output_timeout);
> -			pp_output(&pp);
> +			if (!opts->hide_output)
> +				pp_output(&pp);
>  		}
>  		code = pp_collect_finished(&pp, opts);
>  		if (code) {
> diff --git a/run-command.h b/run-command.h
> index b4584c3698..4570812c08 100644
> --- a/run-command.h
> +++ b/run-command.h
> @@ -496,6 +496,11 @@ struct run_process_parallel_opts
>  	 */
>  	unsigned int ungroup:1;
>  
> +	/**
> +	 * hide_output: see 'hide_output' in run_processes_parallel() below.
> +	 */
> +	unsigned int hide_output:1;
> +
>  	/**
>  	 * get_next_task: See get_next_task_fn() above. This must be
>  	 * specified.
> @@ -547,6 +552,10 @@ struct run_process_parallel_opts
>   * NULL "struct strbuf *out" parameter, and are responsible for
>   * emitting their own output, including dealing with any race
>   * conditions due to writing in parallel to stdout and stderr.
> + * 
> + * If the "hide_output" option is set, any output in the "struct strbuf
> + * *out" parameter will not be printed by this function. This includes
> + * output from child processes as well as callbacks.
>   */
>  void run_processes_parallel(const struct run_process_parallel_opts *opts);
>  
> diff --git a/t/helper/test-run-command.c b/t/helper/test-run-command.c
> index e9b41419a0..1af443db4d 100644
> --- a/t/helper/test-run-command.c
> +++ b/t/helper/test-run-command.c
> @@ -446,6 +446,12 @@ int cmd__run_command(int argc, const char **argv)
>  		opts.ungroup = 1;
>  	}
>  
> +	if (!strcmp(argv[1], "--hide-output")) {
> +		argv += 1;
> +		argc -= 1;
> +		opts.hide_output = 1;
> +	}
> +
>  	if (!strcmp(argv[1], "--pipe-output")) {
>  		argv += 1;
>  		argc -= 1;
> diff --git a/t/t0061-run-command.sh b/t/t0061-run-command.sh
> index e50e57db89..a0219f6093 100755
> --- a/t/t0061-run-command.sh
> +++ b/t/t0061-run-command.sh
> @@ -180,6 +180,12 @@ test_expect_success 'run_command runs ungrouped in parallel with more tasks than
>  	test_line_count = 4 err
>  '
>  
> +test_expect_success 'run_command hides output when run in parallel' '
> +	test-tool run-command --hide-output run-command-parallel 4 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err &&
> +	test_must_be_empty out &&
> +	test_must_be_empty err
> +'
> +
>  cat >expect <<-EOF
>  preloaded output of a child
>  asking for a quick stop

I may just be missing something, but doesn't "struct child_process"
already have e.g. "no_stderr", "no_stdout" etc. that we can use?
I.e. isn't this thing equivalent to running:

	your-command >/dev/null 2>/dev/null

Which is what the non-parallel API already supports.

Now, IIRC if you just set that in the "get_next_task" callback it won't
work in the parallel API, or you'll block waiting for I/O that'll never
come or whatever.

But that'll be because the parallel interface currently only suppors a
subset of the full "child_process" combination of options, and maybe it
doesn't grok this.

But if that's the case we should just extend the API to support
"no_stdout", "no_stderr" etc., no?

I.e. hypothetically the parallel one could support 100% of the "struct
child_process" combination of options, we just haven't bothered yet.

But I don't see why the parallel API should grow options that we already
have in "struct child_process", instead we should set them there, and it
should gradually learn to deal with them.

I think it's also fine to have some basic sanity checks there, e.g. I
could see how for something like this we don't want to support piping
only some children to /dev/null but not others, and that it should be
all or nothing (maybe it makes state management when we loop over them
easier).

Or again, maybe I'm missing something...
Calvin Wan Oct. 24, 2022, 7:24 p.m. UTC | #2
> I may just be missing something, but doesn't "struct child_process"
> already have e.g. "no_stderr", "no_stdout" etc. that we can use?
> I.e. isn't this thing equivalent to running:
>
>         your-command >/dev/null 2>/dev/null
>
> Which is what the non-parallel API already supports.
>
> Now, IIRC if you just set that in the "get_next_task" callback it won't
> work in the parallel API, or you'll block waiting for I/O that'll never
> come or whatever.
>
> But that'll be because the parallel interface currently only suppors a
> subset of the full "child_process" combination of options, and maybe it
> doesn't grok this.
>
> But if that's the case we should just extend the API to support
> "no_stdout", "no_stderr" etc., no?
>
> I.e. hypothetically the parallel one could support 100% of the "struct
> child_process" combination of options, we just haven't bothered yet.
>
> But I don't see why the parallel API should grow options that we already
> have in "struct child_process", instead we should set them there, and it
> should gradually learn to deal with them.
>
> I think it's also fine to have some basic sanity checks there, e.g. I
> could see how for something like this we don't want to support piping
> only some children to /dev/null but not others, and that it should be
> all or nothing (maybe it makes state management when we loop over them
> easier).
>
> Or again, maybe I'm missing something...

Shouldn't the options that are set in "child_process" be abstracted away
from "parallel_processes"? Setting "no_stdout", "no_stderr", etc. in a
"child_process" shouldn't imply that we still pass the stdout and stderr to
 "parallel_processes" and then we send the output to "/dev/null".

That being said, I can understand the aversion to adding an option like
this that doesn't also add support for stdout and stderr. I can remove this
patch and instead reset the buffer inside of pipe_output and task_finished
in a later patch
Ævar Arnfjörð Bjarmason Oct. 25, 2022, 7:32 p.m. UTC | #3
On Mon, Oct 24 2022, Calvin Wan wrote:

>> I may just be missing something, but doesn't "struct child_process"
>> already have e.g. "no_stderr", "no_stdout" etc. that we can use?
>> I.e. isn't this thing equivalent to running:
>>
>>         your-command >/dev/null 2>/dev/null
>>
>> Which is what the non-parallel API already supports.
>>
>> Now, IIRC if you just set that in the "get_next_task" callback it won't
>> work in the parallel API, or you'll block waiting for I/O that'll never
>> come or whatever.
>>
>> But that'll be because the parallel interface currently only suppors a
>> subset of the full "child_process" combination of options, and maybe it
>> doesn't grok this.
>>
>> But if that's the case we should just extend the API to support
>> "no_stdout", "no_stderr" etc., no?
>>
>> I.e. hypothetically the parallel one could support 100% of the "struct
>> child_process" combination of options, we just haven't bothered yet.
>>
>> But I don't see why the parallel API should grow options that we already
>> have in "struct child_process", instead we should set them there, and it
>> should gradually learn to deal with them.
>>
>> I think it's also fine to have some basic sanity checks there, e.g. I
>> could see how for something like this we don't want to support piping
>> only some children to /dev/null but not others, and that it should be
>> all or nothing (maybe it makes state management when we loop over them
>> easier).
>>
>> Or again, maybe I'm missing something...
>
> Shouldn't the options that are set in "child_process" be abstracted away
> from "parallel_processes"?

In general yes, and no :)

Our main interafce should probably be "just set
these in the 'struct child_process' we hand you", but the parallel API
might want to assert certain things about those settings, as some of
them may conflict with its assumptions.

> Setting "no_stdout", "no_stderr", etc. in a
> "child_process" shouldn't imply that we still pass the stdout and stderr to
>  "parallel_processes" and then we send the output to "/dev/null".

Sure, but if they're not producing any output because it's being piped
to /dev/null how worthwhile is it to optimize that?

We still can optimize it, but I still think the interface should just be
the equivalent of:

	parallel -k -j100% 'sleep 0.0$RANDOM && echo {} >/dev/null' ::: {1..100}

Whereas what you seem to be trying to implement is the equivalent of a:

	parallel -u -j100% 'sleep 0.0$RANDOM && echo {} ::: {1..100} >/dev/null

Except as an option to the parallel API, but the end result seems to be
equivalent.

> That being said, I can understand the aversion to adding an option like
> this that doesn't also add support for stdout and stderr. I can remove this
> patch and instead reset the buffer inside of pipe_output and task_finished
> in a later patch

I'm not necessarily opposed to it, just puzzled about it, maybe I don't
have the full picture.

In general I highly recomend looking at whatever GNU parallel is doing,
and seeing if new features in run-command.[ch] can map to that mental
model.

Our API is basically a small subset of its featureset, and I've found it
useful both to steal ideas from there, and to test
assumptions. E.g. "ungroup" is just a straight rip-off of the
"--ungroup" option, it's also had to think about combining various
options we don't have yet (but might want).

In that case the supervisor API/parallel(1) needs to do something
special, but for "I don't want output" it seems best to just do that at
the worker level, i.e. equivalent to piping to /dev/null.

Having a bias towards that approach also makes it easier to convert
things to running in parallel, i.e. you just (mostly) keep your current
"struct child_process", and don't need to find the equivalents in the
parallel API.
Calvin Wan Oct. 25, 2022, 9:22 p.m. UTC | #4
> > Setting "no_stdout", "no_stderr", etc. in a
> > "child_process" shouldn't imply that we still pass the stdout and stderr to
> >  "parallel_processes" and then we send the output to "/dev/null".
>
> Sure, but if they're not producing any output because it's being piped
> to /dev/null how worthwhile is it to optimize that?
>
> We still can optimize it, but I still think the interface should just be
> the equivalent of:
>
>         parallel -k -j100% 'sleep 0.0$RANDOM && echo {} >/dev/null' ::: {1..100}
>
> Whereas what you seem to be trying to implement is the equivalent of a:
>
>         parallel -u -j100% 'sleep 0.0$RANDOM && echo {} ::: {1..100} >/dev/null
>
> Except as an option to the parallel API, but the end result seems to be
> equivalent.
>
> > That being said, I can understand the aversion to adding an option like
> > this that doesn't also add support for stdout and stderr. I can remove this
> > patch and instead reset the buffer inside of pipe_output and task_finished
> > in a later patch
>
> I'm not necessarily opposed to it, just puzzled about it, maybe I don't
> have the full picture.
>
> In general I highly recomend looking at whatever GNU parallel is doing,
> and seeing if new features in run-command.[ch] can map to that mental
> model.
>
> Our API is basically a small subset of its featureset, and I've found it
> useful both to steal ideas from there, and to test
> assumptions. E.g. "ungroup" is just a straight rip-off of the
> "--ungroup" option, it's also had to think about combining various
> options we don't have yet (but might want).
>
> In that case the supervisor API/parallel(1) needs to do something
> special, but for "I don't want output" it seems best to just do that at
> the worker level, i.e. equivalent to piping to /dev/null.

Well I want the output to be able to parse it, but not to print it. Piping
to /dev/null at the worker level denies me the ability to parse it in the
parent process.

Am I understanding correctly that what you're suggesting is if a child
process has "no_stderr" and "no_stdout" set to true, then
parallel_processes would temporarily set them to false before
start_command, and then honor it later after the output is read?
This would allow me to call pipe_output and parse it before sending
the output to /dev/null without the need for "hide_output"
diff mbox series

Patch

diff --git a/run-command.c b/run-command.c
index 03787bc7f5..3aa28ad6dc 100644
--- a/run-command.c
+++ b/run-command.c
@@ -1603,7 +1603,8 @@  static void pp_cleanup(struct parallel_processes *pp,
 	 * When get_next_task added messages to the buffer in its last
 	 * iteration, the buffered output is non empty.
 	 */
-	strbuf_write(&pp->buffered_output, stderr);
+	if (!opts->hide_output)
+		strbuf_write(&pp->buffered_output, stderr);
 	strbuf_release(&pp->buffered_output);
 
 	sigchain_pop_common();
@@ -1754,7 +1755,7 @@  static int pp_collect_finished(struct parallel_processes *pp,
 			pp->pfd[i].fd = -1;
 		child_process_init(&pp->children[i].process);
 
-		if (opts->ungroup) {
+		if (opts->ungroup || opts->hide_output) {
 			; /* no strbuf_*() work to do here */
 		} else if (i != pp->output_owner) {
 			strbuf_addbuf(&pp->buffered_output, &pp->children[i].err);
@@ -1826,7 +1827,8 @@  void run_processes_parallel(const struct run_process_parallel_opts *opts)
 				pp.children[i].state = GIT_CP_WAIT_CLEANUP;
 		} else {
 			pp_buffer_stderr(&pp, opts, output_timeout);
-			pp_output(&pp);
+			if (!opts->hide_output)
+				pp_output(&pp);
 		}
 		code = pp_collect_finished(&pp, opts);
 		if (code) {
diff --git a/run-command.h b/run-command.h
index b4584c3698..4570812c08 100644
--- a/run-command.h
+++ b/run-command.h
@@ -496,6 +496,11 @@  struct run_process_parallel_opts
 	 */
 	unsigned int ungroup:1;
 
+	/**
+	 * hide_output: see 'hide_output' in run_processes_parallel() below.
+	 */
+	unsigned int hide_output:1;
+
 	/**
 	 * get_next_task: See get_next_task_fn() above. This must be
 	 * specified.
@@ -547,6 +552,10 @@  struct run_process_parallel_opts
  * NULL "struct strbuf *out" parameter, and are responsible for
  * emitting their own output, including dealing with any race
  * conditions due to writing in parallel to stdout and stderr.
+ * 
+ * If the "hide_output" option is set, any output in the "struct strbuf
+ * *out" parameter will not be printed by this function. This includes
+ * output from child processes as well as callbacks.
  */
 void run_processes_parallel(const struct run_process_parallel_opts *opts);
 
diff --git a/t/helper/test-run-command.c b/t/helper/test-run-command.c
index e9b41419a0..1af443db4d 100644
--- a/t/helper/test-run-command.c
+++ b/t/helper/test-run-command.c
@@ -446,6 +446,12 @@  int cmd__run_command(int argc, const char **argv)
 		opts.ungroup = 1;
 	}
 
+	if (!strcmp(argv[1], "--hide-output")) {
+		argv += 1;
+		argc -= 1;
+		opts.hide_output = 1;
+	}
+
 	if (!strcmp(argv[1], "--pipe-output")) {
 		argv += 1;
 		argc -= 1;
diff --git a/t/t0061-run-command.sh b/t/t0061-run-command.sh
index e50e57db89..a0219f6093 100755
--- a/t/t0061-run-command.sh
+++ b/t/t0061-run-command.sh
@@ -180,6 +180,12 @@  test_expect_success 'run_command runs ungrouped in parallel with more tasks than
 	test_line_count = 4 err
 '
 
+test_expect_success 'run_command hides output when run in parallel' '
+	test-tool run-command --hide-output run-command-parallel 4 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err &&
+	test_must_be_empty out &&
+	test_must_be_empty err
+'
+
 cat >expect <<-EOF
 preloaded output of a child
 asking for a quick stop