Message ID | 20221020232532.1128326-3-calvinwan@google.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | submodule: parallelize diff | expand |
On Thu, Oct 20 2022, Calvin Wan wrote: > Output from child processes and callbacks may not be necessary to > print out for every invoker of run_processes_parallel. Add > hide_output as an option to not print said output. > > Signed-off-by: Calvin Wan <calvinwan@google.com> > --- > run-command.c | 8 +++++--- > run-command.h | 9 +++++++++ > t/helper/test-run-command.c | 6 ++++++ > t/t0061-run-command.sh | 6 ++++++ > 4 files changed, 26 insertions(+), 3 deletions(-) > > diff --git a/run-command.c b/run-command.c > index 03787bc7f5..3aa28ad6dc 100644 > --- a/run-command.c > +++ b/run-command.c > @@ -1603,7 +1603,8 @@ static void pp_cleanup(struct parallel_processes *pp, > * When get_next_task added messages to the buffer in its last > * iteration, the buffered output is non empty. > */ > - strbuf_write(&pp->buffered_output, stderr); > + if (!opts->hide_output) > + strbuf_write(&pp->buffered_output, stderr); > strbuf_release(&pp->buffered_output); > > sigchain_pop_common(); > @@ -1754,7 +1755,7 @@ static int pp_collect_finished(struct parallel_processes *pp, > pp->pfd[i].fd = -1; > child_process_init(&pp->children[i].process); > > - if (opts->ungroup) { > + if (opts->ungroup || opts->hide_output) { > ; /* no strbuf_*() work to do here */ > } else if (i != pp->output_owner) { > strbuf_addbuf(&pp->buffered_output, &pp->children[i].err); > @@ -1826,7 +1827,8 @@ void run_processes_parallel(const struct run_process_parallel_opts *opts) > pp.children[i].state = GIT_CP_WAIT_CLEANUP; > } else { > pp_buffer_stderr(&pp, opts, output_timeout); > - pp_output(&pp); > + if (!opts->hide_output) > + pp_output(&pp); > } > code = pp_collect_finished(&pp, opts); > if (code) { > diff --git a/run-command.h b/run-command.h > index b4584c3698..4570812c08 100644 > --- a/run-command.h > +++ b/run-command.h > @@ -496,6 +496,11 @@ struct run_process_parallel_opts > */ > unsigned int ungroup:1; > > + /** > + * hide_output: see 'hide_output' in run_processes_parallel() below. > + */ > + unsigned int hide_output:1; > + > /** > * get_next_task: See get_next_task_fn() above. This must be > * specified. > @@ -547,6 +552,10 @@ struct run_process_parallel_opts > * NULL "struct strbuf *out" parameter, and are responsible for > * emitting their own output, including dealing with any race > * conditions due to writing in parallel to stdout and stderr. > + * > + * If the "hide_output" option is set, any output in the "struct strbuf > + * *out" parameter will not be printed by this function. This includes > + * output from child processes as well as callbacks. > */ > void run_processes_parallel(const struct run_process_parallel_opts *opts); > > diff --git a/t/helper/test-run-command.c b/t/helper/test-run-command.c > index e9b41419a0..1af443db4d 100644 > --- a/t/helper/test-run-command.c > +++ b/t/helper/test-run-command.c > @@ -446,6 +446,12 @@ int cmd__run_command(int argc, const char **argv) > opts.ungroup = 1; > } > > + if (!strcmp(argv[1], "--hide-output")) { > + argv += 1; > + argc -= 1; > + opts.hide_output = 1; > + } > + > if (!strcmp(argv[1], "--pipe-output")) { > argv += 1; > argc -= 1; > diff --git a/t/t0061-run-command.sh b/t/t0061-run-command.sh > index e50e57db89..a0219f6093 100755 > --- a/t/t0061-run-command.sh > +++ b/t/t0061-run-command.sh > @@ -180,6 +180,12 @@ test_expect_success 'run_command runs ungrouped in parallel with more tasks than > test_line_count = 4 err > ' > > +test_expect_success 'run_command hides output when run in parallel' ' > + test-tool run-command --hide-output run-command-parallel 4 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err && > + test_must_be_empty out && > + test_must_be_empty err > +' > + > cat >expect <<-EOF > preloaded output of a child > asking for a quick stop I may just be missing something, but doesn't "struct child_process" already have e.g. "no_stderr", "no_stdout" etc. that we can use? I.e. isn't this thing equivalent to running: your-command >/dev/null 2>/dev/null Which is what the non-parallel API already supports. Now, IIRC if you just set that in the "get_next_task" callback it won't work in the parallel API, or you'll block waiting for I/O that'll never come or whatever. But that'll be because the parallel interface currently only suppors a subset of the full "child_process" combination of options, and maybe it doesn't grok this. But if that's the case we should just extend the API to support "no_stdout", "no_stderr" etc., no? I.e. hypothetically the parallel one could support 100% of the "struct child_process" combination of options, we just haven't bothered yet. But I don't see why the parallel API should grow options that we already have in "struct child_process", instead we should set them there, and it should gradually learn to deal with them. I think it's also fine to have some basic sanity checks there, e.g. I could see how for something like this we don't want to support piping only some children to /dev/null but not others, and that it should be all or nothing (maybe it makes state management when we loop over them easier). Or again, maybe I'm missing something...
> I may just be missing something, but doesn't "struct child_process" > already have e.g. "no_stderr", "no_stdout" etc. that we can use? > I.e. isn't this thing equivalent to running: > > your-command >/dev/null 2>/dev/null > > Which is what the non-parallel API already supports. > > Now, IIRC if you just set that in the "get_next_task" callback it won't > work in the parallel API, or you'll block waiting for I/O that'll never > come or whatever. > > But that'll be because the parallel interface currently only suppors a > subset of the full "child_process" combination of options, and maybe it > doesn't grok this. > > But if that's the case we should just extend the API to support > "no_stdout", "no_stderr" etc., no? > > I.e. hypothetically the parallel one could support 100% of the "struct > child_process" combination of options, we just haven't bothered yet. > > But I don't see why the parallel API should grow options that we already > have in "struct child_process", instead we should set them there, and it > should gradually learn to deal with them. > > I think it's also fine to have some basic sanity checks there, e.g. I > could see how for something like this we don't want to support piping > only some children to /dev/null but not others, and that it should be > all or nothing (maybe it makes state management when we loop over them > easier). > > Or again, maybe I'm missing something... Shouldn't the options that are set in "child_process" be abstracted away from "parallel_processes"? Setting "no_stdout", "no_stderr", etc. in a "child_process" shouldn't imply that we still pass the stdout and stderr to "parallel_processes" and then we send the output to "/dev/null". That being said, I can understand the aversion to adding an option like this that doesn't also add support for stdout and stderr. I can remove this patch and instead reset the buffer inside of pipe_output and task_finished in a later patch
On Mon, Oct 24 2022, Calvin Wan wrote: >> I may just be missing something, but doesn't "struct child_process" >> already have e.g. "no_stderr", "no_stdout" etc. that we can use? >> I.e. isn't this thing equivalent to running: >> >> your-command >/dev/null 2>/dev/null >> >> Which is what the non-parallel API already supports. >> >> Now, IIRC if you just set that in the "get_next_task" callback it won't >> work in the parallel API, or you'll block waiting for I/O that'll never >> come or whatever. >> >> But that'll be because the parallel interface currently only suppors a >> subset of the full "child_process" combination of options, and maybe it >> doesn't grok this. >> >> But if that's the case we should just extend the API to support >> "no_stdout", "no_stderr" etc., no? >> >> I.e. hypothetically the parallel one could support 100% of the "struct >> child_process" combination of options, we just haven't bothered yet. >> >> But I don't see why the parallel API should grow options that we already >> have in "struct child_process", instead we should set them there, and it >> should gradually learn to deal with them. >> >> I think it's also fine to have some basic sanity checks there, e.g. I >> could see how for something like this we don't want to support piping >> only some children to /dev/null but not others, and that it should be >> all or nothing (maybe it makes state management when we loop over them >> easier). >> >> Or again, maybe I'm missing something... > > Shouldn't the options that are set in "child_process" be abstracted away > from "parallel_processes"? In general yes, and no :) Our main interafce should probably be "just set these in the 'struct child_process' we hand you", but the parallel API might want to assert certain things about those settings, as some of them may conflict with its assumptions. > Setting "no_stdout", "no_stderr", etc. in a > "child_process" shouldn't imply that we still pass the stdout and stderr to > "parallel_processes" and then we send the output to "/dev/null". Sure, but if they're not producing any output because it's being piped to /dev/null how worthwhile is it to optimize that? We still can optimize it, but I still think the interface should just be the equivalent of: parallel -k -j100% 'sleep 0.0$RANDOM && echo {} >/dev/null' ::: {1..100} Whereas what you seem to be trying to implement is the equivalent of a: parallel -u -j100% 'sleep 0.0$RANDOM && echo {} ::: {1..100} >/dev/null Except as an option to the parallel API, but the end result seems to be equivalent. > That being said, I can understand the aversion to adding an option like > this that doesn't also add support for stdout and stderr. I can remove this > patch and instead reset the buffer inside of pipe_output and task_finished > in a later patch I'm not necessarily opposed to it, just puzzled about it, maybe I don't have the full picture. In general I highly recomend looking at whatever GNU parallel is doing, and seeing if new features in run-command.[ch] can map to that mental model. Our API is basically a small subset of its featureset, and I've found it useful both to steal ideas from there, and to test assumptions. E.g. "ungroup" is just a straight rip-off of the "--ungroup" option, it's also had to think about combining various options we don't have yet (but might want). In that case the supervisor API/parallel(1) needs to do something special, but for "I don't want output" it seems best to just do that at the worker level, i.e. equivalent to piping to /dev/null. Having a bias towards that approach also makes it easier to convert things to running in parallel, i.e. you just (mostly) keep your current "struct child_process", and don't need to find the equivalents in the parallel API.
> > Setting "no_stdout", "no_stderr", etc. in a > > "child_process" shouldn't imply that we still pass the stdout and stderr to > > "parallel_processes" and then we send the output to "/dev/null". > > Sure, but if they're not producing any output because it's being piped > to /dev/null how worthwhile is it to optimize that? > > We still can optimize it, but I still think the interface should just be > the equivalent of: > > parallel -k -j100% 'sleep 0.0$RANDOM && echo {} >/dev/null' ::: {1..100} > > Whereas what you seem to be trying to implement is the equivalent of a: > > parallel -u -j100% 'sleep 0.0$RANDOM && echo {} ::: {1..100} >/dev/null > > Except as an option to the parallel API, but the end result seems to be > equivalent. > > > That being said, I can understand the aversion to adding an option like > > this that doesn't also add support for stdout and stderr. I can remove this > > patch and instead reset the buffer inside of pipe_output and task_finished > > in a later patch > > I'm not necessarily opposed to it, just puzzled about it, maybe I don't > have the full picture. > > In general I highly recomend looking at whatever GNU parallel is doing, > and seeing if new features in run-command.[ch] can map to that mental > model. > > Our API is basically a small subset of its featureset, and I've found it > useful both to steal ideas from there, and to test > assumptions. E.g. "ungroup" is just a straight rip-off of the > "--ungroup" option, it's also had to think about combining various > options we don't have yet (but might want). > > In that case the supervisor API/parallel(1) needs to do something > special, but for "I don't want output" it seems best to just do that at > the worker level, i.e. equivalent to piping to /dev/null. Well I want the output to be able to parse it, but not to print it. Piping to /dev/null at the worker level denies me the ability to parse it in the parent process. Am I understanding correctly that what you're suggesting is if a child process has "no_stderr" and "no_stdout" set to true, then parallel_processes would temporarily set them to false before start_command, and then honor it later after the output is read? This would allow me to call pipe_output and parse it before sending the output to /dev/null without the need for "hide_output"
diff --git a/run-command.c b/run-command.c index 03787bc7f5..3aa28ad6dc 100644 --- a/run-command.c +++ b/run-command.c @@ -1603,7 +1603,8 @@ static void pp_cleanup(struct parallel_processes *pp, * When get_next_task added messages to the buffer in its last * iteration, the buffered output is non empty. */ - strbuf_write(&pp->buffered_output, stderr); + if (!opts->hide_output) + strbuf_write(&pp->buffered_output, stderr); strbuf_release(&pp->buffered_output); sigchain_pop_common(); @@ -1754,7 +1755,7 @@ static int pp_collect_finished(struct parallel_processes *pp, pp->pfd[i].fd = -1; child_process_init(&pp->children[i].process); - if (opts->ungroup) { + if (opts->ungroup || opts->hide_output) { ; /* no strbuf_*() work to do here */ } else if (i != pp->output_owner) { strbuf_addbuf(&pp->buffered_output, &pp->children[i].err); @@ -1826,7 +1827,8 @@ void run_processes_parallel(const struct run_process_parallel_opts *opts) pp.children[i].state = GIT_CP_WAIT_CLEANUP; } else { pp_buffer_stderr(&pp, opts, output_timeout); - pp_output(&pp); + if (!opts->hide_output) + pp_output(&pp); } code = pp_collect_finished(&pp, opts); if (code) { diff --git a/run-command.h b/run-command.h index b4584c3698..4570812c08 100644 --- a/run-command.h +++ b/run-command.h @@ -496,6 +496,11 @@ struct run_process_parallel_opts */ unsigned int ungroup:1; + /** + * hide_output: see 'hide_output' in run_processes_parallel() below. + */ + unsigned int hide_output:1; + /** * get_next_task: See get_next_task_fn() above. This must be * specified. @@ -547,6 +552,10 @@ struct run_process_parallel_opts * NULL "struct strbuf *out" parameter, and are responsible for * emitting their own output, including dealing with any race * conditions due to writing in parallel to stdout and stderr. + * + * If the "hide_output" option is set, any output in the "struct strbuf + * *out" parameter will not be printed by this function. This includes + * output from child processes as well as callbacks. */ void run_processes_parallel(const struct run_process_parallel_opts *opts); diff --git a/t/helper/test-run-command.c b/t/helper/test-run-command.c index e9b41419a0..1af443db4d 100644 --- a/t/helper/test-run-command.c +++ b/t/helper/test-run-command.c @@ -446,6 +446,12 @@ int cmd__run_command(int argc, const char **argv) opts.ungroup = 1; } + if (!strcmp(argv[1], "--hide-output")) { + argv += 1; + argc -= 1; + opts.hide_output = 1; + } + if (!strcmp(argv[1], "--pipe-output")) { argv += 1; argc -= 1; diff --git a/t/t0061-run-command.sh b/t/t0061-run-command.sh index e50e57db89..a0219f6093 100755 --- a/t/t0061-run-command.sh +++ b/t/t0061-run-command.sh @@ -180,6 +180,12 @@ test_expect_success 'run_command runs ungrouped in parallel with more tasks than test_line_count = 4 err ' +test_expect_success 'run_command hides output when run in parallel' ' + test-tool run-command --hide-output run-command-parallel 4 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err && + test_must_be_empty out && + test_must_be_empty err +' + cat >expect <<-EOF preloaded output of a child asking for a quick stop
Output from child processes and callbacks may not be necessary to print out for every invoker of run_processes_parallel. Add hide_output as an option to not print said output. Signed-off-by: Calvin Wan <calvinwan@google.com> --- run-command.c | 8 +++++--- run-command.h | 9 +++++++++ t/helper/test-run-command.c | 6 ++++++ t/t0061-run-command.sh | 6 ++++++ 4 files changed, 26 insertions(+), 3 deletions(-)