Message ID | 20220922232947.631309-2-calvinwan@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | submodule: parallelize status | expand |
On Thu, Sep 22 2022, Calvin Wan wrote: > diff --git a/run-command.c b/run-command.c > index 14f17830f5..893bc1d294 100644 > --- a/run-command.c > +++ b/run-command.c > @@ -1474,6 +1474,7 @@ enum child_state { > }; > > int run_processes_parallel_ungroup; > +int run_processes_parallel_pipe_output; > struct parallel_processes { > void *data; > > @@ -1770,10 +1771,12 @@ int run_processes_parallel(int n, > int output_timeout = 100; > int spawn_cap = 4; > int ungroup = run_processes_parallel_ungroup; > + int pipe_output = run_processes_parallel_pipe_output; > struct parallel_processes pp; > > /* unset for the next API user */ > run_processes_parallel_ungroup = 0; > + run_processes_parallel_pipe_output = 0; > > pp_init(&pp, n, get_next_task, start_failure, task_finished, pp_cb, > ungroup); I guess we could live with this, but this passing a function argument as a global variable interface is something that came out of a topic to fix a release regression: https://lore.kernel.org/git/cover-0.6-00000000000-20220421T122108Z-avarab@gmail.com/ An earlier version of that series simply changed the API to pass an "opts" struct instead: https://lore.kernel.org/git/patch-v2-2.8-5f0a6e9925f-20220518T195858Z-avarab@gmail.com/ I really should have submitted those post-release cleanup patches already, and I'm not sure whether the right thing at this point is to take this & do the cleanup for "ungroup" *and* this new argument later. But maybe you're interested in cherry-picking & adjusting the relevant part of that series for this one? I.e. we're not in some post-release regression hurry, so rather than extending the use of this odd interface we could (and maybe should) just fix how we're doing it first. On the implementation: > + * If the "pipe_output" option is specified, the output will be piped > + * to task_finished_fn in the "struct strbuf *out" variable. The output > + * will still be printed unless the callback resets the strbuf. The > + * "pipe_output" option can be enabled by setting the global > + * "run_processes_parallel_pipe_output" to "1" before invoking > + * run_processes_parallel(), it will be set back to "0" as soon as the > + * API reads that setting. ...okey, but... > +static int task_finished_pipe_output(int result, > + struct strbuf *err, > + void *pp_cb, > + void *pp_task_cb) > +{ > + if (err && pipe_output) { > + fprintf(stderr, "%s", err->buf); > + strbuf_reset(err); ...my memory's hazy, and I haven't re-logged in any detail, but is it really the API interface here that the "output" callback function is responsible for resetting the strbuf that the API gives to it? That seems backwards to me, and e.g. a look at "start_failure" shows that we strbuf_reset() the "err". What's the point of doing it in the API consumer? If it doesn't do it we'll presumably keep accumulating output. Is there a use-case for that? Or perhaps it's not needed & this is really just misleading boilerplate? > @@ -140,6 +140,11 @@ test_expect_success 'run_command runs ungrouped in parallel with more jobs avail > test_line_count = 4 err > ' > > +test_expect_success 'run_command runs pipe_output in parallel with more jobs available than tasks' ' > + test-tool run-command --pipe-output run-command-parallel 5 sh -c "printf \"%s\n%s\n\" Hello World" 2>actual && > + test_cmp expect actual > +' > + Like the global argument, the copy/pasting for "ungroup" was mostly a matter of expediency. But at least in that case we have a different assertion (test_cmp v.s. test_line_count). But here this test case seems to be exactly the same as for the "vanilla" version. So can't we make this some: for opt in '' '--pipe-output' do test_expect_success ... done ?
Calvin Wan <calvinwan@google.com> writes: > run_processes_parallel periodically collects output from its child > processes, prints it, and then resets the buffers for each child. > Add run_processes_parallel_pipe_output variable so output can be > collected and fed to task_finished. When set, the function referenced > by task_finished should parse the output of each child process. I am puzzled. * Why are we configuring an API behaviour via a global variable in 21st century? * The name "task_finished" is mentioned, but it is unclear what it is. Is it one of the parameters to run_process_parallel()? * Is the effect of the new feature that task_finished callback is called with the output, in addition to the normal output? I am not sure why it is called "pipe". The task_finished callback may be free to fork a child and send the received output from the task to that child over the pipe, but that is what a client code could do and is inappropriate to base the name of the mechanism, isn't it? > @@ -1770,10 +1771,12 @@ int run_processes_parallel(int n, > int output_timeout = 100; > int spawn_cap = 4; > int ungroup = run_processes_parallel_ungroup; > + int pipe_output = run_processes_parallel_pipe_output; > struct parallel_processes pp; > > /* unset for the next API user */ > run_processes_parallel_ungroup = 0; > + run_processes_parallel_pipe_output = 0; > > pp_init(&pp, n, get_next_task, start_failure, task_finished, pp_cb, > ungroup); > @@ -1800,7 +1803,8 @@ int run_processes_parallel(int n, > pp.children[i].state = GIT_CP_WAIT_CLEANUP; > } else { > pp_buffer_stderr(&pp, output_timeout); > - pp_output(&pp); > + if (!pipe_output) > + pp_output(&pp); So, we do not send the output from the child to the regular output channel when pipe_output is in effect. OK. > } > code = pp_collect_finished(&pp); > if (code) { And no other code changes? This is quite different from what I expected from reading the proposed log message. Am I correct to say that under this new mode, we no longer flush any output while the child task is running (due to the change in the above hunk to omit calls to pp_output() during the run) and instead keep accumulating in the strbuf, until the child task finishes, at which time pp_collect_finished() will call task_finished callback. Even though the callback usually consumes the last piece of the output since the last pp_output() call made during the normal execution of the run_processes_parallel() loop, because we omitted these calls, we have full output from the child task accumulated in the children[].err strbuf. We may still not output .err for real, as we may not be the output_owner, in which case we may only append to .buffered_output member. I am puzzled simply because, if the above summary is correct, I do not see how a word "pipe" have a chance to come into the picture. I can sort of see that in this mode, we would end up buffering the entire output from each child task into one strbuf each, and can avoid stalling the child tasks waiting for their turn to see their output pipes drained. But is this a reasonable thing to do? How do we control the memory consumption to avoid having to spool unbounded amount of output from child tasks in core, or do we have a good reason to believe that we do not have to bother? Thanks.
> An earlier version of that series simply changed the API to pass an > "opts" struct instead: > https://lore.kernel.org/git/patch-v2-2.8-5f0a6e9925f-20220518T195858Z-avarab@gmail.com/ > > I really should have submitted those post-release cleanup patches > already, and I'm not sure whether the right thing at this point is to > take this & do the cleanup for "ungroup" *and* this new argument later. > > But maybe you're interested in cherry-picking & adjusting the relevant > part of that series for this one? I.e. we're not in some post-release > regression hurry, so rather than extending the use of this odd interface > we could (and maybe should) just fix how we're doing it first. I'll go ahead and give this a try. I was also a little bit surprised that "ungroup" was set this way, but didn't realize it was for a quick fix. > > On the implementation: > > > + * If the "pipe_output" option is specified, the output will be piped > > + * to task_finished_fn in the "struct strbuf *out" variable. The output > > + * will still be printed unless the callback resets the strbuf. The > > + * "pipe_output" option can be enabled by setting the global > > + * "run_processes_parallel_pipe_output" to "1" before invoking > > + * run_processes_parallel(), it will be set back to "0" as soon as the > > + * API reads that setting. > > ...okey, but... > > > +static int task_finished_pipe_output(int result, > > + struct strbuf *err, > > + void *pp_cb, > > + void *pp_task_cb) > > +{ > > + if (err && pipe_output) { > > + fprintf(stderr, "%s", err->buf); > > + strbuf_reset(err); > > ...my memory's hazy, and I haven't re-logged in any detail, but is it > really the API interface here that the "output" callback function is > responsible for resetting the strbuf that the API gives to it? > > That seems backwards to me, and e.g. a look at "start_failure" shows > that we strbuf_reset() the "err". > > What's the point of doing it in the API consumer? If it doesn't do it > we'll presumably keep accumulating output. Is there a use-case for that? > > Or perhaps it's not needed & this is really just misleading boilerplate? Ultimately it is not needed -- I added it as an example to showcase that the output is correctly being piped to "task_finished_pipe_output". The reset is necessary in this case to prevent the output from being printed twice. I'm not sure how exactly else I would go about testing "pipe_output". > > > @@ -140,6 +140,11 @@ test_expect_success 'run_command runs ungrouped in parallel with more jobs avail > > test_line_count = 4 err > > ' > > > > +test_expect_success 'run_command runs pipe_output in parallel with more jobs available than tasks' ' > > + test-tool run-command --pipe-output run-command-parallel 5 sh -c "printf \"%s\n%s\n\" Hello World" 2>actual && > > + test_cmp expect actual > > +' > > + > > Like the global argument, the copy/pasting for "ungroup" was mostly a > matter of expediency. > > But at least in that case we have a different assertion (test_cmp > v.s. test_line_count). > > But here this test case seems to be exactly the same as for the > "vanilla" version. > > So can't we make this some: > > for opt in '' '--pipe-output' > do > test_expect_success ... > done > > ? Yes we can -- but I may need to rethink how instead I should be testing this option?
> * Why are we configuring an API behaviour via a global variable in > 21st century? I was mimicking how "ungroup" worked, but now that Avar mentions that pattern was for a quick regression fix, I can fix it to pass it in as a parameter. > * The name "task_finished" is mentioned, but it is unclear what it > is. Is it one of the parameters to run_process_parallel()? It is one of the callback functions passed in as a parameter to run_process_paraller(). I'll go ahead and clarify that. > * Is the effect of the new feature that task_finished callback is > called with the output, in addition to the normal output? I am > not sure why it is called "pipe". The task_finished callback may > be free to fork a child and send the received output from the > task to that child over the pipe, but that is what a client code > could do and is inappropriate to base the name of the mechanism, > isn't it? The output in task_finished callback, before pipe_output, either contains part of the output or the entire output of the child process, since the output is periodically collected into stderr and then reset. The intention of output I believe is for the caller to be able to add anything they would like to the end (this can be seen with functions like fetch_finished() in builtin/fetch.c). My intention with pipe_output is to guarantee that output contains the entire output of the child process so task_finished can utilize it. > > > @@ -1770,10 +1771,12 @@ int run_processes_parallel(int n, > > int output_timeout = 100; > > int spawn_cap = 4; > > int ungroup = run_processes_parallel_ungroup; > > + int pipe_output = run_processes_parallel_pipe_output; > > struct parallel_processes pp; > > > > /* unset for the next API user */ > > run_processes_parallel_ungroup = 0; > > + run_processes_parallel_pipe_output = 0; > > > > pp_init(&pp, n, get_next_task, start_failure, task_finished, pp_cb, > > ungroup); > > @@ -1800,7 +1803,8 @@ int run_processes_parallel(int n, > > pp.children[i].state = GIT_CP_WAIT_CLEANUP; > > } else { > > pp_buffer_stderr(&pp, output_timeout); > > - pp_output(&pp); > > + if (!pipe_output) > > + pp_output(&pp); > > So, we do not send the output from the child to the regular output > channel when pipe_output is in effect. OK. > > > } > > code = pp_collect_finished(&pp); > > if (code) { > > And no other code changes? This is quite different from what I > expected from reading the proposed log message. > > Am I correct to say that under this new mode, we no longer flush any > output while the child task is running (due to the change in the > above hunk to omit calls to pp_output() during the run) and instead > keep accumulating in the strbuf, until the child task finishes, at > which time pp_collect_finished() will call task_finished callback. > > Even though the callback usually consumes the last piece of the > output since the last pp_output() call made during the normal > execution of the run_processes_parallel() loop, because we omitted > these calls, we have full output from the child task accumulated in > the children[].err strbuf. We may still not output .err for real, > as we may not be the output_owner, in which case we may only append > to .buffered_output member. > > I am puzzled simply because, if the above summary is correct, I do > not see how a word "pipe" have a chance to come into the picture. Ah I see what you mean here -- your summary is correct. Something like "buffer_output" would make much more sense. > I can sort of see that in this mode, we would end up buffering the > entire output from each child task into one strbuf each, and can > avoid stalling the child tasks waiting for their turn to see their > output pipes drained. But is this a reasonable thing to do? How do > we control the memory consumption to avoid having to spool unbounded > amount of output from child tasks in core, or do we have a good > reason to believe that we do not have to bother? You are correct that storing unbounded output doesn't seem like a good idea. One idea I have is to parse output during the periodic collection rather than waiting till the end. The other idea I had was to add another "git status --porcelain" option that would only output the necessary pieces of information so we wouldn't have to bother with worrying about unbounded output. Any other thoughts as to how I can workaround this? Thanks!
Calvin Wan <calvinwan@google.com> writes: > You are correct that storing unbounded output doesn't seem like a good > idea. One idea I have is to parse output during the periodic collection rather > than waiting till the end. The other idea I had was to add another > "git status --porcelain" option that would only output the necessary > pieces of information so we wouldn't have to bother with worrying about > unbounded output. > > Any other thoughts as to how I can workaround this? I wonder if you can arrange not to let them make unbounded progress? In order to run diff-files with path A B C D E ... where B and D are submodules and others are not submodules, you do not have to run and finish comparison for B and D before you can do the comparison for other paths, in order to preserve the proper output order. You can start child task for B and D and arrange so that they will run for any other submodules, and then you - run comparison for A. The child task for B and D may be running and starting to talk back to you, in which case their write may get stuck waiting for you to read from them, but that is OK, as you will read from them shortly. - wait for the child task for B. This is done by reading from the pipe connected to it and waiting for its death synchronously. The child task for D is still running and may be making progress, but you are not obligated to read its output to the end. You can postpone reading to conserve memory and it will fill the pipe and stall automatically. Then accept the result for B. - run comparison for C. - wait for the child task for D. You may have some data read already while dealing with B, but you may still have some reading to do at this point. Let it finish synchronously. - run comparison for E. etc., perhaps?
On Mon, Sep 26 2022, Calvin Wan wrote: >> * Why are we configuring an API behaviour via a global variable in >> 21st century? > > I was mimicking how "ungroup" worked, but now that Avar mentions > that pattern was for a quick regression fix, I can fix it to pass it in as a > parameter. I don't know if you started this work or not, but I had a WIP rebase of those on-list patches lying around in my tree, which I'm rebasing on "master" currently. Some of it's a bit tricky with the in-tree activity since then, I'll send them in when they're ready, maybe you already did your own rebasing of them, or maybe it helps & you'd like to base a re-submission on top of them.
On Mon, Sep 26 2022, Calvin Wan wrote: >> On the implementation: >> >> > + * If the "pipe_output" option is specified, the output will be piped >> > + * to task_finished_fn in the "struct strbuf *out" variable. The output >> > + * will still be printed unless the callback resets the strbuf. The >> > + * "pipe_output" option can be enabled by setting the global >> > + * "run_processes_parallel_pipe_output" to "1" before invoking >> > + * run_processes_parallel(), it will be set back to "0" as soon as the >> > + * API reads that setting. >> >> ...okey, but... >> >> > +static int task_finished_pipe_output(int result, >> > + struct strbuf *err, >> > + void *pp_cb, >> > + void *pp_task_cb) >> > +{ >> > + if (err && pipe_output) { >> > + fprintf(stderr, "%s", err->buf); >> > + strbuf_reset(err); >> >> ...my memory's hazy, and I haven't re-logged in any detail, but is it >> really the API interface here that the "output" callback function is >> responsible for resetting the strbuf that the API gives to it? >> >> That seems backwards to me, and e.g. a look at "start_failure" shows >> that we strbuf_reset() the "err". >> >> What's the point of doing it in the API consumer? If it doesn't do it >> we'll presumably keep accumulating output. Is there a use-case for that? >> >> Or perhaps it's not needed & this is really just misleading boilerplate? > > Ultimately it is not needed -- I added it as an example to showcase that > the output is correctly being piped to "task_finished_pipe_output". The > reset is necessary in this case to prevent the output from being printed > twice. I'm not sure how exactly else I would go about testing "pipe_output". If that's the intent then having that reset there seems to me to be doing the exact opposite of what you want. If the API is broken and passing the output along twice without clearing it in-between the two calls your strbuf_reset() would be sweeping that issue under the rug, that API brokenness would be "repaired" by your test. Whereas if you remove the strbuf_reset() it should behave as it does now, and if it doesn't the API itself is broken, i.e. after calling the callback it should be resetting the buffer.
> I don't know if you started this work or not, but I had a WIP rebase of > those on-list patches lying around in my tree, which I'm rebasing on > "master" currently. > > Some of it's a bit tricky with the in-tree activity since then, I'll > send them in when they're ready, maybe you already did your own rebasing > of them, or maybe it helps & you'd like to base a re-submission on top > of them. I have not started on this part yet. Do you have an estimate as to when you're planning on submitting your rebase? I'm also considering not using my pipe_output option and going a different route since there is the issue of dealing with potentially unbounded output.
> I wonder if you can arrange not to let them make unbounded progress? > > In order to run diff-files with path A B C D E ... where B and D are > submodules and others are not submodules, you do not have to run and > finish comparison for B and D before you can do the comparison for > other paths, in order to preserve the proper output order. You can > start child task for B and D and arrange so that they will run for > any other submodules, and then you There is no need to preserve proper output order, as the output is sorted at the end. > - run comparison for A. The child task for B and D may be running > and starting to talk back to you, in which case their write may > get stuck waiting for you to read from them, but that is OK, as > you will read from them shortly. > > - wait for the child task for B. This is done by reading from the > pipe connected to it and waiting for its death synchronously. > The child task for D is still running and may be making progress, > but you are not obligated to read its output to the end. You can > postpone reading to conserve memory and it will fill the pipe and > stall automatically. Then accept the result for B. > > - run comparison for C. > > - wait for the child task for D. You may have some data read > already while dealing with B, but you may still have some reading > to do at this point. Let it finish synchronously. > > - run comparison for E. > > etc., perhaps? I understand the idea you're suggesting and I think it would work, but I'm worried about the overhead this would create. I would rather implement a separate "git status --porcelain" output for just this submodule case so 1. we would not have to worry about unbounded output and 2. both the output parsing and the command could be optimized. In parse_status_porcelain, the function returns early if the submodule is found to have untracked and modified changes. This early termination can happen on the command side, rather than the parsing side.
On Tue, Sep 27 2022, Calvin Wan wrote: >> I don't know if you started this work or not, but I had a WIP rebase of >> those on-list patches lying around in my tree, which I'm rebasing on >> "master" currently. >> >> Some of it's a bit tricky with the in-tree activity since then, I'll >> send them in when they're ready, maybe you already did your own rebasing >> of them, or maybe it helps & you'd like to base a re-submission on top >> of them. > > I have not started on this part yet. Do you have an estimate as to when > you're planning on submitting your rebase? I hacked this up today, it passes CI at least: https://github.com/git/git/compare/master...avar:avar/hook-run-process-parallel-tty-regression-2-argument-passing > I'm also considering not using > my pipe_output option and going a different route since there is the > issue of dealing with potentially unbounded output. Okey, if it's not blocking a re-submission of yours then I'll definitely wait until after the RC to submit the above, at least, but if you'd like it earlier...
> Okey, if it's not blocking a re-submission of yours then I'll definitely > wait until after the RC to submit the above, at least, but if you'd like > it earlier... I just tested a solution where I add pipe_output_fn to parallel_processes instead of having it as a variable. Not only does it work but it also solves my unbounded output problem, so this is no longer blocked resubmission. Thanks!
Calvin Wan <calvinwan@google.com> writes: > ... I would rather > implement a separate "git status --porcelain" output for just this > submodule case so 1. we would not have to worry about unbounded > output and 2. both the output parsing and the command could be > optimized. Sounds good.
On Tue, Sep 27 2022, Calvin Wan wrote: >> Okey, if it's not blocking a re-submission of yours then I'll definitely >> wait until after the RC to submit the above, at least, but if you'd like >> it earlier... > > I just tested a solution where I add pipe_output_fn to > parallel_processes instead of having it as a variable. Not only does > it work but it also solves my unbounded output problem, so this is > no longer blocked resubmission. You mean the internal "struct parallel_processes"? How do you get the parameter there, presumably by passing it to run_processes_parallel{,_tr2}() as a new parameter? The reason for why the "ungroup" wasn't added as a parameter at the time was to avoid the churn of changing every single caller of the API. But it should really be a "parameter", and doing it via a struct means adding such parameters doesn't need to change every single caller. Then we have outstanding WIP patches for the hook.[ch] API which needed to add two other parameters... So I think first ripping off the band-aid of making it painless to extend the interface is the right thing to do, unless I've missed some way of doing it that you've just discovered...
> You mean the internal "struct parallel_processes"? How do you get the > parameter there, presumably by passing it to > run_processes_parallel{,_tr2}() as a new parameter? Yea I added it as a new parameter... > The reason for why the "ungroup" wasn't added as a parameter at the time > was to avoid the churn of changing every single caller of the API. > > But it should really be a "parameter", and doing it via a struct means > adding such parameters doesn't need to change every single caller. > > Then we have outstanding WIP patches for the hook.[ch] API which needed > to add two other parameters... > > So I think first ripping off the band-aid of making it painless to > extend the interface is the right thing to do, unless I've missed some > way of doing it that you've just discovered... In that case my patch does depend on yours for resubmission, so it sounds like if I want to quickly resubmit then I should cherry-pick the relevant commits from your WIP branch.
diff --git a/run-command.c b/run-command.c index 14f17830f5..893bc1d294 100644 --- a/run-command.c +++ b/run-command.c @@ -1474,6 +1474,7 @@ enum child_state { }; int run_processes_parallel_ungroup; +int run_processes_parallel_pipe_output; struct parallel_processes { void *data; @@ -1770,10 +1771,12 @@ int run_processes_parallel(int n, int output_timeout = 100; int spawn_cap = 4; int ungroup = run_processes_parallel_ungroup; + int pipe_output = run_processes_parallel_pipe_output; struct parallel_processes pp; /* unset for the next API user */ run_processes_parallel_ungroup = 0; + run_processes_parallel_pipe_output = 0; pp_init(&pp, n, get_next_task, start_failure, task_finished, pp_cb, ungroup); @@ -1800,7 +1803,8 @@ int run_processes_parallel(int n, pp.children[i].state = GIT_CP_WAIT_CLEANUP; } else { pp_buffer_stderr(&pp, output_timeout); - pp_output(&pp); + if (!pipe_output) + pp_output(&pp); } code = pp_collect_finished(&pp); if (code) { diff --git a/run-command.h b/run-command.h index 0e85e5846a..a5b1d63f49 100644 --- a/run-command.h +++ b/run-command.h @@ -483,8 +483,17 @@ typedef int (*task_finished_fn)(int result, * "run_processes_parallel_ungroup" to "1" before invoking * run_processes_parallel(), it will be set back to "0" as soon as the * API reads that setting. + * + * If the "pipe_output" option is specified, the output will be piped + * to task_finished_fn in the "struct strbuf *out" variable. The output + * will still be printed unless the callback resets the strbuf. The + * "pipe_output" option can be enabled by setting the global + * "run_processes_parallel_pipe_output" to "1" before invoking + * run_processes_parallel(), it will be set back to "0" as soon as the + * API reads that setting. */ extern int run_processes_parallel_ungroup; +extern int run_processes_parallel_pipe_output; int run_processes_parallel(int n, get_next_task_fn, start_failure_fn, diff --git a/t/helper/test-run-command.c b/t/helper/test-run-command.c index c9283b47af..030e533c6b 100644 --- a/t/helper/test-run-command.c +++ b/t/helper/test-run-command.c @@ -20,6 +20,7 @@ #include "wildmatch.h" #include "gettext.h" +static int pipe_output = 0; static int number_callbacks; static int parallel_next(struct child_process *cp, struct strbuf *err, @@ -52,15 +53,32 @@ static int no_job(struct child_process *cp, return 0; } +static int task_finished_pipe_output(int result, + struct strbuf *err, + void *pp_cb, + void *pp_task_cb) +{ + if (err && pipe_output) { + fprintf(stderr, "%s", err->buf); + strbuf_reset(err); + } + return 0; +} + static int task_finished(int result, struct strbuf *err, void *pp_cb, void *pp_task_cb) { - if (err) + if (err) { strbuf_addstr(err, "asking for a quick stop\n"); - else + if (pipe_output) { + fprintf(stderr, "%s", err->buf); + strbuf_reset(err); + } + } else { fprintf(stderr, "asking for a quick stop\n"); + } return 1; } @@ -423,13 +441,20 @@ int cmd__run_command(int argc, const char **argv) run_processes_parallel_ungroup = 1; } + if (!strcmp(argv[1], "--pipe-output")) { + argv += 1; + argc -= 1; + run_processes_parallel_pipe_output = 1; + pipe_output = 1; + } + jobs = atoi(argv[2]); strvec_clear(&proc.args); strvec_pushv(&proc.args, (const char **)argv + 3); if (!strcmp(argv[1], "run-command-parallel")) exit(run_processes_parallel(jobs, parallel_next, - NULL, NULL, &proc)); + NULL, task_finished_pipe_output, &proc)); if (!strcmp(argv[1], "run-command-abort")) exit(run_processes_parallel(jobs, parallel_next, diff --git a/t/t0061-run-command.sh b/t/t0061-run-command.sh index 7b5423eebd..97ca942a74 100755 --- a/t/t0061-run-command.sh +++ b/t/t0061-run-command.sh @@ -140,6 +140,11 @@ test_expect_success 'run_command runs ungrouped in parallel with more jobs avail test_line_count = 4 err ' +test_expect_success 'run_command runs pipe_output in parallel with more jobs available than tasks' ' + test-tool run-command --pipe-output run-command-parallel 5 sh -c "printf \"%s\n%s\n\" Hello World" 2>actual && + test_cmp expect actual +' + test_expect_success 'run_command runs in parallel with as many jobs as tasks' ' test-tool run-command run-command-parallel 4 sh -c "printf \"%s\n%s\n\" Hello World" 2>actual && test_cmp expect actual @@ -151,6 +156,11 @@ test_expect_success 'run_command runs ungrouped in parallel with as many jobs as test_line_count = 4 err ' +test_expect_success 'run_command runs pipe_output in parallel with as many jobs as tasks' ' + test-tool run-command --pipe-output run-command-parallel 4 sh -c "printf \"%s\n%s\n\" Hello World" 2>actual && + test_cmp expect actual +' + test_expect_success 'run_command runs in parallel with more tasks than jobs available' ' test-tool run-command run-command-parallel 3 sh -c "printf \"%s\n%s\n\" Hello World" 2>actual && test_cmp expect actual @@ -162,6 +172,12 @@ test_expect_success 'run_command runs ungrouped in parallel with more tasks than test_line_count = 4 err ' +test_expect_success 'run_command runs pipe_output in parallel with more tasks than jobs available' ' + test-tool run-command --pipe-output run-command-parallel 3 sh -c "printf \"%s\n%s\n\" Hello World" 2>actual && + test_cmp expect actual +' + + cat >expect <<-EOF preloaded output of a child asking for a quick stop @@ -182,6 +198,11 @@ test_expect_success 'run_command is asked to abort gracefully (ungroup)' ' test_line_count = 6 err ' +test_expect_success 'run_command is asked to abort gracefully (pipe_output)' ' + test-tool run-command --pipe-output run-command-abort 3 false 2>actual && + test_cmp expect actual +' + cat >expect <<-EOF no further jobs available EOF @@ -197,6 +218,11 @@ test_expect_success 'run_command outputs (ungroup) ' ' test_cmp expect err ' +test_expect_success 'run_command outputs (pipe_output) ' ' + test-tool run-command --pipe-output run-command-no-jobs 3 sh -c "printf \"%s\n%s\n\" Hello World" 2>actual && + test_cmp expect actual +' + test_trace () { expect="$1" shift
run_processes_parallel periodically collects output from its child processes, prints it, and then resets the buffers for each child. Add run_processes_parallel_pipe_output variable so output can be collected and fed to task_finished. When set, the function referenced by task_finished should parse the output of each child process. Signed-off-by: Calvin Wan <calvinwan@google.com> --- run-command.c | 6 +++++- run-command.h | 9 +++++++++ t/helper/test-run-command.c | 31 ++++++++++++++++++++++++++++--- t/t0061-run-command.sh | 26 ++++++++++++++++++++++++++ 4 files changed, 68 insertions(+), 4 deletions(-)