Message ID | xmqqikxv4t1v.fsf_-_@gitster.g (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Re* [PATCH v5] describe: refresh the index when 'broken' flag is used | expand |
Junio C Hamano <gitster@pobox.com> writes: > Subject: [PATCH] fsck: clear child_process after failed run_command() > > There are two loops that calls run_command() using the same > child_process struct near the end of cmd_fsck(). 4d0984be (fsck: do > not reuse child_process structs, 2018-11-12) tightened these code > paths to reinitialize the structure in order to safely reuse it. > > The run-command API makes no promises about what is left in a struct > child_process after a command finishes, and it's not safe to simply > reuse it again for a similar command. > > Upon failure, run_command() can return without releasing the > resource held by the child_process structure, which is done by > calling finish_command() which in turn calls child_process_clear(). Sorry, but I have to take this back. The error return code paths in the start_command() function does seem to call clear_child_process(), which means that there is no need to call clear_child_process() ourselves after a failed run_command() call. The need for reinitializing that established by 4d0984be (fsck: do not reuse child_process structs, 2018-11-12) still stand, so ... the first run ... run_command(&cp); /* no need for separate child_process_clear(&cp) */ child_process_init(&cp); /* prepare for reuse */ ... setup for the second run ... strvec_pushv(&cp.args, diff_index_args); cp.git_cmd = 1; ... full set-up without relying on anything done earlier ... would still be needed. Or alternatively, we could do this to ensure that the child_process structure is always reusable. run-command.c | 1 + run-command.h | 6 +++++- 2 files changed, 6 insertions(+), 1 deletion(-) diff --git c/run-command.c w/run-command.c index 6ac1d14516..aba250fbe1 100644 --- c/run-command.c +++ w/run-command.c @@ -26,6 +26,7 @@ void child_process_clear(struct child_process *child) { strvec_clear(&child->args); strvec_clear(&child->env); + child_process_init(child); } struct child_to_clean { diff --git c/run-command.h w/run-command.h index 55f6631a2a..6e203c22f6 100644 --- c/run-command.h +++ w/run-command.h @@ -204,7 +204,8 @@ int start_command(struct child_process *); /** * Wait for the completion of a sub-process that was started with - * start_command(). + * start_command(). The child_process structure is cleared and + * reinitialized. */ int finish_command(struct child_process *); @@ -214,6 +215,9 @@ int finish_command_in_signal(struct child_process *); * A convenience function that encapsulates a sequence of * start_command() followed by finish_command(). Takes a pointer * to a `struct child_process` that specifies the details. + * The child_process structure is cleared and reinitialized, + * even when the command fails to start or an error is detected + * in finish_command(). */ int run_command(struct child_process *);
On 26/06/24 21:47, Junio C Hamano wrote: > Or alternatively, we could do this to ensure that the child_process > structure is always reusable. > > run-command.c | 1 + > run-command.h | 6 +++++- > 2 files changed, 6 insertions(+), 1 deletion(-) > > diff --git c/run-command.c w/run-command.c > index 6ac1d14516..aba250fbe1 100644 > --- c/run-command.c > +++ w/run-command.c > @@ -26,6 +26,7 @@ void child_process_clear(struct child_process *child) > { > strvec_clear(&child->args); > strvec_clear(&child->env); > + child_process_init(child); > } To me, this looks much better. child_process_clear's name already suggests that is sort of like a destructor, so it makes sense to re-initialize everything here. I even wonder why it was not that way to begin with. I suppose no callers are assuming that it only clears args and env though? > struct child_to_clean { > diff --git c/run-command.h w/run-command.h > index 55f6631a2a..6e203c22f6 100644 > --- c/run-command.h > +++ w/run-command.h > @@ -204,7 +204,8 @@ int start_command(struct child_process *); > > /** > * Wait for the completion of a sub-process that was started with > - * start_command(). > + * start_command(). The child_process structure is cleared and > + * reinitialized. > */ > int finish_command(struct child_process *); > > @@ -214,6 +215,9 @@ int finish_command_in_signal(struct child_process *); > * A convenience function that encapsulates a sequence of > * start_command() followed by finish_command(). Takes a pointer > * to a `struct child_process` that specifies the details. > + * The child_process structure is cleared and reinitialized, > + * even when the command fails to start or an error is detected > + * in finish_command(). > */ > int run_command(struct child_process *); >
Abhijeet Sonar <abhijeet.nkt@gmail.com> writes: > To me, this looks much better. child_process_clear's name already > suggests that is sort of like a destructor, so it makes sense to > re-initialize everything here. I even wonder why it was not that way to > begin with. I suppose no callers are assuming that it only clears args > and env though? I guess that validating that supposition is a prerequisite to declare the change as "much better" and "makes sense".
Junio C Hamano <gitster@pobox.com> writes: > Abhijeet Sonar <abhijeet.nkt@gmail.com> writes: > >> To me, this looks much better. child_process_clear's name already >> suggests that is sort of like a destructor, so it makes sense to >> re-initialize everything here. I even wonder why it was not that way to >> begin with. I suppose no callers are assuming that it only clears args >> and env though? > > I guess that validating that supposition is a prerequisite to > declare the change as "much better" and "makes sense". Having said that, a lot more plausible reason is that most callers are expected to use a single struct just once without reusing it, so the cost of re-initialization is wasted cycles for them if it were part of child_process_clear() which is primarily about releasing resources (instead of leaking them). It can be argued that it is a sensible design decision to make those minority callers that do reuse the same struct to pay the cost of reinitialization themselves, instead of the majority callers that use it once and then release resources held in there.
On 26/06/24 23:05, Junio C Hamano wrote: > Abhijeet Sonar <abhijeet.nkt@gmail.com> writes: > >> To me, this looks much better. child_process_clear's name already >> suggests that is sort of like a destructor, so it makes sense to >> re-initialize everything here. I even wonder why it was not that way to >> begin with. I suppose no callers are assuming that it only clears args >> and env though? > > I guess that validating that supposition is a prerequisite to > declare the change as "much better" and "makes sense". OK. I found one: at the end of submodule.c:push_submodule() if (...) { ...some setup... if (run_command(&cp)) return 0; close(cp.out); }
Abhijeet Sonar <abhijeet.nkt@gmail.com> writes: > On 26/06/24 23:05, Junio C Hamano wrote: >> Abhijeet Sonar <abhijeet.nkt@gmail.com> writes: >> >>> To me, this looks much better. child_process_clear's name already >>> suggests that is sort of like a destructor, so it makes sense to >>> re-initialize everything here. I even wonder why it was not that way to >>> begin with. I suppose no callers are assuming that it only clears args >>> and env though? >> >> I guess that validating that supposition is a prerequisite to >> declare the change as "much better" and "makes sense". > > OK. I found one: at the end of submodule.c:push_submodule() > > if (...) { > ...some setup... > if (run_command(&cp)) > return 0; > close(cp.out); > } This is curious. * What is this thing trying to do? When run_command() fails, it wants to leave cp.out open, so that the caller this returns to can write into it??? That cannot be the case, as cp itself is internal. So does this "close(cp.out)" really matter? * Even though we are running child_process_clear() to release the resources in run_command() we are not closing the file descriptor cp.out in the child_process_clear() and force the caller to close it instead. An open file descriptor is a resource, and a file descriptor opened but forgotten is considered a leak. I wonder if child_process_clear() should be closing the file descriptor, at least the ones it opened or dup2()ed. In any case, you found a case where child_process_clear() may not want to do the full re-initialization and at the same time it is not doing its job sufficiently well. Let's decide, at least for now, not to do the reinitialization from child_process_clear(), then. Thanks.
On Wed, Jun 26, 2024 at 11:49:22AM -0700, Junio C Hamano wrote: > Abhijeet Sonar <abhijeet.nkt@gmail.com> writes: > > > On 26/06/24 23:05, Junio C Hamano wrote: > >> Abhijeet Sonar <abhijeet.nkt@gmail.com> writes: > >> > >>> To me, this looks much better. child_process_clear's name already > >>> suggests that is sort of like a destructor, so it makes sense to > >>> re-initialize everything here. I even wonder why it was not that way to > >>> begin with. I suppose no callers are assuming that it only clears args > >>> and env though? > >> > >> I guess that validating that supposition is a prerequisite to > >> declare the change as "much better" and "makes sense". > > > > OK. I found one: at the end of submodule.c:push_submodule() > > > > if (...) { > > ...some setup... > > if (run_command(&cp)) > > return 0; > > close(cp.out); > > } > > This is curious. > > * What is this thing trying to do? When run_command() fails, it > wants to leave cp.out open, so that the caller this returns to > can write into it??? That cannot be the case, as cp itself is > internal. So does this "close(cp.out)" really matter? I think it's totally broken. Using cp.out, cp.in, etc, with run_command() is a deadlock waiting to happen, since it implies opening a pipe, not doing anything with our end, and then doing a waitpid() on the child. You'd always want to use start_command(), and then do something useful with the pipe, and then finish_command(). Arguably run_command() should bug if cp.out, etc are non-zero. In this case the code leaves cp.out as 0, so we are not asking for a pipe. That is good, and we are not subject to any race/deadlock. But then...what is the cleanup trying to do? This will always just close(0), the parent's stdin. That is certainly surprising, but I guess nobody ever noticed because git-push does not usually read from its stdin. So I think the close() is superfluous and should be deleted. It goes all the way back to eb21c732d6 (push: teach --recurse-submodules the on-demand option, 2012-03-29). I guess at one point the author thought we'd read output from a pipe (rather than letting it just go to the parent's stdout) and this was leftover? > * Even though we are running child_process_clear() to release the > resources in run_command() we are not closing the file descriptor > cp.out in the child_process_clear() and force the caller to close > it instead. An open file descriptor is a resource, and a file > descriptor opened but forgotten is considered a leak. I wonder > if child_process_clear() should be closing the file descriptor, > at least the ones it opened or dup2()ed. Usually the caller will have handled this already (since it cares about exactly _when_ to close), so we'd end up double-closing in most cases. This is sometimes harmless (you'll get EBADF), but is a problem if the descriptor was assigned to something else in the meantime. In most cases callers shouldn't be using child_process_clear() at all after start_command(), etc. Either: - start_command() failed, in which case it cleans up everything already (both in the struct and any pipes it opened) - we succeeded in starting the command, in which case child_process_clear() is insufficient. You need to actually finish_command() to avoid leaking the pid. - after finish_command(), the struct has been cleaned already. You do need to close pipes in the caller, but you'd have to do so before calling finish_command() anyway, to avoid the deadlock. You really only need to call child_process_clear() yourself when you set up a struct but didn't actually start the process. And from a quick skim over the grep results, it looks like that's how it's used. -Peff
Junio C Hamano <gitster@pobox.com> writes: > Abhijeet Sonar <abhijeet.nkt@gmail.com> writes: > >> On 26/06/24 23:05, Junio C Hamano wrote: >>> Abhijeet Sonar <abhijeet.nkt@gmail.com> writes: >>> >>>> To me, this looks much better. child_process_clear's name already >>>> suggests that is sort of like a destructor, so it makes sense to >>>> re-initialize everything here. I even wonder why it was not that way to >>>> begin with. I suppose no callers are assuming that it only clears args >>>> and env though? >>> >>> I guess that validating that supposition is a prerequisite to >>> declare the change as "much better" and "makes sense". >> >> OK. I found one: at the end of submodule.c:push_submodule() >> >> if (...) { >> ...some setup... >> if (run_command(&cp)) >> return 0; >> close(cp.out); >> } > > This is curious. > > * What is this thing trying to do? When run_command() fails, it > wants to leave cp.out open, so that the caller this returns to > can write into it??? That cannot be the case, as cp itself is > internal. So does this "close(cp.out)" really matter? > This is curious one indeed, we only need to close the file descriptor when we set `cp.out = -1`, otherwise there is no need to close `cp.out` since `start_command()` takes care of it internally. So the `close(cp.out)` can really be removed here. Wonder if this was copied over from other parts of the submodule code, where we actually set `cp.out = -1`. > > * Even though we are running child_process_clear() to release the > resources in run_command() we are not closing the file descriptor > cp.out in the child_process_clear() and force the caller to close > it instead. An open file descriptor is a resource, and a file > descriptor opened but forgotten is considered a leak. I wonder > if child_process_clear() should be closing the file descriptor, > at least the ones it opened or dup2()ed. > If you see the documentation for run_command.h::child_process, it states /* * Using .in, .out, .err: * - Specify 0 for no redirections. No new file descriptor is allocated. * (child inherits stdin, stdout, stderr from parent). * - Specify -1 to have a pipe allocated as follows: * .in: returns the writable pipe end; parent writes to it, * the readable pipe end becomes child's stdin * .out, .err: returns the readable pipe end; parent reads from * it, the writable pipe end becomes child's stdout/stderr * The caller of start_command() must close the returned FDs * after it has completed reading from/writing to it! * - Specify > 0 to set a channel to a particular FD as follows: * .in: a readable FD, becomes child's stdin * .out: a writable FD, becomes child's stdout/stderr * .err: a writable FD, becomes child's stderr * The specified FD is closed by start_command(), even in case * of errors! */ int in; int out; int err; So this makes sense right? run_command() doesn't support the pipe options, meaning clients have to call start_command() + finish_command() manually. When clients do call start_command() with '-1' set to these options, they are in charge of closing the created pipes. > In any case, you found a case where child_process_clear() may not > want to do the full re-initialization and at the same time it is not > doing its job sufficiently well. Let's decide, at least for now, > not to do the reinitialization from child_process_clear(), then. > > Thanks.
On Wed, Jun 26, 2024 at 04:34:17PM -0400, Jeff King wrote: > > * What is this thing trying to do? When run_command() fails, it > > wants to leave cp.out open, so that the caller this returns to > > can write into it??? That cannot be the case, as cp itself is > > internal. So does this "close(cp.out)" really matter? > > I think it's totally broken. Using cp.out, cp.in, etc, with > run_command() is a deadlock waiting to happen, since it implies opening > a pipe, not doing anything with our end, and then doing a waitpid() on > the child. > > You'd always want to use start_command(), and then do something useful > with the pipe, and then finish_command(). Arguably run_command() should > bug if cp.out, etc are non-zero. Oh, I guess I should have looked at the code. We do indeed BUG() in this case already, courtesy of c29b3962af (run-command: forbid using run_command with piped output, 2015-03-22). So all is well there. This caller is still completely wrong to call close() though. -Peff
diff --git c/builtin/fsck.c w/builtin/fsck.c index d13a226c2e..398b492184 100644 --- c/builtin/fsck.c +++ w/builtin/fsck.c @@ -1078,8 +1078,10 @@ int cmd_fsck(int argc, const char **argv, const char *prefix) strvec_push(&commit_graph_verify.args, "--progress"); else strvec_push(&commit_graph_verify.args, "--no-progress"); - if (run_command(&commit_graph_verify)) + if (run_command(&commit_graph_verify)) { + child_process_clear(&commit_graph_verify); errors_found |= ERROR_COMMIT_GRAPH; + } } } @@ -1096,8 +1098,10 @@ int cmd_fsck(int argc, const char **argv, const char *prefix) strvec_push(&midx_verify.args, "--progress"); else strvec_push(&midx_verify.args, "--no-progress"); - if (run_command(&midx_verify)) + if (run_command(&midx_verify)) { + child_process_clear(&midx_verify); errors_found |= ERROR_MULTI_PACK_INDEX; + } } }