mbox series

[0/8] run-command: remove run_command_v_*()

Message ID 7407e074-4bd8-b351-7fa4-baf59b41880c@web.de (mailing list archive)
Headers show
Series run-command: remove run_command_v_*() | expand

Message

René Scharfe Oct. 27, 2022, 4:30 p.m. UTC
Replace the convenience functions run_command_v_opt() et. al. and use
struct child_process and run_command() directly instead, for an overall
code reduction and a simpler and more flexible API that allows creating
argument lists without magic numbers and reduced risk of memory leaks.

This is a broken-out and polished version of the original scratch at
https://lore.kernel.org/git/9d924a5d-5c72-fbe6-270c-a8f6c5fc5850@web.de/

Ævar Arnfjörð Bjarmason (1):
  merge: remove always-the-same "verbose" arguments

René Scharfe (7):
  bisect--helper: factor out do_bisect_run()
  use child_process members "args" and "env" directly
  use child_process member "args" instead of string array variable
  replace and remove run_command_v_opt_cd_env()
  replace and remove run_command_v_opt_tr2()
  replace and remove run_command_v_opt_cd_env_tr2()
  replace and remove run_command_v_opt()

 add-interactive.c        |   9 ++-
 bisect.c                 |  12 ++--
 builtin/add.c            |  19 +++--
 builtin/am.c             |  12 ++--
 builtin/bisect--helper.c |  68 +++++++++---------
 builtin/clone.c          |  41 ++++++-----
 builtin/difftool.c       |  24 ++++---
 builtin/fetch.c          |   9 ++-
 builtin/gc.c             |  55 ++++++++++-----
 builtin/merge-index.c    |   4 +-
 builtin/merge.c          |  53 ++++++--------
 builtin/pull.c           | 147 +++++++++++++++++++--------------------
 builtin/remote.c         |  40 +++++------
 compat/mingw.c           |  11 +--
 diff.c                   |  27 ++++---
 fsmonitor-ipc.c          |  10 ++-
 git.c                    |  15 ++--
 ll-merge.c               |   7 +-
 merge.c                  |  18 ++---
 run-command.c            |  35 ----------
 run-command.h            |  33 +--------
 scalar.c                 |  13 ++--
 sequencer.c              |  32 ++++-----
 shell.c                  |  17 +++--
 t/helper/test-fake-ssh.c |   7 +-
 t/helper/test-trace2.c   |   4 +-
 tmp-objdir.h             |   4 +-
 27 files changed, 344 insertions(+), 382 deletions(-)

--
2.38.1

Comments

Jeff King Oct. 27, 2022, 8:11 p.m. UTC | #1
On Thu, Oct 27, 2022 at 06:30:36PM +0200, René Scharfe wrote:

> Replace the convenience functions run_command_v_opt() et. al. and use
> struct child_process and run_command() directly instead, for an overall
> code reduction and a simpler and more flexible API that allows creating
> argument lists without magic numbers and reduced risk of memory leaks.
> 
> This is a broken-out and polished version of the original scratch at
> https://lore.kernel.org/git/9d924a5d-5c72-fbe6-270c-a8f6c5fc5850@web.de/

I read through this and it all looks fine to me. I was a bit puzzled at
the layout of your series at first. In particular, the difference
between cases in patch 4 versus the later ones.

I think it is that in patch 4, these are all unambiguously positive
because we are getting rid of magic numbers (or magically-sized arrays).
Whereas in patches 5-8, there's nothing inherently wrong with the
call-sites; but as we get rid of the API wrappers, we convert them. So
they are collateral damage, so to speak, from the simplification of the
API.

That makes sense to me, though I could point out that most of the sites
cleaned up in patch 4 _could_ be converted to look like the ones that
are converted in 5-8. Obviously that doesn't make sense to do, knowing
that 5-8 are coming. But if the point in splitting it this way is to
show that we could stop at patch 4, cleaning up call sites but not
shrinking the run-command API, then I just want to point out that there
is another way to do those cleanups. :)

All that said, you will be unsurprised to hear that I am on board with
the direction of 5-8, so this all looks good to me. I'm just laying out
my thought process while reviewing.

I also think it would have been fine to just drop the whole API in one
patch (i.e., squashing 5-8), which makes the intermediate diffs a bit
shorter. But I do not mind at all having the cases broken out.

Thanks for cleaning this up.

-Peff
Ævar Arnfjörð Bjarmason Oct. 27, 2022, 9:46 p.m. UTC | #2
On Thu, Oct 27 2022, René Scharfe wrote:

> Replace the convenience functions run_command_v_opt() et. al. and use
> struct child_process and run_command() directly instead, for an overall
> code reduction and a simpler and more flexible API that allows creating
> argument lists without magic numbers and reduced risk of memory leaks.
>
> This is a broken-out and polished version of the original scratch at
> https://lore.kernel.org/git/9d924a5d-5c72-fbe6-270c-a8f6c5fc5850@web.de/
>
> Ævar Arnfjörð Bjarmason (1):
>   merge: remove always-the-same "verbose" arguments
>
> René Scharfe (7):
>   bisect--helper: factor out do_bisect_run()
>   use child_process members "args" and "env" directly
>   use child_process member "args" instead of string array variable
>   replace and remove run_command_v_opt_cd_env()
>   replace and remove run_command_v_opt_tr2()
>   replace and remove run_command_v_opt_cd_env_tr2()
>   replace and remove run_command_v_opt()

Even though I had a an earlier alternate series series[1] for this I'd
be happy to see this version go in. I left some comments here and there,
but with/without a re-roll am happy to give this:

	Reviewed-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>

I think I would have just gone for this in the first place, but thought
that people loved the convenience functions too much. It can be hard to
judge sentiments in advance :)

One reason I hadn't re-submitted something is that there were
outstanding new conflicts with "seen", and I see from applying this
topic & merging it that it conflicts somewhat heavily. Junio seems to be
on-board with this though, so maybe he won't mind.

I didn't see any glaring instances where this made things worse, so
maybe we didn't need these convenience wrappers in the first place.

But from the earlier discussion it does seema bit like we tossed the
very notion out because some didn't like the duplicating of struct
members with the flags (which I also doen't like).

So I came up with the below experiment on top, it's not an attempt to
convert all callers, just a demo.

Maybe you think some ideas here are worth using, I probably won't pursue
it (but maybe as ideas for some other future API).

It's a combination of stuff, some of which you might like, some not,
namely:

- Have the API work the same way, but just have a run_commandl(&opt,
  ...) in addition to the run_command(). It's just a trivial wrapper to
  push stuff into &cmd.args for you.

- I saw a few callers that could have perhaps used a similarly trivial
  run_commandv(), but that doesn't benefit from LAST_ARG_MUST_BE_NULL,
  so I didn't bother.

- I wish C had a nicer syntax for not just declaring but squashing
  together compile_time bracketed lists (think set operations). But the
  below "CHILD_PROCESS_INIT_LIST" is a pretty good poor man's version.

  I see gcc/clang nicely give us safety rails for that with
  "-Woverride-init", for this sort of "opts struct with internal stuff,
  but also user options" I think it works out very nicely.

- We have quite a few callers that want "on error, die", so maybe we
  should have something like that "on_error" sooner than later.

On clever (but perhaps overly clever) thing I didn't use, but played
with recently in another context, is that now with C99 you can also do:

	int run_commandl(struct child_process *cmd, ...);
	#define run_command(...) run_command_1(__VA_ARGS__, NULL)

I.e. make the API itself support all of:

	run_command(&cmd);
	run_command(&cmd, "reboot");
	run_command(&cmd, "reboot", NULL);

I haven't made up my mind on whether that's just overly clever, or
outright insane :)

diff --git a/bisect.c b/bisect.c
index ec7487e6836..ef4f80650f7 100644
--- a/bisect.c
+++ b/bisect.c
@@ -740,9 +740,8 @@ enum bisect_error bisect_checkout(const struct object_id *bisect_rev,
 		struct child_process cmd = CHILD_PROCESS_INIT;
 
 		cmd.git_cmd = 1;
-		strvec_pushl(&cmd.args, "checkout", "-q",
-			     oid_to_hex(bisect_rev), "--", NULL);
-		if (run_command(&cmd))
+		if (run_commandl(&cmd, "checkout", "-q",
+				 oid_to_hex(bisect_rev), "--", NULL))
 			/*
 			 * Errors in `run_command()` itself, signaled by res < 0,
 			 * and errors in the child process, signaled by res > 0
diff --git a/builtin/am.c b/builtin/am.c
index 20aea0d2487..3b7df32ce22 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -2189,10 +2189,9 @@ static int show_patch(struct am_state *state, enum show_patch_type sub_mode)
 	if (!is_null_oid(&state->orig_commit)) {
 		struct child_process cmd = CHILD_PROCESS_INIT;
 
-		strvec_pushl(&cmd.args, "show", oid_to_hex(&state->orig_commit),
-			     "--", NULL);
 		cmd.git_cmd = 1;
-		return run_command(&cmd);
+		return run_commandl(&cmd, "show", oid_to_hex(&state->orig_commit),
+				    "--", NULL);
 	}
 
 	switch (sub_mode) {
diff --git a/builtin/bisect--helper.c b/builtin/bisect--helper.c
index 1d2ce8a0e12..087d21c614a 100644
--- a/builtin/bisect--helper.c
+++ b/builtin/bisect--helper.c
@@ -764,12 +764,12 @@ static enum bisect_error bisect_start(struct bisect_terms *terms, const char **a
 		strbuf_read_file(&start_head, git_path_bisect_start(), 0);
 		strbuf_trim(&start_head);
 		if (!no_checkout) {
-			struct child_process cmd = CHILD_PROCESS_INIT;
-
-			cmd.git_cmd = 1;
-			strvec_pushl(&cmd.args, "checkout", start_head.buf,
-				     "--", NULL);
-			if (run_command(&cmd)) {
+			struct child_process cmd = {
+				CHILD_PROCESS_INIT_LIST,
+				.git_cmd = 1,
+			};
+			if (run_commandl(&cmd, "checkout", start_head.buf,
+					 "--", NULL)) {
 				res = error(_("checking out '%s' failed."
 						 " Try 'git bisect start "
 						 "<valid-branch>'."),
@@ -1147,8 +1147,7 @@ static int do_bisect_run(const char *command)
 
 	printf(_("running %s\n"), command);
 	cmd.use_shell = 1;
-	strvec_push(&cmd.args, command);
-	return run_command(&cmd);
+	return run_commandl(&cmd, command, NULL);
 }
 
 static int verify_good(const struct bisect_terms *terms, const char *command)
diff --git a/builtin/clone.c b/builtin/clone.c
index 0e4348686b6..80d09e0fbf1 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -655,7 +655,6 @@ static int git_sparse_checkout_init(const char *repo)
 {
 	struct child_process cmd = CHILD_PROCESS_INIT;
 	int result = 0;
-	strvec_pushl(&cmd.args, "-C", repo, "sparse-checkout", "set", NULL);
 
 	/*
 	 * We must apply the setting in the current process
@@ -664,7 +663,7 @@ static int git_sparse_checkout_init(const char *repo)
 	core_apply_sparse_checkout = 1;
 
 	cmd.git_cmd = 1;
-	if (run_command(&cmd)) {
+	if (run_commandl(&cmd, "-C", repo, "sparse-checkout", "set", NULL)) {
 		error(_("failed to initialize sparse-checkout"));
 		result = 1;
 	}
@@ -868,13 +867,14 @@ static void dissociate_from_references(void)
 	char *alternates = git_pathdup("objects/info/alternates");
 
 	if (!access(alternates, F_OK)) {
-		struct child_process cmd = CHILD_PROCESS_INIT;
-
-		cmd.git_cmd = 1;
-		cmd.no_stdin = 1;
-		strvec_pushl(&cmd.args, "repack", "-a", "-d", NULL);
-		if (run_command(&cmd))
-			die(_("cannot repack to clean up"));
+		struct child_process cmd = {
+			CHILD_PROCESS_INIT_LIST,
+			.git_cmd = 1,
+			.no_stdin = 1,
+			.on_error = CHILD_PROCESS_ON_ERROR_DIE,
+		};
+
+		run_commandl(&cmd, "repack", "-a", "-d", NULL);
 		if (unlink(alternates) && errno != ENOENT)
 			die_errno(_("cannot unlink temporary alternates file"));
 	}
diff --git a/builtin/difftool.c b/builtin/difftool.c
index d7f08c8a7fa..b4165b5a8ae 100644
--- a/builtin/difftool.c
+++ b/builtin/difftool.c
@@ -44,11 +44,12 @@ static int difftool_config(const char *var, const char *value, void *cb)
 
 static int print_tool_help(void)
 {
-	struct child_process cmd = CHILD_PROCESS_INIT;
-
-	cmd.git_cmd = 1;
-	strvec_pushl(&cmd.args, "mergetool", "--tool-help=diff", NULL);
-	return run_command(&cmd);
+	struct child_process cmd = {
+		CHILD_PROCESS_INIT_LIST,
+		.git_cmd = 1,
+	};
+		
+	return run_commandl(&cmd, "mergetool", "--tool-help=diff", NULL);
 }
 
 static int parse_index_info(char *p, int *mode1, int *mode2,
diff --git a/git.c b/git.c
index 6662548986f..93179f44f78 100644
--- a/git.c
+++ b/git.c
@@ -724,7 +724,13 @@ static void handle_builtin(int argc, const char **argv)
 
 static void execv_dashed_external(const char **argv)
 {
-	struct child_process cmd = CHILD_PROCESS_INIT;
+	struct child_process cmd = {
+		CHILD_PROCESS_INIT_LIST,
+		.clean_on_exit = 1,
+		.wait_after_clean = 1,
+		.silent_exec_failure = 1,
+		.trace2_child_class = "dashed",
+	};
 	int status;
 
 	if (get_super_prefix())
@@ -736,10 +742,6 @@ static void execv_dashed_external(const char **argv)
 
 	strvec_pushf(&cmd.args, "git-%s", argv[0]);
 	strvec_pushv(&cmd.args, argv + 1);
-	cmd.clean_on_exit = 1;
-	cmd.wait_after_clean = 1;
-	cmd.silent_exec_failure = 1;
-	cmd.trace2_child_class = "dashed";
 
 	trace2_cmd_name("_run_dashed_");
 
diff --git a/run-command.c b/run-command.c
index 23e100dffc4..4b20aa1b577 100644
--- a/run-command.c
+++ b/run-command.c
@@ -993,15 +993,45 @@ int finish_command_in_signal(struct child_process *cmd)
 
 int run_command(struct child_process *cmd)
 {
-	int code;
+	int starting = 1;
+	int code = 0;
 
 	if (cmd->out < 0 || cmd->err < 0)
 		BUG("run_command with a pipe can cause deadlock");
 
 	code = start_command(cmd);
 	if (code)
+		goto error;
+	starting = 0;
+	code = finish_command(cmd);
+	if (!code)
+		return 0;
+error:
+	switch (cmd->on_error) {
+	case CHILD_PROCESS_ON_ERROR_DIE:
+		die(starting ?
+		    _("start_command() for '%s' failed (error code %d)") :
+		    _("'%s': got non-zero exit code '%d'"),
+		    cmd->args.v[0], code);
+		break;
+	case CHILD_PROCESS_ON_ERROR_RETURN:
 		return code;
-	return finish_command(cmd);
+	default:
+		BUG("unreachable");
+	}
+}
+
+int run_commandl(struct child_process *cmd, ...)
+{
+	va_list ap;
+	const char *arg;
+
+	va_start(ap, cmd);
+	while ((arg = va_arg(ap, const char *)))
+		strvec_push(&cmd->args, arg);
+	va_end(ap);
+
+	return run_command(cmd);
 }
 
 #ifndef NO_PTHREADS
diff --git a/run-command.h b/run-command.h
index fe2717ad11e..71e390350ed 100644
--- a/run-command.h
+++ b/run-command.h
@@ -15,7 +15,22 @@
  * produces in the caller in order to process it.
  */
 
+enum child_process_on_error {
+	/**
+	 * Return a status code from run_command(). Set to 0 so that
+	 * it'll be { 0 } init'd. If it's specified in a
+	 * CHILD_PROCESS_INIT_LIST initialization we don't want a
+	 * "-Woverride-init" warning.
+	 */
+	CHILD_PROCESS_ON_ERROR_RETURN = 0,
 
+	/**
+	 * die() with some sensible message if run_command() would
+	 * have returned a non-zero exit code.
+	 */
+	CHILD_PROCESS_ON_ERROR_DIE,
+};
+	
 /**
  * This describes the arguments, redirections, and environment of a
  * command to run in a sub-process.
@@ -42,6 +57,10 @@
  *		redirected.
  */
 struct child_process {
+	/**
+	 * Error behavior, see "enum child_process_on_error" above.
+	 */
+	enum child_process_on_error on_error;
 
 	/**
 	 * The .args is a `struct strvec', use that API to manipulate
@@ -144,10 +163,11 @@ struct child_process {
 	void (*clean_on_exit_handler)(struct child_process *process);
 };
 
-#define CHILD_PROCESS_INIT { \
+#define CHILD_PROCESS_INIT_LIST \
+	/* .on_error = CHILD_PROCESS_ON_ERROR_RETURN */ \
 	.args = STRVEC_INIT, \
-	.env = STRVEC_INIT, \
-}
+	.env = STRVEC_INIT
+#define CHILD_PROCESS_INIT { CHILD_PROCESS_INIT_LIST }
 
 /**
  * The functions: child_process_init, start_command, finish_command,
@@ -218,6 +238,14 @@ int finish_command_in_signal(struct child_process *);
  */
 int run_command(struct child_process *);
 
+/**
+ * Like run_command() but takes a variable number of arguments, which
+ * will be appended with the equivalent of strvec_pushl(&cmd.args,
+ * ...) before invoking run_command().
+ */
+LAST_ARG_MUST_BE_NULL
+int run_commandl(struct child_process *cmd, ...);
+
 /*
  * Trigger an auto-gc
  */
René Scharfe Oct. 28, 2022, 2:23 p.m. UTC | #3
Am 27.10.22 um 22:11 schrieb Jeff King:
> On Thu, Oct 27, 2022 at 06:30:36PM +0200, René Scharfe wrote:
>
>> Replace the convenience functions run_command_v_opt() et. al. and use
>> struct child_process and run_command() directly instead, for an overall
>> code reduction and a simpler and more flexible API that allows creating
>> argument lists without magic numbers and reduced risk of memory leaks.
>>
>> This is a broken-out and polished version of the original scratch at
>> https://lore.kernel.org/git/9d924a5d-5c72-fbe6-270c-a8f6c5fc5850@web.de/
>
> I read through this and it all looks fine to me. I was a bit puzzled at
> the layout of your series at first. In particular, the difference
> between cases in patch 4 versus the later ones.
>
> I think it is that in patch 4, these are all unambiguously positive
> because we are getting rid of magic numbers (or magically-sized arrays).
> Whereas in patches 5-8, there's nothing inherently wrong with the
> call-sites; but as we get rid of the API wrappers, we convert them. So
> they are collateral damage, so to speak, from the simplification of the
> API.
>
> That makes sense to me, though I could point out that most of the sites
> cleaned up in patch 4 _could_ be converted to look like the ones that
> are converted in 5-8. Obviously that doesn't make sense to do, knowing
> that 5-8 are coming. But if the point in splitting it this way is to
> show that we could stop at patch 4, cleaning up call sites but not
> shrinking the run-command API, then I just want to point out that there
> is another way to do those cleanups. :)

Yes, almost, except that I think 5-7 are doing necessary pruning and 8
requires a small leap of faith in the value of simplicity.  And I wanted
to shrink down and simplify that last patch.

René
René Scharfe Oct. 28, 2022, 4:04 p.m. UTC | #4
Am 27.10.22 um 23:46 schrieb Ævar Arnfjörð Bjarmason:
>
> On Thu, Oct 27 2022, René Scharfe wrote:
>
>> Replace the convenience functions run_command_v_opt() et. al. and use
>> struct child_process and run_command() directly instead, for an overall
>> code reduction and a simpler and more flexible API that allows creating
>> argument lists without magic numbers and reduced risk of memory leaks.
>>
>> This is a broken-out and polished version of the original scratch at
>> https://lore.kernel.org/git/9d924a5d-5c72-fbe6-270c-a8f6c5fc5850@web.de/
>>
>> Ævar Arnfjörð Bjarmason (1):
>>   merge: remove always-the-same "verbose" arguments
>>
>> René Scharfe (7):
>>   bisect--helper: factor out do_bisect_run()
>>   use child_process members "args" and "env" directly
>>   use child_process member "args" instead of string array variable
>>   replace and remove run_command_v_opt_cd_env()
>>   replace and remove run_command_v_opt_tr2()
>>   replace and remove run_command_v_opt_cd_env_tr2()
>>   replace and remove run_command_v_opt()
>
> Even though I had a an earlier alternate series series[1] for this I'd
> be happy to see this version go in. I left some comments here and there,
> but with/without a re-roll am happy to give this:
>
> 	Reviewed-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>
> I think I would have just gone for this in the first place, but thought
> that people loved the convenience functions too much. It can be hard to
> judge sentiments in advance :)
>
> One reason I hadn't re-submitted something is that there were
> outstanding new conflicts with "seen", and I see from applying this
> topic & merging it that it conflicts somewhat heavily. Junio seems to be
> on-board with this though, so maybe he won't mind.
>
> I didn't see any glaring instances where this made things worse, so
> maybe we didn't need these convenience wrappers in the first place.

Right, and I think this is important.

>
> But from the earlier discussion it does seema bit like we tossed the
> very notion out because some didn't like the duplicating of struct
> members with the flags (which I also doen't like).
>
> So I came up with the below experiment on top, it's not an attempt to
> convert all callers, just a demo.
>
> Maybe you think some ideas here are worth using, I probably won't pursue
> it (but maybe as ideas for some other future API).
>
> It's a combination of stuff, some of which you might like, some not,
> namely:
>
> - Have the API work the same way, but just have a run_commandl(&opt,
>   ...) in addition to the run_command(). It's just a trivial wrapper to
>   push stuff into &cmd.args for you.
>
> - I saw a few callers that could have perhaps used a similarly trivial
>   run_commandv(), but that doesn't benefit from LAST_ARG_MUST_BE_NULL,
>   so I didn't bother.

I thought about that as well, but at least for me is probably a just a
case of loss aversion, which I have aplenty.  run_command_v_opt() alone
isn't bad and patch 8 on its own probably wouldn't fly, so I mentally
cling to it.  But without it the API is untangled and simpler.  Only a
single way to specify flags and arguments, no shortcuts for special
combinations.  Overall easier to use once we forget the old ways.

>
> - I wish C had a nicer syntax for not just declaring but squashing
>   together compile_time bracketed lists (think set operations). But the
>   below "CHILD_PROCESS_INIT_LIST" is a pretty good poor man's version.
>
>   I see gcc/clang nicely give us safety rails for that with
>   "-Woverride-init", for this sort of "opts struct with internal stuff,
>   but also user options" I think it works out very nicely.
>

That's a nice and simple macro.  I played with a gross variant à la

  #define CHILD_PROCESS_INIT_EX(...) { .args = STRVEC_INIT, __VA_ARGS__ }

which would allow e.g.

  struct child_process cmd = CHILD_PROCESS_INIT_EX(.git_cmd = 1);

Yours is better, but they share the downside of not actually saving any
lines of code..

> - We have quite a few callers that want "on error, die", so maybe we
>   should have something like that "on_error" sooner than later.

We could add a die_on_error bit for that, or start_command_or_die() and
run_command_or_die() variants (there I go again, multiplying APIs..).
They could report the failed command, which a caller can't do because
the internal strvec is already cleared once it learns of the failure.

>
> On clever (but perhaps overly clever) thing I didn't use, but played
> with recently in another context, is that now with C99 you can also do:
>
> 	int run_commandl(struct child_process *cmd, ...);
> 	#define run_command(...) run_command_1(__VA_ARGS__, NULL)
>
> I.e. make the API itself support all of:
>
> 	run_command(&cmd);
> 	run_command(&cmd, "reboot");
> 	run_command(&cmd, "reboot", NULL);
>
> I haven't made up my mind on whether that's just overly clever, or
> outright insane :)

Neither, I'd say.  By combining two operations it's less flexible than
a pure run method plus direct access to a strvec member.  It's a
shortcut that may save a line but requires more effort to extend its
callers, e.g. to conditionally add a new argument.

>
> diff --git a/bisect.c b/bisect.c
> index ec7487e6836..ef4f80650f7 100644
> --- a/bisect.c
> +++ b/bisect.c
> @@ -740,9 +740,8 @@ enum bisect_error bisect_checkout(const struct object_id *bisect_rev,
>  		struct child_process cmd = CHILD_PROCESS_INIT;
>
>  		cmd.git_cmd = 1;
> -		strvec_pushl(&cmd.args, "checkout", "-q",
> -			     oid_to_hex(bisect_rev), "--", NULL);
> -		if (run_command(&cmd))
> +		if (run_commandl(&cmd, "checkout", "-q",
> +				 oid_to_hex(bisect_rev), "--", NULL))
>  			/*
>  			 * Errors in `run_command()` itself, signaled by res < 0,
>  			 * and errors in the child process, signaled by res > 0
> diff --git a/builtin/am.c b/builtin/am.c
> index 20aea0d2487..3b7df32ce22 100644
> --- a/builtin/am.c
> +++ b/builtin/am.c
> @@ -2189,10 +2189,9 @@ static int show_patch(struct am_state *state, enum show_patch_type sub_mode)
>  	if (!is_null_oid(&state->orig_commit)) {
>  		struct child_process cmd = CHILD_PROCESS_INIT;
>
> -		strvec_pushl(&cmd.args, "show", oid_to_hex(&state->orig_commit),
> -			     "--", NULL);
>  		cmd.git_cmd = 1;
> -		return run_command(&cmd);
> +		return run_commandl(&cmd, "show", oid_to_hex(&state->orig_commit),
> +				    "--", NULL);
>  	}
>
>  	switch (sub_mode) {
> diff --git a/builtin/bisect--helper.c b/builtin/bisect--helper.c
> index 1d2ce8a0e12..087d21c614a 100644
> --- a/builtin/bisect--helper.c
> +++ b/builtin/bisect--helper.c
> @@ -764,12 +764,12 @@ static enum bisect_error bisect_start(struct bisect_terms *terms, const char **a
>  		strbuf_read_file(&start_head, git_path_bisect_start(), 0);
>  		strbuf_trim(&start_head);
>  		if (!no_checkout) {
> -			struct child_process cmd = CHILD_PROCESS_INIT;
> -
> -			cmd.git_cmd = 1;
> -			strvec_pushl(&cmd.args, "checkout", start_head.buf,
> -				     "--", NULL);
> -			if (run_command(&cmd)) {
> +			struct child_process cmd = {
> +				CHILD_PROCESS_INIT_LIST,
> +				.git_cmd = 1,
> +			};
> +			if (run_commandl(&cmd, "checkout", start_head.buf,
> +					 "--", NULL)) {
>  				res = error(_("checking out '%s' failed."
>  						 " Try 'git bisect start "
>  						 "<valid-branch>'."),
> @@ -1147,8 +1147,7 @@ static int do_bisect_run(const char *command)
>
>  	printf(_("running %s\n"), command);
>  	cmd.use_shell = 1;
> -	strvec_push(&cmd.args, command);
> -	return run_command(&cmd);
> +	return run_commandl(&cmd, command, NULL);
>  }
>
>  static int verify_good(const struct bisect_terms *terms, const char *command)
> diff --git a/builtin/clone.c b/builtin/clone.c
> index 0e4348686b6..80d09e0fbf1 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -655,7 +655,6 @@ static int git_sparse_checkout_init(const char *repo)
>  {
>  	struct child_process cmd = CHILD_PROCESS_INIT;
>  	int result = 0;
> -	strvec_pushl(&cmd.args, "-C", repo, "sparse-checkout", "set", NULL);
>
>  	/*
>  	 * We must apply the setting in the current process
> @@ -664,7 +663,7 @@ static int git_sparse_checkout_init(const char *repo)
>  	core_apply_sparse_checkout = 1;
>
>  	cmd.git_cmd = 1;
> -	if (run_command(&cmd)) {
> +	if (run_commandl(&cmd, "-C", repo, "sparse-checkout", "set", NULL)) {
>  		error(_("failed to initialize sparse-checkout"));
>  		result = 1;
>  	}
> @@ -868,13 +867,14 @@ static void dissociate_from_references(void)
>  	char *alternates = git_pathdup("objects/info/alternates");
>
>  	if (!access(alternates, F_OK)) {
> -		struct child_process cmd = CHILD_PROCESS_INIT;
> -
> -		cmd.git_cmd = 1;
> -		cmd.no_stdin = 1;
> -		strvec_pushl(&cmd.args, "repack", "-a", "-d", NULL);
> -		if (run_command(&cmd))
> -			die(_("cannot repack to clean up"));
> +		struct child_process cmd = {
> +			CHILD_PROCESS_INIT_LIST,
> +			.git_cmd = 1,
> +			.no_stdin = 1,
> +			.on_error = CHILD_PROCESS_ON_ERROR_DIE,
> +		};
> +
> +		run_commandl(&cmd, "repack", "-a", "-d", NULL);
>  		if (unlink(alternates) && errno != ENOENT)
>  			die_errno(_("cannot unlink temporary alternates file"));
>  	}
> diff --git a/builtin/difftool.c b/builtin/difftool.c
> index d7f08c8a7fa..b4165b5a8ae 100644
> --- a/builtin/difftool.c
> +++ b/builtin/difftool.c
> @@ -44,11 +44,12 @@ static int difftool_config(const char *var, const char *value, void *cb)
>
>  static int print_tool_help(void)
>  {
> -	struct child_process cmd = CHILD_PROCESS_INIT;
> -
> -	cmd.git_cmd = 1;
> -	strvec_pushl(&cmd.args, "mergetool", "--tool-help=diff", NULL);
> -	return run_command(&cmd);
> +	struct child_process cmd = {
> +		CHILD_PROCESS_INIT_LIST,
> +		.git_cmd = 1,
> +	};
> +
> +	return run_commandl(&cmd, "mergetool", "--tool-help=diff", NULL);
>  }
>
>  static int parse_index_info(char *p, int *mode1, int *mode2,
> diff --git a/git.c b/git.c
> index 6662548986f..93179f44f78 100644
> --- a/git.c
> +++ b/git.c
> @@ -724,7 +724,13 @@ static void handle_builtin(int argc, const char **argv)
>
>  static void execv_dashed_external(const char **argv)
>  {
> -	struct child_process cmd = CHILD_PROCESS_INIT;
> +	struct child_process cmd = {
> +		CHILD_PROCESS_INIT_LIST,
> +		.clean_on_exit = 1,
> +		.wait_after_clean = 1,
> +		.silent_exec_failure = 1,
> +		.trace2_child_class = "dashed",
> +	};
>  	int status;
>
>  	if (get_super_prefix())
> @@ -736,10 +742,6 @@ static void execv_dashed_external(const char **argv)
>
>  	strvec_pushf(&cmd.args, "git-%s", argv[0]);
>  	strvec_pushv(&cmd.args, argv + 1);
> -	cmd.clean_on_exit = 1;
> -	cmd.wait_after_clean = 1;
> -	cmd.silent_exec_failure = 1;
> -	cmd.trace2_child_class = "dashed";
>
>  	trace2_cmd_name("_run_dashed_");
>
> diff --git a/run-command.c b/run-command.c
> index 23e100dffc4..4b20aa1b577 100644
> --- a/run-command.c
> +++ b/run-command.c
> @@ -993,15 +993,45 @@ int finish_command_in_signal(struct child_process *cmd)
>
>  int run_command(struct child_process *cmd)
>  {
> -	int code;
> +	int starting = 1;
> +	int code = 0;
>
>  	if (cmd->out < 0 || cmd->err < 0)
>  		BUG("run_command with a pipe can cause deadlock");
>
>  	code = start_command(cmd);
>  	if (code)
> +		goto error;
> +	starting = 0;
> +	code = finish_command(cmd);
> +	if (!code)
> +		return 0;
> +error:
> +	switch (cmd->on_error) {
> +	case CHILD_PROCESS_ON_ERROR_DIE:
> +		die(starting ?
> +		    _("start_command() for '%s' failed (error code %d)") :
> +		    _("'%s': got non-zero exit code '%d'"),
> +		    cmd->args.v[0], code);
> +		break;
> +	case CHILD_PROCESS_ON_ERROR_RETURN:
>  		return code;
> -	return finish_command(cmd);
> +	default:
> +		BUG("unreachable");
> +	}
> +}
> +
> +int run_commandl(struct child_process *cmd, ...)
> +{
> +	va_list ap;
> +	const char *arg;
> +
> +	va_start(ap, cmd);
> +	while ((arg = va_arg(ap, const char *)))
> +		strvec_push(&cmd->args, arg);
> +	va_end(ap);
> +
> +	return run_command(cmd);
>  }
>
>  #ifndef NO_PTHREADS
> diff --git a/run-command.h b/run-command.h
> index fe2717ad11e..71e390350ed 100644
> --- a/run-command.h
> +++ b/run-command.h
> @@ -15,7 +15,22 @@
>   * produces in the caller in order to process it.
>   */
>
> +enum child_process_on_error {
> +	/**
> +	 * Return a status code from run_command(). Set to 0 so that
> +	 * it'll be { 0 } init'd. If it's specified in a
> +	 * CHILD_PROCESS_INIT_LIST initialization we don't want a
> +	 * "-Woverride-init" warning.
> +	 */
> +	CHILD_PROCESS_ON_ERROR_RETURN = 0,
>
> +	/**
> +	 * die() with some sensible message if run_command() would
> +	 * have returned a non-zero exit code.
> +	 */
> +	CHILD_PROCESS_ON_ERROR_DIE,
> +};
> +
>  /**
>   * This describes the arguments, redirections, and environment of a
>   * command to run in a sub-process.
> @@ -42,6 +57,10 @@
>   *		redirected.
>   */
>  struct child_process {
> +	/**
> +	 * Error behavior, see "enum child_process_on_error" above.
> +	 */
> +	enum child_process_on_error on_error;
>
>  	/**
>  	 * The .args is a `struct strvec', use that API to manipulate
> @@ -144,10 +163,11 @@ struct child_process {
>  	void (*clean_on_exit_handler)(struct child_process *process);
>  };
>
> -#define CHILD_PROCESS_INIT { \
> +#define CHILD_PROCESS_INIT_LIST \
> +	/* .on_error = CHILD_PROCESS_ON_ERROR_RETURN */ \
>  	.args = STRVEC_INIT, \
> -	.env = STRVEC_INIT, \
> -}
> +	.env = STRVEC_INIT
> +#define CHILD_PROCESS_INIT { CHILD_PROCESS_INIT_LIST }
>
>  /**
>   * The functions: child_process_init, start_command, finish_command,
> @@ -218,6 +238,14 @@ int finish_command_in_signal(struct child_process *);
>   */
>  int run_command(struct child_process *);
>
> +/**
> + * Like run_command() but takes a variable number of arguments, which
> + * will be appended with the equivalent of strvec_pushl(&cmd.args,
> + * ...) before invoking run_command().
> + */
> +LAST_ARG_MUST_BE_NULL
> +int run_commandl(struct child_process *cmd, ...);
> +
>  /*
>   * Trigger an auto-gc
>   */
Ævar Arnfjörð Bjarmason Oct. 28, 2022, 4:11 p.m. UTC | #5
On Fri, Oct 28 2022, René Scharfe wrote:

> Am 27.10.22 um 23:46 schrieb Ævar Arnfjörð Bjarmason:
>>
>> - I wish C had a nicer syntax for not just declaring but squashing
>>   together compile_time bracketed lists (think set operations). But the
>>   below "CHILD_PROCESS_INIT_LIST" is a pretty good poor man's version.
>>
>>   I see gcc/clang nicely give us safety rails for that with
>>   "-Woverride-init", for this sort of "opts struct with internal stuff,
>>   but also user options" I think it works out very nicely.
>>
>
> That's a nice and simple macro.  I played with a gross variant à la
>
>   #define CHILD_PROCESS_INIT_EX(...) { .args = STRVEC_INIT, __VA_ARGS__ }
>
> which would allow e.g.
>
>   struct child_process cmd = CHILD_PROCESS_INIT_EX(.git_cmd = 1);
>
> Yours is better, 

I actually think yours is better, anyway...

> but they share the downside of not actually saving any lines of code..

To me it's not about saving code, but that it's immediately obvious when
reading the code that this set of options can be determined and set at
function or scope entry.

We tend to otherwise have creep where the decl and option init drifts
apart over time, and with complex init's you might stare at it for 30s,
before realizing that between the decl and fully init ing it often 50
lines later nothing actually changed vis-a-vis the state, we could have
just done it earlier.

I think that's worth it in general, whether it's worth the churn in this
case...

>> - We have quite a few callers that want "on error, die", so maybe we
>>   should have something like that "on_error" sooner than later.
>
> We could add a die_on_error bit for that, or start_command_or_die() and
> run_command_or_die() variants (there I go again, multiplying APIs..).
> They could report the failed command, which a caller can't do because
> the internal strvec is already cleared once it learns of the failure.

*nod*
René Scharfe Oct. 28, 2022, 5:16 p.m. UTC | #6
Am 28.10.22 um 18:11 schrieb Ævar Arnfjörð Bjarmason:
>
> On Fri, Oct 28 2022, René Scharfe wrote:
>
>> Am 27.10.22 um 23:46 schrieb Ævar Arnfjörð Bjarmason:
>>>
>>> - I wish C had a nicer syntax for not just declaring but squashing
>>>   together compile_time bracketed lists (think set operations). But the
>>>   below "CHILD_PROCESS_INIT_LIST" is a pretty good poor man's version.
>>>
>>>   I see gcc/clang nicely give us safety rails for that with
>>>   "-Woverride-init", for this sort of "opts struct with internal stuff,
>>>   but also user options" I think it works out very nicely.
>>>
>>
>> That's a nice and simple macro.  I played with a gross variant à la
>>
>>   #define CHILD_PROCESS_INIT_EX(...) { .args = STRVEC_INIT, __VA_ARGS__ }
>>
>> which would allow e.g.
>>
>>   struct child_process cmd = CHILD_PROCESS_INIT_EX(.git_cmd = 1);
>>
>> Yours is better,
>
> I actually think yours is better, anyway...
>
>> but they share the downside of not actually saving any lines of code..
>
> To me it's not about saving code, but that it's immediately obvious when
> reading the code that this set of options can be determined and set at
> function or scope entry.
>
> We tend to otherwise have creep where the decl and option init drifts
> apart over time, and with complex init's you might stare at it for 30s,
> before realizing that between the decl and fully init ing it often 50
> lines later nothing actually changed vis-a-vis the state, we could have
> just done it earlier.

Hmm, we could do that by collecting the flag setting parts at the top,
without the need for a new macro.  Or at the bottom, before the
run_command() call.

> I think that's worth it in general, whether it's worth the churn in this
> case...
>
>>> - We have quite a few callers that want "on error, die", so maybe we
>>>   should have something like that "on_error" sooner than later.
>>
>> We could add a die_on_error bit for that, or start_command_or_die() and
>> run_command_or_die() variants (there I go again, multiplying APIs..).
>> They could report the failed command, which a caller can't do because
>> the internal strvec is already cleared once it learns of the failure.
>
> *nod*

Wait a second: Does it even make sense to mention the command in a die()
message after start_command() failed?  Unless .silent_exec_failure is
set, start_command() already reports it.  E.g. archive-tar.c has:

	if (start_command(&filter) < 0)
		die_errno(_("unable to start '%s' filter"), cmd.buf);

... and the result looks like this upon failure:

   $ git -c tar.tgz.command=nonsense archive --format=tgz HEAD
   error: cannot run nonsense: No such file or directory
   fatal: unable to start 'nonsense' filter: No such file or directory

The second message is mostly redundant, but it mentions that the failed
command was a filter.  Probably .silent_exec_failure should be set here,
then the die() message is no longer redundant.  This requires args[0] to
be stored outside of struct child_process, though, which is already done
here, but may be a bit tedious in other cases.

So for start_command(), would it be a generally useful to support a
scenario where upon failure

- the program terminates,
- but before that prints a single message,
- which includes the command that could not be started
- and some kind of hint why we tried to start it?

For run_command() we'd need to distinguish between not being able to
run the command and getting an error code from it after a successful
start.

René
Ævar Arnfjörð Bjarmason Oct. 29, 2022, 2:17 a.m. UTC | #7
On Fri, Oct 28 2022, René Scharfe wrote:

> Am 28.10.22 um 18:11 schrieb Ævar Arnfjörð Bjarmason:
>>
>> On Fri, Oct 28 2022, René Scharfe wrote:
>>
>>> Am 27.10.22 um 23:46 schrieb Ævar Arnfjörð Bjarmason:
>>>>
>>>> - I wish C had a nicer syntax for not just declaring but squashing
>>>>   together compile_time bracketed lists (think set operations). But the
>>>>   below "CHILD_PROCESS_INIT_LIST" is a pretty good poor man's version.
>>>>
>>>>   I see gcc/clang nicely give us safety rails for that with
>>>>   "-Woverride-init", for this sort of "opts struct with internal stuff,
>>>>   but also user options" I think it works out very nicely.
>>>>
>>>
>>> That's a nice and simple macro.  I played with a gross variant à la
>>>
>>>   #define CHILD_PROCESS_INIT_EX(...) { .args = STRVEC_INIT, __VA_ARGS__ }
>>>
>>> which would allow e.g.
>>>
>>>   struct child_process cmd = CHILD_PROCESS_INIT_EX(.git_cmd = 1);
>>>
>>> Yours is better,
>>
>> I actually think yours is better, anyway...
>>
>>> but they share the downside of not actually saving any lines of code..
>>
>> To me it's not about saving code, but that it's immediately obvious when
>> reading the code that this set of options can be determined and set at
>> function or scope entry.
>>
>> We tend to otherwise have creep where the decl and option init drifts
>> apart over time, and with complex init's you might stare at it for 30s,
>> before realizing that between the decl and fully init ing it often 50
>> lines later nothing actually changed vis-a-vis the state, we could have
>> just done it earlier.
>
> Hmm, we could do that by collecting the flag setting parts at the top,
> without the need for a new macro.

You mean just to pinky promise to always try to set the flags right
after we declare variables. Yes, we can try to aim for that, but
sometimes you need to declare quite a few of them (e.g. that difftools
caller), so not having distance between the decl & setting can help.
>> I think that's worth it in general, whether it's worth the churn in this
>> case...
>>
>>>> - We have quite a few callers that want "on error, die", so maybe we
>>>>   should have something like that "on_error" sooner than later.
>>>
>>> We could add a die_on_error bit for that, or start_command_or_die() and
>>> run_command_or_die() variants (there I go again, multiplying APIs..).
>>> They could report the failed command, which a caller can't do because
>>> the internal strvec is already cleared once it learns of the failure.
>>
>> *nod*
>
> Wait a second: Does it even make sense to mention the command in a die()
> message after start_command() failed?  Unless .silent_exec_failure is
> set, start_command() already reports it.  E.g. archive-tar.c has:
>
> 	if (start_command(&filter) < 0)
> 		die_errno(_("unable to start '%s' filter"), cmd.buf);
>
> ... and the result looks like this upon failure:
>
>    $ git -c tar.tgz.command=nonsense archive --format=tgz HEAD
>    error: cannot run nonsense: No such file or directory
>    fatal: unable to start 'nonsense' filter: No such file or directory
>
> The second message is mostly redundant, but it mentions that the failed
> command was a filter.  Probably .silent_exec_failure should be set here,
> then the die() message is no longer redundant.  This requires args[0] to
> be stored outside of struct child_process, though, which is already done
> here, but may be a bit tedious in other cases.

Yes, maybe these messages aren't all that useful. But note that that
"error" is emittted by thy "#ifndef ...WINDOWS..." part of the code. See
also the recent 255a6f91ae4 (t1800: correct test to handle Cygwin,
2022-09-15).

But I suspect some of the "die" messaging is just boilerplate/legacy.

> So for start_command(), would it be a generally useful to support a
> scenario where upon failure
>
> - the program terminates,
> - but before that prints a single message,
> - which includes the command that could not be started

Isn't this just the "cannot spawn"/"cannot fork" etc. messaging from
start_command() now?

> - and some kind of hint why we tried to start it?
>
> For run_command() we'd need to distinguish between not being able to
> run the command and getting an error code from it after a successful
> start.

*nod*, it would be nice if that messaging was portable. Part of that
just seems to be that we need to do the same (or share) the whole "set
die message" part we do on *nix.

Some of the reasons a command fails aren't portable, but we should be
able to portably emit e.g. "can't execute this command because it
doesn't exist" etc.
René Scharfe Oct. 29, 2022, 10:05 a.m. UTC | #8
Am 29.10.2022 um 04:17 schrieb Ævar Arnfjörð Bjarmason:
>
> On Fri, Oct 28 2022, René Scharfe wrote:
>
>> Wait a second: Does it even make sense to mention the command in a die()
>> message after start_command() failed?  Unless .silent_exec_failure is
>> set, start_command() already reports it.  E.g. archive-tar.c has:
>>
>> 	if (start_command(&filter) < 0)
>> 		die_errno(_("unable to start '%s' filter"), cmd.buf);
>>
>> ... and the result looks like this upon failure:
>>
>>    $ git -c tar.tgz.command=nonsense archive --format=tgz HEAD
>>    error: cannot run nonsense: No such file or directory
>>    fatal: unable to start 'nonsense' filter: No such file or directory
>>
>> The second message is mostly redundant, but it mentions that the failed
>> command was a filter.  Probably .silent_exec_failure should be set here,
>> then the die() message is no longer redundant.  This requires args[0] to
>> be stored outside of struct child_process, though, which is already done
>> here, but may be a bit tedious in other cases.
>
> Yes, maybe these messages aren't all that useful. But note that that
> "error" is emittted by thy "#ifndef ...WINDOWS..." part of the code. See
> also the recent 255a6f91ae4 (t1800: correct test to handle Cygwin,
> 2022-09-15).

On Windows I get:

  $ git -c tar.tgz.command=nonsense archive --format=tgz HEAD
  error: cannot spawn nonsense: No such file or directory
  fatal: unable to start 'nonsense' filter: No such file or directory

>> So for start_command(), would it be a generally useful to support a
>> scenario where upon failure
>>
>> - the program terminates,
>> - but before that prints a single message,
>> - which includes the command that could not be started
>
> Isn't this just the "cannot spawn"/"cannot fork" etc. messaging from
> start_command() now?

Sure, a subset is easy, but do we need all four points?

>> - and some kind of hint why we tried to start it?

I suspect .silent_exec_failure suffices for most cases.  Testing calls
of commands that usually exist is a bit hard without source code
changes, though.  Then again: Beatifying messages that will almost
never be seen isn't probably worth it anyway.

But I'll send a patch for the archive case above.

René
Taylor Blau Oct. 29, 2022, 7:32 p.m. UTC | #9
On Thu, Oct 27, 2022 at 06:30:36PM +0200, René Scharfe wrote:
> Replace the convenience functions run_command_v_opt() et. al. and use
> struct child_process and run_command() directly instead, for an overall
> code reduction and a simpler and more flexible API that allows creating
> argument lists without magic numbers and reduced risk of memory leaks.
>
> This is a broken-out and polished version of the original scratch at
> https://lore.kernel.org/git/9d924a5d-5c72-fbe6-270c-a8f6c5fc5850@web.de/

Thanks, this round is looking pretty good to me. It sounds like there
were a couple of small comments for which a reroll would be appreciated.

Assuming that that all looks good, I can queue the rerolled version.
Thanks.

Thanks,
Taylor