diff mbox series

[v8,10/37] hook: support passing stdin to hooks

Message ID 20210311021037.3001235-11-emilyshaffer@google.com (mailing list archive)
State New, archived
Headers show
Series config-based hooks | expand

Commit Message

Emily Shaffer March 11, 2021, 2:10 a.m. UTC
Some hooks (such as post-rewrite) need to take input via stdin.
Previously, callers provided stdin to hooks by setting
run-command.h:child_process.in, which takes a FD. Callers would open the
file in question themselves before calling run-command(). However, since
we will now need to seek to the front of the file and read it again for
every hook which runs, hook.h:run_command() takes a path and handles FD
management itself. Since this file is opened for read only, it should
not prevent later parallel execution support.

On the frontend, this is supported by asking for a file path, rather
than by reading stdin. Reading directly from stdin would involve caching
the entire stdin (to memory or to disk) and reading it back from the
beginning to each hook. We'd want to support cases like insufficient
memory or storage for the file. While this may prove useful later, for
now the path of least resistance is to just ask the user to make this
interim file themselves.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
 Documentation/git-hook.txt    | 11 +++++++++--
 builtin/hook.c                |  5 ++++-
 hook.c                        |  8 +++++++-
 hook.h                        |  6 +++++-
 t/t1360-config-based-hooks.sh | 24 ++++++++++++++++++++++++
 5 files changed, 49 insertions(+), 5 deletions(-)

Comments

Ævar Arnfjörð Bjarmason March 12, 2021, 9 a.m. UTC | #1
On Thu, Mar 11 2021, Emily Shaffer wrote:

> On the frontend, this is supported by asking for a file path, rather
> than by reading stdin. Reading directly from stdin would involve caching
> the entire stdin (to memory or to disk) and reading it back from the
> beginning to each hook. We'd want to support cases like insufficient
> memory or storage for the file. While this may prove useful later, for
> now the path of least resistance is to just ask the user to make this
> interim file themselves.

We need to worry about cases where we wouldn't have enough memory to
buffer the stdin, but still need to then do repo operations on the input
from such a file, presumably some giant pre-receive update or something.

Seems unlikely, and the convenience of having stdin just work by just
allocating that seems appealing, but let's read on to the rest of the
series...
Junio C Hamano March 12, 2021, 10:22 a.m. UTC | #2
Emily Shaffer <emilyshaffer@google.com> writes:

> diff --git a/hook.c b/hook.c
> index 118931f273..f906e8c61c 100644
> --- a/hook.c
> +++ b/hook.c
> @@ -240,6 +240,7 @@ void run_hooks_opt_init(struct run_hooks_opt *o)
>  {
>  	strvec_init(&o->env);
>  	strvec_init(&o->args);
> +	o->path_to_stdin = NULL;
>  	o->run_hookdir = configured_hookdir_opt();
>  }

The new member is initialized to NULL, and presumably the user of
the API would point an existing string with it.  Since there is no
free() in opt_clear() introduced by this patch, the member is
obviously a pointer to a borrowed piece of memory.

> diff --git a/hook.h b/hook.h
> index 0df785add5..2314ec5962 100644
> --- a/hook.h
> +++ b/hook.h
> @@ -52,6 +52,9 @@ struct run_hooks_opt
>  	 * to be overridden if the user can override it at the command line.
>  	 */
>  	enum hookdir_opt run_hookdir;
> +
> +	/* Path to file which should be piped to stdin for each hook */
> +	const char *path_to_stdin;
>  };

And we mark the fact that hook subsystem does not own it by making
it "const char *".  Looks quite consistent.  Good.
diff mbox series

Patch

diff --git a/Documentation/git-hook.txt b/Documentation/git-hook.txt
index 8f96c347ea..96a857c682 100644
--- a/Documentation/git-hook.txt
+++ b/Documentation/git-hook.txt
@@ -9,7 +9,8 @@  SYNOPSIS
 --------
 [verse]
 'git hook' list <hook-name>
-'git hook' run [(-e|--env)=<var>...] [(-a|--arg)=<arg>...] <hook-name>
+'git hook' run [(-e|--env)=<var>...] [(-a|--arg)=<arg>...] [--to-stdin=<path>]
+	<hook-name>
 
 DESCRIPTION
 -----------
@@ -98,7 +99,7 @@  in the order they should be run, and print the config scope where the relevant
 `hook.<hook-name>.command` was specified, not the `hookcmd` (if applicable).
 This output is human-readable and the format is subject to change over time.
 
-run [(-e|--env)=<var>...] [(-a|--arg)=<arg>...] `<hook-name>`::
+run [(-e|--env)=<var>...] [(-a|--arg)=<arg>...] [--to-stdin=<path>] `<hook-name>`::
 
 Runs hooks configured for `<hook-name>`, in the same order displayed by `git
 hook list`. Hooks configured this way may be run prepended with `sh -c`, so
@@ -124,6 +125,12 @@  Specify arguments to pass to every hook that is run.
 +
 Specify environment variables to set for every hook that is run.
 
+--to-stdin::
+	Only valid for `run`.
++
+Specify a file which will be streamed into stdin for every hook that is run.
+Each hook will receive the entire file from beginning to EOF.
+
 CONFIGURATION
 -------------
 include::config/hook.txt[]
diff --git a/builtin/hook.c b/builtin/hook.c
index e823a96238..38a4555e05 100644
--- a/builtin/hook.c
+++ b/builtin/hook.c
@@ -8,7 +8,8 @@ 
 
 static const char * const builtin_hook_usage[] = {
 	N_("git hook list <hookname>"),
-	N_("git hook run [(-e|--env)=<var>...] [(-a|--arg)=<arg>...] <hookname>"),
+	N_("git hook run [(-e|--env)=<var>...] [(-a|--arg)=<arg>...]"
+	   "[--to-stdin=<path>] <hookname>"),
 	NULL
 };
 
@@ -101,6 +102,8 @@  static int run(int argc, const char **argv, const char *prefix)
 			   N_("environment variables for hook to use")),
 		OPT_STRVEC('a', "arg", &opt.args, N_("args"),
 			   N_("argument to pass to hook")),
+		OPT_STRING(0, "to-stdin", &opt.path_to_stdin, N_("path"),
+			   N_("file to read into hooks' stdin")),
 		OPT_END(),
 	};
 
diff --git a/hook.c b/hook.c
index 118931f273..f906e8c61c 100644
--- a/hook.c
+++ b/hook.c
@@ -240,6 +240,7 @@  void run_hooks_opt_init(struct run_hooks_opt *o)
 {
 	strvec_init(&o->env);
 	strvec_init(&o->args);
+	o->path_to_stdin = NULL;
 	o->run_hookdir = configured_hookdir_opt();
 }
 
@@ -274,7 +275,12 @@  static void prepare_hook_cp(struct hook *hook, struct run_hooks_opt *options,
 	if (!hook)
 		return;
 
-	cp->no_stdin = 1;
+	/* reopen the file for stdin; run_command closes it. */
+	if (options->path_to_stdin)
+		cp->in = xopen(options->path_to_stdin, O_RDONLY);
+	else
+		cp->no_stdin = 1;
+
 	cp->env = options->env.v;
 	cp->stdout_to_stderr = 1;
 	cp->trace2_hook_name = hook->command.buf;
diff --git a/hook.h b/hook.h
index 0df785add5..2314ec5962 100644
--- a/hook.h
+++ b/hook.h
@@ -52,6 +52,9 @@  struct run_hooks_opt
 	 * to be overridden if the user can override it at the command line.
 	 */
 	enum hookdir_opt run_hookdir;
+
+	/* Path to file which should be piped to stdin for each hook */
+	const char *path_to_stdin;
 };
 
 void run_hooks_opt_init(struct run_hooks_opt *o);
@@ -68,7 +71,8 @@  int hook_exists(const char *hookname, enum hookdir_opt should_run_hookdir);
 
 /*
  * Runs all hooks associated to the 'hookname' event in order. Each hook will be
- * passed 'env' and 'args'.
+ * passed 'env' and 'args'. The file at 'stdin_path' will be closed and reopened
+ * for each hook that runs.
  */
 int run_hooks(const char *hookname, struct run_hooks_opt *options);
 
diff --git a/t/t1360-config-based-hooks.sh b/t/t1360-config-based-hooks.sh
index 1fca83d536..cace5a23c1 100755
--- a/t/t1360-config-based-hooks.sh
+++ b/t/t1360-config-based-hooks.sh
@@ -276,4 +276,28 @@  test_expect_success 'hook.runHookDir is tolerant to unknown values' '
 	test_i18ncmp expected actual
 '
 
+test_expect_success 'stdin to multiple hooks' '
+	git config --add hook.test.command "xargs -P1 -I% echo a%" &&
+	git config --add hook.test.command "xargs -P1 -I% echo b%" &&
+	test_when_finished "test_unconfig hook.test.command" &&
+
+	cat >input <<-EOF &&
+	1
+	2
+	3
+	EOF
+
+	cat >expected <<-EOF &&
+	a1
+	a2
+	a3
+	b1
+	b2
+	b3
+	EOF
+
+	git hook run --to-stdin=input test 2>actual &&
+	test_cmp expected actual
+'
+
 test_done