diff mbox series

[v2,4/7] for-each-repo: run subcommands on configured repos

Message ID 0314258c5cbb8fd771c35e433bf6be95297c4597.1598380805.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series Maintenance III: background maintenance | expand

Commit Message

Philippe Blain via GitGitGadget Aug. 25, 2020, 6:40 p.m. UTC
From: Derrick Stolee <dstolee@microsoft.com>

It can be helpful to store a list of repositories in global or system
config and then iterate Git commands on that list. Create a new builtin
that makes this process simple for experts. We will use this builtin to
run scheduled maintenance on all configured repositories in a future
change.

The test is very simple, but does highlight that the "--" argument is
optional.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 .gitignore                          |  1 +
 Documentation/git-for-each-repo.txt | 59 +++++++++++++++++++++++++++++
 Makefile                            |  1 +
 builtin.h                           |  1 +
 builtin/for-each-repo.c             | 58 ++++++++++++++++++++++++++++
 command-list.txt                    |  1 +
 git.c                               |  1 +
 t/t0068-for-each-repo.sh            | 30 +++++++++++++++
 8 files changed, 152 insertions(+)
 create mode 100644 Documentation/git-for-each-repo.txt
 create mode 100644 builtin/for-each-repo.c
 create mode 100755 t/t0068-for-each-repo.sh

Comments

Junio C Hamano Aug. 25, 2020, 10:19 p.m. UTC | #1
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +SYNOPSIS
> +--------
> +[verse]
> +'git for-each-repo' --config=<config> [--] <arguments>
> + ...
> +--config=<config>::
> +	Use the given config variable as a multi-valued list storing
> +	absolute path names.

Would it make sense to allow this config to be read from the current
repository, I wonder.  It is probably designed to be written to
either ~/.gitconfig or /etc/gitconfig because it is probably a need
that is not per-repository to list repositories for various purposes
specified by the config key, but I suspect there _might_ be a good
use case for storing some custom list of repositories in the
configuration file local to a repository, but it is not quite
obvious what it is.

If we have a good example, we may want to spell it out---that would
help future readers who wonder about this (just like I am doing now).

Also, if we do read from local config, should there be a way to say
"ah, you may have read values from /etc/gitconfig and ~/.gitconfig,
but please forget them---I have a full list I care when you are
running in this repository", i.e. clear the list.  It is purely a
convention and there is no built-in mechanism for this in the config
API, but often it is signalled by giving an empty string as a value.

By the way, I do not have a good concrete suggestion, but can we use
something better than <config> as the placeholder?  I first thought
this was naming the name of a file that lists repositories, not the
config variable name in our usual config namespace.

> +static int run_command_on_repo(const char *path,
> +			       void *cbdata)

Is that on repo or in repo?  When I saw "-C" on the command line, I
immediately thought of "in repo".

> +{
> +	int i;
> +	struct child_process child = CHILD_PROCESS_INIT;
> +	struct strvec *args = (struct strvec *)cbdata;
> +
> +	child.git_cmd = 1;
> +	strvec_pushl(&child.args, "-C", path, NULL);
> +
> +	for (i = 0; i < args->nr; i++)
> +		strvec_push(&child.args, args->v[i]);

Would strvec_pushv() work, or is args->v[] not NULL terminated?

> +	return run_command(&child);
> +}


> +	values = repo_config_get_value_multi(the_repository,
> +					     config_key);

Not your fault, but it is a bit unsatisfactory that we do not have
special "type" meant for paths in the config API, unlike the
parse-options API where there is a "filename" type that is a bit
richer than a vanilla "string" type by allowing "prefix" handling.
For the purposes of this, as the values are limited to absolute/full
pathnames, it does not hurt as much, though.

Thanks.
Derrick Stolee Aug. 26, 2020, 4:03 p.m. UTC | #2
On 8/25/2020 6:19 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> +SYNOPSIS
>> +--------
>> +[verse]
>> +'git for-each-repo' --config=<config> [--] <arguments>
>> + ...
>> +--config=<config>::
>> +	Use the given config variable as a multi-valued list storing
>> +	absolute path names.
> 
> Would it make sense to allow this config to be read from the current
> repository, I wonder.  It is probably designed to be written to
> either ~/.gitconfig or /etc/gitconfig because it is probably a need
> that is not per-repository to list repositories for various purposes
> specified by the config key, but I suspect there _might_ be a good
> use case for storing some custom list of repositories in the
> configuration file local to a repository, but it is not quite
> obvious what it is.
> 
> If we have a good example, we may want to spell it out---that would
> help future readers who wonder about this (just like I am doing now).
> 
> Also, if we do read from local config, should there be a way to say
> "ah, you may have read values from /etc/gitconfig and ~/.gitconfig,
> but please forget them---I have a full list I care when you are
> running in this repository", i.e. clear the list.  It is purely a
> convention and there is no built-in mechanism for this in the config
> API, but often it is signalled by giving an empty string as a value.

I guess I should test this, but if I ask for a multi-valued config,
will I not get _all_ of the results from /etc/gitconfig, ~/.gitconfig,
AND .git/config? That was my expectation, which is why I don't specify
"local" or "global" config anywhere in the discussion.

> By the way, I do not have a good concrete suggestion, but can we use
> something better than <config> as the placeholder?  I first thought
> this was naming the name of a file that lists repositories, not the
> config variable name in our usual config namespace.
Sure. How about "<key>"?

>> +static int run_command_on_repo(const char *path,
>> +			       void *cbdata)
> 
> Is that on repo or in repo?  When I saw "-C" on the command line, I
> immediately thought of "in repo".

"in" is better.

>> +{
>> +	int i;
>> +	struct child_process child = CHILD_PROCESS_INIT;
>> +	struct strvec *args = (struct strvec *)cbdata;
>> +
>> +	child.git_cmd = 1;
>> +	strvec_pushl(&child.args, "-C", path, NULL);
>> +
>> +	for (i = 0; i < args->nr; i++)
>> +		strvec_push(&child.args, args->v[i]);
> 
> Would strvec_pushv() work, or is args->v[] not NULL terminated?

Yeah, pushv should work.

>> +	return run_command(&child);
>> +}
> 
> 
>> +	values = repo_config_get_value_multi(the_repository,
>> +					     config_key);
> 
> Not your fault, but it is a bit unsatisfactory that we do not have
> special "type" meant for paths in the config API, unlike the
> parse-options API where there is a "filename" type that is a bit
> richer than a vanilla "string" type by allowing "prefix" handling.
> For the purposes of this, as the values are limited to absolute/full
> pathnames, it does not hurt as much, though.

Interesting. Noted.

Thanks,
-Stolee
diff mbox series

Patch

diff --git a/.gitignore b/.gitignore
index a5808fa30d..5eb2a2be71 100644
--- a/.gitignore
+++ b/.gitignore
@@ -67,6 +67,7 @@ 
 /git-filter-branch
 /git-fmt-merge-msg
 /git-for-each-ref
+/git-for-each-repo
 /git-format-patch
 /git-fsck
 /git-fsck-objects
diff --git a/Documentation/git-for-each-repo.txt b/Documentation/git-for-each-repo.txt
new file mode 100644
index 0000000000..94bd19da26
--- /dev/null
+++ b/Documentation/git-for-each-repo.txt
@@ -0,0 +1,59 @@ 
+git-for-each-repo(1)
+====================
+
+NAME
+----
+git-for-each-repo - Run a Git command on a list of repositories
+
+
+SYNOPSIS
+--------
+[verse]
+'git for-each-repo' --config=<config> [--] <arguments>
+
+
+DESCRIPTION
+-----------
+Run a Git command on a list of repositories. The arguments after the
+known options or `--` indicator are used as the arguments for the Git
+subprocess.
+
+THIS COMMAND IS EXPERIMENTAL. THE BEHAVIOR MAY CHANGE.
+
+For example, we could run maintenance on each of a list of repositories
+stored in a `maintenance.repo` config variable using
+
+-------------
+git for-each-repo --config=maintenance.repo maintenance run
+-------------
+
+This will run `git -C <repo> maintenance run` for each value `<repo>`
+in the multi-valued config variable `maintenance.repo`.
+
+
+OPTIONS
+-------
+--config=<config>::
+	Use the given config variable as a multi-valued list storing
+	absolute path names. Iterate on that list of paths to run
+	the given arguments.
++
+These config values are loaded from system, global, and local Git config,
+as available. If `git for-each-repo` is run in a directory that is not a
+Git repository, then only the system and global config is used.
+
+
+SUBPROCESS BEHAVIOR
+-------------------
+
+If any `git -C <repo> <arguments>` subprocess returns a non-zero exit code,
+then the `git for-each-repo` process returns that exit code without running
+more subprocesses.
+
+Each `git -C <repo> <arguments>` subprocess inherits the standard file
+descriptors `stdin`, `stdout`, and `stderr`.
+
+
+GIT
+---
+Part of the linkgit:git[1] suite
diff --git a/Makefile b/Makefile
index 65f8cfb236..7c588ff036 100644
--- a/Makefile
+++ b/Makefile
@@ -1071,6 +1071,7 @@  BUILTIN_OBJS += builtin/fetch-pack.o
 BUILTIN_OBJS += builtin/fetch.o
 BUILTIN_OBJS += builtin/fmt-merge-msg.o
 BUILTIN_OBJS += builtin/for-each-ref.o
+BUILTIN_OBJS += builtin/for-each-repo.o
 BUILTIN_OBJS += builtin/fsck.o
 BUILTIN_OBJS += builtin/gc.o
 BUILTIN_OBJS += builtin/get-tar-commit-id.o
diff --git a/builtin.h b/builtin.h
index 17c1c0ce49..ff7c6e5aa9 100644
--- a/builtin.h
+++ b/builtin.h
@@ -150,6 +150,7 @@  int cmd_fetch(int argc, const char **argv, const char *prefix);
 int cmd_fetch_pack(int argc, const char **argv, const char *prefix);
 int cmd_fmt_merge_msg(int argc, const char **argv, const char *prefix);
 int cmd_for_each_ref(int argc, const char **argv, const char *prefix);
+int cmd_for_each_repo(int argc, const char **argv, const char *prefix);
 int cmd_format_patch(int argc, const char **argv, const char *prefix);
 int cmd_fsck(int argc, const char **argv, const char *prefix);
 int cmd_gc(int argc, const char **argv, const char *prefix);
diff --git a/builtin/for-each-repo.c b/builtin/for-each-repo.c
new file mode 100644
index 0000000000..5bba623ff1
--- /dev/null
+++ b/builtin/for-each-repo.c
@@ -0,0 +1,58 @@ 
+#include "cache.h"
+#include "config.h"
+#include "builtin.h"
+#include "parse-options.h"
+#include "run-command.h"
+#include "string-list.h"
+
+static const char * const for_each_repo_usage[] = {
+	N_("git for-each-repo --config=<config> <command-args>"),
+	NULL
+};
+
+static int run_command_on_repo(const char *path,
+			       void *cbdata)
+{
+	int i;
+	struct child_process child = CHILD_PROCESS_INIT;
+	struct strvec *args = (struct strvec *)cbdata;
+
+	child.git_cmd = 1;
+	strvec_pushl(&child.args, "-C", path, NULL);
+
+	for (i = 0; i < args->nr; i++)
+		strvec_push(&child.args, args->v[i]);
+
+	return run_command(&child);
+}
+
+int cmd_for_each_repo(int argc, const char **argv, const char *prefix)
+{
+	static const char *config_key = NULL;
+	int i, result = 0;
+	const struct string_list *values;
+	struct strvec args = STRVEC_INIT;
+
+	const struct option options[] = {
+		OPT_STRING(0, "config", &config_key, N_("config"),
+			   N_("config key storing a list of repository paths")),
+		OPT_END()
+	};
+
+	argc = parse_options(argc, argv, prefix, options, for_each_repo_usage,
+			     PARSE_OPT_STOP_AT_NON_OPTION);
+
+	if (!config_key)
+		die(_("missing --config=<config>"));
+
+	for (i = 0; i < argc; i++)
+		strvec_push(&args, argv[i]);
+
+	values = repo_config_get_value_multi(the_repository,
+					     config_key);
+
+	for (i = 0; !result && i < values->nr; i++)
+		result = run_command_on_repo(values->items[i].string, &args);
+
+	return result;
+}
diff --git a/command-list.txt b/command-list.txt
index 0e3204e7d1..581499be82 100644
--- a/command-list.txt
+++ b/command-list.txt
@@ -94,6 +94,7 @@  git-fetch-pack                          synchingrepositories
 git-filter-branch                       ancillarymanipulators
 git-fmt-merge-msg                       purehelpers
 git-for-each-ref                        plumbinginterrogators
+git-for-each-repo                       plumbinginterrogators
 git-format-patch                        mainporcelain
 git-fsck                                ancillaryinterrogators          complete
 git-gc                                  mainporcelain
diff --git a/git.c b/git.c
index 24f250d29a..1cab64b5d1 100644
--- a/git.c
+++ b/git.c
@@ -511,6 +511,7 @@  static struct cmd_struct commands[] = {
 	{ "fetch-pack", cmd_fetch_pack, RUN_SETUP | NO_PARSEOPT },
 	{ "fmt-merge-msg", cmd_fmt_merge_msg, RUN_SETUP },
 	{ "for-each-ref", cmd_for_each_ref, RUN_SETUP },
+	{ "for-each-repo", cmd_for_each_repo, RUN_SETUP_GENTLY },
 	{ "format-patch", cmd_format_patch, RUN_SETUP },
 	{ "fsck", cmd_fsck, RUN_SETUP },
 	{ "fsck-objects", cmd_fsck, RUN_SETUP },
diff --git a/t/t0068-for-each-repo.sh b/t/t0068-for-each-repo.sh
new file mode 100755
index 0000000000..136b4ec839
--- /dev/null
+++ b/t/t0068-for-each-repo.sh
@@ -0,0 +1,30 @@ 
+#!/bin/sh
+
+test_description='git for-each-repo builtin'
+
+. ./test-lib.sh
+
+test_expect_success 'run based on configured value' '
+	git init one &&
+	git init two &&
+	git init three &&
+	git -C two commit --allow-empty -m "DID NOT RUN" &&
+	git config run.key "$TRASH_DIRECTORY/one" &&
+	git config --add run.key "$TRASH_DIRECTORY/three" &&
+	git for-each-repo --config=run.key commit --allow-empty -m "ran" &&
+	git -C one log -1 --pretty=format:%s >message &&
+	grep ran message &&
+	git -C two log -1 --pretty=format:%s >message &&
+	! grep ran message &&
+	git -C three log -1 --pretty=format:%s >message &&
+	grep ran message &&
+	git for-each-repo --config=run.key -- commit --allow-empty -m "ran again" &&
+	git -C one log -1 --pretty=format:%s >message &&
+	grep again message &&
+	git -C two log -1 --pretty=format:%s >message &&
+	! grep again message &&
+	git -C three log -1 --pretty=format:%s >message &&
+	grep again message
+'
+
+test_done