Message ID | 20190619094630.32557-1-pclouds@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | fetch: only run 'gc' once when fetching multiple remotes | expand |
On Wed, Jun 19, 2019 at 04:46:30PM +0700, Nguyễn Thái Ngọc Duy wrote: > In multiple remotes mode, git-fetch is launched for n-1 remotes and the > last remote is handled by the current process. Each of these processes > will in turn run 'gc' at the end. > > This is not really a problem because even if multiple 'gc --auto' is run > at the same time we still handle it correctly. It does show multiple > "auto packing in the background" messages though. And we may waste some > resources when gc actually runs because we still do some stuff before > checking the lock and moving it to background. > > So let's try to avoid that. We should only need one 'gc' run after all > objects and references are added anyway. Add a new option --no-auto-gc > that will be used by those n-1 processes. 'gc --auto' will always run on > the main fetch process (*). Yeah, that makes sense. I was surprised that we needed a new command-line option here, but I guess the sub-fetch processes really have no idea that they're subservient to a multi-remote fetch (they do get "--append", but of course somebody could specify that independently). Another option would be to just pass "-c gc.auto=0" to the child processes to inhibit auto-gc. But maybe it makes sense to have a nicer interface (after all, somebody else could be doing the same "let's do a bunch of fetches in a row" without using the multi-fetch code). Though there I kind of wonder if this would apply to other scripted uses, too. E.g., if I'm doing a bunch of commits, I might want to inhibit auto-gc and then run it myself at the end. Should we support "GIT_AUTO_GC=0" in the environment (and a matching "git --no-auto-gc ..." option that could be used here)? > Documentation/fetch-options.txt | 4 ++++ > builtin/fetch.c | 17 +++++++++++------ > t/t5514-fetch-multiple.sh | 7 +++++-- > 3 files changed, 20 insertions(+), 8 deletions(-) My musings above aside, the patch looks correct to me. -Peff
On Thu, Jun 20, 2019 at 1:59 AM Jeff King <peff@peff.net> wrote: > I was surprised that we needed a new command-line option here, but I > guess the sub-fetch processes really have no idea that they're > subservient to a multi-remote fetch (they do get "--append", but of > course somebody could specify that independently). > > Another option would be to just pass "-c gc.auto=0" to the child > processes to inhibit auto-gc. But maybe it makes sense to have a nicer > interface (after all, somebody else could be doing the same "let's do a > bunch of fetches in a row" without using the multi-fetch code). Nah to me -c is much nicer (and flexible too). The only thing I'm not sure about is whether a user could override it. If fetch.c adds -c gc.auto=0 automatically, and the user wants auto gc back, will "git -c gc.auto=non-zero fetch --multiple" still work? I haven't checked git_config_push_parameter() carefully, but I have an impression that the parameter order there is "wrong", at least in this case. > Though there I kind of wonder if this would apply to other scripted > uses, too. E.g., if I'm doing a bunch of commits, I might want to > inhibit auto-gc and then run it myself at the end. Should we support > "GIT_AUTO_GC=0" in the environment (and a matching "git --no-auto-gc > ..." option that could be used here)? export GIT_CONFIG=gc.auto=0 ?
On Thu, Jun 20, 2019 at 05:11:03PM +0700, Duy Nguyen wrote: > > Another option would be to just pass "-c gc.auto=0" to the child > > processes to inhibit auto-gc. But maybe it makes sense to have a nicer > > interface (after all, somebody else could be doing the same "let's do a > > bunch of fetches in a row" without using the multi-fetch code). > > Nah to me -c is much nicer (and flexible too). The only thing I'm not > sure about is whether a user could override it. If fetch.c adds -c > gc.auto=0 automatically, and the user wants auto gc back, will "git -c > gc.auto=non-zero fetch --multiple" still work? > > I haven't checked git_config_push_parameter() carefully, but I have an > impression that the parameter order there is "wrong", at least in this > case. It depends what you're trying to get it to do. I'd expect that command to turn on auto-gc for just the outer fetch (i.e., overriding any on-disk disabling of auto-gc), but keep it off for the child fetches. And I think that is how it would behave: the outer fetch only sees the config you provided. The inner ones see "gc.auto=non-zero gc.auto=0-zero" because git_config_push_parameter() appends. And they accept the latter under last-one-wins rules. If you expect to be able to re-enable auto-gc in the child fetches, then I don't think there would be a way to do that. > > Though there I kind of wonder if this would apply to other scripted > > uses, too. E.g., if I'm doing a bunch of commits, I might want to > > inhibit auto-gc and then run it myself at the end. Should we support > > "GIT_AUTO_GC=0" in the environment (and a matching "git --no-auto-gc > > ..." option that could be used here)? > > export GIT_CONFIG=gc.auto=0 ? Almost. The parser for GIT_CONFIG_PARAMETERS is very picky about seeing single quotes around each item, which makes it a little unfriendly to use manually. Plus there may be existing values if your script was invoked as "git -c whatever my-script". It's probably not that big a deal for a script to just use "git -c gc.auto" to inhibit auto-gc for each command that needs it. -Peff
diff --git a/Documentation/fetch-options.txt b/Documentation/fetch-options.txt index 91c47752ec..592f391298 100644 --- a/Documentation/fetch-options.txt +++ b/Documentation/fetch-options.txt @@ -88,6 +88,10 @@ ifndef::git-pull[] Allow several <repository> and <group> arguments to be specified. No <refspec>s may be specified. +--[no-]auto-gc:: + Run `git gc --auto` at the end to perform garbage collection + if needed. This is enabled by default. + -p:: --prune:: Before fetching, remove any remote-tracking references that no diff --git a/builtin/fetch.c b/builtin/fetch.c index 4ba63d5ac6..6a3c507897 100644 --- a/builtin/fetch.c +++ b/builtin/fetch.c @@ -48,6 +48,7 @@ static int prune_tags = -1; /* unspecified */ static int all, append, dry_run, force, keep, multiple, update_head_ok, verbosity, deepen_relative; static int progress = -1; +static int enable_auto_gc = 1; static int tags = TAGS_DEFAULT, unshallow, update_shallow, deepen; static int max_children = 1; static enum transport_family family; @@ -169,6 +170,8 @@ static struct option builtin_fetch_options[] = { OPT_STRING_LIST(0, "negotiation-tip", &negotiation_tip, N_("revision"), N_("report that we have only objects reachable from this object")), OPT_PARSE_LIST_OBJECTS_FILTER(&filter_options), + OPT_BOOL(0, "auto-gc", &enable_auto_gc, + N_("run 'gc --auto' after fetching")), OPT_END() }; @@ -1424,7 +1427,7 @@ static int fetch_multiple(struct string_list *list) return errcode; } - argv_array_pushl(&argv, "fetch", "--append", NULL); + argv_array_pushl(&argv, "fetch", "--append", "--no-auto-gc", NULL); add_options_to_argv(&argv); for (i = 0; i < list->nr; i++) { @@ -1674,11 +1677,13 @@ int cmd_fetch(int argc, const char **argv, const char *prefix) close_all_packs(the_repository->objects); - argv_array_pushl(&argv_gc_auto, "gc", "--auto", NULL); - if (verbosity < 0) - argv_array_push(&argv_gc_auto, "--quiet"); - run_command_v_opt(argv_gc_auto.argv, RUN_GIT_CMD); - argv_array_clear(&argv_gc_auto); + if (enable_auto_gc) { + argv_array_pushl(&argv_gc_auto, "gc", "--auto", NULL); + if (verbosity < 0) + argv_array_push(&argv_gc_auto, "--quiet"); + run_command_v_opt(argv_gc_auto.argv, RUN_GIT_CMD); + argv_array_clear(&argv_gc_auto); + } return result; } diff --git a/t/t5514-fetch-multiple.sh b/t/t5514-fetch-multiple.sh index 0030c92e1a..5426d4b5ab 100755 --- a/t/t5514-fetch-multiple.sh +++ b/t/t5514-fetch-multiple.sh @@ -105,9 +105,12 @@ test_expect_success 'git fetch --multiple (two remotes)' ' git remote rm origin && git remote add one ../one && git remote add two ../two && - git fetch --multiple one two && + GIT_TRACE=1 git fetch --multiple one two 2>trace && git branch -r > output && - test_cmp ../expect output) + test_cmp ../expect output && + grep "built-in: git gc" trace >gc && + test_line_count = 1 gc + ) ' test_expect_success 'git fetch --multiple (bad remote names)' '
In multiple remotes mode, git-fetch is launched for n-1 remotes and the last remote is handled by the current process. Each of these processes will in turn run 'gc' at the end. This is not really a problem because even if multiple 'gc --auto' is run at the same time we still handle it correctly. It does show multiple "auto packing in the background" messages though. And we may waste some resources when gc actually runs because we still do some stuff before checking the lock and moving it to background. So let's try to avoid that. We should only need one 'gc' run after all objects and references are added anyway. Add a new option --no-auto-gc that will be used by those n-1 processes. 'gc --auto' will always run on the main fetch process (*). (*) even if we fetch remotes in parallel at some point in future, this should still be fine because we should "join" all those processes before this step. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> --- Documentation/fetch-options.txt | 4 ++++ builtin/fetch.c | 17 +++++++++++------ t/t5514-fetch-multiple.sh | 7 +++++-- 3 files changed, 20 insertions(+), 8 deletions(-)