Message ID | pull.1838.git.1734946566885.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | maintenance: add prune-remote-refs task | expand |
Thanks for a patch. "Shubham Kanodia via GitGitGadget" <gitgitgadget@gmail.com> writes: You'd want to check your procedure to tell GGG about addresses; I am seeing these From: "Shubham Kanodia via GitGitGadget" <gitgitgadget@gmail.com> To: git@vger.kernel.org Cc: "mailto:gitster@pobox.com" <[gitster@pobox.com]>, "mailto:ps@pks.im" <[ps@pks.im]>, Shubham Kanodia <shubham.kanodia10@gmail.com>, Shubham Kanodia <shubham.kanodia10@gmail.com> and Cc addresses in it would probably not work as-is (I've fixed them up manually). > From: Shubham Kanodia <shubham.kanodia10@gmail.com> > > Remote-tracking refs can accumulate in local repositories even as branches > are deleted on remotes, impacting git performance negatively. Existing > alternatives to keep refs pruned have a few issues: > > 1. The `fetch.prune` config automatically cleans up remote ref on fetch, > but also pulls in new ref from remote which is an undesirable side-effect. This makes it sound as if fetch.prune configuration makes new refs pulled, but that is not what happens and that is not what you wanted to hint. If you run "git fetch" with the "--prune" option (or with the fetch.prune configuration set to true) while having the default refspec "+refs/heads/*:refs/remotes/$name/*" configured in remote.$name.fetch, then ... > diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt > index 6e6651309d3..0c8f1e01ccd 100644 > --- a/Documentation/git-maintenance.txt > +++ b/Documentation/git-maintenance.txt > @@ -158,6 +158,26 @@ pack-refs:: > need to iterate across many references. See linkgit:git-pack-refs[1] > for more information. > > +prune-remote-refs:: > + The `prune-remote-refs` task runs `git remote prune` on each remote > + repository registered in the local repository. This task helps clean > + up deleted remote branches, improving the performance of operations > + that iterate through the refs. See linkgit:git-remote[1] for more > + information. This task is disabled by default. > ++ > +NOTE: This task is opt-in to prevent unexpected removal of remote refs > +for users of git-maintenance. For most users, configuring `fetch.prune=true` > +is a acceptable solution, as it will automatically clean up stale remote-tracking > +branches during normal fetch operations. However, this task can be useful in > +specific scenarios: > ++ > +-- > +* When using selective fetching (e.g., `git fetch origin +foo:refs/remotes/origin/foo`) > + where `fetch.prune` would not affect refs outside the fetched hierarchy The word "hierarchy" hints that things under refs/remotes/origin/ (which is the hierarchy 'foo' is fetched into) that went away would be pruned, but that is not what happens (otherwise you would not be adding this feature). > +* When third-party tools might perform unexpected full fetches, and you want > + periodic cleanup independently of fetch operations You'd want a full-stop after these two sentences, by the way. > diff --git a/builtin/gc.c b/builtin/gc.c > index 4ae5196aedf..9acf1d29895 100644 > --- a/builtin/gc.c > +++ b/builtin/gc.c > @@ -20,6 +20,7 @@ > #include "lockfile.h" > #include "parse-options.h" > #include "run-command.h" > +#include "remote.h" > #include "sigchain.h" > #include "strvec.h" > #include "commit.h" > @@ -913,6 +914,40 @@ static int maintenance_opt_schedule(const struct option *opt, const char *arg, > return 0; > } > > +static int collect_remote(struct remote *remote, void *cb_data) > +{ > + struct string_list *list = cb_data; > + > + if (!remote->url.nr) > + return 0; > + > + string_list_append(list, remote->name); > + return 0; > +} > + > +static int maintenance_task_prune_remote(struct maintenance_run_opts *opts UNUSED, > + struct gc_config *cfg UNUSED) > +{ > + struct string_list_item *item; > + struct string_list remotes_list = STRING_LIST_INIT_NODUP; > + struct child_process child = CHILD_PROCESS_INIT; > + int result = 0; > + > + for_each_remote(collect_remote, &remotes_list); > + > + for_each_string_list_item (item, &remotes_list) { > + const char *remote_name = item->string; > + child.git_cmd = 1; > + strvec_pushl(&child.args, "remote", "prune", remote_name, NULL); > + > + if (run_command(&child)) > + result = error(_("failed to prune '%s'"), remote_name); > + } Hmph, is there a reason why you need two loops, instead of for-each-remote calling a function that does the run_command() thing? "git grep for_each_string_list_item \*.c" tells me that we almost never write SP between the macro name and the opening parenthesis. This loop does not stop at the first error, but returns a non-zero error after noticing even a single remote fail to run prune, which sounds like a seneible design. Would an error percolate up the same way when two different tasks run and one of them fails in the control folow in "git maintenance"? Just want to see if we are being consistent with the surrounding code. Thanks.
On Fri, Dec 27, 2024 at 2:37 PM Junio C Hamano <gitster@pobox.com> wrote: > > Thanks for a patch. > > > "Shubham Kanodia via GitGitGadget" <gitgitgadget@gmail.com> writes: > > You'd want to check your procedure to tell GGG about addresses; I am > seeing these > > From: "Shubham Kanodia via GitGitGadget" <gitgitgadget@gmail.com> > To: git@vger.kernel.org > Cc: "mailto:gitster@pobox.com" <[gitster@pobox.com]>, > "mailto:ps@pks.im" <[ps@pks.im]>, > Shubham Kanodia <shubham.kanodia10@gmail.com>, > Shubham Kanodia <shubham.kanodia10@gmail.com> > > and Cc addresses in it would probably not work as-is (I've fixed > them up manually). I think the GGG comment had a few formatting errors. Thanks for fixing the cc. > Hmph, is there a reason why you need two loops, instead of > for-each-remote calling a function that does the run_command() > thing? It can be collapsed into one. > This loop does not stop at the first error, but returns a non-zero > error after noticing even a single remote fail to run prune, which > sounds like a seneible design. Would an error percolate up the same > way when two different tasks run and one of them fails in the > control folow in "git maintenance"? Just want to see if we are > being consistent with the surrounding code. Fair point. I'll make the process flow identical to the prefetch refs task that works similarly across remotes. It returns as soon as the first remote fails (without necessarily affecting other tasks).
Shubham Kanodia <shubham.kanodia10@gmail.com> writes: >> Hmph, is there a reason why you need two loops, instead of >> for-each-remote calling a function that does the run_command() >> thing? > > It can be collapsed into one. Sorry, but that is not an answer, as my question was not a suggestion to change anything. It was a question asking you if there was a specific reason why the code was structured the way it was written. If there is another way to write it, you need to answer why the alternative wasn't picked. >> This loop does not stop at the first error, but returns a non-zero >> error after noticing even a single remote fail to run prune, which >> sounds like a seneible design. Would an error percolate up the same >> way when two different tasks run and one of them fails in the >> control folow in "git maintenance"? Just want to see if we are >> being consistent with the surrounding code. > > Fair point. I'll make the process flow identical to the prefetch refs > task that works similarly across remotes. > It returns as soon as the first remote fails (without necessarily > affecting other tasks). ... and the first failure signals the caller a failure? That would match what you did in your new feature, which is perfect. Thanks.
On Sat, Dec 28, 2024 at 9:35 PM Junio C Hamano <gitster@pobox.com> wrote: > > Shubham Kanodia <shubham.kanodia10@gmail.com> writes: > > >> Hmph, is there a reason why you need two loops, instead of > >> for-each-remote calling a function that does the run_command() > >> thing? > > > > It can be collapsed into one. > > Sorry, but that is not an answer, as my question was not a > suggestion to change anything. > > It was a question asking you if there was a specific reason why the > code was structured the way it was written. If there is another way > to write it, you need to answer why the alternative wasn't picked. There wasn't a good reason for doing it that way. I guess I was trying to understand the second argument for `for_each_remote` would be best used if the command was called directly (while avoiding a compilation warning), but looking at a few other usages of `for_each_remote` I realised that it could just be marked unused in this case (since we aren't doing anything with it). I should've probably looked deeper and learnt from existing patterns (e.g. `maintenance_task_prefetch`) — which I have in my last patch. > >> This loop does not stop at the first error, but returns a non-zero > >> error after noticing even a single remote fail to run prune, which > >> sounds like a seneible design. Would an error percolate up the same > >> way when two different tasks run and one of them fails in the > >> control folow in "git maintenance"? Just want to see if we are > >> being consistent with the surrounding code. > > > > Fair point. I'll make the process flow identical to the prefetch refs > > task that works similarly across remotes. > > It returns as soon as the first remote fails (without necessarily > > affecting other tasks). > > ... and the first failure signals the caller a failure? That would > match what you did in your new feature, which is perfect. Exactly — the first failing remote will signal that the `prune-remote-refs` task has failed via an immediate `return 1`. The maintenance command uses this to register the exit code of the top level command to 1, while continuing to execute all other tasks anyway. Thanks, Shubham K
diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt index 6e6651309d3..0c8f1e01ccd 100644 --- a/Documentation/git-maintenance.txt +++ b/Documentation/git-maintenance.txt @@ -158,6 +158,26 @@ pack-refs:: need to iterate across many references. See linkgit:git-pack-refs[1] for more information. +prune-remote-refs:: + The `prune-remote-refs` task runs `git remote prune` on each remote + repository registered in the local repository. This task helps clean + up deleted remote branches, improving the performance of operations + that iterate through the refs. See linkgit:git-remote[1] for more + information. This task is disabled by default. ++ +NOTE: This task is opt-in to prevent unexpected removal of remote refs +for users of git-maintenance. For most users, configuring `fetch.prune=true` +is a acceptable solution, as it will automatically clean up stale remote-tracking +branches during normal fetch operations. However, this task can be useful in +specific scenarios: ++ +-- +* When using selective fetching (e.g., `git fetch origin +foo:refs/remotes/origin/foo`) + where `fetch.prune` would not affect refs outside the fetched hierarchy +* When third-party tools might perform unexpected full fetches, and you want + periodic cleanup independently of fetch operations +-- + OPTIONS ------- --auto:: diff --git a/builtin/gc.c b/builtin/gc.c index 4ae5196aedf..9acf1d29895 100644 --- a/builtin/gc.c +++ b/builtin/gc.c @@ -20,6 +20,7 @@ #include "lockfile.h" #include "parse-options.h" #include "run-command.h" +#include "remote.h" #include "sigchain.h" #include "strvec.h" #include "commit.h" @@ -913,6 +914,40 @@ static int maintenance_opt_schedule(const struct option *opt, const char *arg, return 0; } +static int collect_remote(struct remote *remote, void *cb_data) +{ + struct string_list *list = cb_data; + + if (!remote->url.nr) + return 0; + + string_list_append(list, remote->name); + return 0; +} + +static int maintenance_task_prune_remote(struct maintenance_run_opts *opts UNUSED, + struct gc_config *cfg UNUSED) +{ + struct string_list_item *item; + struct string_list remotes_list = STRING_LIST_INIT_NODUP; + struct child_process child = CHILD_PROCESS_INIT; + int result = 0; + + for_each_remote(collect_remote, &remotes_list); + + for_each_string_list_item (item, &remotes_list) { + const char *remote_name = item->string; + child.git_cmd = 1; + strvec_pushl(&child.args, "remote", "prune", remote_name, NULL); + + if (run_command(&child)) + result = error(_("failed to prune '%s'"), remote_name); + } + + string_list_clear(&remotes_list, 0); + return result; +} + /* Remember to update object flag allocation in object.h */ #define SEEN (1u<<0) @@ -1375,6 +1410,7 @@ enum maintenance_task_label { TASK_GC, TASK_COMMIT_GRAPH, TASK_PACK_REFS, + TASK_PRUNE_REMOTE_REFS, /* Leave as final value */ TASK__COUNT @@ -1411,6 +1447,10 @@ static struct maintenance_task tasks[] = { maintenance_task_pack_refs, pack_refs_condition, }, + [TASK_PRUNE_REMOTE_REFS] = { + "prune-remote-refs", + maintenance_task_prune_remote, + }, }; static int compare_tasks_by_selection(const void *a_, const void *b_) @@ -1505,6 +1545,8 @@ static void initialize_maintenance_strategy(void) tasks[TASK_LOOSE_OBJECTS].schedule = SCHEDULE_DAILY; tasks[TASK_PACK_REFS].enabled = 1; tasks[TASK_PACK_REFS].schedule = SCHEDULE_WEEKLY; + tasks[TASK_PRUNE_REMOTE_REFS].enabled = 0; + tasks[TASK_PRUNE_REMOTE_REFS].schedule = SCHEDULE_DAILY; } } diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh index 0ce4ba1cbef..60a0c3f8353 100755 --- a/t/t7900-maintenance.sh +++ b/t/t7900-maintenance.sh @@ -446,6 +446,50 @@ test_expect_success 'pack-refs task' ' test_subcommand git pack-refs --all --prune <pack-refs.txt ' +test_expect_success 'prune-remote-refs task not enabled by default' ' + git clone . prune-test && + ( + cd prune-test && + GIT_TRACE2_EVENT="$(pwd)/prune.txt" git maintenance run 2>err && + test_subcommand ! git remote prune origin <prune.txt + ) +' + +test_expect_success 'prune-remote-refs task cleans stale remote refs' ' + test_commit initial && + + # Create two separate remote repos + git clone . remote1 && + git clone . remote2 && + + git clone . prune-test-clean && + ( + cd prune-test-clean && + git config maintenance.prune-remote-refs.enabled true && + + # Add both remotes + git remote add remote1 "../remote1" && + git remote add remote2 "../remote2" && + + # Create and push branches to both remotes + git branch -f side2 HEAD && + git push remote1 side2 && + git push remote2 side2 && + + # Rename branches in each remote to simulate a stale branch + git -C ../remote1 branch -m side2 side3 && + git -C ../remote2 branch -m side2 side4 && + + GIT_TRACE2_EVENT="$(pwd)/prune.txt" git maintenance run --task=prune-remote-refs && + + # Verify pruning happened for both remotes + test_subcommand git remote prune remote1 <prune.txt && + test_subcommand git remote prune remote2 <prune.txt && + test_must_fail git rev-parse refs/remotes/remote1/side2 && + test_must_fail git rev-parse refs/remotes/remote2/side2 + ) +' + test_expect_success '--auto and --schedule incompatible' ' test_must_fail git maintenance run --auto --schedule=daily 2>err && test_grep "at most one" err