Message ID | pull.1838.v2.git.1735380461980.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [v2] maintenance: add prune-remote-refs task | expand |
"Shubham Kanodia via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Shubham Kanodia <shubham.kanodia10@gmail.com> > > Remote-tracking refs can accumulate in local repositories even as branches > are deleted on remotes, impacting git performance negatively. Existing > alternatives to keep refs pruned have a few issues — > > 1. Running `git fetch` with either `--prune` or `fetch.prune=true` set will > prune stale refs, but requires a manual operation and also pulls in new > refs from remote which can be an undesirable side-effect. It is only true if you cloned without any tweaks. For example, if you cloned with the single-branch option, you would not pull in new refs, wouldn't you? Also "requires a manual operation" is not quite a good rationale, as you could have placed such a "git fetch" instead of "git remote prune", in the maintenance schedule. For this to become an issue, the condition we saw in earlier discussion, i.e. while having the *default* refspec "+refs/heads/*:refs/remotes/$name/*" configured in remote.$name.fetch is crucial. Since that is the default refspec "git clone" gives you, your "git fetch --prune" would give you full set of refs while pruning, and the end result is that you have up-to-date set of remote-tracking branches (which you may not want). > 2.`git remote prune` cleans up refs without adding to the existing list > but requires periodic user intervention. You have a SP after "1." but not after "2.". > This adds a new maintenance task 'prune-remote-refs' that runs > 'git remote prune' for each configured remote daily. This provides an > automated way to clean up stale remote-tracking refs — especially when > users may not do a full fetch. "This adds" -> "Add". I'd strike the latter sentence. Regardless of what users do or do not do, the automated clean-up is performed. > This task is disabled by default. > > Signed-off-by: Shubham Kanodia <shubham.kanodia10@gmail.com> > --- > +NOTE: This task is opt-in to prevent unexpected removal of remote refs > +for users of git-maintenance. For most users, configuring `fetch.prune=true` > +is a acceptable solution, as it will automatically clean up stale remote-tracking "a acceptable" -> "an acceptable". > +branches during normal fetch operations. However, this task can be useful in > +specific scenarios: > ++ > +-- > +* When using selective fetching (e.g., `git fetch origin +foo:refs/remotes/origin/foo`) > + where `fetch.prune` would only affect refs that are explicitly fetched. > +* When third-party tools might perform unexpected full fetches, and you want > + periodic cleanup independently of fetch operations. > +-- Very well written. > @@ -913,6 +914,30 @@ static int maintenance_opt_schedule(const struct option *opt, const char *arg, > return 0; > } > > +static int prune_remote(struct remote *remote, void *cb_data UNUSED) > +{ > + struct child_process child = CHILD_PROCESS_INIT; > + > + if (!remote->url.nr) > + return 0; > + > + child.git_cmd = 1; > + strvec_pushl(&child.args, "remote", "prune", remote->name, NULL); > + > + return !!run_command(&child); > +} > + > +static int maintenance_task_prune_remote(struct maintenance_run_opts *opts, > + struct gc_config *cfg UNUSED) > +{ > + if (for_each_remote(prune_remote, opts)) { > + error(_("failed to prune remotes")); > + return 1; > + } > + > + return 0; > +} Nice reuse of the program structure, which is very clean and easy to read. Overall very well written. Will queue, with attached range-diff. Thanks. --- >8 --- 1: 0ae9668b5c ! 1: 8a40f8b319 maintenance: add prune-remote-refs task @@ Commit message Remote-tracking refs can accumulate in local repositories even as branches are deleted on remotes, impacting git performance negatively. Existing - alternatives to keep refs pruned have a few issues — + alternatives to keep refs pruned have a few issues: - 1. Running `git fetch` with either `--prune` or `fetch.prune=true` set will - prune stale refs, but requires a manual operation and also pulls in new - refs from remote which can be an undesirable side-effect. + 1. Running `git fetch` with either `--prune` or `fetch.prune=true` + set, with the default refspec to copy all their branches into + our remote-tracking branches, will prune stale refs, but also + pulls in new branches from remote. That is undesirable if the + user wants to only work with a selected few remote branches. - 2.`git remote prune` cleans up refs without adding to the existing list - but requires periodic user intervention. + 2. `git remote prune` cleans up refs without adding to the + existing list but requires periodic user intervention. - This adds a new maintenance task 'prune-remote-refs' that runs - 'git remote prune' for each configured remote daily. This provides an - automated way to clean up stale remote-tracking refs — especially when - users may not do a full fetch. - - This task is disabled by default. + Add a new maintenance task 'prune-remote-refs' that runs 'git remote + prune' for each configured remote daily. Leave the task disabled by + default, as it may be unexpected to see their remote-tracking + branches to disappear while they are not watching for unsuspecting + users. Signed-off-by: Shubham Kanodia <shubham.kanodia10@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> @@ Documentation/git-maintenance.txt: pack-refs:: ++ +NOTE: This task is opt-in to prevent unexpected removal of remote refs +for users of git-maintenance. For most users, configuring `fetch.prune=true` -+is a acceptable solution, as it will automatically clean up stale remote-tracking ++is an acceptable solution, as it will automatically clean up stale remote-tracking +branches during normal fetch operations. However, this task can be useful in +specific scenarios: ++
On Sat, Dec 28, 2024 at 10:07:41AM +0000, Shubham Kanodia via GitGitGadget wrote: > diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt > index 6e6651309d3..8b3e496c8ef 100644 > --- a/Documentation/git-maintenance.txt > +++ b/Documentation/git-maintenance.txt > @@ -158,6 +158,26 @@ pack-refs:: > need to iterate across many references. See linkgit:git-pack-refs[1] > for more information. > > +prune-remote-refs:: > + The `prune-remote-refs` task runs `git remote prune` on each remote > + repository registered in the local repository. This task helps clean > + up deleted remote branches, improving the performance of operations > + that iterate through the refs. See linkgit:git-remote[1] for more > + information. This task is disabled by default. > ++ > +NOTE: This task is opt-in to prevent unexpected removal of remote refs > +for users of git-maintenance. For most users, configuring `fetch.prune=true` Do we want to make this linkgit:git-maintenance[1] even though this is self-referential? > +is a acceptable solution, as it will automatically clean up stale remote-tracking > +branches during normal fetch operations. However, this task can be useful in > +specific scenarios: > ++ > +-- > +* When using selective fetching (e.g., `git fetch origin +foo:refs/remotes/origin/foo`) > + where `fetch.prune` would only affect refs that are explicitly fetched. > +* When third-party tools might perform unexpected full fetches, and you want > + periodic cleanup independently of fetch operations. > +-- Nicely explained. I wish we had more such documentation that is taking the user by their hand and explains why they may or may not want to have a specific thing. > diff --git a/builtin/gc.c b/builtin/gc.c > index 4ae5196aedf..329c764f300 100644 > --- a/builtin/gc.c > +++ b/builtin/gc.c > @@ -20,6 +20,7 @@ > #include "lockfile.h" > #include "parse-options.h" > #include "run-command.h" > +#include "remote.h" > #include "sigchain.h" > #include "strvec.h" > #include "commit.h" > @@ -913,6 +914,30 @@ static int maintenance_opt_schedule(const struct option *opt, const char *arg, > return 0; > } > > +static int prune_remote(struct remote *remote, void *cb_data UNUSED) > +{ > + struct child_process child = CHILD_PROCESS_INIT; > + > + if (!remote->url.nr) > + return 0; > + > + child.git_cmd = 1; > + strvec_pushl(&child.args, "remote", "prune", remote->name, NULL); > + > + return !!run_command(&child); > +} > + > +static int maintenance_task_prune_remote(struct maintenance_run_opts *opts, > + struct gc_config *cfg UNUSED) > +{ > + if (for_each_remote(prune_remote, opts)) { > + error(_("failed to prune remotes")); > + return 1; I wonder whether we should adapt the loop to be eager. Erroring out on the first failed remote would potentially mean that none of the other remotes may get pruned. So if you had a now-unreachable remote as first remote then none of your remotes would be pruned. If so, we may want to collect the names of failed remotes and print them, as well. Patrick
Patrick Steinhardt <ps@pks.im> writes: > On Sat, Dec 28, 2024 at 10:07:41AM +0000, Shubham Kanodia via GitGitGadget wrote: >> diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt >> index 6e6651309d3..8b3e496c8ef 100644 >> --- a/Documentation/git-maintenance.txt >> +++ b/Documentation/git-maintenance.txt >> @@ -158,6 +158,26 @@ pack-refs:: >> need to iterate across many references. See linkgit:git-pack-refs[1] >> for more information. >> >> +prune-remote-refs:: >> + The `prune-remote-refs` task runs `git remote prune` on each remote >> + repository registered in the local repository. This task helps clean >> + up deleted remote branches, improving the performance of operations >> + that iterate through the refs. See linkgit:git-remote[1] for more >> + information. This task is disabled by default. >> ++ >> +NOTE: This task is opt-in to prevent unexpected removal of remote refs >> +for users of git-maintenance. For most users, configuring `fetch.prune=true` > > Do we want to make this linkgit:git-maintenance[1] even though this is > self-referential? That certainly is a thought---the rule could be "whenever we refer to a Git command, we refer to it in a uniform way". An alternative would be "of git-maintenance" -> "of this command" to weaken it. This refers to those users who want to use the command for other reasons (you use the scheduled tasks driven by 'git maintenance' only because you wanted the 'gc' and 'pack-refs' tasks to run, you do not necessarily want to run a new kind of task the new version of Git started supporting, especially when the task is destructive, like this one). We might want to stress that point, perhaps? If a reader reads this part of the documentation, finds this task useful and decides to use 'git maintenance', the note would sound somewhat nonsensical to them---"I thought about the ramifications, I decided I wanted to use the command, why would it be opt-in?" is a plausible confusion. >> +is a acceptable solution, as it will automatically clean up stale remote-tracking >> +branches during normal fetch operations. However, this task can be useful in >> +specific scenarios: >> ++ >> +-- >> +* When using selective fetching (e.g., `git fetch origin +foo:refs/remotes/origin/foo`) >> + where `fetch.prune` would only affect refs that are explicitly fetched. >> +* When third-party tools might perform unexpected full fetches, and you want >> + periodic cleanup independently of fetch operations. >> +-- > > Nicely explained. I wish we had more such documentation that is taking > the user by their hand and explains why they may or may not want to have > a specific thing. Yes, a configuration or an option that are not for everybody and for every situation need such a guidance, and this one is done nicely. >> +static int maintenance_task_prune_remote(struct maintenance_run_opts *opts, >> + struct gc_config *cfg UNUSED) >> +{ >> + if (for_each_remote(prune_remote, opts)) { >> + error(_("failed to prune remotes")); >> + return 1; > > I wonder whether we should adapt the loop to be eager. Erroring out on > the first failed remote would potentially mean that none of the other > remotes may get pruned. So if you had a now-unreachable remote as first > remote then none of your remotes would be pruned. I think the structure, hence the behaviour, is shared with an existing prefetch task. I think the current way is OK-ish, but given that we are not in a hurry, we may want to correct the semantics for both of them before unleashing this new task to the world. For that, we need the callback functions given to for_each_remote (i.e., fetch_remote and prune_remote) to always return "success" in the sense to tell "I am done with this remote" to allow the loop to continue to the next remote, and convey the failure from the subcommand via some other means (like flipping a bit in the cbdata). Thanks.
On Mon, Dec 30, 2024 at 7:35 PM Junio C Hamano <gitster@pobox.com> wrote: > > Patrick Steinhardt <ps@pks.im> writes: > > > On Sat, Dec 28, 2024 at 10:07:41AM +0000, Shubham Kanodia via GitGitGadget wrote: > >> diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt > >> index 6e6651309d3..8b3e496c8ef 100644 > >> --- a/Documentation/git-maintenance.txt > >> +++ b/Documentation/git-maintenance.txt > >> @@ -158,6 +158,26 @@ pack-refs:: > >> need to iterate across many references. See linkgit:git-pack-refs[1] > >> for more information. > >> > >> +prune-remote-refs:: > >> + The `prune-remote-refs` task runs `git remote prune` on each remote > >> + repository registered in the local repository. This task helps clean > >> + up deleted remote branches, improving the performance of operations > >> + that iterate through the refs. See linkgit:git-remote[1] for more > >> + information. This task is disabled by default. > >> ++ > >> +NOTE: This task is opt-in to prevent unexpected removal of remote refs > >> +for users of git-maintenance. For most users, configuring `fetch.prune=true` > > > > Do we want to make this linkgit:git-maintenance[1] even though this is > > self-referential? > > That certainly is a thought---the rule could be "whenever we refer > to a Git command, we refer to it in a uniform way". An alternative > would be "of git-maintenance" -> "of this command" to weaken it. > > This refers to those users who want to use the command for other > reasons (you use the scheduled tasks driven by 'git maintenance' > only because you wanted the 'gc' and 'pack-refs' tasks to run, you > do not necessarily want to run a new kind of task the new version of > Git started supporting, especially when the task is destructive, > like this one). We might want to stress that point, perhaps? If a > reader reads this part of the documentation, finds this task useful > and decides to use 'git maintenance', the note would sound somewhat > nonsensical to them---"I thought about the ramifications, I decided > I wanted to use the command, why would it be opt-in?" is a plausible > confusion. > > >> +is a acceptable solution, as it will automatically clean up stale remote-tracking > >> +branches during normal fetch operations. However, this task can be useful in > >> +specific scenarios: > >> ++ > >> +-- > >> +* When using selective fetching (e.g., `git fetch origin +foo:refs/remotes/origin/foo`) > >> + where `fetch.prune` would only affect refs that are explicitly fetched. > >> +* When third-party tools might perform unexpected full fetches, and you want > >> + periodic cleanup independently of fetch operations. > >> +-- > > > > Nicely explained. I wish we had more such documentation that is taking > > the user by their hand and explains why they may or may not want to have > > a specific thing. > > Yes, a configuration or an option that are not for everybody and for > every situation need such a guidance, and this one is done nicely. > > >> +static int maintenance_task_prune_remote(struct maintenance_run_opts *opts, > >> + struct gc_config *cfg UNUSED) > >> +{ > >> + if (for_each_remote(prune_remote, opts)) { > >> + error(_("failed to prune remotes")); > >> + return 1; > > > > I wonder whether we should adapt the loop to be eager. Erroring out on > > the first failed remote would potentially mean that none of the other > > remotes may get pruned. So if you had a now-unreachable remote as first > > remote then none of your remotes would be pruned. > > I think the structure, hence the behaviour, is shared with an > existing prefetch task. I think the current way is OK-ish, but > given that we are not in a hurry, we may want to correct the > semantics for both of them before unleashing this new task to the > world. > > For that, we need the callback functions given to for_each_remote > (i.e., fetch_remote and prune_remote) to always return "success" in > the sense to tell "I am done with this remote" to allow the loop to > continue to the next remote, and convey the failure from the > subcommand via some other means (like flipping a bit in the cbdata). > > Thanks. Curious — I submitted my patches through GGG, but Junio was kind enough to apply a few other fixes to it. Is there a place I can now get the whole diff (with the range diff patched in) so I can pull that into GGG? Thanks, Shubham
On Fri, Jan 03, 2025 at 12:20:07PM +0530, Shubham Kanodia wrote: > Curious — I submitted my patches through GGG, but Junio was kind > enough to apply a few other fixes to it. > Is there a place I can now get the whole diff (with the range diff > patched in) so I can pull that into GGG? Junio publishes his branches at [1], and yours specifically is called "sk/maintenance-remote-prune". I'd typically just update my local branch to match what he has in there. Patrick [1]: https://github.com/gitster/git.git
diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt index 6e6651309d3..8b3e496c8ef 100644 --- a/Documentation/git-maintenance.txt +++ b/Documentation/git-maintenance.txt @@ -158,6 +158,26 @@ pack-refs:: need to iterate across many references. See linkgit:git-pack-refs[1] for more information. +prune-remote-refs:: + The `prune-remote-refs` task runs `git remote prune` on each remote + repository registered in the local repository. This task helps clean + up deleted remote branches, improving the performance of operations + that iterate through the refs. See linkgit:git-remote[1] for more + information. This task is disabled by default. ++ +NOTE: This task is opt-in to prevent unexpected removal of remote refs +for users of git-maintenance. For most users, configuring `fetch.prune=true` +is a acceptable solution, as it will automatically clean up stale remote-tracking +branches during normal fetch operations. However, this task can be useful in +specific scenarios: ++ +-- +* When using selective fetching (e.g., `git fetch origin +foo:refs/remotes/origin/foo`) + where `fetch.prune` would only affect refs that are explicitly fetched. +* When third-party tools might perform unexpected full fetches, and you want + periodic cleanup independently of fetch operations. +-- + OPTIONS ------- --auto:: diff --git a/builtin/gc.c b/builtin/gc.c index 4ae5196aedf..329c764f300 100644 --- a/builtin/gc.c +++ b/builtin/gc.c @@ -20,6 +20,7 @@ #include "lockfile.h" #include "parse-options.h" #include "run-command.h" +#include "remote.h" #include "sigchain.h" #include "strvec.h" #include "commit.h" @@ -913,6 +914,30 @@ static int maintenance_opt_schedule(const struct option *opt, const char *arg, return 0; } +static int prune_remote(struct remote *remote, void *cb_data UNUSED) +{ + struct child_process child = CHILD_PROCESS_INIT; + + if (!remote->url.nr) + return 0; + + child.git_cmd = 1; + strvec_pushl(&child.args, "remote", "prune", remote->name, NULL); + + return !!run_command(&child); +} + +static int maintenance_task_prune_remote(struct maintenance_run_opts *opts, + struct gc_config *cfg UNUSED) +{ + if (for_each_remote(prune_remote, opts)) { + error(_("failed to prune remotes")); + return 1; + } + + return 0; +} + /* Remember to update object flag allocation in object.h */ #define SEEN (1u<<0) @@ -1375,6 +1400,7 @@ enum maintenance_task_label { TASK_GC, TASK_COMMIT_GRAPH, TASK_PACK_REFS, + TASK_PRUNE_REMOTE_REFS, /* Leave as final value */ TASK__COUNT @@ -1411,6 +1437,10 @@ static struct maintenance_task tasks[] = { maintenance_task_pack_refs, pack_refs_condition, }, + [TASK_PRUNE_REMOTE_REFS] = { + "prune-remote-refs", + maintenance_task_prune_remote, + }, }; static int compare_tasks_by_selection(const void *a_, const void *b_) @@ -1505,6 +1535,8 @@ static void initialize_maintenance_strategy(void) tasks[TASK_LOOSE_OBJECTS].schedule = SCHEDULE_DAILY; tasks[TASK_PACK_REFS].enabled = 1; tasks[TASK_PACK_REFS].schedule = SCHEDULE_WEEKLY; + tasks[TASK_PRUNE_REMOTE_REFS].enabled = 0; + tasks[TASK_PRUNE_REMOTE_REFS].schedule = SCHEDULE_DAILY; } } diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh index 0ce4ba1cbef..60a0c3f8353 100755 --- a/t/t7900-maintenance.sh +++ b/t/t7900-maintenance.sh @@ -446,6 +446,50 @@ test_expect_success 'pack-refs task' ' test_subcommand git pack-refs --all --prune <pack-refs.txt ' +test_expect_success 'prune-remote-refs task not enabled by default' ' + git clone . prune-test && + ( + cd prune-test && + GIT_TRACE2_EVENT="$(pwd)/prune.txt" git maintenance run 2>err && + test_subcommand ! git remote prune origin <prune.txt + ) +' + +test_expect_success 'prune-remote-refs task cleans stale remote refs' ' + test_commit initial && + + # Create two separate remote repos + git clone . remote1 && + git clone . remote2 && + + git clone . prune-test-clean && + ( + cd prune-test-clean && + git config maintenance.prune-remote-refs.enabled true && + + # Add both remotes + git remote add remote1 "../remote1" && + git remote add remote2 "../remote2" && + + # Create and push branches to both remotes + git branch -f side2 HEAD && + git push remote1 side2 && + git push remote2 side2 && + + # Rename branches in each remote to simulate a stale branch + git -C ../remote1 branch -m side2 side3 && + git -C ../remote2 branch -m side2 side4 && + + GIT_TRACE2_EVENT="$(pwd)/prune.txt" git maintenance run --task=prune-remote-refs && + + # Verify pruning happened for both remotes + test_subcommand git remote prune remote1 <prune.txt && + test_subcommand git remote prune remote2 <prune.txt && + test_must_fail git rev-parse refs/remotes/remote1/side2 && + test_must_fail git rev-parse refs/remotes/remote2/side2 + ) +' + test_expect_success '--auto and --schedule incompatible' ' test_must_fail git maintenance run --auto --schedule=daily 2>err && test_grep "at most one" err