Message ID | 20240910203835.2288291-2-bence@ferdinandy.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | set remote/HEAD with fetch | expand |
Bence Ferdinandy <bence@ferdinandy.com> writes: > When cloning a repository refs/remotes/origin/HEAD is set automatically. > In contrast, when using init, remote add and fetch to set a remote, one > needs to call remote set-head --auto to achieve the same result. In other words, "clone" is roughly equivalent to "init && remote add && fetch && remote set-head". I do not see a problem with it. You'd do such a repository creation task only once per repository anyway. > Add a --set-head option to git fetch to automatically set heads on > remotes. This practically would help only those who want to blindly follow what the remote considers their primary branch. If the remote flip-flops that every day betweein 'master' and 'main', you'd want to either "fetch && remote set-head" or "fetch --set-head", but it would be weird to flip-flop the HEAD every day to begin with. And the feature does not help those who want to inspect and then switch. When they say "--set-head", they will unconditionally switch whatever the remote uses and there is no opportunity for them to check. Ideally, because every "git fetch" you do will automatically learn what their HEAD of the day points at, even without "--set-head", it may be nice to let the user know when their HEAD changed, so that the user can inspect the situation and decide. If "fetch $repo", when it notices that refs/remotes/$repo/HEAD is missing, always unconditionally stores where their HEAD points at in refs/remotes/$repo/HEAD, and did nothing else, wouldn't that be sufficient? The users have "remote set-head" to do this when needed. What is the true motivation that "fetch" (which presumably happens a lot more often) needs to be involved in this process? The *only* upside I can see with "fetch --set-head" to blindly follow every switch of HEAD on the remote end is that you can keep up with the remote that flips its HEAD very often, but is that really a realistic need? If we reject such a use case as invalid, I suspect that the end-user experience would be simplified quite a lot. Imagine that we teach "git fetch $repo" to notice refs/remotes/$repo/HEAD is missing, and create it from the remote HEAD information automatically. And we do NOTHING ELSE. What would the end-user experience look like? * Usually, you start with "git clone" and 'origin' will know which branch 'origin/HEAD' points at. * You may run "git remote add -f $repo $URL" to add one. Because this runs "git fetch $repo", the previous addition to the "git fetch" will make sure refs/remotes/$repo/HEAD would be there. * You may run "git remote add $repo $URL" to add one, and then separately "git fetch $repo" yourself. The end result would be the same; refs/remotes/$repo/HEAD will be there. Having said all that, the implementation here ... > +static int run_set_head(const char *name) > +{ > + struct child_process cmd = CHILD_PROCESS_INIT; > + strvec_push(&cmd.args, "remote"); > + strvec_push(&cmd.args, "set-head"); > + strvec_push(&cmd.args, "--auto"); > + strvec_push(&cmd.args, name); > + cmd.git_cmd = 1; > + return run_command(&cmd); > +} ... does look quite straight-forward. > +static int fetch_multiple(struct string_list *list, int max_children, int set_head, > + const struct fetch_config *config) > { > int i, result = 0; > struct strvec argv = STRVEC_INIT; > @@ -2014,6 +2025,8 @@ static int fetch_multiple(struct string_list *list, int max_children, > error(_("could not fetch %s"), name); > result = 1; > } > + if (set_head && run_set_head(name)) > + result = 1; This looked a bit questionable---I expected that we'd run the subfetches with "--set-head" option, but this makes us do the set-head step ourselves after the subfetches are done. What are pros-and-cons considered to reach the decision to do it this way? Thanks.
Hey Junio, thanks for the very speedy feedback! On Wed Sep 11, 2024 at 00:19, Junio C Hamano <gitster@pobox.com> wrote: [snip] > > Ideally, because every "git fetch" you do will automatically learn > what their HEAD of the day points at, even without "--set-head", it > may be nice to let the user know when their HEAD changed, so that > the user can inspect the situation and decide. That would actually make sense, we could print a message saying HEAD has changed and I guess helpfully print the exact set-head command they would need to manually update should they wish to do so. > > If "fetch $repo", when it notices that refs/remotes/$repo/HEAD is > missing, always unconditionally stores where their HEAD points at > in refs/remotes/$repo/HEAD, and did nothing else, wouldn't that be > sufficient? > > The users have "remote set-head" to do this when needed. What is > the true motivation that "fetch" (which presumably happens a lot > more often) needs to be involved in this process? The *only* upside > I can see with "fetch --set-head" to blindly follow every switch of > HEAD on the remote end is that you can keep up with the remote that > flips its HEAD very often, but is that really a realistic need? If > we reject such a use case as invalid, I suspect that the end-user > experience would be simplified quite a lot. Imagine that we teach > "git fetch $repo" to notice refs/remotes/$repo/HEAD is missing, and > create it from the remote HEAD information automatically. And we do > NOTHING ELSE. What would the end-user experience look like? > > * Usually, you start with "git clone" and 'origin' will know which > branch 'origin/HEAD' points at. > > * You may run "git remote add -f $repo $URL" to add one. Because > this runs "git fetch $repo", the previous addition to the "git > fetch" will make sure refs/remotes/$repo/HEAD would be there. > > * You may run "git remote add $repo $URL" to add one, and then > separately "git fetch $repo" yourself. The end result would be > the same; refs/remotes/$repo/HEAD will be there. I'm convinced :) It also does drop a lot of complexity. So there will be no extra flag for fetch, rather: - if the remote HEAD does not exist, we create it - if it does exist, but has changed we print a message saying it was changed from X to Y and print the required command to update to this - no configuration needed The only place I can imagine this not being sufficient is in a CI/CD process that is fetching into a cached repo needs to be updated, but then those people can just always run remote set-head -a. > > Having said all that, the implementation here ... > > > +static int run_set_head(const char *name) > > +{ > > + struct child_process cmd = CHILD_PROCESS_INIT; > > + strvec_push(&cmd.args, "remote"); > > + strvec_push(&cmd.args, "set-head"); > > + strvec_push(&cmd.args, "--auto"); > > + strvec_push(&cmd.args, name); > > + cmd.git_cmd = 1; > > + return run_command(&cmd); > > +} > > ... does look quite straight-forward. Unfortunately, as Jeff has pointed out in the other thread, this implementation requires the user to authenticate twice ... Now, we could still rely on set-head to actually do the writing, since if you explicitly give it what to set to it will not do a network query, but considering all of the above, I think it would make more sense not to do this, rather just bring in the required logic here. This would also allow for the second patch to be a bit more explicit on what did or did not happen during a remote set-head. > > > +static int fetch_multiple(struct string_list *list, int max_children, int set_head, > > + const struct fetch_config *config) > > { > > int i, result = 0; > > struct strvec argv = STRVEC_INIT; > > @@ -2014,6 +2025,8 @@ static int fetch_multiple(struct string_list *list, int max_children, > > error(_("could not fetch %s"), name); > > result = 1; > > } > > + if (set_head && run_set_head(name)) > > + result = 1; > > This looked a bit questionable---I expected that we'd run the > subfetches with "--set-head" option, but this makes us do the > set-head step ourselves after the subfetches are done. What are > pros-and-cons considered to reach the decision to do it this way? Just noobish oversight, I figured out how to do it in the subfetch, but with the above there will be no need for this anyway I guess. I'll send a v2 soonish. Best, Bence
"Bence Ferdinandy" <bence@ferdinandy.com> writes: >> Ideally, because every "git fetch" you do will automatically learn >> what their HEAD of the day points at, even without "--set-head", it >> may be nice to let the user know when their HEAD changed, so that >> the user can inspect the situation and decide. > > That would actually make sense, we could print a message saying HEAD has > changed and I guess helpfully print the exact set-head command they would need > to manually update should they wish to do so. We need to be careful if we were to do this, though. A user may see that their HEAD points at their 'main' branch, and they know the remote-tracking HEAD points at 'next' branch. But the only thing the latter tells us is that 'next' is, as far as this user is concerned, the primary branch they are interested in from this remote. It may have been set earlier with "git remtoe set-head" explicitly, or it may have been set back when "git clone" created this repository and back then the remote had 'next' marked as its primary branch. In other words, the HEAD we learned from remote while we are fetching from it may or may not be the same from the remote-tracking HEAD we have, but it does not mean that the remote recently flipped its HEAD if these two are different, but what we want to report is "their HEAD used to point at 'master' but now it is pointing at 'main'". > So there will be no > extra flag for fetch, rather: > > - if the remote HEAD does not exist, we create it > > - if it does exist, but has changed we print a message saying it was changed from X to Y and print the required command to update to this > > - no configuration needed Very good, with two reservations. - if the remote HEAD does not exist, we may still not want to create it. Imagine [remote "his"] url = https://git.kernel.org/pub/scm/git/git push = refs/heads/maint push = refs/heads/master push = refs/heads/next push = +refs/heads/seen without any refspec for the fetching side. "git fetch his master" may learn where the remote HEAD is, and it may even be pointing at their 'master' branch, but because we do not maintain any remote tracking information for their 'master' (in other words, refs/remotes/his/master is not updated by this 'fetch' and there is no configuration to make future 'fetch' to do so). - it would be very nice to report when we notice that they changed their HEAD, but I do not think we have enough information to do so. Our existing refs/remotes/origin/HEAD may not have been set to match their HEAD in the first place. Thanks.
diff --git a/builtin/fetch.c b/builtin/fetch.c index b2b5aee5bf..6392314c6a 100644 --- a/builtin/fetch.c +++ b/builtin/fetch.c @@ -1961,8 +1961,19 @@ static int fetch_finished(int result, struct strbuf *out, return 0; } -static int fetch_multiple(struct string_list *list, int max_children, - const struct fetch_config *config) +static int run_set_head(const char *name) +{ + struct child_process cmd = CHILD_PROCESS_INIT; + strvec_push(&cmd.args, "remote"); + strvec_push(&cmd.args, "set-head"); + strvec_push(&cmd.args, "--auto"); + strvec_push(&cmd.args, name); + cmd.git_cmd = 1; + return run_command(&cmd); +} + +static int fetch_multiple(struct string_list *list, int max_children, int set_head, + const struct fetch_config *config) { int i, result = 0; struct strvec argv = STRVEC_INIT; @@ -2014,6 +2025,8 @@ static int fetch_multiple(struct string_list *list, int max_children, error(_("could not fetch %s"), name); result = 1; } + if (set_head && run_set_head(name)) + result = 1; } strvec_clear(&argv); @@ -2062,7 +2075,7 @@ static inline void fetch_one_setup_partial(struct remote *remote) } static int fetch_one(struct remote *remote, int argc, const char **argv, - int prune_tags_ok, int use_stdin_refspecs, + int prune_tags_ok, int set_head, int use_stdin_refspecs, const struct fetch_config *config) { struct refspec rs = REFSPEC_INIT_FETCH; @@ -2135,9 +2148,12 @@ static int fetch_one(struct remote *remote, int argc, const char **argv, refspec_clear(&rs); transport_disconnect(gtransport); gtransport = NULL; + if (set_head && run_set_head(remote -> name)) + exit_code = 1; return exit_code; } + int cmd_fetch(int argc, const char **argv, const char *prefix) { struct fetch_config config = { @@ -2154,6 +2170,7 @@ int cmd_fetch(int argc, const char **argv, const char *prefix) struct string_list list = STRING_LIST_INIT_DUP; struct remote *remote = NULL; int all = -1, multiple = 0; + int set_head = 0; int result = 0; int prune_tags_ok = 1; int enable_auto_gc = 1; @@ -2171,6 +2188,8 @@ int cmd_fetch(int argc, const char **argv, const char *prefix) OPT__VERBOSITY(&verbosity), OPT_BOOL(0, "all", &all, N_("fetch from all remotes")), + OPT_BOOL(0, "set-head", &set_head, + N_("auto set remote HEAD")), OPT_BOOL(0, "set-upstream", &set_upstream, N_("set upstream for git pull/fetch")), OPT_BOOL('a', "append", &append, @@ -2436,7 +2455,7 @@ int cmd_fetch(int argc, const char **argv, const char *prefix) trace2_region_leave("fetch", "setup-partial", the_repository); } trace2_region_enter("fetch", "fetch-one", the_repository); - result = fetch_one(remote, argc, argv, prune_tags_ok, stdin_refspecs, + result = fetch_one(remote, argc, argv, prune_tags_ok, set_head, stdin_refspecs, &config); trace2_region_leave("fetch", "fetch-one", the_repository); } else { @@ -2459,7 +2478,7 @@ int cmd_fetch(int argc, const char **argv, const char *prefix) /* TODO should this also die if we have a previous partial-clone? */ trace2_region_enter("fetch", "fetch-multiple", the_repository); - result = fetch_multiple(&list, max_children, &config); + result = fetch_multiple(&list, max_children, set_head, &config); trace2_region_leave("fetch", "fetch-multiple", the_repository); } diff --git a/builtin/remote.c b/builtin/remote.c index 0acc547d69..35c54dd103 100644 --- a/builtin/remote.c +++ b/builtin/remote.c @@ -1536,9 +1536,12 @@ static int get_remote_default(const char *key, const char *value UNUSED, static int update(int argc, const char **argv, const char *prefix) { int i, prune = -1; + int set_head = 0; struct option options[] = { OPT_BOOL('p', "prune", &prune, N_("prune remotes after fetching")), + OPT_BOOL(0, "set-head", &set_head, + N_("auto set remote HEAD")), OPT_END() }; struct child_process cmd = CHILD_PROCESS_INIT; @@ -1552,6 +1555,8 @@ static int update(int argc, const char **argv, const char *prefix) if (prune != -1) strvec_push(&cmd.args, prune ? "--prune" : "--no-prune"); + if (set_head) + strvec_push(&cmd.args, "--set-head"); if (verbose) strvec_push(&cmd.args, "-v"); strvec_push(&cmd.args, "--multiple");
When cloning a repository refs/remotes/origin/HEAD is set automatically. In contrast, when using init, remote add and fetch to set a remote, one needs to call remote set-head --auto to achieve the same result. Add a --set-head option to git fetch to automatically set heads on remotes. Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com> --- builtin/fetch.c | 29 ++++++++++++++++++++++++----- builtin/remote.c | 5 +++++ 2 files changed, 29 insertions(+), 5 deletions(-)