Message ID | 20230206195503.3113048-1-calvinwan@google.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Documentation: clarify multiple pushurls vs urls | expand |
On Mon, Feb 06 2023, Calvin Wan wrote: > While it is possible to define multiple `url` fields in a remote to > push to multiple remotes at once, it is preferable to achieve this by > defining multiple `pushurl` fields. The idea with "url" and "pushurl" surely is to disambiguate whether you want the url for both push & fetch, or just push. I don't see why it's a given that it's preferrable to use "pushurl" over "url" yet, if your setup is e.g. 3 repos you push to, but it won't matter what you pull from why not use "url"? As opposed to pushing "pushurl" to push to read-only mirrors you yourself are updating? But let's read on... > Defining multiple `url` fields can cause confusion for users since > running `git config remote.<remote>.url` returns the last defined url > which doesn't align with the url `git fetch <remote>` uses (the first). I'm certainly confused, I had no idea it worked this way, I'd have thought it was last-set-wins like most things. From a glance fb0cc87ec0f (Allow programs to not depend on remotes having urls, 2009-11-18) mentions it as a known factor, but with: diff --git a/transport.c b/transport.c index 77a61a9d7bb..06159c4184e 100644 --- a/transport.c +++ b/transport.c @@ -1115,7 +1115,7 @@ struct transport *transport_get(struct remote *remote, const char *url) helper = remote->foreign_vcs; if (!url && remote->url) - url = remote->url[0]; + url = remote->url[remote->url_nr - 1]; ret->url = url; /* maybe it is a foreign URL? */ All tests pass for me, and it's selecting the last URL now. I can't find any other mention of these semantics in the docs (but maybe I didn't look in the right places). So is this just some accident, does anyone rely on it, and would we be better off just "fixing" this, rather than steering people away from "url"? > Add documentation to clarify how fetch interacts with multiple urls > and the recommended method to push to multiple remotes. > > Signed-off-by: Calvin Wan <calvinwan@google.com> > --- > Documentation/urls-remotes.txt | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/Documentation/urls-remotes.txt b/Documentation/urls-remotes.txt > index 86d0008f94..61aaded645 100644 > --- a/Documentation/urls-remotes.txt > +++ b/Documentation/urls-remotes.txt > @@ -33,7 +33,10 @@ config file would appear like this: > ------------ > > The `<pushurl>` is used for pushes only. It is optional and defaults > -to `<URL>`. > +to `<URL>`. Additional pushurls can be defined to push to multiple > +remotes. While multiple URLs can be defined to achieve the same > +outcome, this is not recommended since fetch only uses the first > +defined URL. Maybe it's just me, but I feel more confused reading this docs than before :) Surely if there's confusion about the priority of the *.url config variable we should be documenting that explicitly where we discuss "url" itself (e.g. in Documentation/config/remote.txt). Just mentioning it in passing as we document "pushUrl" feels like the wrong place. But I still don't quite see the premise. "git push" has a feature to push to all N urls, whether that's Url or pushUrl. When I configure it to have multiple URLs it pushes to the first configured one first, if the source of the confusion was that it didn't prefer the last configured one first, shouldn't it be doing them in reverse order? I don't think that would make sense, but I also don't see how recommending "pushurl" over "url" un-confuses things. So why is it confusing that "fetch" would use the same order, but due to the semantics of a "fetch" we'd stop after the first one?
> > Defining multiple `url` fields can cause confusion for users since > > running `git config remote.<remote>.url` returns the last defined url > > which doesn't align with the url `git fetch <remote>` uses (the first). > > I'm certainly confused, I had no idea it worked this way, I'd have thought it was last-set-wins like most things. > > From a glance fb0cc87ec0f (Allow programs to not depend on remotes > having urls, 2009-11-18) mentions it as a known factor, but with: > > diff --git a/transport.c b/transport.c > index 77a61a9d7bb..06159c4184e 100644 > --- a/transport.c > +++ b/transport.c > @@ -1115,7 +1115,7 @@ struct transport *transport_get(struct remote *remote, const char *url) > helper = remote->foreign_vcs; > > if (!url && remote->url) > - url = remote->url[0]; > + url = remote->url[remote->url_nr - 1]; > ret->url = url; > > /* maybe it is a foreign URL? */ > > All tests pass for me, and it's selecting the last URL now. I can't find > any other mention of these semantics in the docs (but maybe I didn't > look in the right places). > > So is this just some accident, does anyone rely on it, and would we be > better off just "fixing" this, rather than steering people away from > "url"? I should've mentioned running `git remote -v` on a config with multiple urls shows the correct fetch url, so functionally everything is working as intended -- just needs a doc update somewhere. > Surely if there's confusion about the priority of the *.url config > variable we should be documenting that explicitly where we discuss "url" > itself (e.g. in Documentation/config/remote.txt). Just mentioning it in > passing as we document "pushUrl" feels like the wrong place. > > But I still don't quite see the premise. "git push" has a feature to > push to all N urls, whether that's Url or pushUrl. > > When I configure it to have multiple URLs it pushes to the first > configured one first, if the source of the confusion was that it didn't > prefer the last configured one first, shouldn't it be doing them in > reverse order? > > I don't think that would make sense, but I also don't see how > recommending "pushurl" over "url" un-confuses things. > > So why is it confusing that "fetch" would use the same order, but due to > the semantics of a "fetch" we'd stop after the first one? I agree with you now that updating the documentation in Documentation/config/remote.txt is the ideal way to go about this, but I'll mention what my original thought process was: If a user wants one url to push/fetch to, then he defines 'url' If a user wants to push to multiple urls, then he can either define multiple urls or pushurls (one of the pushurls can be the same as the url). But if a user has say url #2 and #3 defined, they act as pushurls anyways, so defining them as such removes any speculation as to what else they could do (and also clears up the confusion when running `git config remote.<remote>.url`).
On Mon, Feb 06 2023, Calvin Wan wrote: >> > Defining multiple `url` fields can cause confusion for users since >> > running `git config remote.<remote>.url` returns the last defined url >> > which doesn't align with the url `git fetch <remote>` uses (the first). >> >> I'm certainly confused, I had no idea it worked this way, I'd have thought it was last-set-wins like most things. >> >> From a glance fb0cc87ec0f (Allow programs to not depend on remotes >> having urls, 2009-11-18) mentions it as a known factor, but with: >> >> diff --git a/transport.c b/transport.c >> index 77a61a9d7bb..06159c4184e 100644 >> --- a/transport.c >> +++ b/transport.c >> @@ -1115,7 +1115,7 @@ struct transport *transport_get(struct remote *remote, const char *url) >> helper = remote->foreign_vcs; >> >> if (!url && remote->url) >> - url = remote->url[0]; >> + url = remote->url[remote->url_nr - 1]; >> ret->url = url; >> >> /* maybe it is a foreign URL? */ >> >> All tests pass for me, and it's selecting the last URL now. I can't find >> any other mention of these semantics in the docs (but maybe I didn't >> look in the right places). >> >> So is this just some accident, does anyone rely on it, and would we be >> better off just "fixing" this, rather than steering people away from >> "url"? > > I should've mentioned running `git remote -v` on a config with multiple urls > shows the correct fetch url, so functionally everything is working as > intended -- just needs a doc update somewhere. Ah, yes, it seems to prefer the first configured one, whether that's what anyone intended (and should we use last configured?) is another matter. But in any case, figuring that out and having a test in-tree that fails if you pick the first or last of the list (depending on what we go for) would be most welcome... >> Surely if there's confusion about the priority of the *.url config >> variable we should be documenting that explicitly where we discuss "url" >> itself (e.g. in Documentation/config/remote.txt). Just mentioning it in >> passing as we document "pushUrl" feels like the wrong place. >> >> But I still don't quite see the premise. "git push" has a feature to >> push to all N urls, whether that's Url or pushUrl. >> >> When I configure it to have multiple URLs it pushes to the first >> configured one first, if the source of the confusion was that it didn't >> prefer the last configured one first, shouldn't it be doing them in >> reverse order? >> >> I don't think that would make sense, but I also don't see how >> recommending "pushurl" over "url" un-confuses things. >> >> So why is it confusing that "fetch" would use the same order, but due to >> the semantics of a "fetch" we'd stop after the first one? > > I agree with you now that updating the documentation in > Documentation/config/remote.txt is the ideal way to go about this, but Aside: I actually think near where you made the change in Documentation/urls-remotes.txt is probably better, in remote.txt we just point to git-fetch.txt or git-pull.txt etc., which in turn include that. It just seems we'd need a short blurb about how URLs are selected, we prefer the first one, and for fetch it's always "stop at the first", and for push "push to all, from first to last". I may have gotten that wrong, but that's my current understanding from looking at it briefly. > I'll mention what my original thought process was: > If a user wants one url to push/fetch to, then he defines 'url' > If a user wants to push to multiple urls, then he can either define > multiple urls or pushurls (one of the pushurls can be the same as the url). > But if a user has say url #2 and #3 defined, they act as pushurls anyways, > so defining them as such removes any speculation as to what else they > could do (and also clears up the confusion when running > `git config remote.<remote>.url`). I'm coming away from this with the impression that we should almost never recommend "pushUrl", not that it's worthwhile to use it to solve this ambiguity. Trying this out, given a config like: [remote "avar"] url = git@github.com:avar/git.git url = git@github.com:avar/git1.git url = git@github.com:avar/git2.git I'll get: $ ./git remote -v|grep avar avar git@github.com:avar/git.git (fetch) avar git@github.com:avar/git.git (push) avar git@github.com:avar/git1.git (push) avar git@github.com:avar/git2.git (push) So, the semantics of "url" is that the first one is always the fetch, there's no such thing as multiple fetch URLs, and the push URLs are the list of all "url"'s. But now let's change that to: [remote "avar"] url = git@github.com:avar/git.git url = git@github.com:avar/git1.git pushUrl = git@github.com:avar/git2.git Which gives us: $ ./git remote -v|grep avar avar git@github.com:avar/git.git (fetch) avar git@github.com:avar/git2.git (push) So, by defining a "pushUrl" all subsequent URLs in "url" have been shadowed. Maybe that was considered at the time, but it wouldn't surprise me if that's a blindspot in the original "pushUrl" patch (203462347fc (Allow push and fetch urls to be different, 2009-06-09)), i.e. shouldn't we at least warn on "push" now or somewhere else about the useless second "url"? So, back to "pushurl". It seems to me that the only real use-case for it is the case where the URL really must by different in the case of "fetch" and "push", which I daresay is increasingly rare these days (but used to be more common when dumb http read + ssh write or whatever was more common). Whereas the more common case is just wanting to fetch/push from/to github.com, and also wanting to mirror to gitlab.com or whathever. In that case using "pushUrl" requires you to duplicate the "fetch" url, whereas just having multiple "url" sections without "pushurl" does the same thing without the duplication.
Calvin Wan <calvinwan@google.com> writes: > If a user wants one url to push/fetch to, then he defines 'url' > If a user wants to push to multiple urls, then he can either define meaning "the user wants to push to multiple, but never wants to fetch from the other ones"? If so, then ... > multiple urls or pushurls (one of the pushurls can be the same as the url). ... using multiple urls is not desirable as it becomes hard to tell which one should be used for fetching. When you define additional places you want to publish to, you do not expect to see that new definition affect where you fetch from. In a sense, using the first one to fetch, not the last one, gives us less damage, for that reason. Adding more URLs would not change where you fetch from. But still, I think what you gave is a good rule, regardless of the answer to the "which among multiple non-push urls do we fetch" question. If we were designing this part of the system from scratch, we probably - would have forbidden multiple URLs, or - would allowed multiple URLs, but used the 'last one wins', or - would allowed multiple URLs, but and used later ones as fallback, only to be used when earlier ones fail. It certainly is not a designed behaviour what we do when we have multiple URLs, but if we haven't changed how we react to such a configuration in the past (and I do not think we ever did), changing the behaviour in any way this late in the game likely breaks real users' configuration. Thanks.
diff --git a/Documentation/urls-remotes.txt b/Documentation/urls-remotes.txt index 86d0008f94..61aaded645 100644 --- a/Documentation/urls-remotes.txt +++ b/Documentation/urls-remotes.txt @@ -33,7 +33,10 @@ config file would appear like this: ------------ The `<pushurl>` is used for pushes only. It is optional and defaults -to `<URL>`. +to `<URL>`. Additional pushurls can be defined to push to multiple +remotes. While multiple URLs can be defined to achieve the same +outcome, this is not recommended since fetch only uses the first +defined URL. Named file in `$GIT_DIR/remotes` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
While it is possible to define multiple `url` fields in a remote to push to multiple remotes at once, it is preferable to achieve this by defining multiple `pushurl` fields. Defining multiple `url` fields can cause confusion for users since running `git config remote.<remote>.url` returns the last defined url which doesn't align with the url `git fetch <remote>` uses (the first). Add documentation to clarify how fetch interacts with multiple urls and the recommended method to push to multiple remotes. Signed-off-by: Calvin Wan <calvinwan@google.com> --- Documentation/urls-remotes.txt | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)