Documentation: clarify multiple pushurls vs urls

Message ID	20230206195503.3113048-1-calvinwan@google.com (mailing list archive)
State	Superseded
Headers	show Return-Path: <git-owner@vger.kernel.org> Date: Mon, 6 Feb 2023 19:55:03 +0000 Mime-Version: 1.0 Message-ID: <20230206195503.3113048-1-calvinwan@google.com> Subject: [PATCH] Documentation: clarify multiple pushurls vs urls From: Calvin Wan <calvinwan@google.com> To: git@vger.kernel.org Cc: Calvin Wan <calvinwan@google.com> Content-Type: text/plain; charset="UTF-8" Precedence: bulk
Series	Documentation: clarify multiple pushurls vs urls \| expand Documentation: clarify multiple pushurls vs urls

Message ID

20230206195503.3113048-1-calvinwan@google.com (mailing list archive)

State

Superseded

Headers

Date: Mon,  6 Feb 2023 19:55:03 +0000
Mime-Version: 1.0
Message-ID: <20230206195503.3113048-1-calvinwan@google.com>
Subject: [PATCH] Documentation: clarify multiple pushurls vs urls
From: Calvin Wan <calvinwan@google.com>
To: git@vger.kernel.org
Cc: Calvin Wan <calvinwan@google.com>
Content-Type: text/plain; charset="UTF-8"
Precedence: bulk

Series

Documentation: clarify multiple pushurls vs urls | expand

Commit Message

Calvin Wan Feb. 6, 2023, 7:55 p.m. UTC

While it is possible to define multiple `url` fields in a remote to
push to multiple remotes at once, it is preferable to achieve this by
defining multiple `pushurl` fields.

Defining multiple `url` fields can cause confusion for users since
running `git config remote.<remote>.url` returns the last defined url
which doesn't align with the url `git fetch <remote>` uses (the first).

Add documentation to clarify how fetch interacts with multiple urls
and the recommended method to push to multiple remotes.

Signed-off-by: Calvin Wan <calvinwan@google.com>
---
 Documentation/urls-remotes.txt | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Comments

Ævar Arnfjörð Bjarmason Feb. 6, 2023, 8:11 p.m. UTC | #1

On Mon, Feb 06 2023, Calvin Wan wrote:

> While it is possible to define multiple `url` fields in a remote to
> push to multiple remotes at once, it is preferable to achieve this by
> defining multiple `pushurl` fields.

The idea with "url" and "pushurl" surely is to disambiguate whether you
want the url for both push & fetch, or just push.

I don't see why it's a given that it's preferrable to use "pushurl" over
"url" yet, if your setup is e.g. 3 repos you push to, but it won't
matter what you pull from why not use "url"? As opposed to pushing
"pushurl" to push to read-only mirrors you yourself are updating?

But let's read on...

> Defining multiple `url` fields can cause confusion for users since
> running `git config remote.<remote>.url` returns the last defined url
> which doesn't align with the url `git fetch <remote>` uses (the first).

I'm certainly confused, I had no idea it worked this way, I'd have thought it was last-set-wins like most things.

From a glance fb0cc87ec0f (Allow programs to not depend on remotes
having urls, 2009-11-18) mentions it as a known factor, but with:

	diff --git a/transport.c b/transport.c
	index 77a61a9d7bb..06159c4184e 100644
	--- a/transport.c
	+++ b/transport.c
	@@ -1115,7 +1115,7 @@ struct transport *transport_get(struct remote *remote, const char *url)
	 	helper = remote->foreign_vcs;

	 	if (!url && remote->url)
	-		url = remote->url[0];
	+		url = remote->url[remote->url_nr - 1];
	 	ret->url = url;

	 	/* maybe it is a foreign URL? */

All tests pass for me, and it's selecting the last URL now. I can't find
any other mention of these semantics in the docs (but maybe I didn't
look in the right places).

So is this just some accident, does anyone rely on it, and would we be
better off just "fixing" this, rather than steering people away from
"url"?

> Add documentation to clarify how fetch interacts with multiple urls
> and the recommended method to push to multiple remotes.
>
> Signed-off-by: Calvin Wan <calvinwan@google.com>
> ---
>  Documentation/urls-remotes.txt | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/urls-remotes.txt b/Documentation/urls-remotes.txt
> index 86d0008f94..61aaded645 100644
> --- a/Documentation/urls-remotes.txt
> +++ b/Documentation/urls-remotes.txt
> @@ -33,7 +33,10 @@ config file would appear like this:
>  ------------
>  
>  The `<pushurl>` is used for pushes only. It is optional and defaults
> -to `<URL>`.
> +to `<URL>`. Additional pushurls can be defined to push to multiple
> +remotes. While multiple URLs can be defined to achieve the same
> +outcome, this is not recommended since fetch only uses the first
> +defined URL.

Maybe it's just me, but I feel more confused reading this docs than
before :)

Surely if there's confusion about the priority of the *.url config
variable we should be documenting that explicitly where we discuss "url"
itself (e.g. in Documentation/config/remote.txt). Just mentioning it in
passing as we document "pushUrl" feels like the wrong place.

But I still don't quite see the premise. "git push" has a feature to
push to all N urls, whether that's Url or pushUrl.

When I configure it to have multiple URLs it pushes to the first
configured one first, if the source of the confusion was that it didn't
prefer the last configured one first, shouldn't it be doing them in
reverse order?

I don't think that would make sense, but I also don't see how
recommending "pushurl" over "url" un-confuses things.

So why is it confusing that "fetch" would use the same order, but due to
the semantics of a "fetch" we'd stop after the first one?

Calvin Wan Feb. 6, 2023, 9:12 p.m. UTC | #2

> > Defining multiple `url` fields can cause confusion for users since
> > running `git config remote.<remote>.url` returns the last defined url
> > which doesn't align with the url `git fetch <remote>` uses (the first).
>
> I'm certainly confused, I had no idea it worked this way, I'd have thought it was last-set-wins like most things.
>
> From a glance fb0cc87ec0f (Allow programs to not depend on remotes
> having urls, 2009-11-18) mentions it as a known factor, but with:
>
>         diff --git a/transport.c b/transport.c
>         index 77a61a9d7bb..06159c4184e 100644
>         --- a/transport.c
>         +++ b/transport.c
>         @@ -1115,7 +1115,7 @@ struct transport *transport_get(struct remote *remote, const char *url)
>                 helper = remote->foreign_vcs;
>
>                 if (!url && remote->url)
>         -               url = remote->url[0];
>         +               url = remote->url[remote->url_nr - 1];
>                 ret->url = url;
>
>                 /* maybe it is a foreign URL? */
>
> All tests pass for me, and it's selecting the last URL now. I can't find
> any other mention of these semantics in the docs (but maybe I didn't
> look in the right places).
>
> So is this just some accident, does anyone rely on it, and would we be
> better off just "fixing" this, rather than steering people away from
> "url"?

I should've mentioned running `git remote -v` on a config with multiple urls
 shows the correct fetch url, so functionally everything is working as
intended -- just needs a doc update somewhere.

> Surely if there's confusion about the priority of the *.url config
> variable we should be documenting that explicitly where we discuss "url"
> itself (e.g. in Documentation/config/remote.txt). Just mentioning it in
> passing as we document "pushUrl" feels like the wrong place.
>
> But I still don't quite see the premise. "git push" has a feature to
> push to all N urls, whether that's Url or pushUrl.
>
> When I configure it to have multiple URLs it pushes to the first
> configured one first, if the source of the confusion was that it didn't
> prefer the last configured one first, shouldn't it be doing them in
> reverse order?
>
> I don't think that would make sense, but I also don't see how
> recommending "pushurl" over "url" un-confuses things.
>
> So why is it confusing that "fetch" would use the same order, but due to
> the semantics of a "fetch" we'd stop after the first one?

I agree with you now that updating the documentation in
Documentation/config/remote.txt is the ideal way to go about this, but
I'll mention what my original thought process was:

If a user wants one url to push/fetch to, then he defines 'url'
If a user wants to push to multiple urls, then he can either define
multiple urls or pushurls (one of the pushurls can be the same as the url).
But if a user has say url #2 and #3 defined, they act as pushurls anyways,
so defining them as such removes any speculation as to what else they
could do (and also clears up the confusion when running
`git config remote.<remote>.url`).

Ævar Arnfjörð Bjarmason Feb. 6, 2023, 9:55 p.m. UTC | #3

On Mon, Feb 06 2023, Calvin Wan wrote:

>> > Defining multiple `url` fields can cause confusion for users since
>> > running `git config remote.<remote>.url` returns the last defined url
>> > which doesn't align with the url `git fetch <remote>` uses (the first).
>>
>> I'm certainly confused, I had no idea it worked this way, I'd have thought it was last-set-wins like most things.
>>
>> From a glance fb0cc87ec0f (Allow programs to not depend on remotes
>> having urls, 2009-11-18) mentions it as a known factor, but with:
>>
>>         diff --git a/transport.c b/transport.c
>>         index 77a61a9d7bb..06159c4184e 100644
>>         --- a/transport.c
>>         +++ b/transport.c
>>         @@ -1115,7 +1115,7 @@ struct transport *transport_get(struct remote *remote, const char *url)
>>                 helper = remote->foreign_vcs;
>>
>>                 if (!url && remote->url)
>>         -               url = remote->url[0];
>>         +               url = remote->url[remote->url_nr - 1];
>>                 ret->url = url;
>>
>>                 /* maybe it is a foreign URL? */
>>
>> All tests pass for me, and it's selecting the last URL now. I can't find
>> any other mention of these semantics in the docs (but maybe I didn't
>> look in the right places).
>>
>> So is this just some accident, does anyone rely on it, and would we be
>> better off just "fixing" this, rather than steering people away from
>> "url"?
>
> I should've mentioned running `git remote -v` on a config with multiple urls
>  shows the correct fetch url, so functionally everything is working as
> intended -- just needs a doc update somewhere.

Ah, yes, it seems to prefer the first configured one, whether that's
what anyone intended (and should we use last configured?) is another
matter.

But in any case, figuring that out and having a test in-tree that fails
if you pick the first or last of the list (depending on what we go for)
would be most welcome...

>> Surely if there's confusion about the priority of the *.url config
>> variable we should be documenting that explicitly where we discuss "url"
>> itself (e.g. in Documentation/config/remote.txt). Just mentioning it in
>> passing as we document "pushUrl" feels like the wrong place.
>>
>> But I still don't quite see the premise. "git push" has a feature to
>> push to all N urls, whether that's Url or pushUrl.
>>
>> When I configure it to have multiple URLs it pushes to the first
>> configured one first, if the source of the confusion was that it didn't
>> prefer the last configured one first, shouldn't it be doing them in
>> reverse order?
>>
>> I don't think that would make sense, but I also don't see how
>> recommending "pushurl" over "url" un-confuses things.
>>
>> So why is it confusing that "fetch" would use the same order, but due to
>> the semantics of a "fetch" we'd stop after the first one?
>
> I agree with you now that updating the documentation in
> Documentation/config/remote.txt is the ideal way to go about this, but

Aside: I actually think near where you made the change in
Documentation/urls-remotes.txt is probably better, in remote.txt we just
point to git-fetch.txt or git-pull.txt etc., which in turn include that.

It just seems we'd need a short blurb about how URLs are selected, we
prefer the first one, and for fetch it's always "stop at the first", and
for push "push to all, from first to last".

I may have gotten that wrong, but that's my current understanding from
looking at it briefly.

> I'll mention what my original thought process was:
> If a user wants one url to push/fetch to, then he defines 'url'
> If a user wants to push to multiple urls, then he can either define
> multiple urls or pushurls (one of the pushurls can be the same as the url).
> But if a user has say url #2 and #3 defined, they act as pushurls anyways,
> so defining them as such removes any speculation as to what else they
> could do (and also clears up the confusion when running
> `git config remote.<remote>.url`).

I'm coming away from this with the impression that we should almost
never recommend "pushUrl", not that it's worthwhile to use it to solve
this ambiguity.

Trying this out, given a config like:

	[remote "avar"]
	        url = git@github.com:avar/git.git
	        url = git@github.com:avar/git1.git
	        url = git@github.com:avar/git2.git

I'll get:

	$ ./git remote -v|grep avar
	avar    git@github.com:avar/git.git (fetch)
	avar    git@github.com:avar/git.git (push)
	avar    git@github.com:avar/git1.git (push)
	avar    git@github.com:avar/git2.git (push)

So, the semantics of "url" is that the first one is always the fetch,
there's no such thing as multiple fetch URLs, and the push URLs are the
list of all "url"'s.

But now let's change that to:

	[remote "avar"]
	        url = git@github.com:avar/git.git
	        url = git@github.com:avar/git1.git
	        pushUrl = git@github.com:avar/git2.git

Which gives us:

	$ ./git remote -v|grep avar
	avar    git@github.com:avar/git.git (fetch)
	avar    git@github.com:avar/git2.git (push)

So, by defining a "pushUrl" all subsequent URLs in "url" have been
shadowed.

Maybe that was considered at the time, but it wouldn't surprise me if
that's a blindspot in the original "pushUrl" patch (203462347fc (Allow
push and fetch urls to be different, 2009-06-09)), i.e. shouldn't we at
least warn on "push" now or somewhere else about the useless second
"url"?

So, back to "pushurl". It seems to me that the only real use-case for it
is the case where the URL really must by different in the case of
"fetch" and "push", which I daresay is increasingly rare these days (but
used to be more common when dumb http read + ssh write or whatever was
more common).

Whereas the more common case is just wanting to fetch/push from/to
github.com, and also wanting to mirror to gitlab.com or whathever.

In that case using "pushUrl" requires you to duplicate the "fetch" url,
whereas just having multiple "url" sections without "pushurl" does the
same thing without the duplication.

Junio C Hamano Feb. 6, 2023, 11 p.m. UTC | #4

Calvin Wan <calvinwan@google.com> writes:

> If a user wants one url to push/fetch to, then he defines 'url'
> If a user wants to push to multiple urls, then he can either define

meaning "the user wants to push to multiple, but never wants to
fetch from the other ones"?  If so, then ...

> multiple urls or pushurls (one of the pushurls can be the same as the url).

... using multiple urls is not desirable as it becomes hard to tell
which one should be used for fetching.  

When you define additional places you want to publish to, you do not
expect to see that new definition affect where you fetch from.  In a
sense, using the first one to fetch, not the last one, gives us less
damage, for that reason.  Adding more URLs would not change where
you fetch from.

But still, I think what you gave is a good rule, regardless of the
answer to the "which among multiple non-push urls do we fetch"
question.  If we were designing this part of the system from
scratch, we probably 

 - would have forbidden multiple URLs, or
 - would allowed multiple URLs, but used the 'last one wins', or
 - would allowed multiple URLs, but and used later ones as fallback,
   only to be used when earlier ones fail.

It certainly is not a designed behaviour what we do when we have
multiple URLs, but if we haven't changed how we react to such a
configuration in the past (and I do not think we ever did), changing
the behaviour in any way this late in the game likely breaks real
users' configuration.

Thanks.

diff --git a/Documentation/urls-remotes.txt b/Documentation/urls-remotes.txt
index 86d0008f94..61aaded645 100644
--- a/Documentation/urls-remotes.txt
+++ b/Documentation/urls-remotes.txt
@@ -33,7 +33,10 @@  config file would appear like this:
 ------------
 
 The `<pushurl>` is used for pushes only. It is optional and defaults
-to `<URL>`.
+to `<URL>`. Additional pushurls can be defined to push to multiple
+remotes. While multiple URLs can be defined to achieve the same
+outcome, this is not recommended since fetch only uses the first
+defined URL.
 
 Named file in `$GIT_DIR/remotes`
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Documentation: clarify multiple pushurls vs urls

Commit Message

Comments

Patch