diff mbox series

[3/4] Add 'promisor-remote' capability to protocol v2

Message ID 20240731134014.2299361-4-christian.couder@gmail.com (mailing list archive)
State New, archived
Headers show
Series Introduce a "promisor-remote" capability | expand

Commit Message

Christian Couder July 31, 2024, 1:40 p.m. UTC
When a server repository S borrows some objects from a promisor remote X,
then a client repository C which would like to clone or fetch from S might,
or might not, want to also borrow objects from X. Also S might, or might
not, want to advertise X as a good way for C to directly get objects from,
instead of C getting everything through S.

To allow S and C to agree on C using X or not, let's introduce a new
"promisor-remote" capability in the protocol v2, as well as a few new
configuration variables:

  - "promisor.advertise" on the server side, and:
  - "promisor.acceptFromServer" on the client side.

By default, or if "promisor.advertise" is set to 'false', a server S will
advertise only the "promisor-remote" capability without passing any
argument through this capability. This means that S supports the new
capability but doesn't wish any client C to directly access any promisor
remote X S might use.

If "promisor.advertise" is set to 'true', S will advertise its promisor
remotes with a string like:

  promisor-remote=<pm-info>[;<pm-info>]...

where each <pm-info> element contains information about a single
promisor remote in the form:

  name=<pm-name>[,url=<pm-url>]

where <pm-name> is the name of a promisor remote and <pm-url> is the
urlencoded url of the promisor remote named <pm-name>.

For now, the URL is passed in addition to the name. In the future, it
might be possible to pass other information like a filter-spec that the
client should use when cloning from S, or a token that the client should
use when retrieving objects from X.

It might also be possible in the future for "promisor.advertise" to have
other values like "onlyName", so that no URL is advertised.

By default or if "promisor.acceptFromServer" is set to "None", the
client will not accept to use the promisor remotes that might have been
advertised by the server. In this case, the client will advertise only
"promisor-remote" in its reply to the server. This means that the client
has the "promisor-remote" capability but decided not to use any of the
promisor remotes that the server might have advertised.

If "promisor.acceptFromServer" is set to "All", on the contrary, the
client will accept to use all the promisor remotes that the server
advertised and it will reply with a string like:

  promisor-remote=<pm-name>[;<pm-name>]...

where the <pm-name> elements are the names of all the promisor remotes
the server advertised. If the server advertised no promisor remote
though, the client will reply with just "promisor-remote".

In a following commit, other values for "promisor.acceptFromServer" will
be implemented so that the client will be able to decide the promisor
remotes it accepts depending on the name and URL it received from the
server. So even if that name and URL information is not used much right
now, it will be needed soon.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 Documentation/config/promisor.txt     |  13 ++
 Documentation/gitprotocol-v2.txt      |  37 ++++++
 connect.c                             |   7 +
 promisor-remote.c                     | 182 ++++++++++++++++++++++++++
 promisor-remote.h                     |  26 +++-
 serve.c                               |  21 +++
 t/t5555-http-smart-common.sh          |   1 +
 t/t5701-git-serve.sh                  |   1 +
 t/t5710-promisor-remote-capability.sh | 124 ++++++++++++++++++
 upload-pack.c                         |   3 +
 10 files changed, 414 insertions(+), 1 deletion(-)
 create mode 100755 t/t5710-promisor-remote-capability.sh

Comments

Taylor Blau July 31, 2024, 3:40 p.m. UTC | #1
On Wed, Jul 31, 2024 at 03:40:13PM +0200, Christian Couder wrote:
> By default, or if "promisor.advertise" is set to 'false', a server S will
> advertise only the "promisor-remote" capability without passing any
> argument through this capability. This means that S supports the new
> capability but doesn't wish any client C to directly access any promisor
> remote X S might use.

Even if the server supports this new capability, is there a reason to
advertise it to the client if the server knows ahead of time that it has
no promisor remotes to advertise?

I am not sure what action the client would take if it knows the server
supports this capability, but does not actually have any promisor
remotes to advertise. I would suggest that setting promisor.advertise to
false indeed prevents advertising it as a capability in the first place.

Selfishly, it prevents some issues that I have when rolling out new Git
versions within GitHub's infrastructure, since our push proxy layer
picks a single replica to replay the capabilities from, but obviously
replays the client's response to all replicas. So if only some replicas
understand the new 'promisor-remote' capability, we can run into issues.

I'm not sure if the client even bothers to send back promisor-remote if
the server did not send any such remotes to begin with, but between that
and what I wrote in the second paragraph here, I don't see a reason to
advertise the capability when promisor.advertise is false.

Thanks,
Taylor
Taylor Blau July 31, 2024, 4:16 p.m. UTC | #2
On Wed, Jul 31, 2024 at 03:40:13PM +0200, Christian Couder wrote:
> diff --git a/Documentation/gitprotocol-v2.txt b/Documentation/gitprotocol-v2.txt
> index 414bc625d5..4d8d3839c4 100644
> --- a/Documentation/gitprotocol-v2.txt
> +++ b/Documentation/gitprotocol-v2.txt
> @@ -781,6 +781,43 @@ retrieving the header from a bundle at the indicated URI, and thus
>  save themselves and the server(s) the request(s) needed to inspect the
>  headers of that bundle or bundles.
>
> +promisor-remote=<pr-infos>
> +~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +The server may advertise some promisor remotes it is using, if it's OK
> +for the server that a client uses them too. In this case <pr-infos>
> +should be of the form:
> +
> +	pr-infos = pr-info | pr-infos ";" pr-info

So <pr-infos> uses the ';' character to delimit multiple <pr-info>s,
which means that <pr-info> can't use ';' itself. You mention above that
<pr-info> is supposed to be generic so that we can add other fields to
it in the future. Do you imagine that any of those fields might want to
use the ';' in their values?

One that comes to mind is the shared token example you wrote about
above. It would be nice to not restrict what characters the token can
contain.

I wonder if it would instead be useful to have <pr-infos> first write
out how many <pr-info>s it contains, and then write out each <pr-info>
separated by a NUL byte, so that none of the files in the <pr-info>
itself are restricted in what characters they can use.

> +static void promisor_info_vecs(struct repository *repo,
> +			       struct strvec *names,
> +			       struct strvec *urls)
> +{
> +	struct promisor_remote *r;
> +
> +	promisor_remote_init(repo);
> +
> +	for (r = repo->promisor_remote_config->promisors; r; r = r->next) {
> +		char *url;
> +		char *url_key = xstrfmt("remote.%s.url", r->name);
> +
> +		strvec_push(names, r->name);
> +		strvec_push(urls, git_config_get_string(url_key, &url) ? NULL : url);

Do you mean to push NULL onto urls here? It seems risky since you have
to check that each entry in the strvec is non-NULL before printing it
out (which you do below in promisor_remote_info()).

Or maybe you need to in order to advertise promisor remotes without
URLs? If so, I'm not sure what the benefit would be to the client if it
doesn't know where to go to retrieve any objects without having a URL.

Thanks,
Taylor
Junio C Hamano July 31, 2024, 6:25 p.m. UTC | #3
Christian Couder <christian.couder@gmail.com> writes:

> When a server repository S borrows some objects from a promisor remote X,
> then a client repository C which would like to clone or fetch from S might,
> or might not, want to also borrow objects from X. Also S might, or might
> not, want to advertise X as a good way for C to directly get objects from,
> instead of C getting everything through S.

If S is a clone that is keeping up to date with X, even if it does
not borrow anything from X, as long as X is known to be much better
connected to the world (e.g., it is in a $LARGEINTERNETCOMPANY
datacenter with petabit/s backbone connections) than S is (e.g., it
is my deskside box on a cable modem), it may be beneficial if S can
omit objects from its "git fetch" response to C, if C is willing to
fill the gap using X.

So it is of dubious value to limit the feature only to cases where S
"borrows" from X, is it?

> To allow S and C to agree on C using X or not, let's introduce a new
> "promisor-remote" capability in the protocol v2, as well as a few new
> configuration variables:
>
>   - "promisor.advertise" on the server side, and:
>   - "promisor.acceptFromServer" on the client side.
>
> By default, or if "promisor.advertise" is set to 'false', a server S will
> advertise only the "promisor-remote" capability without passing any
> argument through this capability. This means that S supports the new
> capability but doesn't wish any client C to directly access any promisor
> remote X S might use.

I would find it more natural if .advertise is turned off by setting
it explicitly to "false", we would pretend as if we have never even
heard of such a capability.

> If "promisor.advertise" is set to 'true', S will advertise its promisor
> remotes with a string like:
>
>   promisor-remote=<pm-info>[;<pm-info>]...
>
> where each <pm-info> element contains information about a single
> promisor remote in the form:
>
>   name=<pm-name>[,url=<pm-url>]
> where <pm-name> is the name of a promisor remote and <pm-url> is the
> urlencoded url of the promisor remote named <pm-name>.

OK, so pm-name cannot have ";," in it (which is sensible, or define
pm-name more tightly, like "only lowercase alnum").  URL cannot have
';' or ',' in it that is an OK limitation as URL encoding can hide
them.

> For now, the URL is passed in addition to the name. In the future, it
> might be possible to pass other information like a filter-spec that the
> client should use when cloning from S, or a token that the client should
> use when retrieving objects from X.

OK.  And obviously they cannot have ';," in them without encoding
similarly.

> It might also be possible in the future for "promisor.advertise" to have
> other values like "onlyName", so that no URL is advertised.

Saying "<pm-info> is expected to be extended" should be sufficient,
without inviting discussions like "what good does it do to give only
names" that is irrelevant at least at this moment.

> By default or if "promisor.acceptFromServer" is set to "None", the
> client will not accept to use the promisor remotes that might have been
> advertised by the server. In this case, the client will advertise only
> "promisor-remote" in its reply to the server. This means that the client
> has the "promisor-remote" capability but decided not to use any of the
> promisor remotes that the server might have advertised.

OK, that is a signal to the server side that it is not allowed to
omit any objects from its response to "git fetch" request, even
though they might be available via a better connected remotes.

> If "promisor.acceptFromServer" is set to "All", on the contrary, the
> client will accept to use all the promisor remotes that the server
> advertised and it will reply with a string like:
>
>   promisor-remote=<pm-name>[;<pm-name>]...
>
> where the <pm-name> elements are the names of all the promisor remotes
> the server advertised.

So, this is why we need "name" for each "pm-info"---to give a short
name associated with the URL of the remote repository.

Presumably, C has an option to see if each of the remote suggested
is reachable and omit remotes that are not available to C from its
response, so even when .accept is set to "all", the response may not
list all the names of remotes that S advertised, in general.

> If the server advertised no promisor remote
> though, the client will reply with just "promisor-remote".

In other words, at the protocol level:

 - S uses promisor-remote capability to tell C what are potentially
   useful alternate remotes to obtain objects that C may want to
   fetch from S

 - C uses promisor-remote capability to tell S that among the
   remotes advertised by S, it is willing to use the named remotes
   as its promisor, which permits S from omitting objects from its
   response to "git fetch" request from C as long as they are known
   to be available from these remotes.

I think that makes sense, but I do not see the point of sending an
empty promisor-remote capability at all.

What practical difference would it make to S and C, if S chooses not
to advertise the capability at all, instead of advertising an empty
remote list with the capability?  Both tells C that it is useless to
request promistor-remote capability to S in its response.

What practical difference would it make to S and C, if C chooses not
to advertise the capability at all, instead of advertising an empty
remote list with the capability?  Both tells S that S is not allowed
to omit objects that are obtainable from elsewhere.

> In a following commit, other values for "promisor.acceptFromServer" will
> be implemented so that the client will be able to decide the promisor
> remotes it accepts depending on the name and URL it received from the
> server. So even if that name and URL information is not used much right
> now, it will be needed soon.

OK.

> diff --git a/Documentation/config/promisor.txt b/Documentation/config/promisor.txt
> index 98c5cb2ec2..e3939d83a9 100644
> --- a/Documentation/config/promisor.txt
> +++ b/Documentation/config/promisor.txt
> @@ -1,3 +1,16 @@
>  promisor.quiet::
>  	If set to "true" assume `--quiet` when fetching additional
>  	objects for a partial clone.
> +
> +promisor.advertise::
> +	If set to "true", a server will use the "promisor-remote"
> +	capability, see linkgit:gitprotocol-v2[5], to advertise the
> +	promisor remotes it is using if any. Default is "false", which
> +	means no promisor remote is advertised.

Even though I said that there logically is not much reason to tie
this advertisement to the use of promistor remote by the serving
side, I am OK if the initial implementation is limited to that
arrangement.  It would be an easy change to allow this variable
to take a list of remote repositories that may (or may not) be a
promisor remote of this repository (in other words, "they are clones
that are better connected than me") in the future, but that does not
have to happen in the initial iteration.

It would be less confusing to first-time readers if you described
the intent a bit better.  Why would the server want to advertise and
how would the client take advantage of the information?  I see that
the update in this patch to protocol document is skimpy on this point,
and end-user facing documentation has better exposure anyway, so
let's see what we can do here.

    The "promisor-remote" protocol capability can be used by the
    responder to "git fetch" to advertise better-connected remotes
    that the requester can use as promisor remotes, instead of this
    repository, so that "git fetch" requestor can lazily fetch
    objects from these other better-connected remotes.  If this
    configuration variable is set to "true",...

or something, perhaps?

"no promisor remote is advertised" -> "no promisor-remote capability
is advertised".

> +promisor.acceptFromServer::
> +	If set to "all", a client will accept all the promisor remotes
> +	a server might advertise using the "promisor-remote"
> +	capability, see linkgit:gitprotocol-v2[5]. Default is "none",
> +	which means no promisor remote advertised by a server will be
> +	accepted.

Similarly, readers would want to know what the implication is to
"accept" promisor remotes.

	accept ..., and adds them to its promisor remotes, allowing
        the server to omit objects from its response to "fetch"
        requests that are lazily fetchable from these promisor
        remotes, see linkgit:gitprotocol-v2[5].

or something?

> diff --git a/Documentation/gitprotocol-v2.txt b/Documentation/gitprotocol-v2.txt
> index 414bc625d5..4d8d3839c4 100644
> --- a/Documentation/gitprotocol-v2.txt
> +++ b/Documentation/gitprotocol-v2.txt
> @@ -781,6 +781,43 @@ retrieving the header from a bundle at the indicated URI, and thus
>  save themselves and the server(s) the request(s) needed to inspect the
>  headers of that bundle or bundles.
>  
> +promisor-remote=<pr-infos>
> +~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +The server may advertise some promisor remotes it is using, if it's OK
> +for the server that a client uses them too. In this case <pr-infos>
> +should be of the form:

As this is the protocol documentation, we should describe what goes
over the wire and what they mean, regardless of how limited the
initial implementation on either end is.  Advertising the promisor
remotes the server side relies on is probably not what we want to
see this capability limited to forever (remember the previous "X is
much better connected than S" example).

    "it is using, if it's OK ..." -> "the other side may want to use
    as its promisot remotes, instead of this repository"

> +	pr-infos = pr-info | pr-infos ";" pr-info
> +
> +	pr-info = "name=" pr-name | "name=" pr-name "," "url=" pr-url
> +
> +where `pr-name` is the name of a promisor remote, and `pr-url` the
> +urlencoded URL of that promisor remote.

Clarify what the syntax for a valid name here.  Also stress the fact
that ';' and ',' MUST be encoded if it appears in 'pr-url'.

> +In this case a client wanting to use one or more promisor remotes the
> +server advertised should reply with "promisor-remote=<pr-names>" where
> +<pr-names> should be of the form:
> +
> +	pr-names = pr-name | pr-names ";" pr-name
> +
> +where `pr-name` is the name of a promisor remote the server
> +advertised.

After seeing advertisement, client can use some it picked but it
does not have to tell the server about it.  Why would it respond
with the promisor remotes, and what effect does it have to give the
list of promisor remotes it uses?

    If the "git fetch" side decides to use one or more promisor
    remotes advertised, it can reply with ...
    ...
    where ... the server advertised.  Doing so allows the server to
    make its response smaller by omitting objects that are known to
    be lazily fetchable from these other promisor remotes
    repositories.

perhaps?

> +If the server prefers a client not to use any promisor remote the
> +server uses, or if the server doesn't use any promisor remote, it
> +should only advertise "promisor-remote" without any value or "=" sign
> +after it.

It should not advertise "promisor-remote" capability at all.

> +In this case, or if the client doesn't want to use any promisor remote
> +the server advertised, the client should reply only "promisor-remote"
> +without any value or "=" sign after it.

Likewise.  It should not advertise "promisor-remote" capability at
all.

> +The "promisor.advertise" and "promisor.acceptFromServer" configuration
> +options can be used on the server and client side respectively to
> +control what they advertise or accept respectively. See the
> +documentation of these configuration options for more information.

OK.
Junio C Hamano July 31, 2024, 7:34 p.m. UTC | #4
Junio C Hamano <gitster@pobox.com> writes:

> Christian Couder <christian.couder@gmail.com> writes:
>
>> When a server repository S borrows some objects from a promisor remote X,
>> then a client repository C which would like to clone or fetch from S might,
>> or might not, want to also borrow objects from X. Also S might, or might
>> not, want to advertise X as a good way for C to directly get objects from,
>> instead of C getting everything through S.
>
> If S is a clone that is keeping up to date with X, even if it does
> not borrow anything from X, as long as X is known to be much better
> connected to the world (e.g., it is in a $LARGEINTERNETCOMPANY
> datacenter with petabit/s backbone connections) than S is (e.g., it
> is my deskside box on a cable modem), it may be beneficial if S can
> omit objects from its "git fetch" response to C, if C is willing to
> fill the gap using X.
>
> So it is of dubious value to limit the feature only to cases where S
> "borrows" from X, is it?

An even better example is if S on my deskside box is the source of
truth, and X in $LARGEINTERNETCOMPANY datacenter that is much better
connected is a publishing repository I use to push from S to X.

Even if you originally cloned from S and use S as your promisor
remote, as the operator of S, I would like you to always consult X
first to reduce the load on S when lazily fetching objects that you
are missing.
Patrick Steinhardt Aug. 5, 2024, 1:48 p.m. UTC | #5
On Wed, Jul 31, 2024 at 03:40:13PM +0200, Christian Couder wrote:
> diff --git a/Documentation/gitprotocol-v2.txt b/Documentation/gitprotocol-v2.txt
> index 414bc625d5..4d8d3839c4 100644
> --- a/Documentation/gitprotocol-v2.txt
> +++ b/Documentation/gitprotocol-v2.txt
> @@ -781,6 +781,43 @@ retrieving the header from a bundle at the indicated URI, and thus
>  save themselves and the server(s) the request(s) needed to inspect the
>  headers of that bundle or bundles.
>  
> +promisor-remote=<pr-infos>
> +~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +The server may advertise some promisor remotes it is using, if it's OK
> +for the server that a client uses them too. In this case <pr-infos>
> +should be of the form:
> +
> +	pr-infos = pr-info | pr-infos ";" pr-info
> +
> +	pr-info = "name=" pr-name | "name=" pr-name "," "url=" pr-url
> +
> +where `pr-name` is the name of a promisor remote, and `pr-url` the
> +urlencoded URL of that promisor remote.
> +
> +In this case a client wanting to use one or more promisor remotes the
> +server advertised should reply with "promisor-remote=<pr-names>" where
> +<pr-names> should be of the form:
> +
> +	pr-names = pr-name | pr-names ";" pr-name
> +
> +where `pr-name` is the name of a promisor remote the server
> +advertised.
> +
> +If the server prefers a client not to use any promisor remote the
> +server uses, or if the server doesn't use any promisor remote, it
> +should only advertise "promisor-remote" without any value or "=" sign
> +after it.
> +
> +In this case, or if the client doesn't want to use any promisor remote
> +the server advertised, the client should reply only "promisor-remote"
> +without any value or "=" sign after it.

Why does the client have to advertise anything if they don't want to use
any of the promisor remotes?

> +The "promisor.advertise" and "promisor.acceptFromServer" configuration
> +options can be used on the server and client side respectively to
> +control what they advertise or accept respectively. See the
> +documentation of these configuration options for more information.

One thing I'm not totally clear on is the consequence of this
capability. What is the expected consequence if the client accepts one
of the promisor remotes? What is the consequence if the client accepts
none?

In the former case I'd expect that the server is free to omit objects,
but that isn't made explicit anywhere, I think. Also, is there any
mechanism that tells the client exactly which objects have been omitted?
In the latter case I assume that the result will be a full clone, that
is the server fetched any objects it didn't have from the promisor?

Or does the server side continue to only honor whatever the client has
provided as object filters, but signals to the client that it shall
please contact somebody else when backfilling those promised objects?

Patrick
Junio C Hamano Aug. 19, 2024, 8 p.m. UTC | #6
Patrick Steinhardt <ps@pks.im> writes:

>> +In this case, or if the client doesn't want to use any promisor remote
>> +the server advertised, the client should reply only "promisor-remote"
>> +without any value or "=" sign after it.
>
> Why does the client have to advertise anything if they don't want to use
> any of the promisor remotes?

Yeah, it is not very well justified why an empty capability needs to
be sent (from both sides).  My recommendation is to drop that part
of the design, but if there is a reason to keep, it should be done
by explaining how differently the other side should behave when the
capability is not sent at all and when the capability with no
promisor remote is sent.

>> +The "promisor.advertise" and "promisor.acceptFromServer" configuration
>> +options can be used on the server and client side respectively to
>> +control what they advertise or accept respectively. See the
>> +documentation of these configuration options for more information.
>
> One thing I'm not totally clear on is the consequence of this
> capability. What is the expected consequence if the client accepts one
> of the promisor remotes? What is the consequence if the client accepts
> none?

Yes, I also found the documentation lacking in that respect.  The
series talks about how the exchange can proceed, without saying much
(if anything) about what both sides want to exchange promisor-remote
for---what effect does it have on the behaviour of both sides to
send one.  I covered this point in one of my reviews a bit more.

  https://lore.kernel.org/git/xmqqikwl2zca.fsf@gitster.g/

> In the former case I'd expect that the server is free to omit objects,
> but that isn't made explicit anywhere, I think. Also, is there any
> mechanism that tells the client exactly which objects have been omitted?
> In the latter case I assume that the result will be a full clone, that
> is the server fetched any objects it didn't have from the promisor?
>
> Or does the server side continue to only honor whatever the client has
> provided as object filters, but signals to the client that it shall
> please contact somebody else when backfilling those promised objects?

Thanks.
Christian Couder Aug. 20, 2024, 11:32 a.m. UTC | #7
On Wed, Jul 31, 2024 at 6:16 PM Taylor Blau <me@ttaylorr.com> wrote:
>
> On Wed, Jul 31, 2024 at 03:40:13PM +0200, Christian Couder wrote:
> > diff --git a/Documentation/gitprotocol-v2.txt b/Documentation/gitprotocol-v2.txt
> > index 414bc625d5..4d8d3839c4 100644
> > --- a/Documentation/gitprotocol-v2.txt
> > +++ b/Documentation/gitprotocol-v2.txt
> > @@ -781,6 +781,43 @@ retrieving the header from a bundle at the indicated URI, and thus
> >  save themselves and the server(s) the request(s) needed to inspect the
> >  headers of that bundle or bundles.
> >
> > +promisor-remote=<pr-infos>
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +The server may advertise some promisor remotes it is using, if it's OK
> > +for the server that a client uses them too. In this case <pr-infos>
> > +should be of the form:
> > +
> > +     pr-infos = pr-info | pr-infos ";" pr-info
>
> So <pr-infos> uses the ';' character to delimit multiple <pr-info>s,
> which means that <pr-info> can't use ';' itself. You mention above that
> <pr-info> is supposed to be generic so that we can add other fields to
> it in the future. Do you imagine that any of those fields might want to
> use the ';' in their values?

Yeah, but, as for the 'url' field where the value is urlencoded, the
value can be encoded if it could contain some special characters.

> One that comes to mind is the shared token example you wrote about
> above. It would be nice to not restrict what characters the token can
> contain.

I agree but I think urlencoding, or maybe other simple encodings like
base64, should be easy and simple enough to work around this.

> I wonder if it would instead be useful to have <pr-infos> first write
> out how many <pr-info>s it contains, and then write out each <pr-info>
> separated by a NUL byte, so that none of the files in the <pr-info>
> itself are restricted in what characters they can use.

I am not sure how NUL bytes would interfere with the pkt-line.[c,h] code though.

> > +static void promisor_info_vecs(struct repository *repo,
> > +                            struct strvec *names,
> > +                            struct strvec *urls)
> > +{
> > +     struct promisor_remote *r;
> > +
> > +     promisor_remote_init(repo);
> > +
> > +     for (r = repo->promisor_remote_config->promisors; r; r = r->next) {
> > +             char *url;
> > +             char *url_key = xstrfmt("remote.%s.url", r->name);
> > +
> > +             strvec_push(names, r->name);
> > +             strvec_push(urls, git_config_get_string(url_key, &url) ? NULL : url);
>
> Do you mean to push NULL onto urls here? It seems risky since you have
> to check that each entry in the strvec is non-NULL before printing it
> out (which you do below in promisor_remote_info()).

The code doesn't seem risky to me as it allows us to treat the case
when git_config_get_string() fails and when it succeeds but possibly
sets 'url' to NULL (not sure if it's possible though as I didn't
check) in the same way.

Yeah, it means that we have to check if each entry in the strvec is
non-NULL, but I think it's quite easy, and honestly I didn't want to
ask myself questions like should we treat an URL of a remote
configured as an empty string in the same way as the URL not
configured. I think it's much simpler to just pass as-is the content,
if any, that we get from git_config_get_string().

> Or maybe you need to in order to advertise promisor remotes without
> URLs?

Yeah, I think we should advertise promisor remotes that don't have an
URL configured. It might seem strange, but maybe servers might want in
the future to have hidden/secret URLs (URLs that they use, likely
internally on the server, but don't want to pass for some reason to a
client).

> If so, I'm not sure what the benefit would be to the client if it
> doesn't know where to go to retrieve any objects without having a URL.

The client might already have an URL for the promisor-remote (and it
might be a different one than the one the server would use if
hidden/secret URLs become a thing). That's why patch 4/4 implements
the "KnownName" value for the "promisor.acceptFromServer" config
option.
Christian Couder Aug. 20, 2024, 11:32 a.m. UTC | #8
On Wed, Jul 31, 2024 at 5:40 PM Taylor Blau <me@ttaylorr.com> wrote:
>
> On Wed, Jul 31, 2024 at 03:40:13PM +0200, Christian Couder wrote:
> > By default, or if "promisor.advertise" is set to 'false', a server S will
> > advertise only the "promisor-remote" capability without passing any
> > argument through this capability. This means that S supports the new
> > capability but doesn't wish any client C to directly access any promisor
> > remote X S might use.
>
> Even if the server supports this new capability, is there a reason to
> advertise it to the client if the server knows ahead of time that it has
> no promisor remotes to advertise?

I think it could be useful at least in some cases for C to know that S
has the capability to advertise promisor remotes but decided not to
advertise any. For example, if C knows that the repo has a lot of very
large files, it might realize that S is likely not a good mirror of
the repo if it doesn't have the 'promisor-remote' capability.

I agree that it's more useful the other way though. That is for a
server to know that the client has the capability but might not want
to use it.

For example, when C clones without using X directly, it can be a
burden for S to have to fetch large objects from X (as it would use
precious disk space on S, and unnecessarily duplicate large objects).
So S might want to say "please use a newer or different client that
has the 'promisor-remote' capability" if it knows that the client
doesn't have this capability. If S knows that C has the capability but
didn't configure it or doesn't want to use it, it could instead say
something like "please consider activating the 'promisor-remote'
capability by doing this and that to avoid burdening this server and
get a faster clone".

Note that the client might not be 'git'. It might be a "compatible"
implementation (libgit2, gix, JGit, etc), so using the version passed
in the "agent" protocol capability is not a good way to detect if the
client has the capability or not.

In the end, as it looks very useful for S to know if C has the
capability or not, and as it seems natural that S and C behave the
same regarding advertising the capability, I think the choice of
always advertising the capability, even when not using it, is the
right one.

> I am not sure what action the client would take if it knows the server
> supports this capability, but does not actually have any promisor
> remotes to advertise. I would suggest that setting promisor.advertise to
> false indeed prevents advertising it as a capability in the first place.

It could, in some cases, help C realize that S is likely using old or
unoptimized server software for the repo, and C could decide based on
this to use a different mirror repo. For example if C wants to clone
some well known open source AI repo that has a lot of very large files
and is mirrored on many common repo hosting platforms (GitHub, GitLab,
etc), C might be happy to get a clue of how likely the different
mirrors are to be optimized to serve that repo.

I agree that it might not be a very good reason right now, but I think
it might be in the future. Anyway the main reason for such a behavior
is (as I said above) that it is very useful for S to know if C has the
'promisor-remote' capability or not.

> Selfishly, it prevents some issues that I have when rolling out new Git
> versions within GitHub's infrastructure, since our push proxy layer
> picks a single replica to replay the capabilities from, but obviously
> replays the client's response to all replicas. So if only some replicas
> understand the new 'promisor-remote' capability, we can run into issues.

I understand the problem, but I think it might be worked around by
first deploying on a single replica with the new 'promisor-remote'
capability disabled in the config, which is the default. Yeah, that
replica might behave a bit differently than the others, but the client
behavior shouldn't change much. And when things work well with a
single replica with the capability disabled, then more replicas with
that capability disabled can be rolled out until they all have it.

More issues are likely to happen when actually enabling the
capability, but this is independent of the fact that the
'promisor-remote' capability is advertised even if it is not enabled.

> I'm not sure if the client even bothers to send back promisor-remote if
> the server did not send any such remotes to begin with,

If S sends 'promisor-remote' even without sending any remote
information then C should reply using 'promisor-remote' too. I think
it can help S decide if setting up promisor remotes is worth it or
not, if S can easily know if many of its clients could use them or
not.


> but between that
> and what I wrote in the second paragraph here, I don't see a reason to
> advertise the capability when promisor.advertise is false.
Christian Couder Aug. 20, 2024, 12:21 p.m. UTC | #9
On Wed, Jul 31, 2024 at 8:25 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Christian Couder <christian.couder@gmail.com> writes:
>
> > When a server repository S borrows some objects from a promisor remote X,
> > then a client repository C which would like to clone or fetch from S might,
> > or might not, want to also borrow objects from X. Also S might, or might
> > not, want to advertise X as a good way for C to directly get objects from,
> > instead of C getting everything through S.
>
> If S is a clone that is keeping up to date with X, even if it does
> not borrow anything from X, as long as X is known to be much better
> connected to the world (e.g., it is in a $LARGEINTERNETCOMPANY
> datacenter with petabit/s backbone connections) than S is (e.g., it
> is my deskside box on a cable modem), it may be beneficial if S can
> omit objects from its "git fetch" response to C, if C is willing to
> fill the gap using X.
>
> So it is of dubious value to limit the feature only to cases where S
> "borrows" from X, is it?

I agree. I just blindly copied the word from something you said in a
previous thread, but I should have thought more about the use case you
suggest.

> > To allow S and C to agree on C using X or not, let's introduce a new
> > "promisor-remote" capability in the protocol v2, as well as a few new
> > configuration variables:
> >
> >   - "promisor.advertise" on the server side, and:
> >   - "promisor.acceptFromServer" on the client side.
> >
> > By default, or if "promisor.advertise" is set to 'false', a server S will
> > advertise only the "promisor-remote" capability without passing any
> > argument through this capability. This means that S supports the new
> > capability but doesn't wish any client C to directly access any promisor
> > remote X S might use.
>
> I would find it more natural if .advertise is turned off by setting
> it explicitly to "false", we would pretend as if we have never even
> heard of such a capability.

In a reply to Taylor I just sent, I tried to explain why I think it's
a good thing that S can know if C has the capability even if neither S
nor C actually use it.

If, in the future, many servers and repos are transitioned to setups
where promisor remotes and this capability are used, then I think it
could help a lot if S can help C take advantage of X, and better
diagnose issues. And perhaps the other way around (C knowing that S
has the capability or not) could help a bit too.

> > If "promisor.advertise" is set to 'true', S will advertise its promisor
> > remotes with a string like:
> >
> >   promisor-remote=<pm-info>[;<pm-info>]...
> >
> > where each <pm-info> element contains information about a single
> > promisor remote in the form:
> >
> >   name=<pm-name>[,url=<pm-url>]
> > where <pm-name> is the name of a promisor remote and <pm-url> is the
> > urlencoded url of the promisor remote named <pm-name>.
>
> OK, so pm-name cannot have ";," in it (which is sensible, or define
> pm-name more tightly, like "only lowercase alnum").

Not sure what our config allows here. Ideally I think the capability
should support everything our config supports.

>  URL cannot have
> ';' or ',' in it that is an OK limitation as URL encoding can hide
> them.

Yeah, right.

> > For now, the URL is passed in addition to the name. In the future, it
> > might be possible to pass other information like a filter-spec that the
> > client should use when cloning from S, or a token that the client should
> > use when retrieving objects from X.
>
> OK.  And obviously they cannot have ';," in them without encoding
> similarly.

Yeah, or maybe using a different encoding if it's better for some reason.

> > It might also be possible in the future for "promisor.advertise" to have
> > other values like "onlyName", so that no URL is advertised.
>
> Saying "<pm-info> is expected to be extended" should be sufficient,
> without inviting discussions like "what good does it do to give only
> names" that is irrelevant at least at this moment.

Yeah, but in this case Taylor had comments related to advertising
remotes that have a name but no URL, so I think this example could
help people having related questions understand that it could be a
good thing to advertise only a URL name and no URL.

> > By default or if "promisor.acceptFromServer" is set to "None", the
> > client will not accept to use the promisor remotes that might have been
> > advertised by the server. In this case, the client will advertise only
> > "promisor-remote" in its reply to the server. This means that the client
> > has the "promisor-remote" capability but decided not to use any of the
> > promisor remotes that the server might have advertised.
>
> OK, that is a signal to the server side that it is not allowed to
> omit any objects from its response to "git fetch" request, even
> though they might be available via a better connected remotes.

Yeah, right.

> > If "promisor.acceptFromServer" is set to "All", on the contrary, the
> > client will accept to use all the promisor remotes that the server
> > advertised and it will reply with a string like:
> >
> >   promisor-remote=<pm-name>[;<pm-name>]...
> >
> > where the <pm-name> elements are the names of all the promisor remotes
> > the server advertised.
>
> So, this is why we need "name" for each "pm-info"---to give a short
> name associated with the URL of the remote repository.

Yeah, I think it can be valuable to use names to agree on
promisor-remotes that should or shouldn't be used, like we commonly
use remote names with `git clone`, `git fetch`, etc.

> Presumably, C has an option to see if each of the remote suggested
> is reachable and omit remotes that are not available to C from its
> response, so even when .accept is set to "all", the response may not
> list all the names of remotes that S advertised, in general.

I didn't think about reachability or connectivity specifically, but I
agree it might be useful for C to be able to filter in some ways the
promisor remotes that S advertised.

> > If the server advertised no promisor remote
> > though, the client will reply with just "promisor-remote".
>
> In other words, at the protocol level:
>
>  - S uses promisor-remote capability to tell C what are potentially
>    useful alternate remotes to obtain objects that C may want to
>    fetch from S
>
>  - C uses promisor-remote capability to tell S that among the
>    remotes advertised by S, it is willing to use the named remotes
>    as its promisor, which permits S from omitting objects from its
>    response to "git fetch" request from C as long as they are known
>    to be available from these remotes.

Yeah, right.

> I think that makes sense, but I do not see the point of sending an
> empty promisor-remote capability at all.

I hope I gave good enough reasons above and in my replies to Taylor.

> What practical difference would it make to S and C, if S chooses not
> to advertise the capability at all, instead of advertising an empty
> remote list with the capability?  Both tells C that it is useless to
> request promistor-remote capability to S in its response.
>
> What practical difference would it make to S and C, if C chooses not
> to advertise the capability at all, instead of advertising an empty
> remote list with the capability?  Both tells S that S is not allowed
> to omit objects that are obtainable from elsewhere.

I agree that when looking at how Git related things technically work,
it doesn't change anything, but I think we should look at the big
picture too. For GitLab, for example (but I suppose it will be similar
for GitHub and other hosting sites), it will be important to help
users take advantage, as seamlessly as possible, of the feature, and
the more we know about the client they use and its capabilities, the
better job we can do to help them.

> > In a following commit, other values for "promisor.acceptFromServer" will
> > be implemented so that the client will be able to decide the promisor
> > remotes it accepts depending on the name and URL it received from the
> > server. So even if that name and URL information is not used much right
> > now, it will be needed soon.
>
> OK.
>
> > diff --git a/Documentation/config/promisor.txt b/Documentation/config/promisor.txt
> > index 98c5cb2ec2..e3939d83a9 100644
> > --- a/Documentation/config/promisor.txt
> > +++ b/Documentation/config/promisor.txt
> > @@ -1,3 +1,16 @@
> >  promisor.quiet::
> >       If set to "true" assume `--quiet` when fetching additional
> >       objects for a partial clone.
> > +
> > +promisor.advertise::
> > +     If set to "true", a server will use the "promisor-remote"
> > +     capability, see linkgit:gitprotocol-v2[5], to advertise the
> > +     promisor remotes it is using if any. Default is "false", which
> > +     means no promisor remote is advertised.
>
> Even though I said that there logically is not much reason to tie
> this advertisement to the use of promistor remote by the serving
> side, I am OK if the initial implementation is limited to that
> arrangement.  It would be an easy change to allow this variable
> to take a list of remote repositories that may (or may not) be a
> promisor remote of this repository (in other words, "they are clones
> that are better connected than me") in the future, but that does not
> have to happen in the initial iteration.

Yeah, I will mention something like this in the next iteration.

> It would be less confusing to first-time readers if you described
> the intent a bit better.  Why would the server want to advertise and
> how would the client take advantage of the information?

Yeah, I will try to better answer that question in the doc, cover
letter and commit messages.

> I see that
> the update in this patch to protocol document is skimpy on this point,
> and end-user facing documentation has better exposure anyway, so
> let's see what we can do here.
>
>     The "promisor-remote" protocol capability can be used by the
>     responder to "git fetch" to advertise better-connected remotes
>     that the requester can use as promisor remotes, instead of this
>     repository, so that "git fetch" requestor can lazily fetch
>     objects from these other better-connected remotes.  If this
>     configuration variable is set to "true",...
>
> or something, perhaps?

Thanks for the suggestion. I will base the changes in version 2 on it.

> "no promisor remote is advertised" -> "no promisor-remote capability
> is advertised".

Right.

> > +promisor.acceptFromServer::
> > +     If set to "all", a client will accept all the promisor remotes
> > +     a server might advertise using the "promisor-remote"
> > +     capability, see linkgit:gitprotocol-v2[5]. Default is "none",
> > +     which means no promisor remote advertised by a server will be
> > +     accepted.
>
> Similarly, readers would want to know what the implication is to
> "accept" promisor remotes.
>
>         accept ..., and adds them to its promisor remotes, allowing
>         the server to omit objects from its response to "fetch"
>         requests that are lazily fetchable from these promisor
>         remotes, see linkgit:gitprotocol-v2[5].
>
> or something?

Thanks for the suggestion. I will improve the wording based on it.

> > diff --git a/Documentation/gitprotocol-v2.txt b/Documentation/gitprotocol-v2.txt
> > index 414bc625d5..4d8d3839c4 100644
> > --- a/Documentation/gitprotocol-v2.txt
> > +++ b/Documentation/gitprotocol-v2.txt
> > @@ -781,6 +781,43 @@ retrieving the header from a bundle at the indicated URI, and thus
> >  save themselves and the server(s) the request(s) needed to inspect the
> >  headers of that bundle or bundles.
> >
> > +promisor-remote=<pr-infos>
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +The server may advertise some promisor remotes it is using, if it's OK
> > +for the server that a client uses them too. In this case <pr-infos>
> > +should be of the form:
>
> As this is the protocol documentation, we should describe what goes
> over the wire and what they mean, regardless of how limited the
> initial implementation on either end is.  Advertising the promisor
> remotes the server side relies on is probably not what we want to
> see this capability limited to forever (remember the previous "X is
> much better connected than S" example).
>
>     "it is using, if it's OK ..." -> "the other side may want to use
>     as its promisot remotes, instead of this repository"

Right, thanks.

> > +     pr-infos = pr-info | pr-infos ";" pr-info
> > +
> > +     pr-info = "name=" pr-name | "name=" pr-name "," "url=" pr-url
> > +
> > +where `pr-name` is the name of a promisor remote, and `pr-url` the
> > +urlencoded URL of that promisor remote.
>
> Clarify what the syntax for a valid name here.  Also stress the fact
> that ';' and ',' MUST be encoded if it appears in 'pr-url'.

Yeah, I will do that.

> > +In this case a client wanting to use one or more promisor remotes the
> > +server advertised should reply with "promisor-remote=<pr-names>" where
> > +<pr-names> should be of the form:
> > +
> > +     pr-names = pr-name | pr-names ";" pr-name
> > +
> > +where `pr-name` is the name of a promisor remote the server
> > +advertised.
>
> After seeing advertisement, client can use some it picked but it
> does not have to tell the server about it.  Why would it respond
> with the promisor remotes, and what effect does it have to give the
> list of promisor remotes it uses?
>
>     If the "git fetch" side decides to use one or more promisor
>     remotes advertised, it can reply with ...
>     ...
>     where ... the server advertised.  Doing so allows the server to
>     make its response smaller by omitting objects that are known to
>     be lazily fetchable from these other promisor remotes
>     repositories.
>
> perhaps?

Yeah, right.

> > +If the server prefers a client not to use any promisor remote the
> > +server uses, or if the server doesn't use any promisor remote, it
> > +should only advertise "promisor-remote" without any value or "=" sign
> > +after it.
>
> It should not advertise "promisor-remote" capability at all.

Let's discuss this above or in a thread started by Taylor's reviews.

> > +In this case, or if the client doesn't want to use any promisor remote
> > +the server advertised, the client should reply only "promisor-remote"
> > +without any value or "=" sign after it.
>
> Likewise.  It should not advertise "promisor-remote" capability at
> all.
>
> > +The "promisor.advertise" and "promisor.acceptFromServer" configuration
> > +options can be used on the server and client side respectively to
> > +control what they advertise or accept respectively. See the
> > +documentation of these configuration options for more information.
>
> OK.

Thanks for the review and suggestions.
Junio C Hamano Aug. 20, 2024, 4:55 p.m. UTC | #10
Christian Couder <christian.couder@gmail.com> writes:

> On Wed, Jul 31, 2024 at 6:16 PM Taylor Blau <me@ttaylorr.com> wrote:
> ...
> I am not sure how NUL bytes would interfere with the pkt-line.[c,h] code though.

Heh, you had 20 days to compose this response, which would be
pleanty to see that pkt-line.[ch] is about <length> and <bytes>.
After all, it is used to transfer the pack data stream that can have
arbitrary bytes ;-)
Junio C Hamano Aug. 20, 2024, 5:01 p.m. UTC | #11
Christian Couder <christian.couder@gmail.com> writes:

> I agree that it's more useful the other way though. That is for a
> server to know that the client has the capability but might not want
> to use it.
>
> For example, when C clones without using X directly, it can be a
> burden for S to have to fetch large objects from X (as it would use
> precious disk space on S, and unnecessarily duplicate large objects).
> So S might want to say "please use a newer or different client that
> has the 'promisor-remote' capability" if it knows that the client
> doesn't have this capability. If S knows that C has the capability but
> didn't configure it or doesn't want to use it, it could instead say
> something like "please consider activating the 'promisor-remote'
> capability by doing this and that to avoid burdening this server and
> get a faster clone".
>
> Note that the client might not be 'git'. It might be a "compatible"
> implementation (libgit2, gix, JGit, etc), so using the version passed
> in the "agent" protocol capability is not a good way to detect if the
> client has the capability or not.

It is none of S's business to even know about C's "true" capability,
if C does not want to use it with S.  I do not quite find the above
a credible justification.
Christian Couder Sept. 10, 2024, 4:31 p.m. UTC | #12
On Mon, Aug 5, 2024 at 8:01 PM Patrick Steinhardt <ps@pks.im> wrote:
>
> On Wed, Jul 31, 2024 at 03:40:13PM +0200, Christian Couder wrote:

> > +The server may advertise some promisor remotes it is using, if it's OK
> > +for the server that a client uses them too. In this case <pr-infos>
> > +should be of the form:
> > +
> > +     pr-infos = pr-info | pr-infos ";" pr-info
> > +
> > +     pr-info = "name=" pr-name | "name=" pr-name "," "url=" pr-url
> > +
> > +where `pr-name` is the name of a promisor remote, and `pr-url` the
> > +urlencoded URL of that promisor remote.
> > +
> > +In this case a client wanting to use one or more promisor remotes the
> > +server advertised should reply with "promisor-remote=<pr-names>" where
> > +<pr-names> should be of the form:
> > +
> > +     pr-names = pr-name | pr-names ";" pr-name
> > +
> > +where `pr-name` is the name of a promisor remote the server
> > +advertised.
> > +
> > +If the server prefers a client not to use any promisor remote the
> > +server uses, or if the server doesn't use any promisor remote, it
> > +should only advertise "promisor-remote" without any value or "=" sign
> > +after it.
> > +
> > +In this case, or if the client doesn't want to use any promisor remote
> > +the server advertised, the client should reply only "promisor-remote"
> > +without any value or "=" sign after it.
>
> Why does the client have to advertise anything if they don't want to use
> any of the promisor remotes?

I have tried to explain it in a reply to Taylor, but as you, Junio and
others seem to prefer the capability not to be advertised at all when
not used, I have changed this in version 2.

> > +The "promisor.advertise" and "promisor.acceptFromServer" configuration
> > +options can be used on the server and client side respectively to
> > +control what they advertise or accept respectively. See the
> > +documentation of these configuration options for more information.
>
> One thing I'm not totally clear on is the consequence of this
> capability. What is the expected consequence if the client accepts one
> of the promisor remotes? What is the consequence if the client accepts
> none?

I have tried to improve the documentation significatively, especially
according to Junio's suggestion, in version 2.

> In the former case I'd expect that the server is free to omit objects,
> but that isn't made explicit anywhere, I think.

Junio also suggested making it explicit so I have done that in version 2.

> Also, is there any
> mechanism that tells the client exactly which objects have been omitted?

I don't think it's necessary. Agreeing on which promisor remote (name
and URL) to use should be enough security wise. When using bundle-uri,
for example, the server is not telling the client exactly which
objects are in the bundle.

> In the latter case I assume that the result will be a full clone, that
> is the server fetched any objects it didn't have from the promisor?

Yeah, the server should fetch any objects it doesn't have from the
promisor, so it can send everything to the client.

> Or does the server side continue to only honor whatever the client has
> provided as object filters, but signals to the client that it shall
> please contact somebody else when backfilling those promised objects?

No. Options to enable things like this could be built on top later if
needed though.

Thanks for the review.
Christian Couder Sept. 10, 2024, 4:32 p.m. UTC | #13
On Tue, Aug 20, 2024 at 1:32 PM Christian Couder
<christian.couder@gmail.com> wrote:
>
> On Wed, Jul 31, 2024 at 6:16 PM Taylor Blau <me@ttaylorr.com> wrote:
> >
> > On Wed, Jul 31, 2024 at 03:40:13PM +0200, Christian Couder wrote:
> > > diff --git a/Documentation/gitprotocol-v2.txt b/Documentation/gitprotocol-v2.txt
> > > index 414bc625d5..4d8d3839c4 100644
> > > --- a/Documentation/gitprotocol-v2.txt
> > > +++ b/Documentation/gitprotocol-v2.txt
> > > @@ -781,6 +781,43 @@ retrieving the header from a bundle at the indicated URI, and thus
> > >  save themselves and the server(s) the request(s) needed to inspect the
> > >  headers of that bundle or bundles.
> > >
> > > +promisor-remote=<pr-infos>
> > > +~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > +
> > > +The server may advertise some promisor remotes it is using, if it's OK
> > > +for the server that a client uses them too. In this case <pr-infos>
> > > +should be of the form:
> > > +
> > > +     pr-infos = pr-info | pr-infos ";" pr-info

[...]

> > I wonder if it would instead be useful to have <pr-infos> first write
> > out how many <pr-info>s it contains, and then write out each <pr-info>
> > separated by a NUL byte, so that none of the files in the <pr-info>
> > itself are restricted in what characters they can use.
>
> I am not sure how NUL bytes would interfere with the pkt-line.[c,h] code though.

As Junio said pkt-line.[ch] is about <length> and <bytes> and it is
used to transfer the pack data stream that can have arbitrary bytes,
so there is no problem with NUL bytes. Sorry for not checking.

However I still think that capabilities have been using a simple text
format for now which works well, and that it's better to respect that
format and not introduce complexity in it if it's not necessary.

For example t5555-http-smart-common.sh has:

cat >expect <<-EOF &&
    version 2
    agent=FAKE
    ls-refs=unborn
    fetch=shallow wait-for-done
    server-option
    object-format=$(test_oid algo)
    0000
    EOF

to check the capabilities sent by `git upload-pack --advertise-refs`.

t5701 also uses similar instructions to check protocol v2 server commands.

So I think it's nice for tests and debugging if we keep using a simple
text format.

Also writing the number of <pr-info>s and then each <pr-info>
separated by a NUL byte might not save a lot of bytes compared to
urlencoded content if necessary, as I don't think many special
characters will need to be urlencoded most of the time.

So in the version 2 of this patch series, I haven't changed this.

Thanks for the review.
Christian Couder Sept. 10, 2024, 4:32 p.m. UTC | #14
On Tue, Aug 20, 2024 at 7:01 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Christian Couder <christian.couder@gmail.com> writes:
>
> > I agree that it's more useful the other way though. That is for a
> > server to know that the client has the capability but might not want
> > to use it.
> >
> > For example, when C clones without using X directly, it can be a
> > burden for S to have to fetch large objects from X (as it would use
> > precious disk space on S, and unnecessarily duplicate large objects).
> > So S might want to say "please use a newer or different client that
> > has the 'promisor-remote' capability" if it knows that the client
> > doesn't have this capability. If S knows that C has the capability but
> > didn't configure it or doesn't want to use it, it could instead say
> > something like "please consider activating the 'promisor-remote'
> > capability by doing this and that to avoid burdening this server and
> > get a faster clone".
> >
> > Note that the client might not be 'git'. It might be a "compatible"
> > implementation (libgit2, gix, JGit, etc), so using the version passed
> > in the "agent" protocol capability is not a good way to detect if the
> > client has the capability or not.
>
> It is none of S's business to even know about C's "true" capability,
> if C does not want to use it with S.  I do not quite find the above
> a credible justification.

Ok, as you and others have said that the "promisor-remote" capability
should not be advertised by the server or the client if they aren't
actually using it, then I have changed the implementation in the
version 2 of the patch series according to that.

I still think that this change might make it harder than necessary
(for example for support teams at GitHub and GitLab) to help users and
debug issues related to this.

The only downside I saw with always advertising the "promisor-remote"
capability even when not using it, was that it added a bit of bloat in
the protocol, but there are a number of things that could be done to
avoid that. For example changing the name of the capability to just
"promisor" or even "pr" instead of "promisor-remote" could reduce the
size of the overhead.

Thanks for the review.
Junio C Hamano Sept. 10, 2024, 5:46 p.m. UTC | #15
Christian Couder <christian.couder@gmail.com> writes:

>> > > +     pr-infos = pr-info | pr-infos ";" pr-info
>
> [...]
>
>> > I wonder if it would instead be useful to have <pr-infos> first write
>> > out how many <pr-info>s it contains, and then write out each <pr-info>
>> > separated by a NUL byte, so that none of the files in the <pr-info>
>> > itself are restricted in what characters they can use.
>>
>> I am not sure how NUL bytes would interfere with the pkt-line.[c,h] code though.
> ...
> However I still think that capabilities have been using a simple text
> format for now which works well, and that it's better to respect that
> format and not introduce complexity in it if it's not necessary.

Yup, especially when we are in control of what goes into "pr-info",
I do not see much reason to go binary.  It helps debuggability
greatly to stay in text format when you can.

Thanks.
diff mbox series

Patch

diff --git a/Documentation/config/promisor.txt b/Documentation/config/promisor.txt
index 98c5cb2ec2..e3939d83a9 100644
--- a/Documentation/config/promisor.txt
+++ b/Documentation/config/promisor.txt
@@ -1,3 +1,16 @@ 
 promisor.quiet::
 	If set to "true" assume `--quiet` when fetching additional
 	objects for a partial clone.
+
+promisor.advertise::
+	If set to "true", a server will use the "promisor-remote"
+	capability, see linkgit:gitprotocol-v2[5], to advertise the
+	promisor remotes it is using if any. Default is "false", which
+	means no promisor remote is advertised.
+
+promisor.acceptFromServer::
+	If set to "all", a client will accept all the promisor remotes
+	a server might advertise using the "promisor-remote"
+	capability, see linkgit:gitprotocol-v2[5]. Default is "none",
+	which means no promisor remote advertised by a server will be
+	accepted.
diff --git a/Documentation/gitprotocol-v2.txt b/Documentation/gitprotocol-v2.txt
index 414bc625d5..4d8d3839c4 100644
--- a/Documentation/gitprotocol-v2.txt
+++ b/Documentation/gitprotocol-v2.txt
@@ -781,6 +781,43 @@  retrieving the header from a bundle at the indicated URI, and thus
 save themselves and the server(s) the request(s) needed to inspect the
 headers of that bundle or bundles.
 
+promisor-remote=<pr-infos>
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The server may advertise some promisor remotes it is using, if it's OK
+for the server that a client uses them too. In this case <pr-infos>
+should be of the form:
+
+	pr-infos = pr-info | pr-infos ";" pr-info
+
+	pr-info = "name=" pr-name | "name=" pr-name "," "url=" pr-url
+
+where `pr-name` is the name of a promisor remote, and `pr-url` the
+urlencoded URL of that promisor remote.
+
+In this case a client wanting to use one or more promisor remotes the
+server advertised should reply with "promisor-remote=<pr-names>" where
+<pr-names> should be of the form:
+
+	pr-names = pr-name | pr-names ";" pr-name
+
+where `pr-name` is the name of a promisor remote the server
+advertised.
+
+If the server prefers a client not to use any promisor remote the
+server uses, or if the server doesn't use any promisor remote, it
+should only advertise "promisor-remote" without any value or "=" sign
+after it.
+
+In this case, or if the client doesn't want to use any promisor remote
+the server advertised, the client should reply only "promisor-remote"
+without any value or "=" sign after it.
+
+The "promisor.advertise" and "promisor.acceptFromServer" configuration
+options can be used on the server and client side respectively to
+control what they advertise or accept respectively. See the
+documentation of these configuration options for more information.
+
 GIT
 ---
 Part of the linkgit:git[1] suite
diff --git a/connect.c b/connect.c
index cf84e631e9..284ea3cf12 100644
--- a/connect.c
+++ b/connect.c
@@ -20,6 +20,7 @@ 
 #include "protocol.h"
 #include "alias.h"
 #include "bundle-uri.h"
+#include "promisor-remote.h"
 
 static char *server_capabilities_v1;
 static struct strvec server_capabilities_v2 = STRVEC_INIT;
@@ -485,6 +486,7 @@  void check_stateless_delimiter(int stateless_rpc,
 static void send_capabilities(int fd_out, struct packet_reader *reader)
 {
 	const char *hash_name;
+	const char *promisor_remote_info;
 
 	if (server_supports_v2("agent"))
 		packet_write_fmt(fd_out, "agent=%s", git_user_agent_sanitized());
@@ -498,6 +500,11 @@  static void send_capabilities(int fd_out, struct packet_reader *reader)
 	} else {
 		reader->hash_algo = &hash_algos[GIT_HASH_SHA1];
 	}
+	if (server_feature_v2("promisor-remote", &promisor_remote_info)) {
+		char *reply = promisor_remote_reply(promisor_remote_info);
+		packet_write_fmt(fd_out, "promisor-remote%s", reply ? reply : "");
+		free(reply);
+	}
 }
 
 int get_remote_bundle_uri(int fd_out, struct packet_reader *reader,
diff --git a/promisor-remote.c b/promisor-remote.c
index 317e1b127f..d347f4d9b5 100644
--- a/promisor-remote.c
+++ b/promisor-remote.c
@@ -11,6 +11,7 @@ 
 #include "strvec.h"
 #include "packfile.h"
 #include "environment.h"
+#include "url.h"
 
 struct promisor_remote_config {
 	struct promisor_remote *promisors;
@@ -219,6 +220,18 @@  int repo_has_promisor_remote(struct repository *r)
 	return !!repo_promisor_remote_find(r, NULL);
 }
 
+int repo_has_accepted_promisor_remote(struct repository *r)
+{
+	struct promisor_remote *p;
+
+	promisor_remote_init(r);
+
+	for (p = r->promisor_remote_config->promisors; p; p = p->next)
+		if (p->accepted)
+			return 1;
+	return 0;
+}
+
 static int remove_fetched_oids(struct repository *repo,
 			       struct object_id **oids,
 			       int oid_nr, int to_free)
@@ -290,3 +303,172 @@  void promisor_remote_get_direct(struct repository *repo,
 	if (to_free)
 		free(remaining_oids);
 }
+
+static int allow_unsanitized(char ch)
+{
+	if (ch == ',' || ch == ';' || ch == '%')
+		return 0;
+	return ch > 32 && ch < 127;
+}
+
+static void promisor_info_vecs(struct repository *repo,
+			       struct strvec *names,
+			       struct strvec *urls)
+{
+	struct promisor_remote *r;
+
+	promisor_remote_init(repo);
+
+	for (r = repo->promisor_remote_config->promisors; r; r = r->next) {
+		char *url;
+		char *url_key = xstrfmt("remote.%s.url", r->name);
+
+		strvec_push(names, r->name);
+		strvec_push(urls, git_config_get_string(url_key, &url) ? NULL : url);
+
+		free(url);
+		free(url_key);
+	}
+}
+
+void promisor_remote_info(struct repository *repo, struct strbuf *buf)
+{
+	struct strbuf sb = STRBUF_INIT;
+	int advertise_promisors = 0;
+	struct strvec names = STRVEC_INIT;
+	struct strvec urls = STRVEC_INIT;
+
+	git_config_get_bool("promisor.advertise", &advertise_promisors);
+
+	if (!advertise_promisors)
+		return;
+
+	promisor_info_vecs(repo, &names, &urls);
+
+	for (size_t i = 0; i < names.nr; i++) {
+		if (sb.len)
+			strbuf_addch(&sb, ';');
+		strbuf_addf(&sb, "name=%s", names.v[i]);
+		if (urls.v[i]) {
+			strbuf_addstr(&sb, ",url=");
+			strbuf_addstr_urlencode(&sb, urls.v[i], allow_unsanitized);
+		}
+	}
+
+	strbuf_sanitize(&sb);
+	strbuf_addbuf(buf, &sb);
+
+	strvec_clear(&names);
+	strvec_clear(&urls);
+}
+
+enum accept_promisor {
+	ACCEPT_NONE = 0,
+	ACCEPT_ALL
+};
+
+static int should_accept_remote(enum accept_promisor accept,
+				const char *remote_name UNUSED,
+				const char *remote_url UNUSED)
+{
+	if (accept == ACCEPT_ALL)
+		return 1;
+
+	BUG("Unhandled 'enum accept_promisor' value '%d'", accept);
+}
+
+static void filter_promisor_remote(struct repository *repo,
+				   struct strvec *accepted,
+				   const char *info)
+{
+	struct strbuf **remotes;
+	char *accept_str;
+	enum accept_promisor accept = ACCEPT_NONE;
+
+	if (!git_config_get_string("promisor.acceptfromserver", &accept_str)) {
+		if (!accept_str || !*accept_str || !strcasecmp("None", accept_str))
+			accept = ACCEPT_NONE;
+		else if (!strcasecmp("All", accept_str))
+			accept = ACCEPT_ALL;
+		else
+			warning(_("unknown '%s' value for '%s' config option"),
+				accept_str, "promisor.acceptfromserver");
+	}
+
+	if (accept == ACCEPT_NONE)
+		return;
+
+	/* Parse remote info received */
+
+	remotes = strbuf_split_str(info, ';', 0);
+
+	for (size_t i = 0; remotes[i]; i++) {
+		struct strbuf **elems;
+		const char *remote_name = NULL;
+		const char *remote_url = NULL;
+		char *decoded_url = NULL;
+
+		strbuf_trim_trailing_ch(remotes[i], ';');
+		elems = strbuf_split_str(remotes[i]->buf, ',', 0);
+
+		for (size_t j = 0; elems[j]; j++) {
+			int res;
+			strbuf_trim_trailing_ch(elems[j], ',');
+			res = skip_prefix(elems[j]->buf, "name=", &remote_name) ||
+				skip_prefix(elems[j]->buf, "url=", &remote_url);
+			if (!res)
+				warning(_("unknown element '%s' from remote info"),
+					elems[j]->buf);
+		}
+
+		decoded_url = url_decode(remote_url);
+
+		if (should_accept_remote(accept, remote_name, decoded_url))
+			strvec_push(accepted, remote_name);
+
+		strbuf_list_free(elems);
+		free(decoded_url);
+	}
+
+	free(accept_str);
+	strbuf_list_free(remotes);
+}
+
+char *promisor_remote_reply(const char *info)
+{
+	struct strvec accepted = STRVEC_INIT;
+	struct strbuf reply = STRBUF_INIT;
+
+	filter_promisor_remote(the_repository, &accepted, info);
+
+	strbuf_addch(&reply, '=');
+
+	for (size_t i = 0; i < accepted.nr; i++) {
+		if (i != 0)
+			strbuf_addch(&reply, ';');
+		strbuf_addstr(&reply, accepted.v[i]);
+	}
+
+	strvec_clear(&accepted);
+
+	return strbuf_detach(&reply, NULL);
+}
+
+void mark_promisor_remotes_as_accepted(struct repository *r, const char *remotes)
+{
+	struct strbuf **accepted_remotes = strbuf_split_str(remotes, ';', 0);
+
+	for (size_t i = 0; accepted_remotes[i]; i++) {
+		struct promisor_remote *p;
+
+		strbuf_trim_trailing_ch(accepted_remotes[i], ';');
+		p = repo_promisor_remote_find(r, accepted_remotes[i]->buf);
+		if (p)
+			p->accepted = 1;
+		else
+			warning(_("accepted promisor remote '%s' not found"),
+				accepted_remotes[i]->buf);
+	}
+
+	strbuf_list_free(accepted_remotes);
+}
diff --git a/promisor-remote.h b/promisor-remote.h
index 88cb599c39..82f060b5af 100644
--- a/promisor-remote.h
+++ b/promisor-remote.h
@@ -9,11 +9,13 @@  struct object_id;
  * Promisor remote linked list
  *
  * Information in its fields come from remote.XXX config entries or
- * from extensions.partialclone.
+ * from extensions.partialclone, except for 'accepted' which comes
+ * from protocol v2 capabilities exchange.
  */
 struct promisor_remote {
 	struct promisor_remote *next;
 	char *partial_clone_filter;
+	unsigned int accepted : 1;
 	const char name[FLEX_ARRAY];
 };
 
@@ -32,4 +34,26 @@  void promisor_remote_get_direct(struct repository *repo,
 				const struct object_id *oids,
 				int oid_nr);
 
+/*
+ * Append promisor remote info to buf. Useful for a server to
+ * advertise the promisor remotes it uses.
+ */
+void promisor_remote_info(struct repository *repo, struct strbuf *buf);
+
+/*
+ * Prepare a reply to a "promisor-remote" advertisement from a server.
+ */
+char *promisor_remote_reply(const char *info);
+
+/*
+ * Set the 'accepted' flag for some promisor remotes. Useful when some
+ * promisor remotes have been accepted by the client.
+ */
+void mark_promisor_remotes_as_accepted(struct repository *repo, const char *remotes);
+
+/*
+ * Has any promisor remote been accepted by the client?
+ */
+int repo_has_accepted_promisor_remote(struct repository *r);
+
 #endif /* PROMISOR_REMOTE_H */
diff --git a/serve.c b/serve.c
index 884cd84ca8..7c5c7c9856 100644
--- a/serve.c
+++ b/serve.c
@@ -12,6 +12,7 @@ 
 #include "upload-pack.h"
 #include "bundle-uri.h"
 #include "trace2.h"
+#include "promisor-remote.h"
 
 static int advertise_sid = -1;
 static int advertise_object_info = -1;
@@ -31,6 +32,21 @@  static int agent_advertise(struct repository *r UNUSED,
 	return 1;
 }
 
+static int promisor_remote_advertise(struct repository *r,
+				     struct strbuf *value)
+{
+       if (value)
+	       promisor_remote_info(r, value);
+       return 1;
+}
+
+static void promisor_remote_receive(struct repository *r,
+				    const char *remotes)
+{
+	mark_promisor_remotes_as_accepted(r, remotes);
+}
+
+
 static int object_format_advertise(struct repository *r,
 				   struct strbuf *value)
 {
@@ -157,6 +173,11 @@  static struct protocol_capability capabilities[] = {
 		.advertise = bundle_uri_advertise,
 		.command = bundle_uri_command,
 	},
+	{
+		.name = "promisor-remote",
+		.advertise = promisor_remote_advertise,
+		.receive = promisor_remote_receive,
+	},
 };
 
 void protocol_v2_advertise_capabilities(void)
diff --git a/t/t5555-http-smart-common.sh b/t/t5555-http-smart-common.sh
index 3dcb3340a3..27300a8bf5 100755
--- a/t/t5555-http-smart-common.sh
+++ b/t/t5555-http-smart-common.sh
@@ -131,6 +131,7 @@  test_expect_success 'git upload-pack --advertise-refs: v2' '
 	fetch=shallow wait-for-done
 	server-option
 	object-format=$(test_oid algo)
+	promisor-remote
 	0000
 	EOF
 
diff --git a/t/t5701-git-serve.sh b/t/t5701-git-serve.sh
index c48830de8f..c858c43db2 100755
--- a/t/t5701-git-serve.sh
+++ b/t/t5701-git-serve.sh
@@ -22,6 +22,7 @@  test_expect_success 'test capability advertisement' '
 	object-format=$(test_oid algo)
 	EOF
 	cat >expect.trailer <<-EOF &&
+	promisor-remote
 	0000
 	EOF
 	cat expect.base expect.trailer >expect &&
diff --git a/t/t5710-promisor-remote-capability.sh b/t/t5710-promisor-remote-capability.sh
new file mode 100755
index 0000000000..7e44ad15ce
--- /dev/null
+++ b/t/t5710-promisor-remote-capability.sh
@@ -0,0 +1,124 @@ 
+#!/bin/sh
+
+test_description='handling of promisor remote advertisement'
+
+. ./test-lib.sh
+
+# Setup the repository with three commits, this way HEAD is always
+# available and we can hide commit 1 or 2.
+test_expect_success 'setup: create "template" repository' '
+	git init template &&
+	test_commit -C template 1 &&
+	test_commit -C template 2 &&
+	test_commit -C template 3 &&
+	test-tool genrandom foo 10240 >template/foo &&
+	git -C template add foo &&
+	git -C template commit -m foo
+'
+
+# A bare repo will act as a server repo with unpacked objects.
+test_expect_success 'setup: create bare "server" repository' '
+	git clone --bare --no-local template server &&
+	mv server/objects/pack/pack-* . &&
+	packfile=$(ls pack-*.pack) &&
+	git -C server unpack-objects --strict <"$packfile"
+'
+
+check_missing_objects () {
+	git -C "$1" rev-list --objects --all --missing=print > all.txt &&
+	perl -ne 'print if s/^[?]//' all.txt >missing.txt &&
+	test_line_count = "$2" missing.txt &&
+	test "$3" = "$(cat missing.txt)"
+}
+
+initialize_server () {
+	# Repack everything first
+	git -C server -c repack.writebitmaps=false repack -a -d &&
+
+	# Remove promisor file in case they exist, useful when reinitializing
+	rm -rf server/objects/pack/*.promisor &&
+
+	# Repack without the largest object and create a promisor pack on server
+	git -C server -c repack.writebitmaps=false repack -a -d \
+	    --filter=blob:limit=5k --filter-to="$(pwd)" &&
+	promisor_file=$(ls server/objects/pack/*.pack | sed "s/\.pack/.promisor/") &&
+	touch "$promisor_file" &&
+
+	# Check that only one object is missing on the server
+	check_missing_objects server 1 "$oid"
+}
+
+test_expect_success "setup for testing promisor remote advertisement" '
+	# Create another bare repo called "server2"
+	git init --bare server2 &&
+
+	# Copy the largest object from server to server2
+	obj="HEAD:foo" &&
+	oid="$(git -C server rev-parse $obj)" &&
+	oid_path="$(test_oid_to_path $oid)" &&
+	path="server/objects/$oid_path" &&
+	path2="server2/objects/$oid_path" &&
+	mkdir -p $(dirname "$path2") &&
+	cp "$path" "$path2" &&
+
+	initialize_server &&
+
+	# Configure server2 as promisor remote for server
+	git -C server remote add server2 "file://$(pwd)/server2" &&
+	git -C server config remote.server2.promisor true &&
+
+	git -C server2 config uploadpack.allowFilter true &&
+	git -C server2 config uploadpack.allowAnySHA1InWant true &&
+	git -C server config uploadpack.allowFilter true &&
+	git -C server config uploadpack.allowAnySHA1InWant true
+'
+
+test_expect_success "fetch with promisor.advertise set to 'true'" '
+	git -C server config promisor.advertise true &&
+
+	# Clone from server to create a client
+	GIT_NO_LAZY_FETCH=0 git clone -c remote.server2.promisor=true \
+		-c remote.server2.fetch="+refs/heads/*:refs/remotes/server2/*" \
+		-c remote.server2.url="file://$(pwd)/server2" \
+		-c promisor.acceptfromserver=All \
+		--no-local --filter="blob:limit=5k" server client &&
+	test_when_finished "rm -rf client" &&
+
+	# Check that the largest object is still missing on the server
+	check_missing_objects server 1 "$oid"
+'
+
+test_expect_success "fetch with promisor.advertise set to 'false'" '
+	git -C server config promisor.advertise false &&
+
+	# Clone from server to create a client
+	GIT_NO_LAZY_FETCH=0 git clone -c remote.server2.promisor=true \
+		-c remote.server2.fetch="+refs/heads/*:refs/remotes/server2/*" \
+		-c remote.server2.url="file://$(pwd)/server2" \
+		-c promisor.acceptfromserver=All \
+		--no-local --filter="blob:limit=5k" server client &&
+	test_when_finished "rm -rf client" &&
+
+	# Check that the largest object is not missing on the server
+	check_missing_objects server 0 "" &&
+
+	# Reinitialize server so that the largest object is missing again
+	initialize_server
+'
+
+test_expect_success "fetch with promisor.acceptfromserver set to 'None'" '
+	git -C server config promisor.advertise true &&
+
+	# Clone from server to create a client
+	GIT_NO_LAZY_FETCH=0 git clone -c remote.server2.promisor=true \
+		-c remote.server2.fetch="+refs/heads/*:refs/remotes/server2/*" \
+		-c remote.server2.url="file://$(pwd)/server2" \
+		-c promisor.acceptfromserver=None \
+		--no-local --filter="blob:limit=5k" server client &&
+	test_when_finished "rm -rf client" &&
+
+	# Check that the largest object is not missing on the server
+	check_missing_objects server 0 ""
+'
+
+test_done
diff --git a/upload-pack.c b/upload-pack.c
index 0052c6a4dc..0cff76c845 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -31,6 +31,7 @@ 
 #include "write-or-die.h"
 #include "json-writer.h"
 #include "strmap.h"
+#include "promisor-remote.h"
 
 /* Remember to update object flag allocation in object.h */
 #define THEY_HAVE	(1u << 11)
@@ -317,6 +318,8 @@  static void create_pack_file(struct upload_pack_data *pack_data,
 		strvec_push(&pack_objects.args, "--delta-base-offset");
 	if (pack_data->use_include_tag)
 		strvec_push(&pack_objects.args, "--include-tag");
+	if (repo_has_accepted_promisor_remote(the_repository))
+		strvec_push(&pack_objects.args, "--missing=allow-promisor");
 	if (pack_data->filter_options.choice) {
 		const char *spec =
 			expand_list_objects_filter_spec(&pack_data->filter_options);