Message ID | 20240731134014.2299361-4-christian.couder@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Introduce a "promisor-remote" capability | expand |
On Wed, Jul 31, 2024 at 03:40:13PM +0200, Christian Couder wrote: > By default, or if "promisor.advertise" is set to 'false', a server S will > advertise only the "promisor-remote" capability without passing any > argument through this capability. This means that S supports the new > capability but doesn't wish any client C to directly access any promisor > remote X S might use. Even if the server supports this new capability, is there a reason to advertise it to the client if the server knows ahead of time that it has no promisor remotes to advertise? I am not sure what action the client would take if it knows the server supports this capability, but does not actually have any promisor remotes to advertise. I would suggest that setting promisor.advertise to false indeed prevents advertising it as a capability in the first place. Selfishly, it prevents some issues that I have when rolling out new Git versions within GitHub's infrastructure, since our push proxy layer picks a single replica to replay the capabilities from, but obviously replays the client's response to all replicas. So if only some replicas understand the new 'promisor-remote' capability, we can run into issues. I'm not sure if the client even bothers to send back promisor-remote if the server did not send any such remotes to begin with, but between that and what I wrote in the second paragraph here, I don't see a reason to advertise the capability when promisor.advertise is false. Thanks, Taylor
On Wed, Jul 31, 2024 at 03:40:13PM +0200, Christian Couder wrote: > diff --git a/Documentation/gitprotocol-v2.txt b/Documentation/gitprotocol-v2.txt > index 414bc625d5..4d8d3839c4 100644 > --- a/Documentation/gitprotocol-v2.txt > +++ b/Documentation/gitprotocol-v2.txt > @@ -781,6 +781,43 @@ retrieving the header from a bundle at the indicated URI, and thus > save themselves and the server(s) the request(s) needed to inspect the > headers of that bundle or bundles. > > +promisor-remote=<pr-infos> > +~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +The server may advertise some promisor remotes it is using, if it's OK > +for the server that a client uses them too. In this case <pr-infos> > +should be of the form: > + > + pr-infos = pr-info | pr-infos ";" pr-info So <pr-infos> uses the ';' character to delimit multiple <pr-info>s, which means that <pr-info> can't use ';' itself. You mention above that <pr-info> is supposed to be generic so that we can add other fields to it in the future. Do you imagine that any of those fields might want to use the ';' in their values? One that comes to mind is the shared token example you wrote about above. It would be nice to not restrict what characters the token can contain. I wonder if it would instead be useful to have <pr-infos> first write out how many <pr-info>s it contains, and then write out each <pr-info> separated by a NUL byte, so that none of the files in the <pr-info> itself are restricted in what characters they can use. > +static void promisor_info_vecs(struct repository *repo, > + struct strvec *names, > + struct strvec *urls) > +{ > + struct promisor_remote *r; > + > + promisor_remote_init(repo); > + > + for (r = repo->promisor_remote_config->promisors; r; r = r->next) { > + char *url; > + char *url_key = xstrfmt("remote.%s.url", r->name); > + > + strvec_push(names, r->name); > + strvec_push(urls, git_config_get_string(url_key, &url) ? NULL : url); Do you mean to push NULL onto urls here? It seems risky since you have to check that each entry in the strvec is non-NULL before printing it out (which you do below in promisor_remote_info()). Or maybe you need to in order to advertise promisor remotes without URLs? If so, I'm not sure what the benefit would be to the client if it doesn't know where to go to retrieve any objects without having a URL. Thanks, Taylor
Christian Couder <christian.couder@gmail.com> writes: > When a server repository S borrows some objects from a promisor remote X, > then a client repository C which would like to clone or fetch from S might, > or might not, want to also borrow objects from X. Also S might, or might > not, want to advertise X as a good way for C to directly get objects from, > instead of C getting everything through S. If S is a clone that is keeping up to date with X, even if it does not borrow anything from X, as long as X is known to be much better connected to the world (e.g., it is in a $LARGEINTERNETCOMPANY datacenter with petabit/s backbone connections) than S is (e.g., it is my deskside box on a cable modem), it may be beneficial if S can omit objects from its "git fetch" response to C, if C is willing to fill the gap using X. So it is of dubious value to limit the feature only to cases where S "borrows" from X, is it? > To allow S and C to agree on C using X or not, let's introduce a new > "promisor-remote" capability in the protocol v2, as well as a few new > configuration variables: > > - "promisor.advertise" on the server side, and: > - "promisor.acceptFromServer" on the client side. > > By default, or if "promisor.advertise" is set to 'false', a server S will > advertise only the "promisor-remote" capability without passing any > argument through this capability. This means that S supports the new > capability but doesn't wish any client C to directly access any promisor > remote X S might use. I would find it more natural if .advertise is turned off by setting it explicitly to "false", we would pretend as if we have never even heard of such a capability. > If "promisor.advertise" is set to 'true', S will advertise its promisor > remotes with a string like: > > promisor-remote=<pm-info>[;<pm-info>]... > > where each <pm-info> element contains information about a single > promisor remote in the form: > > name=<pm-name>[,url=<pm-url>] > where <pm-name> is the name of a promisor remote and <pm-url> is the > urlencoded url of the promisor remote named <pm-name>. OK, so pm-name cannot have ";," in it (which is sensible, or define pm-name more tightly, like "only lowercase alnum"). URL cannot have ';' or ',' in it that is an OK limitation as URL encoding can hide them. > For now, the URL is passed in addition to the name. In the future, it > might be possible to pass other information like a filter-spec that the > client should use when cloning from S, or a token that the client should > use when retrieving objects from X. OK. And obviously they cannot have ';," in them without encoding similarly. > It might also be possible in the future for "promisor.advertise" to have > other values like "onlyName", so that no URL is advertised. Saying "<pm-info> is expected to be extended" should be sufficient, without inviting discussions like "what good does it do to give only names" that is irrelevant at least at this moment. > By default or if "promisor.acceptFromServer" is set to "None", the > client will not accept to use the promisor remotes that might have been > advertised by the server. In this case, the client will advertise only > "promisor-remote" in its reply to the server. This means that the client > has the "promisor-remote" capability but decided not to use any of the > promisor remotes that the server might have advertised. OK, that is a signal to the server side that it is not allowed to omit any objects from its response to "git fetch" request, even though they might be available via a better connected remotes. > If "promisor.acceptFromServer" is set to "All", on the contrary, the > client will accept to use all the promisor remotes that the server > advertised and it will reply with a string like: > > promisor-remote=<pm-name>[;<pm-name>]... > > where the <pm-name> elements are the names of all the promisor remotes > the server advertised. So, this is why we need "name" for each "pm-info"---to give a short name associated with the URL of the remote repository. Presumably, C has an option to see if each of the remote suggested is reachable and omit remotes that are not available to C from its response, so even when .accept is set to "all", the response may not list all the names of remotes that S advertised, in general. > If the server advertised no promisor remote > though, the client will reply with just "promisor-remote". In other words, at the protocol level: - S uses promisor-remote capability to tell C what are potentially useful alternate remotes to obtain objects that C may want to fetch from S - C uses promisor-remote capability to tell S that among the remotes advertised by S, it is willing to use the named remotes as its promisor, which permits S from omitting objects from its response to "git fetch" request from C as long as they are known to be available from these remotes. I think that makes sense, but I do not see the point of sending an empty promisor-remote capability at all. What practical difference would it make to S and C, if S chooses not to advertise the capability at all, instead of advertising an empty remote list with the capability? Both tells C that it is useless to request promistor-remote capability to S in its response. What practical difference would it make to S and C, if C chooses not to advertise the capability at all, instead of advertising an empty remote list with the capability? Both tells S that S is not allowed to omit objects that are obtainable from elsewhere. > In a following commit, other values for "promisor.acceptFromServer" will > be implemented so that the client will be able to decide the promisor > remotes it accepts depending on the name and URL it received from the > server. So even if that name and URL information is not used much right > now, it will be needed soon. OK. > diff --git a/Documentation/config/promisor.txt b/Documentation/config/promisor.txt > index 98c5cb2ec2..e3939d83a9 100644 > --- a/Documentation/config/promisor.txt > +++ b/Documentation/config/promisor.txt > @@ -1,3 +1,16 @@ > promisor.quiet:: > If set to "true" assume `--quiet` when fetching additional > objects for a partial clone. > + > +promisor.advertise:: > + If set to "true", a server will use the "promisor-remote" > + capability, see linkgit:gitprotocol-v2[5], to advertise the > + promisor remotes it is using if any. Default is "false", which > + means no promisor remote is advertised. Even though I said that there logically is not much reason to tie this advertisement to the use of promistor remote by the serving side, I am OK if the initial implementation is limited to that arrangement. It would be an easy change to allow this variable to take a list of remote repositories that may (or may not) be a promisor remote of this repository (in other words, "they are clones that are better connected than me") in the future, but that does not have to happen in the initial iteration. It would be less confusing to first-time readers if you described the intent a bit better. Why would the server want to advertise and how would the client take advantage of the information? I see that the update in this patch to protocol document is skimpy on this point, and end-user facing documentation has better exposure anyway, so let's see what we can do here. The "promisor-remote" protocol capability can be used by the responder to "git fetch" to advertise better-connected remotes that the requester can use as promisor remotes, instead of this repository, so that "git fetch" requestor can lazily fetch objects from these other better-connected remotes. If this configuration variable is set to "true",... or something, perhaps? "no promisor remote is advertised" -> "no promisor-remote capability is advertised". > +promisor.acceptFromServer:: > + If set to "all", a client will accept all the promisor remotes > + a server might advertise using the "promisor-remote" > + capability, see linkgit:gitprotocol-v2[5]. Default is "none", > + which means no promisor remote advertised by a server will be > + accepted. Similarly, readers would want to know what the implication is to "accept" promisor remotes. accept ..., and adds them to its promisor remotes, allowing the server to omit objects from its response to "fetch" requests that are lazily fetchable from these promisor remotes, see linkgit:gitprotocol-v2[5]. or something? > diff --git a/Documentation/gitprotocol-v2.txt b/Documentation/gitprotocol-v2.txt > index 414bc625d5..4d8d3839c4 100644 > --- a/Documentation/gitprotocol-v2.txt > +++ b/Documentation/gitprotocol-v2.txt > @@ -781,6 +781,43 @@ retrieving the header from a bundle at the indicated URI, and thus > save themselves and the server(s) the request(s) needed to inspect the > headers of that bundle or bundles. > > +promisor-remote=<pr-infos> > +~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +The server may advertise some promisor remotes it is using, if it's OK > +for the server that a client uses them too. In this case <pr-infos> > +should be of the form: As this is the protocol documentation, we should describe what goes over the wire and what they mean, regardless of how limited the initial implementation on either end is. Advertising the promisor remotes the server side relies on is probably not what we want to see this capability limited to forever (remember the previous "X is much better connected than S" example). "it is using, if it's OK ..." -> "the other side may want to use as its promisot remotes, instead of this repository" > + pr-infos = pr-info | pr-infos ";" pr-info > + > + pr-info = "name=" pr-name | "name=" pr-name "," "url=" pr-url > + > +where `pr-name` is the name of a promisor remote, and `pr-url` the > +urlencoded URL of that promisor remote. Clarify what the syntax for a valid name here. Also stress the fact that ';' and ',' MUST be encoded if it appears in 'pr-url'. > +In this case a client wanting to use one or more promisor remotes the > +server advertised should reply with "promisor-remote=<pr-names>" where > +<pr-names> should be of the form: > + > + pr-names = pr-name | pr-names ";" pr-name > + > +where `pr-name` is the name of a promisor remote the server > +advertised. After seeing advertisement, client can use some it picked but it does not have to tell the server about it. Why would it respond with the promisor remotes, and what effect does it have to give the list of promisor remotes it uses? If the "git fetch" side decides to use one or more promisor remotes advertised, it can reply with ... ... where ... the server advertised. Doing so allows the server to make its response smaller by omitting objects that are known to be lazily fetchable from these other promisor remotes repositories. perhaps? > +If the server prefers a client not to use any promisor remote the > +server uses, or if the server doesn't use any promisor remote, it > +should only advertise "promisor-remote" without any value or "=" sign > +after it. It should not advertise "promisor-remote" capability at all. > +In this case, or if the client doesn't want to use any promisor remote > +the server advertised, the client should reply only "promisor-remote" > +without any value or "=" sign after it. Likewise. It should not advertise "promisor-remote" capability at all. > +The "promisor.advertise" and "promisor.acceptFromServer" configuration > +options can be used on the server and client side respectively to > +control what they advertise or accept respectively. See the > +documentation of these configuration options for more information. OK.
Junio C Hamano <gitster@pobox.com> writes: > Christian Couder <christian.couder@gmail.com> writes: > >> When a server repository S borrows some objects from a promisor remote X, >> then a client repository C which would like to clone or fetch from S might, >> or might not, want to also borrow objects from X. Also S might, or might >> not, want to advertise X as a good way for C to directly get objects from, >> instead of C getting everything through S. > > If S is a clone that is keeping up to date with X, even if it does > not borrow anything from X, as long as X is known to be much better > connected to the world (e.g., it is in a $LARGEINTERNETCOMPANY > datacenter with petabit/s backbone connections) than S is (e.g., it > is my deskside box on a cable modem), it may be beneficial if S can > omit objects from its "git fetch" response to C, if C is willing to > fill the gap using X. > > So it is of dubious value to limit the feature only to cases where S > "borrows" from X, is it? An even better example is if S on my deskside box is the source of truth, and X in $LARGEINTERNETCOMPANY datacenter that is much better connected is a publishing repository I use to push from S to X. Even if you originally cloned from S and use S as your promisor remote, as the operator of S, I would like you to always consult X first to reduce the load on S when lazily fetching objects that you are missing.
On Wed, Jul 31, 2024 at 03:40:13PM +0200, Christian Couder wrote: > diff --git a/Documentation/gitprotocol-v2.txt b/Documentation/gitprotocol-v2.txt > index 414bc625d5..4d8d3839c4 100644 > --- a/Documentation/gitprotocol-v2.txt > +++ b/Documentation/gitprotocol-v2.txt > @@ -781,6 +781,43 @@ retrieving the header from a bundle at the indicated URI, and thus > save themselves and the server(s) the request(s) needed to inspect the > headers of that bundle or bundles. > > +promisor-remote=<pr-infos> > +~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +The server may advertise some promisor remotes it is using, if it's OK > +for the server that a client uses them too. In this case <pr-infos> > +should be of the form: > + > + pr-infos = pr-info | pr-infos ";" pr-info > + > + pr-info = "name=" pr-name | "name=" pr-name "," "url=" pr-url > + > +where `pr-name` is the name of a promisor remote, and `pr-url` the > +urlencoded URL of that promisor remote. > + > +In this case a client wanting to use one or more promisor remotes the > +server advertised should reply with "promisor-remote=<pr-names>" where > +<pr-names> should be of the form: > + > + pr-names = pr-name | pr-names ";" pr-name > + > +where `pr-name` is the name of a promisor remote the server > +advertised. > + > +If the server prefers a client not to use any promisor remote the > +server uses, or if the server doesn't use any promisor remote, it > +should only advertise "promisor-remote" without any value or "=" sign > +after it. > + > +In this case, or if the client doesn't want to use any promisor remote > +the server advertised, the client should reply only "promisor-remote" > +without any value or "=" sign after it. Why does the client have to advertise anything if they don't want to use any of the promisor remotes? > +The "promisor.advertise" and "promisor.acceptFromServer" configuration > +options can be used on the server and client side respectively to > +control what they advertise or accept respectively. See the > +documentation of these configuration options for more information. One thing I'm not totally clear on is the consequence of this capability. What is the expected consequence if the client accepts one of the promisor remotes? What is the consequence if the client accepts none? In the former case I'd expect that the server is free to omit objects, but that isn't made explicit anywhere, I think. Also, is there any mechanism that tells the client exactly which objects have been omitted? In the latter case I assume that the result will be a full clone, that is the server fetched any objects it didn't have from the promisor? Or does the server side continue to only honor whatever the client has provided as object filters, but signals to the client that it shall please contact somebody else when backfilling those promised objects? Patrick
Patrick Steinhardt <ps@pks.im> writes: >> +In this case, or if the client doesn't want to use any promisor remote >> +the server advertised, the client should reply only "promisor-remote" >> +without any value or "=" sign after it. > > Why does the client have to advertise anything if they don't want to use > any of the promisor remotes? Yeah, it is not very well justified why an empty capability needs to be sent (from both sides). My recommendation is to drop that part of the design, but if there is a reason to keep, it should be done by explaining how differently the other side should behave when the capability is not sent at all and when the capability with no promisor remote is sent. >> +The "promisor.advertise" and "promisor.acceptFromServer" configuration >> +options can be used on the server and client side respectively to >> +control what they advertise or accept respectively. See the >> +documentation of these configuration options for more information. > > One thing I'm not totally clear on is the consequence of this > capability. What is the expected consequence if the client accepts one > of the promisor remotes? What is the consequence if the client accepts > none? Yes, I also found the documentation lacking in that respect. The series talks about how the exchange can proceed, without saying much (if anything) about what both sides want to exchange promisor-remote for---what effect does it have on the behaviour of both sides to send one. I covered this point in one of my reviews a bit more. https://lore.kernel.org/git/xmqqikwl2zca.fsf@gitster.g/ > In the former case I'd expect that the server is free to omit objects, > but that isn't made explicit anywhere, I think. Also, is there any > mechanism that tells the client exactly which objects have been omitted? > In the latter case I assume that the result will be a full clone, that > is the server fetched any objects it didn't have from the promisor? > > Or does the server side continue to only honor whatever the client has > provided as object filters, but signals to the client that it shall > please contact somebody else when backfilling those promised objects? Thanks.
On Wed, Jul 31, 2024 at 6:16 PM Taylor Blau <me@ttaylorr.com> wrote: > > On Wed, Jul 31, 2024 at 03:40:13PM +0200, Christian Couder wrote: > > diff --git a/Documentation/gitprotocol-v2.txt b/Documentation/gitprotocol-v2.txt > > index 414bc625d5..4d8d3839c4 100644 > > --- a/Documentation/gitprotocol-v2.txt > > +++ b/Documentation/gitprotocol-v2.txt > > @@ -781,6 +781,43 @@ retrieving the header from a bundle at the indicated URI, and thus > > save themselves and the server(s) the request(s) needed to inspect the > > headers of that bundle or bundles. > > > > +promisor-remote=<pr-infos> > > +~~~~~~~~~~~~~~~~~~~~~~~~~~ > > + > > +The server may advertise some promisor remotes it is using, if it's OK > > +for the server that a client uses them too. In this case <pr-infos> > > +should be of the form: > > + > > + pr-infos = pr-info | pr-infos ";" pr-info > > So <pr-infos> uses the ';' character to delimit multiple <pr-info>s, > which means that <pr-info> can't use ';' itself. You mention above that > <pr-info> is supposed to be generic so that we can add other fields to > it in the future. Do you imagine that any of those fields might want to > use the ';' in their values? Yeah, but, as for the 'url' field where the value is urlencoded, the value can be encoded if it could contain some special characters. > One that comes to mind is the shared token example you wrote about > above. It would be nice to not restrict what characters the token can > contain. I agree but I think urlencoding, or maybe other simple encodings like base64, should be easy and simple enough to work around this. > I wonder if it would instead be useful to have <pr-infos> first write > out how many <pr-info>s it contains, and then write out each <pr-info> > separated by a NUL byte, so that none of the files in the <pr-info> > itself are restricted in what characters they can use. I am not sure how NUL bytes would interfere with the pkt-line.[c,h] code though. > > +static void promisor_info_vecs(struct repository *repo, > > + struct strvec *names, > > + struct strvec *urls) > > +{ > > + struct promisor_remote *r; > > + > > + promisor_remote_init(repo); > > + > > + for (r = repo->promisor_remote_config->promisors; r; r = r->next) { > > + char *url; > > + char *url_key = xstrfmt("remote.%s.url", r->name); > > + > > + strvec_push(names, r->name); > > + strvec_push(urls, git_config_get_string(url_key, &url) ? NULL : url); > > Do you mean to push NULL onto urls here? It seems risky since you have > to check that each entry in the strvec is non-NULL before printing it > out (which you do below in promisor_remote_info()). The code doesn't seem risky to me as it allows us to treat the case when git_config_get_string() fails and when it succeeds but possibly sets 'url' to NULL (not sure if it's possible though as I didn't check) in the same way. Yeah, it means that we have to check if each entry in the strvec is non-NULL, but I think it's quite easy, and honestly I didn't want to ask myself questions like should we treat an URL of a remote configured as an empty string in the same way as the URL not configured. I think it's much simpler to just pass as-is the content, if any, that we get from git_config_get_string(). > Or maybe you need to in order to advertise promisor remotes without > URLs? Yeah, I think we should advertise promisor remotes that don't have an URL configured. It might seem strange, but maybe servers might want in the future to have hidden/secret URLs (URLs that they use, likely internally on the server, but don't want to pass for some reason to a client). > If so, I'm not sure what the benefit would be to the client if it > doesn't know where to go to retrieve any objects without having a URL. The client might already have an URL for the promisor-remote (and it might be a different one than the one the server would use if hidden/secret URLs become a thing). That's why patch 4/4 implements the "KnownName" value for the "promisor.acceptFromServer" config option.
On Wed, Jul 31, 2024 at 5:40 PM Taylor Blau <me@ttaylorr.com> wrote: > > On Wed, Jul 31, 2024 at 03:40:13PM +0200, Christian Couder wrote: > > By default, or if "promisor.advertise" is set to 'false', a server S will > > advertise only the "promisor-remote" capability without passing any > > argument through this capability. This means that S supports the new > > capability but doesn't wish any client C to directly access any promisor > > remote X S might use. > > Even if the server supports this new capability, is there a reason to > advertise it to the client if the server knows ahead of time that it has > no promisor remotes to advertise? I think it could be useful at least in some cases for C to know that S has the capability to advertise promisor remotes but decided not to advertise any. For example, if C knows that the repo has a lot of very large files, it might realize that S is likely not a good mirror of the repo if it doesn't have the 'promisor-remote' capability. I agree that it's more useful the other way though. That is for a server to know that the client has the capability but might not want to use it. For example, when C clones without using X directly, it can be a burden for S to have to fetch large objects from X (as it would use precious disk space on S, and unnecessarily duplicate large objects). So S might want to say "please use a newer or different client that has the 'promisor-remote' capability" if it knows that the client doesn't have this capability. If S knows that C has the capability but didn't configure it or doesn't want to use it, it could instead say something like "please consider activating the 'promisor-remote' capability by doing this and that to avoid burdening this server and get a faster clone". Note that the client might not be 'git'. It might be a "compatible" implementation (libgit2, gix, JGit, etc), so using the version passed in the "agent" protocol capability is not a good way to detect if the client has the capability or not. In the end, as it looks very useful for S to know if C has the capability or not, and as it seems natural that S and C behave the same regarding advertising the capability, I think the choice of always advertising the capability, even when not using it, is the right one. > I am not sure what action the client would take if it knows the server > supports this capability, but does not actually have any promisor > remotes to advertise. I would suggest that setting promisor.advertise to > false indeed prevents advertising it as a capability in the first place. It could, in some cases, help C realize that S is likely using old or unoptimized server software for the repo, and C could decide based on this to use a different mirror repo. For example if C wants to clone some well known open source AI repo that has a lot of very large files and is mirrored on many common repo hosting platforms (GitHub, GitLab, etc), C might be happy to get a clue of how likely the different mirrors are to be optimized to serve that repo. I agree that it might not be a very good reason right now, but I think it might be in the future. Anyway the main reason for such a behavior is (as I said above) that it is very useful for S to know if C has the 'promisor-remote' capability or not. > Selfishly, it prevents some issues that I have when rolling out new Git > versions within GitHub's infrastructure, since our push proxy layer > picks a single replica to replay the capabilities from, but obviously > replays the client's response to all replicas. So if only some replicas > understand the new 'promisor-remote' capability, we can run into issues. I understand the problem, but I think it might be worked around by first deploying on a single replica with the new 'promisor-remote' capability disabled in the config, which is the default. Yeah, that replica might behave a bit differently than the others, but the client behavior shouldn't change much. And when things work well with a single replica with the capability disabled, then more replicas with that capability disabled can be rolled out until they all have it. More issues are likely to happen when actually enabling the capability, but this is independent of the fact that the 'promisor-remote' capability is advertised even if it is not enabled. > I'm not sure if the client even bothers to send back promisor-remote if > the server did not send any such remotes to begin with, If S sends 'promisor-remote' even without sending any remote information then C should reply using 'promisor-remote' too. I think it can help S decide if setting up promisor remotes is worth it or not, if S can easily know if many of its clients could use them or not. > but between that > and what I wrote in the second paragraph here, I don't see a reason to > advertise the capability when promisor.advertise is false.
On Wed, Jul 31, 2024 at 8:25 PM Junio C Hamano <gitster@pobox.com> wrote: > > Christian Couder <christian.couder@gmail.com> writes: > > > When a server repository S borrows some objects from a promisor remote X, > > then a client repository C which would like to clone or fetch from S might, > > or might not, want to also borrow objects from X. Also S might, or might > > not, want to advertise X as a good way for C to directly get objects from, > > instead of C getting everything through S. > > If S is a clone that is keeping up to date with X, even if it does > not borrow anything from X, as long as X is known to be much better > connected to the world (e.g., it is in a $LARGEINTERNETCOMPANY > datacenter with petabit/s backbone connections) than S is (e.g., it > is my deskside box on a cable modem), it may be beneficial if S can > omit objects from its "git fetch" response to C, if C is willing to > fill the gap using X. > > So it is of dubious value to limit the feature only to cases where S > "borrows" from X, is it? I agree. I just blindly copied the word from something you said in a previous thread, but I should have thought more about the use case you suggest. > > To allow S and C to agree on C using X or not, let's introduce a new > > "promisor-remote" capability in the protocol v2, as well as a few new > > configuration variables: > > > > - "promisor.advertise" on the server side, and: > > - "promisor.acceptFromServer" on the client side. > > > > By default, or if "promisor.advertise" is set to 'false', a server S will > > advertise only the "promisor-remote" capability without passing any > > argument through this capability. This means that S supports the new > > capability but doesn't wish any client C to directly access any promisor > > remote X S might use. > > I would find it more natural if .advertise is turned off by setting > it explicitly to "false", we would pretend as if we have never even > heard of such a capability. In a reply to Taylor I just sent, I tried to explain why I think it's a good thing that S can know if C has the capability even if neither S nor C actually use it. If, in the future, many servers and repos are transitioned to setups where promisor remotes and this capability are used, then I think it could help a lot if S can help C take advantage of X, and better diagnose issues. And perhaps the other way around (C knowing that S has the capability or not) could help a bit too. > > If "promisor.advertise" is set to 'true', S will advertise its promisor > > remotes with a string like: > > > > promisor-remote=<pm-info>[;<pm-info>]... > > > > where each <pm-info> element contains information about a single > > promisor remote in the form: > > > > name=<pm-name>[,url=<pm-url>] > > where <pm-name> is the name of a promisor remote and <pm-url> is the > > urlencoded url of the promisor remote named <pm-name>. > > OK, so pm-name cannot have ";," in it (which is sensible, or define > pm-name more tightly, like "only lowercase alnum"). Not sure what our config allows here. Ideally I think the capability should support everything our config supports. > URL cannot have > ';' or ',' in it that is an OK limitation as URL encoding can hide > them. Yeah, right. > > For now, the URL is passed in addition to the name. In the future, it > > might be possible to pass other information like a filter-spec that the > > client should use when cloning from S, or a token that the client should > > use when retrieving objects from X. > > OK. And obviously they cannot have ';," in them without encoding > similarly. Yeah, or maybe using a different encoding if it's better for some reason. > > It might also be possible in the future for "promisor.advertise" to have > > other values like "onlyName", so that no URL is advertised. > > Saying "<pm-info> is expected to be extended" should be sufficient, > without inviting discussions like "what good does it do to give only > names" that is irrelevant at least at this moment. Yeah, but in this case Taylor had comments related to advertising remotes that have a name but no URL, so I think this example could help people having related questions understand that it could be a good thing to advertise only a URL name and no URL. > > By default or if "promisor.acceptFromServer" is set to "None", the > > client will not accept to use the promisor remotes that might have been > > advertised by the server. In this case, the client will advertise only > > "promisor-remote" in its reply to the server. This means that the client > > has the "promisor-remote" capability but decided not to use any of the > > promisor remotes that the server might have advertised. > > OK, that is a signal to the server side that it is not allowed to > omit any objects from its response to "git fetch" request, even > though they might be available via a better connected remotes. Yeah, right. > > If "promisor.acceptFromServer" is set to "All", on the contrary, the > > client will accept to use all the promisor remotes that the server > > advertised and it will reply with a string like: > > > > promisor-remote=<pm-name>[;<pm-name>]... > > > > where the <pm-name> elements are the names of all the promisor remotes > > the server advertised. > > So, this is why we need "name" for each "pm-info"---to give a short > name associated with the URL of the remote repository. Yeah, I think it can be valuable to use names to agree on promisor-remotes that should or shouldn't be used, like we commonly use remote names with `git clone`, `git fetch`, etc. > Presumably, C has an option to see if each of the remote suggested > is reachable and omit remotes that are not available to C from its > response, so even when .accept is set to "all", the response may not > list all the names of remotes that S advertised, in general. I didn't think about reachability or connectivity specifically, but I agree it might be useful for C to be able to filter in some ways the promisor remotes that S advertised. > > If the server advertised no promisor remote > > though, the client will reply with just "promisor-remote". > > In other words, at the protocol level: > > - S uses promisor-remote capability to tell C what are potentially > useful alternate remotes to obtain objects that C may want to > fetch from S > > - C uses promisor-remote capability to tell S that among the > remotes advertised by S, it is willing to use the named remotes > as its promisor, which permits S from omitting objects from its > response to "git fetch" request from C as long as they are known > to be available from these remotes. Yeah, right. > I think that makes sense, but I do not see the point of sending an > empty promisor-remote capability at all. I hope I gave good enough reasons above and in my replies to Taylor. > What practical difference would it make to S and C, if S chooses not > to advertise the capability at all, instead of advertising an empty > remote list with the capability? Both tells C that it is useless to > request promistor-remote capability to S in its response. > > What practical difference would it make to S and C, if C chooses not > to advertise the capability at all, instead of advertising an empty > remote list with the capability? Both tells S that S is not allowed > to omit objects that are obtainable from elsewhere. I agree that when looking at how Git related things technically work, it doesn't change anything, but I think we should look at the big picture too. For GitLab, for example (but I suppose it will be similar for GitHub and other hosting sites), it will be important to help users take advantage, as seamlessly as possible, of the feature, and the more we know about the client they use and its capabilities, the better job we can do to help them. > > In a following commit, other values for "promisor.acceptFromServer" will > > be implemented so that the client will be able to decide the promisor > > remotes it accepts depending on the name and URL it received from the > > server. So even if that name and URL information is not used much right > > now, it will be needed soon. > > OK. > > > diff --git a/Documentation/config/promisor.txt b/Documentation/config/promisor.txt > > index 98c5cb2ec2..e3939d83a9 100644 > > --- a/Documentation/config/promisor.txt > > +++ b/Documentation/config/promisor.txt > > @@ -1,3 +1,16 @@ > > promisor.quiet:: > > If set to "true" assume `--quiet` when fetching additional > > objects for a partial clone. > > + > > +promisor.advertise:: > > + If set to "true", a server will use the "promisor-remote" > > + capability, see linkgit:gitprotocol-v2[5], to advertise the > > + promisor remotes it is using if any. Default is "false", which > > + means no promisor remote is advertised. > > Even though I said that there logically is not much reason to tie > this advertisement to the use of promistor remote by the serving > side, I am OK if the initial implementation is limited to that > arrangement. It would be an easy change to allow this variable > to take a list of remote repositories that may (or may not) be a > promisor remote of this repository (in other words, "they are clones > that are better connected than me") in the future, but that does not > have to happen in the initial iteration. Yeah, I will mention something like this in the next iteration. > It would be less confusing to first-time readers if you described > the intent a bit better. Why would the server want to advertise and > how would the client take advantage of the information? Yeah, I will try to better answer that question in the doc, cover letter and commit messages. > I see that > the update in this patch to protocol document is skimpy on this point, > and end-user facing documentation has better exposure anyway, so > let's see what we can do here. > > The "promisor-remote" protocol capability can be used by the > responder to "git fetch" to advertise better-connected remotes > that the requester can use as promisor remotes, instead of this > repository, so that "git fetch" requestor can lazily fetch > objects from these other better-connected remotes. If this > configuration variable is set to "true",... > > or something, perhaps? Thanks for the suggestion. I will base the changes in version 2 on it. > "no promisor remote is advertised" -> "no promisor-remote capability > is advertised". Right. > > +promisor.acceptFromServer:: > > + If set to "all", a client will accept all the promisor remotes > > + a server might advertise using the "promisor-remote" > > + capability, see linkgit:gitprotocol-v2[5]. Default is "none", > > + which means no promisor remote advertised by a server will be > > + accepted. > > Similarly, readers would want to know what the implication is to > "accept" promisor remotes. > > accept ..., and adds them to its promisor remotes, allowing > the server to omit objects from its response to "fetch" > requests that are lazily fetchable from these promisor > remotes, see linkgit:gitprotocol-v2[5]. > > or something? Thanks for the suggestion. I will improve the wording based on it. > > diff --git a/Documentation/gitprotocol-v2.txt b/Documentation/gitprotocol-v2.txt > > index 414bc625d5..4d8d3839c4 100644 > > --- a/Documentation/gitprotocol-v2.txt > > +++ b/Documentation/gitprotocol-v2.txt > > @@ -781,6 +781,43 @@ retrieving the header from a bundle at the indicated URI, and thus > > save themselves and the server(s) the request(s) needed to inspect the > > headers of that bundle or bundles. > > > > +promisor-remote=<pr-infos> > > +~~~~~~~~~~~~~~~~~~~~~~~~~~ > > + > > +The server may advertise some promisor remotes it is using, if it's OK > > +for the server that a client uses them too. In this case <pr-infos> > > +should be of the form: > > As this is the protocol documentation, we should describe what goes > over the wire and what they mean, regardless of how limited the > initial implementation on either end is. Advertising the promisor > remotes the server side relies on is probably not what we want to > see this capability limited to forever (remember the previous "X is > much better connected than S" example). > > "it is using, if it's OK ..." -> "the other side may want to use > as its promisot remotes, instead of this repository" Right, thanks. > > + pr-infos = pr-info | pr-infos ";" pr-info > > + > > + pr-info = "name=" pr-name | "name=" pr-name "," "url=" pr-url > > + > > +where `pr-name` is the name of a promisor remote, and `pr-url` the > > +urlencoded URL of that promisor remote. > > Clarify what the syntax for a valid name here. Also stress the fact > that ';' and ',' MUST be encoded if it appears in 'pr-url'. Yeah, I will do that. > > +In this case a client wanting to use one or more promisor remotes the > > +server advertised should reply with "promisor-remote=<pr-names>" where > > +<pr-names> should be of the form: > > + > > + pr-names = pr-name | pr-names ";" pr-name > > + > > +where `pr-name` is the name of a promisor remote the server > > +advertised. > > After seeing advertisement, client can use some it picked but it > does not have to tell the server about it. Why would it respond > with the promisor remotes, and what effect does it have to give the > list of promisor remotes it uses? > > If the "git fetch" side decides to use one or more promisor > remotes advertised, it can reply with ... > ... > where ... the server advertised. Doing so allows the server to > make its response smaller by omitting objects that are known to > be lazily fetchable from these other promisor remotes > repositories. > > perhaps? Yeah, right. > > +If the server prefers a client not to use any promisor remote the > > +server uses, or if the server doesn't use any promisor remote, it > > +should only advertise "promisor-remote" without any value or "=" sign > > +after it. > > It should not advertise "promisor-remote" capability at all. Let's discuss this above or in a thread started by Taylor's reviews. > > +In this case, or if the client doesn't want to use any promisor remote > > +the server advertised, the client should reply only "promisor-remote" > > +without any value or "=" sign after it. > > Likewise. It should not advertise "promisor-remote" capability at > all. > > > +The "promisor.advertise" and "promisor.acceptFromServer" configuration > > +options can be used on the server and client side respectively to > > +control what they advertise or accept respectively. See the > > +documentation of these configuration options for more information. > > OK. Thanks for the review and suggestions.
Christian Couder <christian.couder@gmail.com> writes: > On Wed, Jul 31, 2024 at 6:16 PM Taylor Blau <me@ttaylorr.com> wrote: > ... > I am not sure how NUL bytes would interfere with the pkt-line.[c,h] code though. Heh, you had 20 days to compose this response, which would be pleanty to see that pkt-line.[ch] is about <length> and <bytes>. After all, it is used to transfer the pack data stream that can have arbitrary bytes ;-)
Christian Couder <christian.couder@gmail.com> writes: > I agree that it's more useful the other way though. That is for a > server to know that the client has the capability but might not want > to use it. > > For example, when C clones without using X directly, it can be a > burden for S to have to fetch large objects from X (as it would use > precious disk space on S, and unnecessarily duplicate large objects). > So S might want to say "please use a newer or different client that > has the 'promisor-remote' capability" if it knows that the client > doesn't have this capability. If S knows that C has the capability but > didn't configure it or doesn't want to use it, it could instead say > something like "please consider activating the 'promisor-remote' > capability by doing this and that to avoid burdening this server and > get a faster clone". > > Note that the client might not be 'git'. It might be a "compatible" > implementation (libgit2, gix, JGit, etc), so using the version passed > in the "agent" protocol capability is not a good way to detect if the > client has the capability or not. It is none of S's business to even know about C's "true" capability, if C does not want to use it with S. I do not quite find the above a credible justification.
On Mon, Aug 5, 2024 at 8:01 PM Patrick Steinhardt <ps@pks.im> wrote: > > On Wed, Jul 31, 2024 at 03:40:13PM +0200, Christian Couder wrote: > > +The server may advertise some promisor remotes it is using, if it's OK > > +for the server that a client uses them too. In this case <pr-infos> > > +should be of the form: > > + > > + pr-infos = pr-info | pr-infos ";" pr-info > > + > > + pr-info = "name=" pr-name | "name=" pr-name "," "url=" pr-url > > + > > +where `pr-name` is the name of a promisor remote, and `pr-url` the > > +urlencoded URL of that promisor remote. > > + > > +In this case a client wanting to use one or more promisor remotes the > > +server advertised should reply with "promisor-remote=<pr-names>" where > > +<pr-names> should be of the form: > > + > > + pr-names = pr-name | pr-names ";" pr-name > > + > > +where `pr-name` is the name of a promisor remote the server > > +advertised. > > + > > +If the server prefers a client not to use any promisor remote the > > +server uses, or if the server doesn't use any promisor remote, it > > +should only advertise "promisor-remote" without any value or "=" sign > > +after it. > > + > > +In this case, or if the client doesn't want to use any promisor remote > > +the server advertised, the client should reply only "promisor-remote" > > +without any value or "=" sign after it. > > Why does the client have to advertise anything if they don't want to use > any of the promisor remotes? I have tried to explain it in a reply to Taylor, but as you, Junio and others seem to prefer the capability not to be advertised at all when not used, I have changed this in version 2. > > +The "promisor.advertise" and "promisor.acceptFromServer" configuration > > +options can be used on the server and client side respectively to > > +control what they advertise or accept respectively. See the > > +documentation of these configuration options for more information. > > One thing I'm not totally clear on is the consequence of this > capability. What is the expected consequence if the client accepts one > of the promisor remotes? What is the consequence if the client accepts > none? I have tried to improve the documentation significatively, especially according to Junio's suggestion, in version 2. > In the former case I'd expect that the server is free to omit objects, > but that isn't made explicit anywhere, I think. Junio also suggested making it explicit so I have done that in version 2. > Also, is there any > mechanism that tells the client exactly which objects have been omitted? I don't think it's necessary. Agreeing on which promisor remote (name and URL) to use should be enough security wise. When using bundle-uri, for example, the server is not telling the client exactly which objects are in the bundle. > In the latter case I assume that the result will be a full clone, that > is the server fetched any objects it didn't have from the promisor? Yeah, the server should fetch any objects it doesn't have from the promisor, so it can send everything to the client. > Or does the server side continue to only honor whatever the client has > provided as object filters, but signals to the client that it shall > please contact somebody else when backfilling those promised objects? No. Options to enable things like this could be built on top later if needed though. Thanks for the review.
On Tue, Aug 20, 2024 at 1:32 PM Christian Couder <christian.couder@gmail.com> wrote: > > On Wed, Jul 31, 2024 at 6:16 PM Taylor Blau <me@ttaylorr.com> wrote: > > > > On Wed, Jul 31, 2024 at 03:40:13PM +0200, Christian Couder wrote: > > > diff --git a/Documentation/gitprotocol-v2.txt b/Documentation/gitprotocol-v2.txt > > > index 414bc625d5..4d8d3839c4 100644 > > > --- a/Documentation/gitprotocol-v2.txt > > > +++ b/Documentation/gitprotocol-v2.txt > > > @@ -781,6 +781,43 @@ retrieving the header from a bundle at the indicated URI, and thus > > > save themselves and the server(s) the request(s) needed to inspect the > > > headers of that bundle or bundles. > > > > > > +promisor-remote=<pr-infos> > > > +~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > + > > > +The server may advertise some promisor remotes it is using, if it's OK > > > +for the server that a client uses them too. In this case <pr-infos> > > > +should be of the form: > > > + > > > + pr-infos = pr-info | pr-infos ";" pr-info [...] > > I wonder if it would instead be useful to have <pr-infos> first write > > out how many <pr-info>s it contains, and then write out each <pr-info> > > separated by a NUL byte, so that none of the files in the <pr-info> > > itself are restricted in what characters they can use. > > I am not sure how NUL bytes would interfere with the pkt-line.[c,h] code though. As Junio said pkt-line.[ch] is about <length> and <bytes> and it is used to transfer the pack data stream that can have arbitrary bytes, so there is no problem with NUL bytes. Sorry for not checking. However I still think that capabilities have been using a simple text format for now which works well, and that it's better to respect that format and not introduce complexity in it if it's not necessary. For example t5555-http-smart-common.sh has: cat >expect <<-EOF && version 2 agent=FAKE ls-refs=unborn fetch=shallow wait-for-done server-option object-format=$(test_oid algo) 0000 EOF to check the capabilities sent by `git upload-pack --advertise-refs`. t5701 also uses similar instructions to check protocol v2 server commands. So I think it's nice for tests and debugging if we keep using a simple text format. Also writing the number of <pr-info>s and then each <pr-info> separated by a NUL byte might not save a lot of bytes compared to urlencoded content if necessary, as I don't think many special characters will need to be urlencoded most of the time. So in the version 2 of this patch series, I haven't changed this. Thanks for the review.
On Tue, Aug 20, 2024 at 7:01 PM Junio C Hamano <gitster@pobox.com> wrote: > > Christian Couder <christian.couder@gmail.com> writes: > > > I agree that it's more useful the other way though. That is for a > > server to know that the client has the capability but might not want > > to use it. > > > > For example, when C clones without using X directly, it can be a > > burden for S to have to fetch large objects from X (as it would use > > precious disk space on S, and unnecessarily duplicate large objects). > > So S might want to say "please use a newer or different client that > > has the 'promisor-remote' capability" if it knows that the client > > doesn't have this capability. If S knows that C has the capability but > > didn't configure it or doesn't want to use it, it could instead say > > something like "please consider activating the 'promisor-remote' > > capability by doing this and that to avoid burdening this server and > > get a faster clone". > > > > Note that the client might not be 'git'. It might be a "compatible" > > implementation (libgit2, gix, JGit, etc), so using the version passed > > in the "agent" protocol capability is not a good way to detect if the > > client has the capability or not. > > It is none of S's business to even know about C's "true" capability, > if C does not want to use it with S. I do not quite find the above > a credible justification. Ok, as you and others have said that the "promisor-remote" capability should not be advertised by the server or the client if they aren't actually using it, then I have changed the implementation in the version 2 of the patch series according to that. I still think that this change might make it harder than necessary (for example for support teams at GitHub and GitLab) to help users and debug issues related to this. The only downside I saw with always advertising the "promisor-remote" capability even when not using it, was that it added a bit of bloat in the protocol, but there are a number of things that could be done to avoid that. For example changing the name of the capability to just "promisor" or even "pr" instead of "promisor-remote" could reduce the size of the overhead. Thanks for the review.
Christian Couder <christian.couder@gmail.com> writes: >> > > + pr-infos = pr-info | pr-infos ";" pr-info > > [...] > >> > I wonder if it would instead be useful to have <pr-infos> first write >> > out how many <pr-info>s it contains, and then write out each <pr-info> >> > separated by a NUL byte, so that none of the files in the <pr-info> >> > itself are restricted in what characters they can use. >> >> I am not sure how NUL bytes would interfere with the pkt-line.[c,h] code though. > ... > However I still think that capabilities have been using a simple text > format for now which works well, and that it's better to respect that > format and not introduce complexity in it if it's not necessary. Yup, especially when we are in control of what goes into "pr-info", I do not see much reason to go binary. It helps debuggability greatly to stay in text format when you can. Thanks.
diff --git a/Documentation/config/promisor.txt b/Documentation/config/promisor.txt index 98c5cb2ec2..e3939d83a9 100644 --- a/Documentation/config/promisor.txt +++ b/Documentation/config/promisor.txt @@ -1,3 +1,16 @@ promisor.quiet:: If set to "true" assume `--quiet` when fetching additional objects for a partial clone. + +promisor.advertise:: + If set to "true", a server will use the "promisor-remote" + capability, see linkgit:gitprotocol-v2[5], to advertise the + promisor remotes it is using if any. Default is "false", which + means no promisor remote is advertised. + +promisor.acceptFromServer:: + If set to "all", a client will accept all the promisor remotes + a server might advertise using the "promisor-remote" + capability, see linkgit:gitprotocol-v2[5]. Default is "none", + which means no promisor remote advertised by a server will be + accepted. diff --git a/Documentation/gitprotocol-v2.txt b/Documentation/gitprotocol-v2.txt index 414bc625d5..4d8d3839c4 100644 --- a/Documentation/gitprotocol-v2.txt +++ b/Documentation/gitprotocol-v2.txt @@ -781,6 +781,43 @@ retrieving the header from a bundle at the indicated URI, and thus save themselves and the server(s) the request(s) needed to inspect the headers of that bundle or bundles. +promisor-remote=<pr-infos> +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The server may advertise some promisor remotes it is using, if it's OK +for the server that a client uses them too. In this case <pr-infos> +should be of the form: + + pr-infos = pr-info | pr-infos ";" pr-info + + pr-info = "name=" pr-name | "name=" pr-name "," "url=" pr-url + +where `pr-name` is the name of a promisor remote, and `pr-url` the +urlencoded URL of that promisor remote. + +In this case a client wanting to use one or more promisor remotes the +server advertised should reply with "promisor-remote=<pr-names>" where +<pr-names> should be of the form: + + pr-names = pr-name | pr-names ";" pr-name + +where `pr-name` is the name of a promisor remote the server +advertised. + +If the server prefers a client not to use any promisor remote the +server uses, or if the server doesn't use any promisor remote, it +should only advertise "promisor-remote" without any value or "=" sign +after it. + +In this case, or if the client doesn't want to use any promisor remote +the server advertised, the client should reply only "promisor-remote" +without any value or "=" sign after it. + +The "promisor.advertise" and "promisor.acceptFromServer" configuration +options can be used on the server and client side respectively to +control what they advertise or accept respectively. See the +documentation of these configuration options for more information. + GIT --- Part of the linkgit:git[1] suite diff --git a/connect.c b/connect.c index cf84e631e9..284ea3cf12 100644 --- a/connect.c +++ b/connect.c @@ -20,6 +20,7 @@ #include "protocol.h" #include "alias.h" #include "bundle-uri.h" +#include "promisor-remote.h" static char *server_capabilities_v1; static struct strvec server_capabilities_v2 = STRVEC_INIT; @@ -485,6 +486,7 @@ void check_stateless_delimiter(int stateless_rpc, static void send_capabilities(int fd_out, struct packet_reader *reader) { const char *hash_name; + const char *promisor_remote_info; if (server_supports_v2("agent")) packet_write_fmt(fd_out, "agent=%s", git_user_agent_sanitized()); @@ -498,6 +500,11 @@ static void send_capabilities(int fd_out, struct packet_reader *reader) } else { reader->hash_algo = &hash_algos[GIT_HASH_SHA1]; } + if (server_feature_v2("promisor-remote", &promisor_remote_info)) { + char *reply = promisor_remote_reply(promisor_remote_info); + packet_write_fmt(fd_out, "promisor-remote%s", reply ? reply : ""); + free(reply); + } } int get_remote_bundle_uri(int fd_out, struct packet_reader *reader, diff --git a/promisor-remote.c b/promisor-remote.c index 317e1b127f..d347f4d9b5 100644 --- a/promisor-remote.c +++ b/promisor-remote.c @@ -11,6 +11,7 @@ #include "strvec.h" #include "packfile.h" #include "environment.h" +#include "url.h" struct promisor_remote_config { struct promisor_remote *promisors; @@ -219,6 +220,18 @@ int repo_has_promisor_remote(struct repository *r) return !!repo_promisor_remote_find(r, NULL); } +int repo_has_accepted_promisor_remote(struct repository *r) +{ + struct promisor_remote *p; + + promisor_remote_init(r); + + for (p = r->promisor_remote_config->promisors; p; p = p->next) + if (p->accepted) + return 1; + return 0; +} + static int remove_fetched_oids(struct repository *repo, struct object_id **oids, int oid_nr, int to_free) @@ -290,3 +303,172 @@ void promisor_remote_get_direct(struct repository *repo, if (to_free) free(remaining_oids); } + +static int allow_unsanitized(char ch) +{ + if (ch == ',' || ch == ';' || ch == '%') + return 0; + return ch > 32 && ch < 127; +} + +static void promisor_info_vecs(struct repository *repo, + struct strvec *names, + struct strvec *urls) +{ + struct promisor_remote *r; + + promisor_remote_init(repo); + + for (r = repo->promisor_remote_config->promisors; r; r = r->next) { + char *url; + char *url_key = xstrfmt("remote.%s.url", r->name); + + strvec_push(names, r->name); + strvec_push(urls, git_config_get_string(url_key, &url) ? NULL : url); + + free(url); + free(url_key); + } +} + +void promisor_remote_info(struct repository *repo, struct strbuf *buf) +{ + struct strbuf sb = STRBUF_INIT; + int advertise_promisors = 0; + struct strvec names = STRVEC_INIT; + struct strvec urls = STRVEC_INIT; + + git_config_get_bool("promisor.advertise", &advertise_promisors); + + if (!advertise_promisors) + return; + + promisor_info_vecs(repo, &names, &urls); + + for (size_t i = 0; i < names.nr; i++) { + if (sb.len) + strbuf_addch(&sb, ';'); + strbuf_addf(&sb, "name=%s", names.v[i]); + if (urls.v[i]) { + strbuf_addstr(&sb, ",url="); + strbuf_addstr_urlencode(&sb, urls.v[i], allow_unsanitized); + } + } + + strbuf_sanitize(&sb); + strbuf_addbuf(buf, &sb); + + strvec_clear(&names); + strvec_clear(&urls); +} + +enum accept_promisor { + ACCEPT_NONE = 0, + ACCEPT_ALL +}; + +static int should_accept_remote(enum accept_promisor accept, + const char *remote_name UNUSED, + const char *remote_url UNUSED) +{ + if (accept == ACCEPT_ALL) + return 1; + + BUG("Unhandled 'enum accept_promisor' value '%d'", accept); +} + +static void filter_promisor_remote(struct repository *repo, + struct strvec *accepted, + const char *info) +{ + struct strbuf **remotes; + char *accept_str; + enum accept_promisor accept = ACCEPT_NONE; + + if (!git_config_get_string("promisor.acceptfromserver", &accept_str)) { + if (!accept_str || !*accept_str || !strcasecmp("None", accept_str)) + accept = ACCEPT_NONE; + else if (!strcasecmp("All", accept_str)) + accept = ACCEPT_ALL; + else + warning(_("unknown '%s' value for '%s' config option"), + accept_str, "promisor.acceptfromserver"); + } + + if (accept == ACCEPT_NONE) + return; + + /* Parse remote info received */ + + remotes = strbuf_split_str(info, ';', 0); + + for (size_t i = 0; remotes[i]; i++) { + struct strbuf **elems; + const char *remote_name = NULL; + const char *remote_url = NULL; + char *decoded_url = NULL; + + strbuf_trim_trailing_ch(remotes[i], ';'); + elems = strbuf_split_str(remotes[i]->buf, ',', 0); + + for (size_t j = 0; elems[j]; j++) { + int res; + strbuf_trim_trailing_ch(elems[j], ','); + res = skip_prefix(elems[j]->buf, "name=", &remote_name) || + skip_prefix(elems[j]->buf, "url=", &remote_url); + if (!res) + warning(_("unknown element '%s' from remote info"), + elems[j]->buf); + } + + decoded_url = url_decode(remote_url); + + if (should_accept_remote(accept, remote_name, decoded_url)) + strvec_push(accepted, remote_name); + + strbuf_list_free(elems); + free(decoded_url); + } + + free(accept_str); + strbuf_list_free(remotes); +} + +char *promisor_remote_reply(const char *info) +{ + struct strvec accepted = STRVEC_INIT; + struct strbuf reply = STRBUF_INIT; + + filter_promisor_remote(the_repository, &accepted, info); + + strbuf_addch(&reply, '='); + + for (size_t i = 0; i < accepted.nr; i++) { + if (i != 0) + strbuf_addch(&reply, ';'); + strbuf_addstr(&reply, accepted.v[i]); + } + + strvec_clear(&accepted); + + return strbuf_detach(&reply, NULL); +} + +void mark_promisor_remotes_as_accepted(struct repository *r, const char *remotes) +{ + struct strbuf **accepted_remotes = strbuf_split_str(remotes, ';', 0); + + for (size_t i = 0; accepted_remotes[i]; i++) { + struct promisor_remote *p; + + strbuf_trim_trailing_ch(accepted_remotes[i], ';'); + p = repo_promisor_remote_find(r, accepted_remotes[i]->buf); + if (p) + p->accepted = 1; + else + warning(_("accepted promisor remote '%s' not found"), + accepted_remotes[i]->buf); + } + + strbuf_list_free(accepted_remotes); +} diff --git a/promisor-remote.h b/promisor-remote.h index 88cb599c39..82f060b5af 100644 --- a/promisor-remote.h +++ b/promisor-remote.h @@ -9,11 +9,13 @@ struct object_id; * Promisor remote linked list * * Information in its fields come from remote.XXX config entries or - * from extensions.partialclone. + * from extensions.partialclone, except for 'accepted' which comes + * from protocol v2 capabilities exchange. */ struct promisor_remote { struct promisor_remote *next; char *partial_clone_filter; + unsigned int accepted : 1; const char name[FLEX_ARRAY]; }; @@ -32,4 +34,26 @@ void promisor_remote_get_direct(struct repository *repo, const struct object_id *oids, int oid_nr); +/* + * Append promisor remote info to buf. Useful for a server to + * advertise the promisor remotes it uses. + */ +void promisor_remote_info(struct repository *repo, struct strbuf *buf); + +/* + * Prepare a reply to a "promisor-remote" advertisement from a server. + */ +char *promisor_remote_reply(const char *info); + +/* + * Set the 'accepted' flag for some promisor remotes. Useful when some + * promisor remotes have been accepted by the client. + */ +void mark_promisor_remotes_as_accepted(struct repository *repo, const char *remotes); + +/* + * Has any promisor remote been accepted by the client? + */ +int repo_has_accepted_promisor_remote(struct repository *r); + #endif /* PROMISOR_REMOTE_H */ diff --git a/serve.c b/serve.c index 884cd84ca8..7c5c7c9856 100644 --- a/serve.c +++ b/serve.c @@ -12,6 +12,7 @@ #include "upload-pack.h" #include "bundle-uri.h" #include "trace2.h" +#include "promisor-remote.h" static int advertise_sid = -1; static int advertise_object_info = -1; @@ -31,6 +32,21 @@ static int agent_advertise(struct repository *r UNUSED, return 1; } +static int promisor_remote_advertise(struct repository *r, + struct strbuf *value) +{ + if (value) + promisor_remote_info(r, value); + return 1; +} + +static void promisor_remote_receive(struct repository *r, + const char *remotes) +{ + mark_promisor_remotes_as_accepted(r, remotes); +} + + static int object_format_advertise(struct repository *r, struct strbuf *value) { @@ -157,6 +173,11 @@ static struct protocol_capability capabilities[] = { .advertise = bundle_uri_advertise, .command = bundle_uri_command, }, + { + .name = "promisor-remote", + .advertise = promisor_remote_advertise, + .receive = promisor_remote_receive, + }, }; void protocol_v2_advertise_capabilities(void) diff --git a/t/t5555-http-smart-common.sh b/t/t5555-http-smart-common.sh index 3dcb3340a3..27300a8bf5 100755 --- a/t/t5555-http-smart-common.sh +++ b/t/t5555-http-smart-common.sh @@ -131,6 +131,7 @@ test_expect_success 'git upload-pack --advertise-refs: v2' ' fetch=shallow wait-for-done server-option object-format=$(test_oid algo) + promisor-remote 0000 EOF diff --git a/t/t5701-git-serve.sh b/t/t5701-git-serve.sh index c48830de8f..c858c43db2 100755 --- a/t/t5701-git-serve.sh +++ b/t/t5701-git-serve.sh @@ -22,6 +22,7 @@ test_expect_success 'test capability advertisement' ' object-format=$(test_oid algo) EOF cat >expect.trailer <<-EOF && + promisor-remote 0000 EOF cat expect.base expect.trailer >expect && diff --git a/t/t5710-promisor-remote-capability.sh b/t/t5710-promisor-remote-capability.sh new file mode 100755 index 0000000000..7e44ad15ce --- /dev/null +++ b/t/t5710-promisor-remote-capability.sh @@ -0,0 +1,124 @@ +#!/bin/sh + +test_description='handling of promisor remote advertisement' + +. ./test-lib.sh + +# Setup the repository with three commits, this way HEAD is always +# available and we can hide commit 1 or 2. +test_expect_success 'setup: create "template" repository' ' + git init template && + test_commit -C template 1 && + test_commit -C template 2 && + test_commit -C template 3 && + test-tool genrandom foo 10240 >template/foo && + git -C template add foo && + git -C template commit -m foo +' + +# A bare repo will act as a server repo with unpacked objects. +test_expect_success 'setup: create bare "server" repository' ' + git clone --bare --no-local template server && + mv server/objects/pack/pack-* . && + packfile=$(ls pack-*.pack) && + git -C server unpack-objects --strict <"$packfile" +' + +check_missing_objects () { + git -C "$1" rev-list --objects --all --missing=print > all.txt && + perl -ne 'print if s/^[?]//' all.txt >missing.txt && + test_line_count = "$2" missing.txt && + test "$3" = "$(cat missing.txt)" +} + +initialize_server () { + # Repack everything first + git -C server -c repack.writebitmaps=false repack -a -d && + + # Remove promisor file in case they exist, useful when reinitializing + rm -rf server/objects/pack/*.promisor && + + # Repack without the largest object and create a promisor pack on server + git -C server -c repack.writebitmaps=false repack -a -d \ + --filter=blob:limit=5k --filter-to="$(pwd)" && + promisor_file=$(ls server/objects/pack/*.pack | sed "s/\.pack/.promisor/") && + touch "$promisor_file" && + + # Check that only one object is missing on the server + check_missing_objects server 1 "$oid" +} + +test_expect_success "setup for testing promisor remote advertisement" ' + # Create another bare repo called "server2" + git init --bare server2 && + + # Copy the largest object from server to server2 + obj="HEAD:foo" && + oid="$(git -C server rev-parse $obj)" && + oid_path="$(test_oid_to_path $oid)" && + path="server/objects/$oid_path" && + path2="server2/objects/$oid_path" && + mkdir -p $(dirname "$path2") && + cp "$path" "$path2" && + + initialize_server && + + # Configure server2 as promisor remote for server + git -C server remote add server2 "file://$(pwd)/server2" && + git -C server config remote.server2.promisor true && + + git -C server2 config uploadpack.allowFilter true && + git -C server2 config uploadpack.allowAnySHA1InWant true && + git -C server config uploadpack.allowFilter true && + git -C server config uploadpack.allowAnySHA1InWant true +' + +test_expect_success "fetch with promisor.advertise set to 'true'" ' + git -C server config promisor.advertise true && + + # Clone from server to create a client + GIT_NO_LAZY_FETCH=0 git clone -c remote.server2.promisor=true \ + -c remote.server2.fetch="+refs/heads/*:refs/remotes/server2/*" \ + -c remote.server2.url="file://$(pwd)/server2" \ + -c promisor.acceptfromserver=All \ + --no-local --filter="blob:limit=5k" server client && + test_when_finished "rm -rf client" && + + # Check that the largest object is still missing on the server + check_missing_objects server 1 "$oid" +' + +test_expect_success "fetch with promisor.advertise set to 'false'" ' + git -C server config promisor.advertise false && + + # Clone from server to create a client + GIT_NO_LAZY_FETCH=0 git clone -c remote.server2.promisor=true \ + -c remote.server2.fetch="+refs/heads/*:refs/remotes/server2/*" \ + -c remote.server2.url="file://$(pwd)/server2" \ + -c promisor.acceptfromserver=All \ + --no-local --filter="blob:limit=5k" server client && + test_when_finished "rm -rf client" && + + # Check that the largest object is not missing on the server + check_missing_objects server 0 "" && + + # Reinitialize server so that the largest object is missing again + initialize_server +' + +test_expect_success "fetch with promisor.acceptfromserver set to 'None'" ' + git -C server config promisor.advertise true && + + # Clone from server to create a client + GIT_NO_LAZY_FETCH=0 git clone -c remote.server2.promisor=true \ + -c remote.server2.fetch="+refs/heads/*:refs/remotes/server2/*" \ + -c remote.server2.url="file://$(pwd)/server2" \ + -c promisor.acceptfromserver=None \ + --no-local --filter="blob:limit=5k" server client && + test_when_finished "rm -rf client" && + + # Check that the largest object is not missing on the server + check_missing_objects server 0 "" +' + +test_done diff --git a/upload-pack.c b/upload-pack.c index 0052c6a4dc..0cff76c845 100644 --- a/upload-pack.c +++ b/upload-pack.c @@ -31,6 +31,7 @@ #include "write-or-die.h" #include "json-writer.h" #include "strmap.h" +#include "promisor-remote.h" /* Remember to update object flag allocation in object.h */ #define THEY_HAVE (1u << 11) @@ -317,6 +318,8 @@ static void create_pack_file(struct upload_pack_data *pack_data, strvec_push(&pack_objects.args, "--delta-base-offset"); if (pack_data->use_include_tag) strvec_push(&pack_objects.args, "--include-tag"); + if (repo_has_accepted_promisor_remote(the_repository)) + strvec_push(&pack_objects.args, "--missing=allow-promisor"); if (pack_data->filter_options.choice) { const char *spec = expand_list_objects_filter_spec(&pack_data->filter_options);
When a server repository S borrows some objects from a promisor remote X, then a client repository C which would like to clone or fetch from S might, or might not, want to also borrow objects from X. Also S might, or might not, want to advertise X as a good way for C to directly get objects from, instead of C getting everything through S. To allow S and C to agree on C using X or not, let's introduce a new "promisor-remote" capability in the protocol v2, as well as a few new configuration variables: - "promisor.advertise" on the server side, and: - "promisor.acceptFromServer" on the client side. By default, or if "promisor.advertise" is set to 'false', a server S will advertise only the "promisor-remote" capability without passing any argument through this capability. This means that S supports the new capability but doesn't wish any client C to directly access any promisor remote X S might use. If "promisor.advertise" is set to 'true', S will advertise its promisor remotes with a string like: promisor-remote=<pm-info>[;<pm-info>]... where each <pm-info> element contains information about a single promisor remote in the form: name=<pm-name>[,url=<pm-url>] where <pm-name> is the name of a promisor remote and <pm-url> is the urlencoded url of the promisor remote named <pm-name>. For now, the URL is passed in addition to the name. In the future, it might be possible to pass other information like a filter-spec that the client should use when cloning from S, or a token that the client should use when retrieving objects from X. It might also be possible in the future for "promisor.advertise" to have other values like "onlyName", so that no URL is advertised. By default or if "promisor.acceptFromServer" is set to "None", the client will not accept to use the promisor remotes that might have been advertised by the server. In this case, the client will advertise only "promisor-remote" in its reply to the server. This means that the client has the "promisor-remote" capability but decided not to use any of the promisor remotes that the server might have advertised. If "promisor.acceptFromServer" is set to "All", on the contrary, the client will accept to use all the promisor remotes that the server advertised and it will reply with a string like: promisor-remote=<pm-name>[;<pm-name>]... where the <pm-name> elements are the names of all the promisor remotes the server advertised. If the server advertised no promisor remote though, the client will reply with just "promisor-remote". In a following commit, other values for "promisor.acceptFromServer" will be implemented so that the client will be able to decide the promisor remotes it accepts depending on the name and URL it received from the server. So even if that name and URL information is not used much right now, it will be needed soon. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> --- Documentation/config/promisor.txt | 13 ++ Documentation/gitprotocol-v2.txt | 37 ++++++ connect.c | 7 + promisor-remote.c | 182 ++++++++++++++++++++++++++ promisor-remote.h | 26 +++- serve.c | 21 +++ t/t5555-http-smart-common.sh | 1 + t/t5701-git-serve.sh | 1 + t/t5710-promisor-remote-capability.sh | 124 ++++++++++++++++++ upload-pack.c | 3 + 10 files changed, 414 insertions(+), 1 deletion(-) create mode 100755 t/t5710-promisor-remote-capability.sh