Message ID | 20240910163000.1985723-4-christian.couder@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Introduce a "promisor-remote" capability | expand |
On Tue, Sep 10, 2024 at 06:29:59PM +0200, Christian Couder wrote: > diff --git a/Documentation/config/promisor.txt b/Documentation/config/promisor.txt > index 98c5cb2ec2..9cbfe3e59e 100644 > --- a/Documentation/config/promisor.txt > +++ b/Documentation/config/promisor.txt > @@ -1,3 +1,20 @@ > promisor.quiet:: > If set to "true" assume `--quiet` when fetching additional > objects for a partial clone. > + > +promisor.advertise:: > + If set to "true", a server will use the "promisor-remote" > + capability, see linkgit:gitprotocol-v2[5], to advertise the > + promisor remotes it is using, if it uses some. Default is > + "false", which means the "promisor-remote" capability is not > + advertised. > + > +promisor.acceptFromServer:: > + If set to "all", a client will accept all the promisor remotes > + a server might advertise using the "promisor-remote" > + capability. Default is "none", which means no promisor remote > + advertised by a server will be accepted. By accepting a > + promisor remote, the client agrees that the server might omit > + objects that are lazily fetchable from this promisor remote > + from its responses to "fetch" and "clone" requests from the > + client. See linkgit:gitprotocol-v2[5]. I wonder a bit about whether making this an option is all that sensible, because that would of course apply globally to every server that you might want to clone from. Wouldn't it be more sensible to make this configurabe per server? Another question: servers may advertise bogus addresses to us, and as far as I can see there are currently no precautions in place against malicious cases. The server might for example use this to redirect us to a remote that uses no encryption, the Git protocol or even the "file://" protocol. I guess the sane thing here would be to default to allow clones via "https://" only, but make the set of accepted protocols configurable. > diff --git a/Documentation/gitprotocol-v2.txt b/Documentation/gitprotocol-v2.txt > index 414bc625d5..65d5256baf 100644 > --- a/Documentation/gitprotocol-v2.txt > +++ b/Documentation/gitprotocol-v2.txt > @@ -781,6 +781,60 @@ retrieving the header from a bundle at the indicated URI, and thus > save themselves and the server(s) the request(s) needed to inspect the > headers of that bundle or bundles. > > +promisor-remote=<pr-infos> > +~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +The server may advertise some promisor remotes it is using or knows > +about to a client which may want to use them as its promisor remotes, > +instead of this repository. In this case <pr-infos> should be of the > +form: > + > + pr-infos = pr-info | pr-infos ";" pr-info Wouldn't it be preferable to make this multiple lines so that we cannot ever burst through the pktline limits? > + pr-info = "name=" pr-name | "name=" pr-name "," "url=" pr-url > + > +where `pr-name` is the urlencoded name of a promisor remote, and > +`pr-url` the urlencoded URL of that promisor remote. > +In this case, if the client decides to use one or more promisor > +remotes the server advertised, it can reply with > +"promisor-remote=<pr-names>" where <pr-names> should be of the form: One of the things that LFS provides is custom transfer types. It is for example possible to use NFS or some other arbitrary protocol to fetch or upload data. It should be possible to provide similar functionality on the Git side via custom transport helpers, too, and if we make the accepted set of helpers configurable as proposed further up this could be made safe, too. But one thing I'm missing here is any documentation around how the client would know which promisor-remote to pick when the remote advertises multiple of them. The easiest schema would of course be to pick the first one whose transport helper the client understands and considers to be safe. But given that we're talking about offloading of large blobs, would we have usecases for advertising e.g. region-scoped remotes that require more information on the client-side? Also, are the promisor remotes promising to each contain all objects? Or would the client have to ask each promisor remote until it finds a desired object? > + pr-names = pr-name | pr-names ";" pr-name > + > +where `pr-name` is the urlencoded name of a promisor remote the server > +advertised and the client accepts. > + > +Note that, everywhere in this document, `pr-name` MUST be a valid > +remote name, and the ';' and ',' characters MUST be encoded if they > +appear in `pr-name` or `pr-url`. So I assume the intent here is to let the client add that promisor remote with that exact, server-provided name? That makes me wonder about two different scenarios: - We must keep the remote from announcing "origin". - What if we eventually decide to allow users to provide their own names for remotes during git-clone(1)? Overall, I don't think that it's a good idea to let the remote dictate which name a client's remotes have. > +If the server doesn't know any promisor remote that could be good for > +a client to use, or prefers a client not to use any promisor remote it > +uses or knows about, it shouldn't advertise the "promisor-remote" > +capability at all. > + > +In this case, or if the client doesn't want to use any promisor remote > +the server advertised, the client shouldn't advertise the > +"promisor-remote" capability at all in its reply. > + > +The "promisor.advertise" and "promisor.acceptFromServer" configuration > +options can be used on the server and client side respectively to > +control what they advertise or accept respectively. See the > +documentation of these configuration options for more information. > + > +Note that in the future it would be nice if the "promisor-remote" > +protocol capability could be used by the server, when responding to > +`git fetch` or `git clone`, to advertise better-connected remotes that > +the client can use as promisor remotes, instead of this repository, so > +that the client can lazily fetch objects from these other > +better-connected remotes. This would require the server to omit in its > +response the objects available on the better-connected remotes that > +the client has accepted. This hasn't been implemented yet though. So > +for now this "promisor-remote" capability is useful only when the > +server advertises some promisor remotes it already uses to borrow > +objects from. In the cover letter you mention that the server may not even have some objects at all in the future. I wonder how that is supposed to interact with clients that do not know about the "promisor-remote" capability at all though. From my point of view the server should be able tot handle that just fine and provide a full packfile to the client. That would of course require the server to fetch missing objects from its own promisor remotes. Do we want to state explicitly that this is a MUST for servers so that we don't end up in a future where clients wouldn't be able to fetch from some forges anymore? Patrick
On Mon, Sep 30, 2024 at 9:57 AM Patrick Steinhardt <ps@pks.im> wrote: > > On Tue, Sep 10, 2024 at 06:29:59PM +0200, Christian Couder wrote: > > diff --git a/Documentation/config/promisor.txt b/Documentation/config/promisor.txt > > index 98c5cb2ec2..9cbfe3e59e 100644 > > --- a/Documentation/config/promisor.txt > > +++ b/Documentation/config/promisor.txt > > @@ -1,3 +1,20 @@ > > promisor.quiet:: > > If set to "true" assume `--quiet` when fetching additional > > objects for a partial clone. > > + > > +promisor.advertise:: > > + If set to "true", a server will use the "promisor-remote" > > + capability, see linkgit:gitprotocol-v2[5], to advertise the > > + promisor remotes it is using, if it uses some. Default is > > + "false", which means the "promisor-remote" capability is not > > + advertised. > > + > > +promisor.acceptFromServer:: > > + If set to "all", a client will accept all the promisor remotes > > + a server might advertise using the "promisor-remote" > > + capability. Default is "none", which means no promisor remote > > + advertised by a server will be accepted. By accepting a > > + promisor remote, the client agrees that the server might omit > > + objects that are lazily fetchable from this promisor remote > > + from its responses to "fetch" and "clone" requests from the > > + client. See linkgit:gitprotocol-v2[5]. > > I wonder a bit about whether making this an option is all that sensible, > because that would of course apply globally to every server that you > might want to clone from. Wouldn't it be more sensible to make this > configurabe per server? It depends. If, for example, you are in a corporate environment where you will interact only with trusted servers, then it might be easier to have only one option and configure it once for all the servers you are going to interact with. I am Ok to also have an option configurable per server in the future though. > Another question: servers may advertise bogus addresses to us, and as > far as I can see there are currently no precautions in place against > malicious cases. The commit message says: "In a following commit, other values for "promisor.acceptFromServer" will be implemented, so that C will be able to decide the promisor remotes it accepts depending on the name and URL it received from S." and indeed as you noticed in your review of patch 4/4, this concern is addressed by patch 4/4. > The server might for example use this to redirect us to > a remote that uses no encryption, the Git protocol or even the "file://" > protocol. I guess the sane thing here would be to default to allow > clones via "https://" only, but make the set of accepted protocols > configurable. Yeah, it is another potential config option that could be added. At this stage I don't want to send a lot of patches with a large number of possibly useful configuration options as it might appear later that very few are actually used and useful. > > diff --git a/Documentation/gitprotocol-v2.txt b/Documentation/gitprotocol-v2.txt > > index 414bc625d5..65d5256baf 100644 > > --- a/Documentation/gitprotocol-v2.txt > > +++ b/Documentation/gitprotocol-v2.txt > > @@ -781,6 +781,60 @@ retrieving the header from a bundle at the indicated URI, and thus > > save themselves and the server(s) the request(s) needed to inspect the > > headers of that bundle or bundles. > > > > +promisor-remote=<pr-infos> > > +~~~~~~~~~~~~~~~~~~~~~~~~~~ > > + > > +The server may advertise some promisor remotes it is using or knows > > +about to a client which may want to use them as its promisor remotes, > > +instead of this repository. In this case <pr-infos> should be of the > > +form: > > + > > + pr-infos = pr-info | pr-infos ";" pr-info > > Wouldn't it be preferable to make this multiple lines so that we cannot > ever burst through the pktline limits? LARGE_PACKET_MAX is 65520 which looks more than enough to me. So having the pktline limit could actually prevent malicious servers from sending too much junk. I wouldn't be against multiple lines if there were reasonable cases where the current pktline limit might not be enough (or if such cases appeared in the future) though. It's just that I can't think of any such reasonable case. > > + pr-info = "name=" pr-name | "name=" pr-name "," "url=" pr-url > > + > > +where `pr-name` is the urlencoded name of a promisor remote, and > > +`pr-url` the urlencoded URL of that promisor remote. > > +In this case, if the client decides to use one or more promisor > > +remotes the server advertised, it can reply with > > +"promisor-remote=<pr-names>" where <pr-names> should be of the form: > > One of the things that LFS provides is custom transfer types. It is for > example possible to use NFS or some other arbitrary protocol to fetch or > upload data. It should be possible to provide similar functionality on > the Git side via custom transport helpers, too, and if we make the > accepted set of helpers configurable as proposed further up this could > be made safe, too. It's already possible to use remote helpers using different protocols with promisor remotes and the URL security issues are addressed by patch 4/4. > But one thing I'm missing here is any documentation around how the > client would know which promisor-remote to pick when the remote > advertises multiple of them. In most cases for now, I think the server should advertise only one, and the client should configure that promisor remote on its own and set "promisor.acceptFromServer" to "KnownUrl", or maybe "KnownName" in a corporate setting, (see patch 4/4). If a server advertises more than one, it should have some docs to explain why it does that and which one(s) should be picked by which client. For example, it could say something like "Users in this part of the world might want to pick only promisor remote A as it is likely to be better connected to them, while users in other parts of the world should pick only promisor remote B for the same reason." > The easiest schema would of course be to > pick the first one whose transport helper the client understands and > considers to be safe. But given that we're talking about offloading of > large blobs, would we have usecases for advertising e.g. region-scoped > remotes that require more information on the client-side? If region-scoped means something like the example I talk about above, then yeah, as also discussed with Junio, this could be an interesting use case. > Also, are the promisor remotes promising to each contain all objects? Or > would the client have to ask each promisor remote until it finds a > desired object? I think both use cases could be interesting. > > + pr-names = pr-name | pr-names ";" pr-name > > + > > +where `pr-name` is the urlencoded name of a promisor remote the server > > +advertised and the client accepts. > > + > > +Note that, everywhere in this document, `pr-name` MUST be a valid > > +remote name, and the ';' and ',' characters MUST be encoded if they > > +appear in `pr-name` or `pr-url`. > > So I assume the intent here is to let the client add that promisor > remote with that exact, server-provided name? That makes me wonder about > two different scenarios: > > - We must keep the remote from announcing "origin". I agree that it might not be a good idea to have something else than the main remote named origin. I am not sure it's necessary to explicitly disallow it though. > - What if we eventually decide to allow users to provide their own > names for remotes during git-clone(1)? I think it could be confusing, so I would say that we should wait until a concrete case where it could be useful appear before allowing this. > Overall, I don't think that it's a good idea to let the remote dictate > which name a client's remotes have. Maybe a new mode like "KnownURL" but where only the URL and not the name should match could be interesting in some cases then? If that's the case it's very simple to add it. I just prefer not to do it for now as I am not yet convinced there is a very relevant use case. I think that if a client doesn't want to trust and cooperate with the server at all, it might just be better in most cases for it to just leave the server alone and not access it at all, independently of using promisor remote or not. > > +If the server doesn't know any promisor remote that could be good for > > +a client to use, or prefers a client not to use any promisor remote it > > +uses or knows about, it shouldn't advertise the "promisor-remote" > > +capability at all. > > + > > +In this case, or if the client doesn't want to use any promisor remote > > +the server advertised, the client shouldn't advertise the > > +"promisor-remote" capability at all in its reply. > > + > > +The "promisor.advertise" and "promisor.acceptFromServer" configuration > > +options can be used on the server and client side respectively to > > +control what they advertise or accept respectively. See the > > +documentation of these configuration options for more information. > > + > > +Note that in the future it would be nice if the "promisor-remote" > > +protocol capability could be used by the server, when responding to > > +`git fetch` or `git clone`, to advertise better-connected remotes that > > +the client can use as promisor remotes, instead of this repository, so > > +that the client can lazily fetch objects from these other > > +better-connected remotes. This would require the server to omit in its > > +response the objects available on the better-connected remotes that > > +the client has accepted. This hasn't been implemented yet though. So > > +for now this "promisor-remote" capability is useful only when the > > +server advertises some promisor remotes it already uses to borrow > > +objects from. > > In the cover letter you mention that the server may not even have some > objects at all in the future. I am not sure which part of the cover letter this refers to. If S uses X as a promisor remote, then yeah, it might not have some objects that are on X. But perhaps there is some wrong wording or a misunderstanding here. > I wonder how that is supposed to interact > with clients that do not know about the "promisor-remote" capability at > all though. When that happens, S can fetch from X the objects it doesn't have, and then proceed as usual to respond to the client. This has the drawback of duplicating these objects on S, but perhaps there could be some kind of garbage collection process that would regularly remove those duplicated objects from S. Another possibility that could be added in the future would be for S to warn the client that it should be upgraded to have the "promisor-remote" capability. Or S could just refuse to serve the client in that case. I don't think we should implement these possibilities right now, but it could be useful to do it in the future. > From my point of view the server should be able tot handle that just > fine and provide a full packfile to the client. That would of course > require the server to fetch missing objects from its own promisor > remotes. It's what already happens. > Do we want to state explicitly that this is a MUST for servers > so that we don't end up in a future where clients wouldn't be able to > fetch from some forges anymore? I don't think we should enforce anything like this. For example in corporate setups, it might be easy to install the latest version of Git and it might be a good thing to make sure the server doesn't get overloaded with large files when they are supposed to only be stored on a promisor remote.
On Mon, Sep 30, 2024 at 03:28:20PM +0200, Christian Couder wrote: > On Mon, Sep 30, 2024 at 9:57 AM Patrick Steinhardt <ps@pks.im> wrote: > > So I assume the intent here is to let the client add that promisor > > remote with that exact, server-provided name? That makes me wonder about > > two different scenarios: > > > > - We must keep the remote from announcing "origin". > > I agree that it might not be a good idea to have something else than > the main remote named origin. I am not sure it's necessary to > explicitly disallow it though. > > > - What if we eventually decide to allow users to provide their own > > names for remotes during git-clone(1)? > > I think it could be confusing, so I would say that we should wait > until a concrete case where it could be useful appear before allowing > this. I think we've been talking past another on this item. What I'm worried about is a potential future where the default remote isn't called "origin", but something else. I for example quite frequently rename the remote right after cloning because I add a handful of remotes, and "origin" would be too confusing. So there is a usecase that may at one point in the future cause us to make this configurable at clone-time. Which brings me to the issue with the current design: if the remote dictates the names of additional remotes we basically cannot do the above change anymore because we have to assume that no matter which remote name is chosen, it could already be used by a promisor remote. Our hands are bound by potential implementations of this feature by a third party, which I think is not a good idea in general. Now I'm not against advertising a name and storing it in our config when we create the additional remote, for example by storing it as a separate key "remote.<generated>.promisor-name". But the name of the remote itself should not be controlled by the server, but should instead be generated by the client. > > Overall, I don't think that it's a good idea to let the remote dictate > > which name a client's remotes have. > > Maybe a new mode like "KnownURL" but where only the URL and not the > name should match could be interesting in some cases then? If that's > the case it's very simple to add it. I just prefer not to do it for > now as I am not yet convinced there is a very relevant use case. I > think that if a client doesn't want to trust and cooperate with the > server at all, it might just be better in most cases for it to just > leave the server alone and not access it at all, independently of > using promisor remote or not. It's not only about trust, as explained above. It's more about not letting server operators dictate how Git can evolve in that context and not taking away the ability of a user to configure their repository how they want to. > > I wonder how that is supposed to interact > > with clients that do not know about the "promisor-remote" capability at > > all though. > > When that happens, S can fetch from X the objects it doesn't have, and > then proceed as usual to respond to the client. This has the drawback > of duplicating these objects on S, but perhaps there could be some > kind of garbage collection process that would regularly remove those > duplicated objects from S. > > Another possibility that could be added in the future would be for S > to warn the client that it should be upgraded to have the > "promisor-remote" capability. Or S could just refuse to serve the > client in that case. I don't think we should implement these > possibilities right now, but it could be useful to do it in the > future. > > > From my point of view the server should be able tot handle that just > > fine and provide a full packfile to the client. That would of course > > require the server to fetch missing objects from its own promisor > > remotes. > > It's what already happens. > > > Do we want to state explicitly that this is a MUST for servers > > so that we don't end up in a future where clients wouldn't be able to > > fetch from some forges anymore? > > I don't think we should enforce anything like this. For example in > corporate setups, it might be easy to install the latest version of > Git and it might be a good thing to make sure the server doesn't get > overloaded with large files when they are supposed to only be stored > on a promisor remote. Partitioning the Git userbase depending on the Git version they can use doesn't feel sensible to me. We have been able to get by without breaking backwards compatibility on the transport layer until now. So it would be too bad if this new feature would break that. Also, the argument with a corporate setup cuts both ways, I think. If the administrators tightly control the Git version anyway they can just upgrade it for all clients, and consequently all of the clients would know how to handle the new capability and thus the server wouldn't be overloaded. Patrick
Patrick Steinhardt <ps@pks.im> writes: > Now I'm not against advertising a name and storing it in our config when > we create the additional remote, for example by storing it as a separate > key "remote.<generated>.promisor-name". But the name of the remote > itself should not be controlled by the server, but should instead be > generated by the client. Thanks. In an earlier round of the review, I noticed that the remote side gives each promisor remote it suggests a name, but I failed to realize that it is used without any say from the user at the receiving end in the local repository---which is horrible. The remote end wants to keep referring to a promisor remote in such a way that both sides can understand when the same promisor remote is referred to in the future, and I am OK for the protocol to allow the remote to give a name to a promisor remote. Such a name needs to be kept separate from the name the end-user locally uses to refer to the promisor remote (if they follow the suggestion given over the protocol). Do we need some mapping mechanism to do so? A name N the remote A gave to another remote B has to keep referring to the remote we know as B today, even if we rename B to C. Thanks.
On Tue, Sep 10, 2024 at 06:29:59PM +0200, Christian Couder wrote: [snip] > +static void filter_promisor_remote(struct repository *repo, > + struct strvec *accepted, > + const char *info) > +{ > + struct strbuf **remotes; > + char *accept_str; > + enum accept_promisor accept = ACCEPT_NONE; > + > + if (!git_config_get_string("promisor.acceptfromserver", &accept_str)) { > + if (!accept_str || !*accept_str || !strcasecmp("None", accept_str)) > + accept = ACCEPT_NONE; > + else if (!strcasecmp("All", accept_str)) > + accept = ACCEPT_ALL; > + else > + warning(_("unknown '%s' value for '%s' config option"), > + accept_str, "promisor.acceptfromserver"); > + } > + > + if (accept == ACCEPT_NONE) > + return; This code path is leaking memory because we don't free `accept_str`. Once you reroll, I'd propose to have below patch on top to fix the leak. Patrick diff --git a/promisor-remote.c b/promisor-remote.c index 06507b2ee1..0a4f7f1188 100644 --- a/promisor-remote.c +++ b/promisor-remote.c @@ -424,12 +424,12 @@ static void filter_promisor_remote(struct repository *repo, const char *info) { struct strbuf **remotes; - char *accept_str; + const char *accept_str; enum accept_promisor accept = ACCEPT_NONE; struct strvec names = STRVEC_INIT; struct strvec urls = STRVEC_INIT; - if (!git_config_get_string("promisor.acceptfromserver", &accept_str)) { + if (!git_config_get_string_tmp("promisor.acceptfromserver", &accept_str)) { if (!accept_str || !*accept_str || !strcasecmp("None", accept_str)) accept = ACCEPT_NONE; else if (!strcasecmp("KnownUrl", accept_str)) @@ -486,7 +486,6 @@ static void filter_promisor_remote(struct repository *repo, free(decoded_url); } - free(accept_str); strvec_clear(&names); strvec_clear(&urls); strbuf_list_free(remotes);
diff --git a/Documentation/config/promisor.txt b/Documentation/config/promisor.txt index 98c5cb2ec2..9cbfe3e59e 100644 --- a/Documentation/config/promisor.txt +++ b/Documentation/config/promisor.txt @@ -1,3 +1,20 @@ promisor.quiet:: If set to "true" assume `--quiet` when fetching additional objects for a partial clone. + +promisor.advertise:: + If set to "true", a server will use the "promisor-remote" + capability, see linkgit:gitprotocol-v2[5], to advertise the + promisor remotes it is using, if it uses some. Default is + "false", which means the "promisor-remote" capability is not + advertised. + +promisor.acceptFromServer:: + If set to "all", a client will accept all the promisor remotes + a server might advertise using the "promisor-remote" + capability. Default is "none", which means no promisor remote + advertised by a server will be accepted. By accepting a + promisor remote, the client agrees that the server might omit + objects that are lazily fetchable from this promisor remote + from its responses to "fetch" and "clone" requests from the + client. See linkgit:gitprotocol-v2[5]. diff --git a/Documentation/gitprotocol-v2.txt b/Documentation/gitprotocol-v2.txt index 414bc625d5..65d5256baf 100644 --- a/Documentation/gitprotocol-v2.txt +++ b/Documentation/gitprotocol-v2.txt @@ -781,6 +781,60 @@ retrieving the header from a bundle at the indicated URI, and thus save themselves and the server(s) the request(s) needed to inspect the headers of that bundle or bundles. +promisor-remote=<pr-infos> +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The server may advertise some promisor remotes it is using or knows +about to a client which may want to use them as its promisor remotes, +instead of this repository. In this case <pr-infos> should be of the +form: + + pr-infos = pr-info | pr-infos ";" pr-info + + pr-info = "name=" pr-name | "name=" pr-name "," "url=" pr-url + +where `pr-name` is the urlencoded name of a promisor remote, and +`pr-url` the urlencoded URL of that promisor remote. + +In this case, if the client decides to use one or more promisor +remotes the server advertised, it can reply with +"promisor-remote=<pr-names>" where <pr-names> should be of the form: + + pr-names = pr-name | pr-names ";" pr-name + +where `pr-name` is the urlencoded name of a promisor remote the server +advertised and the client accepts. + +Note that, everywhere in this document, `pr-name` MUST be a valid +remote name, and the ';' and ',' characters MUST be encoded if they +appear in `pr-name` or `pr-url`. + +If the server doesn't know any promisor remote that could be good for +a client to use, or prefers a client not to use any promisor remote it +uses or knows about, it shouldn't advertise the "promisor-remote" +capability at all. + +In this case, or if the client doesn't want to use any promisor remote +the server advertised, the client shouldn't advertise the +"promisor-remote" capability at all in its reply. + +The "promisor.advertise" and "promisor.acceptFromServer" configuration +options can be used on the server and client side respectively to +control what they advertise or accept respectively. See the +documentation of these configuration options for more information. + +Note that in the future it would be nice if the "promisor-remote" +protocol capability could be used by the server, when responding to +`git fetch` or `git clone`, to advertise better-connected remotes that +the client can use as promisor remotes, instead of this repository, so +that the client can lazily fetch objects from these other +better-connected remotes. This would require the server to omit in its +response the objects available on the better-connected remotes that +the client has accepted. This hasn't been implemented yet though. So +for now this "promisor-remote" capability is useful only when the +server advertises some promisor remotes it already uses to borrow +objects from. + GIT --- Part of the linkgit:git[1] suite diff --git a/connect.c b/connect.c index cf84e631e9..1650bbd71d 100644 --- a/connect.c +++ b/connect.c @@ -20,6 +20,7 @@ #include "protocol.h" #include "alias.h" #include "bundle-uri.h" +#include "promisor-remote.h" static char *server_capabilities_v1; static struct strvec server_capabilities_v2 = STRVEC_INIT; @@ -485,6 +486,7 @@ void check_stateless_delimiter(int stateless_rpc, static void send_capabilities(int fd_out, struct packet_reader *reader) { const char *hash_name; + const char *promisor_remote_info; if (server_supports_v2("agent")) packet_write_fmt(fd_out, "agent=%s", git_user_agent_sanitized()); @@ -498,6 +500,13 @@ static void send_capabilities(int fd_out, struct packet_reader *reader) } else { reader->hash_algo = &hash_algos[GIT_HASH_SHA1]; } + if (server_feature_v2("promisor-remote", &promisor_remote_info)) { + char *reply = promisor_remote_reply(promisor_remote_info); + if (reply) { + packet_write_fmt(fd_out, "promisor-remote=%s", reply); + free(reply); + } + } } int get_remote_bundle_uri(int fd_out, struct packet_reader *reader, diff --git a/promisor-remote.c b/promisor-remote.c index 317e1b127f..baacbe9d94 100644 --- a/promisor-remote.c +++ b/promisor-remote.c @@ -11,6 +11,7 @@ #include "strvec.h" #include "packfile.h" #include "environment.h" +#include "url.h" struct promisor_remote_config { struct promisor_remote *promisors; @@ -219,6 +220,18 @@ int repo_has_promisor_remote(struct repository *r) return !!repo_promisor_remote_find(r, NULL); } +int repo_has_accepted_promisor_remote(struct repository *r) +{ + struct promisor_remote *p; + + promisor_remote_init(r); + + for (p = r->promisor_remote_config->promisors; p; p = p->next) + if (p->accepted) + return 1; + return 0; +} + static int remove_fetched_oids(struct repository *repo, struct object_id **oids, int oid_nr, int to_free) @@ -290,3 +303,188 @@ void promisor_remote_get_direct(struct repository *repo, if (to_free) free(remaining_oids); } + +static int allow_unsanitized(char ch) +{ + if (ch == ',' || ch == ';' || ch == '%') + return 0; + return ch > 32 && ch < 127; +} + +static void promisor_info_vecs(struct repository *repo, + struct strvec *names, + struct strvec *urls) +{ + struct promisor_remote *r; + + promisor_remote_init(repo); + + for (r = repo->promisor_remote_config->promisors; r; r = r->next) { + char *url; + char *url_key = xstrfmt("remote.%s.url", r->name); + + strvec_push(names, r->name); + strvec_push(urls, git_config_get_string(url_key, &url) ? NULL : url); + + free(url); + free(url_key); + } +} + +char *promisor_remote_info(struct repository *repo) +{ + struct strbuf sb = STRBUF_INIT; + int advertise_promisors = 0; + struct strvec names = STRVEC_INIT; + struct strvec urls = STRVEC_INIT; + + git_config_get_bool("promisor.advertise", &advertise_promisors); + + if (!advertise_promisors) + return NULL; + + promisor_info_vecs(repo, &names, &urls); + + if (!names.nr) + return NULL; + + for (size_t i = 0; i < names.nr; i++) { + if (i) + strbuf_addch(&sb, ';'); + strbuf_addstr(&sb, "name="); + strbuf_addstr_urlencode(&sb, names.v[i], allow_unsanitized); + if (urls.v[i]) { + strbuf_addstr(&sb, ",url="); + strbuf_addstr_urlencode(&sb, urls.v[i], allow_unsanitized); + } + } + + strbuf_sanitize(&sb); + + strvec_clear(&names); + strvec_clear(&urls); + + return strbuf_detach(&sb, NULL); +} + +enum accept_promisor { + ACCEPT_NONE = 0, + ACCEPT_ALL +}; + +static int should_accept_remote(enum accept_promisor accept, + const char *remote_name UNUSED, + const char *remote_url UNUSED) +{ + if (accept == ACCEPT_ALL) + return 1; + + BUG("Unhandled 'enum accept_promisor' value '%d'", accept); +} + +static void filter_promisor_remote(struct repository *repo, + struct strvec *accepted, + const char *info) +{ + struct strbuf **remotes; + char *accept_str; + enum accept_promisor accept = ACCEPT_NONE; + + if (!git_config_get_string("promisor.acceptfromserver", &accept_str)) { + if (!accept_str || !*accept_str || !strcasecmp("None", accept_str)) + accept = ACCEPT_NONE; + else if (!strcasecmp("All", accept_str)) + accept = ACCEPT_ALL; + else + warning(_("unknown '%s' value for '%s' config option"), + accept_str, "promisor.acceptfromserver"); + } + + if (accept == ACCEPT_NONE) + return; + + /* Parse remote info received */ + + remotes = strbuf_split_str(info, ';', 0); + + for (size_t i = 0; remotes[i]; i++) { + struct strbuf **elems; + const char *remote_name = NULL; + const char *remote_url = NULL; + char *decoded_name = NULL; + char *decoded_url = NULL; + + strbuf_trim_trailing_ch(remotes[i], ';'); + elems = strbuf_split_str(remotes[i]->buf, ',', 0); + + for (size_t j = 0; elems[j]; j++) { + int res; + strbuf_trim_trailing_ch(elems[j], ','); + res = skip_prefix(elems[j]->buf, "name=", &remote_name) || + skip_prefix(elems[j]->buf, "url=", &remote_url); + if (!res) + warning(_("unknown element '%s' from remote info"), + elems[j]->buf); + } + + if (remote_name) + decoded_name = url_percent_decode(remote_name); + if (remote_url) + decoded_url = url_percent_decode(remote_url); + + if (decoded_name && should_accept_remote(accept, decoded_name, decoded_url)) + strvec_push(accepted, decoded_name); + + strbuf_list_free(elems); + free(decoded_name); + free(decoded_url); + } + + free(accept_str); + strbuf_list_free(remotes); +} + +char *promisor_remote_reply(const char *info) +{ + struct strvec accepted = STRVEC_INIT; + struct strbuf reply = STRBUF_INIT; + + filter_promisor_remote(the_repository, &accepted, info); + + if (!accepted.nr) + return NULL; + + for (size_t i = 0; i < accepted.nr; i++) { + if (i) + strbuf_addch(&reply, ';'); + strbuf_addstr_urlencode(&reply, accepted.v[i], allow_unsanitized); + } + + strvec_clear(&accepted); + + return strbuf_detach(&reply, NULL); +} + +void mark_promisor_remotes_as_accepted(struct repository *r, const char *remotes) +{ + struct strbuf **accepted_remotes = strbuf_split_str(remotes, ';', 0); + + for (size_t i = 0; accepted_remotes[i]; i++) { + struct promisor_remote *p; + char *decoded_remote; + + strbuf_trim_trailing_ch(accepted_remotes[i], ';'); + decoded_remote = url_percent_decode(accepted_remotes[i]->buf); + + p = repo_promisor_remote_find(r, decoded_remote); + if (p) + p->accepted = 1; + else + warning(_("accepted promisor remote '%s' not found"), + decoded_remote); + + free(decoded_remote); + } + + strbuf_list_free(accepted_remotes); +} diff --git a/promisor-remote.h b/promisor-remote.h index 88cb599c39..814ca248c7 100644 --- a/promisor-remote.h +++ b/promisor-remote.h @@ -9,11 +9,13 @@ struct object_id; * Promisor remote linked list * * Information in its fields come from remote.XXX config entries or - * from extensions.partialclone. + * from extensions.partialclone, except for 'accepted' which comes + * from protocol v2 capabilities exchange. */ struct promisor_remote { struct promisor_remote *next; char *partial_clone_filter; + unsigned int accepted : 1; const char name[FLEX_ARRAY]; }; @@ -32,4 +34,36 @@ void promisor_remote_get_direct(struct repository *repo, const struct object_id *oids, int oid_nr); +/* + * Prepare a "promisor-remote" advertisement by a server. + * Check the value of "promisor.advertise" and maybe the configured + * promisor remotes, if any, to prepare information to send in an + * advertisement. + * Return value is NULL if no promisor remote advertisement should be + * made. Otherwise it contains the names and urls of the advertised + * promisor remotes separated by ';' + */ +char *promisor_remote_info(struct repository *repo); + +/* + * Prepare a reply to a "promisor-remote" advertisement from a server. + * Check the value of "promisor.acceptfromserver" and maybe the + * configured promisor remotes, if any, to prepare the reply. + * Return value is NULL if no promisor remote from the server + * is accepted. Otherwise it contains the names of the accepted promisor + * remotes separated by ';'. + */ +char *promisor_remote_reply(const char *info); + +/* + * Set the 'accepted' flag for some promisor remotes. Useful when some + * promisor remotes have been accepted by the client. + */ +void mark_promisor_remotes_as_accepted(struct repository *repo, const char *remotes); + +/* + * Has any promisor remote been accepted by the client? + */ +int repo_has_accepted_promisor_remote(struct repository *r); + #endif /* PROMISOR_REMOTE_H */ diff --git a/serve.c b/serve.c index 884cd84ca8..a8935571d6 100644 --- a/serve.c +++ b/serve.c @@ -12,6 +12,7 @@ #include "upload-pack.h" #include "bundle-uri.h" #include "trace2.h" +#include "promisor-remote.h" static int advertise_sid = -1; static int advertise_object_info = -1; @@ -31,6 +32,26 @@ static int agent_advertise(struct repository *r UNUSED, return 1; } +static int promisor_remote_advertise(struct repository *r, + struct strbuf *value) +{ + if (value) { + char *info = promisor_remote_info(r); + if (!info) + return 0; + strbuf_addstr(value, info); + free(info); + } + return 1; +} + +static void promisor_remote_receive(struct repository *r, + const char *remotes) +{ + mark_promisor_remotes_as_accepted(r, remotes); +} + + static int object_format_advertise(struct repository *r, struct strbuf *value) { @@ -157,6 +178,11 @@ static struct protocol_capability capabilities[] = { .advertise = bundle_uri_advertise, .command = bundle_uri_command, }, + { + .name = "promisor-remote", + .advertise = promisor_remote_advertise, + .receive = promisor_remote_receive, + }, }; void protocol_v2_advertise_capabilities(void) diff --git a/t/t5710-promisor-remote-capability.sh b/t/t5710-promisor-remote-capability.sh new file mode 100755 index 0000000000..7e44ad15ce --- /dev/null +++ b/t/t5710-promisor-remote-capability.sh @@ -0,0 +1,124 @@ +#!/bin/sh + +test_description='handling of promisor remote advertisement' + +. ./test-lib.sh + +# Setup the repository with three commits, this way HEAD is always +# available and we can hide commit 1 or 2. +test_expect_success 'setup: create "template" repository' ' + git init template && + test_commit -C template 1 && + test_commit -C template 2 && + test_commit -C template 3 && + test-tool genrandom foo 10240 >template/foo && + git -C template add foo && + git -C template commit -m foo +' + +# A bare repo will act as a server repo with unpacked objects. +test_expect_success 'setup: create bare "server" repository' ' + git clone --bare --no-local template server && + mv server/objects/pack/pack-* . && + packfile=$(ls pack-*.pack) && + git -C server unpack-objects --strict <"$packfile" +' + +check_missing_objects () { + git -C "$1" rev-list --objects --all --missing=print > all.txt && + perl -ne 'print if s/^[?]//' all.txt >missing.txt && + test_line_count = "$2" missing.txt && + test "$3" = "$(cat missing.txt)" +} + +initialize_server () { + # Repack everything first + git -C server -c repack.writebitmaps=false repack -a -d && + + # Remove promisor file in case they exist, useful when reinitializing + rm -rf server/objects/pack/*.promisor && + + # Repack without the largest object and create a promisor pack on server + git -C server -c repack.writebitmaps=false repack -a -d \ + --filter=blob:limit=5k --filter-to="$(pwd)" && + promisor_file=$(ls server/objects/pack/*.pack | sed "s/\.pack/.promisor/") && + touch "$promisor_file" && + + # Check that only one object is missing on the server + check_missing_objects server 1 "$oid" +} + +test_expect_success "setup for testing promisor remote advertisement" ' + # Create another bare repo called "server2" + git init --bare server2 && + + # Copy the largest object from server to server2 + obj="HEAD:foo" && + oid="$(git -C server rev-parse $obj)" && + oid_path="$(test_oid_to_path $oid)" && + path="server/objects/$oid_path" && + path2="server2/objects/$oid_path" && + mkdir -p $(dirname "$path2") && + cp "$path" "$path2" && + + initialize_server && + + # Configure server2 as promisor remote for server + git -C server remote add server2 "file://$(pwd)/server2" && + git -C server config remote.server2.promisor true && + + git -C server2 config uploadpack.allowFilter true && + git -C server2 config uploadpack.allowAnySHA1InWant true && + git -C server config uploadpack.allowFilter true && + git -C server config uploadpack.allowAnySHA1InWant true +' + +test_expect_success "fetch with promisor.advertise set to 'true'" ' + git -C server config promisor.advertise true && + + # Clone from server to create a client + GIT_NO_LAZY_FETCH=0 git clone -c remote.server2.promisor=true \ + -c remote.server2.fetch="+refs/heads/*:refs/remotes/server2/*" \ + -c remote.server2.url="file://$(pwd)/server2" \ + -c promisor.acceptfromserver=All \ + --no-local --filter="blob:limit=5k" server client && + test_when_finished "rm -rf client" && + + # Check that the largest object is still missing on the server + check_missing_objects server 1 "$oid" +' + +test_expect_success "fetch with promisor.advertise set to 'false'" ' + git -C server config promisor.advertise false && + + # Clone from server to create a client + GIT_NO_LAZY_FETCH=0 git clone -c remote.server2.promisor=true \ + -c remote.server2.fetch="+refs/heads/*:refs/remotes/server2/*" \ + -c remote.server2.url="file://$(pwd)/server2" \ + -c promisor.acceptfromserver=All \ + --no-local --filter="blob:limit=5k" server client && + test_when_finished "rm -rf client" && + + # Check that the largest object is not missing on the server + check_missing_objects server 0 "" && + + # Reinitialize server so that the largest object is missing again + initialize_server +' + +test_expect_success "fetch with promisor.acceptfromserver set to 'None'" ' + git -C server config promisor.advertise true && + + # Clone from server to create a client + GIT_NO_LAZY_FETCH=0 git clone -c remote.server2.promisor=true \ + -c remote.server2.fetch="+refs/heads/*:refs/remotes/server2/*" \ + -c remote.server2.url="file://$(pwd)/server2" \ + -c promisor.acceptfromserver=None \ + --no-local --filter="blob:limit=5k" server client && + test_when_finished "rm -rf client" && + + # Check that the largest object is not missing on the server + check_missing_objects server 0 "" +' + +test_done diff --git a/upload-pack.c b/upload-pack.c index 0052c6a4dc..0cff76c845 100644 --- a/upload-pack.c +++ b/upload-pack.c @@ -31,6 +31,7 @@ #include "write-or-die.h" #include "json-writer.h" #include "strmap.h" +#include "promisor-remote.h" /* Remember to update object flag allocation in object.h */ #define THEY_HAVE (1u << 11) @@ -317,6 +318,8 @@ static void create_pack_file(struct upload_pack_data *pack_data, strvec_push(&pack_objects.args, "--delta-base-offset"); if (pack_data->use_include_tag) strvec_push(&pack_objects.args, "--include-tag"); + if (repo_has_accepted_promisor_remote(the_repository)) + strvec_push(&pack_objects.args, "--missing=allow-promisor"); if (pack_data->filter_options.choice) { const char *spec = expand_list_objects_filter_spec(&pack_data->filter_options);
When a server S knows that some objects from a repository are available from a promisor remote X, S might want to suggest to a client C cloning or fetching the repo from S that C should use X directly instead of S for these objects. Note that this could happen both in the case S itself doesn't have the objects and borrows them from X, and in the case S has the objects but knows that X is better connected to the world (e.g., it is in a $LARGEINTERNETCOMPANY datacenter with petabit/s backbone connections) than S. Implementation of the latter case, which would require S to omit in its response the objects available on X, is left for future improvement though. Then C might or might not, want to get the objects from X, and should let S know about this. To allow S and C to agree and let each other know about C using X or not, let's introduce a new "promisor-remote" capability in the protocol v2, as well as a few new configuration variables: - "promisor.advertise" on the server side, and: - "promisor.acceptFromServer" on the client side. By default, or if "promisor.advertise" is set to 'false', a server S will not advertise the "promisor-remote" capability. If S doesn't advertise the "promisor-remote" capability, then a client C replying to S shouldn't advertise the "promisor-remote" capability either. If "promisor.advertise" is set to 'true', S will advertise its promisor remotes with a string like: promisor-remote=<pr-info>[;<pr-info>]... where each <pr-info> element contains information about a single promisor remote in the form: name=<pr-name>[,url=<pr-url>] where <pr-name> is the urlencoded name of a promisor remote and <pr-url> is the urlencoded URL of the promisor remote named <pr-name>. For now, the URL is passed in addition to the name. In the future, it might be possible to pass other information like a filter-spec that the client should use when cloning from S, or a token that the client should use when retrieving objects from X. It might also be possible in the future for "promisor.advertise" to have other values. For example a value like "onlyName" could prevent S from advertising URLs, which could help in case C should use a different URL for X than the URL S is using. (The URL S is using might be an internal one on the server side for example.) By default or if "promisor.acceptFromServer" is set to "None", C will not accept to use the promisor remotes that might have been advertised by S. In this case, C will not advertise any "promisor-remote" capability in its reply to S. If "promisor.acceptFromServer" is set to "All" and S advertised some promisor remotes, then on the contrary, C will accept to use all the promisor remotes that S advertised and C will reply with a string like: promisor-remote=<pr-name>[;<pr-name>]... where the <pr-name> elements are the urlencoded names of all the promisor remotes S advertised. In a following commit, other values for "promisor.acceptFromServer" will be implemented, so that C will be able to decide the promisor remotes it accepts depending on the name and URL it received from S. So even if that name and URL information is not used much right now, it will be needed soon. Helped-by: Taylor Blau <me@ttaylorr.com> Helped-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> --- Documentation/config/promisor.txt | 17 +++ Documentation/gitprotocol-v2.txt | 54 +++++++ connect.c | 9 ++ promisor-remote.c | 198 ++++++++++++++++++++++++++ promisor-remote.h | 36 ++++- serve.c | 26 ++++ t/t5710-promisor-remote-capability.sh | 124 ++++++++++++++++ upload-pack.c | 3 + 8 files changed, 466 insertions(+), 1 deletion(-) create mode 100755 t/t5710-promisor-remote-capability.sh