Message ID | 32e16dfdbd3f54c0d4b39cbd555e66aa3950fffd.1611686656.git.jonathantanmy@google.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Cloning with remote unborn HEAD | expand |
Jonathan Tan <jonathantanmy@google.com> writes: > When cloning, we choose the default branch based on the remote HEAD. > But if there is no remote HEAD reported (which could happen if the > target of the remote HEAD is unborn), we'll fall back to using our local > init.defaultBranch. Traditionally this hasn't been a big deal, because > most repos used "master" as the default. But these days it is likely to > cause confusion if the server and client implementations choose > different values (e.g., if the remote started with "main", we may choose > "master" locally, create commits there, and then the user is surprised > when they push to "master" and not "main"). > > To solve this, the remote needs to communicate the target of the HEAD > symref, even if it is unborn, and "git clone" needs to use this > information. > > Currently, symrefs that have unborn targets (such as in this case) are > not communicated by the protocol. Teach Git to advertise and support the > "unborn" feature in "ls-refs" (by default, this is advertised, but > server administrators may turn this off through the lsrefs.allowunborn > config). This feature indicates that "ls-refs" supports the "unborn" > argument; when it is specified, "ls-refs" will send the HEAD symref with > the name of its unborn target. > > This change is only for protocol v2. A similar change for protocol v0 > would require independent protocol design (there being no analogous > position to signal support for "unborn") and client-side plumbing of the > data required, so the scope of this patch set is limited to protocol v2. > > The client side will be updated to use this in a subsequent commit. > > Signed-off-by: Jonathan Tan <jonathantanmy@google.com> > Signed-off-by: Junio C Hamano <gitster@pobox.com> > --- > Documentation/config.txt | 2 + > Documentation/config/lsrefs.txt | 3 ++ > Documentation/technical/protocol-v2.txt | 10 ++++- > ls-refs.c | 53 +++++++++++++++++++++++-- > ls-refs.h | 1 + > serve.c | 2 +- > t/t5701-git-serve.sh | 2 +- > 7 files changed, 66 insertions(+), 7 deletions(-) > create mode 100644 Documentation/config/lsrefs.txt > > diff --git a/Documentation/config.txt b/Documentation/config.txt > index 6ba50b1104..d08e83a148 100644 > --- a/Documentation/config.txt > +++ b/Documentation/config.txt > @@ -398,6 +398,8 @@ include::config/interactive.txt[] > > include::config/log.txt[] > > +include::config/lsrefs.txt[] > + > include::config/mailinfo.txt[] > > include::config/mailmap.txt[] > diff --git a/Documentation/config/lsrefs.txt b/Documentation/config/lsrefs.txt > new file mode 100644 > index 0000000000..dcbec11aaa > --- /dev/null > +++ b/Documentation/config/lsrefs.txt > @@ -0,0 +1,3 @@ > +lsrefs.allowUnborn:: > + Allow the server to send information about unborn symrefs during the > + protocol v2 ref advertisement. > diff --git a/Documentation/technical/protocol-v2.txt b/Documentation/technical/protocol-v2.txt > index 85daeb5d9e..4707511c10 100644 > --- a/Documentation/technical/protocol-v2.txt > +++ b/Documentation/technical/protocol-v2.txt > @@ -192,11 +192,19 @@ ls-refs takes in the following arguments: > When specified, only references having a prefix matching one of > the provided prefixes are displayed. > > +If the 'unborn' feature is advertised the following argument can be > +included in the client's request. > + > + unborn > + The server may send symrefs pointing to unborn branches in the form > + "unborn <refname> symref-target:<target>". > + > The output of ls-refs is as follows: > > output = *ref > flush-pkt > - ref = PKT-LINE(obj-id SP refname *(SP ref-attribute) LF) > + obj-id-or-unborn = (obj-id | "unborn") > + ref = PKT-LINE(obj-id-or-unborn SP refname *(SP ref-attribute) LF) > ref-attribute = (symref | peeled) > symref = "symref-target:" symref-target > peeled = "peeled:" obj-id > diff --git a/ls-refs.c b/ls-refs.c > index a1e0b473e4..4077adeb6a 100644 > --- a/ls-refs.c > +++ b/ls-refs.c > @@ -32,6 +32,8 @@ struct ls_refs_data { > unsigned peel; > unsigned symrefs; > struct strvec prefixes; > + unsigned allow_unborn : 1; > + unsigned unborn : 1; > }; > > static int send_ref(const char *refname, const struct object_id *oid, > @@ -47,7 +49,10 @@ static int send_ref(const char *refname, const struct object_id *oid, > if (!ref_match(&data->prefixes, refname_nons)) > return 0; > > - strbuf_addf(&refline, "%s %s", oid_to_hex(oid), refname_nons); > + if (oid) > + strbuf_addf(&refline, "%s %s", oid_to_hex(oid), refname_nons); > + else > + strbuf_addf(&refline, "unborn %s", refname_nons); When a call is made to this helper with NULL "oid", it unconditionally sends the "refname" out as an 'unborn' thing. If data->symrefs is not true, or flag does not have REF_ISSYMREF set, then we'd end up sending "unborn" SP refname LF without any ref-attribute. The caller is responsible for ensuring that it passes sensible data->symrefs and flag when it passes oid==NULL to this function, but it is OK because this is a private helper. OK. > if (data->symrefs && flag & REF_ISSYMREF) { > struct object_id unused; > const char *symref_target = resolve_ref_unsafe(refname, 0, > @@ -74,8 +79,30 @@ static int send_ref(const char *refname, const struct object_id *oid, > return 0; > } > > -static int ls_refs_config(const char *var, const char *value, void *data) > +static void send_possibly_unborn_head(struct ls_refs_data *data) > { > + struct strbuf namespaced = STRBUF_INIT; > + struct object_id oid; > + int flag; > + int oid_is_null; > + > + strbuf_addf(&namespaced, "%sHEAD", get_git_namespace()); > + if (!resolve_ref_unsafe(namespaced.buf, 0, &oid, &flag)) > + return; /* bad ref */ > + oid_is_null = is_null_oid(&oid); > + if (!oid_is_null || > + (data->unborn && data->symrefs && (flag & REF_ISSYMREF))) > + send_ref(namespaced.buf, oid_is_null ? NULL : &oid, flag, data); And this caller makes sure that send_ref()'s expectation holds. > + strbuf_release(&namespaced); > +} > + > +static int ls_refs_config(const char *var, const char *value, void *cb_data) > +{ > + struct ls_refs_data *data = cb_data; > + > + if (!strcmp("lsrefs.allowunborn", var)) > + data->allow_unborn = git_config_bool(var, value); > + > /* > * We only serve fetches over v2 for now, so respect only "uploadpack" > * config. This may need to eventually be expanded to "receive", but we > @@ -91,7 +118,8 @@ int ls_refs(struct repository *r, struct strvec *keys, > > memset(&data, 0, sizeof(data)); > > - git_config(ls_refs_config, NULL); > + data.allow_unborn = 1; > + git_config(ls_refs_config, &data); The above is a usual sequence of "an unspecified allow-unborn defaults to true, but the configuration can turn it off". OK > > while (packet_reader_read(request) == PACKET_READ_NORMAL) { > const char *arg = request->line; > @@ -103,14 +131,31 @@ int ls_refs(struct repository *r, struct strvec *keys, > data.symrefs = 1; > else if (skip_prefix(arg, "ref-prefix ", &out)) > strvec_push(&data.prefixes, out); > + else if (data.allow_unborn && !strcmp("unborn", arg)) > + data.unborn = 1; I think the use of &&-cascade is iffy here. Even when we are *not* accepting request for unborn, we should still parse it as such. This does not matter in today's code, but it is a basic courtesy for future developers who may add more "else if" after it. IOW else if (!strcmp("unborn", arg)) { if (!data.allow_unborn) ; /* we are not accepting the request */ else data.unborn = 1; } I wrote the above in longhand only for documentation purposes; in practice, else if (!strcmp("unborn", arg)) data.unborn = data.allow_unborn; may suffice. > } > > if (request->status != PACKET_READ_FLUSH) > die(_("expected flush after ls-refs arguments")); > > - head_ref_namespaced(send_ref, &data); > + send_possibly_unborn_head(&data); > for_each_namespaced_ref(send_ref, &data); And here is another caller of send_ref(). Are we sure that send_ref()'s expectation is satisfied by this caller when the iteration encounters a broken ref (e.g. refs/heads/broken not a symref but names an object that does not exist and get_sha1() yielding 0{40}), or a dangling symref (e.g. refs/remotes/origin/HEAD pointing at something that does not exist)? > packet_flush(1); > strvec_clear(&data.prefixes); > return 0; > } > + > +int ls_refs_advertise(struct repository *r, struct strbuf *value) > +{ > + if (value) { > + int allow_unborn_value; > + > + if (repo_config_get_bool(the_repository, > + "lsrefs.allowunborn", > + &allow_unborn_value) || > + allow_unborn_value) > + strbuf_addstr(value, "unborn"); > + } This reads "when not explicitly disabled, stuff "unborn" in there". It feels somewhat brittle that we have to read the same variable and apply the same "default to true" logic in two places and have to keep them in sync. Is this because the decision to advertize or not has to be made way before the code that is specific to the implementation of ls-refs is run? If ls_refs_advertise() is always called first before ls_refs(), I wonder if it makes sense to reuse what we found out about the configured (or left unconfigured) state here and use it when ls_refs() gets called? I know that the way serve.c infrastructure calls "do we advertise?" helper from each protocol-element handler is too narrow and does not allow us to pass such a necessary piece of information but I view it as a misdesign that can be corrected (and until that happens, we could use file-local static limited to ls-refs.c). > + return 1; > +} > diff --git a/ls-refs.h b/ls-refs.h > index 7b33a7c6b8..a99e4be0bd 100644 > --- a/ls-refs.h > +++ b/ls-refs.h > @@ -6,5 +6,6 @@ struct strvec; > struct packet_reader; > int ls_refs(struct repository *r, struct strvec *keys, > struct packet_reader *request); > +int ls_refs_advertise(struct repository *r, struct strbuf *value); > > #endif /* LS_REFS_H */ > diff --git a/serve.c b/serve.c > index eec2fe6f29..ac20c72763 100644 > --- a/serve.c > +++ b/serve.c > @@ -73,7 +73,7 @@ struct protocol_capability { > > static struct protocol_capability capabilities[] = { > { "agent", agent_advertise, NULL }, > - { "ls-refs", always_advertise, ls_refs }, > + { "ls-refs", ls_refs_advertise, ls_refs }, > { "fetch", upload_pack_advertise, upload_pack_v2 }, > { "server-option", always_advertise, NULL }, > { "object-format", object_format_advertise, NULL }, > diff --git a/t/t5701-git-serve.sh b/t/t5701-git-serve.sh > index a1f5fdc9fd..df29504161 100755 > --- a/t/t5701-git-serve.sh > +++ b/t/t5701-git-serve.sh > @@ -12,7 +12,7 @@ test_expect_success 'test capability advertisement' ' > cat >expect <<-EOF && > version 2 > agent=git/$(git version | cut -d" " -f3) > - ls-refs > + ls-refs=unborn > fetch=shallow > server-option > object-format=$(test_oid algo)
Junio C Hamano <gitster@pobox.com> writes: > It feels somewhat brittle that we have to read the same variable and > apply the same "default to true" logic in two places and have to > keep them in sync. Is this because the decision to advertize or not > has to be made way before the code that is specific to the > implementation of ls-refs is run? > > If ls_refs_advertise() is always called first before ls_refs(), I > wonder if it makes sense to reuse what we found out about the > configured (or left unconfigured) state here and use it when > ls_refs() gets called? I know that the way serve.c infrastructure > calls "do we advertise?" helper from each protocol-element handler > is too narrow and does not allow us to pass such a necessary piece > of information but I view it as a misdesign that can be corrected > (and until that happens, we could use file-local static limited to > ls-refs.c). After giving the above a bit more thought, here are a few random thoughts around the area. * As "struct protocol_capability" indicates, we have <name of service, the logic to advertise, the logic to serve> as a three-tuple for services. The serving logic should know what advertising logic advertised (or more precisely, what information advertising logic used to make that decision) so that they can work consistently. For that, there should be a mechanism that advertising logic can use to leave a note to serving logic, perhaps by adding a "void *" to both of these functions. The advertising function would allocate a piece of memory it wants to use and returns the pointer to it to the caller in serve.c, and that pointer is given to the corresponding ls_refs() when it is called by serve.c. Then ls_refs_advertise can say "I found this configuration setting and decided to advertise" to later ls_refs() and the latter can say "ah, as you have advertised, I have to respond to such a request". * I am not sure if "lsrefs.allowunborn = yes/no" is a good way to configure this feature. Wouldn't it be more natural to make this three-way, i.e. "lsrefs.unborn = advertise/serve/ignore", where the server operator can choose among (1) advertise the presence of the capability and respond to requests, (2) do not advertise the capability but if a request comes, respond to it, and (3) do not advertise and do not respond. We could throw in 'deny' that causes the request to result in a failure but I do not care too deeply about that fourth option. Using such a configuration mechanism, ls_refs_advertise may leave the value of "lsrefs.unborn" (or lack thereof) it found and used to base its decision to advertise, for use by ls_refs. ls_refs in turn can use the value found there to decide if it ignores or responds to the "unborn" request.
On Tue, Jan 26, 2021 at 01:38:49PM -0800, Junio C Hamano wrote: > > @@ -103,14 +131,31 @@ int ls_refs(struct repository *r, struct strvec *keys, > > data.symrefs = 1; > > else if (skip_prefix(arg, "ref-prefix ", &out)) > > strvec_push(&data.prefixes, out); > > + else if (data.allow_unborn && !strcmp("unborn", arg)) > > + data.unborn = 1; > > I think the use of &&-cascade is iffy here. Even when we are *not* > accepting request for unborn, we should still parse it as such. > This does not matter in today's code, but it is a basic courtesy for > future developers who may add more "else if" after it. > > IOW > > else if (!strcmp("unborn", arg)) { > if (!data.allow_unborn) > ; /* we are not accepting the request */ > else > data.unborn = 1; > } > > I wrote the above in longhand only for documentation purposes; in > practice, > > else if (!strcmp("unborn", arg)) > data.unborn = data.allow_unborn; > > may suffice. Doing it that way is friendlier, but loosens enforcement of: Client will then send a space separated list of capabilities it wants to be in effect. The client MUST NOT ask for capabilities the server did not say it supports. from Documentation/technical/protocol-capabilities.txt. It does solve Jonathan's racy cluster-deploy problem, though. See the discussion in the v4 thread (sorry, seems not to have hit the archive yet, but hopefully this link will work soon): https://lore.kernel.org/git/YBCitNb75rpnuW2L@coredump.intra.peff.net/ -Peff
Jeff King <peff@peff.net> writes: > See the > discussion in the v4 thread (sorry, seems not to have hit the archive > yet, but hopefully this link will work soon): > > https://lore.kernel.org/git/YBCitNb75rpnuW2L@coredump.intra.peff.net/ I guess vger having some sort of constipation, we (not you-and-me but list participants as a whole) would be doing this kind of back-and-force while untable to read what others have said. https://lore.kernel.org/git/xmqqmtwvz4g9.fsf@gitster.c.googlers.com/ will have the following. > It feels somewhat brittle that we have to read the same variable and > apply the same "default to true" logic in two places and have to > keep them in sync. Is this because the decision to advertize or not > has to be made way before the code that is specific to the > implementation of ls-refs is run? > > If ls_refs_advertise() is always called first before ls_refs(), I > wonder if it makes sense to reuse what we found out about the > configured (or left unconfigured) state here and use it when > ls_refs() gets called? I know that the way serve.c infrastructure > calls "do we advertise?" helper from each protocol-element handler > is too narrow and does not allow us to pass such a necessary piece > of information but I view it as a misdesign that can be corrected > (and until that happens, we could use file-local static limited to > ls-refs.c). After giving the above a bit more thought, here are a few random thoughts around the area. * As "struct protocol_capability" indicates, we have <name of service, the logic to advertise, the logic to serve> as a three-tuple for services. The serving logic should know what advertising logic advertised (or more precisely, what information advertising logic used to make that decision) so that they can work consistently. For that, there should be a mechanism that advertising logic can use to leave a note to serving logic, perhaps by adding a "void *" to both of these functions. The advertising function would allocate a piece of memory it wants to use and returns the pointer to it to the caller in serve.c, and that pointer is given to the corresponding ls_refs() when it is called by serve.c. Then ls_refs_advertise can say "I found this configuration setting and decided to advertise" to later ls_refs() and the latter can say "ah, as you have advertised, I have to respond to such a request". * I am not sure if "lsrefs.allowunborn = yes/no" is a good way to configure this feature. Wouldn't it be more natural to make this three-way, i.e. "lsrefs.unborn = advertise/serve/ignore", where the server operator can choose among (1) advertise the presence of the capability and respond to requests, (2) do not advertise the capability but if a request comes, respond to it, and (3) do not advertise and do not respond. We could throw in 'deny' that causes the request to result in a failure but I do not care too deeply about that fourth option. Using such a configuration mechanism, ls_refs_advertise may leave the value of "lsrefs.unborn" (or lack thereof) it found and used to base its decision to advertise, for use by ls_refs. ls_refs in turn can use the value found there to decide if it ignores or responds to the "unborn" request.
On Tue, Jan 26 2021, Jonathan Tan wrote: > +If the 'unborn' feature is advertised the following argument can be > +included in the client's request. > + > + unborn > + The server may send symrefs pointing to unborn branches in the form > + "unborn <refname> symref-target:<target>". > + "branches" as in things under refs/heads/*? What should happen if you send this for a refs/tags/* or refs/xyz/*? Maybe overly pedantic, but it seems we have no other explicit mention of refs/{heads,tags}/ in protocol-v2.txt before this[1]. 1. Although as I've learned from another recent thread include-tag is magical for refs/tags/* only.
> > @@ -47,7 +49,10 @@ static int send_ref(const char *refname, const struct object_id *oid, > > if (!ref_match(&data->prefixes, refname_nons)) > > return 0; > > > > - strbuf_addf(&refline, "%s %s", oid_to_hex(oid), refname_nons); > > + if (oid) > > + strbuf_addf(&refline, "%s %s", oid_to_hex(oid), refname_nons); > > + else > > + strbuf_addf(&refline, "unborn %s", refname_nons); > > When a call is made to this helper with NULL "oid", it unconditionally > sends the "refname" out as an 'unborn' thing. If data->symrefs is not > true, or flag does not have REF_ISSYMREF set, then we'd end up > sending > > "unborn" SP refname LF > > without any ref-attribute. The caller is responsible for ensuring > that it passes sensible data->symrefs and flag when it passes > oid==NULL to this function, but it is OK because this is a private > helper. > > OK. Thanks for checking. > > if (data->symrefs && flag & REF_ISSYMREF) { > > struct object_id unused; > > const char *symref_target = resolve_ref_unsafe(refname, 0, > > @@ -74,8 +79,30 @@ static int send_ref(const char *refname, const struct object_id *oid, > > return 0; > > } > > > > -static int ls_refs_config(const char *var, const char *value, void *data) > > +static void send_possibly_unborn_head(struct ls_refs_data *data) > > { > > + struct strbuf namespaced = STRBUF_INIT; > > + struct object_id oid; > > + int flag; > > + int oid_is_null; > > + > > + strbuf_addf(&namespaced, "%sHEAD", get_git_namespace()); > > + if (!resolve_ref_unsafe(namespaced.buf, 0, &oid, &flag)) > > + return; /* bad ref */ > > + oid_is_null = is_null_oid(&oid); > > + if (!oid_is_null || > > + (data->unborn && data->symrefs && (flag & REF_ISSYMREF))) > > + send_ref(namespaced.buf, oid_is_null ? NULL : &oid, flag, data); > > And this caller makes sure that send_ref()'s expectation holds. Thanks for checking. > > + strbuf_release(&namespaced); > > +} > > + > > +static int ls_refs_config(const char *var, const char *value, void *cb_data) > > +{ > > + struct ls_refs_data *data = cb_data; > > + > > + if (!strcmp("lsrefs.allowunborn", var)) > > + data->allow_unborn = git_config_bool(var, value); > > + > > /* > > * We only serve fetches over v2 for now, so respect only "uploadpack" > > * config. This may need to eventually be expanded to "receive", but we > > @@ -91,7 +118,8 @@ int ls_refs(struct repository *r, struct strvec *keys, > > > > memset(&data, 0, sizeof(data)); > > > > - git_config(ls_refs_config, NULL); > > + data.allow_unborn = 1; > > + git_config(ls_refs_config, &data); > > The above is a usual sequence of "an unspecified allow-unborn > defaults to true, but the configuration can turn it off". OK Later, you address this issue again so I'll comment there. > > @@ -103,14 +131,31 @@ int ls_refs(struct repository *r, struct strvec *keys, > > data.symrefs = 1; > > else if (skip_prefix(arg, "ref-prefix ", &out)) > > strvec_push(&data.prefixes, out); > > + else if (data.allow_unborn && !strcmp("unborn", arg)) > > + data.unborn = 1; > > I think the use of &&-cascade is iffy here. Even when we are *not* > accepting request for unborn, we should still parse it as such. > This does not matter in today's code, but it is a basic courtesy for > future developers who may add more "else if" after it. > > IOW > > else if (!strcmp("unborn", arg)) { > if (!data.allow_unborn) > ; /* we are not accepting the request */ > else > data.unborn = 1; > } > > I wrote the above in longhand only for documentation purposes; in > practice, > > else if (!strcmp("unborn", arg)) > data.unborn = data.allow_unborn; > > may suffice. My thinking was (and is) that falling through in the case of a disallowed argument (as opposed to a completely unrecognized argument) makes it more straightforward later if we ever decide to tighten validation of the ls-refs request - we would only have to put some code at the end that reports back to the user. If we write it as you suggest, we would have to remember to replace the "we are not accepting the request" part (as in the comment in your suggested code) with an error report, but perhaps that is a good thing - we would be able to insert a custom error message instead of an information-hiding "argument not supported". I'm OK either way. > > } > > > > if (request->status != PACKET_READ_FLUSH) > > die(_("expected flush after ls-refs arguments")); > > > > - head_ref_namespaced(send_ref, &data); > > + send_possibly_unborn_head(&data); > > for_each_namespaced_ref(send_ref, &data); > > And here is another caller of send_ref(). Are we sure that > send_ref()'s expectation is satisfied by this caller when the > iteration encounters a broken ref (e.g. refs/heads/broken not a > symref but names an object that does not exist and get_sha1() > yielding 0{40}), or a dangling symref (e.g. refs/remotes/origin/HEAD > pointing at something that does not exist)? I assume that by "this caller" you mean for_each_namespaced_ref(), since you mention an iteration. I believe so - send_ref has been changed to tolerate a NULL (as in (void*)0, not 0{40}) oid, and that is the only change, so if it worked previously, it should still work now. > > packet_flush(1); > > strvec_clear(&data.prefixes); > > return 0; > > } > > + > > +int ls_refs_advertise(struct repository *r, struct strbuf *value) > > +{ > > + if (value) { > > + int allow_unborn_value; > > + > > + if (repo_config_get_bool(the_repository, > > + "lsrefs.allowunborn", > > + &allow_unborn_value) || > > + allow_unborn_value) > > + strbuf_addstr(value, "unborn"); > > + } > > This reads "when not explicitly disabled, stuff "unborn" in there". > > It feels somewhat brittle that we have to read the same variable and > apply the same "default to true" logic in two places and have to > keep them in sync. Is this because the decision to advertize or not > has to be made way before the code that is specific to the > implementation of ls-refs is run? > > If ls_refs_advertise() is always called first before ls_refs(), I > wonder if it makes sense to reuse what we found out about the > configured (or left unconfigured) state here and use it when > ls_refs() gets called? I know that the way serve.c infrastructure > calls "do we advertise?" helper from each protocol-element handler > is too narrow and does not allow us to pass such a necessary piece > of information but I view it as a misdesign that can be corrected > (and until that happens, we could use file-local static limited to > ls-refs.c). Perhaps what I could do is have a static variable that tracks whether config has been read and what the config is (or if the default variable is used), and have each function call another function that sets that static variable if config has not yet been read. I think that will address this concern.
Jonathan Tan <jonathantanmy@google.com> writes: >> I think the use of &&-cascade is iffy here. Even when we are *not* >> accepting request for unborn, we should still parse it as such. >> This does not matter in today's code, but it is a basic courtesy for >> future developers who may add more "else if" after it. >> >> IOW >> >> else if (!strcmp("unborn", arg)) { >> if (!data.allow_unborn) >> ; /* we are not accepting the request */ >> else >> data.unborn = 1; >> } >> >> I wrote the above in longhand only for documentation purposes; in >> practice, >> >> else if (!strcmp("unborn", arg)) >> data.unborn = data.allow_unborn; >> >> may suffice. > > My thinking was (and is) that falling through in the case of a > disallowed argument (as opposed to a completely unrecognized argument) > makes it more straightforward later if we ever decide to tighten > validation of the ls-refs request - we would only have to put some code > at the end that reports back to the user. Sorry, I do not quite follow. If "unborn" is conditionally allowed, you can extend what I suggested above like so: if (we see we got an unborn request) { - if (allowed) + if (partially allowed) + record that we got unborn request and will + partially respond to it + else if (allowed) record that we got unborn request; + else + report that we don't accept unborn request; } This will matter even more if you write more else-if. The downstream of else-if clauses are forced to interpret (and fail) "unborn" request they are not interested in. >> > if (request->status != PACKET_READ_FLUSH) >> > die(_("expected flush after ls-refs arguments")); >> > >> > - head_ref_namespaced(send_ref, &data); >> > + send_possibly_unborn_head(&data); >> > for_each_namespaced_ref(send_ref, &data); >> >> And here is another caller of send_ref(). Are we sure that >> send_ref()'s expectation is satisfied by this caller when the >> iteration encounters a broken ref (e.g. refs/heads/broken not a >> symref but names an object that does not exist and get_sha1() >> yielding 0{40}), or a dangling symref (e.g. refs/remotes/origin/HEAD >> pointing at something that does not exist)? > > I assume that by "this caller" you mean for_each_namespaced_ref(), since > you mention an iteration. I believe so - send_ref has been changed to > tolerate a NULL (as in (void*)0, not 0{40}) oid, and that is the only > change, so if it worked previously, it should still work now. So a dangling symref, e.g. "refs/remotes/origin/HEAD -> trunk" when no "refs/remotes/origin/trunk" exists, is not reported to send_ref() in the same way as an unborn "HEAD"? I would have expected that we'd report where it points at, and for that to work, you'd have to use not just the vanilla send_ref() as the callback, but something that knows how to do "are we expected to send unborn symrefs" logic, like send_possibly_unborn_head does. That "changed to tolerate ... should work" worries me. If "for_each_namespaced_ref(send_ref, &data)" will never call send_ref() with NULL (as in (void *)0) oid, then that would be OK, but if it ends up calling with NULL somehow, it is responsible to ensure that data->symrefs is true and flag has REF_ISSYMREF set, or send_ref() would misbehave, (see the first part of your message, which I am responding to), no?
> Junio C Hamano <gitster@pobox.com> writes: > > > It feels somewhat brittle that we have to read the same variable and > > apply the same "default to true" logic in two places and have to > > keep them in sync. Is this because the decision to advertize or not > > has to be made way before the code that is specific to the > > implementation of ls-refs is run? > > > > If ls_refs_advertise() is always called first before ls_refs(), I > > wonder if it makes sense to reuse what we found out about the > > configured (or left unconfigured) state here and use it when > > ls_refs() gets called? I know that the way serve.c infrastructure > > calls "do we advertise?" helper from each protocol-element handler > > is too narrow and does not allow us to pass such a necessary piece > > of information but I view it as a misdesign that can be corrected > > (and until that happens, we could use file-local static limited to > > ls-refs.c). > > After giving the above a bit more thought, here are a few random > thoughts around the area. > > * As "struct protocol_capability" indicates, we have <name of > service, the logic to advertise, the logic to serve> as a > three-tuple for services. The serving logic should know what > advertising logic advertised (or more precisely, what information > advertising logic used to make that decision) so that they can > work consistently. > > For that, there should be a mechanism that advertising logic can > use to leave a note to serving logic, perhaps by adding a "void > *" to both of these functions. The advertising function would > allocate a piece of memory it wants to use and returns the > pointer to it to the caller in serve.c, and that pointer is given > to the corresponding ls_refs() when it is called by serve.c. > Then ls_refs_advertise can say "I found this configuration > setting and decided to advertise" to later ls_refs() and the > latter can say "ah, as you have advertised, I have to respond to > such a request". Usually the advertising is in the same file as the serving (true for ls-refs and fetch, so far) so I think it's easier if they just communicate on their own instead of through this "void *". I agree with the communication idea, though. > * I am not sure if "lsrefs.allowunborn = yes/no" is a good way to > configure this feature. Wouldn't it be more natural to make this > three-way, i.e. "lsrefs.unborn = advertise/serve/ignore", where > the server operator can choose among (1) advertise the presence > of the capability and respond to requests, (2) do not advertise > the capability but if a request comes, respond to it, and (3) do > not advertise and do not respond. We could throw in 'deny' that > causes the request to result in a failure but I do not care too > deeply about that fourth option. > > Using such a configuration mechanism, ls_refs_advertise may leave > the value of "lsrefs.unborn" (or lack thereof) it found and used > to base its decision to advertise, for use by ls_refs. ls_refs > in turn can use the value found there to decide if it ignores or > responds to the "unborn" request. lsrefs.unborn = advertise/serve/ignore was how it was in version 2 [1] (with different names) and I changed it due to Peff's suggestion, but perhaps I landed up in the unhappy middle [2]. I think we should just pick a standard and use it for this feature and whatever future features may come. The distinction between advertise and serve ("allow" in my version 2) is useful in some cases but is useless once migration has occurred (and thus clutters the code), but perhaps it could be argued that all servers need to do the migration at least once. [1] https://lore.kernel.org/git/cover.1608084282.git.jonathantanmy@google.com/ [2] https://lore.kernel.org/git/YBCitNb75rpnuW2L@coredump.intra.peff.net/
> > On Tue, Jan 26 2021, Jonathan Tan wrote: > > > +If the 'unborn' feature is advertised the following argument can be > > +included in the client's request. > > + > > + unborn > > + The server may send symrefs pointing to unborn branches in the form > > + "unborn <refname> symref-target:<target>". > > + > > "branches" as in things under refs/heads/*? What should happen if you > send this for a refs/tags/* or refs/xyz/*? Maybe overly pedantic, but it > seems we have no other explicit mention of refs/{heads,tags}/ in > protocol-v2.txt before this[1]. > > 1. Although as I've learned from another recent thread include-tag is > magical for refs/tags/* only. Thanks for spotting this. Right now the server sends anything, but the client only uses the information if it is a branch. I think this is the most flexible approach so I'll keep it this way and document it explicitly.
> So a dangling symref, e.g. "refs/remotes/origin/HEAD -> trunk" when > no "refs/remotes/origin/trunk" exists, is not reported to send_ref() > in the same way as an unborn "HEAD"? I've tried it, and yes, for_each_namespaced_ref() will not report it. Looking at the code, I think it is packed_ref_iterator_advance() which checks for broken objects and skips over them. > I would have expected that we'd > report where it points at, and for that to work, you'd have to use > not just the vanilla send_ref() as the callback, but something that > knows how to do "are we expected to send unborn symrefs" logic, like > send_possibly_unborn_head does. > > That "changed to tolerate ... should work" worries me. > > If "for_each_namespaced_ref(send_ref, &data)" will never call send_ref() > with NULL (as in (void *)0) oid, then that would be OK, If it called send_ref() with (void *)0 OID, it would segfault with the current code, which calls oid_to_hex() on the OID unconditionally. > but if it > ends up calling with NULL somehow, it is responsible to ensure that > data->symrefs is true and flag has REF_ISSYMREF set, or send_ref() > would misbehave, (see the first part of your message, which I am > responding to), no? If it did, then yes.
Jonathan Tan <jonathantanmy@google.com> writes: >> So a dangling symref, e.g. "refs/remotes/origin/HEAD -> trunk" when >> no "refs/remotes/origin/trunk" exists, is not reported to send_ref() >> in the same way as an unborn "HEAD"? > > I've tried it, and yes, for_each_namespaced_ref() will not report it. > Looking at the code, I think it is packed_ref_iterator_advance() which > checks for broken objects and skips over them. > >> I would have expected that we'd >> report where it points at, and for that to work, you'd have to use >> not just the vanilla send_ref() as the callback, but something that >> knows how to do "are we expected to send unborn symrefs" logic, like >> send_possibly_unborn_head does. >> >> That "changed to tolerate ... should work" worries me. >> >> If "for_each_namespaced_ref(send_ref, &data)" will never call send_ref() >> with NULL (as in (void *)0) oid, then that would be OK, > > If it called send_ref() with (void *)0 OID, it would segfault with the > current code, which calls oid_to_hex() on the OID unconditionally. > >> but if it >> ends up calling with NULL somehow, it is responsible to ensure that >> data->symrefs is true and flag has REF_ISSYMREF set, or send_ref() >> would misbehave, (see the first part of your message, which I am >> responding to), no? > > If it did, then yes. So, in other words, the series is *only* about HEAD and no other symrefs are reported when dangling as "unborn"?
diff --git a/Documentation/config.txt b/Documentation/config.txt index 6ba50b1104..d08e83a148 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -398,6 +398,8 @@ include::config/interactive.txt[] include::config/log.txt[] +include::config/lsrefs.txt[] + include::config/mailinfo.txt[] include::config/mailmap.txt[] diff --git a/Documentation/config/lsrefs.txt b/Documentation/config/lsrefs.txt new file mode 100644 index 0000000000..dcbec11aaa --- /dev/null +++ b/Documentation/config/lsrefs.txt @@ -0,0 +1,3 @@ +lsrefs.allowUnborn:: + Allow the server to send information about unborn symrefs during the + protocol v2 ref advertisement. diff --git a/Documentation/technical/protocol-v2.txt b/Documentation/technical/protocol-v2.txt index 85daeb5d9e..4707511c10 100644 --- a/Documentation/technical/protocol-v2.txt +++ b/Documentation/technical/protocol-v2.txt @@ -192,11 +192,19 @@ ls-refs takes in the following arguments: When specified, only references having a prefix matching one of the provided prefixes are displayed. +If the 'unborn' feature is advertised the following argument can be +included in the client's request. + + unborn + The server may send symrefs pointing to unborn branches in the form + "unborn <refname> symref-target:<target>". + The output of ls-refs is as follows: output = *ref flush-pkt - ref = PKT-LINE(obj-id SP refname *(SP ref-attribute) LF) + obj-id-or-unborn = (obj-id | "unborn") + ref = PKT-LINE(obj-id-or-unborn SP refname *(SP ref-attribute) LF) ref-attribute = (symref | peeled) symref = "symref-target:" symref-target peeled = "peeled:" obj-id diff --git a/ls-refs.c b/ls-refs.c index a1e0b473e4..4077adeb6a 100644 --- a/ls-refs.c +++ b/ls-refs.c @@ -32,6 +32,8 @@ struct ls_refs_data { unsigned peel; unsigned symrefs; struct strvec prefixes; + unsigned allow_unborn : 1; + unsigned unborn : 1; }; static int send_ref(const char *refname, const struct object_id *oid, @@ -47,7 +49,10 @@ static int send_ref(const char *refname, const struct object_id *oid, if (!ref_match(&data->prefixes, refname_nons)) return 0; - strbuf_addf(&refline, "%s %s", oid_to_hex(oid), refname_nons); + if (oid) + strbuf_addf(&refline, "%s %s", oid_to_hex(oid), refname_nons); + else + strbuf_addf(&refline, "unborn %s", refname_nons); if (data->symrefs && flag & REF_ISSYMREF) { struct object_id unused; const char *symref_target = resolve_ref_unsafe(refname, 0, @@ -74,8 +79,30 @@ static int send_ref(const char *refname, const struct object_id *oid, return 0; } -static int ls_refs_config(const char *var, const char *value, void *data) +static void send_possibly_unborn_head(struct ls_refs_data *data) { + struct strbuf namespaced = STRBUF_INIT; + struct object_id oid; + int flag; + int oid_is_null; + + strbuf_addf(&namespaced, "%sHEAD", get_git_namespace()); + if (!resolve_ref_unsafe(namespaced.buf, 0, &oid, &flag)) + return; /* bad ref */ + oid_is_null = is_null_oid(&oid); + if (!oid_is_null || + (data->unborn && data->symrefs && (flag & REF_ISSYMREF))) + send_ref(namespaced.buf, oid_is_null ? NULL : &oid, flag, data); + strbuf_release(&namespaced); +} + +static int ls_refs_config(const char *var, const char *value, void *cb_data) +{ + struct ls_refs_data *data = cb_data; + + if (!strcmp("lsrefs.allowunborn", var)) + data->allow_unborn = git_config_bool(var, value); + /* * We only serve fetches over v2 for now, so respect only "uploadpack" * config. This may need to eventually be expanded to "receive", but we @@ -91,7 +118,8 @@ int ls_refs(struct repository *r, struct strvec *keys, memset(&data, 0, sizeof(data)); - git_config(ls_refs_config, NULL); + data.allow_unborn = 1; + git_config(ls_refs_config, &data); while (packet_reader_read(request) == PACKET_READ_NORMAL) { const char *arg = request->line; @@ -103,14 +131,31 @@ int ls_refs(struct repository *r, struct strvec *keys, data.symrefs = 1; else if (skip_prefix(arg, "ref-prefix ", &out)) strvec_push(&data.prefixes, out); + else if (data.allow_unborn && !strcmp("unborn", arg)) + data.unborn = 1; } if (request->status != PACKET_READ_FLUSH) die(_("expected flush after ls-refs arguments")); - head_ref_namespaced(send_ref, &data); + send_possibly_unborn_head(&data); for_each_namespaced_ref(send_ref, &data); packet_flush(1); strvec_clear(&data.prefixes); return 0; } + +int ls_refs_advertise(struct repository *r, struct strbuf *value) +{ + if (value) { + int allow_unborn_value; + + if (repo_config_get_bool(the_repository, + "lsrefs.allowunborn", + &allow_unborn_value) || + allow_unborn_value) + strbuf_addstr(value, "unborn"); + } + + return 1; +} diff --git a/ls-refs.h b/ls-refs.h index 7b33a7c6b8..a99e4be0bd 100644 --- a/ls-refs.h +++ b/ls-refs.h @@ -6,5 +6,6 @@ struct strvec; struct packet_reader; int ls_refs(struct repository *r, struct strvec *keys, struct packet_reader *request); +int ls_refs_advertise(struct repository *r, struct strbuf *value); #endif /* LS_REFS_H */ diff --git a/serve.c b/serve.c index eec2fe6f29..ac20c72763 100644 --- a/serve.c +++ b/serve.c @@ -73,7 +73,7 @@ struct protocol_capability { static struct protocol_capability capabilities[] = { { "agent", agent_advertise, NULL }, - { "ls-refs", always_advertise, ls_refs }, + { "ls-refs", ls_refs_advertise, ls_refs }, { "fetch", upload_pack_advertise, upload_pack_v2 }, { "server-option", always_advertise, NULL }, { "object-format", object_format_advertise, NULL }, diff --git a/t/t5701-git-serve.sh b/t/t5701-git-serve.sh index a1f5fdc9fd..df29504161 100755 --- a/t/t5701-git-serve.sh +++ b/t/t5701-git-serve.sh @@ -12,7 +12,7 @@ test_expect_success 'test capability advertisement' ' cat >expect <<-EOF && version 2 agent=git/$(git version | cut -d" " -f3) - ls-refs + ls-refs=unborn fetch=shallow server-option object-format=$(test_oid algo)