Message ID | 20240423212818.574123-1-knayak@gitlab.com (mailing list archive) |
---|---|
Headers | show |
Series | refs: add symref support to 'git-update-ref' | expand |
On Tue, Apr 23, 2024 at 11:28:10PM +0200, Karthik Nayak wrote: > Changes from v2: > 1. We no longer have separate commands for symrefs, instead the regular > commands learn to parse 'ref:<target>' as symref targets. This reduces > the code in this series. Thanks Patrick for the suggestion. Hmm. I can see how this makes things a lot simpler, but it introduces an ambiguity, since you can pass full ref expressions to "update-ref" (like "ref:foo" to find the "foo" entry in ref^{tree}). I see that you only kick in the symref "ref:" handling if the regular oid lookup failed, so there's no backwards-compatibility issue (anything that used to work will still take precedence, and the new code only runs when the old code would have reported an error). But I wonder if it would let somebody cause mischief in a repository they can push to, but which may get updates from other sources. For example, imagine a forge like GitLab runs the equivalent of: echo "create refs/whatever ref:refs/heads/main" | git update-ref --stdin as part of some system process. Now if I push up "refs/heads/ref" that contains the path "refs/heads/main" in its tree, that will take precedence, causing the system process to do something it did not expect. I think you'd have to pile on a lot of assumptions to get any kind of security problem. Something like: 1. The system has a hidden ref namespace like refs/gitlab that normal remote push/fetch users are not allowed to read/write to. 2. The system tries to make a symlink within that namespace. Say, "refs/gitlab/metadata/HEAD" to point to "refs/gitlab/metadata/branches/main" or something. 3. The user pushes up "refs/heads/ref" with a tree that contains "refs/gitlab/metadata/branches/main". Now when (2) happens, the hidden ref points to user-controlled data. That's pretty convoluted. But we can avoid it entirely if there's no ambiguity in the protocol at all. -Peff
Jeff King <peff@peff.net> writes: > That's pretty convoluted. But we can avoid it entirely if there's no > ambiguity in the protocol at all. ;-).
Jeff King <peff@peff.net> writes: > On Tue, Apr 23, 2024 at 11:28:10PM +0200, Karthik Nayak wrote: > >> Changes from v2: >> 1. We no longer have separate commands for symrefs, instead the regular >> commands learn to parse 'ref:<target>' as symref targets. This reduces >> the code in this series. Thanks Patrick for the suggestion. > > Hmm. I can see how this makes things a lot simpler, but it introduces an > ambiguity, since you can pass full ref expressions to "update-ref" (like > "ref:foo" to find the "foo" entry in ref^{tree}). I see that you only > kick in the symref "ref:" handling if the regular oid lookup failed, so > there's no backwards-compatibility issue (anything that used to work > will still take precedence, and the new code only runs when the old code > would have reported an error). > > But I wonder if it would let somebody cause mischief in a repository > they can push to, but which may get updates from other sources. For > example, imagine a forge like GitLab runs the equivalent of: > > echo "create refs/whatever ref:refs/heads/main" | > git update-ref --stdin > > as part of some system process. Now if I push up "refs/heads/ref" that > contains the path "refs/heads/main" in its tree, that will take > precedence, causing the system process to do something it did not > expect. > > I think you'd have to pile on a lot of assumptions to get any kind of > security problem. Something like: > > 1. The system has a hidden ref namespace like refs/gitlab that normal > remote push/fetch users are not allowed to read/write to. > > 2. The system tries to make a symlink within that namespace. Say, > "refs/gitlab/metadata/HEAD" to point to > "refs/gitlab/metadata/branches/main" or something. > > 3. The user pushes up "refs/heads/ref" with a tree that contains > "refs/gitlab/metadata/branches/main". Now when (2) happens, the > hidden ref points to user-controlled data. > > That's pretty convoluted. But we can avoid it entirely if there's no > ambiguity in the protocol at all. > > -Peff Thanks Peff, that is indeed something I totally missed thinking about. This also brings light onto the previous versions we were considering: symref-update SP <ref> SP <new-target> [SP (<old-target> | <old-oid>)] LF There is also some ambiguity here which we missed, especially when we support dangling refs. If we're updating a dangling ref <ref>, and we provide an old value. Then there is uncertainty around whether the provided value is actually a <old-target> or if it's an <old-oid>. For non dangling ref symref, we first parse it as an <old-target> and since the <old-target> would exist, we can move on. So I see two ways to go about this, 1. In the symref-update function, we need to parse and see if <ref> is a regular ref or a symref, if it is symref, we simply set the provided old value as <old-target>, if not, we set it as <old-oid>. This seems clunky because we'll be parsing the ref and trying to understand its type in 'update-ref.c', before the actual update. 2. We change the syntax to something like symref-update SP <ref> SP <new-ref> [SP (ref <old-target> | oid <old-oid>)] LF this would remove any ambiguity since the user specifies the data type they're providing. Overall, I think it is best to discard this version and I will push v4 with the older schematics of introducing new commands. I'm currently considering going ahead with the [2], but will wait for a day or two to consider other opinions. Also on a sidenote, it's worth considering that with the direction of [2], we could also extrapolate to introduce {verify, update, create, delete} v2, which support both symrefs and regular refs. But require explicit types from the user: update-v2 SP <ref> NUL (oid <new-oid> | ref <new-target>) NUL [(oid <old-oid> | ref <old-target>)] NUL create-v2 SP <ref> NUL (oid <new-oid> | ref <new-target>) NUL delete-v2 SP <ref> NUL [(oid <old-oid> | ref <old-target>)] NUL verify-v2 SP <ref> NUL [(oid <old-oid> | ref <old-target>)] NUL This is similar to the v3 patches I've currently sent out, in that it would also allow cross operations between regular refs and symrefs.
On Wed, Apr 24, 2024 at 09:25:27AM -0700, Karthik Nayak wrote: > Jeff King <peff@peff.net> writes: > > On Tue, Apr 23, 2024 at 11:28:10PM +0200, Karthik Nayak wrote: [snip] > Also on a sidenote, it's worth considering that with the direction of > [2], we could also extrapolate to introduce {verify, update, create, > delete} v2, which support both symrefs and regular refs. But require > explicit types from the user: > > update-v2 SP <ref> NUL (oid <new-oid> | ref <new-target>) NUL > [(oid <old-oid> | ref <old-target>)] NUL > create-v2 SP <ref> NUL (oid <new-oid> | ref <new-target>) NUL > delete-v2 SP <ref> NUL [(oid <old-oid> | ref <old-target>)] NUL > verify-v2 SP <ref> NUL [(oid <old-oid> | ref <old-target>)] NUL > > This is similar to the v3 patches I've currently sent out, in that it > would also allow cross operations between regular refs and symrefs. One could put this new syntax behind a feature flag to avoid the "-v2" suffixes, e.g. `git update-ref --with-symrefs`. But I'm not sure whether this would be beneficial. Patrick
You'd want something like this squashed into an appropriate step to avoid breaking "make sparse". diff --git a/builtin/update-ref.c b/builtin/update-ref.c index 175579148f..1cdafc33f3 100644 --- a/builtin/update-ref.c +++ b/builtin/update-ref.c @@ -228,7 +228,7 @@ static void parse_cmd_update(struct ref_transaction *transaction, have_old = !parse_next_arg(&next, end, &old_oid, &old_target, "update", refname, PARSE_SHA1_OLD | PARSE_REFNAME_TARGETS); - have_old = have_old & !old_target.len; + have_old = have_old && !old_target.len; if (*next != line_termination) die("update %s: extra input: %s", refname, next);
Karthik Nayak <karthik.188@gmail.com> writes: > 2. We change the syntax to something like > > symref-update SP <ref> SP <new-ref> [SP (ref <old-target> | oid > <old-oid>)] LF > > this would remove any ambiguity since the user specifies the data type > they're providing. Yup. Being explicit helps, especially if you are only dealing with programs that do not complain "that's too many keystrokes" like pesky humans ;-). When the topic's overall title is to add support for symbolic refs to the update-ref command, a natural expectation is that in a far enough future everything that can be done with "git symbolic-ref" can be done with "git update-ref" and we can, if we wanted to, depreate "git symbolic-ref". Is that really what is going on here? IOW, we should add support for operation modes other than "--stdin" as well, shouldn't we? Thanks.
Junio C Hamano <gitster@pobox.com> writes: > You'd want something like this squashed into an appropriate step to > avoid breaking "make sparse". > > diff --git a/builtin/update-ref.c b/builtin/update-ref.c > index 175579148f..1cdafc33f3 100644 > --- a/builtin/update-ref.c > +++ b/builtin/update-ref.c > @@ -228,7 +228,7 @@ static void parse_cmd_update(struct ref_transaction *transaction, > have_old = !parse_next_arg(&next, end, &old_oid, > &old_target, "update", refname, > PARSE_SHA1_OLD | PARSE_REFNAME_TARGETS); > - have_old = have_old & !old_target.len; > + have_old = have_old && !old_target.len; > > if (*next != line_termination) > die("update %s: extra input: %s", refname, next); Thanks, will check and squash this in.
Patrick Steinhardt <ps@pks.im> writes: > On Wed, Apr 24, 2024 at 09:25:27AM -0700, Karthik Nayak wrote: >> Jeff King <peff@peff.net> writes: >> > On Tue, Apr 23, 2024 at 11:28:10PM +0200, Karthik Nayak wrote: > [snip] >> Also on a sidenote, it's worth considering that with the direction of >> [2], we could also extrapolate to introduce {verify, update, create, >> delete} v2, which support both symrefs and regular refs. But require >> explicit types from the user: >> >> update-v2 SP <ref> NUL (oid <new-oid> | ref <new-target>) NUL >> [(oid <old-oid> | ref <old-target>)] NUL >> create-v2 SP <ref> NUL (oid <new-oid> | ref <new-target>) NUL >> delete-v2 SP <ref> NUL [(oid <old-oid> | ref <old-target>)] NUL >> verify-v2 SP <ref> NUL [(oid <old-oid> | ref <old-target>)] NUL >> >> This is similar to the v3 patches I've currently sent out, in that it >> would also allow cross operations between regular refs and symrefs. > > One could put this new syntax behind a feature flag to avoid the "-v2" > suffixes, e.g. `git update-ref --with-symrefs`. But I'm not sure whether > this would be beneficial. > Yeah, me neither. I mean it doesn't provide any new functionality. The only usecase we're missing currently (with old + symref-* commands), is that we have no way to convert a symref to a regular ref while checking the old_value. This could however simply be solved by introducing a new `symref-convert` command.
Junio C Hamano <gitster@pobox.com> writes: > Karthik Nayak <karthik.188@gmail.com> writes: > >> 2. We change the syntax to something like >> >> symref-update SP <ref> SP <new-ref> [SP (ref <old-target> | oid >> <old-oid>)] LF >> >> this would remove any ambiguity since the user specifies the data type >> they're providing. > > Yup. Being explicit helps, especially if you are only dealing with > programs that do not complain "that's too many keystrokes" like > pesky humans ;-). > > When the topic's overall title is to add support for symbolic refs > to the update-ref command, a natural expectation is that in a far > enough future everything that can be done with "git symbolic-ref" > can be done with "git update-ref" and we can, if we wanted to, > depreate "git symbolic-ref". Is that really what is going on here? > > IOW, we should add support for operation modes other than "--stdin" > as well, shouldn't we? > > Thanks. That's a good point, I was thinking of something like this too, but didn't want to bloat this series. Would it make sense to add this functionality in a follow up series with the cleanup that Patrick mentioned [1] too? [1]: https://lore.kernel.org/git/ZidXrdi7hXdAnDhy@tanuki/
Karthik Nayak <karthik.188@gmail.com> writes: >> IOW, we should add support for operation modes other than "--stdin" >> as well, shouldn't we? >> >> Thanks. > > That's a good point, I was thinking of something like this too, but > didn't want to bloat this series. Would it make sense to add this > functionality in a follow up series with the cleanup that Patrick > mentioned [1] too? Going multi-step is usually a preferred approach but we have to be sure that we do not have to retract the way earlier steps gave our users when we move to later steps. "Earlier we said that your commands should look like this, but now you have to give your commands differently" is absolutely what we want to avoid. So, "in this iteration, you can use this enhanced feature only via the "--stdin" mode, but we promise we will make the same available on the command line" is perfectly fine. You may need to retitle the topic to clarify the scope of each iteration, though. Thansk.
Junio C Hamano <gitster@pobox.com> writes: > Karthik Nayak <karthik.188@gmail.com> writes: > >>> IOW, we should add support for operation modes other than "--stdin" >>> as well, shouldn't we? >>> >>> Thanks. >> >> That's a good point, I was thinking of something like this too, but >> didn't want to bloat this series. Would it make sense to add this >> functionality in a follow up series with the cleanup that Patrick >> mentioned [1] too? > > Going multi-step is usually a preferred approach but we have to be > sure that we do not have to retract the way earlier steps gave our > users when we move to later steps. "Earlier we said that your > commands should look like this, but now you have to give your > commands differently" is absolutely what we want to avoid. > > So, "in this iteration, you can use this enhanced feature only via > the "--stdin" mode, but we promise we will make the same available > on the command line" is perfectly fine. You may need to retitle the > topic to clarify the scope of each iteration, though. > > Thansk. Yeah, will do that. Possibly also some of the commits too, where we could mention that this is being added to the `--stdin` mode and can be extended further on.
On Wed, Apr 24, 2024 at 09:25:27AM -0700, Karthik Nayak wrote: > This also brings light onto the previous versions we were considering: > > symref-update SP <ref> SP <new-target> [SP (<old-target> | <old-oid>)] LF > > There is also some ambiguity here which we missed, especially when we > support dangling refs. If we're updating a dangling ref <ref>, and we > provide an old value. Then there is uncertainty around whether the > provided value is actually a <old-target> or if it's an <old-oid>. > > For non dangling ref symref, we first parse it as an <old-target> and > since the <old-target> would exist, we can move on. > > So I see two ways to go about this, > > 1. In the symref-update function, we need to parse and see if <ref> is a > regular ref or a symref, if it is symref, we simply set the provided old > value as <old-target>, if not, we set it as <old-oid>. This seems clunky > because we'll be parsing the ref and trying to understand its type in > 'update-ref.c', before the actual update. I think this avoids the "mischief" case I mentioned because it is about looking at what is in <ref> currently, not spooky action from a possibly-unrelated "refs/heads/ref". But in general, I think it is a good thing if we can tell what the caller is asking for based only on the syntax of the request, without taking into account repository state. It just makes things easier to reason about. > 2. We change the syntax to something like > > symref-update SP <ref> SP <new-ref> [SP (ref <old-target> | oid > <old-oid>)] LF > > this would remove any ambiguity since the user specifies the data type > they're providing. Yeah, I was going to suggest that it could be resolved with any syntax that would not be a valid oid name. Certainly check-ref-format places some restrictions there (e.g., no "^") but name resolution relies on that, too (so foo^{tree} is a valid name). Probably something like "^foo" is unambiguous, but it's ugly and hard to explain. ;) But I think your "ref <old-target>" solves that. Resolved names can have spaces in them, but only after a "^" (e.g., "foo^{/some regex}") or ":" (e.g., "foo:path with spaces"). So seeing just "ref" by itself, followed by a space, I think is unambiguous. And it looks pretty. It gets a little tricky when the field delimiter is also space, and the item in question is not the final field. See below. > Also on a sidenote, it's worth considering that with the direction of > [2], we could also extrapolate to introduce {verify, update, create, > delete} v2, which support both symrefs and regular refs. But require > explicit types from the user: > > update-v2 SP <ref> NUL (oid <new-oid> | ref <new-target>) NUL > [(oid <old-oid> | ref <old-target>)] NUL > create-v2 SP <ref> NUL (oid <new-oid> | ref <new-target>) NUL > delete-v2 SP <ref> NUL [(oid <old-oid> | ref <old-target>)] NUL > verify-v2 SP <ref> NUL [(oid <old-oid> | ref <old-target>)] NUL > > This is similar to the v3 patches I've currently sent out, in that it > would also allow cross operations between regular refs and symrefs. So I _think_ that "<oid> | ref <symref-target>" is unambiguous. In which case you could just have: update SP <ref> NUL (<new-oid> | ref <new-target>) NUL [(<old-oid> | ref <old-target>)] which is backwards-compatible. With the NUL separator it's easy to parse, because "ref " with a trailing space always means a ref, even in the "new" field. But with spaces instead, it gets weird. If you have: update refs/heads/foo refs refs/heads/bar it can either be creating a symref to "bar" (with no "old" specifier) or it could be pointing it at the resolved-name "refs", which the old value coming from "bar". I guess one option would be to only allow "ref" syntax in "-z" mode, but that is probably getting to be a bit weird and hard to explain. -Peff