[v3,0/8] refs: add symref support to 'git-update-ref'

Message ID	20240423212818.574123-1-knayak@gitlab.com (mailing list archive)
Headers	show Received: from mail-ej1-f52.google.com (mail-ej1-f52.google.com [209.85.218.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6EEA913CFA4 for <git@vger.kernel.org>; Tue, 23 Apr 2024 21:28:23 +0000 (UTC) From: Karthik Nayak <karthik.188@gmail.com> To: karthik.188@gmail.com Cc: chris.torek@gmail.com, git@vger.kernel.org, gitster@pobox.com, ps@pks.im Subject: [PATCH v3 0/8] refs: add symref support to 'git-update-ref' Date: Tue, 23 Apr 2024 23:28:10 +0200 Message-ID: <20240423212818.574123-1-knayak@gitlab.com> In-Reply-To: <20240412095908.1134387-1-knayak@gitlab.com> References: <20240412095908.1134387-1-knayak@gitlab.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	refs: add symref support to 'git-update-ref' \| expand [v3,0/8] refs: add symref support to 'git-update-ref' [v3,1/8] refs: accept symref values in `ref_transaction[_add]_update` [v3,2/8] update-ref: support parsing ref targets in `parse_next_oid` [v3,3/8] files-backend: extract out `create_symref_lock` [v3,4/8] update-ref: support symrefs in the verify command [v3,5/8] update-ref: support symrefs in the delete command [v3,6/8] update-ref: support symrefs in the create command [v3,7/8] update-ref: support symrefs in the update command [v3,8/8] ref: support symrefs in 'reference-transaction' hook

karthik nayak April 23, 2024, 9:28 p.m. UTC

From: Karthik Nayak <karthik.188@gmail.com>

The 'git-update-ref(1)' command allows transactional reference updates.
But currently only supports regular reference updates. Meaning, if one
wants to update HEAD (symbolic ref) in a transaction, there is no tool
to do so.

One option to obtain transactional updates for the HEAD ref is to
manually create the HEAD.lock file and commit. This is intrusive, where
the user needs to mimic internal git behavior. Also, this only works
when using the files backend.

At GitLab, we've been using the manual process till date, to allow users
to set and change their default branch. But with the introduction of
reftables as a reference backend, this becomes a necessity to be solved
within git.

The patch series adds symref support to the existing commands {verify,
create, delete, update} within 'git-update-ref'. This is done by parsing
inputs with the 'ref:' prefix as symref targets. This updates our current
commands to:

update SP <ref> SP (<new-oid> | ref:<new-target>) [SP (<old-oid> | ref:<old-target>)] LF
create SP <ref> SP (<new-oid> | ref:<new-target>) LF
delete SP <ref> [SP (<old-oid> | ref:<old-target>)] LF
verify SP <ref> [SP (<old-oid> | ref:<old-target>)] LF

Wherein, when ref:<new-target> is provided, the update ensures that
the <ref> is a symbolic ref which targets <new-target>. When
ref:<old-target> is provided, we ensure that <ref> is a symbolic ref
which targets <old-target> before the update.

With this it is possible to:
1. Create, verify, delete, and update symrefs
2. Create dangling symrefs
3. Update regular refs to become symrefs
4. Update symrefs to become regular refs

V1 of the patch series can be found here:
https://lore.kernel.org/git/20240330224623.579457-1-knayak@gitlab.com/
V2 of the patch series can be found here:
https://lore.kernel.org/git/20240412095908.1134387-1-knayak@gitlab.com/

Changes from v2:
1. We no longer have separate commands for symrefs, instead the regular
commands learn to parse 'ref:<target>' as symref targets. This reduces
the code in this series. Thanks Patrick for the suggestion.
2. Apart from supporting regular refs => symrefs. We also support the
inverse now.
3. Also allow creation of dangling refs.
4. Bunch of cleanups
   - whitespace issues in the previous patch series
   - uneeded header includes
   - uneeded tests
5. Added more documentation and better error messages, especially in reftables.
6. Better assertions around the input data.

Thanks all for the reviews on the previous iteration. Appreciate the support!

Range diff against v2:

1:  3269d0e91e ! 1:  4e49d54dcc refs: accept symref values in `ref_transaction[_add]_update`
    @@ Commit message
         flags to create a `ref_update` and add it to the transaction at hand.
     
         To extend symref support in transactions, we need to also accept the
    -    old and new ref values and process it. In this commit, let's add the
    -    required paramaters to the function and modify all call sites.
    +    old and new ref targets and process it. In this commit, let's add the
    +    required parameters to the function and modify all call sites.
     
    -    The two paramaters added are `new_ref` and `old_ref`. The `new_ref` is
    -    used to denote what the reference should point to when the transaction
    -    is applied. Some functions allow this parameter to be NULL, meaning that
    -    the reference is not changed, or `""`, meaning that the reference should
    -    be deleted.
    +    The two parameters added are `new_target` and `old_target`. The
    +    `new_target` is used to denote what the reference should point to when
    +    the transaction is applied. Some functions allow this parameter to be
    +    NULL, meaning that the reference is not changed.
     
    -    The `old_ref` denotes the value of that the reference must have before
    -    the update. Some functions allow this parameter to be NULL, meaning that
    -    the old value of the reference is not checked, or `""`, meaning that the
    -    reference must not exist before the update. A copy of this value is made
    +    The `old_target` denotes the value the reference must have before the
    +    update. Some functions allow this parameter to be NULL, meaning that the
    +    old value of the reference is not checked. A copy of this value is made
         in the transaction.
     
         The handling logic of these parameters will be added in consequent
    -    commits as we implement symref-{create, update, delete, verify}.
    +    commits as we add symref support to the existing 'git-update-ref'
    +    commands.
     
         Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
     
    @@ refs.c: struct ref_update *ref_transaction_add_update(
      		const char *refname, unsigned int flags,
      		const struct object_id *new_oid,
      		const struct object_id *old_oid,
    -+		const char *new_ref, const char *old_ref,
    ++		const char *new_target, const char *old_target,
      		const char *msg)
      {
      	struct ref_update *update;
    +@@ refs.c: struct ref_update *ref_transaction_add_update(
    + 	if (transaction->state != REF_TRANSACTION_OPEN)
    + 		BUG("update called for transaction that is not open");
    + 
    ++	if (old_oid && !is_null_oid(old_oid) && old_target)
    ++		BUG("Only one of old_oid and old_target should be non NULL");
    ++	if (new_oid && !is_null_oid(new_oid) && new_target)
    ++		BUG("Only one of new_oid and new_target should be non NULL");
    ++
    + 	FLEX_ALLOC_STR(update, refname, refname);
    + 	ALLOC_GROW(transaction->updates, transaction->nr + 1, transaction->alloc);
    + 	transaction->updates[transaction->nr++] = update;
     @@ refs.c: int ref_transaction_update(struct ref_transaction *transaction,
      			   const char *refname,
      			   const struct object_id *new_oid,
      			   const struct object_id *old_oid,
    -+			   const char *new_ref, const char *old_ref,
    ++			   const char *new_target,
    ++			   const char *old_target,
      			   unsigned int flags, const char *msg,
      			   struct strbuf *err)
      {
    @@ refs.c: int ref_transaction_update(struct ref_transaction *transaction,
      
      	ref_transaction_add_update(transaction, refname, flags,
     -				   new_oid, old_oid, msg);
    -+				   new_oid, old_oid, new_ref, old_ref, msg);
    ++				   new_oid, old_oid, new_target,
    ++				   old_target, msg);
      	return 0;
      }
      
    @@ refs.c: int refs_update_ref(struct ref_store *refs, const char *msg,
     
      ## refs.h ##
     @@ refs.h: struct ref_transaction *ref_transaction_begin(struct strbuf *err);
    -  */
    - #define REF_SKIP_REFNAME_VERIFICATION (1 << 11)
    - 
    -+/*
    -+ * The reference update is considered to be done on a symbolic reference. This
    -+ * ensures that we verify, delete, create and update the ref correspondingly.
    -+ */
    -+#define REF_SYMREF_UPDATE (1 << 12)
    -+
    - /*
    -  * Bitmask of all of the flags that are allowed to be passed in to
    -  * ref_transaction_update() and friends:
    -  */
    - #define REF_TRANSACTION_UPDATE_ALLOWED_FLAGS                                  \
    - 	(REF_NO_DEREF | REF_FORCE_CREATE_REFLOG | REF_SKIP_OID_VERIFICATION | \
    --	 REF_SKIP_REFNAME_VERIFICATION)
    -+	 REF_SKIP_REFNAME_VERIFICATION | REF_SYMREF_UPDATE )
    - 
    - /*
    -  * Add a reference update to transaction. `new_oid` is the value that
    +  *         before the update. A copy of this value is made in the
    +  *         transaction.
    +  *
    ++ *     new_target -- the target reference that the reference will be
    ++ *         update to point to. This takes precedence over new_oid when
    ++ *         set. If the reference is a regular reference, it will be
    ++ *         converted to a symbolic reference.
    ++ *
    ++ *     old_target -- the reference that the reference must be pointing to.
    ++ *         Will only be taken into account when the reference is a symbolic
    ++ *         reference.
    ++ *
    +  *     flags -- flags affecting the update, passed to
    +  *         update_ref_lock(). Possible flags: REF_NO_DEREF,
    +  *         REF_FORCE_CREATE_REFLOG. See those constants for more
    +@@ refs.h: struct ref_transaction *ref_transaction_begin(struct strbuf *err);
    +  * beforehand. The old value is checked after the lock is taken to
    +  * prevent races. If the old value doesn't agree with old_oid, the
    +  * whole transaction fails. If old_oid is NULL, then the previous
    +- * value is not checked.
    ++ * value is not checked. If `old_target` is not NULL, treat the reference
    ++ * as a symbolic ref and validate that its target before the update is
    ++ * `old_target`. If the `new_target` is not NULL, then the reference
    ++ * will be updated to a symbolic ref which targets `new_target`.
    ++ * Together, these allow us to update between regular refs and symrefs.
    +  *
    +  * See the above comment "Reference transaction updates" for more
    +  * information.
     @@ refs.h: int ref_transaction_update(struct ref_transaction *transaction,
      			   const char *refname,
      			   const struct object_id *new_oid,
      			   const struct object_id *old_oid,
    -+			   const char *new_ref, const char *old_ref,
    ++			   const char *new_target,
    ++			   const char *old_target,
      			   unsigned int flags, const char *msg,
      			   struct strbuf *err);
      
    @@ refs/refs-internal.h: struct ref_update {
      	struct object_id old_oid;
      
     +	/*
    -+	 * If (flags & REF_SYMREF_UPDATE), set the reference to this
    -+	 * value (or delete it, if `new_ref` is an empty string).
    ++	 * If set, point the reference to this value. This can also be
    ++	 * used to convert regular references to become symbolic refs.
     +	 */
    -+	const char *new_ref;
    ++	const char *new_target;
     +
     +	/*
    -+	 * If (type & REF_SYMREF_UPDATE), check that the reference
    -+	 * previously had this value (or didn't previously exist,
    -+	 * if `old_ref` is an empty string).
    ++	 * If set and the reference is a symbolic ref, check that the
    ++	 * reference previously pointed to this value.
     +	 */
    -+	const char *old_ref;
    ++	const char *old_target;
     +
      	/*
      	 * One or more of REF_NO_DEREF, REF_FORCE_CREATE_REFLOG,
    @@ refs/refs-internal.h: struct ref_update *ref_transaction_add_update(
      		const char *refname, unsigned int flags,
      		const struct object_id *new_oid,
      		const struct object_id *old_oid,
    -+		const char *new_ref, const char *old_ref,
    ++		const char *new_target, const char *old_target,
      		const char *msg);
      
      /*
2:  a8cb0e0a1d < -:  ---------- update-ref: add support for symref-verify
3:  37c3e006da < -:  ---------- update-ref: add support for symref-delete
-:  ---------- > 2:  37b7aadca4 update-ref: support parsing ref targets in `parse_next_oid`
4:  53fdb408ef = 3:  9cb7817f94 files-backend: extract out `create_symref_lock`
5:  8fa0151f94 < -:  ---------- update-ref: add support for symref-create
6:  714492ede3 < -:  ---------- update-ref: add support for symref-update
-:  ---------- > 4:  c7f43f6058 update-ref: support symrefs in the verify command
-:  ---------- > 5:  4016d6ca98 update-ref: support symrefs in the delete command
-:  ---------- > 6:  9f19e82f00 update-ref: support symrefs in the create command
-:  ---------- > 7:  132dbfcc5f update-ref: support symrefs in the update command
7:  c483104562 ! 8:  1b709f995b refs: support symrefs in 'reference-transaction' hook
    @@ Metadata
     Author: Karthik Nayak <karthik.188@gmail.com>
     
      ## Commit message ##
    -    refs: support symrefs in 'reference-transaction' hook
    +    ref: support symrefs in 'reference-transaction' hook
     
         The 'reference-transaction' hook runs whenever a reference update is
    -    made to the system. In the previous commits, we added support for
    -    various symref commands in `git-update-ref`. While it allowed us to now
    +    made to the system. In the previous commits, we added symref support for
    +    various commands in `git-update-ref`. While it allowed us to now
         manipulate symbolic refs via `git-update-ref`, it didn't activate the
         'reference-transaction' hook.
     
    @@ Commit message
         new format described for this and we stick to the existing format of:
             <old-value> SP <new-value> SP <ref-name> LF
         but now, <old-value> and <new-value> could also denote references
    -    instead of objects.
    +    instead of objects, where the format is similar to that in
    +    'git-update-ref', i.e. 'ref:<ref-target>'.
     
         While this seems to be backward incompatible, it is okay, since the only
         way the `reference-transaction` hook has refs in its output is when
    -    `git-update-ref` is used with `update-symref` command. Also the
    -    documentation for reference-transaction hook always stated that support
    -    for symbolic references may be added in the future.
    +    `git-update-ref` is used to manipulate symrefs. Also the documentation
    +    for reference-transaction hook always stated that support for symbolic
    +    references may be added in the future.
     
         Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
     
    @@ Documentation/githooks.txt: given reference transaction is in:
      `<ref-name>` via `git rev-parse`.
      
     +For symbolic reference updates the `<old_value>` and `<new-value>`
    -+fields could denote references instead of objects.
    ++fields could denote references instead of objects, denoted via the
    ++`ref:<ref-target>` format.
     +
      The exit status of the hook is ignored for any state except for the
      "prepared" state. In the "prepared" state, a non-zero exit status will
    @@ refs.c: static int run_transaction_hook(struct ref_transaction *transaction,
      
      	for (i = 0; i < transaction->nr; i++) {
      		struct ref_update *update = transaction->updates[i];
    -+		const char *new_value, *old_value;
    ++		strbuf_reset(&buf);
      
    --		if (update->flags & REF_SYMREF_UPDATE)
    +-		/*
    +-		 * Skip reference transaction for symbolic refs.
    +-		 */
    +-		if (update->new_target || update->old_target)
     -			continue;
    -+		new_value = oid_to_hex(&update->new_oid);
    -+		old_value = oid_to_hex(&update->old_oid);
    -+
    -+		if (update->flags & REF_SYMREF_UPDATE) {
    -+			if (update->flags & REF_HAVE_NEW && !null_new_value(update))
    -+				new_value = update->new_ref;
    -+			if (update->flags & REF_HAVE_OLD && update->old_ref)
    -+				old_value = update->old_ref;
    -+		}
    ++		if (update->flags & REF_HAVE_OLD && update->old_target)
    ++			strbuf_addf(&buf, "ref:%s ", update->old_target);
    ++		else
    ++			strbuf_addf(&buf, "%s ", oid_to_hex(&update->old_oid));
      
    - 		strbuf_reset(&buf);
    +-		strbuf_reset(&buf);
     -		strbuf_addf(&buf, "%s %s %s\n",
     -			    oid_to_hex(&update->old_oid),
     -			    oid_to_hex(&update->new_oid),
     -			    update->refname);
    -+		strbuf_addf(&buf, "%s %s %s\n", old_value, new_value, update->refname);
    ++		if (update->flags & REF_HAVE_NEW && update->new_target)
    ++			strbuf_addf(&buf, "ref:%s ", update->new_target);
    ++		else
    ++			strbuf_addf(&buf, "%s ", oid_to_hex(&update->new_oid));
    ++
    ++		strbuf_addf(&buf, "%s\n", update->refname);
      
      		if (write_in_full(proc.in, buf.buf, buf.len) < 0) {
      			if (errno != EPIPE) {
     
      ## t/t1416-ref-transaction-hooks.sh ##
    -@@ t/t1416-ref-transaction-hooks.sh: test_expect_success 'hook gets all queued updates in aborted state' '
    - 	test_cmp expect actual
    +@@ t/t1416-ref-transaction-hooks.sh: test_expect_success 'interleaving hook calls succeed' '
    + 	test_cmp expect target-repo.git/actual
      '
      
    -+# This test doesn't add a check for 'symref-delete' since there is a
    ++# This test doesn't add a check for symref 'delete' since there is a
     +# variation between the ref backends WRT 'delete'. In the files backend,
     +# 'delete' also triggers an additional transaction update on the
     +# packed-refs backend, which constitutes additional reflog entries.
    - test_expect_success 'interleaving hook calls succeed' '
    - 	test_when_finished "rm -r target-repo.git" &&
    - 
    -@@ t/t1416-ref-transaction-hooks.sh: test_expect_success 'interleaving hook calls succeed' '
    - 	test_cmp expect target-repo.git/actual
    - '
    - 
     +test_expect_success 'hook gets all queued symref updates' '
     +	test_when_finished "rm actual" &&
     +
    @@ t/t1416-ref-transaction-hooks.sh: test_expect_success 'interleaving hook calls s
     +
     +	cat >expect <<-EOF &&
     +		prepared
    -+		refs/heads/main $ZERO_OID refs/heads/symref
    -+		$ZERO_OID refs/heads/main refs/heads/symrefc
    -+		refs/heads/main refs/heads/branch refs/heads/symrefu
    ++		ref:refs/heads/main $ZERO_OID refs/heads/symref
    ++		$ZERO_OID ref:refs/heads/main refs/heads/symrefc
    ++		ref:refs/heads/main ref:refs/heads/branch refs/heads/symrefu
     +		committed
    -+		refs/heads/main $ZERO_OID refs/heads/symref
    -+		$ZERO_OID refs/heads/main refs/heads/symrefc
    -+		refs/heads/main refs/heads/branch refs/heads/symrefu
    ++		ref:refs/heads/main $ZERO_OID refs/heads/symref
    ++		$ZERO_OID ref:refs/heads/main refs/heads/symrefc
    ++		ref:refs/heads/main ref:refs/heads/branch refs/heads/symrefu
     +	EOF
     +
     +	git update-ref --no-deref --stdin <<-EOF &&
     +		start
    -+		symref-verify refs/heads/symref refs/heads/main
    -+		symref-create refs/heads/symrefc refs/heads/main
    -+		symref-update refs/heads/symrefu refs/heads/branch refs/heads/main
    ++		verify refs/heads/symref ref:refs/heads/main
    ++		create refs/heads/symrefc ref:refs/heads/main
    ++		update refs/heads/symrefu ref:refs/heads/branch ref:refs/heads/main
     +		prepare
     +		commit
     +	EOF


Karthik Nayak (8):
  refs: accept symref values in `ref_transaction[_add]_update`
  update-ref: support parsing ref targets in `parse_next_oid`
  files-backend: extract out `create_symref_lock`
  update-ref: support symrefs in the verify command
  update-ref: support symrefs in the delete command
  update-ref: support symrefs in the create command
  update-ref: support symrefs in the update command
  ref: support symrefs in 'reference-transaction' hook

 Documentation/git-update-ref.txt |  41 ++--
 Documentation/githooks.txt       |  14 +-
 branch.c                         |   2 +-
 builtin/clone.c                  |   2 +-
 builtin/fast-import.c            |   5 +-
 builtin/fetch.c                  |   4 +-
 builtin/receive-pack.c           |   4 +-
 builtin/replace.c                |   2 +-
 builtin/tag.c                    |   1 +
 builtin/update-ref.c             | 106 ++++++++--
 refs.c                           |  78 +++++--
 refs.h                           |  23 +-
 refs/files-backend.c             | 141 +++++++++++--
 refs/refs-internal.h             |  20 ++
 refs/reftable-backend.c          |  49 ++++-
 sequencer.c                      |   9 +-
 t/t0600-reffiles-backend.sh      |  32 +++
 t/t1400-update-ref.sh            | 346 ++++++++++++++++++++++++++++++-
 t/t1416-ref-transaction-hooks.sh |  41 ++++
 walker.c                         |   2 +-
 20 files changed, 817 insertions(+), 105 deletions(-)

Jeff King April 23, 2024, 10:03 p.m. UTC | #1

On Tue, Apr 23, 2024 at 11:28:10PM +0200, Karthik Nayak wrote:

> Changes from v2:
> 1. We no longer have separate commands for symrefs, instead the regular
> commands learn to parse 'ref:<target>' as symref targets. This reduces
> the code in this series. Thanks Patrick for the suggestion.

Hmm. I can see how this makes things a lot simpler, but it introduces an
ambiguity, since you can pass full ref expressions to "update-ref" (like
"ref:foo" to find the "foo" entry in ref^{tree}). I see that you only
kick in the symref "ref:" handling if the regular oid lookup failed, so
there's no backwards-compatibility issue (anything that used to work
will still take precedence, and the new code only runs when the old code
would have reported an error).

But I wonder if it would let somebody cause mischief in a repository
they can push to, but which may get updates from other sources. For
example, imagine a forge like GitLab runs the equivalent of:

  echo "create refs/whatever ref:refs/heads/main" |
  git update-ref --stdin

as part of some system process. Now if I push up "refs/heads/ref" that
contains the path "refs/heads/main" in its tree, that will take
precedence, causing the system process to do something it did not
expect.

I think you'd have to pile on a lot of assumptions to get any kind of
security problem. Something like:

 1. The system has a hidden ref namespace like refs/gitlab that normal
    remote push/fetch users are not allowed to read/write to.

 2. The system tries to make a symlink within that namespace. Say,
    "refs/gitlab/metadata/HEAD" to point to
    "refs/gitlab/metadata/branches/main" or something.

 3. The user pushes up "refs/heads/ref" with a tree that contains
    "refs/gitlab/metadata/branches/main". Now when (2) happens, the
    hidden ref points to user-controlled data.

That's pretty convoluted. But we can avoid it entirely if there's no
ambiguity in the protocol at all.

-Peff

Junio C Hamano April 24, 2024, 1:17 a.m. UTC | #2

Jeff King <peff@peff.net> writes:

> That's pretty convoluted. But we can avoid it entirely if there's no
> ambiguity in the protocol at all.

;-).

karthik nayak April 24, 2024, 4:25 p.m. UTC | #3

Jeff King <peff@peff.net> writes:

> On Tue, Apr 23, 2024 at 11:28:10PM +0200, Karthik Nayak wrote:
>
>> Changes from v2:
>> 1. We no longer have separate commands for symrefs, instead the regular
>> commands learn to parse 'ref:<target>' as symref targets. This reduces
>> the code in this series. Thanks Patrick for the suggestion.
>
> Hmm. I can see how this makes things a lot simpler, but it introduces an
> ambiguity, since you can pass full ref expressions to "update-ref" (like
> "ref:foo" to find the "foo" entry in ref^{tree}). I see that you only
> kick in the symref "ref:" handling if the regular oid lookup failed, so
> there's no backwards-compatibility issue (anything that used to work
> will still take precedence, and the new code only runs when the old code
> would have reported an error).
>
> But I wonder if it would let somebody cause mischief in a repository
> they can push to, but which may get updates from other sources. For
> example, imagine a forge like GitLab runs the equivalent of:
>
>   echo "create refs/whatever ref:refs/heads/main" |
>   git update-ref --stdin
>
> as part of some system process. Now if I push up "refs/heads/ref" that
> contains the path "refs/heads/main" in its tree, that will take
> precedence, causing the system process to do something it did not
> expect.
>
> I think you'd have to pile on a lot of assumptions to get any kind of
> security problem. Something like:
>
>  1. The system has a hidden ref namespace like refs/gitlab that normal
>     remote push/fetch users are not allowed to read/write to.
>
>  2. The system tries to make a symlink within that namespace. Say,
>     "refs/gitlab/metadata/HEAD" to point to
>     "refs/gitlab/metadata/branches/main" or something.
>
>  3. The user pushes up "refs/heads/ref" with a tree that contains
>     "refs/gitlab/metadata/branches/main". Now when (2) happens, the
>     hidden ref points to user-controlled data.
>
> That's pretty convoluted. But we can avoid it entirely if there's no
> ambiguity in the protocol at all.
>
> -Peff

Thanks Peff, that is indeed something I totally missed thinking about.

This also brings light onto the previous versions we were considering:

    symref-update SP <ref> SP <new-target> [SP (<old-target> | <old-oid>)] LF

There is also some ambiguity here which we missed, especially when we
support dangling refs. If we're updating a dangling ref <ref>, and we
provide an old value. Then there is uncertainty around whether the
provided value is actually a <old-target> or if it's an <old-oid>.

For non dangling ref symref, we first parse it as an <old-target> and
since the <old-target> would exist, we can move on.

So I see two ways to go about this,

1. In the symref-update function, we need to parse and see if <ref> is a
regular ref or a symref, if it is symref, we simply set the provided old
value as <old-target>, if not, we set it as <old-oid>. This seems clunky
because we'll be parsing the ref and trying to understand its type in
'update-ref.c', before the actual update.

2. We change the syntax to something like

    symref-update SP <ref> SP <new-ref> [SP (ref <old-target> | oid
<old-oid>)] LF

this would remove any ambiguity since the user specifies the data type
they're providing.

Overall, I think it is best to discard this version and I will push v4
with the older schematics of introducing new commands.

I'm currently considering going ahead with the [2], but will wait for a
day or two to consider other opinions.

Also on a sidenote, it's worth considering that with the direction of
[2], we could also extrapolate to introduce {verify, update, create,
delete} v2, which support both symrefs and regular refs. But require
explicit types from the user:

    update-v2 SP <ref> NUL (oid <new-oid> | ref <new-target>) NUL
[(oid <old-oid> | ref <old-target>)] NUL
	create-v2 SP <ref> NUL (oid <new-oid> | ref <new-target>) NUL
	delete-v2 SP <ref> NUL [(oid <old-oid> | ref <old-target>)] NUL
	verify-v2 SP <ref> NUL [(oid <old-oid> | ref <old-target>)] NUL

This is similar to the v3 patches I've currently sent out, in that it
would also allow cross operations between regular refs and symrefs.

Patrick Steinhardt April 25, 2024, 6:40 a.m. UTC | #4

On Wed, Apr 24, 2024 at 09:25:27AM -0700, Karthik Nayak wrote:
> Jeff King <peff@peff.net> writes:
> > On Tue, Apr 23, 2024 at 11:28:10PM +0200, Karthik Nayak wrote:
[snip]
> Also on a sidenote, it's worth considering that with the direction of
> [2], we could also extrapolate to introduce {verify, update, create,
> delete} v2, which support both symrefs and regular refs. But require
> explicit types from the user:
> 
>     update-v2 SP <ref> NUL (oid <new-oid> | ref <new-target>) NUL
> [(oid <old-oid> | ref <old-target>)] NUL
> 	create-v2 SP <ref> NUL (oid <new-oid> | ref <new-target>) NUL
> 	delete-v2 SP <ref> NUL [(oid <old-oid> | ref <old-target>)] NUL
> 	verify-v2 SP <ref> NUL [(oid <old-oid> | ref <old-target>)] NUL
> 
> This is similar to the v3 patches I've currently sent out, in that it
> would also allow cross operations between regular refs and symrefs.

One could put this new syntax behind a feature flag to avoid the "-v2"
suffixes, e.g. `git update-ref --with-symrefs`. But I'm not sure whether
this would be beneficial.

Patrick

Junio C Hamano April 25, 2024, 5:09 p.m. UTC | #5

You'd want something like this squashed into an appropriate step to
avoid breaking "make sparse".

diff --git a/builtin/update-ref.c b/builtin/update-ref.c
index 175579148f..1cdafc33f3 100644
--- a/builtin/update-ref.c
+++ b/builtin/update-ref.c
@@ -228,7 +228,7 @@ static void parse_cmd_update(struct ref_transaction *transaction,
 	have_old = !parse_next_arg(&next, end, &old_oid,
 				   &old_target, "update", refname,
 				   PARSE_SHA1_OLD | PARSE_REFNAME_TARGETS);
-	have_old = have_old & !old_target.len;
+	have_old = have_old && !old_target.len;
 
 	if (*next != line_termination)
 		die("update %s: extra input: %s", refname, next);

Junio C Hamano April 25, 2024, 6:01 p.m. UTC | #6

Karthik Nayak <karthik.188@gmail.com> writes:

> 2. We change the syntax to something like
>
>     symref-update SP <ref> SP <new-ref> [SP (ref <old-target> | oid
> <old-oid>)] LF
>
> this would remove any ambiguity since the user specifies the data type
> they're providing.

Yup.  Being explicit helps, especially if you are only dealing with
programs that do not complain "that's too many keystrokes" like
pesky humans ;-).

When the topic's overall title is to add support for symbolic refs
to the update-ref command, a natural expectation is that in a far
enough future everything that can be done with "git symbolic-ref"
can be done with "git update-ref" and we can, if we wanted to,
depreate "git symbolic-ref".  Is that really what is going on here?

IOW, we should add support for operation modes other than "--stdin"
as well, shouldn't we?

Thanks.

karthik nayak April 25, 2024, 9:07 p.m. UTC | #7

Junio C Hamano <gitster@pobox.com> writes:

> You'd want something like this squashed into an appropriate step to
> avoid breaking "make sparse".
>
> diff --git a/builtin/update-ref.c b/builtin/update-ref.c
> index 175579148f..1cdafc33f3 100644
> --- a/builtin/update-ref.c
> +++ b/builtin/update-ref.c
> @@ -228,7 +228,7 @@ static void parse_cmd_update(struct ref_transaction *transaction,
>  	have_old = !parse_next_arg(&next, end, &old_oid,
>  				   &old_target, "update", refname,
>  				   PARSE_SHA1_OLD | PARSE_REFNAME_TARGETS);
> -	have_old = have_old & !old_target.len;
> +	have_old = have_old && !old_target.len;
>
>  	if (*next != line_termination)
>  		die("update %s: extra input: %s", refname, next);

Thanks, will check and squash this in.

karthik nayak April 25, 2024, 9:12 p.m. UTC | #8

Patrick Steinhardt <ps@pks.im> writes:

> On Wed, Apr 24, 2024 at 09:25:27AM -0700, Karthik Nayak wrote:
>> Jeff King <peff@peff.net> writes:
>> > On Tue, Apr 23, 2024 at 11:28:10PM +0200, Karthik Nayak wrote:
> [snip]
>> Also on a sidenote, it's worth considering that with the direction of
>> [2], we could also extrapolate to introduce {verify, update, create,
>> delete} v2, which support both symrefs and regular refs. But require
>> explicit types from the user:
>>
>>     update-v2 SP <ref> NUL (oid <new-oid> | ref <new-target>) NUL
>> [(oid <old-oid> | ref <old-target>)] NUL
>> 	create-v2 SP <ref> NUL (oid <new-oid> | ref <new-target>) NUL
>> 	delete-v2 SP <ref> NUL [(oid <old-oid> | ref <old-target>)] NUL
>> 	verify-v2 SP <ref> NUL [(oid <old-oid> | ref <old-target>)] NUL
>>
>> This is similar to the v3 patches I've currently sent out, in that it
>> would also allow cross operations between regular refs and symrefs.
>
> One could put this new syntax behind a feature flag to avoid the "-v2"
> suffixes, e.g. `git update-ref --with-symrefs`. But I'm not sure whether
> this would be beneficial.
>

Yeah, me neither. I mean it doesn't provide any new functionality. The
only usecase we're missing currently (with old + symref-* commands), is
that we have no way to convert a symref to a regular ref while checking
the old_value. This could however simply be solved by introducing a new
`symref-convert` command.

karthik nayak April 25, 2024, 9:14 p.m. UTC | #9

Junio C Hamano <gitster@pobox.com> writes:

> Karthik Nayak <karthik.188@gmail.com> writes:
>
>> 2. We change the syntax to something like
>>
>>     symref-update SP <ref> SP <new-ref> [SP (ref <old-target> | oid
>> <old-oid>)] LF
>>
>> this would remove any ambiguity since the user specifies the data type
>> they're providing.
>
> Yup.  Being explicit helps, especially if you are only dealing with
> programs that do not complain "that's too many keystrokes" like
> pesky humans ;-).
>
> When the topic's overall title is to add support for symbolic refs
> to the update-ref command, a natural expectation is that in a far
> enough future everything that can be done with "git symbolic-ref"
> can be done with "git update-ref" and we can, if we wanted to,
> depreate "git symbolic-ref".  Is that really what is going on here?
>
> IOW, we should add support for operation modes other than "--stdin"
> as well, shouldn't we?
>
> Thanks.

That's a good point, I was thinking of something like this too, but
didn't want to bloat this series. Would it make sense to add this
functionality in a follow up series with the cleanup that Patrick
mentioned [1] too?

[1]: https://lore.kernel.org/git/ZidXrdi7hXdAnDhy@tanuki/

Junio C Hamano April 25, 2024, 9:55 p.m. UTC | #10

Karthik Nayak <karthik.188@gmail.com> writes:

>> IOW, we should add support for operation modes other than "--stdin"
>> as well, shouldn't we?
>>
>> Thanks.
>
> That's a good point, I was thinking of something like this too, but
> didn't want to bloat this series. Would it make sense to add this
> functionality in a follow up series with the cleanup that Patrick
> mentioned [1] too?

Going multi-step is usually a preferred approach but we have to be
sure that we do not have to retract the way earlier steps gave our
users when we move to later steps.  "Earlier we said that your
commands should look like this, but now you have to give your
commands differently" is absolutely what we want to avoid.

So, "in this iteration, you can use this enhanced feature only via
the "--stdin" mode, but we promise we will make the same available
on the command line" is perfectly fine.  You may need to retitle the
topic to clarify the scope of each iteration, though.

Thansk.

karthik nayak April 26, 2024, 12:48 p.m. UTC | #11

Junio C Hamano <gitster@pobox.com> writes:

> Karthik Nayak <karthik.188@gmail.com> writes:
>
>>> IOW, we should add support for operation modes other than "--stdin"
>>> as well, shouldn't we?
>>>
>>> Thanks.
>>
>> That's a good point, I was thinking of something like this too, but
>> didn't want to bloat this series. Would it make sense to add this
>> functionality in a follow up series with the cleanup that Patrick
>> mentioned [1] too?
>
> Going multi-step is usually a preferred approach but we have to be
> sure that we do not have to retract the way earlier steps gave our
> users when we move to later steps.  "Earlier we said that your
> commands should look like this, but now you have to give your
> commands differently" is absolutely what we want to avoid.
>
> So, "in this iteration, you can use this enhanced feature only via
> the "--stdin" mode, but we promise we will make the same available
> on the command line" is perfectly fine.  You may need to retitle the
> topic to clarify the scope of each iteration, though.
>
> Thansk.

Yeah, will do that. Possibly also some of the commits too, where we
could mention that this is being added to the `--stdin` mode and can be
extended further on.

Jeff King April 26, 2024, 8:41 p.m. UTC | #12

On Wed, Apr 24, 2024 at 09:25:27AM -0700, Karthik Nayak wrote:

> This also brings light onto the previous versions we were considering:
> 
>     symref-update SP <ref> SP <new-target> [SP (<old-target> | <old-oid>)] LF
> 
> There is also some ambiguity here which we missed, especially when we
> support dangling refs. If we're updating a dangling ref <ref>, and we
> provide an old value. Then there is uncertainty around whether the
> provided value is actually a <old-target> or if it's an <old-oid>.
> 
> For non dangling ref symref, we first parse it as an <old-target> and
> since the <old-target> would exist, we can move on.
> 
> So I see two ways to go about this,
> 
> 1. In the symref-update function, we need to parse and see if <ref> is a
> regular ref or a symref, if it is symref, we simply set the provided old
> value as <old-target>, if not, we set it as <old-oid>. This seems clunky
> because we'll be parsing the ref and trying to understand its type in
> 'update-ref.c', before the actual update.

I think this avoids the "mischief" case I mentioned because it is about
looking at what is in <ref> currently, not spooky action from a
possibly-unrelated "refs/heads/ref". But in general, I think it is a
good thing if we can tell what the caller is asking for based only on
the syntax of the request, without taking into account repository state.
It just makes things easier to reason about.

> 2. We change the syntax to something like
> 
>     symref-update SP <ref> SP <new-ref> [SP (ref <old-target> | oid
> <old-oid>)] LF
> 
> this would remove any ambiguity since the user specifies the data type
> they're providing.

Yeah, I was going to suggest that it could be resolved with any syntax
that would not be a valid oid name. Certainly check-ref-format places
some restrictions there (e.g., no "^") but name resolution relies on
that, too (so foo^{tree} is a valid name). Probably something like
"^foo" is unambiguous, but it's ugly and hard to explain. ;)

But I think your "ref <old-target>" solves that. Resolved names can have
spaces in them, but only after a "^" (e.g., "foo^{/some regex}") or ":"
(e.g., "foo:path with spaces"). So seeing just "ref" by itself, followed
by a space, I think is unambiguous. And it looks pretty.

It gets a little tricky when the field delimiter is also space, and the
item in question is not the final field. See below.

> Also on a sidenote, it's worth considering that with the direction of
> [2], we could also extrapolate to introduce {verify, update, create,
> delete} v2, which support both symrefs and regular refs. But require
> explicit types from the user:
> 
>     update-v2 SP <ref> NUL (oid <new-oid> | ref <new-target>) NUL
> [(oid <old-oid> | ref <old-target>)] NUL
> 	create-v2 SP <ref> NUL (oid <new-oid> | ref <new-target>) NUL
> 	delete-v2 SP <ref> NUL [(oid <old-oid> | ref <old-target>)] NUL
> 	verify-v2 SP <ref> NUL [(oid <old-oid> | ref <old-target>)] NUL
> 
> This is similar to the v3 patches I've currently sent out, in that it
> would also allow cross operations between regular refs and symrefs.

So I _think_ that "<oid> | ref <symref-target>" is unambiguous. In which
case you could just have:

  update SP <ref> NUL (<new-oid> | ref <new-target>) NUL [(<old-oid> | ref <old-target>)]

which is backwards-compatible.

With the NUL separator it's easy to parse, because "ref " with a
trailing space always means a ref, even in the "new" field. But with
spaces instead, it gets weird. If you have:

  update refs/heads/foo refs refs/heads/bar

it can either be creating a symref to "bar" (with no "old" specifier) or
it could be pointing it at the resolved-name "refs", which the old value
coming from "bar".

I guess one option would be to only allow "ref" syntax in "-z" mode, but
that is probably getting to be a bit weird and hard to explain.

-Peff

[v3,0/8] refs: add symref support to 'git-update-ref'

Message

Comments