Message ID | ae5c3b2b783f912a02b26142ecd753bf92530d2f.1610974040.git.ps@pks.im (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RESEND] refs: Always pass old object name to reftx hook | expand |
Patrick Steinhardt <ps@pks.im> writes: > Inputs of the reference-transaction hook currently depends on the > command which is being run. For example if the command `git update-ref > $REF $A $B` is executed, it will receive "$B $A $REF" as input, but if > the command `git update-ref $REF $A` is executed without providing the > old value, then it will receive "0*40 $A $REF" as input. This is due to > the fact that we directly write queued transaction updates into the > hook's standard input, which will not contain the old object value in > case it wasn't provided. In effect, the user says "I do not care if this update races with somebody else and it is perfectly OK if it overwrites their update" by not giving $B. > While this behaviour reflects what is happening as part of the > repository, it doesn't feel like it is useful. The main intent of the > reference-transaction hook is to be able to completely audit all > reference updates, no matter where they come from. As such, it makes a > lot more sense to always provide actual values instead of what the user > wanted. Furthermore, it's impossible for the hook to distinguish whether > this is intended to be a branch creation or a branch update without > doing additional digging with the current format. But shouldn't the transaction hook script be allowed to learn the end-user intention and behave differently? If we replace the missing old object before calling the script, wouldn't it lose information? The above is not an objection posed as two rhetoric questions. I am purely curious why losing information is OK in this case, or why it may not be so OK but should still be acceptable because it is lessor evil than giving 0{40} to the hooks. Even without this change, the current value the hook can learn by looking the ref up itself if it really wanted to, no?
On Mon, Jan 18, 2021 at 02:45:30PM -0800, Junio C Hamano wrote: > Patrick Steinhardt <ps@pks.im> writes: > > > Inputs of the reference-transaction hook currently depends on the > > command which is being run. For example if the command `git update-ref > > $REF $A $B` is executed, it will receive "$B $A $REF" as input, but if > > the command `git update-ref $REF $A` is executed without providing the > > old value, then it will receive "0*40 $A $REF" as input. This is due to > > the fact that we directly write queued transaction updates into the > > hook's standard input, which will not contain the old object value in > > case it wasn't provided. > > In effect, the user says "I do not care if this update races with > somebody else and it is perfectly OK if it overwrites their update" > by not giving $B. > > > While this behaviour reflects what is happening as part of the > > repository, it doesn't feel like it is useful. The main intent of the > > reference-transaction hook is to be able to completely audit all > > reference updates, no matter where they come from. As such, it makes a > > lot more sense to always provide actual values instead of what the user > > wanted. Furthermore, it's impossible for the hook to distinguish whether > > this is intended to be a branch creation or a branch update without > > doing additional digging with the current format. > > But shouldn't the transaction hook script be allowed to learn the > end-user intention and behave differently? If we replace the > missing old object before calling the script, wouldn't it lose > information? > > The above is not an objection posed as two rhetoric questions. I am > purely curious why losing information is OK in this case, or why it > may not be so OK but should still be acceptable because it is lessor > evil than giving 0{40} to the hooks. > > Even without this change, the current value the hook can learn by > looking the ref up itself if it really wanted to, no? I think the biggest problem is that right now, you cannot discern the actual intention of the user because the information provided to the hook is ambiguous in the branch creation case: "$ZERO_OID $NEW_OID $REF" could mean the user intends to create a new branch where it shouldn't have existed previously. BUT it could also mean that the user just doesn't care what the reference previously pointed to. The user could now try to derive the intention by manually looking up the current state of the reference. But that does feel kind of awkward to me. To me, having clearly defined semantics ("The script always gets old and new value of the branch regardless of what the user did") is preferable to having ambiguous semantics. Patrick
Patrick Steinhardt <ps@pks.im> writes: >> Even without this change, the current value the hook can learn by >> looking the ref up itself if it really wanted to, no? > > I think the biggest problem is that right now, you cannot discern the > actual intention of the user because the information provided to the > hook is ambiguous in the branch creation case: "$ZERO_OID $NEW_OID $REF" > could mean the user intends to create a new branch where it shouldn't > have existed previously. BUT it could also mean that the user just > doesn't care what the reference previously pointed to. Yes, it can mean both, but when you pretend to be that hook, wouldn't you check if the ref exists? If not, the user is trying to create it, and otherwise, the user does not know or care what the original value is, no? > The user could now try to derive the intention by manually looking up > the current state of the reference. But that does feel kind of awkward > to me. So in short, with respect to the OLD slot, there are three kind of end-user intention that could be conveyed to the hook: (1) the user does not care, so 0{40} appears in the OLD slot here, (2) the user is creating, so 0{40} apears in the OLD slot here, and (3) the user does care, and this is the OID in the OLD slot, And (1) and (2) cannot be separated without looking at the ref (in other words, if the hook really cares, it can find it out). But if you replace 0{40} with the current OID, then you are making it impossible to tell (1) and (3) apart. The hook cannot tell the distinction even if it is willing to go the extra mile. So that sounds like a strict disimprovement to me. If you can invent a way to help the hook to tell all three apart, I am very much interested. It would earn you a bonus point if you can do so without breaking backward compatibility (but I doubt that it is possible). > To me, having clearly defined semantics ("The script always gets old and > new value of the branch regardless of what the user did") is preferable > to having ambiguous semantics. But "The script always gets old that was given by the user and the new value to be stored" is very clearly defined semantics already. On the other hand, "The script gets a non-NULL object name in both cases, either when the user says s/he does not care, or when the user insists that the old value must be that, and it is not just ambiguous but is impossible to tell apart" is worse than just being ambiguous.
On Tue, Jan 19, 2021 at 11:06:15PM -0800, Junio C Hamano wrote: > Patrick Steinhardt <ps@pks.im> writes: > > >> Even without this change, the current value the hook can learn by > >> looking the ref up itself if it really wanted to, no? > > > > I think the biggest problem is that right now, you cannot discern the > > actual intention of the user because the information provided to the > > hook is ambiguous in the branch creation case: "$ZERO_OID $NEW_OID $REF" > > could mean the user intends to create a new branch where it shouldn't > > have existed previously. BUT it could also mean that the user just > > doesn't care what the reference previously pointed to. > > Yes, it can mean both, but when you pretend to be that hook, > wouldn't you check if the ref exists? If not, the user is trying to > create it, and otherwise, the user does not know or care what the > original value is, no? As long as you're aware as the script author, yes. There is one gotcha though: you can verify the state when the reference-transaction hook gets invoked in the "prepared" state, as it means that all references have been locked and thus cannot be changed by any other well-behaved process accessing the git repository. In "committed" or "aborted" that's not true anymore, given that the state has changed already, so any locks have been released and it's impossible to find out what happened now. > > The user could now try to derive the intention by manually looking up > > the current state of the reference. But that does feel kind of awkward > > to me. > > So in short, with respect to the OLD slot, there are three kind of > end-user intention that could be conveyed to the hook: > > (1) the user does not care, so 0{40} appears in the OLD slot here, > (2) the user is creating, so 0{40} apears in the OLD slot here, and > (3) the user does care, and this is the OID in the OLD slot, > > And (1) and (2) cannot be separated without looking at the ref (in > other words, if the hook really cares, it can find it out). > > But if you replace 0{40} with the current OID, then you are making > it impossible to tell (1) and (3) apart. The hook cannot tell the > distinction even if it is willing to go the extra mile. > > So that sounds like a strict disimprovement to me. True. > If you can invent a way to help the hook to tell all three apart, I > am very much interested. It would earn you a bonus point if you can > do so without breaking backward compatibility (but I doubt that it > is possible). I did think about any way to do this, but wasn't yet able to find one. And doing it in a backwards-compatible way is probably going to be impossible. One idea I had is to use something similar to the peeled-format we use in packed refs in case the actual change is different from the user-provided change. E.g. 0{40} <new> <ref> ^<old> or 0{40}^<old> <new> <ref> That can be considered as backwards-incompatible though. > > To me, having clearly defined semantics ("The script always gets old and > > new value of the branch regardless of what the user did") is preferable > > to having ambiguous semantics. > > But "The script always gets old that was given by the user and the > new value to be stored" is very clearly defined semantics already. > > On the other hand, "The script gets a non-NULL object name in both > cases, either when the user says s/he does not care, or when the > user insists that the old value must be that, and it is not just > ambiguous but is impossible to tell apart" is worse than just being > ambiguous. Yup. Whatever we agree on, what is clear is that the documentation needs to be more specific here. Patrick
Patrick Steinhardt <ps@pks.im> writes: >> Yes, it can mean both, but when you pretend to be that hook, >> wouldn't you check if the ref exists? If not, the user is trying to >> create it, and otherwise, the user does not know or care what the >> original value is, no? > > As long as you're aware as the script author, yes. As you said downbelow, I agree that clear documentation may be necessary. > There is one gotcha though: you can verify the state when the > reference-transaction hook gets invoked in the "prepared" state, as it > means that all references have been locked and thus cannot be changed by > any other well-behaved process accessing the git repository. In > "committed" or "aborted" that's not true anymore, given that the state > has changed already, so any locks have been released and it's impossible > to find out what happened now. True, but isn't the situation the same if we replaced the 0{40} old side with (one version of) original value of the ref? > different from the user-provided change. E.g. > > 0{40} <new> <ref> > ^<old> > > or > > 0{40}^<old> <new> <ref> > > That can be considered as backwards-incompatible though. Yes, it is an incompatible change. I thought of somehow annotating the old side, e.g. "<old> <new> <ref>" vs "<OLD> <new> <ref>", to show the distinction between "this is the original value of ref the user wanted to see when updating <ref>" and "the user does not care what value the <ref> gets updated from, but by the way, here is the original value of the ref as Git sees it" [*], but I cannot think of a way to do so without breaking existing readers. Side note: here, I am exploring the approach to replace 0{40} that is given when "do not care" into an actual original object name taken from the current state, like your patch did, but trying to find a way to make non-NULL object name distinguishable between the two cases (i.e. user-supplied vs system-filled). That raises another question. How much trust should the hook place on the value of the <old> given to it? When a non-NULL <old> value is given by the end-user, does the hook get the value as-is, or do we read the current value of the ref and send that as <old>? Does the transaction get rejected if the two are different and such a record is not even given to the hook? > Yup. Whatever we agree on, what is clear is that the documentation needs > to be more specific here. Yes, agreed. Thanks.
diff --git a/Documentation/githooks.txt b/Documentation/githooks.txt index 1f3b57d04d..e7baf0e2a0 100644 --- a/Documentation/githooks.txt +++ b/Documentation/githooks.txt @@ -492,6 +492,12 @@ receives on standard input a line of the format: <old-value> SP <new-value> SP <ref-name> LF +where `<old-value>` is the old object name stored in the ref, +`<new-value>` is the new object name to be stored in the ref and +`<ref-name>` is the full name of the ref. +When creating a new ref, `<old-value>` is 40 `0`. +When deleting an old ref, `<new-value>` is 40 `0`. + The exit status of the hook is ignored for any state except for the "prepared" state. In the "prepared" state, a non-zero exit status will cause the transaction to be aborted. The hook will not be called with diff --git a/refs/files-backend.c b/refs/files-backend.c index 04e85e7002..5b10d3822b 100644 --- a/refs/files-backend.c +++ b/refs/files-backend.c @@ -2452,6 +2452,9 @@ static int lock_ref_for_update(struct files_ref_store *refs, ret = TRANSACTION_GENERIC_ERROR; goto out; } + + if (!(update->flags & REF_HAVE_OLD)) + oidcpy(&update->old_oid, &lock->old_oid); } else { /* * Create a new update for the reference this @@ -2474,6 +2477,9 @@ static int lock_ref_for_update(struct files_ref_store *refs, goto out; } + if (!(update->flags & REF_HAVE_OLD)) + oidcpy(&update->old_oid, &lock->old_oid); + /* * If this update is happening indirectly because of a * symref update, record the old OID in the parent @@ -2484,6 +2490,8 @@ static int lock_ref_for_update(struct files_ref_store *refs, parent_update = parent_update->parent_update) { struct ref_lock *parent_lock = parent_update->backend_data; oidcpy(&parent_lock->old_oid, &lock->old_oid); + if (!(parent_update->flags & REF_HAVE_OLD)) + oidcpy(&parent_update->old_oid, &lock->old_oid); } } diff --git a/refs/packed-backend.c b/refs/packed-backend.c index b912f2505f..08f0feee3d 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -1178,6 +1178,8 @@ static int write_with_updates(struct packed_ref_store *refs, oid_to_hex(&update->old_oid)); goto error; } + } else { + oidcpy(&update->old_oid, iter->oid); } /* Now figure out what to use for the new value: */ diff --git a/t/t1416-ref-transaction-hooks.sh b/t/t1416-ref-transaction-hooks.sh index f6e741c6c0..111533682a 100755 --- a/t/t1416-ref-transaction-hooks.sh +++ b/t/t1416-ref-transaction-hooks.sh @@ -52,12 +52,12 @@ test_expect_success 'hook gets all queued updates in prepared state' ' fi EOF cat >expect <<-EOF && - $ZERO_OID $POST_OID HEAD - $ZERO_OID $POST_OID refs/heads/master + $PRE_OID $POST_OID HEAD + $PRE_OID $POST_OID refs/heads/master EOF git update-ref HEAD POST <<-EOF && - update HEAD $ZERO_OID $POST_OID - update refs/heads/master $ZERO_OID $POST_OID + update HEAD $PRE_OID $POST_OID + update refs/heads/master $PRE_OID $POST_OID EOF test_cmp expect actual ' @@ -75,8 +75,8 @@ test_expect_success 'hook gets all queued updates in committed state' ' fi EOF cat >expect <<-EOF && - $ZERO_OID $POST_OID HEAD - $ZERO_OID $POST_OID refs/heads/master + $PRE_OID $POST_OID HEAD + $PRE_OID $POST_OID refs/heads/master EOF git update-ref HEAD POST && test_cmp expect actual
Inputs of the reference-transaction hook currently depends on the command which is being run. For example if the command `git update-ref $REF $A $B` is executed, it will receive "$B $A $REF" as input, but if the command `git update-ref $REF $A` is executed without providing the old value, then it will receive "0*40 $A $REF" as input. This is due to the fact that we directly write queued transaction updates into the hook's standard input, which will not contain the old object value in case it wasn't provided. While this behaviour reflects what is happening as part of the repository, it doesn't feel like it is useful. The main intent of the reference-transaction hook is to be able to completely audit all reference updates, no matter where they come from. As such, it makes a lot more sense to always provide actual values instead of what the user wanted. Furthermore, it's impossible for the hook to distinguish whether this is intended to be a branch creation or a branch update without doing additional digging with the current format. Fix the issue by storing the old object value into the queued transaction update operation if it wasn't provided by the caller. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- Documentation/githooks.txt | 6 ++++++ refs/files-backend.c | 8 ++++++++ refs/packed-backend.c | 2 ++ t/t1416-ref-transaction-hooks.sh | 12 ++++++------ 4 files changed, 22 insertions(+), 6 deletions(-)