Message ID | 1d1a94426f95d842e0e3ea6a1569c0c45239229c.1591086316.git.ps@pks.im (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | refs: implement reference transaction hooks | expand |
Patrick Steinhardt <ps@pks.im> writes: > The above scenario is the motivation for a set of three new hooks that > reach directly into Git's reference transaction. Each of the following > new hooks (currently) doesn't accept any parameters and receives the set > of queued reference updates via stdin: Do we have something (e.g. performance measurement) to convince ourselves that this won't incur unacceptable levels of overhead in null cases where there is no hook installed in the repository? > +static int run_transaction_hook(struct ref_transaction *transaction, > + const char *hook_name) > +{ > + struct child_process proc = CHILD_PROCESS_INIT; > + struct strbuf buf = STRBUF_INIT; > + const char *argv[2]; > + int code, i; > + > + argv[0] = find_hook(hook_name); > + if (!argv[0]) > + return 0; > + > + argv[1] = NULL; > + > + proc.argv = argv; Let's use proc.args and argv_push() API in newly introduced code instead. > + proc.in = -1; > + proc.stdout_to_stderr = 1; > + proc.trace2_hook_name = hook_name; > + > + code = start_command(&proc); > + if (code) > + return code; > + > + sigchain_push(SIGPIPE, SIG_IGN); > + > + for (i = 0; i < transaction->nr; i++) { > + struct ref_update *update = transaction->updates[i]; > + > + strbuf_reset(&buf); > + strbuf_addf(&buf, "%s %s %s\n", > + oid_to_hex(&update->old_oid), > + oid_to_hex(&update->new_oid), > + update->refname); > + > + if (write_in_full(proc.in, buf.buf, buf.len) < 0) > + break; We leave the loop early when we detect a write failure here... > + } > + > + close(proc.in); > + sigchain_pop(SIGPIPE); > + > + strbuf_release(&buf); > + return finish_command(&proc); ... but the caller does not get notified. Intended? > +} > + > int ref_transaction_prepare(struct ref_transaction *transaction, > struct strbuf *err) > { > struct ref_store *refs = transaction->ref_store; > + int ret; > > switch (transaction->state) { > case REF_TRANSACTION_OPEN: > @@ -2012,7 +2060,17 @@ int ref_transaction_prepare(struct ref_transaction *transaction, > return -1; > } > > - return refs->be->transaction_prepare(refs, transaction, err); > + ret = refs->be->transaction_prepare(refs, transaction, err); > + if (ret) > + return ret; > + > + ret = run_transaction_hook(transaction, "ref-transaction-prepared"); This caller does care about it, no? > + if (ret) { > + ref_transaction_abort(transaction, err); > + die(_("ref updates aborted by hook")); > + } > + > + return 0; > } > > int ref_transaction_abort(struct ref_transaction *transaction, > @@ -2036,6 +2094,8 @@ int ref_transaction_abort(struct ref_transaction *transaction, > break; > } > > + run_transaction_hook(transaction, "ref-transaction-aborted"); And I presume that the callers of "ref_xn_abort()" would, too, but the value returned here does not get folded into 'ret'. > ref_transaction_free(transaction); > return ret; > } > @@ -2064,7 +2124,10 @@ int ref_transaction_commit(struct ref_transaction *transaction, > break; > } > > - return refs->be->transaction_finish(refs, transaction, err); > + ret = refs->be->transaction_finish(refs, transaction, err); > + if (!ret) > + run_transaction_hook(transaction, "ref-transaction-committed"); > + return ret; > } > > int refs_verify_refname_available(struct ref_store *refs, > diff --git a/t/t1416-ref-transaction-hooks.sh b/t/t1416-ref-transaction-hooks.sh > new file mode 100755 > index 0000000000..b6df5fc883 > --- /dev/null > +++ b/t/t1416-ref-transaction-hooks.sh > @@ -0,0 +1,88 @@ > +#!/bin/sh > + > +test_description='reference transaction hooks' > + > +. ./test-lib.sh > + > +create_commit () > +{ Style (Documentation/CodingGuidelines): - We prefer a space between the function name and the parentheses, and no space inside the parentheses. The opening "{" should also be on the same line. > + test_tick && > + T=$(git write-tree) && > + sha1=$(echo message | git commit-tree $T) && > + echo $sha1 Calling this helper in a subprocess does not have the intended effect of calling test_tick, which increments the committer timestamp by 1 second to prepare for the next call... > +} > + > +test_expect_success setup ' > + mkdir -p .git/hooks > +' > + > +test_expect_success 'prepared hook allows updating ref' ' > + test_when_finished "rm .git/hooks/ref-transaction-prepared" && > + write_script .git/hooks/ref-transaction-prepared <<-\EOF && > + exit 0 > + EOF > + C=$(create_commit) && ... like here. Instead of creating test commits left and right at each test, how about preparing a pair of them (PRE, POST) upfront in the "setup" step, reset the refs involved to PRE at the very beginning of each test, and use POST to operations that would update the refs? We can use a couple of calls to test_commit helper to do so, without having to bother with the low level porcelain calls if we go that route. > + git update-ref HEAD $C > +' > + > +test_expect_success 'prepared hook aborts updating ref' ' > + test_when_finished "rm .git/hooks/ref-transaction-prepared" && > + write_script .git/hooks/ref-transaction-prepared <<-\EOF && > + exit 1 > + EOF > + C=$(create_commit) && > + test_must_fail git update-ref HEAD $C 2>err && > + grep "ref updates aborted by hook" err Running "make GIT_TEST_GETTEXT_POISON=Yes test" would probably break this line. Use test_i18ngrep instead. > +' > + > +test_expect_success 'prepared hook gets all queued updates' ' > + test_when_finished "rm .git/hooks/ref-transaction-prepared actual" && > + write_script .git/hooks/ref-transaction-prepared <<-\EOF && > + while read -r line; do printf "%s\n" "$line"; done >actual Style (used in the generated script)? > + EOF > + C=$(create_commit) && > + cat >expect <<-EOF && > + $ZERO_OID $C HEAD > + $ZERO_OID $C refs/heads/master > + EOF > + git update-ref HEAD $C <<-EOF && > + update HEAD $ZERO_OID $C > + update refs/heads/master $ZERO_OID $C > + EOF > + test_cmp expect actual OK, good futureproofing for the hash algorithm update ;-).
On Tue, Jun 02, 2020 at 10:25:44AM +0200, Patrick Steinhardt wrote: > The low-level reference transactions used to update references are > currently completely opaque to the user. While certainly desirable in > most usecases, there are some which might want to hook into the > transaction to observe all queued reference updates as well as observing > the abortion or commit of a prepared transaction. > > One such usecase would be to have a set of replicas of a given Git > repository, where we perform Git operations on all of the repositories > at once and expect the outcome to be the same in all of them. While > there exist hooks already for a certain subset of Git commands that > could be used to implement a voting mechanism for this, many others > currently don't have any mechanism for this. > > The above scenario is the motivation for a set of three new hooks that > reach directly into Git's reference transaction. Each of the following > new hooks (currently) doesn't accept any parameters and receives the set > of queued reference updates via stdin: > > - ref-transaction-prepared gets called when all reference updates > have been queued. At this stage, the hook may decide to abort the > transaction prematurely by returning a non-zero status code. > > - ref-transaction-committed gets called when a reference transaction > was transmitted and all queued updates have been persisted. > > - ref-transaction-aborted gets called when a reference transaction > was aborted and all queued updates have been rolled back. The point of reference transactions is that they are atomic, and these hooks must work together to ensure that. This raises the question how these hooks can be updated in an actively used repository. Having multiple hooks means that they can't be updated atomically, and git might invoke the new abort hook after the transaction was prepared with the old hook. Now, if there were a single 'ref-transaction' hook (which gets the phase of the transaction ('prepared', 'committed' or 'aborted') as a parameter), then it could be updated atomically by mv-ing it to place, but even that update can happen in between git invokes 'ref-transaction prepared' and 'ref-transaction aborted'. I suppose this issue could be addressed by a single hook which runs during the whole transaction and some back-and-forth communication through stdin/out between git and the hook. However, this would, I'm afraid, complicate both Git's handling of this hook and the hook as well, so let's take a step back first: is this something we should worry about in the first place? > Given the usecase described above, a voting mechanism can now be > implemented as a "ref-transaction-prepared" hook: as soon as it gets > called, it will take all of stdin and use it to cast a vote to a central > service. When all replicas of the repository agree, the hook will exit > with zero, otherwise it will abort the transaction by returning > non-zero. The most important upside is that this will catch _all_ > commands writing references at once, allowing to implement strong > consistency for reference updates via a single mechanism. > > Signed-off-by: Patrick Steinhardt <ps@pks.im> > --- > Documentation/githooks.txt | 51 ++++++++++++++++++ > refs.c | 67 +++++++++++++++++++++++- > t/t1416-ref-transaction-hooks.sh | 88 ++++++++++++++++++++++++++++++++ > 3 files changed, 204 insertions(+), 2 deletions(-) > create mode 100755 t/t1416-ref-transaction-hooks.sh > > diff --git a/Documentation/githooks.txt b/Documentation/githooks.txt > index 81f2a87e88..48f8446943 100644 > --- a/Documentation/githooks.txt > +++ b/Documentation/githooks.txt > @@ -404,6 +404,57 @@ Both standard output and standard error output are forwarded to > `git send-pack` on the other end, so you can simply `echo` messages > for the user. > > +ref-transaction-prepared > +~~~~~~~~~~~~~~~~~~~~~~~~ > + > +This hook is invoked by any Git command that performs reference > +updates. It executes as soon as all reference updates were queued to > +the transaction and locked on disk. This hook executes for every > +reference transaction that is being prepared and may thus get called > +multiple times. > + > +It takes no arguments, but for each ref to be updated it receives on > +standard input a line of the format: > + > + <old-value> SP <new-value> SP <ref-name> LF > + > +If the hook exits with a non-zero status, the transaction is aborted > +and the command exits immediately. The > +<<ref-transaction-aborted,'ref-transaction-aborted'>> hook is not > +executed in that case. > + > +[[ref-transaction-aborted]] > +ref-transaction-aborted > +~~~~~~~~~~~~~~~~~~~~~~~ > + > +This hook is invoked by any Git command that performs reference > +updates. It executes as soon as a reference transaction is aborted and > +after all reference locks were released and any changes made to > +references were rolled back. The hook may get called multiple times or > +never in case no transaction was aborted. > + > +The hook takes no arguments, but for each ref to be updated it Nit: I found it a bit surprising to read about refs "to be updated" in the description of the 'aborted' hook, because by the time this hook is called the update has already been refused. The same applies to the 'committed' hook below as well. > +receives on standard input a line of the format: > + > + <old-value> SP <new-value> SP <ref-name> LF > + > +The hook's exit code is discarded by Git. > + > +ref-transaction-committed > +~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +This hook is invoked by any Git command that performs reference > +updates. It executes as soon as a reference transaction is committed, > +persisting all changes to disk and releasing any locks. The hook may > +get called multiple times or never in case no transaction was aborted. > + > +The hook takes no arguments, but for each ref to be updated it > +receives on standard input a line of the format: > + > + <old-value> SP <new-value> SP <ref-name> LF > + > +The hook's exit code is discarded by Git. > + > push-to-checkout > ~~~~~~~~~~~~~~~~ >
On Tue, Jun 02, 2020 at 08:09:00PM +0200, SZEDER Gábor wrote: > On Tue, Jun 02, 2020 at 10:25:44AM +0200, Patrick Steinhardt wrote: > > The low-level reference transactions used to update references are > > currently completely opaque to the user. While certainly desirable in > > most usecases, there are some which might want to hook into the > > transaction to observe all queued reference updates as well as observing > > the abortion or commit of a prepared transaction. > > > > One such usecase would be to have a set of replicas of a given Git > > repository, where we perform Git operations on all of the repositories > > at once and expect the outcome to be the same in all of them. While > > there exist hooks already for a certain subset of Git commands that > > could be used to implement a voting mechanism for this, many others > > currently don't have any mechanism for this. > > > > The above scenario is the motivation for a set of three new hooks that > > reach directly into Git's reference transaction. Each of the following > > new hooks (currently) doesn't accept any parameters and receives the set > > of queued reference updates via stdin: > > > > - ref-transaction-prepared gets called when all reference updates > > have been queued. At this stage, the hook may decide to abort the > > transaction prematurely by returning a non-zero status code. > > > > - ref-transaction-committed gets called when a reference transaction > > was transmitted and all queued updates have been persisted. > > > > - ref-transaction-aborted gets called when a reference transaction > > was aborted and all queued updates have been rolled back. > > The point of reference transactions is that they are atomic, and these > hooks must work together to ensure that. This raises the question how > these hooks can be updated in an actively used repository. > > Having multiple hooks means that they can't be updated atomically, and > git might invoke the new abort hook after the transaction was prepared > with the old hook. Now, if there were a single 'ref-transaction' hook > (which gets the phase of the transaction ('prepared', 'committed' or > 'aborted') as a parameter), then it could be updated atomically by > mv-ing it to place, but even that update can happen in between git > invokes 'ref-transaction prepared' and 'ref-transaction aborted'. > > I suppose this issue could be addressed by a single hook which runs > during the whole transaction and some back-and-forth communication > through stdin/out between git and the hook. However, this would, I'm > afraid, complicate both Git's handling of this hook and the hook as > well, so let's take a step back first: is this something we should > worry about in the first place? Very good point about which I didn't previously think, thanks a lot for raising it! I agree that using a single long-lived hook would complicate the logic by quite a bit. Given that the ref-transaction mechanism is such a central piece to Git, I'd be wary of introducing such complexity. But merging the current three hooks into a single hook accepting a parameter sounds like a fair compromise to me that would at least allow users to replace them atomically, even though it doesn't mean all stages were run with the same version of the hook. If we want to go further and also ensure the same script's run across all hook invocations, we could also open the hook's file descriptor on first invocation and then use `fexecve` on all subsequent invocations. As long as the user doesn't do inline rewrites of the file, we'd thus always use the same file. I'm not sure how portable that syscall is, though, but it's at least part of POSIX-2008. > > Given the usecase described above, a voting mechanism can now be > > implemented as a "ref-transaction-prepared" hook: as soon as it gets > > called, it will take all of stdin and use it to cast a vote to a central > > service. When all replicas of the repository agree, the hook will exit > > with zero, otherwise it will abort the transaction by returning > > non-zero. The most important upside is that this will catch _all_ > > commands writing references at once, allowing to implement strong > > consistency for reference updates via a single mechanism. > > > > Signed-off-by: Patrick Steinhardt <ps@pks.im> > > --- > > Documentation/githooks.txt | 51 ++++++++++++++++++ > > refs.c | 67 +++++++++++++++++++++++- > > t/t1416-ref-transaction-hooks.sh | 88 ++++++++++++++++++++++++++++++++ > > 3 files changed, 204 insertions(+), 2 deletions(-) > > create mode 100755 t/t1416-ref-transaction-hooks.sh > > > > diff --git a/Documentation/githooks.txt b/Documentation/githooks.txt > > index 81f2a87e88..48f8446943 100644 > > --- a/Documentation/githooks.txt > > +++ b/Documentation/githooks.txt > > @@ -404,6 +404,57 @@ Both standard output and standard error output are forwarded to > > `git send-pack` on the other end, so you can simply `echo` messages > > for the user. > > > > +ref-transaction-prepared > > +~~~~~~~~~~~~~~~~~~~~~~~~ > > + > > +This hook is invoked by any Git command that performs reference > > +updates. It executes as soon as all reference updates were queued to > > +the transaction and locked on disk. This hook executes for every > > +reference transaction that is being prepared and may thus get called > > +multiple times. > > + > > +It takes no arguments, but for each ref to be updated it receives on > > +standard input a line of the format: > > + > > + <old-value> SP <new-value> SP <ref-name> LF > > + > > +If the hook exits with a non-zero status, the transaction is aborted > > +and the command exits immediately. The > > +<<ref-transaction-aborted,'ref-transaction-aborted'>> hook is not > > +executed in that case. > > + > > +[[ref-transaction-aborted]] > > +ref-transaction-aborted > > +~~~~~~~~~~~~~~~~~~~~~~~ > > + > > +This hook is invoked by any Git command that performs reference > > +updates. It executes as soon as a reference transaction is aborted and > > +after all reference locks were released and any changes made to > > +references were rolled back. The hook may get called multiple times or > > +never in case no transaction was aborted. > > + > > +The hook takes no arguments, but for each ref to be updated it > > Nit: I found it a bit surprising to read about refs "to be updated" in > the description of the 'aborted' hook, because by the time this hook > is called the update has already been refused. The same applies to > the 'committed' hook below as well. Fair. This should rather read "that would have been updated" and "that were updated", respectively. Patrick
On Tue, Jun 02, 2020 at 10:47:55AM -0700, Junio C Hamano wrote: > Patrick Steinhardt <ps@pks.im> writes: > > > The above scenario is the motivation for a set of three new hooks that > > reach directly into Git's reference transaction. Each of the following > > new hooks (currently) doesn't accept any parameters and receives the set > > of queued reference updates via stdin: > > Do we have something (e.g. performance measurement) to convince > ourselves that this won't incur unacceptable levels of overhead in > null cases where there is no hook installed in the repository? Not yet, but I'll try to come up with a benchmark in the next iteration. I guess the best way to test is to directly exercise git-update-refs, as it's nearly a direct wrapper around reference transactions. > > + proc.in = -1; > > + proc.stdout_to_stderr = 1; > > + proc.trace2_hook_name = hook_name; > > + > > + code = start_command(&proc); > > + if (code) > > + return code; > > + > > + sigchain_push(SIGPIPE, SIG_IGN); > > + > > + for (i = 0; i < transaction->nr; i++) { > > + struct ref_update *update = transaction->updates[i]; > > + > > + strbuf_reset(&buf); > > + strbuf_addf(&buf, "%s %s %s\n", > > + oid_to_hex(&update->old_oid), > > + oid_to_hex(&update->new_oid), > > + update->refname); > > + > > + if (write_in_full(proc.in, buf.buf, buf.len) < 0) > > + break; > > We leave the loop early when we detect a write failure here... > > > + } > > + > > + close(proc.in); > > + sigchain_pop(SIGPIPE); > > + > > + strbuf_release(&buf); > > + return finish_command(&proc); > > ... but the caller does not get notified. Intended? This is semi-intended. In case the hook doesn't fully consume stdin and exits early, writing to its stdin would fail as we ignore SIGPIPE. We don't want to force the hook to care about consuming all of stdin, though. We could improve error handling here by ignoring EPIPE, but making every other write error fatal. If there's any other abnormal error condition then we certainly don't want the hook to act on incomplete data and pretend everything's fine. > > +} > > + > > int ref_transaction_prepare(struct ref_transaction *transaction, > > struct strbuf *err) > > { > > struct ref_store *refs = transaction->ref_store; > > + int ret; > > > > switch (transaction->state) { > > case REF_TRANSACTION_OPEN: > > @@ -2012,7 +2060,17 @@ int ref_transaction_prepare(struct ref_transaction *transaction, > > return -1; > > } > > > > - return refs->be->transaction_prepare(refs, transaction, err); > > + ret = refs->be->transaction_prepare(refs, transaction, err); > > + if (ret) > > + return ret; > > + > > + ret = run_transaction_hook(transaction, "ref-transaction-prepared"); > > This caller does care about it, no? This caller does as it may abort the transaction, but... > > + if (ret) { > > + ref_transaction_abort(transaction, err); > > + die(_("ref updates aborted by hook")); > > + } > > + > > + return 0; > > } > > > > int ref_transaction_abort(struct ref_transaction *transaction, > > @@ -2036,6 +2094,8 @@ int ref_transaction_abort(struct ref_transaction *transaction, > > break; > > } > > > > + run_transaction_hook(transaction, "ref-transaction-aborted"); > > And I presume that the callers of "ref_xn_abort()" would, too, but > the value returned here does not get folded into 'ret'. ... this one doesn't. The thing is that the reference transaction hook for the "aborted" case can't do anything about an aborted transaction after the fact anyway. That's why I chose to ignore errors here, same for the "committed" case. Patrick
On Wed, Jun 03, 2020 at 01:26:04PM +0200, Patrick Steinhardt wrote: > On Tue, Jun 02, 2020 at 10:47:55AM -0700, Junio C Hamano wrote: > > Patrick Steinhardt <ps@pks.im> writes: > > > > > The above scenario is the motivation for a set of three new hooks that > > > reach directly into Git's reference transaction. Each of the following > > > new hooks (currently) doesn't accept any parameters and receives the set > > > of queued reference updates via stdin: > > > > Do we have something (e.g. performance measurement) to convince > > ourselves that this won't incur unacceptable levels of overhead in > > null cases where there is no hook installed in the repository? > > Not yet, but I'll try to come up with a benchmark in the next iteration. > I guess the best way to test is to directly exercise git-update-refs, as > it's nearly a direct wrapper around reference transactions. > > > > + proc.in = -1; > > > + proc.stdout_to_stderr = 1; > > > + proc.trace2_hook_name = hook_name; > > > + > > > + code = start_command(&proc); > > > + if (code) > > > + return code; > > > + > > > + sigchain_push(SIGPIPE, SIG_IGN); > > > + > > > + for (i = 0; i < transaction->nr; i++) { > > > + struct ref_update *update = transaction->updates[i]; > > > + > > > + strbuf_reset(&buf); > > > + strbuf_addf(&buf, "%s %s %s\n", > > > + oid_to_hex(&update->old_oid), > > > + oid_to_hex(&update->new_oid), > > > + update->refname); > > > + > > > + if (write_in_full(proc.in, buf.buf, buf.len) < 0) > > > + break; > > > > We leave the loop early when we detect a write failure here... > > > > > + } > > > + > > > + close(proc.in); > > > + sigchain_pop(SIGPIPE); > > > + > > > + strbuf_release(&buf); > > > + return finish_command(&proc); > > > > ... but the caller does not get notified. Intended? > > This is semi-intended. In case the hook doesn't fully consume stdin and > exits early, writing to its stdin would fail as we ignore SIGPIPE. We > don't want to force the hook to care about consuming all of stdin, > though. Why? How could the prepared hook properly initialize the voting mechanism for the transaction without reading all the refs to be updated? > We could improve error handling here by ignoring EPIPE, but making every > other write error fatal. If there's any other abnormal error condition > then we certainly don't want the hook to act on incomplete data and > pretend everything's fine. As I read v2 of this patch, a prepared hook can exit(0) early without reading all the refs to be updated, cause EPIPE in the git process invoking the hook, and that process would interpret that as success. I haven't though it through how such a voting mechanism would work, but I have a gut feeling that this can't be good.
On Sun, Jun 07, 2020 at 10:12:33PM +0200, SZEDER Gábor wrote: > On Wed, Jun 03, 2020 at 01:26:04PM +0200, Patrick Steinhardt wrote: > > On Tue, Jun 02, 2020 at 10:47:55AM -0700, Junio C Hamano wrote: > > > Patrick Steinhardt <ps@pks.im> writes: > > > > > > > The above scenario is the motivation for a set of three new hooks that > > > > reach directly into Git's reference transaction. Each of the following > > > > new hooks (currently) doesn't accept any parameters and receives the set > > > > of queued reference updates via stdin: > > > > > > Do we have something (e.g. performance measurement) to convince > > > ourselves that this won't incur unacceptable levels of overhead in > > > null cases where there is no hook installed in the repository? > > > > Not yet, but I'll try to come up with a benchmark in the next iteration. > > I guess the best way to test is to directly exercise git-update-refs, as > > it's nearly a direct wrapper around reference transactions. > > > > > > + proc.in = -1; > > > > + proc.stdout_to_stderr = 1; > > > > + proc.trace2_hook_name = hook_name; > > > > + > > > > + code = start_command(&proc); > > > > + if (code) > > > > + return code; > > > > + > > > > + sigchain_push(SIGPIPE, SIG_IGN); > > > > + > > > > + for (i = 0; i < transaction->nr; i++) { > > > > + struct ref_update *update = transaction->updates[i]; > > > > + > > > > + strbuf_reset(&buf); > > > > + strbuf_addf(&buf, "%s %s %s\n", > > > > + oid_to_hex(&update->old_oid), > > > > + oid_to_hex(&update->new_oid), > > > > + update->refname); > > > > + > > > > + if (write_in_full(proc.in, buf.buf, buf.len) < 0) > > > > + break; > > > > > > We leave the loop early when we detect a write failure here... > > > > > > > + } > > > > + > > > > + close(proc.in); > > > > + sigchain_pop(SIGPIPE); > > > > + > > > > + strbuf_release(&buf); > > > > + return finish_command(&proc); > > > > > > ... but the caller does not get notified. Intended? > > > > This is semi-intended. In case the hook doesn't fully consume stdin and > > exits early, writing to its stdin would fail as we ignore SIGPIPE. We > > don't want to force the hook to care about consuming all of stdin, > > though. > > Why? How could the prepared hook properly initialize the voting > mechanism for the transaction without reading all the refs to be > updated? Because the hook might not want to implement a voting mechanism after all but something entirely different which we're currently not foreseeing as a valid usecase. We don't enforce this anywhere else either, like e.g. for the pre-receive hook. If that one exits early without consuming its stdin then that's totally fine. > > We could improve error handling here by ignoring EPIPE, but making every > > other write error fatal. If there's any other abnormal error condition > > then we certainly don't want the hook to act on incomplete data and > > pretend everything's fine. > > As I read v2 of this patch, a prepared hook can exit(0) early without > reading all the refs to be updated, cause EPIPE in the git process > invoking the hook, and that process would interpret that as success. > I haven't though it through how such a voting mechanism would work, > but I have a gut feeling that this can't be good. As said, I lean towards allowing more flexibility for the hook implementation to also cater for other usecases. But I agree that in a voting implementation, not reading all of stdin is a bad thing and may point to a buggy hook implementation. Aborting the transaction if the hook didn't read all of stdin would be a nice safeguard in that case. With the current implementation of using a single hook for "prepared", "committed" and "aborted", it'd also force the hook implementation to do something in cases it doesn't care about. E.g. #!/bin/sh case "$1" in prepared) VOTE=$(sha1sum <&0) cast $VOTE ;; aborted|committed) cat <&0 >/dev/null ;; esac That being said, I'm not opposed to enforce this and not treat EPIPE differently. Patrick
diff --git a/Documentation/githooks.txt b/Documentation/githooks.txt index 81f2a87e88..48f8446943 100644 --- a/Documentation/githooks.txt +++ b/Documentation/githooks.txt @@ -404,6 +404,57 @@ Both standard output and standard error output are forwarded to `git send-pack` on the other end, so you can simply `echo` messages for the user. +ref-transaction-prepared +~~~~~~~~~~~~~~~~~~~~~~~~ + +This hook is invoked by any Git command that performs reference +updates. It executes as soon as all reference updates were queued to +the transaction and locked on disk. This hook executes for every +reference transaction that is being prepared and may thus get called +multiple times. + +It takes no arguments, but for each ref to be updated it receives on +standard input a line of the format: + + <old-value> SP <new-value> SP <ref-name> LF + +If the hook exits with a non-zero status, the transaction is aborted +and the command exits immediately. The +<<ref-transaction-aborted,'ref-transaction-aborted'>> hook is not +executed in that case. + +[[ref-transaction-aborted]] +ref-transaction-aborted +~~~~~~~~~~~~~~~~~~~~~~~ + +This hook is invoked by any Git command that performs reference +updates. It executes as soon as a reference transaction is aborted and +after all reference locks were released and any changes made to +references were rolled back. The hook may get called multiple times or +never in case no transaction was aborted. + +The hook takes no arguments, but for each ref to be updated it +receives on standard input a line of the format: + + <old-value> SP <new-value> SP <ref-name> LF + +The hook's exit code is discarded by Git. + +ref-transaction-committed +~~~~~~~~~~~~~~~~~~~~~~~~~ + +This hook is invoked by any Git command that performs reference +updates. It executes as soon as a reference transaction is committed, +persisting all changes to disk and releasing any locks. The hook may +get called multiple times or never in case no transaction was aborted. + +The hook takes no arguments, but for each ref to be updated it +receives on standard input a line of the format: + + <old-value> SP <new-value> SP <ref-name> LF + +The hook's exit code is discarded by Git. + push-to-checkout ~~~~~~~~~~~~~~~~ diff --git a/refs.c b/refs.c index 224ff66c7b..e41fa7ea55 100644 --- a/refs.c +++ b/refs.c @@ -9,6 +9,7 @@ #include "iterator.h" #include "refs.h" #include "refs/refs-internal.h" +#include "run-command.h" #include "object-store.h" #include "object.h" #include "tag.h" @@ -16,6 +17,7 @@ #include "worktree.h" #include "argv-array.h" #include "repository.h" +#include "sigchain.h" /* * List of all available backends @@ -1986,10 +1988,56 @@ int ref_update_reject_duplicates(struct string_list *refnames, return 0; } +static int run_transaction_hook(struct ref_transaction *transaction, + const char *hook_name) +{ + struct child_process proc = CHILD_PROCESS_INIT; + struct strbuf buf = STRBUF_INIT; + const char *argv[2]; + int code, i; + + argv[0] = find_hook(hook_name); + if (!argv[0]) + return 0; + + argv[1] = NULL; + + proc.argv = argv; + proc.in = -1; + proc.stdout_to_stderr = 1; + proc.trace2_hook_name = hook_name; + + code = start_command(&proc); + if (code) + return code; + + sigchain_push(SIGPIPE, SIG_IGN); + + for (i = 0; i < transaction->nr; i++) { + struct ref_update *update = transaction->updates[i]; + + strbuf_reset(&buf); + strbuf_addf(&buf, "%s %s %s\n", + oid_to_hex(&update->old_oid), + oid_to_hex(&update->new_oid), + update->refname); + + if (write_in_full(proc.in, buf.buf, buf.len) < 0) + break; + } + + close(proc.in); + sigchain_pop(SIGPIPE); + + strbuf_release(&buf); + return finish_command(&proc); +} + int ref_transaction_prepare(struct ref_transaction *transaction, struct strbuf *err) { struct ref_store *refs = transaction->ref_store; + int ret; switch (transaction->state) { case REF_TRANSACTION_OPEN: @@ -2012,7 +2060,17 @@ int ref_transaction_prepare(struct ref_transaction *transaction, return -1; } - return refs->be->transaction_prepare(refs, transaction, err); + ret = refs->be->transaction_prepare(refs, transaction, err); + if (ret) + return ret; + + ret = run_transaction_hook(transaction, "ref-transaction-prepared"); + if (ret) { + ref_transaction_abort(transaction, err); + die(_("ref updates aborted by hook")); + } + + return 0; } int ref_transaction_abort(struct ref_transaction *transaction, @@ -2036,6 +2094,8 @@ int ref_transaction_abort(struct ref_transaction *transaction, break; } + run_transaction_hook(transaction, "ref-transaction-aborted"); + ref_transaction_free(transaction); return ret; } @@ -2064,7 +2124,10 @@ int ref_transaction_commit(struct ref_transaction *transaction, break; } - return refs->be->transaction_finish(refs, transaction, err); + ret = refs->be->transaction_finish(refs, transaction, err); + if (!ret) + run_transaction_hook(transaction, "ref-transaction-committed"); + return ret; } int refs_verify_refname_available(struct ref_store *refs, diff --git a/t/t1416-ref-transaction-hooks.sh b/t/t1416-ref-transaction-hooks.sh new file mode 100755 index 0000000000..b6df5fc883 --- /dev/null +++ b/t/t1416-ref-transaction-hooks.sh @@ -0,0 +1,88 @@ +#!/bin/sh + +test_description='reference transaction hooks' + +. ./test-lib.sh + +create_commit () +{ + test_tick && + T=$(git write-tree) && + sha1=$(echo message | git commit-tree $T) && + echo $sha1 +} + +test_expect_success setup ' + mkdir -p .git/hooks +' + +test_expect_success 'prepared hook allows updating ref' ' + test_when_finished "rm .git/hooks/ref-transaction-prepared" && + write_script .git/hooks/ref-transaction-prepared <<-\EOF && + exit 0 + EOF + C=$(create_commit) && + git update-ref HEAD $C +' + +test_expect_success 'prepared hook aborts updating ref' ' + test_when_finished "rm .git/hooks/ref-transaction-prepared" && + write_script .git/hooks/ref-transaction-prepared <<-\EOF && + exit 1 + EOF + C=$(create_commit) && + test_must_fail git update-ref HEAD $C 2>err && + grep "ref updates aborted by hook" err +' + +test_expect_success 'prepared hook gets all queued updates' ' + test_when_finished "rm .git/hooks/ref-transaction-prepared actual" && + write_script .git/hooks/ref-transaction-prepared <<-\EOF && + while read -r line; do printf "%s\n" "$line"; done >actual + EOF + C=$(create_commit) && + cat >expect <<-EOF && + $ZERO_OID $C HEAD + $ZERO_OID $C refs/heads/master + EOF + git update-ref HEAD $C <<-EOF && + update HEAD $ZERO_OID $C + update refs/heads/master $ZERO_OID $C + EOF + test_cmp expect actual +' + +test_expect_success 'committed hook gets all queued updates' ' + test_when_finished "rm .git/hooks/ref-transaction-committed actual" && + write_script .git/hooks/ref-transaction-committed <<-\EOF && + while read -r line; do printf "%s\n" "$line"; done >actual + EOF + C=$(create_commit) && + cat >expect <<-EOF && + $ZERO_OID $C HEAD + $ZERO_OID $C refs/heads/master + EOF + git update-ref HEAD $C && + test_cmp expect actual +' + +test_expect_success 'aborted hook gets all queued updates' ' + test_when_finished "rm .git/hooks/ref-transaction-aborted actual" && + write_script .git/hooks/ref-transaction-aborted <<-\EOF && + while read -r line; do printf "%s\n" "$line"; done >actual + EOF + C=$(create_commit) && + cat >expect <<-EOF && + $ZERO_OID $C HEAD + $ZERO_OID $C refs/heads/master + EOF + git update-ref --stdin <<-EOF && + start + update HEAD $C $ZERO_OID + update refs/heads/master $C $ZERO_OID + abort + EOF + test_cmp expect actual +' + +test_done
The low-level reference transactions used to update references are currently completely opaque to the user. While certainly desirable in most usecases, there are some which might want to hook into the transaction to observe all queued reference updates as well as observing the abortion or commit of a prepared transaction. One such usecase would be to have a set of replicas of a given Git repository, where we perform Git operations on all of the repositories at once and expect the outcome to be the same in all of them. While there exist hooks already for a certain subset of Git commands that could be used to implement a voting mechanism for this, many others currently don't have any mechanism for this. The above scenario is the motivation for a set of three new hooks that reach directly into Git's reference transaction. Each of the following new hooks (currently) doesn't accept any parameters and receives the set of queued reference updates via stdin: - ref-transaction-prepared gets called when all reference updates have been queued. At this stage, the hook may decide to abort the transaction prematurely by returning a non-zero status code. - ref-transaction-committed gets called when a reference transaction was transmitted and all queued updates have been persisted. - ref-transaction-aborted gets called when a reference transaction was aborted and all queued updates have been rolled back. Given the usecase described above, a voting mechanism can now be implemented as a "ref-transaction-prepared" hook: as soon as it gets called, it will take all of stdin and use it to cast a vote to a central service. When all replicas of the repository agree, the hook will exit with zero, otherwise it will abort the transaction by returning non-zero. The most important upside is that this will catch _all_ commands writing references at once, allowing to implement strong consistency for reference updates via a single mechanism. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- Documentation/githooks.txt | 51 ++++++++++++++++++ refs.c | 67 +++++++++++++++++++++++- t/t1416-ref-transaction-hooks.sh | 88 ++++++++++++++++++++++++++++++++ 3 files changed, 204 insertions(+), 2 deletions(-) create mode 100755 t/t1416-ref-transaction-hooks.sh