[v3] refs: implement reference transaction hook

Message ID	1de96b96e3448c8f7e7974f7c082fd08d2d14e96.1592475610.git.ps@pks.im (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=1Nmv=77=vger.kernel.org=git-owner@kernel.org> Date: Thu, 18 Jun 2020 12:27:43 +0200 From: Patrick Steinhardt <ps@pks.im> To: git@vger.kernel.org Cc: SZEDER =?iso-8859-1?q?G=E1bor?= <szeder.dev@gmail.com>, Junio C Hamano <gitster@pobox.com>, Taylor Blau <me@ttaylorr.com> Subject: [PATCH v3] refs: implement reference transaction hook Message-ID: <1de96b96e3448c8f7e7974f7c082fd08d2d14e96.1592475610.git.ps@pks.im> References: <1d1a94426f95d842e0e3ea6a1569c0c45239229c.1591086316.git.ps@pks.im> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="T4sUOijqQbZv57TR" Content-Disposition: inline In-Reply-To: <1d1a94426f95d842e0e3ea6a1569c0c45239229c.1591086316.git.ps@pks.im> Sender: git-owner@vger.kernel.org Precedence: bulk
Series	[v3] refs: implement reference transaction hook \| expand [v3] refs: implement reference transaction hook

Message ID

1de96b96e3448c8f7e7974f7c082fd08d2d14e96.1592475610.git.ps@pks.im (mailing list archive)

State

New, archived

Headers

Date: Thu, 18 Jun 2020 12:27:43 +0200
From: Patrick Steinhardt <ps@pks.im>
To: git@vger.kernel.org
Cc: SZEDER =?iso-8859-1?q?G=E1bor?= <szeder.dev@gmail.com>,
 Junio C Hamano <gitster@pobox.com>, Taylor Blau <me@ttaylorr.com>
Subject: [PATCH v3] refs: implement reference transaction hook
Message-ID: 
 <1de96b96e3448c8f7e7974f7c082fd08d2d14e96.1592475610.git.ps@pks.im>
References: 
 <1d1a94426f95d842e0e3ea6a1569c0c45239229c.1591086316.git.ps@pks.im>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha512;
        protocol="application/pgp-signature"; boundary="T4sUOijqQbZv57TR"
Content-Disposition: inline
In-Reply-To: 
 <1d1a94426f95d842e0e3ea6a1569c0c45239229c.1591086316.git.ps@pks.im>
Sender: git-owner@vger.kernel.org
Precedence: bulk

Series

[v3] refs: implement reference transaction hook | expand

Commit Message

Patrick Steinhardt June 18, 2020, 10:27 a.m. UTC

The low-level reference transactions used to update references are
currently completely opaque to the user. While certainly desirable in
most usecases, there are some which might want to hook into the
transaction to observe all queued reference updates as well as observing
the abortion or commit of a prepared transaction.

One such usecase would be to have a set of replicas of a given Git
repository, where we perform Git operations on all of the repositories
at once and expect the outcome to be the same in all of them. While
there exist hooks already for a certain subset of Git commands that
could be used to implement a voting mechanism for this, many others
currently don't have any mechanism for this.

The above scenario is the motivation for the new "reference-transaction"
hook that reaches directly into Git's reference transaction mechanism.
The hook receives as parameter the current state the transaction was
moved to ("prepared", "committed" or "aborted") and gets via its
standard input all queued reference updates. While the exit code gets
ignored in the "committed" and "aborted" states, a non-zero exit code in
the "prepared" state will cause the transaction to be aborted
prematurely.

Given the usecase described above, a voting mechanism can now be
implemented via this hook: as soon as it gets called, it will take all
of stdin and use it to cast a vote to a central service. When all
replicas of the repository agree, the hook will exit with zero,
otherwise it will abort the transaction by returning non-zero. The most
important upside is that this will catch _all_ commands writing
references at once, allowing to implement strong consistency for
reference updates via a single mechanism.

In order to test the impact on the case where we don't have any
"reference-transaction" hook installed in the repository, this commit
introduce two new performance tests for git-update-refs(1). Run against
an empty repository, it produces the following results:

  Test                         origin/master     HEAD
  --------------------------------------------------------------------
  1400.2: update-ref           2.70(2.10+0.71)   2.71(2.10+0.73) +0.4%
  1400.3: update-ref --stdin   0.21(0.09+0.11)   0.21(0.07+0.14) +0.0%

The performance test p1400.2 creates, updates and deletes a branch a
thousand times, thus averaging runtime of git-update-refs over 3000
invocations. p1400.3 instead calls `git-update-refs --stdin` three times
and queues a thousand creations, updates and deletes respectively.

As expected, p1400.3 consistently shows no noticeable impact, as for
each batch of updates there's a single call to access(3P) for the
negative hook lookup. On the other hand, for p1400.2, one can see an
impact caused by this patchset. But doing five runs of the performance
tests where each one was run with GIT_PERF_REPEAT_COUNT=10, the overhead
ranged from -1.5% to +1.1%. These inconsistent performance numbers can
be explained by the overhead of spawning 3000 processes. This shows that
the overhead of assembling the hook path and executing access(3P) once
to check if it's there is mostly outweighed by the operating system's
overhead.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---

There's two changes compared to v2:

    1. I've added lookup cache for the hook that both caches the
       positive as well as the negative lookup of this hook. It's
       specific to the reference-transaction hook, only, and will simply
       store the path to the found hook or a sentinel value in case it
       wasn't found. Like this, we avoid one of the two calls to
       access(3P) when executing git-update-ref(1). This improved
       performance into the range where variance between runs drowns out
       the impact of the single access(3P) call.

    2. I've amend a second benchmark to p1400, where one tests single
       invocations of git-update-ref and the second one uses batched
       invocations via its --stdin switch. The latter doesn't show any
       impact, while the former one ranged from an overhead of -1.5% to
       +1.1%.

I did have a look at integrating this work with Emily's work, but I
don't really think it necessary given that benchmarks show that the
overhead of the access(3P) call is drowned out by the OS anyway. If you
feel differently, I may revisit it and perform some benchmarks on top of
her work.

Patrick

 Documentation/githooks.txt       |  29 ++++++++
 refs.c                           |  79 +++++++++++++++++++++-
 t/perf/p1400-update-ref.sh       |  32 +++++++++
 t/t1416-ref-transaction-hooks.sh | 109 +++++++++++++++++++++++++++++++
 4 files changed, 247 insertions(+), 2 deletions(-)
 create mode 100755 t/perf/p1400-update-ref.sh
 create mode 100755 t/t1416-ref-transaction-hooks.sh

Comments

Junio C Hamano June 18, 2020, 10:23 p.m. UTC | #1

Patrick Steinhardt <ps@pks.im> writes:

> +static const char hook_not_found;
> +static const char *hook;

;-)  Nice.

> +static int run_transaction_hook(struct ref_transaction *transaction,
> +				const char *state)
> +{
> +	struct child_process proc = CHILD_PROCESS_INIT;
> +	struct strbuf buf = STRBUF_INIT;
> +	int saved_errno = 0, ret, i;
> +...
> +	ret = start_command(&proc);
> +	if (ret)
> +		return ret;
> +
> +	sigchain_push(SIGPIPE, SIG_IGN);
> +
> +	for (i = 0; i < transaction->nr; i++) {
> +		struct ref_update *update = transaction->updates[i];
> + ...
> +		if (write_in_full(proc.in, buf.buf, buf.len) < 0) {
> +			if (errno != EPIPE)
> +				saved_errno = errno;
> +			break;
> +		}
> +	}
> +
> +	close(proc.in);
> +	sigchain_pop(SIGPIPE);
> +	strbuf_release(&buf);
> +
> +	ret = finish_command(&proc);
> +	if (ret)
> +		return ret;
> +
> +	return saved_errno;
> +}

OK, the only thing that looked a bit tricky was the "saved_errno"
that is used in an unusual (relative to its name) way.  The use
isn't incorrect per-se, but it rubs readers' expectation the wrong
way to use the variable named saved_errno for any purpose other than
the established pattern:

	saved_errno = errno;
	if (some_libcall_that_may_update_errno()) {
		... deal with an error and perform
		... some fallback action
	}
	errno = saved_errno;

that allows the caller to be oblivious to the library call that is
made as a mere implementation detail whose failure does not matter
to the caller.

In any case, the idea of the code in the patch is to make sure we
remember the fact that we failed to write (or caught any other
error, if we added more calls in the future) in a variable, and make
sure we return an error even if we manage to cleanly call
"finish_command()".  For that purpose, in order to avoid overwriting
the "ret" variable with the returned value from finish_command(), a
separate variable is needed, and "saved_errno" was picked for the
name of the variable.

But I do not think it is a good idea to return the errno in one
codepath when the function can return an error status that is not an
errno that is received from start_command() and finish_command().
The caller of this function cannot (and probably do not want to)
tell what the failed syscall was and would be checking if the return
value was success (=0) or failure (<0).

So I'd rather simplify the error handling to

 - Remove "saved_errno"; instead initialize ret to 0 at the beginning;

 - Return "ret" even if we return hardcoded 0 in the earlier part
   for consistency;

 - Update "ret" in the loop to -1 (the same error return status that
   is returned by start_command() and finish_command()) when we
   found a write error that we are not ignoring before breaking out
   of the loop.

 - We need to call finish_command() even if we earlier saw an error
   once we successfully called start_command().  So do something
   like this:

	ret |= finish_command(&proc);
	return ret;

   to make sure we retain an earlier error in "ret", we
   unconditionally call finish_command() when the control reaches
   there, and we mark the result a failure when finish_command()
   fails.

if I were writing this function.

Thanks.

diff --git a/Documentation/githooks.txt b/Documentation/githooks.txt
index 81f2a87e88..642471109f 100644
--- a/Documentation/githooks.txt
+++ b/Documentation/githooks.txt
@@ -404,6 +404,35 @@  Both standard output and standard error output are forwarded to
 `git send-pack` on the other end, so you can simply `echo` messages
 for the user.
 
+ref-transaction
+~~~~~~~~~~~~~~~
+
+This hook is invoked by any Git command that performs reference
+updates. It executes whenever a reference transaction is prepared,
+committed or aborted and may thus get called multiple times.
+
+The hook takes exactly one argument, which is the current state the
+given reference transaction is in:
+
+    - "prepared": All reference updates have been queued to the
+      transaction and references were locked on disk.
+
+    - "committed": The reference transaction was committed and all
+      references now have their respective new value.
+
+    - "aborted": The reference transaction was aborted, no changes
+      were performed and the locks have been released.
+
+For each reference update that was added to the transaction, the hook
+receives on standard input a line of the format:
+
+  <old-value> SP <new-value> SP <ref-name> LF
+
+The exit status of the hook is ignored for any state except for the
+"prepared" state. In the "prepared" state, a non-zero exit status will
+cause the transaction to be aborted. The hook will not be called with
+"aborted" state in that case.
+
 push-to-checkout
 ~~~~~~~~~~~~~~~~
 
diff --git a/refs.c b/refs.c
index 224ff66c7b..c4961f4ad9 100644
--- a/refs.c
+++ b/refs.c
@@ -9,6 +9,7 @@ 
 #include "iterator.h"
 #include "refs.h"
 #include "refs/refs-internal.h"
+#include "run-command.h"
 #include "object-store.h"
 #include "object.h"
 #include "tag.h"
@@ -16,6 +17,7 @@ 
 #include "worktree.h"
 #include "argv-array.h"
 #include "repository.h"
+#include "sigchain.h"
 
 /*
  * List of all available backends
@@ -1986,10 +1988,68 @@  int ref_update_reject_duplicates(struct string_list *refnames,
 	return 0;
 }
 
+static const char hook_not_found;
+static const char *hook;
+
+static int run_transaction_hook(struct ref_transaction *transaction,
+				const char *state)
+{
+	struct child_process proc = CHILD_PROCESS_INIT;
+	struct strbuf buf = STRBUF_INIT;
+	int saved_errno = 0, ret, i;
+
+	if (hook == &hook_not_found)
+		return 0;
+	if (!hook)
+		hook = find_hook("reference-transaction");
+	if (!hook) {
+		hook = &hook_not_found;
+		return 0;
+	}
+
+	argv_array_pushl(&proc.args, hook, state, NULL);
+	proc.in = -1;
+	proc.stdout_to_stderr = 1;
+	proc.trace2_hook_name = "reference-transaction";
+
+	ret = start_command(&proc);
+	if (ret)
+		return ret;
+
+	sigchain_push(SIGPIPE, SIG_IGN);
+
+	for (i = 0; i < transaction->nr; i++) {
+		struct ref_update *update = transaction->updates[i];
+
+		strbuf_reset(&buf);
+		strbuf_addf(&buf, "%s %s %s\n",
+			    oid_to_hex(&update->old_oid),
+			    oid_to_hex(&update->new_oid),
+			    update->refname);
+
+		if (write_in_full(proc.in, buf.buf, buf.len) < 0) {
+			if (errno != EPIPE)
+				saved_errno = errno;
+			break;
+		}
+	}
+
+	close(proc.in);
+	sigchain_pop(SIGPIPE);
+	strbuf_release(&buf);
+
+	ret = finish_command(&proc);
+	if (ret)
+		return ret;
+
+	return saved_errno;
+}
+
 int ref_transaction_prepare(struct ref_transaction *transaction,
 			    struct strbuf *err)
 {
 	struct ref_store *refs = transaction->ref_store;
+	int ret;
 
 	switch (transaction->state) {
 	case REF_TRANSACTION_OPEN:
@@ -2012,7 +2072,17 @@  int ref_transaction_prepare(struct ref_transaction *transaction,
 		return -1;
 	}
 
-	return refs->be->transaction_prepare(refs, transaction, err);
+	ret = refs->be->transaction_prepare(refs, transaction, err);
+	if (ret)
+		return ret;
+
+	ret = run_transaction_hook(transaction, "prepared");
+	if (ret) {
+		ref_transaction_abort(transaction, err);
+		die(_("ref updates aborted by hook"));
+	}
+
+	return 0;
 }
 
 int ref_transaction_abort(struct ref_transaction *transaction,
@@ -2036,6 +2106,8 @@  int ref_transaction_abort(struct ref_transaction *transaction,
 		break;
 	}
 
+	run_transaction_hook(transaction, "aborted");
+
 	ref_transaction_free(transaction);
 	return ret;
 }
@@ -2064,7 +2136,10 @@  int ref_transaction_commit(struct ref_transaction *transaction,
 		break;
 	}
 
-	return refs->be->transaction_finish(refs, transaction, err);
+	ret = refs->be->transaction_finish(refs, transaction, err);
+	if (!ret)
+		run_transaction_hook(transaction, "committed");
+	return ret;
 }
 
 int refs_verify_refname_available(struct ref_store *refs,
diff --git a/t/perf/p1400-update-ref.sh b/t/perf/p1400-update-ref.sh
new file mode 100755
index 0000000000..d275a81248
--- /dev/null
+++ b/t/perf/p1400-update-ref.sh
@@ -0,0 +1,32 @@ 
+#!/bin/sh
+
+test_description="Tests performance of update-ref"
+
+. ./perf-lib.sh
+
+test_perf_fresh_repo
+
+test_expect_success "setup" '
+	test_commit PRE &&
+	test_commit POST &&
+	printf "create refs/heads/%d PRE\n" $(test_seq 1000) >create &&
+	printf "update refs/heads/%d POST PRE\n" $(test_seq 1000) >update &&
+	printf "delete refs/heads/%d POST\n" $(test_seq 1000) >delete
+'
+
+test_perf "update-ref" '
+	for i in $(test_seq 1000)
+	do
+		git update-ref refs/heads/branch PRE &&
+		git update-ref refs/heads/branch POST PRE &&
+		git update-ref -d refs/heads/branch
+	done
+'
+
+test_perf "update-ref --stdin" '
+	git update-ref --stdin <create &&
+	git update-ref --stdin <update &&
+	git update-ref --stdin <delete
+'
+
+test_done
diff --git a/t/t1416-ref-transaction-hooks.sh b/t/t1416-ref-transaction-hooks.sh
new file mode 100755
index 0000000000..da58d867a5
--- /dev/null
+++ b/t/t1416-ref-transaction-hooks.sh
@@ -0,0 +1,109 @@ 
+#!/bin/sh
+
+test_description='reference transaction hooks'
+
+. ./test-lib.sh
+
+test_expect_success setup '
+	mkdir -p .git/hooks &&
+	test_commit PRE &&
+	test_commit POST &&
+	POST_OID=$(git rev-parse POST)
+'
+
+test_expect_success 'hook allows updating ref if successful' '
+	test_when_finished "rm .git/hooks/reference-transaction" &&
+	git reset --hard PRE &&
+	write_script .git/hooks/reference-transaction <<-\EOF &&
+		echo "$*" >>actual
+	EOF
+	cat >expect <<-EOF &&
+		prepared
+		committed
+	EOF
+	git update-ref HEAD POST &&
+	test_cmp expect actual
+'
+
+test_expect_success 'hook aborts updating ref in prepared state' '
+	test_when_finished "rm .git/hooks/reference-transaction" &&
+	git reset --hard PRE &&
+	write_script .git/hooks/reference-transaction <<-\EOF &&
+		if test "$1" = prepared
+		then
+			exit 1
+		fi
+	EOF
+	test_must_fail git update-ref HEAD POST 2>err &&
+	test_i18ngrep "ref updates aborted by hook" err
+'
+
+test_expect_success 'hook gets all queued updates in prepared state' '
+	test_when_finished "rm .git/hooks/reference-transaction actual" &&
+	git reset --hard PRE &&
+	write_script .git/hooks/reference-transaction <<-\EOF &&
+		if test "$1" = prepared
+		then
+			while read -r line
+			do
+				printf "%s\n" "$line"
+			done >actual
+		fi
+	EOF
+	cat >expect <<-EOF &&
+		$ZERO_OID $POST_OID HEAD
+		$ZERO_OID $POST_OID refs/heads/master
+	EOF
+	git update-ref HEAD POST <<-EOF &&
+		update HEAD $ZERO_OID $POST_OID
+		update refs/heads/master $ZERO_OID $POST_OID
+	EOF
+	test_cmp expect actual
+'
+
+test_expect_success 'hook gets all queued updates in committed state' '
+	test_when_finished "rm .git/hooks/reference-transaction actual" &&
+	git reset --hard PRE &&
+	write_script .git/hooks/reference-transaction <<-\EOF &&
+		if test "$1" = committed
+		then
+			while read -r line
+			do
+				printf "%s\n" "$line"
+			done >actual
+		fi
+	EOF
+	cat >expect <<-EOF &&
+		$ZERO_OID $POST_OID HEAD
+		$ZERO_OID $POST_OID refs/heads/master
+	EOF
+	git update-ref HEAD POST &&
+	test_cmp expect actual
+'
+
+test_expect_success 'hook gets all queued updates in aborted state' '
+	test_when_finished "rm .git/hooks/reference-transaction actual" &&
+	git reset --hard PRE &&
+	write_script .git/hooks/reference-transaction <<-\EOF &&
+		if test "$1" = aborted
+		then
+			while read -r line
+			do
+				printf "%s\n" "$line"
+			done >actual
+		fi
+	EOF
+	cat >expect <<-EOF &&
+		$ZERO_OID $POST_OID HEAD
+		$ZERO_OID $POST_OID refs/heads/master
+	EOF
+	git update-ref --stdin <<-EOF &&
+		start
+		update HEAD POST $ZERO_OID
+		update refs/heads/master POST $ZERO_OID
+		abort
+	EOF
+	test_cmp expect actual
+'
+
+test_done

[v3] refs: implement reference transaction hook

Commit Message

Comments

Patch