diff mbox series

[v3,6/6] refs: skip hooks when deleting uncovered packed refs

Message ID 279eadc41cda9ceea4c5317d6a4c358c18d50ce9.1642054003.git.ps@pks.im (mailing list archive)
State Superseded
Headers show
Series refs: excessive hook execution with packed refs | expand

Commit Message

Patrick Steinhardt Jan. 13, 2022, 6:11 a.m. UTC
When deleting refs from the loose-files refs backend, then we need to be
careful to also delete the same ref from the packed refs backend, if it
exists. If we don't, then deleting the loose ref would "uncover" the
packed ref. We thus always have to queue up deletions of refs for both
the loose and the packed refs backend. This is done in two separate
transactions, where the end result is that the reference-transaction
hook is executed twice for the deleted refs.

This behaviour is quite misleading: it's exposing implementation details
of how the files backend works to the user, in contrast to the logical
updates that we'd really want to expose via the hook. Worse yet, whether
the hook gets executed once or twice depends on how well-packed the
repository is: if the ref only exists as a loose ref, then we execute it
once, otherwise if it is also packed then we execute it twice.

Fix this behaviour and don't execute the reference-transaction hook at
all when refs in the packed-refs backend if it's driven by the files
backend. This works as expected even in case the refs to be deleted only
exist in the packed-refs backend because the loose-backend always queues
refs in its own transaction even if they don't exist such that they can
be locked for concurrent creation. And it also does the right thing in
case neither of the backends has the ref because that would cause the
transaction to fail completely.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/files-backend.c             | 9 ++++++---
 t/t1416-ref-transaction-hooks.sh | 7 +------
 2 files changed, 7 insertions(+), 9 deletions(-)

Comments

Ævar Arnfjörð Bjarmason Jan. 13, 2022, 1:04 p.m. UTC | #1
On Thu, Jan 13 2022, Patrick Steinhardt wrote:

> [[PGP Signed Part:Undecided]]
> When deleting refs from the loose-files refs backend, then we need to be
> careful to also delete the same ref from the packed refs backend, if it
> exists. If we don't, then deleting the loose ref would "uncover" the
> packed ref. We thus always have to queue up deletions of refs for both
> the loose and the packed refs backend. This is done in two separate
> transactions, where the end result is that the reference-transaction
> hook is executed twice for the deleted refs.

But do we (which would be an issue before this series) delete the loose
and and then the packed one, thus racily exposing the stale ref to any
concurrent repository reader, or do we first update the packed ref to
the valu of the now-locked loose ref to avoid such a race?

> [...]
> Fix this behaviour and don't execute the reference-transaction hook at
> all when refs in the packed-refs backend if it's driven by the files
> backend. This works as expected even in case the refs to be deleted only
> exist in the packed-refs backend because the loose-backend always queues
> refs in its own transaction even if they don't exist such that they can
> be locked for concurrent creation. And it also does the right thing in
> case neither of the backends has the ref because that would cause the
> transaction to fail completely.

I do wonder if the fundimental approach here is the right
one. I.e. changing the hook to only expose "real" updates, as opposed to
leaving it as a lower-level facility to listed in on any sort of ref
updates.

In such a scenario we could imagine adding a third parameter or
otherwise flag the update as "real" to the hook, so a dumber hook
consumer could ignore the more verbose inter-transactional chatter.

I say that because this change does the right thing for the use-case you
have in mind, but if you e.g. imagine a more gentle background-friendly
"gc" such a thing could be implemented by backing off as soon as it sees
an ongoing transaction being started.

With my ae35e16cd43 (reflog expire: don't lock reflogs using previously
seen OID, 2021-08-23) not getting that more chatty data should be be OK
for such a hypothetical hook.

But we might have more avoidable tripping over locks as the gc and ref
transaction race one another to lock various things in the repository.

Or maybe nobody cares in practice, just food for thought.
Patrick Steinhardt Jan. 17, 2022, 7:56 a.m. UTC | #2
On Thu, Jan 13, 2022 at 02:04:38PM +0100, Ævar Arnfjörð Bjarmason wrote:
> 
> On Thu, Jan 13 2022, Patrick Steinhardt wrote:
> 
> > [[PGP Signed Part:Undecided]]
> > When deleting refs from the loose-files refs backend, then we need to be
> > careful to also delete the same ref from the packed refs backend, if it
> > exists. If we don't, then deleting the loose ref would "uncover" the
> > packed ref. We thus always have to queue up deletions of refs for both
> > the loose and the packed refs backend. This is done in two separate
> > transactions, where the end result is that the reference-transaction
> > hook is executed twice for the deleted refs.
> 
> But do we (which would be an issue before this series) delete the loose
> and and then the packed one, thus racily exposing the stale ref to any
> concurrent repository reader, or do we first update the packed ref to
> the valu of the now-locked loose ref to avoid such a race?

We first commit the packed-refs file so that the stale ref is not
exposed.

> > [...]
> > Fix this behaviour and don't execute the reference-transaction hook at
> > all when refs in the packed-refs backend if it's driven by the files
> > backend. This works as expected even in case the refs to be deleted only
> > exist in the packed-refs backend because the loose-backend always queues
> > refs in its own transaction even if they don't exist such that they can
> > be locked for concurrent creation. And it also does the right thing in
> > case neither of the backends has the ref because that would cause the
> > transaction to fail completely.
> 
> I do wonder if the fundimental approach here is the right
> one. I.e. changing the hook to only expose "real" updates, as opposed to
> leaving it as a lower-level facility to listed in on any sort of ref
> updates.
> 
> In such a scenario we could imagine adding a third parameter or
> otherwise flag the update as "real" to the hook, so a dumber hook
> consumer could ignore the more verbose inter-transactional chatter.
> 
> I say that because this change does the right thing for the use-case you
> have in mind, but if you e.g. imagine a more gentle background-friendly
> "gc" such a thing could be implemented by backing off as soon as it sees
> an ongoing transaction being started.

I've mostly been acting on the original report by Waleed. And I tend to
agree with his report given that we also got a workaround at GitLab
which filters out reference transactions which only consist of force
deletions because they're likely to be pruning refs in the packed
backend which are about to be uncovered. The result is that execution of
the reftx hook is dependent on how well-packed a repository's refs are:
when refs are packed we execute the hook twice, whereas we execute it
once when it's not well-packed. This is surprising behaviour, even
though one can definitely argue that it's just working as intended.

I think ultimately the question boils down to whether we want to treat
the files backend as a single compound backend and whether the reftx
hook should treat it like that. If we treat it as a single backend, then
we shouldn't report a change in refs when pruning about-to-be-uncovered
refs given that it wouldn't have been visible, but it's only internal
cleanup. And neither should we report ref changes when repacking refs
into a single file given that from the backend's perspective nothing is
about to change.

Patrick

> With my ae35e16cd43 (reflog expire: don't lock reflogs using previously
> seen OID, 2021-08-23) not getting that more chatty data should be be OK
> for such a hypothetical hook.
> 
> But we might have more avoidable tripping over locks as the gc and ref
> transaction race one another to lock various things in the repository.
> 
> Or maybe nobody cares in practice, just food for thought.
diff mbox series

Patch

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 3c0f3027fe..9a20cb8fa8 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -1261,7 +1261,8 @@  static int files_delete_refs(struct ref_store *ref_store, const char *msg,
 	if (packed_refs_lock(refs->packed_ref_store, 0, &err))
 		goto error;
 
-	transaction = ref_store_transaction_begin(refs->packed_ref_store, 0, &err);
+	transaction = ref_store_transaction_begin(refs->packed_ref_store,
+						  REF_TRANSACTION_SKIP_HOOK, &err);
 	if (!transaction)
 		goto error;
 
@@ -2775,7 +2776,8 @@  static int files_transaction_prepare(struct ref_store *ref_store,
 			 */
 			if (!packed_transaction) {
 				packed_transaction = ref_store_transaction_begin(
-						refs->packed_ref_store, 0, err);
+						refs->packed_ref_store,
+						REF_TRANSACTION_SKIP_HOOK, err);
 				if (!packed_transaction) {
 					ret = TRANSACTION_GENERIC_ERROR;
 					goto cleanup;
@@ -3046,7 +3048,8 @@  static int files_initial_transaction_commit(struct ref_store *ref_store,
 				 &affected_refnames))
 		BUG("initial ref transaction called with existing refs");
 
-	packed_transaction = ref_store_transaction_begin(refs->packed_ref_store, 0, err);
+	packed_transaction = ref_store_transaction_begin(refs->packed_ref_store,
+							 REF_TRANSACTION_SKIP_HOOK, err);
 	if (!packed_transaction) {
 		ret = TRANSACTION_GENERIC_ERROR;
 		goto cleanup;
diff --git a/t/t1416-ref-transaction-hooks.sh b/t/t1416-ref-transaction-hooks.sh
index f9d3d5213f..4e1e84a91f 100755
--- a/t/t1416-ref-transaction-hooks.sh
+++ b/t/t1416-ref-transaction-hooks.sh
@@ -175,16 +175,11 @@  test_expect_success 'deleting packed ref calls hook once' '
 	git update-ref -d refs/heads/to-be-deleted $POST_OID &&
 
 	# We only expect a single hook invocation, which is the logical
-	# deletion. But currently, we see two interleaving transactions, once
-	# for deleting the loose refs and once for deleting the packed ref.
+	# deletion.
 	cat >expect <<-EOF &&
-		prepared
-		$ZERO_OID $ZERO_OID refs/heads/to-be-deleted
 		prepared
 		$POST_OID $ZERO_OID refs/heads/to-be-deleted
 		committed
-		$ZERO_OID $ZERO_OID refs/heads/to-be-deleted
-		committed
 		$POST_OID $ZERO_OID refs/heads/to-be-deleted
 	EOF