Message ID | 279eadc41cda9ceea4c5317d6a4c358c18d50ce9.1642054003.git.ps@pks.im (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | refs: excessive hook execution with packed refs | expand |
On Thu, Jan 13 2022, Patrick Steinhardt wrote: > [[PGP Signed Part:Undecided]] > When deleting refs from the loose-files refs backend, then we need to be > careful to also delete the same ref from the packed refs backend, if it > exists. If we don't, then deleting the loose ref would "uncover" the > packed ref. We thus always have to queue up deletions of refs for both > the loose and the packed refs backend. This is done in two separate > transactions, where the end result is that the reference-transaction > hook is executed twice for the deleted refs. But do we (which would be an issue before this series) delete the loose and and then the packed one, thus racily exposing the stale ref to any concurrent repository reader, or do we first update the packed ref to the valu of the now-locked loose ref to avoid such a race? > [...] > Fix this behaviour and don't execute the reference-transaction hook at > all when refs in the packed-refs backend if it's driven by the files > backend. This works as expected even in case the refs to be deleted only > exist in the packed-refs backend because the loose-backend always queues > refs in its own transaction even if they don't exist such that they can > be locked for concurrent creation. And it also does the right thing in > case neither of the backends has the ref because that would cause the > transaction to fail completely. I do wonder if the fundimental approach here is the right one. I.e. changing the hook to only expose "real" updates, as opposed to leaving it as a lower-level facility to listed in on any sort of ref updates. In such a scenario we could imagine adding a third parameter or otherwise flag the update as "real" to the hook, so a dumber hook consumer could ignore the more verbose inter-transactional chatter. I say that because this change does the right thing for the use-case you have in mind, but if you e.g. imagine a more gentle background-friendly "gc" such a thing could be implemented by backing off as soon as it sees an ongoing transaction being started. With my ae35e16cd43 (reflog expire: don't lock reflogs using previously seen OID, 2021-08-23) not getting that more chatty data should be be OK for such a hypothetical hook. But we might have more avoidable tripping over locks as the gc and ref transaction race one another to lock various things in the repository. Or maybe nobody cares in practice, just food for thought.
On Thu, Jan 13, 2022 at 02:04:38PM +0100, Ævar Arnfjörð Bjarmason wrote: > > On Thu, Jan 13 2022, Patrick Steinhardt wrote: > > > [[PGP Signed Part:Undecided]] > > When deleting refs from the loose-files refs backend, then we need to be > > careful to also delete the same ref from the packed refs backend, if it > > exists. If we don't, then deleting the loose ref would "uncover" the > > packed ref. We thus always have to queue up deletions of refs for both > > the loose and the packed refs backend. This is done in two separate > > transactions, where the end result is that the reference-transaction > > hook is executed twice for the deleted refs. > > But do we (which would be an issue before this series) delete the loose > and and then the packed one, thus racily exposing the stale ref to any > concurrent repository reader, or do we first update the packed ref to > the valu of the now-locked loose ref to avoid such a race? We first commit the packed-refs file so that the stale ref is not exposed. > > [...] > > Fix this behaviour and don't execute the reference-transaction hook at > > all when refs in the packed-refs backend if it's driven by the files > > backend. This works as expected even in case the refs to be deleted only > > exist in the packed-refs backend because the loose-backend always queues > > refs in its own transaction even if they don't exist such that they can > > be locked for concurrent creation. And it also does the right thing in > > case neither of the backends has the ref because that would cause the > > transaction to fail completely. > > I do wonder if the fundimental approach here is the right > one. I.e. changing the hook to only expose "real" updates, as opposed to > leaving it as a lower-level facility to listed in on any sort of ref > updates. > > In such a scenario we could imagine adding a third parameter or > otherwise flag the update as "real" to the hook, so a dumber hook > consumer could ignore the more verbose inter-transactional chatter. > > I say that because this change does the right thing for the use-case you > have in mind, but if you e.g. imagine a more gentle background-friendly > "gc" such a thing could be implemented by backing off as soon as it sees > an ongoing transaction being started. I've mostly been acting on the original report by Waleed. And I tend to agree with his report given that we also got a workaround at GitLab which filters out reference transactions which only consist of force deletions because they're likely to be pruning refs in the packed backend which are about to be uncovered. The result is that execution of the reftx hook is dependent on how well-packed a repository's refs are: when refs are packed we execute the hook twice, whereas we execute it once when it's not well-packed. This is surprising behaviour, even though one can definitely argue that it's just working as intended. I think ultimately the question boils down to whether we want to treat the files backend as a single compound backend and whether the reftx hook should treat it like that. If we treat it as a single backend, then we shouldn't report a change in refs when pruning about-to-be-uncovered refs given that it wouldn't have been visible, but it's only internal cleanup. And neither should we report ref changes when repacking refs into a single file given that from the backend's perspective nothing is about to change. Patrick > With my ae35e16cd43 (reflog expire: don't lock reflogs using previously > seen OID, 2021-08-23) not getting that more chatty data should be be OK > for such a hypothetical hook. > > But we might have more avoidable tripping over locks as the gc and ref > transaction race one another to lock various things in the repository. > > Or maybe nobody cares in practice, just food for thought.
diff --git a/refs/files-backend.c b/refs/files-backend.c index 3c0f3027fe..9a20cb8fa8 100644 --- a/refs/files-backend.c +++ b/refs/files-backend.c @@ -1261,7 +1261,8 @@ static int files_delete_refs(struct ref_store *ref_store, const char *msg, if (packed_refs_lock(refs->packed_ref_store, 0, &err)) goto error; - transaction = ref_store_transaction_begin(refs->packed_ref_store, 0, &err); + transaction = ref_store_transaction_begin(refs->packed_ref_store, + REF_TRANSACTION_SKIP_HOOK, &err); if (!transaction) goto error; @@ -2775,7 +2776,8 @@ static int files_transaction_prepare(struct ref_store *ref_store, */ if (!packed_transaction) { packed_transaction = ref_store_transaction_begin( - refs->packed_ref_store, 0, err); + refs->packed_ref_store, + REF_TRANSACTION_SKIP_HOOK, err); if (!packed_transaction) { ret = TRANSACTION_GENERIC_ERROR; goto cleanup; @@ -3046,7 +3048,8 @@ static int files_initial_transaction_commit(struct ref_store *ref_store, &affected_refnames)) BUG("initial ref transaction called with existing refs"); - packed_transaction = ref_store_transaction_begin(refs->packed_ref_store, 0, err); + packed_transaction = ref_store_transaction_begin(refs->packed_ref_store, + REF_TRANSACTION_SKIP_HOOK, err); if (!packed_transaction) { ret = TRANSACTION_GENERIC_ERROR; goto cleanup; diff --git a/t/t1416-ref-transaction-hooks.sh b/t/t1416-ref-transaction-hooks.sh index f9d3d5213f..4e1e84a91f 100755 --- a/t/t1416-ref-transaction-hooks.sh +++ b/t/t1416-ref-transaction-hooks.sh @@ -175,16 +175,11 @@ test_expect_success 'deleting packed ref calls hook once' ' git update-ref -d refs/heads/to-be-deleted $POST_OID && # We only expect a single hook invocation, which is the logical - # deletion. But currently, we see two interleaving transactions, once - # for deleting the loose refs and once for deleting the packed ref. + # deletion. cat >expect <<-EOF && - prepared - $ZERO_OID $ZERO_OID refs/heads/to-be-deleted prepared $POST_OID $ZERO_OID refs/heads/to-be-deleted committed - $ZERO_OID $ZERO_OID refs/heads/to-be-deleted - committed $POST_OID $ZERO_OID refs/heads/to-be-deleted EOF
When deleting refs from the loose-files refs backend, then we need to be careful to also delete the same ref from the packed refs backend, if it exists. If we don't, then deleting the loose ref would "uncover" the packed ref. We thus always have to queue up deletions of refs for both the loose and the packed refs backend. This is done in two separate transactions, where the end result is that the reference-transaction hook is executed twice for the deleted refs. This behaviour is quite misleading: it's exposing implementation details of how the files backend works to the user, in contrast to the logical updates that we'd really want to expose via the hook. Worse yet, whether the hook gets executed once or twice depends on how well-packed the repository is: if the ref only exists as a loose ref, then we execute it once, otherwise if it is also packed then we execute it twice. Fix this behaviour and don't execute the reference-transaction hook at all when refs in the packed-refs backend if it's driven by the files backend. This works as expected even in case the refs to be deleted only exist in the packed-refs backend because the loose-backend always queues refs in its own transaction even if they don't exist such that they can be locked for concurrent creation. And it also does the right thing in case neither of the backends has the ref because that would cause the transaction to fail completely. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- refs/files-backend.c | 9 ++++++--- t/t1416-ref-transaction-hooks.sh | 7 +------ 2 files changed, 7 insertions(+), 9 deletions(-)