Message ID | pull.1228.git.1651676435634.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | refs: honor reference-transaction semantics when deleting refs | expand |
On Wed, May 04, 2022 at 03:00:35PM +0000, Michael Heemskerk via GitGitGadget wrote: > From: Michael Heemskerk <mheemskerk@atlassian.com> > > When deleting refs from the loose-files refs backend, files_delete_refs > first rewrites packed-refs if any of the to-be-deleted refs were packed > and then removes loose refs. While making those changes, it invokes the > reference-transaction hook incorrectly; a single transaction is used to > prepare and commit the changes to packed-refs, followed by another > transaction per deleted ref, each with another prepared and committed > reference-transaction hook invocation. > > This behaviour is problematic for a number of reasons. First of all, > deleting a ref through `git branch -d` or `git tag -d` results in a > different sequence of reference-transaction callbacks than deleting the > same ref through `git update-ref`: > > - update-ref of a loose ref: aborted, prepared, committed > - update-ref of a packed ref: prepared, prepared, committed, commited > - branch -d: prepared, committed, aborted, prepared, committed > > The bigger problem is that the packed-refs update is committed before > the prepared reference-transaction invocation is done for the loose > ref. Returning a non-zero exit code from the second > reference-transaction callback leads to an invalid sequence of > reference-transaction callbacks: > > 1. prepared - hook returns 0, so the change is allowed to go through. > 2. committed - git informs the hook that the changes are now final, > but are they really? Any loose refs may still survive if the > subsequent prepared callback is canceled. > 3. aborted - 'fake' invocation from the packed-transaction because the > ref does not exist in packed-refs. > 4. prepared - hook returns 1, so the change should be blocked. > 5. aborted - git informs the hook that it has rolled back the change, > but it really hasn't; packed-refs has already been rewritten and > if the ref only existed as a packed ref, it is now gone. > > The changes to the reference-transaction invocations that were planned > for git 2.36 but have been reverted make the problem more pronounced. > Those changes suppress the 'fake' invocations for the packed-transaction > (invocations 1-3 in the above list). In that case, the prepared callback > in step 4 cannot prevent a ref from being deleted if it only existed in > packed-refs. > > To address the issue, the use a separate transactions to update the > packed and loose refs has been removed from files_delete_refs. Instead, > it now uses a single transaction, queues up the refs-to-be-deleted and > relies on the standard transaction mechanism to invoke the > reference-transaction hooks as expected. > > Signed-off-by: Michael Heemskerk <mheemskerk@atlassian.com> > --- > refs: honor reference-transaction semantics when deleting refs > > When deleting refs from the loose-files refs backend, files_delete_refs > first rewrites packed-refs if any of the to-be-deleted refs were packed > and then removes loose refs. While making those changes, it invokes the > reference-transaction hook incorrectly; a single transaction is used to > prepare and commit the changes to packed-refs, followed by another > transaction per deleted ref, each with another prepared and committed > reference-transaction hook invocation. > > This behaviour is problematic for a number of reasons. First of all, > deleting a ref through git branch -d or git tag -d results in a > different sequence of reference-transaction callbacks than deleting the > same ref through git update-ref: > > * update-ref of a loose ref: aborted, prepared, committed > * update-ref of a packed ref: prepared, prepared, committed, commited > * branch -d: prepared, committed, aborted, prepared, committed > > The bigger problem is that the packed-refs update is committed before > the prepared reference-transaction invocation is done for the loose ref. > Returning a non-zero exit code from the second reference-transaction > callback leads to an invalid sequence of reference-transaction > callbacks: > > 1. prepared - hook returns 0, so the change is allowed to go through. > 2. committed - git informs the hook that the changes are now final, but > are they really? Any loose refs may still survive if the subsequent > prepared callback is canceled. > 3. aborted - 'fake' invocation from the packed-transaction because the > ref does not exist in packed-refs. > 4. prepared - hook returns 1, so the change should be blocked. > 5. aborted - git informs the hook that it has rolled back the change, > but it really hasn't; packed-refs has already been rewritten and if > the ref only existed as a packed ref, it is now gone. > > The changes to the reference-transaction invocations that were planned > for git 2.36 but have been reverted make the problem more pronounced. > Those changes suppress the 'fake' invocations for the packed-transaction > (invocations 1-3 in the above list). In that case, the prepared callback > in step 4 cannot prevent a ref from being deleted if it only existed in > packed-refs. > > To address the issue, the use a separate transactions to update the > packed and loose refs has been removed from files_delete_refs. Instead, > it now uses a single transaction, queues up the refs-to-be-deleted and > relies on the standard transaction mechanism to invoke the > reference-transaction hooks as expected. > > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1228%2Fmheemskerk%2Ffiles-delete-ref-reference-transaction-v1 > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1228/mheemskerk/files-delete-ref-reference-transaction-v1 > Pull-Request: https://github.com/gitgitgadget/git/pull/1228 > > refs/files-backend.c | 33 +++++++-------- > t/t1416-ref-transaction-hooks.sh | 70 ++++++++++++++++++++++++++++++++ > t/t5510-fetch.sh | 6 +-- > 3 files changed, 87 insertions(+), 22 deletions(-) > > diff --git a/refs/files-backend.c b/refs/files-backend.c > index 8db7882aacb..5c23277eda7 100644 > --- a/refs/files-backend.c > +++ b/refs/files-backend.c > @@ -1265,44 +1265,39 @@ static int files_delete_refs(struct ref_store *ref_store, const char *msg, > struct files_ref_store *refs = > files_downcast(ref_store, REF_STORE_WRITE, "delete_refs"); > struct strbuf err = STRBUF_INIT; > - int i, result = 0; > + int i; > + struct ref_transaction *transaction; > > if (!refnames->nr) > return 0; > > - if (packed_refs_lock(refs->packed_ref_store, 0, &err)) > - goto error; > - > - if (refs_delete_refs(refs->packed_ref_store, msg, refnames, flags)) { > - packed_refs_unlock(refs->packed_ref_store); > + transaction = ref_store_transaction_begin(&refs->base, &err); > + if (!transaction) > goto error; > - } > - > - packed_refs_unlock(refs->packed_ref_store); > > for (i = 0; i < refnames->nr; i++) { > const char *refname = refnames->items[i].string; > - > - if (refs_delete_ref(&refs->base, msg, refname, NULL, flags)) > - result |= error(_("could not remove reference %s"), refname); > + if (ref_transaction_delete(transaction, refname, NULL, > + flags, msg, &err)) > + goto error; > } > > + if (ref_transaction_commit(transaction, &err)) > + goto error; > + > + ref_transaction_free(transaction); > strbuf_release(&err); > - return result; > + return 0; > > error: > - /* > - * If we failed to rewrite the packed-refs file, then it is > - * unsafe to try to remove loose refs, because doing so might > - * expose an obsolete packed value for a reference that might > - * even point at an object that has been garbage collected. > - */ > if (refnames->nr == 1) > error(_("could not delete reference %s: %s"), > refnames->items[0].string, err.buf); > else > error(_("could not delete references: %s"), err.buf); > > + if (transaction) > + ref_transaction_free(transaction); > strbuf_release(&err); > return -1; > } I really like these changes given that they simplify things, but I wonder whether we can do them. In the preimage we're eagerly removing loose refs: any error encountered when deleting a reference is recorded, but we keep on trying to remove the other refs, as well. With the new behaviour we now create a single transaction for all refs and try to commit it. This also means that we'll abort the transaction when locking any of the refs fails, which is a change in behaviour. The current behaviour is explicitly documented in `refs.h:refs_delete_refs()`: /* * Delete the specified references. If there are any problems, emit * errors but attempt to keep going (i.e., the deletes are not done in * an all-or-nothing transaction). msg and flags are passed through to * ref_transaction_delete(). */ int refs_delete_refs(struct ref_store *refs, const char *msg, struct string_list *refnames, unsigned int flags); There are multiple callsites of this function via `delete_refs()`. Now honestly, most of these callsites look somewhat broken: - `bisect.c` simply does its best to clean up bisect state. This usecase looks fine to me. - `builtin/branch.c` reports the branches as deleted even if `delete_refs()` failed. - `builtin/remote.c` also misreports the deleted branches for the `prune` verb. The `rm` verb looks alright: if deletion of any branch failed then it doesn't prune the remote's config in the end and reports an error. - `builtin/fetch.c` also misreports deleted branches with `--prune`. So most of these commands incorrectly handle the case where only a subset of branches has been deleted. This raises the question whether the interface provided by `refs_delete_refs()` is actually sensible if it's so easy to get wrong. It doesn't even report which branches could be removed and which couldn't. Furthermore, the question is whether new backends like the reftable backend which write all refs into a single slice would actually even be in a position to efficiently retain semantics of this function. I'm torn. There are valid usecases for eagerly deleting refs even if a subset of deletions failed, making this change a tough sell, but most of the callsites don't actually handle this correctly in the first place. > diff --git a/t/t1416-ref-transaction-hooks.sh b/t/t1416-ref-transaction-hooks.sh > index 27731722a5b..e3d4fe624f7 100755 > --- a/t/t1416-ref-transaction-hooks.sh > +++ b/t/t1416-ref-transaction-hooks.sh > @@ -133,4 +133,74 @@ test_expect_success 'interleaving hook calls succeed' ' > test_cmp expect target-repo.git/actual > ' > > +test_expect_success 'hook allows deleting loose ref if successful' ' > + test_when_finished "rm actual" && > + git branch to-be-deleted $PRE_OID && > + test_hook reference-transaction <<-\EOF && > + echo "$*" >>actual > + EOF > + cat >expect <<-EOF && > + aborted > + prepared > + committed > + EOF > + git branch -d to-be-deleted && > + test_cmp expect actual && > + test_must_fail git rev-parse refs/heads/to-be-deleted > +' > + > +test_expect_success 'hook allows deleting packed ref if successful' ' > + test_when_finished "rm actual" && > + git branch to-be-deleted $PRE_OID && > + git pack-refs --all --prune && > + test_hook reference-transaction <<-\EOF && > + echo "$*" >>actual > + EOF > + cat >expect <<-EOF && > + prepared > + prepared > + committed > + committed > + EOF > + git branch -d to-be-deleted && > + test_cmp expect actual && > + test_must_fail git rev-parse refs/heads/to-be-deleted > +' > + > +test_expect_success 'hook aborts deleting loose ref in prepared state' ' > + test_when_finished "rm actual" && > + test_when_finished "git branch -d to-be-deleted" && > + git branch to-be-deleted $PRE_OID && > + test_hook reference-transaction <<-\EOF && > + echo "$*" >>actual > + exit 1 > + EOF > + cat >expect <<-EOF && > + aborted > + prepared > + aborted > + EOF It's not really clear why we get the first "aborted" result here. I know that it is because we didn't queue up any refs to the packed backend, and thus we don't even try to prepare the transaction. But it's likely confusing to a reader and might thus warrant a comment. On the other hand this is going away anyway if and when my patch series lands again that stops calling the hook from the nested packed backend. > + test_must_fail git branch -d to-be-deleted && > + test_cmp expect actual && > + git rev-parse refs/heads/to-be-deleted > +' > + > +test_expect_success 'hook aborts deleting packed ref in prepared state' ' > + test_when_finished "rm actual" && > + test_when_finished "git branch -d to-be-deleted" && > + git branch to-be-deleted $PRE_OID && > + git pack-refs --all --prune && > + test_hook reference-transaction <<-\EOF && > + echo "$*" >>actual > + exit 1 > + EOF > + cat >expect <<-EOF && > + prepared > + aborted > + EOF > + test_must_fail git branch -d to-be-deleted && > + test_cmp expect actual && > + git rev-parse refs/heads/to-be-deleted > +' > + From your description one more interesting case is when the packed-refs transaction is committed, but the loose-refs backend is aborted. It's a bit harder to test given that we have no way to indicate what backend the reftx hook is being invoked from though. One thing that worries me is that these patches kind of set the current behaviour of driving the reftx hook via both packed and loose backend into stone. My patch series that got reverted is going to change that behaviour though so that we don't execute the hook from the packed backend, and consequentially we'd have to change all these tests again to match the new behaviour. This makes it a lot harder to argue though that we can safely switch to the new behaviour without breaking any assumptions when we even codified our current assumptions into tests. Taking a step back I wonder whether my previous approach to just hide the hook for the packed backend was the right thing to do though. An alternative would be to instead expose additional information to the hook so that it can decide by itself whether it wants to execute the hook or not. This could e.g. trivially be done by exposing a new "GIT_REFS_BACKEND" environment variable to the reftx hook that gets set to either "packed-refs", "loose-refs" or "reftables" depending on which backend is currently in use. Equipped with this information a hook script can then easily ignore any updates to the packed-refs file by itself without having to change the way it is invoked right now and thus we wouldn't regress any currently existing hooks. > test_done > diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh > index 4620f0ca7fa..8b09a99c2e8 100755 > --- a/t/t5510-fetch.sh > +++ b/t/t5510-fetch.sh > @@ -169,10 +169,10 @@ test_expect_success REFFILES 'fetch --prune fails to delete branches' ' > git clone . prune-fail && > cd prune-fail && > git update-ref refs/remotes/origin/extrabranch main && > - : this will prevent --prune from locking packed-refs for deleting refs, but adding loose refs still succeeds && > - >.git/packed-refs.new && > + : this will prevent --prune from locking refs/remotes/origin/extra for deletion && > + >.git/refs/remotes/origin/extrabranch.lock && > > - test_must_fail git fetch --prune origin > + test_must_fail git fetch --prune origin > outputs 2> errors It would be nice to have an explanation why exactly this change is needed, and why it is fine that the visible behaviour changes here. Patrick > > test_expect_success 'fetch --atomic works with a single branch' ' > > base-commit: 0f828332d5ac36fc63b7d8202652efa152809856 > -- > gitgitgadget
On Mon, May 9, 2022 at 8:23 AM Patrick Steinhardt <ps@pks.im> wrote: > > I really like these changes given that they simplify things, but I > wonder whether we can do them. In the preimage we're eagerly removing > loose refs: any error encountered when deleting a reference is recorded, > but we keep on trying to remove the other refs, as well. With the new > behaviour we now create a single transaction for all refs and try to > commit it. This also means that we'll abort the transaction when locking > any of the refs fails, which is a change in behaviour. > > The current behaviour is explicitly documented in `refs.h:refs_delete_refs()`: > > /* > * Delete the specified references. If there are any problems, emit > * errors but attempt to keep going (i.e., the deletes are not done in > * an all-or-nothing transaction). msg and flags are passed through to > * ref_transaction_delete(). > */ > int refs_delete_refs(struct ref_store *refs, const char *msg, > struct string_list *refnames, unsigned int flags); > > There are multiple callsites of this function via `delete_refs()`. Now > honestly, most of these callsites look somewhat broken: > > - `bisect.c` simply does its best to clean up bisect state. This > usecase looks fine to me. > > - `builtin/branch.c` reports the branches as deleted even if > `delete_refs()` failed. > > - `builtin/remote.c` also misreports the deleted branches for the > `prune` verb. The `rm` verb looks alright: if deletion of any > branch failed then it doesn't prune the remote's config in the end > and reports an error. > > - `builtin/fetch.c` also misreports deleted branches with `--prune`. > > So most of these commands incorrectly handle the case where only a > subset of branches has been deleted. This raises the question whether > the interface provided by `refs_delete_refs()` is actually sensible if > it's so easy to get wrong. It doesn't even report which branches could > be removed and which couldn't. Furthermore, the question is whether new > backends like the reftable backend which write all refs into a single > slice would actually even be in a position to efficiently retain > semantics of this function. > > I'm torn. There are valid usecases for eagerly deleting refs even if a > subset of deletions failed, making this change a tough sell, but most of > the callsites don't actually handle this correctly in the first place. Exactly; this is the reason that I initially suggested fixing the issue by just removing the upfront rewrite of packed-refs. With that rewrite removed, the refs-to-be-deleted are deleted in individual transactions, which may or may not rewrite packed-refs. The downside, as you correctly pointed out, is that we could end up rewriting packed-refs multiple times, which could come at a significant performance penalty for repositories with large packed-refs files. Unfortunately, the current approach of updating packed-refs in one transaction and updating the loose refs in individual transactions doesn't work either. So what are our options? - delete each of the refs in a separate transaction and pay a (potentially significant) performance penalty in repositories with large packed-refs files when deleting many refs. I'll note that this scenario is similar to deleting a set of refs through a non-atomic push. - switch to a single transaction and update refs.h:refs_delete_refs to use an all-or-nothing approach (the approach I've taken in my patch). - improve the reference-transaction mechanism to support the 'batch-of-transactions' mode more efficiently. If I remember correctly, something like that has been suggested before, but I'm not sure if it's actually been built or spiked. In this batch-of-transactions mode, git could try to prepare all refs, and only invoke the hook for the refs that could be successfully prepared. The hook should then be able to reject individual ref updates, and git would then apply only the non-rejected ref updates. While such a change would make many scenarios where multiple refs are being updated more efficient, it's also a much bigger change that's hard to make without breaking the current reference-transaction protocol. Sticking to a transaction per ref and rewriting packed-refs multiple times is the safer option. Deleting the refs in a single transaction is the more performant option, but changes the behavior. A stay/stale lock file could then make it impossible to remove a remote, or to prune /refs/remotes/ refs. My suggestion would be to stick to a transaction per ref and pay the same performance penalty as you'd get when deleting many refs through a non-atomic push. > > +test_expect_success 'hook aborts deleting loose ref in prepared state' ' > > + test_when_finished "rm actual" && > > + test_when_finished "git branch -d to-be-deleted" && > > + git branch to-be-deleted $PRE_OID && > > + test_hook reference-transaction <<-\EOF && > > + echo "$*" >>actual > > + exit 1 > > + EOF > > + cat >expect <<-EOF && > > + aborted > > + prepared > > + aborted > > + EOF > > It's not really clear why we get the first "aborted" result here. I know > that it is because we didn't queue up any refs to the packed backend, > and thus we don't even try to prepare the transaction. But it's likely > confusing to a reader and might thus warrant a comment. On the other > hand this is going away anyway if and when my patch series lands again > that stops calling the hook from the nested packed backend. I can add a comment explaining why we get the 'aborted' callback and also add a comment that that 'aborted' callback is expected to be removed in an upcoming version. > > + test_must_fail git branch -d to-be-deleted && > > + test_cmp expect actual && > > + git rev-parse refs/heads/to-be-deleted > > +' > > + > > +test_expect_success 'hook aborts deleting packed ref in prepared state' ' > > + test_when_finished "rm actual" && > > + test_when_finished "git branch -d to-be-deleted" && > > + git branch to-be-deleted $PRE_OID && > > + git pack-refs --all --prune && > > + test_hook reference-transaction <<-\EOF && > > + echo "$*" >>actual > > + exit 1 > > + EOF > > + cat >expect <<-EOF && > > + prepared > > + aborted > > + EOF > > + test_must_fail git branch -d to-be-deleted && > > + test_cmp expect actual && > > + git rev-parse refs/heads/to-be-deleted > > +' > > + > > From your description one more interesting case is when the packed-refs > transaction is committed, but the loose-refs backend is aborted. It's a > bit harder to test given that we have no way to indicate what backend > the reftx hook is being invoked from though. That is indeed the more interesting case, and exactly the case we saw fail when the packed-refs reference-transactions callbacks had been suppressed in 2.36.0-rc1 and -rc2. But as you said, they're pretty hard to test. In Bitbucket's tests we now verify from the reference-transaction callback that the ref-updates have not been committed (yet) when the prepared or aborted callback is received, and that they _have_ been committed when the committed callback is received. Perhaps a similar approach could be used in git's reference-transaction tests for more thorough testing. > One thing that worries me is that these patches kind of set the current > behaviour of driving the reftx hook via both packed and loose backend > into stone. My patch series that got reverted is going to change that > behaviour though so that we don't execute the hook from the packed > backend, and consequentially we'd have to change all these tests again > to match the new behaviour. This makes it a lot harder to argue though > that we can safely switch to the new behaviour without breaking any > assumptions when we even codified our current assumptions into tests. The counter argument to that is that it's kind of scary if you could remove half of the reference-transaction callbacks without needing to update a test. I'd rather have tests that verify current behavior that you need to update when you intentionally change the behavior, then not have those tests? > Taking a step back I wonder whether my previous approach to just hide > the hook for the packed backend was the right thing to do though. An > alternative would be to instead expose additional information to the > hook so that it can decide by itself whether it wants to execute the > hook or not. This could e.g. trivially be done by exposing a new > "GIT_REFS_BACKEND" environment variable to the reftx hook that gets set > to either "packed-refs", "loose-refs" or "reftables" depending on which > backend is currently in use. Equipped with this information a hook > script can then easily ignore any updates to the packed-refs file by > itself without having to change the way it is invoked right now and thus > we wouldn't regress any currently existing hooks. From the reference-transaction hook writer's perspective, the backend involved is an implementation detail that the hook should not have to care about. Getting separate callbacks for the loose and the packed backends makes it a lot harder to write a good reference-transaction hook, especially when the callbacks differ if a ref is packed or not. IMO, there should really be a "files" backend transaction, that internally takes care of locking individual refs and possibly "packed-refs" in case of a deletion. In addition, not getting "artificial" packed callbacks also saves us a few extra reference-transaction callbacks when deleting refs. Even if those take only a few ms per invocation, when deleting hundreds of refs, it's still something that we'd like to avoid if we can. > > test_done > > diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh > > index 4620f0ca7fa..8b09a99c2e8 100755 > > --- a/t/t5510-fetch.sh > > +++ b/t/t5510-fetch.sh > > @@ -169,10 +169,10 @@ test_expect_success REFFILES 'fetch --prune fails to delete branches' ' > > git clone . prune-fail && > > cd prune-fail && > > git update-ref refs/remotes/origin/extrabranch main && > > - : this will prevent --prune from locking packed-refs for deleting refs, but adding loose refs still succeeds && > > - >.git/packed-refs.new && > > + : this will prevent --prune from locking refs/remotes/origin/extra for deletion && > > + >.git/refs/remotes/origin/extrabranch.lock && > > > > - test_must_fail git fetch --prune origin > > + test_must_fail git fetch --prune origin > outputs 2> errors > > It would be nice to have an explanation why exactly this change is > needed, and why it is fine that the visible behaviour changes here. The test was relying on the fact that the packed-refs file was locked when git fetch --prune is called. My patch replaces that unconditional lock with a transaction. The transaction only takes out the lock if packed-refs actually needs to be updated, and since the ref being pruned only exists as a loose ref, git fetch --prune failed to fail. I've replaced the lock on packed-refs with a lock on the loose ref that should be pruned. Michael
On Mon, May 09, 2022 at 12:18:51PM +0200, Michael Heemskerk wrote: > On Mon, May 9, 2022 at 8:23 AM Patrick Steinhardt <ps@pks.im> wrote: > > > > I really like these changes given that they simplify things, but I > > wonder whether we can do them. In the preimage we're eagerly removing > > loose refs: any error encountered when deleting a reference is recorded, > > but we keep on trying to remove the other refs, as well. With the new > > behaviour we now create a single transaction for all refs and try to > > commit it. This also means that we'll abort the transaction when locking > > any of the refs fails, which is a change in behaviour. > > > > The current behaviour is explicitly documented in `refs.h:refs_delete_refs()`: > > > > /* > > * Delete the specified references. If there are any problems, emit > > * errors but attempt to keep going (i.e., the deletes are not done in > > * an all-or-nothing transaction). msg and flags are passed through to > > * ref_transaction_delete(). > > */ > > int refs_delete_refs(struct ref_store *refs, const char *msg, > > struct string_list *refnames, unsigned int flags); > > > > There are multiple callsites of this function via `delete_refs()`. Now > > honestly, most of these callsites look somewhat broken: > > > > - `bisect.c` simply does its best to clean up bisect state. This > > usecase looks fine to me. > > > > - `builtin/branch.c` reports the branches as deleted even if > > `delete_refs()` failed. > > > > - `builtin/remote.c` also misreports the deleted branches for the > > `prune` verb. The `rm` verb looks alright: if deletion of any > > branch failed then it doesn't prune the remote's config in the end > > and reports an error. > > > > - `builtin/fetch.c` also misreports deleted branches with `--prune`. > > > > So most of these commands incorrectly handle the case where only a > > subset of branches has been deleted. This raises the question whether > > the interface provided by `refs_delete_refs()` is actually sensible if > > it's so easy to get wrong. It doesn't even report which branches could > > be removed and which couldn't. Furthermore, the question is whether new > > backends like the reftable backend which write all refs into a single > > slice would actually even be in a position to efficiently retain > > semantics of this function. > > > > I'm torn. There are valid usecases for eagerly deleting refs even if a > > subset of deletions failed, making this change a tough sell, but most of > > the callsites don't actually handle this correctly in the first place. > > Exactly; this is the reason that I initially suggested fixing the issue by > just removing the upfront rewrite of packed-refs. With that rewrite removed, > the refs-to-be-deleted are deleted in individual transactions, which may or > may not rewrite packed-refs. The downside, as you correctly pointed out, > is that we could end up rewriting packed-refs multiple times, which could > come at a significant performance penalty for repositories with large > packed-refs files. > > Unfortunately, the current approach of updating packed-refs in one > transaction and updating the loose refs in individual transactions doesn't > work either. > > So what are our options? > > - delete each of the refs in a separate transaction and pay a (potentially > significant) performance penalty in repositories with large packed-refs > files when deleting many refs. I'll note that this scenario is similar to > deleting a set of refs through a non-atomic push. By chance I know that on gitlab.com we've had huge performance issues with access patterns like this. In some cases with repos that have millions of refs I have seen that certain actions were completely dominated by rewriting the packed-refs file. So I'd rather avoid going there given that it would be a serious performance regression. > - switch to a single transaction and update refs.h:refs_delete_refs to use > an all-or-nothing approach (the approach I've taken in my patch). This is the easiest approach, but also backwards incompatible as I've layed out above. I personally wouldn't mind, and as said I think that most of the usecases are broken anyway because of implementations which misreport the case where we only partially deleted refs. But it's not a decision we can make without more discussion, I guess. > - improve the reference-transaction mechanism to support the > 'batch-of-transactions' mode more efficiently. If I remember correctly, > something like that has been suggested before, but I'm not sure if it's > actually been built or spiked. In this batch-of-transactions mode, git > could try to prepare all refs, and only invoke the hook for the refs that > could be successfully prepared. The hook should then be able to > reject individual ref updates, and git would then apply only the > non-rejected ref updates. While such a change would make many > scenarios where multiple refs are being updated more efficient, it's also > a much bigger change that's hard to make without breaking the current > reference-transaction protocol. Peff and I had such a discussion in the past, and having transactions with eager semantics would also fix some edge cases we have in other parts of the codebase. You already mentioned git-fetch(1) without `--atomic`, and there are likely others that could benefit. I did have a go at this several times already, but none of the approaches I took resulted in a clean-to-use API. I consider this to be the best solution, but also the hardest one to implement, unfortunately. > Sticking to a transaction per ref and rewriting packed-refs multiple times > is the safer option. Deleting the refs in a single transaction is the more > performant option, but changes the behavior. A stay/stale lock file could > then make it impossible to remove a remote, or to prune /refs/remotes/ > refs. > > My suggestion would be to stick to a transaction per ref and pay the > same performance penalty as you'd get when deleting many refs through > a non-atomic push. I don't think we should do this. The one-transaction-per-ref issue is why I added the `--atomic` flag to git-fetch(1) in the first place, because it has been biting us in repos with millions of refs. Quoting 583bc41923 (fetch: make `--atomic` flag cover pruning of refs, 2022-02-17), which introduces this flag: Benchmark 1: git fetch --prune --atomic +refs/*:refs/* (HEAD~) Time (mean ± σ): 2.366 s ± 0.021 s [User: 0.858 s, System: 1.508 s] Range (min … max): 2.328 s … 2.407 s 10 runs Benchmark 2: git fetch --prune --atomic +refs/*:refs/* (HEAD) Time (mean ± σ): 1.369 s ± 0.017 s [User: 0.715 s, System: 0.641 s] Range (min … max): 1.346 s … 1.400 s 10 runs Summary 'git fetch --prune --atomic +refs/*:refs/* (HEAD)' ran 1.73 ± 0.03 times faster than 'git fetch --prune --atomic +refs/*:refs/* (HEAD~)' And this is only with a 100k references. The issue gets a lot worse when you're in the millions of refs, both because you have 10x more refs to prune, but also because the packed-refs file is 10x larger. So performance doesn't scale linearly with the number of refs, but is a product of file size and number of refs. [snip] > > One thing that worries me is that these patches kind of set the current > > behaviour of driving the reftx hook via both packed and loose backend > > into stone. My patch series that got reverted is going to change that > > behaviour though so that we don't execute the hook from the packed > > backend, and consequentially we'd have to change all these tests again > > to match the new behaviour. This makes it a lot harder to argue though > > that we can safely switch to the new behaviour without breaking any > > assumptions when we even codified our current assumptions into tests. > > The counter argument to that is that it's kind of scary if you could remove > half of the reference-transaction callbacks without needing to update a > test. I'd rather have tests that verify current behavior that you need to > update when you intentionally change the behavior, then not have those > tests? Oh, I certainly agree there. And we have stated in the past that the reftx hook is _not_ providing a stable interface with regards to exactly when it is called. We only want to guarantee that it is called on a transaction. > > Taking a step back I wonder whether my previous approach to just hide > > the hook for the packed backend was the right thing to do though. An > > alternative would be to instead expose additional information to the > > hook so that it can decide by itself whether it wants to execute the > > hook or not. This could e.g. trivially be done by exposing a new > > "GIT_REFS_BACKEND" environment variable to the reftx hook that gets set > > to either "packed-refs", "loose-refs" or "reftables" depending on which > > backend is currently in use. Equipped with this information a hook > > script can then easily ignore any updates to the packed-refs file by > > itself without having to change the way it is invoked right now and thus > > we wouldn't regress any currently existing hooks. > > From the reference-transaction hook writer's perspective, the backend > involved is an implementation detail that the hook should not have to > care about. Getting separate callbacks for the loose and the packed > backends makes it a lot harder to write a good reference-transaction > hook, especially when the callbacks differ if a ref is packed or not. > > IMO, there should really be a "files" backend transaction, that internally > takes care of locking individual refs and possibly "packed-refs" in case > of a deletion. In addition, not getting "artificial" packed callbacks also > saves us a few extra reference-transaction callbacks when deleting refs. > Even if those take only a few ms per invocation, when deleting hundreds > of refs, it's still something that we'd like to avoid if we can. True. In this case it's just an alternative way to tackle this specific issue given that one wouldn't typically care about the packed-refs changes. And it's very much an artifact of the files-backend really being two backends in one. In any case I can see arguments for both approaches, and ultimately I agree that it would be best if the files-backend behaved as if it was a single one. > > > test_done > > > diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh > > > index 4620f0ca7fa..8b09a99c2e8 100755 > > > --- a/t/t5510-fetch.sh > > > +++ b/t/t5510-fetch.sh > > > @@ -169,10 +169,10 @@ test_expect_success REFFILES 'fetch --prune fails to delete branches' ' > > > git clone . prune-fail && > > > cd prune-fail && > > > git update-ref refs/remotes/origin/extrabranch main && > > > - : this will prevent --prune from locking packed-refs for deleting refs, but adding loose refs still succeeds && > > > - >.git/packed-refs.new && > > > + : this will prevent --prune from locking refs/remotes/origin/extra for deletion && > > > + >.git/refs/remotes/origin/extrabranch.lock && > > > > > > - test_must_fail git fetch --prune origin > > > + test_must_fail git fetch --prune origin > outputs 2> errors > > > > It would be nice to have an explanation why exactly this change is > > needed, and why it is fine that the visible behaviour changes here. > > The test was relying on the fact that the packed-refs file was locked when > git fetch --prune is called. My patch replaces that unconditional lock with > a transaction. The transaction only takes out the lock if packed-refs > actually needs to be updated, and since the ref being pruned only exists > as a loose ref, git fetch --prune failed to fail. > > I've replaced the lock on packed-refs with a lock on the loose ref that > should be pruned. Ah, thanks. I was missing the fact that the lock for the packed-refs file was taken unconditionally before. Patrick
diff --git a/refs/files-backend.c b/refs/files-backend.c index 8db7882aacb..5c23277eda7 100644 --- a/refs/files-backend.c +++ b/refs/files-backend.c @@ -1265,44 +1265,39 @@ static int files_delete_refs(struct ref_store *ref_store, const char *msg, struct files_ref_store *refs = files_downcast(ref_store, REF_STORE_WRITE, "delete_refs"); struct strbuf err = STRBUF_INIT; - int i, result = 0; + int i; + struct ref_transaction *transaction; if (!refnames->nr) return 0; - if (packed_refs_lock(refs->packed_ref_store, 0, &err)) - goto error; - - if (refs_delete_refs(refs->packed_ref_store, msg, refnames, flags)) { - packed_refs_unlock(refs->packed_ref_store); + transaction = ref_store_transaction_begin(&refs->base, &err); + if (!transaction) goto error; - } - - packed_refs_unlock(refs->packed_ref_store); for (i = 0; i < refnames->nr; i++) { const char *refname = refnames->items[i].string; - - if (refs_delete_ref(&refs->base, msg, refname, NULL, flags)) - result |= error(_("could not remove reference %s"), refname); + if (ref_transaction_delete(transaction, refname, NULL, + flags, msg, &err)) + goto error; } + if (ref_transaction_commit(transaction, &err)) + goto error; + + ref_transaction_free(transaction); strbuf_release(&err); - return result; + return 0; error: - /* - * If we failed to rewrite the packed-refs file, then it is - * unsafe to try to remove loose refs, because doing so might - * expose an obsolete packed value for a reference that might - * even point at an object that has been garbage collected. - */ if (refnames->nr == 1) error(_("could not delete reference %s: %s"), refnames->items[0].string, err.buf); else error(_("could not delete references: %s"), err.buf); + if (transaction) + ref_transaction_free(transaction); strbuf_release(&err); return -1; } diff --git a/t/t1416-ref-transaction-hooks.sh b/t/t1416-ref-transaction-hooks.sh index 27731722a5b..e3d4fe624f7 100755 --- a/t/t1416-ref-transaction-hooks.sh +++ b/t/t1416-ref-transaction-hooks.sh @@ -133,4 +133,74 @@ test_expect_success 'interleaving hook calls succeed' ' test_cmp expect target-repo.git/actual ' +test_expect_success 'hook allows deleting loose ref if successful' ' + test_when_finished "rm actual" && + git branch to-be-deleted $PRE_OID && + test_hook reference-transaction <<-\EOF && + echo "$*" >>actual + EOF + cat >expect <<-EOF && + aborted + prepared + committed + EOF + git branch -d to-be-deleted && + test_cmp expect actual && + test_must_fail git rev-parse refs/heads/to-be-deleted +' + +test_expect_success 'hook allows deleting packed ref if successful' ' + test_when_finished "rm actual" && + git branch to-be-deleted $PRE_OID && + git pack-refs --all --prune && + test_hook reference-transaction <<-\EOF && + echo "$*" >>actual + EOF + cat >expect <<-EOF && + prepared + prepared + committed + committed + EOF + git branch -d to-be-deleted && + test_cmp expect actual && + test_must_fail git rev-parse refs/heads/to-be-deleted +' + +test_expect_success 'hook aborts deleting loose ref in prepared state' ' + test_when_finished "rm actual" && + test_when_finished "git branch -d to-be-deleted" && + git branch to-be-deleted $PRE_OID && + test_hook reference-transaction <<-\EOF && + echo "$*" >>actual + exit 1 + EOF + cat >expect <<-EOF && + aborted + prepared + aborted + EOF + test_must_fail git branch -d to-be-deleted && + test_cmp expect actual && + git rev-parse refs/heads/to-be-deleted +' + +test_expect_success 'hook aborts deleting packed ref in prepared state' ' + test_when_finished "rm actual" && + test_when_finished "git branch -d to-be-deleted" && + git branch to-be-deleted $PRE_OID && + git pack-refs --all --prune && + test_hook reference-transaction <<-\EOF && + echo "$*" >>actual + exit 1 + EOF + cat >expect <<-EOF && + prepared + aborted + EOF + test_must_fail git branch -d to-be-deleted && + test_cmp expect actual && + git rev-parse refs/heads/to-be-deleted +' + test_done diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh index 4620f0ca7fa..8b09a99c2e8 100755 --- a/t/t5510-fetch.sh +++ b/t/t5510-fetch.sh @@ -169,10 +169,10 @@ test_expect_success REFFILES 'fetch --prune fails to delete branches' ' git clone . prune-fail && cd prune-fail && git update-ref refs/remotes/origin/extrabranch main && - : this will prevent --prune from locking packed-refs for deleting refs, but adding loose refs still succeeds && - >.git/packed-refs.new && + : this will prevent --prune from locking refs/remotes/origin/extra for deletion && + >.git/refs/remotes/origin/extrabranch.lock && - test_must_fail git fetch --prune origin + test_must_fail git fetch --prune origin > outputs 2> errors ' test_expect_success 'fetch --atomic works with a single branch' '