Message ID | 682f16117b743bec59c533e15ae5a88d39250222.1644565025.git.ps@pks.im (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | fetch: improve atomicity of `--atomic` flag | expand |
On Fri, Feb 11, 2022 at 9:25 AM Patrick Steinhardt <ps@pks.im> wrote: > > When fetching with the `--prune` flag we will delete any local > references matching the fetch refspec which have disappeared on the > remote. This step is not currently covered by the `--atomic` flag: we > delete branches even though updating of local references has failed, > which means that the fetch is not an all-or-nothing operation. It could perhaps be seen as a regression by some users though, if updating of local references doesn't work anymore when deleting a local reference matching the fetch refspec fails. > Fix this bug by passing in the global transaction into `prune_refs()`: > if one is given, then we'll only queue up deletions and not commit them > right away. > > This change also improves performance when pruning many branches in a > repository with a big packed-refs file: every references is pruned in > its own transaction, which means that we potentially have to rewrite > the packed-refs files for every single reference we're about to prune. Yeah, I wonder if there could be a performance improvement in the previous patch too as it looks like tag backfilling wasn't atomic too. > The following benchmark demonstrates this: it performs a pruning fetch > from a repository with a single reference into a repository with 100k > references, which causes us to prune all but one reference. This is of > course a very artificial setup, but serves to demonstrate the impact of > only having to write the packed-refs file once: > > Benchmark 1: git fetch --prune --atomic +refs/*:refs/* (HEAD~) > Time (mean ± σ): 2.366 s ± 0.021 s [User: 0.858 s, System: 1.508 s] > Range (min … max): 2.328 s … 2.407 s 10 runs > > Benchmark 2: git fetch --prune --atomic +refs/*:refs/* (HEAD) > Time (mean ± σ): 1.369 s ± 0.017 s [User: 0.715 s, System: 0.641 s] > Range (min … max): 1.346 s … 1.400 s 10 runs > > Summary > 'git fetch --prune --atomic +refs/*:refs/* (HEAD)' ran > 1.73 ± 0.03 times faster than 'git fetch --prune --atomic +refs/*:refs/* (HEAD~)' Nice!
Patrick Steinhardt <ps@pks.im> writes: > diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh > index 93a0db3c68..afa6bf9f7d 100755 > --- a/t/t5510-fetch.sh > +++ b/t/t5510-fetch.sh > @@ -349,11 +349,9 @@ test_expect_success 'fetch --atomic --prune executes a single reference transact > cat >expected <<-EOF && > prepared > $ZERO_OID $ZERO_OID refs/remotes/origin/scheduled-for-deletion > - committed > - $ZERO_OID $ZERO_OID refs/remotes/origin/scheduled-for-deletion > - prepared > $ZERO_OID $head_oid refs/remotes/origin/new-branch > committed > + $ZERO_OID $ZERO_OID refs/remotes/origin/scheduled-for-deletion > $ZERO_OID $head_oid refs/remotes/origin/new-branch > EOF I think that there is a comment above this change that needs to be deleted too. Overall, I think that this patch set is a good change, and I have no further comments. (Regarding patch 5, I can't think of a better way to know what refs are duplicates.)
On Fri, Feb 11, 2022 at 12:25 AM Patrick Steinhardt <ps@pks.im> wrote: > > When fetching with the `--prune` flag we will delete any local > references matching the fetch refspec which have disappeared on the > remote. This step is not currently covered by the `--atomic` flag: we > delete branches even though updating of local references has failed, > which means that the fetch is not an all-or-nothing operation. > > Fix this bug by passing in the global transaction into `prune_refs()`: > if one is given, then we'll only queue up deletions and not commit them > right away. > > This change also improves performance when pruning many branches in a > repository with a big packed-refs file: every references is pruned in > its own transaction, which means that we potentially have to rewrite > the packed-refs files for every single reference we're about to prune. > > The following benchmark demonstrates this: it performs a pruning fetch > from a repository with a single reference into a repository with 100k > references, which causes us to prune all but one reference. This is of > course a very artificial setup, but serves to demonstrate the impact of > only having to write the packed-refs file once: > > Benchmark 1: git fetch --prune --atomic +refs/*:refs/* (HEAD~) > Time (mean ± σ): 2.366 s ± 0.021 s [User: 0.858 s, System: 1.508 s] > Range (min … max): 2.328 s … 2.407 s 10 runs > > Benchmark 2: git fetch --prune --atomic +refs/*:refs/* (HEAD) > Time (mean ± σ): 1.369 s ± 0.017 s [User: 0.715 s, System: 0.641 s] > Range (min … max): 1.346 s … 1.400 s 10 runs > > Summary > 'git fetch --prune --atomic +refs/*:refs/* (HEAD)' ran > 1.73 ± 0.03 times faster than 'git fetch --prune --atomic +refs/*:refs/* (HEAD~)' Very nice! [...] > diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh > index 93a0db3c68..afa6bf9f7d 100755 > --- a/t/t5510-fetch.sh > +++ b/t/t5510-fetch.sh > @@ -349,11 +349,9 @@ test_expect_success 'fetch --atomic --prune executes a single reference transact > cat >expected <<-EOF && > prepared > $ZERO_OID $ZERO_OID refs/remotes/origin/scheduled-for-deletion > - committed > - $ZERO_OID $ZERO_OID refs/remotes/origin/scheduled-for-deletion > - prepared > $ZERO_OID $head_oid refs/remotes/origin/new-branch > committed Up to here this is just what I expected. > + $ZERO_OID $ZERO_OID refs/remotes/origin/scheduled-for-deletion > $ZERO_OID $head_oid refs/remotes/origin/new-branch But now scheduled-for-deletion and new-branch are both listed again even with your fixes? Is this some peculiarity of the reference transaction hook that it lists the refs again after the prepared...committed block? (It may well be; I'm not that familiar with this area of the code.)
On Tue, Feb 15, 2022 at 10:12:23AM +0100, Christian Couder wrote: > On Fri, Feb 11, 2022 at 9:25 AM Patrick Steinhardt <ps@pks.im> wrote: > > > > When fetching with the `--prune` flag we will delete any local > > references matching the fetch refspec which have disappeared on the > > remote. This step is not currently covered by the `--atomic` flag: we > > delete branches even though updating of local references has failed, > > which means that the fetch is not an all-or-nothing operation. > > It could perhaps be seen as a regression by some users though, if > updating of local references doesn't work anymore when deleting a > local reference matching the fetch refspec fails. I guess the same comment applies here as for the other patch: the documentation says that we either update all or no refs, and that's not what was happening previous to this patch. > > Fix this bug by passing in the global transaction into `prune_refs()`: > > if one is given, then we'll only queue up deletions and not commit them > > right away. > > > > This change also improves performance when pruning many branches in a > > repository with a big packed-refs file: every references is pruned in > > its own transaction, which means that we potentially have to rewrite > > the packed-refs files for every single reference we're about to prune. > > Yeah, I wonder if there could be a performance improvement in the > previous patch too as it looks like tag backfilling wasn't atomic too. I doubt it would be as measurable as it is here. The reason why we have this speedup is that for every deleted ref, we need to rewrite the complete contents of the packed-refs file only with that single ref removed from it. So for 10k refs, we essentially write the file 10k times. For the backfilling case that doesn't happen though: we only write the new loose refs, and that's not any faster in case we use a single transaction. Sure, we'll likely be able to shed some of the overhead by using a single transaction only, but it will not be as pronounced as it is here. This will be different though as soon as the reftable backend lands: there we'd write all new refs in a single slice, and that's definitely more efficient than writing one slice per backfilled ref. Patrick > > The following benchmark demonstrates this: it performs a pruning fetch > > from a repository with a single reference into a repository with 100k > > references, which causes us to prune all but one reference. This is of > > course a very artificial setup, but serves to demonstrate the impact of > > only having to write the packed-refs file once: > > > > Benchmark 1: git fetch --prune --atomic +refs/*:refs/* (HEAD~) > > Time (mean ± σ): 2.366 s ± 0.021 s [User: 0.858 s, System: 1.508 s] > > Range (min … max): 2.328 s … 2.407 s 10 runs > > > > Benchmark 2: git fetch --prune --atomic +refs/*:refs/* (HEAD) > > Time (mean ± σ): 1.369 s ± 0.017 s [User: 0.715 s, System: 0.641 s] > > Range (min … max): 1.346 s … 1.400 s 10 runs > > > > Summary > > 'git fetch --prune --atomic +refs/*:refs/* (HEAD)' ran > > 1.73 ± 0.03 times faster than 'git fetch --prune --atomic +refs/*:refs/* (HEAD~)' > > Nice!
On Wed, Feb 16, 2022 at 05:40:19PM -0800, Elijah Newren wrote: > On Fri, Feb 11, 2022 at 12:25 AM Patrick Steinhardt <ps@pks.im> wrote: > > > > When fetching with the `--prune` flag we will delete any local > > references matching the fetch refspec which have disappeared on the > > remote. This step is not currently covered by the `--atomic` flag: we > > delete branches even though updating of local references has failed, > > which means that the fetch is not an all-or-nothing operation. > > > > Fix this bug by passing in the global transaction into `prune_refs()`: > > if one is given, then we'll only queue up deletions and not commit them > > right away. > > > > This change also improves performance when pruning many branches in a > > repository with a big packed-refs file: every references is pruned in > > its own transaction, which means that we potentially have to rewrite > > the packed-refs files for every single reference we're about to prune. > > > > The following benchmark demonstrates this: it performs a pruning fetch > > from a repository with a single reference into a repository with 100k > > references, which causes us to prune all but one reference. This is of > > course a very artificial setup, but serves to demonstrate the impact of > > only having to write the packed-refs file once: > > > > Benchmark 1: git fetch --prune --atomic +refs/*:refs/* (HEAD~) > > Time (mean ± σ): 2.366 s ± 0.021 s [User: 0.858 s, System: 1.508 s] > > Range (min … max): 2.328 s … 2.407 s 10 runs > > > > Benchmark 2: git fetch --prune --atomic +refs/*:refs/* (HEAD) > > Time (mean ± σ): 1.369 s ± 0.017 s [User: 0.715 s, System: 0.641 s] > > Range (min … max): 1.346 s … 1.400 s 10 runs > > > > Summary > > 'git fetch --prune --atomic +refs/*:refs/* (HEAD)' ran > > 1.73 ± 0.03 times faster than 'git fetch --prune --atomic +refs/*:refs/* (HEAD~)' > > Very nice! > > [...] > > diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh > > index 93a0db3c68..afa6bf9f7d 100755 > > --- a/t/t5510-fetch.sh > > +++ b/t/t5510-fetch.sh > > @@ -349,11 +349,9 @@ test_expect_success 'fetch --atomic --prune executes a single reference transact > > cat >expected <<-EOF && > > prepared > > $ZERO_OID $ZERO_OID refs/remotes/origin/scheduled-for-deletion > > - committed > > - $ZERO_OID $ZERO_OID refs/remotes/origin/scheduled-for-deletion > > - prepared > > $ZERO_OID $head_oid refs/remotes/origin/new-branch > > committed > > Up to here this is just what I expected. > > > + $ZERO_OID $ZERO_OID refs/remotes/origin/scheduled-for-deletion > > $ZERO_OID $head_oid refs/remotes/origin/new-branch > > But now scheduled-for-deletion and new-branch are both listed again > even with your fixes? Is this some peculiarity of the reference > transaction hook that it lists the refs again after the > prepared...committed block? (It may well be; I'm not that familiar > with this area of the code.) Yes, the reference-transaction hook is executed for each of the "prepared", "committed" and "aborted" phases and gets as input all the refs that have been queued for that phase. Patrick
diff --git a/builtin/fetch.c b/builtin/fetch.c index 348e64cf2c..75e791a4b4 100644 --- a/builtin/fetch.c +++ b/builtin/fetch.c @@ -1333,11 +1333,14 @@ static int fetch_and_consume_refs(struct transport *transport, return ret; } -static int prune_refs(struct refspec *rs, struct ref *ref_map, +static int prune_refs(struct refspec *rs, + struct ref_transaction *transaction, + struct ref *ref_map, const char *raw_url) { int url_len, i, result = 0; struct ref *ref, *stale_refs = get_stale_heads(rs, ref_map); + struct strbuf err = STRBUF_INIT; char *url; int summary_width = transport_summary_width(stale_refs); const char *dangling_msg = dry_run @@ -1358,13 +1361,22 @@ static int prune_refs(struct refspec *rs, struct ref *ref_map, url_len = i - 3; if (!dry_run) { - struct string_list refnames = STRING_LIST_INIT_NODUP; + if (transaction) { + for (ref = stale_refs; ref; ref = ref->next) { + result = ref_transaction_delete(transaction, ref->name, NULL, 0, + "fetch: prune", &err); + if (result) + goto cleanup; + } + } else { + struct string_list refnames = STRING_LIST_INIT_NODUP; - for (ref = stale_refs; ref; ref = ref->next) - string_list_append(&refnames, ref->name); + for (ref = stale_refs; ref; ref = ref->next) + string_list_append(&refnames, ref->name); - result = delete_refs("fetch: prune", &refnames, 0); - string_list_clear(&refnames, 0); + result = delete_refs("fetch: prune", &refnames, 0); + string_list_clear(&refnames, 0); + } } if (verbosity >= 0) { @@ -1383,6 +1395,8 @@ static int prune_refs(struct refspec *rs, struct ref *ref_map, } } +cleanup: + strbuf_release(&err); free(url); free_refs(stale_refs); return result; @@ -1624,10 +1638,10 @@ static int do_fetch(struct transport *transport, * don't care whether --tags was specified. */ if (rs->nr) { - prune_refs(rs, ref_map, transport->url); + prune_refs(rs, transaction, ref_map, transport->url); } else { prune_refs(&transport->remote->fetch, - ref_map, + transaction, ref_map, transport->url); } } diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh index 93a0db3c68..afa6bf9f7d 100755 --- a/t/t5510-fetch.sh +++ b/t/t5510-fetch.sh @@ -349,11 +349,9 @@ test_expect_success 'fetch --atomic --prune executes a single reference transact cat >expected <<-EOF && prepared $ZERO_OID $ZERO_OID refs/remotes/origin/scheduled-for-deletion - committed - $ZERO_OID $ZERO_OID refs/remotes/origin/scheduled-for-deletion - prepared $ZERO_OID $head_oid refs/remotes/origin/new-branch committed + $ZERO_OID $ZERO_OID refs/remotes/origin/scheduled-for-deletion $ZERO_OID $head_oid refs/remotes/origin/new-branch EOF
When fetching with the `--prune` flag we will delete any local references matching the fetch refspec which have disappeared on the remote. This step is not currently covered by the `--atomic` flag: we delete branches even though updating of local references has failed, which means that the fetch is not an all-or-nothing operation. Fix this bug by passing in the global transaction into `prune_refs()`: if one is given, then we'll only queue up deletions and not commit them right away. This change also improves performance when pruning many branches in a repository with a big packed-refs file: every references is pruned in its own transaction, which means that we potentially have to rewrite the packed-refs files for every single reference we're about to prune. The following benchmark demonstrates this: it performs a pruning fetch from a repository with a single reference into a repository with 100k references, which causes us to prune all but one reference. This is of course a very artificial setup, but serves to demonstrate the impact of only having to write the packed-refs file once: Benchmark 1: git fetch --prune --atomic +refs/*:refs/* (HEAD~) Time (mean ± σ): 2.366 s ± 0.021 s [User: 0.858 s, System: 1.508 s] Range (min … max): 2.328 s … 2.407 s 10 runs Benchmark 2: git fetch --prune --atomic +refs/*:refs/* (HEAD) Time (mean ± σ): 1.369 s ± 0.017 s [User: 0.715 s, System: 0.641 s] Range (min … max): 1.346 s … 1.400 s 10 runs Summary 'git fetch --prune --atomic +refs/*:refs/* (HEAD)' ran 1.73 ± 0.03 times faster than 'git fetch --prune --atomic +refs/*:refs/* (HEAD~)' Signed-off-by: Patrick Steinhardt <ps@pks.im> --- builtin/fetch.c | 30 ++++++++++++++++++++++-------- t/t5510-fetch.sh | 4 +--- 2 files changed, 23 insertions(+), 11 deletions(-)