diff mbox series

[6/6] fetch: make `--atomic` flag cover pruning of refs

Message ID 682f16117b743bec59c533e15ae5a88d39250222.1644565025.git.ps@pks.im (mailing list archive)
State Superseded
Headers show
Series fetch: improve atomicity of `--atomic` flag | expand

Commit Message

Patrick Steinhardt Feb. 11, 2022, 7:47 a.m. UTC
When fetching with the `--prune` flag we will delete any local
references matching the fetch refspec which have disappeared on the
remote. This step is not currently covered by the `--atomic` flag: we
delete branches even though updating of local references has failed,
which means that the fetch is not an all-or-nothing operation.

Fix this bug by passing in the global transaction into `prune_refs()`:
if one is given, then we'll only queue up deletions and not commit them
right away.

This change also improves performance when pruning many branches in a
repository with a big packed-refs file: every references is pruned in
its own transaction, which means that we potentially have to rewrite
the packed-refs files for every single reference we're about to prune.

The following benchmark demonstrates this: it performs a pruning fetch
from a repository with a single reference into a repository with 100k
references, which causes us to prune all but one reference. This is of
course a very artificial setup, but serves to demonstrate the impact of
only having to write the packed-refs file once:

    Benchmark 1: git fetch --prune --atomic +refs/*:refs/* (HEAD~)
      Time (mean ± σ):      2.366 s ±  0.021 s    [User: 0.858 s, System: 1.508 s]
      Range (min … max):    2.328 s …  2.407 s    10 runs

    Benchmark 2: git fetch --prune --atomic +refs/*:refs/* (HEAD)
      Time (mean ± σ):      1.369 s ±  0.017 s    [User: 0.715 s, System: 0.641 s]
      Range (min … max):    1.346 s …  1.400 s    10 runs

    Summary
      'git fetch --prune --atomic +refs/*:refs/* (HEAD)' ran
        1.73 ± 0.03 times faster than 'git fetch --prune --atomic +refs/*:refs/* (HEAD~)'

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/fetch.c  | 30 ++++++++++++++++++++++--------
 t/t5510-fetch.sh |  4 +---
 2 files changed, 23 insertions(+), 11 deletions(-)

Comments

Christian Couder Feb. 15, 2022, 9:12 a.m. UTC | #1
On Fri, Feb 11, 2022 at 9:25 AM Patrick Steinhardt <ps@pks.im> wrote:
>
> When fetching with the `--prune` flag we will delete any local
> references matching the fetch refspec which have disappeared on the
> remote. This step is not currently covered by the `--atomic` flag: we
> delete branches even though updating of local references has failed,
> which means that the fetch is not an all-or-nothing operation.

It could perhaps be seen as a regression by some users though, if
updating of local references doesn't work anymore when deleting a
local reference matching the fetch refspec fails.

> Fix this bug by passing in the global transaction into `prune_refs()`:
> if one is given, then we'll only queue up deletions and not commit them
> right away.
>
> This change also improves performance when pruning many branches in a
> repository with a big packed-refs file: every references is pruned in
> its own transaction, which means that we potentially have to rewrite
> the packed-refs files for every single reference we're about to prune.

Yeah, I wonder if there could be a performance improvement in the
previous patch too as it looks like tag backfilling wasn't atomic too.

> The following benchmark demonstrates this: it performs a pruning fetch
> from a repository with a single reference into a repository with 100k
> references, which causes us to prune all but one reference. This is of
> course a very artificial setup, but serves to demonstrate the impact of
> only having to write the packed-refs file once:
>
>     Benchmark 1: git fetch --prune --atomic +refs/*:refs/* (HEAD~)
>       Time (mean ± σ):      2.366 s ±  0.021 s    [User: 0.858 s, System: 1.508 s]
>       Range (min … max):    2.328 s …  2.407 s    10 runs
>
>     Benchmark 2: git fetch --prune --atomic +refs/*:refs/* (HEAD)
>       Time (mean ± σ):      1.369 s ±  0.017 s    [User: 0.715 s, System: 0.641 s]
>       Range (min … max):    1.346 s …  1.400 s    10 runs
>
>     Summary
>       'git fetch --prune --atomic +refs/*:refs/* (HEAD)' ran
>         1.73 ± 0.03 times faster than 'git fetch --prune --atomic +refs/*:refs/* (HEAD~)'

Nice!
Jonathan Tan Feb. 16, 2022, 11:39 p.m. UTC | #2
Patrick Steinhardt <ps@pks.im> writes:
> diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh
> index 93a0db3c68..afa6bf9f7d 100755
> --- a/t/t5510-fetch.sh
> +++ b/t/t5510-fetch.sh
> @@ -349,11 +349,9 @@ test_expect_success 'fetch --atomic --prune executes a single reference transact
>  	cat >expected <<-EOF &&
>  		prepared
>  		$ZERO_OID $ZERO_OID refs/remotes/origin/scheduled-for-deletion
> -		committed
> -		$ZERO_OID $ZERO_OID refs/remotes/origin/scheduled-for-deletion
> -		prepared
>  		$ZERO_OID $head_oid refs/remotes/origin/new-branch
>  		committed
> +		$ZERO_OID $ZERO_OID refs/remotes/origin/scheduled-for-deletion
>  		$ZERO_OID $head_oid refs/remotes/origin/new-branch
>  	EOF

I think that there is a comment above this change that needs to be
deleted too.

Overall, I think that this patch set is a good change, and I have no
further comments. (Regarding patch 5, I can't think of a better way to
know what refs are duplicates.)
Elijah Newren Feb. 17, 2022, 1:40 a.m. UTC | #3
On Fri, Feb 11, 2022 at 12:25 AM Patrick Steinhardt <ps@pks.im> wrote:
>
> When fetching with the `--prune` flag we will delete any local
> references matching the fetch refspec which have disappeared on the
> remote. This step is not currently covered by the `--atomic` flag: we
> delete branches even though updating of local references has failed,
> which means that the fetch is not an all-or-nothing operation.
>
> Fix this bug by passing in the global transaction into `prune_refs()`:
> if one is given, then we'll only queue up deletions and not commit them
> right away.
>
> This change also improves performance when pruning many branches in a
> repository with a big packed-refs file: every references is pruned in
> its own transaction, which means that we potentially have to rewrite
> the packed-refs files for every single reference we're about to prune.
>
> The following benchmark demonstrates this: it performs a pruning fetch
> from a repository with a single reference into a repository with 100k
> references, which causes us to prune all but one reference. This is of
> course a very artificial setup, but serves to demonstrate the impact of
> only having to write the packed-refs file once:
>
>     Benchmark 1: git fetch --prune --atomic +refs/*:refs/* (HEAD~)
>       Time (mean ± σ):      2.366 s ±  0.021 s    [User: 0.858 s, System: 1.508 s]
>       Range (min … max):    2.328 s …  2.407 s    10 runs
>
>     Benchmark 2: git fetch --prune --atomic +refs/*:refs/* (HEAD)
>       Time (mean ± σ):      1.369 s ±  0.017 s    [User: 0.715 s, System: 0.641 s]
>       Range (min … max):    1.346 s …  1.400 s    10 runs
>
>     Summary
>       'git fetch --prune --atomic +refs/*:refs/* (HEAD)' ran
>         1.73 ± 0.03 times faster than 'git fetch --prune --atomic +refs/*:refs/* (HEAD~)'

Very nice!

[...]
> diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh
> index 93a0db3c68..afa6bf9f7d 100755
> --- a/t/t5510-fetch.sh
> +++ b/t/t5510-fetch.sh
> @@ -349,11 +349,9 @@ test_expect_success 'fetch --atomic --prune executes a single reference transact
>         cat >expected <<-EOF &&
>                 prepared
>                 $ZERO_OID $ZERO_OID refs/remotes/origin/scheduled-for-deletion
> -               committed
> -               $ZERO_OID $ZERO_OID refs/remotes/origin/scheduled-for-deletion
> -               prepared
>                 $ZERO_OID $head_oid refs/remotes/origin/new-branch
>                 committed

Up to here this is just what I expected.

> +               $ZERO_OID $ZERO_OID refs/remotes/origin/scheduled-for-deletion
>                 $ZERO_OID $head_oid refs/remotes/origin/new-branch

But now scheduled-for-deletion and new-branch are both listed again
even with your fixes?  Is this some peculiarity of the reference
transaction hook that it lists the refs again after the
prepared...committed block?  (It may well be; I'm not that familiar
with this area of the code.)
Patrick Steinhardt Feb. 17, 2022, 12:03 p.m. UTC | #4
On Tue, Feb 15, 2022 at 10:12:23AM +0100, Christian Couder wrote:
> On Fri, Feb 11, 2022 at 9:25 AM Patrick Steinhardt <ps@pks.im> wrote:
> >
> > When fetching with the `--prune` flag we will delete any local
> > references matching the fetch refspec which have disappeared on the
> > remote. This step is not currently covered by the `--atomic` flag: we
> > delete branches even though updating of local references has failed,
> > which means that the fetch is not an all-or-nothing operation.
> 
> It could perhaps be seen as a regression by some users though, if
> updating of local references doesn't work anymore when deleting a
> local reference matching the fetch refspec fails.

I guess the same comment applies here as for the other patch: the
documentation says that we either update all or no refs, and that's not
what was happening previous to this patch.

> > Fix this bug by passing in the global transaction into `prune_refs()`:
> > if one is given, then we'll only queue up deletions and not commit them
> > right away.
> >
> > This change also improves performance when pruning many branches in a
> > repository with a big packed-refs file: every references is pruned in
> > its own transaction, which means that we potentially have to rewrite
> > the packed-refs files for every single reference we're about to prune.
> 
> Yeah, I wonder if there could be a performance improvement in the
> previous patch too as it looks like tag backfilling wasn't atomic too.

I doubt it would be as measurable as it is here. The reason why we have
this speedup is that for every deleted ref, we need to rewrite the
complete contents of the packed-refs file only with that single ref
removed from it. So for 10k refs, we essentially write the file 10k
times.

For the backfilling case that doesn't happen though: we only write the
new loose refs, and that's not any faster in case we use a single
transaction. Sure, we'll likely be able to shed some of the overhead by
using a single transaction only, but it will not be as pronounced as it
is here.

This will be different though as soon as the reftable backend lands:
there we'd write all new refs in a single slice, and that's definitely
more efficient than writing one slice per backfilled ref.

Patrick

> > The following benchmark demonstrates this: it performs a pruning fetch
> > from a repository with a single reference into a repository with 100k
> > references, which causes us to prune all but one reference. This is of
> > course a very artificial setup, but serves to demonstrate the impact of
> > only having to write the packed-refs file once:
> >
> >     Benchmark 1: git fetch --prune --atomic +refs/*:refs/* (HEAD~)
> >       Time (mean ± σ):      2.366 s ±  0.021 s    [User: 0.858 s, System: 1.508 s]
> >       Range (min … max):    2.328 s …  2.407 s    10 runs
> >
> >     Benchmark 2: git fetch --prune --atomic +refs/*:refs/* (HEAD)
> >       Time (mean ± σ):      1.369 s ±  0.017 s    [User: 0.715 s, System: 0.641 s]
> >       Range (min … max):    1.346 s …  1.400 s    10 runs
> >
> >     Summary
> >       'git fetch --prune --atomic +refs/*:refs/* (HEAD)' ran
> >         1.73 ± 0.03 times faster than 'git fetch --prune --atomic +refs/*:refs/* (HEAD~)'
> 
> Nice!
Patrick Steinhardt Feb. 17, 2022, 12:06 p.m. UTC | #5
On Wed, Feb 16, 2022 at 05:40:19PM -0800, Elijah Newren wrote:
> On Fri, Feb 11, 2022 at 12:25 AM Patrick Steinhardt <ps@pks.im> wrote:
> >
> > When fetching with the `--prune` flag we will delete any local
> > references matching the fetch refspec which have disappeared on the
> > remote. This step is not currently covered by the `--atomic` flag: we
> > delete branches even though updating of local references has failed,
> > which means that the fetch is not an all-or-nothing operation.
> >
> > Fix this bug by passing in the global transaction into `prune_refs()`:
> > if one is given, then we'll only queue up deletions and not commit them
> > right away.
> >
> > This change also improves performance when pruning many branches in a
> > repository with a big packed-refs file: every references is pruned in
> > its own transaction, which means that we potentially have to rewrite
> > the packed-refs files for every single reference we're about to prune.
> >
> > The following benchmark demonstrates this: it performs a pruning fetch
> > from a repository with a single reference into a repository with 100k
> > references, which causes us to prune all but one reference. This is of
> > course a very artificial setup, but serves to demonstrate the impact of
> > only having to write the packed-refs file once:
> >
> >     Benchmark 1: git fetch --prune --atomic +refs/*:refs/* (HEAD~)
> >       Time (mean ± σ):      2.366 s ±  0.021 s    [User: 0.858 s, System: 1.508 s]
> >       Range (min … max):    2.328 s …  2.407 s    10 runs
> >
> >     Benchmark 2: git fetch --prune --atomic +refs/*:refs/* (HEAD)
> >       Time (mean ± σ):      1.369 s ±  0.017 s    [User: 0.715 s, System: 0.641 s]
> >       Range (min … max):    1.346 s …  1.400 s    10 runs
> >
> >     Summary
> >       'git fetch --prune --atomic +refs/*:refs/* (HEAD)' ran
> >         1.73 ± 0.03 times faster than 'git fetch --prune --atomic +refs/*:refs/* (HEAD~)'
> 
> Very nice!
> 
> [...]
> > diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh
> > index 93a0db3c68..afa6bf9f7d 100755
> > --- a/t/t5510-fetch.sh
> > +++ b/t/t5510-fetch.sh
> > @@ -349,11 +349,9 @@ test_expect_success 'fetch --atomic --prune executes a single reference transact
> >         cat >expected <<-EOF &&
> >                 prepared
> >                 $ZERO_OID $ZERO_OID refs/remotes/origin/scheduled-for-deletion
> > -               committed
> > -               $ZERO_OID $ZERO_OID refs/remotes/origin/scheduled-for-deletion
> > -               prepared
> >                 $ZERO_OID $head_oid refs/remotes/origin/new-branch
> >                 committed
> 
> Up to here this is just what I expected.
> 
> > +               $ZERO_OID $ZERO_OID refs/remotes/origin/scheduled-for-deletion
> >                 $ZERO_OID $head_oid refs/remotes/origin/new-branch
> 
> But now scheduled-for-deletion and new-branch are both listed again
> even with your fixes?  Is this some peculiarity of the reference
> transaction hook that it lists the refs again after the
> prepared...committed block?  (It may well be; I'm not that familiar
> with this area of the code.)

Yes, the reference-transaction hook is executed for each of the
"prepared", "committed" and "aborted" phases and gets as input all the
refs that have been queued for that phase.

Patrick
diff mbox series

Patch

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 348e64cf2c..75e791a4b4 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1333,11 +1333,14 @@  static int fetch_and_consume_refs(struct transport *transport,
 	return ret;
 }
 
-static int prune_refs(struct refspec *rs, struct ref *ref_map,
+static int prune_refs(struct refspec *rs,
+		      struct ref_transaction *transaction,
+		      struct ref *ref_map,
 		      const char *raw_url)
 {
 	int url_len, i, result = 0;
 	struct ref *ref, *stale_refs = get_stale_heads(rs, ref_map);
+	struct strbuf err = STRBUF_INIT;
 	char *url;
 	int summary_width = transport_summary_width(stale_refs);
 	const char *dangling_msg = dry_run
@@ -1358,13 +1361,22 @@  static int prune_refs(struct refspec *rs, struct ref *ref_map,
 		url_len = i - 3;
 
 	if (!dry_run) {
-		struct string_list refnames = STRING_LIST_INIT_NODUP;
+		if (transaction) {
+			for (ref = stale_refs; ref; ref = ref->next) {
+				result = ref_transaction_delete(transaction, ref->name, NULL, 0,
+								"fetch: prune", &err);
+				if (result)
+					goto cleanup;
+			}
+		} else {
+			struct string_list refnames = STRING_LIST_INIT_NODUP;
 
-		for (ref = stale_refs; ref; ref = ref->next)
-			string_list_append(&refnames, ref->name);
+			for (ref = stale_refs; ref; ref = ref->next)
+				string_list_append(&refnames, ref->name);
 
-		result = delete_refs("fetch: prune", &refnames, 0);
-		string_list_clear(&refnames, 0);
+			result = delete_refs("fetch: prune", &refnames, 0);
+			string_list_clear(&refnames, 0);
+		}
 	}
 
 	if (verbosity >= 0) {
@@ -1383,6 +1395,8 @@  static int prune_refs(struct refspec *rs, struct ref *ref_map,
 		}
 	}
 
+cleanup:
+	strbuf_release(&err);
 	free(url);
 	free_refs(stale_refs);
 	return result;
@@ -1624,10 +1638,10 @@  static int do_fetch(struct transport *transport,
 		 * don't care whether --tags was specified.
 		 */
 		if (rs->nr) {
-			prune_refs(rs, ref_map, transport->url);
+			prune_refs(rs, transaction, ref_map, transport->url);
 		} else {
 			prune_refs(&transport->remote->fetch,
-				   ref_map,
+				   transaction, ref_map,
 				   transport->url);
 		}
 	}
diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh
index 93a0db3c68..afa6bf9f7d 100755
--- a/t/t5510-fetch.sh
+++ b/t/t5510-fetch.sh
@@ -349,11 +349,9 @@  test_expect_success 'fetch --atomic --prune executes a single reference transact
 	cat >expected <<-EOF &&
 		prepared
 		$ZERO_OID $ZERO_OID refs/remotes/origin/scheduled-for-deletion
-		committed
-		$ZERO_OID $ZERO_OID refs/remotes/origin/scheduled-for-deletion
-		prepared
 		$ZERO_OID $head_oid refs/remotes/origin/new-branch
 		committed
+		$ZERO_OID $ZERO_OID refs/remotes/origin/scheduled-for-deletion
 		$ZERO_OID $head_oid refs/remotes/origin/new-branch
 	EOF