[v2,1/2] fetch-pack: use commit-graph when computing cutoff

Message ID	6fac914f0fe77df4c3058340642bea2a45a850cd.1643806143.git.ps@pks.im (mailing list archive)
State	Superseded
Headers	show Return-Path: <git-owner@kernel.org> Date: Wed, 2 Feb 2022 13:51:48 +0100 From: Patrick Steinhardt <ps@pks.im> To: git@vger.kernel.org Cc: Taylor Blau <me@ttaylorr.com>, Derrick Stolee <stolee@gmail.com> Subject: [PATCH v2 1/2] fetch-pack: use commit-graph when computing cutoff Message-ID: <6fac914f0fe77df4c3058340642bea2a45a850cd.1643806143.git.ps@pks.im> References: <cover.1643364888.git.ps@pks.im> <cover.1643806143.git.ps@pks.im> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="OfSfQT0dWZwQYuEe" Content-Disposition: inline In-Reply-To: <cover.1643806143.git.ps@pks.im> Precedence: bulk
Series	fetch: speed up mirror-fetches with many refs \| expand [v2,0/2] fetch: speed up mirror-fetches with many refs [v2,1/2] fetch-pack: use commit-graph when computing cutoff [v2,2/2] fetch: skip computing output width when not printing anything

Message ID

6fac914f0fe77df4c3058340642bea2a45a850cd.1643806143.git.ps@pks.im (mailing list archive)

State

Superseded

Headers

Date: Wed, 2 Feb 2022 13:51:48 +0100
From: Patrick Steinhardt <ps@pks.im>
To: git@vger.kernel.org
Cc: Taylor Blau <me@ttaylorr.com>, Derrick Stolee <stolee@gmail.com>
Subject: [PATCH v2 1/2] fetch-pack: use commit-graph when computing cutoff
Message-ID: 
 <6fac914f0fe77df4c3058340642bea2a45a850cd.1643806143.git.ps@pks.im>
References: <cover.1643364888.git.ps@pks.im>
 <cover.1643806143.git.ps@pks.im>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha512;
        protocol="application/pgp-signature"; boundary="OfSfQT0dWZwQYuEe"
Content-Disposition: inline
In-Reply-To: <cover.1643806143.git.ps@pks.im>
Precedence: bulk

Series

fetch: speed up mirror-fetches with many refs | expand

Commit Message

Patrick Steinhardt Feb. 2, 2022, 12:51 p.m. UTC

During packfile negotiation we iterate over all refs announced by the
remote side to check whether their IDs refer to commits already known to
us. If a commit is known to us already, then its date is a potential
cutoff point for commits we have in common with the remote side.

There is potentially a lot of commits announced by the remote depending
on how many refs there are in the remote repository, and for every one
of them we need to search for it in our object database and, if found,
parse the corresponding object to find out whether it is a candidate for
the cutoff date. This can be sped up by trying to look up commits via
the commit-graph first, which is a lot more efficient.

Benchmarks in a repository with about 2,1 million refs and an up-to-date
commit-graph show a 20% speedup when mirror-fetching:

    Benchmark 1: git fetch --atomic +refs/*:refs/* (v2.35.0)
      Time (mean ± σ):     75.264 s ±  1.115 s    [User: 68.199 s, System: 10.094 s]
      Range (min … max):   74.145 s … 76.862 s    5 runs

    Benchmark 2: git fetch --atomic +refs/*:refs/* (HEAD)
      Time (mean ± σ):     62.350 s ±  0.854 s    [User: 55.412 s, System: 9.976 s]
      Range (min … max):   61.224 s … 63.216 s    5 runs

    Summary
      'git fetch --atomic +refs/*:refs/* (HEAD)' ran
        1.21 ± 0.02 times faster than 'git fetch --atomic +refs/*:refs/* (v2.35.0)'

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 fetch-pack.c | 28 ++++++++++++++++------------
 1 file changed, 16 insertions(+), 12 deletions(-)

Comments

Christian Couder Feb. 9, 2022, 6:01 p.m. UTC | #1

On Mon, Feb 7, 2022 at 7:03 AM Patrick Steinhardt <ps@pks.im> wrote:

> Benchmarks in a repository with about 2,1 million refs and an up-to-date
> commit-graph show a 20% speedup when mirror-fetching:
>
>     Benchmark 1: git fetch --atomic +refs/*:refs/* (v2.35.0)
>       Time (mean ± σ):     75.264 s ±  1.115 s    [User: 68.199 s, System: 10.094 s]
>       Range (min … max):   74.145 s … 76.862 s    5 runs
>
>     Benchmark 2: git fetch --atomic +refs/*:refs/* (HEAD)
>       Time (mean ± σ):     62.350 s ±  0.854 s    [User: 55.412 s, System: 9.976 s]
>       Range (min … max):   61.224 s … 63.216 s    5 runs
>
>     Summary
>       'git fetch --atomic +refs/*:refs/* (HEAD)' ran
>         1.21 ± 0.02 times faster than 'git fetch --atomic +refs/*:refs/* (v2.35.0)'

The commit message and code make sense to me, but I wonder if there is
a reason why --atomic is used when fetching.

Patrick Steinhardt Feb. 10, 2022, 11:43 a.m. UTC | #2

On Wed, Feb 09, 2022 at 07:01:54PM +0100, Christian Couder wrote:
> On Mon, Feb 7, 2022 at 7:03 AM Patrick Steinhardt <ps@pks.im> wrote:
> 
> > Benchmarks in a repository with about 2,1 million refs and an up-to-date
> > commit-graph show a 20% speedup when mirror-fetching:
> >
> >     Benchmark 1: git fetch --atomic +refs/*:refs/* (v2.35.0)
> >       Time (mean ± σ):     75.264 s ±  1.115 s    [User: 68.199 s, System: 10.094 s]
> >       Range (min … max):   74.145 s … 76.862 s    5 runs
> >
> >     Benchmark 2: git fetch --atomic +refs/*:refs/* (HEAD)
> >       Time (mean ± σ):     62.350 s ±  0.854 s    [User: 55.412 s, System: 9.976 s]
> >       Range (min … max):   61.224 s … 63.216 s    5 runs
> >
> >     Summary
> >       'git fetch --atomic +refs/*:refs/* (HEAD)' ran
> >         1.21 ± 0.02 times faster than 'git fetch --atomic +refs/*:refs/* (v2.35.0)'
> 
> The commit message and code make sense to me, but I wonder if there is
> a reason why --atomic is used when fetching.

The repository that I was mirror-fetching into needs to update a big
bunch of references, and doing that via `--atomic` is more efficient
than doing it without, and this shows in the benchmark. I did another
benchmarking run without `--atomic`, and it is indeed about 30 seconds
slower for both cases. But interestingly the relative performance
improvement is still roughly the same:

    Benchmark 1: git fetch +refs/*:refs/* (v2.35.0)
      Time (mean ± σ):     115.587 s ±  2.009 s    [User: 109.874 s, System: 11.305 s]
      Range (min … max):   113.584 s … 118.820 s    5 runs

    Benchmark 2: git fetch +refs/*:refs/* (pks-fetch-pack-optim-v1~)
      Time (mean ± σ):     96.859 s ±  0.624 s    [User: 91.948 s, System: 10.980 s]
      Range (min … max):   96.180 s … 97.875 s    5 runs

    Summary
      'git fetch +refs/*:refs/* (pks-fetch-pack-optim-v1~)' ran
        1.19 ± 0.02 times faster than 'git fetch +refs/*:refs/* (v2.35.0)'

I'll update the commit message to just use this new benchmark so that
the `--atomic` flag doesn't cause any questions.

Patrick

diff --git a/fetch-pack.c b/fetch-pack.c
index dd6ec449f2..c5967e228e 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -696,26 +696,30 @@  static void mark_complete_and_common_ref(struct fetch_negotiator *negotiator,
 
 	trace2_region_enter("fetch-pack", "parse_remote_refs_and_find_cutoff", NULL);
 	for (ref = *refs; ref; ref = ref->next) {
-		struct object *o;
+		struct commit *commit;
 
-		if (!has_object_file_with_flags(&ref->old_oid,
+		commit = lookup_commit_in_graph(the_repository, &ref->old_oid);
+		if (!commit) {
+			struct object *o;
+
+			if (!has_object_file_with_flags(&ref->old_oid,
 						OBJECT_INFO_QUICK |
-							OBJECT_INFO_SKIP_FETCH_OBJECT))
-			continue;
-		o = parse_object(the_repository, &ref->old_oid);
-		if (!o)
-			continue;
+						OBJECT_INFO_SKIP_FETCH_OBJECT))
+				continue;
+			o = parse_object(the_repository, &ref->old_oid);
+			if (!o || o->type != OBJ_COMMIT)
+				continue;
+
+			commit = (struct commit *)o;
+		}
 
 		/*
 		 * We already have it -- which may mean that we were
 		 * in sync with the other side at some time after
 		 * that (it is OK if we guess wrong here).
 		 */
-		if (o->type == OBJ_COMMIT) {
-			struct commit *commit = (struct commit *)o;
-			if (!cutoff || cutoff < commit->date)
-				cutoff = commit->date;
-		}
+		if (!cutoff || cutoff < commit->date)
+			cutoff = commit->date;
 	}
 	trace2_region_leave("fetch-pack", "parse_remote_refs_and_find_cutoff", NULL);

[v2,1/2] fetch-pack: use commit-graph when computing cutoff

Commit Message

Comments

Patch