[v2,0/5] fetch: more optimizations for mirror fetches

Message ID	cover.1646127015.git.ps@pks.im (mailing list archive)
Headers	show Return-Path: <git-owner@kernel.org> Date: Tue, 1 Mar 2022 10:33:33 +0100 From: Patrick Steinhardt <ps@pks.im> To: git@vger.kernel.org Cc: Derrick Stolee <derrickstolee@github.com> Subject: [PATCH v2 0/5] fetch: more optimizations for mirror fetches Message-ID: <cover.1646127015.git.ps@pks.im> References: <cover.1645619224.git.ps@pks.im> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="VcXUPscmPKaD3Mwz" Content-Disposition: inline In-Reply-To: <cover.1645619224.git.ps@pks.im> Precedence: bulk
Series	fetch: more optimizations for mirror fetches \| expand [v2,0/5] fetch: more optimizations for mirror fetches [v2,1/5] upload-pack: look up "want" lines via commit-graph [v2,2/5] fetch: avoid lookup of commits when not appending to FETCH_HEAD [v2,3/5] refs: add ability for backends to special-case reading of symbolic refs [v2,4/5] remote: read symbolic refs via `refs_read_symbolic_ref()` [v2,5/5] refs/files-backend: optimize reading of symbolic refs

Message ID

cover.1646127015.git.ps@pks.im (mailing list archive)

Headers

Date: Tue, 1 Mar 2022 10:33:33 +0100
From: Patrick Steinhardt <ps@pks.im>
To: git@vger.kernel.org
Cc: Derrick Stolee <derrickstolee@github.com>
Subject: [PATCH v2 0/5] fetch: more optimizations for mirror fetches
Message-ID: <cover.1646127015.git.ps@pks.im>
References: <cover.1645619224.git.ps@pks.im>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha512;
        protocol="application/pgp-signature"; boundary="VcXUPscmPKaD3Mwz"
Content-Disposition: inline
In-Reply-To: <cover.1645619224.git.ps@pks.im>
Precedence: bulk

Series

fetch: more optimizations for mirror fetches | expand

Message

Patrick Steinhardt March 1, 2022, 9:33 a.m. UTC

Hi,

this is another patch series with the aim to speed up mirror fetches. It
applies on top of e6ebfd0e8c (The sixth batch, 2022-02-18) with
3824153b23 (Merge branch 'ps/fetch-atomic' into next, 2022-02-18) merged
into it to fix a conflict.

The only change compared to v2 is an update to the benchmarks so that
they're less verbose, as proposed by Derrick. I also had a look at
introducing a new helper `parse_object_probably_commit()`, but I didn't
find the end result to be much of an improvement compared to the ad-hoc
`lookup_commit_in_graph() || parse_object()` dance we do right now.

Thanks!

Patrick

Patrick Steinhardt (5):
  upload-pack: look up "want" lines via commit-graph
  fetch: avoid lookup of commits when not appending to FETCH_HEAD
  refs: add ability for backends to special-case reading of symbolic
    refs
  remote: read symbolic refs via `refs_read_symbolic_ref()`
  refs/files-backend: optimize reading of symbolic refs

 builtin/fetch.c       | 42 +++++++++++++++++++++++++++---------------
 builtin/remote.c      |  8 +++++---
 refs.c                | 17 +++++++++++++++++
 refs.h                |  3 +++
 refs/debug.c          |  1 +
 refs/files-backend.c  | 33 ++++++++++++++++++++++++++++-----
 refs/packed-backend.c |  1 +
 refs/refs-internal.h  | 16 ++++++++++++++++
 remote.c              | 14 +++++++-------
 upload-pack.c         | 20 +++++++++++++++++---
 10 files changed, 122 insertions(+), 33 deletions(-)

Range-diff against v1:
1:  ca5e136cca ! 1:  b5c696bd8e upload-pack: look up "want" lines via commit-graph
    @@ Commit message
         Refactor parsing of both "want" and "want-ref" lines to do so.
     
         The following benchmark is executed in a repository with a huge number
    -    of references. It uses cached request from git-fetch(1) as input and
    -    contains about 876,000 "want" lines:
    +    of references. It uses cached request from git-fetch(1) as input to
    +    git-upload-pack(1) that contains about 876,000 "want" lines:
     
    -        Benchmark 1: git-upload-pack (HEAD~)
    +        Benchmark 1: HEAD~
               Time (mean ± σ):      7.113 s ±  0.028 s    [User: 6.900 s, System: 0.662 s]
               Range (min … max):    7.072 s …  7.168 s    10 runs
     
    -        Benchmark 2: git-upload-pack (HEAD)
    +        Benchmark 2: HEAD
               Time (mean ± σ):      6.622 s ±  0.061 s    [User: 6.452 s, System: 0.650 s]
               Range (min … max):    6.535 s …  6.727 s    10 runs
     
             Summary
    -          'git-upload-pack (HEAD)' ran
    -            1.07 ± 0.01 times faster than 'git-upload-pack (HEAD~)'
    +          'HEAD' ran
    +            1.07 ± 0.01 times faster than 'HEAD~'
     
         Signed-off-by: Patrick Steinhardt <ps@pks.im>
     
2:  80f993dddd ! 2:  fbe76b78c3 fetch: avoid lookup of commits when not appending to FETCH_HEAD
    @@ Commit message
     
         Skip this busywork in case we're not writing to FETCH_HEAD. The
         following benchmark performs a mirror-fetch in a repository with about
    -    two million references:
    +    two million references via `git fetch --prune --no-write-fetch-head
    +    +refs/*:refs/*`:
     
    -        Benchmark 1: git fetch --prune --no-write-fetch-head +refs/*:refs/* (HEAD~)
    +        Benchmark 1: HEAD~
               Time (mean ± σ):     75.388 s ±  1.942 s    [User: 71.103 s, System: 8.953 s]
               Range (min … max):   73.184 s … 76.845 s    3 runs
     
    -        Benchmark 2: git fetch --prune --no-write-fetch-head +refs/*:refs/* (HEAD)
    +        Benchmark 2: HEAD
               Time (mean ± σ):     69.486 s ±  1.016 s    [User: 65.941 s, System: 8.806 s]
               Range (min … max):   68.864 s … 70.659 s    3 runs
     
             Summary
    -          'git fetch --prune --no-write-fetch-head +refs/*:refs/* (HEAD)' ran
    -            1.08 ± 0.03 times faster than 'git fetch --prune --no-write-fetch-head +refs/*:refs/* (HEAD~)'
    +          'HEAD' ran
    +            1.08 ± 0.03 times faster than 'HEAD~'
     
         Signed-off-by: Patrick Steinhardt <ps@pks.im>
     
3:  28cacbdbe2 = 3:  29eb81d37c refs: add ability for backends to special-case reading of symbolic refs
4:  1d24101fe4 = 4:  0489380e00 remote: read symbolic refs via `refs_read_symbolic_ref()`
5:  7213ffdbdd ! 5:  b6eca63d3b refs/files-backend: optimize reading of symbolic refs
    @@ Commit message
         need to skip updating local symbolic references during a fetch, which is
         why the change results in a significant speedup when doing fetches in
         repositories with huge numbers of references. The following benchmark
    -    executes a mirror-fetch in a repository with about 2 million references:
    +    executes a mirror-fetch in a repository with about 2 million references
    +    via `git fetch --prune --no-write-fetch-head +refs/*:refs/*`:
     
    -        Benchmark 1: git fetch --prune --no-write-fetch-head +refs/*:refs/* (HEAD~)
    +        Benchmark 1: HEAD~
               Time (mean ± σ):     68.372 s ±  2.344 s    [User: 65.629 s, System: 8.786 s]
               Range (min … max):   65.745 s … 70.246 s    3 runs
     
    -        Benchmark 2: git fetch --prune --no-write-fetch-head +refs/*:refs/* (HEAD)
    +        Benchmark 2: HEAD
               Time (mean ± σ):     60.259 s ±  0.343 s    [User: 61.019 s, System: 7.245 s]
               Range (min … max):   60.003 s … 60.649 s    3 runs
     
             Summary
    -          'git fetch --prune --no-write-fetch-head +refs/*:refs/* (HEAD)' ran
    -            1.13 ± 0.04 times faster than 'git fetch --prune --no-write-fetch-head +refs/*:refs/* (HEAD~)'
    +          'HEAD' ran
    +            1.13 ± 0.04 times faster than 'HEAD~'
     
         Signed-off-by: Patrick Steinhardt <ps@pks.im>

Comments

Junio C Hamano March 1, 2022, 10:02 p.m. UTC | #1

Patrick Steinhardt <ps@pks.im> writes:

> this is another patch series with the aim to speed up mirror fetches. It
> applies on top of e6ebfd0e8c (The sixth batch, 2022-02-18) with
> 3824153b23 (Merge branch 'ps/fetch-atomic' into next, 2022-02-18) merged
> into it to fix a conflict.

Thanks for clearly describing a base.  Except that a merge on 'next'
(e.g. 3824153b23) is not what we want a new topic to depend on; use
the tip(s) of individual topic(s), i.e. 583bc419 (fetch: make
`--atomic` flag cover pruning of refs, 2022-02-17), instead.

Derrick Stolee March 2, 2022, 6:54 p.m. UTC | #2

On 3/1/2022 4:33 AM, Patrick Steinhardt wrote:
> Hi,
> 
> this is another patch series with the aim to speed up mirror fetches. It
> applies on top of e6ebfd0e8c (The sixth batch, 2022-02-18) with
> 3824153b23 (Merge branch 'ps/fetch-atomic' into next, 2022-02-18) merged
> into it to fix a conflict.
> 
> The only change compared to v2 is an update to the benchmarks so that
> they're less verbose, as proposed by Derrick. I also had a look at
> introducing a new helper `parse_object_probably_commit()`, but I didn't
> find the end result to be much of an improvement compared to the ad-hoc
> `lookup_commit_in_graph() || parse_object()` dance we do right now.

I'm satisfied that you tried the helper idea. This version
looks good to me.

Thanks,
-Stolee