mbox series

[00/10] refs: optimize ref format migrations

Message ID 20241108-pks-refs-optimize-migrations-v1-0-7fd37fa80e35@pks.im (mailing list archive)
Headers show
Series refs: optimize ref format migrations | expand

Message

Patrick Steinhardt Nov. 8, 2024, 9:34 a.m. UTC
Hi,

I have recently learned that ref format migrations can take a
significant amount of time in the order of minutes when migrating
millions of refs. This is probably not entirely surprising: the initial
focus for the logic to migrate ref backends was mostly focussed on
getting the basic feature working, and I didn't yet invest any time into
optimizing the code path at all. But I was still mildly surprised that
the migration of a couple million refs was taking minutes to finish.

This patch series thus optimizes how we migrate ref formats. This is
mostly done by expanding upon the "initial transaction" semantics that
we already use for git-clone(1). These semantics allow us to assume that
the ref backend is completely empty and that there are no concurrent
writers, and thus we are free to perform certain optimizations that
wouldn't have otherwise been possible. On the one hand this allows us to
drop needless collision checks. On the other hand, it also allows us to
write regular refs directly into the "packed-refs" file when migrating
from the "reftable" backend to the "files" backend.

This leads to some significant speedups. Migrating 1 million refs from
"files" to "reftable":

    Benchmark 1: migrate files:reftable (refcount = 1000000, revision = origin/master)
      Time (mean ± σ):      4.580 s ±  0.062 s    [User: 1.818 s, System: 2.746 s]
      Range (min … max):    4.534 s …  4.743 s    10 runs

    Benchmark 2: migrate files:reftable (refcount = 1000000, revision = pks-refs-optimize-migrations)
      Time (mean ± σ):     767.7 ms ±   9.5 ms    [User: 629.2 ms, System: 126.1 ms]
      Range (min … max):   755.8 ms … 786.9 ms    10 runs

    Summary
      migrate files:reftable (refcount = 1000000, revision = pks-refs-optimize-migrations) ran
        5.97 ± 0.11 times faster than migrate files:reftable (refcount = 1000000, revision = origin/master)

And migrating from "reftable" to "files:

    Benchmark 1: migrate reftable:files (refcount = 1000000, revision = origin/master)
      Time (mean ± σ):     35.409 s ±  0.302 s    [User: 5.061 s, System: 29.244 s]
      Range (min … max):   35.055 s … 35.898 s    10 runs

    Benchmark 2: migrate reftable:files (refcount = 1000000, revision = pks-refs-optimize-migrations)
      Time (mean ± σ):     855.9 ms ±  61.5 ms    [User: 646.7 ms, System: 187.1 ms]
      Range (min … max):   830.0 ms … 1030.3 ms    10 runs

    Summary
      migrate reftable:files (refcount = 1000000, revision = pks-refs-optimize-migrations) ran
       41.37 ± 2.99 times faster than migrate reftable:files (refcount = 1000000, revision = origin/master)

Thanks!

Patrick

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
Patrick Steinhardt (10):
      refs: allow passing flags when setting up a transaction
      refs/files: move logic to commit initial transaction
      refs: introduce "initial" transaction flag
      refs/files: support symbolic and root refs in initial transaction
      refs: use "initial" transaction semantics to migrate refs
      refs: skip collision checks in initial transactions
      refs: don't normalize log messages with `REF_SKIP_CREATE_REFLOG`
      reftable/writer: optimize allocations by using a scratch buffer
      reftable/block: rename `block_writer::buf` variable
      reftable/block: optimize allocations by using scratch buffer

 branch.c                  |   2 +-
 builtin/clone.c           |   4 +-
 builtin/fast-import.c     |   4 +-
 builtin/fetch.c           |   4 +-
 builtin/receive-pack.c    |   4 +-
 builtin/replace.c         |   2 +-
 builtin/tag.c             |   2 +-
 builtin/update-ref.c      |   4 +-
 refs.c                    |  70 ++++++-------
 refs.h                    |  45 +++++----
 refs/debug.c              |  13 ---
 refs/files-backend.c      | 244 +++++++++++++++++++++++++---------------------
 refs/packed-backend.c     |   8 --
 refs/refs-internal.h      |   2 +-
 refs/reftable-backend.c   |  14 +--
 reftable/block.c          |  33 +++----
 reftable/block.h          |   9 +-
 reftable/writer.c         |  23 +++--
 reftable/writer.h         |   1 +
 sequencer.c               |   6 +-
 t/helper/test-ref-store.c |   2 +-
 t/t1460-refs-migrate.sh   |   2 +-
 walker.c                  |   2 +-
 23 files changed, 247 insertions(+), 253 deletions(-)
---
base-commit: facbe4f633e4ad31e641f64617bc88074c659959
change-id: 20241108-pks-refs-optimize-migrations-6d0ceee4abb7

Best regards,

Comments

karthik nayak Nov. 11, 2024, 10:57 a.m. UTC | #1
Patrick Steinhardt <ps@pks.im> writes:

> Hi,
>
> I have recently learned that ref format migrations can take a
> significant amount of time in the order of minutes when migrating
> millions of refs. This is probably not entirely surprising: the initial
> focus for the logic to migrate ref backends was mostly focussed on
> getting the basic feature working, and I didn't yet invest any time into
> optimizing the code path at all. But I was still mildly surprised that
> the migration of a couple million refs was taking minutes to finish.
>
> This patch series thus optimizes how we migrate ref formats. This is
> mostly done by expanding upon the "initial transaction" semantics that
> we already use for git-clone(1). These semantics allow us to assume that
> the ref backend is completely empty and that there are no concurrent
> writers, and thus we are free to perform certain optimizations that
> wouldn't have otherwise been possible. On the one hand this allows us to
> drop needless collision checks. On the other hand, it also allows us to
> write regular refs directly into the "packed-refs" file when migrating
> from the "reftable" backend to the "files" backend.
>
> This leads to some significant speedups. Migrating 1 million refs from
> "files" to "reftable":
>
>     Benchmark 1: migrate files:reftable (refcount = 1000000, revision = origin/master)
>       Time (mean ± σ):      4.580 s ±  0.062 s    [User: 1.818 s, System: 2.746 s]
>       Range (min … max):    4.534 s …  4.743 s    10 runs
>
>     Benchmark 2: migrate files:reftable (refcount = 1000000, revision = pks-refs-optimize-migrations)
>       Time (mean ± σ):     767.7 ms ±   9.5 ms    [User: 629.2 ms, System: 126.1 ms]
>       Range (min … max):   755.8 ms … 786.9 ms    10 runs
>
>     Summary
>       migrate files:reftable (refcount = 1000000, revision = pks-refs-optimize-migrations) ran
>         5.97 ± 0.11 times faster than migrate files:reftable (refcount = 1000000, revision = origin/master)
>
> And migrating from "reftable" to "files:
>
>     Benchmark 1: migrate reftable:files (refcount = 1000000, revision = origin/master)
>       Time (mean ± σ):     35.409 s ±  0.302 s    [User: 5.061 s, System: 29.244 s]
>       Range (min … max):   35.055 s … 35.898 s    10 runs
>
>     Benchmark 2: migrate reftable:files (refcount = 1000000, revision = pks-refs-optimize-migrations)
>       Time (mean ± σ):     855.9 ms ±  61.5 ms    [User: 646.7 ms, System: 187.1 ms]
>       Range (min … max):   830.0 ms … 1030.3 ms    10 runs
>
>     Summary
>       migrate reftable:files (refcount = 1000000, revision = pks-refs-optimize-migrations) ran
>        41.37 ± 2.99 times faster than migrate reftable:files (refcount = 1000000, revision = origin/master)
>
> Thanks!
>
> Patrick
>

I read through the series, apart from a few small nits, the patches
look good and straightforward.

Thanks
Patrick Steinhardt Nov. 11, 2024, 12:53 p.m. UTC | #2
On Mon, Nov 11, 2024 at 05:57:43AM -0500, karthik nayak wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> I read through the series, apart from a few small nits, the patches
> look good and straightforward.

I've queued the single change to the first commit message locally, but
don't think that this is sufficient reason yet to reroll the patch
series, so I'll wait for additional feedback.

Thanks for your review!

Patrick