mbox series

[v2,0/3] refs: sync loose refs to disk before committing them

Message ID cover.1636544377.git.ps@pks.im (mailing list archive)
Headers show
Series refs: sync loose refs to disk before committing them | expand

Message

Patrick Steinhardt Nov. 10, 2021, 11:40 a.m. UTC
Hi,

[This is a resend, I initially used the wrong mailing list address.]

this is the second version of this patch series to implement syncing of
loose refs to avoid a class of ref corruption.

This series is implements the same batching as Neeraj's series [1]. It
introduces a new config switch "core.fsyncRefFiles" with the same three
toggles "on", "off" and "batch". Given that it reuses funcitonality
provided by the batched object flushing this series is based on "next".

Please note that I didn't yet add any performance numbers or tests.
Performance tests didn't show any conclusive results on my machine given
that I couldn't observe any noticeable impact at all, and I didn't write
tests yet given that I first wanted to get some feedback on this series.
If we agree that this is the correct way to go forward I'll of course
put in some more time to add them.

Patrick

[1]: <pull.1076.v8.git.git.1633366667.gitgitgadget@gmail.com>

Patrick Steinhardt (3):
  wrapper: handle EINTR in `git_fsync()`
  wrapper: provide function to sync directories
  refs: add configuration to enable flushing of refs

 Documentation/config/core.txt |  7 ++++
 bulk-checkin.c                | 13 ++------
 cache.h                       |  7 ++++
 config.c                      | 10 ++++++
 environment.c                 |  1 +
 git-compat-util.h             |  7 ++++
 refs/files-backend.c          | 62 ++++++++++++++++++++++++++++++++++-
 wrapper.c                     | 30 ++++++++++++++++-
 write-or-die.c                |  6 ++--
 9 files changed, 127 insertions(+), 16 deletions(-)

Comments

Johannes Schindelin Nov. 10, 2021, 2:44 p.m. UTC | #1
Hi Patrick,

On Wed, 10 Nov 2021, Patrick Steinhardt wrote:

> This series is implements the same batching as Neeraj's series [1]. It
> introduces a new config switch "core.fsyncRefFiles" with the same three
> toggles "on", "off" and "batch". Given that it reuses funcitonality
> provided by the batched object flushing this series is based on "next".

Excellent, I am glad that my idea was helpful.

I read through the patches and found them easy to follow. Even 3/3, which
was a bit complex because of the need to support both transacted ref
updates as well as atomic renames/copies, is clear and well-written.

FWIW you could base your patches directly on `ns/batched-fsync`, I would
think.

Thank you,
Dscho
Jeff King Nov. 10, 2021, 8:45 p.m. UTC | #2
On Wed, Nov 10, 2021 at 12:40:50PM +0100, Patrick Steinhardt wrote:

> Please note that I didn't yet add any performance numbers or tests.
> Performance tests didn't show any conclusive results on my machine given
> that I couldn't observe any noticeable impact at all, and I didn't write
> tests yet given that I first wanted to get some feedback on this series.
> If we agree that this is the correct way to go forward I'll of course
> put in some more time to add them.

Here's a mild adjustment of the quick test I showed earlier in [1]. It
uses your version for all cases, but swaps between the three config
options, and it uses --atomic to put everything into a single
transaction:

  $ hyperfine -L v false,true,batch -p ./setup 'git -C dst.git config core.fsyncreffiles {v}; git.compile push --atomic dst.git refs/heads/*'
  Benchmark 1: git -C dst.git config core.fsyncreffiles false; git.compile push --atomic dst.git refs/heads/*
    Time (mean ± σ):      10.5 ms ±   0.2 ms    [User: 7.8 ms, System: 3.9 ms]
    Range (min … max):    10.1 ms …  11.0 ms    108 runs
   
  Benchmark 2: git -C dst.git config core.fsyncreffiles true; git.compile push --atomic dst.git refs/heads/*
    Time (mean ± σ):     402.5 ms ±   6.2 ms    [User: 13.9 ms, System: 10.7 ms]
    Range (min … max):   387.3 ms … 408.9 ms    10 runs
   
  Benchmark 3: git -C dst.git config core.fsyncreffiles batch; git.compile push --atomic dst.git refs/heads/*
    Time (mean ± σ):      21.4 ms ±   0.6 ms    [User: 8.0 ms, System: 5.2 ms]
    Range (min … max):    20.7 ms …  24.9 ms    99 runs
   
  Summary
    'git -C dst.git config core.fsyncreffiles false; git.compile push --atomic dst.git refs/heads/*' ran
      2.05 ± 0.07 times faster than 'git -C dst.git config core.fsyncreffiles batch; git.compile push --atomic dst.git refs/heads/*'
     38.51 ± 1.06 times faster than 'git -C dst.git config core.fsyncreffiles true; git.compile push --atomic dst.git refs/heads/*'

So yes, the "batch" case is much improved. It's still more expensive
than skipping the fsync entirely, but it's within reason. The trick,
though is the --atomic. If I drop that flag, then "batch" takes as long
as "true". And of course that's the default.

I wonder if that means we'd want an extra mode: do the WRITEOUT_ONLY
flush, but _don't_ do a HARDWARE_FLUSH for each transaction. That
doesn't give you durability, but it (in theory) solves the out-of-order
journal problem without paying any performance cost.

I say "in theory" because I'm just assuming that the WRITEOUT thing
works as advertised. I don't have an easy way to test the outcome on a
power failure at the right moment here.

-Peff

[1] https://lore.kernel.org/git/YYTaiIlEKxHRVdCy@coredump.intra.peff.net/
Patrick Steinhardt Nov. 11, 2021, 11:47 a.m. UTC | #3
On Wed, Nov 10, 2021 at 03:45:11PM -0500, Jeff King wrote:
> On Wed, Nov 10, 2021 at 12:40:50PM +0100, Patrick Steinhardt wrote:
> 
> > Please note that I didn't yet add any performance numbers or tests.
> > Performance tests didn't show any conclusive results on my machine given
> > that I couldn't observe any noticeable impact at all, and I didn't write
> > tests yet given that I first wanted to get some feedback on this series.
> > If we agree that this is the correct way to go forward I'll of course
> > put in some more time to add them.
> 
> Here's a mild adjustment of the quick test I showed earlier in [1]. It
> uses your version for all cases, but swaps between the three config
> options, and it uses --atomic to put everything into a single
> transaction:
> 
>   $ hyperfine -L v false,true,batch -p ./setup 'git -C dst.git config core.fsyncreffiles {v}; git.compile push --atomic dst.git refs/heads/*'
>   Benchmark 1: git -C dst.git config core.fsyncreffiles false; git.compile push --atomic dst.git refs/heads/*
>     Time (mean ± σ):      10.5 ms ±   0.2 ms    [User: 7.8 ms, System: 3.9 ms]
>     Range (min … max):    10.1 ms …  11.0 ms    108 runs
>    
>   Benchmark 2: git -C dst.git config core.fsyncreffiles true; git.compile push --atomic dst.git refs/heads/*
>     Time (mean ± σ):     402.5 ms ±   6.2 ms    [User: 13.9 ms, System: 10.7 ms]
>     Range (min … max):   387.3 ms … 408.9 ms    10 runs
>    
>   Benchmark 3: git -C dst.git config core.fsyncreffiles batch; git.compile push --atomic dst.git refs/heads/*
>     Time (mean ± σ):      21.4 ms ±   0.6 ms    [User: 8.0 ms, System: 5.2 ms]
>     Range (min … max):    20.7 ms …  24.9 ms    99 runs
>    
>   Summary
>     'git -C dst.git config core.fsyncreffiles false; git.compile push --atomic dst.git refs/heads/*' ran
>       2.05 ± 0.07 times faster than 'git -C dst.git config core.fsyncreffiles batch; git.compile push --atomic dst.git refs/heads/*'
>      38.51 ± 1.06 times faster than 'git -C dst.git config core.fsyncreffiles true; git.compile push --atomic dst.git refs/heads/*'
> 
> So yes, the "batch" case is much improved. It's still more expensive
> than skipping the fsync entirely, but it's within reason. The trick,
> though is the --atomic. If I drop that flag, then "batch" takes as long
> as "true". And of course that's the default.

Great, thanks for doing these tests.

> I wonder if that means we'd want an extra mode: do the WRITEOUT_ONLY
> flush, but _don't_ do a HARDWARE_FLUSH for each transaction. That
> doesn't give you durability, but it (in theory) solves the out-of-order
> journal problem without paying any performance cost.
> 
> I say "in theory" because I'm just assuming that the WRITEOUT thing
> works as advertised. I don't have an easy way to test the outcome on a
> power failure at the right moment here.
> 
> -Peff
> 
> [1] https://lore.kernel.org/git/YYTaiIlEKxHRVdCy@coredump.intra.peff.net/

Yeah, it's also one of the things I'm currently wondering about. I just
piggy-backed on the preexisting patch series which claim that it at
least ought to work on macOS and Windows, but left out of the equation
were Linux filesystems. I'm hoping that we'll get an answer to this
question via <211111.86pmr7pk9o.gmgdl@evledraar.gmail.com>, where Ævar
put Linus and Christopher Hellwig on Cc.

In any case, if the outcome of a WRITEOUT_ONLY flush would be that we
may not see all renames, but that every refs' contents would be
well-defined, then this is a tradeoff we'd probably want to make at
GitLab.

Patrick