mbox series

[0/2,RFC] Implement a bulk-checkin option for core.fsyncObjectFiles

Message ID pull.1076.git.git.1629856292.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series Implement a bulk-checkin option for core.fsyncObjectFiles | expand

Message

Philippe Blain via GitGitGadget Aug. 25, 2021, 1:51 a.m. UTC
Git for Windows has had fsyncing of object files enabled since "409cae91eb
(mingw: change core.fsyncObjectFiles = 1 by default, 2017-09-04)".

There have been requests to make core.fsyncObjectFiles the default
everywhere, but there are concerns about its performance cost (perf results
below). There's a long and gory thread here:
https://lore.kernel.org/git/87a7xcw8sa.fsf@linux-m68k.org/t/.

My change introduces the new 'core.fsyncobjectFiles = 2' setting, which
batches the data-integrity FLUSH command sent to the disk across multiple
loose object files added to the object database.

We take advantage of the bulk-checkin hooks already in the add command and
add some hooks to the update-index (which is used internally by stash).
Details are in the last patch of the series.

Here's a simple performance test script:

    #!/bin/sh
    git clone https://github.com/nodejs/node.git node-repo-cache
    git clone node-repo-cache node-repo
    cd node-repo
    git --version
    
    find . -name "*.c" -exec sh -c 'echo foo1 >> $1' -- {} \;
    echo "----GIT stash fsync"
    time git -c core.fsyncObjectFiles=true stash push
    
    find . -name "*.c" -exec sh -c 'echo foo2 >> $1' -- {} \;
    echo "----GIT stash fsync_defer"
    time git -c core.fsyncObjectFiles=2 stash push
    
    find . -name "*.c" -exec sh -c 'echo foo3 >> $1' -- {} \;
    echo "----GIT stash no_fsync"
    time git -c core.fsyncObjectFiles=false stash push
    
    cd ..
    rm -r -f node-repo


Hardware:

 * Mac - Mac Mini 2018 running MacOS 11.5.1, APFS with a 1TB Apple NMVE SSD,
 * Linux - Ubuntu 20.04 - ext4 running on a Hyper-V VM with a fixed VHDX
   backed by a Samsung PM981.
 * Win - Windows NTFS - Same Hyper-V host as Linux. Operation | Mac | Linux
   | Windows

---------------- |---------|-------|---------- git fsync | 40.6 s | 7.8 s |
6.9s git fsync_defer | 6.5 s | 2.1 s | 3.8s git no_fsync | 1.7 s | 1.0 s |
2.6s

The windows version of git is slightly different:
https://github.com/git-for-windows/git/pull/3391. I also used a
Windows-specific test script.

I hope I'm CC'ing a reasonable set of people on this patch, based on the
last discussion.

Thanks, Neeraj Singh Windows Core File Systems.

Neeraj Singh (2):
  object-file: use futimes rather than utime
  core.fsyncobjectfiles: batch disk flushes

 Documentation/config/core.txt |  17 ++++--
 Makefile                      |   4 ++
 builtin/add.c                 |   3 +-
 builtin/update-index.c        |   3 +
 bulk-checkin.c                | 105 +++++++++++++++++++++++++++++++---
 bulk-checkin.h                |   4 +-
 compat/mingw.c                |  42 +++++++++-----
 compat/mingw.h                |   2 +
 config.c                      |   4 +-
 config.mak.uname              |   2 +
 configure.ac                  |   8 +++
 git-compat-util.h             |   7 +++
 object-file.c                 |  23 ++------
 wrapper.c                     |  36 ++++++++++++
 write-or-die.c                |   2 +-
 15 files changed, 213 insertions(+), 49 deletions(-)


base-commit: 225bc32a989d7a22fa6addafd4ce7dcd04675dbf
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1076%2Fneerajsi-msft%2Fneerajsi%2Fbulk-fsync-object-files-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1076/neerajsi-msft/neerajsi/bulk-fsync-object-files-v1
Pull-Request: https://github.com/git/git/pull/1076

Comments

Neeraj Singh Aug. 25, 2021, 4:58 p.m. UTC | #1
On Tue, Aug 24, 2021 at 6:51 PM Neeraj K. Singh via GitGitGadget
<gitgitgadget@gmail.com> wrote:
> Hardware:
>
>  * Mac - Mac Mini 2018 running MacOS 11.5.1, APFS with a 1TB Apple NMVE SSD,
>  * Linux - Ubuntu 20.04 - ext4 running on a Hyper-V VM with a fixed VHDX
>    backed by a Samsung PM981.
>  * Win - Windows NTFS - Same Hyper-V host as Linux. Operation | Mac | Linux
>    | Windows
>
> ---------------- |---------|-------|---------- git fsync | 40.6 s | 7.8 s |
> 6.9s git fsync_defer | 6.5 s | 2.1 s | 3.8s git no_fsync | 1.7 s | 1.0 s |
> 2.6s
>
I just wanted to fix this performance test table so that it is readable.
Operation       | Mac     | Linux | Windows
----------------|---------|-------|----------
git fsync       | 40.6 s  | 7.8 s | 6.9 s
git fsync_defer | 6.5 s   | 2.1 s | 3.8 s
git no_fsync    | 1.7 s   | 1.0 s | 2.6 s

Here's the graphical version:
https://docs.google.com/spreadsheets/d/18HWXSUVAVqqKATsuVvgxDF6ftX_5qG1UGgNjGtwOuu8/edit?usp=sharing