mbox series

[0/7] core.fsyncmethod: add 'batch' mode for faster fsyncing of multiple objects

Message ID pull.1134.git.1647379859.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series core.fsyncmethod: add 'batch' mode for faster fsyncing of multiple objects | expand

Message

Philippe Blain via GitGitGadget March 15, 2022, 9:30 p.m. UTC
When core.fsync includes loose-object, we issue an fsync after every written
object. For a 'git-add' or similar command that adds a lot of files to the
repo, the costs of these fsyncs adds up. One major factor in this cost is
the time it takes for the physical storage controller to flush its caches to
durable media.

This series takes advantage of the writeout-only mode of git_fsync to issue
OS cache writebacks for all of the objects being added to the repository
followed by a single fsync to a dummy file, which should trigger a
filesystem log flush and storage controller cache flush. This mechanism is
known to be safe on common Windows filesystems and expected to be safe on
macOS. Some linux filesystems, such as XFS, will probably do the right thing
as well. See [1] for previous discussion on the predecessor of this patch
series.

This series is important on Windows, where loose-objects are included in the
fsync set by default in Git-For-Windows. In this series, I'm also setting
the default mode for Windows to turn on loose object fsyncing with batch
mode, so that we can get CI coverage of the actual git-for-windows
configuration upstream. We still don't actually issue fsyncs for the test
suite since GIT_TEST_FSYNC is set to 0, but we exercise all of the
surrounding batch mode code.

This work is based on 'seen' at 367f447f0f0cf39e9830c865e8373e42a3c45303.
It's dependent on ns/core-fsyncmethod.

[1]
https://lore.kernel.org/git/2c1ddef6057157d85da74a7274e03eacf0374e45.1629856293.git.gitgitgadget@gmail.com/

Neeraj Singh (7):
  bulk-checkin: rename 'state' variable and separate 'plugged' boolean
  core.fsyncmethod: batched disk flushes for loose-objects
  update-index: use the bulk-checkin infrastructure
  unpack-objects: use the bulk-checkin infrastructure
  core.fsync: use batch mode and sync loose objects by default on
    Windows
  core.fsyncmethod: tests for batch mode
  core.fsyncmethod: performance tests for add and stash

 Documentation/config/core.txt |  5 ++
 builtin/unpack-objects.c      |  3 ++
 builtin/update-index.c        |  6 +++
 bulk-checkin.c                | 89 +++++++++++++++++++++++++++++++----
 bulk-checkin.h                |  2 +
 cache.h                       | 12 ++++-
 compat/mingw.h                |  3 ++
 config.c                      |  4 +-
 git-compat-util.h             |  2 +
 object-file.c                 |  2 +
 t/lib-unique-files.sh         | 36 ++++++++++++++
 t/perf/p3700-add.sh           | 59 +++++++++++++++++++++++
 t/perf/p3900-stash.sh         | 62 ++++++++++++++++++++++++
 t/perf/perf-lib.sh            |  4 +-
 t/t3700-add.sh                | 22 +++++++++
 t/t3903-stash.sh              | 17 +++++++
 t/t5300-pack-object.sh        | 32 ++++++++-----
 17 files changed, 335 insertions(+), 25 deletions(-)
 create mode 100644 t/lib-unique-files.sh
 create mode 100755 t/perf/p3700-add.sh
 create mode 100755 t/perf/p3900-stash.sh


base-commit: 367f447f0f0cf39e9830c865e8373e42a3c45303
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1134%2Fneerajsi-msft%2Fns%2Fbatched-fsync-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1134/neerajsi-msft/ns/batched-fsync-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1134