mbox series

[v6,0/8] Implement a batched fsync option for core.fsyncObjectFiles

Message ID pull.1076.v6.git.git.1632527609.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series Implement a batched fsync option for core.fsyncObjectFiles | expand

Message

Philippe Blain via GitGitGadget Sept. 24, 2021, 11:53 p.m. UTC
Thanks to everyone for review so far!

v5 was a bit of a dud, with some issues that I only noticed after
submitting. v6 changes:

 * re-add Windows support
 * fix minor formatting issues
 * reset git author and commit dates which got messed up

Changes since v4, all in response to review feedback from Ævar Arnfjörð
Bjarmason:

 * Update core.fsyncobjectfiles documentation to specify 'loose' objects and
   to add a statement about not fsyncing parent directories.
   
   * I still don't want to make any promises on behalf of the Linux FS developers
     in the documentation. However, according to [v4.1] and my understanding
     of how XFS journals are documented to work, it looks like recent versions
     of Linux running on XFS should be as safe as Windows or macOS in 'batch'
     mode. I don't know about ext4, since it's not clear to me when metadata
     updates are made visible to the journal.
   

 * Rewrite the core batched fsync change to use the tmp-objdir lib. As Ævar
   pointed out, this lets us access the added loose objects immediately,
   rather than only after unplugging the bulk checkin. This is a hard
   requirement in unpack-objects for resolving OBJ_REF_DELTA packed objects.
   
   * As a preparatory patch, the object-file code now doesn't do a rename if it's in a
     tmp objdir (as determined by the quarantine environment variable).
   
   * I added support to the tmp-objdir lib to replace the 'main' writable odb.
   
   * Instead of using a lockfile for the final full fsync, we now use a new dummy
     temp file. Doing that makes the below unpack-objects change easier.
   

 * Add bulk-checkin support to unpack-objects, which is used in fetch and
   push. In addition to making those operations faster, it allows us to
   directly compare performance of packfiles against loose objects. Please
   see [v4.2] for a measurement of 'git push' to a local upstream with
   different numbers of unique new files.

 * Rename FSYNC_OBJECT_FILES_MODE to fsync_object_files_mode.

 * Remove comment with link to NtFlushBuffersFileEx documentation.

 * Make t/lib-unique-files.sh a bit cleaner. We are still creating unique
   contents, but now this uses test_tick, so it should be deterministic from
   run to run.

 * Ensure there are tests for all of the modified commands. Make the
   unpack-objects tests validate that the unpacked objects are really
   available in the ODB.

References for v4: [v4.1]
https://lore.kernel.org/linux-fsdevel/20190419072938.31320-1-amir73il@gmail.com/#t

[v4.2]
https://docs.google.com/spreadsheets/d/1uxMBkEXFFnQ1Y3lXKqcKpw6Mq44BzhpCAcPex14T-QQ/edit#gid=1898936117

Changes since v3:

 * Fix core.fsyncobjectfiles option parsing as suggested by Junio: We now
   accept no value to mean "true" and we require 'batch' to be lowercase.

 * Leave the default fsync mode as 'false'. Git for windows can change its
   default when this series makes it over to that fork.

 * Use a switch statement in git_fsync, as suggested by Junio.

 * Add regression test cases for core.fsyncobjectfiles=batch. This should
   keep the batch functionality basically working in upstream git even if
   few users adopt batch mode initially. I expect git-for-windows will
   provide a good baking area for the new mode.

Neeraj Singh (8):
  object-file.c: do not rename in a temp odb
  bulk-checkin: rename 'state' variable and separate 'plugged' boolean
  core.fsyncobjectfiles: batched disk flushes
  core.fsyncobjectfiles: add windows support for batch mode
  update-index: use the bulk-checkin infrastructure
  unpack-objects: use the bulk-checkin infrastructure
  core.fsyncobjectfiles: tests for batch mode
  core.fsyncobjectfiles: performance tests for add and stash

 Documentation/config/core.txt       |  29 +++++--
 Makefile                            |   6 ++
 builtin/unpack-objects.c            |   3 +
 builtin/update-index.c              |   6 ++
 bulk-checkin.c                      |  92 +++++++++++++++++++---
 bulk-checkin.h                      |   2 +
 cache.h                             |   8 +-
 compat/mingw.h                      |   3 +
 compat/win32/flush.c                |  28 +++++++
 config.c                            |   7 +-
 config.mak.uname                    |   3 +
 configure.ac                        |   8 ++
 contrib/buildsystems/CMakeLists.txt |   3 +-
 environment.c                       |   6 +-
 git-compat-util.h                   |   7 ++
 object-file.c                       | 118 ++++++++++++++++++++++++----
 object-store.h                      |  22 ++++++
 object.c                            |   2 +-
 repository.c                        |   2 +
 repository.h                        |   1 +
 t/lib-unique-files.sh               |  36 +++++++++
 t/perf/p3700-add.sh                 |  43 ++++++++++
 t/perf/p3900-stash.sh               |  46 +++++++++++
 t/t3700-add.sh                      |  20 +++++
 t/t3903-stash.sh                    |  14 ++++
 t/t5300-pack-object.sh              |  30 ++++---
 tmp-objdir.c                        |  20 ++++-
 tmp-objdir.h                        |   6 ++
 wrapper.c                           |  48 +++++++++++
 write-or-die.c                      |   2 +-
 30 files changed, 570 insertions(+), 51 deletions(-)
 create mode 100644 compat/win32/flush.c
 create mode 100644 t/lib-unique-files.sh
 create mode 100755 t/perf/p3700-add.sh
 create mode 100755 t/perf/p3900-stash.sh


base-commit: 8b7c11b8668b4e774f81a9f0b4c30144b818f1d1
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1076%2Fneerajsi-msft%2Fneerajsi%2Fbulk-fsync-object-files-v6
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1076/neerajsi-msft/neerajsi/bulk-fsync-object-files-v6
Pull-Request: https://github.com/git/git/pull/1076

Range-diff vs v5:

 1:  95315f35a28 = 1:  e4081f81f6a object-file.c: do not rename in a temp odb
 2:  df6fab94d67 = 2:  ebba65e040c bulk-checkin: rename 'state' variable and separate 'plugged' boolean
 3:  fe19cdfc930 ! 3:  543ea356934 core.fsyncobjectfiles: batched disk flushes
     @@ Makefile: ifdef HAVE_CLOCK_MONOTONIC
       	EXTLIBS += -lrt
       endif
      
     - ## builtin/add.c ##
     -@@ builtin/add.c: int cmd_add(int argc, const char **argv, const char *prefix)
     - 
     - 	if (chmod_arg && pathspec.nr)
     - 		exit_status |= chmod_pathspec(&pathspec, chmod_arg[0], show_only);
     -+
     - 	unplug_bulk_checkin();
     - 
     - finish:
     -
       ## bulk-checkin.c ##
      @@
        */
 -:  ----------- > 4:  bdb99822f8c core.fsyncobjectfiles: add windows support for batch mode
 4:  485b4a767df ! 5:  92e18cedab0 update-index: use the bulk-checkin infrastructure
     @@ Commit message
      
          This change enables bulk-checkin for update-index infrastructure to
          speed up adding new objects to the object database by leveraging the
     -    pack functionality and the new bulk-fsync functionality. This mode
     -    is enabled when passing paths to update-index via the --stdin flag,
     -    as is done by 'git stash'.
     +    pack functionality and the new bulk-fsync functionality.
      
          There is some risk with this change, since under batch fsync, the object
          files will not be available until the update-index is entirely complete.
 5:  889e7668760 = 6:  e3c5a11f225 unpack-objects: use the bulk-checkin infrastructure
 6:  0f2e3b25759 = 7:  385199354fa core.fsyncobjectfiles: tests for batch mode
 7:  6543564376a = 8:  504bcc95c56 core.fsyncobjectfiles: performance tests for add and stash