mbox series

[v4,0/6] Implement a batched fsync option for core.fsyncObjectFiles

Message ID pull.1076.v4.git.git.1632176111.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series Implement a batched fsync option for core.fsyncObjectFiles | expand

Message

Philippe Blain via GitGitGadget Sept. 20, 2021, 10:15 p.m. UTC
Thanks to everyone for review so far! Changes since v3:

 * Fix core.fsyncobjectfiles option parsing as suggested by Junio: We now
   accept no value to mean "true" and we require 'batch' to be lowercase.

 * Leave the default fsync mode as 'false'. Git for windows can change its
   default when this series makes it over to that fork.

 * Use a switch statement in git_fsync, as suggested by Junio.

 * Add regression test cases for core.fsyncobjectfiles=batch. This should
   keep the batch functionality basically working in upstream git even if
   few users adopt batch mode initially. I expect git-for-windows will
   provide a good baking area for the new mode.

Changes since v2:

 * Removed an unused Makefile define (FSYNC_DOESNT_FLUSH) that slipped in
   from an intermediate change.

 * Drop the futimens part of the patch and return to just calling utime, now
   within the new bulk_checkin code. The utime to futimens change seemed to
   be problematic for some platforms (thanks Randall Becker), and is really
   orthogonal to the rest of the patch series.

 * (Optional commit) Enable batch mode by default so that we can shake loose
   any issues relating to deferring the renames until the
   unplug_bulk_checkin.

Changes since v1:

 * Switch from futimes(2) to futimens(2), which is in POSIX.1-2008. Contrary
   to dscho's suggestion, I'm still implementing the Windows version in the
   same patch and I'm not doing autoconf detection since this is a POSIX
   function.

 * Introduce a separate preparatory patch to the bulk-checkin infrastructure
   to separate the 'plugged' variable and rename the 'state' variable, as
   suggested by dscho.

 * Add performance numbers to the commit message of the main bulk fsync
   patch, as suggested by dscho.

 * Add a comment about the non-thread-safety of the bulk-checkin
   infrastructure, as suggested by avarab.

 * Rename the experimental mode to core.fsyncobjectfiles=batch, as suggested
   by dscho and avarab and others.

 * Add more details to Documentation/config/core.txt about the various
   settings and their intended effects, as suggested by avarab.

 * Switch to the string-list API to hold the rename state, as suggested by
   avarab.

 * Create a separate update-index patch to use bulk-checkin as suggested by
   dscho.

 * Add Windows support in the upstream git. This is done in a way that
   should not conflict with git-for-windows.

 * Add new performance tests that shows the delta based on fsync mode.

NOTE: Based on Christoph Hellwig's comments, the 'batch' mode is not correct
on Linux, since sync_file_range does not provide data integrity guarantees.
There is currently no kernel interface suitable to achieve disk flush
batching as is, but he suggested that he might implement a 'syncfs' variant
on top of this patchset. This code is still useful on macOS and Windows, and
the config documentation makes that clear.

Neeraj Singh (6):
  bulk-checkin: rename 'state' variable and separate 'plugged' boolean
  core.fsyncobjectfiles: batched disk flushes
  core.fsyncobjectfiles: add windows support for batch mode
  update-index: use the bulk-checkin infrastructure
  core.fsyncobjectfiles: tests for batch mode
  core.fsyncobjectfiles: performance tests for add and stash

 Documentation/config/core.txt       |  26 +++++--
 Makefile                            |   6 ++
 builtin/add.c                       |   3 +-
 builtin/update-index.c              |   3 +
 bulk-checkin.c                      | 103 +++++++++++++++++++++++++---
 bulk-checkin.h                      |   5 +-
 cache.h                             |   8 ++-
 compat/mingw.h                      |   3 +
 compat/win32/flush.c                |  29 ++++++++
 config.c                            |   7 +-
 config.mak.uname                    |   3 +
 configure.ac                        |   8 +++
 contrib/buildsystems/CMakeLists.txt |   3 +-
 environment.c                       |   2 +-
 git-compat-util.h                   |   7 ++
 object-file.c                       |  22 +-----
 t/lib-unique-files.sh               |  34 +++++++++
 t/perf/p3700-add.sh                 |  43 ++++++++++++
 t/perf/p3900-stash.sh               |  46 +++++++++++++
 t/t3700-add.sh                      |  11 +++
 t/t3903-stash.sh                    |  14 ++++
 wrapper.c                           |  48 +++++++++++++
 write-or-die.c                      |   2 +-
 23 files changed, 392 insertions(+), 44 deletions(-)
 create mode 100644 compat/win32/flush.c
 create mode 100644 t/lib-unique-files.sh
 create mode 100755 t/perf/p3700-add.sh
 create mode 100755 t/perf/p3900-stash.sh


base-commit: 8b7c11b8668b4e774f81a9f0b4c30144b818f1d1
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1076%2Fneerajsi-msft%2Fneerajsi%2Fbulk-fsync-object-files-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1076/neerajsi-msft/neerajsi/bulk-fsync-object-files-v4
Pull-Request: https://github.com/git/git/pull/1076

Range-diff vs v3:

 1:  d5893e28df1 = 1:  d5893e28df1 bulk-checkin: rename 'state' variable and separate 'plugged' boolean
 2:  f8b5b709e9e ! 2:  12cad737635 core.fsyncobjectfiles: batched disk flushes
     @@ config.c: static int git_default_core_config(const char *var, const char *value,
       
       	if (!strcmp(var, "core.fsyncobjectfiles")) {
      -		fsync_object_files = git_config_bool(var, value);
     -+		if (!value)
     -+			return config_error_nonbool(var);
     -+		if (!strcasecmp(value, "batch"))
     ++		if (value && !strcmp(value, "batch"))
      +			fsync_object_files = FSYNC_OBJECT_FILES_BATCH;
     ++		else if (git_config_bool(var, value))
     ++			fsync_object_files = FSYNC_OBJECT_FILES_ON;
      +		else
     -+			fsync_object_files = git_config_bool(var, value)
     -+				? FSYNC_OBJECT_FILES_ON : FSYNC_OBJECT_FILES_OFF;
     ++			fsync_object_files = FSYNC_OBJECT_FILES_OFF;
       		return 0;
       	}
       
     @@ wrapper.c: int xmkstemp_mode(char *filename_template, int mode)
       
      +int git_fsync(int fd, enum fsync_action action)
      +{
     -+	if (action == FSYNC_WRITEOUT_ONLY) {
     ++	switch (action) {
     ++	case FSYNC_WRITEOUT_ONLY:
     ++
      +#ifdef __APPLE__
      +		/*
     -+		 * on Mac OS X, fsync just causes filesystem cache writeback but does not
     ++		 * on macOS, fsync just causes filesystem cache writeback but does not
      +		 * flush hardware caches.
      +		 */
      +		return fsync(fd);
     @@ wrapper.c: int xmkstemp_mode(char *filename_template, int mode)
      +
      +		errno = ENOSYS;
      +		return -1;
     -+	}
     ++
     ++	case FSYNC_HARDWARE_FLUSH:
      +
      +#ifdef __APPLE__
     -+	return fcntl(fd, F_FULLFSYNC);
     ++		return fcntl(fd, F_FULLFSYNC);
      +#else
     -+	return fsync(fd);
     ++		return fsync(fd);
      +#endif
     ++
     ++	default:
     ++		BUG("unexpected git_fsync(%d) call", action);
     ++	}
     ++
      +}
      +
       static int warn_if_unremovable(const char *op, const char *file, int rc)
 3:  815a862e229 ! 3:  a5b3e21b762 core.fsyncobjectfiles: add windows support for batch mode
     @@ wrapper.c: int git_fsync(int fd, enum fsync_action action)
      +
       		errno = ENOSYS;
       		return -1;
     - 	}
     + 
 4:  6b576038986 = 4:  f7f756f3932 update-index: use the bulk-checkin infrastructure
 -:  ----------- > 5:  afb0028e796 core.fsyncobjectfiles: tests for batch mode
 5:  b7ca3ba9302 ! 6:  3e6b80b5fa2 core.fsyncobjectfiles: performance tests for add and stash
     @@ Commit message
      
          Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
      
     - ## t/perf/lib-unique-files.sh (new) ##
     -@@
     -+# Helper to create files with unique contents
     -+
     -+test_create_unique_files_base__=$(date -u)
     -+test_create_unique_files_counter__=0
     -+
     -+# Create multiple files with unique contents. Takes the number of
     -+# directories, the number of files in each directory, and the base
     -+# directory.
     -+#
     -+# test_create_unique_files 2 3 . -- Creates 2 directories with 3 files
     -+#				    each in the current directory, all
     -+#				    with unique contents.
     -+
     -+test_create_unique_files() {
     -+	test "$#" -ne 3 && BUG "3 param"
     -+
     -+	local dirs=$1
     -+	local files=$2
     -+	local basedir=$3
     -+
     -+	for i in $(test_seq $dirs)
     -+	do
     -+		local dir=$basedir/dir$i
     -+
     -+		mkdir -p "$dir" > /dev/null
     -+		for j in $(test_seq $files)
     -+		do
     -+			test_create_unique_files_counter__=$((test_create_unique_files_counter__ + 1))
     -+			echo "$test_create_unique_files_base__.$test_create_unique_files_counter__"  >"$dir/file$j.txt"
     -+		done
     -+	done
     -+}
     -
       ## t/perf/p3700-add.sh (new) ##
      @@
      +#!/bin/sh
     @@ t/perf/p3700-add.sh (new)
      +
      +. ./perf-lib.sh
      +
     -+. $TEST_DIRECTORY/perf/lib-unique-files.sh
     ++. $TEST_DIRECTORY/lib-unique-files.sh
      +
      +test_perf_default_repo
      +test_checkout_worktree
     @@ t/perf/p3700-add.sh (new)
      +# We need to create the files each time we run the perf test, but
      +# we do not want to measure the cost of creating the files, so run
      +# the tet once.
     -+if test "$GIT_PERF_REPEAT_COUNT" -ne 1
     ++if test "${GIT_PERF_REPEAT_COUNT-1}" -ne 1
      +then
      +	echo "warning: Setting GIT_PERF_REPEAT_COUNT=1" >&2
      +	GIT_PERF_REPEAT_COUNT=1
     @@ t/perf/p3900-stash.sh (new)
      +
      +. ./perf-lib.sh
      +
     -+. $TEST_DIRECTORY/perf/lib-unique-files.sh
     ++. $TEST_DIRECTORY/lib-unique-files.sh
      +
      +test_perf_default_repo
      +test_checkout_worktree
     @@ t/perf/p3900-stash.sh (new)
      +# We need to create the files each time we run the perf test, but
      +# we do not want to measure the cost of creating the files, so run
      +# the tet once.
     -+if test "$GIT_PERF_REPEAT_COUNT" -ne 1
     ++if test "${GIT_PERF_REPEAT_COUNT-1}" -ne 1
      +then
      +	echo "warning: Setting GIT_PERF_REPEAT_COUNT=1" >&2
      +	GIT_PERF_REPEAT_COUNT=1
 6:  55a40fc8fd5 < -:  ----------- core.fsyncobjectfiles: enable batch mode for testing