mbox series

[v2,0/6] Implement a batched fsync option for core.fsyncObjectFiles

Message ID pull.1076.v2.git.git.1630108177.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series Implement a batched fsync option for core.fsyncObjectFiles | expand

Message

Philippe Blain via GitGitGadget Aug. 27, 2021, 11:49 p.m. UTC
Thanks to everyone for review so far! I've responded to the previous
feedback and changed the patch series a bit.

Changes since v1:

 * Switch from futimes(2) to futimens(2), which is in POSIX.1-2008. Contrary
   to dscho's suggestion, I'm still implementing the Windows version in the
   same patch and I'm not doing autoconf detection since this is a POSIX
   function.

 * Introduce a separate preparatory patch to the bulk-checkin infrastructure
   to separate the 'plugged' variable and rename the 'state' variable, as
   suggested by dscho.

 * Add performance numbers to the commit message of the main bulk fsync
   patch, as suggested by dscho.

 * Add a comment about the non-thread-safety of the bulk-checkin
   infrastructure, as suggested by avarab.

 * Rename the experimental mode to core.fsyncobjectfiles=batch, as suggested
   by dscho and avarab and others.

 * Add more details to Documentation/config/core.txt about the various
   settings and their intended effects, as suggested by avarab.

 * Switch to the string-list API to hold the rename state, as suggested by
   avarab.

 * Create a separate update-index patch to use bulk-checkin as suggested by
   dscho.

 * Add Windows support in the upstream git. This is done in a way that
   should not conflict with git-for-windows.

 * Add new performance tests that shows the delta based on fsync mode.

NOTE: Based on Christoph Hellwig's comments, the 'batch' mode is not correct
on Linux, since sync_file_range does not provide data integrity guarantees.
There is currently no kernel interface suitable to achieve disk flush
batching as is, but he suggested that he might implement a 'syncfs' variant
on top of this patchset. This code is still useful on macOS and Windows, and
the config documentation makes that clear.

Neeraj Singh (6):
  object-file: use futimens rather than utime
  bulk-checkin: rename 'state' variable and separate 'plugged' boolean
  core.fsyncobjectfiles: batched disk flushes
  core.fsyncobjectfiles: add windows support for batch mode
  update-index: use the bulk-checkin infrastructure
  core.fsyncobjectfiles: performance tests for add and stash

 Documentation/config/core.txt       | 26 ++++++--
 Makefile                            |  6 ++
 builtin/add.c                       |  3 +-
 builtin/update-index.c              |  3 +
 bulk-checkin.c                      | 92 +++++++++++++++++++++++++----
 bulk-checkin.h                      |  4 +-
 cache.h                             |  8 ++-
 compat/mingw.c                      | 53 +++++++++++------
 compat/mingw.h                      |  5 ++
 compat/win32/flush.c                | 29 +++++++++
 config.c                            |  8 ++-
 config.mak.uname                    |  4 ++
 configure.ac                        |  8 +++
 contrib/buildsystems/CMakeLists.txt |  3 +-
 environment.c                       |  2 +-
 git-compat-util.h                   |  7 +++
 object-file.c                       | 23 ++------
 t/perf/lib-unique-files.sh          | 32 ++++++++++
 t/perf/p3700-add.sh                 | 43 ++++++++++++++
 t/perf/p3900-stash.sh               | 46 +++++++++++++++
 wrapper.c                           | 40 +++++++++++++
 write-or-die.c                      |  2 +-
 22 files changed, 389 insertions(+), 58 deletions(-)
 create mode 100644 compat/win32/flush.c
 create mode 100644 t/perf/lib-unique-files.sh
 create mode 100755 t/perf/p3700-add.sh
 create mode 100755 t/perf/p3900-stash.sh


base-commit: 225bc32a989d7a22fa6addafd4ce7dcd04675dbf
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1076%2Fneerajsi-msft%2Fneerajsi%2Fbulk-fsync-object-files-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1076/neerajsi-msft/neerajsi/bulk-fsync-object-files-v2
Pull-Request: https://github.com/git/git/pull/1076

Range-diff vs v1:

 1:  2c1ddef6057 ! 1:  fc3d5a7b635 object-file: use futimes rather than utime
     @@ Metadata
      Author: Neeraj Singh <neerajsi@microsoft.com>
      
       ## Commit message ##
     -    object-file: use futimes rather than utime
     +    object-file: use futimens rather than utime
      
     -    Refactor the loose object file creation code and use the futimes(2) API
     -    rather than utime. This should be slightly faster given that we already
     -    have an FD to work with.
     +    Make close_loose_object do all of the steps for syncing and correctly
     +    naming a new loose object so that it can be reimplemented in the
     +    upcoming bulk-fsync mode.
     +
     +    Use futimens, which is available in POSIX.1-2008 to update the file
     +    timestamps. This should be slightly faster than utime, since we have
     +    a file descriptor already available. This change allows us to update
     +    the time before closing, renaming, and potentially fsyincing the file
     +    being refreshed. This code is currently only invoked by git-pack-objects
     +    via force_object_loose.
     +
     +    Implement a futimens shim for the Windows port of Git.
      
          Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
      
       ## compat/mingw.c ##
     +@@ compat/mingw.c: int mingw_chmod(const char *filename, int mode)
     +  * The unit of FILETIME is 100-nanoseconds since January 1, 1601, UTC.
     +  * Returns the 100-nanoseconds ("hekto nanoseconds") since the epoch.
     +  */
     ++
     ++#define UNIX_EPOCH_FILETIME 116444736000000000LL
     ++
     + static inline long long filetime_to_hnsec(const FILETIME *ft)
     + {
     + 	long long winTime = ((long long)ft->dwHighDateTime << 32) + ft->dwLowDateTime;
     + 	/* Windows to Unix Epoch conversion */
     +-	return winTime - 116444736000000000LL;
     ++	return winTime - UNIX_EPOCH_FILETIME;
     + }
     + 
     + static inline void filetime_to_timespec(const FILETIME *ft, struct timespec *ts)
     +@@ compat/mingw.c: static inline void filetime_to_timespec(const FILETIME *ft, struct timespec *ts)
     + 	ts->tv_nsec = (hnsec % 10000000) * 100;
     + }
     + 
     ++static inline void timespec_to_filetime(const struct timespec *t, FILETIME *ft)
     ++{
     ++	long long winTime = t->tv_sec * 10000000LL + t->tv_nsec / 100 + UNIX_EPOCH_FILETIME;
     ++	ft->dwLowDateTime = winTime;
     ++	ft->dwHighDateTime = winTime >> 32;
     ++}
     ++
     + /**
     +  * Verifies that safe_create_leading_directories() would succeed.
     +  */
      @@ compat/mingw.c: int mingw_fstat(int fd, struct stat *buf)
       	}
       }
       
      -static inline void time_t_to_filetime(time_t t, FILETIME *ft)
     -+static inline void timeval_to_filetime(const struct timeval *t, FILETIME *ft)
     ++int mingw_futimens(int fd, const struct timespec times[2])
       {
      -	long long winTime = t * 10000000LL + 116444736000000000LL;
     -+	long long winTime = t->tv_sec * 10000000LL + t->tv_usec * 10 + 116444736000000000LL;
     - 	ft->dwLowDateTime = winTime;
     - 	ft->dwHighDateTime = winTime >> 32;
     - }
     - 
     --int mingw_utime (const char *file_name, const struct utimbuf *times)
     -+int mingw_futimes(int fd, const struct timeval times[2])
     - {
     - 	FILETIME mft, aft;
     +-	ft->dwLowDateTime = winTime;
     +-	ft->dwHighDateTime = winTime >> 32;
     ++	FILETIME mft, aft;
      +
      +	if (times) {
     -+		timeval_to_filetime(&times[0], &aft);
     -+		timeval_to_filetime(&times[1], &mft);
     ++		timespec_to_filetime(&times[0], &aft);
     ++		timespec_to_filetime(&times[1], &mft);
      +	} else {
      +		GetSystemTimeAsFileTime(&mft);
      +		aft = mft;
     @@ compat/mingw.c: int mingw_fstat(int fd, struct stat *buf)
      +	}
      +
      +	return 0;
     -+}
     -+
     -+int mingw_utime (const char *file_name, const struct utimbuf *times)
     -+{
     + }
     + 
     +-int mingw_utime (const char *file_name, const struct utimbuf *times)
     ++int mingw_utime(const char *file_name, const struct utimbuf *times)
     + {
     +-	FILETIME mft, aft;
       	int fh, rc;
       	DWORD attrs;
       	wchar_t wfilename[MAX_PATH];
     -+	struct timeval tvs[2];
     ++	struct timespec ts[2];
      +
       	if (xutftowcs_path(wfilename, file_name) < 0)
       		return -1;
     @@ compat/mingw.c: int mingw_utime (const char *file_name, const struct utimbuf *ti
      -	} else {
      -		GetSystemTimeAsFileTime(&mft);
      -		aft = mft;
     -+		memset(tvs, 0, sizeof(tvs));
     -+		tvs[0].tv_sec = times->actime;
     -+		tvs[1].tv_sec = times->modtime;
     ++		memset(ts, 0, sizeof(ts));
     ++		ts[0].tv_sec = times->actime;
     ++		ts[1].tv_sec = times->modtime;
       	}
      -	if (!SetFileTime((HANDLE)_get_osfhandle(fh), NULL, &aft, &mft)) {
      -		errno = EINVAL;
     @@ compat/mingw.c: int mingw_utime (const char *file_name, const struct utimbuf *ti
      -	} else
      -		rc = 0;
      +
     -+	rc = mingw_futimes(fh, times ? tvs : NULL);
     ++	rc = mingw_futimens(fh, times ? ts : NULL);
       	close(fh);
       
       revert_attrs:
     @@ compat/mingw.h: int mingw_fstat(int fd, struct stat *buf);
       
       int mingw_utime(const char *file_name, const struct utimbuf *times);
       #define utime mingw_utime
     -+int mingw_futimes(int fd, const struct timeval times[2]);
     -+#define futimes mingw_futimes
     ++int mingw_futimens(int fd, const struct timespec times[2]);
     ++#define futimens mingw_futimens
       size_t mingw_strftime(char *s, size_t max,
       		   const char *format, const struct tm *tm);
       #define strftime mingw_strftime
     @@ object-file.c: static int write_loose_object(const struct object_id *oid, char *
      -		utb.modtime = mtime;
      -		if (utime(tmp_file.buf, &utb) < 0)
      -			warning_errno(_("failed utime() on %s"), tmp_file.buf);
     -+		struct timeval tvs[2] = {0};
     -+		tvs[0].tv_sec = mtime;
     -+		tvs[1].tv_sec = mtime;
     -+		if (futimes(fd, tvs) < 0)
     ++		struct timespec ts[2] = {0};
     ++		ts[0].tv_sec = mtime;
     ++		ts[1].tv_sec = mtime;
     ++		if (futimens(fd, ts) < 0)
      +			warning_errno(_("failed futimes() on %s"), tmp_file.buf);
       	}
       
 -:  ----------- > 2:  49f72800bfb bulk-checkin: rename 'state' variable and separate 'plugged' boolean
 2:  d1e68d4a2af ! 3:  2c1c907b12a core.fsyncobjectfiles: batch disk flushes
     @@ Metadata
      Author: Neeraj Singh <neerajsi@microsoft.com>
      
       ## Commit message ##
     -    core.fsyncobjectfiles: batch disk flushes
     +    core.fsyncobjectfiles: batched disk flushes
      
          When adding many objects to a repo with core.fsyncObjectFiles set to
          true, the cost of fsync'ing each object file can become prohibitive.
      
          One major source of the cost of fsync is the implied flush of the
          hardware writeback cache within the disk drive. Fortunately, Windows,
     -    MacOS, and Linux each offer mechanisms to write data from the filesystem
     +    macOS, and Linux each offer mechanisms to write data from the filesystem
          page cache without initiating a hardware flush.
      
     -    This patch introduces a new 'core.fsyncObjectFiles = 2' option that
     +    This patch introduces a new 'core.fsyncObjectFiles = batch' option that
          takes advantage of the bulk-checkin infrastructure to batch up hardware
          flushes.
      
     @@ Commit message
          1. Create a tmp_obj_XXXX file and write the object data to it.
          2. Issue a pagecache writeback request and wait for it to complete.
          3. Record the tmp name and the final name in the bulk-checkin state for
     -       later name.
     +       later rename.
      
          At the end of the entire transaction we:
          1. Issue a fsync against the lock file to flush the hardware writeback
             cache, which should by now have processed the tmp file writes.
          2. Rename all of the temp files to their final names.
     -    3. When updating the index and/or refs, we will issue another fsync
     -       internal to that operation.
     +    3. When updating the index and/or refs, we assume that Git will issue
     +       another fsync internal to that operation.
      
          On a filesystem with a singular journal that is updated during name
          operations (e.g. create, link, rename, etc), such as NTFS and HFS+, we
     @@ Commit message
          sequence is enough to ensure that the user's data is durable by the time
          the git command returns.
      
     -    This change also updates the MacOS code to trigger a real hardware flush
     +    This change also updates the macOS code to trigger a real hardware flush
          via fnctl(fd, F_FULLFSYNC) when fsync_or_die is called. Previously, on
     -    MacOS there was no guarantee of durability since a simple fsync(2) call
     +    macOS there was no guarantee of durability since a simple fsync(2) call
          does not flush any hardware caches.
      
     +    _Performance numbers_:
     +
     +    Linux - Hyper-V VM running Kernel 5.11 (Ubuntu 20.04) on a fast SSD.
     +    Mac - macOS 11.5.1 running on a Mac mini on a 1TB Apple SSD.
     +    Windows - Same host as Linux, a preview version of Windows 11.
     +              This number is from a patch later in the series.
     +
     +    Adding 500 files to the repo with 'git add' Times reported in seconds.
     +
     +    core.fsyncObjectFiles | Linux | Mac   | Windows
     +    ----------------------|-------|-------|--------
     +                    false | 0.06  |  0.35 | 0.61
     +                    true  | 1.88  | 11.18 | 2.47
     +                    batch | 0.15  |  0.41 | 1.53
     +
          Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
      
       ## Documentation/config/core.txt ##
     @@ Documentation/config/core.txt: core.whitespace::
      -data writes properly, but can be useful for filesystems that do not use
      -journalling (traditional UNIX filesystems) or that only journal metadata
      -and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback").
     -+	A boolean value or the number '2', indicating the level of durability
     -+	applied to object files.
     ++	A value indicating the level of effort Git will expend in
     ++	trying to make objects added to the repo durable in the event
     ++	of an unclean system shutdown. This setting currently only
     ++	controls the object store, so updates to any refs or the
     ++	index may not be equally durable.
      ++
     -+This setting controls how much effort Git makes to ensure that data added to
     -+the object store are durable in the case of an unclean system shutdown. If
     -+'false', Git allows data to remain in file system caches according to operating
     -+system policy, whence they may be lost if the system loses power or crashes. A
     -+value of 'true' instructs Git to force objects to stable storage immediately
     -+when they are added to the object store. The number '2' is an experimental
     -+value that also preserves durability but tries to perform hardware flushes in a
     -+batch.
     ++* `false` allows data to remain in file system caches according to
     ++  operating system policy, whence it may be lost if the system loses power
     ++  or crashes.
     ++* `true` triggers a data integrity flush for each object added to the
     ++  object store. This is the safest setting that is likely to ensure durability
     ++  across all operating systems and file systems that honor the 'fsync' system
     ++  call. However, this setting comes with a significant performance cost on
     ++  common hardware.
     ++* `batch` enables an experimental mode that uses interfaces available in some
     ++  operating systems to write object data with a minimal set of FLUSH CACHE
     ++  (or equivalent) commands sent to the storage controller. If the operating
     ++  system interfaces are not available, this mode behaves the same as `true`.
     ++  This mode is expected to be safe on macOS for repos stored on HFS+ or APFS
     ++  filesystems and on Windows for repos stored on NTFS or ReFS.
       
       core.preloadIndex::
       	Enable parallel index preload for operations like 'git diff'
      
       ## Makefile ##
     +@@ Makefile: all::
     + #
     + # Define HAVE_CLOCK_MONOTONIC if your platform has CLOCK_MONOTONIC.
     + #
     ++# Define HAVE_SYNC_FILE_RANGE if your platform has sync_file_range.
     ++#
     + # Define NEEDS_LIBRT if your platform requires linking with librt (glibc version
     + # before 2.17) for clock_gettime and CLOCK_MONOTONIC.
     + #
      @@ Makefile: ifdef HAVE_CLOCK_MONOTONIC
       	BASIC_CFLAGS += -DHAVE_CLOCK_MONOTONIC
       endif
     @@ builtin/add.c: int cmd_add(int argc, const char **argv, const char *prefix)
       finish:
       	if (write_locked_index(&the_index, &lock_file,
      
     - ## builtin/update-index.c ##
     -@@
     -  */
     - #define USE_THE_INDEX_COMPATIBILITY_MACROS
     - #include "cache.h"
     -+#include "bulk-checkin.h"
     - #include "config.h"
     - #include "lockfile.h"
     - #include "quote.h"
     -@@ builtin/update-index.c: int cmd_update_index(int argc, const char **argv, const char *prefix)
     - 		struct strbuf unquoted = STRBUF_INIT;
     - 
     - 		setup_work_tree();
     -+		plug_bulk_checkin();
     - 		while (getline_fn(&buf, stdin) != EOF) {
     - 			char *p;
     - 			if (!nul_term_line && buf.buf[0] == '"') {
     -@@ builtin/update-index.c: int cmd_update_index(int argc, const char **argv, const char *prefix)
     - 				chmod_path(set_executable_bit, p);
     - 			free(p);
     - 		}
     -+		unplug_bulk_checkin(&lock_file);
     - 		strbuf_release(&unquoted);
     - 		strbuf_release(&buf);
     - 	}
     -
       ## bulk-checkin.c ##
      @@
        */
     @@ bulk-checkin.c
       #include "repository.h"
       #include "csum-file.h"
       #include "pack.h"
     -@@
     + #include "strbuf.h"
     ++#include "string-list.h"
       #include "packfile.h"
       #include "object-store.h"
       
     -+struct object_rename {
     -+	char *src;
     -+	char *dst;
     -+};
     -+
     -+static struct bulk_rename_state {
     -+	struct object_rename *renames;
     -+	uint32_t alloc_renames;
     -+	uint32_t nr_renames;
     -+} bulk_rename_state;
     -+
     - static struct bulk_checkin_state {
     - 	unsigned plugged:1;
     + static int bulk_checkin_plugged;
       
     -@@ bulk-checkin.c: static struct bulk_checkin_state {
     - 	struct pack_idx_entry **written;
     - 	uint32_t alloc_written;
     - 	uint32_t nr_written;
     --} state;
     ++static struct string_list bulk_fsync_state = STRING_LIST_INIT_DUP;
      +
     -+} bulk_checkin_state;
     - 
     - static void finish_bulk_checkin(struct bulk_checkin_state *state)
     - {
     - 	struct object_id oid;
     - 	struct strbuf packname = STRBUF_INIT;
     - 	int i;
     -+	unsigned old_plugged;
     - 
     - 	if (!state->f)
     - 		return;
     -@@ bulk-checkin.c: static void finish_bulk_checkin(struct bulk_checkin_state *state)
     - 
     - clear_exit:
     - 	free(state->written);
     -+	old_plugged = state->plugged;
     - 	memset(state, 0, sizeof(*state));
     -+	state->plugged = old_plugged;
     - 
     - 	strbuf_release(&packname);
     - 	/* Make objects we just wrote available to ourselves */
     + static struct bulk_checkin_state {
     + 	char *pack_tmp_name;
     + 	struct hashfile *f;
     +@@ bulk-checkin.c: clear_exit:
       	reprepare_packed_git(the_repository);
       }
       
     -+static void do_sync_and_rename(struct bulk_rename_state *state, struct lock_file *lock_file)
     ++static void do_sync_and_rename(struct string_list *fsync_state, struct lock_file *lock_file)
      +{
     -+	if (state->nr_renames) {
     -+		int i;
     ++	if (fsync_state->nr) {
     ++		struct string_list_item *rename;
      +
      +		/*
      +		 * Issue a full hardware flush against the lock file to ensure
     @@ bulk-checkin.c: static void finish_bulk_checkin(struct bulk_checkin_state *state
      +		 */
      +		fsync_or_die(get_lock_file_fd(lock_file), get_lock_file_path(lock_file));
      +
     -+		for (i = 0; i < state->nr_renames; i++) {
     -+			if (finalize_object_file(state->renames[i].src, state->renames[i].dst))
     -+				die_errno(_("could not rename '%s'"), state->renames[i].src);
     ++		for_each_string_list_item(rename, fsync_state) {
     ++			const char *src = rename->string;
     ++			const char *dst = rename->util;
      +
     -+			free(state->renames[i].src);
     -+			free(state->renames[i].dst);
     ++			if (finalize_object_file(src, dst))
     ++				die_errno(_("could not rename '%s' to '%s'"), src, dst);
      +		}
      +
     -+		free(state->renames);
     -+		memset(state, 0, sizeof(*state));
     ++		string_list_clear(fsync_state, 1);
      +	}
      +}
      +
     @@ bulk-checkin.c: static int deflate_to_pack(struct bulk_checkin_state *state,
       	return 0;
       }
       
     -+static void add_rename_bulk_checkin(struct bulk_rename_state *state,
     ++static void add_rename_bulk_checkin(struct string_list *fsync_state,
      +				    const char *src, const char *dst)
      +{
     -+	struct object_rename *rename;
     -+
     -+	ALLOC_GROW(state->renames, state->nr_renames + 1, state->alloc_renames);
     -+
     -+	rename = &state->renames[state->nr_renames++];
     -+	rename->src = xstrdup(src);
     -+	rename->dst = xstrdup(dst);
     ++	string_list_insert(fsync_state, src)->util = xstrdup(dst);
      +}
      +
      +int fsync_and_close_loose_object_bulk_checkin(int fd, const char *tmpfile,
      +					      const char *filename)
      +{
     -+	if (fsync_object_files) {
     ++	if (fsync_object_files != FSYNC_OBJECT_FILES_OFF) {
      +		/*
      +		 * If we have a plugged bulk checkin, we issue a call that
      +		 * cleans the filesystem page cache but avoids a hardware flush
      +		 * command. Later on we will issue a single hardware flush
      +		 * before renaming files as part of do_sync_and_rename.
      +		 */
     -+		if (bulk_checkin_state.plugged &&
     -+		    fsync_object_files == 2 &&
     ++		if (bulk_checkin_plugged &&
     ++		    fsync_object_files == FSYNC_OBJECT_FILES_BATCH &&
      +		    git_fsync(fd, FSYNC_WRITEOUT_ONLY) >= 0) {
     -+			add_rename_bulk_checkin(&bulk_rename_state, tmpfile, filename);
     ++			add_rename_bulk_checkin(&bulk_fsync_state, tmpfile, filename);
      +			if (close(fd))
      +				die_errno(_("error when closing loose object file"));
      +
     @@ bulk-checkin.c: static int deflate_to_pack(struct bulk_checkin_state *state,
       int index_bulk_checkin(struct object_id *oid,
       		       int fd, size_t size, enum object_type type,
       		       const char *path, unsigned flags)
     - {
     --	int status = deflate_to_pack(&state, oid, fd, size, type,
     -+	int status = deflate_to_pack(&bulk_checkin_state, oid, fd, size, type,
     - 				     path, flags);
     --	if (!state.plugged)
     --		finish_bulk_checkin(&state);
     -+	if (!bulk_checkin_state.plugged)
     -+		finish_bulk_checkin(&bulk_checkin_state);
     - 	return status;
     - }
     - 
     - void plug_bulk_checkin(void)
     - {
     --	state.plugged = 1;
     -+	bulk_checkin_state.plugged = 1;
     +@@ bulk-checkin.c: void plug_bulk_checkin(void)
     + 	bulk_checkin_plugged = 1;
       }
       
      -void unplug_bulk_checkin(void)
      +void unplug_bulk_checkin(struct lock_file *lock_file)
       {
     --	state.plugged = 0;
     --	if (state.f)
     --		finish_bulk_checkin(&state);
     -+	bulk_checkin_state.plugged = 0;
     -+	if (bulk_checkin_state.f)
     -+		finish_bulk_checkin(&bulk_checkin_state);
     + 	assert(bulk_checkin_plugged);
     + 	bulk_checkin_plugged = 0;
     + 	if (bulk_checkin_state.f)
     + 		finish_bulk_checkin(&bulk_checkin_state);
      +
     -+	do_sync_and_rename(&bulk_rename_state, lock_file);
     ++	do_sync_and_rename(&bulk_fsync_state, lock_file);
       }
      
       ## bulk-checkin.h ##
     @@ bulk-checkin.h
       
       #endif
      
     + ## cache.h ##
     +@@ cache.h: void reset_shared_repository(void);
     + extern int read_replace_refs;
     + extern char *git_replace_ref_base;
     + 
     +-extern int fsync_object_files;
     ++enum FSYNC_OBJECT_FILES_MODE {
     ++    FSYNC_OBJECT_FILES_OFF,
     ++    FSYNC_OBJECT_FILES_ON,
     ++    FSYNC_OBJECT_FILES_BATCH
     ++};
     ++
     ++extern enum FSYNC_OBJECT_FILES_MODE fsync_object_files;
     + extern int core_preload_index;
     + extern int precomposed_unicode;
     + extern int protect_hfs;
     +
       ## config.c ##
      @@ config.c: static int git_default_core_config(const char *var, const char *value, void *cb)
       	}
       
       	if (!strcmp(var, "core.fsyncobjectfiles")) {
      -		fsync_object_files = git_config_bool(var, value);
     -+		int is_bool;
     -+
     -+		fsync_object_files = git_config_bool_or_int(var, value, &is_bool);
     ++		if (!value)
     ++			return config_error_nonbool(var);
     ++		if (!strcasecmp(value, "batch"))
     ++			fsync_object_files = FSYNC_OBJECT_FILES_BATCH;
     ++		else
     ++			fsync_object_files = git_config_bool(var, value)
     ++				? FSYNC_OBJECT_FILES_ON : FSYNC_OBJECT_FILES_OFF;
       		return 0;
       	}
       
     @@ configure.ac: AC_COMPILE_IFELSE([CLOCK_MONOTONIC_SRC],
       # Define NO_SETITIMER if you don't have setitimer.
       GIT_CHECK_FUNC(setitimer,
      
     + ## environment.c ##
     +@@ environment.c: const char *git_hooks_path;
     + int zlib_compression_level = Z_BEST_SPEED;
     + int core_compression_level;
     + int pack_compression_level = Z_DEFAULT_COMPRESSION;
     +-int fsync_object_files;
     ++enum FSYNC_OBJECT_FILES_MODE fsync_object_files;
     + size_t packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE;
     + size_t packed_git_limit = DEFAULT_PACKED_GIT_LIMIT;
     + size_t delta_base_cache_limit = 96 * 1024 * 1024;
     +
       ## git-compat-util.h ##
      @@ git-compat-util.h: __attribute__((format (printf, 1, 2))) NORETURN
       void BUG(const char *fmt, ...);
 -:  ----------- > 4:  546ad9c82e8 core.fsyncobjectfiles: add windows support for batch mode
 -:  ----------- > 5:  d8843185fe4 update-index: use the bulk-checkin infrastructure
 -:  ----------- > 6:  73b5d41be94 core.fsyncobjectfiles: performance tests for add and stash

Comments

Neeraj Singh Sept. 7, 2021, 7:44 p.m. UTC | #1
On Fri, Aug 27, 2021 at 4:49 PM Neeraj K. Singh via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> Thanks to everyone for review so far! I've responded to the previous
> feedback and changed the patch series a bit.
>
> Changes since v1:
>
>  * Switch from futimes(2) to futimens(2), which is in POSIX.1-2008. Contrary
>    to dscho's suggestion, I'm still implementing the Windows version in the
>    same patch and I'm not doing autoconf detection since this is a POSIX
>    function.
>
>  * Introduce a separate preparatory patch to the bulk-checkin infrastructure
>    to separate the 'plugged' variable and rename the 'state' variable, as
>    suggested by dscho.
>
>  * Add performance numbers to the commit message of the main bulk fsync
>    patch, as suggested by dscho.
>
>  * Add a comment about the non-thread-safety of the bulk-checkin
>    infrastructure, as suggested by avarab.
>
>  * Rename the experimental mode to core.fsyncobjectfiles=batch, as suggested
>    by dscho and avarab and others.
>
>  * Add more details to Documentation/config/core.txt about the various
>    settings and their intended effects, as suggested by avarab.
>
>  * Switch to the string-list API to hold the rename state, as suggested by
>    avarab.
>
>  * Create a separate update-index patch to use bulk-checkin as suggested by
>    dscho.
>
>  * Add Windows support in the upstream git. This is done in a way that
>    should not conflict with git-for-windows.
>
>  * Add new performance tests that shows the delta based on fsync mode.
>
> NOTE: Based on Christoph Hellwig's comments, the 'batch' mode is not correct
> on Linux, since sync_file_range does not provide data integrity guarantees.
> There is currently no kernel interface suitable to achieve disk flush
> batching as is, but he suggested that he might implement a 'syncfs' variant
> on top of this patchset. This code is still useful on macOS and Windows, and
> the config documentation makes that clear.
>
> Neeraj Singh (6):
>   object-file: use futimens rather than utime
>   bulk-checkin: rename 'state' variable and separate 'plugged' boolean
>   core.fsyncobjectfiles: batched disk flushes
>   core.fsyncobjectfiles: add windows support for batch mode
>   update-index: use the bulk-checkin infrastructure
>   core.fsyncobjectfiles: performance tests for add and stash
>
>  Documentation/config/core.txt       | 26 ++++++--
>  Makefile                            |  6 ++
>  builtin/add.c                       |  3 +-
>  builtin/update-index.c              |  3 +
>  bulk-checkin.c                      | 92 +++++++++++++++++++++++++----
>  bulk-checkin.h                      |  4 +-
>  cache.h                             |  8 ++-
>  compat/mingw.c                      | 53 +++++++++++------
>  compat/mingw.h                      |  5 ++
>  compat/win32/flush.c                | 29 +++++++++
>  config.c                            |  8 ++-
>  config.mak.uname                    |  4 ++
>  configure.ac                        |  8 +++
>  contrib/buildsystems/CMakeLists.txt |  3 +-
>  environment.c                       |  2 +-
>  git-compat-util.h                   |  7 +++
>  object-file.c                       | 23 ++------
>  t/perf/lib-unique-files.sh          | 32 ++++++++++
>  t/perf/p3700-add.sh                 | 43 ++++++++++++++
>  t/perf/p3900-stash.sh               | 46 +++++++++++++++
>  wrapper.c                           | 40 +++++++++++++
>  write-or-die.c                      |  2 +-
>  22 files changed, 389 insertions(+), 58 deletions(-)
>  create mode 100644 compat/win32/flush.c
>  create mode 100644 t/perf/lib-unique-files.sh
>  create mode 100755 t/perf/p3700-add.sh
>  create mode 100755 t/perf/p3900-stash.sh
>
>
> base-commit: 225bc32a989d7a22fa6addafd4ce7dcd04675dbf
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1076%2Fneerajsi-msft%2Fneerajsi%2Fbulk-fsync-object-files-v2
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1076/neerajsi-msft/neerajsi/bulk-fsync-object-files-v2
> Pull-Request: https://github.com/git/git/pull/1076

Hello everyone,
I'd like to bump this review up in people's inboxes since Patch V2
hasn't gotten any traction in over a week.

Thanks in advance for taking a look,
- Neeraj Singh
Windows Core Filesystems Team
Ævar Arnfjörð Bjarmason Sept. 7, 2021, 7:50 p.m. UTC | #2
On Tue, Sep 07 2021, Neeraj Singh wrote:

> On Fri, Aug 27, 2021 at 4:49 PM Neeraj K. Singh via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> Thanks to everyone for review so far! I've responded to the previous
>> feedback and changed the patch series a bit.
>>
>> Changes since v1:
>>
>>  * Switch from futimes(2) to futimens(2), which is in POSIX.1-2008. Contrary
>>    to dscho's suggestion, I'm still implementing the Windows version in the
>>    same patch and I'm not doing autoconf detection since this is a POSIX
>>    function.
>>
>>  * Introduce a separate preparatory patch to the bulk-checkin infrastructure
>>    to separate the 'plugged' variable and rename the 'state' variable, as
>>    suggested by dscho.
>>
>>  * Add performance numbers to the commit message of the main bulk fsync
>>    patch, as suggested by dscho.
>>
>>  * Add a comment about the non-thread-safety of the bulk-checkin
>>    infrastructure, as suggested by avarab.
>>
>>  * Rename the experimental mode to core.fsyncobjectfiles=batch, as suggested
>>    by dscho and avarab and others.
>>
>>  * Add more details to Documentation/config/core.txt about the various
>>    settings and their intended effects, as suggested by avarab.
>>
>>  * Switch to the string-list API to hold the rename state, as suggested by
>>    avarab.
>>
>>  * Create a separate update-index patch to use bulk-checkin as suggested by
>>    dscho.
>>
>>  * Add Windows support in the upstream git. This is done in a way that
>>    should not conflict with git-for-windows.
>>
>>  * Add new performance tests that shows the delta based on fsync mode.
>>
>> NOTE: Based on Christoph Hellwig's comments, the 'batch' mode is not correct
>> on Linux, since sync_file_range does not provide data integrity guarantees.
>> There is currently no kernel interface suitable to achieve disk flush
>> batching as is, but he suggested that he might implement a 'syncfs' variant
>> on top of this patchset. This code is still useful on macOS and Windows, and
>> the config documentation makes that clear.
>>
>> Neeraj Singh (6):
>>   object-file: use futimens rather than utime
>>   bulk-checkin: rename 'state' variable and separate 'plugged' boolean
>>   core.fsyncobjectfiles: batched disk flushes
>>   core.fsyncobjectfiles: add windows support for batch mode
>>   update-index: use the bulk-checkin infrastructure
>>   core.fsyncobjectfiles: performance tests for add and stash
>>
>>  Documentation/config/core.txt       | 26 ++++++--
>>  Makefile                            |  6 ++
>>  builtin/add.c                       |  3 +-
>>  builtin/update-index.c              |  3 +
>>  bulk-checkin.c                      | 92 +++++++++++++++++++++++++----
>>  bulk-checkin.h                      |  4 +-
>>  cache.h                             |  8 ++-
>>  compat/mingw.c                      | 53 +++++++++++------
>>  compat/mingw.h                      |  5 ++
>>  compat/win32/flush.c                | 29 +++++++++
>>  config.c                            |  8 ++-
>>  config.mak.uname                    |  4 ++
>>  configure.ac                        |  8 +++
>>  contrib/buildsystems/CMakeLists.txt |  3 +-
>>  environment.c                       |  2 +-
>>  git-compat-util.h                   |  7 +++
>>  object-file.c                       | 23 ++------
>>  t/perf/lib-unique-files.sh          | 32 ++++++++++
>>  t/perf/p3700-add.sh                 | 43 ++++++++++++++
>>  t/perf/p3900-stash.sh               | 46 +++++++++++++++
>>  wrapper.c                           | 40 +++++++++++++
>>  write-or-die.c                      |  2 +-
>>  22 files changed, 389 insertions(+), 58 deletions(-)
>>  create mode 100644 compat/win32/flush.c
>>  create mode 100644 t/perf/lib-unique-files.sh
>>  create mode 100755 t/perf/p3700-add.sh
>>  create mode 100755 t/perf/p3900-stash.sh
>>
>>
>> base-commit: 225bc32a989d7a22fa6addafd4ce7dcd04675dbf
>> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1076%2Fneerajsi-msft%2Fneerajsi%2Fbulk-fsync-object-files-v2
>> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1076/neerajsi-msft/neerajsi/bulk-fsync-object-files-v2
>> Pull-Request: https://github.com/git/git/pull/1076
>
> Hello everyone,
> I'd like to bump this review up in people's inboxes since Patch V2
> hasn't gotten any traction in over a week.
>
> Thanks in advance for taking a look,
> - Neeraj Singh
> Windows Core Filesystems Team

Thanks, I've been meaning to take a look at this, and also as a
note-to-self: check how this interacts with the fsync()-impacted race I
noted in my just-sent:
https://lore.kernel.org/git/cover-0.3-00000000000-20210907T193600Z-avarab@gmail.com/
Randall S. Becker Sept. 7, 2021, 7:54 p.m. UTC | #3
On September 7, 2021 3:44 PM, Neeraj Singh wrote:
>On Fri, Aug 27, 2021 at 4:49 PM Neeraj K. Singh via GitGitGadget <gitgitgadget@gmail.com> wrote:
>>
>> Thanks to everyone for review so far! I've responded to the previous
>> feedback and changed the patch series a bit.
>>
>> Changes since v1:
>>
>>  * Switch from futimes(2) to futimens(2), which is in POSIX.1-2008. Contrary
>>    to dscho's suggestion, I'm still implementing the Windows version in the
>>    same patch and I'm not doing autoconf detection since this is a POSIX
>>    function.

While POSIX.1-2008, this function is not available on every single POSIX-compliant platform. Please make sure that the code will not cause a breakage on some platforms - the ones I maintain, in particular. Neither futimes nor futimens is available on either NonStop ia64 or x86. The platform only has utime, so this needs to be wrapped with an option in config.mak.uname.

Thanks,
Randall
Neeraj Singh Sept. 8, 2021, 12:54 a.m. UTC | #4
On Tue, Sep 7, 2021 at 12:54 PM Randall S. Becker
<rsbecker@nexbridge.com> wrote:
>
> On September 7, 2021 3:44 PM, Neeraj Singh wrote:
> >On Fri, Aug 27, 2021 at 4:49 PM Neeraj K. Singh via GitGitGadget <gitgitgadget@gmail.com> wrote:
> >>
> >> Thanks to everyone for review so far! I've responded to the previous
> >> feedback and changed the patch series a bit.
> >>
> >> Changes since v1:
> >>
> >>  * Switch from futimes(2) to futimens(2), which is in POSIX.1-2008. Contrary
> >>    to dscho's suggestion, I'm still implementing the Windows version in the
> >>    same patch and I'm not doing autoconf detection since this is a POSIX
> >>    function.
>
> While POSIX.1-2008, this function is not available on every single POSIX-compliant platform. Please make sure that the code will not cause a breakage on some platforms - the ones I maintain, in particular. Neither futimes nor futimens is available on either NonStop ia64 or x86. The platform only has utime, so this needs to be wrapped with an option in config.mak.uname.
>
> Thanks,
> Randall

Ugh. Fair enough.  How do other contributors feel about me moving back
to utime, but instead just doing the utime over in
builtins/pack-objects.c?  The idea would be to eliminate the mtime
logic entirely from write_loose_object and just do it at the top-level
in loosen_unused_packed_objects.

Thanks,
Neeraj
Neeraj Singh Sept. 8, 2021, 12:55 a.m. UTC | #5
On Tue, Sep 7, 2021 at 12:44 PM Neeraj Singh <nksingh85@gmail.com> wrote:
>
> Hello everyone,
> I'd like to bump this review up in people's inboxes since Patch V2
> hasn't gotten any traction in over a week.
>
> Thanks in advance for taking a look,
> - Neeraj Singh
> Windows Core Filesystems Team

BTW, I updated the github PR to enable batch mode everywhere, and all
the tests passed, which is good news to me.

Thanks,
Neeraj
Ævar Arnfjörð Bjarmason Sept. 8, 2021, 1:22 a.m. UTC | #6
On Tue, Sep 07 2021, Neeraj Singh wrote:

> On Tue, Sep 7, 2021 at 12:54 PM Randall S. Becker
> <rsbecker@nexbridge.com> wrote:
>>
>> On September 7, 2021 3:44 PM, Neeraj Singh wrote:
>> >On Fri, Aug 27, 2021 at 4:49 PM Neeraj K. Singh via GitGitGadget <gitgitgadget@gmail.com> wrote:
>> >>
>> >> Thanks to everyone for review so far! I've responded to the previous
>> >> feedback and changed the patch series a bit.
>> >>
>> >> Changes since v1:
>> >>
>> >>  * Switch from futimes(2) to futimens(2), which is in POSIX.1-2008. Contrary
>> >>    to dscho's suggestion, I'm still implementing the Windows version in the
>> >>    same patch and I'm not doing autoconf detection since this is a POSIX
>> >>    function.
>>
>> While POSIX.1-2008, this function is not available on every single
>> POSIX-compliant platform. Please make sure that the code will not
>> cause a breakage on some platforms - the ones I maintain, in
>> particular. Neither futimes nor futimens is available on either
>> NonStop ia64 or x86. The platform only has utime, so this needs to
>> be wrapped with an option in config.mak.uname.
>>
>> Thanks,
>> Randall
>
> Ugh. Fair enough.  How do other contributors feel about me moving back
> to utime, but instead just doing the utime over in
> builtins/pack-objects.c?  The idea would be to eliminate the mtime
> logic entirely from write_loose_object and just do it at the top-level
> in loosen_unused_packed_objects.

Aside from where it lives, can't we just have a wrapper that takes both
the filename & fd, and then on some platforms will need to dispatch to a
slower filename-only version, but can hopefully use the new fd-accepting
function?
Junio C Hamano Sept. 8, 2021, 6:44 a.m. UTC | #7
Neeraj Singh <nksingh85@gmail.com> writes:

> BTW, I updated the github PR to enable batch mode everywhere, and all
> the tests passed, which is good news to me.

I doubt that fsyncObjectFiles is something we can reliably test in
CI, either with the new batched thing or with the original "when we
close one, make sure the changes hit the disk platter" approach.  So
I am not sure what conclusion we should draw from such an experiment,
other than "ok, it compiles cleanly."  After all, unless we cause
system crashes, what we thought we have written and close(2) would
be seen by another process that we spawn after that, with or without
sync, no?
Christoph Hellwig Sept. 8, 2021, 6:49 a.m. UTC | #8
On Tue, Sep 07, 2021 at 11:44:52PM -0700, Junio C Hamano wrote:
> I doubt that fsyncObjectFiles is something we can reliably test in
> CI, either with the new batched thing or with the original "when we
> close one, make sure the changes hit the disk platter" approach.  So
> I am not sure what conclusion we should draw from such an experiment,
> other than "ok, it compiles cleanly."  After all, unless we cause
> system crashes, what we thought we have written and close(2) would
> be seen by another process that we spawn after that, with or without
> sync, no?

Basically yes.  XFS on Linux has shutdown ioctls that allow to simulate
that crash by shutting the file system down which really helps debugging
that kind of code.  A bunch of other file systems (ext4, f2fs) have
also picked this up now (grep for {XFS,EXT4,F2FS}_IOC_SHUTDOWN).
Randall S. Becker Sept. 8, 2021, 1:57 p.m. UTC | #9
On September 8, 2021 2:50 AM, Christoph Hellwig wrote:
>To: Junio C Hamano <gitster@pobox.com>
>Cc: Neeraj Singh <nksingh85@gmail.com>; Neeraj K. Singh via GitGitGadget <gitgitgadget@gmail.com>; Git List <git@vger.kernel.org>;
>Johannes Schindelin <Johannes.Schindelin@gmx.de>; Jeff King <peff@peff.net>; Jeff Hostetler <jeffhost@microsoft.com>; Christoph
>Hellwig <hch@lst.de>; Ævar Arnfjörð Bjarmason <avarab@gmail.com>; Neeraj K. Singh <neerajsi@microsoft.com>
>Subject: Re: [PATCH v2 0/6] Implement a batched fsync option for core.fsyncObjectFiles
>
>On Tue, Sep 07, 2021 at 11:44:52PM -0700, Junio C Hamano wrote:
>> I doubt that fsyncObjectFiles is something we can reliably test in CI,
>> either with the new batched thing or with the original "when we close
>> one, make sure the changes hit the disk platter" approach.  So I am
>> not sure what conclusion we should draw from such an experiment, other
>> than "ok, it compiles cleanly."  After all, unless we cause system
>> crashes, what we thought we have written and close(2) would be seen by
>> another process that we spawn after that, with or without sync, no?
>
>Basically yes.  XFS on Linux has shutdown ioctls that allow to simulate that crash by shutting the file system down which really
helps
>debugging that kind of code.  A bunch of other file systems (ext4, f2fs) have also picked this up now (grep for
>{XFS,EXT4,F2FS}_IOC_SHUTDOWN).

I strongly doubt this concept will work in an MPP architecture, particularly one where "shutting the file system down" is not
possible. I know of at least 3 operating systems where that is a bad plan, and if you did, you would take the test suite down while
you were at it.
-Randall
Randall S. Becker Sept. 8, 2021, 2:04 p.m. UTC | #10
On September 7, 2021 9:23 PM, Ævar Arnfjörð Bjarmason wrote:
>Subject: Re: [PATCH v2 0/6] Implement a batched fsync option for core.fsyncObjectFiles
>
>
>On Tue, Sep 07 2021, Neeraj Singh wrote:
>
>> On Tue, Sep 7, 2021 at 12:54 PM Randall S. Becker
>> <rsbecker@nexbridge.com> wrote:
>>>
>>> On September 7, 2021 3:44 PM, Neeraj Singh wrote:
>>> >On Fri, Aug 27, 2021 at 4:49 PM Neeraj K. Singh via GitGitGadget <gitgitgadget@gmail.com> wrote:
>>> >>
>>> >> Thanks to everyone for review so far! I've responded to the
>>> >> previous feedback and changed the patch series a bit.
>>> >>
>>> >> Changes since v1:
>>> >>
>>> >>  * Switch from futimes(2) to futimens(2), which is in POSIX.1-2008. Contrary
>>> >>    to dscho's suggestion, I'm still implementing the Windows version in the
>>> >>    same patch and I'm not doing autoconf detection since this is a POSIX
>>> >>    function.
>>>
>>> While POSIX.1-2008, this function is not available on every single
>>> POSIX-compliant platform. Please make sure that the code will not
>>> cause a breakage on some platforms - the ones I maintain, in
>>> particular. Neither futimes nor futimens is available on either
>>> NonStop ia64 or x86. The platform only has utime, so this needs to be
>>> wrapped with an option in config.mak.uname.
>>>
>>> Thanks,
>>> Randall
>>
>> Ugh. Fair enough.  How do other contributors feel about me moving back
>> to utime, but instead just doing the utime over in
>> builtins/pack-objects.c?  The idea would be to eliminate the mtime
>> logic entirely from write_loose_object and just do it at the top-level
>> in loosen_unused_packed_objects.
>
>Aside from where it lives, can't we just have a wrapper that takes both the filename & fd, and then on some platforms will need to
>dispatch to a slower filename-only version, but can hopefully use the new fd-accepting function?

I'm not really enamoured with this direction at all. It means that any platform would have to potentially skip a version of git (resulting from the broken build from a wrapper that is not compilable) after the patches were applied, unless the patches for all of those platforms are included. Even adding a Makefile option would be similar. This should be an "Enable if supported" feature, not a default-or-broken, feature. At best, I'd have to monitor for the time where the patch is applied and hope I can figure out the wrapper changes (around my $DAYJOB) in time to make the same release. This seems a bit counter to a "keeping things compatible" philosophy. Maybe there's something I'm missing here.
-Randall
Christoph Hellwig Sept. 8, 2021, 2:13 p.m. UTC | #11
On Wed, Sep 08, 2021 at 09:57:34AM -0400, Randall S. Becker wrote:
> possible. I know of at least 3 operating systems where that is a bad plan, and if you did, you would take the test suite down while
> you were at it.

I've just mentioned a good way to write a test for this feature on a
specific platform.  This is absolutely no judgement if that is a good
plan on other platforms.
Randall S. Becker Sept. 8, 2021, 2:25 p.m. UTC | #12
On September 8, 2021 10:13 AM, Christoph Hellwig wrote:
>Subject: Re: [PATCH v2 0/6] Implement a batched fsync option for core.fsyncObjectFiles
>
>On Wed, Sep 08, 2021 at 09:57:34AM -0400, Randall S. Becker wrote:
>> possible. I know of at least 3 operating systems where that is a bad
>> plan, and if you did, you would take the test suite down while you were at it.
>
>I've just mentioned a good way to write a test for this feature on a specific platform.  This is absolutely no judgement if that is
a good plan
>on other platforms.

Thank you for the clarification. I do appreciate it.
-Randall
Neeraj Singh Sept. 8, 2021, 4:34 p.m. UTC | #13
On Tue, Sep 7, 2021 at 11:44 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Neeraj Singh <nksingh85@gmail.com> writes:
>
> > BTW, I updated the github PR to enable batch mode everywhere, and all
> > the tests passed, which is good news to me.
>
> I doubt that fsyncObjectFiles is something we can reliably test in
> CI, either with the new batched thing or with the original "when we
> close one, make sure the changes hit the disk platter" approach.  So
> I am not sure what conclusion we should draw from such an experiment,
> other than "ok, it compiles cleanly."  After all, unless we cause
> system crashes, what we thought we have written and close(2) would
> be seen by another process that we spawn after that, with or without
> sync, no?

The main failure mode I was worried about is that some test or other part
of Git is relying on a loose object being immediately available after it is
added to the ODB. With batch mode, the loose objects aren't actually
available until the bulk checkin is unplugged.

I agree that it is not easy to test whether the data is actually going
to durable
storage at the expected time.  FWIW, I did take a disk IO trace on Windows to
verify that we are issuing disk writes and flushes at the right time.
But that's a
one-time test that would be hard to make automated.
Neeraj Singh Sept. 8, 2021, 7:01 p.m. UTC | #14
On Tue, Sep 7, 2021 at 6:23 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>
>
> On Tue, Sep 07 2021, Neeraj Singh wrote:
>
> > On Tue, Sep 7, 2021 at 12:54 PM Randall S. Becker
> > <rsbecker@nexbridge.com> wrote:
> >>
> >> On September 7, 2021 3:44 PM, Neeraj Singh wrote:
> >> >On Fri, Aug 27, 2021 at 4:49 PM Neeraj K. Singh via GitGitGadget <gitgitgadget@gmail.com> wrote:
> >> >>
> >> >> Thanks to everyone for review so far! I've responded to the previous
> >> >> feedback and changed the patch series a bit.
> >> >>
> >> >> Changes since v1:
> >> >>
> >> >>  * Switch from futimes(2) to futimens(2), which is in POSIX.1-2008. Contrary
> >> >>    to dscho's suggestion, I'm still implementing the Windows version in the
> >> >>    same patch and I'm not doing autoconf detection since this is a POSIX
> >> >>    function.
> >>
> >> While POSIX.1-2008, this function is not available on every single
> >> POSIX-compliant platform. Please make sure that the code will not
> >> cause a breakage on some platforms - the ones I maintain, in
> >> particular. Neither futimes nor futimens is available on either
> >> NonStop ia64 or x86. The platform only has utime, so this needs to
> >> be wrapped with an option in config.mak.uname.
> >>
> >> Thanks,
> >> Randall
> >
> > Ugh. Fair enough.  How do other contributors feel about me moving back
> > to utime, but instead just doing the utime over in
> > builtins/pack-objects.c?  The idea would be to eliminate the mtime
> > logic entirely from write_loose_object and just do it at the top-level
> > in loosen_unused_packed_objects.
>
> Aside from where it lives, can't we just have a wrapper that takes both
> the filename & fd, and then on some platforms will need to dispatch to a
> slower filename-only version, but can hopefully use the new fd-accepting
> function?

I had some concerns around using utime() while a file descriptor is open.
There's some risk of sharing violation on Windows (doesn't matter since we'd
be using futimens), but I was also concerned that there might be some OSes that
update the mtime on close(fd), thus overwriting the effects of utime.
Maybe that's an unwarranted concern, but it's part of why I didn't want to have
different call sequences on different OSes.

I'd be happy to implement your suggestion though and see what happens. But I
also feel that this time update thing is pretty ancillary to the real
goal of my change.
I'm only doing it because it's in the same area. The effects of
getting mtime wrong
would be pretty subtle -- I think we'd just not be deleting some
unpacked unreachable
objects as soon as expected.  Do you have a strong objection to
lifting the time update
logic out?
Junio C Hamano Sept. 8, 2021, 7:12 p.m. UTC | #15
Neeraj Singh <nksingh85@gmail.com> writes:

> On Tue, Sep 7, 2021 at 11:44 PM Junio C Hamano <gitster@pobox.com> wrote:
>>
>> Neeraj Singh <nksingh85@gmail.com> writes:
>>
>> > BTW, I updated the github PR to enable batch mode everywhere, and all
>> > the tests passed, which is good news to me.
>>
>> I doubt that fsyncObjectFiles is something we can reliably test in
>> CI, either with the new batched thing or with the original "when we
>> close one, make sure the changes hit the disk platter" approach.  So
>> I am not sure what conclusion we should draw from such an experiment,
>> other than "ok, it compiles cleanly."  After all, unless we cause
>> system crashes, what we thought we have written and close(2) would
>> be seen by another process that we spawn after that, with or without
>> sync, no?
>
> The main failure mode I was worried about is that some test or other part
> of Git is relying on a loose object being immediately available after it is
> added to the ODB. With batch mode, the loose objects aren't actually
> available until the bulk checkin is unplugged.

Ah, I see.  If there are two processes that communicate over pipes
to decide whose turn it is (perhaps a producer of data that feeds
fast-import may wait for fast-import to say "I gave this label to
the object you requested" and goes ahead to use that object), and at
the point that the "other" process takes its turn, if the objects
are not "flushed" yet, things can break.  That's a valid concern.
Neeraj Singh Sept. 8, 2021, 7:20 p.m. UTC | #16
On Wed, Sep 8, 2021 at 12:12 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Neeraj Singh <nksingh85@gmail.com> writes:
>
> > On Tue, Sep 7, 2021 at 11:44 PM Junio C Hamano <gitster@pobox.com> wrote:
> >>
> >> Neeraj Singh <nksingh85@gmail.com> writes:
> >>
> >> > BTW, I updated the github PR to enable batch mode everywhere, and all
> >> > the tests passed, which is good news to me.
> >>
> >> I doubt that fsyncObjectFiles is something we can reliably test in
> >> CI, either with the new batched thing or with the original "when we
> >> close one, make sure the changes hit the disk platter" approach.  So
> >> I am not sure what conclusion we should draw from such an experiment,
> >> other than "ok, it compiles cleanly."  After all, unless we cause
> >> system crashes, what we thought we have written and close(2) would
> >> be seen by another process that we spawn after that, with or without
> >> sync, no?
> >
> > The main failure mode I was worried about is that some test or other part
> > of Git is relying on a loose object being immediately available after it is
> > added to the ODB. With batch mode, the loose objects aren't actually
> > available until the bulk checkin is unplugged.
>
> Ah, I see.  If there are two processes that communicate over pipes
> to decide whose turn it is (perhaps a producer of data that feeds
> fast-import may wait for fast-import to say "I gave this label to
> the object you requested" and goes ahead to use that object), and at
> the point that the "other" process takes its turn, if the objects
> are not "flushed" yet, things can break.  That's a valid concern.

That's right. This appears to be a possibility in the existing bulk
checkin code that produces packfiles for large objects as well, but
my change makes the situation much more common.
Ævar Arnfjörð Bjarmason Sept. 8, 2021, 7:23 p.m. UTC | #17
On Wed, Sep 08 2021, Neeraj Singh wrote:

> On Tue, Sep 7, 2021 at 11:44 PM Junio C Hamano <gitster@pobox.com> wrote:
>>
>> Neeraj Singh <nksingh85@gmail.com> writes:
>>
>> > BTW, I updated the github PR to enable batch mode everywhere, and all
>> > the tests passed, which is good news to me.
>>
>> I doubt that fsyncObjectFiles is something we can reliably test in
>> CI, either with the new batched thing or with the original "when we
>> close one, make sure the changes hit the disk platter" approach.  So
>> I am not sure what conclusion we should draw from such an experiment,
>> other than "ok, it compiles cleanly."  After all, unless we cause
>> system crashes, what we thought we have written and close(2) would
>> be seen by another process that we spawn after that, with or without
>> sync, no?
>
> The main failure mode I was worried about is that some test or other part
> of Git is relying on a loose object being immediately available after it is
> added to the ODB. With batch mode, the loose objects aren't actually
> available until the bulk checkin is unplugged.
>
> I agree that it is not easy to test whether the data is actually going
> to durable
> storage at the expected time.  FWIW, I did take a disk IO trace on Windows to
> verify that we are issuing disk writes and flushes at the right time.
> But that's a
> one-time test that would be hard to make automated.

I have some semi-related patches I need to dig up and finish sometime
which add a "git gc" test mode to the test suite, i.e. any time we call
"git gc --auto" it will go ahead and actually run, and some adversarial
options to run always, right away, prune with --expire=now. It found
some false positives, but also some genuine races and bugs at the time.

Similarly, I think a good longer term goal for better fsync() and data
integrity in git is to refactor the various codepaths where we write to
disk (grepping for fsync_or_die() is a good start to find those) to all
live in one place, we could then easily instrument that code to run in a
hostile test mode.

E.g. make anything that expects to write out a "foo" file actually write
out "foo.not-synced-yet" as long as fsync() etc. hasn't been called, or
with signals/timers/atexit() handlers fake up known FS edge cases such
as a write of "foo" only renaming "foo.not-synced-yet" to "foo" 1s after
the last close() call not followed by an fsync, etc.

Anyway, I expect given your occupation that you may have better ideas in
that area, presumably needing to instrument and test behavior under I/O
pressure, deferred syncs etc. is something mature FS's need to deal with
as part of their own regression tests...

1. https://lore.kernel.org/git/cover-v2-0.4-0000000000-20210908T003631Z-avarab@gmail.com/