Message ID | 12cad737635663ed596e52f89f0f4f22f58bfe38.1632176111.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Implement a batched fsync option for core.fsyncObjectFiles | expand |
On Mon, Sep 20 2021, Neeraj Singh via GitGitGadget wrote: > When the new mode is enabled we do the following for new objects: > > 1. Create a tmp_obj_XXXX file and write the object data to it. > 2. Issue a pagecache writeback request and wait for it to complete. > 3. Record the tmp name and the final name in the bulk-checkin state for > later rename. > > At the end of the entire transaction we: > 1. Issue a fsync against the lock file to flush the hardware writeback > cache, which should by now have processed the tmp file writes. > 2. Rename all of the temp files to their final names. > 3. When updating the index and/or refs, we assume that Git will issue > another fsync internal to that operation. Perhaps note too that: 4. For loose objects, refs etc. we may or may not create directories, and most certainly will be updating metadata on the immediate directory containing the file, but none of that's fsync()'d. > On a filesystem with a singular journal that is updated during name > operations (e.g. create, link, rename, etc), such as NTFS and HFS+, we > would expect the fsync to trigger a journal writeout so that this > sequence is enough to ensure that the user's data is durable by the time > the git command returns. > > This change also updates the macOS code to trigger a real hardware flush > via fnctl(fd, F_FULLFSYNC) when fsync_or_die is called. Previously, on > macOS there was no guarantee of durability since a simple fsync(2) call > does not flush any hardware caches. There's no discussion of whether this is or isn't known to also work some Linux FS's, and for these OS's where this does work is this only for the object files themselves, or does metadata also "ride along"? > _Performance numbers_: > > Linux - Hyper-V VM running Kernel 5.11 (Ubuntu 20.04) on a fast SSD. > Mac - macOS 11.5.1 running on a Mac mini on a 1TB Apple SSD. > Windows - Same host as Linux, a preview version of Windows 11. > This number is from a patch later in the series. > > Adding 500 files to the repo with 'git add' Times reported in seconds. > > core.fsyncObjectFiles | Linux | Mac | Windows > ----------------------|-------|-------|-------- > false | 0.06 | 0.35 | 0.61 > true | 1.88 | 11.18 | 2.47 > batch | 0.15 | 0.41 | 1.53 Per my https://lore.kernel.org/git/87mtp5cwpn.fsf@evledraar.gmail.com and 6/6 in this series we've got perf tests for add/stash, but it would be really interesting to see how this is impacted by transfer.unpackLimit in cases where we may be writing packs or loose objects. > [...] > core.fsyncObjectFiles:: > - This boolean will enable 'fsync()' when writing object files. > -+ > -This is a total waste of time and effort on a filesystem that orders > -data writes properly, but can be useful for filesystems that do not use > -journalling (traditional UNIX filesystems) or that only journal metadata > -and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback"). > + A value indicating the level of effort Git will expend in > + trying to make objects added to the repo durable in the event > + of an unclean system shutdown. This setting currently only > + controls the object store, so updates to any refs or the > + index may not be equally durable. All these mentions of "object" should really clarify that it's "loose objects", i.e. we always fsync pack files. > +* `false` allows data to remain in file system caches according to > + operating system policy, whence it may be lost if the system loses power > + or crashes. As noted in point #4 of https://lore.kernel.org/git/87mtp5cwpn.fsf@evledraar.gmail.com/ while this direction is overall an improvement over the previously flippant docs, they at least alluded to the context that the assumption behind "false" is that you don't really care about loose objects, you care about loose objects *and* the ref update or whatever. As I think (this is from memory) we've covered already this may have been all based on some old ext3 assumption, but it's probably worth summarizing that here, i.e. if you've got an FS with global ordered operations you can probably skip this, but probably not etc. > +* `true` triggers a data integrity flush for each object added to the > + object store. This is the safest setting that is likely to ensure durability > + across all operating systems and file systems that honor the 'fsync' system > + call. However, this setting comes with a significant performance cost on > + common hardware. This is really overpromising things by omitting the fact that eve if we're getting this feature you've hacked up right, we're still not fsyncing dir entries etc (also noted above). So something that describes the narrow scope here, along with "loose objects" etc.... > +* `batch` enables an experimental mode that uses interfaces available in some > + operating systems to write object data with a minimal set of FLUSH CACHE > + (or equivalent) commands sent to the storage controller. If the operating > + system interfaces are not available, this mode behaves the same as `true`. > + This mode is expected to be safe on macOS for repos stored on HFS+ or APFS > + filesystems and on Windows for repos stored on NTFS or ReFS. Again, even if it's called "core.fsyncObjectFiles" if we're going to say "safe" we really need to say safe in what sense. Having written and fsync()'d the file is helping nobody if the metadata never arrives.... > +static void do_sync_and_rename(struct string_list *fsync_state, struct lock_file *lock_file) > +{ > + if (fsync_state->nr) { I think less indentation here would be nice: if (!fsync_state->nr) return; /* rest of unindented body */ Or better yet do this check in unplug_bulk_checkin(), then here: fsync_or_die(); for_each_string_list_item() { ...} string_list_clear(....); > + struct string_list_item *rename; > + > + /* > + * Issue a full hardware flush against the lock file to ensure > + * that all objects are durable before any renames occur. > + * The code in fsync_and_close_loose_object_bulk_checkin has > + * already ensured that writeout has occurred, but it has not > + * flushed any writeback cache in the storage hardware. > + */ > + fsync_or_die(get_lock_file_fd(lock_file), get_lock_file_path(lock_file)); > + > + for_each_string_list_item(rename, fsync_state) { > + const char *src = rename->string; > + const char *dst = rename->util; > + > + if (finalize_object_file(src, dst)) > + die_errno(_("could not rename '%s' to '%s'"), src, dst); > + } > + > + string_list_clear(fsync_state, 1); > + } > +} > + > static int already_written(struct bulk_checkin_state *state, struct object_id *oid) > { > int i; > @@ -256,6 +286,53 @@ static int deflate_to_pack(struct bulk_checkin_state *state, > return 0; > } > > +static void add_rename_bulk_checkin(struct string_list *fsync_state, > + const char *src, const char *dst) > +{ > + string_list_insert(fsync_state, src)->util = xstrdup(dst); > +} Just has one caller, why not just inline the string_list_insert() call... > +int fsync_and_close_loose_object_bulk_checkin(int fd, const char *tmpfile, > + const char *filename, time_t mtime) > +{ > + int do_finalize = 1; > + int ret = 0; > + > + if (fsync_object_files != FSYNC_OBJECT_FILES_OFF) { Let's do postive enum comparisons, and with switch() statements, so the compiler helps us to see if we've covered them all. > + /* > + * If we have a plugged bulk checkin, we issue a call that > + * cleans the filesystem page cache but avoids a hardware flush > + * command. Later on we will issue a single hardware flush > + * before renaming files as part of do_sync_and_rename. > + */ > + if (bulk_checkin_plugged && > + fsync_object_files == FSYNC_OBJECT_FILES_BATCH && > + git_fsync(fd, FSYNC_WRITEOUT_ONLY) >= 0) { > + add_rename_bulk_checkin(&bulk_fsync_state, tmpfile, filename); > + do_finalize = 0; > + > + } else { > + fsync_or_die(fd, "loose object file"); > + } > + } So nothing ever explicitly checks FSYNC_OBJECT_FILES_ON...? > -extern int fsync_object_files; > +enum FSYNC_OBJECT_FILES_MODE { > + FSYNC_OBJECT_FILES_OFF, > + FSYNC_OBJECT_FILES_ON, > + FSYNC_OBJECT_FILES_BATCH > +}; Style: We don't use ALL_CAPS for type names in this codebase, just the enum labels themselves.... > +extern enum FSYNC_OBJECT_FILES_MODE fsync_object_files; ...to the point where I had to rub my eyes to see what was going on here ... :) > - fsync_object_files = git_config_bool(var, value); > + if (value && !strcmp(value, "batch")) > + fsync_object_files = FSYNC_OBJECT_FILES_BATCH; > + else if (git_config_bool(var, value)) > + fsync_object_files = FSYNC_OBJECT_FILES_ON; > + else > + fsync_object_files = FSYNC_OBJECT_FILES_OFF; Since the point of this setting is safety, let's explicitly check true/false here, use git_config_maybe_bool(), and perhaps issue a warning on unknown values, but maybe that would get too verbose... If we have a future "supersafe" mode, it'll get mapped to "false" on older versions of git, probably not a good idea... > return 0; > } > > diff --git a/config.mak.uname b/config.mak.uname > index 76516aaa9a5..e6d482fbcc6 100644 > --- a/config.mak.uname > +++ b/config.mak.uname > @@ -53,6 +53,7 @@ ifeq ($(uname_S),Linux) > HAVE_CLOCK_MONOTONIC = YesPlease > # -lrt is needed for clock_gettime on glibc <= 2.16 > NEEDS_LIBRT = YesPlease > + HAVE_SYNC_FILE_RANGE = YesPlease > HAVE_GETDELIM = YesPlease > SANE_TEXT_GREP=-a > FREAD_READS_DIRECTORIES = UnfortunatelyYes > diff --git a/configure.ac b/configure.ac > index 031e8d3fee8..c711037d625 100644 > --- a/configure.ac > +++ b/configure.ac > @@ -1090,6 +1090,14 @@ AC_COMPILE_IFELSE([CLOCK_MONOTONIC_SRC], > [AC_MSG_RESULT([no]) > HAVE_CLOCK_MONOTONIC=]) > GIT_CONF_SUBST([HAVE_CLOCK_MONOTONIC]) > + > +# > +# Define HAVE_SYNC_FILE_RANGE=YesPlease if sync_file_range is available. > +GIT_CHECK_FUNC(sync_file_range, > + [HAVE_SYNC_FILE_RANGE=YesPlease], > + [HAVE_SYNC_FILE_RANGE]) > +GIT_CONF_SUBST([HAVE_SYNC_FILE_RANGE]) > + > # > # Define NO_SETITIMER if you don't have setitimer. > GIT_CHECK_FUNC(setitimer, > diff --git a/environment.c b/environment.c > index d6b22ede7ea..3e23eafff80 100644 > --- a/environment.c > +++ b/environment.c > @@ -43,7 +43,7 @@ const char *git_hooks_path; > int zlib_compression_level = Z_BEST_SPEED; > int core_compression_level; > int pack_compression_level = Z_DEFAULT_COMPRESSION; > -int fsync_object_files; > +enum FSYNC_OBJECT_FILES_MODE fsync_object_files; > size_t packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE; > size_t packed_git_limit = DEFAULT_PACKED_GIT_LIMIT; > size_t delta_base_cache_limit = 96 * 1024 * 1024; > diff --git a/git-compat-util.h b/git-compat-util.h > index b46605300ab..d14e2436276 100644 > --- a/git-compat-util.h > +++ b/git-compat-util.h > @@ -1210,6 +1210,13 @@ __attribute__((format (printf, 1, 2))) NORETURN > void BUG(const char *fmt, ...); > #endif > > +enum fsync_action { > + FSYNC_WRITEOUT_ONLY, > + FSYNC_HARDWARE_FLUSH > +}; > + > +int git_fsync(int fd, enum fsync_action action); > + > /* > * Preserves errno, prints a message, but gives no warning for ENOENT. > * Returns 0 on success, which includes trying to unlink an object that does > diff --git a/object-file.c b/object-file.c > index a8be8994814..ea14c3a3483 100644 > --- a/object-file.c > +++ b/object-file.c > @@ -1859,15 +1859,6 @@ int hash_object_file(const struct git_hash_algo *algo, const void *buf, > return 0; > } > > -/* Finalize a file on disk, and close it. */ > -static void close_loose_object(int fd) > -{ > - if (fsync_object_files) > - fsync_or_die(fd, "loose object file"); > - if (close(fd) != 0) > - die_errno(_("error when closing loose object file")); > -} > - > /* Size of directory component, including the ending '/' */ > static inline int directory_size(const char *filename) > { > @@ -1973,17 +1964,8 @@ static int write_loose_object(const struct object_id *oid, char *hdr, > die(_("confused by unstable object source data for %s"), > oid_to_hex(oid)); > > - close_loose_object(fd); > - > - if (mtime) { > - struct utimbuf utb; > - utb.actime = mtime; > - utb.modtime = mtime; > - if (utime(tmp_file.buf, &utb) < 0) > - warning_errno(_("failed utime() on %s"), tmp_file.buf); > - } > - > - return finalize_object_file(tmp_file.buf, filename.buf); > + return fsync_and_close_loose_object_bulk_checkin(fd, tmp_file.buf, > + filename.buf, mtime); > } > > static int freshen_loose_object(const struct object_id *oid) > diff --git a/wrapper.c b/wrapper.c > index 7c6586af321..bb4f9f043ce 100644 > --- a/wrapper.c > +++ b/wrapper.c > @@ -540,6 +540,50 @@ int xmkstemp_mode(char *filename_template, int mode) > return fd; > } > > +int git_fsync(int fd, enum fsync_action action) > +{ > + switch (action) { > + case FSYNC_WRITEOUT_ONLY: > + > +#ifdef __APPLE__ > + /* > + * on macOS, fsync just causes filesystem cache writeback but does not > + * flush hardware caches. > + */ > + return fsync(fd); > +#endif > + > +#ifdef HAVE_SYNC_FILE_RANGE > + /* > + * On linux 2.6.17 and above, sync_file_range is the way to issue > + * a writeback without a hardware flush. An offset of 0 and size of 0 > + * indicates writeout of the entire file and the wait flags ensure that all > + * dirty data is written to the disk (potentially in a disk-side cache) > + * before we continue. > + */ > + > + return sync_file_range(fd, 0, 0, SYNC_FILE_RANGE_WAIT_BEFORE | > + SYNC_FILE_RANGE_WRITE | > + SYNC_FILE_RANGE_WAIT_AFTER); > +#endif > + > + errno = ENOSYS; > + return -1; > + > + case FSYNC_HARDWARE_FLUSH: > + > +#ifdef __APPLE__ > + return fcntl(fd, F_FULLFSYNC); > +#else > + return fsync(fd); > +#endif > + > + default: > + BUG("unexpected git_fsync(%d) call", action); > + } > + > +} > + > static int warn_if_unremovable(const char *op, const char *file, int rc) > { > int err; > diff --git a/write-or-die.c b/write-or-die.c > index d33e68f6abb..8f53953d4ab 100644 > --- a/write-or-die.c > +++ b/write-or-die.c > @@ -57,7 +57,7 @@ void fprintf_or_die(FILE *f, const char *fmt, ...) > > void fsync_or_die(int fd, const char *msg) > { > - while (fsync(fd) < 0) { > + while (git_fsync(fd, FSYNC_HARDWARE_FLUSH) < 0) { > if (errno != EINTR) > die_errno("fsync error on '%s'", msg); > }
On Tue, Sep 21, 2021 at 4:41 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > > > On Mon, Sep 20 2021, Neeraj Singh via GitGitGadget wrote: > > > When the new mode is enabled we do the following for new objects: > > > > 1. Create a tmp_obj_XXXX file and write the object data to it. > > 2. Issue a pagecache writeback request and wait for it to complete. > > 3. Record the tmp name and the final name in the bulk-checkin state for > > later rename. > > > > At the end of the entire transaction we: > > 1. Issue a fsync against the lock file to flush the hardware writeback > > cache, which should by now have processed the tmp file writes. > > 2. Rename all of the temp files to their final names. > > 3. When updating the index and/or refs, we assume that Git will issue > > another fsync internal to that operation. > > Perhaps note too that: > > 4. For loose objects, refs etc. we may or may not create directories, > and most certainly will be updating metadata on the immediate > directory containing the file, but none of that's fsync()'d. > > > On a filesystem with a singular journal that is updated during name > > operations (e.g. create, link, rename, etc), such as NTFS and HFS+, we > > would expect the fsync to trigger a journal writeout so that this > > sequence is enough to ensure that the user's data is durable by the time > > the git command returns. > > > > This change also updates the macOS code to trigger a real hardware flush > > via fnctl(fd, F_FULLFSYNC) when fsync_or_die is called. Previously, on > > macOS there was no guarantee of durability since a simple fsync(2) call > > does not flush any hardware caches. > > There's no discussion of whether this is or isn't known to also work > some Linux FS's, and for these OS's where this does work is this only > for the object files themselves, or does metadata also "ride along"? > I unfortunately can't examine Linux kernel source code and the details of metadata consistency behavior across files is not something that anyone in that group wants to pin down. As far as I can tell, the only thing that's really guaranteed is fsyncing every single file you write down and its parent directory if you're creating a new file (which we always are). As came up in conversation with Christoph Hellwig elsewhere on thread, Linux doesn't have any set of syscalls to make batch mode safe. It does look like XFS would be safe if sync_file_ranges actually promised to wait for all pagecache writeback definitively, since it would do a "log force" to push all the dirty metadata to disk when we do our final fsync. I really didn't want to say something definitive about what Linux can or will do, since I'm not in a position to really know or influence them. Christoph did say that he would be interested in contributing a variant to this patch that would be definitively safe on filesystems that honor syncfs. > > _Performance numbers_: > > > > Linux - Hyper-V VM running Kernel 5.11 (Ubuntu 20.04) on a fast SSD. > > Mac - macOS 11.5.1 running on a Mac mini on a 1TB Apple SSD. > > Windows - Same host as Linux, a preview version of Windows 11. > > This number is from a patch later in the series. > > > > Adding 500 files to the repo with 'git add' Times reported in seconds. > > > > core.fsyncObjectFiles | Linux | Mac | Windows > > ----------------------|-------|-------|-------- > > false | 0.06 | 0.35 | 0.61 > > true | 1.88 | 11.18 | 2.47 > > batch | 0.15 | 0.41 | 1.53 > > Per my https://lore.kernel.org/git/87mtp5cwpn.fsf@evledraar.gmail.com > and 6/6 in this series we've got perf tests for add/stash, but it would > be really interesting to see how this is impacted by > transfer.unpackLimit in cases where we may be writing packs or loose > objects. I'm having trouble understanding how unpackLimit is related to 'git stash' or 'git add'. From code inspection, it doesn't look like we're using those settings for adding objects except from across a transport. Are you proposing that we have a similar setting for adding objects via 'add' using a packfile? I think that would be a good goal, but it might be a bit tricky since we've likely done a lot of the work to buffer the input objects in order to compute their OIDs, before we know how many objects there are to add. If the policy were to "always add to a packfile", it would be easier. > > > [...] > > core.fsyncObjectFiles:: > > - This boolean will enable 'fsync()' when writing object files. > > -+ > > -This is a total waste of time and effort on a filesystem that orders > > -data writes properly, but can be useful for filesystems that do not use > > -journalling (traditional UNIX filesystems) or that only journal metadata > > -and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback"). > > + A value indicating the level of effort Git will expend in > > + trying to make objects added to the repo durable in the event > > + of an unclean system shutdown. This setting currently only > > + controls the object store, so updates to any refs or the > > + index may not be equally durable. > > All these mentions of "object" should really clarify that it's "loose > objects", i.e. we always fsync pack files. > > > +* `false` allows data to remain in file system caches according to > > + operating system policy, whence it may be lost if the system loses power > > + or crashes. > > As noted in point #4 of > https://lore.kernel.org/git/87mtp5cwpn.fsf@evledraar.gmail.com/ while > this direction is overall an improvement over the previously flippant > docs, they at least alluded to the context that the assumption behind > "false" is that you don't really care about loose objects, you care > about loose objects *and* the ref update or whatever. > > As I think (this is from memory) we've covered already this may have > been all based on some old ext3 assumption, but it's probably worth > summarizing that here, i.e. if you've got an FS with global ordered > operations you can probably skip this, but probably not etc. > > > +* `true` triggers a data integrity flush for each object added to the > > + object store. This is the safest setting that is likely to ensure durability > > + across all operating systems and file systems that honor the 'fsync' system > > + call. However, this setting comes with a significant performance cost on > > + common hardware. > > This is really overpromising things by omitting the fact that eve if > we're getting this feature you've hacked up right, we're still not > fsyncing dir entries etc (also noted above). > > So something that describes the narrow scope here, along with "loose > objects" etc.... > > > +* `batch` enables an experimental mode that uses interfaces available in some > > + operating systems to write object data with a minimal set of FLUSH CACHE > > + (or equivalent) commands sent to the storage controller. If the operating > > + system interfaces are not available, this mode behaves the same as `true`. > > + This mode is expected to be safe on macOS for repos stored on HFS+ or APFS > > + filesystems and on Windows for repos stored on NTFS or ReFS. > > Again, even if it's called "core.fsyncObjectFiles" if we're going to say > "safe" we really need to say safe in what sense. Having written and > fsync()'d the file is helping nobody if the metadata never arrives.... > My concern with your feedback here is that this is user-facing documentation. I'd assume that people who are not intimately familiar with both their filesystem and Git's internals would just be completely mystified by a long commentary on the specifics in the Config documentation. I think over time Git should focus on making this setting really guarantee durability in a meaningful way across the entire repository. > > +static void do_sync_and_rename(struct string_list *fsync_state, struct lock_file *lock_file) > > +{ > > + if (fsync_state->nr) { > > I think less indentation here would be nice: > > if (!fsync_state->nr) > return; > /* rest of unindented body */ > Will fix. > Or better yet do this check in unplug_bulk_checkin(), then here: > > fsync_or_die(); > for_each_string_list_item() { ...} > string_list_clear(....); > > I'd prefer to put it in the callee for reasons of separation-of-concerns. I don't want to have the caller and callee partially implement the contract. The compiler should do a good enough job, since it's only one caller and will probably get totally inilined. > > + struct string_list_item *rename; > > + > > + /* > > + * Issue a full hardware flush against the lock file to ensure > > + * that all objects are durable before any renames occur. > > + * The code in fsync_and_close_loose_object_bulk_checkin has > > + * already ensured that writeout has occurred, but it has not > > + * flushed any writeback cache in the storage hardware. > > + */ > > + fsync_or_die(get_lock_file_fd(lock_file), get_lock_file_path(lock_file)); > > + > > + for_each_string_list_item(rename, fsync_state) { > > + const char *src = rename->string; > > + const char *dst = rename->util; > > + > > + if (finalize_object_file(src, dst)) > > + die_errno(_("could not rename '%s' to '%s'"), src, dst); > > + } > > + > > + string_list_clear(fsync_state, 1); > > + } > > +} > > + > > static int already_written(struct bulk_checkin_state *state, struct object_id *oid) > > { > > int i; > > @@ -256,6 +286,53 @@ static int deflate_to_pack(struct bulk_checkin_state *state, > > return 0; > > } > > > > +static void add_rename_bulk_checkin(struct string_list *fsync_state, > > + const char *src, const char *dst) > > +{ > > + string_list_insert(fsync_state, src)->util = xstrdup(dst); > > +} > > Just has one caller, why not just inline the string_list_insert() > call... > I thought about doing that before. I'll do it. > > +int fsync_and_close_loose_object_bulk_checkin(int fd, const char *tmpfile, > > + const char *filename, time_t mtime) > > +{ > > + int do_finalize = 1; > > + int ret = 0; > > + > > + if (fsync_object_files != FSYNC_OBJECT_FILES_OFF) { > > Let's do postive enum comparisons, and with switch() statements, so the > compiler helps us to see if we've covered them all. > Ok, will switch to switch. > > + /* > > + * If we have a plugged bulk checkin, we issue a call that > > + * cleans the filesystem page cache but avoids a hardware flush > > + * command. Later on we will issue a single hardware flush > > + * before renaming files as part of do_sync_and_rename. > > + */ > > + if (bulk_checkin_plugged && > > + fsync_object_files == FSYNC_OBJECT_FILES_BATCH && > > + git_fsync(fd, FSYNC_WRITEOUT_ONLY) >= 0) { > > + add_rename_bulk_checkin(&bulk_fsync_state, tmpfile, filename); > > + do_finalize = 0; > > + > > + } else { > > + fsync_or_die(fd, "loose object file"); > > + } > > + } > > So nothing ever explicitly checks FSYNC_OBJECT_FILES_ON...? > Yeah, I did it this way to avoid any code duplication, but I can change to a switch if it doesn't require too much repetition. > > -extern int fsync_object_files; > > +enum FSYNC_OBJECT_FILES_MODE { > > + FSYNC_OBJECT_FILES_OFF, > > + FSYNC_OBJECT_FILES_ON, > > + FSYNC_OBJECT_FILES_BATCH > > +}; > > Style: We don't use ALL_CAPS for type names in this codebase, just the > enum labels themselves.... > > > +extern enum FSYNC_OBJECT_FILES_MODE fsync_object_files; > > ...to the point where I had to rub my eyes to see what was going on here > ... :) > Sorry, Windows Developer :). Will fix. > > - fsync_object_files = git_config_bool(var, value); > > + if (value && !strcmp(value, "batch")) > > + fsync_object_files = FSYNC_OBJECT_FILES_BATCH; > > + else if (git_config_bool(var, value)) > > + fsync_object_files = FSYNC_OBJECT_FILES_ON; > > + else > > + fsync_object_files = FSYNC_OBJECT_FILES_OFF; > > Since the point of this setting is safety, let's explicitly check > true/false here, use git_config_maybe_bool(), and perhaps issue a > warning on unknown values, but maybe that would get too verbose... > > If we have a future "supersafe" mode, it'll get mapped to "false" on > older versions of git, probably not a good idea... > I took Junio's suggestion verbatim. I'll try a warning if the value exists, and is not 'batch' or <maybe bool>. Thanks for looking at my changes so thoroughly! -Neeraj
On Tue, Sep 21 2021, Neeraj Singh wrote: > On Tue, Sep 21, 2021 at 4:41 PM Ævar Arnfjörð Bjarmason > <avarab@gmail.com> wrote: >> >> >> On Mon, Sep 20 2021, Neeraj Singh via GitGitGadget wrote: >> >> > When the new mode is enabled we do the following for new objects: >> > >> > 1. Create a tmp_obj_XXXX file and write the object data to it. >> > 2. Issue a pagecache writeback request and wait for it to complete. >> > 3. Record the tmp name and the final name in the bulk-checkin state for >> > later rename. >> > >> > At the end of the entire transaction we: >> > 1. Issue a fsync against the lock file to flush the hardware writeback >> > cache, which should by now have processed the tmp file writes. >> > 2. Rename all of the temp files to their final names. >> > 3. When updating the index and/or refs, we assume that Git will issue >> > another fsync internal to that operation. >> >> Perhaps note too that: >> >> 4. For loose objects, refs etc. we may or may not create directories, >> and most certainly will be updating metadata on the immediate >> directory containing the file, but none of that's fsync()'d. >> >> > On a filesystem with a singular journal that is updated during name >> > operations (e.g. create, link, rename, etc), such as NTFS and HFS+, we >> > would expect the fsync to trigger a journal writeout so that this >> > sequence is enough to ensure that the user's data is durable by the time >> > the git command returns. >> > >> > This change also updates the macOS code to trigger a real hardware flush >> > via fnctl(fd, F_FULLFSYNC) when fsync_or_die is called. Previously, on >> > macOS there was no guarantee of durability since a simple fsync(2) call >> > does not flush any hardware caches. >> >> There's no discussion of whether this is or isn't known to also work >> some Linux FS's, and for these OS's where this does work is this only >> for the object files themselves, or does metadata also "ride along"? >> > > I unfortunately can't examine Linux kernel source code and the details > of metadata > consistency behavior across files is not something that anyone in that > group wants > to pin down. As far as I can tell, the only thing that's really > guaranteed is fsyncing > every single file you write down and its parent directory if you're > creating a new file > (which we always are). As came up in conversation with Christoph > Hellwig elsewhere > on thread, Linux doesn't have any set of syscalls to make batch mode > safe. It does look > like XFS would be safe if sync_file_ranges actually promised to wait > for all pagecache > writeback definitively, since it would do a "log force" to push all > the dirty metadata to > disk when we do our final fsync. > > I really didn't want to say something definitive about what Linux can > or will do, since I'm > not in a position to really know or influence them. Christoph did say > that he would be > interested in contributing a variant to this patch that would be > definitively safe on filesystems > that honor syncfs. *nod*, it's fine if it's omitted. Just wondering if we knew but weren't saying etc. >> > _Performance numbers_: >> > >> > Linux - Hyper-V VM running Kernel 5.11 (Ubuntu 20.04) on a fast SSD. >> > Mac - macOS 11.5.1 running on a Mac mini on a 1TB Apple SSD. >> > Windows - Same host as Linux, a preview version of Windows 11. >> > This number is from a patch later in the series. >> > >> > Adding 500 files to the repo with 'git add' Times reported in seconds. >> > >> > core.fsyncObjectFiles | Linux | Mac | Windows >> > ----------------------|-------|-------|-------- >> > false | 0.06 | 0.35 | 0.61 >> > true | 1.88 | 11.18 | 2.47 >> > batch | 0.15 | 0.41 | 1.53 >> >> Per my https://lore.kernel.org/git/87mtp5cwpn.fsf@evledraar.gmail.com >> and 6/6 in this series we've got perf tests for add/stash, but it would >> be really interesting to see how this is impacted by >> transfer.unpackLimit in cases where we may be writing packs or loose >> objects. > > I'm having trouble understanding how unpackLimit is related to 'git stash' > or 'git add'. From code inspection, it doesn't look like we're using > those settings > for adding objects except from across a transport. > > Are you proposing that we have a similar setting for adding objects > via 'add' using > a packfile? I think that would be a good goal, but it might be a bit > tricky since we've > likely done a lot of the work to buffer the input objects in order to > compute their OIDs, > before we know how many objects there are to add. If the policy were > to "always add to > a packfile", it would be easier. No, just that in the documentation that we should be explaining to the reader that this mode that optimizes for loose object writing benefits particular commands, but e.g. on the server-side that we'll probably never write 500 objects, but stream them to one pack. Which might also inform next steps for the commands this does help with, i.e. can we make more things stream to packs? I think having this mode is at worst a good transitory thing to have, but perhaps longer term we'll want to simply write fewer individual loose objects. In any case, pushing to a server with this configured and scaling that by transfer.unpackLimit should nicely demonstrate the pack v.s. loose object scenario at different fsck-settings. >> >> > [...] >> > core.fsyncObjectFiles:: >> > - This boolean will enable 'fsync()' when writing object files. >> > -+ >> > -This is a total waste of time and effort on a filesystem that orders >> > -data writes properly, but can be useful for filesystems that do not use >> > -journalling (traditional UNIX filesystems) or that only journal metadata >> > -and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback"). >> > + A value indicating the level of effort Git will expend in >> > + trying to make objects added to the repo durable in the event >> > + of an unclean system shutdown. This setting currently only >> > + controls the object store, so updates to any refs or the >> > + index may not be equally durable. >> >> All these mentions of "object" should really clarify that it's "loose >> objects", i.e. we always fsync pack files. >> >> > +* `false` allows data to remain in file system caches according to >> > + operating system policy, whence it may be lost if the system loses power >> > + or crashes. >> >> As noted in point #4 of >> https://lore.kernel.org/git/87mtp5cwpn.fsf@evledraar.gmail.com/ while >> this direction is overall an improvement over the previously flippant >> docs, they at least alluded to the context that the assumption behind >> "false" is that you don't really care about loose objects, you care >> about loose objects *and* the ref update or whatever. >> >> As I think (this is from memory) we've covered already this may have >> been all based on some old ext3 assumption, but it's probably worth >> summarizing that here, i.e. if you've got an FS with global ordered >> operations you can probably skip this, but probably not etc. >> >> > +* `true` triggers a data integrity flush for each object added to the >> > + object store. This is the safest setting that is likely to ensure durability >> > + across all operating systems and file systems that honor the 'fsync' system >> > + call. However, this setting comes with a significant performance cost on >> > + common hardware. >> >> This is really overpromising things by omitting the fact that eve if >> we're getting this feature you've hacked up right, we're still not >> fsyncing dir entries etc (also noted above). >> >> So something that describes the narrow scope here, along with "loose >> objects" etc.... >> >> > +* `batch` enables an experimental mode that uses interfaces available in some >> > + operating systems to write object data with a minimal set of FLUSH CACHE >> > + (or equivalent) commands sent to the storage controller. If the operating >> > + system interfaces are not available, this mode behaves the same as `true`. >> > + This mode is expected to be safe on macOS for repos stored on HFS+ or APFS >> > + filesystems and on Windows for repos stored on NTFS or ReFS. >> >> Again, even if it's called "core.fsyncObjectFiles" if we're going to say >> "safe" we really need to say safe in what sense. Having written and >> fsync()'d the file is helping nobody if the metadata never arrives.... >> > > My concern with your feedback here is that this is user-facing documentation. > I'd assume that people who are not intimately familiar with both their > filesystem > and Git's internals would just be completely mystified by a long commentary on > the specifics in the Config documentation. I think over time Git should focus on > making this setting really guarantee durability in a meaningful way > across the entire > repository. Yeah, this setting though is probably going to be tweaked only by fairly expert-level users of git. I think it's fine if it just explicitly punts and says something like 'this is what it does, this may or may not work on your FS' etc., my main issue with the current docs is that they give off this vibe of knowing a lot more than they're telling you. >> > +static void do_sync_and_rename(struct string_list *fsync_state, struct lock_file *lock_file) >> > +{ >> > + if (fsync_state->nr) { >> >> I think less indentation here would be nice: >> >> if (!fsync_state->nr) >> return; >> /* rest of unindented body */ >> > > Will fix. > >> Or better yet do this check in unplug_bulk_checkin(), then here: >> >> fsync_or_die(); >> for_each_string_list_item() { ...} >> string_list_clear(....); >> >> > > I'd prefer to put it in the callee for reasons of > separation-of-concerns. I don't want > to have the caller and callee partially implement the contract. The > compiler should > do a good enough job, since it's only one caller and will probably get > totally inilined. *nod* For what it's worth I meant the "inlined" just in terms of avoiding the indirection for human readers, it won't matter to the machine, especially since this is all I/O bound...
On Tue, Sep 21, 2021 at 6:23 PM Neeraj Singh <nksingh85@gmail.com> wrote: > > On Tue, Sep 21, 2021 at 4:41 PM Ævar Arnfjörð Bjarmason > <avarab@gmail.com> wrote: > > > > > > On Mon, Sep 20 2021, Neeraj Singh via GitGitGadget wrote: > > > - fsync_object_files = git_config_bool(var, value); > > > + if (value && !strcmp(value, "batch")) > > > + fsync_object_files = FSYNC_OBJECT_FILES_BATCH; > > > + else if (git_config_bool(var, value)) > > > + fsync_object_files = FSYNC_OBJECT_FILES_ON; > > > + else > > > + fsync_object_files = FSYNC_OBJECT_FILES_OFF; > > > > Since the point of this setting is safety, let's explicitly check > > true/false here, use git_config_maybe_bool(), and perhaps issue a > > warning on unknown values, but maybe that would get too verbose... > > > > If we have a future "supersafe" mode, it'll get mapped to "false" on > > older versions of git, probably not a good idea... > > > > I took Junio's suggestion verbatim. I'll try a warning if the value > exists, and is not 'batch' or <maybe bool>. An update on this. I tested out some values: nksingh@neerajsi-x1:~/src/git$ ./git -c core.fsyncobjectfiles=batch add ./ fsync_object_files: 2 nksingh@neerajsi-x1:~/src/git$ ./git -c core.fsyncobjectfiles=0 add ./ fsync_object_files: 0 nksingh@neerajsi-x1:~/src/git$ ./git -c core.fsyncobjectfiles=1 add ./ fsync_object_files: 1 nksingh@neerajsi-x1:~/src/git$ ./git -c core.fsyncobjectfiles=2 add ./ fsync_object_files: 1 nksingh@neerajsi-x1:~/src/git$ ./git -c core.fsyncobjectfiles=barf add ./ fatal: bad boolean config value 'barf' for 'core.fsyncobjectfiles' nksingh@neerajsi-x1:~/src/git$ ./git -c core.fsyncobjectfiles=true add ./ fsync_object_files: 1 nksingh@neerajsi-x1:~/src/git$ ./git -c core.fsyncobjectfiles=false add ./ fsync_object_files: 0 nksingh@neerajsi-x1:~/src/git$ ./git -c core.fsyncobjectfiles=t add ./ fatal: bad boolean config value 't' for 'core.fsyncobjectfiles' nksingh@neerajsi-x1:~/src/git$ ./git -c core.fsyncobjectfiles=y add ./ fatal: bad boolean config value 'y' for 'core.fsyncobjectfiles' nksingh@neerajsi-x1:~/src/git$ ./git -c core.fsyncobjectfiles=yes add ./ fsync_object_files: 1 nksingh@neerajsi-x1:~/src/git$ ./git -c core.fsyncobjectfiles=no add ./ fsync_object_files: 0 nksingh@neerajsi-x1:~/src/git$ ./git -c core.fsyncobjectfiles=nope add ./ fatal: bad boolean config value 'nope' for 'core.fsyncobjectfiles' So I think the code already works like you are suggesting (thanks Junio!).
diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt index c04f62a54a1..0006d90980d 100644 --- a/Documentation/config/core.txt +++ b/Documentation/config/core.txt @@ -548,12 +548,26 @@ core.whitespace:: errors. The default tab width is 8. Allowed values are 1 to 63. core.fsyncObjectFiles:: - This boolean will enable 'fsync()' when writing object files. -+ -This is a total waste of time and effort on a filesystem that orders -data writes properly, but can be useful for filesystems that do not use -journalling (traditional UNIX filesystems) or that only journal metadata -and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback"). + A value indicating the level of effort Git will expend in + trying to make objects added to the repo durable in the event + of an unclean system shutdown. This setting currently only + controls the object store, so updates to any refs or the + index may not be equally durable. ++ +* `false` allows data to remain in file system caches according to + operating system policy, whence it may be lost if the system loses power + or crashes. +* `true` triggers a data integrity flush for each object added to the + object store. This is the safest setting that is likely to ensure durability + across all operating systems and file systems that honor the 'fsync' system + call. However, this setting comes with a significant performance cost on + common hardware. +* `batch` enables an experimental mode that uses interfaces available in some + operating systems to write object data with a minimal set of FLUSH CACHE + (or equivalent) commands sent to the storage controller. If the operating + system interfaces are not available, this mode behaves the same as `true`. + This mode is expected to be safe on macOS for repos stored on HFS+ or APFS + filesystems and on Windows for repos stored on NTFS or ReFS. core.preloadIndex:: Enable parallel index preload for operations like 'git diff' diff --git a/Makefile b/Makefile index 429c276058d..326c7607e0f 100644 --- a/Makefile +++ b/Makefile @@ -406,6 +406,8 @@ all:: # # Define HAVE_CLOCK_MONOTONIC if your platform has CLOCK_MONOTONIC. # +# Define HAVE_SYNC_FILE_RANGE if your platform has sync_file_range. +# # Define NEEDS_LIBRT if your platform requires linking with librt (glibc version # before 2.17) for clock_gettime and CLOCK_MONOTONIC. # @@ -1896,6 +1898,10 @@ ifdef HAVE_CLOCK_MONOTONIC BASIC_CFLAGS += -DHAVE_CLOCK_MONOTONIC endif +ifdef HAVE_SYNC_FILE_RANGE + BASIC_CFLAGS += -DHAVE_SYNC_FILE_RANGE +endif + ifdef NEEDS_LIBRT EXTLIBS += -lrt endif diff --git a/builtin/add.c b/builtin/add.c index 2244311d485..dda4bf093a0 100644 --- a/builtin/add.c +++ b/builtin/add.c @@ -678,7 +678,8 @@ int cmd_add(int argc, const char **argv, const char *prefix) if (chmod_arg && pathspec.nr) exit_status |= chmod_pathspec(&pathspec, chmod_arg[0], show_only); - unplug_bulk_checkin(); + + unplug_bulk_checkin(&lock_file); finish: if (write_locked_index(&the_index, &lock_file, diff --git a/bulk-checkin.c b/bulk-checkin.c index f117d62c908..ddbab5e5c8c 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -3,15 +3,19 @@ */ #include "cache.h" #include "bulk-checkin.h" +#include "lockfile.h" #include "repository.h" #include "csum-file.h" #include "pack.h" #include "strbuf.h" +#include "string-list.h" #include "packfile.h" #include "object-store.h" static int bulk_checkin_plugged; +static struct string_list bulk_fsync_state = STRING_LIST_INIT_DUP; + static struct bulk_checkin_state { char *pack_tmp_name; struct hashfile *f; @@ -62,6 +66,32 @@ clear_exit: reprepare_packed_git(the_repository); } +static void do_sync_and_rename(struct string_list *fsync_state, struct lock_file *lock_file) +{ + if (fsync_state->nr) { + struct string_list_item *rename; + + /* + * Issue a full hardware flush against the lock file to ensure + * that all objects are durable before any renames occur. + * The code in fsync_and_close_loose_object_bulk_checkin has + * already ensured that writeout has occurred, but it has not + * flushed any writeback cache in the storage hardware. + */ + fsync_or_die(get_lock_file_fd(lock_file), get_lock_file_path(lock_file)); + + for_each_string_list_item(rename, fsync_state) { + const char *src = rename->string; + const char *dst = rename->util; + + if (finalize_object_file(src, dst)) + die_errno(_("could not rename '%s' to '%s'"), src, dst); + } + + string_list_clear(fsync_state, 1); + } +} + static int already_written(struct bulk_checkin_state *state, struct object_id *oid) { int i; @@ -256,6 +286,53 @@ static int deflate_to_pack(struct bulk_checkin_state *state, return 0; } +static void add_rename_bulk_checkin(struct string_list *fsync_state, + const char *src, const char *dst) +{ + string_list_insert(fsync_state, src)->util = xstrdup(dst); +} + +int fsync_and_close_loose_object_bulk_checkin(int fd, const char *tmpfile, + const char *filename, time_t mtime) +{ + int do_finalize = 1; + int ret = 0; + + if (fsync_object_files != FSYNC_OBJECT_FILES_OFF) { + /* + * If we have a plugged bulk checkin, we issue a call that + * cleans the filesystem page cache but avoids a hardware flush + * command. Later on we will issue a single hardware flush + * before renaming files as part of do_sync_and_rename. + */ + if (bulk_checkin_plugged && + fsync_object_files == FSYNC_OBJECT_FILES_BATCH && + git_fsync(fd, FSYNC_WRITEOUT_ONLY) >= 0) { + add_rename_bulk_checkin(&bulk_fsync_state, tmpfile, filename); + do_finalize = 0; + + } else { + fsync_or_die(fd, "loose object file"); + } + } + + if (close(fd)) + die_errno(_("error when closing loose object file")); + + if (mtime) { + struct utimbuf utb; + utb.actime = mtime; + utb.modtime = mtime; + if (utime(tmpfile, &utb) < 0) + warning_errno(_("failed utime() on %s"), tmpfile); + } + + if (do_finalize) + ret = finalize_object_file(tmpfile, filename); + + return ret; +} + int index_bulk_checkin(struct object_id *oid, int fd, size_t size, enum object_type type, const char *path, unsigned flags) @@ -273,10 +350,12 @@ void plug_bulk_checkin(void) bulk_checkin_plugged = 1; } -void unplug_bulk_checkin(void) +void unplug_bulk_checkin(struct lock_file *lock_file) { assert(bulk_checkin_plugged); bulk_checkin_plugged = 0; if (bulk_checkin_state.f) finish_bulk_checkin(&bulk_checkin_state); + + do_sync_and_rename(&bulk_fsync_state, lock_file); } diff --git a/bulk-checkin.h b/bulk-checkin.h index b26f3dc3b74..4a3309c1531 100644 --- a/bulk-checkin.h +++ b/bulk-checkin.h @@ -6,11 +6,14 @@ #include "cache.h" +int fsync_and_close_loose_object_bulk_checkin(int fd, const char *tmpfile, + const char *filename, time_t mtime); + int index_bulk_checkin(struct object_id *oid, int fd, size_t size, enum object_type type, const char *path, unsigned flags); void plug_bulk_checkin(void); -void unplug_bulk_checkin(void); +void unplug_bulk_checkin(struct lock_file *); #endif diff --git a/cache.h b/cache.h index d23de693680..39b3a88181a 100644 --- a/cache.h +++ b/cache.h @@ -985,7 +985,13 @@ void reset_shared_repository(void); extern int read_replace_refs; extern char *git_replace_ref_base; -extern int fsync_object_files; +enum FSYNC_OBJECT_FILES_MODE { + FSYNC_OBJECT_FILES_OFF, + FSYNC_OBJECT_FILES_ON, + FSYNC_OBJECT_FILES_BATCH +}; + +extern enum FSYNC_OBJECT_FILES_MODE fsync_object_files; extern int core_preload_index; extern int precomposed_unicode; extern int protect_hfs; diff --git a/config.c b/config.c index cb4a8058bff..1b403e00241 100644 --- a/config.c +++ b/config.c @@ -1509,7 +1509,12 @@ static int git_default_core_config(const char *var, const char *value, void *cb) } if (!strcmp(var, "core.fsyncobjectfiles")) { - fsync_object_files = git_config_bool(var, value); + if (value && !strcmp(value, "batch")) + fsync_object_files = FSYNC_OBJECT_FILES_BATCH; + else if (git_config_bool(var, value)) + fsync_object_files = FSYNC_OBJECT_FILES_ON; + else + fsync_object_files = FSYNC_OBJECT_FILES_OFF; return 0; } diff --git a/config.mak.uname b/config.mak.uname index 76516aaa9a5..e6d482fbcc6 100644 --- a/config.mak.uname +++ b/config.mak.uname @@ -53,6 +53,7 @@ ifeq ($(uname_S),Linux) HAVE_CLOCK_MONOTONIC = YesPlease # -lrt is needed for clock_gettime on glibc <= 2.16 NEEDS_LIBRT = YesPlease + HAVE_SYNC_FILE_RANGE = YesPlease HAVE_GETDELIM = YesPlease SANE_TEXT_GREP=-a FREAD_READS_DIRECTORIES = UnfortunatelyYes diff --git a/configure.ac b/configure.ac index 031e8d3fee8..c711037d625 100644 --- a/configure.ac +++ b/configure.ac @@ -1090,6 +1090,14 @@ AC_COMPILE_IFELSE([CLOCK_MONOTONIC_SRC], [AC_MSG_RESULT([no]) HAVE_CLOCK_MONOTONIC=]) GIT_CONF_SUBST([HAVE_CLOCK_MONOTONIC]) + +# +# Define HAVE_SYNC_FILE_RANGE=YesPlease if sync_file_range is available. +GIT_CHECK_FUNC(sync_file_range, + [HAVE_SYNC_FILE_RANGE=YesPlease], + [HAVE_SYNC_FILE_RANGE]) +GIT_CONF_SUBST([HAVE_SYNC_FILE_RANGE]) + # # Define NO_SETITIMER if you don't have setitimer. GIT_CHECK_FUNC(setitimer, diff --git a/environment.c b/environment.c index d6b22ede7ea..3e23eafff80 100644 --- a/environment.c +++ b/environment.c @@ -43,7 +43,7 @@ const char *git_hooks_path; int zlib_compression_level = Z_BEST_SPEED; int core_compression_level; int pack_compression_level = Z_DEFAULT_COMPRESSION; -int fsync_object_files; +enum FSYNC_OBJECT_FILES_MODE fsync_object_files; size_t packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE; size_t packed_git_limit = DEFAULT_PACKED_GIT_LIMIT; size_t delta_base_cache_limit = 96 * 1024 * 1024; diff --git a/git-compat-util.h b/git-compat-util.h index b46605300ab..d14e2436276 100644 --- a/git-compat-util.h +++ b/git-compat-util.h @@ -1210,6 +1210,13 @@ __attribute__((format (printf, 1, 2))) NORETURN void BUG(const char *fmt, ...); #endif +enum fsync_action { + FSYNC_WRITEOUT_ONLY, + FSYNC_HARDWARE_FLUSH +}; + +int git_fsync(int fd, enum fsync_action action); + /* * Preserves errno, prints a message, but gives no warning for ENOENT. * Returns 0 on success, which includes trying to unlink an object that does diff --git a/object-file.c b/object-file.c index a8be8994814..ea14c3a3483 100644 --- a/object-file.c +++ b/object-file.c @@ -1859,15 +1859,6 @@ int hash_object_file(const struct git_hash_algo *algo, const void *buf, return 0; } -/* Finalize a file on disk, and close it. */ -static void close_loose_object(int fd) -{ - if (fsync_object_files) - fsync_or_die(fd, "loose object file"); - if (close(fd) != 0) - die_errno(_("error when closing loose object file")); -} - /* Size of directory component, including the ending '/' */ static inline int directory_size(const char *filename) { @@ -1973,17 +1964,8 @@ static int write_loose_object(const struct object_id *oid, char *hdr, die(_("confused by unstable object source data for %s"), oid_to_hex(oid)); - close_loose_object(fd); - - if (mtime) { - struct utimbuf utb; - utb.actime = mtime; - utb.modtime = mtime; - if (utime(tmp_file.buf, &utb) < 0) - warning_errno(_("failed utime() on %s"), tmp_file.buf); - } - - return finalize_object_file(tmp_file.buf, filename.buf); + return fsync_and_close_loose_object_bulk_checkin(fd, tmp_file.buf, + filename.buf, mtime); } static int freshen_loose_object(const struct object_id *oid) diff --git a/wrapper.c b/wrapper.c index 7c6586af321..bb4f9f043ce 100644 --- a/wrapper.c +++ b/wrapper.c @@ -540,6 +540,50 @@ int xmkstemp_mode(char *filename_template, int mode) return fd; } +int git_fsync(int fd, enum fsync_action action) +{ + switch (action) { + case FSYNC_WRITEOUT_ONLY: + +#ifdef __APPLE__ + /* + * on macOS, fsync just causes filesystem cache writeback but does not + * flush hardware caches. + */ + return fsync(fd); +#endif + +#ifdef HAVE_SYNC_FILE_RANGE + /* + * On linux 2.6.17 and above, sync_file_range is the way to issue + * a writeback without a hardware flush. An offset of 0 and size of 0 + * indicates writeout of the entire file and the wait flags ensure that all + * dirty data is written to the disk (potentially in a disk-side cache) + * before we continue. + */ + + return sync_file_range(fd, 0, 0, SYNC_FILE_RANGE_WAIT_BEFORE | + SYNC_FILE_RANGE_WRITE | + SYNC_FILE_RANGE_WAIT_AFTER); +#endif + + errno = ENOSYS; + return -1; + + case FSYNC_HARDWARE_FLUSH: + +#ifdef __APPLE__ + return fcntl(fd, F_FULLFSYNC); +#else + return fsync(fd); +#endif + + default: + BUG("unexpected git_fsync(%d) call", action); + } + +} + static int warn_if_unremovable(const char *op, const char *file, int rc) { int err; diff --git a/write-or-die.c b/write-or-die.c index d33e68f6abb..8f53953d4ab 100644 --- a/write-or-die.c +++ b/write-or-die.c @@ -57,7 +57,7 @@ void fprintf_or_die(FILE *f, const char *fmt, ...) void fsync_or_die(int fd, const char *msg) { - while (fsync(fd) < 0) { + while (git_fsync(fd, FSYNC_HARDWARE_FLUSH) < 0) { if (errno != EINTR) die_errno("fsync error on '%s'", msg); }