From patchwork Mon Sep 20 22:15:06 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Neeraj Singh (WINDOWS-SFS)" X-Patchwork-Id: 12506781 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 26076C433F5 for ; Tue, 21 Sep 2021 02:20:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0E81D60F9D for ; Tue, 21 Sep 2021 02:20:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230106AbhIUCVr (ORCPT ); Mon, 20 Sep 2021 22:21:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33444 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236462AbhIUBuT (ORCPT ); Mon, 20 Sep 2021 21:50:19 -0400 Received: from mail-wr1-x42a.google.com (mail-wr1-x42a.google.com [IPv6:2a00:1450:4864:20::42a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 211E0C051776 for ; Mon, 20 Sep 2021 15:15:15 -0700 (PDT) Received: by mail-wr1-x42a.google.com with SMTP id q26so33701096wrc.7 for ; Mon, 20 Sep 2021 15:15:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=7X2vP95Nrih2/ZYF5MeQkFO3N7Ybz8M6MMt6amHXYUA=; b=iAG9Y6qlP1M3v0XVHmM7/d3UxxtW2TDLbfnVrtgAmQruYz7BmhChA8F0XMzGVZ9e8G f4pegY4V++H0vvRH39G8MRIk1w8d/JcGndxnw1wwJ+1IIX2i8jczx3yv+L3DZhefU8BV 5I89oghoJBxESRjMf+iBfVcwmMm1Q1qt5LYT+1BI1bSykO5J/Hmr0PBhELYE8VdT7/f7 Due1GLhelPZvzjmMebwh3BfXwMYWWdpHD/znPdi8udobnNsem5Y+wZC5/AhxoP5VKZ2f EC61pXCLGGNzfuQbfzGmOltxD77svwSvCr+lOjvsnUmN8HQfEkGwAHVCQnkE+ZIv3gSC cwVg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=7X2vP95Nrih2/ZYF5MeQkFO3N7Ybz8M6MMt6amHXYUA=; b=lU6KRUhVFhOQDXC+3xjdGHCRZGBWoSjJ+bk0kZ3n7wiDPQE0bdw1cEFOM8O2+CK9+m uJlPFaP2uSuswwTjZzX43odTr7A7yoMRSP7QzTOoMJaF/Ie4EJvGCWazJRqxn3lZUmiw Saf8/IPvIMq7a5Np2TI81OiAAjB69GiUXEfBD/nEnGObaZhos6KGVFQTQljUm2RjwjR9 OZSYLmfjF+x2h7p4yIlKF9Ai42eP8gSxgnNvwS9yRR9TeK+CSL92oieW42XwqbBM9q8K LKp4ofi+IjF2J8Z4XmeyBEpHV3GllNExp/xpe8LcBUtbObYDdkJZqRpx2UkwvMy+GnPX Y1Dg== X-Gm-Message-State: AOAM533ZcTseddmEBs+bOA3ND9MF9BbnL1Cnx0bmGTIuLHeVkyjpCH5V SwJFZXqtedvBVrGjNpol9B1MORi1BJQ= X-Google-Smtp-Source: ABdhPJyGqpYoEy1IjZtxF9PEx0Mq+wSvht3Lek9u/N4Y47tgdCJVjTGDYdBayQ/spkN1IVronKN/BA== X-Received: by 2002:a1c:7906:: with SMTP id l6mr1215086wme.78.1632176113733; Mon, 20 Sep 2021 15:15:13 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id l2sm1163232wmi.1.2021.09.20.15.15.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Sep 2021 15:15:13 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Mon, 20 Sep 2021 22:15:06 +0000 Subject: [PATCH v4 1/6] bulk-checkin: rename 'state' variable and separate 'plugged' boolean Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Neeraj-Personal , Johannes Schindelin , Jeff King , Jeff Hostetler , Christoph Hellwig , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , "Randall S. Becker" , Bagas Sanjaya , "Neeraj K. Singh" , Neeraj Singh Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Neeraj Singh From: Neeraj Singh Preparation for adding bulk-fsync to the bulk-checkin.c infrastructure. * Rename 'state' variable to 'bulk_checkin_state', since we will later be adding 'bulk_fsync_state'. This also makes the variable easier to find in the debugger, since the name is more unique. * Move the 'plugged' data member of 'bulk_checkin_state' into a separate static variable. Doing this avoids resetting the variable in finish_bulk_checkin when zeroing the 'bulk_checkin_state'. As-is, we seem to unintentionally disable the plugging functionality the first time a new packfile must be created due to packfile size limits. While disabling the plugging state only results in suboptimal behavior for the current code, it would be fatal for the bulk-fsync functionality later in this patch series. Signed-off-by: Neeraj Singh --- bulk-checkin.c | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/bulk-checkin.c b/bulk-checkin.c index b023d9959aa..f117d62c908 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -10,9 +10,9 @@ #include "packfile.h" #include "object-store.h" -static struct bulk_checkin_state { - unsigned plugged:1; +static int bulk_checkin_plugged; +static struct bulk_checkin_state { char *pack_tmp_name; struct hashfile *f; off_t offset; @@ -21,7 +21,7 @@ static struct bulk_checkin_state { struct pack_idx_entry **written; uint32_t alloc_written; uint32_t nr_written; -} state; +} bulk_checkin_state; static void finish_bulk_checkin(struct bulk_checkin_state *state) { @@ -260,21 +260,23 @@ int index_bulk_checkin(struct object_id *oid, int fd, size_t size, enum object_type type, const char *path, unsigned flags) { - int status = deflate_to_pack(&state, oid, fd, size, type, + int status = deflate_to_pack(&bulk_checkin_state, oid, fd, size, type, path, flags); - if (!state.plugged) - finish_bulk_checkin(&state); + if (!bulk_checkin_plugged) + finish_bulk_checkin(&bulk_checkin_state); return status; } void plug_bulk_checkin(void) { - state.plugged = 1; + assert(!bulk_checkin_plugged); + bulk_checkin_plugged = 1; } void unplug_bulk_checkin(void) { - state.plugged = 0; - if (state.f) - finish_bulk_checkin(&state); + assert(bulk_checkin_plugged); + bulk_checkin_plugged = 0; + if (bulk_checkin_state.f) + finish_bulk_checkin(&bulk_checkin_state); } From patchwork Mon Sep 20 22:15:07 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Neeraj Singh (WINDOWS-SFS)" X-Patchwork-Id: 12506779 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3A96EC433FE for ; Tue, 21 Sep 2021 02:20:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2099A611ED for ; Tue, 21 Sep 2021 02:20:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1347748AbhIUCVo (ORCPT ); Mon, 20 Sep 2021 22:21:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32948 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236463AbhIUBuT (ORCPT ); Mon, 20 Sep 2021 21:50:19 -0400 Received: from mail-wr1-x434.google.com (mail-wr1-x434.google.com [IPv6:2a00:1450:4864:20::434]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DC583C051777 for ; Mon, 20 Sep 2021 15:15:15 -0700 (PDT) Received: by mail-wr1-x434.google.com with SMTP id u18so31993183wrg.5 for ; Mon, 20 Sep 2021 15:15:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=HWJNmYhGvaqhX9Cma8pwcl0a3zgDaumlv7l13K4ZPss=; b=V9dZvzN1xDE8Mr4MSfVlaCuapNwMa9+SI1Mex+6ZPdDwz2JQawkHon7ITvkwuRzU7F uNlSWqHwJ46nKfX4145Fa+vcrdXdhKqHW2ZH2WmwIKkAjOGWU2sMO1QYTMdEOkKbiinU tm/6FdL7HbO4p8gzZe7V4xoTUHCdNTXyUng2TLCfrCF9wZfdjD1IwMmmsVY0/oiKDqf8 HURz0hWFo7ko/6PAaa25DVfBRfmHDqpFTWh+PN66Io7WgUoCqcuMN2W5+flJt/GOd0z4 UmwLcf4eBaJqPuZVPUUuMqD7hHD7dKnC2PHqivBAylQ647iEl/jjMQNgKpXkz6Uz/PBP OlNw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=HWJNmYhGvaqhX9Cma8pwcl0a3zgDaumlv7l13K4ZPss=; b=Tws9mZNJR/nSTuAtvOnYwUSrKKpbBHGLQmJhRnUBStJQi7de23+1y9eZR+QtnyivHK SiyNBVU7Ks3XVnRPQOr47r8Z2A5hfK9XRlEYBFD9Q6P0th4YA9nQTBShNIe1dOOlWblT 1WL9Az8rW1GRSIU4V8FDR//R8xsVhqX9eQ0Q4YZrnLxTd4VJABfv/Sd4umX5nf6HONy3 JFpJgecZjUmypYkTHJPKQmFFwlesAsxkDvC2Ei+oNZirietAT7HQZb77D9JV0atR1hfD 8J9bA0skOX/r8KAOelH1btMXLT92B5KRkA4umOvS42T3DsSK+cqLlbw3M9vt+6jV/CQ1 c/ew== X-Gm-Message-State: AOAM531/7gUNqzVWhe7z5y2RxBznb/i0W+4RtY2kzvQO2vhwnfvYP58Q qqU90fYksyVNLs5zwFzHIYRQ2zgK8ek= X-Google-Smtp-Source: ABdhPJzUSKI+Fb4NLmm8ln/vK4wq+yf61B11QerNrf0w3rGlmCwFQbgH2IEGqFCiT0GrScE2VUh3tw== X-Received: by 2002:adf:f208:: with SMTP id p8mr30696227wro.379.1632176114401; Mon, 20 Sep 2021 15:15:14 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id l19sm17494716wrc.16.2021.09.20.15.15.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Sep 2021 15:15:14 -0700 (PDT) Message-Id: <12cad737635663ed596e52f89f0f4f22f58bfe38.1632176111.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 20 Sep 2021 22:15:07 +0000 Subject: [PATCH v4 2/6] core.fsyncobjectfiles: batched disk flushes Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Neeraj-Personal , Johannes Schindelin , Jeff King , Jeff Hostetler , Christoph Hellwig , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , "Randall S. Becker" , Bagas Sanjaya , "Neeraj K. Singh" , Neeraj Singh Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Neeraj Singh From: Neeraj Singh When adding many objects to a repo with core.fsyncObjectFiles set to true, the cost of fsync'ing each object file can become prohibitive. One major source of the cost of fsync is the implied flush of the hardware writeback cache within the disk drive. Fortunately, Windows, macOS, and Linux each offer mechanisms to write data from the filesystem page cache without initiating a hardware flush. This patch introduces a new 'core.fsyncObjectFiles = batch' option that takes advantage of the bulk-checkin infrastructure to batch up hardware flushes. When the new mode is enabled we do the following for new objects: 1. Create a tmp_obj_XXXX file and write the object data to it. 2. Issue a pagecache writeback request and wait for it to complete. 3. Record the tmp name and the final name in the bulk-checkin state for later rename. At the end of the entire transaction we: 1. Issue a fsync against the lock file to flush the hardware writeback cache, which should by now have processed the tmp file writes. 2. Rename all of the temp files to their final names. 3. When updating the index and/or refs, we assume that Git will issue another fsync internal to that operation. On a filesystem with a singular journal that is updated during name operations (e.g. create, link, rename, etc), such as NTFS and HFS+, we would expect the fsync to trigger a journal writeout so that this sequence is enough to ensure that the user's data is durable by the time the git command returns. This change also updates the macOS code to trigger a real hardware flush via fnctl(fd, F_FULLFSYNC) when fsync_or_die is called. Previously, on macOS there was no guarantee of durability since a simple fsync(2) call does not flush any hardware caches. _Performance numbers_: Linux - Hyper-V VM running Kernel 5.11 (Ubuntu 20.04) on a fast SSD. Mac - macOS 11.5.1 running on a Mac mini on a 1TB Apple SSD. Windows - Same host as Linux, a preview version of Windows 11. This number is from a patch later in the series. Adding 500 files to the repo with 'git add' Times reported in seconds. core.fsyncObjectFiles | Linux | Mac | Windows ----------------------|-------|-------|-------- false | 0.06 | 0.35 | 0.61 true | 1.88 | 11.18 | 2.47 batch | 0.15 | 0.41 | 1.53 Signed-off-by: Neeraj Singh --- Documentation/config/core.txt | 26 ++++++++--- Makefile | 6 +++ builtin/add.c | 3 +- bulk-checkin.c | 81 ++++++++++++++++++++++++++++++++++- bulk-checkin.h | 5 ++- cache.h | 8 +++- config.c | 7 ++- config.mak.uname | 1 + configure.ac | 8 ++++ environment.c | 2 +- git-compat-util.h | 7 +++ object-file.c | 22 +--------- wrapper.c | 44 +++++++++++++++++++ write-or-die.c | 2 +- 14 files changed, 189 insertions(+), 33 deletions(-) diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt index c04f62a54a1..0006d90980d 100644 --- a/Documentation/config/core.txt +++ b/Documentation/config/core.txt @@ -548,12 +548,26 @@ core.whitespace:: errors. The default tab width is 8. Allowed values are 1 to 63. core.fsyncObjectFiles:: - This boolean will enable 'fsync()' when writing object files. -+ -This is a total waste of time and effort on a filesystem that orders -data writes properly, but can be useful for filesystems that do not use -journalling (traditional UNIX filesystems) or that only journal metadata -and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback"). + A value indicating the level of effort Git will expend in + trying to make objects added to the repo durable in the event + of an unclean system shutdown. This setting currently only + controls the object store, so updates to any refs or the + index may not be equally durable. ++ +* `false` allows data to remain in file system caches according to + operating system policy, whence it may be lost if the system loses power + or crashes. +* `true` triggers a data integrity flush for each object added to the + object store. This is the safest setting that is likely to ensure durability + across all operating systems and file systems that honor the 'fsync' system + call. However, this setting comes with a significant performance cost on + common hardware. +* `batch` enables an experimental mode that uses interfaces available in some + operating systems to write object data with a minimal set of FLUSH CACHE + (or equivalent) commands sent to the storage controller. If the operating + system interfaces are not available, this mode behaves the same as `true`. + This mode is expected to be safe on macOS for repos stored on HFS+ or APFS + filesystems and on Windows for repos stored on NTFS or ReFS. core.preloadIndex:: Enable parallel index preload for operations like 'git diff' diff --git a/Makefile b/Makefile index 429c276058d..326c7607e0f 100644 --- a/Makefile +++ b/Makefile @@ -406,6 +406,8 @@ all:: # # Define HAVE_CLOCK_MONOTONIC if your platform has CLOCK_MONOTONIC. # +# Define HAVE_SYNC_FILE_RANGE if your platform has sync_file_range. +# # Define NEEDS_LIBRT if your platform requires linking with librt (glibc version # before 2.17) for clock_gettime and CLOCK_MONOTONIC. # @@ -1896,6 +1898,10 @@ ifdef HAVE_CLOCK_MONOTONIC BASIC_CFLAGS += -DHAVE_CLOCK_MONOTONIC endif +ifdef HAVE_SYNC_FILE_RANGE + BASIC_CFLAGS += -DHAVE_SYNC_FILE_RANGE +endif + ifdef NEEDS_LIBRT EXTLIBS += -lrt endif diff --git a/builtin/add.c b/builtin/add.c index 2244311d485..dda4bf093a0 100644 --- a/builtin/add.c +++ b/builtin/add.c @@ -678,7 +678,8 @@ int cmd_add(int argc, const char **argv, const char *prefix) if (chmod_arg && pathspec.nr) exit_status |= chmod_pathspec(&pathspec, chmod_arg[0], show_only); - unplug_bulk_checkin(); + + unplug_bulk_checkin(&lock_file); finish: if (write_locked_index(&the_index, &lock_file, diff --git a/bulk-checkin.c b/bulk-checkin.c index f117d62c908..ddbab5e5c8c 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -3,15 +3,19 @@ */ #include "cache.h" #include "bulk-checkin.h" +#include "lockfile.h" #include "repository.h" #include "csum-file.h" #include "pack.h" #include "strbuf.h" +#include "string-list.h" #include "packfile.h" #include "object-store.h" static int bulk_checkin_plugged; +static struct string_list bulk_fsync_state = STRING_LIST_INIT_DUP; + static struct bulk_checkin_state { char *pack_tmp_name; struct hashfile *f; @@ -62,6 +66,32 @@ clear_exit: reprepare_packed_git(the_repository); } +static void do_sync_and_rename(struct string_list *fsync_state, struct lock_file *lock_file) +{ + if (fsync_state->nr) { + struct string_list_item *rename; + + /* + * Issue a full hardware flush against the lock file to ensure + * that all objects are durable before any renames occur. + * The code in fsync_and_close_loose_object_bulk_checkin has + * already ensured that writeout has occurred, but it has not + * flushed any writeback cache in the storage hardware. + */ + fsync_or_die(get_lock_file_fd(lock_file), get_lock_file_path(lock_file)); + + for_each_string_list_item(rename, fsync_state) { + const char *src = rename->string; + const char *dst = rename->util; + + if (finalize_object_file(src, dst)) + die_errno(_("could not rename '%s' to '%s'"), src, dst); + } + + string_list_clear(fsync_state, 1); + } +} + static int already_written(struct bulk_checkin_state *state, struct object_id *oid) { int i; @@ -256,6 +286,53 @@ static int deflate_to_pack(struct bulk_checkin_state *state, return 0; } +static void add_rename_bulk_checkin(struct string_list *fsync_state, + const char *src, const char *dst) +{ + string_list_insert(fsync_state, src)->util = xstrdup(dst); +} + +int fsync_and_close_loose_object_bulk_checkin(int fd, const char *tmpfile, + const char *filename, time_t mtime) +{ + int do_finalize = 1; + int ret = 0; + + if (fsync_object_files != FSYNC_OBJECT_FILES_OFF) { + /* + * If we have a plugged bulk checkin, we issue a call that + * cleans the filesystem page cache but avoids a hardware flush + * command. Later on we will issue a single hardware flush + * before renaming files as part of do_sync_and_rename. + */ + if (bulk_checkin_plugged && + fsync_object_files == FSYNC_OBJECT_FILES_BATCH && + git_fsync(fd, FSYNC_WRITEOUT_ONLY) >= 0) { + add_rename_bulk_checkin(&bulk_fsync_state, tmpfile, filename); + do_finalize = 0; + + } else { + fsync_or_die(fd, "loose object file"); + } + } + + if (close(fd)) + die_errno(_("error when closing loose object file")); + + if (mtime) { + struct utimbuf utb; + utb.actime = mtime; + utb.modtime = mtime; + if (utime(tmpfile, &utb) < 0) + warning_errno(_("failed utime() on %s"), tmpfile); + } + + if (do_finalize) + ret = finalize_object_file(tmpfile, filename); + + return ret; +} + int index_bulk_checkin(struct object_id *oid, int fd, size_t size, enum object_type type, const char *path, unsigned flags) @@ -273,10 +350,12 @@ void plug_bulk_checkin(void) bulk_checkin_plugged = 1; } -void unplug_bulk_checkin(void) +void unplug_bulk_checkin(struct lock_file *lock_file) { assert(bulk_checkin_plugged); bulk_checkin_plugged = 0; if (bulk_checkin_state.f) finish_bulk_checkin(&bulk_checkin_state); + + do_sync_and_rename(&bulk_fsync_state, lock_file); } diff --git a/bulk-checkin.h b/bulk-checkin.h index b26f3dc3b74..4a3309c1531 100644 --- a/bulk-checkin.h +++ b/bulk-checkin.h @@ -6,11 +6,14 @@ #include "cache.h" +int fsync_and_close_loose_object_bulk_checkin(int fd, const char *tmpfile, + const char *filename, time_t mtime); + int index_bulk_checkin(struct object_id *oid, int fd, size_t size, enum object_type type, const char *path, unsigned flags); void plug_bulk_checkin(void); -void unplug_bulk_checkin(void); +void unplug_bulk_checkin(struct lock_file *); #endif diff --git a/cache.h b/cache.h index d23de693680..39b3a88181a 100644 --- a/cache.h +++ b/cache.h @@ -985,7 +985,13 @@ void reset_shared_repository(void); extern int read_replace_refs; extern char *git_replace_ref_base; -extern int fsync_object_files; +enum FSYNC_OBJECT_FILES_MODE { + FSYNC_OBJECT_FILES_OFF, + FSYNC_OBJECT_FILES_ON, + FSYNC_OBJECT_FILES_BATCH +}; + +extern enum FSYNC_OBJECT_FILES_MODE fsync_object_files; extern int core_preload_index; extern int precomposed_unicode; extern int protect_hfs; diff --git a/config.c b/config.c index cb4a8058bff..1b403e00241 100644 --- a/config.c +++ b/config.c @@ -1509,7 +1509,12 @@ static int git_default_core_config(const char *var, const char *value, void *cb) } if (!strcmp(var, "core.fsyncobjectfiles")) { - fsync_object_files = git_config_bool(var, value); + if (value && !strcmp(value, "batch")) + fsync_object_files = FSYNC_OBJECT_FILES_BATCH; + else if (git_config_bool(var, value)) + fsync_object_files = FSYNC_OBJECT_FILES_ON; + else + fsync_object_files = FSYNC_OBJECT_FILES_OFF; return 0; } diff --git a/config.mak.uname b/config.mak.uname index 76516aaa9a5..e6d482fbcc6 100644 --- a/config.mak.uname +++ b/config.mak.uname @@ -53,6 +53,7 @@ ifeq ($(uname_S),Linux) HAVE_CLOCK_MONOTONIC = YesPlease # -lrt is needed for clock_gettime on glibc <= 2.16 NEEDS_LIBRT = YesPlease + HAVE_SYNC_FILE_RANGE = YesPlease HAVE_GETDELIM = YesPlease SANE_TEXT_GREP=-a FREAD_READS_DIRECTORIES = UnfortunatelyYes diff --git a/configure.ac b/configure.ac index 031e8d3fee8..c711037d625 100644 --- a/configure.ac +++ b/configure.ac @@ -1090,6 +1090,14 @@ AC_COMPILE_IFELSE([CLOCK_MONOTONIC_SRC], [AC_MSG_RESULT([no]) HAVE_CLOCK_MONOTONIC=]) GIT_CONF_SUBST([HAVE_CLOCK_MONOTONIC]) + +# +# Define HAVE_SYNC_FILE_RANGE=YesPlease if sync_file_range is available. +GIT_CHECK_FUNC(sync_file_range, + [HAVE_SYNC_FILE_RANGE=YesPlease], + [HAVE_SYNC_FILE_RANGE]) +GIT_CONF_SUBST([HAVE_SYNC_FILE_RANGE]) + # # Define NO_SETITIMER if you don't have setitimer. GIT_CHECK_FUNC(setitimer, diff --git a/environment.c b/environment.c index d6b22ede7ea..3e23eafff80 100644 --- a/environment.c +++ b/environment.c @@ -43,7 +43,7 @@ const char *git_hooks_path; int zlib_compression_level = Z_BEST_SPEED; int core_compression_level; int pack_compression_level = Z_DEFAULT_COMPRESSION; -int fsync_object_files; +enum FSYNC_OBJECT_FILES_MODE fsync_object_files; size_t packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE; size_t packed_git_limit = DEFAULT_PACKED_GIT_LIMIT; size_t delta_base_cache_limit = 96 * 1024 * 1024; diff --git a/git-compat-util.h b/git-compat-util.h index b46605300ab..d14e2436276 100644 --- a/git-compat-util.h +++ b/git-compat-util.h @@ -1210,6 +1210,13 @@ __attribute__((format (printf, 1, 2))) NORETURN void BUG(const char *fmt, ...); #endif +enum fsync_action { + FSYNC_WRITEOUT_ONLY, + FSYNC_HARDWARE_FLUSH +}; + +int git_fsync(int fd, enum fsync_action action); + /* * Preserves errno, prints a message, but gives no warning for ENOENT. * Returns 0 on success, which includes trying to unlink an object that does diff --git a/object-file.c b/object-file.c index a8be8994814..ea14c3a3483 100644 --- a/object-file.c +++ b/object-file.c @@ -1859,15 +1859,6 @@ int hash_object_file(const struct git_hash_algo *algo, const void *buf, return 0; } -/* Finalize a file on disk, and close it. */ -static void close_loose_object(int fd) -{ - if (fsync_object_files) - fsync_or_die(fd, "loose object file"); - if (close(fd) != 0) - die_errno(_("error when closing loose object file")); -} - /* Size of directory component, including the ending '/' */ static inline int directory_size(const char *filename) { @@ -1973,17 +1964,8 @@ static int write_loose_object(const struct object_id *oid, char *hdr, die(_("confused by unstable object source data for %s"), oid_to_hex(oid)); - close_loose_object(fd); - - if (mtime) { - struct utimbuf utb; - utb.actime = mtime; - utb.modtime = mtime; - if (utime(tmp_file.buf, &utb) < 0) - warning_errno(_("failed utime() on %s"), tmp_file.buf); - } - - return finalize_object_file(tmp_file.buf, filename.buf); + return fsync_and_close_loose_object_bulk_checkin(fd, tmp_file.buf, + filename.buf, mtime); } static int freshen_loose_object(const struct object_id *oid) diff --git a/wrapper.c b/wrapper.c index 7c6586af321..bb4f9f043ce 100644 --- a/wrapper.c +++ b/wrapper.c @@ -540,6 +540,50 @@ int xmkstemp_mode(char *filename_template, int mode) return fd; } +int git_fsync(int fd, enum fsync_action action) +{ + switch (action) { + case FSYNC_WRITEOUT_ONLY: + +#ifdef __APPLE__ + /* + * on macOS, fsync just causes filesystem cache writeback but does not + * flush hardware caches. + */ + return fsync(fd); +#endif + +#ifdef HAVE_SYNC_FILE_RANGE + /* + * On linux 2.6.17 and above, sync_file_range is the way to issue + * a writeback without a hardware flush. An offset of 0 and size of 0 + * indicates writeout of the entire file and the wait flags ensure that all + * dirty data is written to the disk (potentially in a disk-side cache) + * before we continue. + */ + + return sync_file_range(fd, 0, 0, SYNC_FILE_RANGE_WAIT_BEFORE | + SYNC_FILE_RANGE_WRITE | + SYNC_FILE_RANGE_WAIT_AFTER); +#endif + + errno = ENOSYS; + return -1; + + case FSYNC_HARDWARE_FLUSH: + +#ifdef __APPLE__ + return fcntl(fd, F_FULLFSYNC); +#else + return fsync(fd); +#endif + + default: + BUG("unexpected git_fsync(%d) call", action); + } + +} + static int warn_if_unremovable(const char *op, const char *file, int rc) { int err; diff --git a/write-or-die.c b/write-or-die.c index d33e68f6abb..8f53953d4ab 100644 --- a/write-or-die.c +++ b/write-or-die.c @@ -57,7 +57,7 @@ void fprintf_or_die(FILE *f, const char *fmt, ...) void fsync_or_die(int fd, const char *msg) { - while (fsync(fd) < 0) { + while (git_fsync(fd, FSYNC_HARDWARE_FLUSH) < 0) { if (errno != EINTR) die_errno("fsync error on '%s'", msg); } From patchwork Mon Sep 20 22:15:08 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Neeraj Singh (WINDOWS-SFS)" X-Patchwork-Id: 12506787 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9FB3BC433FE for ; Tue, 21 Sep 2021 02:21:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 794E06124A for ; Tue, 21 Sep 2021 02:21:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1347849AbhIUCWZ (ORCPT ); Mon, 20 Sep 2021 22:22:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33458 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234362AbhIUBuU (ORCPT ); Mon, 20 Sep 2021 21:50:20 -0400 Received: from mail-wr1-x42e.google.com (mail-wr1-x42e.google.com [IPv6:2a00:1450:4864:20::42e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 66FF3C07E5C1 for ; Mon, 20 Sep 2021 15:15:16 -0700 (PDT) Received: by mail-wr1-x42e.google.com with SMTP id w29so33732184wra.8 for ; Mon, 20 Sep 2021 15:15:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=4TwjEERCBTdTIk5k03nSoPU6aP7qbN6anXF+SduK7kA=; b=EH1uYVXdPuCHVtvmEDbd9j249I2/KSQnGQA/2Re4CQT6LtsC+49W0f8jh3fGl1DC18 M6QPCPss21LGoac7vJwu8dV1HqoyoBCvQCFi4aBZa+fsTBlsWdK9VACFLA4CY6mTZDIA gpViQmTWxAaLiMco2txIh61uFzuTf1MHiDJYElEkw9e2fKy8x69nDHJtITd6Ge52kpLB 0slmkFKFZo8oMGTPNr9qt4TSyXw6KSlXPw0wM9H59jQqhIVx2CGjhpRYA3/dzVSMTBAj A/g2GHfETA+iiOHODaelaw9GFRmFc6l0/Ls5tpRGmKhXsR3nwiDIch6omHXp8BFTA2V5 2JlQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=4TwjEERCBTdTIk5k03nSoPU6aP7qbN6anXF+SduK7kA=; b=YQ4P3fRg6HD9FBLHZPhnx/6XPW9eB0npBVebCki4omZBfwCKNNf1A/7gpQ2TfvCvXZ onsDOXyeI6EreGihhrowpKdXDLtahowa/fkKG+ueCWAXpVWnBRNeiDOa9evbOu9q6wYv ZrcLPKoEkJo6ab4hG6kGz1gUMVhR4lHZGhMZCXAbzBrxs8Q2PQHrqL67SRkPHcCrAswe 6KW4Hr2qyTp+Kmzbvc+NhClGt3BLiSZwdiRt3C6L+aicU8taQdR85a9lR/LSGMYYw8Zy vqDJNrSBCkasRxutTOS8+gnvsTKlx6SD4VcjWg3Ipvm0OucfRtw7RibwJn3ashWv5Ukh jf3Q== X-Gm-Message-State: AOAM531bZHSdeJps9fyKREVqB6Zd0z5q6MgBia/QLzGn21L59u6i5jzV HKoQCxygAK7bXT7tYGwTz7vHRQldPsA= X-Google-Smtp-Source: ABdhPJwgH6u8ml+idY8s1iPety3Y32BXB80hHsEQiI+C6anPFEvEKuzWz6ItfE7H3tabmc8lwA/bMA== X-Received: by 2002:adf:e0cc:: with SMTP id m12mr30801604wri.62.1632176115037; Mon, 20 Sep 2021 15:15:15 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id t23sm18071595wrb.71.2021.09.20.15.15.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Sep 2021 15:15:14 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Mon, 20 Sep 2021 22:15:08 +0000 Subject: [PATCH v4 3/6] core.fsyncobjectfiles: add windows support for batch mode Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Neeraj-Personal , Johannes Schindelin , Jeff King , Jeff Hostetler , Christoph Hellwig , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , "Randall S. Becker" , Bagas Sanjaya , "Neeraj K. Singh" , Neeraj Singh Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Neeraj Singh From: Neeraj Singh This commit adds a win32 implementation for fsync_no_flush that is called git_fsync. The 'NtFlushBuffersFileEx' function being called is available since Windows 8. If the function is not available, we return -1 and Git falls back to doing a full fsync. The operating system is told to flush data only without a hardware flush primitive. A later full fsync will cause the metadata log to be flushed and then the disk cache to be flushed on NTFS and ReFS. Other filesystems will treat this as a full flush operation. I added a new file here for this system call so as not to conflict with downstream changes in the git-for-windows repository related to fscache. Signed-off-by: Neeraj Singh --- compat/mingw.h | 3 +++ compat/win32/flush.c | 29 +++++++++++++++++++++++++++++ config.mak.uname | 2 ++ contrib/buildsystems/CMakeLists.txt | 3 ++- wrapper.c | 4 ++++ 5 files changed, 40 insertions(+), 1 deletion(-) create mode 100644 compat/win32/flush.c diff --git a/compat/mingw.h b/compat/mingw.h index c9a52ad64a6..6074a3d3ced 100644 --- a/compat/mingw.h +++ b/compat/mingw.h @@ -329,6 +329,9 @@ int mingw_getpagesize(void); #define getpagesize mingw_getpagesize #endif +int win32_fsync_no_flush(int fd); +#define fsync_no_flush win32_fsync_no_flush + struct rlimit { unsigned int rlim_cur; }; diff --git a/compat/win32/flush.c b/compat/win32/flush.c new file mode 100644 index 00000000000..c013920ce37 --- /dev/null +++ b/compat/win32/flush.c @@ -0,0 +1,29 @@ +#include "../../git-compat-util.h" +#include +#include "lazyload.h" + +int win32_fsync_no_flush(int fd) +{ + IO_STATUS_BLOCK io_status; + +#define FLUSH_FLAGS_FILE_DATA_ONLY 1 + + DECLARE_PROC_ADDR(ntdll.dll, NTSTATUS, NtFlushBuffersFileEx, + HANDLE FileHandle, ULONG Flags, PVOID Parameters, ULONG ParameterSize, + PIO_STATUS_BLOCK IoStatusBlock); + + if (!INIT_PROC_ADDR(NtFlushBuffersFileEx)) { + errno = ENOSYS; + return -1; + } + + /* See https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/ntifs/nf-ntifs-ntflushbuffersfileex */ + memset(&io_status, 0, sizeof(io_status)); + if (NtFlushBuffersFileEx((HANDLE)_get_osfhandle(fd), FLUSH_FLAGS_FILE_DATA_ONLY, + NULL, 0, &io_status)) { + errno = EINVAL; + return -1; + } + + return 0; +} diff --git a/config.mak.uname b/config.mak.uname index e6d482fbcc6..34c93314a50 100644 --- a/config.mak.uname +++ b/config.mak.uname @@ -451,6 +451,7 @@ endif CFLAGS = BASIC_CFLAGS = -nologo -I. -Icompat/vcbuild/include -DWIN32 -D_CONSOLE -DHAVE_STRING_H -D_CRT_SECURE_NO_WARNINGS -D_CRT_NONSTDC_NO_DEPRECATE COMPAT_OBJS = compat/msvc.o compat/winansi.o \ + compat/win32/flush.o \ compat/win32/path-utils.o \ compat/win32/pthread.o compat/win32/syslog.o \ compat/win32/trace2_win32_process_info.o \ @@ -626,6 +627,7 @@ ifneq (,$(findstring MINGW,$(uname_S))) COMPAT_CFLAGS += -DSTRIP_EXTENSION=\".exe\" COMPAT_OBJS += compat/mingw.o compat/winansi.o \ compat/win32/trace2_win32_process_info.o \ + compat/win32/flush.o \ compat/win32/path-utils.o \ compat/win32/pthread.o compat/win32/syslog.o \ compat/win32/dirent.o diff --git a/contrib/buildsystems/CMakeLists.txt b/contrib/buildsystems/CMakeLists.txt index 171b4124afe..b573a5ee122 100644 --- a/contrib/buildsystems/CMakeLists.txt +++ b/contrib/buildsystems/CMakeLists.txt @@ -261,7 +261,8 @@ if(CMAKE_SYSTEM_NAME STREQUAL "Windows") NOGDI OBJECT_CREATION_MODE=1 __USE_MINGW_ANSI_STDIO=0 USE_NED_ALLOCATOR OVERRIDE_STRDUP MMAP_PREVENTS_DELETE USE_WIN32_MMAP UNICODE _UNICODE HAVE_WPGMPTR ENSURE_MSYSTEM_IS_SET) - list(APPEND compat_SOURCES compat/mingw.c compat/winansi.c compat/win32/path-utils.c + list(APPEND compat_SOURCES compat/mingw.c compat/winansi.c + compat/win32/flush.c compat/win32/path-utils.c compat/win32/pthread.c compat/win32mmap.c compat/win32/syslog.c compat/win32/trace2_win32_process_info.c compat/win32/dirent.c compat/nedmalloc/nedmalloc.c compat/strdup.c) diff --git a/wrapper.c b/wrapper.c index bb4f9f043ce..1a1e2fba9c9 100644 --- a/wrapper.c +++ b/wrapper.c @@ -567,6 +567,10 @@ int git_fsync(int fd, enum fsync_action action) SYNC_FILE_RANGE_WAIT_AFTER); #endif +#ifdef fsync_no_flush + return fsync_no_flush(fd); +#endif + errno = ENOSYS; return -1; From patchwork Mon Sep 20 22:15:09 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Neeraj Singh (WINDOWS-SFS)" X-Patchwork-Id: 12506783 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B459CC433EF for ; Tue, 21 Sep 2021 02:20:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9CCA5611ED for ; Tue, 21 Sep 2021 02:20:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1347800AbhIUCVu (ORCPT ); Mon, 20 Sep 2021 22:21:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32966 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236467AbhIUBuT (ORCPT ); Mon, 20 Sep 2021 21:50:19 -0400 Received: from mail-wr1-x433.google.com (mail-wr1-x433.google.com [IPv6:2a00:1450:4864:20::433]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0562FC07E5C2 for ; Mon, 20 Sep 2021 15:15:17 -0700 (PDT) Received: by mail-wr1-x433.google.com with SMTP id d6so33570896wrc.11 for ; Mon, 20 Sep 2021 15:15:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=k48/OY3a/Od2qbkotgxVXMxzPpJgzp+Wqzz1gQdi384=; b=ftnw/rAjZF8jODN8OPNQFvJFATqUuFL4gfZ21GvP47QcbA2szBa+NA43yPaBZ9oaef 7L/E4n44L4zpthy+paey7G7Dic5QztVWo+DnKNf1akL2rdEv+L/OhXzinl/2GRp3t82D UqskCspAQyAvSm5vJIFvAXdTyEtFrtPBWk9C/xD1tXi7DxNpBzBLO6266m1wSesVFY43 oofocgLv8UM+fKb1myLEUVszg0YNPpw/o7GHbpvzl06LEophC2iFU8iXjKr02l3eVSw7 +rWIWrhB/6LemyNp8oNnXDrUpyN6NbxxdRQrEnFUqwn42wrKpZS77Gsrp8ySa3Lw4wiU keUg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=k48/OY3a/Od2qbkotgxVXMxzPpJgzp+Wqzz1gQdi384=; b=oJY/hCCWa6Crkd//exWw9Y1xuN43QVfKs0IdVGqWZfZ7NkHRJLOtlF+r9Lc2vvyspx j1I3J3RldrZvbJkUiJIX4bXe2ATjbQdu20+ie06+tQ0vF+Z/1vhkuFOd1Kcwo/DqERDw nbWktJ0nn9mJRZuQaLNcF7+q2ZtW+odUbKsuGIvwGr+YDNn8Cs6CHHr21HTyCXZAXLkl Kjp/pFko17mVHqZqxglhdUKuSUtyYsAJ9oz+x8QmxsJGR9IaC/ZGOS6D6JIsXavr9Fii hgyZJzY5+tqVrstAyGn7cv92OnFuuEEvzWQPlqwT2PiA6+jikiIr+G4tUCVXExy8eXvT UJAQ== X-Gm-Message-State: AOAM533HnoqPikmaho2hekpVCzFANIZGuXS965c31zErgdptVU1Wbly1 LzoaE+tq8Q6o7qz40WcHu31U1gB1PDw= X-Google-Smtp-Source: ABdhPJyDInyBcF4kp9J0IdisDzaYRoeGFz9ExUKd8muQM+ZhtucBzn3LYzlHQhcnhUt3FLyC9HSXVQ== X-Received: by 2002:a5d:55cf:: with SMTP id i15mr30205236wrw.224.1632176115611; Mon, 20 Sep 2021 15:15:15 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id 48sm18306290wrc.14.2021.09.20.15.15.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Sep 2021 15:15:15 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Mon, 20 Sep 2021 22:15:09 +0000 Subject: [PATCH v4 4/6] update-index: use the bulk-checkin infrastructure Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Neeraj-Personal , Johannes Schindelin , Jeff King , Jeff Hostetler , Christoph Hellwig , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , "Randall S. Becker" , Bagas Sanjaya , "Neeraj K. Singh" , Neeraj Singh Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Neeraj Singh From: Neeraj Singh The update-index functionality is used internally by 'git stash push' to setup the internal stashed commit. This change enables bulk-checkin for update-index infrastructure to speed up adding new objects to the object database by leveraging the pack functionality and the new bulk-fsync functionality. This mode is enabled when passing paths to update-index via the --stdin flag, as is done by 'git stash'. There is some risk with this change, since under batch fsync, the object files will not be available until the update-index is entirely complete. This usage is unlikely, since any tool invoking update-index and expecting to see objects would have to snoop the output of --verbose to find out when update-index has actually processed a given path. Additionally the index is locked for the duration of the update. Signed-off-by: Neeraj Singh --- builtin/update-index.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/builtin/update-index.c b/builtin/update-index.c index 187203e8bb5..b0689f2cdf6 100644 --- a/builtin/update-index.c +++ b/builtin/update-index.c @@ -5,6 +5,7 @@ */ #define USE_THE_INDEX_COMPATIBILITY_MACROS #include "cache.h" +#include "bulk-checkin.h" #include "config.h" #include "lockfile.h" #include "quote.h" @@ -1150,6 +1151,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) struct strbuf unquoted = STRBUF_INIT; setup_work_tree(); + plug_bulk_checkin(); while (getline_fn(&buf, stdin) != EOF) { char *p; if (!nul_term_line && buf.buf[0] == '"') { @@ -1164,6 +1166,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) chmod_path(set_executable_bit, p); free(p); } + unplug_bulk_checkin(&lock_file); strbuf_release(&unquoted); strbuf_release(&buf); } From patchwork Mon Sep 20 22:15:10 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Neeraj Singh (WINDOWS-SFS)" X-Patchwork-Id: 12506777 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0EE28C433EF for ; Tue, 21 Sep 2021 02:20:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E336660F9D for ; Tue, 21 Sep 2021 02:20:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1347726AbhIUCVl (ORCPT ); Mon, 20 Sep 2021 22:21:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32968 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236465AbhIUBuT (ORCPT ); Mon, 20 Sep 2021 21:50:19 -0400 Received: from mail-wr1-x42b.google.com (mail-wr1-x42b.google.com [IPv6:2a00:1450:4864:20::42b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A620BC07E5C3 for ; Mon, 20 Sep 2021 15:15:17 -0700 (PDT) Received: by mail-wr1-x42b.google.com with SMTP id q11so33730118wrr.9 for ; Mon, 20 Sep 2021 15:15:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=NNnWulhxCRafjPQevx0rNtqzkNiEkXLvwc0u772NMxI=; b=YcVRueMKaRiK5nUlZZU/WNFojgmM8pKgEAzazrTM3XLijWubSxPp7TryIPKLtisa4w 2cKFAu6ujwsUVmP7XkwC8ywuwLJuiHlcIDB9SMsYy86qY8fuR7yKC+YrwNF0f59yQcSM DUa/LRl7z+pHSeRlh6FlshC9LyKf8i4z3TAmlTrB1vHgk77M94sRdzPmRTzviqmsRAP/ 9vk+vZbno+6M3pLiXg96d306ikVa5iq3urOOdsIgvecF7ddgenWdDDnXX6h9qGZZbJsn 64yUuQO6WCBfi7E38jQgiSE91El3D4FuTwxbkkHFBKKxrNF/WO2CA1x1YOIS7hA0qWEx 5b+A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=NNnWulhxCRafjPQevx0rNtqzkNiEkXLvwc0u772NMxI=; b=o0h07OMGeBtuPFjnAtFOHNF8Wmlj2+OJgfzVyGSg89WwIZHUCy9PKaHyf8PZdm2RH4 n0VYBv1e6BqY1U+QonwTw5gqAMmpsaFBn/k1/xYSdCPZvGzlzTyCIrCIX6heIul8kQ2T dODAUhgZKfFA+jS/zKJWPOJZknxtUikQJAwQEP7lO9YbcWIhfLiJuBnaU4MEi5On/7T2 dAewUURxRijRdhMn5eQtlxD+DxGSYBKH1ziKblTID3tOfB6YhHGZMRblSXRbTtJbgh53 txYYzkjJAfq3HSj4LsomZJe2rg90eUc3dFagel2f6rdlto7ob8uRWXPg0iX/b6OOpxYI G6kQ== X-Gm-Message-State: AOAM531p15cO/0L1BbjtMQAX6FKZgvcyMIm++VHeZbSSkLYovlANvGs2 V1REs8q/FjHGn6KT188qM7GBjBKXfsc= X-Google-Smtp-Source: ABdhPJzHFXZs+BGv2YbWCRgaSFvU16KUtodfD1/0A1cMGua9joKYLbliSVogVlEt1SqUES1TM2z9FA== X-Received: by 2002:a7b:c014:: with SMTP id c20mr1207390wmb.81.1632176116319; Mon, 20 Sep 2021 15:15:16 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id o1sm17009773wru.91.2021.09.20.15.15.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Sep 2021 15:15:15 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Mon, 20 Sep 2021 22:15:10 +0000 Subject: [PATCH v4 5/6] core.fsyncobjectfiles: tests for batch mode Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Neeraj-Personal , Johannes Schindelin , Jeff King , Jeff Hostetler , Christoph Hellwig , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , "Randall S. Becker" , Bagas Sanjaya , "Neeraj K. Singh" , Neeraj Singh Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Neeraj Singh From: Neeraj Singh Add test cases to exercise batch mode for 'git add' and 'git stash'. These tests ensure that the added data winds up in the object database. I verified the tests by introducing an incorrect rename in do_sync_and_rename. Signed-off-by: Neeraj Singh --- t/lib-unique-files.sh | 34 ++++++++++++++++++++++++++++++++++ t/t3700-add.sh | 11 +++++++++++ t/t3903-stash.sh | 14 ++++++++++++++ 3 files changed, 59 insertions(+) create mode 100644 t/lib-unique-files.sh diff --git a/t/lib-unique-files.sh b/t/lib-unique-files.sh new file mode 100644 index 00000000000..a8a25eba61d --- /dev/null +++ b/t/lib-unique-files.sh @@ -0,0 +1,34 @@ +# Helper to create files with unique contents + +test_create_unique_files_base__=$(date -u) +test_create_unique_files_counter__=0 + +# Create multiple files with unique contents. Takes the number of +# directories, the number of files in each directory, and the base +# directory. +# +# test_create_unique_files 2 3 . -- Creates 2 directories with 3 files +# each in the specified directory, all +# with unique contents. + +test_create_unique_files() { + test "$#" -ne 3 && BUG "3 param" + + local dirs=$1 + local files=$2 + local basedir=$3 + + rm -rf $basedir >/dev/null + + for i in $(test_seq $dirs) + do + local dir=$basedir/dir$i + + mkdir -p "$dir" > /dev/null + for j in $(test_seq $files) + do + test_create_unique_files_counter__=$((test_create_unique_files_counter__ + 1)) + echo "$test_create_unique_files_base__.$test_create_unique_files_counter__" >"$dir/file$j.txt" + done + done +} diff --git a/t/t3700-add.sh b/t/t3700-add.sh index 4086e1ebbc9..2122acc3e9e 100755 --- a/t/t3700-add.sh +++ b/t/t3700-add.sh @@ -7,6 +7,8 @@ test_description='Test of git add, including the -- option.' . ./test-lib.sh +. $TEST_DIRECTORY/lib-unique-files.sh + # Test the file mode "$1" of the file "$2" in the index. test_mode_in_index () { case "$(git ls-files -s "$2")" in @@ -33,6 +35,15 @@ test_expect_success \ 'Test that "git add -- -q" works' \ 'touch -- -q && git add -- -q' +test_expect_success 'git add: core.fsyncobjectfiles=batch' " + test_create_unique_files 2 4 fsync-files && + git -c core.fsyncobjectfiles=batch add -- ./fsync-files/ && + rm -f fsynced_files && + git ls-files --stage fsync-files/ > fsynced_files && + test_line_count = 8 fsynced_files && + cat fsynced_files | awk '{print \$2}' | xargs -n1 git cat-file -e +" + test_expect_success \ 'git add: Test that executable bit is not used if core.filemode=0' \ 'git config core.filemode 0 && diff --git a/t/t3903-stash.sh b/t/t3903-stash.sh index 873aa56e359..0b4e8bb55b8 100755 --- a/t/t3903-stash.sh +++ b/t/t3903-stash.sh @@ -9,6 +9,7 @@ GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME . ./test-lib.sh +. $TEST_DIRECTORY/lib-unique-files.sh diff_cmp () { for i in "$1" "$2" @@ -1293,6 +1294,19 @@ test_expect_success 'stash handles skip-worktree entries nicely' ' git rev-parse --verify refs/stash:A.t ' +test_expect_success 'stash with core.fsyncobjectfiles=batch' " + test_create_unique_files 2 4 fsync-files && + git -c core.fsyncobjectfiles=batch stash push -u -- ./fsync-files/ && + rm -f fsynced_files && + + # The files were untracked, so use the third parent, + # which contains the untracked files + git ls-tree -r stash^3 -- ./fsync-files/ > fsynced_files && + test_line_count = 8 fsynced_files && + cat fsynced_files | awk '{print \$3}' | xargs -n1 git cat-file -e +" + + test_expect_success 'stash -c stash.useBuiltin=false warning ' ' expected="stash.useBuiltin support has been removed" && From patchwork Mon Sep 20 22:15:11 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Neeraj Singh (WINDOWS-SFS)" X-Patchwork-Id: 12506789 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A6A94C433EF for ; Tue, 21 Sep 2021 02:21:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8419E6126A for ; Tue, 21 Sep 2021 02:21:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1347858AbhIUCWa (ORCPT ); Mon, 20 Sep 2021 22:22:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33352 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236453AbhIUBuU (ORCPT ); Mon, 20 Sep 2021 21:50:20 -0400 Received: from mail-wr1-x432.google.com (mail-wr1-x432.google.com [IPv6:2a00:1450:4864:20::432]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 533DAC07E5C4 for ; Mon, 20 Sep 2021 15:15:18 -0700 (PDT) Received: by mail-wr1-x432.google.com with SMTP id w17so25599961wrv.10 for ; Mon, 20 Sep 2021 15:15:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=THp3oyYMQZ0IyvYbWDVtqI55HStotyepThNgwaAMuqY=; b=idDtJXXLboCwldv4igpDXn8uT6iaEPaKO+U9r3LsTAyDvQXnhvCKndKFkl2dBwsped mRUITE6tgknc/a/9j7P5m9lEesuLEASzYiDJUMxrpZTZKF2gaACfUjSP9aFIyeoiEnLW DrSINIWScn+AiPhH7Mkw2qpaH0OBqN0/silfT6C9yiT7E/+7vttdb9tHvvF8KuaT9GaN P9Etzm07H7nwjzsVA3jMt9fKC4qOsSipNd9tqISnCbuA2ocYo5M1f2I2RnlPwPlBpjyB 3mrmAyZJuXUBSidHN5pFywk90dy523E4b42icSYbV9nnGc3+crJHeL8QNJqYiCDUs4kN l9/w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=THp3oyYMQZ0IyvYbWDVtqI55HStotyepThNgwaAMuqY=; b=L0ImBESicHiFF+8F7o6aZRe5LZCyXGOpOf/9eVkVZJ7B/8NFfizAH6U+H+PUO0acAB VKXEqt3eO0ymWvQ9DxwWIsblmKIhtoloAsy945koFTS6AZmHubAsfYc+XJseYAedlIBX U0AcaHM6skWvyouKQduglWQwQCdhsGSkNdM7mn/IPEYFt5wXr43vSgJ0r64Y+Ke8jcHu sNNrAYLKaYaQb9l0Npg8AaVTYz8+s9uJTx9c/v/kb+Lu0hv+4jJExhWQx5D6huawkkfo 6UHJXkJjmeCQcURNpjfgCVNXqjeFezAfErg9oyrrrXE8QZDUD1+1IJgvQzjXEYQP/5p0 VSZQ== X-Gm-Message-State: AOAM533CLhOIklYC3vGia2wutC1iwaZW8GFnr4rSYeVGa0pEuYUe+047 4r+JlIxX/p0QFFXejMN/fTfVlgaw0LY= X-Google-Smtp-Source: ABdhPJzM3v0bI9Fc+/QzAcDiVE67CIgqlLa1ZXNplVAlF7NeAVEOaNWygz6mewNHjpyJO29A5qwQ7A== X-Received: by 2002:a05:600c:350f:: with SMTP id h15mr1252387wmq.144.1632176116913; Mon, 20 Sep 2021 15:15:16 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id v21sm12242098wrv.3.2021.09.20.15.15.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Sep 2021 15:15:16 -0700 (PDT) Message-Id: <3e6b80b5fa25c5f1dfdbe299e088323c86dc8587.1632176111.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 20 Sep 2021 22:15:11 +0000 Subject: [PATCH v4 6/6] core.fsyncobjectfiles: performance tests for add and stash Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Neeraj-Personal , Johannes Schindelin , Jeff King , Jeff Hostetler , Christoph Hellwig , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , "Randall S. Becker" , Bagas Sanjaya , "Neeraj K. Singh" , Neeraj Singh Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Neeraj Singh From: Neeraj Singh Add a basic performance test for "git add" and "git stash" of a lot of new objects with various fsync settings. Signed-off-by: Neeraj Singh --- t/perf/p3700-add.sh | 43 ++++++++++++++++++++++++++++++++++++++++ t/perf/p3900-stash.sh | 46 +++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 89 insertions(+) create mode 100755 t/perf/p3700-add.sh create mode 100755 t/perf/p3900-stash.sh diff --git a/t/perf/p3700-add.sh b/t/perf/p3700-add.sh new file mode 100755 index 00000000000..e93c08a2e70 --- /dev/null +++ b/t/perf/p3700-add.sh @@ -0,0 +1,43 @@ +#!/bin/sh +# +# This test measures the performance of adding new files to the object database +# and index. The test was originally added to measure the effect of the +# core.fsyncObjectFiles=batch mode, which is why we are testing different values +# of that setting explicitly and creating a lot of unique objects. + +test_description="Tests performance of add" + +. ./perf-lib.sh + +. $TEST_DIRECTORY/lib-unique-files.sh + +test_perf_default_repo +test_checkout_worktree + +dir_count=10 +files_per_dir=50 +total_files=$((dir_count * files_per_dir)) + +# We need to create the files each time we run the perf test, but +# we do not want to measure the cost of creating the files, so run +# the tet once. +if test "${GIT_PERF_REPEAT_COUNT-1}" -ne 1 +then + echo "warning: Setting GIT_PERF_REPEAT_COUNT=1" >&2 + GIT_PERF_REPEAT_COUNT=1 +fi + +for m in false true batch +do + test_expect_success "create the files for core.fsyncObjectFiles=$m" ' + git reset --hard && + # create files across directories + test_create_unique_files $dir_count $files_per_dir files + ' + + test_perf "add $total_files files (core.fsyncObjectFiles=$m)" " + git -c core.fsyncobjectfiles=$m add files + " +done + +test_done diff --git a/t/perf/p3900-stash.sh b/t/perf/p3900-stash.sh new file mode 100755 index 00000000000..c9fcd0c03eb --- /dev/null +++ b/t/perf/p3900-stash.sh @@ -0,0 +1,46 @@ +#!/bin/sh +# +# This test measures the performance of adding new files to the object database +# and index. The test was originally added to measure the effect of the +# core.fsyncObjectFiles=batch mode, which is why we are testing different values +# of that setting explicitly and creating a lot of unique objects. + +test_description="Tests performance of stash" + +. ./perf-lib.sh + +. $TEST_DIRECTORY/lib-unique-files.sh + +test_perf_default_repo +test_checkout_worktree + +dir_count=10 +files_per_dir=50 +total_files=$((dir_count * files_per_dir)) + +# We need to create the files each time we run the perf test, but +# we do not want to measure the cost of creating the files, so run +# the tet once. +if test "${GIT_PERF_REPEAT_COUNT-1}" -ne 1 +then + echo "warning: Setting GIT_PERF_REPEAT_COUNT=1" >&2 + GIT_PERF_REPEAT_COUNT=1 +fi + +for m in false true batch +do + test_expect_success "create the files for core.fsyncObjectFiles=$m" ' + git reset --hard && + # create files across directories + test_create_unique_files $dir_count $files_per_dir files + ' + + # We only stash files in the 'files' subdirectory since + # the perf test infrastructure creates files in the + # current working directory that need to be preserved + test_perf "stash 500 files (core.fsyncObjectFiles=$m)" " + git -c core.fsyncobjectfiles=$m stash push -u -- files + " +done + +test_done