From patchwork Sun Mar 20 07:15:54 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Neeraj Singh (WINDOWS-SFS)" X-Patchwork-Id: 12786447 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3BCFBC433F5 for ; Sun, 20 Mar 2022 07:16:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244956AbiCTHRa (ORCPT ); Sun, 20 Mar 2022 03:17:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42354 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244928AbiCTHR2 (ORCPT ); Sun, 20 Mar 2022 03:17:28 -0400 Received: from mail-wm1-x333.google.com (mail-wm1-x333.google.com [IPv6:2a00:1450:4864:20::333]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 606D3C3374 for ; Sun, 20 Mar 2022 00:16:05 -0700 (PDT) Received: by mail-wm1-x333.google.com with SMTP id r7so6935048wmq.2 for ; Sun, 20 Mar 2022 00:16:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=rPIqtYOuJZhaQkBUZUvmbTUDWfAzbOpHstVecEFt4eo=; b=G2MkLrFd/gpWpYAC7rLvZfvtJM07meJ8/u7DwkMsW+x+MpUwZ+gbVj0Cvfo4xadliY ZKk9AtzTk/3Yrd6dnadTTdocQ/d+ZsadxUPG5i/4pziRlbvC7hiW9C6P5iU/p2dVbW5D GnPPn/M2greqSqr2fl24Z5GXv4pEXI98DAlJgMF48rUz7NWoDA0uuIjaP+4TzEWVaMRL 2v3X7oLQQDgBIH2cagnt7f9Etl9O9rsBAFsHsYpeIeP7PiRGgSllmsNfmydO4l4BoUpn d54xGH6zDvXDeRMQNVrnfzPPl90hFxDNxyAllt8SW+/wUj3u/UBXFxBAhsOXuYXJK6sn xfkg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=rPIqtYOuJZhaQkBUZUvmbTUDWfAzbOpHstVecEFt4eo=; b=zjVxR6XvPaVSzGt1Zmcqw0cYQh7+crgIYA7AeieoIxzQljXbcrvaNJzuxkXKh1xqfM 5PDn+ldBQd6omnzH5jvAPqH1o0O6C+xsi/oF9k2m+M+kOnyWdVCqtsMNeFaFb3WMFa5s A5qUPreT04aG6c3E6KRS3BhoYFLiCPs7BoiMxuMFOJmit6k+tBUM8AoC4pkSzOHt+D9c 7kLF7OaEIKrX+qcAOxxXn2hgydBI8juLu32V+nGkwF/xWC6zgjyKNo7lvxj2F3XLdSh4 jG/LlmX31A7k3JL3fmPOoOGyS+hlbeIiBUPJLg5sHD+rMZgcFlnOLSPrVN66Q35h2Thm 3sTw== X-Gm-Message-State: AOAM531L5aL6du4EYyxw9cD/yZ/Mj8VnPkLTati5lmfblr0rrqBEdm+N VnMcmrdeEARIsMYv583pushlS4NMZVc= X-Google-Smtp-Source: ABdhPJw20CLzi2xJiJvkZ4tKvmAyPkMv4rw6fVFM6xWrnqi9uCkaiaLChYqe9TuUaCPi4iz4iA7XuA== X-Received: by 2002:a05:600c:1c28:b0:389:b614:68f2 with SMTP id j40-20020a05600c1c2800b00389b61468f2mr22553951wms.142.1647760563679; Sun, 20 Mar 2022 00:16:03 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id r15-20020a5d6c6f000000b002040552e88esm2580304wrz.29.2022.03.20.00.16.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 20 Mar 2022 00:16:03 -0700 (PDT) Message-Id: <9c2abd12bbbd27261378ffb6478a3e5db8a5063c.1647760560.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Sun, 20 Mar 2022 07:15:54 +0000 Subject: [PATCH v2 1/7] bulk-checkin: rename 'state' variable and separate 'plugged' boolean Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Johannes.Schindelin@gmx.de, avarab@gmail.com, nksingh85@gmail.com, ps@pks.im, Bagas Sanjaya , "Neeraj K. Singh" , Neeraj Singh Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Neeraj Singh From: Neeraj Singh This commit prepares for adding batch-fsync to the bulk-checkin infrastructure. The bulk-checkin infrastructure is currently used to batch up addition of large blobs to a packfile. When a blob is larger than big_file_threshold, we unconditionally add it to a pack. If bulk checkins are 'plugged', we allow multiple large blobs to be added to a single pack until we reach the packfile size limit; otherwise, we simply make a new packfile for each large blob. The 'unplug' call tells us when the series of blob additions is done so that we can finish the packfiles and make their objects available to subsequent operations. Stated another way, bulk-checkin allows callers to define a transaction that adds multiple objects to the object database, where the object database can optimize its internal operations within the transaction boundary. Batched fsync will fit into bulk-checkin by taking advantage of the plug/unplug functionality to determine the appropriate time to fsync and make newly-added objects available in the primary object database. * Rename 'state' variable to 'bulk_checkin_state', since we will later be adding 'bulk_fsync_objdir'. This also makes the variable easier to find in the debugger, since the name is more unique. * Move the 'plugged' data member of 'bulk_checkin_state' into a separate static variable. Doing this avoids resetting the variable in finish_bulk_checkin when zeroing the 'bulk_checkin_state'. As-is, we seem to unintentionally disable the plugging functionality the first time a new packfile must be created due to packfile size limits. While disabling the plugging state only results in suboptimal behavior for the current code, it would be fatal for the bulk-fsync functionality later in this patch series. Signed-off-by: Neeraj Singh --- bulk-checkin.c | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/bulk-checkin.c b/bulk-checkin.c index e988a388b65..93b1dc5138a 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -10,9 +10,9 @@ #include "packfile.h" #include "object-store.h" -static struct bulk_checkin_state { - unsigned plugged:1; +static int bulk_checkin_plugged; +static struct bulk_checkin_state { char *pack_tmp_name; struct hashfile *f; off_t offset; @@ -21,7 +21,7 @@ static struct bulk_checkin_state { struct pack_idx_entry **written; uint32_t alloc_written; uint32_t nr_written; -} state; +} bulk_checkin_state; static void finish_tmp_packfile(struct strbuf *basename, const char *pack_tmp_name, @@ -278,21 +278,23 @@ int index_bulk_checkin(struct object_id *oid, int fd, size_t size, enum object_type type, const char *path, unsigned flags) { - int status = deflate_to_pack(&state, oid, fd, size, type, + int status = deflate_to_pack(&bulk_checkin_state, oid, fd, size, type, path, flags); - if (!state.plugged) - finish_bulk_checkin(&state); + if (!bulk_checkin_plugged) + finish_bulk_checkin(&bulk_checkin_state); return status; } void plug_bulk_checkin(void) { - state.plugged = 1; + assert(!bulk_checkin_plugged); + bulk_checkin_plugged = 1; } void unplug_bulk_checkin(void) { - state.plugged = 0; - if (state.f) - finish_bulk_checkin(&state); + assert(bulk_checkin_plugged); + bulk_checkin_plugged = 0; + if (bulk_checkin_state.f) + finish_bulk_checkin(&bulk_checkin_state); } From patchwork Sun Mar 20 07:15:55 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Neeraj Singh (WINDOWS-SFS)" X-Patchwork-Id: 12786448 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F3FB8C433EF for ; Sun, 20 Mar 2022 07:16:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244962AbiCTHRc (ORCPT ); Sun, 20 Mar 2022 03:17:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42456 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244949AbiCTHR3 (ORCPT ); Sun, 20 Mar 2022 03:17:29 -0400 Received: from mail-wr1-x429.google.com (mail-wr1-x429.google.com [IPv6:2a00:1450:4864:20::429]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7AA86C3375 for ; Sun, 20 Mar 2022 00:16:06 -0700 (PDT) Received: by mail-wr1-x429.google.com with SMTP id q8so5537322wrc.0 for ; Sun, 20 Mar 2022 00:16:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=16fGogxRiNpspSikcOkYASZA8LjPVjdCiOcmMw+Ogr4=; b=KdXV9UrIMY2vDQUVEaIa95H6/cdYypWQofyq/IV8WHZx9WeQSKYQZb+V3aVidtzdZC mcqmP8DpCzNc1NDqjjTWcslRNqix4oBCHHD2ShfbguzBYoFh6bQnJL1cv/0Zb0zpPgtV 14y15gvx3ICoSFgSsVDQcS2quYLzV8f+VzrJl44zEbIQMD2+x1afDROj7YtHWpJjbJnH jahiPIsu8BNj5q0INBX7D1IlSUKIhuvF6NmswFB6ApzjFAMF3vSMrTMUMokkHELG9rOP O/UnBoVGKyl8FnR2IL593LqFIMTwtlJ4PhmutlzWAgjITo4T1pKvORpNuJ9VzWKLHa+D 7Wow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=16fGogxRiNpspSikcOkYASZA8LjPVjdCiOcmMw+Ogr4=; b=2VV1MlBDidgt0DrV2C7wksz9WWI3yWqEi/7vLf7IUccrI01nF0tHvrULVfdJMx20oJ Zr83T98vphJCLrnp2DygFDeBINK4GUaLp0lWn4P5YSeke5tKIYpgzj+gEO58H3YaueVn Ihi7fo9hlBH9FtNa51PDQU6rZX0VD+gtuX1sLh30eSxKCWiShjKEx+8HafEIAeO+n3xu pyEuEKq9LavaHMsBjnAIdJA8PNdc7hnlNFr+6wYCbuT6Mnb81TqfllZsu127CPNa99+c +NQPp/pwBrDun6k3LWbiaKqUVB+1z4+caSVM4XDUoMCNFQDifAQCa7JMxLD8+Sf+LYrN +2xQ== X-Gm-Message-State: AOAM532mXXs1zdEROLNhjTaykqr3FOAgKqMfTXoErsPzbVje3mGyTuZZ nlVTkN/BzivrobDy5Tr/9VXqri4ZD+U= X-Google-Smtp-Source: ABdhPJy+VMPLOXsaCckQzjaKadvxSz7juFZT3M3TjpHB5ND5KFBpYeb0HEob+C0BW9EvBMwl4lm+xg== X-Received: by 2002:a05:6000:144c:b0:1f1:f24b:a70b with SMTP id v12-20020a056000144c00b001f1f24ba70bmr14185151wrx.541.1647760564731; Sun, 20 Mar 2022 00:16:04 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id k12-20020a5d628c000000b00203e2fbb2absm9311683wru.113.2022.03.20.00.16.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 20 Mar 2022 00:16:04 -0700 (PDT) Message-Id: <3ed1dcd9b9ba9b34f26b3012eaba8da0269ee842.1647760560.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Sun, 20 Mar 2022 07:15:55 +0000 Subject: [PATCH v2 2/7] core.fsyncmethod: batched disk flushes for loose-objects Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Johannes.Schindelin@gmx.de, avarab@gmail.com, nksingh85@gmail.com, ps@pks.im, Bagas Sanjaya , "Neeraj K. Singh" , Neeraj Singh Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Neeraj Singh From: Neeraj Singh When adding many objects to a repo with `core.fsync=loose-object`, the cost of fsync'ing each object file can become prohibitive. One major source of the cost of fsync is the implied flush of the hardware writeback cache within the disk drive. This commit introduces a new `core.fsyncMethod=batch` option that batches up hardware flushes. It hooks into the bulk-checkin plugging and unplugging functionality, takes advantage of tmp-objdir, and uses the writeout-only support code. When the new mode is enabled, we do the following for each new object: 1. Create the object in a tmp-objdir. 2. Issue a pagecache writeback request and wait for it to complete. At the end of the entire transaction when unplugging bulk checkin: 1. Issue an fsync against a dummy file to flush the hardware writeback cache, which should by now have seen the tmp-objdir writes. 2. Rename all of the tmp-objdir files to their final names. 3. When updating the index and/or refs, we assume that Git will issue another fsync internal to that operation. This is not the default today, but the user now has the option of syncing the index and there is a separate patch series to implement syncing of refs. On a filesystem with a singular journal that is updated during name operations (e.g. create, link, rename, etc), such as NTFS, HFS+, or XFS we would expect the fsync to trigger a journal writeout so that this sequence is enough to ensure that the user's data is durable by the time the git command returns. Batch mode is only enabled if core.fsyncObjectFiles is false or unset. _Performance numbers_: Linux - Hyper-V VM running Kernel 5.11 (Ubuntu 20.04) on a fast SSD. Mac - macOS 11.5.1 running on a Mac mini on a 1TB Apple SSD. Windows - Same host as Linux, a preview version of Windows 11. Adding 500 files to the repo with 'git add' Times reported in seconds. object file syncing | Linux | Mac | Windows --------------------|-------|-------|-------- disabled | 0.06 | 0.35 | 0.61 fsync | 1.88 | 11.18 | 2.47 batch | 0.15 | 0.41 | 1.53 Signed-off-by: Neeraj Singh --- Documentation/config/core.txt | 7 ++++ bulk-checkin.c | 70 +++++++++++++++++++++++++++++++++++ bulk-checkin.h | 2 + cache.h | 8 +++- config.c | 2 + object-file.c | 2 + 6 files changed, 90 insertions(+), 1 deletion(-) diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt index 889522956e4..a3798dfc334 100644 --- a/Documentation/config/core.txt +++ b/Documentation/config/core.txt @@ -628,6 +628,13 @@ core.fsyncMethod:: * `writeout-only` issues pagecache writeback requests, but depending on the filesystem and storage hardware, data added to the repository may not be durable in the event of a system crash. This is the default mode on macOS. +* `batch` enables a mode that uses writeout-only flushes to stage multiple + updates in the disk writeback cache and then does a single full fsync of + a dummy file to trigger the disk cache flush at the end of the operation. + Currently `batch` mode only applies to loose-object files. Other repository + data is made durable as if `fsync` was specified. This mode is expected to + be as safe as `fsync` on macOS for repos stored on HFS+ or APFS filesystems + and on Windows for repos stored on NTFS or ReFS filesystems. core.fsyncObjectFiles:: This boolean will enable 'fsync()' when writing object files. diff --git a/bulk-checkin.c b/bulk-checkin.c index 93b1dc5138a..a702e0ff203 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -3,14 +3,20 @@ */ #include "cache.h" #include "bulk-checkin.h" +#include "lockfile.h" #include "repository.h" #include "csum-file.h" #include "pack.h" #include "strbuf.h" +#include "string-list.h" +#include "tmp-objdir.h" #include "packfile.h" #include "object-store.h" static int bulk_checkin_plugged; +static int needs_batch_fsync; + +static struct tmp_objdir *bulk_fsync_objdir; static struct bulk_checkin_state { char *pack_tmp_name; @@ -80,6 +86,37 @@ clear_exit: reprepare_packed_git(the_repository); } +/* + * Cleanup after batch-mode fsync_object_files. + */ +static void do_batch_fsync(void) +{ + /* + * Issue a full hardware flush against a temporary file to ensure + * that all objects are durable before any renames occur. The code in + * fsync_loose_object_bulk_checkin has already issued a writeout + * request, but it has not flushed any writeback cache in the storage + * hardware. + */ + + if (needs_batch_fsync) { + struct strbuf temp_path = STRBUF_INIT; + struct tempfile *temp; + + strbuf_addf(&temp_path, "%s/bulk_fsync_XXXXXX", get_object_directory()); + temp = xmks_tempfile(temp_path.buf); + fsync_or_die(get_tempfile_fd(temp), get_tempfile_path(temp)); + delete_tempfile(&temp); + strbuf_release(&temp_path); + needs_batch_fsync = 0; + } + + if (bulk_fsync_objdir) { + tmp_objdir_migrate(bulk_fsync_objdir); + bulk_fsync_objdir = NULL; + } +} + static int already_written(struct bulk_checkin_state *state, struct object_id *oid) { int i; @@ -274,6 +311,24 @@ static int deflate_to_pack(struct bulk_checkin_state *state, return 0; } +void fsync_loose_object_bulk_checkin(int fd) +{ + /* + * If we have a plugged bulk checkin, we issue a call that + * cleans the filesystem page cache but avoids a hardware flush + * command. Later on we will issue a single hardware flush + * before as part of do_batch_fsync. + */ + if (bulk_checkin_plugged && + git_fsync(fd, FSYNC_WRITEOUT_ONLY) >= 0) { + assert(bulk_fsync_objdir); + if (!needs_batch_fsync) + needs_batch_fsync = 1; + } else { + fsync_or_die(fd, "loose object file"); + } +} + int index_bulk_checkin(struct object_id *oid, int fd, size_t size, enum object_type type, const char *path, unsigned flags) @@ -288,6 +343,19 @@ int index_bulk_checkin(struct object_id *oid, void plug_bulk_checkin(void) { assert(!bulk_checkin_plugged); + + /* + * A temporary object directory is used to hold the files + * while they are not fsynced. + */ + if (batch_fsync_enabled(FSYNC_COMPONENT_LOOSE_OBJECT)) { + bulk_fsync_objdir = tmp_objdir_create("bulk-fsync"); + if (!bulk_fsync_objdir) + die(_("Could not create temporary object directory for core.fsyncobjectfiles=batch")); + + tmp_objdir_replace_primary_odb(bulk_fsync_objdir, 0); + } + bulk_checkin_plugged = 1; } @@ -297,4 +365,6 @@ void unplug_bulk_checkin(void) bulk_checkin_plugged = 0; if (bulk_checkin_state.f) finish_bulk_checkin(&bulk_checkin_state); + + do_batch_fsync(); } diff --git a/bulk-checkin.h b/bulk-checkin.h index b26f3dc3b74..08f292379b6 100644 --- a/bulk-checkin.h +++ b/bulk-checkin.h @@ -6,6 +6,8 @@ #include "cache.h" +void fsync_loose_object_bulk_checkin(int fd); + int index_bulk_checkin(struct object_id *oid, int fd, size_t size, enum object_type type, const char *path, unsigned flags); diff --git a/cache.h b/cache.h index 3160bc1e489..d1ae51388c9 100644 --- a/cache.h +++ b/cache.h @@ -1040,7 +1040,8 @@ extern int use_fsync; enum fsync_method { FSYNC_METHOD_FSYNC, - FSYNC_METHOD_WRITEOUT_ONLY + FSYNC_METHOD_WRITEOUT_ONLY, + FSYNC_METHOD_BATCH }; extern enum fsync_method fsync_method; @@ -1767,6 +1768,11 @@ void fsync_or_die(int fd, const char *); int fsync_component(enum fsync_component component, int fd); void fsync_component_or_die(enum fsync_component component, int fd, const char *msg); +static inline int batch_fsync_enabled(enum fsync_component component) +{ + return (fsync_components & component) && (fsync_method == FSYNC_METHOD_BATCH); +} + ssize_t read_in_full(int fd, void *buf, size_t count); ssize_t write_in_full(int fd, const void *buf, size_t count); ssize_t pread_in_full(int fd, void *buf, size_t count, off_t offset); diff --git a/config.c b/config.c index 261ee7436e0..0b28f90de8b 100644 --- a/config.c +++ b/config.c @@ -1688,6 +1688,8 @@ static int git_default_core_config(const char *var, const char *value, void *cb) fsync_method = FSYNC_METHOD_FSYNC; else if (!strcmp(value, "writeout-only")) fsync_method = FSYNC_METHOD_WRITEOUT_ONLY; + else if (!strcmp(value, "batch")) + fsync_method = FSYNC_METHOD_BATCH; else warning(_("ignoring unknown core.fsyncMethod value '%s'"), value); diff --git a/object-file.c b/object-file.c index 5258d9ed827..bdb0a38328f 100644 --- a/object-file.c +++ b/object-file.c @@ -1895,6 +1895,8 @@ static void close_loose_object(int fd) if (fsync_object_files > 0) fsync_or_die(fd, "loose object file"); + else if (batch_fsync_enabled(FSYNC_COMPONENT_LOOSE_OBJECT)) + fsync_loose_object_bulk_checkin(fd); else fsync_component_or_die(FSYNC_COMPONENT_LOOSE_OBJECT, fd, "loose object file"); From patchwork Sun Mar 20 07:15:56 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Neeraj Singh (WINDOWS-SFS)" X-Patchwork-Id: 12786449 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7B551C433F5 for ; Sun, 20 Mar 2022 07:16:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244968AbiCTHRf (ORCPT ); Sun, 20 Mar 2022 03:17:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42458 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244953AbiCTHR3 (ORCPT ); Sun, 20 Mar 2022 03:17:29 -0400 Received: from mail-wm1-x32d.google.com (mail-wm1-x32d.google.com [IPv6:2a00:1450:4864:20::32d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 374E7C3372 for ; Sun, 20 Mar 2022 00:16:07 -0700 (PDT) Received: by mail-wm1-x32d.google.com with SMTP id n35so5310897wms.5 for ; Sun, 20 Mar 2022 00:16:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=VMEUndPqaZqgnJ/fjVlg4dlleAN5tgf3inMupsE+RpY=; b=PeQNqgWamu952xky887NFZMZXDos8ra51+NDLNZsJBdmeiSqvI1dPoZKV0xTpR9Yiy Oz2LvtcIZHpgv3RXw92TDyzgpQqwAyG1HdVpV4VwBulglJfazeWrkgl6aan/WYV3kXGE glElfxMRb5vTUZJv8kMFM62g7AvivmWR8KZCnWMQUMbxiG5TwdfgPvy2CYsFT0IP3ZjE SXHUCVllhAwGyn/N3wXl8E8Tkcxj2FaaUeX2uaj1yXMHheP3gQ6Eao91zgpsFRyLlz8M mN6Og4Vj4gUkbwCylDYbqjggBRsDYuChBpTMYjXsmPlRjeg9x9Gvk018aoo2JzJGyCIi SREA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=VMEUndPqaZqgnJ/fjVlg4dlleAN5tgf3inMupsE+RpY=; b=IQFZ4ES7vuC6vh74KnPsJ94M2vcegf7mUgWhswW7eUc4d1bxqL4qqWq/Sfw1qvY3eC +wU1Tq30so/T7oK8nld5NnMMJ2+zGVROPNIy6iGxd7CnHzyAltFhSm4zZQ6RocRTAYcQ Ig8LFpfzA6fODcLie+WprWJ0npYTc9HWuK0FQNlr7mBW/ZM93AYE6LcFjnUfFFMm6OlV AwAup9jgzVSW4W70i7AZtOYz6TozRDwbGALYXqJNzdbvNt7sgo0CLu44zhqFJ1tVRuuT Fsc3JrQiE2PPGBxb1QrU/9rcg5K8EHp5tDAF4uKsMbz6MYtXcc3+dV5HWwKA7qWIUcr1 u/Dg== X-Gm-Message-State: AOAM531Nd1+y9se/SMkUZJHimea5LPpvQ0c6cN17et0aq+ri5NKqfBi7 wvOTY3QKKeToYRb2bDlYRrmJLMUYpmU= X-Google-Smtp-Source: ABdhPJyiJuAxZZcUqvRcx5DFINhsOK5gfca3aG1Lv48CTUCoC5RcYaxQT8h/7PmfdT0kCck+VGq2hA== X-Received: by 2002:a05:600c:1da2:b0:38c:6dc3:fcb8 with SMTP id p34-20020a05600c1da200b0038c6dc3fcb8mr14625173wms.121.1647760565624; Sun, 20 Mar 2022 00:16:05 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id n7-20020a05600c3b8700b00389a6241669sm15300721wms.33.2022.03.20.00.16.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 20 Mar 2022 00:16:05 -0700 (PDT) Message-Id: <54797dbc52060b7fa913642cd5266f7e159a5bc9.1647760561.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Sun, 20 Mar 2022 07:15:56 +0000 Subject: [PATCH v2 3/7] update-index: use the bulk-checkin infrastructure Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Johannes.Schindelin@gmx.de, avarab@gmail.com, nksingh85@gmail.com, ps@pks.im, Bagas Sanjaya , "Neeraj K. Singh" , Neeraj Singh Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Neeraj Singh From: Neeraj Singh The update-index functionality is used internally by 'git stash push' to setup the internal stashed commit. This change enables bulk-checkin for update-index infrastructure to speed up adding new objects to the object database by leveraging the batch fsync functionality. There is some risk with this change, since under batch fsync, the object files will be in a tmp-objdir until update-index is complete. This usage is unlikely, since any tool invoking update-index and expecting to see objects would have to synchronize with the update-index process after passing it a file path. Signed-off-by: Neeraj Singh --- builtin/update-index.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/builtin/update-index.c b/builtin/update-index.c index 75d646377cc..38e9d7e88cb 100644 --- a/builtin/update-index.c +++ b/builtin/update-index.c @@ -5,6 +5,7 @@ */ #define USE_THE_INDEX_COMPATIBILITY_MACROS #include "cache.h" +#include "bulk-checkin.h" #include "config.h" #include "lockfile.h" #include "quote.h" @@ -1110,6 +1111,9 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) the_index.updated_skipworktree = 1; + /* we might be adding many objects to the object database */ + plug_bulk_checkin(); + /* * Custom copy of parse_options() because we want to handle * filename arguments as they come. @@ -1190,6 +1194,8 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) strbuf_release(&buf); } + /* by now we must have added all of the new objects */ + unplug_bulk_checkin(); if (split_index > 0) { if (git_config_get_split_index() == 0) warning(_("core.splitIndex is set to false; " From patchwork Sun Mar 20 07:15:57 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Neeraj Singh (WINDOWS-SFS)" X-Patchwork-Id: 12786450 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BFA4DC433F5 for ; Sun, 20 Mar 2022 07:16:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244973AbiCTHRh (ORCPT ); Sun, 20 Mar 2022 03:17:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42562 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244957AbiCTHRa (ORCPT ); Sun, 20 Mar 2022 03:17:30 -0400 Received: from mail-wm1-x32f.google.com (mail-wm1-x32f.google.com [IPv6:2a00:1450:4864:20::32f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 16B43C3374 for ; Sun, 20 Mar 2022 00:16:08 -0700 (PDT) Received: by mail-wm1-x32f.google.com with SMTP id q20so6939295wmq.1 for ; Sun, 20 Mar 2022 00:16:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=uH9KeiJNcsWuHeLIMulApi4ZYs+kLt8o7kPraRKdbBo=; b=gDt2LhPlaNcmukbUF6qoVkK987x1Bn7wKXqYksdxyn2Chp2jePJmQeCuI3SHZmHNIs cnPY2bwAHyyQ119wYKmDyWIg8gZBkE4XU94nd7wQk8yZB2TGLk+0qpu3/i1tpINVavpA yQPMtmSyFWwF4NauZpuoiJJ0vOF0iIzToG4sukTkehbeL3spUX26MgjInfvoEk31SvuQ BP2noCFb6l8sk6+D6GNsBCFHl1+BpIf/QJTFf0Em2OeLysKd+nMK0YPPwYZ38UsmsB7B osYDRY2Y36TvS8GXa4ww0dygjq5SBl9EXaw5+6sRJpApRAZ+w7RNDAqQ5I9XbjXEsB3i HgDg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=uH9KeiJNcsWuHeLIMulApi4ZYs+kLt8o7kPraRKdbBo=; b=ApKwQ+T2Kh3X38bXGkW9HkmzuPAjvM/5s68FtRkVFLmrGwrzLNu2Yn26U+drLlH5e+ EBW8AmK2Rc/p99fEq3ajSNrrZFIQjcGB62nOgaLNJAUjS6xOXtEYRaraOairpIY5eMqS wEN1cnDyNn7a+YcHCFtB3DSdV2v2fPyjWYGtF0dZ3y378DbLf6FbtvtjjpU+1wYDlsoC cRof5o11ufE/dLhr4Zb3a8HibjCWPpIwWEiT5kXw3lI7G1MSE5Q8fury0VH8r8NLGsiR JimBn8dwW6V0AIopuX77bgk7ABXSdcSvW1mZgSUaieZ2dEAn1OUkUf58JtwQZaKqTYVs nqgg== X-Gm-Message-State: AOAM530RbfoPbj+atAFR2QuSoKLNyEOatEpCCTqK08Qu6hMnL96lVN5t U7x9hzKPwtlF5rF4CCV6nWd2GnKh/QE= X-Google-Smtp-Source: ABdhPJx6jcmFWydKLZE6rEROGpUEffteQ00I2WCcyYnTum2k0oc/L+dpFIY7ruquTU1CdYKp9n/8Hg== X-Received: by 2002:a7b:cc12:0:b0:37c:1ae:100a with SMTP id f18-20020a7bcc12000000b0037c01ae100amr22197581wmh.54.1647760566538; Sun, 20 Mar 2022 00:16:06 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id p2-20020a5d4582000000b00203f51aa12asm5638966wrq.55.2022.03.20.00.16.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 20 Mar 2022 00:16:06 -0700 (PDT) Message-Id: <6662e2dae0f5d65c158fba785d186885f9671073.1647760561.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Sun, 20 Mar 2022 07:15:57 +0000 Subject: [PATCH v2 4/7] unpack-objects: use the bulk-checkin infrastructure Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Johannes.Schindelin@gmx.de, avarab@gmail.com, nksingh85@gmail.com, ps@pks.im, Bagas Sanjaya , "Neeraj K. Singh" , Neeraj Singh Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Neeraj Singh From: Neeraj Singh The unpack-objects functionality is used by fetch, push, and fast-import to turn the transfered data into object database entries when there are fewer objects than the 'unpacklimit' setting. By enabling bulk-checkin when unpacking objects, we can take advantage of batched fsyncs. Signed-off-by: Neeraj Singh --- builtin/unpack-objects.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c index dbeb0680a58..c55b6616aed 100644 --- a/builtin/unpack-objects.c +++ b/builtin/unpack-objects.c @@ -1,5 +1,6 @@ #include "builtin.h" #include "cache.h" +#include "bulk-checkin.h" #include "config.h" #include "object-store.h" #include "object.h" @@ -503,10 +504,12 @@ static void unpack_all(void) if (!quiet) progress = start_progress(_("Unpacking objects"), nr_objects); CALLOC_ARRAY(obj_list, nr_objects); + plug_bulk_checkin(); for (i = 0; i < nr_objects; i++) { unpack_one(i); display_progress(progress, i + 1); } + unplug_bulk_checkin(); stop_progress(&progress); if (delta_list) From patchwork Sun Mar 20 07:15:58 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Neeraj Singh (WINDOWS-SFS)" X-Patchwork-Id: 12786452 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1C58AC433EF for ; Sun, 20 Mar 2022 07:16:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244977AbiCTHRm (ORCPT ); Sun, 20 Mar 2022 03:17:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42630 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244958AbiCTHRb (ORCPT ); Sun, 20 Mar 2022 03:17:31 -0400 Received: from mail-wr1-x435.google.com (mail-wr1-x435.google.com [IPv6:2a00:1450:4864:20::435]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C2DA4C3377 for ; Sun, 20 Mar 2022 00:16:08 -0700 (PDT) Received: by mail-wr1-x435.google.com with SMTP id j10so3823549wrb.13 for ; Sun, 20 Mar 2022 00:16:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=G+4+NgSz17xKe/Z6JOM/faGH+GUAQCBLgmXyIKbClWc=; b=XEt9gBkA/OeuHRU0PcKPbOICCRWV/wpVrBCAMob3AwyaqsR1QuDQBWGu0SxPUoionb ehCP7XZrB4cHuN3xoDqv56Hr7pt+SLRA5b7blkU/QuqIG1rTcxU6eEgJbyHO6poeuYCJ Wx+0l/dxewTUhae3WgZTeBs51skKolXIRek3iNlUF4UYm5kUyFWn+2P/kUtKsTivQJI0 94OaaFNrzIcdps7If5LvyBHooVpG4pTrQjNl9n4wMamSC+5sa6N8lOiYtuk70K3Nj+vT yS79UoqturrrZWv+ZtuyUq0RIODBxAGUL+/nbZsI+Ke/U2vnQz1TT9cMVTSkI+Lzs6ve K4gA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=G+4+NgSz17xKe/Z6JOM/faGH+GUAQCBLgmXyIKbClWc=; b=eFEVfW0xf0GCyNJsLTixrBJIOo9aCv5PFvdksPxGN5getQGmys8/lFuZO6p4AXG6kp eWryKI4K+t/l6OW9kyS1BOxy5fOR/6AICVgR1c5Zexj8TxVil2o1agAJMcdTMP0cWV1+ NuH+sUCRFtybwrT6to823cU+UpxqX21caquzercuqlEOM7UmFHdpip2RK/duQRXUO4jK YeVygp0wPG6om0JKxmmraWyNUlnomn/NLUtij5YA48EN0cgFSsfAHgDUf9x8GsgT3j1p +qqVQTABMfpduS16sEUi0tWITN/nHsr14WhCJKt6e5K+0sCyCeak28WIaC5G75gSMMGx O27w== X-Gm-Message-State: AOAM530varYc71SWxsvFNJEDKi+d0qza0+T8T4zGdhVCOCXZx/HV1Gwz nZZp0h5105FXx0ZcgSzM3cYoU1WdY4Q= X-Google-Smtp-Source: ABdhPJzpGmhoBky/HMGX5ho5M5UG/Mf6H7FPhsa9ADFbbhO4/8jkWiJit5+M+s6X8I4lyHsQVkE1mw== X-Received: by 2002:a5d:47a8:0:b0:204:72:7051 with SMTP id 8-20020a5d47a8000000b0020400727051mr4905908wrb.451.1647760567236; Sun, 20 Mar 2022 00:16:07 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id u7-20020a5d6da7000000b00203d9d1875bsm11213302wrs.73.2022.03.20.00.16.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 20 Mar 2022 00:16:06 -0700 (PDT) Message-Id: <03bf591742a48d750d6b8e6c54b5a8fd954561a5.1647760561.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Sun, 20 Mar 2022 07:15:58 +0000 Subject: [PATCH v2 5/7] core.fsync: use batch mode and sync loose objects by default on Windows Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Johannes.Schindelin@gmx.de, avarab@gmail.com, nksingh85@gmail.com, ps@pks.im, Bagas Sanjaya , "Neeraj K. Singh" , Neeraj Singh Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Neeraj Singh From: Neeraj Singh Git for Windows has defaulted to core.fsyncObjectFiles=true since September 2017. We turn on syncing of loose object files with batch mode in upstream Git so that we can get broad coverage of the new code upstream. We don't actually do fsyncs in the test suite, since GIT_TEST_FSYNC is set to 0. However, we do exercise all of the surrounding batch mode code since GIT_TEST_FSYNC merely makes the maybe_fsync wrapper always appear to succeed. Signed-off-by: Neeraj Singh --- cache.h | 4 ++++ compat/mingw.h | 3 +++ config.c | 2 +- git-compat-util.h | 2 ++ 4 files changed, 10 insertions(+), 1 deletion(-) diff --git a/cache.h b/cache.h index d1ae51388c9..4d2131e8f4f 100644 --- a/cache.h +++ b/cache.h @@ -1031,6 +1031,10 @@ enum fsync_component { FSYNC_COMPONENT_INDEX | \ FSYNC_COMPONENT_REFERENCE) +#ifndef FSYNC_COMPONENTS_PLATFORM_DEFAULT +#define FSYNC_COMPONENTS_PLATFORM_DEFAULT FSYNC_COMPONENTS_DEFAULT +#endif + /* * A bitmask indicating which components of the repo should be fsynced. */ diff --git a/compat/mingw.h b/compat/mingw.h index 6074a3d3ced..afe30868c04 100644 --- a/compat/mingw.h +++ b/compat/mingw.h @@ -332,6 +332,9 @@ int mingw_getpagesize(void); int win32_fsync_no_flush(int fd); #define fsync_no_flush win32_fsync_no_flush +#define FSYNC_COMPONENTS_PLATFORM_DEFAULT (FSYNC_COMPONENTS_DEFAULT | FSYNC_COMPONENT_LOOSE_OBJECT) +#define FSYNC_METHOD_DEFAULT (FSYNC_METHOD_BATCH) + struct rlimit { unsigned int rlim_cur; }; diff --git a/config.c b/config.c index 0b28f90de8b..c76443dc556 100644 --- a/config.c +++ b/config.c @@ -1342,7 +1342,7 @@ static const struct fsync_component_name { static enum fsync_component parse_fsync_components(const char *var, const char *string) { - enum fsync_component current = FSYNC_COMPONENTS_DEFAULT; + enum fsync_component current = FSYNC_COMPONENTS_PLATFORM_DEFAULT; enum fsync_component positive = 0, negative = 0; while (string) { diff --git a/git-compat-util.h b/git-compat-util.h index 0892e209a2f..fffe42ce7c1 100644 --- a/git-compat-util.h +++ b/git-compat-util.h @@ -1257,11 +1257,13 @@ __attribute__((format (printf, 3, 4))) NORETURN void BUG_fl(const char *file, int line, const char *fmt, ...); #define BUG(...) BUG_fl(__FILE__, __LINE__, __VA_ARGS__) +#ifndef FSYNC_METHOD_DEFAULT #ifdef __APPLE__ #define FSYNC_METHOD_DEFAULT FSYNC_METHOD_WRITEOUT_ONLY #else #define FSYNC_METHOD_DEFAULT FSYNC_METHOD_FSYNC #endif +#endif enum fsync_action { FSYNC_WRITEOUT_ONLY, From patchwork Sun Mar 20 07:15:59 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Neeraj Singh (WINDOWS-SFS)" X-Patchwork-Id: 12786453 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 914F4C433FE for ; Sun, 20 Mar 2022 07:16:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244981AbiCTHRq (ORCPT ); Sun, 20 Mar 2022 03:17:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42718 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244959AbiCTHRc (ORCPT ); Sun, 20 Mar 2022 03:17:32 -0400 Received: from mail-wm1-x32b.google.com (mail-wm1-x32b.google.com [IPv6:2a00:1450:4864:20::32b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C43B7C4E29 for ; Sun, 20 Mar 2022 00:16:09 -0700 (PDT) Received: by mail-wm1-x32b.google.com with SMTP id i64-20020a1c3b43000000b0038c99618859so91365wma.2 for ; Sun, 20 Mar 2022 00:16:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=NFCucs1SB7Xx0Io3Hv6AU/pBnfbh/c+oYzyKZHSX2VY=; b=jNByf8z6OAYKflCF/3HgYtNc7SoaTbfWUxXVUFs4xCd3u75bbIvys2/xeuI8tbWHxG q55NabsBNDgHvGtTt0RtH9tM9IqNwYpaN4SFR/Brj8RpOHt5ax3ljokBM7sKPjsyJK2O 100eTZ+afTVdOYT4fRLvSglMSRANkOoCaQAZOnK1E1//y9i8qEbprB/BtmipcDDsRMSH P25m93OfOrjEo0Dy5slNDs8UjmzTfhkvLEHsZufuPZb3OqKH/rtwfxZ+aFmj5hx+JuWN VG3QFTKRzi5Ow4Cpj8sC1ycylQCVfWyzAnzCg1jp+Zo+e+T/qSMyBvoUC9ecS8HiUuLp qsRg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=NFCucs1SB7Xx0Io3Hv6AU/pBnfbh/c+oYzyKZHSX2VY=; b=W0JUyZHCrKgE8ovEbjQ8JEfosNLUYrnImpt6kEOhhPugmvlXiSeUde1r5cs5sAz/+o lgUaPZ0Q2v1dNXtHhIRpA/6VES2S2Z7M412pg7USy6jpOKaj7QGT107CngHmqTxX06zZ xGz8hUGXQAFXNtNekJFyME2+gzzy8uibQnpQ5voTHTwEXHvDhlUOIGEHklhvZi/Hg6AY laVRWjh5IHNVRdrH6nQC558Om/2TbzrcuDGZQsMssG7XaYPjaycKG0GA3TJJaWGDmqV3 R4mTRyCzsG9IuVLu6+DrdsArrMeXU4AMk3yRulORHZv6CKTzkZfj3IB398nZ1wFvp76H MQXQ== X-Gm-Message-State: AOAM532D7Nd1MeUs5Eauq4zBpyV6SCQvOqcgvtEECNznxoqrbcoJeZYV 77R8pgL3ds1LX+1XNIiiK3kqwlj4GZw= X-Google-Smtp-Source: ABdhPJz+5LI8FP17glc08J9wKEJqeB5ApyBSNSaTM8Z5ujwXAsPQI0QAooXUzpOkOzjo/GTl2YLaHA== X-Received: by 2002:a05:600c:4ed2:b0:38c:93ad:4825 with SMTP id g18-20020a05600c4ed200b0038c93ad4825mr5912579wmq.181.1647760568127; Sun, 20 Mar 2022 00:16:08 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id g10-20020a5d46ca000000b00203fd86e198sm3771455wrs.96.2022.03.20.00.16.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 20 Mar 2022 00:16:07 -0700 (PDT) Message-Id: <1937746df47eefecfc343e32eb9bf6c0949fb7b9.1647760561.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Sun, 20 Mar 2022 07:15:59 +0000 Subject: [PATCH v2 6/7] core.fsyncmethod: tests for batch mode Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Johannes.Schindelin@gmx.de, avarab@gmail.com, nksingh85@gmail.com, ps@pks.im, Bagas Sanjaya , "Neeraj K. Singh" , Neeraj Singh Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Neeraj Singh From: Neeraj Singh Add test cases to exercise batch mode for: * 'git add' * 'git stash' * 'git update-index' * 'git unpack-objects' These tests ensure that the added data winds up in the object database. In this change we introduce a new test helper lib-unique-files.sh. The goal of this library is to create a tree of files that have different oids from any other files that may have been created in the current test repo. This helps us avoid missing validation of an object being added due to it already being in the repo. We aren't actually issuing any fsyncs in these tests, since GIT_TEST_FSYNC is 0, but we still exercise all of the tmp_objdir logic in bulk-checkin. Signed-off-by: Neeraj Singh --- t/lib-unique-files.sh | 36 ++++++++++++++++++++++++++++++++++++ t/t3700-add.sh | 22 ++++++++++++++++++++++ t/t3903-stash.sh | 17 +++++++++++++++++ t/t5300-pack-object.sh | 32 +++++++++++++++++++++----------- 4 files changed, 96 insertions(+), 11 deletions(-) create mode 100644 t/lib-unique-files.sh diff --git a/t/lib-unique-files.sh b/t/lib-unique-files.sh new file mode 100644 index 00000000000..a7de4ca8512 --- /dev/null +++ b/t/lib-unique-files.sh @@ -0,0 +1,36 @@ +# Helper to create files with unique contents + + +# Create multiple files with unique contents. Takes the number of +# directories, the number of files in each directory, and the base +# directory. +# +# test_create_unique_files 2 3 my_dir -- Creates 2 directories with 3 files +# each in my_dir, all with unique +# contents. + +test_create_unique_files() { + test "$#" -ne 3 && BUG "3 param" + + local dirs=$1 + local files=$2 + local basedir=$3 + local counter=0 + test_tick + local basedata=$test_tick + + + rm -rf $basedir + + for i in $(test_seq $dirs) + do + local dir=$basedir/dir$i + + mkdir -p "$dir" + for j in $(test_seq $files) + do + counter=$((counter + 1)) + echo "$basedata.$counter" >"$dir/file$j.txt" + done + done +} diff --git a/t/t3700-add.sh b/t/t3700-add.sh index b1f90ba3250..1f349f52ad3 100755 --- a/t/t3700-add.sh +++ b/t/t3700-add.sh @@ -8,6 +8,8 @@ test_description='Test of git add, including the -- option.' TEST_PASSES_SANITIZE_LEAK=true . ./test-lib.sh +. $TEST_DIRECTORY/lib-unique-files.sh + # Test the file mode "$1" of the file "$2" in the index. test_mode_in_index () { case "$(git ls-files -s "$2")" in @@ -34,6 +36,26 @@ test_expect_success \ 'Test that "git add -- -q" works' \ 'touch -- -q && git add -- -q' +BATCH_CONFIGURATION='-c core.fsync=loose-object -c core.fsyncmethod=batch' + +test_expect_success 'git add: core.fsyncmethod=batch' " + test_create_unique_files 2 4 fsync-files && + git $BATCH_CONFIGURATION add -- ./fsync-files/ && + rm -f fsynced_files && + git ls-files --stage fsync-files/ > fsynced_files && + test_line_count = 8 fsynced_files && + awk -- '{print \$2}' fsynced_files | xargs -n1 git cat-file -e +" + +test_expect_success 'git update-index: core.fsyncmethod=batch' " + test_create_unique_files 2 4 fsync-files2 && + find fsync-files2 ! -type d -print | xargs git $BATCH_CONFIGURATION update-index --add -- && + rm -f fsynced_files2 && + git ls-files --stage fsync-files2/ > fsynced_files2 && + test_line_count = 8 fsynced_files2 && + awk -- '{print \$2}' fsynced_files2 | xargs -n1 git cat-file -e +" + test_expect_success \ 'git add: Test that executable bit is not used if core.filemode=0' \ 'git config core.filemode 0 && diff --git a/t/t3903-stash.sh b/t/t3903-stash.sh index 55cd77901a8..5a3996b838f 100755 --- a/t/t3903-stash.sh +++ b/t/t3903-stash.sh @@ -9,6 +9,7 @@ GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME . ./test-lib.sh +. $TEST_DIRECTORY/lib-unique-files.sh test_expect_success 'usage on cmd and subcommand invalid option' ' test_expect_code 129 git stash --invalid-option 2>usage && @@ -1462,6 +1463,22 @@ test_expect_success 'stash handles skip-worktree entries nicely' ' git rev-parse --verify refs/stash:A.t ' + +BATCH_CONFIGURATION='-c core.fsync=loose-object -c core.fsyncmethod=batch' + +test_expect_success 'stash with core.fsyncmethod=batch' " + test_create_unique_files 2 4 fsync-files && + git $BATCH_CONFIGURATION stash push -u -- ./fsync-files/ && + rm -f fsynced_files && + + # The files were untracked, so use the third parent, + # which contains the untracked files + git ls-tree -r stash^3 -- ./fsync-files/ > fsynced_files && + test_line_count = 8 fsynced_files && + awk -- '{print \$3}' fsynced_files | xargs -n1 git cat-file -e +" + + test_expect_success 'git stash succeeds despite directory/file change' ' test_create_repo directory_file_switch_v1 && ( diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh index a11d61206ad..8e2f73cc68f 100755 --- a/t/t5300-pack-object.sh +++ b/t/t5300-pack-object.sh @@ -162,23 +162,25 @@ test_expect_success 'pack-objects with bogus arguments' ' check_unpack () { test_when_finished "rm -rf git2" && - git init --bare git2 && - git -C git2 unpack-objects -n <"$1".pack && - git -C git2 unpack-objects <"$1".pack && - (cd .git && find objects -type f -print) | - while read path - do - cmp git2/$path .git/$path || { - echo $path differs. - return 1 - } - done + git $2 init --bare git2 && + ( + git $2 -C git2 unpack-objects -n <"$1".pack && + git $2 -C git2 unpack-objects <"$1".pack && + git $2 -C git2 cat-file --batch-check="%(objectname)" + ) current && + cmp obj-list current } test_expect_success 'unpack without delta' ' check_unpack test-1-${packname_1} ' +BATCH_CONFIGURATION='-c core.fsync=loose-object -c core.fsyncmethod=batch' + +test_expect_success 'unpack without delta (core.fsyncmethod=batch)' ' + check_unpack test-1-${packname_1} "$BATCH_CONFIGURATION" +' + test_expect_success 'pack with REF_DELTA' ' packname_2=$(git pack-objects --progress test-2 stderr) && check_deltas stderr -gt 0 @@ -188,6 +190,10 @@ test_expect_success 'unpack with REF_DELTA' ' check_unpack test-2-${packname_2} ' +test_expect_success 'unpack with REF_DELTA (core.fsyncmethod=batch)' ' + check_unpack test-2-${packname_2} "$BATCH_CONFIGURATION" +' + test_expect_success 'pack with OFS_DELTA' ' packname_3=$(git pack-objects --progress --delta-base-offset test-3 \ stderr) && @@ -198,6 +204,10 @@ test_expect_success 'unpack with OFS_DELTA' ' check_unpack test-3-${packname_3} ' +test_expect_success 'unpack with OFS_DELTA (core.fsyncmethod=batch)' ' + check_unpack test-3-${packname_3} "$BATCH_CONFIGURATION" +' + test_expect_success 'compare delta flavors' ' perl -e '\'' defined($_ = -s $_) or die for @ARGV; From patchwork Sun Mar 20 07:16:00 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Neeraj Singh (WINDOWS-SFS)" X-Patchwork-Id: 12786451 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 43314C433EF for ; Sun, 20 Mar 2022 07:16:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244975AbiCTHRk (ORCPT ); Sun, 20 Mar 2022 03:17:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42848 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244963AbiCTHRd (ORCPT ); Sun, 20 Mar 2022 03:17:33 -0400 Received: from mail-wr1-x434.google.com (mail-wr1-x434.google.com [IPv6:2a00:1450:4864:20::434]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F0DAEC6257 for ; Sun, 20 Mar 2022 00:16:10 -0700 (PDT) Received: by mail-wr1-x434.google.com with SMTP id b19so16865366wrh.11 for ; Sun, 20 Mar 2022 00:16:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=hiQ5WX0jcODcj04j+PrFFWVvlhk66JiPT7wxH3YR6go=; b=SbiK8Pe7ygkcup830ImSor3woAFE1A34FiOXxVN224mvNb3PRD0SpL0UTK+XtA5S2J xNPo2TwSwY9c8vzYHj6Xh4/hnoTL/Hbus7Cnb8XtqNWgIKIyBgQanfjEPFETCtLydnCm wTu97sXur22xRECdYfjoA9onUx2+iIr30G6iXsb+i/Vnxc6Kp97O/01RH0l1AQkR2wui s2U5xsX0Ubal3eCxp+Ua62GR8C6u6y9C8sVy2iVOG1TrZ3u3JpuqfHY7qCu6PFjJbSWq Lah8WgY75MuvQTdAyCYXyNplQN4VBqXi2MAAwDEpwV9FPNH9Iooo7AOBphrF7H37z6ey 08Lw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=hiQ5WX0jcODcj04j+PrFFWVvlhk66JiPT7wxH3YR6go=; b=0FTQlTPP4iMfbxgBQ0VJ+Vf/HQOVz6sTX722E1RxZKbLTPmpjAah0xN0kaH1DLEb6K 3c8/4BepId855+O/zns5mzkIJjSkbU34YNJzT7pfe8XdX80V4PLgFtupY+I9M1XbI6pd uEzIoEC167l0cg2ESpthf4jtIJaLMEItMzZ7cGdAXRs3/Mf3Be6/q/uYcixHdysjsAzn aZUnFfddH2ste8HgPD8mY0E2g0eRZqnHkSoivtULyOQa88htpvYleLii8Ik8xUiRbqpn hpaVN3lOM6D7WEXAKsOmWtHTtfntnoD2VD8J4Cr7ZSdCA7qjtQO+5tcwcT7lZizPWqOB hgPw== X-Gm-Message-State: AOAM531XoDvOznPThnr+G+RWzQUuR8iFjhGus+zWDbkjPIJp2OWEkSXf +1hM+YgI+wdk0Rj5QI4xKxsijKKBS00= X-Google-Smtp-Source: ABdhPJwg9GeoEmtCy/H7R+/ZrFl+pSaAxv9yfFUuJA225i4u5iMZCA0mj8UlxcRNUdZ9SK3YHrNnyQ== X-Received: by 2002:a05:6000:1d8b:b0:203:df82:ff8d with SMTP id bk11-20020a0560001d8b00b00203df82ff8dmr13864053wrb.623.1647760569098; Sun, 20 Mar 2022 00:16:09 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id v20-20020a7bcb54000000b0037fa63db8aasm14356537wmj.5.2022.03.20.00.16.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 20 Mar 2022 00:16:08 -0700 (PDT) Message-Id: <624244078c7adc2186941fbfa08cb3afecdece4c.1647760561.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Sun, 20 Mar 2022 07:16:00 +0000 Subject: [PATCH v2 7/7] core.fsyncmethod: performance tests for add and stash Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Johannes.Schindelin@gmx.de, avarab@gmail.com, nksingh85@gmail.com, ps@pks.im, Bagas Sanjaya , "Neeraj K. Singh" , Neeraj Singh Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Neeraj Singh From: Neeraj Singh Add basic performance tests for "git add" and "git stash" of a lot of new objects with various fsync settings. This shows the benefit of batch mode relative to an ordinary stash command. Signed-off-by: Neeraj Singh --- t/perf/p3700-add.sh | 59 ++++++++++++++++++++++++++++++++++++++++ t/perf/p3900-stash.sh | 62 +++++++++++++++++++++++++++++++++++++++++++ t/perf/perf-lib.sh | 4 +-- 3 files changed, 123 insertions(+), 2 deletions(-) create mode 100755 t/perf/p3700-add.sh create mode 100755 t/perf/p3900-stash.sh diff --git a/t/perf/p3700-add.sh b/t/perf/p3700-add.sh new file mode 100755 index 00000000000..2ea78c9449d --- /dev/null +++ b/t/perf/p3700-add.sh @@ -0,0 +1,59 @@ +#!/bin/sh +# +# This test measures the performance of adding new files to the object database +# and index. The test was originally added to measure the effect of the +# core.fsyncMethod=batch mode, which is why we are testing different values +# of that setting explicitly and creating a lot of unique objects. + +test_description="Tests performance of add" + +# Fsync is normally turned off for the test suite. +GIT_TEST_FSYNC=1 +export GIT_TEST_FSYNC + +. ./perf-lib.sh + +. $TEST_DIRECTORY/lib-unique-files.sh + +test_perf_default_repo +test_checkout_worktree + +dir_count=10 +files_per_dir=50 +total_files=$((dir_count * files_per_dir)) + +# We need to create the files each time we run the perf test, but +# we do not want to measure the cost of creating the files, so run +# the test once. +if test "${GIT_PERF_REPEAT_COUNT-1}" -ne 1 +then + echo "warning: Setting GIT_PERF_REPEAT_COUNT=1" >&2 + GIT_PERF_REPEAT_COUNT=1 +fi + +for m in false true batch +do + test_expect_success "create the files for object_fsyncing=$m" ' + git reset --hard && + # create files across directories + test_create_unique_files $dir_count $files_per_dir files + ' + + case $m in + false) + FSYNC_CONFIG='-c core.fsync=-loose-object -c core.fsyncmethod=fsync' + ;; + true) + FSYNC_CONFIG='-c core.fsync=loose-object -c core.fsyncmethod=fsync' + ;; + batch) + FSYNC_CONFIG='-c core.fsync=loose-object -c core.fsyncmethod=batch' + ;; + esac + + test_perf "add $total_files files (object_fsyncing=$m)" " + git $FSYNC_CONFIG add files + " +done + +test_done diff --git a/t/perf/p3900-stash.sh b/t/perf/p3900-stash.sh new file mode 100755 index 00000000000..3526f06cef4 --- /dev/null +++ b/t/perf/p3900-stash.sh @@ -0,0 +1,62 @@ +#!/bin/sh +# +# This test measures the performance of adding new files to the object database +# and index. The test was originally added to measure the effect of the +# core.fsyncMethod=batch mode, which is why we are testing different values +# of that setting explicitly and creating a lot of unique objects. + +test_description="Tests performance of stash" + +# Fsync is normally turned off for the test suite. +GIT_TEST_FSYNC=1 +export GIT_TEST_FSYNC + +. ./perf-lib.sh + +. $TEST_DIRECTORY/lib-unique-files.sh + +test_perf_default_repo +test_checkout_worktree + +dir_count=10 +files_per_dir=50 +total_files=$((dir_count * files_per_dir)) + +# We need to create the files each time we run the perf test, but +# we do not want to measure the cost of creating the files, so run +# the test once. +if test "${GIT_PERF_REPEAT_COUNT-1}" -ne 1 +then + echo "warning: Setting GIT_PERF_REPEAT_COUNT=1" >&2 + GIT_PERF_REPEAT_COUNT=1 +fi + +for m in false true batch +do + test_expect_success "create the files for object_fsyncing=$m" ' + git reset --hard && + # create files across directories + test_create_unique_files $dir_count $files_per_dir files + ' + + case $m in + false) + FSYNC_CONFIG='-c core.fsync=-loose-object -c core.fsyncmethod=fsync' + ;; + true) + FSYNC_CONFIG='-c core.fsync=loose-object -c core.fsyncmethod=fsync' + ;; + batch) + FSYNC_CONFIG='-c core.fsync=loose-object -c core.fsyncmethod=batch' + ;; + esac + + # We only stash files in the 'files' subdirectory since + # the perf test infrastructure creates files in the + # current working directory that need to be preserved + test_perf "stash $total_files files (object_fsyncing=$m)" " + git $FSYNC_CONFIG stash push -u -- files + " +done + +test_done diff --git a/t/perf/perf-lib.sh b/t/perf/perf-lib.sh index 932105cd12c..d270d1d962a 100644 --- a/t/perf/perf-lib.sh +++ b/t/perf/perf-lib.sh @@ -98,8 +98,8 @@ test_perf_create_repo_from () { mkdir -p "$repo/.git" ( cd "$source" && - { cp -Rl "$objects_dir" "$repo/.git/" 2>/dev/null || - cp -R "$objects_dir" "$repo/.git/"; } && + { cp -Rl "$objects_dir" "$repo/.git/" || + cp -R "$objects_dir" "$repo/.git/" 2>/dev/null;} && # common_dir must come first here, since we want source_git to # take precedence and overwrite any overlapping files