From patchwork Fri Sep 24 20:12:05 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Neeraj Singh (WINDOWS-SFS)" X-Patchwork-Id: 12516763 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 29D39C433EF for ; Fri, 24 Sep 2021 20:12:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0C75261038 for ; Fri, 24 Sep 2021 20:12:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345677AbhIXUNu (ORCPT ); Fri, 24 Sep 2021 16:13:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33010 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1348375AbhIXUNs (ORCPT ); Fri, 24 Sep 2021 16:13:48 -0400 Received: from mail-wr1-x42f.google.com (mail-wr1-x42f.google.com [IPv6:2a00:1450:4864:20::42f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5E174C061571 for ; Fri, 24 Sep 2021 13:12:14 -0700 (PDT) Received: by mail-wr1-x42f.google.com with SMTP id d6so30675123wrc.11 for ; Fri, 24 Sep 2021 13:12:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=qaojmrOCfEqu729QPdI/H8/1KHs81q8gLQkfiU1S9D8=; b=kec9OKZ1P9bSBgBN12zaB+P4FoMMLpADKxCOhg0LYQQye3+qDWwEmCZlrjJA8cSteN /+BdeAMzj2cBgbOXepy0iIyujQYWefe6A6Ew24tbl3/UGoNn1WfOVazugeormhJ66PUD OBHE6AiQfgVD4T1rvnsTMyMHSIoI+7IzMm6Tjo3R7YMRDkoAQEhBgO8Iavmoe1PWPO6h VR9NBf7vhU0o+2ZYsUnkKXXTwB8gA5xCRaYQuj6JDcwP9EKTFE1Oi+VAv5RLmg6B73nE OlM5Wfkx9KIPOxSw+lDHdR4WcT5LRs2OXXAKVBtbTrNiqE80Ejtx0vkGns/MuTXLyiXn PeVw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=qaojmrOCfEqu729QPdI/H8/1KHs81q8gLQkfiU1S9D8=; b=HoDdKGr2GceLwnM5f4eSK9cPu11TrH1N/Hx1Y7QWChGabalwaqzwfpoKQ+5e98vOKk eDDuJodZi192jtCdsnctzRcnycXhaN2PT2oMpwv+tZdsEU7kUyY8eo3u/S7VS4thlTMF durB4QNYajud9RquoCWNlQZIB2FyWGXJ3LkaatodwdtF2/T8qcOaLlKrUd25/k5wJQKv PHwBT9H6oYNCs7kGa/XiPOOb+QDEPDG8XQCsbihAdjzmdiFLXYFCtCawNAgGXcKKXbli QNZma0Bd76H6ii3cEHERd5JbD5sNyjvdKE0WqMXrtMiX6gAdKwWuysgpU5QWz5yFi61+ zQVw== X-Gm-Message-State: AOAM531QdN8ybjWy8x2l6oLKb6q8l1q/GhEkRc8pAPyWgF/jjf84k9z3 Rz0qw7WNXcuL3RMTQJbqkSEZDluSkfc= X-Google-Smtp-Source: ABdhPJzsXg0FF1zpNaMcv2TfU9KGg2oUskcpLagWANcZCQGaFCwhRW7cB9UG46jmwnHRIi9/Hh42Eg== X-Received: by 2002:adf:fac7:: with SMTP id a7mr13531891wrs.341.1632514332962; Fri, 24 Sep 2021 13:12:12 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id r27sm8042201wrr.70.2021.09.24.13.12.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Sep 2021 13:12:12 -0700 (PDT) Message-Id: <95315f35a283feabe301b24d2d465a8ae141b139.1632514331.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Fri, 24 Sep 2021 20:12:05 +0000 Subject: [PATCH v5 1/7] object-file.c: do not rename in a temp odb Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Neeraj-Personal , Johannes Schindelin , Jeff King , Jeff Hostetler , Christoph Hellwig , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , "Randall S. Becker" , Bagas Sanjaya , "Neeraj K. Singh" , Neeraj Singh Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Neeraj Singh From: Neeraj Singh If a temporary ODB is active, as determined by GIT_QUARANTINE_PATH being set, create object files with their final names. This avoids an extra rename beyond what is needed to merge the temporary ODB in tmp_objdir_migrate. Creating an object file with the expected final name should be okay since the git process writing to the temporary object store is the only writer, and it only invokes write_loose_object/create_object_file after checking that the object doesn't exist. Signed-off-by: Neeraj Singh --- environment.c | 4 ++++ object-file.c | 51 ++++++++++++++++++++++++++++++++++---------------- object-store.h | 6 ++++++ repository.c | 2 ++ repository.h | 1 + 5 files changed, 48 insertions(+), 16 deletions(-) diff --git a/environment.c b/environment.c index d6b22ede7ea..d9ba68402e9 100644 --- a/environment.c +++ b/environment.c @@ -177,6 +177,10 @@ void setup_git_env(const char *git_dir) args.graft_file = getenv_safe(&to_free, GRAFT_ENVIRONMENT); args.index_file = getenv_safe(&to_free, INDEX_ENVIRONMENT); args.alternate_db = getenv_safe(&to_free, ALTERNATE_DB_ENVIRONMENT); + if (getenv(GIT_QUARANTINE_ENVIRONMENT)) { + args.object_dir_is_temp = 1; + } + repo_set_gitdir(the_repository, git_dir, &args); strvec_clear(&to_free); diff --git a/object-file.c b/object-file.c index a8be8994814..ab593515cec 100644 --- a/object-file.c +++ b/object-file.c @@ -1800,12 +1800,17 @@ static void write_object_file_prepare(const struct git_hash_algo *algo, } /* - * Move the just written object into its final resting place. + * Move the just written object into its final resting place, + * unless it is already there, as indicated by an empty string for + * tmpfile. */ int finalize_object_file(const char *tmpfile, const char *filename) { int ret = 0; + if (!*tmpfile) + goto out; + if (object_creation_mode == OBJECT_CREATION_USES_RENAMES) goto try_rename; else if (link(tmpfile, filename)) @@ -1878,21 +1883,37 @@ static inline int directory_size(const char *filename) } /* - * This creates a temporary file in the same directory as the final - * 'filename' + * This creates a loose object file for the specified object id. + * If we're working in a temporary object directory, the file is + * created with its final filename, otherwise it is created with + * a temporary name and renamed by finalize_object_file. + * If no rename is required, an empty string is returned in tmp. * * We want to avoid cross-directory filename renames, because those * can have problems on various filesystems (FAT, NFS, Coda). */ -static int create_tmpfile(struct strbuf *tmp, const char *filename) +static int create_objfile(const struct object_id *oid, struct strbuf *tmp, + struct strbuf *filename) { - int fd, dirlen = directory_size(filename); + int fd, dirlen, is_retrying = 0; + const char *object_name; + static const int object_mode = 0444; + loose_object_path(the_repository, filename, oid); + dirlen = directory_size(filename->buf); + +retry_create: strbuf_reset(tmp); - strbuf_add(tmp, filename, dirlen); - strbuf_addstr(tmp, "tmp_obj_XXXXXX"); - fd = git_mkstemp_mode(tmp->buf, 0444); - if (fd < 0 && dirlen && errno == ENOENT) { + if (!the_repository->objects->odb->is_temp) { + strbuf_add(tmp, filename->buf, dirlen); + object_name = "tmp_obj_XXXXXX"; + strbuf_addstr(tmp, object_name); + fd = git_mkstemp_mode(tmp->buf, object_mode); + } else { + fd = open(filename->buf, O_CREAT | O_EXCL | O_RDWR, object_mode); + } + + if (fd < 0 && dirlen && errno == ENOENT && !is_retrying) { /* * Make sure the directory exists; note that the contents * of the buffer are undefined after mkstemp returns an @@ -1900,15 +1921,15 @@ static int create_tmpfile(struct strbuf *tmp, const char *filename) * scratch. */ strbuf_reset(tmp); - strbuf_add(tmp, filename, dirlen - 1); + strbuf_add(tmp, filename->buf, dirlen - 1); if (mkdir(tmp->buf, 0777) && errno != EEXIST) return -1; if (adjust_shared_perm(tmp->buf)) return -1; /* Try again */ - strbuf_addstr(tmp, "/tmp_obj_XXXXXX"); - fd = git_mkstemp_mode(tmp->buf, 0444); + is_retrying = 1; + goto retry_create; } return fd; } @@ -1925,14 +1946,12 @@ static int write_loose_object(const struct object_id *oid, char *hdr, static struct strbuf tmp_file = STRBUF_INIT; static struct strbuf filename = STRBUF_INIT; - loose_object_path(the_repository, &filename, oid); - - fd = create_tmpfile(&tmp_file, filename.buf); + fd = create_objfile(oid, &tmp_file, &filename); if (fd < 0) { if (errno == EACCES) return error(_("insufficient permission for adding an object to repository database %s"), get_object_directory()); else - return error_errno(_("unable to create temporary file")); + return error_errno(_("unable to create object file")); } /* Set it up */ diff --git a/object-store.h b/object-store.h index b4dc6668aa2..f8c883a5730 100644 --- a/object-store.h +++ b/object-store.h @@ -26,6 +26,12 @@ struct object_directory { uint32_t loose_objects_subdir_seen[8]; /* 256 bits */ struct oidtree *loose_objects_cache; + /* + * This is a temporary object store, so there is no need to + * create new objects via rename. + */ + int is_temp; + /* * Path to the alternative object store. If this is a relative path, * it is relative to the current working directory. diff --git a/repository.c b/repository.c index b2bf44c6faf..a16de04dfa8 100644 --- a/repository.c +++ b/repository.c @@ -80,6 +80,8 @@ void repo_set_gitdir(struct repository *repo, expand_base_dir(&repo->objects->odb->path, o->object_dir, repo->commondir, "objects"); + repo->objects->odb->is_temp = o->object_dir_is_temp; + free(repo->objects->alternate_db); repo->objects->alternate_db = xstrdup_or_null(o->alternate_db); expand_base_dir(&repo->graft_file, o->graft_file, diff --git a/repository.h b/repository.h index 3740c93bc0f..d3711367a6f 100644 --- a/repository.h +++ b/repository.h @@ -162,6 +162,7 @@ struct set_gitdir_args { const char *graft_file; const char *index_file; const char *alternate_db; + int object_dir_is_temp; }; void repo_set_gitdir(struct repository *repo, const char *root, From patchwork Fri Sep 24 20:12:06 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Neeraj Singh (WINDOWS-SFS)" X-Patchwork-Id: 12516765 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 71CBFC433F5 for ; Fri, 24 Sep 2021 20:12:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 58D8461050 for ; Fri, 24 Sep 2021 20:12:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1348395AbhIXUNy (ORCPT ); Fri, 24 Sep 2021 16:13:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33016 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1348384AbhIXUNt (ORCPT ); Fri, 24 Sep 2021 16:13:49 -0400 Received: from mail-wr1-x436.google.com (mail-wr1-x436.google.com [IPv6:2a00:1450:4864:20::436]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D5330C061571 for ; Fri, 24 Sep 2021 13:12:15 -0700 (PDT) Received: by mail-wr1-x436.google.com with SMTP id t18so30887790wrb.0 for ; Fri, 24 Sep 2021 13:12:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=7X2vP95Nrih2/ZYF5MeQkFO3N7Ybz8M6MMt6amHXYUA=; b=CHz9kopmiJRA22IMCnAv1XOPq9NhJIEYwfqb2P6WrLKKlFafAspAkcH5SC6/zoY2xf sFZ6UV7LwAd5BxvXLGROJDh2KAjwZte4Nvned5fpDuKyazHkqJRUUv9hAtc5+iKtYtfX V/bohwn4K4K3ysmo0iinrKjt+YaosbPFAwK38/pj22jV4cTKe3Y50vwxb6Tr6rTBqQOJ lJ9jCESaG5ovBSWwksZgE8VeX72vmQ2tNMNtcfT87uIvESMi6UWNxf6AEIPgRIFuVK5v XFH6FOEHq730OdEu0n8S1SknOlRPM57yPu/yNaUpy1RXxvQ5x3Ki7/YIL1PHPS/NAX5o /MEg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=7X2vP95Nrih2/ZYF5MeQkFO3N7Ybz8M6MMt6amHXYUA=; b=PIvyK+1A2nnjxcBIuRFKK42eOQgogzb1DkfkVo4zOPVdMpJg1VL3+2Y1hl/hgpbTaD X4HCys9IsRKHX4h+XHztvXtb4Mq9oWKFCzFNkWNEnDvnJbwfWI8Sat/xPHUhYH/V/zj7 Hws26bselQ6F7GKZbJjzzJPFgC79YuFPsjsDxdawfx1dmpFbmLJsG7quegqo/PQpDfaB S7BjXOId17C1S+oiuWN8Gq/HPdLOD7IXeuUijS4S0B1TtUC3iK0Ebm33kuohRNfFr4LJ S3StHXWYFxr7/V7MBtaGlB16pnA1AF5MCX55CWvuNFcwh/TQyarBr8D5hTkv2gpZTUrg y2CA== X-Gm-Message-State: AOAM533AdKCi9VL2dGTYMGQGMYxCazHCd8Y0V3OBEu/+AB+ao1wx2JE8 dAmnph4EWN4lObTRaohmXEwEUoMpCu0= X-Google-Smtp-Source: ABdhPJx/fGUBRbzJ9gTSh7qVTa7D6aIEaaFKcBM9mlepQntd3L7gnv6p/iJHoaAfPD6jE+AJ8JqvzQ== X-Received: by 2002:a5d:4579:: with SMTP id a25mr13840802wrc.222.1632514334553; Fri, 24 Sep 2021 13:12:14 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id t22sm5577974wmj.30.2021.09.24.13.12.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Sep 2021 13:12:14 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Fri, 24 Sep 2021 20:12:06 +0000 Subject: [PATCH v5 2/7] bulk-checkin: rename 'state' variable and separate 'plugged' boolean Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Neeraj-Personal , Johannes Schindelin , Jeff King , Jeff Hostetler , Christoph Hellwig , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , "Randall S. Becker" , Bagas Sanjaya , "Neeraj K. Singh" , Neeraj Singh Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Neeraj Singh From: Neeraj Singh Preparation for adding bulk-fsync to the bulk-checkin.c infrastructure. * Rename 'state' variable to 'bulk_checkin_state', since we will later be adding 'bulk_fsync_state'. This also makes the variable easier to find in the debugger, since the name is more unique. * Move the 'plugged' data member of 'bulk_checkin_state' into a separate static variable. Doing this avoids resetting the variable in finish_bulk_checkin when zeroing the 'bulk_checkin_state'. As-is, we seem to unintentionally disable the plugging functionality the first time a new packfile must be created due to packfile size limits. While disabling the plugging state only results in suboptimal behavior for the current code, it would be fatal for the bulk-fsync functionality later in this patch series. Signed-off-by: Neeraj Singh --- bulk-checkin.c | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/bulk-checkin.c b/bulk-checkin.c index b023d9959aa..f117d62c908 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -10,9 +10,9 @@ #include "packfile.h" #include "object-store.h" -static struct bulk_checkin_state { - unsigned plugged:1; +static int bulk_checkin_plugged; +static struct bulk_checkin_state { char *pack_tmp_name; struct hashfile *f; off_t offset; @@ -21,7 +21,7 @@ static struct bulk_checkin_state { struct pack_idx_entry **written; uint32_t alloc_written; uint32_t nr_written; -} state; +} bulk_checkin_state; static void finish_bulk_checkin(struct bulk_checkin_state *state) { @@ -260,21 +260,23 @@ int index_bulk_checkin(struct object_id *oid, int fd, size_t size, enum object_type type, const char *path, unsigned flags) { - int status = deflate_to_pack(&state, oid, fd, size, type, + int status = deflate_to_pack(&bulk_checkin_state, oid, fd, size, type, path, flags); - if (!state.plugged) - finish_bulk_checkin(&state); + if (!bulk_checkin_plugged) + finish_bulk_checkin(&bulk_checkin_state); return status; } void plug_bulk_checkin(void) { - state.plugged = 1; + assert(!bulk_checkin_plugged); + bulk_checkin_plugged = 1; } void unplug_bulk_checkin(void) { - state.plugged = 0; - if (state.f) - finish_bulk_checkin(&state); + assert(bulk_checkin_plugged); + bulk_checkin_plugged = 0; + if (bulk_checkin_state.f) + finish_bulk_checkin(&bulk_checkin_state); } From patchwork Fri Sep 24 20:12:07 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Neeraj Singh (WINDOWS-SFS)" X-Patchwork-Id: 12516769 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DE618C433EF for ; Fri, 24 Sep 2021 20:12:28 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BB54061038 for ; Fri, 24 Sep 2021 20:12:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1348404AbhIXUOA (ORCPT ); Fri, 24 Sep 2021 16:14:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33028 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344407AbhIXUNu (ORCPT ); Fri, 24 Sep 2021 16:13:50 -0400 Received: from mail-wr1-x42e.google.com (mail-wr1-x42e.google.com [IPv6:2a00:1450:4864:20::42e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AB839C061571 for ; Fri, 24 Sep 2021 13:12:16 -0700 (PDT) Received: by mail-wr1-x42e.google.com with SMTP id i24so14867623wrc.9 for ; Fri, 24 Sep 2021 13:12:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=s58h2yRCtzX2y/fLvrOOSPav3rMBGL4p/lrwppDnCzI=; b=hp2OCD7o0jHk/IcMonc2LN6WkMmUPIa9BgIOffOjo/66dBU3U46moe6DC2z/KKlzMy R8KEoXBiJitq/Sm3GHTY6FyovbjlS8sWC36mYet+Xy7EJyRahUZFtlnCwEGmakmInUR2 RzWMjq09iO1VEz5mlZlYJJQ45EvuY88sx3XwAJqN0WuFLCzl2r8FYjI3wmxbAWQpeRJM azCO8/CHhsSVzornwiu23kRpTFbcAyS7D6UCRU85AM4+rIqbl0D2/0ExVmljeRqrUJgj 2USTlqI8g8qDyHbQYW+fshcxwmJME5+6CWFosBWaKNNVrOwI5kNGfKWXBas25bSnr5Hv sm/Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=s58h2yRCtzX2y/fLvrOOSPav3rMBGL4p/lrwppDnCzI=; b=AbzhjRrJoVu5rPjT+KlHKOJu9KJjQvqbfMyULzfS9+a1QYfpjM4VbieNG4H7gkAd5b xcjVU145QcArXuXESavLEI8AQE2C7F6cMbdJPCW+jQwNlJJryTjkztFgaccw+EXsnrt+ MMhc4Gn7e3mM484WWE4rjG20wwmbayQ1fqw3crs9/6AMbKXzOi/Z0bIUFDxu1Ib3Kpmr c0M22WzUgmgP3wpPBqXrs4+Bye2LEEwwVTBy7P+FcVMoV9lHonL4dTEItdNVzUsT8soD XIk9bGTCbBtCayOxsYunXcNF4kyEnlWEbdCZf6MKQ2HMN7KUw4YGkiKgYX5JShse0nwQ NxHA== X-Gm-Message-State: AOAM530cYRXZ+pe8i7RGDoc9dJy/lxrSID/Hc1cwkhrERq3fOZ6NnqEb U+oOS1rqbDfKxjQEMQ8pcw1rwRFXipQ= X-Google-Smtp-Source: ABdhPJxZi1MLDnhA6yobqStHuEmrTXq8vroJsSGTlIIW7GNBbVe12Poa3HHWpga51c9lfv7weRFB5A== X-Received: by 2002:a1c:ed13:: with SMTP id l19mr3934508wmh.48.1632514335144; Fri, 24 Sep 2021 13:12:15 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id r25sm9223333wra.76.2021.09.24.13.12.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Sep 2021 13:12:14 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Fri, 24 Sep 2021 20:12:07 +0000 Subject: [PATCH v5 3/7] core.fsyncobjectfiles: batched disk flushes Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Neeraj-Personal , Johannes Schindelin , Jeff King , Jeff Hostetler , Christoph Hellwig , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , "Randall S. Becker" , Bagas Sanjaya , "Neeraj K. Singh" , Neeraj Singh Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Neeraj Singh From: Neeraj Singh When adding many objects to a repo with core.fsyncObjectFiles set to true, the cost of fsync'ing each object file can become prohibitive. One major source of the cost of fsync is the implied flush of the hardware writeback cache within the disk drive. Fortunately, Windows, and macOS offer mechanisms to write data from the filesystem page cache without initiating a hardware flush. Linux has the sync_file_range API, which issues a pagecache writeback request reliably after version 5.2. This patch introduces a new 'core.fsyncObjectFiles = batch' option that batches up hardware flushes. It hooks into the bulk-checkin plugging and unplugging functionality and takes advantage of tmp-objdir. When the new mode is enabled we do the following for each new object: 1. Create the object in a tmp-objdir. 2. Issue a pagecache writeback request and wait for it to complete. At the end of the entire transaction when unplugging bulk checkin we: 1. Issue an fsync against a dummy file to flush the hardware writeback cache, which should by now have processed the tmp-objdir writes. 2. Rename all of the tmp-objdir files to their final names. 3. When updating the index and/or refs, we assume that Git will issue another fsync internal to that operation. This is not the case today, but may be a good extension to those components. On a filesystem with a singular journal that is updated during name operations (e.g. create, link, rename, etc), such as NTFS, HFS+, or XFS we would expect the fsync to trigger a journal writeout so that this sequence is enough to ensure that the user's data is durable by the time the git command returns. This change also updates the macOS code to trigger a real hardware flush via fnctl(fd, F_FULLFSYNC) when fsync_or_die is called. Previously, on macOS there was no guarantee of durability since a simple fsync(2) call does not flush any hardware caches. _Performance numbers_: Linux - Hyper-V VM running Kernel 5.11 (Ubuntu 20.04) on a fast SSD. Mac - macOS 11.5.1 running on a Mac mini on a 1TB Apple SSD. Windows - Same host as Linux, a preview version of Windows 11. This number is from a patch later in the series. Adding 500 files to the repo with 'git add' Times reported in seconds. core.fsyncObjectFiles | Linux | Mac | Windows ----------------------|-------|-------|-------- false | 0.06 | 0.35 | 0.61 true | 1.88 | 11.18 | 2.47 batch | 0.15 | 0.41 | 1.53 Signed-off-by: Neeraj Singh --- Documentation/config/core.txt | 29 ++++++++++++--- Makefile | 6 +++ builtin/add.c | 1 + bulk-checkin.c | 70 +++++++++++++++++++++++++++++++++++ bulk-checkin.h | 2 + cache.h | 8 +++- config.c | 7 +++- config.mak.uname | 1 + configure.ac | 8 ++++ environment.c | 2 +- git-compat-util.h | 7 ++++ object-file.c | 67 ++++++++++++++++++++++++++++++++- object-store.h | 16 ++++++++ object.c | 2 +- tmp-objdir.c | 20 +++++++++- tmp-objdir.h | 6 +++ wrapper.c | 44 ++++++++++++++++++++++ write-or-die.c | 2 +- 18 files changed, 285 insertions(+), 13 deletions(-) diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt index c04f62a54a1..200b4d9f06e 100644 --- a/Documentation/config/core.txt +++ b/Documentation/config/core.txt @@ -548,12 +548,29 @@ core.whitespace:: errors. The default tab width is 8. Allowed values are 1 to 63. core.fsyncObjectFiles:: - This boolean will enable 'fsync()' when writing object files. -+ -This is a total waste of time and effort on a filesystem that orders -data writes properly, but can be useful for filesystems that do not use -journalling (traditional UNIX filesystems) or that only journal metadata -and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback"). + A value indicating the level of effort Git will expend in + trying to make objects added to the repo durable in the event + of an unclean system shutdown. This setting currently only + controls loose objects in the object store, so updates to any + refs or the index may not be equally durable. ++ +* `false` allows data to remain in file system caches according to + operating system policy, whence it may be lost if the system loses power + or crashes. +* `true` triggers a data integrity flush for each loose object added to the + object store. This is the safest setting that is likely to ensure durability + across all operating systems and file systems that honor the 'fsync' system + call. However, this setting comes with a significant performance cost on + common hardware. Git does not currently fsync parent directories for + newly-added files, so some filesystems may still allow data to be lost on + system crash. +* `batch` enables an experimental mode that uses interfaces available in some + operating systems to write loose object data with a minimal set of FLUSH + CACHE (or equivalent) commands sent to the storage controller. If the + operating system interfaces are not available, this mode behaves the same as + `true`. This mode is expected to be as safe as `true` on macOS for repos + stored on HFS+ or APFS filesystems and on Windows for repos stored on NTFS or + ReFS. core.preloadIndex:: Enable parallel index preload for operations like 'git diff' diff --git a/Makefile b/Makefile index 429c276058d..326c7607e0f 100644 --- a/Makefile +++ b/Makefile @@ -406,6 +406,8 @@ all:: # # Define HAVE_CLOCK_MONOTONIC if your platform has CLOCK_MONOTONIC. # +# Define HAVE_SYNC_FILE_RANGE if your platform has sync_file_range. +# # Define NEEDS_LIBRT if your platform requires linking with librt (glibc version # before 2.17) for clock_gettime and CLOCK_MONOTONIC. # @@ -1896,6 +1898,10 @@ ifdef HAVE_CLOCK_MONOTONIC BASIC_CFLAGS += -DHAVE_CLOCK_MONOTONIC endif +ifdef HAVE_SYNC_FILE_RANGE + BASIC_CFLAGS += -DHAVE_SYNC_FILE_RANGE +endif + ifdef NEEDS_LIBRT EXTLIBS += -lrt endif diff --git a/builtin/add.c b/builtin/add.c index 2244311d485..9d9897cf037 100644 --- a/builtin/add.c +++ b/builtin/add.c @@ -678,6 +678,7 @@ int cmd_add(int argc, const char **argv, const char *prefix) if (chmod_arg && pathspec.nr) exit_status |= chmod_pathspec(&pathspec, chmod_arg[0], show_only); + unplug_bulk_checkin(); finish: diff --git a/bulk-checkin.c b/bulk-checkin.c index f117d62c908..957a6238684 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -3,14 +3,20 @@ */ #include "cache.h" #include "bulk-checkin.h" +#include "lockfile.h" #include "repository.h" #include "csum-file.h" #include "pack.h" #include "strbuf.h" +#include "string-list.h" +#include "tmp-objdir.h" #include "packfile.h" #include "object-store.h" static int bulk_checkin_plugged; +static int needs_batch_fsync; + +static struct tmp_objdir *bulk_fsync_objdir; static struct bulk_checkin_state { char *pack_tmp_name; @@ -62,6 +68,34 @@ clear_exit: reprepare_packed_git(the_repository); } +/* + * Cleanup after batch-mode fsync_object_files. + */ +static void do_batch_fsync(void) +{ + /* + * Issue a full hardware flush against a temporary file to ensure + * that all objects are durable before any renames occur. The code in + * fsync_loose_object_bulk_checkin has already issued a writeout + * request, but it has not flushed any writeback cache in the storage + * hardware. + */ + + if (needs_batch_fsync) { + struct strbuf temp_path = STRBUF_INIT; + struct tempfile *temp; + + strbuf_addf(&temp_path, "%s/bulk_fsync_XXXXXX", get_object_directory()); + temp = xmks_tempfile(temp_path.buf); + fsync_or_die(get_tempfile_fd(temp), get_tempfile_path(temp)); + delete_tempfile(&temp); + strbuf_release(&temp_path); + } + + if (bulk_fsync_objdir) + tmp_objdir_migrate(bulk_fsync_objdir); +} + static int already_written(struct bulk_checkin_state *state, struct object_id *oid) { int i; @@ -256,6 +290,26 @@ static int deflate_to_pack(struct bulk_checkin_state *state, return 0; } +void fsync_loose_object_bulk_checkin(int fd) +{ + assert(fsync_object_files == FSYNC_OBJECT_FILES_BATCH); + + /* + * If we have a plugged bulk checkin, we issue a call that + * cleans the filesystem page cache but avoids a hardware flush + * command. Later on we will issue a single hardware flush + * before as part of do_batch_fsync. + */ + if (bulk_checkin_plugged && + git_fsync(fd, FSYNC_WRITEOUT_ONLY) >= 0) { + assert(the_repository->objects->odb->is_temp); + if (!needs_batch_fsync) + needs_batch_fsync = 1; + } else { + fsync_or_die(fd, "loose object file"); + } +} + int index_bulk_checkin(struct object_id *oid, int fd, size_t size, enum object_type type, const char *path, unsigned flags) @@ -270,6 +324,20 @@ int index_bulk_checkin(struct object_id *oid, void plug_bulk_checkin(void) { assert(!bulk_checkin_plugged); + + /* + * Create a temporary object directory if the current + * object directory is not already temporary. + */ + if (fsync_object_files == FSYNC_OBJECT_FILES_BATCH && + !the_repository->objects->odb->is_temp) { + bulk_fsync_objdir = tmp_objdir_create(); + if (!bulk_fsync_objdir) + die(_("Could not create temporary object directory for core.fsyncobjectfiles=batch")); + + tmp_objdir_replace_main_odb(bulk_fsync_objdir); + } + bulk_checkin_plugged = 1; } @@ -279,4 +347,6 @@ void unplug_bulk_checkin(void) bulk_checkin_plugged = 0; if (bulk_checkin_state.f) finish_bulk_checkin(&bulk_checkin_state); + + do_batch_fsync(); } diff --git a/bulk-checkin.h b/bulk-checkin.h index b26f3dc3b74..08f292379b6 100644 --- a/bulk-checkin.h +++ b/bulk-checkin.h @@ -6,6 +6,8 @@ #include "cache.h" +void fsync_loose_object_bulk_checkin(int fd); + int index_bulk_checkin(struct object_id *oid, int fd, size_t size, enum object_type type, const char *path, unsigned flags); diff --git a/cache.h b/cache.h index d23de693680..d1897fe9d92 100644 --- a/cache.h +++ b/cache.h @@ -985,7 +985,13 @@ void reset_shared_repository(void); extern int read_replace_refs; extern char *git_replace_ref_base; -extern int fsync_object_files; +enum fsync_object_files_mode { + FSYNC_OBJECT_FILES_OFF, + FSYNC_OBJECT_FILES_ON, + FSYNC_OBJECT_FILES_BATCH +}; + +extern enum fsync_object_files_mode fsync_object_files; extern int core_preload_index; extern int precomposed_unicode; extern int protect_hfs; diff --git a/config.c b/config.c index cb4a8058bff..1b403e00241 100644 --- a/config.c +++ b/config.c @@ -1509,7 +1509,12 @@ static int git_default_core_config(const char *var, const char *value, void *cb) } if (!strcmp(var, "core.fsyncobjectfiles")) { - fsync_object_files = git_config_bool(var, value); + if (value && !strcmp(value, "batch")) + fsync_object_files = FSYNC_OBJECT_FILES_BATCH; + else if (git_config_bool(var, value)) + fsync_object_files = FSYNC_OBJECT_FILES_ON; + else + fsync_object_files = FSYNC_OBJECT_FILES_OFF; return 0; } diff --git a/config.mak.uname b/config.mak.uname index 76516aaa9a5..e6d482fbcc6 100644 --- a/config.mak.uname +++ b/config.mak.uname @@ -53,6 +53,7 @@ ifeq ($(uname_S),Linux) HAVE_CLOCK_MONOTONIC = YesPlease # -lrt is needed for clock_gettime on glibc <= 2.16 NEEDS_LIBRT = YesPlease + HAVE_SYNC_FILE_RANGE = YesPlease HAVE_GETDELIM = YesPlease SANE_TEXT_GREP=-a FREAD_READS_DIRECTORIES = UnfortunatelyYes diff --git a/configure.ac b/configure.ac index 031e8d3fee8..c711037d625 100644 --- a/configure.ac +++ b/configure.ac @@ -1090,6 +1090,14 @@ AC_COMPILE_IFELSE([CLOCK_MONOTONIC_SRC], [AC_MSG_RESULT([no]) HAVE_CLOCK_MONOTONIC=]) GIT_CONF_SUBST([HAVE_CLOCK_MONOTONIC]) + +# +# Define HAVE_SYNC_FILE_RANGE=YesPlease if sync_file_range is available. +GIT_CHECK_FUNC(sync_file_range, + [HAVE_SYNC_FILE_RANGE=YesPlease], + [HAVE_SYNC_FILE_RANGE]) +GIT_CONF_SUBST([HAVE_SYNC_FILE_RANGE]) + # # Define NO_SETITIMER if you don't have setitimer. GIT_CHECK_FUNC(setitimer, diff --git a/environment.c b/environment.c index d9ba68402e9..f318d59e585 100644 --- a/environment.c +++ b/environment.c @@ -43,7 +43,7 @@ const char *git_hooks_path; int zlib_compression_level = Z_BEST_SPEED; int core_compression_level; int pack_compression_level = Z_DEFAULT_COMPRESSION; -int fsync_object_files; +enum fsync_object_files_mode fsync_object_files; size_t packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE; size_t packed_git_limit = DEFAULT_PACKED_GIT_LIMIT; size_t delta_base_cache_limit = 96 * 1024 * 1024; diff --git a/git-compat-util.h b/git-compat-util.h index b46605300ab..d14e2436276 100644 --- a/git-compat-util.h +++ b/git-compat-util.h @@ -1210,6 +1210,13 @@ __attribute__((format (printf, 1, 2))) NORETURN void BUG(const char *fmt, ...); #endif +enum fsync_action { + FSYNC_WRITEOUT_ONLY, + FSYNC_HARDWARE_FLUSH +}; + +int git_fsync(int fd, enum fsync_action action); + /* * Preserves errno, prints a message, but gives no warning for ENOENT. * Returns 0 on success, which includes trying to unlink an object that does diff --git a/object-file.c b/object-file.c index ab593515cec..ec22560dd66 100644 --- a/object-file.c +++ b/object-file.c @@ -750,6 +750,60 @@ void add_to_alternates_memory(const char *reference) '\n', NULL, 0); } +struct object_directory *set_temporary_main_odb(const char *dir) +{ + struct object_directory *main_odb, *new_odb, *old_next; + + /* + * Make sure alternates are initialized, or else our entry may be + * overwritten when they are. + */ + prepare_alt_odb(the_repository); + + /* Copy the existing object directory and make it an alternate. */ + main_odb = the_repository->objects->odb; + new_odb = xmalloc(sizeof(*new_odb)); + *new_odb = *main_odb; + *the_repository->objects->odb_tail = new_odb; + the_repository->objects->odb_tail = &(new_odb->next); + new_odb->next = NULL; + + /* + * Reinitialize the main odb with the specified path, being careful + * to keep the next pointer value. + */ + old_next = main_odb->next; + memset(main_odb, 0, sizeof(*main_odb)); + main_odb->next = old_next; + main_odb->is_temp = 1; + main_odb->path = xstrdup(dir); + return new_odb; +} + +void restore_main_odb(struct object_directory *odb) +{ + struct object_directory **prev, *main_odb; + + /* Unlink the saved previous main ODB from the list. */ + prev = &the_repository->objects->odb->next; + assert(*prev); + while (*prev != odb) { + prev = &(*prev)->next; + } + *prev = odb->next; + if (*prev == NULL) + the_repository->objects->odb_tail = prev; + + /* + * Restore the data from the old main odb, being careful to + * keep the next pointer value + */ + main_odb = the_repository->objects->odb; + SWAP(*main_odb, *odb); + main_odb->next = odb->next; + free_object_directory(odb); +} + /* * Compute the exact path an alternate is at and returns it. In case of * error NULL is returned and the human readable error is added to `err` @@ -1867,8 +1921,19 @@ int hash_object_file(const struct git_hash_algo *algo, const void *buf, /* Finalize a file on disk, and close it. */ static void close_loose_object(int fd) { - if (fsync_object_files) + switch (fsync_object_files) { + case FSYNC_OBJECT_FILES_OFF: + break; + case FSYNC_OBJECT_FILES_ON: fsync_or_die(fd, "loose object file"); + break; + case FSYNC_OBJECT_FILES_BATCH: + fsync_loose_object_bulk_checkin(fd); + break; + default: + BUG("Invalid fsync_object_files mode."); + } + if (close(fd) != 0) die_errno(_("error when closing loose object file")); } diff --git a/object-store.h b/object-store.h index f8c883a5730..9bea14e7f3b 100644 --- a/object-store.h +++ b/object-store.h @@ -62,6 +62,19 @@ void add_to_alternates_file(const char *dir); */ void add_to_alternates_memory(const char *dir); +/* + * Replace the current main object directory with the specified temporary + * object directory. We make a copy of the former main object directory, + * add it as an in-memory alternate, and return the copy so that it can + * be restored via restore_main_odb. + */ +struct object_directory *set_temporary_main_odb(const char *dir); + +/* + * Restore a previous ODB replaced by set_temporary_main_odb. + */ +void restore_main_odb(struct object_directory *odb); + /* * Populate and return the loose object cache array corresponding to the * given object ID. @@ -72,6 +85,9 @@ struct oidtree *odb_loose_cache(struct object_directory *odb, /* Empty the loose object cache for the specified object directory. */ void odb_clear_loose_cache(struct object_directory *odb); +/* Clear and free the specified object directory */ +void free_object_directory(struct object_directory *odb); + struct packed_git { struct hashmap_entry packmap_ent; struct packed_git *next; diff --git a/object.c b/object.c index 4e85955a941..98635bc4043 100644 --- a/object.c +++ b/object.c @@ -513,7 +513,7 @@ struct raw_object_store *raw_object_store_new(void) return o; } -static void free_object_directory(struct object_directory *odb) +void free_object_directory(struct object_directory *odb) { free(odb->path); odb_clear_loose_cache(odb); diff --git a/tmp-objdir.c b/tmp-objdir.c index b8d880e3626..f027c49db4c 100644 --- a/tmp-objdir.c +++ b/tmp-objdir.c @@ -11,6 +11,7 @@ struct tmp_objdir { struct strbuf path; struct strvec env; + struct object_directory *prev_main_odb; }; /* @@ -50,8 +51,12 @@ static int tmp_objdir_destroy_1(struct tmp_objdir *t, int on_signal) * freeing memory; it may cause a deadlock if the signal * arrived while libc's allocator lock is held. */ - if (!on_signal) + if (!on_signal) { + if (t->prev_main_odb) + restore_main_odb(t->prev_main_odb); tmp_objdir_free(t); + } + return err; } @@ -132,6 +137,7 @@ struct tmp_objdir *tmp_objdir_create(void) t = xmalloc(sizeof(*t)); strbuf_init(&t->path, 0); strvec_init(&t->env); + t->prev_main_odb = NULL; strbuf_addf(&t->path, "%s/incoming-XXXXXX", get_object_directory()); @@ -269,6 +275,11 @@ int tmp_objdir_migrate(struct tmp_objdir *t) if (!t) return 0; + if (t->prev_main_odb) { + restore_main_odb(t->prev_main_odb); + t->prev_main_odb = NULL; + } + strbuf_addbuf(&src, &t->path); strbuf_addstr(&dst, get_object_directory()); @@ -292,3 +303,10 @@ void tmp_objdir_add_as_alternate(const struct tmp_objdir *t) { add_to_alternates_memory(t->path.buf); } + +void tmp_objdir_replace_main_odb(struct tmp_objdir *t) +{ + if (t->prev_main_odb) + BUG("the main object database is already replaced"); + t->prev_main_odb = set_temporary_main_odb(t->path.buf); +} diff --git a/tmp-objdir.h b/tmp-objdir.h index b1e45b4c75d..4b898add05b 100644 --- a/tmp-objdir.h +++ b/tmp-objdir.h @@ -51,4 +51,10 @@ int tmp_objdir_destroy(struct tmp_objdir *); */ void tmp_objdir_add_as_alternate(const struct tmp_objdir *); +/* + * Replaces the main object store in the current process with the temporary + * object directory and makes the former main object store an alternate. + */ +void tmp_objdir_replace_main_odb(struct tmp_objdir *); + #endif /* TMP_OBJDIR_H */ diff --git a/wrapper.c b/wrapper.c index 7c6586af321..bb4f9f043ce 100644 --- a/wrapper.c +++ b/wrapper.c @@ -540,6 +540,50 @@ int xmkstemp_mode(char *filename_template, int mode) return fd; } +int git_fsync(int fd, enum fsync_action action) +{ + switch (action) { + case FSYNC_WRITEOUT_ONLY: + +#ifdef __APPLE__ + /* + * on macOS, fsync just causes filesystem cache writeback but does not + * flush hardware caches. + */ + return fsync(fd); +#endif + +#ifdef HAVE_SYNC_FILE_RANGE + /* + * On linux 2.6.17 and above, sync_file_range is the way to issue + * a writeback without a hardware flush. An offset of 0 and size of 0 + * indicates writeout of the entire file and the wait flags ensure that all + * dirty data is written to the disk (potentially in a disk-side cache) + * before we continue. + */ + + return sync_file_range(fd, 0, 0, SYNC_FILE_RANGE_WAIT_BEFORE | + SYNC_FILE_RANGE_WRITE | + SYNC_FILE_RANGE_WAIT_AFTER); +#endif + + errno = ENOSYS; + return -1; + + case FSYNC_HARDWARE_FLUSH: + +#ifdef __APPLE__ + return fcntl(fd, F_FULLFSYNC); +#else + return fsync(fd); +#endif + + default: + BUG("unexpected git_fsync(%d) call", action); + } + +} + static int warn_if_unremovable(const char *op, const char *file, int rc) { int err; diff --git a/write-or-die.c b/write-or-die.c index d33e68f6abb..8f53953d4ab 100644 --- a/write-or-die.c +++ b/write-or-die.c @@ -57,7 +57,7 @@ void fprintf_or_die(FILE *f, const char *fmt, ...) void fsync_or_die(int fd, const char *msg) { - while (fsync(fd) < 0) { + while (git_fsync(fd, FSYNC_HARDWARE_FLUSH) < 0) { if (errno != EINTR) die_errno("fsync error on '%s'", msg); } From patchwork Fri Sep 24 20:12:08 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Neeraj Singh (WINDOWS-SFS)" X-Patchwork-Id: 12516771 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C3B6CC433F5 for ; Fri, 24 Sep 2021 20:12:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A6A2B61038 for ; Fri, 24 Sep 2021 20:12:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1348407AbhIXUOD (ORCPT ); Fri, 24 Sep 2021 16:14:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33032 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345750AbhIXUNu (ORCPT ); Fri, 24 Sep 2021 16:13:50 -0400 Received: from mail-wr1-x42b.google.com (mail-wr1-x42b.google.com [IPv6:2a00:1450:4864:20::42b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0EDAFC0614ED for ; Fri, 24 Sep 2021 13:12:17 -0700 (PDT) Received: by mail-wr1-x42b.google.com with SMTP id d6so30675438wrc.11 for ; Fri, 24 Sep 2021 13:12:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=c1quqjzj3unJsCwpkyV0syS23SZ3/Jt3uBM3z5xQXaE=; b=CuZlD8Zf4bvztKYJcIMKrJ9x+KmfIb92c64NCkTXMNnSkxzp04E+iMMhHbw/PuyLN2 0poJo/ELibq0+cTJSvVC+ZHsLm+dXhiRnX0Zfu9VjNjjNzrswynBOSF0JguHsHsMydj5 MIexohUzBZvP/LPfEUJlz6RXM4YFRX51XU2NvbCPtehVDXcI2k+zDv5tQtigpbus0t8p sSOCHEJcCaOB6Y9f69HkMuEPbSWxUvGYVOPaZO2rZh4EGNLvDCG1v1kXn5spjLtfLskt XNdssjasa0CNxpo/FCcmq+Z3N55DKrx2mYMVjjyCHijeNEQJwWipic+kHfgPRt73EZkx rLKg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=c1quqjzj3unJsCwpkyV0syS23SZ3/Jt3uBM3z5xQXaE=; b=I232Mx6PBs2SAGS7TNZfgziSyUiJhoqg7O8bYFKGA9Csfs/DgWEQdAgQ62boTLIfdd HqdKKuJ2CFvyWaoBOXqCJtWGQRiG67AntyQVEYPlwjPLaydRo7stgmOOKeJbRJgapRZl WG4FdsSeRY9NcrWsr27CK9fkMK5GeHWxX1Syupwi4T8qggWwxefkq1V5nzigrMdJZj3z 7y0+3VczaWj4sUpHce/6+jlGwjVsQfISiU6vNH4UPMmgDKwPoH4ryeQqk8uL8v/PRg0y jqzV0DMWsc0G0u8kY/BM1lAg1ovYVXFgDB6klOketHiW7dx2e2eo2fdCSzFcsDutjRUD b/+w== X-Gm-Message-State: AOAM530pnhTlX1F6L91ya5Ti0OhxdPkZQmnd3n3MUU663A+gCyjqJDUN 0MNw3TvjoeHW9Qh7zkT1yEGMzQAhvAw= X-Google-Smtp-Source: ABdhPJx0u60Eig1bKYNn5q43lPEKDZpW8Hr8BNqhqwsIuus5EAEHyQzF22NFjaPnVnstsS4zv+K0EA== X-Received: by 2002:a7b:c191:: with SMTP id y17mr4020064wmi.122.1632514335636; Fri, 24 Sep 2021 13:12:15 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id u25sm10599860wmm.5.2021.09.24.13.12.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Sep 2021 13:12:15 -0700 (PDT) Message-Id: <485b4a767dfa54729c40b32b7fea033aedc870d1.1632514331.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Fri, 24 Sep 2021 20:12:08 +0000 Subject: [PATCH v5 4/7] update-index: use the bulk-checkin infrastructure Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Neeraj-Personal , Johannes Schindelin , Jeff King , Jeff Hostetler , Christoph Hellwig , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , "Randall S. Becker" , Bagas Sanjaya , "Neeraj K. Singh" , Neeraj Singh Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Neeraj Singh From: Neeraj Singh The update-index functionality is used internally by 'git stash push' to setup the internal stashed commit. This change enables bulk-checkin for update-index infrastructure to speed up adding new objects to the object database by leveraging the pack functionality and the new bulk-fsync functionality. This mode is enabled when passing paths to update-index via the --stdin flag, as is done by 'git stash'. There is some risk with this change, since under batch fsync, the object files will not be available until the update-index is entirely complete. This usage is unlikely, since any tool invoking update-index and expecting to see objects would have to synchronize with the update-index process after passing it a file path. Signed-off-by: Neeraj Singh --- builtin/update-index.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/builtin/update-index.c b/builtin/update-index.c index 187203e8bb5..dc7368bb1ee 100644 --- a/builtin/update-index.c +++ b/builtin/update-index.c @@ -5,6 +5,7 @@ */ #define USE_THE_INDEX_COMPATIBILITY_MACROS #include "cache.h" +#include "bulk-checkin.h" #include "config.h" #include "lockfile.h" #include "quote.h" @@ -1088,6 +1089,9 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) the_index.updated_skipworktree = 1; + /* we might be adding many objects to the object database */ + plug_bulk_checkin(); + /* * Custom copy of parse_options() because we want to handle * filename arguments as they come. @@ -1168,6 +1172,8 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) strbuf_release(&buf); } + /* by now we must have added all of the new objects */ + unplug_bulk_checkin(); if (split_index > 0) { if (git_config_get_split_index() == 0) warning(_("core.splitIndex is set to false; " From patchwork Fri Sep 24 20:12:09 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Neeraj Singh (WINDOWS-SFS)" X-Patchwork-Id: 12516773 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 791C0C433F5 for ; Fri, 24 Sep 2021 20:12:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6103761038 for ; Fri, 24 Sep 2021 20:12:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1348421AbhIXUOY (ORCPT ); Fri, 24 Sep 2021 16:14:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33034 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1348399AbhIXUNv (ORCPT ); Fri, 24 Sep 2021 16:13:51 -0400 Received: from mail-wr1-x436.google.com (mail-wr1-x436.google.com [IPv6:2a00:1450:4864:20::436]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 88345C061613 for ; Fri, 24 Sep 2021 13:12:17 -0700 (PDT) Received: by mail-wr1-x436.google.com with SMTP id i23so30894810wrb.2 for ; Fri, 24 Sep 2021 13:12:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=Wx/NRBAYoIBhOKGldB61QZt0lf65wQv8k/Ky8embUfk=; b=dM+xicnb9pPddczEcg4c4VC36EkigoiQk+gth5qeQ4MxE3QZD0BBv9RzM67PxEwLgJ W+2zAWewlWikZ6/OI3JwhbYs+tKZWFGaAMj3Ns+s+tIxMRMv80eW59W+hrPFzsUYydxv 3H4PYx1TOe8CySGoxYKQ1HHoyhC2ogZCeU+64qMdXIaeTzguWJA+Mgi+a5jdIwAJwy5b xSXUYpMJjF/aETXo2ZyUDJ2XIrB/MLa+oXs+/NLk9Amn+Zt/hSxS6ODXSzTc0jiMMidI D0X+69cbAH3LWLpDRQ9cF8sZ17uTKJD7PVSi/R53VJfxEx29rhMCxHbyOFxDzvvWiAQe qBrA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=Wx/NRBAYoIBhOKGldB61QZt0lf65wQv8k/Ky8embUfk=; b=vkRpPpXiYL0QwtX/sOogMyFpb14JOVQTUfc1VQrpFgVi6Vbbrn8hSNna/0mjSQp+wr ELHW9JG/YH4APXqJ6K238ONOcKXFW4a8ysWi//ksK2bWk55OLriuL3yaBVKnGDl18a0X sKPGwReS9IIgPLsyq6diMQG2F0PEkEv5VfbF5z3kJDXgcv7Kz68DSqk0dhOEoULv8FuG bWVJomF4IRj+G1l67GQGJYdOteKqQh4IofIwL7AspDiRi/1hVWm3UQ6t2GNaKbQpnQ7p 4o2YfWidzPK+HACi8yLl+aciwNQ7aCbkOqpdko+LwCbQXfh7TyF0zo46aapzqZfv1Zsp z/UA== X-Gm-Message-State: AOAM530Owa5jYnnz+w1mQ8CRecw+Pb3BO4caAJYpYxggK63FQ3y+Gd60 689aVHAr8/gdlFSd6rmmBXWBy4NBUP4= X-Google-Smtp-Source: ABdhPJxBDoyLKte2l4FgOnnI4IU8WmmPgVdY25lz7cYks9Q6iMH0jbO7DtFGdQpxtNy1QRbhPjI+Ig== X-Received: by 2002:a7b:ce94:: with SMTP id q20mr4085564wmj.83.1632514336238; Fri, 24 Sep 2021 13:12:16 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id a77sm9267800wme.28.2021.09.24.13.12.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Sep 2021 13:12:16 -0700 (PDT) Message-Id: <889e76687601e3a1242e57c430a1b7f64ea1d77b.1632514331.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Fri, 24 Sep 2021 20:12:09 +0000 Subject: [PATCH v5 5/7] unpack-objects: use the bulk-checkin infrastructure Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Neeraj-Personal , Johannes Schindelin , Jeff King , Jeff Hostetler , Christoph Hellwig , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , "Randall S. Becker" , Bagas Sanjaya , "Neeraj K. Singh" , Neeraj Singh Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Neeraj Singh From: Neeraj Singh The unpack-objects functionality is used by fetch, push, and fast-import to turn the transfered data into object database entries when there are fewer objects than the 'unpacklimit' setting. By enabling bulk-checkin when unpacking objects, we can take advantage of batched fsyncs. Signed-off-by: Neeraj Singh --- builtin/unpack-objects.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c index 4a9466295ba..51eb4f7b531 100644 --- a/builtin/unpack-objects.c +++ b/builtin/unpack-objects.c @@ -1,5 +1,6 @@ #include "builtin.h" #include "cache.h" +#include "bulk-checkin.h" #include "config.h" #include "object-store.h" #include "object.h" @@ -503,10 +504,12 @@ static void unpack_all(void) if (!quiet) progress = start_progress(_("Unpacking objects"), nr_objects); CALLOC_ARRAY(obj_list, nr_objects); + plug_bulk_checkin(); for (i = 0; i < nr_objects; i++) { unpack_one(i); display_progress(progress, i + 1); } + unplug_bulk_checkin(); stop_progress(&progress); if (delta_list) From patchwork Fri Sep 24 20:12:10 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Neeraj Singh (WINDOWS-SFS)" X-Patchwork-Id: 12516775 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DAC92C433EF for ; Fri, 24 Sep 2021 20:12:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B9D2F61050 for ; Fri, 24 Sep 2021 20:12:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1348399AbhIXUO0 (ORCPT ); Fri, 24 Sep 2021 16:14:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33042 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1348377AbhIXUNv (ORCPT ); Fri, 24 Sep 2021 16:13:51 -0400 Received: from mail-wr1-x430.google.com (mail-wr1-x430.google.com [IPv6:2a00:1450:4864:20::430]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 287CAC061571 for ; Fri, 24 Sep 2021 13:12:18 -0700 (PDT) Received: by mail-wr1-x430.google.com with SMTP id u18so30854282wrg.5 for ; Fri, 24 Sep 2021 13:12:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=8AgoE4x6pFlEAgdAVwM5FjKoJLl1pgqUb7ScpI1FJL0=; b=jrOTJeaRVywR4gicMGMvRyxnOONOfgWoG+cgdj6bpUP2V63mUiQaHOvGhA3NhuE2AD /5FIHfNhJknsmsORhuLvDbLTVwJU5DASf0Y17WLOEXM4/NWHaHUx0L5l5R8GkQ/L2Y88 Oox0uW7kanjMRBrQ3FJoNJofVKiyuxgv7QX+vOFzXJvw8KYh8oULoeivYlfFagyzHI/A N8rioPXuIfjDIYK0hMlel4jSl2rOLuEq8OfAgs+wTjQiP1JoZgzFuqKPW8AJLKU+lfI1 Z2CSqf6dQYVhk6JDgJg7lR+NSlK/d9FweZ2EhcmFCF7IFNEz1hxj48C0qY9HaIISqN+B ktRg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=8AgoE4x6pFlEAgdAVwM5FjKoJLl1pgqUb7ScpI1FJL0=; b=ZQNx8AUQG6nLFBuN6QNkAZIfoo7+eMiO64tr+6d7t6m307rdyoPrYCpTqbFKo48cpA b9DZ5NobENyKpqIadZypCz6ciGPb34Zz7XIDtCq3wfuwRgSWhsZoaIPVtQSHrgkft3+P jS1k1ylVDktTWPFXAYq/8KGjaXQWbcToPo2/7+UpSF+EMzmvzmfjDZw4yRaC8L3zZWDZ 86FrzzTcFEQbMFS/3ZWKhpW8lNIEDXAsZ+Y1yxknhlRyG78QiZGMnWY4yUbwyWAT8H+n +WHvatytgwgd3kqAyG0dh8dtq+lAtSuUjLuK4ZqJ/anTcabnLkz5VGC2ur+Frrsfpjl2 7diA== X-Gm-Message-State: AOAM533rAFWk1Ccub3qYvEIys1u7QIj3uSe0XWHqiksGA6v6KsxMkYrr d7Uz9ChW8o9trgOyRCOTaTimCNTnUuM= X-Google-Smtp-Source: ABdhPJyTnv5Nf+W/bEWnTz+ul/vpipmBQ0DSBbN0kEDMbBglrbRxlWyTGexDJ7rfk+hKRbgHE+Yw+A== X-Received: by 2002:adf:dd42:: with SMTP id u2mr11910604wrm.39.1632514336790; Fri, 24 Sep 2021 13:12:16 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id q7sm9149217wru.56.2021.09.24.13.12.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Sep 2021 13:12:16 -0700 (PDT) Message-Id: <0f2e3b25759160a31c11836b72b1f3783bf1e372.1632514331.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Fri, 24 Sep 2021 20:12:10 +0000 Subject: [PATCH v5 6/7] core.fsyncobjectfiles: tests for batch mode Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Neeraj-Personal , Johannes Schindelin , Jeff King , Jeff Hostetler , Christoph Hellwig , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , "Randall S. Becker" , Bagas Sanjaya , "Neeraj K. Singh" , Neeraj Singh Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Neeraj Singh From: Neeraj Singh Add test cases to exercise batch mode for: * 'git add' * 'git stash' * 'git update-index' * 'git unpack-objects' These tests ensure that the added data winds up in the object database. In this change we introduce a new test helper lib-unique-files.sh. The goal of this library is to create a tree of files that have different oids from any other files that may have been created in the current test repo. This helps us avoid missing validation of an object being added due to it already being in the repo. Signed-off-by: Neeraj Singh --- t/lib-unique-files.sh | 36 ++++++++++++++++++++++++++++++++++++ t/t3700-add.sh | 20 ++++++++++++++++++++ t/t3903-stash.sh | 14 ++++++++++++++ t/t5300-pack-object.sh | 30 +++++++++++++++++++----------- 4 files changed, 89 insertions(+), 11 deletions(-) create mode 100644 t/lib-unique-files.sh diff --git a/t/lib-unique-files.sh b/t/lib-unique-files.sh new file mode 100644 index 00000000000..a7de4ca8512 --- /dev/null +++ b/t/lib-unique-files.sh @@ -0,0 +1,36 @@ +# Helper to create files with unique contents + + +# Create multiple files with unique contents. Takes the number of +# directories, the number of files in each directory, and the base +# directory. +# +# test_create_unique_files 2 3 my_dir -- Creates 2 directories with 3 files +# each in my_dir, all with unique +# contents. + +test_create_unique_files() { + test "$#" -ne 3 && BUG "3 param" + + local dirs=$1 + local files=$2 + local basedir=$3 + local counter=0 + test_tick + local basedata=$test_tick + + + rm -rf $basedir + + for i in $(test_seq $dirs) + do + local dir=$basedir/dir$i + + mkdir -p "$dir" + for j in $(test_seq $files) + do + counter=$((counter + 1)) + echo "$basedata.$counter" >"$dir/file$j.txt" + done + done +} diff --git a/t/t3700-add.sh b/t/t3700-add.sh index 4086e1ebbc9..36049a53ff7 100755 --- a/t/t3700-add.sh +++ b/t/t3700-add.sh @@ -7,6 +7,8 @@ test_description='Test of git add, including the -- option.' . ./test-lib.sh +. $TEST_DIRECTORY/lib-unique-files.sh + # Test the file mode "$1" of the file "$2" in the index. test_mode_in_index () { case "$(git ls-files -s "$2")" in @@ -33,6 +35,24 @@ test_expect_success \ 'Test that "git add -- -q" works' \ 'touch -- -q && git add -- -q' +test_expect_success 'git add: core.fsyncobjectfiles=batch' " + test_create_unique_files 2 4 fsync-files && + git -c core.fsyncobjectfiles=batch add -- ./fsync-files/ && + rm -f fsynced_files && + git ls-files --stage fsync-files/ > fsynced_files && + test_line_count = 8 fsynced_files && + awk -- '{print \$2}' fsynced_files | xargs -n1 git cat-file -e +" + +test_expect_success 'git update-index: core.fsyncobjectfiles=batch' " + test_create_unique_files 2 4 fsync-files2 && + find fsync-files2 ! -type d -print | xargs git -c core.fsyncobjectfiles=batch update-index --add -- && + rm -f fsynced_files2 && + git ls-files --stage fsync-files2/ > fsynced_files2 && + test_line_count = 8 fsynced_files2 && + awk -- '{print \$2}' fsynced_files2 | xargs -n1 git cat-file -e +" + test_expect_success \ 'git add: Test that executable bit is not used if core.filemode=0' \ 'git config core.filemode 0 && diff --git a/t/t3903-stash.sh b/t/t3903-stash.sh index 873aa56e359..2fc819e5584 100755 --- a/t/t3903-stash.sh +++ b/t/t3903-stash.sh @@ -9,6 +9,7 @@ GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME . ./test-lib.sh +. $TEST_DIRECTORY/lib-unique-files.sh diff_cmp () { for i in "$1" "$2" @@ -1293,6 +1294,19 @@ test_expect_success 'stash handles skip-worktree entries nicely' ' git rev-parse --verify refs/stash:A.t ' +test_expect_success 'stash with core.fsyncobjectfiles=batch' " + test_create_unique_files 2 4 fsync-files && + git -c core.fsyncobjectfiles=batch stash push -u -- ./fsync-files/ && + rm -f fsynced_files && + + # The files were untracked, so use the third parent, + # which contains the untracked files + git ls-tree -r stash^3 -- ./fsync-files/ > fsynced_files && + test_line_count = 8 fsynced_files && + awk -- '{print \$3}' fsynced_files | xargs -n1 git cat-file -e +" + + test_expect_success 'stash -c stash.useBuiltin=false warning ' ' expected="stash.useBuiltin support has been removed" && diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh index e13a8842075..38663dc1393 100755 --- a/t/t5300-pack-object.sh +++ b/t/t5300-pack-object.sh @@ -162,23 +162,23 @@ test_expect_success 'pack-objects with bogus arguments' ' check_unpack () { test_when_finished "rm -rf git2" && - git init --bare git2 && - git -C git2 unpack-objects -n <"$1".pack && - git -C git2 unpack-objects <"$1".pack && - (cd .git && find objects -type f -print) | - while read path - do - cmp git2/$path .git/$path || { - echo $path differs. - return 1 - } - done + git $2 init --bare git2 && + ( + git $2 -C git2 unpack-objects -n <"$1".pack && + git $2 -C git2 unpack-objects <"$1".pack && + git $2 -C git2 cat-file --batch-check="%(objectname)" + ) current && + cmp obj-list current } test_expect_success 'unpack without delta' ' check_unpack test-1-${packname_1} ' +test_expect_success 'unpack without delta (core.fsyncobjectfiles=batch)' ' + check_unpack test-1-${packname_1} "-c core.fsyncobjectfiles=batch" +' + test_expect_success 'pack with REF_DELTA' ' packname_2=$(git pack-objects --progress test-2 stderr) && check_deltas stderr -gt 0 @@ -188,6 +188,10 @@ test_expect_success 'unpack with REF_DELTA' ' check_unpack test-2-${packname_2} ' +test_expect_success 'unpack with REF_DELTA (core.fsyncobjectfiles=batch)' ' + check_unpack test-2-${packname_2} "-c core.fsyncobjectfiles=batch" +' + test_expect_success 'pack with OFS_DELTA' ' packname_3=$(git pack-objects --progress --delta-base-offset test-3 \ stderr) && @@ -198,6 +202,10 @@ test_expect_success 'unpack with OFS_DELTA' ' check_unpack test-3-${packname_3} ' +test_expect_success 'unpack with OFS_DELTA (core.fsyncobjectfiles=batch)' ' + check_unpack test-3-${packname_3} "-c core.fsyncobjectfiles=batch" +' + test_expect_success 'compare delta flavors' ' perl -e '\'' defined($_ = -s $_) or die for @ARGV; From patchwork Fri Sep 24 20:12:11 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Neeraj Singh (WINDOWS-SFS)" X-Patchwork-Id: 12516777 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E769BC433FE for ; Fri, 24 Sep 2021 20:12:54 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D1C6361050 for ; Fri, 24 Sep 2021 20:12:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346191AbhIXUO1 (ORCPT ); Fri, 24 Sep 2021 16:14:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33044 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1348389AbhIXUNw (ORCPT ); Fri, 24 Sep 2021 16:13:52 -0400 Received: from mail-wr1-x42b.google.com (mail-wr1-x42b.google.com [IPv6:2a00:1450:4864:20::42b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9D1B5C06161E for ; Fri, 24 Sep 2021 13:12:18 -0700 (PDT) Received: by mail-wr1-x42b.google.com with SMTP id c21so1123960wrb.13 for ; Fri, 24 Sep 2021 13:12:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=THp3oyYMQZ0IyvYbWDVtqI55HStotyepThNgwaAMuqY=; b=OIJ1R/Jm67Mlk9N3HCbJ/F/TnBOnwW4ZXqB2HLl2ns4kwGmdpQ7U8meAyd+u6wN4QI nN2xn1xBVPU2hpSMnkoqcva5MVComt2kIXymt7nYO7/UAyYAvjsYk6S/zqL02mScEtY1 PAPLW5PK5bdOdEiBabKjgI2cqqe5N6QL5sqaYK9H7UyWr+UMGrwxzS89WBcI41dncTvh kDMpVt5rT5tlnjYnagRrzRpNRIRyA189Z5Hg7O9PCBFmN4o7/HSD3l/npSSc+KCMPVZE JjSjErINNozc6oTuXYmLrkKEGjENof7H9id117u0jZNHOJUxZZflZpUr6KEErrXzQKY/ hoew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=THp3oyYMQZ0IyvYbWDVtqI55HStotyepThNgwaAMuqY=; b=xzyttnUmQghiJMSTXX8KzganUYnBPw6/C10hRi7V4tFRP2jHBCPXrx6RY+G/hBk1rt Y4DjQW3DezKYO8NqnUT2Y5kIQAFnvE7PTg+hDT/dVxck56A08j14nLreLeQmObtO/db6 8bVGlnBb22UYvVYzPPGPtYYy9iRK3jmPj2kj8IixZiAyRWM4h56ZBDcAAr4uGooWTCqO lTYqvAHdAgmMVh4kKMC8o4QQJTGIXCV4GHI+3GNQ1z1x8vuAUvx/I4W2a+ygW6lQ2L3B F+abE8Xl9XZloxf8+nvtNpxIi5XLGzaeJbLkGuJWysciDuHNoTP9UoFxw7qZoEK0N+8o bfEg== X-Gm-Message-State: AOAM531Dj+gU94yrLWcTkgOJkuJSpZYrIS8A+bASheXbKvjyhDZ/y3MC hrZN3EwJLgyiRpE37rp0Dn6+8yIGia0= X-Google-Smtp-Source: ABdhPJxC/6A85AXjIikb9dP80qTM9F0cHmrRxExUhHO8RucYLJHvZV4p9e4nsYEzlGP5z0yuwn7fBA== X-Received: by 2002:a05:6000:186d:: with SMTP id d13mr13767455wri.169.1632514337267; Fri, 24 Sep 2021 13:12:17 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id w5sm9000152wra.87.2021.09.24.13.12.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Sep 2021 13:12:17 -0700 (PDT) Message-Id: <6543564376a7b06809d51dedbbf4571c359ace3b.1632514331.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Fri, 24 Sep 2021 20:12:11 +0000 Subject: [PATCH v5 7/7] core.fsyncobjectfiles: performance tests for add and stash Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Neeraj-Personal , Johannes Schindelin , Jeff King , Jeff Hostetler , Christoph Hellwig , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , "Randall S. Becker" , Bagas Sanjaya , "Neeraj K. Singh" , Neeraj Singh Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Neeraj Singh From: Neeraj Singh Add a basic performance test for "git add" and "git stash" of a lot of new objects with various fsync settings. Signed-off-by: Neeraj Singh --- t/perf/p3700-add.sh | 43 ++++++++++++++++++++++++++++++++++++++++ t/perf/p3900-stash.sh | 46 +++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 89 insertions(+) create mode 100755 t/perf/p3700-add.sh create mode 100755 t/perf/p3900-stash.sh diff --git a/t/perf/p3700-add.sh b/t/perf/p3700-add.sh new file mode 100755 index 00000000000..e93c08a2e70 --- /dev/null +++ b/t/perf/p3700-add.sh @@ -0,0 +1,43 @@ +#!/bin/sh +# +# This test measures the performance of adding new files to the object database +# and index. The test was originally added to measure the effect of the +# core.fsyncObjectFiles=batch mode, which is why we are testing different values +# of that setting explicitly and creating a lot of unique objects. + +test_description="Tests performance of add" + +. ./perf-lib.sh + +. $TEST_DIRECTORY/lib-unique-files.sh + +test_perf_default_repo +test_checkout_worktree + +dir_count=10 +files_per_dir=50 +total_files=$((dir_count * files_per_dir)) + +# We need to create the files each time we run the perf test, but +# we do not want to measure the cost of creating the files, so run +# the tet once. +if test "${GIT_PERF_REPEAT_COUNT-1}" -ne 1 +then + echo "warning: Setting GIT_PERF_REPEAT_COUNT=1" >&2 + GIT_PERF_REPEAT_COUNT=1 +fi + +for m in false true batch +do + test_expect_success "create the files for core.fsyncObjectFiles=$m" ' + git reset --hard && + # create files across directories + test_create_unique_files $dir_count $files_per_dir files + ' + + test_perf "add $total_files files (core.fsyncObjectFiles=$m)" " + git -c core.fsyncobjectfiles=$m add files + " +done + +test_done diff --git a/t/perf/p3900-stash.sh b/t/perf/p3900-stash.sh new file mode 100755 index 00000000000..c9fcd0c03eb --- /dev/null +++ b/t/perf/p3900-stash.sh @@ -0,0 +1,46 @@ +#!/bin/sh +# +# This test measures the performance of adding new files to the object database +# and index. The test was originally added to measure the effect of the +# core.fsyncObjectFiles=batch mode, which is why we are testing different values +# of that setting explicitly and creating a lot of unique objects. + +test_description="Tests performance of stash" + +. ./perf-lib.sh + +. $TEST_DIRECTORY/lib-unique-files.sh + +test_perf_default_repo +test_checkout_worktree + +dir_count=10 +files_per_dir=50 +total_files=$((dir_count * files_per_dir)) + +# We need to create the files each time we run the perf test, but +# we do not want to measure the cost of creating the files, so run +# the tet once. +if test "${GIT_PERF_REPEAT_COUNT-1}" -ne 1 +then + echo "warning: Setting GIT_PERF_REPEAT_COUNT=1" >&2 + GIT_PERF_REPEAT_COUNT=1 +fi + +for m in false true batch +do + test_expect_success "create the files for core.fsyncObjectFiles=$m" ' + git reset --hard && + # create files across directories + test_create_unique_files $dir_count $files_per_dir files + ' + + # We only stash files in the 'files' subdirectory since + # the perf test infrastructure creates files in the + # current working directory that need to be preserved + test_perf "stash 500 files (core.fsyncObjectFiles=$m)" " + git -c core.fsyncobjectfiles=$m stash push -u -- files + " +done + +test_done