From patchwork Fri Jan 6 16:31:53 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13091525 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 68CD2C5479D for ; Fri, 6 Jan 2023 16:32:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231307AbjAFQcF (ORCPT ); Fri, 6 Jan 2023 11:32:05 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51036 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229510AbjAFQcC (ORCPT ); Fri, 6 Jan 2023 11:32:02 -0500 Received: from mail-wr1-x431.google.com (mail-wr1-x431.google.com [IPv6:2a00:1450:4864:20::431]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ED68976EF1 for ; Fri, 6 Jan 2023 08:32:00 -0800 (PST) Received: by mail-wr1-x431.google.com with SMTP id z16so1760124wrw.1 for ; Fri, 06 Jan 2023 08:32:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=gAFl3g0ii3MudzNaTXBcOtIccfisGMWtyBiFHC36zl0=; b=aK3B6cwKg3htHZKo85uf2S7rkGqMzxuaaBW01WhECc/uLjcXQxWUGaHk/UK+iaiLkG T2G8o2wbK4jUst4ysihpjUV38H8JmkeWQ7Y8tbCXmJNlkGOjwrurfR+gBgZo5gdE5Ah/ iAqFTytDwC5tBYBbjkwvIyP4XC3CqvkkXb1c6/OGRx/USdNugr+drn0wl08lCJNxixcp 6i65uwJaoivX64tuYO5zkhkjWkqy1bFW9g9y2n5NGAgIhQ/N1RGx0jKfgDZVUuvc4FCB eBL5lYFWoVRvyOZPCxcbz3CNR9Nwy+tRPLHidGkzaTqVL94lNEksoSs6+aC0Alb22zeV hxtw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=gAFl3g0ii3MudzNaTXBcOtIccfisGMWtyBiFHC36zl0=; b=BNgTA8EXa4RpYBPcToN4U2R/2jf+//3hbMms1KGzMkd3F/K/7i4di4kyX6QPoiFJgs 54kec/yMv3o9G1W+/l3MCLnl+lDUEbhw5mKZ3RWOpoNCdBwEED5WjDD3yW7QHXK3WfgA xF/OGYwJpgBY8To2FZ0Dst3A8hJJqNfZXXaR+rJbb5+wetr1MRikcpBITbycBrKU5kUJ yyfjD/KeID3cHCH69DLr134wOooMwbuq4qz0XdMKpa4OtA8CJTfDyO4A59ml72jLhws3 ZTzClvLKx7NqxA8Qga+yWlGS0E7z4aVNhukxt2/Jr/R7gXNuX9oel1q0ZPseXpAdL2pD sCrA== X-Gm-Message-State: AFqh2kryiBhBDSQR+A6XO7KUXuGQp/yTxyrGQX7BAOGwpvExgGdRbsgR yAvLMe292ZuMh//ZlBWlY7eBII93Yf0= X-Google-Smtp-Source: AMrXdXvfcqLP8MuQ+ErbxA1wOh59bUMHuBa9xYIOdkI7JAxFbRzuXTMU36cfUiHZU0R15tm00CWWqg== X-Received: by 2002:adf:cc8f:0:b0:242:14bb:439d with SMTP id p15-20020adfcc8f000000b0024214bb439dmr36504977wrj.43.1673022719239; Fri, 06 Jan 2023 08:31:59 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id i6-20020adfe486000000b002423dc3b1a9sm1556128wrm.52.2023.01.06.08.31.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 06 Jan 2023 08:31:58 -0800 (PST) Message-Id: In-Reply-To: References: Date: Fri, 06 Jan 2023 16:31:53 +0000 Subject: [PATCH v5 1/4] hashfile: allow skipping the hash function Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, vdye@github.com, avarab@gmail.com, newren@gmail.com, Jacob Keller , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The hashfile API is useful for generating files that include a trailing hash of the file's contents up to that point. Using such a hash is helpful for verifying the file for corruption-at-rest, such as a faulty drive causing flipped bits. Git's index file includes this trailing hash, so it uses a 'struct hashfile' to handle the I/O to the file. This was very convenient to allow using the hashfile methods during these operations. However, hashing the file contents during write comes at a performance penalty. It's slower to hash the bytes on their way to the disk than without that step. This problem is made worse by the replacement of hardware-accelerated SHA1 computations with the software-based sha1dc computation. This write cost is significant, and the checksum capability is likely not worth that cost for such a short-lived file. The index is rewritten frequently and the only time the checksum is checked is during 'git fsck'. Thus, it would be helpful to allow a user to opt-out of the hash computation. We first need to allow Git to opt-out of the hash computation in the hashfile API. The buffered writes of the API are still helpful, so it makes sense to make the change here. Introduce a new 'skip_hash' option to 'struct hashfile'. When set, the update_fn and final_fn members of the_hash_algo are skipped. When finalizing the hashfile, the trailing hash is replaced with the null hash. This use of a trailing null hash would be desireable in either case, since we do not want to special case a file format to have a different length depending on whether it was hashed or not. When the final bytes of a file are all zero, we can infer that it was written without hashing, and thus that verification is not available as a check for file consistency. This also means that we could easily toggle hashing for any file format we desire. A version of this patch has existed in the microsoft/git fork since 2017 [1] (the linked commit was rebased in 2018, but the original dates back to January 2017). Here, the change to make the index use this fast path is delayed until a later change. [1] https://github.com/microsoft/git/commit/21fed2d91410f45d85279467f21d717a2db45201 Co-authored-by: Kevin Willford Signed-off-by: Kevin Willford Signed-off-by: Derrick Stolee --- csum-file.c | 14 +++++++++++--- csum-file.h | 7 +++++++ 2 files changed, 18 insertions(+), 3 deletions(-) diff --git a/csum-file.c b/csum-file.c index 59ef3398ca2..cce13c0f047 100644 --- a/csum-file.c +++ b/csum-file.c @@ -45,7 +45,8 @@ void hashflush(struct hashfile *f) unsigned offset = f->offset; if (offset) { - the_hash_algo->update_fn(&f->ctx, f->buffer, offset); + if (!f->skip_hash) + the_hash_algo->update_fn(&f->ctx, f->buffer, offset); flush(f, f->buffer, offset); f->offset = 0; } @@ -64,7 +65,12 @@ int finalize_hashfile(struct hashfile *f, unsigned char *result, int fd; hashflush(f); - the_hash_algo->final_fn(f->buffer, &f->ctx); + + if (f->skip_hash) + hashclr(f->buffer); + else + the_hash_algo->final_fn(f->buffer, &f->ctx); + if (result) hashcpy(result, f->buffer); if (flags & CSUM_HASH_IN_STREAM) @@ -108,7 +114,8 @@ void hashwrite(struct hashfile *f, const void *buf, unsigned int count) * the hashfile's buffer. In this block, * f->offset is necessarily zero. */ - the_hash_algo->update_fn(&f->ctx, buf, nr); + if (!f->skip_hash) + the_hash_algo->update_fn(&f->ctx, buf, nr); flush(f, buf, nr); } else { /* @@ -153,6 +160,7 @@ static struct hashfile *hashfd_internal(int fd, const char *name, f->tp = tp; f->name = name; f->do_crc = 0; + f->skip_hash = 0; the_hash_algo->init_fn(&f->ctx); f->buffer_len = buffer_len; diff --git a/csum-file.h b/csum-file.h index 0d29f528fbc..793a59da12b 100644 --- a/csum-file.h +++ b/csum-file.h @@ -20,6 +20,13 @@ struct hashfile { size_t buffer_len; unsigned char *buffer; unsigned char *check_buffer; + + /** + * If non-zero, skip_hash indicates that we should + * not actually compute the hash for this hashfile and + * instead only use it as a buffered write. + */ + int skip_hash; }; /* Checkpoint */ From patchwork Fri Jan 6 16:31:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13091526 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 642FCC3DA7A for ; Fri, 6 Jan 2023 16:32:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233870AbjAFQcJ (ORCPT ); Fri, 6 Jan 2023 11:32:09 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51044 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229877AbjAFQcD (ORCPT ); Fri, 6 Jan 2023 11:32:03 -0500 Received: from mail-wr1-x433.google.com (mail-wr1-x433.google.com [IPv6:2a00:1450:4864:20::433]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DC36676EF4 for ; Fri, 6 Jan 2023 08:32:01 -0800 (PST) Received: by mail-wr1-x433.google.com with SMTP id bn26so1773852wrb.0 for ; Fri, 06 Jan 2023 08:32:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:fcc:content-transfer-encoding:mime-version:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=W2kdq6A9jY79UN1v3D5E0yTHK7NY16SGNzEHfNPUuTE=; b=fgXben6QyJe65hVkMMQqk07Lv5up8cVMp6FA6GKto0Rx3BtnOyaXzeAAdexzoUT8o4 zdMB+JCY/MP8Zk20V0/5T4kr7etoEft/oqbl7D/rBTQHpOCFdK0Xmw2+zymmrfSnF6vq rSDSVTz485xdcpjDYFPqXo27ICdvhdUYj938ePfs5juswMZrq3uZdQzlAUCXn2NhKawQ 7N0jVfu2XoLDffRTm4dlaQ8RLTXKB9/NcIM2oktTbdAeyyUQ7ZR7RW266ndjgVHOEGGg zXsZtkGxF9DmR9vElss+url7td89QsRc8i3rd56bpljbmd8lvZdTZOWdimS0/DNBCb+3 2XFg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:fcc:content-transfer-encoding:mime-version:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=W2kdq6A9jY79UN1v3D5E0yTHK7NY16SGNzEHfNPUuTE=; b=OpRR2iR4PtASTSEbGQwqU/TqrTpggkseLf7EdlOxcFbvCLepAj68gnpkG+lQ0j4leQ IHI8dAwb1HAdyV2M1qB+QJpatjOiI93TyP7RgAKimwZZO+kJ/jnP33OvH6ejPuKlD4pT 4sGXYwsrlhUBiANah0m+ZoIY5n/T3zXz4o7RlpXOIDu/k6uBLIY6pRLYHAaKgVJlU+JJ hZxUpPtI+lswGkWArUstAbTYRvwE4StTbJ4hFL9U8GNMhIvJplVXZ9MsTOdwbP+oEsCn wy8uAKuM4cEQqhK6Lp7R/iJjSleiLJE9qGNOQV8scISapNJdbOp6ODEkZ1rOmnalNNC0 yLmA== X-Gm-Message-State: AFqh2koAuRsUIl8s4uSGle4o5ABVhydba1nGj2RWQGDn7eZIo1L0wsMt kQdPqjBMV7ymOBnNgmFsz/HHv78JLHU= X-Google-Smtp-Source: AMrXdXsBKf6L1wicfJxci/XpweO9WX5+ZKzuKkNuEDSy+Hq5DlalvjRXwcNVz1ujfoepfETZLSF9bQ== X-Received: by 2002:a5d:610a:0:b0:27a:cc74:977d with SMTP id v10-20020a5d610a000000b0027acc74977dmr26385064wrt.4.1673022720249; Fri, 06 Jan 2023 08:32:00 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id v12-20020a5d6b0c000000b002366553eca7sm1569374wrw.83.2023.01.06.08.31.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 06 Jan 2023 08:31:59 -0800 (PST) Message-Id: In-Reply-To: References: Date: Fri, 06 Jan 2023 16:31:54 +0000 Subject: [PATCH v5 2/4] read-cache: add index.skipHash config option MIME-Version: 1.0 Fcc: Sent To: git@vger.kernel.org Cc: gitster@pobox.com, vdye@github.com, avarab@gmail.com, newren@gmail.com, Jacob Keller , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The previous change allowed skipping the hashing portion of the hashwrite API, using it instead as a buffered write API. Disabling the hashwrite can be particularly helpful when the write operation is in a critical path. One such critical path is the writing of the index. This operation is so critical that the sparse index was created specifically to reduce the size of the index to make these writes (and reads) faster. This trade-off between file stability at rest and write-time performance is not easy to balance. The index is an interesting case for a couple reasons: 1. Writes block users. Writing the index takes place in many user- blocking foreground operations. The speed improvement directly impacts their use. Other file formats are typically written in the background (commit-graph, multi-pack-index) or are super-critical to correctness (pack-files). 2. Index files are short lived. It is rare that a user leaves an index for a long time with many staged changes. Outside of staged changes, the index can be completely destroyed and rewritten with minimal impact to the user. Following a similar approach to one used in the microsoft/git fork [1], add a new config option (index.skipHash) that allows disabling this hashing during the index write. The cost is that we can no longer validate the contents for corruption-at-rest using the trailing hash. [1] https://github.com/microsoft/git/commit/21fed2d91410f45d85279467f21d717a2db45201 We load this config from the repository config given by istate->repo, with a fallback to the_repository if it is not set. While older Git versions will not recognize the null hash as a special case, the file format itself is still being met in terms of its structure. Using this null hash will still allow Git operations to function across older versions. The one exception is 'git fsck' which checks the hash of the index file. This used to be a check on every index read, but was split out to just the index in a33fc72fe91 (read-cache: force_verify_index_checksum, 2017-04-14) and released first in Git 2.13.0. Document the versions that relaxed these restrictions, with the optimistic expectation that this change will be included in Git 2.40.0. Here, we disable this check if the trailing hash is all zeroes. We add a warning to the config option that this may cause undesirable behavior with older Git versions. As a quick comparison, I tested 'git update-index --force-write' with and without index.skipHash=true on a copy of the Linux kernel repository. Benchmark 1: with hash Time (mean ± σ): 46.3 ms ± 13.8 ms [User: 34.3 ms, System: 11.9 ms] Range (min … max): 34.3 ms … 79.1 ms 82 runs Benchmark 2: without hash Time (mean ± σ): 26.0 ms ± 7.9 ms [User: 11.8 ms, System: 14.2 ms] Range (min … max): 16.3 ms … 42.0 ms 69 runs Summary 'without hash' ran 1.78 ± 0.76 times faster than 'with hash' These performance benefits are substantial enough to allow users the ability to opt-in to this feature, even with the potential confusion with older 'git fsck' versions. Test this new config option, both at a command-line level and within a submodule. The confirmation is currently limited to confirm that 'git fsck' does not complain about the index. Future updates will make this test more robust. It is critical that this test is placed before the test_index_version tests, since those tests obliterate the .git/config file and hence lose the setting from GIT_TEST_DEFAULT_HASH, if set. Signed-off-by: Derrick Stolee --- Documentation/config/index.txt | 11 +++++++++++ read-cache.c | 13 ++++++++++++- t/t1600-index.sh | 14 ++++++++++++++ 3 files changed, 37 insertions(+), 1 deletion(-) diff --git a/Documentation/config/index.txt b/Documentation/config/index.txt index 75f3a2d1054..23c7985eb40 100644 --- a/Documentation/config/index.txt +++ b/Documentation/config/index.txt @@ -30,3 +30,14 @@ index.version:: Specify the version with which new index files should be initialized. This does not affect existing repositories. If `feature.manyFiles` is enabled, then the default is 4. + +index.skipHash:: + When enabled, do not compute the trailing hash for the index file. + This accelerates Git commands that manipulate the index, such as + `git add`, `git commit`, or `git status`. Instead of storing the + checksum, write a trailing set of bytes with value zero, indicating + that the computation was skipped. ++ +If you enable `index.skipHash`, then Git clients older than 2.13.0 will +refuse to parse the index and Git clients older than 2.40.0 will report an +error during `git fsck`. diff --git a/read-cache.c b/read-cache.c index 46f5e497b14..d73a81e41ae 100644 --- a/read-cache.c +++ b/read-cache.c @@ -1817,6 +1817,8 @@ static int verify_hdr(const struct cache_header *hdr, unsigned long size) git_hash_ctx c; unsigned char hash[GIT_MAX_RAWSZ]; int hdr_version; + unsigned char *start, *end; + struct object_id oid; if (hdr->hdr_signature != htonl(CACHE_SIGNATURE)) return error(_("bad signature 0x%08x"), hdr->hdr_signature); @@ -1827,10 +1829,16 @@ static int verify_hdr(const struct cache_header *hdr, unsigned long size) if (!verify_index_checksum) return 0; + end = (unsigned char *)hdr + size; + start = end - the_hash_algo->rawsz; + oidread(&oid, start); + if (oideq(&oid, null_oid())) + return 0; + the_hash_algo->init_fn(&c); the_hash_algo->update_fn(&c, hdr, size - the_hash_algo->rawsz); the_hash_algo->final_fn(hash, &c); - if (!hasheq(hash, (unsigned char *)hdr + size - the_hash_algo->rawsz)) + if (!hasheq(hash, start)) return error(_("bad index file sha1 signature")); return 0; } @@ -2915,9 +2923,12 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, int ieot_entries = 1; struct index_entry_offset_table *ieot = NULL; int nr, nr_threads; + struct repository *r = istate->repo ? istate->repo : the_repository; f = hashfd(tempfile->fd, tempfile->filename.buf); + repo_config_get_bool(r, "index.skiphash", &f->skip_hash); + for (i = removed = extended = 0; i < entries; i++) { if (cache[i]->ce_flags & CE_REMOVE) removed++; diff --git a/t/t1600-index.sh b/t/t1600-index.sh index 010989f90e6..98c5a83db73 100755 --- a/t/t1600-index.sh +++ b/t/t1600-index.sh @@ -65,6 +65,20 @@ test_expect_success 'out of bounds index.version issues warning' ' ) ' +test_expect_success 'index.skipHash config option' ' + rm -f .git/index && + git -c index.skipHash=true add a && + git fsck && + + test_commit start && + git -c protocol.file.allow=always submodule add ./ sub && + git config index.skipHash false && + git -C sub config index.skipHash true && + >sub/file && + git -C sub add a && + git -C sub fsck +' + test_index_version () { INDEX_VERSION_CONFIG=$1 && FEATURE_MANY_FILES=$2 && From patchwork Fri Jan 6 16:31:55 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13091527 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 62D76C5479D for ; Fri, 6 Jan 2023 16:32:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234196AbjAFQcL (ORCPT ); Fri, 6 Jan 2023 11:32:11 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51050 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230047AbjAFQcE (ORCPT ); Fri, 6 Jan 2023 11:32:04 -0500 Received: from mail-wr1-x432.google.com (mail-wr1-x432.google.com [IPv6:2a00:1450:4864:20::432]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C3DA276EF5 for ; Fri, 6 Jan 2023 08:32:02 -0800 (PST) Received: by mail-wr1-x432.google.com with SMTP id bs20so1755889wrb.3 for ; Fri, 06 Jan 2023 08:32:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=c/HpHSxTSF0Y5I/DH6rhmOa+ACyaZbJlJ7dQWlMWApI=; b=gxasuBSECslO6L8I/cURMukJ5aGRFyvWw95QaYZ9e+ytYqTkSfvjAkEFxgU/i0wJmB 3K6te44dQeWVArCuQIBDqzc2k2mtrzS3ljRyiLEeZpEHvTMWn9v3iw8rChaYuyk1vtgS yQsPld6rUIJbrlbQQumymm/EMVSiUVA8rIVS6fPRjbFQDeBoAVFpR1JnCGQ3QWcCIu0a niC/MglWrFBwCuvsKUAikiwqSqVrZf6mZ9nkCtcTTDqiodxyX1y/DGsVWuEsR+Ur3r4O pHB6Gzbg+3h6HlR8WV9x/mbxZlN3qTsMPMZzl0REkuIqyvVwtnIGHIoaOk+sgZ0tnFO4 hs1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=c/HpHSxTSF0Y5I/DH6rhmOa+ACyaZbJlJ7dQWlMWApI=; b=qdLOBJ2F4h9aVgZ/L9Jvgd5FIPe+W18JrxU+eeIiY+0Y2ae4D8MKS1Po0kW3J62TAN Cz5UHc24macvLEM/PSuV4DlpzHv/aBjOf2lnIpm71lMMKWBTqRtKb8j0UI1/l2AYcHlc 9lZi4Xya+1+eAKkHWZkjjyu27aTCYIgunkxKwsCdAExVNmlNb53a6Fp+BbYBeanZv2BH z2baQCBrXTZdvfkHPER4A2yoPXN+UXYLDzBlJmu9ECZfaSda+QN6038p2hLxg/N4QaiQ UMry3EFVuIXQMxFyeohiYE8cPE+q1fMMm+HZJFY3GMCGX82oQi7oHtfSw0+hIKHQj+Nb D88w== X-Gm-Message-State: AFqh2krQiCJQ+0WGMP1fkcjRam96WtYfVR0nWGncQL/RTvViu+1MDiMh gvPx4EgC68OPxHIgFeS7H+pbbk02rL0= X-Google-Smtp-Source: AMrXdXtT/fAEFkiGQPuWTpsleA1w8pgMCKAf7Z3Z2sQXbZMZL7e+PzmMYXItnrk374J+Qn9qJ8jQ6A== X-Received: by 2002:adf:cc8f:0:b0:242:14bb:439d with SMTP id p15-20020adfcc8f000000b0024214bb439dmr36505047wrj.43.1673022720988; Fri, 06 Jan 2023 08:32:00 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id e7-20020a5d5007000000b0023662d97130sm1593588wrt.20.2023.01.06.08.32.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 06 Jan 2023 08:32:00 -0800 (PST) Message-Id: In-Reply-To: References: Date: Fri, 06 Jan 2023 16:31:55 +0000 Subject: [PATCH v5 3/4] test-lib-functions: add helper for trailing hash Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, vdye@github.com, avarab@gmail.com, newren@gmail.com, Jacob Keller , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee It can be helpful to check that a file format with a trailing hash has a specific hash in the final bytes of a written file. This is made more apparent by recent changes that allow skipping the hash algorithm and writing a null hash at the end of the file instead. Add a new test_trailing_hash helper and use it in t1600 to verify that index.skipHash=true really does skip the hash computation, since 'git fsck' does not actually verify the hash. This confirms that when the config is disabled explicitly in a super project but enabled in a submodule, then the use of repo_config_get_bool() loads config from the correct repository in the case of 'git add'. There are other cases where istate->repo is NULL and thus this config is loaded instead from the_repository, but that's due to many different code paths initializing index_state structs in their own way. Keep the 'git fsck' call to ensure that any potential future change to check the index hash does not cause an error in this case. Signed-off-by: Derrick Stolee --- t/t1600-index.sh | 5 +++++ t/test-lib-functions.sh | 8 ++++++++ 2 files changed, 13 insertions(+) diff --git a/t/t1600-index.sh b/t/t1600-index.sh index 98c5a83db73..2f792bb8ffa 100755 --- a/t/t1600-index.sh +++ b/t/t1600-index.sh @@ -68,6 +68,9 @@ test_expect_success 'out of bounds index.version issues warning' ' test_expect_success 'index.skipHash config option' ' rm -f .git/index && git -c index.skipHash=true add a && + test_trailing_hash .git/index >hash && + echo $(test_oid zero) >expect && + test_cmp expect hash && git fsck && test_commit start && @@ -76,6 +79,8 @@ test_expect_success 'index.skipHash config option' ' git -C sub config index.skipHash true && >sub/file && git -C sub add a && + test_trailing_hash .git/modules/sub/index >hash && + test_cmp expect hash && git -C sub fsck ' diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh index 796093a7b32..60308843f8f 100644 --- a/t/test-lib-functions.sh +++ b/t/test-lib-functions.sh @@ -1875,3 +1875,11 @@ test_cmp_config_output () { sort config-actual >sorted-actual && test_cmp sorted-expect sorted-actual } + +# Given a filename, extract its trailing hash as a hex string +test_trailing_hash () { + local file="$1" && + tail -c $(test_oid rawsz) "$file" | + test-tool hexdump | + sed "s/ //g" +} From patchwork Fri Jan 6 16:31:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13091528 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3444AC3DA7A for ; Fri, 6 Jan 2023 16:32:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230047AbjAFQcN (ORCPT ); Fri, 6 Jan 2023 11:32:13 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51050 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230514AbjAFQcE (ORCPT ); Fri, 6 Jan 2023 11:32:04 -0500 Received: from mail-wr1-x42f.google.com (mail-wr1-x42f.google.com [IPv6:2a00:1450:4864:20::42f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4B70376EF7 for ; Fri, 6 Jan 2023 08:32:03 -0800 (PST) Received: by mail-wr1-x42f.google.com with SMTP id h16so1730124wrz.12 for ; Fri, 06 Jan 2023 08:32:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=4pvFKRnd3AndEohyyEC5z7YMXT7+2Ddy+ydsdu69Dec=; b=hlD2ihVtsFYnE6sLz5d9MaaQQJG88bmkHPIN0QNv+6ugsuKr2s0gBlt78cpp6h/cWi 6gHwQSYIIjbbIG5lJOkHeilPK0+n8l/RivkWZOlJGBT0syrGeekIlY5NLw7zyPnTSqCR s0vdgFuEKq58sPggrfrQfOy756rFi8W63Or01JdBOKXaGObSpS/tS3LMKCHblMQSCHgh f50TZI7MseWlgqW0YXF3stvwjIg7diL5iFw7sg6SPFanaRg2PlTDAqJG5d5VdL7o/1F2 ItJ8vDu+7QjA6W8UJ3lf6DrTStiYg48RrUEyAAHOOs26ouTAeTQMRm+C06FYSgXvTDXF 84HQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=4pvFKRnd3AndEohyyEC5z7YMXT7+2Ddy+ydsdu69Dec=; b=SR6P1gFzviw7RKWe67NkIT7vpdx3MvUAz8GggNxqTBC+Yes4JJVztE/jHrnrnTUamz sS/PutCQrYBXv2QbCRxMlRET5ms9/UD64ruE+vAejhIRyvetoUDmckYoeZeY1Vv/6nZW XhEwZ2+OZdD/fFv+jrVExl0tH2eQcWnDMJO6Z2JGbFZ1AKDsK6arY5Iaj5mOxYNusIJk HyeiJx6tFf7Aq4cDhbc4hB3AjlH6gQncT2ZtexOI3TideExP3roBgq89FI3/pERZOYoq 2WWtzNh2vHTa73EbGhdJD6ddm0ntXLLMxHYPRsujwNegrz0qJdhyLkPbhxT8L8cGYPuD t6Gg== X-Gm-Message-State: AFqh2krqmAB4f5qs138X42zNXXWPIBZEo2DT0XgwNNFuafF5quBEUya+ EJzn8aL5qPLZ+J1sj7G/D7b693VQHl8= X-Google-Smtp-Source: AMrXdXt3L8ytzXV9TzOvtsYevyCPcQi8SB/pq5MFO4406T/rW7ez0n2tL8JvKrQTJ/oPhtSS/jDseA== X-Received: by 2002:a05:6000:69c:b0:281:67a6:5138 with SMTP id bo28-20020a056000069c00b0028167a65138mr27712345wrb.15.1673022721724; Fri, 06 Jan 2023 08:32:01 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id bx11-20020a5d5b0b000000b002366f9bd717sm1956837wrb.45.2023.01.06.08.32.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 06 Jan 2023 08:32:01 -0800 (PST) Message-Id: <1beedcb5ba112b05f6ef9cd306378b140ca73f8c.1673022717.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Fri, 06 Jan 2023 16:31:56 +0000 Subject: [PATCH v5 4/4] features: feature.manyFiles implies fast index writes Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, vdye@github.com, avarab@gmail.com, newren@gmail.com, Jacob Keller , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The recent addition of the index.skipHash config option allows index writes to speed up by skipping the hash computation for the trailing checksum. This is particularly critical for repositories with many files at HEAD, so add this config option to two cases where users in that scenario may opt-in to such behavior: 1. The feature.manyFiles config option enables some options that are helpful for repositories with many files at HEAD. 2. 'scalar register' and 'scalar reconfigure' set config options that optimize for large repositories. In both of these cases, set index.skipHash=true to gain this speedup. Add tests that demonstrate the proper way that index.skipHash=true can override feature.manyFiles=true. Signed-off-by: Derrick Stolee --- Documentation/config/feature.txt | 5 +++++ read-cache.c | 3 ++- repo-settings.c | 2 ++ repository.h | 1 + scalar.c | 1 + t/t1600-index.sh | 11 +++++++++++ 6 files changed, 22 insertions(+), 1 deletion(-) diff --git a/Documentation/config/feature.txt b/Documentation/config/feature.txt index 95975e50912..e52bc6b8584 100644 --- a/Documentation/config/feature.txt +++ b/Documentation/config/feature.txt @@ -23,6 +23,11 @@ feature.manyFiles:: working directory. With many files, commands such as `git status` and `git checkout` may be slow and these new defaults improve performance: + +* `index.skipHash=true` speeds up index writes by not computing a trailing + checksum. Note that this will cause Git versions earlier than 2.13.0 to + refuse to parse the index and Git versions earlier than 2.40.0 will report + a corrupted index during `git fsck`. ++ * `index.version=4` enables path-prefix compression in the index. + * `core.untrackedCache=true` enables the untracked cache. This setting assumes diff --git a/read-cache.c b/read-cache.c index d73a81e41ae..feefa0f68ba 100644 --- a/read-cache.c +++ b/read-cache.c @@ -2927,7 +2927,8 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, f = hashfd(tempfile->fd, tempfile->filename.buf); - repo_config_get_bool(r, "index.skiphash", &f->skip_hash); + prepare_repo_settings(r); + f->skip_hash = r->settings.index_skip_hash; for (i = removed = extended = 0; i < entries; i++) { if (cache[i]->ce_flags & CE_REMOVE) diff --git a/repo-settings.c b/repo-settings.c index 3021921c53d..3dbd3f0e2ec 100644 --- a/repo-settings.c +++ b/repo-settings.c @@ -47,6 +47,7 @@ void prepare_repo_settings(struct repository *r) } if (manyfiles) { r->settings.index_version = 4; + r->settings.index_skip_hash = 1; r->settings.core_untracked_cache = UNTRACKED_CACHE_WRITE; } @@ -61,6 +62,7 @@ void prepare_repo_settings(struct repository *r) repo_cfg_bool(r, "pack.usesparse", &r->settings.pack_use_sparse, 1); repo_cfg_bool(r, "core.multipackindex", &r->settings.core_multi_pack_index, 1); repo_cfg_bool(r, "index.sparse", &r->settings.sparse_index, 0); + repo_cfg_bool(r, "index.skiphash", &r->settings.index_skip_hash, r->settings.index_skip_hash); /* * The GIT_TEST_MULTI_PACK_INDEX variable is special in that diff --git a/repository.h b/repository.h index 6c461c5b9de..e8c67ffe165 100644 --- a/repository.h +++ b/repository.h @@ -42,6 +42,7 @@ struct repo_settings { struct fsmonitor_settings *fsmonitor; /* lazily loaded */ int index_version; + int index_skip_hash; enum untracked_cache_setting core_untracked_cache; int pack_use_sparse; diff --git a/scalar.c b/scalar.c index 6c52243cdf1..b49bb8c24ec 100644 --- a/scalar.c +++ b/scalar.c @@ -143,6 +143,7 @@ static int set_recommended_config(int reconfigure) { "credential.validate", "false", 1 }, /* GCM4W-only */ { "gc.auto", "0", 1 }, { "gui.GCWarning", "false", 1 }, + { "index.skipHash", "false", 1 }, { "index.threads", "true", 1 }, { "index.version", "4", 1 }, { "merge.stat", "false", 1 }, diff --git a/t/t1600-index.sh b/t/t1600-index.sh index 2f792bb8ffa..0ebbae13058 100755 --- a/t/t1600-index.sh +++ b/t/t1600-index.sh @@ -73,6 +73,17 @@ test_expect_success 'index.skipHash config option' ' test_cmp expect hash && git fsck && + rm -f .git/index && + git -c feature.manyFiles=true add a && + test_trailing_hash .git/index >hash && + cmp expect hash && + + rm -f .git/index && + git -c feature.manyFiles=true \ + -c index.skipHash=false add a && + test_trailing_hash .git/index >hash && + ! cmp expect hash && + test_commit start && git -c protocol.file.allow=always submodule add ./ sub && git config index.skipHash false &&