From patchwork Thu Dec 15 15:06:57 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13074272 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5138CC4167B for ; Thu, 15 Dec 2022 15:07:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229596AbiLOPHN (ORCPT ); Thu, 15 Dec 2022 10:07:13 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37128 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229755AbiLOPHJ (ORCPT ); Thu, 15 Dec 2022 10:07:09 -0500 Received: from mail-wm1-x334.google.com (mail-wm1-x334.google.com [IPv6:2a00:1450:4864:20::334]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 92CCD3054F for ; Thu, 15 Dec 2022 07:07:04 -0800 (PST) Received: by mail-wm1-x334.google.com with SMTP id o5-20020a05600c510500b003d21f02fbaaso1911326wms.4 for ; Thu, 15 Dec 2022 07:07:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=Ed5q4EI9Rja0zU3IUSIZ9CyNU+B6k57eyHEpfMq9F2s=; b=jtwaA2B+7OxQFXwec4HC7bht6LMenRIAZX5+QmSIXEty8kDCXKp+rWYj692LmxaFuH OB+DmN1SJkrjd5nbBawWRqsJn0JAk/+/M7ljnHIkUkGXXfByAyq24vu7dDdTDOHknEFl dGkmIKo7vfC7adFg6HufKHmOsNyDH26P8Y5k9+bsljy02SaQQZ6jv6ADK9Ow2+t8AYcn cigymf9TUdZ/BX2/bSAxH1z9ZglVA3REcbYkbmM566JteT+jEksA49xjTNVelUVWXPDs 5EMRFbwykCOZkYLABX8ThbSu6d1jjmBJ2e12YkjrFiwMn1xW0i0hM0u5w09qpw1yc76I FCxQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Ed5q4EI9Rja0zU3IUSIZ9CyNU+B6k57eyHEpfMq9F2s=; b=xHmVXC8XUU7Q9GjDYegNCfgfYkoJfIC/pzyyHb3v3zdQAETlCMfTprwSOa89sf7sqi Ij6VMCOTzZPcfPzTyfbFDL0d8RRr/GV2n1TuEJk8V3KDuI0l/AJGu4uhOLupCkgz1XF5 FX2xJqDa4BBNzpNkd2yydFrIczFZDFq1amNC607K2qNx4E0F2f7jI41wrvIerLQHubmp ukbHf5B3PFKYLR3ouswP3EBvtV8IV0Pg6j65eqlbhYL9yOHSkHaif1xgYNVOd6izCCu6 jamzEgrzn2UUJsGo/YOC0xmkSwmQSB82VFrexHH59x95ep+zoFQvxqpdJqEqO9zIYn6l B36g== X-Gm-Message-State: ANoB5pkjNJmMQrfvW24pFszOfVOIBGCyqbJ00n7QNRpIGj1qjYuxlnDd voI50nEkhUXrJ0eBEiZWGaujlzKO6Q8= X-Google-Smtp-Source: AA0mqf5/aBZYo+npBz+iy+rBt28YpdIyP3JHx/wePTQF0WV8bkPd0YEz8X1a7iCwY2UEZz+Pgerstg== X-Received: by 2002:a05:600c:1ca1:b0:3d2:bca5:10a2 with SMTP id k33-20020a05600c1ca100b003d2bca510a2mr2451057wms.22.1671116822803; Thu, 15 Dec 2022 07:07:02 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id ay40-20020a05600c1e2800b003c6c5a5a651sm6664471wmb.28.2022.12.15.07.07.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 15 Dec 2022 07:07:02 -0800 (PST) Message-Id: In-Reply-To: References: Date: Thu, 15 Dec 2022 15:06:57 +0000 Subject: [PATCH v3 1/4] hashfile: allow skipping the hash function Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, vdye@github.com, avarab@gmail.com, newren@gmail.com, Jacob Keller , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The hashfile API is useful for generating files that include a trailing hash of the file's contents up to that point. Using such a hash is helpful for verifying the file for corruption-at-rest, such as a faulty drive causing flipped bits. Git's index file includes this trailing hash, so it uses a 'struct hashfile' to handle the I/O to the file. This was very convenient to allow using the hashfile methods during these operations. However, hashing the file contents during write comes at a performance penalty. It's slower to hash the bytes on their way to the disk than without that step. This problem is made worse by the replacement of hardware-accelerated SHA1 computations with the software-based sha1dc computation. This write cost is significant, and the checksum capability is likely not worth that cost for such a short-lived file. The index is rewritten frequently and the only time the checksum is checked is during 'git fsck'. Thus, it would be helpful to allow a user to opt-out of the hash computation. We first need to allow Git to opt-out of the hash computation in the hashfile API. The buffered writes of the API are still helpful, so it makes sense to make the change here. Introduce a new 'skip_hash' option to 'struct hashfile'. When set, the update_fn and final_fn members of the_hash_algo are skipped. When finalizing the hashfile, the trailing hash is replaced with the null hash. This use of a trailing null hash would be desireable in either case, since we do not want to special case a file format to have a different length depending on whether it was hashed or not. When the final bytes of a file are all zero, we can infer that it was written without hashing, and thus that verification is not available as a check for file consistency. This also means that we could easily toggle hashing for any file format we desire. A version of this patch has existed in the microsoft/git fork since 2017 [1] (the linked commit was rebased in 2018, but the original dates back to January 2017). Here, the change to make the index use this fast path is delayed until a later change. [1] https://github.com/microsoft/git/commit/21fed2d91410f45d85279467f21d717a2db45201 Co-authored-by: Kevin Willford Signed-off-by: Kevin Willford Signed-off-by: Derrick Stolee --- csum-file.c | 14 +++++++++++--- csum-file.h | 7 +++++++ 2 files changed, 18 insertions(+), 3 deletions(-) diff --git a/csum-file.c b/csum-file.c index 59ef3398ca2..cce13c0f047 100644 --- a/csum-file.c +++ b/csum-file.c @@ -45,7 +45,8 @@ void hashflush(struct hashfile *f) unsigned offset = f->offset; if (offset) { - the_hash_algo->update_fn(&f->ctx, f->buffer, offset); + if (!f->skip_hash) + the_hash_algo->update_fn(&f->ctx, f->buffer, offset); flush(f, f->buffer, offset); f->offset = 0; } @@ -64,7 +65,12 @@ int finalize_hashfile(struct hashfile *f, unsigned char *result, int fd; hashflush(f); - the_hash_algo->final_fn(f->buffer, &f->ctx); + + if (f->skip_hash) + hashclr(f->buffer); + else + the_hash_algo->final_fn(f->buffer, &f->ctx); + if (result) hashcpy(result, f->buffer); if (flags & CSUM_HASH_IN_STREAM) @@ -108,7 +114,8 @@ void hashwrite(struct hashfile *f, const void *buf, unsigned int count) * the hashfile's buffer. In this block, * f->offset is necessarily zero. */ - the_hash_algo->update_fn(&f->ctx, buf, nr); + if (!f->skip_hash) + the_hash_algo->update_fn(&f->ctx, buf, nr); flush(f, buf, nr); } else { /* @@ -153,6 +160,7 @@ static struct hashfile *hashfd_internal(int fd, const char *name, f->tp = tp; f->name = name; f->do_crc = 0; + f->skip_hash = 0; the_hash_algo->init_fn(&f->ctx); f->buffer_len = buffer_len; diff --git a/csum-file.h b/csum-file.h index 0d29f528fbc..29468067f81 100644 --- a/csum-file.h +++ b/csum-file.h @@ -20,6 +20,13 @@ struct hashfile { size_t buffer_len; unsigned char *buffer; unsigned char *check_buffer; + + /** + * If set to 1, skip_hash indicates that we should + * not actually compute the hash for this hashfile and + * instead only use it as a buffered write. + */ + unsigned int skip_hash; }; /* Checkpoint */ From patchwork Thu Dec 15 15:06:58 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13074273 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3E54CC4332F for ; Thu, 15 Dec 2022 15:07:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229783AbiLOPHP (ORCPT ); Thu, 15 Dec 2022 10:07:15 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37032 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229544AbiLOPHK (ORCPT ); Thu, 15 Dec 2022 10:07:10 -0500 Received: from mail-wr1-x435.google.com (mail-wr1-x435.google.com [IPv6:2a00:1450:4864:20::435]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 49A4F2FBD7 for ; Thu, 15 Dec 2022 07:07:06 -0800 (PST) Received: by mail-wr1-x435.google.com with SMTP id m14so3347819wrh.7 for ; Thu, 15 Dec 2022 07:07:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:fcc:content-transfer-encoding:mime-version:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=UEbMq54j8bKhuo9OUPycuOxora+wDnAP3HoafrfWaOs=; b=BZJrAhfkCxK0dcBnxyJzmVGUfVifEKo8dLsXnnfJJx3fFcAASKvnNgBtV6ft4UlFAz Qfv+cIcNYGUJOSCr1Hr75e/MXHFZG5duYu/Wrj+OROqdsejLLimBIT0lvBl18Z1bh1l1 KhgvVqBYvLR/jh9l505/caDIFTfXDCtXKYh+c38GZlo58k5jiWA5jRGFb9H+3BNkg38I UaUjwvLVehGAnlLPgsXzjSwsvQVA1ycVutADFFMCaW8yKzVZ3RMCr0elUKw0891J/xr9 S3axe14UDq6LhF/N0LjKN7BhSsMonDA1uc9TtI3hXanGPzdX+cJP+X767JGeW3ETRMCe Tp4A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:fcc:content-transfer-encoding:mime-version:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=UEbMq54j8bKhuo9OUPycuOxora+wDnAP3HoafrfWaOs=; b=ectzJez8Zm6d5H5CGLEsuSOJoij2+0AMxKZIPPHMXO/QUTek9J5s+KWaY0NSWBXEBE AzujPRD1P7swS2BMoFbtCO957ULlCdTx3ou/6H0SL3UusyN874IzNXqKw0ulT/yKhlfi GZC4XN17pEGXr6yDmns58/OUSQ3OO2i4HZwrgO5GYzHFhFWO2msxQZZ5ElOWO0sO37u7 eG89+a+6wEnnWyciFxpXxyu3A3BPNO9qifvB+rJtJz5oXQ4H4SZRCErVtKZIiqTQ7+wT vu1CMou1YpuGsS2K8G/UzYkuWpypOQEEOgsOMB600j/6lbcb9tNuViesNbQpM+R4zn7d l4LQ== X-Gm-Message-State: ANoB5pkWNXIaW6niemADco4UAru3Cqu96TwNnYlk0Nm5bPEJxM2bpG3K IQ4wjpzjU3h9kDDT1WkE/rIXIfHyiJw= X-Google-Smtp-Source: AA0mqf6tSs0f7MXoOXhUPMzIE+SSMOjeRxItp7g5M7OMStIKGHoCHibbPXfhScqiukBexiCsC6hjTA== X-Received: by 2002:adf:a385:0:b0:242:7fd1:e9d2 with SMTP id l5-20020adfa385000000b002427fd1e9d2mr18087703wrb.62.1671116824222; Thu, 15 Dec 2022 07:07:04 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id l1-20020adfa381000000b002423a5d7cb1sm6138601wrb.113.2022.12.15.07.07.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 15 Dec 2022 07:07:03 -0800 (PST) Message-Id: <00738c81a1212970910da6f29fe3ecef87c2ec3a.1671116820.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Thu, 15 Dec 2022 15:06:58 +0000 Subject: [PATCH v3 2/4] read-cache: add index.skipHash config option MIME-Version: 1.0 Fcc: Sent To: git@vger.kernel.org Cc: gitster@pobox.com, vdye@github.com, avarab@gmail.com, newren@gmail.com, Jacob Keller , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The previous change allowed skipping the hashing portion of the hashwrite API, using it instead as a buffered write API. Disabling the hashwrite can be particularly helpful when the write operation is in a critical path. One such critical path is the writing of the index. This operation is so critical that the sparse index was created specifically to reduce the size of the index to make these writes (and reads) faster. This trade-off between file stability at rest and write-time performance is not easy to balance. The index is an interesting case for a couple reasons: 1. Writes block users. Writing the index takes place in many user- blocking foreground operations. The speed improvement directly impacts their use. Other file formats are typically written in the background (commit-graph, multi-pack-index) or are super-critical to correctness (pack-files). 2. Index files are short lived. It is rare that a user leaves an index for a long time with many staged changes. Outside of staged changes, the index can be completely destroyed and rewritten with minimal impact to the user. Following a similar approach to one used in the microsoft/git fork [1], add a new config option (index.skipHash) that allows disabling this hashing during the index write. The cost is that we can no longer validate the contents for corruption-at-rest using the trailing hash. [1] https://github.com/microsoft/git/commit/21fed2d91410f45d85279467f21d717a2db45201 While older Git versions will not recognize the null hash as a special case, the file format itself is still being met in terms of its structure. Using this null hash will still allow Git operations to function across older versions. The one exception is 'git fsck' which checks the hash of the index file. This used to be a check on every index read, but was split out to just the index in a33fc72fe91 (read-cache: force_verify_index_checksum, 2017-04-14) and released first in Git 2.13.0. Document the versions that relaxed these restrictions, with the optimistic expectation that this change will be included in Git 2.40.0. Here, we disable this check if the trailing hash is all zeroes. We add a warning to the config option that this may cause undesirable behavior with older Git versions. As a quick comparison, I tested 'git update-index --force-write' with and without index.skipHash=true on a copy of the Linux kernel repository. Benchmark 1: with hash Time (mean ± σ): 46.3 ms ± 13.8 ms [User: 34.3 ms, System: 11.9 ms] Range (min … max): 34.3 ms … 79.1 ms 82 runs Benchmark 2: without hash Time (mean ± σ): 26.0 ms ± 7.9 ms [User: 11.8 ms, System: 14.2 ms] Range (min … max): 16.3 ms … 42.0 ms 69 runs Summary 'without hash' ran 1.78 ± 0.76 times faster than 'with hash' These performance benefits are substantial enough to allow users the ability to opt-in to this feature, even with the potential confusion with older 'git fsck' versions. It is critical that this test is placed before the test_index_version tests, since those tests obliterate the .git/config file and hence lose the setting from GIT_TEST_DEFAULT_HASH, if set. Signed-off-by: Derrick Stolee --- Documentation/config/index.txt | 11 +++++++++++ read-cache.c | 12 +++++++++++- t/t1600-index.sh | 6 ++++++ 3 files changed, 28 insertions(+), 1 deletion(-) diff --git a/Documentation/config/index.txt b/Documentation/config/index.txt index 75f3a2d1054..23c7985eb40 100644 --- a/Documentation/config/index.txt +++ b/Documentation/config/index.txt @@ -30,3 +30,14 @@ index.version:: Specify the version with which new index files should be initialized. This does not affect existing repositories. If `feature.manyFiles` is enabled, then the default is 4. + +index.skipHash:: + When enabled, do not compute the trailing hash for the index file. + This accelerates Git commands that manipulate the index, such as + `git add`, `git commit`, or `git status`. Instead of storing the + checksum, write a trailing set of bytes with value zero, indicating + that the computation was skipped. ++ +If you enable `index.skipHash`, then Git clients older than 2.13.0 will +refuse to parse the index and Git clients older than 2.40.0 will report an +error during `git fsck`. diff --git a/read-cache.c b/read-cache.c index 46f5e497b14..3f7de8b2e20 100644 --- a/read-cache.c +++ b/read-cache.c @@ -1817,6 +1817,8 @@ static int verify_hdr(const struct cache_header *hdr, unsigned long size) git_hash_ctx c; unsigned char hash[GIT_MAX_RAWSZ]; int hdr_version; + unsigned char *start, *end; + struct object_id oid; if (hdr->hdr_signature != htonl(CACHE_SIGNATURE)) return error(_("bad signature 0x%08x"), hdr->hdr_signature); @@ -1827,10 +1829,16 @@ static int verify_hdr(const struct cache_header *hdr, unsigned long size) if (!verify_index_checksum) return 0; + end = (unsigned char *)hdr + size; + start = end - the_hash_algo->rawsz; + oidread(&oid, start); + if (oideq(&oid, null_oid())) + return 0; + the_hash_algo->init_fn(&c); the_hash_algo->update_fn(&c, hdr, size - the_hash_algo->rawsz); the_hash_algo->final_fn(hash, &c); - if (!hasheq(hash, (unsigned char *)hdr + size - the_hash_algo->rawsz)) + if (!hasheq(hash, end - the_hash_algo->rawsz)) return error(_("bad index file sha1 signature")); return 0; } @@ -2918,6 +2926,8 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, f = hashfd(tempfile->fd, tempfile->filename.buf); + git_config_get_maybe_bool("index.skiphash", (int *)&f->skip_hash); + for (i = removed = extended = 0; i < entries; i++) { if (cache[i]->ce_flags & CE_REMOVE) removed++; diff --git a/t/t1600-index.sh b/t/t1600-index.sh index 010989f90e6..45feb0fc5d8 100755 --- a/t/t1600-index.sh +++ b/t/t1600-index.sh @@ -65,6 +65,12 @@ test_expect_success 'out of bounds index.version issues warning' ' ) ' +test_expect_success 'index.skipHash config option' ' + rm -f .git/index && + git -c index.skipHash=true add a && + git fsck +' + test_index_version () { INDEX_VERSION_CONFIG=$1 && FEATURE_MANY_FILES=$2 && From patchwork Thu Dec 15 15:06:59 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13074274 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 79C82C4332F for ; Thu, 15 Dec 2022 15:07:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230001AbiLOPHU (ORCPT ); Thu, 15 Dec 2022 10:07:20 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36928 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229737AbiLOPHK (ORCPT ); Thu, 15 Dec 2022 10:07:10 -0500 Received: from mail-wm1-x32e.google.com (mail-wm1-x32e.google.com [IPv6:2a00:1450:4864:20::32e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4F9612FBE0 for ; Thu, 15 Dec 2022 07:07:07 -0800 (PST) Received: by mail-wm1-x32e.google.com with SMTP id z8-20020a05600c220800b003d33b0bda11so868440wml.0 for ; Thu, 15 Dec 2022 07:07:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=FotZg4yy9XlqAjPVJNxC5xwRpcqkYAtK4Jr/uCnQRcg=; b=F+3YFJ30hlTTIymb5qoXAhjNjB7FPUSEUAIOK6+/IWK4rik+midP55n0YZ8feMy8mM IHZxsD0b4U2RqtQvSAc342IGLOOE5amaORp9btKGPyOYNZAfkyigajfhOvU4cJw8On/t uDkm9D31qypeAAnbDux8ote/PMx8vCefD/9cxOul0eRVqqXA5CMf7v1P4Y6LfpkNRKRM WcRWLsk8xDCZBi+NNX3tfRbGO56nVdUoTmo3snOq37VGc0r2JotjGB6djQriOd7g0IIR oWF+v1NWLY7Zcq9NOju6En+WuQ8iBoJSj55XkGbIZGUHTxfC27VDVjVHOV62HdjR4xwp rhNg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=FotZg4yy9XlqAjPVJNxC5xwRpcqkYAtK4Jr/uCnQRcg=; b=oPxHxrfGVAFDRUYisHfvV1R5+DqEjBJ4lODisaG20QCMqgtN1u9fawJxnKf4tNQRX2 k/D6k3gnQkLRZmDf1SLyMKUxmFTj7HMsLXqlD04g9DBrhkWfTzzIExFs11+MAmTWJZcn ppO9F9Dc1bJ1zW5x7RGeyuqOtBzEsYko76e36ct38xmJQIy9/LtMuhEZgIAEHGVDY7ah 4kxGbrmq2dFt1kRkF57tYyB8jdBtnJ/SX1B2HrlrYSQRuGLmi0c7mdkP+vWvdHPVHFof G8ex05ttAoEjSlV6cOR352/D7YDVlgh4qrZp8CT93PAsu8zr/+KmB9nGC3DhsON6FRFu wu8A== X-Gm-Message-State: ANoB5pkOYS+4WMDgWfX8T/aL19eeyKQVV66M4vCof/Dx5AoTxff0Pi7s dKYmyVF6Hgn4tPli+p20ioFBwnhnjtE= X-Google-Smtp-Source: AA0mqf51DOiyneBP2uyBQF+Bczp9vufcr2aa6ziaPGmC/gHqBVSsEWQJIG9RfitRn4HqKi72rAOr8w== X-Received: by 2002:a05:600c:1e1d:b0:3d1:fd97:5157 with SMTP id ay29-20020a05600c1e1d00b003d1fd975157mr21709598wmb.14.1671116825504; Thu, 15 Dec 2022 07:07:05 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id h15-20020a05600c350f00b003c71358a42dsm9533936wmq.18.2022.12.15.07.07.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 15 Dec 2022 07:07:04 -0800 (PST) Message-Id: <86370af1351a275d5583c76c07b536754e4b6afc.1671116820.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Thu, 15 Dec 2022 15:06:59 +0000 Subject: [PATCH v3 3/4] test-lib-functions: add helper for trailing hash Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, vdye@github.com, avarab@gmail.com, newren@gmail.com, Jacob Keller , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee It can be helpful to check that a file format with a trailing hash has a specific hash in the final bytes of a written file. This is made more apparent by recent changes that allow skipping the hash algorithm and writing a null hash at the end of the file instead. Add a new test_trailing_hash helper and use it in t1600 to verify that index.skipHash=true really does skip the hash computation, since 'git fsck' does not actually verify the hash. Keep the 'git fsck' call to ensure that any potential future change to check the index hash does not cause an error in this case. Signed-off-by: Derrick Stolee --- t/t1600-index.sh | 3 +++ t/test-lib-functions.sh | 8 ++++++++ 2 files changed, 11 insertions(+) diff --git a/t/t1600-index.sh b/t/t1600-index.sh index 45feb0fc5d8..55914bc3506 100755 --- a/t/t1600-index.sh +++ b/t/t1600-index.sh @@ -68,6 +68,9 @@ test_expect_success 'out of bounds index.version issues warning' ' test_expect_success 'index.skipHash config option' ' rm -f .git/index && git -c index.skipHash=true add a && + test_trailing_hash .git/index >hash && + echo $(test_oid zero) >expect && + test_cmp expect hash && git fsck ' diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh index 796093a7b32..60308843f8f 100644 --- a/t/test-lib-functions.sh +++ b/t/test-lib-functions.sh @@ -1875,3 +1875,11 @@ test_cmp_config_output () { sort config-actual >sorted-actual && test_cmp sorted-expect sorted-actual } + +# Given a filename, extract its trailing hash as a hex string +test_trailing_hash () { + local file="$1" && + tail -c $(test_oid rawsz) "$file" | + test-tool hexdump | + sed "s/ //g" +} From patchwork Thu Dec 15 15:07:00 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13074275 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 53081C4332F for ; Thu, 15 Dec 2022 15:07:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229786AbiLOPHe (ORCPT ); Thu, 15 Dec 2022 10:07:34 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37030 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229892AbiLOPHL (ORCPT ); Thu, 15 Dec 2022 10:07:11 -0500 Received: from mail-wm1-x334.google.com (mail-wm1-x334.google.com [IPv6:2a00:1450:4864:20::334]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9B052303C5 for ; Thu, 15 Dec 2022 07:07:08 -0800 (PST) Received: by mail-wm1-x334.google.com with SMTP id b24-20020a05600c4a9800b003d21efdd61dso2070136wmp.3 for ; Thu, 15 Dec 2022 07:07:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=fESBRnWCynHwK44228IT91mKpA82MaCngsfDT7ZqGik=; b=N7TFfRPB5up1RvW6r1xdUjEz8O8W536eIIhu1+sWrGo99fmaAgLICHe4ROv9Xv26ao i/OWZ6tynWlTS7HzMSa2ztvxYolls3rwjFw30MDkh1ixHjXc3yESg0ss1BXTpm3Mwh2P 4s2W/j2pCKZImRDb9/DccU9NQ9ytG+ZX7LBBI9YUAZ3iDw59ZPTWtWg8a8Yrf1S8F3yY Q8QSxYtaPw+BPTFIgliJYff32euGK5+lU2vjLv6nbqrcuEtAjLMKxe87amT7NBTFSRnj r6ApJmORyuvSDgD5WR/Ef4YTyPJs9S1PRXBVk6QnH6Fe0GQg910TXLAvmjiVwEb51SAq VGtQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=fESBRnWCynHwK44228IT91mKpA82MaCngsfDT7ZqGik=; b=AzENg005mryAC3jcd8ok0HSTPqP8yY91A+1r5WGoBtBo1KQO9N5ds78oFR0h5tgITe 6OtnwkhY4TUxjCTel/lWw568IkjeE26Y7ilbdkUXJsnxIKLLLRxLqtUcDlEtqXzlNBwh bJYmLR9LOWXfU0lUuEtstxknMSpwkrv/kmLgE11m4ajoRHvmYqtx1h7G5L0UNO1vJkuQ tvcDvSiPEtgJfJldWu5Skke7IlGT8ki+ITqyIVsstVNoQZDo1NlF38axcVcACd95jvpp XymZvVypy3N41mWYVHShVdlaSe9+paYJ9Z/V+fF+G6zDvfe4baPET+jCzoafQvE130f9 SO9A== X-Gm-Message-State: ANoB5pmrQ3EO6AhO4PZcRYCIFLia9azBL6403nyD6aKItmOV8qgpTzqj 28Fk2STH0M1zMQv4DOK5lH797xz4A8M= X-Google-Smtp-Source: AA0mqf6Yzp3InwbXVZyI9HZ8+1JhXB/3eYQHq0D2gn9H5RveeQSBSw9x9UTb19VN1D3gJiWdemp96w== X-Received: by 2002:a05:600c:3c8d:b0:3d0:6d39:c62e with SMTP id bg13-20020a05600c3c8d00b003d06d39c62emr23098112wmb.12.1671116826895; Thu, 15 Dec 2022 07:07:06 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id n7-20020a05600c4f8700b003d1e90717ccsm8353420wmq.30.2022.12.15.07.07.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 15 Dec 2022 07:07:06 -0800 (PST) Message-Id: <6490bd445ebea41223eec38784426ad17a0711b6.1671116820.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Thu, 15 Dec 2022 15:07:00 +0000 Subject: [PATCH v3 4/4] features: feature.manyFiles implies fast index writes Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, vdye@github.com, avarab@gmail.com, newren@gmail.com, Jacob Keller , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The recent addition of the index.skipHash config option allows index writes to speed up by skipping the hash computation for the trailing checksum. This is particularly critical for repositories with many files at HEAD, so add this config option to two cases where users in that scenario may opt-in to such behavior: 1. The feature.manyFiles config option enables some options that are helpful for repositories with many files at HEAD. 2. 'scalar register' and 'scalar reconfigure' set config options that optimize for large repositories. In both of these cases, set index.skipHash=true to gain this speedup. Add tests that demonstrate the proper way that index.skipHash=true can override feature.manyFiles=true. Signed-off-by: Derrick Stolee --- Documentation/config/feature.txt | 5 +++++ read-cache.c | 5 ++++- repo-settings.c | 2 ++ repository.h | 1 + scalar.c | 1 + t/t1600-index.sh | 13 ++++++++++++- 6 files changed, 25 insertions(+), 2 deletions(-) diff --git a/Documentation/config/feature.txt b/Documentation/config/feature.txt index 95975e50912..e52bc6b8584 100644 --- a/Documentation/config/feature.txt +++ b/Documentation/config/feature.txt @@ -23,6 +23,11 @@ feature.manyFiles:: working directory. With many files, commands such as `git status` and `git checkout` may be slow and these new defaults improve performance: + +* `index.skipHash=true` speeds up index writes by not computing a trailing + checksum. Note that this will cause Git versions earlier than 2.13.0 to + refuse to parse the index and Git versions earlier than 2.40.0 will report + a corrupted index during `git fsck`. ++ * `index.version=4` enables path-prefix compression in the index. + * `core.untrackedCache=true` enables the untracked cache. This setting assumes diff --git a/read-cache.c b/read-cache.c index 3f7de8b2e20..1844953fba7 100644 --- a/read-cache.c +++ b/read-cache.c @@ -2926,7 +2926,10 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, f = hashfd(tempfile->fd, tempfile->filename.buf); - git_config_get_maybe_bool("index.skiphash", (int *)&f->skip_hash); + if (istate->repo) { + prepare_repo_settings(istate->repo); + f->skip_hash = istate->repo->settings.index_skip_hash; + } for (i = removed = extended = 0; i < entries; i++) { if (cache[i]->ce_flags & CE_REMOVE) diff --git a/repo-settings.c b/repo-settings.c index 3021921c53d..3dbd3f0e2ec 100644 --- a/repo-settings.c +++ b/repo-settings.c @@ -47,6 +47,7 @@ void prepare_repo_settings(struct repository *r) } if (manyfiles) { r->settings.index_version = 4; + r->settings.index_skip_hash = 1; r->settings.core_untracked_cache = UNTRACKED_CACHE_WRITE; } @@ -61,6 +62,7 @@ void prepare_repo_settings(struct repository *r) repo_cfg_bool(r, "pack.usesparse", &r->settings.pack_use_sparse, 1); repo_cfg_bool(r, "core.multipackindex", &r->settings.core_multi_pack_index, 1); repo_cfg_bool(r, "index.sparse", &r->settings.sparse_index, 0); + repo_cfg_bool(r, "index.skiphash", &r->settings.index_skip_hash, r->settings.index_skip_hash); /* * The GIT_TEST_MULTI_PACK_INDEX variable is special in that diff --git a/repository.h b/repository.h index 6c461c5b9de..e8c67ffe165 100644 --- a/repository.h +++ b/repository.h @@ -42,6 +42,7 @@ struct repo_settings { struct fsmonitor_settings *fsmonitor; /* lazily loaded */ int index_version; + int index_skip_hash; enum untracked_cache_setting core_untracked_cache; int pack_use_sparse; diff --git a/scalar.c b/scalar.c index 6c52243cdf1..b49bb8c24ec 100644 --- a/scalar.c +++ b/scalar.c @@ -143,6 +143,7 @@ static int set_recommended_config(int reconfigure) { "credential.validate", "false", 1 }, /* GCM4W-only */ { "gc.auto", "0", 1 }, { "gui.GCWarning", "false", 1 }, + { "index.skipHash", "false", 1 }, { "index.threads", "true", 1 }, { "index.version", "4", 1 }, { "merge.stat", "false", 1 }, diff --git a/t/t1600-index.sh b/t/t1600-index.sh index 55914bc3506..103743a1c7d 100755 --- a/t/t1600-index.sh +++ b/t/t1600-index.sh @@ -71,7 +71,18 @@ test_expect_success 'index.skipHash config option' ' test_trailing_hash .git/index >hash && echo $(test_oid zero) >expect && test_cmp expect hash && - git fsck + git fsck && + + rm -f .git/index && + git -c feature.manyFiles=true add a && + test_trailing_hash .git/index >hash && + test_cmp expect hash && + + rm -f .git/index && + git -c feature.manyFiles=true \ + -c index.skipHash=false add a && + test_trailing_hash .git/index >hash && + ! cmp expect hash ' test_index_version () {