From patchwork Mon May 17 12:24:49 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12261763 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1085BC433B4 for ; Mon, 17 May 2021 12:25:02 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E66FC61241 for ; Mon, 17 May 2021 12:25:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236994AbhEQM0R (ORCPT ); Mon, 17 May 2021 08:26:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40894 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230123AbhEQM0M (ORCPT ); Mon, 17 May 2021 08:26:12 -0400 Received: from mail-wm1-x32a.google.com (mail-wm1-x32a.google.com [IPv6:2a00:1450:4864:20::32a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6ABCEC06174A for ; Mon, 17 May 2021 05:24:55 -0700 (PDT) Received: by mail-wm1-x32a.google.com with SMTP id u5-20020a7bc0450000b02901480e40338bso5177097wmc.1 for ; Mon, 17 May 2021 05:24:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=yIaSykTHkZm7kVR794lpGSXGwJW8B3o4eXrOf0WDx9w=; b=qjam2Y/U0OjMevHYsu0gVhDoql9ku8ul9iRFbN//PPuSM0urSojNoFJyER635WKytm loK5GfmL3lhfQxpp7YpF+mrF4BUnZmT44bYu+QpjZtzIh+1YuABstS6yo+yA+XDr98iU 5riZqfIyAAA7xz7oDfxaDvqWZ/E6nMwdDbrGQ0bGiGk3g3WHvJHP2ykAy+deO5PTQaBx ZKW2FBiVqjpD+53Q1NyyOIMWe3nQCNJzaUaLPYE5zREVptpCt1FoamDd9qQU0CH4b/qn O1IB2BLTyi+SvOYIXHJGlSMDsmswZpu4tQoLUEtiL8qtVeMelmklaNOHl+OOnTIQsWRR ZZhA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=yIaSykTHkZm7kVR794lpGSXGwJW8B3o4eXrOf0WDx9w=; b=l+cCGSwZREc3OSk0sTb6H2UcIj22VOuGJFF8xCoVrgdFgaQ+hodxIC+U4oOXY0u8K7 k3phUs9aUyjSkviSO+MvLbLKGRpCsS3j1GVaRvF71WdojlykdT9NNN4SCR1Jkw8PPBT9 mjUkXOD3q/C0Ku5WSXgXcR5yDN+72yWjzQEoMDpTVxCXpL+WuNbd6eA1QMbzEuDzoadW ywsh+ZZyclCQ5HQ+lV0YyZQPyRhjXsH6bPZTZ6yoyrZF4UDM9ZffyVhQp6jM3mL1MIu5 kroq+S1FniQXSdd3HqK9PlK/8KrBM6rZXkncDuX7WBb3y6bR1ms385+jdEMMNHHWqI29 ffhg== X-Gm-Message-State: AOAM530Gpmu0beabisWBw58ciO7NCqvsiIoCipTjqZvPqxxpFGw1R+Ll sMQGgEBZ9bKuKo3N8aWb2WCHFb9/TUg= X-Google-Smtp-Source: ABdhPJw9LzNvyHsJoZNJ0AQjUrYs73oGTFW2LWAH35N+q2YrQMdmUjaSa+fPbVlzwzCsucypU4nA3w== X-Received: by 2002:a7b:c206:: with SMTP id x6mr23132714wmi.72.1621254294266; Mon, 17 May 2021 05:24:54 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id v15sm21184848wmj.39.2021.05.17.05.24.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 17 May 2021 05:24:54 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Mon, 17 May 2021 12:24:49 +0000 Subject: [PATCH v2 1/4] hashfile: use write_in_full() Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, peff@peff.net, stolee@gmail.com, git@jeffhostetler.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The flush() logic in csum-file.c was introduced originally by c38138c (git-pack-objects: write the pack files with a SHA1 csum, 2005-06-26) and a portion of the logic performs similar utility to write_in_full() in wrapper.c. The history of write_in_full() is full of moves and renames, but was originally introduced by 7230e6d (Add write_or_die(), a helper function, 2006-08-21). The point of these sections of code are to flush a write buffer using xwrite() and report errors in the case of disk space issues or other generic input/output errors. The logic in flush() can interpret the output of write_in_full() to provide the correct error messages to users. The logic in the hashfile API has an additional set of logic to augment the progress indicator between calls to xwrite(). This was introduced by 2a128d6 (add throughput display to git-push, 2007-10-30). It seems that since the hashfile's buffer is only 8KB, these additional progress indicators might not be incredibly necessary. Instead, update the progress only when write_in_full() complete. Signed-off-by: Derrick Stolee --- csum-file.c | 17 +++++------------ 1 file changed, 5 insertions(+), 12 deletions(-) diff --git a/csum-file.c b/csum-file.c index 7510950fa3e9..3c26389d4914 100644 --- a/csum-file.c +++ b/csum-file.c @@ -25,21 +25,14 @@ static void flush(struct hashfile *f, const void *buf, unsigned int count) die("sha1 file '%s' validation error", f->name); } - for (;;) { - int ret = xwrite(f->fd, buf, count); - if (ret > 0) { - f->total += ret; - display_throughput(f->tp, f->total); - buf = (char *) buf + ret; - count -= ret; - if (count) - continue; - return; - } - if (!ret) + if (write_in_full(f->fd, buf, count) < 0) { + if (errno == ENOSPC) die("sha1 file '%s' write error. Out of diskspace", f->name); die_errno("sha1 file '%s' write error", f->name); } + + f->total += count; + display_throughput(f->tp, f->total); } void hashflush(struct hashfile *f) From patchwork Mon May 17 12:24:50 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12261767 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 57FA9C433ED for ; Mon, 17 May 2021 12:25:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3743C61285 for ; Mon, 17 May 2021 12:25:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237002AbhEQM0S (ORCPT ); Mon, 17 May 2021 08:26:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40896 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236992AbhEQM0N (ORCPT ); Mon, 17 May 2021 08:26:13 -0400 Received: from mail-wm1-x329.google.com (mail-wm1-x329.google.com [IPv6:2a00:1450:4864:20::329]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 187D0C061573 for ; Mon, 17 May 2021 05:24:56 -0700 (PDT) Received: by mail-wm1-x329.google.com with SMTP id z137-20020a1c7e8f0000b02901774f2a7dc4so1200968wmc.0 for ; Mon, 17 May 2021 05:24:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=ClO1IzNp/+NptSD5vjE/6PtbgR0czHv6zACBZRa0S+U=; b=Ha3/4coS4vONThTuJaBW+zfoKMlqZYbwQWld3oNBjIHAdqg9iabGq9YLOcJEhMiUFA juRuiim+z0TcZwj0DQUauA/qSSPiIQGu74MTFnYSufJzn+QHUIuLh2Qid9kkZF5xm16W mthCJFkTDtkcq9/JBsZ7Ejwg8I2+Ckjfltb98KZAmRQ+dk9iCQ/PISTE/sNvtoNbBB3+ +VO4UdtImPwUiF8tFIpYs3UxKdO9Si5/zE0PHeWspXld/wsP8O5kDe3jVkn+oHQps8C/ 0hNRMXVZGaocYF+TdWpjUpYZvkWKtoKxnCAGa1W0e4jqn3iaN4uGehAu7D9d/xFZ0wVV Y2MQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=ClO1IzNp/+NptSD5vjE/6PtbgR0czHv6zACBZRa0S+U=; b=uXlOSq1cfNfIXtZAgaN+8CF0wFxck/gNFkyrWsDpSMBVT81VrdxQg+54wm8b5p/oC2 LK1o6bTEO8dnYMnjIu05tJaC0bsqpmT2vOqwFmJGf56LtzR3Y86ZZ9qydMlhzcbBlESN 4H0zz5FC68+Z6otg0T3sOE7YIyvfQGxe8cXxlJXrNsymTBpDlSN4u/2r95lbCOXaG1XJ q1ralIXYU179HHJ4E/PCflx7aD77Zakjq1mdFpCFe6o6+7YbrNWMIPvkpwXkc5FrYfJF dM53L42ix8R+GAzXg+7hHm97pOFV47yXQjNZPLClVRTrMChc+SdnqnbK/ano6Ze4ORRI VJmw== X-Gm-Message-State: AOAM531OuhQcq1ANwN3+NmxzxckJsee4eeEDmdtivQDf0HJUfOZF2Lja DC81gLQKWGK80LLC/9mP1hOhmz14clE= X-Google-Smtp-Source: ABdhPJzg+eMijUMDFWQTs/zQzVO0zvwLzxdx3E7rbYVWCcYlxQejl5YiTkRHJG7+YA9rob4HMUAPsw== X-Received: by 2002:a1c:f717:: with SMTP id v23mr22704836wmh.32.1621254294796; Mon, 17 May 2021 05:24:54 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id r11sm11299237wrp.46.2021.05.17.05.24.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 17 May 2021 05:24:54 -0700 (PDT) Message-Id: <9dc602f6c4221e2259778842ec3d1eda57508333.1621254292.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 17 May 2021 12:24:50 +0000 Subject: [PATCH v2 2/4] csum-file.h: increase hashfile buffer size Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, peff@peff.net, stolee@gmail.com, git@jeffhostetler.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The hashfile API uses a hard-coded buffer size of 8KB and has ever since it was introduced in c38138c (git-pack-objects: write the pack files with a SHA1 csum, 2005-06-26). It performs a similar function to the hashing buffers in read-cache.c, but that code was updated from 8KB to 128KB in f279894 (read-cache: make the index write buffer size 128K, 2021-02-18). The justification there was that do_write_index() improves from 1.02s to 0.72s. There is a buffer, check_buffer, that is used to verify the check_fd file descriptor. When this buffer increases to 128K to fit the data being flushed, it causes the stack to overflow the limits placed in the test suite. By moving this to a static buffer, we stop using stack data for this purpose, but we lose some thread-safety. This change makes it unsafe to write to multiple hashfiles across different threads. By adding a new trace2 region in the chunk-format API, we can see that the writing portion of 'git multi-pack-index write' lowers from ~1.49s to ~1.47s on a Linux machine. These effects may be more pronounced or diminished on other filesystems. The end-to-end timing is too noisy to have a definitive change either way. Signed-off-by: Derrick Stolee --- chunk-format.c | 12 ++++++++---- csum-file.c | 28 +++++++++++++++++----------- csum-file.h | 4 +++- 3 files changed, 28 insertions(+), 16 deletions(-) diff --git a/chunk-format.c b/chunk-format.c index da191e59a29d..1c3dca62e205 100644 --- a/chunk-format.c +++ b/chunk-format.c @@ -58,9 +58,11 @@ void add_chunk(struct chunkfile *cf, int write_chunkfile(struct chunkfile *cf, void *data) { - int i; + int i, result = 0; uint64_t cur_offset = hashfile_total(cf->f); + trace2_region_enter("chunkfile", "write", the_repository); + /* Add the table of contents to the current offset */ cur_offset += (cf->chunks_nr + 1) * CHUNK_TOC_ENTRY_SIZE; @@ -77,10 +79,10 @@ int write_chunkfile(struct chunkfile *cf, void *data) for (i = 0; i < cf->chunks_nr; i++) { off_t start_offset = hashfile_total(cf->f); - int result = cf->chunks[i].write_fn(cf->f, data); + result = cf->chunks[i].write_fn(cf->f, data); if (result) - return result; + goto cleanup; if (hashfile_total(cf->f) - start_offset != cf->chunks[i].size) BUG("expected to write %"PRId64" bytes to chunk %"PRIx32", but wrote %"PRId64" instead", @@ -88,7 +90,9 @@ int write_chunkfile(struct chunkfile *cf, void *data) hashfile_total(cf->f) - start_offset); } - return 0; +cleanup: + trace2_region_leave("chunkfile", "write", the_repository); + return result; } int read_table_of_contents(struct chunkfile *cf, diff --git a/csum-file.c b/csum-file.c index 3c26389d4914..bd9939c49efa 100644 --- a/csum-file.c +++ b/csum-file.c @@ -11,19 +11,25 @@ #include "progress.h" #include "csum-file.h" +static void verify_buffer_or_die(struct hashfile *f, + const void *buf, + unsigned int count) +{ + static unsigned char check_buffer[WRITE_BUFFER_SIZE]; + ssize_t ret = read_in_full(f->check_fd, check_buffer, count); + + if (ret < 0) + die_errno("%s: sha1 file read error", f->name); + if (ret != count) + die("%s: sha1 file truncated", f->name); + if (memcmp(buf, check_buffer, count)) + die("sha1 file '%s' validation error", f->name); +} + static void flush(struct hashfile *f, const void *buf, unsigned int count) { - if (0 <= f->check_fd && count) { - unsigned char check_buffer[8192]; - ssize_t ret = read_in_full(f->check_fd, check_buffer, count); - - if (ret < 0) - die_errno("%s: sha1 file read error", f->name); - if (ret != count) - die("%s: sha1 file truncated", f->name); - if (memcmp(buf, check_buffer, count)) - die("sha1 file '%s' validation error", f->name); - } + if (0 <= f->check_fd && count) + verify_buffer_or_die(f, buf, count); if (write_in_full(f->fd, buf, count) < 0) { if (errno == ENOSPC) diff --git a/csum-file.h b/csum-file.h index e54d53d1d0b3..bc88eb86fc28 100644 --- a/csum-file.h +++ b/csum-file.h @@ -5,6 +5,8 @@ struct progress; +#define WRITE_BUFFER_SIZE (128 * 1024) + /* A SHA1-protected file */ struct hashfile { int fd; @@ -16,7 +18,7 @@ struct hashfile { const char *name; int do_crc; uint32_t crc32; - unsigned char buffer[8192]; + unsigned char buffer[WRITE_BUFFER_SIZE]; }; /* Checkpoint */ From patchwork Mon May 17 12:24:51 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12261765 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9E8D4C43460 for ; Mon, 17 May 2021 12:25:02 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7D98761244 for ; Mon, 17 May 2021 12:25:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237000AbhEQM0R (ORCPT ); Mon, 17 May 2021 08:26:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40904 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236991AbhEQM0N (ORCPT ); Mon, 17 May 2021 08:26:13 -0400 Received: from mail-wr1-x432.google.com (mail-wr1-x432.google.com [IPv6:2a00:1450:4864:20::432]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9CF4CC061756 for ; Mon, 17 May 2021 05:24:56 -0700 (PDT) Received: by mail-wr1-x432.google.com with SMTP id i17so6158737wrq.11 for ; Mon, 17 May 2021 05:24:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=mL3VTJcZRgjHjWnseSbPBgNW4A9nsJStj0QoIQD9lIo=; b=XVqUU5HL3qSQtfIJqBKLAAFsy+2dXOjamlfjPUcE6T8T1u3cCvZz302AMQ5f6bXSDQ 8u1FZ7ZBBWMCZY8bhU+iL+gL2CDLz5219PwNWUWH2p3JLUumBQR1vOFXrzolAjI7vzfE XquR8RGQTd4MaNcODlCgB0AeUSdLpp+EyA5AebnVViAGGut3t9U3lVicwyOEO9Cd+1Is VVIJVSPRrI4ALHq4wNDLpATWYJjJ26YO+2n4ymlnLpMabKGMX0iNK1K/3ijxCedrO/sC /6g0Nw4Jev3efCGMryMZIg1EvTAkUogRUMYAoQiOpiDpEghByYJNJhqcGUeBUzq8pYN/ uJZw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=mL3VTJcZRgjHjWnseSbPBgNW4A9nsJStj0QoIQD9lIo=; b=fkToVfO6rTbkSy1x3+W5XwR54BeIdIJEwFGr4eS/qYzYnn97Ng1z+57zSUS4rAO/G2 ftqgeLypvysq2ZSG+KE28nS85PL7EkwJ8azCD0Vm4e6+VKU1YuD0c1nTBwnzyBH/D3Pp 3uPPXhLjCIoEtWaQrK05g9xO9CgrBLmpYFSKBbKXoEc8CpM1oyIIwfhpcq3I3cNtF+PJ lJ2OsTfwuG4TllRJ/mhji0S3y3KbwSoX7ZqfSwQKkzo/M8JfDe2JsxM6KeT6UGF2d1yU auPhhOFuiu7tLuRrdPGhO9ktd0PbHfV3YWND4ClqwA+mEhvOJWyOJjN0jlcyvl0mJ0+5 MwaA== X-Gm-Message-State: AOAM533UktgrUnmREjFGsxEe2MM3QEqjoqO0cUi+QZwz0Ve2M7Myzs4x iD1NDTvmR4QulAXsPeV7Nu5hP0RtIYI= X-Google-Smtp-Source: ABdhPJyyC3ZDhi3R8qQu9mErfTCqP9KyVhokJhgNnVU3W4Unob14XqFR/+taoQH4Xd/wBEcDcSJrwQ== X-Received: by 2002:adf:e589:: with SMTP id l9mr5494244wrm.361.1621254295372; Mon, 17 May 2021 05:24:55 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id 6sm16439321wry.60.2021.05.17.05.24.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 17 May 2021 05:24:55 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Mon, 17 May 2021 12:24:51 +0000 Subject: [PATCH v2 3/4] read-cache: use hashfile instead of git_hash_ctx Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, peff@peff.net, stolee@gmail.com, git@jeffhostetler.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The do_write_index() method in read-cache.c has its own hashing logic and buffering mechanism. Specifically, the ce_write() method was introduced by 4990aadc (Speed up index file writing by chunking it nicely, 2005-04-20) and similar mechanisms were introduced a few months later in c38138cd (git-pack-objects: write the pack files with a SHA1 csum, 2005-06-26). Based on the timing, in the early days of the Git codebase, I figured that these roughly equivalent code paths were never unified only because it got lost in the shuffle. The hashfile API has since been used extensively in other file formats, such as pack-indexes, mult-pack-indexes, and commit-graphs. Therefore, it seems prudent to unify the index writing code to use the same mechanism. I discovered this disparity while trying to create a new index format that uses the chunk-format API. That API uses a hashfile as its base, so it is incompatible with the custom code in read-cache.c. This rewrite is rather straightforward. It replaces all writes to the temporary file with writes to the hashfile struct. This takes care of many of the direct interactions with the_hash_algo. There are still some remaining: the extension headers are hashed for use in the End of Index Entries (EOIE) extension. This use of the git_hash_ctx is left as-is. There are multiple reasons to not use a hashfile here, including the fact that the data is not actually writing to a file, just a hash computation. These hashes do not block our adoption of the chunk-format API in a future change to the index, so leave it as-is. The internals of the algorithms are mostly identical. Previously, the hashfile API used a smaller 8KB buffer instead of the 128KB buffer from read-cache.c. The previous change already unified these sizes. There is one subtle point: we do not pass the CSUM_FSYNC to the finalize_hashfile() method, which differs from most consumers of the hashfile API. The extra fsync() call indicated by this flag causes a significant peformance degradation that is noticeable for quick commands that write the index, such as "git add". Other consumers can absorb this cost with their more complicated data structure organization, and further writing structures such as pack-files and commit-graphs is rarely in the critical path for common user interactions. Some static methods become orphaned in this diff, so I marked them as MAYBE_UNUSED. The diff is much harder to read if they are deleted during this change. Instead, they will be deleted in the following change. In addition to the test suite passing, I computed indexes using the previous binaries and the binaries compiled after this change, and found the index data to be exactly equal. Finally, I did extensive performance testing of "git update-index --force-write" on repos of various sizes, including one with over 2 million paths at HEAD. These tests demonstrated less than 1% difference in behavior, so the performance should be considered identical. Signed-off-by: Derrick Stolee --- read-cache.c | 137 +++++++++++++++++++++++++-------------------------- 1 file changed, 66 insertions(+), 71 deletions(-) diff --git a/read-cache.c b/read-cache.c index fbf3a4ce7d5d..1c0bda81e7e7 100644 --- a/read-cache.c +++ b/read-cache.c @@ -26,6 +26,7 @@ #include "thread-utils.h" #include "progress.h" #include "sparse-index.h" +#include "csum-file.h" /* Mask for the name length in ce_flags in the on-disk index */ @@ -2519,6 +2520,7 @@ int repo_index_has_changes(struct repository *repo, static unsigned char write_buffer[WRITE_BUFFER_SIZE]; static unsigned long write_buffer_len; +MAYBE_UNUSED static int ce_write_flush(git_hash_ctx *context, int fd) { unsigned int buffered = write_buffer_len; @@ -2531,6 +2533,7 @@ static int ce_write_flush(git_hash_ctx *context, int fd) return 0; } +MAYBE_UNUSED static int ce_write(git_hash_ctx *context, int fd, void *data, unsigned int len) { while (len) { @@ -2553,19 +2556,24 @@ static int ce_write(git_hash_ctx *context, int fd, void *data, unsigned int len) return 0; } -static int write_index_ext_header(git_hash_ctx *context, git_hash_ctx *eoie_context, - int fd, unsigned int ext, unsigned int sz) +static int write_index_ext_header(struct hashfile *f, + git_hash_ctx *eoie_f, + unsigned int ext, + unsigned int sz) { - ext = htonl(ext); - sz = htonl(sz); - if (eoie_context) { - the_hash_algo->update_fn(eoie_context, &ext, 4); - the_hash_algo->update_fn(eoie_context, &sz, 4); + hashwrite_be32(f, ext); + hashwrite_be32(f, sz); + + if (eoie_f) { + ext = htonl(ext); + sz = htonl(sz); + the_hash_algo->update_fn(eoie_f, &ext, sizeof(ext)); + the_hash_algo->update_fn(eoie_f, &sz, sizeof(sz)); } - return ((ce_write(context, fd, &ext, 4) < 0) || - (ce_write(context, fd, &sz, 4) < 0)) ? -1 : 0; + return 0; } +MAYBE_UNUSED static int ce_flush(git_hash_ctx *context, int fd, unsigned char *hash) { unsigned int left = write_buffer_len; @@ -2667,11 +2675,10 @@ static void copy_cache_entry_to_ondisk(struct ondisk_cache_entry *ondisk, } } -static int ce_write_entry(git_hash_ctx *c, int fd, struct cache_entry *ce, +static int ce_write_entry(struct hashfile *f, struct cache_entry *ce, struct strbuf *previous_name, struct ondisk_cache_entry *ondisk) { int size; - int result; unsigned int saved_namelen; int stripped_name = 0; static unsigned char padding[8] = { 0x00 }; @@ -2687,11 +2694,9 @@ static int ce_write_entry(git_hash_ctx *c, int fd, struct cache_entry *ce, if (!previous_name) { int len = ce_namelen(ce); copy_cache_entry_to_ondisk(ondisk, ce); - result = ce_write(c, fd, ondisk, size); - if (!result) - result = ce_write(c, fd, ce->name, len); - if (!result) - result = ce_write(c, fd, padding, align_padding_size(size, len)); + hashwrite(f, ondisk, size); + hashwrite(f, ce->name, len); + hashwrite(f, padding, align_padding_size(size, len)); } else { int common, to_remove, prefix_size; unsigned char to_remove_vi[16]; @@ -2705,13 +2710,10 @@ static int ce_write_entry(git_hash_ctx *c, int fd, struct cache_entry *ce, prefix_size = encode_varint(to_remove, to_remove_vi); copy_cache_entry_to_ondisk(ondisk, ce); - result = ce_write(c, fd, ondisk, size); - if (!result) - result = ce_write(c, fd, to_remove_vi, prefix_size); - if (!result) - result = ce_write(c, fd, ce->name + common, ce_namelen(ce) - common); - if (!result) - result = ce_write(c, fd, padding, 1); + hashwrite(f, ondisk, size); + hashwrite(f, to_remove_vi, prefix_size); + hashwrite(f, ce->name + common, ce_namelen(ce) - common); + hashwrite(f, padding, 1); strbuf_splice(previous_name, common, to_remove, ce->name + common, ce_namelen(ce) - common); @@ -2721,7 +2723,7 @@ static int ce_write_entry(git_hash_ctx *c, int fd, struct cache_entry *ce, ce->ce_flags &= ~CE_STRIP_NAME; } - return result; + return 0; } /* @@ -2833,8 +2835,8 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, int strip_extensions) { uint64_t start = getnanotime(); - int newfd = tempfile->fd; - git_hash_ctx c, eoie_c; + struct hashfile *f; + git_hash_ctx *eoie_c = NULL; struct cache_header hdr; int i, err = 0, removed, extended, hdr_version; struct cache_entry **cache = istate->cache; @@ -2848,6 +2850,8 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, struct index_entry_offset_table *ieot = NULL; int nr, nr_threads; + f = hashfd(tempfile->fd, tempfile->filename.buf); + for (i = removed = extended = 0; i < entries; i++) { if (cache[i]->ce_flags & CE_REMOVE) removed++; @@ -2876,9 +2880,7 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, hdr.hdr_version = htonl(hdr_version); hdr.hdr_entries = htonl(entries - removed); - the_hash_algo->init_fn(&c); - if (ce_write(&c, newfd, &hdr, sizeof(hdr)) < 0) - return -1; + hashwrite(f, &hdr, sizeof(hdr)); if (!HAVE_THREADS || git_config_get_index_threads(&nr_threads)) nr_threads = 1; @@ -2913,12 +2915,8 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, } } - offset = lseek(newfd, 0, SEEK_CUR); - if (offset < 0) { - free(ieot); - return -1; - } - offset += write_buffer_len; + offset = hashfile_total(f); + nr = 0; previous_name = (hdr_version == 4) ? &previous_name_buf : NULL; @@ -2953,14 +2951,10 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, if (previous_name) previous_name->buf[0] = 0; nr = 0; - offset = lseek(newfd, 0, SEEK_CUR); - if (offset < 0) { - free(ieot); - return -1; - } - offset += write_buffer_len; + + offset = hashfile_total(f); } - if (ce_write_entry(&c, newfd, ce, previous_name, (struct ondisk_cache_entry *)&ondisk) < 0) + if (ce_write_entry(f, ce, previous_name, (struct ondisk_cache_entry *)&ondisk) < 0) err = -1; if (err) @@ -2979,14 +2973,16 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, return err; } - /* Write extension data here */ - offset = lseek(newfd, 0, SEEK_CUR); - if (offset < 0) { - free(ieot); - return -1; + offset = hashfile_total(f); + + /* + * The extension headers must be hashed on their own for the + * EOIE extension. Create a hashfile here to compute that hash. + */ + if (offset && record_eoie()) { + CALLOC_ARRAY(eoie_c, 1); + the_hash_algo->init_fn(eoie_c); } - offset += write_buffer_len; - the_hash_algo->init_fn(&eoie_c); /* * Lets write out CACHE_EXT_INDEXENTRYOFFSETTABLE first so that we @@ -2999,8 +2995,8 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, struct strbuf sb = STRBUF_INIT; write_ieot_extension(&sb, ieot); - err = write_index_ext_header(&c, &eoie_c, newfd, CACHE_EXT_INDEXENTRYOFFSETTABLE, sb.len) < 0 - || ce_write(&c, newfd, sb.buf, sb.len) < 0; + err = write_index_ext_header(f, eoie_c, CACHE_EXT_INDEXENTRYOFFSETTABLE, sb.len) < 0; + hashwrite(f, sb.buf, sb.len); strbuf_release(&sb); free(ieot); if (err) @@ -3012,9 +3008,9 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, struct strbuf sb = STRBUF_INIT; err = write_link_extension(&sb, istate) < 0 || - write_index_ext_header(&c, &eoie_c, newfd, CACHE_EXT_LINK, - sb.len) < 0 || - ce_write(&c, newfd, sb.buf, sb.len) < 0; + write_index_ext_header(f, eoie_c, CACHE_EXT_LINK, + sb.len) < 0; + hashwrite(f, sb.buf, sb.len); strbuf_release(&sb); if (err) return -1; @@ -3023,8 +3019,8 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, struct strbuf sb = STRBUF_INIT; cache_tree_write(&sb, istate->cache_tree); - err = write_index_ext_header(&c, &eoie_c, newfd, CACHE_EXT_TREE, sb.len) < 0 - || ce_write(&c, newfd, sb.buf, sb.len) < 0; + err = write_index_ext_header(f, eoie_c, CACHE_EXT_TREE, sb.len) < 0; + hashwrite(f, sb.buf, sb.len); strbuf_release(&sb); if (err) return -1; @@ -3033,9 +3029,9 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, struct strbuf sb = STRBUF_INIT; resolve_undo_write(&sb, istate->resolve_undo); - err = write_index_ext_header(&c, &eoie_c, newfd, CACHE_EXT_RESOLVE_UNDO, - sb.len) < 0 - || ce_write(&c, newfd, sb.buf, sb.len) < 0; + err = write_index_ext_header(f, eoie_c, CACHE_EXT_RESOLVE_UNDO, + sb.len) < 0; + hashwrite(f, sb.buf, sb.len); strbuf_release(&sb); if (err) return -1; @@ -3044,9 +3040,9 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, struct strbuf sb = STRBUF_INIT; write_untracked_extension(&sb, istate->untracked); - err = write_index_ext_header(&c, &eoie_c, newfd, CACHE_EXT_UNTRACKED, - sb.len) < 0 || - ce_write(&c, newfd, sb.buf, sb.len) < 0; + err = write_index_ext_header(f, eoie_c, CACHE_EXT_UNTRACKED, + sb.len) < 0; + hashwrite(f, sb.buf, sb.len); strbuf_release(&sb); if (err) return -1; @@ -3055,14 +3051,14 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, struct strbuf sb = STRBUF_INIT; write_fsmonitor_extension(&sb, istate); - err = write_index_ext_header(&c, &eoie_c, newfd, CACHE_EXT_FSMONITOR, sb.len) < 0 - || ce_write(&c, newfd, sb.buf, sb.len) < 0; + err = write_index_ext_header(f, eoie_c, CACHE_EXT_FSMONITOR, sb.len) < 0; + hashwrite(f, sb.buf, sb.len); strbuf_release(&sb); if (err) return -1; } if (istate->sparse_index) { - if (write_index_ext_header(&c, &eoie_c, newfd, CACHE_EXT_SPARSE_DIRECTORIES, 0) < 0) + if (write_index_ext_header(f, eoie_c, CACHE_EXT_SPARSE_DIRECTORIES, 0) < 0) return -1; } @@ -3072,19 +3068,18 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, * read. Write it out regardless of the strip_extensions parameter as we need it * when loading the shared index. */ - if (offset && record_eoie()) { + if (eoie_c) { struct strbuf sb = STRBUF_INIT; - write_eoie_extension(&sb, &eoie_c, offset); - err = write_index_ext_header(&c, NULL, newfd, CACHE_EXT_ENDOFINDEXENTRIES, sb.len) < 0 - || ce_write(&c, newfd, sb.buf, sb.len) < 0; + write_eoie_extension(&sb, eoie_c, offset); + err = write_index_ext_header(f, NULL, CACHE_EXT_ENDOFINDEXENTRIES, sb.len) < 0; + hashwrite(f, sb.buf, sb.len); strbuf_release(&sb); if (err) return -1; } - if (ce_flush(&c, newfd, istate->oid.hash)) - return -1; + finalize_hashfile(f, istate->oid.hash, CSUM_HASH_IN_STREAM); if (close_tempfile_gently(tempfile)) { error(_("could not close '%s'"), get_tempfile_path(tempfile)); return -1; From patchwork Mon May 17 12:24:52 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12261769 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8FF1EC433B4 for ; Mon, 17 May 2021 12:25:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6E0DC61285 for ; Mon, 17 May 2021 12:25:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237005AbhEQM0T (ORCPT ); Mon, 17 May 2021 08:26:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40908 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236993AbhEQM0N (ORCPT ); Mon, 17 May 2021 08:26:13 -0400 Received: from mail-wm1-x32f.google.com (mail-wm1-x32f.google.com [IPv6:2a00:1450:4864:20::32f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 281F0C06175F for ; Mon, 17 May 2021 05:24:57 -0700 (PDT) Received: by mail-wm1-x32f.google.com with SMTP id k5-20020a05600c4785b0290174b7945d7eso3057530wmo.2 for ; Mon, 17 May 2021 05:24:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=Jm8DwKKeRwrgmlPJOYqnlTtvUko/WwlTnENBi71vOBs=; b=fiifZaShk11BhD/62LOhBKg6kaJ+t2kedz5bPf5QfsuL4uPU8ACIX2hhRJZQCcds0r wKbHUilf1ssJmT8DeGDOqtyHDXakdVATBEkcvmWpYemtEIPYyqHMdIAxEBzCDSDUW4/o MYyzFKpJF6L4aWQL0N1gaL7MCmwK13xZLT2cEQdI/I1ZmKydlm4v8mPCDGZmnClmXKPa nkAvPvG7MTGL5WTJ8DeXD2eF5L8n7Z7hdI+TXT6DYVm3nWozgnTb6Dt6N+Q/SQzoqAg5 TUWEc1wVbrT6Wqa3fSigi8yJwuGOjJmFoOTtMQlx16M/eclyqxznSFwuHuEfEYyaGEC+ hgNQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=Jm8DwKKeRwrgmlPJOYqnlTtvUko/WwlTnENBi71vOBs=; b=NGlCLMkdRHMcG7Y/z65gIZAmn2W40/AgfaMmim8lh17gHgXS+sokVSDyStCRADqt3t s3AwUA/PQs7cnigbBVgIsw4E/Fr5XRscOdmDPatz2hiUBFSzKU0wyhjCGD+HUAuM1a1+ kFIqDL4f+7f8S08tT88GIx4ZDu1KXdZZ8W7xq7nV3ruCzkCXPJbEL6EtIfbswuFQ9WDt ozrjQTIW3fQ5+XLcbD+FrQilA8myNXBOUsYGqkIGYu0bB52EzaJVN9iFdIPiUjb0Pf8+ PopbXkNwfgihnACCuZWJNYawoUqj38SK3zJUgvbeEyf2fTMIZDgPChy8qerLy952+E7T jFqQ== X-Gm-Message-State: AOAM533Ncdq4qKtyFoNyBhfU41QfHP4BjMZSEaFs7UxXa+ovgHlVz8So IkK8zaeeTD9BuxjOG2BkPJU1kSGWzSQ= X-Google-Smtp-Source: ABdhPJzs4zr+NazYTd8NwuvmKLQ/6bmddIAo70UeyVzsC6/c8rr/B+xrfqGzvhPW48N0LWrQCRhNbA== X-Received: by 2002:a1c:b002:: with SMTP id z2mr63889147wme.26.1621254295981; Mon, 17 May 2021 05:24:55 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id 1sm9815701wmj.23.2021.05.17.05.24.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 17 May 2021 05:24:55 -0700 (PDT) Message-Id: <4b3814eb4c80617d3b180dd576348a3b2f26b35e.1621254292.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 17 May 2021 12:24:52 +0000 Subject: [PATCH v2 4/4] read-cache: delete unused hashing methods Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, peff@peff.net, stolee@gmail.com, git@jeffhostetler.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee These methods were marked as MAYBE_UNUSED in the previous change to avoid a complicated diff. Delete them entirely, since we now use the hashfile API instead of this custom hashing code. Signed-off-by: Derrick Stolee --- read-cache.c | 64 ---------------------------------------------------- 1 file changed, 64 deletions(-) diff --git a/read-cache.c b/read-cache.c index 1c0bda81e7e7..aa6751c6a092 100644 --- a/read-cache.c +++ b/read-cache.c @@ -2516,46 +2516,6 @@ int repo_index_has_changes(struct repository *repo, } } -#define WRITE_BUFFER_SIZE (128 * 1024) -static unsigned char write_buffer[WRITE_BUFFER_SIZE]; -static unsigned long write_buffer_len; - -MAYBE_UNUSED -static int ce_write_flush(git_hash_ctx *context, int fd) -{ - unsigned int buffered = write_buffer_len; - if (buffered) { - the_hash_algo->update_fn(context, write_buffer, buffered); - if (write_in_full(fd, write_buffer, buffered) < 0) - return -1; - write_buffer_len = 0; - } - return 0; -} - -MAYBE_UNUSED -static int ce_write(git_hash_ctx *context, int fd, void *data, unsigned int len) -{ - while (len) { - unsigned int buffered = write_buffer_len; - unsigned int partial = WRITE_BUFFER_SIZE - buffered; - if (partial > len) - partial = len; - memcpy(write_buffer + buffered, data, partial); - buffered += partial; - if (buffered == WRITE_BUFFER_SIZE) { - write_buffer_len = buffered; - if (ce_write_flush(context, fd)) - return -1; - buffered = 0; - } - write_buffer_len = buffered; - len -= partial; - data = (char *) data + partial; - } - return 0; -} - static int write_index_ext_header(struct hashfile *f, git_hash_ctx *eoie_f, unsigned int ext, @@ -2573,30 +2533,6 @@ static int write_index_ext_header(struct hashfile *f, return 0; } -MAYBE_UNUSED -static int ce_flush(git_hash_ctx *context, int fd, unsigned char *hash) -{ - unsigned int left = write_buffer_len; - - if (left) { - write_buffer_len = 0; - the_hash_algo->update_fn(context, write_buffer, left); - } - - /* Flush first if not enough space for hash signature */ - if (left + the_hash_algo->rawsz > WRITE_BUFFER_SIZE) { - if (write_in_full(fd, write_buffer, left) < 0) - return -1; - left = 0; - } - - /* Append the hash signature at the end */ - the_hash_algo->final_fn(write_buffer + left, context); - hashcpy(hash, write_buffer + left); - left += the_hash_algo->rawsz; - return (write_in_full(fd, write_buffer, left) < 0) ? -1 : 0; -} - static void ce_smudge_racily_clean_entry(struct index_state *istate, struct cache_entry *ce) {