From patchwork Thu Dec 3 16:16:40 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 11949101 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B0938C4361A for ; Thu, 3 Dec 2020 16:17:55 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 58D61207AA for ; Thu, 3 Dec 2020 16:17:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2389290AbgLCQRk (ORCPT ); Thu, 3 Dec 2020 11:17:40 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56912 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726032AbgLCQRj (ORCPT ); Thu, 3 Dec 2020 11:17:39 -0500 Received: from mail-wr1-x442.google.com (mail-wr1-x442.google.com [IPv6:2a00:1450:4864:20::442]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 189B4C061A51 for ; Thu, 3 Dec 2020 08:16:59 -0800 (PST) Received: by mail-wr1-x442.google.com with SMTP id l1so2440827wrb.9 for ; Thu, 03 Dec 2020 08:16:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=EPPdLSsQ8z9O62xecWRL5UkGZdb9qdYM4+t9bIDnwC4=; b=P58uNl5nnH++aMd2gMBHaFhJPsC89Ie0rfB2Q0P2LL0Y+8XTpQQFan/k+5E45fgUZ1 ujhKov5WTKmmcRElZTrk1O5YatrQnE41uX3J8ni5iIhlDafclxjGKkXyObbBVBY8frc7 lNxW+eLQQ9NNms6nrpx2XKduGTNZN0YwvI7KxTnuWCISqT9kBM906EE8uMNbrlxlgjpg d7tGo5XWJr5EQoPLNYxN/VsiQrqOO+MBHDG4GrAXI2q6BqKIa14EbpnZ8Jy1n9UWfMx4 V4IhQNM2mEoQEGM8es6J69Tg7XE0Zq8ukacf+W35s3C5yzjoUhckLxHk/bZMFtnOyv/u Aksg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=EPPdLSsQ8z9O62xecWRL5UkGZdb9qdYM4+t9bIDnwC4=; b=HHLQq6ZXpleUAVvW8TiY2aKOSSTILcM4ZQ1O4pANKnuU0VxuBcOcIov6nhwtyuztN/ T5OWBRNwiZjYzk0Jj1sJzUxlZ5BFI4P7+QNw3qQWMUS00w/qNL7P2dfDm3Rtq7JsCE7B FEuCAQEkceL6R5xTEWelM5NdWRrj/6S5Up6YvVrr+D1Btf9zYbXfQIAYSNdijR+/gLVs 5py5Hi0K8Q2jdZp3mtnbIPcDF3Xs4W1jaIw6idh0AvJ10eBZ+sUjeIVTmKOldWwVFXEa zjAfb3Q0gQ6RLL09BkjWvEodI63JE9ZupSnBi16ixrpqI59F+h+HD9NZpmukoUXRN2XX 7WJA== X-Gm-Message-State: AOAM5305psQ5r9Gzdksas8uYsM878ZxMeuEE75o7ptRtq/VptWx8Cra+ 4988TD018ySEZRfcAHr/lksD5dhJpc0= X-Google-Smtp-Source: ABdhPJxNJpBdxjUgb2AN0i0kQUAjnioycFPSAxbJvGbmygdnWfrkq65U8SWdV+fTfvXUu4mBsU6+lA== X-Received: by 2002:adf:f2c7:: with SMTP id d7mr4592646wrp.142.1607012217588; Thu, 03 Dec 2020 08:16:57 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id n14sm2074610wmi.1.2020.12.03.08.16.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Dec 2020 08:16:57 -0800 (PST) Message-Id: <191b0afba825fceb2721effeb6783961bf42b59e.1607012215.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Thu, 03 Dec 2020 16:16:40 +0000 Subject: [PATCH 01/15] commit-graph: anonymize data in chunk_write_fn Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: szeder.dev@gmail.com, me@ttaylorr.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee In preparation for creating an API around file formats using chunks and tables of contents, prepare the commit-graph write code to use prototypes that will match this new API. Specifically, convert chunk_write_fn to take a "void *data" parameter instead of the commit-graph-specific "struct write_commit_graph_context" pointer. Signed-off-by: Derrick Stolee --- commit-graph.c | 30 ++++++++++++++++++++++-------- 1 file changed, 22 insertions(+), 8 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index 6f62a07313..6b5bb8b6b8 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -980,8 +980,10 @@ struct write_commit_graph_context { }; static int write_graph_chunk_fanout(struct hashfile *f, - struct write_commit_graph_context *ctx) + void *data) { + struct write_commit_graph_context *ctx = + (struct write_commit_graph_context *)data; int i, count = 0; struct commit **list = ctx->commits.list; @@ -1006,8 +1008,10 @@ static int write_graph_chunk_fanout(struct hashfile *f, } static int write_graph_chunk_oids(struct hashfile *f, - struct write_commit_graph_context *ctx) + void *data) { + struct write_commit_graph_context *ctx = + (struct write_commit_graph_context *)data; struct commit **list = ctx->commits.list; int count; for (count = 0; count < ctx->commits.nr; count++, list++) { @@ -1025,8 +1029,10 @@ static const unsigned char *commit_to_sha1(size_t index, void *table) } static int write_graph_chunk_data(struct hashfile *f, - struct write_commit_graph_context *ctx) + void *data) { + struct write_commit_graph_context *ctx = + (struct write_commit_graph_context *)data; struct commit **list = ctx->commits.list; struct commit **last = ctx->commits.list + ctx->commits.nr; uint32_t num_extra_edges = 0; @@ -1127,8 +1133,10 @@ static int write_graph_chunk_data(struct hashfile *f, } static int write_graph_chunk_extra_edges(struct hashfile *f, - struct write_commit_graph_context *ctx) + void *data) { + struct write_commit_graph_context *ctx = + (struct write_commit_graph_context *)data; struct commit **list = ctx->commits.list; struct commit **last = ctx->commits.list + ctx->commits.nr; struct commit_list *parent; @@ -1181,8 +1189,10 @@ static int write_graph_chunk_extra_edges(struct hashfile *f, } static int write_graph_chunk_bloom_indexes(struct hashfile *f, - struct write_commit_graph_context *ctx) + void *data) { + struct write_commit_graph_context *ctx = + (struct write_commit_graph_context *)data; struct commit **list = ctx->commits.list; struct commit **last = ctx->commits.list + ctx->commits.nr; uint32_t cur_pos = 0; @@ -1216,8 +1226,10 @@ static void trace2_bloom_filter_settings(struct write_commit_graph_context *ctx) } static int write_graph_chunk_bloom_data(struct hashfile *f, - struct write_commit_graph_context *ctx) + void *data) { + struct write_commit_graph_context *ctx = + (struct write_commit_graph_context *)data; struct commit **list = ctx->commits.list; struct commit **last = ctx->commits.list + ctx->commits.nr; @@ -1670,8 +1682,10 @@ static int write_graph_chunk_base_1(struct hashfile *f, } static int write_graph_chunk_base(struct hashfile *f, - struct write_commit_graph_context *ctx) + void *data) { + struct write_commit_graph_context *ctx = + (struct write_commit_graph_context *)data; int num = write_graph_chunk_base_1(f, ctx->new_base_graph); if (num != ctx->num_commit_graphs_after - 1) { @@ -1683,7 +1697,7 @@ static int write_graph_chunk_base(struct hashfile *f, } typedef int (*chunk_write_fn)(struct hashfile *f, - struct write_commit_graph_context *ctx); + void *data); struct chunk_info { uint32_t id; From patchwork Thu Dec 3 16:16:41 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 11949097 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E57AEC4167B for ; Thu, 3 Dec 2020 16:17:55 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8480C207AC for ; Thu, 3 Dec 2020 16:17:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729083AbgLCQRl (ORCPT ); Thu, 3 Dec 2020 11:17:41 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56918 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726032AbgLCQRk (ORCPT ); Thu, 3 Dec 2020 11:17:40 -0500 Received: from mail-wr1-x444.google.com (mail-wr1-x444.google.com [IPv6:2a00:1450:4864:20::444]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2743FC061A53 for ; Thu, 3 Dec 2020 08:17:00 -0800 (PST) Received: by mail-wr1-x444.google.com with SMTP id k14so2467843wrn.1 for ; Thu, 03 Dec 2020 08:17:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=IiAJ5xlfj99x+csrAVzw+pba+/C2uo7MRvgOolgqv98=; b=d112AYOBRcRPqAXwfRPz3aBZhIgUwy9e1wxSEDMNXLKUqgAziP/3wdByMwkqjneQVF xm/4KiSi0xmeT2vBt5DhUCjM9ndJe/TFtxl0SdF5PKOIc0ntCJHawgtQwCSckIDR8PRq ZilP2SLv7d1oTP6hIFXVaw1IBvMn1t4MeyclkfHTN1+BOjGwnEhgUat5ZXVXllKiN03k z/cFoA+hPdiWtM0rFBczHa0SNNFpZyQLGr2WCqH13gAUkQL2kdwFSowL1d+WIw8Pqb0d 8jM1x6OtnG3u+jrZzcIw2UGQSI2rfXRMPsMPkJoX94XtbjkYbIw+p+fq12q9BLOK9wsE ibOQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=IiAJ5xlfj99x+csrAVzw+pba+/C2uo7MRvgOolgqv98=; b=KxVo7DzlDN8l03GV3ESXZYK8jmGP3dliPGmwOHifs70RGUAl2gYMubZ0Gd8fXF6bGm wCbG6ml2t46nhmSUczvASYT+2MmK1xWWyB9gGwYTgLF4QE9UAQQ46i2jRM/bAnKpQJlC 1Mr77kus292o3OUvRltEGa9jHj6wTPcvL4joZGS0mddO/LyqLJbaVHZqCMkWtUvPdI/e 1Qb+NQBJ/FZbILHi1qaJqRrW9DzerQoXZsqnKKFaoj6CnPTyu0OM73z36QoxoKB6IQ5m +WVC1xC6FAmlCHQBT7IBSu26SjeMj4+53RV+W41Wp1W2MSRoDFdgbuJ5LAvYkAN81GEe g+Sg== X-Gm-Message-State: AOAM532rmYWtyH2S3rsyMYJapng9pi3oVXy6Uz7bP/JT5OHB+SaBzOqW AUAkHgxmyoAq6/9Ox2fAE9vnjYQRQ7s= X-Google-Smtp-Source: ABdhPJxY/KksidXlIPbOf8lMe8D9lcpKHFJztZ8+RXPE3b7HwQuuNqoyT+AYDLz8UQ4tB7KTzNXCjg== X-Received: by 2002:a5d:550f:: with SMTP id b15mr4604047wrv.112.1607012218606; Thu, 03 Dec 2020 08:16:58 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id l14sm1932159wmi.33.2020.12.03.08.16.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Dec 2020 08:16:58 -0800 (PST) Message-Id: In-Reply-To: References: Date: Thu, 03 Dec 2020 16:16:41 +0000 Subject: [PATCH 02/15] chunk-format: add API for writing table of contents Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: szeder.dev@gmail.com, me@ttaylorr.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The commit-graph and multi-pack-index formats share a concept of "chunks" that are described by a table of contents near the beginning of the file. The table of contents consists of rows of 12 bytes. Each row starts with a 4-byte ID that signals the type of data stored in the chunk. The row then continues with an 8-byte offset describing the position in the file where that data starts. The table of contents lists the chunks in position order so the length of a chunk can be determined by subtracting its start position from the start position of the next chunk. The table of contents always ends with ID 0x0000 to assist finding the end of the last "real" chunk. Typically, this points to the trailing hash of a file. Convert the chunk-writing loop in commit-graph.c to use the new write_table_of_contents() method in chunk-format.c. The most subtle part of this conversion is the use of 'cur_offset' to allow the caller to specify how many bytes were written in the file's header before the table of contents. This may differ between formats. Signed-off-by: Derrick Stolee --- Makefile | 1 + chunk-format.c | 26 ++++++++++++++++++++++++++ chunk-format.h | 36 ++++++++++++++++++++++++++++++++++++ commit-graph.c | 23 ++--------------------- 4 files changed, 65 insertions(+), 21 deletions(-) create mode 100644 chunk-format.c create mode 100644 chunk-format.h diff --git a/Makefile b/Makefile index d3a531d3c6..cdbcadac14 100644 --- a/Makefile +++ b/Makefile @@ -854,6 +854,7 @@ LIB_OBJS += bundle.o LIB_OBJS += cache-tree.o LIB_OBJS += chdir-notify.o LIB_OBJS += checkout.o +LIB_OBJS += chunk-format.o LIB_OBJS += color.o LIB_OBJS += column.o LIB_OBJS += combine-diff.o diff --git a/chunk-format.c b/chunk-format.c new file mode 100644 index 0000000000..771b6d98d0 --- /dev/null +++ b/chunk-format.c @@ -0,0 +1,26 @@ +#include "git-compat-util.h" +#include "chunk-format.h" +#include "csum-file.h" +#define CHUNK_LOOKUP_WIDTH 12 + +void write_table_of_contents(struct hashfile *f, + uint64_t cur_offset, + struct chunk_info *chunks, + int nr) +{ + int i; + + /* Add the table of contents to the current offset */ + cur_offset += (nr + 1) * CHUNK_LOOKUP_WIDTH; + + for (i = 0; i < nr; i++) { + hashwrite_be32(f, chunks[i].id); + hashwrite_be64(f, cur_offset); + + cur_offset += chunks[i].size; + } + + /* Trailing entry marks the end of the chunks */ + hashwrite_be32(f, 0); + hashwrite_be64(f, cur_offset); +} diff --git a/chunk-format.h b/chunk-format.h new file mode 100644 index 0000000000..4b9cbeb372 --- /dev/null +++ b/chunk-format.h @@ -0,0 +1,36 @@ +#ifndef CHUNK_FORMAT_H +#define CHUNK_FORMAT_H + +#include "git-compat-util.h" + +struct hashfile; + +typedef int (*chunk_write_fn)(struct hashfile *f, + void *data); + +/* + * When writing a chunk-based file format, collect the chunks in + * an array of chunk_info structs. The size stores the _expected_ + * amount of data that will be written by write_fn. + */ +struct chunk_info { + uint32_t id; + uint64_t size; + chunk_write_fn write_fn; +}; + +/* + * Write the chunk data into the supplied hashfile. + * + * * 'cur_offset' indicates the number of bytes written to the hashfile + * before the table of contents starts. + * + * * 'nr' is the number of chunks with non-zero IDs, so 'nr + 1' + * chunks are written in total. + */ +void write_table_of_contents(struct hashfile *f, + uint64_t cur_offset, + struct chunk_info *chunks, + int nr); + +#endif diff --git a/commit-graph.c b/commit-graph.c index 6b5bb8b6b8..5494fda1d3 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -19,6 +19,7 @@ #include "shallow.h" #include "json-writer.h" #include "trace2.h" +#include "chunk-format.h" void git_test_write_commit_graph_or_die(void) { @@ -1696,15 +1697,6 @@ static int write_graph_chunk_base(struct hashfile *f, return 0; } -typedef int (*chunk_write_fn)(struct hashfile *f, - void *data); - -struct chunk_info { - uint32_t id; - uint64_t size; - chunk_write_fn write_fn; -}; - static int write_commit_graph_file(struct write_commit_graph_context *ctx) { uint32_t i; @@ -1715,7 +1707,6 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx) const unsigned hashsz = the_hash_algo->rawsz; struct strbuf progress_title = STRBUF_INIT; int num_chunks = 3; - uint64_t chunk_offset; struct object_id file_hash; if (ctx->split) { @@ -1805,17 +1796,7 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx) hashwrite_u8(f, num_chunks); hashwrite_u8(f, ctx->num_commit_graphs_after - 1); - chunk_offset = 8 + (num_chunks + 1) * GRAPH_CHUNKLOOKUP_WIDTH; - for (i = 0; i <= num_chunks; i++) { - uint32_t chunk_write[3]; - - chunk_write[0] = htonl(chunks[i].id); - chunk_write[1] = htonl(chunk_offset >> 32); - chunk_write[2] = htonl(chunk_offset & 0xffffffff); - hashwrite(f, chunk_write, 12); - - chunk_offset += chunks[i].size; - } + write_table_of_contents(f, /* cur_offset */ 8, chunks, num_chunks); if (ctx->report_progress) { strbuf_addf(&progress_title, From patchwork Thu Dec 3 16:16:42 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 11949103 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1E9C6C4361B for ; Thu, 3 Dec 2020 16:17:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B1F8720658 for ; Thu, 3 Dec 2020 16:17:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2501857AbgLCQRm (ORCPT ); Thu, 3 Dec 2020 11:17:42 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56924 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726032AbgLCQRl (ORCPT ); Thu, 3 Dec 2020 11:17:41 -0500 Received: from mail-wm1-x343.google.com (mail-wm1-x343.google.com [IPv6:2a00:1450:4864:20::343]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 55ED9C061A54 for ; Thu, 3 Dec 2020 08:17:01 -0800 (PST) Received: by mail-wm1-x343.google.com with SMTP id g185so4476417wmf.3 for ; Thu, 03 Dec 2020 08:17:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=Ualui+z4PGgWOE3KuEZB+Ff7yqU71YeKVKL8UE2Sbco=; b=Fxov46H98Dm4RBSlRJagpTQSY9RDYqZsk9ZKRpaDGd3Otbi+DZmSOfxEYCIYiOyguY iZQTjju8zWE7KBl+WSYYBv9M75aQJchnW2FlFZ9KTz3LNA9Un1S/BAGc/+djkFolc0r9 2P1p8zEilJk/i935Dhueyu8fjZcqoUh2MnarlsKfGpqX1qsQxgUZj3tiv9lLy7ehW7MO 5NzgU7XjR6Xgs1XmndejoLvdjacHxNWBl9FsFkj5wq61a+i0BMKmLet/+ti9Wq9rK/0/ Lhj571G+gHwZ+hqhtn/VhuH1I3994PGdZ3HGgxGF0fNicVFoBE+XW7+oQSCWh7Q329Bg vNxg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=Ualui+z4PGgWOE3KuEZB+Ff7yqU71YeKVKL8UE2Sbco=; b=pMwtoPjW8WNwn4X3hgI39tj5I6yxKo/jmFupk9nNgJCZbV8COssrtFn7CCG6KcdfYc 2kB4PpJPEV4+MDDTMhWS7mh9xjTGnKgGcOZ1d7KLvfzG0TI0NapjGgxrnW6qNzGRdw52 Et6LovNBInZT8hSFxqKCSDfcX1aVo4Nhcms4sgx9coSJGbPtAebRW5ABNp6MjgscvxEc n0OE+18EU8N0DQymSnpKaOFsGLcA1ogYHPtPBcZkhokpL9cz4q66hWjzKM2e7E8HsT8Z IxQ14qS/OKbPXp8JA2qbQc3VBZCuhkTfrtiSJftzCn4z64qrnLZS2ZSbiUsm7PMD+5EM cw3A== X-Gm-Message-State: AOAM5324jT/T+ttS/UUPovCG6ETq9tSSMmv8nhiXS1zHYAILAMibqcfL /EOb+6qkhYM4bak4zueNaSV8+yxXcr4= X-Google-Smtp-Source: ABdhPJwNe3de8MuejxVKKVGaUDqDjhq7S3+ZSRP+0I/R7aofNKS5gearwv/5u0NqnwAfaQGwPF+bCw== X-Received: by 2002:a7b:cb8f:: with SMTP id m15mr4104337wmi.95.1607012219728; Thu, 03 Dec 2020 08:16:59 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id c2sm2648851wrf.68.2020.12.03.08.16.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Dec 2020 08:16:59 -0800 (PST) Message-Id: <5a882fc773d7ab07e0ee71b5f588cc8c68d8b5cd.1607012215.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Thu, 03 Dec 2020 16:16:42 +0000 Subject: [PATCH 03/15] midx: rename pack_info to write_midx_context Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: szeder.dev@gmail.com, me@ttaylorr.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee In an effort to streamline our chunk-based file formats, align some of the code structure in write_midx_internal() to be similar to the patterns in write_commit_graph_file(). Specifically, let's create a "struct write_midx_context" that can be used as a data parameter to abstract function types. This change only renames "struct pack_info" to "struct write_midx_context" and the names of instances from "packs" to "ctx". In future changes, we will expand the data inside "struct write_midx_context" and align our chunk-writing method with the chunk-format API. Signed-off-by: Derrick Stolee --- midx.c | 130 ++++++++++++++++++++++++++++----------------------------- 1 file changed, 65 insertions(+), 65 deletions(-) diff --git a/midx.c b/midx.c index da03c1449a..ded4d394bb 100644 --- a/midx.c +++ b/midx.c @@ -451,7 +451,7 @@ static int pack_info_compare(const void *_a, const void *_b) return strcmp(a->pack_name, b->pack_name); } -struct pack_list { +struct write_midx_context { struct pack_info *info; uint32_t nr; uint32_t alloc; @@ -463,37 +463,37 @@ struct pack_list { static void add_pack_to_midx(const char *full_path, size_t full_path_len, const char *file_name, void *data) { - struct pack_list *packs = (struct pack_list *)data; + struct write_midx_context *ctx = (struct write_midx_context *)data; if (ends_with(file_name, ".idx")) { - display_progress(packs->progress, ++packs->pack_paths_checked); - if (packs->m && midx_contains_pack(packs->m, file_name)) + display_progress(ctx->progress, ++ctx->pack_paths_checked); + if (ctx->m && midx_contains_pack(ctx->m, file_name)) return; - ALLOC_GROW(packs->info, packs->nr + 1, packs->alloc); + ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc); - packs->info[packs->nr].p = add_packed_git(full_path, - full_path_len, - 0); + ctx->info[ctx->nr].p = add_packed_git(full_path, + full_path_len, + 0); - if (!packs->info[packs->nr].p) { + if (!ctx->info[ctx->nr].p) { warning(_("failed to add packfile '%s'"), full_path); return; } - if (open_pack_index(packs->info[packs->nr].p)) { + if (open_pack_index(ctx->info[ctx->nr].p)) { warning(_("failed to open pack-index '%s'"), full_path); - close_pack(packs->info[packs->nr].p); - FREE_AND_NULL(packs->info[packs->nr].p); + close_pack(ctx->info[ctx->nr].p); + FREE_AND_NULL(ctx->info[ctx->nr].p); return; } - packs->info[packs->nr].pack_name = xstrdup(file_name); - packs->info[packs->nr].orig_pack_int_id = packs->nr; - packs->info[packs->nr].expired = 0; - packs->nr++; + ctx->info[ctx->nr].pack_name = xstrdup(file_name); + ctx->info[ctx->nr].orig_pack_int_id = ctx->nr; + ctx->info[ctx->nr].expired = 0; + ctx->nr++; } } @@ -801,7 +801,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * uint32_t i; struct hashfile *f = NULL; struct lock_file lk; - struct pack_list packs; + struct write_midx_context ctx = { 0 }; uint32_t *pack_perm = NULL; uint64_t written = 0; uint32_t chunk_ids[MIDX_MAX_CHUNKS + 1]; @@ -820,40 +820,40 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * midx_name); if (m) - packs.m = m; + ctx.m = m; else - packs.m = load_multi_pack_index(object_dir, 1); - - packs.nr = 0; - packs.alloc = packs.m ? packs.m->num_packs : 16; - packs.info = NULL; - ALLOC_ARRAY(packs.info, packs.alloc); - - if (packs.m) { - for (i = 0; i < packs.m->num_packs; i++) { - ALLOC_GROW(packs.info, packs.nr + 1, packs.alloc); - - packs.info[packs.nr].orig_pack_int_id = i; - packs.info[packs.nr].pack_name = xstrdup(packs.m->pack_names[i]); - packs.info[packs.nr].p = NULL; - packs.info[packs.nr].expired = 0; - packs.nr++; + ctx.m = load_multi_pack_index(object_dir, 1); + + ctx.nr = 0; + ctx.alloc = ctx.m ? ctx.m->num_packs : 16; + ctx.info = NULL; + ALLOC_ARRAY(ctx.info, ctx.alloc); + + if (ctx.m) { + for (i = 0; i < ctx.m->num_packs; i++) { + ALLOC_GROW(ctx.info, ctx.nr + 1, ctx.alloc); + + ctx.info[ctx.nr].orig_pack_int_id = i; + ctx.info[ctx.nr].pack_name = xstrdup(ctx.m->pack_names[i]); + ctx.info[ctx.nr].p = NULL; + ctx.info[ctx.nr].expired = 0; + ctx.nr++; } } - packs.pack_paths_checked = 0; + ctx.pack_paths_checked = 0; if (flags & MIDX_PROGRESS) - packs.progress = start_delayed_progress(_("Adding packfiles to multi-pack-index"), 0); + ctx.progress = start_delayed_progress(_("Adding packfiles to multi-pack-index"), 0); else - packs.progress = NULL; + ctx.progress = NULL; - for_each_file_in_pack_dir(object_dir, add_pack_to_midx, &packs); - stop_progress(&packs.progress); + for_each_file_in_pack_dir(object_dir, add_pack_to_midx, &ctx); + stop_progress(&ctx.progress); - if (packs.m && packs.nr == packs.m->num_packs && !packs_to_drop) + if (ctx.m && ctx.nr == ctx.m->num_packs && !packs_to_drop) goto cleanup; - entries = get_sorted_entries(packs.m, packs.info, packs.nr, &nr_entries); + entries = get_sorted_entries(ctx.m, ctx.info, ctx.nr, &nr_entries); for (i = 0; i < nr_entries; i++) { if (entries[i].offset > 0x7fffffff) @@ -862,19 +862,19 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * large_offsets_needed = 1; } - QSORT(packs.info, packs.nr, pack_info_compare); + QSORT(ctx.info, ctx.nr, pack_info_compare); if (packs_to_drop && packs_to_drop->nr) { int drop_index = 0; int missing_drops = 0; - for (i = 0; i < packs.nr && drop_index < packs_to_drop->nr; i++) { - int cmp = strcmp(packs.info[i].pack_name, + for (i = 0; i < ctx.nr && drop_index < packs_to_drop->nr; i++) { + int cmp = strcmp(ctx.info[i].pack_name, packs_to_drop->items[drop_index].string); if (!cmp) { drop_index++; - packs.info[i].expired = 1; + ctx.info[i].expired = 1; } else if (cmp > 0) { error(_("did not see pack-file %s to drop"), packs_to_drop->items[drop_index].string); @@ -882,7 +882,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * missing_drops++; i--; } else { - packs.info[i].expired = 0; + ctx.info[i].expired = 0; } } @@ -898,19 +898,19 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * * * pack_perm[old_id] = new_id */ - ALLOC_ARRAY(pack_perm, packs.nr); - for (i = 0; i < packs.nr; i++) { - if (packs.info[i].expired) { + ALLOC_ARRAY(pack_perm, ctx.nr); + for (i = 0; i < ctx.nr; i++) { + if (ctx.info[i].expired) { dropped_packs++; - pack_perm[packs.info[i].orig_pack_int_id] = PACK_EXPIRED; + pack_perm[ctx.info[i].orig_pack_int_id] = PACK_EXPIRED; } else { - pack_perm[packs.info[i].orig_pack_int_id] = i - dropped_packs; + pack_perm[ctx.info[i].orig_pack_int_id] = i - dropped_packs; } } - for (i = 0; i < packs.nr; i++) { - if (!packs.info[i].expired) - pack_name_concat_len += strlen(packs.info[i].pack_name) + 1; + for (i = 0; i < ctx.nr; i++) { + if (!ctx.info[i].expired) + pack_name_concat_len += strlen(ctx.info[i].pack_name) + 1; } if (pack_name_concat_len % MIDX_CHUNK_ALIGNMENT) @@ -921,19 +921,19 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * f = hashfd(lk.tempfile->fd, lk.tempfile->filename.buf); FREE_AND_NULL(midx_name); - if (packs.m) - close_midx(packs.m); + if (ctx.m) + close_midx(ctx.m); cur_chunk = 0; num_chunks = large_offsets_needed ? 5 : 4; - if (packs.nr - dropped_packs == 0) { + if (ctx.nr - dropped_packs == 0) { error(_("no pack files to index.")); result = 1; goto cleanup; } - written = write_midx_header(f, num_chunks, packs.nr - dropped_packs); + written = write_midx_header(f, num_chunks, ctx.nr - dropped_packs); chunk_ids[cur_chunk] = MIDX_CHUNKID_PACKNAMES; chunk_offsets[cur_chunk] = written + (num_chunks + 1) * MIDX_CHUNKLOOKUP_WIDTH; @@ -990,7 +990,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * switch (chunk_ids[i]) { case MIDX_CHUNKID_PACKNAMES: - written += write_midx_pack_names(f, packs.info, packs.nr); + written += write_midx_pack_names(f, ctx.info, ctx.nr); break; case MIDX_CHUNKID_OIDFANOUT: @@ -1027,15 +1027,15 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * commit_lock_file(&lk); cleanup: - for (i = 0; i < packs.nr; i++) { - if (packs.info[i].p) { - close_pack(packs.info[i].p); - free(packs.info[i].p); + for (i = 0; i < ctx.nr; i++) { + if (ctx.info[i].p) { + close_pack(ctx.info[i].p); + free(ctx.info[i].p); } - free(packs.info[i].pack_name); + free(ctx.info[i].pack_name); } - free(packs.info); + free(ctx.info); free(entries); free(pack_perm); free(midx_name); From patchwork Thu Dec 3 16:16:43 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 11949099 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3B0B7C0007A for ; Thu, 3 Dec 2020 16:17:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DE540207AD for ; Thu, 3 Dec 2020 16:17:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2501859AbgLCQRn (ORCPT ); Thu, 3 Dec 2020 11:17:43 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56926 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726032AbgLCQRm (ORCPT ); Thu, 3 Dec 2020 11:17:42 -0500 Received: from mail-wm1-x343.google.com (mail-wm1-x343.google.com [IPv6:2a00:1450:4864:20::343]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 21604C061A55 for ; Thu, 3 Dec 2020 08:17:02 -0800 (PST) Received: by mail-wm1-x343.google.com with SMTP id x22so3226859wmc.5 for ; Thu, 03 Dec 2020 08:17:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=3vtm1X9i88mWmQCcuhGCoP18qF+vmfT+1M4unInmIAg=; b=gm94LPVWzvjqA5bN3/3ui2DJbRFSurKHUjEOvgUalfXGZHkWJw3hKBIlxPkk66WpGf RMyFV2pWjdMLBraMNTZzz0nE5vGZqQGJ6dJaAxbcKGicrWTtt5LMfJ9u4wSRRNy7blxe XtQG/Py4lLFSwhYBEH3fxeEC8RcOlkhHXwWTTfLuCzQDirIjc5dYdGJHUwjB3F6rLnVS MMITBaYvGnYgk5+rhDhJkYmnIAr6N679oYzLP/Il5IWe5+taduGS6Zu5OFa71CRvrHo2 bLowkQnZOP+wqCcPvC9v4pIvJjwILdsWloZg3o6dh4KAcxP8ykiudPQihLgyFqPfItk3 6cMg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=3vtm1X9i88mWmQCcuhGCoP18qF+vmfT+1M4unInmIAg=; b=mtaBIK8XsiMdbAcNVQ+FwJVB2PbLdfDXjizs2qVGL+uzEjlLQX2jCYEo15EnmlUYkj ObM1XKfoKh9ITl5U71/wjzWkPQqNMeqbtWS2xKXrw65/BlWdr6scx1wIxPZ4KV6cuMrg xc0RpKFErlj5kPAu9mMZnFWeioyG/xUmAcL9mOdzHfQDCygeDBv3dJ1NxLPU4ezNBpTp idGUJ5VslBy+189efZlglmE6bSLvi8nCcxFSyaPgYRKuXXWe6ghmq7Q2OVeU5sD6WvuU LWt5grBWleI4Ovyar9EkJ3sh7y6bu3sOu33E8KljQ78q11eq3rvx+cy2myKmr1W9X9xA ZHbQ== X-Gm-Message-State: AOAM533aP4Kk9Hspujoowq8PnK4EtuavlPzI6UgzJan++PIKxab/tZRb LhrYG7W3VlRYQee9mlVf9zlKii5Ky5A= X-Google-Smtp-Source: ABdhPJwV1UeyEBfydFYjFq1QwY9Yo65xTs8nA3+OhdLoh3fKFXaD2/GKPKTmg5jxWNOT8tIhMOPCQA== X-Received: by 2002:a7b:cb8f:: with SMTP id m15mr4104406wmi.95.1607012220674; Thu, 03 Dec 2020 08:17:00 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id 138sm2268194wma.41.2020.12.03.08.16.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Dec 2020 08:17:00 -0800 (PST) Message-Id: In-Reply-To: References: Date: Thu, 03 Dec 2020 16:16:43 +0000 Subject: [PATCH 04/15] midx: use context in write_midx_pack_names() Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: szeder.dev@gmail.com, me@ttaylorr.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee In an effort to align the write_midx_internal() to use the chunk-format API, start converting chunk writing methods to match chunk_write_fn. The first case is to convert write_midx_pack_names() to take "void *data". We already have the necessary data in "struct write_midx_context", so this conversion is rather mechanical. Signed-off-by: Derrick Stolee --- midx.c | 21 ++++++++++----------- 1 file changed, 10 insertions(+), 11 deletions(-) diff --git a/midx.c b/midx.c index ded4d394bb..6ab655ddda 100644 --- a/midx.c +++ b/midx.c @@ -643,27 +643,26 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m, return deduplicated_entries; } -static size_t write_midx_pack_names(struct hashfile *f, - struct pack_info *info, - uint32_t num_packs) +static size_t write_midx_pack_names(struct hashfile *f, void *data) { + struct write_midx_context *ctx = (struct write_midx_context *)data; uint32_t i; unsigned char padding[MIDX_CHUNK_ALIGNMENT]; size_t written = 0; - for (i = 0; i < num_packs; i++) { + for (i = 0; i < ctx->nr; i++) { size_t writelen; - if (info[i].expired) + if (ctx->info[i].expired) continue; - if (i && strcmp(info[i].pack_name, info[i - 1].pack_name) <= 0) + if (i && strcmp(ctx->info[i].pack_name, ctx->info[i - 1].pack_name) <= 0) BUG("incorrect pack-file order: %s before %s", - info[i - 1].pack_name, - info[i].pack_name); + ctx->info[i - 1].pack_name, + ctx->info[i].pack_name); - writelen = strlen(info[i].pack_name) + 1; - hashwrite(f, info[i].pack_name, writelen); + writelen = strlen(ctx->info[i].pack_name) + 1; + hashwrite(f, ctx->info[i].pack_name, writelen); written += writelen; } @@ -990,7 +989,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * switch (chunk_ids[i]) { case MIDX_CHUNKID_PACKNAMES: - written += write_midx_pack_names(f, ctx.info, ctx.nr); + written += write_midx_pack_names(f, &ctx); break; case MIDX_CHUNKID_OIDFANOUT: From patchwork Thu Dec 3 16:16:44 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 11949105 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5C780C0018C for ; Thu, 3 Dec 2020 16:17:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1A535207AA for ; Thu, 3 Dec 2020 16:17:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2501863AbgLCQRo (ORCPT ); Thu, 3 Dec 2020 11:17:44 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56930 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2501860AbgLCQRn (ORCPT ); Thu, 3 Dec 2020 11:17:43 -0500 Received: from mail-wm1-x344.google.com (mail-wm1-x344.google.com [IPv6:2a00:1450:4864:20::344]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 234BFC061A56 for ; Thu, 3 Dec 2020 08:17:03 -0800 (PST) Received: by mail-wm1-x344.google.com with SMTP id x22so3226896wmc.5 for ; Thu, 03 Dec 2020 08:17:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=yFAN4RKKlPaLrUGEWuSO/NjeD0s+//v0DUSw0DUyvJw=; b=DPQHRUpC3rLFICEHd72yW5Q1rfocUjHOnCuH+8ftVn72St4/4EYrMhxOvo7aS5R+gM g17y8n8TNC2FoAFpIZQ1owWsaVbvsWvYzLv/jTnCC2D662y/9v+TAud0bAxhoeTJqD6N AGOLemoOab+lKfM2yoB3gweXRqhwkSEs3c5bhkMWBpaGtLPLVEF1CqiTtTZKqN66skEz 65C8Qx/nA0Xhx2mGsmtOax8YUJVEsPrh9o70kSGXEcp3efosA9zpxz+pE+pCjVPbkKR9 XMzpuVzgbAVUIU0RV5bTVzkhKetyBwcWDwI+EHL1Lz2PAyQGamXJfPoL0D8Mqv+lPFSi gDqQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=yFAN4RKKlPaLrUGEWuSO/NjeD0s+//v0DUSw0DUyvJw=; b=cNKqbH1HPJhuhFmbp09LFY84mcsjW9hX+musq3iI4v9lEy8sP5F8P78ytliGhnjlPw ASbF0gECcV+kvGe70CKQrNvYXSFXp1hdOWyGxW7IYswEUdEY1FXaVXbAc8pnI5G731oB p7jTB7jBJmPvuEphemINeZEQg7go7iCs/RHSr7ucvEOJOQXxIM7gje69vvcrkvam2rtO qBhBsu/5sk1MGbqLs4xX6r/rbU/58YqgEHL7n8TOweTyjhmNZlc7m8MHmnMb4NKMZkCX hrLn0GknuWoLJkglK0UfsbYDWbY3AYo+ljS7m2kPW0woROL4C2BSrH5wrvcfaVigbNo+ 4XvA== X-Gm-Message-State: AOAM530K2ZMYoPL+XFPKuylXispNGa89pM9l8aJeyXU8nu/QzMzPUtm/ 79fcmgkNpsu5ITbPuSLcov6xnsXiPtw= X-Google-Smtp-Source: ABdhPJwEmV66GQfKcRWaDG/hca4ZriPD/BkL20U0OWMMkIvHVL6F5wjBQsRI2BCDs1maBn7JAGk8Vw== X-Received: by 2002:a1c:17:: with SMTP id 23mr3141346wma.35.1607012221617; Thu, 03 Dec 2020 08:17:01 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id k6sm2065290wmf.25.2020.12.03.08.17.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Dec 2020 08:17:01 -0800 (PST) Message-Id: <491667de2baef422e801df1e2c7d3173462a96ff.1607012215.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Thu, 03 Dec 2020 16:16:44 +0000 Subject: [PATCH 05/15] midx: add entries to write_midx_context Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: szeder.dev@gmail.com, me@ttaylorr.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee In an effort to align write_midx_internal() with the chunk-format API, continue to group necessary data into "struct write_midx_context". This change collects the "struct pack_midx_entry *entries" list and its count into the context. Update write_midx_oid_fanout() and write_midx_oid_lookup() to take the context directly, as these are easy conversions with this new data. Only the callers of write_midx_object_offsets() and write_midx_large_offsets() are updated here, since additional data in the context before those methods can match chunk_write_fn. Signed-off-by: Derrick Stolee --- midx.c | 49 ++++++++++++++++++++++++++----------------------- 1 file changed, 26 insertions(+), 23 deletions(-) diff --git a/midx.c b/midx.c index 6ab655ddda..2af4452165 100644 --- a/midx.c +++ b/midx.c @@ -458,6 +458,9 @@ struct write_midx_context { struct multi_pack_index *m; struct progress *progress; unsigned pack_paths_checked; + + struct pack_midx_entry *entries; + uint32_t entries_nr; }; static void add_pack_to_midx(const char *full_path, size_t full_path_len, @@ -678,11 +681,11 @@ static size_t write_midx_pack_names(struct hashfile *f, void *data) } static size_t write_midx_oid_fanout(struct hashfile *f, - struct pack_midx_entry *objects, - uint32_t nr_objects) + void *data) { - struct pack_midx_entry *list = objects; - struct pack_midx_entry *last = objects + nr_objects; + struct write_midx_context *ctx = (struct write_midx_context *)data; + struct pack_midx_entry *list = ctx->entries; + struct pack_midx_entry *last = ctx->entries + ctx->entries_nr; uint32_t count = 0; uint32_t i; @@ -706,18 +709,19 @@ static size_t write_midx_oid_fanout(struct hashfile *f, return MIDX_CHUNK_FANOUT_SIZE; } -static size_t write_midx_oid_lookup(struct hashfile *f, unsigned char hash_len, - struct pack_midx_entry *objects, - uint32_t nr_objects) +static size_t write_midx_oid_lookup(struct hashfile *f, + void *data) { - struct pack_midx_entry *list = objects; + struct write_midx_context *ctx = (struct write_midx_context *)data; + unsigned char hash_len = the_hash_algo->rawsz; + struct pack_midx_entry *list = ctx->entries; uint32_t i; size_t written = 0; - for (i = 0; i < nr_objects; i++) { + for (i = 0; i < ctx->entries_nr; i++) { struct pack_midx_entry *obj = list++; - if (i < nr_objects - 1) { + if (i < ctx->entries_nr - 1) { struct pack_midx_entry *next = list; if (oidcmp(&obj->oid, &next->oid) >= 0) BUG("OIDs not in order: %s >= %s", @@ -805,8 +809,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * uint64_t written = 0; uint32_t chunk_ids[MIDX_MAX_CHUNKS + 1]; uint64_t chunk_offsets[MIDX_MAX_CHUNKS + 1]; - uint32_t nr_entries, num_large_offsets = 0; - struct pack_midx_entry *entries = NULL; + uint32_t num_large_offsets = 0; struct progress *progress = NULL; int large_offsets_needed = 0; int pack_name_concat_len = 0; @@ -852,12 +855,12 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * if (ctx.m && ctx.nr == ctx.m->num_packs && !packs_to_drop) goto cleanup; - entries = get_sorted_entries(ctx.m, ctx.info, ctx.nr, &nr_entries); + ctx.entries = get_sorted_entries(ctx.m, ctx.info, ctx.nr, &ctx.entries_nr); - for (i = 0; i < nr_entries; i++) { - if (entries[i].offset > 0x7fffffff) + for (i = 0; i < ctx.entries_nr; i++) { + if (ctx.entries[i].offset > 0x7fffffff) num_large_offsets++; - if (entries[i].offset > 0xffffffff) + if (ctx.entries[i].offset > 0xffffffff) large_offsets_needed = 1; } @@ -947,10 +950,10 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * cur_chunk++; chunk_ids[cur_chunk] = MIDX_CHUNKID_OBJECTOFFSETS; - chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + nr_entries * the_hash_algo->rawsz; + chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + ctx.entries_nr * the_hash_algo->rawsz; cur_chunk++; - chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + nr_entries * MIDX_CHUNK_OFFSET_WIDTH; + chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + ctx.entries_nr * MIDX_CHUNK_OFFSET_WIDTH; if (large_offsets_needed) { chunk_ids[cur_chunk] = MIDX_CHUNKID_LARGEOFFSETS; @@ -993,19 +996,19 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * break; case MIDX_CHUNKID_OIDFANOUT: - written += write_midx_oid_fanout(f, entries, nr_entries); + written += write_midx_oid_fanout(f, &ctx); break; case MIDX_CHUNKID_OIDLOOKUP: - written += write_midx_oid_lookup(f, the_hash_algo->rawsz, entries, nr_entries); + written += write_midx_oid_lookup(f, &ctx); break; case MIDX_CHUNKID_OBJECTOFFSETS: - written += write_midx_object_offsets(f, large_offsets_needed, pack_perm, entries, nr_entries); + written += write_midx_object_offsets(f, large_offsets_needed, pack_perm, ctx.entries, ctx.entries_nr); break; case MIDX_CHUNKID_LARGEOFFSETS: - written += write_midx_large_offsets(f, num_large_offsets, entries, nr_entries); + written += write_midx_large_offsets(f, num_large_offsets, ctx.entries, ctx.entries_nr); break; default: @@ -1035,7 +1038,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * } free(ctx.info); - free(entries); + free(ctx.entries); free(pack_perm); free(midx_name); return result; From patchwork Thu Dec 3 16:16:45 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 11949109 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AB30BC19425 for ; Thu, 3 Dec 2020 16:17:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 757CA207AA for ; Thu, 3 Dec 2020 16:17:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2501867AbgLCQRq (ORCPT ); Thu, 3 Dec 2020 11:17:46 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56936 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2501860AbgLCQRp (ORCPT ); Thu, 3 Dec 2020 11:17:45 -0500 Received: from mail-wr1-x441.google.com (mail-wr1-x441.google.com [IPv6:2a00:1450:4864:20::441]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1570CC08C5F2 for ; Thu, 3 Dec 2020 08:17:04 -0800 (PST) Received: by mail-wr1-x441.google.com with SMTP id k14so2468071wrn.1 for ; Thu, 03 Dec 2020 08:17:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=4Hnydh4wkwojy2Td5O/OnPVsIxkTes9D9bb90vKMHjw=; b=U+5AGeU9awDK7hoy9Bq33mJzugOeAEyfCewYV+apVVa7sSNhCK2VbF4gnAp4kWso1h gw8DelVqF5/CdQ8+vDC+Z3WyQ05GA2WqWyMCUgv+JJ9PikNX1Iwz/NBiuG8mIlmQyLzX J6PEVKyhpVVefb+fVy2J6qOaXxn3ss7FupiuplHdfDYVxAMxxbTR6geK5A8Xo2DS349V uYr9zRYKC6FeVta7V8BUMycufM1s/0QhN5QvOT8mEXKp3dactE+pbaZR60B5UhbxYouL JPl6wnBHqnSHJ7Yw9QU1hDXN8MEl3JhH+p61RXuuNopArh6CCy5jBC9y4LYCUCC7aEUK 1oVQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=4Hnydh4wkwojy2Td5O/OnPVsIxkTes9D9bb90vKMHjw=; b=WSHmUZ1/bdOFwKikMDyxMTur6xgmEQ7040nry+9yD5gv+xVEES4r3v9uzyVDpCGVZt 9O2RyZtEIV5a+3Ur+56LcplqlpWkgv5eyc6nhAAq4EKYpxmQdbtlSf9Wqv1BE7WBQatA esarBjEBP6Go5a+ET1wDa2xkJ01kYnJIP3zBz+JWsTa9Ofcvm9GvceVQCePhJ9GYJ68u jBLJlldtxLntLEL5KROALcFXxKqWMaU/si2dXieTiFGO4FH1fJ54Kabys7hr00y4On+Q UNPxjhapONNeN85DSwvf/x71c8mtvUfATsqZJI0D5kNFblU0ghcQi42wzK8s0euQpb8i fU7Q== X-Gm-Message-State: AOAM530r4wb1i5rjHrVcNJ3wt8/kswd46OgJbyESychleWFPkeAhv+8J +mk+wY0uleri91SEwEsxagffOCeIf84= X-Google-Smtp-Source: ABdhPJwzcXkA/nUXZ8JtjOVBzk3C3dosCVQVCn7naJtGDfqDkly1JdVt9gEY+rEuc0WvwDrwgOKSkQ== X-Received: by 2002:adf:f441:: with SMTP id f1mr4618830wrp.225.1607012222551; Thu, 03 Dec 2020 08:17:02 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id a65sm2040449wmc.35.2020.12.03.08.17.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Dec 2020 08:17:02 -0800 (PST) Message-Id: <0a2f81e7cfd39d69f6388bbfc8b157ba28321a21.1607012215.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Thu, 03 Dec 2020 16:16:45 +0000 Subject: [PATCH 06/15] midx: add pack_perm to write_midx_context Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: szeder.dev@gmail.com, me@ttaylorr.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee In an effort to align write_midx_internal() with the chunk-format API, continue to group necessary data into "struct write_midx_context". This change collects the "uint32_t *pack_perm" and large_offsets_needed bit into the context. Update write_midx_object_offsets() to match chunk_write_fn. Signed-off-by: Derrick Stolee --- midx.c | 40 +++++++++++++++++++++------------------- 1 file changed, 21 insertions(+), 19 deletions(-) diff --git a/midx.c b/midx.c index 2af4452165..f7c3a54a33 100644 --- a/midx.c +++ b/midx.c @@ -461,6 +461,9 @@ struct write_midx_context { struct pack_midx_entry *entries; uint32_t entries_nr; + + uint32_t *pack_perm; + unsigned large_offsets_needed:1; }; static void add_pack_to_midx(const char *full_path, size_t full_path_len, @@ -736,27 +739,27 @@ static size_t write_midx_oid_lookup(struct hashfile *f, return written; } -static size_t write_midx_object_offsets(struct hashfile *f, int large_offset_needed, - uint32_t *perm, - struct pack_midx_entry *objects, uint32_t nr_objects) +static size_t write_midx_object_offsets(struct hashfile *f, + void *data) { - struct pack_midx_entry *list = objects; + struct write_midx_context *ctx = (struct write_midx_context *)data; + struct pack_midx_entry *list = ctx->entries; uint32_t i, nr_large_offset = 0; size_t written = 0; - for (i = 0; i < nr_objects; i++) { + for (i = 0; i < ctx->entries_nr; i++) { struct pack_midx_entry *obj = list++; - if (perm[obj->pack_int_id] == PACK_EXPIRED) + if (ctx->pack_perm[obj->pack_int_id] == PACK_EXPIRED) BUG("object %s is in an expired pack with int-id %d", oid_to_hex(&obj->oid), obj->pack_int_id); - hashwrite_be32(f, perm[obj->pack_int_id]); + hashwrite_be32(f, ctx->pack_perm[obj->pack_int_id]); - if (large_offset_needed && obj->offset >> 31) + if (ctx->large_offsets_needed && obj->offset >> 31) hashwrite_be32(f, MIDX_LARGE_OFFSET_NEEDED | nr_large_offset++); - else if (!large_offset_needed && obj->offset >> 32) + else if (!ctx->large_offsets_needed && obj->offset >> 32) BUG("object %s requires a large offset (%"PRIx64") but the MIDX is not writing large offsets!", oid_to_hex(&obj->oid), obj->offset); @@ -805,13 +808,11 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * struct hashfile *f = NULL; struct lock_file lk; struct write_midx_context ctx = { 0 }; - uint32_t *pack_perm = NULL; uint64_t written = 0; uint32_t chunk_ids[MIDX_MAX_CHUNKS + 1]; uint64_t chunk_offsets[MIDX_MAX_CHUNKS + 1]; uint32_t num_large_offsets = 0; struct progress *progress = NULL; - int large_offsets_needed = 0; int pack_name_concat_len = 0; int dropped_packs = 0; int result = 0; @@ -857,11 +858,12 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * ctx.entries = get_sorted_entries(ctx.m, ctx.info, ctx.nr, &ctx.entries_nr); + ctx.large_offsets_needed = 0; for (i = 0; i < ctx.entries_nr; i++) { if (ctx.entries[i].offset > 0x7fffffff) num_large_offsets++; if (ctx.entries[i].offset > 0xffffffff) - large_offsets_needed = 1; + ctx.large_offsets_needed = 1; } QSORT(ctx.info, ctx.nr, pack_info_compare); @@ -900,13 +902,13 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * * * pack_perm[old_id] = new_id */ - ALLOC_ARRAY(pack_perm, ctx.nr); + ALLOC_ARRAY(ctx.pack_perm, ctx.nr); for (i = 0; i < ctx.nr; i++) { if (ctx.info[i].expired) { dropped_packs++; - pack_perm[ctx.info[i].orig_pack_int_id] = PACK_EXPIRED; + ctx.pack_perm[ctx.info[i].orig_pack_int_id] = PACK_EXPIRED; } else { - pack_perm[ctx.info[i].orig_pack_int_id] = i - dropped_packs; + ctx.pack_perm[ctx.info[i].orig_pack_int_id] = i - dropped_packs; } } @@ -927,7 +929,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * close_midx(ctx.m); cur_chunk = 0; - num_chunks = large_offsets_needed ? 5 : 4; + num_chunks = ctx.large_offsets_needed ? 5 : 4; if (ctx.nr - dropped_packs == 0) { error(_("no pack files to index.")); @@ -954,7 +956,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * cur_chunk++; chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + ctx.entries_nr * MIDX_CHUNK_OFFSET_WIDTH; - if (large_offsets_needed) { + if (ctx.large_offsets_needed) { chunk_ids[cur_chunk] = MIDX_CHUNKID_LARGEOFFSETS; cur_chunk++; @@ -1004,7 +1006,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * break; case MIDX_CHUNKID_OBJECTOFFSETS: - written += write_midx_object_offsets(f, large_offsets_needed, pack_perm, ctx.entries, ctx.entries_nr); + written += write_midx_object_offsets(f, &ctx); break; case MIDX_CHUNKID_LARGEOFFSETS: @@ -1039,7 +1041,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * free(ctx.info); free(ctx.entries); - free(pack_perm); + free(ctx.pack_perm); free(midx_name); return result; } From patchwork Thu Dec 3 16:16:46 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 11949107 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 857F2C18E57 for ; Thu, 3 Dec 2020 16:17:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 48FCE20658 for ; Thu, 3 Dec 2020 16:17:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2501869AbgLCQRq (ORCPT ); Thu, 3 Dec 2020 11:17:46 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56938 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2501864AbgLCQRp (ORCPT ); Thu, 3 Dec 2020 11:17:45 -0500 Received: from mail-wm1-x341.google.com (mail-wm1-x341.google.com [IPv6:2a00:1450:4864:20::341]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 16557C08E85E for ; Thu, 3 Dec 2020 08:17:05 -0800 (PST) Received: by mail-wm1-x341.google.com with SMTP id a6so3270913wmc.2 for ; Thu, 03 Dec 2020 08:17:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=khL0G4eY6t8w7kMYcfsyX62mlwnP0T1t9GT8lr2yxe0=; b=J1a+WrW+9puV3Guj4Cg4S3Q/cNZYxWPs98EeJLPtJUx0+S5VqedYkZBF8pbbvogCY3 qSkqr88DWPTN2YAJ4gU8v7QTWexA/aFMJQVVxqEvl7J6OjSK6g+Bu6e+vQQokOAEWfhB R2PJRoqA9LBY15P6EeWzjVzbvBfpZVAyz1wPGbmNTPLjS8Y68MbAk5tjXZMmFk2EwgpM 2Cu3dYPhbiYdFn9mtOFUUArNQ4OelH0vSesMXTkH6RzwRXsKMQ5me81w9DMjxjleHda1 Xkab64JWNYIu+mL7QqtEjBKDkhIxvSlwtUKECOQfHJs4gzH+Xl8RVJobpOb0wvuWIPfN 68lQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=khL0G4eY6t8w7kMYcfsyX62mlwnP0T1t9GT8lr2yxe0=; b=V0ZbuwE5CkbqKco3P7ev6GFS5xBgPjvl6f65AluH9V40/tEF8mfn5dAJV3Uflrwydl 1W8hJ6LaPz7R4V46bDgsxEL/fG2g2v87aTSP1a4CGA1v0vRWXZJ0WPIEiKM1onv0LNyo vlpGw47rwlgcN+dH9q5c/gxiVKKWl7UBJUnIE+XluR4Wh4XLaUoakYweRk5lB0dba3Mo ofFeQPEIT7E09ByuWa2qYLZG6+K6wPX7awZ1VCrzFqvnZ8WsJzfdeJ5q/AaTjIVJsAzP F02eevBssD5E/8aqjsWTgS15fz6T6D92tZOVLlWAxo5dUJamajCR7HPpyBS95IJy756M ddPw== X-Gm-Message-State: AOAM530gN/h41BunKrcRt4w0zD4P0zxIRjv3ago5X1qC3snskB+WTD+D qaHxMt+FgWUMh5MncfmUzHoEmIG7E7s= X-Google-Smtp-Source: ABdhPJy4ZWZdUSSv5HW9Ri6bdo8TcUTgJ6Lc5PgCxleQjE64GQubki0lBDvabxVXEmpShEmSIT5uaw== X-Received: by 2002:a1c:f20e:: with SMTP id s14mr4158588wmc.126.1607012223609; Thu, 03 Dec 2020 08:17:03 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id b74sm2262840wme.27.2020.12.03.08.17.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Dec 2020 08:17:03 -0800 (PST) Message-Id: <84f7bc46f93a756da2148ebbc175f0cb02ef3c1a.1607012215.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Thu, 03 Dec 2020 16:16:46 +0000 Subject: [PATCH 07/15] midx: add num_large_offsets to write_midx_context Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: szeder.dev@gmail.com, me@ttaylorr.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee In an effort to align write_midx_internal() with the chunk-format API, continue to group necessary data into "struct write_midx_context". This change collects the "uint32_t num_large_offsets" into the context. With this new data, write_midx_large_offsets() now matches the chunk_write_fn type. Signed-off-by: Derrick Stolee --- midx.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/midx.c b/midx.c index f7c3a54a33..d7da358a3f 100644 --- a/midx.c +++ b/midx.c @@ -464,6 +464,7 @@ struct write_midx_context { uint32_t *pack_perm; unsigned large_offsets_needed:1; + uint32_t num_large_offsets; }; static void add_pack_to_midx(const char *full_path, size_t full_path_len, @@ -772,11 +773,14 @@ static size_t write_midx_object_offsets(struct hashfile *f, return written; } -static size_t write_midx_large_offsets(struct hashfile *f, uint32_t nr_large_offset, - struct pack_midx_entry *objects, uint32_t nr_objects) +static size_t write_midx_large_offsets(struct hashfile *f, + void *data) { - struct pack_midx_entry *list = objects, *end = objects + nr_objects; + struct write_midx_context *ctx = (struct write_midx_context *)data; + struct pack_midx_entry *list = ctx->entries; + struct pack_midx_entry *end = ctx->entries + ctx->entries_nr; size_t written = 0; + uint32_t nr_large_offset = ctx->num_large_offsets; while (nr_large_offset) { struct pack_midx_entry *obj; @@ -811,7 +815,6 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * uint64_t written = 0; uint32_t chunk_ids[MIDX_MAX_CHUNKS + 1]; uint64_t chunk_offsets[MIDX_MAX_CHUNKS + 1]; - uint32_t num_large_offsets = 0; struct progress *progress = NULL; int pack_name_concat_len = 0; int dropped_packs = 0; @@ -861,7 +864,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * ctx.large_offsets_needed = 0; for (i = 0; i < ctx.entries_nr; i++) { if (ctx.entries[i].offset > 0x7fffffff) - num_large_offsets++; + ctx.num_large_offsets++; if (ctx.entries[i].offset > 0xffffffff) ctx.large_offsets_needed = 1; } @@ -961,7 +964,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * cur_chunk++; chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + - num_large_offsets * MIDX_CHUNK_LARGE_OFFSET_WIDTH; + ctx.num_large_offsets * MIDX_CHUNK_LARGE_OFFSET_WIDTH; } chunk_ids[cur_chunk] = 0; @@ -1010,7 +1013,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * break; case MIDX_CHUNKID_LARGEOFFSETS: - written += write_midx_large_offsets(f, num_large_offsets, ctx.entries, ctx.entries_nr); + written += write_midx_large_offsets(f, &ctx); break; default: From patchwork Thu Dec 3 16:16:47 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 11949113 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3036EC4167B for ; Thu, 3 Dec 2020 16:17:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CDC34207AC for ; Thu, 3 Dec 2020 16:17:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2501874AbgLCQRw (ORCPT ); Thu, 3 Dec 2020 11:17:52 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56956 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2501864AbgLCQRv (ORCPT ); Thu, 3 Dec 2020 11:17:51 -0500 Received: from mail-wm1-x343.google.com (mail-wm1-x343.google.com [IPv6:2a00:1450:4864:20::343]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1C016C061A4F for ; Thu, 3 Dec 2020 08:17:06 -0800 (PST) Received: by mail-wm1-x343.google.com with SMTP id f190so4530799wme.1 for ; Thu, 03 Dec 2020 08:17:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=Pfqu9XMbjGTvOL3pjpfj+P8GC32X0HDYjERDMUqH4/I=; b=TXai4lDzpuWuB+NbKr5dq1h66IpDYBNILjNESSYzBkB2ltt+89OKDcaxUuwHKnb3sq t1yz3QUYyr0Ox6ZmSvRx1rujN+gConsxp3WHA104IK4w+kxHmlgLY/ur63wae0PDOiut P6yh11aL90/xQvPuqxpUBcOg6OkhJiNvKjriZzPH+RFI38Wq4bKweI6F/ASnQhYftC2B U1gxy2mVy4+IQlGZaYlkkNPlclsoBouJECjSymoLsQ05vTW5/Q/vtm4liDvJ/Um1hfip ffEGE4ISLOXENY/oAtEncDgjRiDeWwvsRT1hCdbfkdRss5UeX6SvPwrK3hBrFvejveuD bn2w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=Pfqu9XMbjGTvOL3pjpfj+P8GC32X0HDYjERDMUqH4/I=; b=FfNUNngJghEJCbYSQcM4AJS1QsPTi6YbQ4Qq/tq89TJklaF6OAxQyZdTqBJvK7bW2Q 9P8oBQa+F07EczbLukDIIo4BPHFwxMPVBcnhazSAetBlmE+ezlOEO18W8uKnA8ZjjeJ7 JcRssmv0t9O8PjMEHmi1m+bHaiaQF7GJx/Lp5cE4KQrJ8tfPzKiA3tgKkcjsxF0zMyFK ZCwPk5wsTsdJ6MMlShFBX2gl72V8R2nKAoV2EkuGIKxi3RbC0rT82uLZNHtv0apK5qU5 E8MRaycfPZ/6PmAbBvr1W7mNyoniPF9T2f3VyzCD+jSOJkoaczXbyBXYlM7+KVSjx+4C fiJw== X-Gm-Message-State: AOAM532Td3RDuXO07LDeNDGnNoiTReb5valydTcwIcjPm+WSUP7AzB52 6Yl73OgKTT3gdes0l8yIL9fKUa9o83I= X-Google-Smtp-Source: ABdhPJxOBJHNyEEyiZd3vRvlPelLOiD+3BL+JR0dfArSdfxbHNIaZfudMaFwDdX2rdOjEwWbhPL4Xg== X-Received: by 2002:a7b:c2e8:: with SMTP id e8mr4094045wmk.103.1607012224534; Thu, 03 Dec 2020 08:17:04 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id c2sm2227715wrv.41.2020.12.03.08.17.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Dec 2020 08:17:04 -0800 (PST) Message-Id: In-Reply-To: References: Date: Thu, 03 Dec 2020 16:16:47 +0000 Subject: [PATCH 08/15] midx: convert chunk write methods to return int Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: szeder.dev@gmail.com, me@ttaylorr.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee Historically, the chunk-writing methods in midx.c have returned the amount of data written so the writer method could compare this with the table of contents. This presents with some interesting issues: 1. If a chunk writing method has a bug that miscalculates the written bytes, then we can satisfy the table of contents without actually writing the right amount of data to the hashfile. The commit-graph writing code checks the hashfile struct directly for a more robust verification. 2. There is no way for a chunk writing method to gracefully fail. Returning an int presents an opportunity to fail without a die(). 3. The current pattern doesn't match chunk_write_fn type exactly, so we cannot share code with commit-graph.c For these reasons, convert the midx chunk writer methods to return an 'int'. Since none of them fail at the moment, they all return 0. Signed-off-by: Derrick Stolee --- midx.c | 63 +++++++++++++++++++++++++--------------------------------- 1 file changed, 27 insertions(+), 36 deletions(-) diff --git a/midx.c b/midx.c index d7da358a3f..5eb1b01946 100644 --- a/midx.c +++ b/midx.c @@ -650,7 +650,7 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m, return deduplicated_entries; } -static size_t write_midx_pack_names(struct hashfile *f, void *data) +static int write_midx_pack_names(struct hashfile *f, void *data) { struct write_midx_context *ctx = (struct write_midx_context *)data; uint32_t i; @@ -678,14 +678,13 @@ static size_t write_midx_pack_names(struct hashfile *f, void *data) if (i < MIDX_CHUNK_ALIGNMENT) { memset(padding, 0, sizeof(padding)); hashwrite(f, padding, i); - written += i; } - return written; + return 0; } -static size_t write_midx_oid_fanout(struct hashfile *f, - void *data) +static int write_midx_oid_fanout(struct hashfile *f, + void *data) { struct write_midx_context *ctx = (struct write_midx_context *)data; struct pack_midx_entry *list = ctx->entries; @@ -710,17 +709,16 @@ static size_t write_midx_oid_fanout(struct hashfile *f, list = next; } - return MIDX_CHUNK_FANOUT_SIZE; + return 0; } -static size_t write_midx_oid_lookup(struct hashfile *f, - void *data) +static int write_midx_oid_lookup(struct hashfile *f, + void *data) { struct write_midx_context *ctx = (struct write_midx_context *)data; unsigned char hash_len = the_hash_algo->rawsz; struct pack_midx_entry *list = ctx->entries; uint32_t i; - size_t written = 0; for (i = 0; i < ctx->entries_nr; i++) { struct pack_midx_entry *obj = list++; @@ -734,19 +732,17 @@ static size_t write_midx_oid_lookup(struct hashfile *f, } hashwrite(f, obj->oid.hash, (int)hash_len); - written += hash_len; } - return written; + return 0; } -static size_t write_midx_object_offsets(struct hashfile *f, - void *data) +static int write_midx_object_offsets(struct hashfile *f, + void *data) { struct write_midx_context *ctx = (struct write_midx_context *)data; struct pack_midx_entry *list = ctx->entries; uint32_t i, nr_large_offset = 0; - size_t written = 0; for (i = 0; i < ctx->entries_nr; i++) { struct pack_midx_entry *obj = list++; @@ -766,20 +762,17 @@ static size_t write_midx_object_offsets(struct hashfile *f, obj->offset); else hashwrite_be32(f, (uint32_t)obj->offset); - - written += MIDX_CHUNK_OFFSET_WIDTH; } - return written; + return 0; } -static size_t write_midx_large_offsets(struct hashfile *f, - void *data) +static int write_midx_large_offsets(struct hashfile *f, + void *data) { struct write_midx_context *ctx = (struct write_midx_context *)data; struct pack_midx_entry *list = ctx->entries; struct pack_midx_entry *end = ctx->entries + ctx->entries_nr; - size_t written = 0; uint32_t nr_large_offset = ctx->num_large_offsets; while (nr_large_offset) { @@ -795,12 +788,12 @@ static size_t write_midx_large_offsets(struct hashfile *f, if (!(offset >> 31)) continue; - written += hashwrite_be64(f, offset); + hashwrite_be64(f, offset); nr_large_offset--; } - return written; + return 0; } static int write_midx_internal(const char *object_dir, struct multi_pack_index *m, @@ -812,7 +805,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * struct hashfile *f = NULL; struct lock_file lk; struct write_midx_context ctx = { 0 }; - uint64_t written = 0; + uint64_t header_size = 0; uint32_t chunk_ids[MIDX_MAX_CHUNKS + 1]; uint64_t chunk_offsets[MIDX_MAX_CHUNKS + 1]; struct progress *progress = NULL; @@ -940,10 +933,10 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * goto cleanup; } - written = write_midx_header(f, num_chunks, ctx.nr - dropped_packs); + header_size = write_midx_header(f, num_chunks, ctx.nr - dropped_packs); chunk_ids[cur_chunk] = MIDX_CHUNKID_PACKNAMES; - chunk_offsets[cur_chunk] = written + (num_chunks + 1) * MIDX_CHUNKLOOKUP_WIDTH; + chunk_offsets[cur_chunk] = header_size + (num_chunks + 1) * MIDX_CHUNKLOOKUP_WIDTH; cur_chunk++; chunk_ids[cur_chunk] = MIDX_CHUNKID_OIDFANOUT; @@ -981,39 +974,37 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * hashwrite_be32(f, chunk_ids[i]); hashwrite_be64(f, chunk_offsets[i]); - - written += MIDX_CHUNKLOOKUP_WIDTH; } if (flags & MIDX_PROGRESS) progress = start_delayed_progress(_("Writing chunks to multi-pack-index"), num_chunks); for (i = 0; i < num_chunks; i++) { - if (written != chunk_offsets[i]) + if (f->total + f->offset != chunk_offsets[i]) BUG("incorrect chunk offset (%"PRIu64" != %"PRIu64") for chunk id %"PRIx32, chunk_offsets[i], - written, + f->total + f->offset, chunk_ids[i]); switch (chunk_ids[i]) { case MIDX_CHUNKID_PACKNAMES: - written += write_midx_pack_names(f, &ctx); + write_midx_pack_names(f, &ctx); break; case MIDX_CHUNKID_OIDFANOUT: - written += write_midx_oid_fanout(f, &ctx); + write_midx_oid_fanout(f, &ctx); break; case MIDX_CHUNKID_OIDLOOKUP: - written += write_midx_oid_lookup(f, &ctx); + write_midx_oid_lookup(f, &ctx); break; case MIDX_CHUNKID_OBJECTOFFSETS: - written += write_midx_object_offsets(f, &ctx); + write_midx_object_offsets(f, &ctx); break; case MIDX_CHUNKID_LARGEOFFSETS: - written += write_midx_large_offsets(f, &ctx); + write_midx_large_offsets(f, &ctx); break; default: @@ -1025,9 +1016,9 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * } stop_progress(&progress); - if (written != chunk_offsets[num_chunks]) + if (f->total + f->offset != chunk_offsets[num_chunks]) BUG("incorrect final offset %"PRIu64" != %"PRIu64, - written, + f->total + f->offset, chunk_offsets[num_chunks]); finalize_hashfile(f, NULL, CSUM_FSYNC | CSUM_HASH_IN_STREAM); From patchwork Thu Dec 3 16:16:48 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 11949111 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DFEE9C193FE for ; Thu, 3 Dec 2020 16:17:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9FB2E20658 for ; Thu, 3 Dec 2020 16:17:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2501876AbgLCQRw (ORCPT ); Thu, 3 Dec 2020 11:17:52 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56958 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2501870AbgLCQRv (ORCPT ); Thu, 3 Dec 2020 11:17:51 -0500 Received: from mail-wm1-x343.google.com (mail-wm1-x343.google.com [IPv6:2a00:1450:4864:20::343]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 13F9FC08E85F for ; Thu, 3 Dec 2020 08:17:07 -0800 (PST) Received: by mail-wm1-x343.google.com with SMTP id e25so4514762wme.0 for ; Thu, 03 Dec 2020 08:17:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=Ub/FJSWvT5iakCd0l8s0RS+TgWpD6nKTyMFwTV4pXAc=; b=dJbIXUXDiEPnOe2A1HRUgR4gNEdQmA0/+/Cp4QdBLdue1WrUeVINIZS+Vy1IaJjRe7 geCB6Hrm7IFfsP4hXZlIZpeaNIhSvJphgzgP5c8+DjlZ0A1Ojv8CeHsDERl26HzgigOS LMprGhTtqMBdWmDOAdpSkoyubkjRoJfBprLrNQY/n4RmENiTK6ftkcTekSMf/X86naNm pBpViw9SKQ6jEnG0FKbIEV9kWxaj84msYxRw3KbWecSOYRm4d8GvnyNJYTVoWItIHght kcVSJqJ1yoR+uPzHb/mrZl7VAX+qFxA1BwyNVDuz0CNGpeK42tFDlJBMCR2v3iNJX2Qm 6dCA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=Ub/FJSWvT5iakCd0l8s0RS+TgWpD6nKTyMFwTV4pXAc=; b=bb1twuNKDO2oaOvgDBWoqh3pdhCU1hWG7yJgpFElGegwTR6UXLxqZU+usZQe+GeKcm zPayFLsTO/LRxKLF9L3/zxJAAub6cLa72LiS/bR4FUMTwShCyHeJHIpNz9NKLQqVdqW7 tbeFGa3pqTK3G88q0+qZGjW37LdZR2jEKF5Uv+Ool8ruTV00pxsfeikOAqsctZzMp3T9 KJFkcuFsz6fxGuQlyYcwbKG9QK0aOTe+g11+Cdf7ItjKaJrtyAzzca5hu71ALTgRY1J4 Mtc62Q14SQUVVndBV6zqbN1gFVshBlBS6IZJTqosW3ecagLLVkn15mZlzxRgx5jGke6F ZXuQ== X-Gm-Message-State: AOAM533o1Ihg1yI7DKeTogOPWvlUlzvia0EMOQwT2CbkJ+pEW1owKGAW a6x2zXT0wnTbVCgHvutSZDrNdQ9TNi0= X-Google-Smtp-Source: ABdhPJxnZjrY8wKiPzaMPgL2xwssvq8bHwLmfid7xfyqdiCcZMAuqCmR/3FVOXSGHUWOnjubsI/5aA== X-Received: by 2002:a7b:c308:: with SMTP id k8mr4160356wmj.76.1607012225638; Thu, 03 Dec 2020 08:17:05 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id b83sm2081201wmd.48.2020.12.03.08.17.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Dec 2020 08:17:05 -0800 (PST) Message-Id: In-Reply-To: References: Date: Thu, 03 Dec 2020 16:16:48 +0000 Subject: [PATCH 09/15] midx: drop chunk progress during write Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: szeder.dev@gmail.com, me@ttaylorr.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee Most expensive operations in write_midx_internal() use the context struct's progress member, and these indicate the process of the expensive operations within the chunk writing methods. However, there is a competing progress struct that counts the progress over all chunks. This is not very helpful compared to the others, so drop it. This also reduces our barriers to combining the chunk writing code with chunk-format.c. Signed-off-by: Derrick Stolee --- midx.c | 7 ------- 1 file changed, 7 deletions(-) diff --git a/midx.c b/midx.c index 5eb1b01946..ce6d4339bd 100644 --- a/midx.c +++ b/midx.c @@ -808,7 +808,6 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * uint64_t header_size = 0; uint32_t chunk_ids[MIDX_MAX_CHUNKS + 1]; uint64_t chunk_offsets[MIDX_MAX_CHUNKS + 1]; - struct progress *progress = NULL; int pack_name_concat_len = 0; int dropped_packs = 0; int result = 0; @@ -976,9 +975,6 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * hashwrite_be64(f, chunk_offsets[i]); } - if (flags & MIDX_PROGRESS) - progress = start_delayed_progress(_("Writing chunks to multi-pack-index"), - num_chunks); for (i = 0; i < num_chunks; i++) { if (f->total + f->offset != chunk_offsets[i]) BUG("incorrect chunk offset (%"PRIu64" != %"PRIu64") for chunk id %"PRIx32, @@ -1011,10 +1007,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * BUG("trying to write unknown chunk id %"PRIx32, chunk_ids[i]); } - - display_progress(progress, i + 1); } - stop_progress(&progress); if (f->total + f->offset != chunk_offsets[num_chunks]) BUG("incorrect final offset %"PRIu64" != %"PRIu64, From patchwork Thu Dec 3 16:16:49 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 11949125 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2B287C0018C for ; Thu, 3 Dec 2020 16:18:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D78E1207AA for ; Thu, 3 Dec 2020 16:18:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2501880AbgLCQST (ORCPT ); Thu, 3 Dec 2020 11:18:19 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57032 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2501871AbgLCQSS (ORCPT ); Thu, 3 Dec 2020 11:18:18 -0500 Received: from mail-wm1-x344.google.com (mail-wm1-x344.google.com [IPv6:2a00:1450:4864:20::344]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 10716C08E860 for ; Thu, 3 Dec 2020 08:17:08 -0800 (PST) Received: by mail-wm1-x344.google.com with SMTP id d3so3240395wmb.4 for ; Thu, 03 Dec 2020 08:17:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=N5NGjmYZjqgMn1IRM3P08gMwL8dLA+n1v9CQ6COUz6w=; b=gZpbfsctilW+KaPe6ARIoSCaMzS2FQCr2dF+tdrPw0zD6hTaSs2vBPY3cGy00dAZxk v9mQpr2/flaTKY2q0AMl2aaLzaekMWrrSvdkZbb+Zend3QGGJ7dbYAUmXD2OmnjRR/c/ hKfY4zIFV+BCBBfvYn3mDBp7lmRmpkoh4IMQd/mWM3mX3tJGGeLNbz/GIegfYXxu+lYM 8bVqddQmE2nB8GGDTMWxGq7h3FrLnAT6aiUakWGzGd/vDeJbDYqXxBfuMB9KrPZ4F9ZK x6Dro98eM84/c2Jwmc7WWjeQMIE/LQSJvRZhpmhllyCnlq4nesQxZ7K7Aa7GszgC+zB1 mJTw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=N5NGjmYZjqgMn1IRM3P08gMwL8dLA+n1v9CQ6COUz6w=; b=dSU8xq4uAu1X58x1mDddU0ix4uPEIyUEcmTxVrtumbJNmUORrwBCjvAAmVyOPjYUX3 eUXBqxHqV7j+WRf24Ysm8+IKBQQiK1jaEtUoBGO3RmlW5TyN7K9JWC7t2X6nmngr7+Rq 08toLItTtimS7nO878lPiGqZjxEGhyJBZr7/EGW+cP91KgeMyhm73B2Diqf3OfEXsDiz hZLgtR25xyxoJ50Z0GSELUAWWqZAXinTKRiDxU09sNJDG/5Iov4ZS2MOMfcC3SDhMOmJ JYqSHqe4vECo646+QOG4cNfdr0Ece+f06QbmQ+UgZtXp+JK0NNqINoBSuHBy9Y9BTHO4 219Q== X-Gm-Message-State: AOAM53269XUTblvGPuTLFGrBHyemdKBpRdbcOEYLwlagtv7TpIJiMjnt G3r3juxHQbAT2HUNYB8CnhHnu0IUycA= X-Google-Smtp-Source: ABdhPJxTngH+ckIUjhUg0YqMYQbQ8gw2NNigJMmGGA6KAvquTwp0UQvriqjOl6J4u7OGYvNzd2K/xw== X-Received: by 2002:a7b:cb0c:: with SMTP id u12mr4136576wmj.11.1607012226565; Thu, 03 Dec 2020 08:17:06 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id 35sm2234367wro.71.2020.12.03.08.17.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Dec 2020 08:17:06 -0800 (PST) Message-Id: In-Reply-To: References: Date: Thu, 03 Dec 2020 16:16:49 +0000 Subject: [PATCH 10/15] midx: use chunk-format API in write_midx_internal() Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: szeder.dev@gmail.com, me@ttaylorr.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The chunk-format API allows automatically writing the table of contents for a chunk-based file format when using an array of "struct chunk_info"s. Update write_midx_internal() to use this strategy, which also simplifies the chunk writing loop. This loop will be replaced with a chunk-format API call in an upcoming change. Signed-off-by: Derrick Stolee --- midx.c | 96 +++++++++++++--------------------------------------------- 1 file changed, 21 insertions(+), 75 deletions(-) diff --git a/midx.c b/midx.c index ce6d4339bd..0548266bea 100644 --- a/midx.c +++ b/midx.c @@ -11,6 +11,7 @@ #include "trace2.h" #include "run-command.h" #include "repository.h" +#include "chunk-format.h" #define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */ #define MIDX_VERSION 1 @@ -799,15 +800,14 @@ static int write_midx_large_offsets(struct hashfile *f, static int write_midx_internal(const char *object_dir, struct multi_pack_index *m, struct string_list *packs_to_drop, unsigned flags) { - unsigned char cur_chunk, num_chunks = 0; + unsigned char num_chunks = 0; char *midx_name; uint32_t i; struct hashfile *f = NULL; struct lock_file lk; struct write_midx_context ctx = { 0 }; uint64_t header_size = 0; - uint32_t chunk_ids[MIDX_MAX_CHUNKS + 1]; - uint64_t chunk_offsets[MIDX_MAX_CHUNKS + 1]; + struct chunk_info chunks[MIDX_MAX_CHUNKS]; int pack_name_concat_len = 0; int dropped_packs = 0; int result = 0; @@ -923,7 +923,6 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * if (ctx.m) close_midx(ctx.m); - cur_chunk = 0; num_chunks = ctx.large_offsets_needed ? 5 : 4; if (ctx.nr - dropped_packs == 0) { @@ -934,85 +933,32 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * header_size = write_midx_header(f, num_chunks, ctx.nr - dropped_packs); - chunk_ids[cur_chunk] = MIDX_CHUNKID_PACKNAMES; - chunk_offsets[cur_chunk] = header_size + (num_chunks + 1) * MIDX_CHUNKLOOKUP_WIDTH; + chunks[0].id = MIDX_CHUNKID_PACKNAMES; + chunks[0].size = pack_name_concat_len; + chunks[0].write_fn = write_midx_pack_names; - cur_chunk++; - chunk_ids[cur_chunk] = MIDX_CHUNKID_OIDFANOUT; - chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + pack_name_concat_len; + chunks[1].id = MIDX_CHUNKID_OIDFANOUT; + chunks[1].size = MIDX_CHUNK_FANOUT_SIZE; + chunks[1].write_fn = write_midx_oid_fanout; - cur_chunk++; - chunk_ids[cur_chunk] = MIDX_CHUNKID_OIDLOOKUP; - chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + MIDX_CHUNK_FANOUT_SIZE; + chunks[2].id = MIDX_CHUNKID_OIDLOOKUP; + chunks[2].size = ctx.entries_nr * the_hash_algo->rawsz; + chunks[2].write_fn = write_midx_oid_lookup; - cur_chunk++; - chunk_ids[cur_chunk] = MIDX_CHUNKID_OBJECTOFFSETS; - chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + ctx.entries_nr * the_hash_algo->rawsz; + chunks[3].id = MIDX_CHUNKID_OBJECTOFFSETS; + chunks[3].size = ctx.entries_nr * MIDX_CHUNK_OFFSET_WIDTH; + chunks[3].write_fn = write_midx_object_offsets; - cur_chunk++; - chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + ctx.entries_nr * MIDX_CHUNK_OFFSET_WIDTH; if (ctx.large_offsets_needed) { - chunk_ids[cur_chunk] = MIDX_CHUNKID_LARGEOFFSETS; - - cur_chunk++; - chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + - ctx.num_large_offsets * MIDX_CHUNK_LARGE_OFFSET_WIDTH; + chunks[4].id = MIDX_CHUNKID_LARGEOFFSETS; + chunks[4].size = ctx.num_large_offsets * MIDX_CHUNK_LARGE_OFFSET_WIDTH; + chunks[4].write_fn = write_midx_large_offsets; } - chunk_ids[cur_chunk] = 0; - - for (i = 0; i <= num_chunks; i++) { - if (i && chunk_offsets[i] < chunk_offsets[i - 1]) - BUG("incorrect chunk offsets: %"PRIu64" before %"PRIu64, - chunk_offsets[i - 1], - chunk_offsets[i]); - - if (chunk_offsets[i] % MIDX_CHUNK_ALIGNMENT) - BUG("chunk offset %"PRIu64" is not properly aligned", - chunk_offsets[i]); - - hashwrite_be32(f, chunk_ids[i]); - hashwrite_be64(f, chunk_offsets[i]); - } - - for (i = 0; i < num_chunks; i++) { - if (f->total + f->offset != chunk_offsets[i]) - BUG("incorrect chunk offset (%"PRIu64" != %"PRIu64") for chunk id %"PRIx32, - chunk_offsets[i], - f->total + f->offset, - chunk_ids[i]); - - switch (chunk_ids[i]) { - case MIDX_CHUNKID_PACKNAMES: - write_midx_pack_names(f, &ctx); - break; - - case MIDX_CHUNKID_OIDFANOUT: - write_midx_oid_fanout(f, &ctx); - break; - - case MIDX_CHUNKID_OIDLOOKUP: - write_midx_oid_lookup(f, &ctx); - break; - - case MIDX_CHUNKID_OBJECTOFFSETS: - write_midx_object_offsets(f, &ctx); - break; - - case MIDX_CHUNKID_LARGEOFFSETS: - write_midx_large_offsets(f, &ctx); - break; - - default: - BUG("trying to write unknown chunk id %"PRIx32, - chunk_ids[i]); - } - } + write_table_of_contents(f, header_size, chunks, num_chunks); - if (f->total + f->offset != chunk_offsets[num_chunks]) - BUG("incorrect final offset %"PRIu64" != %"PRIu64, - f->total + f->offset, - chunk_offsets[num_chunks]); + for (i = 0; i < num_chunks; i++) + chunks[i].write_fn(f, &ctx); finalize_hashfile(f, NULL, CSUM_FSYNC | CSUM_HASH_IN_STREAM); commit_lock_file(&lk); From patchwork Thu Dec 3 16:16:50 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 11949115 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8B94CC4361A for ; Thu, 3 Dec 2020 16:18:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3F515207AA for ; Thu, 3 Dec 2020 16:18:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2501882AbgLCQST (ORCPT ); Thu, 3 Dec 2020 11:18:19 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57034 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2501877AbgLCQSS (ORCPT ); Thu, 3 Dec 2020 11:18:18 -0500 Received: from mail-wm1-x341.google.com (mail-wm1-x341.google.com [IPv6:2a00:1450:4864:20::341]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 14DFCC08E861 for ; Thu, 3 Dec 2020 08:17:09 -0800 (PST) Received: by mail-wm1-x341.google.com with SMTP id d3so3240434wmb.4 for ; Thu, 03 Dec 2020 08:17:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=SVA62Nk456Js/i6ZOJlkZkWfZOO+qx1nt7RrTq98D+U=; b=RtxrXWcf6Ll2uxab2BjA1Ql7MeTGNbcatqt5JJBfkJcbT9wbo8lEiTFIoEfYfO2zzj 1mPQVcFyF88H2ocRAjf7036JU7lKgsjeOAzcrIUYfZHdIMNTUpJPJf7LPyTi3Na3JE3A A0A8LR59r2xjIuJbqatAX6tdiBQlZjZvAtaHIY+/3e/+juk+ZXCxXQLRR4d7EWuaVWmC ZRvPMlC54HOTV7Z9nDbqzcLxRF9tGUDGDifHDIaRkNVKn7XH5douX+ExpSzKUDiN7qhf mV0Bdrq70xzxflyp8DAPFsNCAaYYA3HLsl8lj/0esIdoFPTlu9Wj5EL2onOznNEHUYL0 PxHw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=SVA62Nk456Js/i6ZOJlkZkWfZOO+qx1nt7RrTq98D+U=; b=NciqVWSXdyxlk3fYJ+tTmqVu3o0q3HSCpOd+faGAzl06Ah/MB1jpC5+Y6sXk39rX84 Z+s+YzgCtr9qbys3x7XrCBlZbnX9MA9ii1ZL/OniGpYX8ahODrMwNPngFYfKLpCjN5Fb hyY66sxqiLKu5yzaUB+IS9Ux+g+GN/W9mwpeSU4P4+6/LM5K9LYEhoK+4hOeCf9lqCgB UGrh3VAhZwwHlwkl1GtXpjWU9jWMcUVWwalzb9KE9Yf6ZsdwisNXAHLlWYBQdWuwnq6F vR6gjIwEAnyO3rRot7x/SasaQ4dgNCh3jp3b7r/U5rYuLX+B2EznsHE+WIp0aHyMncD2 z5Kw== X-Gm-Message-State: AOAM5317UWhT2JL9f2LJ+JVhZM2qtm4jvR3vPejpWnTCPjsVess1/33E pka1fSb8mdJdSm/goMgUvWP27E2frxk= X-Google-Smtp-Source: ABdhPJywonG47Hig8O/uLKQD2ZmF3LtJRnNzcAQ+Yroa6jpoYPV2CjcclpeoalGxKxLquhGDaLhnVg== X-Received: by 2002:a1c:4184:: with SMTP id o126mr4152963wma.107.1607012227636; Thu, 03 Dec 2020 08:17:07 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id v64sm2234569wme.25.2020.12.03.08.17.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Dec 2020 08:17:07 -0800 (PST) Message-Id: In-Reply-To: References: Date: Thu, 03 Dec 2020 16:16:50 +0000 Subject: [PATCH 11/15] midx: use 64-bit multiplication for chunk sizes Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: szeder.dev@gmail.com, me@ttaylorr.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee When calculating the sizes of certain chunks, we should use 64-bit multiplication always. This allows us to properly predict the chunk sizes without risk of overflow. Signed-off-by: Derrick Stolee --- midx.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/midx.c b/midx.c index 0548266bea..47f5f60fcd 100644 --- a/midx.c +++ b/midx.c @@ -946,12 +946,12 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * chunks[2].write_fn = write_midx_oid_lookup; chunks[3].id = MIDX_CHUNKID_OBJECTOFFSETS; - chunks[3].size = ctx.entries_nr * MIDX_CHUNK_OFFSET_WIDTH; + chunks[3].size = (uint64_t)ctx.entries_nr * MIDX_CHUNK_OFFSET_WIDTH; chunks[3].write_fn = write_midx_object_offsets; if (ctx.large_offsets_needed) { chunks[4].id = MIDX_CHUNKID_LARGEOFFSETS; - chunks[4].size = ctx.num_large_offsets * MIDX_CHUNK_LARGE_OFFSET_WIDTH; + chunks[4].size = (uint64_t)ctx.num_large_offsets * MIDX_CHUNK_LARGE_OFFSET_WIDTH; chunks[4].write_fn = write_midx_large_offsets; } From patchwork Thu Dec 3 16:16:51 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 11949117 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C6F29C433FE for ; Thu, 3 Dec 2020 16:18:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7B430207AD for ; Thu, 3 Dec 2020 16:18:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2501890AbgLCQSV (ORCPT ); Thu, 3 Dec 2020 11:18:21 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57036 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2501883AbgLCQST (ORCPT ); Thu, 3 Dec 2020 11:18:19 -0500 Received: from mail-wr1-x443.google.com (mail-wr1-x443.google.com [IPv6:2a00:1450:4864:20::443]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 28FEFC08E862 for ; Thu, 3 Dec 2020 08:17:10 -0800 (PST) Received: by mail-wr1-x443.google.com with SMTP id t4so2428016wrr.12 for ; Thu, 03 Dec 2020 08:17:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=+k5IPhjnFps1bPQ+fZLXw1kxy5QJksAzxhHBKv4+0wQ=; b=btTNDGsNqFyQ8exz7GIc7lWRJ9+Kt9+Pwf/utN4PbFAz/+YR/O4o2/gsUjAtWJrNOH OAMPyIsW1X2PvFYYfH5gFuZfNpaeF9h2KE1ZFbNovzdbzlazfNsdGizSw7Ep6zRzE3WX 1m6EOsZEHhpKsGrWqemnproManV2nb0vO2p4f6pMYMSDHP8VtuIZWFkgYeqXGJdDib03 8grmOnodW2/uJ1+oe+IIy+nTEnoqoi5599PRhjvlP44mjSNkDnL6fHmQ4mDyNbnKOYdK cEWTwpVn2q6t7kym/FXiJMDVTNgh7tZ+2Mby+Ixy54vcNtO4CRQ6i5B6npyF0z4SrtVA JB8g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=+k5IPhjnFps1bPQ+fZLXw1kxy5QJksAzxhHBKv4+0wQ=; b=bvRCFs176dLjKAvc4bnAnF+m4bwm3r+QDTZBsV8FThDlfqSE2jDytqbxKRG1smuMq5 doICdmW7hVNdk42BY1MRvI2VDhanXOmfEcSB1jWgNnjGDj1EsTGwAvmKcwsgLKZnJoJk 4AwfGAHc/a81zUmGe8abCRx1ohzVhcHAVOsF/YFICQayIRfttitTiUQhH1apQ2Gg03fY AHtp9xlLc/jImn1X0JQgxAUvw6vzbOidgL10a8/KGfA1oXhFBKj52tFAJs3q3iybK84M m9NDmA2X6nRJwbVgBMd63guBzDkgOdhNU6uEG44LTVGlAGxaWj5DVTt0kIacvdVWcasI Aiqg== X-Gm-Message-State: AOAM532HpZyECgWBxVHDRrpsyDjG+GDfBfiHH8NGGJ5LYWsRHxILo0zN Zn7paf4xM0d6npZ+umuIzarHFkvV5HU= X-Google-Smtp-Source: ABdhPJy7cxIKdjJD4G/kJzz0WTtoRcJJwD6pc8Fbn5vsW2NVNssXbsXwBfv4oTgX+IybJB6Z0qFp6Q== X-Received: by 2002:adf:94c3:: with SMTP id 61mr4555113wrr.143.1607012228513; Thu, 03 Dec 2020 08:17:08 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id f23sm1883008wmb.43.2020.12.03.08.17.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Dec 2020 08:17:08 -0800 (PST) Message-Id: <03f3255c8f4a953065b2ff8e61816f83534c23ed.1607012215.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Thu, 03 Dec 2020 16:16:51 +0000 Subject: [PATCH 12/15] chunk-format: create write_chunks() Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: szeder.dev@gmail.com, me@ttaylorr.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The commit-graph and multi-pack-index files both use a chunk-based file format. They have already unified on using write_table_of_contents(), but we expand upon that by unifying their chunk writing loop. This takes the concepts already present in the commit-graph that were dropped in the multi-pack-index code during refactoring, including: * Check the hashfile for how much data was written by each write_fn. * Allow write_fn() to report an error that results in a failure without using die() in the low-level commands. This simplifies the code in commit-graph.c and midx.c while laying the foundation for future formats using similar ideas. Signed-off-by: Derrick Stolee --- chunk-format.c | 23 +++++++++++++++++++++++ chunk-format.h | 13 +++++++++++++ commit-graph.c | 13 ++----------- midx.c | 3 +-- 4 files changed, 39 insertions(+), 13 deletions(-) diff --git a/chunk-format.c b/chunk-format.c index 771b6d98d0..a6643a4fc8 100644 --- a/chunk-format.c +++ b/chunk-format.c @@ -24,3 +24,26 @@ void write_table_of_contents(struct hashfile *f, hashwrite_be32(f, 0); hashwrite_be64(f, cur_offset); } + +int write_chunks(struct hashfile *f, + struct chunk_info *chunks, + int nr, + void *data) +{ + int i; + + for (i = 0; i < nr; i++) { + uint64_t start_offset = f->total + f->offset; + int result = chunks[i].write_fn(f, data); + + if (result) + return result; + + if (f->total + f->offset != start_offset + chunks[i].size) + BUG("expected to write %"PRId64" bytes to chunk %"PRIx32", but wrote %"PRId64" instead", + chunks[i].size, chunks[i].id, + f->total + f->offset - start_offset); + } + + return 0; +} diff --git a/chunk-format.h b/chunk-format.h index 4b9cbeb372..a2c7ddb23b 100644 --- a/chunk-format.h +++ b/chunk-format.h @@ -33,4 +33,17 @@ void write_table_of_contents(struct hashfile *f, struct chunk_info *chunks, int nr); +/* + * Write the data for the given chunk list using the provided + * write_fn values. The given 'data' parameter is passed to those + * methods. + * + * The data that is written by each write_fn is checked to be of + * the expected size, and a BUG() is thrown if not specified correctly. + */ +int write_chunks(struct hashfile *f, + struct chunk_info *chunks, + int nr, + void *data); + #endif diff --git a/commit-graph.c b/commit-graph.c index 5494fda1d3..10dcef9d6b 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -1809,17 +1809,8 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx) num_chunks * ctx->commits.nr); } - for (i = 0; i < num_chunks; i++) { - uint64_t start_offset = f->total + f->offset; - - if (chunks[i].write_fn(f, ctx)) - return -1; - - if (f->total + f->offset != start_offset + chunks[i].size) - BUG("expected to write %"PRId64" bytes to chunk %"PRIx32", but wrote %"PRId64" instead", - chunks[i].size, chunks[i].id, - f->total + f->offset - start_offset); - } + if (write_chunks(f, chunks, num_chunks, ctx)) + return -1; stop_progress(&ctx->progress); strbuf_release(&progress_title); diff --git a/midx.c b/midx.c index 47f5f60fcd..67ac232a81 100644 --- a/midx.c +++ b/midx.c @@ -957,8 +957,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * write_table_of_contents(f, header_size, chunks, num_chunks); - for (i = 0; i < num_chunks; i++) - chunks[i].write_fn(f, &ctx); + result = write_chunks(f, chunks, num_chunks, &ctx); finalize_hashfile(f, NULL, CSUM_FSYNC | CSUM_HASH_IN_STREAM); commit_lock_file(&lk); From patchwork Thu Dec 3 16:16:52 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 11949119 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7CF54C4361B for ; Thu, 3 Dec 2020 16:18:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 24457207AA for ; Thu, 3 Dec 2020 16:18:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2501886AbgLCQSU (ORCPT ); Thu, 3 Dec 2020 11:18:20 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57038 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2501877AbgLCQST (ORCPT ); Thu, 3 Dec 2020 11:18:19 -0500 Received: from mail-wm1-x341.google.com (mail-wm1-x341.google.com [IPv6:2a00:1450:4864:20::341]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 674C8C08E863 for ; Thu, 3 Dec 2020 08:17:11 -0800 (PST) Received: by mail-wm1-x341.google.com with SMTP id h21so4506824wmb.2 for ; Thu, 03 Dec 2020 08:17:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=Zhqt0AwMzROvyhcvFbwQxVrzJhz+jcUqN5PlkHniKp8=; b=oxH27VgAqRLFpNvp3zM69x/+MJHDhKirLpsMV1f4OMEj13/FOv5WMmBrtH4FW85e30 oBGWl31Y+61aNNyZcvXdet58DO37MSql5G08pCSigPWhtq3MH36T0jI2iueRmA6JBVIq XFB+epEqBjD/3YrcYB87M1LCWycvmDUZ7qLsYFLYKg9bUC8rq/3/7c7PXVyQB9/kQ9pM rMJyXwO/074M3WQu4tk5jbB8VDDlHv1L/1ZwnrD8Z2SgEFIi0Slg7lQvwKVjmn3ZabHd I+g5IpHFegdiRI8COdhuHDn9wRKkshPcoQWkgxB5Ukcv3JsWd5ZtKuAc1wQpFNOlk57/ jg6Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=Zhqt0AwMzROvyhcvFbwQxVrzJhz+jcUqN5PlkHniKp8=; b=ESRfcpqwgobHykiQ76gNJfdNJnOdfI8E18coiQmGKmw3Yec+FgBn2ocLo737wX8Ahp RYZsbQTVcM1BAGHEqdlr8h5B2mJZyV9UIKXDFnEjr21HUibiSzvW9HTQdakC27e4xvY5 i/hpBvh+2h/A0CCG+nUzsqMoLsx+Xo+MmQcjkixrrhxnrcZDaBvka7AxmLHeIyHdbDZo O5+GcY1eexS+GHW7GJHoa3u/oJ4wsjywnXzXCzIaAjVdRqMxtpaReWFNHSTM/KG6Vh8J d8KW8xQL9q3rmUouX5eXhGYvasU9fF4IiIeAtshqMD8AyL7s4VGif4ObQ6wddM3tqYga c0Jg== X-Gm-Message-State: AOAM533N7Q3wDwB59jSV8JdLczL90GA2+n8kby0SdLOfqjVmRggCJ06m OOczyj4odY3sScMrXvkqlIUfgk+UwPU= X-Google-Smtp-Source: ABdhPJxgo4JkUvgixavlz5YczkRpuNzG44l9jSrrEfbdZg01RFlLuOuNXPj1Rjt47GNESLKnGRPX3g== X-Received: by 2002:a7b:ca47:: with SMTP id m7mr4181434wml.33.1607012229652; Thu, 03 Dec 2020 08:17:09 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id d3sm2320958wrr.2.2020.12.03.08.17.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Dec 2020 08:17:09 -0800 (PST) Message-Id: <6801e231f7414444a272f2ea87dcc6f60f29e25a.1607012215.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Thu, 03 Dec 2020 16:16:52 +0000 Subject: [PATCH 13/15] chunk-format: create chunk reading API Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: szeder.dev@gmail.com, me@ttaylorr.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee Now that the chunk-format API has a consistent mechanism for writing file formats based on chunks, let's extend it to also parse chunk-based files during read. Similar to the write scenario, the caller supplies some context information, such as a memory location, the offset of the table of contents, and some information of what to do with each chunk. The table of contents parsing will skip any unspecified chunks and will leave the specifics of each chunk to a function pointer. This implementation handles some of the important error cases, such as chunk offsets that escape the size of the file. However, we drop the case of duplicate chunks and leave that to the given functions. It may be helpful to allow multiple instances of the same chunk ID for some formats. The new location of these error checks change the error strings and also the tests that verify for corruption in the table of contents. Signed-off-by: Derrick Stolee --- chunk-format.c | 56 ++++++++++ chunk-format.h | 20 ++++ commit-graph.c | 200 ++++++++++++++++++++---------------- midx.c | 114 ++++++++++++-------- t/t5318-commit-graph.sh | 2 +- t/t5319-multi-pack-index.sh | 6 +- 6 files changed, 261 insertions(+), 137 deletions(-) diff --git a/chunk-format.c b/chunk-format.c index a6643a4fc8..d888ef6ec7 100644 --- a/chunk-format.c +++ b/chunk-format.c @@ -1,6 +1,7 @@ #include "git-compat-util.h" #include "chunk-format.h" #include "csum-file.h" +#include "cache.h" #define CHUNK_LOOKUP_WIDTH 12 void write_table_of_contents(struct hashfile *f, @@ -47,3 +48,58 @@ int write_chunks(struct hashfile *f, return 0; } + +int read_table_of_contents(const unsigned char *mfile, + size_t mfile_size, + uint64_t toc_offset, + int toc_length, + struct read_chunk_info *chunks, + int nr, + void *data) +{ + uint32_t chunk_id; + const unsigned char *table_of_contents = mfile + toc_offset; + + while (toc_length--) { + int i; + uint64_t chunk_offset, next_chunk_offset; + + chunk_id = get_be32(table_of_contents); + chunk_offset = get_be64(table_of_contents + 4); + + if (!chunk_id) { + error(_("terminating chunk id appears earlier than expected")); + return 1; + } + + table_of_contents += CHUNK_LOOKUP_WIDTH; + next_chunk_offset = get_be64(table_of_contents + 4); + + if (next_chunk_offset < chunk_offset || + next_chunk_offset > mfile_size - the_hash_algo->rawsz) { + error(_("improper chunk offset(s) %"PRIx64" and %"PRIx64""), + chunk_offset, next_chunk_offset); + return 1; + } + for (i = 0; i < nr; i++) { + if (chunks[i].id == chunk_id) { + int result = chunks[i].read_fn( + mfile + chunk_offset, + next_chunk_offset - chunk_offset, + data); + + if (result) + return result; + break; + } + } + } + + chunk_id = get_be32(table_of_contents); + if (chunk_id) { + error(_("final chunk has non-zero id %"PRIx32""), chunk_id); + return 1; + } + + return 0; +} diff --git a/chunk-format.h b/chunk-format.h index a2c7ddb23b..7049800f73 100644 --- a/chunk-format.h +++ b/chunk-format.h @@ -46,4 +46,24 @@ int write_chunks(struct hashfile *f, int nr, void *data); +/* + * When reading a table of contents, we find the chunk with matching 'id' + * then call its read_fn to populate the necessary 'data' based on the + * chunk start and size. + */ +typedef int (*chunk_read_fn)(const unsigned char *chunk_start, + size_t chunk_size, void *data); +struct read_chunk_info { + uint32_t id; + chunk_read_fn read_fn; +}; + +int read_table_of_contents(const unsigned char *mfile, + size_t mfile_size, + uint64_t toc_offset, + int toc_length, + struct read_chunk_info *chunks, + int nr, + void *data); + #endif diff --git a/commit-graph.c b/commit-graph.c index 10dcef9d6b..0a3ba147df 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -289,15 +289,114 @@ static int verify_commit_graph_lite(struct commit_graph *g) return 0; } +static int graph_read_oid_fanout(const unsigned char *chunk_start, + size_t chunk_size, void *data) +{ + struct commit_graph *g = (struct commit_graph *)data; + g->chunk_oid_fanout = (uint32_t*)chunk_start; + return 0; +} + +static int graph_read_oid_lookup(const unsigned char *chunk_start, + size_t chunk_size, void *data) +{ + struct commit_graph *g = (struct commit_graph *)data; + g->chunk_oid_lookup = chunk_start; + g->num_commits = chunk_size / g->hash_len; + return 0; +} + +static int graph_read_data(const unsigned char *chunk_start, + size_t chunk_size, void *data) +{ + struct commit_graph *g = (struct commit_graph *)data; + g->chunk_commit_data = chunk_start; + return 0; +} + +static int graph_read_extra_edges(const unsigned char *chunk_start, + size_t chunk_size, void *data) +{ + struct commit_graph *g = (struct commit_graph *)data; + g->chunk_extra_edges = chunk_start; + return 0; +} + +static int graph_read_base_graphs(const unsigned char *chunk_start, + size_t chunk_size, void *data) +{ + struct commit_graph *g = (struct commit_graph *)data; + g->chunk_base_graphs = chunk_start; + return 0; +} + +static int graph_read_bloom_indices(const unsigned char *chunk_start, + size_t chunk_size, void *data) +{ + struct commit_graph *g = (struct commit_graph *)data; + g->chunk_bloom_indexes = chunk_start; + return 0; +} + +static int graph_read_bloom_data(const unsigned char *chunk_start, + size_t chunk_size, void *data) +{ + struct commit_graph *g = (struct commit_graph *)data; + uint32_t hash_version; + g->chunk_bloom_data = chunk_start; + hash_version = get_be32(chunk_start); + + if (hash_version != 1) + return 0; + + g->bloom_filter_settings = xmalloc(sizeof(struct bloom_filter_settings)); + g->bloom_filter_settings->hash_version = hash_version; + g->bloom_filter_settings->num_hashes = get_be32(chunk_start + 4); + g->bloom_filter_settings->bits_per_entry = get_be32(chunk_start + 8); + g->bloom_filter_settings->max_changed_paths = DEFAULT_BLOOM_MAX_CHANGES; + + return 0; +} + +static struct read_chunk_info read_chunks[] = { + [0] = { + GRAPH_CHUNKID_OIDFANOUT, + graph_read_oid_fanout + }, + [1] = { + GRAPH_CHUNKID_OIDLOOKUP, + graph_read_oid_lookup + }, + [2] = { + GRAPH_CHUNKID_DATA, + graph_read_data + }, + [3] = { + GRAPH_CHUNKID_EXTRAEDGES, + graph_read_extra_edges + }, + [4] = { + GRAPH_CHUNKID_BASE, + graph_read_base_graphs + }, + [5] = { + GRAPH_CHUNKID_BLOOMINDEXES, + graph_read_bloom_indices + }, + [6] = { + GRAPH_CHUNKID_BLOOMDATA, + graph_read_bloom_data + } +}; + struct commit_graph *parse_commit_graph(struct repository *r, void *graph_map, size_t graph_size) { - const unsigned char *data, *chunk_lookup; - uint32_t i; + const unsigned char *data; struct commit_graph *graph; - uint64_t next_chunk_offset; uint32_t graph_signature; unsigned char graph_version, hash_version; + int chunks_nr = MAX_NUM_CHUNKS; if (!graph_map) return NULL; @@ -346,95 +445,14 @@ struct commit_graph *parse_commit_graph(struct repository *r, return NULL; } - chunk_lookup = data + 8; - next_chunk_offset = get_be64(chunk_lookup + 4); - for (i = 0; i < graph->num_chunks; i++) { - uint32_t chunk_id; - uint64_t chunk_offset = next_chunk_offset; - int chunk_repeated = 0; - - chunk_id = get_be32(chunk_lookup + 0); - - chunk_lookup += GRAPH_CHUNKLOOKUP_WIDTH; - next_chunk_offset = get_be64(chunk_lookup + 4); - - if (chunk_offset > graph_size - the_hash_algo->rawsz) { - error(_("commit-graph improper chunk offset %08x%08x"), (uint32_t)(chunk_offset >> 32), - (uint32_t)chunk_offset); - goto free_and_return; - } - - switch (chunk_id) { - case GRAPH_CHUNKID_OIDFANOUT: - if (graph->chunk_oid_fanout) - chunk_repeated = 1; - else - graph->chunk_oid_fanout = (uint32_t*)(data + chunk_offset); - break; - - case GRAPH_CHUNKID_OIDLOOKUP: - if (graph->chunk_oid_lookup) - chunk_repeated = 1; - else { - graph->chunk_oid_lookup = data + chunk_offset; - graph->num_commits = (next_chunk_offset - chunk_offset) - / graph->hash_len; - } - break; - - case GRAPH_CHUNKID_DATA: - if (graph->chunk_commit_data) - chunk_repeated = 1; - else - graph->chunk_commit_data = data + chunk_offset; - break; + /* limit the chunk-format list if we are ignoring Bloom filters */ + if (!r->settings.commit_graph_read_changed_paths) + chunks_nr = 5; - case GRAPH_CHUNKID_EXTRAEDGES: - if (graph->chunk_extra_edges) - chunk_repeated = 1; - else - graph->chunk_extra_edges = data + chunk_offset; - break; - - case GRAPH_CHUNKID_BASE: - if (graph->chunk_base_graphs) - chunk_repeated = 1; - else - graph->chunk_base_graphs = data + chunk_offset; - break; - - case GRAPH_CHUNKID_BLOOMINDEXES: - if (graph->chunk_bloom_indexes) - chunk_repeated = 1; - else if (r->settings.commit_graph_read_changed_paths) - graph->chunk_bloom_indexes = data + chunk_offset; - break; - - case GRAPH_CHUNKID_BLOOMDATA: - if (graph->chunk_bloom_data) - chunk_repeated = 1; - else if (r->settings.commit_graph_read_changed_paths) { - uint32_t hash_version; - graph->chunk_bloom_data = data + chunk_offset; - hash_version = get_be32(data + chunk_offset); - - if (hash_version != 1) - break; - - graph->bloom_filter_settings = xmalloc(sizeof(struct bloom_filter_settings)); - graph->bloom_filter_settings->hash_version = hash_version; - graph->bloom_filter_settings->num_hashes = get_be32(data + chunk_offset + 4); - graph->bloom_filter_settings->bits_per_entry = get_be32(data + chunk_offset + 8); - graph->bloom_filter_settings->max_changed_paths = DEFAULT_BLOOM_MAX_CHANGES; - } - break; - } - - if (chunk_repeated) { - error(_("commit-graph chunk id %08x appears multiple times"), chunk_id); - goto free_and_return; - } - } + if (read_table_of_contents( + graph->data, graph_size, GRAPH_HEADER_SIZE, graph->num_chunks, + read_chunks, chunks_nr, graph)) + goto free_and_return; if (graph->chunk_bloom_indexes && graph->chunk_bloom_data) { init_bloom_filters(); diff --git a/midx.c b/midx.c index 67ac232a81..786b3b51c3 100644 --- a/midx.c +++ b/midx.c @@ -54,6 +54,74 @@ static char *get_midx_filename(const char *object_dir) return xstrfmt("%s/pack/multi-pack-index", object_dir); } +static int midx_read_pack_names(const unsigned char *chunk_start, + size_t chunk_size, void *data) +{ + struct multi_pack_index *m = (struct multi_pack_index *)data; + m->chunk_pack_names = chunk_start; + return 0; +} + +static int midx_read_oid_fanout(const unsigned char *chunk_start, + size_t chunk_size, void *data) +{ + struct multi_pack_index *m = (struct multi_pack_index *)data; + m->chunk_oid_fanout = (uint32_t *)chunk_start; + + if (chunk_size != 4 * 256) { + error(_("multi-pack-index OID fanout is of the wrong size")); + return 1; + } + return 0; +} + +static int midx_read_oid_lookup(const unsigned char *chunk_start, + size_t chunk_size, void *data) +{ + struct multi_pack_index *m = (struct multi_pack_index *)data; + m->chunk_oid_lookup = chunk_start; + return 0; +} + +static int midx_read_offsets(const unsigned char *chunk_start, + size_t chunk_size, void *data) +{ + struct multi_pack_index *m = (struct multi_pack_index *)data; + m->chunk_object_offsets = chunk_start; + return 0; +} + +static int midx_read_large_offsets(const unsigned char *chunk_start, + size_t chunk_size, void *data) +{ + struct multi_pack_index *m = (struct multi_pack_index *)data; + m->chunk_large_offsets = chunk_start; + return 0; +} + +static struct read_chunk_info read_chunks[] = { + [0] = { + MIDX_CHUNKID_PACKNAMES, + midx_read_pack_names + }, + [1] = { + MIDX_CHUNKID_OIDFANOUT, + midx_read_oid_fanout + }, + [2] = { + MIDX_CHUNKID_OIDLOOKUP, + midx_read_oid_lookup + }, + [3] = { + MIDX_CHUNKID_OBJECTOFFSETS, + midx_read_offsets + }, + [4] = { + MIDX_CHUNKID_LARGEOFFSETS, + midx_read_large_offsets + } +}; + struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local) { struct multi_pack_index *m = NULL; @@ -114,48 +182,10 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local m->num_packs = get_be32(m->data + MIDX_BYTE_NUM_PACKS); - for (i = 0; i < m->num_chunks; i++) { - uint32_t chunk_id = get_be32(m->data + MIDX_HEADER_SIZE + - MIDX_CHUNKLOOKUP_WIDTH * i); - uint64_t chunk_offset = get_be64(m->data + MIDX_HEADER_SIZE + 4 + - MIDX_CHUNKLOOKUP_WIDTH * i); - - if (chunk_offset >= m->data_len) - die(_("invalid chunk offset (too large)")); - - switch (chunk_id) { - case MIDX_CHUNKID_PACKNAMES: - m->chunk_pack_names = m->data + chunk_offset; - break; - - case MIDX_CHUNKID_OIDFANOUT: - m->chunk_oid_fanout = (uint32_t *)(m->data + chunk_offset); - break; - - case MIDX_CHUNKID_OIDLOOKUP: - m->chunk_oid_lookup = m->data + chunk_offset; - break; - - case MIDX_CHUNKID_OBJECTOFFSETS: - m->chunk_object_offsets = m->data + chunk_offset; - break; - - case MIDX_CHUNKID_LARGEOFFSETS: - m->chunk_large_offsets = m->data + chunk_offset; - break; - - case 0: - die(_("terminating multi-pack-index chunk id appears earlier than expected")); - break; - - default: - /* - * Do nothing on unrecognized chunks, allowing future - * extensions to add optional chunks. - */ - break; - } - } + if (read_table_of_contents(m->data, midx_size, MIDX_HEADER_SIZE, + m->num_chunks, read_chunks, + MIDX_MAX_CHUNKS, m)) + goto cleanup_fail; if (!m->chunk_pack_names) die(_("multi-pack-index missing required pack-name chunk")); diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh index 2ed0c1544d..65879af6c0 100755 --- a/t/t5318-commit-graph.sh +++ b/t/t5318-commit-graph.sh @@ -563,7 +563,7 @@ test_expect_success 'detect bad hash version' ' test_expect_success 'detect low chunk count' ' corrupt_graph_and_verify $GRAPH_BYTE_CHUNK_COUNT "\01" \ - "missing the .* chunk" + "final chunk has non-zero id" ' test_expect_success 'detect missing OID fanout chunk' ' diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh index ace469c95c..a02d612f4d 100755 --- a/t/t5319-multi-pack-index.sh +++ b/t/t5319-multi-pack-index.sh @@ -314,12 +314,12 @@ test_expect_success 'verify bad OID version' ' test_expect_success 'verify truncated chunk count' ' corrupt_midx_and_verify $MIDX_BYTE_CHUNK_COUNT "\01" $objdir \ - "missing required" + "final chunk has non-zero id" ' test_expect_success 'verify extended chunk count' ' corrupt_midx_and_verify $MIDX_BYTE_CHUNK_COUNT "\07" $objdir \ - "terminating multi-pack-index chunk id appears earlier than expected" + "terminating chunk id appears earlier than expected" ' test_expect_success 'verify missing required chunk' ' @@ -329,7 +329,7 @@ test_expect_success 'verify missing required chunk' ' test_expect_success 'verify invalid chunk offset' ' corrupt_midx_and_verify $MIDX_BYTE_CHUNK_OFFSET "\01" $objdir \ - "invalid chunk offset (too large)" + "improper chunk offset(s)" ' test_expect_success 'verify packnames out of order' ' From patchwork Thu Dec 3 16:16:53 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 11949123 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1F021C0007A for ; Thu, 3 Dec 2020 16:18:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C81CB207AA for ; Thu, 3 Dec 2020 16:18:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2501892AbgLCQSV (ORCPT ); Thu, 3 Dec 2020 11:18:21 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57044 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2501888AbgLCQSU (ORCPT ); Thu, 3 Dec 2020 11:18:20 -0500 Received: from mail-wr1-x442.google.com (mail-wr1-x442.google.com [IPv6:2a00:1450:4864:20::442]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2FF4AC061A52 for ; Thu, 3 Dec 2020 08:17:12 -0800 (PST) Received: by mail-wr1-x442.google.com with SMTP id u12so2476810wrt.0 for ; Thu, 03 Dec 2020 08:17:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=x8Lbb1KbdEQ46sFMnLOoqdM26A56F4xaMLIe5N9T+7c=; b=c0VMC+QYyU9oBTzF1arD4FnyN9SqsSZ/7up6h7DYJoBNE2Km69eCLjTqNAryMHWOEA IvDXJoSFpoNZXeBphI3hc7F+nZFq71PNDHspbrbVTUGD1qrb9OpEiWBbw3OYE4R0rhUa +2ua7OnypJ6grS4m8ZHxrIv0j22m8r4Oig6gEYAGsXhxI63K7SI7cBbu0IqddyxNPxng OkgmEvqcBjONDFj3I3QhFuFdtnrpNRHAzaDdUkNOA9VEZx1rn77uN67cYsALXj+PknqH 9qoMMRen2jrII3ZeVt+gqbJ4uh68d82vHmYwyQVwxaqsbyKmQD1OxFHJ5VBSQ/BZIw1M hdJA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=x8Lbb1KbdEQ46sFMnLOoqdM26A56F4xaMLIe5N9T+7c=; b=VT9DBhXHi2f8+nltbTYarXSkil+HJutH9a6L2JKNuUz35BVLkbcp38GnqS4YjKoxiX X27MTczUw8Tmu2DwWWNdylj3Fel+2PWx3fq4RWeftB/cgtGc1IAR53yrHM3trFfOvd65 hihdVu5DKRICDhWZaLDCfQK8rgkIpw5H/S+UHX3CvOws8FNN3ONKxEXu2V9pLMGfFmBV G2ljAbHfmo0ieU7fA5rlFKg9ZZpmxpDoOk/8E2d+0LfwVAXR9hDJ4bStShJEJMcXMWwZ dYJqBtczVKvvWxETnBjNtP2YdG87LaEft/3a/rYGAEfVB5CZFNrYcFyy38MnaitKeGwz l+2w== X-Gm-Message-State: AOAM531k27ItlcMkOHA5cN4gMAYMB/RxBWhUzsKw5i/iATkhvIOU9mYo YcBtLxOiqcIKJ+VP0U+jp0DS8uTvK9k= X-Google-Smtp-Source: ABdhPJzFUxzULx9mrAFJh4PqxbmVxYZc85VGb4lWlXdS+T+lvczoul4DINiKqhyfNZsRhgN29JNrTw== X-Received: by 2002:a5d:63cb:: with SMTP id c11mr4593168wrw.238.1607012230737; Thu, 03 Dec 2020 08:17:10 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id 90sm2331033wra.95.2020.12.03.08.17.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Dec 2020 08:17:10 -0800 (PST) Message-Id: <106dd51f75699fbf4fc1e46687124995f5ef0278.1607012215.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Thu, 03 Dec 2020 16:16:53 +0000 Subject: [PATCH 14/15] commit-graph: restore duplicate chunk checks Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: szeder.dev@gmail.com, me@ttaylorr.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The previous change introduced read_table_of_contents() in the chunk-format API, but dropped the duplicate chunk check from the commit-graph parsing logic. This was done to keep flexibility in the chunk-format API. One way to restore this check is to have each chunk_read_fn method check if it has run before. This is somewhat repetitive. If we determine that the chunk-format API would be better off with a hard requirement that chunks are never repeated, then this could be replaced with a check in chunk-format.c. For now, only restore the duplicate checks that previously existed in the commit-graph parsing logic. Signed-off-by: Derrick Stolee Signed-off-by: Derrick Stolee --- commit-graph.c | 34 ++++++++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+) diff --git a/commit-graph.c b/commit-graph.c index 0a3ba147df..c0102fceba 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -289,10 +289,20 @@ static int verify_commit_graph_lite(struct commit_graph *g) return 0; } +static int report_duplicate(void) +{ + warning(_("duplicate chunk detected")); + return 1; +} + static int graph_read_oid_fanout(const unsigned char *chunk_start, size_t chunk_size, void *data) { struct commit_graph *g = (struct commit_graph *)data; + + if (g->chunk_oid_fanout) + return report_duplicate(); + g->chunk_oid_fanout = (uint32_t*)chunk_start; return 0; } @@ -301,6 +311,10 @@ static int graph_read_oid_lookup(const unsigned char *chunk_start, size_t chunk_size, void *data) { struct commit_graph *g = (struct commit_graph *)data; + + if (g->chunk_oid_lookup) + return report_duplicate(); + g->chunk_oid_lookup = chunk_start; g->num_commits = chunk_size / g->hash_len; return 0; @@ -310,6 +324,10 @@ static int graph_read_data(const unsigned char *chunk_start, size_t chunk_size, void *data) { struct commit_graph *g = (struct commit_graph *)data; + + if (g->chunk_commit_data) + return report_duplicate(); + g->chunk_commit_data = chunk_start; return 0; } @@ -318,6 +336,10 @@ static int graph_read_extra_edges(const unsigned char *chunk_start, size_t chunk_size, void *data) { struct commit_graph *g = (struct commit_graph *)data; + + if (g->chunk_extra_edges) + return report_duplicate(); + g->chunk_extra_edges = chunk_start; return 0; } @@ -326,6 +348,10 @@ static int graph_read_base_graphs(const unsigned char *chunk_start, size_t chunk_size, void *data) { struct commit_graph *g = (struct commit_graph *)data; + + if (g->chunk_base_graphs) + return report_duplicate(); + g->chunk_base_graphs = chunk_start; return 0; } @@ -334,6 +360,10 @@ static int graph_read_bloom_indices(const unsigned char *chunk_start, size_t chunk_size, void *data) { struct commit_graph *g = (struct commit_graph *)data; + + if (g->chunk_bloom_indexes) + return report_duplicate(); + g->chunk_bloom_indexes = chunk_start; return 0; } @@ -343,6 +373,10 @@ static int graph_read_bloom_data(const unsigned char *chunk_start, { struct commit_graph *g = (struct commit_graph *)data; uint32_t hash_version; + + if (g->chunk_bloom_data) + return report_duplicate(); + g->chunk_bloom_data = chunk_start; hash_version = get_be32(chunk_start); From patchwork Thu Dec 3 16:16:54 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 11949121 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 746C0C4361A for ; Thu, 3 Dec 2020 16:18:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2E539207AA for ; Thu, 3 Dec 2020 16:18:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2501894AbgLCQSV (ORCPT ); Thu, 3 Dec 2020 11:18:21 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57046 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2501889AbgLCQSU (ORCPT ); Thu, 3 Dec 2020 11:18:20 -0500 Received: from mail-wm1-x344.google.com (mail-wm1-x344.google.com [IPv6:2a00:1450:4864:20::344]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 35D52C08E864 for ; Thu, 3 Dec 2020 08:17:13 -0800 (PST) Received: by mail-wm1-x344.google.com with SMTP id f190so4531219wme.1 for ; Thu, 03 Dec 2020 08:17:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=qapLRmq+trRfHn8fUtX5V+BBJGktSVTO8UPHI/yN5eE=; b=n/T6DEJ4a4lLfY/ZyglBNu2vEH9gke0TC+uvcnsdmfsbFmVdLZ+rLhJhgQw1jY/9du fr2Uk8rXkAEq5I7yaBuCQNR0hb4tMgJX6pMFGizXkCaosWOpWl2zB+ZZO7pcWmM5zq2w NlZEw2VyV90tUOXAo8nhep/ZG9OwKR5BaDteuy1TngvEjVsKbqZEBlNrpwQ7LHfxtFYF woqy8GiLcjFdI2dcgMtG5kLRhTX5s1gFAOUpYOc3URA42E3dp/e140Iy55Q6jziUt8TK fNtHiU6dA5+7oNHg2i2TqhUULNooa9Iup04GYG5jeJlMQR4IGLBdLXZ+ofbULXfXDwv5 fqiw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=qapLRmq+trRfHn8fUtX5V+BBJGktSVTO8UPHI/yN5eE=; b=F7KhZMbmXNHsjIMV/i3uKiT0N3w6XKTV+jYqLa8y2mn5+CZN85krqZXb7f1jkzk2Li Gq33z5H071qUJfpYd1OvwCe4WY8FEVtCNixWe9rvIqwalCFAYAKtgI3jXhfD5VQP7f/I nMFmI/oajSyOCmxPwYQCWt8iY2GP8mGbjWu/UPjqcWLr15pJexXr5VN3E+VNheQmh38p uSat8C6S500z9l4ji+mHp/5BKVw3cjC8DBtXWDGxf6XPvHRMQGtHl7slLREeEzgnaBKl A5VYX9IXbSsDk3Dt8ngIoOAszZRmyDw9lbp4kqSA9NXWOYNDCGn6BJfRH+bS8wWXSGTY aNwQ== X-Gm-Message-State: AOAM532AcAEsZ9D6XE1TNXwK61c863ydATu6xZ8RDO9WuHv25ItWFC4M 8bSN0YEYpbJK4Hm76GdW+/+XlmkjXRE= X-Google-Smtp-Source: ABdhPJwb0lrrzaD9naDDshdrtL9NSacf0JlCGnHNDYTqBAwTEr1ZgZaPQBONkJD2UPyejwVnaZ9Krw== X-Received: by 2002:a1c:a706:: with SMTP id q6mr4223556wme.7.1607012231693; Thu, 03 Dec 2020 08:17:11 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id o15sm2322277wrp.74.2020.12.03.08.17.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Dec 2020 08:17:11 -0800 (PST) Message-Id: <2ce1c2a54261494df31808660792fef800dc9665.1607012215.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Thu, 03 Dec 2020 16:16:54 +0000 Subject: [PATCH 15/15] chunk-format: add technical docs Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: szeder.dev@gmail.com, me@ttaylorr.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The chunk-based file format is now an API in the code, but we should also take time to document it as a file format. Specifically, it matches the CHUNK LOOKUP sections of the commit-graph and multi-pack-index files, but there are some commonalities that should be grouped in this document. Signed-off-by: Derrick Stolee --- Documentation/technical/chunk-format.txt | 54 +++++++++++++++++++ .../technical/commit-graph-format.txt | 3 ++ Documentation/technical/pack-format.txt | 3 ++ 3 files changed, 60 insertions(+) create mode 100644 Documentation/technical/chunk-format.txt diff --git a/Documentation/technical/chunk-format.txt b/Documentation/technical/chunk-format.txt new file mode 100644 index 0000000000..3db3792dea --- /dev/null +++ b/Documentation/technical/chunk-format.txt @@ -0,0 +1,54 @@ +Chunk-based file formats +======================== + +Some file formats in Git use a common concept of "chunks" to describe +sections of the file. This allows structured access to a large file by +scanning a small "table of contents" for the remaining data. This common +format is used by the `commit-graph` and `multi-pack-index` files. See +link:technical/pack-format.html[the `multi-pack-index` format] and +link:technical/commit-graph-format.html[the `commit-graph` format] for +how they use the chunks to describe structured data. + +A chunk-based file format begins with some header information custom to +that format. That header should include enough information to identify +the file type, format version, and number of chunks in the file. From this +information, that file can determine the start of the chunk-based region. + +The chunk-based region starts with a table of contents describing where +each chunk starts and ends. This consists of (C+1) rows of 12 bytes each, +where C is the number of chunks. Consider the following table: + + | Chunk ID (4 bytes) | Chunk Offset (8 bytes) | + |--------------------|------------------------| + | ID[0] | OFFSET[0] | + | ... | ... | + | ID[C] | OFFSET[C] | + | 0x0000 | OFFSET[C+1] | + +Each row consists of a 4-byte chunk identifier (ID) and an 8-byte offset. +Each integer is stored in network-byte order. + +The chunk identifier `ID[i]` is a label for the data stored within this +fill from `OFFSET[i]` (inclusive) to `OFFSET[i+1]` (exclusive). Thus, the +size of the `i`th chunk is equal to the difference between `OFFSET[i+1]` +and `OFFSET[i]`. This requires that the chunk data appears contiguously +in the same order as the table of contents. + +The final entry in the table of contents must be four zero bytes. This +confirms that the table of contents is ending and provides the offset for +the end of the chunk-based data. + +Note: The chunk-based format expects that the file contains _at least_ a +trailing hash after `OFFSET[C+1]`. + +Functions for working with chunk-based file formats are declared in +`chunk-format.h`. Using these methods provide extra checks that assist +developers when creating new file formats, including: + + 1. Writing and reading the table of contents. + + 2. Verifying that the data written in a chunk matches the expected size + that was recorded in the table of contents. + + 3. Checking that a table of contents describes offsets properly within + the file boundaries. diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt index b3b58880b9..b92442780e 100644 --- a/Documentation/technical/commit-graph-format.txt +++ b/Documentation/technical/commit-graph-format.txt @@ -65,6 +65,9 @@ CHUNK LOOKUP: the length using the next chunk position if necessary.) Each chunk ID appears at most once. + The CHUNK LOOKUP matches the table of contents from + link:technical/chunk-format.html[the chunk-based file format]. + The remaining data in the body is described one chunk at a time, and these chunks may be given in any order. Chunks are required unless otherwise specified. diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt index f96b2e605f..2fb1e60d29 100644 --- a/Documentation/technical/pack-format.txt +++ b/Documentation/technical/pack-format.txt @@ -301,6 +301,9 @@ CHUNK LOOKUP: (Chunks are provided in file-order, so you can infer the length using the next chunk position if necessary.) + The CHUNK LOOKUP matches the table of contents from + link:technical/chunk-format.html[the chunk-based file format]. + The remaining data in the body is described one chunk at a time, and these chunks may be given in any order. Chunks are required unless otherwise specified.