From patchwork Wed Jan 27 15:01:40 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12050501 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 35F7EC433E9 for ; Wed, 27 Jan 2021 15:59:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E07C9207D0 for ; Wed, 27 Jan 2021 15:59:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235521AbhA0P7D (ORCPT ); Wed, 27 Jan 2021 10:59:03 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50804 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235418AbhA0PCi (ORCPT ); Wed, 27 Jan 2021 10:02:38 -0500 Received: from mail-wr1-x42b.google.com (mail-wr1-x42b.google.com [IPv6:2a00:1450:4864:20::42b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 04A6BC061756 for ; Wed, 27 Jan 2021 07:02:01 -0800 (PST) Received: by mail-wr1-x42b.google.com with SMTP id v15so2256210wrx.4 for ; Wed, 27 Jan 2021 07:02:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=4jAJyZj2vuArliZTGrCmiku/Z3yLQwa3mMfot5zJvI0=; b=p/Nz+Ba6J2Wxi/vQdSda95wdnWWPBjo4Mw9oFk4Y5i4XUBfTso/H97C+RJyDfAwvoY 13AeNPAtCbYR9xETZEznJI53BS4S6xtNzyiWRWd70pny6ac6bwgHxyHXUVdkI6q2emt5 x/AGndKmJOOrmUIuHIupmnYvMAoP5t5PiMqBcASGdoJL6kY4Xz+s4rs71A6ZWtm0pofS tl2yOLWTXzcuY0o2cYe2o3Eil4nFldI6XZfBv+URnBRg8nM5uF1QXGMIsP+RNL1sbXVV I8oAR63XKINDw/rz+qqNQnNToTC3icAcEBZRYTZvw9Ak0jFMt1hVvEhnzQxY1OWnk2kp kTXA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=4jAJyZj2vuArliZTGrCmiku/Z3yLQwa3mMfot5zJvI0=; b=gTz7W+2mo7ZSeJwhrVUheKEV0VzapaTZDK7pxdBgEYeoHd9Gmi1pS0737+g5gjZt7R 6tq0CyjVp1baoaWnn80HTZuCfSbQD4hSdmkOVibVMxpiqL9HQ4C74mFiVUpN0HGgXwhL KUXcGzdVYdE/Ssnn7b6NEOUsV1WNzXnm11F4yRIo/tbN4da6tKq0gqK7gmVczWwC0Ng9 fE3Jr/W+HfS7GXEOjqphCFvz5r+6SO//2BdI+mGKsJW7QywNOw7rcX6dVhOz5HKck9oh 5PvnTMrohbEPo5kKusmX/12xiLmOcIXmdt6zfsDBum96GYMBv09zo0xPFBl3rhVlHiJ+ p2tA== X-Gm-Message-State: AOAM531OE1y8eQ2aS7BiyJQghxODX5Ln0hlxkssHXpdwEBFoCkCIcPQH iSIF5VCfRgziEr1ydykMR8N6CwDGpz8= X-Google-Smtp-Source: ABdhPJzwuEatrXqmKdGxDlGTqgJVCUQ2gVDiCX1VLclPVlEArtWsbWSz7tnfaFSh7FxqALpJe8N9jA== X-Received: by 2002:a5d:4d86:: with SMTP id b6mr11539184wru.152.1611759719418; Wed, 27 Jan 2021 07:01:59 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id i6sm3185562wrs.71.2021.01.27.07.01.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Jan 2021 07:01:58 -0800 (PST) Message-Id: <243dcec9436853ff8d1bf2580e76ab909b7cb324.1611759716.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Wed, 27 Jan 2021 15:01:40 +0000 Subject: [PATCH v2 01/17] commit-graph: anonymize data in chunk_write_fn Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Chris Torek , Derrick Stolee , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee In preparation for creating an API around file formats using chunks and tables of contents, prepare the commit-graph write code to use prototypes that will match this new API. Specifically, convert chunk_write_fn to take a "void *data" parameter instead of the commit-graph-specific "struct write_commit_graph_context" pointer. Signed-off-by: Derrick Stolee --- commit-graph.c | 29 +++++++++++++++++++---------- 1 file changed, 19 insertions(+), 10 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index f3bde2ad95a..fae7d1b6393 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -1040,8 +1040,9 @@ struct write_commit_graph_context { }; static int write_graph_chunk_fanout(struct hashfile *f, - struct write_commit_graph_context *ctx) + void *data) { + struct write_commit_graph_context *ctx = data; int i, count = 0; struct commit **list = ctx->commits.list; @@ -1066,8 +1067,9 @@ static int write_graph_chunk_fanout(struct hashfile *f, } static int write_graph_chunk_oids(struct hashfile *f, - struct write_commit_graph_context *ctx) + void *data) { + struct write_commit_graph_context *ctx = data; struct commit **list = ctx->commits.list; int count; for (count = 0; count < ctx->commits.nr; count++, list++) { @@ -1085,8 +1087,9 @@ static const unsigned char *commit_to_sha1(size_t index, void *table) } static int write_graph_chunk_data(struct hashfile *f, - struct write_commit_graph_context *ctx) + void *data) { + struct write_commit_graph_context *ctx = data; struct commit **list = ctx->commits.list; struct commit **last = ctx->commits.list + ctx->commits.nr; uint32_t num_extra_edges = 0; @@ -1187,8 +1190,9 @@ static int write_graph_chunk_data(struct hashfile *f, } static int write_graph_chunk_generation_data(struct hashfile *f, - struct write_commit_graph_context *ctx) + void *data) { + struct write_commit_graph_context *ctx = data; int i, num_generation_data_overflows = 0; for (i = 0; i < ctx->commits.nr; i++) { @@ -1208,8 +1212,9 @@ static int write_graph_chunk_generation_data(struct hashfile *f, } static int write_graph_chunk_generation_data_overflow(struct hashfile *f, - struct write_commit_graph_context *ctx) + void *data) { + struct write_commit_graph_context *ctx = data; int i; for (i = 0; i < ctx->commits.nr; i++) { struct commit *c = ctx->commits.list[i]; @@ -1226,8 +1231,9 @@ static int write_graph_chunk_generation_data_overflow(struct hashfile *f, } static int write_graph_chunk_extra_edges(struct hashfile *f, - struct write_commit_graph_context *ctx) + void *data) { + struct write_commit_graph_context *ctx = data; struct commit **list = ctx->commits.list; struct commit **last = ctx->commits.list + ctx->commits.nr; struct commit_list *parent; @@ -1280,8 +1286,9 @@ static int write_graph_chunk_extra_edges(struct hashfile *f, } static int write_graph_chunk_bloom_indexes(struct hashfile *f, - struct write_commit_graph_context *ctx) + void *data) { + struct write_commit_graph_context *ctx = data; struct commit **list = ctx->commits.list; struct commit **last = ctx->commits.list + ctx->commits.nr; uint32_t cur_pos = 0; @@ -1315,8 +1322,9 @@ static void trace2_bloom_filter_settings(struct write_commit_graph_context *ctx) } static int write_graph_chunk_bloom_data(struct hashfile *f, - struct write_commit_graph_context *ctx) + void *data) { + struct write_commit_graph_context *ctx = data; struct commit **list = ctx->commits.list; struct commit **last = ctx->commits.list + ctx->commits.nr; @@ -1737,8 +1745,9 @@ static int write_graph_chunk_base_1(struct hashfile *f, } static int write_graph_chunk_base(struct hashfile *f, - struct write_commit_graph_context *ctx) + void *data) { + struct write_commit_graph_context *ctx = data; int num = write_graph_chunk_base_1(f, ctx->new_base_graph); if (num != ctx->num_commit_graphs_after - 1) { @@ -1750,7 +1759,7 @@ static int write_graph_chunk_base(struct hashfile *f, } typedef int (*chunk_write_fn)(struct hashfile *f, - struct write_commit_graph_context *ctx); + void *data); struct chunk_info { uint32_t id; From patchwork Wed Jan 27 15:01:41 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12050319 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C2E1EC433E0 for ; Wed, 27 Jan 2021 15:08:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7A6E32075A for ; Wed, 27 Jan 2021 15:08:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233969AbhA0PGX (ORCPT ); Wed, 27 Jan 2021 10:06:23 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51600 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235453AbhA0PCz (ORCPT ); Wed, 27 Jan 2021 10:02:55 -0500 Received: from mail-wm1-x32a.google.com (mail-wm1-x32a.google.com [IPv6:2a00:1450:4864:20::32a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DDFFFC061352 for ; Wed, 27 Jan 2021 07:02:01 -0800 (PST) Received: by mail-wm1-x32a.google.com with SMTP id f16so1797490wmq.5 for ; Wed, 27 Jan 2021 07:02:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=2n9Gat+G9J21x4tPULKaAjt50eix2K1y4mbkQW2Mm04=; b=NZyvrbkAtvCsO5/svnXmcN9nfZXZtgPmnOHTqTQRd/XR/HWPdAodAXwYTMEvqcasJL 6MQNRUgr1g9/p2Q36S8Ypu9jvMpPMmuKzQv17tTlX7Erah4VvKcPGCS2fcpaV8fFPbBM c5c/bUzPhW25SfF9OxOHRsYzYg3vLVEEVMVGG6sPB8X62iwovv9zPIUdFb27dsZqst+/ 3s4goBAnAwbYuRvSJlWRhrzEDboNKmz8USS/PybTQMdA2ezj4lwRpsv1AgCkJNXK2uk4 u2QXMcoIhwWiwKCrQlbb3RizJgDecpXAHPbMGbWSSlft8Jb4mNxTpwR4vNk+hntX8Msb 1HEA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=2n9Gat+G9J21x4tPULKaAjt50eix2K1y4mbkQW2Mm04=; b=EtkXL048pRXPf7cwbyMxBerjPRMtRHbnbAxhHZf+Y7xU0Oet6CBz8STkwPMwX68FcK 14k9lHAzliL6vYxu/jsembqKuMsET6a48Yt9EJLmBY5m/ErekZ4gkDGaJ6caKV3L7gZN C4Z/pw96nJawA8002S+6I1ch0xetuKKPQ9T1K2ayCsIGIFKMPA46B667XEOTVSRSIeJu 4vxuNKDMdoqulc9X2mhXu4NlYYPyDfGnO8S2V97od5EYYiWcFG+3bLftT4vr3HGUhGjW 8Fsp+uTEPc9i0IJQ8Nsr1GW2g4qjGK4uV8KtyuyPen50Mii/0L4SBP/UUBTaRgcqj7QW yuEQ== X-Gm-Message-State: AOAM532SPb0DLXvh8e/sn67qwBalhui3OCvVUUWdIb56lx0KNWZkEUCb Qq3tELvXS3Q+tlQSYVNgN9Hu7NqnF5Y= X-Google-Smtp-Source: ABdhPJw2ERIHZHrtsMXmqEpGyZzEwgP6a+pDYzrLIQ4FKT91hWgBYoQWEOexJoF4uw4DxvRSE0pgZw== X-Received: by 2002:a05:600c:29cc:: with SMTP id s12mr4675147wmd.180.1611759720411; Wed, 27 Jan 2021 07:02:00 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id z185sm3166190wmb.0.2021.01.27.07.01.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Jan 2021 07:01:59 -0800 (PST) Message-Id: <814512f216719d89f41822753d5c71df5e49385d.1611759716.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Wed, 27 Jan 2021 15:01:41 +0000 Subject: [PATCH v2 02/17] chunk-format: create chunk format write API Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Chris Torek , Derrick Stolee , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee In anticipation of combining the logic from the commit-graph and multi-pack-index file formats, create a new chunk-format API. Use a 'struct chunkfile' pointer to keep track of data that has been registered for writes. This struct is anonymous outside of chunk-format.c to ensure no user attempts to interfere with the data. The next change will use this API in commit-graph.c, but the general approach is: 1. initialize the chunkfile with init_chunkfile(f). 2. add chunks in the intended writing order with add_chunk(). 3. write any header information to the hashfile f. 4. write the chunkfile data using write_chunkfile(). 5. free the chunkfile struct using free_chunkfile(). Helped-by: Taylor Blau Signed-off-by: Derrick Stolee --- Makefile | 1 + chunk-format.c | 91 ++++++++++++++++++++++++++++++++++++++++++++++++++ chunk-format.h | 20 +++++++++++ 3 files changed, 112 insertions(+) create mode 100644 chunk-format.c create mode 100644 chunk-format.h diff --git a/Makefile b/Makefile index 7b64106930a..50a7663841e 100644 --- a/Makefile +++ b/Makefile @@ -854,6 +854,7 @@ LIB_OBJS += bundle.o LIB_OBJS += cache-tree.o LIB_OBJS += chdir-notify.o LIB_OBJS += checkout.o +LIB_OBJS += chunk-format.o LIB_OBJS += color.o LIB_OBJS += column.o LIB_OBJS += combine-diff.o diff --git a/chunk-format.c b/chunk-format.c new file mode 100644 index 00000000000..ab914c55856 --- /dev/null +++ b/chunk-format.c @@ -0,0 +1,91 @@ +#include "cache.h" +#include "chunk-format.h" +#include "csum-file.h" +#define CHUNK_LOOKUP_WIDTH 12 + +/* + * When writing a chunk-based file format, collect the chunks in + * an array of chunk_info structs. The size stores the _expected_ + * amount of data that will be written by write_fn. + */ +struct chunk_info { + uint32_t id; + uint64_t size; + chunk_write_fn write_fn; +}; + +struct chunkfile { + struct hashfile *f; + + struct chunk_info *chunks; + size_t chunks_nr; + size_t chunks_alloc; +}; + +struct chunkfile *init_chunkfile(struct hashfile *f) +{ + struct chunkfile *cf = xcalloc(1, sizeof(*cf)); + cf->f = f; + return cf; +} + +void free_chunkfile(struct chunkfile *cf) +{ + if (!cf) + return; + free(cf->chunks); + free(cf); +} + +int get_num_chunks(struct chunkfile *cf) +{ + return cf->chunks_nr; +} + +void add_chunk(struct chunkfile *cf, + uint64_t id, + chunk_write_fn fn, + size_t size) +{ + ALLOC_GROW(cf->chunks, cf->chunks_nr + 1, cf->chunks_alloc); + + cf->chunks[cf->chunks_nr].id = id; + cf->chunks[cf->chunks_nr].write_fn = fn; + cf->chunks[cf->chunks_nr].size = size; + cf->chunks_nr++; +} + +int write_chunkfile(struct chunkfile *cf, void *data) +{ + int i; + size_t cur_offset = cf->f->offset + cf->f->total; + + /* Add the table of contents to the current offset */ + cur_offset += (cf->chunks_nr + 1) * CHUNK_LOOKUP_WIDTH; + + for (i = 0; i < cf->chunks_nr; i++) { + hashwrite_be32(cf->f, cf->chunks[i].id); + hashwrite_be64(cf->f, cur_offset); + + cur_offset += cf->chunks[i].size; + } + + /* Trailing entry marks the end of the chunks */ + hashwrite_be32(cf->f, 0); + hashwrite_be64(cf->f, cur_offset); + + for (i = 0; i < cf->chunks_nr; i++) { + uint64_t start_offset = cf->f->total + cf->f->offset; + int result = cf->chunks[i].write_fn(cf->f, data); + + if (result) + return result; + + if (cf->f->total + cf->f->offset - start_offset != cf->chunks[i].size) + BUG("expected to write %"PRId64" bytes to chunk %"PRIx32", but wrote %"PRId64" instead", + cf->chunks[i].size, cf->chunks[i].id, + cf->f->total + cf->f->offset - start_offset); + } + + return 0; +} diff --git a/chunk-format.h b/chunk-format.h new file mode 100644 index 00000000000..bfaed672813 --- /dev/null +++ b/chunk-format.h @@ -0,0 +1,20 @@ +#ifndef CHUNK_FORMAT_H +#define CHUNK_FORMAT_H + +#include "git-compat-util.h" + +struct hashfile; +struct chunkfile; + +struct chunkfile *init_chunkfile(struct hashfile *f); +void free_chunkfile(struct chunkfile *cf); +int get_num_chunks(struct chunkfile *cf); +typedef int (*chunk_write_fn)(struct hashfile *f, + void *data); +void add_chunk(struct chunkfile *cf, + uint64_t id, + chunk_write_fn fn, + size_t size); +int write_chunkfile(struct chunkfile *cf, void *data); + +#endif From patchwork Wed Jan 27 15:01:42 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12050315 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 057EEC433E6 for ; Wed, 27 Jan 2021 15:07:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9C7EC207FC for ; Wed, 27 Jan 2021 15:07:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235561AbhA0PGx (ORCPT ); Wed, 27 Jan 2021 10:06:53 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51606 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235456AbhA0PCz (ORCPT ); Wed, 27 Jan 2021 10:02:55 -0500 Received: from mail-wr1-x42b.google.com (mail-wr1-x42b.google.com [IPv6:2a00:1450:4864:20::42b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 57CA2C061353 for ; Wed, 27 Jan 2021 07:02:03 -0800 (PST) Received: by mail-wr1-x42b.google.com with SMTP id g10so2262039wrx.1 for ; Wed, 27 Jan 2021 07:02:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=IVxpJhxwjfihzysiEtTywlKgWec7iN7UTRxTl4Cf+Tw=; b=RfUbmthH2r8av4PEryiLNPOSnXdpOewtMxRDd29dV6ejhr9lDjy9x/alvM4IoeHaXI qpKkyg590s7UPMdTci5o5+yVLiAyM6w61HKnh1GnwKYXyPUF4PtMVZ62mlCwSEmA2S7r vSSgDJZZOHHpR20gLK+/A2nvsXhOUuAuBPEV9p0cXfMYqkp93EyJ3jd6d+QF5Siomt8g 1mBKjS6SucaaqHyQvLvh7xkf1rfrKklss+WZfj2sPO0LsS0ZRvoUvmwSCRjX8qpYWOwz s1BS/xStL2MBvuamz54icNSiAlZyv3NuTIdYd4+ktXhAjNlrb8+8Madr+eJ1nzUhXWL9 VXyw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=IVxpJhxwjfihzysiEtTywlKgWec7iN7UTRxTl4Cf+Tw=; b=tixK2OFM96SUWkCpyVG9zIyT3Re1S1hVgDRTCUwktuvWAWID23knpOdurCYF7u5qTU CaGehA3hk3xSnCXY/2Cb284tYnQ8lcfCIWIqyMFV03QnzwQQobK0fNpxxowWAindJgN1 wxBzMjZ2H/GIJsGAA3XOCSLPt4b9rmj7c25jEbbxnAGmqBsnDc1nWOR9n517VfN+Cp2g 1neUALbWktoGtGVtNP4ULnJK2BBOEKmLwWFf0hR0oTDz7Uf8EfqTQja4KXOkCDKZ84qo 9dX+izLnymjZWYkslhI+JKmOGC+QovdkoUmU3X8rTY92SxvzhgQm4HdikNNBI7Y21tSL JLlQ== X-Gm-Message-State: AOAM531DphE51DC6swpUyy+OTpXIWugpnQwU7ASn2sO18ShJI/IuoUK5 FJBga/qrVtUuoNNMKE8zMbZA7zqBUKI= X-Google-Smtp-Source: ABdhPJz6xiDkbMLK9EUSaRLYYSY4ji6q4FqGu4EaRy+ipwQUmJNWQGf4wTZMXgnqQAPzPXpW8OS08w== X-Received: by 2002:a5d:6588:: with SMTP id q8mr11675022wru.294.1611759721856; Wed, 27 Jan 2021 07:02:01 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id s23sm2717182wmc.35.2021.01.27.07.02.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Jan 2021 07:02:00 -0800 (PST) Message-Id: <70af6e3083f4f5e2b921c1c9817c790c8b5f66ce.1611759716.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Wed, 27 Jan 2021 15:01:42 +0000 Subject: [PATCH v2 03/17] commit-graph: use chunk-format write API Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Chris Torek , Derrick Stolee , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The commit-graph write logic is ready to make use of the chunk-format write API. Each chunk write method is already in the correct prototype. We only need to use the 'struct chunkfile' pointer and the correct API calls. Signed-off-by: Derrick Stolee --- commit-graph.c | 118 ++++++++++++++++--------------------------------- 1 file changed, 37 insertions(+), 81 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index fae7d1b6393..ba33777dcb8 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -19,6 +19,7 @@ #include "shallow.h" #include "json-writer.h" #include "trace2.h" +#include "chunk-format.h" void git_test_write_commit_graph_or_die(void) { @@ -1758,27 +1759,17 @@ static int write_graph_chunk_base(struct hashfile *f, return 0; } -typedef int (*chunk_write_fn)(struct hashfile *f, - void *data); - -struct chunk_info { - uint32_t id; - uint64_t size; - chunk_write_fn write_fn; -}; - static int write_commit_graph_file(struct write_commit_graph_context *ctx) { uint32_t i; int fd; struct hashfile *f; struct lock_file lk = LOCK_INIT; - struct chunk_info chunks[MAX_NUM_CHUNKS + 1]; const unsigned hashsz = the_hash_algo->rawsz; struct strbuf progress_title = STRBUF_INIT; int num_chunks = 3; - uint64_t chunk_offset; struct object_id file_hash; + struct chunkfile *cf; if (ctx->split) { struct strbuf tmp_file = STRBUF_INIT; @@ -1824,76 +1815,50 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx) f = hashfd(lk.tempfile->fd, lk.tempfile->filename.buf); } - chunks[0].id = GRAPH_CHUNKID_OIDFANOUT; - chunks[0].size = GRAPH_FANOUT_SIZE; - chunks[0].write_fn = write_graph_chunk_fanout; - chunks[1].id = GRAPH_CHUNKID_OIDLOOKUP; - chunks[1].size = hashsz * ctx->commits.nr; - chunks[1].write_fn = write_graph_chunk_oids; - chunks[2].id = GRAPH_CHUNKID_DATA; - chunks[2].size = (hashsz + 16) * ctx->commits.nr; - chunks[2].write_fn = write_graph_chunk_data; + cf = init_chunkfile(f); + + add_chunk(cf, GRAPH_CHUNKID_OIDFANOUT, + write_graph_chunk_fanout, GRAPH_FANOUT_SIZE); + add_chunk(cf, GRAPH_CHUNKID_OIDLOOKUP, + write_graph_chunk_oids, hashsz * ctx->commits.nr); + add_chunk(cf, GRAPH_CHUNKID_DATA, + write_graph_chunk_data, (hashsz + 16) * ctx->commits.nr); if (git_env_bool(GIT_TEST_COMMIT_GRAPH_NO_GDAT, 0)) ctx->write_generation_data = 0; - if (ctx->write_generation_data) { - chunks[num_chunks].id = GRAPH_CHUNKID_GENERATION_DATA; - chunks[num_chunks].size = sizeof(uint32_t) * ctx->commits.nr; - chunks[num_chunks].write_fn = write_graph_chunk_generation_data; - num_chunks++; - } - if (ctx->num_generation_data_overflows) { - chunks[num_chunks].id = GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW; - chunks[num_chunks].size = sizeof(timestamp_t) * ctx->num_generation_data_overflows; - chunks[num_chunks].write_fn = write_graph_chunk_generation_data_overflow; - num_chunks++; - } - if (ctx->num_extra_edges) { - chunks[num_chunks].id = GRAPH_CHUNKID_EXTRAEDGES; - chunks[num_chunks].size = 4 * ctx->num_extra_edges; - chunks[num_chunks].write_fn = write_graph_chunk_extra_edges; - num_chunks++; - } + if (ctx->write_generation_data) + add_chunk(cf, GRAPH_CHUNKID_GENERATION_DATA, + write_graph_chunk_generation_data, + sizeof(uint32_t) * ctx->commits.nr); + if (ctx->num_generation_data_overflows) + add_chunk(cf, GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW, + write_graph_chunk_generation_data_overflow, + sizeof(timestamp_t) * ctx->num_generation_data_overflows); + if (ctx->num_extra_edges) + add_chunk(cf, GRAPH_CHUNKID_EXTRAEDGES, + write_graph_chunk_extra_edges, + 4 * ctx->num_extra_edges); if (ctx->changed_paths) { - chunks[num_chunks].id = GRAPH_CHUNKID_BLOOMINDEXES; - chunks[num_chunks].size = sizeof(uint32_t) * ctx->commits.nr; - chunks[num_chunks].write_fn = write_graph_chunk_bloom_indexes; - num_chunks++; - chunks[num_chunks].id = GRAPH_CHUNKID_BLOOMDATA; - chunks[num_chunks].size = sizeof(uint32_t) * 3 - + ctx->total_bloom_filter_data_size; - chunks[num_chunks].write_fn = write_graph_chunk_bloom_data; - num_chunks++; - } - if (ctx->num_commit_graphs_after > 1) { - chunks[num_chunks].id = GRAPH_CHUNKID_BASE; - chunks[num_chunks].size = hashsz * (ctx->num_commit_graphs_after - 1); - chunks[num_chunks].write_fn = write_graph_chunk_base; - num_chunks++; - } - - chunks[num_chunks].id = 0; - chunks[num_chunks].size = 0; + add_chunk(cf, GRAPH_CHUNKID_BLOOMINDEXES, + write_graph_chunk_bloom_indexes, + sizeof(uint32_t) * ctx->commits.nr); + add_chunk(cf, GRAPH_CHUNKID_BLOOMDATA, + write_graph_chunk_bloom_data, + sizeof(uint32_t) * 3 + + ctx->total_bloom_filter_data_size); + } + if (ctx->num_commit_graphs_after > 1) + add_chunk(cf, GRAPH_CHUNKID_BASE, + write_graph_chunk_base, + hashsz * (ctx->num_commit_graphs_after - 1)); hashwrite_be32(f, GRAPH_SIGNATURE); hashwrite_u8(f, GRAPH_VERSION); hashwrite_u8(f, oid_version()); - hashwrite_u8(f, num_chunks); + hashwrite_u8(f, get_num_chunks(cf)); hashwrite_u8(f, ctx->num_commit_graphs_after - 1); - chunk_offset = 8 + (num_chunks + 1) * GRAPH_CHUNKLOOKUP_WIDTH; - for (i = 0; i <= num_chunks; i++) { - uint32_t chunk_write[3]; - - chunk_write[0] = htonl(chunks[i].id); - chunk_write[1] = htonl(chunk_offset >> 32); - chunk_write[2] = htonl(chunk_offset & 0xffffffff); - hashwrite(f, chunk_write, 12); - - chunk_offset += chunks[i].size; - } - if (ctx->report_progress) { strbuf_addf(&progress_title, Q_("Writing out commit graph in %d pass", @@ -1905,17 +1870,7 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx) num_chunks * ctx->commits.nr); } - for (i = 0; i < num_chunks; i++) { - uint64_t start_offset = f->total + f->offset; - - if (chunks[i].write_fn(f, ctx)) - return -1; - - if (f->total + f->offset != start_offset + chunks[i].size) - BUG("expected to write %"PRId64" bytes to chunk %"PRIx32", but wrote %"PRId64" instead", - chunks[i].size, chunks[i].id, - f->total + f->offset - start_offset); - } + write_chunkfile(cf, ctx); stop_progress(&ctx->progress); strbuf_release(&progress_title); @@ -1932,6 +1887,7 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx) close_commit_graph(ctx->r->objects); finalize_hashfile(f, file_hash.hash, CSUM_HASH_IN_STREAM | CSUM_FSYNC); + free_chunkfile(cf); if (ctx->split) { FILE *chainf = fdopen_lock_file(&lk, "w"); From patchwork Wed Jan 27 15:01:43 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12050499 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A992BC433E6 for ; Wed, 27 Jan 2021 15:59:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 63D3020825 for ; Wed, 27 Jan 2021 15:59:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236792AbhA0P6d (ORCPT ); Wed, 27 Jan 2021 10:58:33 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51088 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235421AbhA0PCi (ORCPT ); Wed, 27 Jan 2021 10:02:38 -0500 Received: from mail-wm1-x336.google.com (mail-wm1-x336.google.com [IPv6:2a00:1450:4864:20::336]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 71B3CC061354 for ; Wed, 27 Jan 2021 07:02:04 -0800 (PST) Received: by mail-wm1-x336.google.com with SMTP id y187so1946904wmd.3 for ; Wed, 27 Jan 2021 07:02:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=u102xFMKu2YfUtpFo0uIws1L5jfUcZTWZdjsCY1+jas=; b=IuNxWJvhQWz6YlbhT7ruLDbAauIhIcQfDZmXIN+96acao6fCWUmpWaQUu58uMdJfub blHExg3sP3UnyvQjHao82Jg/jVJA3bA4UNfmo0h4/VN3ha74vgOBek4Wzshr1M/ZVpzn AhXiyMdNLXo9hQlsh4dB29LkjO3fsiegfiPzqybI7bqBKPslISlmbtXBH1ks/HuImCam uJdvcyY9NkZ4SCDczeCIZ95PcZr101TKUoOKpdQjEG+h4WhNayQZQSKJ6nvAwPvd6a3D CSl6khy3rWDpNmviPPxScV8C7zHVer2XoCJ0J9+1tmxyeIu5a4yt1QRCie0uqVvWr/HI esiQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=u102xFMKu2YfUtpFo0uIws1L5jfUcZTWZdjsCY1+jas=; b=QZ+qvZxsuNnBTPHvyKnEy09e0dUJc8khAaSoonEL75ftzR0D1L2hkUyc0w/JUTco1f 2eEJISEpTez0tWmq/h8mSTA1M0ykil8TZbuedWAk2ZlMd4w8NL3UhJx5Zzj6yM/IVgN2 bZ2xAV8DgLP1qVfINIIKCO4vDdMTboh0coXFc3/li9gMSrHmSzFKRKjcypprRz+675iy OSmDR3fVK4DxAHQcuTa4pFDOM9z6CyKufy/zuWYOgsybcBJcntHRW99cKn+gmkpqMiuc yXtDXccNGuiEcTbZSw80hRVK/za0AnSRBGmmHpHceUQxOm3q0P/EFPZE30VnUewICO1N PNyA== X-Gm-Message-State: AOAM5325h9UngAPEjuekjWclRNB2WJqP6MVXedX+A1h60hFTWs4pvpBD ur7yZEO9w4ZYK4zH2hDh8GyRDdAEZHg= X-Google-Smtp-Source: ABdhPJzqIn+ExDITvSOYEljbPt3aD+S5Lf/hl6K6bQ4egsdfTKMbDveqowhSu1Fvwou7CdPr2rlZvw== X-Received: by 2002:a7b:c41a:: with SMTP id k26mr4600721wmi.1.1611759722869; Wed, 27 Jan 2021 07:02:02 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id h16sm3258800wrq.29.2021.01.27.07.02.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Jan 2021 07:02:02 -0800 (PST) Message-Id: <0cac7890bed7774296e9984f9891c3e765a9a11d.1611759716.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Wed, 27 Jan 2021 15:01:43 +0000 Subject: [PATCH v2 04/17] midx: rename pack_info to write_midx_context Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Chris Torek , Derrick Stolee , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee In an effort to streamline our chunk-based file formats, align some of the code structure in write_midx_internal() to be similar to the patterns in write_commit_graph_file(). Specifically, let's create a "struct write_midx_context" that can be used as a data parameter to abstract function types. This change only renames "struct pack_info" to "struct write_midx_context" and the names of instances from "packs" to "ctx". In future changes, we will expand the data inside "struct write_midx_context" and align our chunk-writing method with the chunk-format API. Signed-off-by: Derrick Stolee --- midx.c | 130 ++++++++++++++++++++++++++++----------------------------- 1 file changed, 65 insertions(+), 65 deletions(-) diff --git a/midx.c b/midx.c index 79c282b070d..561f65a63a5 100644 --- a/midx.c +++ b/midx.c @@ -451,7 +451,7 @@ static int pack_info_compare(const void *_a, const void *_b) return strcmp(a->pack_name, b->pack_name); } -struct pack_list { +struct write_midx_context { struct pack_info *info; uint32_t nr; uint32_t alloc; @@ -463,37 +463,37 @@ struct pack_list { static void add_pack_to_midx(const char *full_path, size_t full_path_len, const char *file_name, void *data) { - struct pack_list *packs = (struct pack_list *)data; + struct write_midx_context *ctx = data; if (ends_with(file_name, ".idx")) { - display_progress(packs->progress, ++packs->pack_paths_checked); - if (packs->m && midx_contains_pack(packs->m, file_name)) + display_progress(ctx->progress, ++ctx->pack_paths_checked); + if (ctx->m && midx_contains_pack(ctx->m, file_name)) return; - ALLOC_GROW(packs->info, packs->nr + 1, packs->alloc); + ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc); - packs->info[packs->nr].p = add_packed_git(full_path, - full_path_len, - 0); + ctx->info[ctx->nr].p = add_packed_git(full_path, + full_path_len, + 0); - if (!packs->info[packs->nr].p) { + if (!ctx->info[ctx->nr].p) { warning(_("failed to add packfile '%s'"), full_path); return; } - if (open_pack_index(packs->info[packs->nr].p)) { + if (open_pack_index(ctx->info[ctx->nr].p)) { warning(_("failed to open pack-index '%s'"), full_path); - close_pack(packs->info[packs->nr].p); - FREE_AND_NULL(packs->info[packs->nr].p); + close_pack(ctx->info[ctx->nr].p); + FREE_AND_NULL(ctx->info[ctx->nr].p); return; } - packs->info[packs->nr].pack_name = xstrdup(file_name); - packs->info[packs->nr].orig_pack_int_id = packs->nr; - packs->info[packs->nr].expired = 0; - packs->nr++; + ctx->info[ctx->nr].pack_name = xstrdup(file_name); + ctx->info[ctx->nr].orig_pack_int_id = ctx->nr; + ctx->info[ctx->nr].expired = 0; + ctx->nr++; } } @@ -801,7 +801,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * uint32_t i; struct hashfile *f = NULL; struct lock_file lk; - struct pack_list packs; + struct write_midx_context ctx = { 0 }; uint32_t *pack_perm = NULL; uint64_t written = 0; uint32_t chunk_ids[MIDX_MAX_CHUNKS + 1]; @@ -820,40 +820,40 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * midx_name); if (m) - packs.m = m; + ctx.m = m; else - packs.m = load_multi_pack_index(object_dir, 1); - - packs.nr = 0; - packs.alloc = packs.m ? packs.m->num_packs : 16; - packs.info = NULL; - ALLOC_ARRAY(packs.info, packs.alloc); - - if (packs.m) { - for (i = 0; i < packs.m->num_packs; i++) { - ALLOC_GROW(packs.info, packs.nr + 1, packs.alloc); - - packs.info[packs.nr].orig_pack_int_id = i; - packs.info[packs.nr].pack_name = xstrdup(packs.m->pack_names[i]); - packs.info[packs.nr].p = NULL; - packs.info[packs.nr].expired = 0; - packs.nr++; + ctx.m = load_multi_pack_index(object_dir, 1); + + ctx.nr = 0; + ctx.alloc = ctx.m ? ctx.m->num_packs : 16; + ctx.info = NULL; + ALLOC_ARRAY(ctx.info, ctx.alloc); + + if (ctx.m) { + for (i = 0; i < ctx.m->num_packs; i++) { + ALLOC_GROW(ctx.info, ctx.nr + 1, ctx.alloc); + + ctx.info[ctx.nr].orig_pack_int_id = i; + ctx.info[ctx.nr].pack_name = xstrdup(ctx.m->pack_names[i]); + ctx.info[ctx.nr].p = NULL; + ctx.info[ctx.nr].expired = 0; + ctx.nr++; } } - packs.pack_paths_checked = 0; + ctx.pack_paths_checked = 0; if (flags & MIDX_PROGRESS) - packs.progress = start_delayed_progress(_("Adding packfiles to multi-pack-index"), 0); + ctx.progress = start_delayed_progress(_("Adding packfiles to multi-pack-index"), 0); else - packs.progress = NULL; + ctx.progress = NULL; - for_each_file_in_pack_dir(object_dir, add_pack_to_midx, &packs); - stop_progress(&packs.progress); + for_each_file_in_pack_dir(object_dir, add_pack_to_midx, &ctx); + stop_progress(&ctx.progress); - if (packs.m && packs.nr == packs.m->num_packs && !packs_to_drop) + if (ctx.m && ctx.nr == ctx.m->num_packs && !packs_to_drop) goto cleanup; - entries = get_sorted_entries(packs.m, packs.info, packs.nr, &nr_entries); + entries = get_sorted_entries(ctx.m, ctx.info, ctx.nr, &nr_entries); for (i = 0; i < nr_entries; i++) { if (entries[i].offset > 0x7fffffff) @@ -862,19 +862,19 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * large_offsets_needed = 1; } - QSORT(packs.info, packs.nr, pack_info_compare); + QSORT(ctx.info, ctx.nr, pack_info_compare); if (packs_to_drop && packs_to_drop->nr) { int drop_index = 0; int missing_drops = 0; - for (i = 0; i < packs.nr && drop_index < packs_to_drop->nr; i++) { - int cmp = strcmp(packs.info[i].pack_name, + for (i = 0; i < ctx.nr && drop_index < packs_to_drop->nr; i++) { + int cmp = strcmp(ctx.info[i].pack_name, packs_to_drop->items[drop_index].string); if (!cmp) { drop_index++; - packs.info[i].expired = 1; + ctx.info[i].expired = 1; } else if (cmp > 0) { error(_("did not see pack-file %s to drop"), packs_to_drop->items[drop_index].string); @@ -882,7 +882,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * missing_drops++; i--; } else { - packs.info[i].expired = 0; + ctx.info[i].expired = 0; } } @@ -898,19 +898,19 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * * * pack_perm[old_id] = new_id */ - ALLOC_ARRAY(pack_perm, packs.nr); - for (i = 0; i < packs.nr; i++) { - if (packs.info[i].expired) { + ALLOC_ARRAY(pack_perm, ctx.nr); + for (i = 0; i < ctx.nr; i++) { + if (ctx.info[i].expired) { dropped_packs++; - pack_perm[packs.info[i].orig_pack_int_id] = PACK_EXPIRED; + pack_perm[ctx.info[i].orig_pack_int_id] = PACK_EXPIRED; } else { - pack_perm[packs.info[i].orig_pack_int_id] = i - dropped_packs; + pack_perm[ctx.info[i].orig_pack_int_id] = i - dropped_packs; } } - for (i = 0; i < packs.nr; i++) { - if (!packs.info[i].expired) - pack_name_concat_len += strlen(packs.info[i].pack_name) + 1; + for (i = 0; i < ctx.nr; i++) { + if (!ctx.info[i].expired) + pack_name_concat_len += strlen(ctx.info[i].pack_name) + 1; } if (pack_name_concat_len % MIDX_CHUNK_ALIGNMENT) @@ -921,19 +921,19 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * f = hashfd(lk.tempfile->fd, lk.tempfile->filename.buf); FREE_AND_NULL(midx_name); - if (packs.m) - close_midx(packs.m); + if (ctx.m) + close_midx(ctx.m); cur_chunk = 0; num_chunks = large_offsets_needed ? 5 : 4; - if (packs.nr - dropped_packs == 0) { + if (ctx.nr - dropped_packs == 0) { error(_("no pack files to index.")); result = 1; goto cleanup; } - written = write_midx_header(f, num_chunks, packs.nr - dropped_packs); + written = write_midx_header(f, num_chunks, ctx.nr - dropped_packs); chunk_ids[cur_chunk] = MIDX_CHUNKID_PACKNAMES; chunk_offsets[cur_chunk] = written + (num_chunks + 1) * MIDX_CHUNKLOOKUP_WIDTH; @@ -990,7 +990,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * switch (chunk_ids[i]) { case MIDX_CHUNKID_PACKNAMES: - written += write_midx_pack_names(f, packs.info, packs.nr); + written += write_midx_pack_names(f, ctx.info, ctx.nr); break; case MIDX_CHUNKID_OIDFANOUT: @@ -1027,15 +1027,15 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * commit_lock_file(&lk); cleanup: - for (i = 0; i < packs.nr; i++) { - if (packs.info[i].p) { - close_pack(packs.info[i].p); - free(packs.info[i].p); + for (i = 0; i < ctx.nr; i++) { + if (ctx.info[i].p) { + close_pack(ctx.info[i].p); + free(ctx.info[i].p); } - free(packs.info[i].pack_name); + free(ctx.info[i].pack_name); } - free(packs.info); + free(ctx.info); free(entries); free(pack_perm); free(midx_name); From patchwork Wed Jan 27 15:01:44 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12050313 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9B04DC433DB for ; Wed, 27 Jan 2021 15:07:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4818C2074D for ; Wed, 27 Jan 2021 15:07:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235779AbhA0PGh (ORCPT ); Wed, 27 Jan 2021 10:06:37 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51102 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235442AbhA0PCz (ORCPT ); Wed, 27 Jan 2021 10:02:55 -0500 Received: from mail-wr1-x42e.google.com (mail-wr1-x42e.google.com [IPv6:2a00:1450:4864:20::42e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4356BC0613ED for ; Wed, 27 Jan 2021 07:02:05 -0800 (PST) Received: by mail-wr1-x42e.google.com with SMTP id 7so2265450wrz.0 for ; Wed, 27 Jan 2021 07:02:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=xBEf4iQ2asSfr1nhf3N8+WaJ6o77jtBdSEhmotRLPtI=; b=qeP312DuVvDr+jT8P9wn9hMBOleXTgzGoCQ6PF7d1mi48NrKOSZvrfITD9pjkHyU4x 7RwKfUjmkuViqhuMGeWJiI/aSkHXHVREacs83FPB+qqDsH/vSvSqVFHHCMs5q8qClT1b UL0qND3L+SyOrWi8pwLRV7mr5d2Egh+BNbRA2xmnuFqHR6U0YNqSQUTDEma8u7uCI2kv ATXfCawA18MTrx9PJYQKviZ0WWtU7zNyEiy2yHcZTzIzJmqYo9lZSa49ap+5DELCbJWb Z4hOH+bN5wRna8sU6hoDD5hnC5KjZBO3yTpYrGPHpSMuZtYUGVOtoA9n/HGc0XEn0ElR gGKw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=xBEf4iQ2asSfr1nhf3N8+WaJ6o77jtBdSEhmotRLPtI=; b=JXmKiTWlcY0ejAEo8En8XuaVD2EqYgGf7TUOuUHNsY4sZJpKx15t+odAb2GE0n6KlF 7xedOf1YTtBEUmSwEXoS+WTU0U0m9FTf46ngXauz1MVOV8GHPtbPIsVjNux3LOQAWHLj J/rgC453C4ou34fuuhO66n+j3/K9goSt3DM5zRRPzqmF76brvOkr22133/Nkb/8aRz2e YHN/xGRB9dYqJNBWGT5zx5ZtKSOtTa8QnzwbphWbxA+AcwcUI/PVuKJ+FW7PztFVdt3w PWjKRWJSRhR36fEbxW3zMNMwIOYtR+CHOs/BsaEgwea2FlbIvqAjw2TFXX9ETgyj7UoX rSmA== X-Gm-Message-State: AOAM5301059faWQmQieB7bcp7B33dSaUU3uYARLGiTlw4EaBwkonYVmK 6p85ZItCwhg4F2x8f3hDR9QGVdfDBWU= X-Google-Smtp-Source: ABdhPJy+/Ig04OSm8w2SAhWbP2/rma5j6yOSCApDpWT05rYQ2SE6IoG6Uo5uJjyDEXfG0Ru4+j4RXg== X-Received: by 2002:adf:e348:: with SMTP id n8mr11938519wrj.148.1611759723813; Wed, 27 Jan 2021 07:02:03 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id h125sm2901258wmh.16.2021.01.27.07.02.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Jan 2021 07:02:03 -0800 (PST) Message-Id: <4a4e90b129aefd6a186561d5e814dc7695bae7cb.1611759716.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Wed, 27 Jan 2021 15:01:44 +0000 Subject: [PATCH v2 05/17] midx: use context in write_midx_pack_names() Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Chris Torek , Derrick Stolee , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee In an effort to align the write_midx_internal() to use the chunk-format API, start converting chunk writing methods to match chunk_write_fn. The first case is to convert write_midx_pack_names() to take "void *data". We already have the necessary data in "struct write_midx_context", so this conversion is rather mechanical. Signed-off-by: Derrick Stolee --- midx.c | 21 ++++++++++----------- 1 file changed, 10 insertions(+), 11 deletions(-) diff --git a/midx.c b/midx.c index 561f65a63a5..88452b04433 100644 --- a/midx.c +++ b/midx.c @@ -643,27 +643,26 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m, return deduplicated_entries; } -static size_t write_midx_pack_names(struct hashfile *f, - struct pack_info *info, - uint32_t num_packs) +static size_t write_midx_pack_names(struct hashfile *f, void *data) { + struct write_midx_context *ctx = data; uint32_t i; unsigned char padding[MIDX_CHUNK_ALIGNMENT]; size_t written = 0; - for (i = 0; i < num_packs; i++) { + for (i = 0; i < ctx->nr; i++) { size_t writelen; - if (info[i].expired) + if (ctx->info[i].expired) continue; - if (i && strcmp(info[i].pack_name, info[i - 1].pack_name) <= 0) + if (i && strcmp(ctx->info[i].pack_name, ctx->info[i - 1].pack_name) <= 0) BUG("incorrect pack-file order: %s before %s", - info[i - 1].pack_name, - info[i].pack_name); + ctx->info[i - 1].pack_name, + ctx->info[i].pack_name); - writelen = strlen(info[i].pack_name) + 1; - hashwrite(f, info[i].pack_name, writelen); + writelen = strlen(ctx->info[i].pack_name) + 1; + hashwrite(f, ctx->info[i].pack_name, writelen); written += writelen; } @@ -990,7 +989,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * switch (chunk_ids[i]) { case MIDX_CHUNKID_PACKNAMES: - written += write_midx_pack_names(f, ctx.info, ctx.nr); + written += write_midx_pack_names(f, &ctx); break; case MIDX_CHUNKID_OIDFANOUT: From patchwork Wed Jan 27 15:01:45 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12050495 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7B805C433E0 for ; Wed, 27 Jan 2021 15:57:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 27563207C4 for ; Wed, 27 Jan 2021 15:57:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236355AbhA0P5Q (ORCPT ); Wed, 27 Jan 2021 10:57:16 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51682 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235478AbhA0PDQ (ORCPT ); Wed, 27 Jan 2021 10:03:16 -0500 Received: from mail-wm1-x32b.google.com (mail-wm1-x32b.google.com [IPv6:2a00:1450:4864:20::32b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A438FC061355 for ; Wed, 27 Jan 2021 07:02:06 -0800 (PST) Received: by mail-wm1-x32b.google.com with SMTP id c128so1953750wme.2 for ; Wed, 27 Jan 2021 07:02:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=z+FgrS4l2ZoIlJaYhz8gp4+eXO+7nQ97g0QFJMohY60=; b=ZWGp/UXCpuhpIAc9YIE2k9PO5zH3CHc817FYaHnQwjLwlmQYax/KSq4nizUjY3P/pR Z1C7XCcenFPHAyL8wYH06/9jFR2F3oDtX8yJgqzZY/KJ+DsfKnnUktGgswX5sLRZp4/M xfALqrF/UCHwLon40NIwSDXPV7c7mUlb+1OfutHYpgI9d4bpxbj7qEWdrR7HhG/4/THa cXMR0JE7AhL6CBolw5lpnqrR9OvGA9Y8SoXpEZ2uq2uxn0EkBQbHDbx8UC09hy6aYg7f Hu/ldbgHh/p3vAYq7I/g+POBRpH0rruUxC/nppUsr9Y6u1nuJNYgmnbxU/qaAeXqWxbV c3ag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=z+FgrS4l2ZoIlJaYhz8gp4+eXO+7nQ97g0QFJMohY60=; b=aUhRAHep6uDk5dqQiS5emjBb5yiWCOgrjjZVYsq8pPJ/D9evk5Scdzswo3dto4aG9W GMYjHmt8B6tgHIPR4u/I/9IQdXlN/3HmS6zQHveKXmWTS3ii+toNlDjNoiTtNcLLBs+/ Rzj6UyKGwe48NFn/VhEe2Dt+0m2de/c7uWeNf+7H10+QWA7apSfWJUDESKdumCceeJ8g nHbAX7/MNXXxFUsYHzAArR9CftSg6j5rl0/Ie8AIoiiql/bXd4VpyRmZMytI1TFGBnH0 BrMqq79ahqpPeMX8XJR8ZhJ9oKwPD35QvYxqYne+KXbfuOJlr7h0kyFTCO9f975g7F1m oCZQ== X-Gm-Message-State: AOAM530RxSNl5Q47xTBifiXfR37j5RBLkivUheeYgHtEvBfDae7FBEx3 gJ4oibVnlxFqbe6Za+KaK5RTFpsEHHk= X-Google-Smtp-Source: ABdhPJzMcguITWly7spm2QfHEK5zb5kqXXtWj5vboYdDg1Bcy4aFbM8YSR4UuWiJFAGip4UwYv0Now== X-Received: by 2002:a1c:f604:: with SMTP id w4mr4665638wmc.39.1611759725156; Wed, 27 Jan 2021 07:02:05 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id g187sm2888415wmf.1.2021.01.27.07.02.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Jan 2021 07:02:04 -0800 (PST) Message-Id: <30ad423997b71645c928b1b6f3cbec71e712d31c.1611759716.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Wed, 27 Jan 2021 15:01:45 +0000 Subject: [PATCH v2 06/17] midx: add entries to write_midx_context Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Chris Torek , Derrick Stolee , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee In an effort to align write_midx_internal() with the chunk-format API, continue to group necessary data into "struct write_midx_context". This change collects the "struct pack_midx_entry *entries" list and its count into the context. Update write_midx_oid_fanout() and write_midx_oid_lookup() to take the context directly, as these are easy conversions with this new data. Only the callers of write_midx_object_offsets() and write_midx_large_offsets() are updated here, since additional data in the context before those methods can match chunk_write_fn. Signed-off-by: Derrick Stolee --- midx.c | 49 ++++++++++++++++++++++++++----------------------- 1 file changed, 26 insertions(+), 23 deletions(-) diff --git a/midx.c b/midx.c index 88452b04433..4520ef82b91 100644 --- a/midx.c +++ b/midx.c @@ -458,6 +458,9 @@ struct write_midx_context { struct multi_pack_index *m; struct progress *progress; unsigned pack_paths_checked; + + struct pack_midx_entry *entries; + uint32_t entries_nr; }; static void add_pack_to_midx(const char *full_path, size_t full_path_len, @@ -678,11 +681,11 @@ static size_t write_midx_pack_names(struct hashfile *f, void *data) } static size_t write_midx_oid_fanout(struct hashfile *f, - struct pack_midx_entry *objects, - uint32_t nr_objects) + void *data) { - struct pack_midx_entry *list = objects; - struct pack_midx_entry *last = objects + nr_objects; + struct write_midx_context *ctx = data; + struct pack_midx_entry *list = ctx->entries; + struct pack_midx_entry *last = ctx->entries + ctx->entries_nr; uint32_t count = 0; uint32_t i; @@ -706,18 +709,19 @@ static size_t write_midx_oid_fanout(struct hashfile *f, return MIDX_CHUNK_FANOUT_SIZE; } -static size_t write_midx_oid_lookup(struct hashfile *f, unsigned char hash_len, - struct pack_midx_entry *objects, - uint32_t nr_objects) +static size_t write_midx_oid_lookup(struct hashfile *f, + void *data) { - struct pack_midx_entry *list = objects; + struct write_midx_context *ctx = data; + unsigned char hash_len = the_hash_algo->rawsz; + struct pack_midx_entry *list = ctx->entries; uint32_t i; size_t written = 0; - for (i = 0; i < nr_objects; i++) { + for (i = 0; i < ctx->entries_nr; i++) { struct pack_midx_entry *obj = list++; - if (i < nr_objects - 1) { + if (i < ctx->entries_nr - 1) { struct pack_midx_entry *next = list; if (oidcmp(&obj->oid, &next->oid) >= 0) BUG("OIDs not in order: %s >= %s", @@ -805,8 +809,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * uint64_t written = 0; uint32_t chunk_ids[MIDX_MAX_CHUNKS + 1]; uint64_t chunk_offsets[MIDX_MAX_CHUNKS + 1]; - uint32_t nr_entries, num_large_offsets = 0; - struct pack_midx_entry *entries = NULL; + uint32_t num_large_offsets = 0; struct progress *progress = NULL; int large_offsets_needed = 0; int pack_name_concat_len = 0; @@ -852,12 +855,12 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * if (ctx.m && ctx.nr == ctx.m->num_packs && !packs_to_drop) goto cleanup; - entries = get_sorted_entries(ctx.m, ctx.info, ctx.nr, &nr_entries); + ctx.entries = get_sorted_entries(ctx.m, ctx.info, ctx.nr, &ctx.entries_nr); - for (i = 0; i < nr_entries; i++) { - if (entries[i].offset > 0x7fffffff) + for (i = 0; i < ctx.entries_nr; i++) { + if (ctx.entries[i].offset > 0x7fffffff) num_large_offsets++; - if (entries[i].offset > 0xffffffff) + if (ctx.entries[i].offset > 0xffffffff) large_offsets_needed = 1; } @@ -947,10 +950,10 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * cur_chunk++; chunk_ids[cur_chunk] = MIDX_CHUNKID_OBJECTOFFSETS; - chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + nr_entries * the_hash_algo->rawsz; + chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + ctx.entries_nr * the_hash_algo->rawsz; cur_chunk++; - chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + nr_entries * MIDX_CHUNK_OFFSET_WIDTH; + chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + ctx.entries_nr * MIDX_CHUNK_OFFSET_WIDTH; if (large_offsets_needed) { chunk_ids[cur_chunk] = MIDX_CHUNKID_LARGEOFFSETS; @@ -993,19 +996,19 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * break; case MIDX_CHUNKID_OIDFANOUT: - written += write_midx_oid_fanout(f, entries, nr_entries); + written += write_midx_oid_fanout(f, &ctx); break; case MIDX_CHUNKID_OIDLOOKUP: - written += write_midx_oid_lookup(f, the_hash_algo->rawsz, entries, nr_entries); + written += write_midx_oid_lookup(f, &ctx); break; case MIDX_CHUNKID_OBJECTOFFSETS: - written += write_midx_object_offsets(f, large_offsets_needed, pack_perm, entries, nr_entries); + written += write_midx_object_offsets(f, large_offsets_needed, pack_perm, ctx.entries, ctx.entries_nr); break; case MIDX_CHUNKID_LARGEOFFSETS: - written += write_midx_large_offsets(f, num_large_offsets, entries, nr_entries); + written += write_midx_large_offsets(f, num_large_offsets, ctx.entries, ctx.entries_nr); break; default: @@ -1035,7 +1038,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * } free(ctx.info); - free(entries); + free(ctx.entries); free(pack_perm); free(midx_name); return result; From patchwork Wed Jan 27 15:01:46 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12050317 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 908E6C433E0 for ; Wed, 27 Jan 2021 15:07:44 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3DF822074D for ; Wed, 27 Jan 2021 15:07:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235582AbhA0PHH (ORCPT ); Wed, 27 Jan 2021 10:07:07 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51132 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235473AbhA0PDC (ORCPT ); Wed, 27 Jan 2021 10:03:02 -0500 Received: from mail-wm1-x32b.google.com (mail-wm1-x32b.google.com [IPv6:2a00:1450:4864:20::32b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9398BC061356 for ; Wed, 27 Jan 2021 07:02:07 -0800 (PST) Received: by mail-wm1-x32b.google.com with SMTP id u14so1944750wmq.4 for ; Wed, 27 Jan 2021 07:02:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=AneWd0085FrvtuS+ErolP0xonuFSYCHl+u0lEosnw9Y=; b=i7ziFPOiNEzvUQM3Q0SSTGRTItbLq8Fesb7iMnyT35j9ifC8FmQI8t6sxGafGe0Fi9 bnHSIbo9I4DAKibFpUxD3kBpNVkd3ytkf9zk4wJW+rMak2HJUdqI7jBWyFAlzI/XoFnz 65bmcVJDF5BNob19So9nrnGifHjx2gaxkXcA1hfO0h9PXi/2iNoixGPyaSgU5uET+H4Y LpvaIPc0Otdrrv5DZQBbFPkGf0evk/kpxJi6gHocDqhDHkf9aZN8lRX54QyFatFXRH+L wM6oA9zpmrWg9aJvSDcX7ZkXv3E4pIC4OoptWlYFul/AArV6+Ks/Pf73oHPE9UqqGFdu xJag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=AneWd0085FrvtuS+ErolP0xonuFSYCHl+u0lEosnw9Y=; b=i/5jadgXWxU4lMo8k8qNRLhMVQ04HoYShEoZdSGm/tC50iWSpePR7FS2HCfikCfjoI S4ReRPFU40MQ8/A9AwQ6p+oSy6KZw4m/+pho07RWUl8u4ZKx/aR3VvTzus++7ViUNDo3 HLcfOVHGCaaT6NEMST7dEkDj3vpTRq1bKcyUvy5b6KUIhXAFHsg6DjHtNVR8jYA2oPMg dgnQNCpodOGRxhbDNR+ZII3L0nVnDeUrO8FdviIokuBt368B3CzdeMEhgEqld/5WJWBN bbTUAiHHacif25sSgNK2iMXseq3qC9P+0Gvm+kV8JtLGUnYUzybhfHedt96+WeEAV5kb mE+w== X-Gm-Message-State: AOAM533mX7R1RZR1pPNvVZGcoABdbRHrIVBRSYM7JBc4FeSyTQZNCtzb k4mcQX0P+AqpsHBY/fnXxSEyo5ahYjo= X-Google-Smtp-Source: ABdhPJyGPOYbNpXl6VgPN/8gv7JmCvaQt3vltCFGCc69N9GGXFKtgwbfOpWxYMT1Xc53RcbdCumfRQ== X-Received: by 2002:a1c:4e:: with SMTP id 75mr4635957wma.150.1611759726138; Wed, 27 Jan 2021 07:02:06 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id h16sm3258985wrq.29.2021.01.27.07.02.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Jan 2021 07:02:05 -0800 (PST) Message-Id: <2f1c496f3ab537820861fb8563b65f2c2fe15136.1611759716.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Wed, 27 Jan 2021 15:01:46 +0000 Subject: [PATCH v2 07/17] midx: add pack_perm to write_midx_context Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Chris Torek , Derrick Stolee , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee In an effort to align write_midx_internal() with the chunk-format API, continue to group necessary data into "struct write_midx_context". This change collects the "uint32_t *pack_perm" and large_offsets_needed bit into the context. Update write_midx_object_offsets() to match chunk_write_fn. Signed-off-by: Derrick Stolee --- midx.c | 40 +++++++++++++++++++++------------------- 1 file changed, 21 insertions(+), 19 deletions(-) diff --git a/midx.c b/midx.c index 4520ef82b91..cd994e333ec 100644 --- a/midx.c +++ b/midx.c @@ -461,6 +461,9 @@ struct write_midx_context { struct pack_midx_entry *entries; uint32_t entries_nr; + + uint32_t *pack_perm; + unsigned large_offsets_needed:1; }; static void add_pack_to_midx(const char *full_path, size_t full_path_len, @@ -736,27 +739,27 @@ static size_t write_midx_oid_lookup(struct hashfile *f, return written; } -static size_t write_midx_object_offsets(struct hashfile *f, int large_offset_needed, - uint32_t *perm, - struct pack_midx_entry *objects, uint32_t nr_objects) +static size_t write_midx_object_offsets(struct hashfile *f, + void *data) { - struct pack_midx_entry *list = objects; + struct write_midx_context *ctx = data; + struct pack_midx_entry *list = ctx->entries; uint32_t i, nr_large_offset = 0; size_t written = 0; - for (i = 0; i < nr_objects; i++) { + for (i = 0; i < ctx->entries_nr; i++) { struct pack_midx_entry *obj = list++; - if (perm[obj->pack_int_id] == PACK_EXPIRED) + if (ctx->pack_perm[obj->pack_int_id] == PACK_EXPIRED) BUG("object %s is in an expired pack with int-id %d", oid_to_hex(&obj->oid), obj->pack_int_id); - hashwrite_be32(f, perm[obj->pack_int_id]); + hashwrite_be32(f, ctx->pack_perm[obj->pack_int_id]); - if (large_offset_needed && obj->offset >> 31) + if (ctx->large_offsets_needed && obj->offset >> 31) hashwrite_be32(f, MIDX_LARGE_OFFSET_NEEDED | nr_large_offset++); - else if (!large_offset_needed && obj->offset >> 32) + else if (!ctx->large_offsets_needed && obj->offset >> 32) BUG("object %s requires a large offset (%"PRIx64") but the MIDX is not writing large offsets!", oid_to_hex(&obj->oid), obj->offset); @@ -805,13 +808,11 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * struct hashfile *f = NULL; struct lock_file lk; struct write_midx_context ctx = { 0 }; - uint32_t *pack_perm = NULL; uint64_t written = 0; uint32_t chunk_ids[MIDX_MAX_CHUNKS + 1]; uint64_t chunk_offsets[MIDX_MAX_CHUNKS + 1]; uint32_t num_large_offsets = 0; struct progress *progress = NULL; - int large_offsets_needed = 0; int pack_name_concat_len = 0; int dropped_packs = 0; int result = 0; @@ -857,11 +858,12 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * ctx.entries = get_sorted_entries(ctx.m, ctx.info, ctx.nr, &ctx.entries_nr); + ctx.large_offsets_needed = 0; for (i = 0; i < ctx.entries_nr; i++) { if (ctx.entries[i].offset > 0x7fffffff) num_large_offsets++; if (ctx.entries[i].offset > 0xffffffff) - large_offsets_needed = 1; + ctx.large_offsets_needed = 1; } QSORT(ctx.info, ctx.nr, pack_info_compare); @@ -900,13 +902,13 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * * * pack_perm[old_id] = new_id */ - ALLOC_ARRAY(pack_perm, ctx.nr); + ALLOC_ARRAY(ctx.pack_perm, ctx.nr); for (i = 0; i < ctx.nr; i++) { if (ctx.info[i].expired) { dropped_packs++; - pack_perm[ctx.info[i].orig_pack_int_id] = PACK_EXPIRED; + ctx.pack_perm[ctx.info[i].orig_pack_int_id] = PACK_EXPIRED; } else { - pack_perm[ctx.info[i].orig_pack_int_id] = i - dropped_packs; + ctx.pack_perm[ctx.info[i].orig_pack_int_id] = i - dropped_packs; } } @@ -927,7 +929,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * close_midx(ctx.m); cur_chunk = 0; - num_chunks = large_offsets_needed ? 5 : 4; + num_chunks = ctx.large_offsets_needed ? 5 : 4; if (ctx.nr - dropped_packs == 0) { error(_("no pack files to index.")); @@ -954,7 +956,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * cur_chunk++; chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + ctx.entries_nr * MIDX_CHUNK_OFFSET_WIDTH; - if (large_offsets_needed) { + if (ctx.large_offsets_needed) { chunk_ids[cur_chunk] = MIDX_CHUNKID_LARGEOFFSETS; cur_chunk++; @@ -1004,7 +1006,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * break; case MIDX_CHUNKID_OBJECTOFFSETS: - written += write_midx_object_offsets(f, large_offsets_needed, pack_perm, ctx.entries, ctx.entries_nr); + written += write_midx_object_offsets(f, &ctx); break; case MIDX_CHUNKID_LARGEOFFSETS: @@ -1039,7 +1041,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * free(ctx.info); free(ctx.entries); - free(pack_perm); + free(ctx.pack_perm); free(midx_name); return result; } From patchwork Wed Jan 27 15:01:47 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12050497 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AC936C433E6 for ; Wed, 27 Jan 2021 15:58:27 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 63445207D0 for ; Wed, 27 Jan 2021 15:58:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236843AbhA0P5m (ORCPT ); Wed, 27 Jan 2021 10:57:42 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51158 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235315AbhA0PDH (ORCPT ); Wed, 27 Jan 2021 10:03:07 -0500 Received: from mail-wm1-x32e.google.com (mail-wm1-x32e.google.com [IPv6:2a00:1450:4864:20::32e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 96AF0C0612F2 for ; Wed, 27 Jan 2021 07:02:08 -0800 (PST) Received: by mail-wm1-x32e.google.com with SMTP id m187so1806148wme.2 for ; Wed, 27 Jan 2021 07:02:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=CiliLAMun0gIKzobYrydAgAbTcUC3B/7iQngvKucSDE=; b=judR14PePSF11al1UrFhuObgrWEqAzI8GG4sn+4dFGs+PSS4dRPFrp7JAcFC/WQDxE 9gtb3b7uo5XVp3kcJLT0IoKy0yTUu8MBmXu+t0PlGoc9tOy1gHmY36whRW758eQSrqJI 6ZEXYPW3bQiOsj0CR2WGkk7yGEsjTeM82s0ZU3L2h6G/72uY+GL9NNecz//54IyVzdhi JsB7nchzNIb/bcHxyODhqu/+D7HjgOWM767osUGRER+sGxyRTf/W2QAXK55ho76ir6+C m3U47tMO0nx1NfLrCbiuY0v1FCO8T96Fo5HN7atc+f0q2GfTUJzlTEDVb4Ib6PSzP2bX WPWg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=CiliLAMun0gIKzobYrydAgAbTcUC3B/7iQngvKucSDE=; b=CVfSX9bdm0EdAuTUiTRnIS6+XTwVEgcCQ2SV2PZpjj+b/MmN5tjXoxPKqzlh3WY81Y r+OfIKBm6b526JH04WhY4XQggf4291a73+upcrZoFCJiOZZhiXIdracIgXHaLZ9jzdAe LB5YVSzAMrOKLBp6ACCm4wjZTis7GPiC/PADm1B+R4bzSKZbN4a25p3wPRycGQEjJh4R MWhHH8HFbPhFFzQiUhQ5BmjRT5M7MZvedO8WebfbBCoDXK79zBe84Ra1LzcW/GjAshU+ TZqZLTIvKG4aVcv4wfmx0UuZ+mmQhg+hZoWGmacm7m3v+o14P40nogHqtQP4UiOY1PcV 7yhQ== X-Gm-Message-State: AOAM530XhSGARthESTB3M+47vW5NqTDlUmBLQW0LLkgTnzRtzFcRZc2Z hiLMgsElLjEZ1h3E2omefVKdBbqIXOM= X-Google-Smtp-Source: ABdhPJx9i+1YEuWxUnP6bHm+T4ARxLSxAImMouUiUiomHuk1PL/pSXiZuDJ7u6l0LFWaW37gvVR1Jg== X-Received: by 2002:a1c:2003:: with SMTP id g3mr4493728wmg.90.1611759727137; Wed, 27 Jan 2021 07:02:07 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id u16sm3077255wrn.68.2021.01.27.07.02.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Jan 2021 07:02:06 -0800 (PST) Message-Id: In-Reply-To: References: Date: Wed, 27 Jan 2021 15:01:47 +0000 Subject: [PATCH v2 08/17] midx: add num_large_offsets to write_midx_context Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Chris Torek , Derrick Stolee , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee In an effort to align write_midx_internal() with the chunk-format API, continue to group necessary data into "struct write_midx_context". This change collects the "uint32_t num_large_offsets" into the context. With this new data, write_midx_large_offsets() now matches the chunk_write_fn type. Signed-off-by: Derrick Stolee --- midx.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/midx.c b/midx.c index cd994e333ec..5be081f229a 100644 --- a/midx.c +++ b/midx.c @@ -464,6 +464,7 @@ struct write_midx_context { uint32_t *pack_perm; unsigned large_offsets_needed:1; + uint32_t num_large_offsets; }; static void add_pack_to_midx(const char *full_path, size_t full_path_len, @@ -772,11 +773,14 @@ static size_t write_midx_object_offsets(struct hashfile *f, return written; } -static size_t write_midx_large_offsets(struct hashfile *f, uint32_t nr_large_offset, - struct pack_midx_entry *objects, uint32_t nr_objects) +static size_t write_midx_large_offsets(struct hashfile *f, + void *data) { - struct pack_midx_entry *list = objects, *end = objects + nr_objects; + struct write_midx_context *ctx = data; + struct pack_midx_entry *list = ctx->entries; + struct pack_midx_entry *end = ctx->entries + ctx->entries_nr; size_t written = 0; + uint32_t nr_large_offset = ctx->num_large_offsets; while (nr_large_offset) { struct pack_midx_entry *obj; @@ -811,7 +815,6 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * uint64_t written = 0; uint32_t chunk_ids[MIDX_MAX_CHUNKS + 1]; uint64_t chunk_offsets[MIDX_MAX_CHUNKS + 1]; - uint32_t num_large_offsets = 0; struct progress *progress = NULL; int pack_name_concat_len = 0; int dropped_packs = 0; @@ -861,7 +864,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * ctx.large_offsets_needed = 0; for (i = 0; i < ctx.entries_nr; i++) { if (ctx.entries[i].offset > 0x7fffffff) - num_large_offsets++; + ctx.num_large_offsets++; if (ctx.entries[i].offset > 0xffffffff) ctx.large_offsets_needed = 1; } @@ -961,7 +964,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * cur_chunk++; chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + - num_large_offsets * MIDX_CHUNK_LARGE_OFFSET_WIDTH; + ctx.num_large_offsets * MIDX_CHUNK_LARGE_OFFSET_WIDTH; } chunk_ids[cur_chunk] = 0; @@ -1010,7 +1013,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * break; case MIDX_CHUNKID_LARGEOFFSETS: - written += write_midx_large_offsets(f, num_large_offsets, ctx.entries, ctx.entries_nr); + written += write_midx_large_offsets(f, &ctx); break; default: From patchwork Wed Jan 27 15:01:48 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12050437 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 48800C433E0 for ; Wed, 27 Jan 2021 15:36:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DB851207D0 for ; Wed, 27 Jan 2021 15:36:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236190AbhA0PgQ (ORCPT ); Wed, 27 Jan 2021 10:36:16 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51866 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235646AbhA0PEH (ORCPT ); Wed, 27 Jan 2021 10:04:07 -0500 Received: from mail-wr1-x436.google.com (mail-wr1-x436.google.com [IPv6:2a00:1450:4864:20::436]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7C0CCC06121C for ; Wed, 27 Jan 2021 07:02:09 -0800 (PST) Received: by mail-wr1-x436.google.com with SMTP id v15so2256759wrx.4 for ; Wed, 27 Jan 2021 07:02:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=DNrUlJKzkd69wTnW/ETeSDpBd+f1N66q0Q1nMjyjNWg=; b=agyAsuuCFgKTv2xKRR1EClaRhKFTng9Nf6sbW+J1L0D9HXOZGiObFmqhQaiyO2kdHZ oG5eajXM39j7hS7jqeNcfrtiZCWNZ1QFbZSoHPce/m4dUmVTbUcHkVJzfaaqMV7JNF+p QO+7bsryUASGqoGrzNYMUusZkzY+DZZ5GkKINZip1WFuPPScYK4rCXOkVV1WzYsYoFDc IP7KAZzbxmWmEtyAA8LfiqEQRvHEw/kLBL/6d4oe+uFPHq6ww151ArgInx1VtOBymWGW Ong77Ynu6Nwnh/8fQvlFh6OdXO3PJkwG0sSp/gkvAgqg53ZSfJjHviuF+QA33xOaiQEP 4vIQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=DNrUlJKzkd69wTnW/ETeSDpBd+f1N66q0Q1nMjyjNWg=; b=KHyYwFK8yI+gX0AmfP2quT4NecmPJHGuhM5BW2IAVFhHsuk3slGG/xPzR7TkPx7qOH JXmXZSTpqvY3s3KNbldgoUvl9VrMtlOXnqFmxS9DGZo0FGWakzBGXHkgWIur0BMic3Fj kEB3I6seedq2CvC+1rkdz+keL9FxM145T7VOAL8G7IemTZCWralovEwbx6VmgPOkgILm W+mheNgViE0lTcD97xYV448iWM5UNpK+ouKdo0M8vqpj1nDtVPZWdYF+nd6efbZivTzE mN/dMX7jotvfUFehsKENuVGY/N341nzZTrg84k8AQRXLDn3/7FVEBIHROba53n2PXRRM NlfQ== X-Gm-Message-State: AOAM533E+ryaRF4t9wg/TKDoj2UnxH0qSrf95L13wtz2vTRv2Whi9LdQ imu4S5Q+qsZ3D5LGRdG5Ze14tmncJyw= X-Google-Smtp-Source: ABdhPJx6sCW2A/wQYY7y7JOpO41ec94+kRXcWY2/SL/FO0DcQkvVRglbkRogkFZO/7pZcsxGxxZe1A== X-Received: by 2002:a5d:6b89:: with SMTP id n9mr11316343wrx.323.1611759728048; Wed, 27 Jan 2021 07:02:08 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id d13sm3191526wrx.93.2021.01.27.07.02.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Jan 2021 07:02:07 -0800 (PST) Message-Id: In-Reply-To: References: Date: Wed, 27 Jan 2021 15:01:48 +0000 Subject: [PATCH v2 09/17] midx: return success/failure in chunk write methods Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Chris Torek , Derrick Stolee , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee Historically, the chunk-writing methods in midx.c have returned the amount of data written so the writer method could compare this with the table of contents. This presents with some interesting issues: 1. If a chunk writing method has a bug that miscalculates the written bytes, then we can satisfy the table of contents without actually writing the right amount of data to the hashfile. The commit-graph writing code checks the hashfile struct directly for a more robust verification. 2. There is no way for a chunk writing method to gracefully fail. Returning an int presents an opportunity to fail without a die(). 3. The current pattern doesn't match chunk_write_fn type exactly, so we cannot share code with commit-graph.c For these reasons, convert the midx chunk writer methods to return an 'int'. Since none of them fail at the moment, they all return 0. Signed-off-by: Derrick Stolee --- midx.c | 63 +++++++++++++++++++++++++--------------------------------- 1 file changed, 27 insertions(+), 36 deletions(-) diff --git a/midx.c b/midx.c index 5be081f229a..e23a5fc4903 100644 --- a/midx.c +++ b/midx.c @@ -650,7 +650,7 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m, return deduplicated_entries; } -static size_t write_midx_pack_names(struct hashfile *f, void *data) +static int write_midx_pack_names(struct hashfile *f, void *data) { struct write_midx_context *ctx = data; uint32_t i; @@ -678,14 +678,13 @@ static size_t write_midx_pack_names(struct hashfile *f, void *data) if (i < MIDX_CHUNK_ALIGNMENT) { memset(padding, 0, sizeof(padding)); hashwrite(f, padding, i); - written += i; } - return written; + return 0; } -static size_t write_midx_oid_fanout(struct hashfile *f, - void *data) +static int write_midx_oid_fanout(struct hashfile *f, + void *data) { struct write_midx_context *ctx = data; struct pack_midx_entry *list = ctx->entries; @@ -710,17 +709,16 @@ static size_t write_midx_oid_fanout(struct hashfile *f, list = next; } - return MIDX_CHUNK_FANOUT_SIZE; + return 0; } -static size_t write_midx_oid_lookup(struct hashfile *f, - void *data) +static int write_midx_oid_lookup(struct hashfile *f, + void *data) { struct write_midx_context *ctx = data; unsigned char hash_len = the_hash_algo->rawsz; struct pack_midx_entry *list = ctx->entries; uint32_t i; - size_t written = 0; for (i = 0; i < ctx->entries_nr; i++) { struct pack_midx_entry *obj = list++; @@ -734,19 +732,17 @@ static size_t write_midx_oid_lookup(struct hashfile *f, } hashwrite(f, obj->oid.hash, (int)hash_len); - written += hash_len; } - return written; + return 0; } -static size_t write_midx_object_offsets(struct hashfile *f, - void *data) +static int write_midx_object_offsets(struct hashfile *f, + void *data) { struct write_midx_context *ctx = data; struct pack_midx_entry *list = ctx->entries; uint32_t i, nr_large_offset = 0; - size_t written = 0; for (i = 0; i < ctx->entries_nr; i++) { struct pack_midx_entry *obj = list++; @@ -766,20 +762,17 @@ static size_t write_midx_object_offsets(struct hashfile *f, obj->offset); else hashwrite_be32(f, (uint32_t)obj->offset); - - written += MIDX_CHUNK_OFFSET_WIDTH; } - return written; + return 0; } -static size_t write_midx_large_offsets(struct hashfile *f, - void *data) +static int write_midx_large_offsets(struct hashfile *f, + void *data) { struct write_midx_context *ctx = data; struct pack_midx_entry *list = ctx->entries; struct pack_midx_entry *end = ctx->entries + ctx->entries_nr; - size_t written = 0; uint32_t nr_large_offset = ctx->num_large_offsets; while (nr_large_offset) { @@ -795,12 +788,12 @@ static size_t write_midx_large_offsets(struct hashfile *f, if (!(offset >> 31)) continue; - written += hashwrite_be64(f, offset); + hashwrite_be64(f, offset); nr_large_offset--; } - return written; + return 0; } static int write_midx_internal(const char *object_dir, struct multi_pack_index *m, @@ -812,7 +805,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * struct hashfile *f = NULL; struct lock_file lk; struct write_midx_context ctx = { 0 }; - uint64_t written = 0; + uint64_t header_size = 0; uint32_t chunk_ids[MIDX_MAX_CHUNKS + 1]; uint64_t chunk_offsets[MIDX_MAX_CHUNKS + 1]; struct progress *progress = NULL; @@ -940,10 +933,10 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * goto cleanup; } - written = write_midx_header(f, num_chunks, ctx.nr - dropped_packs); + header_size = write_midx_header(f, num_chunks, ctx.nr - dropped_packs); chunk_ids[cur_chunk] = MIDX_CHUNKID_PACKNAMES; - chunk_offsets[cur_chunk] = written + (num_chunks + 1) * MIDX_CHUNKLOOKUP_WIDTH; + chunk_offsets[cur_chunk] = header_size + (num_chunks + 1) * MIDX_CHUNKLOOKUP_WIDTH; cur_chunk++; chunk_ids[cur_chunk] = MIDX_CHUNKID_OIDFANOUT; @@ -981,39 +974,37 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * hashwrite_be32(f, chunk_ids[i]); hashwrite_be64(f, chunk_offsets[i]); - - written += MIDX_CHUNKLOOKUP_WIDTH; } if (flags & MIDX_PROGRESS) progress = start_delayed_progress(_("Writing chunks to multi-pack-index"), num_chunks); for (i = 0; i < num_chunks; i++) { - if (written != chunk_offsets[i]) + if (f->total + f->offset != chunk_offsets[i]) BUG("incorrect chunk offset (%"PRIu64" != %"PRIu64") for chunk id %"PRIx32, chunk_offsets[i], - written, + f->total + f->offset, chunk_ids[i]); switch (chunk_ids[i]) { case MIDX_CHUNKID_PACKNAMES: - written += write_midx_pack_names(f, &ctx); + write_midx_pack_names(f, &ctx); break; case MIDX_CHUNKID_OIDFANOUT: - written += write_midx_oid_fanout(f, &ctx); + write_midx_oid_fanout(f, &ctx); break; case MIDX_CHUNKID_OIDLOOKUP: - written += write_midx_oid_lookup(f, &ctx); + write_midx_oid_lookup(f, &ctx); break; case MIDX_CHUNKID_OBJECTOFFSETS: - written += write_midx_object_offsets(f, &ctx); + write_midx_object_offsets(f, &ctx); break; case MIDX_CHUNKID_LARGEOFFSETS: - written += write_midx_large_offsets(f, &ctx); + write_midx_large_offsets(f, &ctx); break; default: @@ -1025,9 +1016,9 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * } stop_progress(&progress); - if (written != chunk_offsets[num_chunks]) + if (f->total + f->offset != chunk_offsets[num_chunks]) BUG("incorrect final offset %"PRIu64" != %"PRIu64, - written, + f->total + f->offset, chunk_offsets[num_chunks]); finalize_hashfile(f, NULL, CSUM_FSYNC | CSUM_HASH_IN_STREAM); From patchwork Wed Jan 27 15:01:49 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12050439 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1EAABC433E6 for ; Wed, 27 Jan 2021 15:36:44 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C343A207D2 for ; Wed, 27 Jan 2021 15:36:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236224AbhA0Pgb (ORCPT ); Wed, 27 Jan 2021 10:36:31 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50922 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235573AbhA0PDn (ORCPT ); Wed, 27 Jan 2021 10:03:43 -0500 Received: from mail-wm1-x334.google.com (mail-wm1-x334.google.com [IPv6:2a00:1450:4864:20::334]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7F1B2C06121D for ; Wed, 27 Jan 2021 07:02:10 -0800 (PST) Received: by mail-wm1-x334.google.com with SMTP id c128so1953956wme.2 for ; Wed, 27 Jan 2021 07:02:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=+DRAQNjk0bjEC6cdHtenVpAcIZ+yrXA2o9jF08e5BFs=; b=tZ/ocKqAOCkGcXjweB3UoNcuwQiuI+5Ql7UCvgEDsQAyx9puJeZPJFZFLmXF7blPn3 em9YmKZoX459M2n0VqFW/6sZKxiuzTQ06vJUEH3Uj/k9dLwsLAeFtwwO8oWrrZDK/an1 51jScuWpelmtWPnxuk1ZsvQZCKHGYk6WXh25MtEarQhRJjLa1TyzU7MbHgWNup9H20Ut WKFf7BVTXWJJ1qZCc9RS2YK2Dc+F3TS0wpolTjltK8y28k/p+OVcwT/HW9JZsMlwfP9f HrAdKsEFzNN2YMlLkzdvIns3aHdFBx7GyGzMln4ERV56VQ3Wbucwewme5TaXJw8skUDi MQaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=+DRAQNjk0bjEC6cdHtenVpAcIZ+yrXA2o9jF08e5BFs=; b=VpwN4/FhqIkNdwLPOV8K4ITEbPhdjRyz9qIXvh5uXtgZp4KA7HUr+BHXVqxdCecyRy WJpGt0lkvta3U6mLAfwJMGU8GoGWeCeebm/LyrSx7yYAPEdZT01QnX3/a+qoWNet/JwS 1Y6sgYB72AvaKfr3yhezl5G1exvA7THD1a6F8oxtUPwmrwyPDyC1hqDm85mswqXi7OdG opldU+ecQyW7zQ+Y+W+ZLEGuLkWtbaLyTb+foh6y80l2Klje0yj3H6iay0t123xpE/z7 r8wKLBvAOJlISl9ZBq1YzZtCoKOw03Zrg3GWLCdXPNs6zvA+hbWQ2zfM5fDyCipL37rB U6Xg== X-Gm-Message-State: AOAM533bqQ5Fa7XIxK70XV1xDgao0A0oyH7UKsUqPkPm7GJNDLQj48xK d6gaPIrJWd3K4jh/aAY2PJAZ9lZauJk= X-Google-Smtp-Source: ABdhPJxlQpzJRwYR1pKSw/IV6N9ml9VFhK70dHQ1vyQwMBh79oOKGkdxMgJ0wOV88xqs5cDSeUq00w== X-Received: by 2002:a05:600c:24b:: with SMTP id 11mr4402960wmj.17.1611759729043; Wed, 27 Jan 2021 07:02:09 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id r16sm3318766wrx.36.2021.01.27.07.02.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Jan 2021 07:02:08 -0800 (PST) Message-Id: <78744d3b7016520cfc946e858eb1f6233003bffc.1611759716.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Wed, 27 Jan 2021 15:01:49 +0000 Subject: [PATCH v2 10/17] midx: drop chunk progress during write Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Chris Torek , Derrick Stolee , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee Most expensive operations in write_midx_internal() use the context struct's progress member, and these indicate the process of the expensive operations within the chunk writing methods. However, there is a competing progress struct that counts the progress over all chunks. This is not very helpful compared to the others, so drop it. This also reduces our barriers to combining the chunk writing code with chunk-format.c. Signed-off-by: Derrick Stolee --- midx.c | 7 ------- 1 file changed, 7 deletions(-) diff --git a/midx.c b/midx.c index e23a5fc4903..6ee262aab79 100644 --- a/midx.c +++ b/midx.c @@ -808,7 +808,6 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * uint64_t header_size = 0; uint32_t chunk_ids[MIDX_MAX_CHUNKS + 1]; uint64_t chunk_offsets[MIDX_MAX_CHUNKS + 1]; - struct progress *progress = NULL; int pack_name_concat_len = 0; int dropped_packs = 0; int result = 0; @@ -976,9 +975,6 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * hashwrite_be64(f, chunk_offsets[i]); } - if (flags & MIDX_PROGRESS) - progress = start_delayed_progress(_("Writing chunks to multi-pack-index"), - num_chunks); for (i = 0; i < num_chunks; i++) { if (f->total + f->offset != chunk_offsets[i]) BUG("incorrect chunk offset (%"PRIu64" != %"PRIu64") for chunk id %"PRIx32, @@ -1011,10 +1007,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * BUG("trying to write unknown chunk id %"PRIx32, chunk_ids[i]); } - - display_progress(progress, i + 1); } - stop_progress(&progress); if (f->total + f->offset != chunk_offsets[num_chunks]) BUG("incorrect final offset %"PRIu64" != %"PRIu64, From patchwork Wed Jan 27 15:01:50 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12050407 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 127B5C433E0 for ; Wed, 27 Jan 2021 15:36:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id AF719207D0 for ; Wed, 27 Jan 2021 15:36:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236333AbhA0Pfy (ORCPT ); Wed, 27 Jan 2021 10:35:54 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51874 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235659AbhA0PEJ (ORCPT ); Wed, 27 Jan 2021 10:04:09 -0500 Received: from mail-wr1-x42c.google.com (mail-wr1-x42c.google.com [IPv6:2a00:1450:4864:20::42c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 776BAC06121E for ; Wed, 27 Jan 2021 07:02:11 -0800 (PST) Received: by mail-wr1-x42c.google.com with SMTP id d16so2209880wro.11 for ; Wed, 27 Jan 2021 07:02:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=1ac7aq3Rq9lLGFvGJOTpKlrnF07cuk7BfPOPt8yCS/Q=; b=Y0ZL/9MlTcRk3DfmocSNMaS+rCq5fDaN8V7XW5yLIMeGBV09mV9plp+e2gHtjlLL+s pVlgO7kQxHvTNj9ZljQcM5d30zzk6zYsJobZ70sNB6oQJAxIzgFKb7W/K83IXE1Ijfih /3JFpx4cCYFucem8Z870ImUlG1g3qncw8JpO0+bcVsDceVxV9TjMrrtGm24D/zpiX/KV KlDEVOxSlc9iva4m4nBcq+1OGqWQPNEhQVeVJURiMluqWwHdWj4TqhTi506pVwGCfS4L +K+8kiHOVx89YpQpGLuTewextdpO8HDC7IsiFyy2XYeAWYDPG1GEb4eHIL7fHN/YuAgY jD6Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=1ac7aq3Rq9lLGFvGJOTpKlrnF07cuk7BfPOPt8yCS/Q=; b=DNhKOjFbbL9yJpr4Ep8AsAHLUIqBqfn88C9SD1hwatrI1KvyEDAmu/961yupRzViRT AprdbVfORZw80xMdaiB2GFjeTqPTrjfPIFR9kIJMtifWX51tEFlxKaKxZqUnnIJIIqU8 LsYDVKsHxR+kBgMeKnycsboGeCtr6dyJ87tEmhUPZVNO3MJwwJxT1xTaqEIs2qUMdXBi TRwk1cKlsWOgc6TYTQWV6t6HM0hdoKcfp8ulaWo5pgD6H/TAgx5x+U1JAB87V1mVDHGx EpokvGzb8mDD6QYMGBMstwjXm0dct689al7O8mxrZCPZ3tiDKy0ZYRTqVx9kjMvrb4Vu ekcg== X-Gm-Message-State: AOAM531akcLRW7dl1NJiHCrY5FxMDx9PKxNoJhjM8bcwpyJL19/86aFe iyqx7GNzde1jf1fO+l74EmrBsRj4j4M= X-Google-Smtp-Source: ABdhPJyVV27g69FwEuMYM1f/Uh7bykXwr2sHRLg8eLVBpMx70WrrSTcIlGMZal9cyhA2Sy8h74lW3A== X-Received: by 2002:a5d:6842:: with SMTP id o2mr11851285wrw.310.1611759730005; Wed, 27 Jan 2021 07:02:10 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id a184sm2938830wme.35.2021.01.27.07.02.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Jan 2021 07:02:09 -0800 (PST) Message-Id: <07dc0cf8c683676d304ac16fde2338f49e5cc483.1611759716.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Wed, 27 Jan 2021 15:01:50 +0000 Subject: [PATCH v2 11/17] midx: use chunk-format API in write_midx_internal() Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Chris Torek , Derrick Stolee , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The chunk-format API allows writing the table of contents and all chunks using the anonymous 'struct chunkfile' type. We only need to convert our local chunk logic to this API for the multi-pack-index writes to share that logic with the commit-graph file writes. Signed-off-by: Derrick Stolee --- midx.c | 104 +++++++++++---------------------------------------------- 1 file changed, 19 insertions(+), 85 deletions(-) diff --git a/midx.c b/midx.c index 6ee262aab79..3585e04a706 100644 --- a/midx.c +++ b/midx.c @@ -11,6 +11,7 @@ #include "trace2.h" #include "run-command.h" #include "repository.h" +#include "chunk-format.h" #define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */ #define MIDX_VERSION 1 @@ -799,18 +800,15 @@ static int write_midx_large_offsets(struct hashfile *f, static int write_midx_internal(const char *object_dir, struct multi_pack_index *m, struct string_list *packs_to_drop, unsigned flags) { - unsigned char cur_chunk, num_chunks = 0; char *midx_name; uint32_t i; struct hashfile *f = NULL; struct lock_file lk; struct write_midx_context ctx = { 0 }; - uint64_t header_size = 0; - uint32_t chunk_ids[MIDX_MAX_CHUNKS + 1]; - uint64_t chunk_offsets[MIDX_MAX_CHUNKS + 1]; int pack_name_concat_len = 0; int dropped_packs = 0; int result = 0; + struct chunkfile *cf; midx_name = get_midx_filename(object_dir); if (safe_create_leading_directories(midx_name)) @@ -923,98 +921,34 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * if (ctx.m) close_midx(ctx.m); - cur_chunk = 0; - num_chunks = ctx.large_offsets_needed ? 5 : 4; - if (ctx.nr - dropped_packs == 0) { error(_("no pack files to index.")); result = 1; goto cleanup; } - header_size = write_midx_header(f, num_chunks, ctx.nr - dropped_packs); - - chunk_ids[cur_chunk] = MIDX_CHUNKID_PACKNAMES; - chunk_offsets[cur_chunk] = header_size + (num_chunks + 1) * MIDX_CHUNKLOOKUP_WIDTH; - - cur_chunk++; - chunk_ids[cur_chunk] = MIDX_CHUNKID_OIDFANOUT; - chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + pack_name_concat_len; - - cur_chunk++; - chunk_ids[cur_chunk] = MIDX_CHUNKID_OIDLOOKUP; - chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + MIDX_CHUNK_FANOUT_SIZE; - - cur_chunk++; - chunk_ids[cur_chunk] = MIDX_CHUNKID_OBJECTOFFSETS; - chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + ctx.entries_nr * the_hash_algo->rawsz; - - cur_chunk++; - chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + ctx.entries_nr * MIDX_CHUNK_OFFSET_WIDTH; - if (ctx.large_offsets_needed) { - chunk_ids[cur_chunk] = MIDX_CHUNKID_LARGEOFFSETS; - - cur_chunk++; - chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + - ctx.num_large_offsets * MIDX_CHUNK_LARGE_OFFSET_WIDTH; - } - - chunk_ids[cur_chunk] = 0; - - for (i = 0; i <= num_chunks; i++) { - if (i && chunk_offsets[i] < chunk_offsets[i - 1]) - BUG("incorrect chunk offsets: %"PRIu64" before %"PRIu64, - chunk_offsets[i - 1], - chunk_offsets[i]); - - if (chunk_offsets[i] % MIDX_CHUNK_ALIGNMENT) - BUG("chunk offset %"PRIu64" is not properly aligned", - chunk_offsets[i]); - - hashwrite_be32(f, chunk_ids[i]); - hashwrite_be64(f, chunk_offsets[i]); - } - - for (i = 0; i < num_chunks; i++) { - if (f->total + f->offset != chunk_offsets[i]) - BUG("incorrect chunk offset (%"PRIu64" != %"PRIu64") for chunk id %"PRIx32, - chunk_offsets[i], - f->total + f->offset, - chunk_ids[i]); + cf = init_chunkfile(f); - switch (chunk_ids[i]) { - case MIDX_CHUNKID_PACKNAMES: - write_midx_pack_names(f, &ctx); - break; + add_chunk(cf, MIDX_CHUNKID_PACKNAMES, + write_midx_pack_names, pack_name_concat_len); + add_chunk(cf, MIDX_CHUNKID_OIDFANOUT, + write_midx_oid_fanout, MIDX_CHUNK_FANOUT_SIZE); + add_chunk(cf, MIDX_CHUNKID_OIDLOOKUP, + write_midx_oid_lookup, ctx.entries_nr * the_hash_algo->rawsz); + add_chunk(cf, MIDX_CHUNKID_OBJECTOFFSETS, + write_midx_object_offsets, + ctx.entries_nr * MIDX_CHUNK_OFFSET_WIDTH); - case MIDX_CHUNKID_OIDFANOUT: - write_midx_oid_fanout(f, &ctx); - break; - - case MIDX_CHUNKID_OIDLOOKUP: - write_midx_oid_lookup(f, &ctx); - break; - - case MIDX_CHUNKID_OBJECTOFFSETS: - write_midx_object_offsets(f, &ctx); - break; - - case MIDX_CHUNKID_LARGEOFFSETS: - write_midx_large_offsets(f, &ctx); - break; - - default: - BUG("trying to write unknown chunk id %"PRIx32, - chunk_ids[i]); - } - } + if (ctx.large_offsets_needed) + add_chunk(cf, MIDX_CHUNKID_LARGEOFFSETS, + write_midx_large_offsets, + ctx.num_large_offsets * MIDX_CHUNK_LARGE_OFFSET_WIDTH); - if (f->total + f->offset != chunk_offsets[num_chunks]) - BUG("incorrect final offset %"PRIu64" != %"PRIu64, - f->total + f->offset, - chunk_offsets[num_chunks]); + write_midx_header(f, get_num_chunks(cf), ctx.nr - dropped_packs); + write_chunkfile(cf, &ctx); finalize_hashfile(f, NULL, CSUM_FSYNC | CSUM_HASH_IN_STREAM); + free_chunkfile(cf); commit_lock_file(&lk); cleanup: From patchwork Wed Jan 27 15:01:51 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12050409 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5E346C433DB for ; Wed, 27 Jan 2021 15:36:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1FEFB207D0 for ; Wed, 27 Jan 2021 15:36:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235955AbhA0PgE (ORCPT ); Wed, 27 Jan 2021 10:36:04 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50910 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235610AbhA0PD6 (ORCPT ); Wed, 27 Jan 2021 10:03:58 -0500 Received: from mail-wm1-x32b.google.com (mail-wm1-x32b.google.com [IPv6:2a00:1450:4864:20::32b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 98EB1C06121F for ; Wed, 27 Jan 2021 07:02:12 -0800 (PST) Received: by mail-wm1-x32b.google.com with SMTP id c128so1954065wme.2 for ; Wed, 27 Jan 2021 07:02:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=ryEQU73ySFs9i5qWnwYa2tmiNxlV09ruJyI3FKs6Bew=; b=oln+4TEg04MmPKi5DE+YIgN814TUPXYwPG9XG6bovGCH++7i9f/rktbseYH8sSCTIk VceC6UShTGp7df8omSj8bYUMXCVPcTz90dnNJivyFBNq9qUYNw0XTv5xgXa3iuu4Lv/u OawHUuKdB1tIGW2AfZ9OoCqSFCB2n5Ig5CbswztI05EYUvuivXTF/fnavBef9BIq3eus eRRZe6T44/1/tE2IrzBnnUdx+H4bTdSApBDkdGVvXCqUc1HTgNnw85Lz3iYbXa4XAM7u n+4BU9bl4pGRTeFqUVgC3p3888GoKPCQsuWkUH6Uquc2F0HSmuM0R9tNsutuF/hXGrvt oddQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=ryEQU73ySFs9i5qWnwYa2tmiNxlV09ruJyI3FKs6Bew=; b=MghuBPd7DNaKSIeiKRH5/6pG4nrTn6a7qYKRYrTxL3Luqx/SmyOQgAfc9jojg2KDGN hRCH6jS2gdEWRgJw1yHS46YfUH/1scgIIIaihCh4T0Mm8XGIQDeWexvBd731L0p0mZql 8FNDoBwRdvgPc9fhgLdeV6FdDwYev4oIC8L7u4rjeUAQL7ttdLTEs/k2C6gNL5I5O5NJ ALSSCylms5+qBDzG6p8ZDPdaiKSTuJJ8O/rlF8bxOSzb9Q5MUz9gflBPCalggzvFefRq I6On+vnZ3S63VtOC7+24OhBeyiNBaw2U793hfkT18TEmubL0jTjF8Bnosr26EmAbw+XC fxHQ== X-Gm-Message-State: AOAM533PYx1iew9pKg59eE24yzeX3Z0gNzlYMJBcGVDpdKZ6pSkWUCMF ZgN/OHpeyZ/VHY8jY8cLTC4J0ZjkaZ0= X-Google-Smtp-Source: ABdhPJxopWWJAXlZyuVjEWJvEpGFMfvhmeIHL3VvQhtRtHSrwjuIoJhwNl8H+7DCFybxZnq275Bk9Q== X-Received: by 2002:a1c:bd8b:: with SMTP id n133mr4726798wmf.9.1611759731069; Wed, 27 Jan 2021 07:02:11 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id s25sm2712891wmj.24.2021.01.27.07.02.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Jan 2021 07:02:10 -0800 (PST) Message-Id: In-Reply-To: References: Date: Wed, 27 Jan 2021 15:01:51 +0000 Subject: [PATCH v2 12/17] chunk-format: create read chunk API Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Chris Torek , Derrick Stolee , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee Add the capability to read the table of contents, then pair the chunks with necessary logic using read_chunk_fn pointers. Callers will be added in future changes, but the typical outline will be: 1. initialize a 'struct chunkfile' with init_chunkfile(NULL). 2. call read_table_of_contents(). 3. for each chunk to parse, a. call pair_chunk() to assign a pointer with the chunk position, or b. call read_chunk() to run a callback on the chunk start and size. 4. call free_chunkfile() to clear the 'struct chunkfile' data. We are re-using the anonymous 'struct chunkfile' data, as it is internal to the chunk-format API. This gives it essentially two modes: write and read. If the same struct instance was used for both reads and writes, then there would be failures. Signed-off-by: Derrick Stolee --- chunk-format.c | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++ chunk-format.h | 33 +++++++++++++++++++++ 2 files changed, 113 insertions(+) diff --git a/chunk-format.c b/chunk-format.c index ab914c55856..74501084cf8 100644 --- a/chunk-format.c +++ b/chunk-format.c @@ -12,6 +12,8 @@ struct chunk_info { uint32_t id; uint64_t size; chunk_write_fn write_fn; + + const void *start; }; struct chunkfile { @@ -89,3 +91,81 @@ int write_chunkfile(struct chunkfile *cf, void *data) return 0; } + +int read_table_of_contents(struct chunkfile *cf, + const unsigned char *mfile, + size_t mfile_size, + uint64_t toc_offset, + int toc_length) +{ + uint32_t chunk_id; + const unsigned char *table_of_contents = mfile + toc_offset; + + ALLOC_GROW(cf->chunks, toc_length, cf->chunks_alloc); + + while (toc_length--) { + uint64_t chunk_offset, next_chunk_offset; + + chunk_id = get_be32(table_of_contents); + chunk_offset = get_be64(table_of_contents + 4); + + if (!chunk_id) { + error(_("terminating chunk id appears earlier than expected")); + return 1; + } + + table_of_contents += CHUNK_LOOKUP_WIDTH; + next_chunk_offset = get_be64(table_of_contents + 4); + + if (next_chunk_offset < chunk_offset || + next_chunk_offset > mfile_size - the_hash_algo->rawsz) { + error(_("improper chunk offset(s) %"PRIx64" and %"PRIx64""), + chunk_offset, next_chunk_offset); + return -1; + } + + cf->chunks[cf->chunks_nr].id = chunk_id; + cf->chunks[cf->chunks_nr].start = mfile + chunk_offset; + cf->chunks[cf->chunks_nr].size = next_chunk_offset - chunk_offset; + cf->chunks_nr++; + } + + chunk_id = get_be32(table_of_contents); + if (chunk_id) { + error(_("final chunk has non-zero id %"PRIx32""), chunk_id); + return -1; + } + + return 0; +} + +int pair_chunk(struct chunkfile *cf, + uint32_t chunk_id, + const unsigned char **p) +{ + int i; + + for (i = 0; i < cf->chunks_nr; i++) { + if (cf->chunks[i].id == chunk_id) { + *p = cf->chunks[i].start; + return 0; + } + } + + return CHUNK_NOT_FOUND; +} + +int read_chunk(struct chunkfile *cf, + uint32_t chunk_id, + chunk_read_fn fn, + void *data) +{ + int i; + + for (i = 0; i < cf->chunks_nr; i++) { + if (cf->chunks[i].id == chunk_id) + return fn(cf->chunks[i].start, cf->chunks[i].size, data); + } + + return CHUNK_NOT_FOUND; +} diff --git a/chunk-format.h b/chunk-format.h index bfaed672813..b62c9bf8ba1 100644 --- a/chunk-format.h +++ b/chunk-format.h @@ -17,4 +17,37 @@ void add_chunk(struct chunkfile *cf, size_t size); int write_chunkfile(struct chunkfile *cf, void *data); +int read_table_of_contents(struct chunkfile *cf, + const unsigned char *mfile, + size_t mfile_size, + uint64_t toc_offset, + int toc_length); + +#define CHUNK_NOT_FOUND (-2) + +/* + * Find 'chunk_id' in the given chunkfile and assign the + * given pointer to the position in the mmap'd file where + * that chunk begins. + * + * Returns CHUNK_NOT_FOUND if the chunk does not exist. + */ +int pair_chunk(struct chunkfile *cf, + uint32_t chunk_id, + const unsigned char **p); + +typedef int (*chunk_read_fn)(const unsigned char *chunk_start, + size_t chunk_size, void *data); +/* + * Find 'chunk_id' in the given chunkfile and call the + * given chunk_read_fn method with the information for + * that chunk. + * + * Returns CHUNK_NOT_FOUND if the chunk does not exist. + */ +int read_chunk(struct chunkfile *cf, + uint32_t chunk_id, + chunk_read_fn fn, + void *data); + #endif From patchwork Wed Jan 27 15:01:52 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12050403 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 88E7AC433E0 for ; Wed, 27 Jan 2021 15:36:10 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 24CE3207D0 for ; Wed, 27 Jan 2021 15:36:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234684AbhA0PfZ (ORCPT ); Wed, 27 Jan 2021 10:35:25 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51922 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235666AbhA0PE2 (ORCPT ); Wed, 27 Jan 2021 10:04:28 -0500 Received: from mail-wm1-x32c.google.com (mail-wm1-x32c.google.com [IPv6:2a00:1450:4864:20::32c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A482FC061220 for ; Wed, 27 Jan 2021 07:02:13 -0800 (PST) Received: by mail-wm1-x32c.google.com with SMTP id j18so1799738wmi.3 for ; Wed, 27 Jan 2021 07:02:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=Q5wWfy5oAGCO35mqYvKAtPhcZckzIut3LhnPNj7ZkDQ=; b=sfJma4y/tIH88XqZJyxqbKxuofXKtn5ldLfm2V6NrlP/pMx2y9iNw41VRJWSl7so6j 6+UZ0mfzetbN6EH39T/B9xyZ9luweTYS/Y209e72tGbXgM8JjIv/VMwaU8jcIJniefPr 7TfGdhP6c3MRaZPlLNYQQDV1+l77a32iHtOXRq1DkuZq1rJv3F7YQpaALu7gyqQuRF+o LEEijkk17g8ubXfqryjHxoFoxt7/qpFPKlC7kZfgaA8SSt5uOMr6m8t3Kv6HDTTC6wjk UL1bCjwgdz2q/NUkqaTglToodOSao98MDv+Nj+Upv14PrKXVjTDO9OKrwSzyUYUj076A re3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=Q5wWfy5oAGCO35mqYvKAtPhcZckzIut3LhnPNj7ZkDQ=; b=ugjkWVa9/FWb8bo7fdKQpJ1tEPvMs/m9Wv1VvoTnaxFwzyiaje/xDF6KrCiIBB7ARc Xg+zihDBvP9BIbvPO3TEtTtAVIsI9lYLiEfwlJYa95BmDep7zp0FZMKbNi1aTRAfWyZ8 MHXT7pTLcjCTK/rF5K66yGMUnlapAun66X/KszuvE8jC+rjPhP3WFiD+b152mLOTNPNF qI8Gr667RtHZXn+GKXh+wqS3VTptRdeSoMBN5d6h+NZtxnohH+FyqFi4N2+6r2uKnsZG WVOLKXzyT9+eh4Ygb8vdU7ZePOEd5Exgbt5w5PRUJt+sqBcBEq6C3K7y2B6lrXjdyD3T pnnA== X-Gm-Message-State: AOAM5316wNs87l0h8xneCRfPi2++kj1kjsQ4k6CnoOMbFqRZycPNZTrg u9muwtMw65r4+r5/pcScY24hFOxNmx4= X-Google-Smtp-Source: ABdhPJz+16SZBofhvFgIx3HyaLl12q3AgyBm1PYDP72wO4gnvDf4atXRmyIGkOfSiP5eR1J95hF7dQ== X-Received: by 2002:a05:600c:258:: with SMTP id 24mr4536280wmj.161.1611759732107; Wed, 27 Jan 2021 07:02:12 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id s4sm3276008wrt.85.2021.01.27.07.02.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Jan 2021 07:02:11 -0800 (PST) Message-Id: <8744d2785965773a2b561e9e1e91170530d052b1.1611759716.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Wed, 27 Jan 2021 15:01:52 +0000 Subject: [PATCH v2 13/17] commit-graph: use chunk-format read API Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Chris Torek , Derrick Stolee , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee Instead of parsing the table of contents directly, use the chunk-format API methods read_table_of_contents() and pair_chunk(). While the current implementation loses the duplicate-chunk detection, that will be added in a future change. Signed-off-by: Derrick Stolee --- commit-graph.c | 154 ++++++++++++++-------------------------- t/t5318-commit-graph.sh | 2 +- 2 files changed, 53 insertions(+), 103 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index ba33777dcb8..8aa4881d85d 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -299,15 +299,43 @@ static int verify_commit_graph_lite(struct commit_graph *g) return 0; } +static int graph_read_oid_lookup(const unsigned char *chunk_start, + size_t chunk_size, void *data) +{ + struct commit_graph *g = data; + g->chunk_oid_lookup = chunk_start; + g->num_commits = chunk_size / g->hash_len; + return 0; +} + +static int graph_read_bloom_data(const unsigned char *chunk_start, + size_t chunk_size, void *data) +{ + struct commit_graph *g = data; + uint32_t hash_version; + g->chunk_bloom_data = chunk_start; + hash_version = get_be32(chunk_start); + + if (hash_version != 1) + return 0; + + g->bloom_filter_settings = xmalloc(sizeof(struct bloom_filter_settings)); + g->bloom_filter_settings->hash_version = hash_version; + g->bloom_filter_settings->num_hashes = get_be32(chunk_start + 4); + g->bloom_filter_settings->bits_per_entry = get_be32(chunk_start + 8); + g->bloom_filter_settings->max_changed_paths = DEFAULT_BLOOM_MAX_CHANGES; + + return 0; +} + struct commit_graph *parse_commit_graph(struct repository *r, void *graph_map, size_t graph_size) { - const unsigned char *data, *chunk_lookup; - uint32_t i; + const unsigned char *data; struct commit_graph *graph; - uint64_t next_chunk_offset; uint32_t graph_signature; unsigned char graph_version, hash_version; + struct chunkfile *cf = NULL; if (!graph_map) return NULL; @@ -356,108 +384,28 @@ struct commit_graph *parse_commit_graph(struct repository *r, return NULL; } - chunk_lookup = data + 8; - next_chunk_offset = get_be64(chunk_lookup + 4); - for (i = 0; i < graph->num_chunks; i++) { - uint32_t chunk_id; - uint64_t chunk_offset = next_chunk_offset; - int chunk_repeated = 0; - - chunk_id = get_be32(chunk_lookup + 0); - - chunk_lookup += GRAPH_CHUNKLOOKUP_WIDTH; - next_chunk_offset = get_be64(chunk_lookup + 4); - - if (chunk_offset > graph_size - the_hash_algo->rawsz) { - error(_("commit-graph improper chunk offset %08x%08x"), (uint32_t)(chunk_offset >> 32), - (uint32_t)chunk_offset); - goto free_and_return; - } - - switch (chunk_id) { - case GRAPH_CHUNKID_OIDFANOUT: - if (graph->chunk_oid_fanout) - chunk_repeated = 1; - else - graph->chunk_oid_fanout = (uint32_t*)(data + chunk_offset); - break; - - case GRAPH_CHUNKID_OIDLOOKUP: - if (graph->chunk_oid_lookup) - chunk_repeated = 1; - else { - graph->chunk_oid_lookup = data + chunk_offset; - graph->num_commits = (next_chunk_offset - chunk_offset) - / graph->hash_len; - } - break; + cf = init_chunkfile(NULL); - case GRAPH_CHUNKID_DATA: - if (graph->chunk_commit_data) - chunk_repeated = 1; - else - graph->chunk_commit_data = data + chunk_offset; - break; - - case GRAPH_CHUNKID_GENERATION_DATA: - if (graph->chunk_generation_data) - chunk_repeated = 1; - else - graph->chunk_generation_data = data + chunk_offset; - break; - - case GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW: - if (graph->chunk_generation_data_overflow) - chunk_repeated = 1; - else - graph->chunk_generation_data_overflow = data + chunk_offset; - break; - - case GRAPH_CHUNKID_EXTRAEDGES: - if (graph->chunk_extra_edges) - chunk_repeated = 1; - else - graph->chunk_extra_edges = data + chunk_offset; - break; - - case GRAPH_CHUNKID_BASE: - if (graph->chunk_base_graphs) - chunk_repeated = 1; - else - graph->chunk_base_graphs = data + chunk_offset; - break; - - case GRAPH_CHUNKID_BLOOMINDEXES: - if (graph->chunk_bloom_indexes) - chunk_repeated = 1; - else if (r->settings.commit_graph_read_changed_paths) - graph->chunk_bloom_indexes = data + chunk_offset; - break; - - case GRAPH_CHUNKID_BLOOMDATA: - if (graph->chunk_bloom_data) - chunk_repeated = 1; - else if (r->settings.commit_graph_read_changed_paths) { - uint32_t hash_version; - graph->chunk_bloom_data = data + chunk_offset; - hash_version = get_be32(data + chunk_offset); - - if (hash_version != 1) - break; + if (read_table_of_contents(cf, graph->data, graph_size, + GRAPH_HEADER_SIZE, graph->num_chunks)) + goto free_and_return; - graph->bloom_filter_settings = xmalloc(sizeof(struct bloom_filter_settings)); - graph->bloom_filter_settings->hash_version = hash_version; - graph->bloom_filter_settings->num_hashes = get_be32(data + chunk_offset + 4); - graph->bloom_filter_settings->bits_per_entry = get_be32(data + chunk_offset + 8); - graph->bloom_filter_settings->max_changed_paths = DEFAULT_BLOOM_MAX_CHANGES; - } - break; - } + pair_chunk(cf, GRAPH_CHUNKID_OIDFANOUT, + (const unsigned char **)&graph->chunk_oid_fanout); + read_chunk(cf, GRAPH_CHUNKID_OIDLOOKUP, graph_read_oid_lookup, graph); + pair_chunk(cf, GRAPH_CHUNKID_DATA, &graph->chunk_commit_data); + pair_chunk(cf, GRAPH_CHUNKID_EXTRAEDGES, &graph->chunk_extra_edges); + pair_chunk(cf, GRAPH_CHUNKID_BASE, &graph->chunk_base_graphs); + pair_chunk(cf, GRAPH_CHUNKID_GENERATION_DATA, + &graph->chunk_generation_data); + pair_chunk(cf, GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW, + &graph->chunk_generation_data_overflow); - if (chunk_repeated) { - error(_("commit-graph chunk id %08x appears multiple times"), chunk_id); - goto free_and_return; - } + if (r->settings.commit_graph_read_changed_paths) { + pair_chunk(cf, GRAPH_CHUNKID_BLOOMINDEXES, + &graph->chunk_bloom_indexes); + read_chunk(cf, GRAPH_CHUNKID_BLOOMDATA, + graph_read_bloom_data, graph); } if (graph->chunk_bloom_indexes && graph->chunk_bloom_data) { @@ -474,9 +422,11 @@ struct commit_graph *parse_commit_graph(struct repository *r, if (verify_commit_graph_lite(graph)) goto free_and_return; + free_chunkfile(cf); return graph; free_and_return: + free_chunkfile(cf); free(graph->bloom_filter_settings); free(graph); return NULL; diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh index fa27df579a5..c7da741284e 100755 --- a/t/t5318-commit-graph.sh +++ b/t/t5318-commit-graph.sh @@ -564,7 +564,7 @@ test_expect_success 'detect bad hash version' ' test_expect_success 'detect low chunk count' ' corrupt_graph_and_verify $GRAPH_BYTE_CHUNK_COUNT "\01" \ - "missing the .* chunk" + "final chunk has non-zero id" ' test_expect_success 'detect missing OID fanout chunk' ' From patchwork Wed Jan 27 15:01:53 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12050401 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 94657C4332D for ; Wed, 27 Jan 2021 15:35:28 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 54627207E2 for ; Wed, 27 Jan 2021 15:35:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235671AbhA0PfT (ORCPT ); Wed, 27 Jan 2021 10:35:19 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51930 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235667AbhA0PE2 (ORCPT ); Wed, 27 Jan 2021 10:04:28 -0500 Received: from mail-wr1-x430.google.com (mail-wr1-x430.google.com [IPv6:2a00:1450:4864:20::430]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B0B85C06174A for ; Wed, 27 Jan 2021 07:02:14 -0800 (PST) Received: by mail-wr1-x430.google.com with SMTP id d16so2210085wro.11 for ; Wed, 27 Jan 2021 07:02:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=LCLkBb3Btx9eKiXsYizy8uFBjs8FNfUy9ILboNO8EaA=; b=sVUFjqch5xA1521sO0sTwQdWV8zefqBsBabx65LJzKdNRmxuvlNAfoKCKqplppwI1B T9R+1JsAt/ZkWvL6JRfFnANR5BFeRCs8d0cNzkAanI/ROKv55zWsW9oMGfWyyUZn4ocI rHvyvC+cTdGPgDJkHoJnccjShP47YTH8PBc11jsJoQ+1DtxZmQtDtvaSzmXnNw/6w5UK N3GRgidkqHtg8R8XUBlgzRIl6I2tC5cMkYuNtVixdMWcJPd2kesYV4DB9hUaBb80xjIf xwohYYpGqrW1KI8lBazLs68m2OKPOFvBtuELbmq66eEpFL9FuDR6HPObFJokSXe2duy+ FEfA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=LCLkBb3Btx9eKiXsYizy8uFBjs8FNfUy9ILboNO8EaA=; b=OBplYGOHwAjUaOfSbIIJ+/SeFdP6fA9C8nIhQACaxmcqwN4RiZSllizd9yoIJlsncG MuMf9PlScxA+ZO3tkhk/Qt0nci6CSnASHqHCh5W0kL0auSG2/2bgiTEGfg4Cr+Q/iJ2/ 2ovJoxSr9G5cCTxYFH+cemKiorwgSvRdE/ZYw+wNT+OEQn1dtp6YsXPi1auKs4FyhdXC xb8kLbD45IWhKP2bCw2p8UQyXDBI3UNbWw1P4cBWhfA3ks0/2XUKYgNIQah1/UZptKP0 yNByP76KAuGX/bJ0GkmA8YmmmPl5NKBT6jZeLLfqTNtEbqJHGAFJrp/Bl0Hcl6jwEHOV k+aA== X-Gm-Message-State: AOAM533Kx4YhVOoBWxMizatH6MC3McEiknSuESvCesChlNN6hoJTbKzU VAng1U1ibTT0qGkeI5YfUfotrplfHqs= X-Google-Smtp-Source: ABdhPJzj6rLIikuzzkQMINFHUV5LA+qZ/eCA03TUyBJQzMrSLMZmxM3mIRVUoHI4EwdiYsH5Nd4LQw== X-Received: by 2002:a5d:4383:: with SMTP id i3mr11604164wrq.293.1611759733187; Wed, 27 Jan 2021 07:02:13 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id r10sm2719659wmd.15.2021.01.27.07.02.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Jan 2021 07:02:12 -0800 (PST) Message-Id: <750c03253c95cf9fdbcf41bb65058956920ee83e.1611759716.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Wed, 27 Jan 2021 15:01:53 +0000 Subject: [PATCH v2 14/17] midx: use chunk-format read API Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Chris Torek , Derrick Stolee , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee Instead of parsing the table of contents directly, use the chunk-format API methods read_table_of_contents() and pair_chunk(). In particular, we can use the return value of pair_chunk() to generate an error when a required chunk is missing. Signed-off-by: Derrick Stolee --- midx.c | 71 +++++++++++++------------------------ t/t5319-multi-pack-index.sh | 6 ++-- 2 files changed, 28 insertions(+), 49 deletions(-) diff --git a/midx.c b/midx.c index 3585e04a706..e94dcd34b7f 100644 --- a/midx.c +++ b/midx.c @@ -54,6 +54,19 @@ static char *get_midx_filename(const char *object_dir) return xstrfmt("%s/pack/multi-pack-index", object_dir); } +static int midx_read_oid_fanout(const unsigned char *chunk_start, + size_t chunk_size, void *data) +{ + struct multi_pack_index *m = data; + m->chunk_oid_fanout = (uint32_t *)chunk_start; + + if (chunk_size != 4 * 256) { + error(_("multi-pack-index OID fanout is of the wrong size")); + return 1; + } + return 0; +} + struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local) { struct multi_pack_index *m = NULL; @@ -65,6 +78,7 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local char *midx_name = get_midx_filename(object_dir); uint32_t i; const char *cur_pack_name; + struct chunkfile *cf = NULL; fd = git_open(midx_name); @@ -114,58 +128,23 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local m->num_packs = get_be32(m->data + MIDX_BYTE_NUM_PACKS); - for (i = 0; i < m->num_chunks; i++) { - uint32_t chunk_id = get_be32(m->data + MIDX_HEADER_SIZE + - MIDX_CHUNKLOOKUP_WIDTH * i); - uint64_t chunk_offset = get_be64(m->data + MIDX_HEADER_SIZE + 4 + - MIDX_CHUNKLOOKUP_WIDTH * i); - - if (chunk_offset >= m->data_len) - die(_("invalid chunk offset (too large)")); - - switch (chunk_id) { - case MIDX_CHUNKID_PACKNAMES: - m->chunk_pack_names = m->data + chunk_offset; - break; - - case MIDX_CHUNKID_OIDFANOUT: - m->chunk_oid_fanout = (uint32_t *)(m->data + chunk_offset); - break; - - case MIDX_CHUNKID_OIDLOOKUP: - m->chunk_oid_lookup = m->data + chunk_offset; - break; - - case MIDX_CHUNKID_OBJECTOFFSETS: - m->chunk_object_offsets = m->data + chunk_offset; - break; - - case MIDX_CHUNKID_LARGEOFFSETS: - m->chunk_large_offsets = m->data + chunk_offset; - break; - - case 0: - die(_("terminating multi-pack-index chunk id appears earlier than expected")); - break; - - default: - /* - * Do nothing on unrecognized chunks, allowing future - * extensions to add optional chunks. - */ - break; - } - } + cf = init_chunkfile(NULL); - if (!m->chunk_pack_names) + if (read_table_of_contents(cf, m->data, midx_size, + MIDX_HEADER_SIZE, m->num_chunks)) + goto cleanup_fail; + + if (pair_chunk(cf, MIDX_CHUNKID_PACKNAMES, &m->chunk_pack_names) == CHUNK_NOT_FOUND) die(_("multi-pack-index missing required pack-name chunk")); - if (!m->chunk_oid_fanout) + if (read_chunk(cf, MIDX_CHUNKID_OIDFANOUT, midx_read_oid_fanout, m) == CHUNK_NOT_FOUND) die(_("multi-pack-index missing required OID fanout chunk")); - if (!m->chunk_oid_lookup) + if (pair_chunk(cf, MIDX_CHUNKID_OIDLOOKUP, &m->chunk_oid_lookup) == CHUNK_NOT_FOUND) die(_("multi-pack-index missing required OID lookup chunk")); - if (!m->chunk_object_offsets) + if (pair_chunk(cf, MIDX_CHUNKID_OBJECTOFFSETS, &m->chunk_object_offsets) == CHUNK_NOT_FOUND) die(_("multi-pack-index missing required object offsets chunk")); + pair_chunk(cf, MIDX_CHUNKID_LARGEOFFSETS, &m->chunk_large_offsets); + m->num_objects = ntohl(m->chunk_oid_fanout[255]); m->pack_names = xcalloc(m->num_packs, sizeof(*m->pack_names)); diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh index 297de502a94..ad4e878b65b 100755 --- a/t/t5319-multi-pack-index.sh +++ b/t/t5319-multi-pack-index.sh @@ -314,12 +314,12 @@ test_expect_success 'verify bad OID version' ' test_expect_success 'verify truncated chunk count' ' corrupt_midx_and_verify $MIDX_BYTE_CHUNK_COUNT "\01" $objdir \ - "missing required" + "final chunk has non-zero id" ' test_expect_success 'verify extended chunk count' ' corrupt_midx_and_verify $MIDX_BYTE_CHUNK_COUNT "\07" $objdir \ - "terminating multi-pack-index chunk id appears earlier than expected" + "terminating chunk id appears earlier than expected" ' test_expect_success 'verify missing required chunk' ' @@ -329,7 +329,7 @@ test_expect_success 'verify missing required chunk' ' test_expect_success 'verify invalid chunk offset' ' corrupt_midx_and_verify $MIDX_BYTE_CHUNK_OFFSET "\01" $objdir \ - "invalid chunk offset (too large)" + "improper chunk offset(s)" ' test_expect_success 'verify packnames out of order' ' From patchwork Wed Jan 27 15:01:54 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12050397 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AB9ADC433E0 for ; Wed, 27 Jan 2021 15:35:27 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 65EA5207D2 for ; Wed, 27 Jan 2021 15:35:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236266AbhA0PfL (ORCPT ); Wed, 27 Jan 2021 10:35:11 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51940 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235671AbhA0PE2 (ORCPT ); Wed, 27 Jan 2021 10:04:28 -0500 Received: from mail-wr1-x431.google.com (mail-wr1-x431.google.com [IPv6:2a00:1450:4864:20::431]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D1E2EC061786 for ; Wed, 27 Jan 2021 07:02:15 -0800 (PST) Received: by mail-wr1-x431.google.com with SMTP id z6so2192066wrq.10 for ; Wed, 27 Jan 2021 07:02:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=xNckMs8XzXWvH6kN0+MYbu3bWWVHwDNLRLn5SXuuQkw=; b=n46KnaP1x2HZrVcs/HdBXVAaXZX74BmRGtfz7M52CtEE1seyvI3SG6O3oL591XGRBw jT8Owsv6kUXXEswCu3gKvetlCyhJUjwpL3eguUaUc0aZzUeJdkQ8JVBe7aFsviho5/Ow St9d8kjSvdWNEE+OhujcdXvryb6g7r/9b/bmUSik5NLZqjo+DLyrGvjV09TzDx4p4OPT kr4L4K9E38my79eYTqdZWOSJYjFeEXAnZH3L2YVSCPBBNvd0EGujRZUQ7QXxXR5Kb/FR KstCVrZhoI6vcpTeuhhpBiPZqNprDSY3hP7eX8mSg71GT34uYlZfOMk74DXFrNqx5M5O 2xGA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=xNckMs8XzXWvH6kN0+MYbu3bWWVHwDNLRLn5SXuuQkw=; b=Bx+M6k51w3rG4wBi2UFoCTm+dfQk1aqrR737gV4upukwfnt4+kinGmg3c5DfjpdiKm OM0jUtoorjyjb33D82UzQPHbpCQnQQIrKwY+G55mJhgBFXkcLMQ2nmp4sy2zrVIFRU60 1ws8GkY+vHlLo55Uze9CY2oT6BYkRgM0C1B+GJqSRVykZyfN0Mv/Nr/HYdZgdL4qhSXZ fBU7G/NLj8h+/vNMVRVHBF3+ruluE8+vW6VGy1SOq6u7qAM0dNBeYh7GDZeUsf7ou4h1 J2UfKyH76zu1FqJLajQzGLi9fM4X/88qTIS7cNw5SK4b4FAI+wilVJXWzOubRzS341MJ RuUg== X-Gm-Message-State: AOAM532SBVm001JJR0EtcCM/HuwxvKpUIaq6Fe9HO+ZCVzgcQAQJ1rLU TGTrjgU8C7fDzY6wwJbb+Xxi1M2on5Q= X-Google-Smtp-Source: ABdhPJxFUu6U+dU2pPy2O2jdfTgeQ/PWjsXWipe027no53oTPKesjDkwSZGb2NaelvuymM+L9htzfg== X-Received: by 2002:a05:6000:1565:: with SMTP id 5mr11849044wrz.109.1611759734383; Wed, 27 Jan 2021 07:02:14 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id a184sm2939004wme.35.2021.01.27.07.02.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Jan 2021 07:02:13 -0800 (PST) Message-Id: <83d292532a0fa3f3a0ad343421be4a99a03471d0.1611759716.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Wed, 27 Jan 2021 15:01:54 +0000 Subject: [PATCH v2 15/17] midx: use 64-bit multiplication for chunk sizes Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Chris Torek , Derrick Stolee , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee When calculating the sizes of certain chunks, we should use 64-bit multiplication always. This allows us to properly predict the chunk sizes without risk of overflow. Signed-off-by: Derrick Stolee --- midx.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/midx.c b/midx.c index e94dcd34b7f..a365dac6bbc 100644 --- a/midx.c +++ b/midx.c @@ -913,7 +913,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * add_chunk(cf, MIDX_CHUNKID_OIDFANOUT, write_midx_oid_fanout, MIDX_CHUNK_FANOUT_SIZE); add_chunk(cf, MIDX_CHUNKID_OIDLOOKUP, - write_midx_oid_lookup, ctx.entries_nr * the_hash_algo->rawsz); + write_midx_oid_lookup, (uint64_t)ctx.entries_nr * the_hash_algo->rawsz); add_chunk(cf, MIDX_CHUNKID_OBJECTOFFSETS, write_midx_object_offsets, ctx.entries_nr * MIDX_CHUNK_OFFSET_WIDTH); @@ -921,7 +921,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * if (ctx.large_offsets_needed) add_chunk(cf, MIDX_CHUNKID_LARGEOFFSETS, write_midx_large_offsets, - ctx.num_large_offsets * MIDX_CHUNK_LARGE_OFFSET_WIDTH); + (uint64_t)ctx.num_large_offsets * MIDX_CHUNK_LARGE_OFFSET_WIDTH); write_midx_header(f, get_num_chunks(cf), ctx.nr - dropped_packs); write_chunkfile(cf, &ctx); From patchwork Wed Jan 27 15:01:55 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12050405 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0EBAAC433DB for ; Wed, 27 Jan 2021 15:36:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id ACA49207D0 for ; Wed, 27 Jan 2021 15:36:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236320AbhA0Pfc (ORCPT ); Wed, 27 Jan 2021 10:35:32 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50990 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235653AbhA0PEJ (ORCPT ); Wed, 27 Jan 2021 10:04:09 -0500 Received: from mail-wm1-x32c.google.com (mail-wm1-x32c.google.com [IPv6:2a00:1450:4864:20::32c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BDF3DC06178A for ; Wed, 27 Jan 2021 07:02:16 -0800 (PST) Received: by mail-wm1-x32c.google.com with SMTP id o10so3473987wmc.1 for ; Wed, 27 Jan 2021 07:02:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=TTTz9vK+8k9uuGRSm5TuJOz4k7rXG4BQKle14Lbv6Qg=; b=sq/QiY6u3CAIjFlbd4aKNOIvAMLFhknCUEp2CljjhgSVY6MGxWsZUskfsQNSc4zN1y N1R+impTuylKbWnREXT+6BPiBWaxlBwU8dbg7JhSg8uycaU6iLQ1T0pmUmndw4CgSZp+ 4VEQoORq75Wlg7dRlFHk8gqxY8CZYgdTu/V/voemIeeefd8Uxk4SnPIqK8uxjYLinQ83 5QA+a3jL1H9bjX1JykIo9IzML6cIm6l2ihoEzNaa+BkxZJDhtJqlx5xaHkCKL7lCNbDH dcfi+mc0Wj680Z7ip5VeKR02NnbXerM6ytH09QoDKBqy3aU2DY1DHQKgIcBmU4ORzlge Z4SA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=TTTz9vK+8k9uuGRSm5TuJOz4k7rXG4BQKle14Lbv6Qg=; b=S3gvVLeaKyaAj9sxL1i3YGRreosq9q7jRM+uisrTCP/zWAr//PzVfoWXTdZ5SZR0ed 3G5wStoNBEIjzMX1hRYqjkFeDIIQKXiiAv4Cm2yynrSm5gjjqs+ApQw4JpYi3l9eKKrA gGgB/9yHhbNolDQeiAfsyx4AtoaLjkmLTIYiMzkIYzYiVuB5/60cxk/vcebmkXqqzZ9W T1RQ74kE7QeLIuEidJQVBJwrMpg/3cdqIHKfNwmP4G0/WnkmztY7kqHsJdlsVqJqfJG2 +6l0vq8dUMlrYyQ5IynPCUiuiTTrrCFhGfkH2VOn0IOYhH+GK9UJL1tHWPfMwRMJ7BVw wjbg== X-Gm-Message-State: AOAM532TdHd3GNA/2w5+JXYW4uAqTpw8jjL1laNhowojKYDGjhBWwCfn hRmEyeN4RDmKLkNs+D8y2SN/xX+Msh0= X-Google-Smtp-Source: ABdhPJyzK+lvvoMH3vxvBdsrQZSX3hW0/TwdqDbdLwJPVwT0baLnh6GtlXbNX6JAhroC9oK6rUmgUQ== X-Received: by 2002:a7b:c45a:: with SMTP id l26mr4577756wmi.164.1611759735291; Wed, 27 Jan 2021 07:02:15 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id i8sm3330564wry.90.2021.01.27.07.02.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Jan 2021 07:02:14 -0800 (PST) Message-Id: <669eeec707ab92a3e5983ad12baddc2c15012d43.1611759716.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Wed, 27 Jan 2021 15:01:55 +0000 Subject: [PATCH v2 16/17] chunk-format: restore duplicate chunk checks Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Chris Torek , Derrick Stolee , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee Before refactoring into the chunk-format API, the commit-graph parsing logic included checks for duplicate chunks. It is unlikely that we would desire a chunk-based file format that allows duplicate chunk IDs in the table of contents, so add duplicate checks into read_table_of_contents(). Signed-off-by: Derrick Stolee --- chunk-format.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/chunk-format.c b/chunk-format.c index 74501084cf8..1ee875df423 100644 --- a/chunk-format.c +++ b/chunk-format.c @@ -14,6 +14,7 @@ struct chunk_info { chunk_write_fn write_fn; const void *start; + unsigned found:1; }; struct chunkfile { @@ -98,6 +99,7 @@ int read_table_of_contents(struct chunkfile *cf, uint64_t toc_offset, int toc_length) { + int i; uint32_t chunk_id; const unsigned char *table_of_contents = mfile + toc_offset; @@ -124,6 +126,14 @@ int read_table_of_contents(struct chunkfile *cf, return -1; } + for (i = 0; i < cf->chunks_nr; i++) { + if (cf->chunks[i].id == chunk_id) { + error(_("duplicate chunk ID %"PRIx32" found"), + chunk_id); + return -1; + } + } + cf->chunks[cf->chunks_nr].id = chunk_id; cf->chunks[cf->chunks_nr].start = mfile + chunk_offset; cf->chunks[cf->chunks_nr].size = next_chunk_offset - chunk_offset; From patchwork Wed Jan 27 15:01:56 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12050399 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8FF33C433DB for ; Wed, 27 Jan 2021 15:35:27 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 38878207D0 for ; Wed, 27 Jan 2021 15:35:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235687AbhA0Pez (ORCPT ); Wed, 27 Jan 2021 10:34:55 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52120 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235308AbhA0PFP (ORCPT ); Wed, 27 Jan 2021 10:05:15 -0500 Received: from mail-wr1-x42a.google.com (mail-wr1-x42a.google.com [IPv6:2a00:1450:4864:20::42a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DD599C061221 for ; Wed, 27 Jan 2021 07:02:17 -0800 (PST) Received: by mail-wr1-x42a.google.com with SMTP id h9so2227351wrr.9 for ; Wed, 27 Jan 2021 07:02:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=/31wU3H+Rpd5f02JkLU2apACd46gCGYTclTqM21pbrU=; b=RGKngy0xof48CDrWck0RiLoH8QLRTj9BkrR+HiH5rVb2jNbbsoQL2Y/nWKlpG2/uV3 aa03OoOfRql3W9pecV9qxd3z5uq4P1xb4/MLSbgmNNSSi4pvgnG327Feb1ruzR3WBa3b Kp+e8LEdvSy6wqyEEP3x4q8x/+lV5FbuXurGgLUr7KDmkaxot10KrB8kIaHkbZymboDP gXc8rwi9anrGFnTlPn9X0Mp2JrU6PeTN0jjN7N8mi+4I5qizXdg3EVBQrBSTG2XuO8RP VcUOo7wHX7kWImtYwhqJvLARwKkcLSPa2imaAWy7pObR5uyoEgttSx5q1SZUDaZDFF7E EsRg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=/31wU3H+Rpd5f02JkLU2apACd46gCGYTclTqM21pbrU=; b=QajttKdzFceHo3h9Ggozh5bxqNn/ZT+V+e6aRvqwNbicvyVQfO1u+WyET6sLIBN9xM spIbTJzdirnoiqV56/y0qgUzUWxtgoYef4c/eQoevVHI4eOsHPDwuycOIqNCGzfp93z5 7AVGXo0sF1x45dL3U/iiQ8IbR+9X9iHNsvOFTGyljesfLJDCtFZwwxaKZv4NI9CjwjNE iflhEw51coEpFXekMbpJfeeN1s/8HtwQQnjqLKs/pqD3V6NHVC7/PqEk3BiNnZtzxDPY 5r69LmjbHGDUcjds5b68+WRwId+L5/sXKnhKQWFfcOieMCVBOUfQ6Afea1EhURksS2BB yEAA== X-Gm-Message-State: AOAM530I2GK0g9g7+aUWxvTE9tPzHMh4DutHDL7rEvP83+s8Sti220zo tK7u5C+nnOcpFeksHXpVfqcJuG1l7MU= X-Google-Smtp-Source: ABdhPJzPc7tCV6IT0sOPcvzSru6jbSJyf4D1j2miJSIxLSqD0hnUfgdap6h+kmoNBVLUdg/hUxPifw== X-Received: by 2002:a5d:6092:: with SMTP id w18mr11529833wrt.75.1611759736310; Wed, 27 Jan 2021 07:02:16 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id c20sm2646925wmb.38.2021.01.27.07.02.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Jan 2021 07:02:15 -0800 (PST) Message-Id: <8f3985ab5df3e4abc6de6db7f71f1adcbc16b4a8.1611759716.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Wed, 27 Jan 2021 15:01:56 +0000 Subject: [PATCH v2 17/17] chunk-format: add technical docs Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Chris Torek , Derrick Stolee , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The chunk-based file format is now an API in the code, but we should also take time to document it as a file format. Specifically, it matches the CHUNK LOOKUP sections of the commit-graph and multi-pack-index files, but there are some commonalities that should be grouped in this document. Signed-off-by: Derrick Stolee --- Documentation/technical/chunk-format.txt | 54 +++++++++++++++++++ .../technical/commit-graph-format.txt | 3 ++ Documentation/technical/pack-format.txt | 3 ++ 3 files changed, 60 insertions(+) create mode 100644 Documentation/technical/chunk-format.txt diff --git a/Documentation/technical/chunk-format.txt b/Documentation/technical/chunk-format.txt new file mode 100644 index 00000000000..3db3792dea2 --- /dev/null +++ b/Documentation/technical/chunk-format.txt @@ -0,0 +1,54 @@ +Chunk-based file formats +======================== + +Some file formats in Git use a common concept of "chunks" to describe +sections of the file. This allows structured access to a large file by +scanning a small "table of contents" for the remaining data. This common +format is used by the `commit-graph` and `multi-pack-index` files. See +link:technical/pack-format.html[the `multi-pack-index` format] and +link:technical/commit-graph-format.html[the `commit-graph` format] for +how they use the chunks to describe structured data. + +A chunk-based file format begins with some header information custom to +that format. That header should include enough information to identify +the file type, format version, and number of chunks in the file. From this +information, that file can determine the start of the chunk-based region. + +The chunk-based region starts with a table of contents describing where +each chunk starts and ends. This consists of (C+1) rows of 12 bytes each, +where C is the number of chunks. Consider the following table: + + | Chunk ID (4 bytes) | Chunk Offset (8 bytes) | + |--------------------|------------------------| + | ID[0] | OFFSET[0] | + | ... | ... | + | ID[C] | OFFSET[C] | + | 0x0000 | OFFSET[C+1] | + +Each row consists of a 4-byte chunk identifier (ID) and an 8-byte offset. +Each integer is stored in network-byte order. + +The chunk identifier `ID[i]` is a label for the data stored within this +fill from `OFFSET[i]` (inclusive) to `OFFSET[i+1]` (exclusive). Thus, the +size of the `i`th chunk is equal to the difference between `OFFSET[i+1]` +and `OFFSET[i]`. This requires that the chunk data appears contiguously +in the same order as the table of contents. + +The final entry in the table of contents must be four zero bytes. This +confirms that the table of contents is ending and provides the offset for +the end of the chunk-based data. + +Note: The chunk-based format expects that the file contains _at least_ a +trailing hash after `OFFSET[C+1]`. + +Functions for working with chunk-based file formats are declared in +`chunk-format.h`. Using these methods provide extra checks that assist +developers when creating new file formats, including: + + 1. Writing and reading the table of contents. + + 2. Verifying that the data written in a chunk matches the expected size + that was recorded in the table of contents. + + 3. Checking that a table of contents describes offsets properly within + the file boundaries. diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt index b6658eff188..87971c27dd7 100644 --- a/Documentation/technical/commit-graph-format.txt +++ b/Documentation/technical/commit-graph-format.txt @@ -61,6 +61,9 @@ CHUNK LOOKUP: the length using the next chunk position if necessary.) Each chunk ID appears at most once. + The CHUNK LOOKUP matches the table of contents from + link:technical/chunk-format.html[the chunk-based file format]. + The remaining data in the body is described one chunk at a time, and these chunks may be given in any order. Chunks are required unless otherwise specified. diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt index f96b2e605f3..2fb1e60d29e 100644 --- a/Documentation/technical/pack-format.txt +++ b/Documentation/technical/pack-format.txt @@ -301,6 +301,9 @@ CHUNK LOOKUP: (Chunks are provided in file-order, so you can infer the length using the next chunk position if necessary.) + The CHUNK LOOKUP matches the table of contents from + link:technical/chunk-format.html[the chunk-based file format]. + The remaining data in the body is described one chunk at a time, and these chunks may be given in any order. Chunks are required unless otherwise specified.