From patchwork Wed Sep 29 01:55:01 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 12524393 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7441BC433F5 for ; Wed, 29 Sep 2021 01:55:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4EB9460EFD for ; Wed, 29 Sep 2021 01:55:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243610AbhI2B4o (ORCPT ); Tue, 28 Sep 2021 21:56:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41354 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229505AbhI2B4n (ORCPT ); Tue, 28 Sep 2021 21:56:43 -0400 Received: from mail-io1-xd34.google.com (mail-io1-xd34.google.com [IPv6:2607:f8b0:4864:20::d34]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A59BEC06161C for ; Tue, 28 Sep 2021 18:55:03 -0700 (PDT) Received: by mail-io1-xd34.google.com with SMTP id n71so1196902iod.0 for ; Tue, 28 Sep 2021 18:55:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20210112.gappssmtp.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=+fTmKYl0jFkOhMPvYVM4JfTdMaXHOAwfHND5bviBg7s=; b=HhKMPNukMSYj444BW+IIcW7FOdTdByhAinJlgxGD6pwZSrOy4ofh6gKrOjtokEitqu cSSoeEvdMhdTLksLqFl02AuO9mfiNIrs/Je6teUY/+0NGtRVtJO+K2m6+zeahU9C/4gw 1M8lPLayLcAptrTr+cbEfbG8HVq55Py4SnOgcwR9pMukXmNiYr7L5fd7sYeuYuKXemVg lSQ5pcgSZ6FqhPD7OqlSACNJiDoZilOyx0Ta13P4cok9YkEXWwXEGRbKDTB8hp/OAf/y 8HUlfxqo56eqRWTqyxDG2qX3MF1WGfoC80dXbXSbA6zq1Q8HwICvcvfsSfgtXEgvthj0 u7uA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=+fTmKYl0jFkOhMPvYVM4JfTdMaXHOAwfHND5bviBg7s=; b=d71+puP7sFUyM//V/HFqeh887nbkCSNPYzSKzJYpY69/14C9dBxeRvqkZrGTWuqv/S P3Y2HSx06LdQrQz4sA6utd+2aVGucCbVuodnXUJHXHhQYSPHbFNfw58ORTC69tRowih/ 0wQOv1gJs/nWUU8rmbeGI60hOdl8bnG98//2SS+9vRysJ2axYB4zs54cDfemgVEbyeiA bSgYT9ombbJi82h7HfjGShWjExCZQVaeSb+QKJovyuWyMBcO1IXq3+Z3wvTeJ1gzv7iz 5Nm/Mvtve39DW0for/zFo/ZXpymBHG66Fin0OKaU725bwqast6cUV3TZIcTS8IPAenir vcxw== X-Gm-Message-State: AOAM532Wfmmo/+m5tqxHJzw0FYntANrBQv4XZHVMwdJV/RAiqhzM0+5I L52hoXcHfSh/ylbqNt1bFt5S3mxkoeakGQ== X-Google-Smtp-Source: ABdhPJznKNMkTNpH7wLf/JKgLwzkIXbI/gQfqHOF5qSX8S1logauKnWnzybrWIxc5KTgGau64G2YQg== X-Received: by 2002:a05:6638:16d4:: with SMTP id g20mr7213729jat.22.1632880502942; Tue, 28 Sep 2021 18:55:02 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id z4sm474811ilp.64.2021.09.28.18.55.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Sep 2021 18:55:02 -0700 (PDT) Date: Tue, 28 Sep 2021 21:55:01 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, avarab@gmail.com, gitster@pobox.com, jonathantanmy@google.com, steadmon@google.com Subject: [PATCH v3 1/9] midx: expose `write_midx_file_only()` publicly Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Expose a variant of the write_midx_file() function which ignores packs that aren't included in an explicit "allow" list. This will be used in an upcoming patch to power a new `--stdin-packs` mode of `git multi-pack-index write` for callers that only want to include certain packs in a MIDX (and ignore any packs which may have happened to enter the repository independently, e.g., from pushes). Those patches will provide test coverage for this new function. Signed-off-by: Taylor Blau --- midx.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++-------- midx.h | 9 +++++++++ 2 files changed, 58 insertions(+), 8 deletions(-) diff --git a/midx.c b/midx.c index f96fb2efee..7ac97e66e0 100644 --- a/midx.c +++ b/midx.c @@ -460,6 +460,8 @@ struct write_midx_context { uint32_t num_large_offsets; int preferred_pack_idx; + + struct string_list *to_include; }; static void add_pack_to_midx(const char *full_path, size_t full_path_len, @@ -469,8 +471,26 @@ static void add_pack_to_midx(const char *full_path, size_t full_path_len, if (ends_with(file_name, ".idx")) { display_progress(ctx->progress, ++ctx->pack_paths_checked); + /* + * Note that at most one of ctx->m and ctx->to_include are set, + * so we are testing midx_contains_pack() and + * string_list_has_string() independently (guarded by the + * appropriate NULL checks). + * + * We could support passing to_include while reusing an existing + * MIDX, but don't currently since the reuse process drags + * forward all packs from an existing MIDX (without checking + * whether or not they appear in the to_include list). + * + * If we added support for that, these next two conditional + * should be performed independently (likely checking + * to_include before the existing MIDX). + */ if (ctx->m && midx_contains_pack(ctx->m, file_name)) return; + else if (ctx->to_include && + !string_list_has_string(ctx->to_include, file_name)) + return; ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc); @@ -1043,6 +1063,7 @@ static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash, } static int write_midx_internal(const char *object_dir, + struct string_list *packs_to_include, struct string_list *packs_to_drop, const char *preferred_pack_name, unsigned flags) @@ -1067,10 +1088,17 @@ static int write_midx_internal(const char *object_dir, die_errno(_("unable to create leading directories of %s"), midx_name); - for (cur = get_multi_pack_index(the_repository); cur; cur = cur->next) { - if (!strcmp(object_dir, cur->object_dir)) { - ctx.m = cur; - break; + if (!packs_to_include) { + /* + * Only reference an existing MIDX when not filtering which + * packs to include, since all packs and objects are copied + * blindly from an existing MIDX if one is present. + */ + for (cur = get_multi_pack_index(the_repository); cur; cur = cur->next) { + if (!strcmp(object_dir, cur->object_dir)) { + ctx.m = cur; + break; + } } } @@ -1121,10 +1149,13 @@ static int write_midx_internal(const char *object_dir, else ctx.progress = NULL; + ctx.to_include = packs_to_include; + for_each_file_in_pack_dir(object_dir, add_pack_to_midx, &ctx); stop_progress(&ctx.progress); - if (ctx.m && ctx.nr == ctx.m->num_packs && !packs_to_drop) { + if ((ctx.m && ctx.nr == ctx.m->num_packs) && + !(packs_to_include || packs_to_drop)) { struct bitmap_index *bitmap_git; int bitmap_exists; int want_bitmap = flags & MIDX_WRITE_BITMAP; @@ -1365,7 +1396,17 @@ int write_midx_file(const char *object_dir, const char *preferred_pack_name, unsigned flags) { - return write_midx_internal(object_dir, NULL, preferred_pack_name, flags); + return write_midx_internal(object_dir, NULL, NULL, preferred_pack_name, + flags); +} + +int write_midx_file_only(const char *object_dir, + struct string_list *packs_to_include, + const char *preferred_pack_name, + unsigned flags) +{ + return write_midx_internal(object_dir, packs_to_include, NULL, + preferred_pack_name, flags); } struct clear_midx_data { @@ -1645,7 +1686,7 @@ int expire_midx_packs(struct repository *r, const char *object_dir, unsigned fla free(count); if (packs_to_drop.nr) { - result = write_midx_internal(object_dir, &packs_to_drop, NULL, flags); + result = write_midx_internal(object_dir, NULL, &packs_to_drop, NULL, flags); m = NULL; } @@ -1836,7 +1877,7 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size, goto cleanup; } - result = write_midx_internal(object_dir, NULL, NULL, flags); + result = write_midx_internal(object_dir, NULL, NULL, NULL, flags); m = NULL; cleanup: diff --git a/midx.h b/midx.h index aa3da557bb..3545e327ea 100644 --- a/midx.h +++ b/midx.h @@ -2,6 +2,7 @@ #define MIDX_H #include "repository.h" +#include "string-list.h" struct object_id; struct pack_entry; @@ -62,6 +63,14 @@ int midx_contains_pack(struct multi_pack_index *m, const char *idx_or_pack_name) int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, int local); int write_midx_file(const char *object_dir, const char *preferred_pack_name, unsigned flags); +/* + * Variant of write_midx_file which writes a MIDX containing only the packs + * specified in packs_to_include. + */ +int write_midx_file_only(const char *object_dir, + struct string_list *packs_to_include, + const char *preferred_pack_name, + unsigned flags); void clear_midx_file(struct repository *r); int verify_midx_file(struct repository *r, const char *object_dir, unsigned flags); int expire_midx_packs(struct repository *r, const char *object_dir, unsigned flags); From patchwork Wed Sep 29 01:55:04 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 12524395 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3B3B0C433EF for ; Wed, 29 Sep 2021 01:55:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1AA2E60EFD for ; Wed, 29 Sep 2021 01:55:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243646AbhI2B4r (ORCPT ); Tue, 28 Sep 2021 21:56:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41366 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229505AbhI2B4q (ORCPT ); Tue, 28 Sep 2021 21:56:46 -0400 Received: from mail-io1-xd29.google.com (mail-io1-xd29.google.com [IPv6:2607:f8b0:4864:20::d29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7CBD1C06161C for ; Tue, 28 Sep 2021 18:55:06 -0700 (PDT) Received: by mail-io1-xd29.google.com with SMTP id d18so1081423iof.13 for ; Tue, 28 Sep 2021 18:55:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20210112.gappssmtp.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=33mnRJBc90bs/Pe9PpeTX4YZQHUfSNkoVR0bzvGx7X0=; b=KCN570zU+ATJO6b7x69sMBr/vx+UCQuLGaCZH8CYfW9/G65s5wW6KiLkKX4Kzh0NcK hPOq42j8J2lkSiCul5rj4sxoyF353U/bFHtXOHTx+7jlUox6kah6hiLsInBMohRk7F1r xi2z0fpKMxeX3wM6vnVtJCCPnnXwHRoerTNorhjSPduNL40buc+oWgWsaxrjAx7TJjaj Zkg24VhUcX9Fy1Ib3nNFLfpSSL9HYc3QDoVBZ169D/wTn7FIBxucMb5ITkIru93qvoFQ e9WxYRRskFoIz1il+fw07LNzlgWXHimR9Wk379beNZXxBMiC3Om1Bm+DEZfhiWTfruTf M4IQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=33mnRJBc90bs/Pe9PpeTX4YZQHUfSNkoVR0bzvGx7X0=; b=Z2PwhwFpq2X5slCxSQmd2Qwp2ItjmOd7Sxp7UgSpbVmRSGklis/KsrdNZFk4iS8YcG l9VxPqzflG9Jj0Il8kiN6U3Z0J6uyXmcJRIUNH3VZ2m7iQwFkm9rsDRgfRcM9m5jwrQE WgNDBXpD88OcDRGxmU8wfRfn/jv76idLVNFJOoNl+opZGO9cvdIUD+5eNspdvKUpPvXu 4iRT4nm6Uq/8wMwtWLoDl57gYtGbMsJW1YsowosMeHtc+ILU2uTaZ8i4zppp/n7aj8av Oxg3daOcFOSlOzrW2V6JE+1E7l82PSgAEEFbQgVmrO3KOX77x+DR74fxKEm4eoCQc2+j +ZSA== X-Gm-Message-State: AOAM530zWrdstjOBVdZblLTYxBAEm3pcIApz4rxdD1odV465s9uuYXIT V0V0WbrJOdn89v4/if5xRf7vfrQPsiwtZA== X-Google-Smtp-Source: ABdhPJy7gwEoxzEONyJ8JmdfXmCM2RkGYqbzF6JHGTmm28bO3VTHATtkI14v8DaAKvLgC8hUKQkURg== X-Received: by 2002:a6b:d209:: with SMTP id q9mr6222375iob.206.1632880505804; Tue, 28 Sep 2021 18:55:05 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id e13sm521935iod.36.2021.09.28.18.55.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Sep 2021 18:55:05 -0700 (PDT) Date: Tue, 28 Sep 2021 21:55:04 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, avarab@gmail.com, gitster@pobox.com, jonathantanmy@google.com, steadmon@google.com Subject: [PATCH v3 2/9] builtin/multi-pack-index.c: support `--stdin-packs` mode Message-ID: <986ef14f2af3b137b2a67806f3bf96585dbc27f3.1632880469.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org To power a new `--write-midx` mode, `git repack` will want to write a multi-pack index containing a certain set of packs in the repository. This new option will be used by `git repack` to write a MIDX which contains only the packs which will survive after the repack (that is, it will exclude any packs which are about to be deleted). This patch effectively exposes the function implemented in the previous commit via the `git multi-pack-index` builtin. An alternative approach would have been to call that function from the `git repack` builtin directly, but this introduces awkward problems around closing and reopening the object store, so the MIDX will be written out-of-process. Signed-off-by: Taylor Blau --- Documentation/git-multi-pack-index.txt | 4 ++++ builtin/multi-pack-index.c | 27 ++++++++++++++++++++++++++ t/t5319-multi-pack-index.sh | 15 ++++++++++++++ 3 files changed, 46 insertions(+) diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt index a9df3dbd32..009c989ef8 100644 --- a/Documentation/git-multi-pack-index.txt +++ b/Documentation/git-multi-pack-index.txt @@ -45,6 +45,10 @@ write:: --[no-]bitmap:: Control whether or not a multi-pack bitmap is written. + + --stdin-packs:: + Write a multi-pack index containing only the set of + line-delimited pack index basenames provided over stdin. -- verify:: diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c index 66de6efd41..03aaf8e7fb 100644 --- a/builtin/multi-pack-index.c +++ b/builtin/multi-pack-index.c @@ -47,6 +47,7 @@ static struct opts_multi_pack_index { const char *preferred_pack; unsigned long batch_size; unsigned flags; + int stdin_packs; } opts; static struct option common_opts[] = { @@ -61,6 +62,16 @@ static struct option *add_common_options(struct option *prev) return parse_options_concat(common_opts, prev); } +static void read_packs_from_stdin(struct string_list *to) +{ + struct strbuf buf = STRBUF_INIT; + while (strbuf_getline(&buf, stdin) != EOF) + string_list_append(to, buf.buf); + string_list_sort(to); + + strbuf_release(&buf); +} + static int cmd_multi_pack_index_write(int argc, const char **argv) { struct option *options; @@ -70,6 +81,8 @@ static int cmd_multi_pack_index_write(int argc, const char **argv) N_("pack for reuse when computing a multi-pack bitmap")), OPT_BIT(0, "bitmap", &opts.flags, N_("write multi-pack bitmap"), MIDX_WRITE_BITMAP | MIDX_WRITE_REV_INDEX), + OPT_BOOL(0, "stdin-packs", &opts.stdin_packs, + N_("write multi-pack index containing only given indexes")), OPT_END(), }; @@ -86,6 +99,20 @@ static int cmd_multi_pack_index_write(int argc, const char **argv) FREE_AND_NULL(options); + if (opts.stdin_packs) { + struct string_list packs = STRING_LIST_INIT_DUP; + int ret; + + read_packs_from_stdin(&packs); + + ret = write_midx_file_only(opts.object_dir, &packs, + opts.preferred_pack, opts.flags); + + string_list_clear(&packs, 0); + + return ret; + + } return write_midx_file(opts.object_dir, opts.preferred_pack, opts.flags); } diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh index bb04f0f23b..385f0a3efd 100755 --- a/t/t5319-multi-pack-index.sh +++ b/t/t5319-multi-pack-index.sh @@ -168,6 +168,21 @@ test_expect_success 'write midx with two packs' ' compare_results_with_midx "two packs" +test_expect_success 'write midx with --stdin-packs' ' + rm -fr $objdir/pack/multi-pack-index && + + idx="$(find $objdir/pack -name "test-2-*.idx")" && + basename "$idx" >in && + + git multi-pack-index write --stdin-packs packs && + + test_cmp packs in +' + +compare_results_with_midx "mixed mode (one pack + extra)" + test_expect_success 'write progress off for redirected stderr' ' git multi-pack-index --object-dir=$objdir write 2>err && test_line_count = 0 err From patchwork Wed Sep 29 01:55:07 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 12524397 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2ACA4C433F5 for ; Wed, 29 Sep 2021 01:55:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0A2CA61352 for ; Wed, 29 Sep 2021 01:55:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243651AbhI2B4u (ORCPT ); Tue, 28 Sep 2021 21:56:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41386 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229505AbhI2B4t (ORCPT ); Tue, 28 Sep 2021 21:56:49 -0400 Received: from mail-io1-xd33.google.com (mail-io1-xd33.google.com [IPv6:2607:f8b0:4864:20::d33]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8A8ABC06161C for ; Tue, 28 Sep 2021 18:55:09 -0700 (PDT) Received: by mail-io1-xd33.google.com with SMTP id y197so1087191iof.11 for ; Tue, 28 Sep 2021 18:55:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20210112.gappssmtp.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=xomgPOqHvDExerKweyuwzFQWM/+HeysQdGLPTmXaLRM=; b=ny2yffJYha0bwPXmYOKI4my13ejitK5fA6HIxgjGM038Iu2WAm58Qw+5aMYCQgD4z4 6gnc9UmWDHzITVi9HdD8//Letmx6QmYi4/7CySaSLzhntTw1T9v+4xrUKPUQYffPstTS EhRutis1i8mdmD+C3ukz9TA+mtjMS33UyMnCzn1k5pSW86c4fON0NWwAlE5DBaMLXGJt jmFWMn+FAFshC7QMfIZZOsiBzcaNUtaNVMY4EMCBXT8NP2DTSGijaB+kpPW1nbFn8s2/ lIt+lkcQfVaqBteotYrSojpUfDPbkuIQDdc+hFKZaP0wpzPV8Zha2A0QytkSJmuSE0ai Jqcw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=xomgPOqHvDExerKweyuwzFQWM/+HeysQdGLPTmXaLRM=; b=6GlNFsnCoYMAfSpnTI+lZVQFyj/921C5fnxCa6lmrmMxYiNrFnWCCx/0zwGfv/Kfcg +RBrJ9OvU+/52HlPpfVTLzxwOhdamAHIWDpoWn26ZZHcqtMCxN33ANVY0A5JWQ5KexAC 48+DwLawYecKycKUirFOf0NBPTGeVQhDrer1D1oHze8JHJ7llrgiBkIEqUAwQKdJD9up pU/Msfiwenl7wiDC6SOPYmBhTwts+YxIgHJj+rPbj/mWSfqxl19U8zdThz3Gx8r6iiwj /9kQyy7w4nPefjhDC6C1zT5EpzaasHbgczjdg+9wIUXoucbeQ1Z94wJ6kWfkQVTQSbsu IA+w== X-Gm-Message-State: AOAM531N9+x0376K0srUw8pIwm3A1qNIfl0lRkKZEpUWu6ysFfV+7IPk NtN5rGwC7XNq6Y6aJE33lgMUUYT6h4u2gg== X-Google-Smtp-Source: ABdhPJzM9RPMkEvQJ0EUhkdr+QyTOnC69K3sCQGcQG0I5fxU34jKuOv4yVCZr5vHrFhSUwugq5k1XQ== X-Received: by 2002:a02:a38f:: with SMTP id y15mr7246961jak.26.1632880508471; Tue, 28 Sep 2021 18:55:08 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id r13sm487922ilh.80.2021.09.28.18.55.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Sep 2021 18:55:08 -0700 (PDT) Date: Tue, 28 Sep 2021 21:55:07 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, avarab@gmail.com, gitster@pobox.com, jonathantanmy@google.com, steadmon@google.com Subject: [PATCH v3 3/9] midx: preliminary support for `--refs-snapshot` Message-ID: <4e3769a4f39f78788be975adcb5b3b57143df2c7.1632880469.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org To figure out which commits we can write a bitmap for, the multi-pack index/bitmap code does a reachability traversal, marking any commit which can be found in the MIDX as eligible to receive a bitmap. This approach will cause a problem when multi-pack bitmaps are able to be generated from `git repack`, since the reference tips can change during the repack. Even though we ignore commits that don't exist in the MIDX (when doing a scan of the ref tips), it's possible that a commit in the MIDX reaches something that isn't. This can happen when a multi-pack index contains some pack which refers to loose objects (e.g., if a pack was pushed after starting the repack but before generating the MIDX which depends on an object which is stored as loose in the repository, and by definition isn't included in the multi-pack index). By taking a snapshot of the references before we start repacking, we can close that race window. In the above scenario (where we have a packed object pointing at a loose one), we'll either (a) take a snapshot of the references before seeing the packed one, or (b) take it after, at which point we can guarantee that the loose object will be packed and included in the MIDX. This patch does just that. It writes a temporary "reference snapshot", which is a list of OIDs that are at the ref tips before writing a multi-pack bitmap. References that are "preferred" (i.e,. are a suffix of at least one value of the 'pack.preferBitmapTips' configuration) are marked with a special '+'. The format is simple: one line per commit at each tip, with an optional '+' at the beginning (for preferred references, as described above). When provided, the reference snapshot is used to drive bitmap selection instead of the MIDX code doing its own traversal. When it isn't provided, the usual traversal takes place instead. Signed-off-by: Taylor Blau --- Documentation/git-multi-pack-index.txt | 15 +++++ builtin/multi-pack-index.c | 11 +++- builtin/repack.c | 2 +- midx.c | 61 ++++++++++++++++--- midx.h | 6 +- t/t5326-multi-pack-bitmaps.sh | 82 ++++++++++++++++++++++++++ 6 files changed, 164 insertions(+), 13 deletions(-) diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt index 009c989ef8..27f83932e4 100644 --- a/Documentation/git-multi-pack-index.txt +++ b/Documentation/git-multi-pack-index.txt @@ -49,6 +49,21 @@ write:: --stdin-packs:: Write a multi-pack index containing only the set of line-delimited pack index basenames provided over stdin. + + --refs-snapshot=:: + With `--bitmap`, optionally specify a file which + contains a "refs snapshot" taken prior to repacking. ++ +A reference snapshot is composed of line-delimited OIDs corresponding to +the reference tips, usually taken by `git repack` prior to generating a +new pack. A line may optionally start with a `+` character to indicate +that the reference which corresponds to that OID is "preferred" (see +linkgit:git-config[1]'s `pack.preferBitmapTips`.) ++ +The file given at `` is expected to be readable, and can contain +duplicates. (If a given OID is given more than once, it is marked as +preferred if at least one instance of it begins with the special `+` +marker). -- verify:: diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c index 03aaf8e7fb..93869d58c5 100644 --- a/builtin/multi-pack-index.c +++ b/builtin/multi-pack-index.c @@ -7,7 +7,8 @@ #include "object-store.h" #define BUILTIN_MIDX_WRITE_USAGE \ - N_("git multi-pack-index [] write [--preferred-pack=]") + N_("git multi-pack-index [] write [--preferred-pack=]" \ + "[--refs-snapshot=]") #define BUILTIN_MIDX_VERIFY_USAGE \ N_("git multi-pack-index [] verify") @@ -45,6 +46,7 @@ static char const * const builtin_multi_pack_index_usage[] = { static struct opts_multi_pack_index { const char *object_dir; const char *preferred_pack; + const char *refs_snapshot; unsigned long batch_size; unsigned flags; int stdin_packs; @@ -83,6 +85,8 @@ static int cmd_multi_pack_index_write(int argc, const char **argv) MIDX_WRITE_BITMAP | MIDX_WRITE_REV_INDEX), OPT_BOOL(0, "stdin-packs", &opts.stdin_packs, N_("write multi-pack index containing only given indexes")), + OPT_FILENAME(0, "refs-snapshot", &opts.refs_snapshot, + N_("refs snapshot for selecting bitmap commits")), OPT_END(), }; @@ -106,7 +110,8 @@ static int cmd_multi_pack_index_write(int argc, const char **argv) read_packs_from_stdin(&packs); ret = write_midx_file_only(opts.object_dir, &packs, - opts.preferred_pack, opts.flags); + opts.preferred_pack, + opts.refs_snapshot, opts.flags); string_list_clear(&packs, 0); @@ -114,7 +119,7 @@ static int cmd_multi_pack_index_write(int argc, const char **argv) } return write_midx_file(opts.object_dir, opts.preferred_pack, - opts.flags); + opts.refs_snapshot, opts.flags); } static int cmd_multi_pack_index_verify(int argc, const char **argv) diff --git a/builtin/repack.c b/builtin/repack.c index c1a209013b..dba83eede2 100644 --- a/builtin/repack.c +++ b/builtin/repack.c @@ -733,7 +733,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix) unsigned flags = 0; if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP, 0)) flags |= MIDX_WRITE_BITMAP | MIDX_WRITE_REV_INDEX; - write_midx_file(get_object_directory(), NULL, flags); + write_midx_file(get_object_directory(), NULL, NULL, flags); } string_list_clear(&names, 0); diff --git a/midx.c b/midx.c index 7ac97e66e0..984dea2dba 100644 --- a/midx.c +++ b/midx.c @@ -968,7 +968,43 @@ static void bitmap_show_commit(struct commit *commit, void *_data) data->commits[data->commits_nr++] = commit; } +static int read_refs_snapshot(const char *refs_snapshot, + struct rev_info *revs) +{ + struct strbuf buf = STRBUF_INIT; + struct object_id oid; + FILE *f = xfopen(refs_snapshot, "r"); + + while (strbuf_getline(&buf, f) != EOF) { + struct object *object; + int preferred = 0; + char *hex = buf.buf; + const char *end = NULL; + + if (buf.len && *buf.buf == '+') { + preferred = 1; + hex = &buf.buf[1]; + } + + if (parse_oid_hex(hex, &oid, &end) < 0) + die(_("could not parse line: %s"), buf.buf); + if (*end) + die(_("malformed line: %s"), buf.buf); + + object = parse_object_or_die(&oid, NULL); + if (preferred) + object->flags |= NEEDS_BITMAP; + + add_pending_object(revs, object, ""); + } + + fclose(f); + strbuf_release(&buf); + return 0; +} + static struct commit **find_commits_for_midx_bitmap(uint32_t *indexed_commits_nr_p, + const char *refs_snapshot, struct write_midx_context *ctx) { struct rev_info revs; @@ -977,8 +1013,12 @@ static struct commit **find_commits_for_midx_bitmap(uint32_t *indexed_commits_nr cb.ctx = ctx; repo_init_revisions(the_repository, &revs, NULL); - setup_revisions(0, NULL, &revs, NULL); - for_each_ref(add_ref_to_pending, &revs); + if (refs_snapshot) { + read_refs_snapshot(refs_snapshot, &revs); + } else { + setup_revisions(0, NULL, &revs, NULL); + for_each_ref(add_ref_to_pending, &revs); + } /* * Skipping promisor objects here is intentional, since it only excludes @@ -1007,6 +1047,7 @@ static struct commit **find_commits_for_midx_bitmap(uint32_t *indexed_commits_nr static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash, struct write_midx_context *ctx, + const char *refs_snapshot, unsigned flags) { struct packing_data pdata; @@ -1018,7 +1059,7 @@ static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash, prepare_midx_packing_data(&pdata, ctx); - commits = find_commits_for_midx_bitmap(&commits_nr, ctx); + commits = find_commits_for_midx_bitmap(&commits_nr, refs_snapshot, ctx); /* * Build the MIDX-order index based on pdata.objects (which is already @@ -1066,6 +1107,7 @@ static int write_midx_internal(const char *object_dir, struct string_list *packs_to_include, struct string_list *packs_to_drop, const char *preferred_pack_name, + const char *refs_snapshot, unsigned flags) { char *midx_name; @@ -1359,7 +1401,8 @@ static int write_midx_internal(const char *object_dir, if (flags & MIDX_WRITE_REV_INDEX) write_midx_reverse_index(midx_name, midx_hash, &ctx); if (flags & MIDX_WRITE_BITMAP) { - if (write_midx_bitmap(midx_name, midx_hash, &ctx, flags) < 0) { + if (write_midx_bitmap(midx_name, midx_hash, &ctx, + refs_snapshot, flags) < 0) { error(_("could not write multi-pack bitmap")); result = 1; goto cleanup; @@ -1394,19 +1437,21 @@ static int write_midx_internal(const char *object_dir, int write_midx_file(const char *object_dir, const char *preferred_pack_name, + const char *refs_snapshot, unsigned flags) { return write_midx_internal(object_dir, NULL, NULL, preferred_pack_name, - flags); + refs_snapshot, flags); } int write_midx_file_only(const char *object_dir, struct string_list *packs_to_include, const char *preferred_pack_name, + const char *refs_snapshot, unsigned flags) { return write_midx_internal(object_dir, packs_to_include, NULL, - preferred_pack_name, flags); + preferred_pack_name, refs_snapshot, flags); } struct clear_midx_data { @@ -1686,7 +1731,7 @@ int expire_midx_packs(struct repository *r, const char *object_dir, unsigned fla free(count); if (packs_to_drop.nr) { - result = write_midx_internal(object_dir, NULL, &packs_to_drop, NULL, flags); + result = write_midx_internal(object_dir, NULL, &packs_to_drop, NULL, NULL, flags); m = NULL; } @@ -1877,7 +1922,7 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size, goto cleanup; } - result = write_midx_internal(object_dir, NULL, NULL, NULL, flags); + result = write_midx_internal(object_dir, NULL, NULL, NULL, NULL, flags); m = NULL; cleanup: diff --git a/midx.h b/midx.h index 3545e327ea..11ff094a8c 100644 --- a/midx.h +++ b/midx.h @@ -62,14 +62,18 @@ int fill_midx_entry(struct repository *r, const struct object_id *oid, struct pa int midx_contains_pack(struct multi_pack_index *m, const char *idx_or_pack_name); int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, int local); -int write_midx_file(const char *object_dir, const char *preferred_pack_name, unsigned flags); /* * Variant of write_midx_file which writes a MIDX containing only the packs * specified in packs_to_include. */ +int write_midx_file(const char *object_dir, + const char *preferred_pack_name, + const char *refs_snapshot, + unsigned flags); int write_midx_file_only(const char *object_dir, struct string_list *packs_to_include, const char *preferred_pack_name, + const char *refs_snapshot, unsigned flags); void clear_midx_file(struct repository *r); int verify_midx_file(struct repository *r, const char *object_dir, unsigned flags); diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh index 4ad7c2c969..069dab3e17 100755 --- a/t/t5326-multi-pack-bitmaps.sh +++ b/t/t5326-multi-pack-bitmaps.sh @@ -283,4 +283,86 @@ test_expect_success 'pack.preferBitmapTips' ' ) ' +test_expect_success 'writing a bitmap with --refs-snapshot' ' + git init repo && + test_when_finished "rm -fr repo" && + ( + cd repo && + + test_commit one && + test_commit two && + + git rev-parse one >snapshot && + + git repack -ad && + + # First, write a MIDX which see both refs/tags/one and + # refs/tags/two (causing both of those commits to receive + # bitmaps). + git multi-pack-index write --bitmap && + + test_path_is_file $midx && + test_path_is_file $midx-$(midx_checksum $objdir).bitmap && + + test-tool bitmap list-commits | sort >bitmaps && + grep "$(git rev-parse one)" bitmaps && + grep "$(git rev-parse two)" bitmaps && + + rm -fr $midx-$(midx_checksum $objdir).bitmap && + rm -fr $midx-$(midx_checksum $objdir).rev && + rm -fr $midx && + + # Then again, but with a refs snapshot which only sees + # refs/tags/one. + git multi-pack-index write --bitmap --refs-snapshot=snapshot && + + test_path_is_file $midx && + test_path_is_file $midx-$(midx_checksum $objdir).bitmap && + + test-tool bitmap list-commits | sort >bitmaps && + grep "$(git rev-parse one)" bitmaps && + ! grep "$(git rev-parse two)" bitmaps + ) +' + +test_expect_success 'write a bitmap with --refs-snapshot (preferred tips)' ' + git init repo && + test_when_finished "rm -fr repo" && + ( + cd repo && + + test_commit_bulk --message="%s" 103 && + + git log --format="%H" >commits.raw && + sort commits && + + git log --format="create refs/tags/%s %H" HEAD >refs && + git update-ref --stdin bitmaps && + comm -13 bitmaps commits >before && + test_line_count = 1 before && + + ( + grep -vf before commits.raw && + # mark missing commits as preferred + sed "s/^/+/" before + ) >snapshot && + + rm -fr $midx-$(midx_checksum $objdir).bitmap && + rm -fr $midx-$(midx_checksum $objdir).rev && + rm -fr $midx && + + git multi-pack-index write --bitmap --refs-snapshot=snapshot && + test-tool bitmap list-commits | sort >bitmaps && + comm -13 bitmaps commits >after && + + ! test_cmp before after + ) +' + test_done From patchwork Wed Sep 29 01:55:10 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 12524399 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8E2F4C433EF for ; Wed, 29 Sep 2021 01:55:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6C13C613C8 for ; Wed, 29 Sep 2021 01:55:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243657AbhI2B4w (ORCPT ); Tue, 28 Sep 2021 21:56:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41404 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243653AbhI2B4w (ORCPT ); Tue, 28 Sep 2021 21:56:52 -0400 Received: from mail-il1-x132.google.com (mail-il1-x132.google.com [IPv6:2607:f8b0:4864:20::132]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E25BCC061745 for ; Tue, 28 Sep 2021 18:55:11 -0700 (PDT) Received: by mail-il1-x132.google.com with SMTP id k13so1109953ilo.7 for ; Tue, 28 Sep 2021 18:55:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20210112.gappssmtp.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=8/j7OCA5X6GA98KEGZELf+J6mt6aZqN/sCyf/uNXWng=; b=hg11mjPhW5JhKLForN3kDfkPEJFkrmebbucFIH5iA43gYaftSsBCa7rw1OFjiOlyVo aUxSWOcP0OY93T7VNay8wNPfnwZQ+MBwCgFhs/R0Qrpd6RREI5Ma/CbxUY1iwWLV2mpH YtooxHne1ZJiHpuxW+7enxKpiQv2kJqPKyqGafFNuxUtHe3SCfQquz28HpI1/hxvdf+b Qte3o3zPKmIf4cgRPqxWgP3k6OGhtWNTBhD6nJiStVTPKbDWCHRWdD17Lc/T2tx5Gf1e zftciUq14C5/UO1RGgRNhJClnKn/L1WcAeVoKBiTbq54NlzITx51MVFMq5yYQs6I5tSB KiYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=8/j7OCA5X6GA98KEGZELf+J6mt6aZqN/sCyf/uNXWng=; b=5nN3ysxzHy2GDnvuHB5/fBMtbwT83y1K/sMXDQZQtrQSOnvTcMuVEl8X4HcWNFwAc5 ktcz76fkfYuLvZ/OyjTPo+4S4j4evmWZPxwg4I5h8AnpnHAzZGyTC41zBvd2qx6QJmx8 U2cs/Gu2naQ8tiHw3IMMAu5/8ZQR1RXB3KEah621aHwPbAA91V/rHh31XUO+f0gH786N wPvWy6Iw3Frx6zxu8TwYBSN2eH+xHhTMh7XMvijFchtuOImTfL28ueiifPRM4trsGdcj rJ+4jpce55jGuBPP/GA615VeQWBfHD8UgsViU+zf7XFgPO4BD56jMMIb1UkqQVBwduio skDA== X-Gm-Message-State: AOAM532Jc0SlFVninaLvHd5qnXpgAbIP1UzFWxxAgYINHca9r2RD4j4u HdAO92jTVsDMM2de8bmoAAxGhr4W1shXVg== X-Google-Smtp-Source: ABdhPJyGlkE7bYiCmiCRUaa0/Brz2pHSd+iUb5SeohMJNywPPrGIkmpxHdjsOBhvJJxI/J8SVaF0Rw== X-Received: by 2002:a92:ca0b:: with SMTP id j11mr6517076ils.191.1632880511248; Tue, 28 Sep 2021 18:55:11 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id x13sm473917ila.29.2021.09.28.18.55.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Sep 2021 18:55:10 -0700 (PDT) Date: Tue, 28 Sep 2021 21:55:10 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, avarab@gmail.com, gitster@pobox.com, jonathantanmy@google.com, steadmon@google.com Subject: [PATCH v3 4/9] builtin/repack.c: keep track of existing packs unconditionally Message-ID: <1b3dd331cac3a94747838bfeb6be14c7c5a240ed.1632880469.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org In order to be able to write a multi-pack index during repacking, `git repack` must keep track of which packs it wants to write into the MIDX. This set is the union of existing packs which will not be deleted, new pack(s) generated as a result of the repack, and .keep packs. Prior to this patch, `git repack` populated the list of existing packs only when repacking all-into-one (i.e., with `-A` or `-a`), but we will soon need to know this list when repacking when writing a MIDX without a-i-o. Populate the list of existing packs unconditionally, and guard removing packs from that list only when repacking a-i-o. Additionally, keep track of filenames of kept packs separately, since this, too, will be used in an upcoming patch. Signed-off-by: Taylor Blau --- builtin/repack.c | 56 +++++++++++++++++++++++++++--------------------- 1 file changed, 31 insertions(+), 25 deletions(-) diff --git a/builtin/repack.c b/builtin/repack.c index dba83eede2..39f11df675 100644 --- a/builtin/repack.c +++ b/builtin/repack.c @@ -94,12 +94,14 @@ static void remove_pack_on_signal(int signo) } /* - * Adds all packs hex strings to the fname list, which do not - * have a corresponding .keep file. These packs are not to - * be kept if we are going to pack everything into one file. + * Adds all packs hex strings to either fname_list or fname_kept_list + * based on whether each pack has a corresponding .keep file or not. + * Packs without a .keep file are not to be kept if we are going to + * pack everything into one file. */ -static void get_non_kept_pack_filenames(struct string_list *fname_list, - const struct string_list *extra_keep) +static void collect_pack_filenames(struct string_list *fname_list, + struct string_list *fname_kept_list, + const struct string_list *extra_keep) { DIR *dir; struct dirent *e; @@ -112,21 +114,20 @@ static void get_non_kept_pack_filenames(struct string_list *fname_list, size_t len; int i; + if (!strip_suffix(e->d_name, ".pack", &len)) + continue; + for (i = 0; i < extra_keep->nr; i++) if (!fspathcmp(e->d_name, extra_keep->items[i].string)) break; - if (extra_keep->nr > 0 && i < extra_keep->nr) - continue; - - if (!strip_suffix(e->d_name, ".pack", &len)) - continue; fname = xmemdupz(e->d_name, len); - if (!file_exists(mkpath("%s/%s.keep", packdir, fname))) - string_list_append_nodup(fname_list, fname); + if ((extra_keep->nr > 0 && i < extra_keep->nr) || + (file_exists(mkpath("%s/%s.keep", packdir, fname)))) + string_list_append_nodup(fname_kept_list, fname); else - free(fname); + string_list_append_nodup(fname_list, fname); } closedir(dir); } @@ -440,6 +441,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix) struct string_list names = STRING_LIST_INIT_DUP; struct string_list rollback = STRING_LIST_INIT_NODUP; struct string_list existing_packs = STRING_LIST_INIT_DUP; + struct string_list existing_kept_packs = STRING_LIST_INIT_DUP; struct pack_geometry *geometry = NULL; struct strbuf line = STRBUF_INIT; int i, ext, ret; @@ -572,9 +574,10 @@ int cmd_repack(int argc, const char **argv, const char *prefix) if (use_delta_islands) strvec_push(&cmd.args, "--delta-islands"); + collect_pack_filenames(&existing_packs, &existing_kept_packs, + &keep_pack_list); + if (pack_everything & ALL_INTO_ONE) { - get_non_kept_pack_filenames(&existing_packs, &keep_pack_list); - repack_promisor_objects(&po_args, &names); if (existing_packs.nr && delete_redundant) { @@ -683,17 +686,19 @@ int cmd_repack(int argc, const char **argv, const char *prefix) reprepare_packed_git(the_repository); if (delete_redundant) { - const int hexsz = the_hash_algo->hexsz; int opts = 0; - string_list_sort(&names); - for_each_string_list_item(item, &existing_packs) { - char *sha1; - size_t len = strlen(item->string); - if (len < hexsz) - continue; - sha1 = item->string + len - hexsz; - if (!string_list_has_string(&names, sha1)) - remove_redundant_pack(packdir, item->string); + if (pack_everything & ALL_INTO_ONE) { + const int hexsz = the_hash_algo->hexsz; + string_list_sort(&names); + for_each_string_list_item(item, &existing_packs) { + char *sha1; + size_t len = strlen(item->string); + if (len < hexsz) + continue; + sha1 = item->string + len - hexsz; + if (!string_list_has_string(&names, sha1)) + remove_redundant_pack(packdir, item->string); + } } if (geometry) { @@ -739,6 +744,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix) string_list_clear(&names, 0); string_list_clear(&rollback, 0); string_list_clear(&existing_packs, 0); + string_list_clear(&existing_kept_packs, 0); clear_pack_geometry(geometry); strbuf_release(&line); From patchwork Wed Sep 29 01:55:12 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 12524401 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AEBC3C433EF for ; Wed, 29 Sep 2021 01:55:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 96CBA61352 for ; Wed, 29 Sep 2021 01:55:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243677AbhI2B44 (ORCPT ); Tue, 28 Sep 2021 21:56:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41420 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243658AbhI2B4y (ORCPT ); Tue, 28 Sep 2021 21:56:54 -0400 Received: from mail-il1-x135.google.com (mail-il1-x135.google.com [IPv6:2607:f8b0:4864:20::135]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 94B68C061745 for ; Tue, 28 Sep 2021 18:55:14 -0700 (PDT) Received: by mail-il1-x135.google.com with SMTP id y15so1088081ilu.12 for ; Tue, 28 Sep 2021 18:55:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20210112.gappssmtp.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=plmzWDKNttTf3F4mYoAK+PUPvE5Sb6Lxr4ohnq+wxTg=; b=XAe46bSNMocKc0pmOTm9envxLWIs9DVh+CjGYFjJXkUXEjcwhhlxlEg9UIDRHGTIIT yy3wJ8jo46cRGpk3kLwgflEAv5gIWNlSsPW6irEZ9ZVyutKaicXgjOP14Q8AeaIQUy1r vp3TeAE5Bxzemw2n0dNWbcakUYRZcuQZqm2xbZ/8wykk3TXEoBvP+53ZTTG9FfbC6nFi 9caKnu4cm8I1EOztTEXB8fEltVWX2wZdhWzFLNoXOH57ru6iGJem9ApuOzMJdQpjsa8z bUaHdPG1qej4J9dhciy+5G6lQ85o2uJRBqVDCf38us8hVuBj85OzNltraLik/ffTE9Mh NnFw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=plmzWDKNttTf3F4mYoAK+PUPvE5Sb6Lxr4ohnq+wxTg=; b=lCJiG8FRJxlnUAlHTwiwgW5WymAdy3xBD6kOZKpSC0HlPkyxsd9N+G0LgpRh7+BkMS tSDQHtvKpZWY1dP6TeuYyInjWfGboJSo6NS4dwRTP7GLJXvrHAJGYJNw7SCyRDCaD4+a gBUGxCWlyAn18S9mwHBhIg3vmxgFFWect7cfoLdr+VjXt3yRtp2lcCcjIxPCKHWI2eNJ 5JZNkTCbuIjjWyrw8h67LbDdTQVD20vVAWgj/bA/tdz+2KZKLF57sHrLyP2OS3TVfQSE F4ByJBNnad+dVyrYD4gKvKBBPHjKWxK/1Q0lMtPaFlOLV1iayUMH9yoNPX2Nswnbc0gU YUOg== X-Gm-Message-State: AOAM530FUyyYhiup7hu3UpZjSghOfzMMNEQ01a/cMVtgd9o44IDOfZu2 hLCxrHQwvggb5g3ib2uJrl+z7ROLsNT8MQ== X-Google-Smtp-Source: ABdhPJw2tV/ERlXh76cLxH2N37qkUsvRjFThrylNxW40W1oVnk/Xd9k5ABiOkyEhs5vSm2AVnYKaPQ== X-Received: by 2002:a05:6e02:10c4:: with SMTP id s4mr6334250ilj.285.1632880513863; Tue, 28 Sep 2021 18:55:13 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id u14sm588993iob.18.2021.09.28.18.55.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Sep 2021 18:55:13 -0700 (PDT) Date: Tue, 28 Sep 2021 21:55:12 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, avarab@gmail.com, gitster@pobox.com, jonathantanmy@google.com, steadmon@google.com Subject: [PATCH v3 5/9] builtin/repack.c: rename variables that deal with non-kept packs Message-ID: <15831a201a99534063b81f9bba81ed88f766f46e.1632880469.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org The new variable `existing_kept_packs` (and corresponding parameter `fname_kept_list`) added by the previous patch make it seem like `existing_packs` and `fname_list` are each subsets of the other two respectively. In reality, each pair is disjoint: one stores the packs without .keep files, and the other stores the packs with .keep files. Rename each to more clearly reflect this. Suggested-by: Jonathan Tan Signed-off-by: Taylor Blau --- builtin/repack.c | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/builtin/repack.c b/builtin/repack.c index 39f11df675..5539ec7e89 100644 --- a/builtin/repack.c +++ b/builtin/repack.c @@ -94,12 +94,12 @@ static void remove_pack_on_signal(int signo) } /* - * Adds all packs hex strings to either fname_list or fname_kept_list - * based on whether each pack has a corresponding .keep file or not. - * Packs without a .keep file are not to be kept if we are going to - * pack everything into one file. + * Adds all packs hex strings to either fname_nonkept_list or + * fname_kept_list based on whether each pack has a corresponding + * .keep file or not. Packs without a .keep file are not to be kept + * if we are going to pack everything into one file. */ -static void collect_pack_filenames(struct string_list *fname_list, +static void collect_pack_filenames(struct string_list *fname_nonkept_list, struct string_list *fname_kept_list, const struct string_list *extra_keep) { @@ -127,7 +127,7 @@ static void collect_pack_filenames(struct string_list *fname_list, (file_exists(mkpath("%s/%s.keep", packdir, fname)))) string_list_append_nodup(fname_kept_list, fname); else - string_list_append_nodup(fname_list, fname); + string_list_append_nodup(fname_nonkept_list, fname); } closedir(dir); } @@ -440,7 +440,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix) struct string_list_item *item; struct string_list names = STRING_LIST_INIT_DUP; struct string_list rollback = STRING_LIST_INIT_NODUP; - struct string_list existing_packs = STRING_LIST_INIT_DUP; + struct string_list existing_nonkept_packs = STRING_LIST_INIT_DUP; struct string_list existing_kept_packs = STRING_LIST_INIT_DUP; struct pack_geometry *geometry = NULL; struct strbuf line = STRBUF_INIT; @@ -574,13 +574,13 @@ int cmd_repack(int argc, const char **argv, const char *prefix) if (use_delta_islands) strvec_push(&cmd.args, "--delta-islands"); - collect_pack_filenames(&existing_packs, &existing_kept_packs, + collect_pack_filenames(&existing_nonkept_packs, &existing_kept_packs, &keep_pack_list); if (pack_everything & ALL_INTO_ONE) { repack_promisor_objects(&po_args, &names); - if (existing_packs.nr && delete_redundant) { + if (existing_nonkept_packs.nr && delete_redundant) { for_each_string_list_item(item, &names) { strvec_pushf(&cmd.args, "--keep-pack=%s-%s.pack", packtmp_name, item->string); @@ -690,7 +690,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix) if (pack_everything & ALL_INTO_ONE) { const int hexsz = the_hash_algo->hexsz; string_list_sort(&names); - for_each_string_list_item(item, &existing_packs) { + for_each_string_list_item(item, &existing_nonkept_packs) { char *sha1; size_t len = strlen(item->string); if (len < hexsz) @@ -743,7 +743,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix) string_list_clear(&names, 0); string_list_clear(&rollback, 0); - string_list_clear(&existing_packs, 0); + string_list_clear(&existing_nonkept_packs, 0); string_list_clear(&existing_kept_packs, 0); clear_pack_geometry(geometry); strbuf_release(&line); From patchwork Wed Sep 29 01:55:15 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 12524403 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C7262C433EF for ; Wed, 29 Sep 2021 01:55:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id AF29E61352 for ; Wed, 29 Sep 2021 01:55:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243669AbhI2B47 (ORCPT ); Tue, 28 Sep 2021 21:56:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41450 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243661AbhI2B45 (ORCPT ); Tue, 28 Sep 2021 21:56:57 -0400 Received: from mail-io1-xd34.google.com (mail-io1-xd34.google.com [IPv6:2607:f8b0:4864:20::d34]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4949FC061745 for ; Tue, 28 Sep 2021 18:55:17 -0700 (PDT) Received: by mail-io1-xd34.google.com with SMTP id p80so1095432iod.10 for ; Tue, 28 Sep 2021 18:55:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20210112.gappssmtp.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=iUERTpfrumNzPpL+9R6cg7A8hehvT+o3pXXZN9DmkNc=; b=wpweK1OKrrTNG5yO0AWtY7uNrhGx7KevjJeWRyNg7GsN5ZZg6V0I9s9XpldHcyx4hQ 8cplxtnpm62pp+Z4p1KG6Byx7aFQBUhWcvwgIVgnWdwJrASzeJY2dCElgL40DhQXyE3z KQqD1cy3JdKTw+FS2g8X9c9/c6quW/NdTzJ1CLiKROwU120zw9H/aN5Bs/BHAFnuRvsO Bv3AI/HJXXeI/ShZP0lwLRkBQ4e7pnC/zed8vubijTx/YJxpeasOeGpBFXU6gtYRvo2o ECsBPt58WL70jQMHkhw9HQwduM8HODWY0FFJfMuaDTWFxC/mI2nWSG4+bBXkSmsTp0vp phGA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=iUERTpfrumNzPpL+9R6cg7A8hehvT+o3pXXZN9DmkNc=; b=2FPzbFIPanYgpeXqjDKwhdH5A0UGGW8oaHp0ZGGmLf083iFSE/I7nWr3d9FKrGDnBS aBkBk83uBTj8DbHoj9IpVelzTJLjFPL9vfPW0dO3RH1ze8jFScmsqEiWz0mmHXZwWDNq TNO38X+4Qy5/zE0nePhYjC9CJ475+G1BzCihsTx4QKmZDcsuovR8pdtN0DCxh5mUBF3t YgBYevCigYSkvLQuDzHIJe722mpbS+ew5yehmcvkT+OKf3+wNHWEdIOuaUmnuYCqNL9W 9CMfBWvXjSRQmoajfPt4lyGDwsqK5uoQNhqAeL8rEjFEYfbo58PPbKEPtOfUIKInqUY9 DUqA== X-Gm-Message-State: AOAM532DsXJNXe64llSBIfd//SaA9TuO1a9v9Wr6vbYETFUJJUldYD0u RY7ACF1LSiTm+vhJp4gpwHix3UTP9xFfxQ== X-Google-Smtp-Source: ABdhPJyRXIBZhseR2Y6Xt7+7Ke4TMZTyyZ8IOX0bWjHI0yVatEvTsklkPvPlh3UpUKLGzADgfpIoRQ== X-Received: by 2002:a05:6602:1a:: with SMTP id b26mr6157579ioa.0.1632880516626; Tue, 28 Sep 2021 18:55:16 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id x12sm498326ilh.15.2021.09.28.18.55.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Sep 2021 18:55:16 -0700 (PDT) Date: Tue, 28 Sep 2021 21:55:15 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, avarab@gmail.com, gitster@pobox.com, jonathantanmy@google.com, steadmon@google.com Subject: [PATCH v3 6/9] builtin/repack.c: extract showing progress to a variable Message-ID: <1a40161baf13224228d9c755db8ce2bdeb06917a.1632880469.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org We only ask whether stderr is a tty before calling 'prune_packed_objects()', but the subsequent patch will add another use. Extract this check into a variable so that both can use it without having to call 'isatty()' twice. Signed-off-by: Taylor Blau --- builtin/repack.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/builtin/repack.c b/builtin/repack.c index 5539ec7e89..475677b297 100644 --- a/builtin/repack.c +++ b/builtin/repack.c @@ -446,6 +446,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix) struct strbuf line = STRBUF_INIT; int i, ext, ret; FILE *out; + int show_progress = isatty(2); /* variables to be filled by option parsing */ int pack_everything = 0; @@ -719,7 +720,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix) } strbuf_release(&buf); } - if (!po_args.quiet && isatty(2)) + if (!po_args.quiet && show_progress) opts |= PRUNE_PACKED_VERBOSE; prune_packed_objects(opts); From patchwork Wed Sep 29 01:55:18 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 12524405 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AB4F0C433EF for ; Wed, 29 Sep 2021 01:55:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 92247613C8 for ; Wed, 29 Sep 2021 01:55:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243666AbhI2B5B (ORCPT ); Tue, 28 Sep 2021 21:57:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41462 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243660AbhI2B5A (ORCPT ); Tue, 28 Sep 2021 21:57:00 -0400 Received: from mail-io1-xd2b.google.com (mail-io1-xd2b.google.com [IPv6:2607:f8b0:4864:20::d2b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 49C0FC06161C for ; Tue, 28 Sep 2021 18:55:20 -0700 (PDT) Received: by mail-io1-xd2b.google.com with SMTP id n71so1197511iod.0 for ; Tue, 28 Sep 2021 18:55:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20210112.gappssmtp.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=+7hsb7dayAMtakLtdQNpDrytA7zIPGOLm45coTKbACI=; b=54ewWgn6v5ClstDRDTTL1YWKeoAhSkEyd6rq4spK/5x4V+PTMpG/kw4B1oxYZQqcZu bfU3gKzqJrt9uN1IFjtlfu0i8cDQOE44NUHP+4HmqfE+Uz/B7WziBj3e+fXEbtelLkLP 7k1W5a8/R5L3BnHjkyJhnHLgJood669JAjPGFnp5MjwO6b1iG7zGfFUkRtCmL4upWP9y amFeaj//bvMldk6Zkqsgl3bCPh/GzodwbVm4HuAwknaM4Bp8UlBnNjROW6tREygTWz7v 60qz/ST8tNa1fqI07vUTogElU0vKDvAuA7EKmIqgT8gM9c+vpGoYMUWEdQnNxv8p8OJD 1Yyw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=+7hsb7dayAMtakLtdQNpDrytA7zIPGOLm45coTKbACI=; b=knfzL3+0V5OoKC/9dJItiHKqqvHXfQPBJY6Tyv52Pw2kGGnKF94clPteHtEJNGX1Ch uV8vN5VMtBp7RcKdjvL57UbHPVaSDN07aBiUeXckr5W8TdAQiyn4QcM//JoPRybhHHg2 f/uSFU9Vhwkn/COE2kD9AVxFd/6wSXiYXtVMbqKFwvQEdHuPwcAKIf8GUPDbbYuAgvBG qsaoH2hKySqD4SKFw/Pdd4eG5Skl2y8KL2fMaNY/CMvEpJygryiJZwlAVx6AY4XtsG1E 2rZENqS+uwgR3TuEHvy4PHrCJvJELpfEblK3/Pp78HD6eNrkDH6ns6o+4614q01qtGhg gvPA== X-Gm-Message-State: AOAM530jaueYCGJdekgtK8OCYAbFtTjv1uWunSqt03FO6F/QQghRxtwE YYVPCYu3U3bhR0FL6O7/jtQbcu8wiUgqOg== X-Google-Smtp-Source: ABdhPJxhRIyguxFOUhuDmXp5858qWsnIM9MNUVJbPuop5bcKNa/CF7AhhuU3/KymSorm5VBAyoW6nw== X-Received: by 2002:a6b:6806:: with SMTP id d6mr6118335ioc.96.1632880519168; Tue, 28 Sep 2021 18:55:19 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id y124sm544595iof.8.2021.09.28.18.55.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Sep 2021 18:55:18 -0700 (PDT) Date: Tue, 28 Sep 2021 21:55:18 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, avarab@gmail.com, gitster@pobox.com, jonathantanmy@google.com, steadmon@google.com Subject: [PATCH v3 7/9] builtin/repack.c: support writing a MIDX while repacking Message-ID: <6854f0751de56eb6117b985fc20e13e2a944a989.1632880469.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Teach `git repack` a new `--write-midx` option for callers that wish to persist a multi-pack index in their repository while repacking. There are two existing alternatives to this new flag, but they don't cover our particular use-case. These alternatives are: - Call 'git multi-pack-index write' after running 'git repack', or - Set 'GIT_TEST_MULTI_PACK_INDEX=1' in your environment when running 'git repack'. The former works, but introduces a gap in bitmap coverage between repacking and writing a new MIDX (since the repack may have deleted a pack included in the existing MIDX, invalidating it altogether). Setting the 'GIT_TEST_' environment variable is obviously unsupported. In fact, even if it were supported officially, it still wouldn't work, because it generates the MIDX *after* redundant packs have been dropped, leading to the same issue as above. Introduce a new option which eliminates this race by teaching `git repack` to generate the MIDX at the critical point: after the new packs have been written and moved into place, but before the redundant packs have been removed. This option is compatible with `git repack`'s '--bitmap' option (it changes the interpretation to be: "write a bitmap corresponding to the MIDX after one has been generated"). There is a little bit of additional noise in the patch below to avoid repeating ourselves when selecting which packs to delete. Instead of a single loop as before (where we iterate over 'existing_packs', decide if a pack is worth deleting, and if so, delete it), we have two loops (the first where we decide which ones are worth deleting, and the second where we actually do the deleting). This makes it so we have a single check we can make consistently when (1) telling the MIDX which packs we want to exclude, and (2) actually unlinking the redundant packs. There is also a tiny change to short-circuit the body of write_midx_included_packs() when no packs remain in the case of an empty repository. The MIDX code does not handle this, so avoid trying to generate a MIDX covering zero packs in the first place. Signed-off-by: Taylor Blau --- Documentation/git-repack.txt | 14 +++- builtin/repack.c | 138 ++++++++++++++++++++++++++++++----- t/lib-midx.sh | 8 ++ t/t7700-repack.sh | 96 ++++++++++++++++++++++++ t/t7703-repack-geometric.sh | 2 +- 5 files changed, 234 insertions(+), 24 deletions(-) create mode 100644 t/lib-midx.sh diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt index 24c00c9384..0f2d235ca5 100644 --- a/Documentation/git-repack.txt +++ b/Documentation/git-repack.txt @@ -9,7 +9,7 @@ git-repack - Pack unpacked objects in a repository SYNOPSIS -------- [verse] -'git repack' [-a] [-A] [-d] [-f] [-F] [-l] [-n] [-q] [-b] [--window=] [--depth=] [--threads=] [--keep-pack=] +'git repack' [-a] [-A] [-d] [-f] [-F] [-l] [-n] [-q] [-b] [-m] [--window=] [--depth=] [--threads=] [--keep-pack=] [--write-midx] DESCRIPTION ----------- @@ -128,10 +128,11 @@ depth is 4095. -b:: --write-bitmap-index:: Write a reachability bitmap index as part of the repack. This - only makes sense when used with `-a` or `-A`, as the bitmaps + only makes sense when used with `-a`, `-A` or `-m`, as the bitmaps must be able to refer to all reachable objects. This option - overrides the setting of `repack.writeBitmaps`. This option - has no effect if multiple packfiles are created. + overrides the setting of `repack.writeBitmaps`. This option + has no effect if multiple packfiles are created, unless writing a + MIDX (in which case a multi-pack bitmap is created). --pack-kept-objects:: Include objects in `.keep` files when repacking. Note that we @@ -190,6 +191,11 @@ to change in the future. This option (implying a drastically different repack mode) is not guaranteed to work with all other combinations of option to `git repack`. +-m:: +--write-midx:: + Write a multi-pack index (see linkgit:git-multi-pack-index[1]) + containing the non-redundant packs. + CONFIGURATION ------------- diff --git a/builtin/repack.c b/builtin/repack.c index 475677b297..abb30f89e8 100644 --- a/builtin/repack.c +++ b/builtin/repack.c @@ -434,6 +434,76 @@ static void clear_pack_geometry(struct pack_geometry *geometry) geometry->split = 0; } +static void midx_included_packs(struct string_list *include, + struct string_list *existing_nonkept_packs, + struct string_list *existing_kept_packs, + struct string_list *names, + struct pack_geometry *geometry) +{ + struct string_list_item *item; + + for_each_string_list_item(item, existing_kept_packs) + string_list_insert(include, xstrfmt("%s.idx", item->string)); + for_each_string_list_item(item, names) + string_list_insert(include, xstrfmt("pack-%s.idx", item->string)); + if (geometry) { + struct strbuf buf = STRBUF_INIT; + uint32_t i; + for (i = geometry->split; i < geometry->pack_nr; i++) { + struct packed_git *p = geometry->pack[i]; + + strbuf_addstr(&buf, pack_basename(p)); + strbuf_strip_suffix(&buf, ".pack"); + strbuf_addstr(&buf, ".idx"); + + string_list_insert(include, strbuf_detach(&buf, NULL)); + } + } else { + for_each_string_list_item(item, existing_nonkept_packs) { + if (item->util) + continue; + string_list_insert(include, xstrfmt("%s.idx", item->string)); + } + } +} + +static int write_midx_included_packs(struct string_list *include, + int show_progress, int write_bitmaps) +{ + struct child_process cmd = CHILD_PROCESS_INIT; + struct string_list_item *item; + FILE *in; + int ret; + + if (!include->nr) + return 0; + + cmd.in = -1; + cmd.git_cmd = 1; + + strvec_push(&cmd.args, "multi-pack-index"); + strvec_pushl(&cmd.args, "write", "--stdin-packs", NULL); + + if (show_progress) + strvec_push(&cmd.args, "--progress"); + else + strvec_push(&cmd.args, "--no-progress"); + + if (write_bitmaps) + strvec_push(&cmd.args, "--bitmap"); + + ret = start_command(&cmd); + if (ret) + return ret; + + in = xfdopen(cmd.in, "w"); + for_each_string_list_item(item, include) + fprintf(in, "%s\n", item->string); + fclose(in); + + return finish_command(&cmd); +} + int cmd_repack(int argc, const char **argv, const char *prefix) { struct child_process cmd = CHILD_PROCESS_INIT; @@ -457,6 +527,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix) int no_update_server_info = 0; struct pack_objects_args po_args = {NULL}; int geometric_factor = 0; + int write_midx = 0; struct option builtin_repack_options[] = { OPT_BIT('a', NULL, &pack_everything, @@ -499,6 +570,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix) N_("do not repack this pack")), OPT_INTEGER('g', "geometric", &geometric_factor, N_("find a geometric progression with factor ")), + OPT_BOOL('m', "write-midx", &write_midx, + N_("write a multi-pack index of the resulting packs")), OPT_END() }; @@ -515,8 +588,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix) die(_("--keep-unreachable and -A are incompatible")); if (write_bitmaps < 0) { - if (!(pack_everything & ALL_INTO_ONE) || - !is_bare_repository()) + if (!write_midx && + (!(pack_everything & ALL_INTO_ONE) || !is_bare_repository())) write_bitmaps = 0; } else if (write_bitmaps && git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0) && @@ -526,7 +599,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix) if (pack_kept_objects < 0) pack_kept_objects = write_bitmaps > 0; - if (write_bitmaps && !(pack_everything & ALL_INTO_ONE)) + if (write_bitmaps && !(pack_everything & ALL_INTO_ONE) && !write_midx) die(_(incremental_bitmap_conflict_error)); if (geometric_factor) { @@ -568,10 +641,12 @@ int cmd_repack(int argc, const char **argv, const char *prefix) } if (has_promisor_remote()) strvec_push(&cmd.args, "--exclude-promisor-objects"); - if (write_bitmaps > 0) - strvec_push(&cmd.args, "--write-bitmap-index"); - else if (write_bitmaps < 0) - strvec_push(&cmd.args, "--write-bitmap-index-quiet"); + if (!write_midx) { + if (write_bitmaps > 0) + strvec_push(&cmd.args, "--write-bitmap-index"); + else if (write_bitmaps < 0) + strvec_push(&cmd.args, "--write-bitmap-index-quiet"); + } if (use_delta_islands) strvec_push(&cmd.args, "--delta-islands"); @@ -684,22 +759,47 @@ int cmd_repack(int argc, const char **argv, const char *prefix) } /* End of pack replacement. */ + if (delete_redundant && pack_everything & ALL_INTO_ONE) { + const int hexsz = the_hash_algo->hexsz; + string_list_sort(&names); + for_each_string_list_item(item, &existing_nonkept_packs) { + char *sha1; + size_t len = strlen(item->string); + if (len < hexsz) + continue; + sha1 = item->string + len - hexsz; + /* + * Mark this pack for deletion, which ensures that this + * pack won't be included in a MIDX (if `--write-midx` + * was given) and that we will actually delete this pack + * (if `-d` was given). + */ + item->util = (void*)(intptr_t)!string_list_has_string(&names, sha1); + } + } + + if (write_midx) { + struct string_list include = STRING_LIST_INIT_NODUP; + midx_included_packs(&include, &existing_nonkept_packs, + &existing_kept_packs, &names, geometry); + + ret = write_midx_included_packs(&include, + show_progress, write_bitmaps > 0); + + string_list_clear(&include, 0); + + if (ret) + return ret; + } + reprepare_packed_git(the_repository); if (delete_redundant) { int opts = 0; - if (pack_everything & ALL_INTO_ONE) { - const int hexsz = the_hash_algo->hexsz; - string_list_sort(&names); - for_each_string_list_item(item, &existing_nonkept_packs) { - char *sha1; - size_t len = strlen(item->string); - if (len < hexsz) - continue; - sha1 = item->string + len - hexsz; - if (!string_list_has_string(&names, sha1)) - remove_redundant_pack(packdir, item->string); - } + for_each_string_list_item(item, &existing_nonkept_packs) { + if (!item->util) + continue; + remove_redundant_pack(packdir, item->string); } if (geometry) { diff --git a/t/lib-midx.sh b/t/lib-midx.sh new file mode 100644 index 0000000000..1261994744 --- /dev/null +++ b/t/lib-midx.sh @@ -0,0 +1,8 @@ +# test_midx_consistent +test_midx_consistent () { + ls $1/pack/pack-*.idx | xargs -n 1 basename | sort >expect && + test-tool read-midx $1 | grep ^pack-.*\.idx$ | sort >actual && + + test_cmp expect actual && + git multi-pack-index --object-dir=$1 verify +} diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh index 98eda3bfeb..6792531dfd 100755 --- a/t/t7700-repack.sh +++ b/t/t7700-repack.sh @@ -3,6 +3,8 @@ test_description='git repack works correctly' . ./test-lib.sh +. "${TEST_DIRECTORY}/lib-bitmap.sh" +. "${TEST_DIRECTORY}/lib-midx.sh" commit_and_pack () { test_commit "$@" 1>&2 && @@ -234,4 +236,98 @@ test_expect_success 'auto-bitmaps do not complain if unavailable' ' test_must_be_empty actual ' +objdir=.git/objects +midx=$objdir/pack/multi-pack-index + +test_expect_success 'setup for --write-midx tests' ' + git init midx && + ( + cd midx && + git config core.multiPackIndex true && + + test_commit base + ) +' + +test_expect_success '--write-midx unchanged' ' + ( + cd midx && + GIT_TEST_MULTI_PACK_INDEX=0 git repack && + test_path_is_missing $midx && + test_path_is_missing $midx-*.bitmap && + + GIT_TEST_MULTI_PACK_INDEX=0 git repack --write-midx && + + test_path_is_file $midx && + test_path_is_missing $midx-*.bitmap && + test_midx_consistent $objdir + ) +' + +test_expect_success '--write-midx with a new pack' ' + ( + cd midx && + test_commit loose && + + GIT_TEST_MULTI_PACK_INDEX=0 git repack --write-midx && + + test_path_is_file $midx && + test_path_is_missing $midx-*.bitmap && + test_midx_consistent $objdir + ) +' + +test_expect_success '--write-midx with -b' ' + ( + cd midx && + GIT_TEST_MULTI_PACK_INDEX=0 git repack -mb && + + test_path_is_file $midx && + test_path_is_file $midx-*.bitmap && + test_midx_consistent $objdir + ) +' + +test_expect_success '--write-midx with -d' ' + ( + cd midx && + test_commit repack && + + GIT_TEST_MULTI_PACK_INDEX=0 git repack -Ad --write-midx && + + test_path_is_file $midx && + test_path_is_missing $midx-*.bitmap && + test_midx_consistent $objdir + ) +' + +test_expect_success 'cleans up MIDX when appropriate' ' + ( + cd midx && + + test_commit repack-2 && + GIT_TEST_MULTI_PACK_INDEX=0 git repack -Adb --write-midx && + + checksum=$(midx_checksum $objdir) && + test_path_is_file $midx && + test_path_is_file $midx-$checksum.bitmap && + test_path_is_file $midx-$checksum.rev && + + test_commit repack-3 && + GIT_TEST_MULTI_PACK_INDEX=0 git repack -Adb --write-midx && + + test_path_is_file $midx && + test_path_is_missing $midx-$checksum.bitmap && + test_path_is_missing $midx-$checksum.rev && + test_path_is_file $midx-$(midx_checksum $objdir).bitmap && + test_path_is_file $midx-$(midx_checksum $objdir).rev && + + test_commit repack-4 && + GIT_TEST_MULTI_PACK_INDEX=0 git repack -Adb && + + find $objdir/pack -type f -name "multi-pack-index*" >files && + test_must_be_empty files + ) +' + test_done diff --git a/t/t7703-repack-geometric.sh b/t/t7703-repack-geometric.sh index 5ccaa440e0..67049f7637 100755 --- a/t/t7703-repack-geometric.sh +++ b/t/t7703-repack-geometric.sh @@ -15,7 +15,7 @@ test_expect_success '--geometric with no packs' ' ( cd geometric && - git repack --geometric 2 >out && + git repack --write-midx --geometric 2 >out && test_i18ngrep "Nothing new to pack" out ) ' From patchwork Wed Sep 29 01:55:20 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 12524407 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 02C5AC433FE for ; Wed, 29 Sep 2021 01:55:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D68EB60EFD for ; Wed, 29 Sep 2021 01:55:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243675AbhI2B5D (ORCPT ); Tue, 28 Sep 2021 21:57:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41476 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243660AbhI2B5C (ORCPT ); Tue, 28 Sep 2021 21:57:02 -0400 Received: from mail-io1-xd35.google.com (mail-io1-xd35.google.com [IPv6:2607:f8b0:4864:20::d35]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8DBAEC06161C for ; Tue, 28 Sep 2021 18:55:22 -0700 (PDT) Received: by mail-io1-xd35.google.com with SMTP id d18so1082034iof.13 for ; Tue, 28 Sep 2021 18:55:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20210112.gappssmtp.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=FJ7sX9cvQ2kw392nHDHtd+ACZNHTdlrFq2cEi8Bt7Qc=; b=h7+GG9ZtBTySFVIumtYoqvHsdMyZyRt6jvSq17ifY8m4b2qZWulYP1/MLnF6/WAQWh 8JLmDzR8dz9sN256N6Ou6ivniT5YufSrkdDyw6eZn/x4/XTaH+WMgzZCx3CRTaZegKUT Ie8ejXwGWzMLiwPBpeNXmepl+437UD5rvUgXxScGCxOOf4rOAB0j3Sb6EOiy5nbGr7Fz KC56MneZHtZvLtHCzPlNHUMEciGaZcUBCXDPGiO66oDb3hY8KPNSzVBLnR11eG8E/sfi WFT1CjLuXZxPqe680S50+4GFUMhRak8hdpMRJch+pmiaacF4rKk2NWvUZl3X7cYu6Yb/ iPjw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=FJ7sX9cvQ2kw392nHDHtd+ACZNHTdlrFq2cEi8Bt7Qc=; b=A80tqxUtUNJ+FOIiH6Qylpb6SX3zO2GMV9xuFDUdARifDj4cGdOOmquosezPvt0tCW w57ZhvVE/TKA+kH9aFtk4SmnJmF0hcp/C8I7tj6r82zmVoLQQWS5FnQ1rFES03pqOO5H IXhJJi0ICmbWIKuwuAqcGz3GRfenNkyiQLLaQF23ReLeYXbd6ajyb7ov5ee4c3xKO88e 2FedTX9V53s5EZkahqTW36/DrAO0kxW0+cjGGmU+m3wEklcdhjrYEadUt4af1SR8MpAC ydeYLEYyYpQJyekdYkKFB3bnqvCB4Ds7CiciL/INDEXYshhythacY9Gg4FFvKC7Buq2C fYLA== X-Gm-Message-State: AOAM531cvxUH9kL0O7StAFMpU7UA68q72r3eQJKP61QEb59BKjkC0Ccw gWq2wcGKX/jw39vzn99bQqfgRZAXI/Hm2g== X-Google-Smtp-Source: ABdhPJwNLuZxA7xtrgfXXStySUpIVpp1I3iwDlisuXMUK/sw96ezD+XWADl4B6vTnxPRd0EVy5GmVw== X-Received: by 2002:a5e:d618:: with SMTP id w24mr6135314iom.178.1632880521840; Tue, 28 Sep 2021 18:55:21 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id q17sm562435iod.51.2021.09.28.18.55.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Sep 2021 18:55:21 -0700 (PDT) Date: Tue, 28 Sep 2021 21:55:20 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, avarab@gmail.com, gitster@pobox.com, jonathantanmy@google.com, steadmon@google.com Subject: [PATCH v3 8/9] builtin/repack.c: make largest pack preferred Message-ID: <3596c76daf095dc997c3d322ed96875efe9348a7.1632880469.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org When repacking into a geometric series and writing a multi-pack bitmap, it is beneficial to have the largest resulting pack be the preferred object source in the bitmap's MIDX, since selecting the large packs can lead to fewer broken delta chains and better compression. Teach 'git repack' to identify this pack and pass it to the MIDX write machinery in order to mark it as preferred. Signed-off-by: Taylor Blau --- Documentation/git-repack.txt | 4 ++++ builtin/repack.c | 27 ++++++++++++++++++++++++++- pack-bitmap.c | 2 +- pack-bitmap.h | 1 + t/helper/test-read-midx.c | 25 ++++++++++++++++++++++++- t/t7703-repack-geometric.sh | 22 ++++++++++++++++++++++ 6 files changed, 78 insertions(+), 3 deletions(-) diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt index 0f2d235ca5..7183fb498f 100644 --- a/Documentation/git-repack.txt +++ b/Documentation/git-repack.txt @@ -190,6 +190,10 @@ this "roll-up", without respect to their reachability. This is subject to change in the future. This option (implying a drastically different repack mode) is not guaranteed to work with all other combinations of option to `git repack`. ++ +When writing a multi-pack bitmap, `git repack` selects the largest resulting +pack as the preferred pack for object selection by the MIDX (see +linkgit:git-multi-pack-index[1]). -m:: --write-midx:: diff --git a/builtin/repack.c b/builtin/repack.c index abb30f89e8..1577f0d59f 100644 --- a/builtin/repack.c +++ b/builtin/repack.c @@ -423,6 +423,25 @@ static void split_pack_geometry(struct pack_geometry *geometry, int factor) geometry->split = split; } +static struct packed_git *get_largest_active_pack(struct pack_geometry *geometry) +{ + if (!geometry) { + /* + * No geometry means either an all-into-one repack (in which + * case there is only one pack left and it is the largest) or an + * incremental one. + * + * If repacking incrementally, then we could check the size of + * all packs to determine which should be preferred, but leave + * this for later. + */ + return NULL; + } + if (geometry->split == geometry->pack_nr) + return NULL; + return geometry->pack[geometry->pack_nr - 1]; +} + static void clear_pack_geometry(struct pack_geometry *geometry) { if (!geometry) @@ -468,10 +487,12 @@ static void midx_included_packs(struct string_list *include, } static int write_midx_included_packs(struct string_list *include, + struct pack_geometry *geometry, int show_progress, int write_bitmaps) { struct child_process cmd = CHILD_PROCESS_INIT; struct string_list_item *item; + struct packed_git *largest = get_largest_active_pack(geometry); FILE *in; int ret; @@ -492,6 +513,10 @@ static int write_midx_included_packs(struct string_list *include, if (write_bitmaps) strvec_push(&cmd.args, "--bitmap"); + if (largest) + strvec_pushf(&cmd.args, "--preferred-pack=%s", + pack_basename(largest)); + ret = start_command(&cmd); if (ret) return ret; @@ -783,7 +808,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix) midx_included_packs(&include, &existing_nonkept_packs, &existing_kept_packs, &names, geometry); - ret = write_midx_included_packs(&include, + ret = write_midx_included_packs(&include, geometry, show_progress, write_bitmaps > 0); string_list_clear(&include, 0); diff --git a/pack-bitmap.c b/pack-bitmap.c index 8504110a4d..67be9be9a6 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -1418,7 +1418,7 @@ static int try_partial_reuse(struct packed_git *pack, return 0; } -static uint32_t midx_preferred_pack(struct bitmap_index *bitmap_git) +uint32_t midx_preferred_pack(struct bitmap_index *bitmap_git) { struct multi_pack_index *m = bitmap_git->midx; if (!m) diff --git a/pack-bitmap.h b/pack-bitmap.h index 469090bad2..7d407c5a4c 100644 --- a/pack-bitmap.h +++ b/pack-bitmap.h @@ -55,6 +55,7 @@ int test_bitmap_commits(struct repository *r); struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs, struct list_objects_filter_options *filter, int filter_provided_objects); +uint32_t midx_preferred_pack(struct bitmap_index *bitmap_git); int reuse_partial_packfile_from_bitmap(struct bitmap_index *, struct packed_git **packfile, uint32_t *entries, diff --git a/t/helper/test-read-midx.c b/t/helper/test-read-midx.c index cb0d27049a..0038559129 100644 --- a/t/helper/test-read-midx.c +++ b/t/helper/test-read-midx.c @@ -3,6 +3,7 @@ #include "midx.h" #include "repository.h" #include "object-store.h" +#include "pack-bitmap.h" static int read_midx_file(const char *object_dir, int show_objects) { @@ -72,14 +73,36 @@ static int read_midx_checksum(const char *object_dir) return 0; } +static int read_midx_preferred_pack(const char *object_dir) +{ + struct multi_pack_index *midx = NULL; + struct bitmap_index *bitmap = NULL; + + setup_git_directory(); + + midx = load_multi_pack_index(object_dir, 1); + if (!midx) + return 1; + + bitmap = prepare_bitmap_git(the_repository); + if (!(bitmap && bitmap_is_midx(bitmap))) + return 1; + + + printf("%s\n", midx->pack_names[midx_preferred_pack(bitmap)]); + return 0; +} + int cmd__read_midx(int argc, const char **argv) { if (!(argc == 2 || argc == 3)) - usage("read-midx [--show-objects|--checksum] "); + usage("read-midx [--show-objects|--checksum|--preferred-pack] "); if (!strcmp(argv[1], "--show-objects")) return read_midx_file(argv[2], 1); else if (!strcmp(argv[1], "--checksum")) return read_midx_checksum(argv[2]); + else if (!strcmp(argv[1], "--preferred-pack")) + return read_midx_preferred_pack(argv[2]); return read_midx_file(argv[1], 0); } diff --git a/t/t7703-repack-geometric.sh b/t/t7703-repack-geometric.sh index 67049f7637..bdbbcbf1ec 100755 --- a/t/t7703-repack-geometric.sh +++ b/t/t7703-repack-geometric.sh @@ -180,4 +180,26 @@ test_expect_success '--geometric ignores kept packs' ' ) ' +test_expect_success '--geometric chooses largest MIDX preferred pack' ' + git init geometric && + test_when_finished "rm -fr geometric" && + ( + cd geometric && + + # These packs already form a geometric progression. + test_commit_bulk --start=1 1 && # 3 objects + test_commit_bulk --start=2 2 && # 6 objects + ls $objdir/pack/pack-*.idx >before && + test_commit_bulk --start=4 4 && # 12 objects + ls $objdir/pack/pack-*.idx >after && + + git repack --geometric 2 -dbm && + + comm -3 before after | xargs -n 1 basename >expect && + test-tool read-midx --preferred-pack $objdir >actual && + + test_cmp expect actual + ) +' + test_done From patchwork Wed Sep 29 01:55:23 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 12524409 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 59C5EC433F5 for ; Wed, 29 Sep 2021 01:55:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 411CA61352 for ; Wed, 29 Sep 2021 01:55:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243702AbhI2B5K (ORCPT ); Tue, 28 Sep 2021 21:57:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41524 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243696AbhI2B5I (ORCPT ); Tue, 28 Sep 2021 21:57:08 -0400 Received: from mail-il1-x132.google.com (mail-il1-x132.google.com [IPv6:2607:f8b0:4864:20::132]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2D1EFC061766 for ; Tue, 28 Sep 2021 18:55:25 -0700 (PDT) Received: by mail-il1-x132.google.com with SMTP id a11so1096687ilk.9 for ; Tue, 28 Sep 2021 18:55:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20210112.gappssmtp.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=GVAEaDvJESveSFFWdD9hElxuP8CI9v2+PMwEro7P404=; b=HtYDIAkT/qox0aKRxzMmX7588P96IJ6Th9y9AK5cgBaEZ+fHEkut7w28dy391z1YSu dFxLKZHuCAI/gID+m6Qdh3Xdu904ZYiUpJ7F84HyrG/xwwkxB4c1hEVTOtxrTc2iwpU/ ZA7ll7N8ldDPRtEhCYH4kykSnPG9URlfhVqOWjTsHUnB5NExHK+tdMPYq5LS5Zg8HLT5 6TK+uomO4nFkESC1cjpet+b6hq1ZRrAE/VpqAsFcnY8dNQro+eUPbnj28n5BrsNqq2A7 V3OyF6jZSpMsxCIG4N1v7eZwNu8rgbfS54SeitlVDSTspFLJGs9WPkT7lCTFlkchMBzs ZbOA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=GVAEaDvJESveSFFWdD9hElxuP8CI9v2+PMwEro7P404=; b=quME+dtbX6SPfY6/G9nebT3hQsadlEm/8D2BX2jLicUd5t8wwi//Xcsj2TZk5oFmu8 xLjKd3CfBJSgESQuC5629VPbnQL4bGemtih8gxu309YAcr/yVuSvAbjh3YAeX3mPggG9 4eLLHPb4LS/nwvGsbx4GnsKwRz++34M/WtUPZkkAHtlvm+tiITTprJeL1TTsd54/55Sv k+bQ2e1/xP6zI/4ylqYvZop3UMFycBUoiv2NzjVDje+X0a4kUm4mrG+Y/zEqbheaWgH4 CSO7CWiqOhGFx2SRfGX+Qpa5RU5wVaOGnAP7BXaOwBmJ3RnuCKFSQCgW4eBhcjlqGi1e CWTw== X-Gm-Message-State: AOAM531uzHyJ0TqP8R4BVvwgAmNWHbGV1Ku99VuQOoePuqXUAnAYvYYw sIC5KHYaX/XNOoCVUqsKvc2cJ9A1rmOGlA== X-Google-Smtp-Source: ABdhPJwIOGwWP4hKbemijyq77uq+LFLNlXYtCxjhqvsbYqhmJqpGJDpKoi3Y6HD+tlZMa1DuzEZOyQ== X-Received: by 2002:a05:6e02:1e0c:: with SMTP id g12mr6325930ila.155.1632880524451; Tue, 28 Sep 2021 18:55:24 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id b14sm504760ilc.63.2021.09.28.18.55.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Sep 2021 18:55:24 -0700 (PDT) Date: Tue, 28 Sep 2021 21:55:23 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, avarab@gmail.com, gitster@pobox.com, jonathantanmy@google.com, steadmon@google.com Subject: [PATCH v3 9/9] builtin/repack.c: pass `--refs-snapshot` when writing bitmaps Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org To prevent the race described in an earlier patch, generate and pass a reference snapshot to the multi-pack bitmap code, if we are writing one from `git repack`. This patch is mostly limited to creating a temporary file, and then calling for_each_ref(). Except we try to minimize duplicates, since doing so can drastically reduce the size in network-of-forks style repositories. In the kernel's fork network (the repository containing all objects from the kernel and all its forks), deduplicating the references drops the snapshot size from 934 MB to just 12 MB. But since we're handling duplicates in this way, we have to make sure that we preferred references (those listed in pack.preferBitmapTips) before non-preferred ones (to avoid recording an object which is pointed at by a preferred tip as non-preferred). We accomplish this by doing separate passes over the references: first visiting each prefix in pack.preferBitmapTips, and then over the rest of the references. Signed-off-by: Taylor Blau --- builtin/repack.c | 79 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 79 insertions(+) diff --git a/builtin/repack.c b/builtin/repack.c index 1577f0d59f..5cc0dff77c 100644 --- a/builtin/repack.c +++ b/builtin/repack.c @@ -15,6 +15,8 @@ #include "promisor-remote.h" #include "shallow.h" #include "pack.h" +#include "pack-bitmap.h" +#include "refs.h" static int delta_base_offset = 1; static int pack_kept_objects = -1; @@ -453,6 +455,65 @@ static void clear_pack_geometry(struct pack_geometry *geometry) geometry->split = 0; } +struct midx_snapshot_ref_data { + struct tempfile *f; + struct oidset seen; + int preferred; +}; + +static int midx_snapshot_ref_one(const char *refname, + const struct object_id *oid, + int flag, void *_data) +{ + struct midx_snapshot_ref_data *data = _data; + struct object_id peeled; + + if (!peel_iterated_oid(oid, &peeled)) + oid = &peeled; + + if (oidset_insert(&data->seen, oid)) + return 0; /* already seen */ + + if (oid_object_info(the_repository, oid, NULL) != OBJ_COMMIT) + return 0; + + fprintf(data->f->fp, "%s%s\n", data->preferred ? "+" : "", + oid_to_hex(oid)); + + return 0; +} + +static void midx_snapshot_refs(struct tempfile *f) +{ + struct midx_snapshot_ref_data data; + const struct string_list *preferred = bitmap_preferred_tips(the_repository); + + data.f = f; + oidset_init(&data.seen, 0); + + if (!fdopen_tempfile(f, "w")) + die(_("could not open tempfile %s for writing"), + get_tempfile_path(f)); + + if (preferred) { + struct string_list_item *item; + + data.preferred = 1; + for_each_string_list_item(item, preferred) + for_each_ref_in(item->string, midx_snapshot_ref_one, &data); + data.preferred = 0; + } + + for_each_ref(midx_snapshot_ref_one, &data); + + if (close_tempfile_gently(f)) { + int save_errno = errno; + delete_tempfile(&f); + errno = save_errno; + die_errno(_("could not close refs snapshot tempfile")); + } +} + static void midx_included_packs(struct string_list *include, struct string_list *existing_nonkept_packs, struct string_list *existing_kept_packs, @@ -488,6 +549,7 @@ static void midx_included_packs(struct string_list *include, static int write_midx_included_packs(struct string_list *include, struct pack_geometry *geometry, + const char *refs_snapshot, int show_progress, int write_bitmaps) { struct child_process cmd = CHILD_PROCESS_INIT; @@ -517,6 +579,9 @@ static int write_midx_included_packs(struct string_list *include, strvec_pushf(&cmd.args, "--preferred-pack=%s", pack_basename(largest)); + if (refs_snapshot) + strvec_pushf(&cmd.args, "--refs-snapshot=%s", refs_snapshot); + ret = start_command(&cmd); if (ret) return ret; @@ -539,6 +604,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix) struct string_list existing_kept_packs = STRING_LIST_INIT_DUP; struct pack_geometry *geometry = NULL; struct strbuf line = STRBUF_INIT; + struct tempfile *refs_snapshot = NULL; int i, ext, ret; FILE *out; int show_progress = isatty(2); @@ -627,6 +693,18 @@ int cmd_repack(int argc, const char **argv, const char *prefix) if (write_bitmaps && !(pack_everything & ALL_INTO_ONE) && !write_midx) die(_(incremental_bitmap_conflict_error)); + if (write_midx && write_bitmaps) { + struct strbuf path = STRBUF_INIT; + + strbuf_addf(&path, "%s/%s_XXXXXX", get_object_directory(), + "bitmap-ref-tips"); + + refs_snapshot = xmks_tempfile(path.buf); + midx_snapshot_refs(refs_snapshot); + + strbuf_release(&path); + } + if (geometric_factor) { if (pack_everything) die(_("--geometric is incompatible with -A, -a")); @@ -809,6 +887,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix) &existing_kept_packs, &names, geometry); ret = write_midx_included_packs(&include, geometry, + refs_snapshot ? get_tempfile_path(refs_snapshot) : NULL, show_progress, write_bitmaps > 0); string_list_clear(&include, 0);