From patchwork Thu Oct 8 15:04:39 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?Jean-No=C3=ABl_Avila_via_GitGitGadget?= X-Patchwork-Id: 11823153 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2B530C4363A for ; Thu, 8 Oct 2020 15:04:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CB9A620708 for ; Thu, 8 Oct 2020 15:04:45 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="WeK7703U" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730786AbgJHPEo (ORCPT ); Thu, 8 Oct 2020 11:04:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33864 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730685AbgJHPEo (ORCPT ); Thu, 8 Oct 2020 11:04:44 -0400 Received: from mail-wr1-x442.google.com (mail-wr1-x442.google.com [IPv6:2a00:1450:4864:20::442]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 92ED8C061755 for ; Thu, 8 Oct 2020 08:04:43 -0700 (PDT) Received: by mail-wr1-x442.google.com with SMTP id j2so6989063wrx.7 for ; Thu, 08 Oct 2020 08:04:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=Ztt5wnGLz1vFJFZbnZUPfP+QTr0XxJhBMhV50fj/N1o=; b=WeK7703UDMh5z17TchAuAWdG9sIMhwPCYsCaJ3FEDEk0YG0rVdOE49llOHTTw0uMeu oVU2MmDI7743XUfhzpIPV91duGd4EzS4fGt51Qxv1CoGSdsvAVPMoEJAuW/QzwsWFOE8 SJQZMdsY+RDitx6y7WwqlimIvr/rxwjR/753ntDUHS17NrGlYRAFghMq7Hsu783UH/L0 vvB3fwbKBdBdlZOc/Prrpp5mlnKCagoRfpzpwpSN9y5aiwufkWqN1delA18raGKZ0+vg eExVl/UZMzWNvu/dFbise1CLvCCpj62cxkDRYgk3KWfnYHgIHCjz4HbqHvTevcIH2UBF aMcw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=Ztt5wnGLz1vFJFZbnZUPfP+QTr0XxJhBMhV50fj/N1o=; b=IJl1h7/EORf0e1tkhgbRBucVbqW8G0fitvtij/FJCPGViqSFs+xf9DB+EZsk+QGeFe NzAtR2v4iNui4j9nOYT3lsAB4Tf10hqtNwBUEoISADvS/q6ZATKuJRiqTuDANxAVOU4K empfqwS7eyCyHXtSWgOggCcv71+is7VfQaiSC21sTwAsSJp01w/8bA2+haP5w2viqdRM U3b8vrqG1UkHwOm8iKYkVoatn+nb3TZZjcZNNWePiS4MVQsUs3n+aKqHNA4lN6AYwc7G PgSJ3blQDP8A4iUiWh9EiQxmdzuplBXb8ESgl2sHoFXX+zxbr/LtB/k2nwZl7VxQHYnN xGWA== X-Gm-Message-State: AOAM532dMdMbnxV4FHz7HoyZNzjNZsMhzqSlGIlLFpl9bkz5njnhulV1 8UIxj0efA/DEeIe+YDJy/yMiw7yb5X0= X-Google-Smtp-Source: ABdhPJwmYd91zKUthMQirowyalhC0w7UCw9Zep/jPGwfPgrcoBd2WQJY6QFyAMhSeZmcUJHR6fP2JQ== X-Received: by 2002:adf:dcc7:: with SMTP id x7mr10048375wrm.203.1602169481426; Thu, 08 Oct 2020 08:04:41 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id x1sm217813wrl.41.2020.10.08.08.04.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Oct 2020 08:04:40 -0700 (PDT) Message-Id: In-Reply-To: References: From: "Derrick Stolee via GitGitGadget" Date: Thu, 08 Oct 2020 15:04:39 +0000 Subject: [PATCH v3] commit-graph: ignore duplicates when merging layers Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Taylor Blau , Derrick Stolee , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee Thomas reported [1] that a "git fetch" command was failing with an error saying "unexpected duplicate commit id". The root cause is that they had fetch.writeCommitGraph enabled which generates commit-graph chains, and this instance was merging two layers that both contained the same commit ID. [1] https://lore.kernel.org/git/55f8f00c-a61c-67d4-889e-a9501c596c39@virtuell-zuhause.de/ The initial assumption is that Git would not write a commit ID into a commit-graph layer if it already exists in a lower commit-graph layer. Somehow, this specific case did get into that situation, leading to this error. While unexpected, this isn't actually invalid (as long as the two layers agree on the metadata for the commit). When we parse a commit that does not have a graph_pos in the commit_graph_data_slab, we use binary search in the commit-graph layers to find the commit and set graph_pos. That position is never used again in this case. However, when we parse a commit from the commit-graph file, we load its parents from the commit-graph and assign graph_pos at that point. If those parents were already parsed from the commit-graph, then nothing needs to be done. Otherwise, this graph_pos is a valid position in the commit-graph so we can parse the parents, when necessary. Thus, this die() is too aggressive. The easiest thing to do would be to ignore the duplicates. If we only ignore the duplicates, then we will produce a commit-graph that has identical commit IDs listed in adjacent positions. This excess data will never be removed from the commit-graph, which could cascade into significantly bloated file sizes. Thankfully, we can collapse the list to erase the duplicate commit pointers. This allows us to get the end result we want without extra memory costs and minimal CPU time. Since the root cause for producing commit-graph layers with these duplicate commits is currently unknown, it is difficult to create a test for this scenario. For now, we must rely on testing the example data graciously provided in [1]. My local test successfully merged layers, and 'git commit-graph verify' passed. Reported-by: Thomas Braun Helped-by: Taylor Blau Co-authored-by: Jeff King Signed-off-by: Derrick Stolee --- commit-graph: ignore duplicates when merging layers This wasn't quite as simple as what Peff had posted, since we really don't want to keep duplicate commits around in the new merged layer. I still don't have a grasp on how this happened in the first place, but will keep looking. Thanks, -Stolee APOLOGIES: v2 accidentally only changed the commit message, not the patch contents. Please ignore v2 and go straight to v3. Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-747%2Fderrickstolee%2Fcommit-graph-dup-commits-v3 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-747/derrickstolee/commit-graph-dup-commits-v3 Pull-Request: https://github.com/gitgitgadget/git/pull/747 Range-diff vs v2: 1: 85f4e578b8 ! 1: 9e760f07ac commit-graph: ignore duplicates when merging layers @@ Commit message ## commit-graph.c ## @@ commit-graph.c: static int commit_compare(const void *_a, const void *_b) + static void sort_and_scan_merged_commits(struct write_commit_graph_context *ctx) { - uint32_t i; -+ struct packed_commit_list deduped_commits = { NULL, 0, 0 }; +- uint32_t i; ++ uint32_t i, dedup_i = 0; if (ctx->report_progress) ctx->progress = start_delayed_progress( @@ commit-graph.c: static void sort_and_scan_merged_commits(struct write_commit_graph_context *ctx) - ctx->commits.nr); - - QSORT(ctx->commits.list, ctx->commits.nr, commit_compare); -+ deduped_commits.alloc = ctx->commits.nr; -+ ALLOC_ARRAY(deduped_commits.list, deduped_commits.alloc); - - ctx->num_extra_edges = 0; - for (i = 0; i < ctx->commits.nr; i++) { -@@ commit-graph.c: static void sort_and_scan_merged_commits(struct write_commit_graph_context *ctx) if (i && oideq(&ctx->commits.list[i - 1]->object.oid, &ctx->commits.list[i]->object.oid)) { @@ commit-graph.c: static void sort_and_scan_merged_commits(struct write_commit_gra } else { unsigned int num_parents; -+ deduped_commits.list[deduped_commits.nr] = ctx->commits.list[i]; -+ deduped_commits.nr++; ++ ctx->commits.list[dedup_i] = ctx->commits.list[i]; ++ dedup_i++; + num_parents = commit_list_count(ctx->commits.list[i]->parents); if (num_parents > 2) @@ commit-graph.c: static void sort_and_scan_merged_commits(struct write_commit_gra } } -+ free(ctx->commits.list); -+ ctx->commits.list = deduped_commits.list; -+ ctx->commits.nr = deduped_commits.nr; -+ ctx->commits.alloc = deduped_commits.alloc; ++ ctx->commits.nr = dedup_i; + stop_progress(&ctx->progress); } commit-graph.c | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-) base-commit: d98273ba77e1ab9ec755576bc86c716a97bf59d7 diff --git a/commit-graph.c b/commit-graph.c index cb042bdba8..0280dcb2ce 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -2008,7 +2008,7 @@ static int commit_compare(const void *_a, const void *_b) static void sort_and_scan_merged_commits(struct write_commit_graph_context *ctx) { - uint32_t i; + uint32_t i, dedup_i = 0; if (ctx->report_progress) ctx->progress = start_delayed_progress( @@ -2023,17 +2023,27 @@ static void sort_and_scan_merged_commits(struct write_commit_graph_context *ctx) if (i && oideq(&ctx->commits.list[i - 1]->object.oid, &ctx->commits.list[i]->object.oid)) { - die(_("unexpected duplicate commit id %s"), - oid_to_hex(&ctx->commits.list[i]->object.oid)); + /* + * Silently ignore duplicates. These were likely + * created due to a commit appearing in multiple + * layers of the chain, which is unexpected but + * not invalid. We should make sure there is a + * unique copy in the new layer. + */ } else { unsigned int num_parents; + ctx->commits.list[dedup_i] = ctx->commits.list[i]; + dedup_i++; + num_parents = commit_list_count(ctx->commits.list[i]->parents); if (num_parents > 2) ctx->num_extra_edges += num_parents - 1; } } + ctx->commits.nr = dedup_i; + stop_progress(&ctx->progress); } From patchwork Fri Oct 9 20:53:52 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?Jean-No=C3=ABl_Avila_via_GitGitGadget?= X-Patchwork-Id: 11829669 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D1A56C433DF for ; Fri, 9 Oct 2020 20:53:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7025622261 for ; Fri, 9 Oct 2020 20:53:59 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="N3KKtjBW" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2390988AbgJIUx6 (ORCPT ); Fri, 9 Oct 2020 16:53:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56922 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2390960AbgJIUx5 (ORCPT ); Fri, 9 Oct 2020 16:53:57 -0400 Received: from mail-wm1-x342.google.com (mail-wm1-x342.google.com [IPv6:2a00:1450:4864:20::342]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 14AC3C0613D5 for ; Fri, 9 Oct 2020 13:53:57 -0700 (PDT) Received: by mail-wm1-x342.google.com with SMTP id a72so490329wme.5 for ; Fri, 09 Oct 2020 13:53:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=BeZky0ifivjf2ThY4p57yacIS4TlpNjuiehrhMNG82Q=; b=N3KKtjBWF5nt5eFL78DvsD8kS30s3Y7gigIEatG3R4kPNOM3n09fOFQfJ1synY5gvH BYPam+vkTkS2taOZeyOjiIRK+82++FUg5c2wKxmgf79UqCqJkAEpUODn5HLNBMipCJvx M3od7QkfE9cSZA2EWa5/nq2P9L7UT8fdBuEQJThsK2RZMu2NbJp22gklRFTM3qBjoE6d cFCbcxLqEbwYxxCzEauk1kHWbAMrqI2F0lPr2RbpqCmTFXrG4dtsO9GGErXX/WogXwfX tHB1mj8vyWFb5arzTjWQcLEU65Q6rLOwWbLUgLRT0viCWL1xiLziEniL9LWaC7a5XFBI um/Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=BeZky0ifivjf2ThY4p57yacIS4TlpNjuiehrhMNG82Q=; b=KAArOc/zODo+uRbRmyigP+Aa8uTZeIksSzlN3E80j24M2Nwz9PHWZbXLhmLH0LUckD 0xwmfP2vyGCRQns8v9RC61VqaPI51UZp4jOsYXzdduSamApB5NCWyOINXQIEQCH57NG+ ddHPhcox8irQ7sSLehEQN2NwtJYr90W2J+Kt8TOp7n/Xi3C9V7UM/qWvajhk7Mi7UBHR yZmGkxlbUTP4Kmdfik6gKul1ZsiuWEJck3JcZTeyr2/m2+g3M1xJ8IL51xW+xTftuew8 hHS7duw9MS7qfngKUPtsW2zenvRai307SUbkZvFcKz27Banmtf1aoaDkFCQHImrAPXF/ /vug== X-Gm-Message-State: AOAM531q0chQbUFdNUerlx7z5rLGpqM7/rEoL9S8QU20bWZ6l9+f00O0 2kiwl2T8mddrMm4JUgplSdZ10s5FoYw= X-Google-Smtp-Source: ABdhPJyvYsEiDobfjEDVdED5XHYWEwOWLxYUHowwdG1VDoiyvr9f45VC4rLxyACsvqdUo+MQ4TLpoA== X-Received: by 2002:a1c:bd43:: with SMTP id n64mr15252044wmf.113.1602276835582; Fri, 09 Oct 2020 13:53:55 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id 30sm3746327wrs.84.2020.10.09.13.53.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 09 Oct 2020 13:53:55 -0700 (PDT) Message-Id: <4439e8ae8fdc9abf28df29d3038a1483d9084cf2.1602276832.git.gitgitgadget@gmail.com> In-Reply-To: References: From: "Derrick Stolee via GitGitGadget" Date: Fri, 09 Oct 2020 20:53:52 +0000 Subject: [PATCH v4 2/2] commit-graph: don't write commit-graph when disabled Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Taylor Blau , Derrick Stolee , Jeff King , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org X-Original-From: Derrick Stolee From: Derrick Stolee The core.commitGraph config setting can be set to 'false' to prevent parsing commits from the commit-graph file(s). This causes an issue when trying to write with "--split" which needs to distinguish between commits that are in the existing commit-graph layers and commits that are not. The existing mechanism uses parse_commit() and follows by checking if there is a 'graph_pos' that shows the commit was parsed from the commit-graph file. When core.commitGraph=false, we do not parse the commits from the commit-graph and 'graph_pos' indicates that no commits are in the existing file. The --split logic moves forward creating a new layer on top that holds all reachable commits, then possibly merges down into those layers, resulting in duplicate commits. The previous change makes that merging process more robust to such a situation in case it happens in the written commit-graph data. The easy answer here is to avoid writing a commit-graph if reading the commit-graph is disabled. Since the resulting commit-graph will would not be read by subsequent Git processes. This is more natural than forcing core.commitGraph to be true for the 'write' process. Reported-by: Thomas Braun Helped-by: Jeff King Helped-by: Taylor Blau Signed-off-by: Derrick Stolee --- Documentation/git-commit-graph.txt | 4 +++- commit-graph.c | 5 +++++ t/t5324-split-commit-graph.sh | 3 ++- 3 files changed, 10 insertions(+), 2 deletions(-) diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt index de6b6de230..e1f48c95b3 100644 --- a/Documentation/git-commit-graph.txt +++ b/Documentation/git-commit-graph.txt @@ -39,7 +39,9 @@ COMMANDS -------- 'write':: -Write a commit-graph file based on the commits found in packfiles. +Write a commit-graph file based on the commits found in packfiles. If +the config option `core.commitGraph` is disabled, then this command will +output a warning, then return success without writing a commit-graph file. + With the `--stdin-packs` option, generate the new commit graph by walking objects only in the specified pack-indexes. (Cannot be combined diff --git a/commit-graph.c b/commit-graph.c index 0280dcb2ce..6f62a07313 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -2160,6 +2160,11 @@ int write_commit_graph(struct object_directory *odb, int replace = 0; struct bloom_filter_settings bloom_settings = DEFAULT_BLOOM_FILTER_SETTINGS; + prepare_repo_settings(the_repository); + if (!the_repository->settings.core_commit_graph) { + warning(_("attempting to write a commit-graph, but 'core.commitGraph' is disabled")); + return 0; + } if (!commit_graph_compatible(the_repository)) return 0; diff --git a/t/t5324-split-commit-graph.sh b/t/t5324-split-commit-graph.sh index a314ce0368..4d3842b83b 100755 --- a/t/t5324-split-commit-graph.sh +++ b/t/t5324-split-commit-graph.sh @@ -442,8 +442,9 @@ test_expect_success '--split=replace with partial Bloom data' ' test_expect_success 'prevent regression for duplicate commits across layers' ' git init dup && - git -C dup config core.commitGraph false && git -C dup commit --allow-empty -m one && + git -C dup -c core.commitGraph=false commit-graph write --split=no-merge --reachable 2>err && + test_i18ngrep "attempting to write a commit-graph" err && git -C dup commit-graph write --split=no-merge --reachable && git -C dup commit --allow-empty -m two && git -C dup commit-graph write --split=no-merge --reachable &&