From patchwork Fri Oct 13 12:37:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Patrick Steinhardt X-Patchwork-Id: 13420876 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DF09415E8A for ; Fri, 13 Oct 2023 12:37:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=pks.im header.i=@pks.im header.b="ZyY67RQj"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="YIzmFzlT" Received: from wout4-smtp.messagingengine.com (wout4-smtp.messagingengine.com [64.147.123.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CC3CBA9 for ; Fri, 13 Oct 2023 05:37:43 -0700 (PDT) Received: from compute7.internal (compute7.nyi.internal [10.202.2.48]) by mailout.west.internal (Postfix) with ESMTP id 5A11F32009FB; Fri, 13 Oct 2023 08:37:40 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute7.internal (MEProxy); Fri, 13 Oct 2023 08:37:40 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pks.im; h=cc:cc :content-type:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to; s=fm2; t=1697200659; x=1697287059; bh=Dw yElQdLXsOdeluKZPDvx6w7ahP3Gfnt8sPcWslcKuw=; b=ZyY67RQj1ifiybXtet hHuX4l5PYXbJILR10Tv4eUnz7Ip7PaBlKqLi7HSZW8Bk0H8dtqmdhoSHxo6tCNbZ 4LqO3nspn+bMZqshp1vrNKeWzXUkmPwgbtkpyC4r4IhDZQ2vCy74elXI8z/o8/1b vU6379fKVrjevm1kJu+a11QjA6gnw2glbzFJCN6dn9P8C536ThbEpnDDgvjrt9zP TN1G8+SGp5Kl41a6U7RtSlKAZN/hpmf9lrNhPBTMgjfOHH4X9q/pHXySJ34B4jTc LbdJt6IRVgLcS0F71rgJv4SnW5V0cAVyQ+vrblZHhPRAAcotfiH8sF6k5szomMk3 IwZA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:sender:subject :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm2; t=1697200659; x=1697287059; bh=DwyElQdLXsOde luKZPDvx6w7ahP3Gfnt8sPcWslcKuw=; b=YIzmFzlTzXbQDRtV42Jv8JVQs5Ndc SqP3WBXmzg59pniCfriMMWrnqaE1Tbbww0yT9mZRnvRLKYvBMggHNtUSGv05oKel mjraFaN0pua4mRdLjPlhJI5uA6D4Jporm2NVw6/0uaHMK7lzC2oazqf1dsXA/t7V mJPfBpxtO8Hmp7Z4+x3vIW3R9aIvPTzxu0IlShXRd9n+dpSEQ2M0olXRfJKos5Og WoafX58bhdVhrm6B6Fa/8esA2abLm1ig5x9z2wMhwDfTD8+wFjuMBBaKiSpE90O2 y6cZIpnoWrGuWo6lR2ySbownjMJ/2R+Vd2GUB67XDiKaSgzEUkph4XGLA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvkedrieefgdefudcutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc fjughrpeffhffvvefukfhfgggtuggjsehgtderredttddvnecuhfhrohhmpefrrghtrhhi tghkucfuthgvihhnhhgrrhguthcuoehpshesphhkshdrihhmqeenucggtffrrghtthgvrh hnpeeukedtvedtffevleejtefgheehieegkeeluddvfeefgeehgfeltddtheejleffteen ucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehpshesph hkshdrihhm X-ME-Proxy: Feedback-ID: i197146af:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Fri, 13 Oct 2023 08:37:38 -0400 (EDT) Received: by vm-mail (OpenSMTPD) with ESMTPSA id 830fe830 (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO); Fri, 13 Oct 2023 12:37:36 +0000 (UTC) Date: Fri, 13 Oct 2023 14:37:35 +0200 From: Patrick Steinhardt To: git@vger.kernel.org Cc: Karthik Nayak , Junio C Hamano , Taylor Blau Subject: [PATCH] commit: detect commits that exist in commit-graph but not in the ODB Message-ID: References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_PASS,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Commit graphs can become stale and contain references to commits that do not exist in the object database anymore. Theoretically, this can lead to a scenario where we are able to successfully look up any such commit via the commit graph even though such a lookup would fail if done via the object database directly. As the commit graph is mostly intended as a sort of cache to speed up parsing of commits we do not want to have diverging behaviour in a repository with and a repository without commit graphs, no matter whether they are stale or not. As commits are otherwise immutable, the only thing that we really need to care about is thus the presence or absence of a commit. To address potentially stale commit data that may exist in the graph, our `lookup_commit_in_graph()` function will check for the commit's existence in both the commit graph, but also in the object database. So even if we were able to look up the commit's data in the graph, we would still pretend as if the commit didn't exist if it is missing in the object database. We don't have the same safety net in `parse_commit_in_graph_one()` though. This function is mostly used internally in "commit-graph.c" itself to validate the commit graph, and this usage is fine. We do expose its functionality via `parse_commit_in_graph()` though, which gets called by `repo_parse_commit_internal()`, and that function is in turn used in many places in our codebase. For all I can see this function is never used to directly turn an object ID into a commit object without additional safety checks before or after this lookup. What it is being used for though is to walk history via the parent chain of commits. So when commits in the parent chain of a graph walk are missing it is possible that we wouldn't notice if that missing commit was part of the commit graph. Thus, a query like `git rev-parse HEAD~2` can succeed even if the intermittent commit is missing. It's unclear whether there are additional ways in which such stale commit graphs can lead to problems. In any case, it feels like this is a bigger bug waiting to happen when we gain additional direct or indirect callers of `repo_parse_commit_internal()`. So let's fix the inconsistent behaviour by checking for object existence via the object database, as well. Signed-off-by: Patrick Steinhardt --- commit.c | 7 ++++++- t/t5318-commit-graph.sh | 23 +++++++++++++++++++++++ 2 files changed, 29 insertions(+), 1 deletion(-) diff --git a/commit.c b/commit.c index b3223478bc..109e9217e3 100644 --- a/commit.c +++ b/commit.c @@ -572,8 +572,13 @@ int repo_parse_commit_internal(struct repository *r, return -1; if (item->object.parsed) return 0; - if (use_commit_graph && parse_commit_in_graph(r, item)) + if (use_commit_graph && parse_commit_in_graph(r, item)) { + if (!has_object(r, &item->object.oid, 0)) + return quiet_on_missing ? -1 : + error(_("commit %s exists in commit-graph but not in the object database"), + oid_to_hex(&item->object.oid)); return 0; + } if (oid_object_info_extended(r, &item->object.oid, &oi, flags) < 0) return quiet_on_missing ? -1 : diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh index ba65f17dd9..25f8e9e2d3 100755 --- a/t/t5318-commit-graph.sh +++ b/t/t5318-commit-graph.sh @@ -821,4 +821,27 @@ test_expect_success 'overflow during generation version upgrade' ' ) ' +test_expect_success 'commit exists in commit-graph but not in object database' ' + test_when_finished "rm -rf repo" && + git init repo && + ( + cd repo && + + test_commit A && + test_commit B && + test_commit C && + git commit-graph write --reachable && + + # Corrupt the repository by deleting the intermittent commit + # object. Commands should notice that this object is absent and + # thus that the repository is corrupt even if the commit graph + # exists. + oid=$(git rev-parse B) && + rm .git/objects/"$(test_oid_to_path "$oid")" && + + test_must_fail git rev-parse HEAD~2 2>error && + grep "error: commit $oid exists in commit-graph but not in the object database" error + ) +' + test_done