From patchwork Fri Oct 25 06:43:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 13850117 Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8596A18CBFF for ; Fri, 25 Oct 2024 06:43:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=104.130.231.41 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729838625; cv=none; b=EJmeCrqtQUBv8wYc/KfQ+AlcTt+VW8NZSvX/kyvdocFg0NKJw88Woy92yMDsejPasqA7rMInSTn4BdyKkiUFOD1EQTRsKTMEm/2hUVtSE1KcR+GqgvpYg5yMSyfewYNqTDdUi2SI0OP4q796PpUADZULe9chS1A3ahvEpxepMKU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729838625; c=relaxed/simple; bh=hMhxGCHIMwT3dqGHqhiJ/U7wqkZnzxJUq6qRk/MgySY=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=Z4NmL9kp8RtXi+VwI8EcctaeIYrCsng4k5HExPQ5fVYjo6BfwiD+lpdzLF+M6I/kSMDimkq4kLjnxCwUUQTUFlHe77aZlAqNfRd1/Un58KR1Xzy+fJJiGyk3SKcX/Uyu1Yhm4qRyMW9lU4Wa2z227C2kUxCdiSl/Ac7R74eyI84= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=peff.net; spf=pass smtp.mailfrom=peff.net; dkim=pass (2048-bit key) header.d=peff.net header.i=@peff.net header.b=B+RELIaQ; arc=none smtp.client-ip=104.130.231.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=peff.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=peff.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=peff.net header.i=@peff.net header.b="B+RELIaQ" Received: (qmail 346 invoked by uid 109); 25 Oct 2024 06:43:42 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed; d=peff.net; h=date:from:to:cc:subject:message-id:references:mime-version:content-type:in-reply-to; s=20240930; bh=hMhxGCHIMwT3dqGHqhiJ/U7wqkZnzxJUq6qRk/MgySY=; b=B+RELIaQwH4oWERhIXc5ckh6PgVBHeCxMMRrgpOr8mtekcfQfYoUHgm3pdkv8u2+aECzMggIZOm9zRFgz1FKry9CaBUSBEmGa6rVNNvesaxCIywR4JaJtPkR2OjVyQHls1u5uNITXlmMNO7yBgQdVpawGTInD9ZMZFWaXJLXOe/iMLXO6o3gKFQOCke+8Qz+nym9FJ2e8viTp0gMfSYJicQt9d8zcyiw0Dl4QoJrM1Jhi/GthbrzVvKh7MSWMmuIsrLfmZPqiJzZ5vaIs+U5isrsdnplf2vBO2tEcIVSPPOproYFHGM9j0SirLaKSkVaDy34pJ7CYiQ9BFrKT5+alg== Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Fri, 25 Oct 2024 06:43:41 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 12661 invoked by uid 111); 25 Oct 2024 06:43:41 -0000 Received: from coredump.intra.peff.net (HELO coredump.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Fri, 25 Oct 2024 02:43:41 -0400 Authentication-Results: peff.net; auth=none Date: Fri, 25 Oct 2024 02:43:40 -0400 From: Jeff King To: Taylor Blau Cc: Derrick Stolee , fox , Eric Sunshine , git@vger.kernel.org Subject: [PATCH 01/11] midx: avoid duplicate packed_git entries Message-ID: <20241025064340.GA2110355@coredump.intra.peff.net> References: <20241025064148.GA2110169@coredump.intra.peff.net> Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20241025064148.GA2110169@coredump.intra.peff.net> When we scan a pack directory to load the idx entries we find into the packed_git list, we skip any of them that are contained in a midx. We then load them later lazily if we actually need to access the corresponding pack, referencing them both from the midx struct and the packed_git list. The lazy-load in the midx code checks to see if the midx already mentions the pack, but doesn't otherwise check the packed_git list. This makes sense, since we should have added any pack to both lists. But there's a loophole! If we call close_object_store(), that frees the midx entirely, but _not_ the packed_git structs, which we must keep around for Reasons[1]. If we then try to look up more objects, we'll auto-load the midx again, which won't realize that we've already loaded those packs, and will create duplicate entries in the packed_git list. This is possibly inefficient, because it means we may open and map the pack redundantly. But it can also lead to weird user-visible behavior. The case I found is in "git repack", which closes and reopens the midx after repacking and then calls update_server_info(). We end up writing the duplicate entries into objects/info/packs. We could obviously de-dup them while writing that file, but it seems like a violation of more core assumptions that we end up with these duplicate structs at all. We can avoid the duplicates reasonably efficiently by checking their names in the pack_map hash. This annoyingly does require a little more than a straight hash lookup due to the naming conventions, but it should only happen when we are about to actually open a pack. I don't think one extra malloc will be noticeable there. [1] I'm not entirely sure of all the details, except that we generally assume the packed_git structs never go away. We noted this restriction in the comment added by 6f1e9394e2 (object: fix leaking packfiles when closing object store, 2024-08-08), but it's somewhat vague. At any rate, if you try freeing the structs in close_object_store(), you can observe segfaults all over the test suite. So it might be fixable, but it's not trivial. Signed-off-by: Jeff King --- +cc Stolee here as the original midx author. I can't think of a good reason we'd want to avoid this dup-detection here. midx.c | 20 +++++++++++++++++--- t/t5200-update-server-info.sh | 8 ++++++++ 2 files changed, 25 insertions(+), 3 deletions(-) diff --git a/midx.c b/midx.c index 67e0d64004..479812cb9b 100644 --- a/midx.c +++ b/midx.c @@ -445,6 +445,7 @@ int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, uint32_t pack_int_id) { struct strbuf pack_name = STRBUF_INIT; + struct strbuf key = STRBUF_INIT; struct packed_git *p; pack_int_id = midx_for_pack(&m, pack_int_id); @@ -455,16 +456,29 @@ int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, strbuf_addf(&pack_name, "%s/pack/%s", m->object_dir, m->pack_names[pack_int_id]); - p = add_packed_git(pack_name.buf, pack_name.len, m->local); + /* pack_map holds the ".pack" name, but we have the .idx */ + strbuf_addbuf(&key, &pack_name); + strbuf_strip_suffix(&key, ".idx"); + strbuf_addstr(&key, ".pack"); + p = hashmap_get_entry_from_hash(&r->objects->pack_map, + strhash(key.buf), key.buf, + struct packed_git, packmap_ent); + if (!p) { + p = add_packed_git(pack_name.buf, pack_name.len, m->local); + if (p) { + install_packed_git(r, p); + list_add_tail(&p->mru, &r->objects->packed_git_mru); + } + } + strbuf_release(&pack_name); + strbuf_release(&key); if (!p) return 1; p->multi_pack_index = 1; m->packs[pack_int_id] = p; - install_packed_git(r, p); - list_add_tail(&p->mru, &r->objects->packed_git_mru); return 0; } diff --git a/t/t5200-update-server-info.sh b/t/t5200-update-server-info.sh index ed9dfd624c..cc51c73986 100755 --- a/t/t5200-update-server-info.sh +++ b/t/t5200-update-server-info.sh @@ -39,4 +39,12 @@ test_expect_success 'info/refs updates when changes are made' ' ! test_cmp a b ' +test_expect_success 'midx does not create duplicate pack entries' ' + git repack -d --write-midx && + git repack -d && + grep ^P .git/objects/info/packs >packs && + uniq -d dups && + test_must_be_empty dups +' + test_done From patchwork Fri Oct 25 06:44:17 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 13850118 Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A0C3118C018 for ; Fri, 25 Oct 2024 06:44:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=104.130.231.41 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729838662; cv=none; b=rYm+omiNC0ORIlRLSCv+0RWT4LPTd4bwjhD7ISr7eCakxUItYn4gHqdd28h6YMdUumx1A9mJ7zi7YlAruUVjPr/MJBgVFj8pLj8KSHky+GWvfRUipQMKLgqYGKxXBI4S8zOPJp0c8w2RVVT84CylUchvIIChKpKQXUCdFFZofsE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729838662; c=relaxed/simple; bh=o2UyIdAUBub57pZkWLND8v82A42J//dvEbDU212UCvo=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=H3xgpNWzv9DBhocmjX1ryNduTjysDylHFli9xP8fHkpKvbo6ut+shk+OWbxiZ6D5RuLnm4zlWyA6u4at7f/7XIdcFMyfn/qt84AARoBQB07wPaihoMZUwKbD5J8mKyD8LmgDvoB/6Z4PvSrId3KukJeciREEQqxdFsxhihHVFTg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=peff.net; spf=pass smtp.mailfrom=peff.net; dkim=pass (2048-bit key) header.d=peff.net header.i=@peff.net header.b=cGDbBSYg; arc=none smtp.client-ip=104.130.231.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=peff.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=peff.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=peff.net header.i=@peff.net header.b="cGDbBSYg" Received: (qmail 366 invoked by uid 109); 25 Oct 2024 06:44:19 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed; d=peff.net; h=date:from:to:cc:subject:message-id:references:mime-version:content-type:in-reply-to; s=20240930; bh=o2UyIdAUBub57pZkWLND8v82A42J//dvEbDU212UCvo=; b=cGDbBSYg2V5xr/xmNe/1yfhyMu9n61eb6mM90nqNgHmGXRt4OPHqkNyNNI/wQLBMzdje6B1eg970uDE1DG22wfyspkS09soPn0hddmPX34gGxt7dDdztyKD3TLRO0cOGpqPvRodm3/+qWIBKrjoe1G5oOgGOT1yq+RZ2SiYnBk0EZkka0mkIopDf0t93qNfFqb38notDXlLPXneVKRjDy6FmMr8O13sM88vRcD0TUy8aI6h/P2eImvUsTgU8rYRE3DkE4UNWNIWpiZrLdvip1/5IKrgrjUSCPn7bZVYgsyxtiO9T86GAJIUGpZw5L6CCcwHaiF2EVsytINkGchG10A== Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Fri, 25 Oct 2024 06:44:19 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 12672 invoked by uid 111); 25 Oct 2024 06:44:18 -0000 Received: from coredump.intra.peff.net (HELO coredump.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Fri, 25 Oct 2024 02:44:18 -0400 Authentication-Results: peff.net; auth=none Date: Fri, 25 Oct 2024 02:44:17 -0400 From: Jeff King To: Taylor Blau Cc: fox , Eric Sunshine , git@vger.kernel.org Subject: [PATCH 02/11] t5550: count fetches in "previously-fetched .idx" test Message-ID: <20241025064417.GB2110355@coredump.intra.peff.net> References: <20241025064148.GA2110169@coredump.intra.peff.net> Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20241025064148.GA2110169@coredump.intra.peff.net> We have a test in t5550 that looks at index fetching over dumb http. It creates two branches, each of which is completely stored in its own pack, then fetches the branches independently. What should (and does) happen is that the first fetch grabs both .idx files and one .pack file, and then the fetch of the second branch re-uses the previously downloaded .idx files (fetching none) and grabs the now-required .pack file. Since the next few patches will be touching this area of the code, let's beef up the test a little by checking that we're downloading the expected items at each step. Signed-off-by: Jeff King --- t/t5550-http-fetch-dumb.sh | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/t/t5550-http-fetch-dumb.sh b/t/t5550-http-fetch-dumb.sh index 58189c9f7d..3968b82260 100755 --- a/t/t5550-http-fetch-dumb.sh +++ b/t/t5550-http-fetch-dumb.sh @@ -307,6 +307,14 @@ test_expect_success 'fetch notices corrupt idx' ' ) ' +# usage: count_fetches +count_fetches () { + # ignore grep exit code; it may return non-zero if we are expecting no + # matches + grep "GET .*objects/pack/pack-[a-z0-9]*.$2" "$3" >trace.count + test_line_count = "$1" trace.count +} + test_expect_success 'fetch can handle previously-fetched .idx files' ' git checkout --orphan branch1 && echo base >file && @@ -321,8 +329,14 @@ test_expect_success 'fetch can handle previously-fetched .idx files' ' git push "$HTTPD_DOCUMENT_ROOT_PATH"/repo_packed_branches.git branch2 && git --git-dir="$HTTPD_DOCUMENT_ROOT_PATH"/repo_packed_branches.git repack -d && git --bare init clone_packed_branches.git && - git --git-dir=clone_packed_branches.git fetch "$HTTPD_URL"/dumb/repo_packed_branches.git branch1:branch1 && - git --git-dir=clone_packed_branches.git fetch "$HTTPD_URL"/dumb/repo_packed_branches.git branch2:branch2 + GIT_TRACE_CURL=$PWD/one.trace git --git-dir=clone_packed_branches.git \ + fetch "$HTTPD_URL"/dumb/repo_packed_branches.git branch1:branch1 && + count_fetches 2 idx one.trace && + count_fetches 1 pack one.trace && + GIT_TRACE_CURL=$PWD/two.trace git --git-dir=clone_packed_branches.git \ + fetch "$HTTPD_URL"/dumb/repo_packed_branches.git branch2:branch2 && + count_fetches 0 idx two.trace && + count_fetches 1 pack two.trace ' test_expect_success 'did not use upload-pack service' ' From patchwork Fri Oct 25 06:58:06 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 13850130 Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A53661B0F03 for ; Fri, 25 Oct 2024 06:58:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=104.130.231.41 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729839491; cv=none; b=n8UiQ5MJLlTBJ5SvwR9qHVHRkEyCjqJJYNp+ejoT1jtxJHpVA9em2U83OKssEYXSxd0EPvsYOmElVVH7tcvSYNF6Q+F6Ql/GfDzF8hXUBwnY01uJ5d8PjijTdAzFxLG2dIhjWoNrw3kPfCs08HBb+2VOg+s1vK8T1PZcO6bG22g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729839491; c=relaxed/simple; bh=O7r3qvrXpR3keAF3q9i5XGknj11Jas45RtWCVeZdwtA=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=RU+gml4ZK+CIJSrbr5MY3ovGIhE9tAK40AIcdghF4szxdjipIzWQKX5KBZri7B6dsXcS6haqTudIIJG1Im9LE1dR7PPf8lxWadwzLM8gvK9tkeQCxk6s41KRM9FrBhV3Xx4fzqdC/rfZSramLyc4jpPVEzzpDXaSpZbFQRb3NGA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=peff.net; spf=pass smtp.mailfrom=peff.net; dkim=pass (2048-bit key) header.d=peff.net header.i=@peff.net header.b=NDBnP53L; arc=none smtp.client-ip=104.130.231.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=peff.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=peff.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=peff.net header.i=@peff.net header.b="NDBnP53L" Received: (qmail 484 invoked by uid 109); 25 Oct 2024 06:58:07 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed; d=peff.net; h=date:from:to:cc:subject:message-id:references:mime-version:content-type:in-reply-to; s=20240930; bh=O7r3qvrXpR3keAF3q9i5XGknj11Jas45RtWCVeZdwtA=; b=NDBnP53LrZEcfJrdF/iTl1RxYpXuXufQhqelOHtB9mnV3Af+yzaSjqCnN0ASMyTTTZBnnjDWVNpu5F1erZCdYZ0A9Qh/MVXLuiYjmuekPcrlX2fqEv0VGkSVNfzDjbpFJ+y1MI4jgzcKmHFx7EPcC2Fyq6a3eUrsfWAaeF7LssdoxUN7fVrLLwNIeO7DaVmG/9BSdKR352oAqjLXPDD5q5p8AKARwhQ9r9OHcCtHy0JY1+hmGz8ln6wSUGDkFPs7MDkTaclpKasew1B/oL5Mgt0lMZgU5/tSTZdxpuZ9kQVl+pCW2VbbJALbqjWvBP8EY9oIfSWuoKvYSlyVMa1vzQ== Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Fri, 25 Oct 2024 06:58:07 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 12790 invoked by uid 111); 25 Oct 2024 06:58:06 -0000 Received: from coredump.intra.peff.net (HELO coredump.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Fri, 25 Oct 2024 02:58:06 -0400 Authentication-Results: peff.net; auth=none Date: Fri, 25 Oct 2024 02:58:06 -0400 From: Jeff King To: Taylor Blau Cc: fox , Eric Sunshine , git@vger.kernel.org Subject: [PATCH 03/11] dumb-http: store downloaded pack idx as tempfile Message-ID: <20241025065806.GC2110355@coredump.intra.peff.net> References: <20241025064148.GA2110169@coredump.intra.peff.net> Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20241025064148.GA2110169@coredump.intra.peff.net> This patch fixes a regression in b1b8dfde69 (finalize_object_file(): implement collision check, 2024-09-26) where fetching a v1 pack idx file over the dumb-http protocol would cause the fetch to fail. The core of the issue is that dumb-http stores the idx we fetch from the remote at the same path that will eventually hold the idx we generate from "index-pack --stdin". The sequence is something like this: 0. We realize we need some object X, which we don't have locally, and nor does the other side have it as a loose object. 1. We download the list of remote packs from objects/info/packs. 2. For each entry in that file, we download each pack index and store it locally in .git/objects/pack/pack-$hash.idx (the $hash is not something we can verify yet and is given to us by the remote). 3. We check each pack index we got to see if it has object X. When we find a match, we download the matching .pack file from the remote to a tempfile. We feed that to "index-pack --stdin", which reindexes the pack, rather than trusting that it has what the other side claims it does. In most cases, this will end up generating the exact same (byte-for-byte) pack index which we'll store at the same pack-$hash.idx path, because the index generation and $hash id are computed based on what's in the packfile. But: a. The other side might have used other options to generate the index. For instance we use index v2 by default, but long ago it was v1 (and you can still ask for v1 explicitly). b. The other side might even use a different mechanism to determine $hash. E.g., long ago it was based on the sorted list of objects in the packfile, but we switched to using the pack checksum in 1190a1acf8 (pack-objects: name pack files after trailer hash, 2013-12-05). The regression we saw in the real world was (3a). A recent client fetching from a server with a v1 index downloaded that index, then complained about trying to overwrite it with its own v2 index. This collision is otherwise harmless; we know we want to replace the remote version with our local one, but the collision check doesn't realize that. There are a few options to fix it: - we could teach index-pack a command-line option to ignore only pack idx collisions, and use it when the dumb-http code invokes index-pack. This would be an awkward thing to expose users to and would involve a lot of boilerplate to get the option down to the collision code. - we could delete the remote .idx file right before running index-pack. It should be redundant at that point (since we've just downloaded the matching pack). But it feels risky to delete something from our own .git/objects based on what the other side has said. I'm not entirely positive that a malicious server couldn't lie about which pack-$hash.idx it has and get us to delete something precious. - we can stop co-mingling the downloaded idx files in our local objects directory. This is a slightly bigger change but I think fixes the root of the problem more directly. This patch implements the third option. The big design questions are: where do we store the downloaded files, and how do we manage their lifetimes? There are some additional quirks to the dumb-http system we should consider. Remember that in step 2 we downloaded every pack index, but in step 3 we may only download some of the matching packs. What happens to those other idx files now? They sit in the .git/objects/pack directory, possibly waiting to be used at a later date. That may save bandwidth for a subsequent fetch, but it also creates a lot of weird corner cases: - our local object directory now has semi-untrusted .idx files sitting around, without their matching .pack - in case 3b, we noted that we might not generate the same hash as the other side. In that case even if we download the matching pack, our index-pack invocation will store it in a different pack-$hash.idx file. And the unmatched .idx will sit there forever. - if the server repacks, it may delete the old packs. Now we have these orphaned .idx files sitting around locally that will never be used (nor deleted). - if we repack locally we may delete our local version of the server's pack index and not realize we have it. So we'll download it again, even though we have all of the objects it mentions. I think the right solution here is probably some more complex cache management system: download the remote .idx files to their own storage directory, mark them as "seen" when we get their matching pack (to avoid re-downloading even if we repack), and then delete them when the server's objects/info/refs no longer mentions them. But since the dumb http protocol is so ancient and so inferior to the smart http protocol, I don't think it's worth spending a lot of time creating such a system. For this patch I'm just downloading the idx files to .git/objects/tmp_pack_*, and marking them as tempfiles to be deleted when we exit (and due to the name, any we miss due to a crash, etc, should eventually be removed by "git gc" runs based on timestamps). That is slightly worse for one case: if we download an idx but not the matching pack, we won't retain that idx for subsequent runs. But the flip side is that we're making other cases better (we never hold on to useless idx files forever). I suspect that worse case does not even come up often, since it implies that the packs are generated to match distinct parts of history (i.e., in practice even in a repo with many packs you're going to end up grabbing all of those packs to do a clone). If somebody really cares about that, I think the right path forward is a managed cache directory as above, and this patch is providing the first step in that direction anyway (by moving things out of the objects/pack/ directory). There are two test changes. One demonstrates the broken v1 index case (it double-checks the resulting clone with fsck to be careful, but prior to this patch it actually fails at the clone step). The other tweaks the expectation for a test that covers the "slightly worse" case to accommodate the extra index download. The code changes are fairly simple. We stop using finalize_object_file() to copy the remote's index file into place, and leave it as a tempfile. We give the tempfile a real ".idx" name, since the packfile code expects that, and thus we make sure it is out of the usual packs/ directory (so we'd never mistake it for a real local .idx). We also have to change parse_pack_index(), which creates a temporary packed_git to access our index (we need this because all of the pack idx code assumes we have that struct). It reads the index data from the tempfile, but prior to this patch would speculatively write the finalized name into the packed_git struct using the pack-$hash we expect to use. I was mildly surprised that this worked at all, since we call verify_pack_index() on the packed_git which mentions the final name before moving the file into place! But it works because parse_pack_index() leaves the mmap-ed data in the struct, so the lazy-open in verify_pack_index() never triggers, and we read from the tempfile, ignoring the filename in the struct completely. Hacky, but it works. After this patch, parse_pack_index() now uses the index filename we pass in to derive a matching .pack name. This is OK to change because there are only two callers, both in the dumb http code (and the other passes in an existing pack-$hash.idx name, so the derived name is going to be pack-$hash.pack, which is what we were using anyway). I'll follow up with some more cleanups in that area, but this patch is sufficient to fix the regression. Reported-by: fox Signed-off-by: Jeff King --- http.c | 25 ++++++++++++++++++++----- packfile.c | 11 ++++++++++- t/t5550-http-fetch-dumb.sh | 12 +++++++++++- 3 files changed, 41 insertions(+), 7 deletions(-) diff --git a/http.c b/http.c index d59e59f66b..9642cad2e3 100644 --- a/http.c +++ b/http.c @@ -19,6 +19,7 @@ #include "string-list.h" #include "object-file.h" #include "object-store-ll.h" +#include "tempfile.h" static struct trace_key trace_curl = TRACE_KEY_INIT(CURL); static int trace_curl_data = 1; @@ -2388,8 +2389,24 @@ static char *fetch_pack_index(unsigned char *hash, const char *base_url) strbuf_addf(&buf, "objects/pack/pack-%s.idx", hash_to_hex(hash)); url = strbuf_detach(&buf, NULL); - strbuf_addf(&buf, "%s.temp", sha1_pack_index_name(hash)); - tmp = strbuf_detach(&buf, NULL); + /* + * Don't put this into packs/, since it's just temporary and we don't + * want to confuse it with our local .idx files. We'll generate our + * own index if we choose to download the matching packfile. + * + * It's tempting to use xmks_tempfile() here, but it's important that + * the file not exist, otherwise http_get_file() complains. So we + * create a filename that should be unique, and then just register it + * as a tempfile so that it will get cleaned up on exit. + * + * In theory we could hold on to the tempfile and delete these as soon + * as we download the matching pack, but it would take a bit of + * refactoring. Leaving them until the process ends is probably OK. + */ + tmp = xstrfmt("%s/tmp_pack_%s.idx", + repo_get_object_directory(the_repository), + hash_to_hex(hash)); + register_tempfile(tmp); if (http_get_file(url, tmp, NULL) != HTTP_OK) { error("Unable to get pack index %s", url); @@ -2427,10 +2444,8 @@ static int fetch_and_setup_pack_index(struct packed_git **packs_head, } ret = verify_pack_index(new_pack); - if (!ret) { + if (!ret) close_pack_index(new_pack); - ret = finalize_object_file(tmp_idx, sha1_pack_index_name(sha1)); - } free(tmp_idx); if (ret) return -1; diff --git a/packfile.c b/packfile.c index df4ba67719..16d3bcf7f7 100644 --- a/packfile.c +++ b/packfile.c @@ -237,13 +237,22 @@ static struct packed_git *alloc_packed_git(int extra) return p; } +static char *pack_path_from_idx(const char *idx_path) +{ + size_t len; + if (!strip_suffix(idx_path, ".idx", &len)) + BUG("idx path does not end in .idx: %s", idx_path); + return xstrfmt("%.*s.pack", (int)len, idx_path); +} + struct packed_git *parse_pack_index(unsigned char *sha1, const char *idx_path) { - const char *path = sha1_pack_name(sha1); + char *path = pack_path_from_idx(idx_path); size_t alloc = st_add(strlen(path), 1); struct packed_git *p = alloc_packed_git(alloc); memcpy(p->pack_name, path, alloc); /* includes NUL */ + free(path); hashcpy(p->hash, sha1, the_repository->hash_algo); if (check_packed_git_idx(idx_path, p)) { free(p); diff --git a/t/t5550-http-fetch-dumb.sh b/t/t5550-http-fetch-dumb.sh index 3968b82260..706540ab74 100755 --- a/t/t5550-http-fetch-dumb.sh +++ b/t/t5550-http-fetch-dumb.sh @@ -335,7 +335,7 @@ test_expect_success 'fetch can handle previously-fetched .idx files' ' count_fetches 1 pack one.trace && GIT_TRACE_CURL=$PWD/two.trace git --git-dir=clone_packed_branches.git \ fetch "$HTTPD_URL"/dumb/repo_packed_branches.git branch2:branch2 && - count_fetches 0 idx two.trace && + count_fetches 1 idx two.trace && count_fetches 1 pack two.trace ' @@ -521,4 +521,14 @@ test_expect_success 'fetching via http alternates works' ' git -c http.followredirects=true clone "$HTTPD_URL/dumb/alt-child.git" ' +test_expect_success 'dumb http can fetch index v1' ' + server=$HTTPD_DOCUMENT_ROOT_PATH/idx-v1.git && + git init --bare "$server" && + git -C "$server" --work-tree=. commit --allow-empty -m foo && + git -C "$server" -c pack.indexVersion=1 gc && + + git clone "$HTTPD_URL/dumb/idx-v1.git" && + git -C idx-v1 fsck +' + test_done From patchwork Fri Oct 25 07:00:09 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 13850131 Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D06731AF0DE for ; Fri, 25 Oct 2024 07:00:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=104.130.231.41 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729839614; cv=none; b=vFLXNH2uvQ+QNZw0fqJBpAuSBAtaVdBASeWs1gwcf9QbbBLTW4s4QUlkdwXz0Rrm/ikpTlHll8jJ468U6q42ogYh9nr+rQgZhnkOs3VdHhkzesJPnbpy5GCwLoNvDy61S9Vq0EfPydw8v4HI1mM14onzizvxK10ArLNABfarLqY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729839614; c=relaxed/simple; bh=VQjlTCqxlJxLPvpUAtD57p3Xs8gclA30Bg6ooHtuN+w=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=MXOp6U1lSB65Wi+42mVlA/cUZksqL9Hv4UzENnQaDnBCVg2lktMPYQgG4szOoU0RJaYon7SrlMfVLxzndTdpuNbt9CWtxIeHtVr2DTQmNA7n9Mun0lDXthIaK9KLv63yif7CRTvQTygYl81lMknSmzmMjdmitYGhxEve5RV1r6U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=peff.net; spf=pass smtp.mailfrom=peff.net; dkim=pass (2048-bit key) header.d=peff.net header.i=@peff.net header.b=Zji/KSDw; arc=none smtp.client-ip=104.130.231.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=peff.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=peff.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=peff.net header.i=@peff.net header.b="Zji/KSDw" Received: (qmail 501 invoked by uid 109); 25 Oct 2024 07:00:10 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed; d=peff.net; h=date:from:to:cc:subject:message-id:references:mime-version:content-type:in-reply-to; s=20240930; bh=VQjlTCqxlJxLPvpUAtD57p3Xs8gclA30Bg6ooHtuN+w=; b=Zji/KSDwKYNyLbtWbXpPzBZ9g4idY4+uDwvX7J8oEZPxUMiIvcTpojzRZVtVZUGCxhy7/y9L5XY+Xcn+no2CfvJ+wO3cTZH56qqTJK1mszJMiqOqluU5tfkT+n29j54j2DZa5Eqtjc+atPgWNA6DmC5OhLSbnIx+O+Bqr7l4TJOhFxUli1FvjrOrduJga73W8fZO1aIibVIpzixJ/0Lw6+zkPFM8jq23CTPJQSjQr2gaPpkQ/SvEzgwpUE7ABu47yErUwi8chSDU2WZFa9aXFpiw0E9hoH6NYgErGr6Og2pWC78mKLUc985E46QsRMEZeXyxMcu88chWHh2KMuSTYw== Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Fri, 25 Oct 2024 07:00:10 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 12834 invoked by uid 111); 25 Oct 2024 07:00:10 -0000 Received: from coredump.intra.peff.net (HELO coredump.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Fri, 25 Oct 2024 03:00:10 -0400 Authentication-Results: peff.net; auth=none Date: Fri, 25 Oct 2024 03:00:09 -0400 From: Jeff King To: Taylor Blau Cc: fox , Eric Sunshine , git@vger.kernel.org Subject: [PATCH 04/11] packfile: drop has_pack_index() Message-ID: <20241025070009.GD2110355@coredump.intra.peff.net> References: <20241025064148.GA2110169@coredump.intra.peff.net> Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20241025064148.GA2110169@coredump.intra.peff.net> The has_pack_index() function has several oddities that may make it surprising if you are trying to find out if we have a pack with some $hash: - it is not looking for a valid pack that we found while searching object directories. It just looks for any pack-$hash.idx file in the pack directory. - it only looks in the local directory, not any alternates - it takes a bare "unsigned char" hash, which we try to avoid these days The only caller it has is in the dumb http code; it wants to know if we already have the pack idx in question. This can happen if we downloaded the pack (and generated its index) during a previous fetch. Before the previous patch ("dumb-http: store downloaded pack idx as tempfile"), it could also happen if we downloaded the .idx from the remote but didn't get the matching .pack. But since that patch, we don't hold on to those .idx files. So there's no need to look for the .idx file in the filesystem; we can just scan through the packed_git list to see if we have it. That lets us simplify the dumb http code a bit, as we know that if we have the .idx we have the matching .pack already. And it lets us get rid of this odd function that is unlikely to be needed again. Signed-off-by: Jeff King --- http.c | 15 ++++++++------- packfile.c | 8 -------- packfile.h | 2 -- 3 files changed, 8 insertions(+), 17 deletions(-) diff --git a/http.c b/http.c index 9642cad2e3..03802b9049 100644 --- a/http.c +++ b/http.c @@ -2420,15 +2420,17 @@ static char *fetch_pack_index(unsigned char *hash, const char *base_url) static int fetch_and_setup_pack_index(struct packed_git **packs_head, unsigned char *sha1, const char *base_url) { - struct packed_git *new_pack; + struct packed_git *new_pack, *p; char *tmp_idx = NULL; int ret; - if (has_pack_index(sha1)) { - new_pack = parse_pack_index(sha1, sha1_pack_index_name(sha1)); - if (!new_pack) - return -1; /* parse_pack_index() already issued error message */ - goto add_pack; + /* + * If we already have the pack locally, no need to fetch its index or + * even add it to list; we already have all of its objects. + */ + for (p = get_all_packs(the_repository); p; p = p->next) { + if (hasheq(p->hash, sha1, the_repository->hash_algo)) + return 0; } tmp_idx = fetch_pack_index(sha1, base_url); @@ -2450,7 +2452,6 @@ static int fetch_and_setup_pack_index(struct packed_git **packs_head, if (ret) return -1; -add_pack: new_pack->next = *packs_head; *packs_head = new_pack; return 0; diff --git a/packfile.c b/packfile.c index 16d3bcf7f7..0ead2290d4 100644 --- a/packfile.c +++ b/packfile.c @@ -2160,14 +2160,6 @@ int has_object_kept_pack(const struct object_id *oid, unsigned flags) return find_kept_pack_entry(the_repository, oid, flags, &e); } -int has_pack_index(const unsigned char *sha1) -{ - struct stat st; - if (stat(sha1_pack_index_name(sha1), &st)) - return 0; - return 1; -} - int for_each_object_in_pack(struct packed_git *p, each_packed_object_fn cb, void *data, enum for_each_object_flags flags) diff --git a/packfile.h b/packfile.h index 0f78658229..b4df3546a3 100644 --- a/packfile.h +++ b/packfile.h @@ -193,8 +193,6 @@ int find_kept_pack_entry(struct repository *r, const struct object_id *oid, unsi int has_object_pack(const struct object_id *oid); int has_object_kept_pack(const struct object_id *oid, unsigned flags); -int has_pack_index(const unsigned char *sha1); - /* * Return 1 if an object in a promisor packfile is or refers to the given * object, 0 otherwise. From patchwork Fri Oct 25 07:00:55 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 13850132 Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5F07A18D647 for ; Fri, 25 Oct 2024 07:00:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=104.130.231.41 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729839659; cv=none; b=QaDkMGLf/WqOby/Nlomov/7fUWhLNZyBkJSuzVJ/7I6s096/nojOU7iA4k+TnSOZXHiaMGYhdekJtMJYOdgOvQlrxozH0F2iDFWlY/HcgUrPFmvXh/cdke1DYN1Xwj9EwC9dkrgO5OQ6BBeGsKAQbMmS6xLJ98/hbM4FmPyukNw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729839659; c=relaxed/simple; bh=xQ7NKb3oJjUwBu6QKUbTJlAdSzABMkVijmtYX8CqTBQ=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=C4tFcA2TUs9rH+hM/PqRvB4KawMYluytL4HUNG/HUjOVlyOkq7Ye7Xtdv9tlfR4ElZkfZpceUHiPdLcMOvOFQLxiHiWni9xmOnS4EuQsEvQBHpTsP7Lxhl572rdoZsUkK0jGbnTr7rZyKYvJcY/etP8FPYxj1az8/IlzEaEwTxg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=peff.net; spf=pass smtp.mailfrom=peff.net; dkim=pass (2048-bit key) header.d=peff.net header.i=@peff.net header.b=FITC21nJ; arc=none smtp.client-ip=104.130.231.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=peff.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=peff.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=peff.net header.i=@peff.net header.b="FITC21nJ" Received: (qmail 513 invoked by uid 109); 25 Oct 2024 07:00:56 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed; d=peff.net; h=date:from:to:cc:subject:message-id:references:mime-version:content-type:in-reply-to; s=20240930; bh=xQ7NKb3oJjUwBu6QKUbTJlAdSzABMkVijmtYX8CqTBQ=; b=FITC21nJtVLzSLUr2VvlUi3mEICnRubq05ENVVehhUHDjWehdCo0aFwwBFt4vbICC7g9NZf+U9PdWzZ/6Htt/CJjJ8a1tISsSJDx5KR4LLalmIWJL+yspHnYmV91o9uJEiprbipb53fo72pjoDexKq2XDMWJkm5hvqnaJnFNPDwNY1QWUn9LzoFCx0TzBkl0+ZZS+Qr+Bbxwck4M8psOkLDSuxAOzZjiXC+dk12jbXuAyMfv7HVXSgdP+c4rRSiHQXNpsOQZwQ1VjLvvrRaGz11ycTG9ZwN/E2kkm2GYbcwguMlMq7XmzC26lpYyd/ngVhmY75kNDaiQ2iIzdxdAXg== Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Fri, 25 Oct 2024 07:00:56 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 12841 invoked by uid 111); 25 Oct 2024 07:00:56 -0000 Received: from coredump.intra.peff.net (HELO coredump.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Fri, 25 Oct 2024 03:00:56 -0400 Authentication-Results: peff.net; auth=none Date: Fri, 25 Oct 2024 03:00:55 -0400 From: Jeff King To: Taylor Blau Cc: fox , Eric Sunshine , git@vger.kernel.org Subject: [PATCH 05/11] packfile: drop sha1_pack_name() Message-ID: <20241025070055.GE2110355@coredump.intra.peff.net> References: <20241025064148.GA2110169@coredump.intra.peff.net> Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20241025064148.GA2110169@coredump.intra.peff.net> The sha1_pack_name() function has a few ugly bits: - it writes into a static strbuf (and not even a ring buffer of them), which can lead to subtle invalidation problems - it uses the term "sha1", but it's really using the_hash_algo, which could be sha256 There's only one caller of it left. And in fact that caller is better off using the underlying odb_pack_name() function itself, since it's just copying the result into its own strbuf anyway. Converting that caller lets us get rid of this now-obselete function. Signed-off-by: Jeff King --- http.c | 3 ++- packfile.c | 6 ------ packfile.h | 7 ------- 3 files changed, 2 insertions(+), 14 deletions(-) diff --git a/http.c b/http.c index 03802b9049..510332ab04 100644 --- a/http.c +++ b/http.c @@ -2579,7 +2579,8 @@ struct http_pack_request *new_direct_http_pack_request( preq->url = url; - strbuf_addf(&preq->tmpfile, "%s.temp", sha1_pack_name(packed_git_hash)); + odb_pack_name(&preq->tmpfile, packed_git_hash, "pack"); + strbuf_addstr(&preq->tmpfile, ".temp"); preq->packfile = fopen(preq->tmpfile.buf, "a"); if (!preq->packfile) { error("Unable to open local file %s for pack", diff --git a/packfile.c b/packfile.c index 0ead2290d4..48d650161f 100644 --- a/packfile.c +++ b/packfile.c @@ -35,12 +35,6 @@ char *odb_pack_name(struct strbuf *buf, return buf->buf; } -char *sha1_pack_name(const unsigned char *sha1) -{ - static struct strbuf buf = STRBUF_INIT; - return odb_pack_name(&buf, sha1, "pack"); -} - char *sha1_pack_index_name(const unsigned char *sha1) { static struct strbuf buf = STRBUF_INIT; diff --git a/packfile.h b/packfile.h index b4df3546a3..2bbcc58571 100644 --- a/packfile.h +++ b/packfile.h @@ -31,13 +31,6 @@ struct pack_entry { */ char *odb_pack_name(struct strbuf *buf, const unsigned char *sha1, const char *ext); -/* - * Return the name of the (local) packfile with the specified sha1 in - * its name. The return value is a pointer to memory that is - * overwritten each time this function is called. - */ -char *sha1_pack_name(const unsigned char *sha1); - /* * Return the name of the (local) pack index file with the specified * sha1 in its name. The return value is a pointer to memory that is From patchwork Fri Oct 25 07:01:25 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 13850133 Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EC87318C03C for ; Fri, 25 Oct 2024 07:01:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=104.130.231.41 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729839692; cv=none; b=AlOACApGqiQYvZwn+jbk/pJ6zoDfs7cAio13YHgZQqDGlVALjUZgVMPRsxJ67K0nbZx+rysH9ySFvah51Pt3WE7TLeeJU3Bcz+gWbbi6v84tbq/cZzSlwltal1sAV2Zb4+Am+DW+4umvg8xdfocFEn/jqqerCChO8P6fsGSopLs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729839692; c=relaxed/simple; bh=IGUcww4EptP2drMe9x7JEyDTWUxZ2QY5brhP0sQ7bX4=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=simGGLECFT9Z6sSC/OmMsMn7pQB6dpH0hdg0J/nfyrG9oGvt+SgPoHI7rgsdOc6G/lBcYg/g/uHjesp0J7sCh6Hr9Y0/JokecuZunHKTrSM5M+jTPWu1LokM3eGTWJ8/YKunZ1kKpG03IwH1ShqiDOF2pHyJO+PpsphYCx+v+Ws= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=peff.net; spf=pass smtp.mailfrom=peff.net; dkim=pass (2048-bit key) header.d=peff.net header.i=@peff.net header.b=Icds5HHH; arc=none smtp.client-ip=104.130.231.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=peff.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=peff.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=peff.net header.i=@peff.net header.b="Icds5HHH" Received: (qmail 527 invoked by uid 109); 25 Oct 2024 07:01:26 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed; d=peff.net; h=date:from:to:cc:subject:message-id:references:mime-version:content-type:in-reply-to; s=20240930; bh=IGUcww4EptP2drMe9x7JEyDTWUxZ2QY5brhP0sQ7bX4=; b=Icds5HHHFJrLBJBAdZb5L4rfWU1Lgoa8REdc1xjxEzzxCj2tOJwzv/NddOJbXBp5+rcSp0JiAMwZ1zodZrTcfjJYo1VyOievlENeu+kcYS4Yz8lgIsBqL+/rkJ/mrroIFjTwaIgvAk8xgVvr9itsaBC87A/HQgSVNeIJPQizydwOSM1GPp45EnNBQhJ2ak4RZdNigUPiQUYorfSxdPG4n1XtusnJXnPTQJgqIhtabfQksE96NJcrVX5/Exz63R5k/TyqFanB7HwtkUjh2CTG6QZrReAjuc8mNxRAhZ7+6Je7doI3cIpLZ3sG9QlqrTOK9Ft5kq/T0DxsGfCEcX9/iQ== Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Fri, 25 Oct 2024 07:01:26 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 12847 invoked by uid 111); 25 Oct 2024 07:01:26 -0000 Received: from coredump.intra.peff.net (HELO coredump.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Fri, 25 Oct 2024 03:01:26 -0400 Authentication-Results: peff.net; auth=none Date: Fri, 25 Oct 2024 03:01:25 -0400 From: Jeff King To: Taylor Blau Cc: fox , Eric Sunshine , git@vger.kernel.org Subject: [PATCH 06/11] packfile: drop sha1_pack_index_name() Message-ID: <20241025070125.GF2110355@coredump.intra.peff.net> References: <20241025064148.GA2110169@coredump.intra.peff.net> Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20241025064148.GA2110169@coredump.intra.peff.net> Like sha1_pack_name() that we dropped in the previous commit, this function uses an error-prone static strbuf and the somewhat misleading name "sha1". The only caller left is in pack-redundant.c. While this command is marked for potential removal in our BreakingChanges document, we still have it for now. But it's simple enough to convert it to use its own strbuf with the underlying odb_pack_name() function, letting us drop the otherwise obsolete function. Note that odb_pack_name() does its own strbuf_reset(), so it's safe to use directly within a loop like this. Signed-off-by: Jeff King --- builtin/pack-redundant.c | 5 ++++- packfile.c | 6 ------ packfile.h | 7 ------- 3 files changed, 4 insertions(+), 14 deletions(-) diff --git a/builtin/pack-redundant.c b/builtin/pack-redundant.c index 5809613002..d2c1c4e5ec 100644 --- a/builtin/pack-redundant.c +++ b/builtin/pack-redundant.c @@ -13,6 +13,7 @@ #include "packfile.h" #include "object-store-ll.h" +#include "strbuf.h" #define BLKSIZE 512 @@ -591,6 +592,7 @@ static void load_all(void) int cmd_pack_redundant(int argc, const char **argv, const char *prefix UNUSED, struct repository *repo UNUSED) { int i; int i_still_use_this = 0; struct pack_list *min = NULL, *red, *pl; struct llist *ignore; + struct strbuf idx_name = STRBUF_INIT; char buf[GIT_MAX_HEXSZ + 2]; /* hex hash + \n + \0 */ if (argc == 2 && !strcmp(argv[1], "-h")) @@ -688,7 +690,7 @@ int cmd_pack_redundant(int argc, const char **argv, const char *prefix UNUSED, s pl = red = pack_list_difference(local_packs, min); while (pl) { printf("%s\n%s\n", - sha1_pack_index_name(pl->pack->hash), + odb_pack_name(&idx_name, pl->pack->hash, "idx"), pl->pack->pack_name); pl = pl->next; } @@ -699,5 +701,6 @@ int cmd_pack_redundant(int argc, const char **argv, const char *prefix UNUSED, s pack_list_free(red); pack_list_free(min); llist_free(ignore); + strbuf_release(&idx_name); return 0; } diff --git a/packfile.c b/packfile.c index 48d650161f..0d4fd2737a 100644 --- a/packfile.c +++ b/packfile.c @@ -35,12 +35,6 @@ char *odb_pack_name(struct strbuf *buf, return buf->buf; } -char *sha1_pack_index_name(const unsigned char *sha1) -{ - static struct strbuf buf = STRBUF_INIT; - return odb_pack_name(&buf, sha1, "idx"); -} - static unsigned int pack_used_ctr; static unsigned int pack_mmap_calls; static unsigned int peak_pack_open_windows; diff --git a/packfile.h b/packfile.h index 2bbcc58571..6812abaae0 100644 --- a/packfile.h +++ b/packfile.h @@ -31,13 +31,6 @@ struct pack_entry { */ char *odb_pack_name(struct strbuf *buf, const unsigned char *sha1, const char *ext); -/* - * Return the name of the (local) pack index file with the specified - * sha1 in its name. The return value is a pointer to memory that is - * overwritten each time this function is called. - */ -char *sha1_pack_index_name(const unsigned char *sha1); - /* * Return the basename of the packfile, omitting any containing directory * (e.g., "pack-1234abcd[...].pack"). From patchwork Fri Oct 25 07:02:22 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 13850134 Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BEA2218C03C for ; Fri, 25 Oct 2024 07:02:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=104.130.231.41 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729839746; cv=none; b=srnOzORgBw3FtK7B3cv9invRaJfLzlZQgXmTyESluD4/bmbnjgH1QmHZBDHEimyomIq2EykSR3sh4sfXYC7uQ3oktvxFa+51IDS5Ap7YgnG773Q63JC5OgLAufkvj/xULP4sMb4iN8Gz61Rw/gf7xMu3X0/uaV7tUXnydMonONI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729839746; c=relaxed/simple; bh=YzYWuNJQY5StW769ixlHdyPJS5IJh1tdJeehnmrGeq8=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=JaPb0k7BGCcIUJ5+BIohBww+OENN1nRe47v0CK/T1NoTCz+J3f6yq1iTFQoUdvuHSx6Rfm9JavDCmnib9s3X+nXU/vlHsxvxcZYFeLfbq5Y4CvpXEteCWvQGvZ+j/dy+HxoRZouP+qVfIsAzeOGMCw7RNl4oKZijMSsBlIqA+Ms= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=peff.net; spf=pass smtp.mailfrom=peff.net; dkim=pass (2048-bit key) header.d=peff.net header.i=@peff.net header.b=WrZWt/Y9; arc=none smtp.client-ip=104.130.231.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=peff.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=peff.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=peff.net header.i=@peff.net header.b="WrZWt/Y9" Received: (qmail 541 invoked by uid 109); 25 Oct 2024 07:02:23 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed; d=peff.net; h=date:from:to:cc:subject:message-id:references:mime-version:content-type:in-reply-to; s=20240930; bh=YzYWuNJQY5StW769ixlHdyPJS5IJh1tdJeehnmrGeq8=; b=WrZWt/Y961v+i775p/a8qI6IpmgcH/jfBYJVQpTE8adiGGsEzDc3mcyHVF1ckbrMFJY/+mekBBwTKSk226q1Q32fE6JFb6hjWWIoUfDc0kdwFT7coWQo0C9GMTVC7S/gju4dFbV5lTnPdtxRZAkb6ufUoIa8hZam58WrflQWezKUWkzL7Gd4+0ZPb3pf/nx0Mr+r82uMZ9m/xnfk4ifpxbfPzbQ7ulCqyp3OZaIt03Hk2cZGVF97VuqzKKD2XVZVgbhEvHLgdJtLrfU7o8GipOwekpM/2iy9ejsQfbGF/YGVcNH5wenua0W82e82xaAYdIRJ/yxB0trgsPYuNiCW1w== Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Fri, 25 Oct 2024 07:02:23 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 12855 invoked by uid 111); 25 Oct 2024 07:02:23 -0000 Received: from coredump.intra.peff.net (HELO coredump.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Fri, 25 Oct 2024 03:02:23 -0400 Authentication-Results: peff.net; auth=none Date: Fri, 25 Oct 2024 03:02:22 -0400 From: Jeff King To: Taylor Blau Cc: fox , Eric Sunshine , git@vger.kernel.org Subject: [PATCH 07/11] packfile: warn people away from parse_packed_git() Message-ID: <20241025070222.GG2110355@coredump.intra.peff.net> References: <20241025064148.GA2110169@coredump.intra.peff.net> Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20241025064148.GA2110169@coredump.intra.peff.net> With a name like parse_packed_git(), you might think it's the right way to access a local pack index and its associated objects. But not so! It's a one-off used by the dumb-http code to access pack idx files we've downloaded from the remote, but whose packs we might not have. There's only one caller left for this function, and ideally we'd drop it completely and just inline it there. But that would require exposing other internals from packfile.[ch], like alloc_packed_git() and check_packed_git_idx(). So let's leave it be for now, and just warn people that it's probably not what they're looking for. Perhaps in the long run if we eventually drop dumb-http support, we can remove the function entirely then. Signed-off-by: Jeff King --- packfile.h | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/packfile.h b/packfile.h index 6812abaae0..ad393be9f1 100644 --- a/packfile.h +++ b/packfile.h @@ -37,6 +37,15 @@ char *odb_pack_name(struct strbuf *buf, const unsigned char *sha1, const char *e */ const char *pack_basename(struct packed_git *p); +/* + * Parse the pack idx file found at idx_path and create a packed_git struct + * which can be used with find_pack_entry_one(). + * + * You probably don't want to use this function! It skips most of the normal + * sanity checks (including whether we even have the matching .pack file), + * and does not add the resulting packed_git struct to the internal list of + * packs. You probably want add_packed_git() instead. + */ struct packed_git *parse_pack_index(unsigned char *sha1, const char *idx_path); typedef void each_file_in_pack_dir_fn(const char *full_path, size_t full_path_len, @@ -193,7 +202,7 @@ int is_promisor_object(const struct object_id *oid); * * This function should not be used directly. It is exposed here only so that we * have a convenient entry-point for fuzz testing. For real uses, you should - * probably use open_pack_index() or parse_pack_index() instead. + * probably use open_pack_index() instead. */ int load_idx(const char *path, const unsigned int hashsz, void *idx_map, size_t idx_size, struct packed_git *p); From patchwork Fri Oct 25 07:03:42 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 13850136 Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A028E18C03C for ; Fri, 25 Oct 2024 07:03:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=104.130.231.41 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729839826; cv=none; b=NvoGM9qnoYFoAWnedHjvtkd0oYzn4dAImsj27rPncm4NdJ7CFd4N2lO+or75l6UpuMFDZT196LDzXT/u208P4FjwV+Z00rlVE+EbuxwGI78EcO/klKK/J03Y/MCKMNjzinzN+/ZPXTHvIFBIAmNlhUk72iH2fKDF8myshP1jJbc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729839826; c=relaxed/simple; bh=x51T6HYxO7B25wj82pfZK7lcgjg8rQfanLKBc1VA558=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=ZpBojRKQnDWc+2hMR70FuUuzpcRJiqskW3soFFyNC5DbFa3c8jnhgDgF2iN2EwX01m4iB03mYRlUx2xLBKcoWC4fRIjg7YgqfVSl7E509lqL+RfGEjazjYy8z8Kn9VV2NDTKS3WAeY3turgUb6CB2UyusUCcTC5oTsnVl/12aLg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=peff.net; spf=pass smtp.mailfrom=peff.net; dkim=pass (2048-bit key) header.d=peff.net header.i=@peff.net header.b=SNhu2Z44; arc=none smtp.client-ip=104.130.231.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=peff.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=peff.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=peff.net header.i=@peff.net header.b="SNhu2Z44" Received: (qmail 552 invoked by uid 109); 25 Oct 2024 07:03:43 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed; d=peff.net; h=date:from:to:cc:subject:message-id:references:mime-version:content-type:in-reply-to; s=20240930; bh=x51T6HYxO7B25wj82pfZK7lcgjg8rQfanLKBc1VA558=; b=SNhu2Z44fROv4UvCpNRPb9ewHdFDjNuFhUPS/qQtb7+PcEyYz5BZxJlMVOWORbP7sviAaxce3oJeSkEtbE1Esa29sTmBwNi74riVBhZy+n2Zi3UgcT6rkZJFZM6ZxQ5enxoxRg0/Ua2/FyzsstiQzGgzNnSu81PIet501x0WV3kT0xEIQN/qYtRQ0w1IxtTGraJnRb348we/rKCRl9yLcDSHfmbMJptFkCBqhaC21iX7N0JYIX+mfmRRYjoT89+T80pKuK2dCJvl9nRaylfu58TNsvyOkeoPLKLHEe2fTUi5xxRQtFN2NnMhv+lDGQRCBWR9xXQMjcS6Cy9hsNa2dQ== Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Fri, 25 Oct 2024 07:03:43 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 12867 invoked by uid 111); 25 Oct 2024 07:03:43 -0000 Received: from coredump.intra.peff.net (HELO coredump.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Fri, 25 Oct 2024 03:03:43 -0400 Authentication-Results: peff.net; auth=none Date: Fri, 25 Oct 2024 03:03:42 -0400 From: Jeff King To: Taylor Blau Cc: fox , Eric Sunshine , git@vger.kernel.org Subject: [PATCH 08/11] http-walker: use object_id instead of bare hash Message-ID: <20241025070342.GH2110355@coredump.intra.peff.net> References: <20241025064148.GA2110169@coredump.intra.peff.net> Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20241025064148.GA2110169@coredump.intra.peff.net> We long ago switched most code to using object_id structs instead of bare "unsigned char *" hashes. This gives us more type safety from the compiler, and generally makes it easier to understand what we expect in each parameter. But the dumb-http code has lagged behind. And indeed, the whole "walker" subsystem interface has the same problem, though http-walker is the only user left. So let's update the walker interface to pass object_id structs (which we already have anyway at all call sites!), and likewise use those within the http-walker methods that it calls. This cleans up the dumb-http code a bit, but will also let us fix a few more commonly used helper functions. Signed-off-by: Jeff King --- http-walker.c | 25 +++++++++++++------------ walker.c | 4 ++-- walker.h | 4 ++-- 3 files changed, 17 insertions(+), 16 deletions(-) diff --git a/http-walker.c b/http-walker.c index fb2d86d5e7..36dd1f33c0 100644 --- a/http-walker.c +++ b/http-walker.c @@ -147,14 +147,14 @@ static int fill_active_slot(void *data UNUSED) return 0; } -static void prefetch(struct walker *walker, unsigned char *sha1) +static void prefetch(struct walker *walker, const struct object_id *oid) { struct object_request *newreq; struct walker_data *data = walker->data; newreq = xmalloc(sizeof(*newreq)); newreq->walker = walker; - oidread(&newreq->oid, sha1, the_repository->hash_algo); + oidcpy(&newreq->oid, oid); newreq->repo = data->alt; newreq->state = WAITING; newreq->req = NULL; @@ -422,7 +422,8 @@ static int fetch_indices(struct walker *walker, struct alt_base *repo) return ret; } -static int http_fetch_pack(struct walker *walker, struct alt_base *repo, unsigned char *sha1) +static int http_fetch_pack(struct walker *walker, struct alt_base *repo, + const struct object_id *oid) { struct packed_git *target; int ret; @@ -431,7 +432,7 @@ static int http_fetch_pack(struct walker *walker, struct alt_base *repo, unsigne if (fetch_indices(walker, repo)) return -1; - target = find_sha1_pack(sha1, repo->packs); + target = find_sha1_pack(oid->hash, repo->packs); if (!target) return -1; close_pack_index(target); @@ -440,7 +441,7 @@ static int http_fetch_pack(struct walker *walker, struct alt_base *repo, unsigne fprintf(stderr, "Getting pack %s\n", hash_to_hex(target->hash)); fprintf(stderr, " which contains %s\n", - hash_to_hex(sha1)); + oid_to_hex(oid)); } preq = new_http_pack_request(target->hash, repo->base); @@ -477,17 +478,17 @@ static void abort_object_request(struct object_request *obj_req) release_object_request(obj_req); } -static int fetch_object(struct walker *walker, unsigned char *hash) +static int fetch_object(struct walker *walker, const struct object_id *oid) { - char *hex = hash_to_hex(hash); + char *hex = oid_to_hex(oid); int ret = 0; struct object_request *obj_req = NULL; struct http_object_request *req; struct list_head *pos, *head = &object_queue_head; list_for_each(pos, head) { obj_req = list_entry(pos, struct object_request, node); - if (hasheq(obj_req->oid.hash, hash, the_repository->hash_algo)) + if (oideq(&obj_req->oid, oid)) break; } if (!obj_req) @@ -548,20 +549,20 @@ static int fetch_object(struct walker *walker, unsigned char *hash) return ret; } -static int fetch(struct walker *walker, unsigned char *hash) +static int fetch(struct walker *walker, const struct object_id *oid) { struct walker_data *data = walker->data; struct alt_base *altbase = data->alt; - if (!fetch_object(walker, hash)) + if (!fetch_object(walker, oid)) return 0; while (altbase) { - if (!http_fetch_pack(walker, altbase, hash)) + if (!http_fetch_pack(walker, altbase, oid)) return 0; fetch_alternates(walker, data->alt->base); altbase = altbase->next; } - return error("Unable to find %s under %s", hash_to_hex(hash), + return error("Unable to find %s under %s", oid_to_hex(oid), data->alt->base); } diff --git a/walker.c b/walker.c index 807a7a3881..5ea7e5b392 100644 --- a/walker.c +++ b/walker.c @@ -157,7 +157,7 @@ static int process(struct walker *walker, struct object *obj) else { if (obj->flags & COMPLETE) return 0; - walker->prefetch(walker, obj->oid.hash); + walker->prefetch(walker, &obj->oid); } object_list_insert(obj, process_queue_end); @@ -186,7 +186,7 @@ static int loop(struct walker *walker) * the queue because we needed to fetch it first. */ if (! (obj->flags & TO_SCAN)) { - if (walker->fetch(walker, obj->oid.hash)) { + if (walker->fetch(walker, &obj->oid)) { stop_progress(&progress); report_missing(obj); return -1; diff --git a/walker.h b/walker.h index d40b016bab..25aaa3631c 100644 --- a/walker.h +++ b/walker.h @@ -6,8 +6,8 @@ struct walker { void *data; int (*fetch_ref)(struct walker *, struct ref *ref); - void (*prefetch)(struct walker *, unsigned char *sha1); - int (*fetch)(struct walker *, unsigned char *sha1); + void (*prefetch)(struct walker *, const struct object_id *oid); + int (*fetch)(struct walker *, const struct object_id *oid); void (*cleanup)(struct walker *); int get_verbosely; int get_progress; From patchwork Fri Oct 25 07:05:03 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 13850137 Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C22EE67A0D for ; Fri, 25 Oct 2024 07:05:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=104.130.231.41 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729839908; cv=none; b=EsYnnnkX93F44DHhDvIkPS8nsm63JFCp4nc0l4XPW/HGHpvNslZSl9Oqh23B9vwxBo11Jxf/OBON9T/u3cyc8hbGrsF8BxDoOzyHe13g56dTMUDpMEdRPOZ2rkFNRUTzVnjhNavpdj+rALzN27etEpWX78w9ZYs5aqdwt+a/b00= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729839908; c=relaxed/simple; bh=6zxf/GTRXiAvW8ycrg7/fZ5H9l/mjpXa/ODqMDj2Gik=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=p6NUu1KGvjL4y9i23Mz7/jWBU/CvINdYYndepZwlfav6sB/NJkEwr/yNERVAqHFeRJs4gPA2iXUKasKKe58hYZU1ZwSrilseNKzBiuJy8eej+h48I9Y5tLo+4ik1eYH9lAMg1NkRPHx4kw1n4mLAFNKogWvhairGAs2DvGtExHU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=peff.net; spf=pass smtp.mailfrom=peff.net; dkim=pass (2048-bit key) header.d=peff.net header.i=@peff.net header.b=Vt0Wmdbq; arc=none smtp.client-ip=104.130.231.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=peff.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=peff.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=peff.net header.i=@peff.net header.b="Vt0Wmdbq" Received: (qmail 571 invoked by uid 109); 25 Oct 2024 07:05:04 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed; d=peff.net; h=date:from:to:cc:subject:message-id:references:mime-version:content-type:in-reply-to; s=20240930; bh=6zxf/GTRXiAvW8ycrg7/fZ5H9l/mjpXa/ODqMDj2Gik=; b=Vt0WmdbqFHm37ElhrIP9VHXdC892873DY14JIB9cbTSgG6AXgvrxIpWppB4UoZSl7Na5jMnOzRPSi4YmIZz6TiQp3KscGxfiTIFnCMuNEzgZkyWfwajCnpGyBrlWS6Na3SDF3MfdHj9s11+bXf4tQnjEIJpg/xVoeHIg74JNGYSAUOQ0k7G67mVZKxEeQWdfUmg7KzcLQYOMwVhXt52rawmbGzqNW55CB7qf2vzR/iiz/UvX/kxluGAZXckexrM6BMEGr8cu9f/O3opBXlcJj9U0pmzsXHB5lC9BdTWNO+IGDm4plE/qig/u1VaVdJA/4l+g9HXGQPWQwkgj7JVh/A== Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Fri, 25 Oct 2024 07:05:04 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 12940 invoked by uid 111); 25 Oct 2024 07:05:04 -0000 Received: from coredump.intra.peff.net (HELO coredump.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Fri, 25 Oct 2024 03:05:04 -0400 Authentication-Results: peff.net; auth=none Date: Fri, 25 Oct 2024 03:05:03 -0400 From: Jeff King To: Taylor Blau Cc: fox , Eric Sunshine , git@vger.kernel.org Subject: [PATCH 09/11] packfile: convert find_sha1_pack() to use object_id Message-ID: <20241025070503.GI2110355@coredump.intra.peff.net> References: <20241025064148.GA2110169@coredump.intra.peff.net> Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20241025064148.GA2110169@coredump.intra.peff.net> The find_sha1_pack() function has a few problems: - it's badly named, since it works with any object hash - it takes the hash as a bare pointer rather than an object_id struct We can fix both of these easily, as all callers actually have a real object_id anyway. I also found the existence of this function somewhat confusing, as it is about looking in an arbitrary set of linked packed_git structs. It's good for things like dumb-http which are looking in downloaded remote packs, and not our local packs. But despite the name, it is not a good way to find the pack which contains a local object (it skips the use of the midx, the pack mru list, and so on). So let's also add an explanatory comment above the declaration that may point people in the right direction. I suspect the calls in fast-import.c, which use the packed_git list from the repository struct, could actually just be using find_pack_entry(). But since we'd need to keep it anyway for dumb-http, I didn't dig further there. If we eventually drop dumb-http support, then it might be worth examining them to see if we can get rid of the function entirely. Signed-off-by: Jeff King --- builtin/fast-import.c | 6 ++---- http-push.c | 4 ++-- http-walker.c | 2 +- packfile.c | 6 +++--- packfile.h | 9 +++++++-- 5 files changed, 15 insertions(+), 12 deletions(-) diff --git a/builtin/fast-import.c b/builtin/fast-import.c index 1e7ab67f6e..76d5c20f14 100644 --- a/builtin/fast-import.c +++ b/builtin/fast-import.c @@ -966,8 +966,7 @@ static int store_object( if (e->idx.offset) { duplicate_count_by_type[type]++; return 1; - } else if (find_sha1_pack(oid.hash, - get_all_packs(the_repository))) { + } else if (find_oid_pack(&oid, get_all_packs(the_repository))) { e->type = type; e->pack_id = MAX_PACK_ID; e->idx.offset = 1; /* just not zero! */ @@ -1167,8 +1166,7 @@ static void stream_blob(uintmax_t len, struct object_id *oidout, uintmax_t mark) duplicate_count_by_type[OBJ_BLOB]++; truncate_pack(&checkpoint); - } else if (find_sha1_pack(oid.hash, - get_all_packs(the_repository))) { + } else if (find_oid_pack(&oid, get_all_packs(the_repository))) { e->type = OBJ_BLOB; e->pack_id = MAX_PACK_ID; e->idx.offset = 1; /* just not zero! */ diff --git a/http-push.c b/http-push.c index aad89f2eab..4d24e6b8d4 100644 --- a/http-push.c +++ b/http-push.c @@ -309,7 +309,7 @@ static void start_fetch_packed(struct transfer_request *request) struct transfer_request *check_request = request_queue_head; struct http_pack_request *preq; - target = find_sha1_pack(request->obj->oid.hash, repo->packs); + target = find_oid_pack(&request->obj->oid, repo->packs); if (!target) { fprintf(stderr, "Unable to fetch %s, will not be able to update server info refs\n", oid_to_hex(&request->obj->oid)); repo->can_update_info_refs = 0; @@ -681,7 +681,7 @@ static int add_send_request(struct object *obj, struct remote_lock *lock) get_remote_object_list(obj->oid.hash[0]); if (obj->flags & (REMOTE | PUSHING)) return 0; - target = find_sha1_pack(obj->oid.hash, repo->packs); + target = find_oid_pack(&obj->oid, repo->packs); if (target) { obj->flags |= REMOTE; return 0; diff --git a/http-walker.c b/http-walker.c index 36dd1f33c0..43cde0ebe5 100644 --- a/http-walker.c +++ b/http-walker.c @@ -432,7 +432,7 @@ static int http_fetch_pack(struct walker *walker, struct alt_base *repo, if (fetch_indices(walker, repo)) return -1; - target = find_sha1_pack(oid->hash, repo->packs); + target = find_oid_pack(oid, repo->packs); if (!target) return -1; close_pack_index(target); diff --git a/packfile.c b/packfile.c index 0d4fd2737a..c51eab15a5 100644 --- a/packfile.c +++ b/packfile.c @@ -2010,13 +2010,13 @@ int is_pack_valid(struct packed_git *p) return !open_packed_git(p); } -struct packed_git *find_sha1_pack(const unsigned char *sha1, - struct packed_git *packs) +struct packed_git *find_oid_pack(const struct object_id *oid, + struct packed_git *packs) { struct packed_git *p; for (p = packs; p; p = p->next) { - if (find_pack_entry_one(sha1, p)) + if (find_pack_entry_one(oid->hash, p)) return p; } return NULL; diff --git a/packfile.h b/packfile.h index ad393be9f1..3baffa940c 100644 --- a/packfile.h +++ b/packfile.h @@ -79,8 +79,13 @@ struct packed_git *get_all_packs(struct repository *r); */ unsigned long repo_approximate_object_count(struct repository *r); -struct packed_git *find_sha1_pack(const unsigned char *sha1, - struct packed_git *packs); +/* + * Find the pack within the "packs" list whose index contains the object "oid". + * For general object lookups, you probably don't want this; use + * find_pack_entry() instead. + */ +struct packed_git *find_oid_pack(const struct object_id *oid, + struct packed_git *packs); void pack_report(void); From patchwork Fri Oct 25 07:06:06 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 13850138 Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DD3EF1AF0CD for ; Fri, 25 Oct 2024 07:06:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=104.130.231.41 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729839970; cv=none; b=CEir8wkXNiQa2fCGNZ66qZ2R05H/gBgzMAh8uX9XKBPAHzV+7X6imk8EHvof1whZf9A1BN2jX5r3ZiLbo1HB7eTkLdKgtHpOZzjhv4wvD16wFucn4dxt4UmaQoYHymjwaqEz/qvq10vSO3Wo/aJKWSsdD5f1nVZVkhV2mOd24AU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729839970; c=relaxed/simple; bh=eJT1jyDZR/MRlz3PheptSkUjERDORmD07nUiXDNXu1Q=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=t9+yc+nAGp2V4OlYDSk8kB9vzKI4TGvQPyIH9zhPASu+OL8N+2uGUQZuIj9OHLGsL17vbUGkCPB1dfB8UMXcb7iShGI+St3whGyfh31+uhpH7Iafy07uIUz2HFkIYQ5kUgfRpPA3inUVtNVbqLPR9MSXizh6yZD/IUOHo2o7oog= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=peff.net; spf=pass smtp.mailfrom=peff.net; dkim=pass (2048-bit key) header.d=peff.net header.i=@peff.net header.b=beruCH7v; arc=none smtp.client-ip=104.130.231.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=peff.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=peff.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=peff.net header.i=@peff.net header.b="beruCH7v" Received: (qmail 583 invoked by uid 109); 25 Oct 2024 07:06:07 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed; d=peff.net; h=date:from:to:cc:subject:message-id:references:mime-version:content-type:in-reply-to; s=20240930; bh=eJT1jyDZR/MRlz3PheptSkUjERDORmD07nUiXDNXu1Q=; b=beruCH7vvt25Pqvv8Aaeeka6Ci+FGXPOSPpGtL9GuqNC4ow/3nWvlshYHPrvA/qLMpwY89N9VXVche8Cafu53KjCqrDSlet8ibeFZiAor85r9a4Nxyu5IiM5iFiA3JcETKS58OstI764ByGXZOJbZ1oZizKwpUJwRing+o/Q0uVz6wFDP9DhtKFSEQhEZ4GMInIEp7In5nr8Onlk1HGOfRJmkGnVxJw9unv/4vT2dXyTuPQK9XW6XiXGgg5H/CHqa4XsWezQNlM/Y4OpBpW3HU3oLLAd+hrmIiXy8zQW84FuK0GA7j8UPh/x6vkV1NUZlGj7OePjYDvivmKGHHo0Tg== Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Fri, 25 Oct 2024 07:06:06 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 12947 invoked by uid 111); 25 Oct 2024 07:06:06 -0000 Received: from coredump.intra.peff.net (HELO coredump.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Fri, 25 Oct 2024 03:06:06 -0400 Authentication-Results: peff.net; auth=none Date: Fri, 25 Oct 2024 03:06:06 -0400 From: Jeff King To: Taylor Blau Cc: fox , Eric Sunshine , git@vger.kernel.org Subject: [PATCH 10/11] packfile: use object_id in find_pack_entry_one() Message-ID: <20241025070606.GJ2110355@coredump.intra.peff.net> References: <20241025064148.GA2110169@coredump.intra.peff.net> Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20241025064148.GA2110169@coredump.intra.peff.net> The main function we use to search a pack index for an object is find_pack_entry_one(). That function still takes a bare pointer to the hash, despite the fact that its underlying bsearch_pack() function needs an object_id struct. And so we end up making an extra copy of the hash into the struct just to do a lookup. As it turns out, all callers but one already have such an object_id. So we can just take a pointer to that struct and use it directly. This avoids the extra copy and provides a more type-safe interface. The one exception is get_delta_base() in packfile.c, when we are chasing a REF_DELTA from inside the pack (and thus we have a pointer directly to the mmap'd pack memory, not a struct). We can just bump the hashcpy() from inside find_pack_entry_one() to this one caller that needs it. Signed-off-by: Jeff King --- builtin/pack-objects.c | 4 ++-- connected.c | 4 ++-- midx.c | 2 +- pack-bitmap.c | 4 ++-- packfile.c | 16 ++++++++-------- packfile.h | 4 ++-- t/helper/test-find-pack.c | 2 +- 7 files changed, 18 insertions(+), 18 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 0fc0680b40..0800714267 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -1556,7 +1556,7 @@ static int want_object_in_pack_one(struct packed_git *p, if (p == *found_pack) offset = *found_offset; else - offset = find_pack_entry_one(oid->hash, p); + offset = find_pack_entry_one(oid, p); if (offset) { if (!*found_pack) { @@ -3984,7 +3984,7 @@ static int has_sha1_pack_kept_or_nonlocal(const struct object_id *oid) while (p) { if ((!p->pack_local || p->pack_keep || p->pack_keep_in_core) && - find_pack_entry_one(oid->hash, p)) { + find_pack_entry_one(oid, p)) { last_found = p; return 1; } diff --git a/connected.c b/connected.c index 87cc4b57a1..a9e2e13995 100644 --- a/connected.c +++ b/connected.c @@ -78,7 +78,7 @@ int check_connected(oid_iterate_fn fn, void *cb_data, for (p = get_all_packs(the_repository); p; p = p->next) { if (!p->pack_promisor) continue; - if (find_pack_entry_one(oid->hash, p)) + if (find_pack_entry_one(oid, p)) goto promisor_pack_found; } /* @@ -144,7 +144,7 @@ int check_connected(oid_iterate_fn fn, void *cb_data, * are sure the ref is good and not sending it to * rev-list for verification. */ - if (new_pack && find_pack_entry_one(oid->hash, new_pack)) + if (new_pack && find_pack_entry_one(oid, new_pack)) continue; if (fprintf(rev_list_in, "%s\n", oid_to_hex(oid)) < 0) diff --git a/midx.c b/midx.c index 479812cb9b..e82d4f2e65 100644 --- a/midx.c +++ b/midx.c @@ -987,7 +987,7 @@ int verify_midx_file(struct repository *r, const char *object_dir, unsigned flag } m_offset = e.offset; - p_offset = find_pack_entry_one(oid.hash, e.p); + p_offset = find_pack_entry_one(&oid, e.p); if (m_offset != p_offset) midx_report(_("incorrect object offset for oid[%d] = %s: %"PRIx64" != %"PRIx64), diff --git a/pack-bitmap.c b/pack-bitmap.c index 32b222a7af..4fa9dfc771 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -935,7 +935,7 @@ static inline int bitmap_position_packfile(struct bitmap_index *bitmap_git, const struct object_id *oid) { uint32_t pos; - off_t offset = find_pack_entry_one(oid->hash, bitmap_git->pack); + off_t offset = find_pack_entry_one(oid, bitmap_git->pack); if (!offset) return -1; @@ -1609,7 +1609,7 @@ static int in_bitmapped_pack(struct bitmap_index *bitmap_git, if (bsearch_midx(&object->oid, bitmap_git->midx, NULL)) return 1; } else { - if (find_pack_entry_one(object->oid.hash, bitmap_git->pack) > 0) + if (find_pack_entry_one(&object->oid, bitmap_git->pack) > 0) return 1; } } diff --git a/packfile.c b/packfile.c index c51eab15a5..005ca670b4 100644 --- a/packfile.c +++ b/packfile.c @@ -1239,7 +1239,9 @@ off_t get_delta_base(struct packed_git *p, *curpos += used; } else if (type == OBJ_REF_DELTA) { /* The base entry _must_ be in the same pack */ - base_offset = find_pack_entry_one(base_info, p); + struct object_id oid; + hashcpy(oid.hash, base_info, the_repository->hash_algo); + base_offset = find_pack_entry_one(&oid, p); *curpos += the_hash_algo->rawsz; } else die("I am totally screwed"); @@ -1971,20 +1973,18 @@ off_t nth_packed_object_offset(const struct packed_git *p, uint32_t n) } } -off_t find_pack_entry_one(const unsigned char *sha1, - struct packed_git *p) +off_t find_pack_entry_one(const struct object_id *oid, + struct packed_git *p) { const unsigned char *index = p->index_data; - struct object_id oid; uint32_t result; if (!index) { if (open_pack_index(p)) return 0; } - hashcpy(oid.hash, sha1, the_repository->hash_algo); - if (bsearch_pack(&oid, p, &result)) + if (bsearch_pack(oid, p, &result)) return nth_packed_object_offset(p, result); return 0; } @@ -2016,7 +2016,7 @@ struct packed_git *find_oid_pack(const struct object_id *oid, struct packed_git *p; for (p = packs; p; p = p->next) { - if (find_pack_entry_one(oid->hash, p)) + if (find_pack_entry_one(oid, p)) return p; } return NULL; @@ -2033,7 +2033,7 @@ static int fill_pack_entry(const struct object_id *oid, oidset_contains(&p->bad_objects, oid)) return 0; - offset = find_pack_entry_one(oid->hash, p); + offset = find_pack_entry_one(oid, p); if (!offset) return 0; diff --git a/packfile.h b/packfile.h index 3baffa940c..08f88a7ff5 100644 --- a/packfile.h +++ b/packfile.h @@ -154,10 +154,10 @@ int nth_packed_object_id(struct object_id *, struct packed_git *, uint32_t n); off_t nth_packed_object_offset(const struct packed_git *, uint32_t n); /* - * If the object named sha1 is present in the specified packfile, + * If the object named by oid is present in the specified packfile, * return its offset within the packfile; otherwise, return 0. */ -off_t find_pack_entry_one(const unsigned char *sha1, struct packed_git *); +off_t find_pack_entry_one(const struct object_id *oid, struct packed_git *); int is_pack_valid(struct packed_git *); void *unpack_entry(struct repository *r, struct packed_git *, off_t, enum object_type *, unsigned long *); diff --git a/t/helper/test-find-pack.c b/t/helper/test-find-pack.c index 14b2b0c12c..85a69a4e55 100644 --- a/t/helper/test-find-pack.c +++ b/t/helper/test-find-pack.c @@ -40,7 +40,7 @@ int cmd__find_pack(int argc, const char **argv) die("cannot parse %s as an object name", argv[0]); for (p = get_all_packs(the_repository); p; p = p->next) - if (find_pack_entry_one(oid.hash, p)) { + if (find_pack_entry_one(&oid, p)) { printf("%s\n", p->pack_name); actual_count++; } From patchwork Fri Oct 25 07:08:10 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 13850144 Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0EFF967A0D for ; Fri, 25 Oct 2024 07:08:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=104.130.231.41 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729840094; cv=none; b=lOaHY5Pl7vBTYYs61vGQ6wAz6GdUqG2blotDEqcD8pfMa4gM44b1m5SROjPWd+olPxqTo5CI6ubfWzXND5Eo1brBb4AKYd/GaTcIYmr0f3cTzg2ZP/QxhlQ0q+6CrH71kldoaOuLC8LkaNPr4GEqc8DTe8cTNgJlxbAh0oJGb+g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729840094; c=relaxed/simple; bh=AVFplhlld4wvjTvchPzO6bvAR8bgYe7EqFZJDzfkygQ=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=nKFgm5z0CLY7KQBOtKUW//o+fvoYKfLdUikPzkFKWYM3qBcW12TX+FlMuAPc4rkFKWsfujoij1e6DM9U++hmtAsAR7H+KoQ9HBTczZ5SopJLVYIm8IFjF/u0eM7Vr/SqLUVBrBcUMholizKFyg+uwIABYkzvu3+sVab9LMBYzm0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=peff.net; spf=pass smtp.mailfrom=peff.net; dkim=pass (2048-bit key) header.d=peff.net header.i=@peff.net header.b=WOqytyNM; arc=none smtp.client-ip=104.130.231.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=peff.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=peff.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=peff.net header.i=@peff.net header.b="WOqytyNM" Received: (qmail 614 invoked by uid 109); 25 Oct 2024 07:08:11 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed; d=peff.net; h=date:from:to:cc:subject:message-id:references:mime-version:content-type:in-reply-to; s=20240930; bh=AVFplhlld4wvjTvchPzO6bvAR8bgYe7EqFZJDzfkygQ=; b=WOqytyNMVQS7PKKXzLBB9s1u096KhhvsOWNz3ZMxblOWDFKKh8jabRAeJcfvopyVoQJS3eAIdV8/MQrXQfK2TjpEeb+LW/tlNc+0COMVfZMmgrh1JXkneIqfYhtnXbvMKdFR84+hFouwBfybAK/Xb6kfqxFmW0dxQD+xK/N8f2MSsJjuGdZcW2T5IU0uHiLejNEBje+AYTJT9LFF9WIYVMlS3NqBSZlN1J7gvlbe+8cQy9LqFXqA+PS+UhBxKzrqpE1aj3WhmWUTxQSVnCuqcEhdEmlpjpSn3FPLTNTABUAmu5px1LpHDBprv/H9p7ycdlQoCiIbK8b0ZpTWm2PbZw== Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Fri, 25 Oct 2024 07:08:10 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 13004 invoked by uid 111); 25 Oct 2024 07:08:10 -0000 Received: from coredump.intra.peff.net (HELO coredump.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Fri, 25 Oct 2024 03:08:10 -0400 Authentication-Results: peff.net; auth=none Date: Fri, 25 Oct 2024 03:08:10 -0400 From: Jeff King To: Taylor Blau Cc: Patrick Steinhardt , fox , Eric Sunshine , git@vger.kernel.org Subject: [PATCH 11/11] packfile: use oidread() instead of hashcpy() to fill object_id Message-ID: <20241025070810.GK2110355@coredump.intra.peff.net> References: <20241025064148.GA2110169@coredump.intra.peff.net> Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20241025064148.GA2110169@coredump.intra.peff.net> When chasing a REF_DELTA, we need to pull the raw hash bytes out of the mmap'd packfile into an object_id struct. We do that with a raw hashcpy() of the appropriate length (that happens directly now, though before the previous commit it happened inside find_pack_entry_one(), also using a hashcpy). But I think this creates a potentially dangerous situation due to d4d364b2c7 (hash: convert `oidcmp()` and `oideq()` to compare whole hash, 2024-06-14). When using sha1, we'll have uninitialized bytes in the latter part of the object_id.hash buffer, which could fool oideq(), etc. We should use oidread() instead, which correctly zero-pads the extra bytes, as of c98d762ed9 (global: ensure that object IDs are always padded, 2024-06-14). As far as I can see, this has not been a problem in practice because the object_id we feed to find_pack_entry_one() is never used with oideq(), etc. It is being compared to the bytes mmap'd from a pack idx file, which of course do not have the extra padding bytes themselves. So there's no bug here, but this just puzzled me while looking at the code. We should do the more obviously safe thing, both for future-proofing and to avoid confusing readers. Signed-off-by: Jeff King --- +cc Patrick for any wisdom here. I'd guess that the conversions from c98d762ed9 were found by using ASan, valgrind, or similar with the new oideq() implementation. And so this case, because it actually is safe, was not flagged. packfile.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/packfile.c b/packfile.c index 005ca670b4..9560f0a33c 100644 --- a/packfile.c +++ b/packfile.c @@ -1240,7 +1240,7 @@ off_t get_delta_base(struct packed_git *p, } else if (type == OBJ_REF_DELTA) { /* The base entry _must_ be in the same pack */ struct object_id oid; - hashcpy(oid.hash, base_info, the_repository->hash_algo); + oidread(&oid, base_info, the_repository->hash_algo); base_offset = find_pack_entry_one(&oid, p); *curpos += the_hash_algo->rawsz; } else