From patchwork Mon Nov 6 22:56:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13447579 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 10C3D2EB14 for ; Mon, 6 Nov 2023 22:56:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="y2oFkT9z" Received: from mail-ua1-x92d.google.com (mail-ua1-x92d.google.com [IPv6:2607:f8b0:4864:20::92d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BE3CED73 for ; Mon, 6 Nov 2023 14:56:32 -0800 (PST) Received: by mail-ua1-x92d.google.com with SMTP id a1e0cc1a2514c-7ba962d534eso2047688241.3 for ; Mon, 06 Nov 2023 14:56:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1699311391; x=1699916191; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=SmbAJOPvOxlonyTOSHSXWfRz8/PAZTQqB3RHCadHKOs=; b=y2oFkT9zUeB18YsJtBIpyN7Ce7gxiRvipLPWpYY6pymE3ArH1Gi4I+Nb2fTvUJMaTx 2IYoEBFBQmhs3EmbfSPgeVCCL7vUELX6GUlWXUwTvrYfhUXro1Im5dT5HV+ewTj88AgE vgl+5oLLtCl+e2s1Dzl+LQPzX4b8AFWL/wY77U2FWb8VRJk/ftx/Vf5nmHTtKwGN/IQW NfX7ST/DQd0M/IaIiWgxwHxvC9kUrz8gh2Pkcr7rt9htYP4EUnDXvkxVOSzhsWhI6Vht kveAEqf8Rgea9pDs/092X0lFZ2f9cOBoD7o4XttRdhexHq1G2MiXpxB7nAtNvLWEtuqy +Niw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699311391; x=1699916191; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=SmbAJOPvOxlonyTOSHSXWfRz8/PAZTQqB3RHCadHKOs=; b=cqAIC+aF492eDqI4jE6JTYUfVxW1NtZ4naZHZjsbpxut23qgoQf558LJ9Pd42aqm3r 6PmmIjM6OAMUGcltl7phuOrt4OOd56I+ijPxZt26Aa20frGYJsk887FNMe20GbdSGLYb JE5MgcAkUCWwMk/6n6mprNSnjb43Od1oM7AzjDLHl5/5hYxDiJA2QgvxhQKn5CP5M/u+ 38agK7yn1xqrZV0pBqXzd/rUkgvUXdZvC7HzmMjO/ZfmkEHHbL58fZeHXI/H8VKAZW/I A5TtHLPHSuIyL3vyt+8knMykUzLQ1hZwgeM2VTIqqoDimFllAyd59/Ue11HZSC6vsuiF 27+A== X-Gm-Message-State: AOJu0YxAljjDX831oszneeKP6o6BuRaA9GUEDJsNoqeEmw53ZgpaHo3W 22srpibyTaVOug7FJ8T0NT1T2cI+aCIx/QNDsOxHhw== X-Google-Smtp-Source: AGHT+IHvgRpHln9tXufuMWruxGhKS6YoHcAI9Iut21ONo2pLtIOy2jeIML/9jZa0Q6G52WYoIE6+tw== X-Received: by 2002:a05:6102:c93:b0:45f:3b30:9ca9 with SMTP id f19-20020a0561020c9300b0045f3b309ca9mr4408178vst.15.1699311391562; Mon, 06 Nov 2023 14:56:31 -0800 (PST) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id gb7-20020a05622a598700b00418189b689csm3861374qtb.10.2023.11.06.14.56.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 06 Nov 2023 14:56:31 -0800 (PST) Date: Mon, 6 Nov 2023 17:56:30 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Junio C Hamano Subject: [PATCH 1/2] list-objects: drop --unpacked non-commit objects from results Message-ID: References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: In git-rev-list(1), we describe the `--unpacked` option as: Only useful with `--objects`; print the object IDs that are not in packs. This is true of commits, which we discard via get_commit_action(), but not of the objects they reach. So if we ask for an --objects traversal with --unpacked, we may get arbitrarily many objects which are indeed packed. I am nearly certain this behavior dates back to the introduction of `--unpacked` via 12d2a18780 ("git rev-list --unpacked" shows only unpacked commits, 2005-07-03), but I couldn't get that revision of Git to compile for me. At least as early as v2.0.0 this has been subtly broken: $ git.compile --version git version 2.0.0 $ git.compile rev-list --objects --all --unpacked 72791fe96c93f9ec5c311b8bc966ab349b3b5bbe 05713d991c18bbeef7e154f99660005311b5004d v1.0 153ed8b7719c6f5a68ce7ffc43133e95a6ac0fdb 8e4020bb5a8d8c873b25de15933e75cc0fc275df one 9200b628cf9dc883a85a7abc8d6e6730baee589c two 3e6b46e1b7e3b91acce99f6a823104c28aae0b58 unpacked.t There, only the first, third, and sixth entries are loose, with the remaining set of objects belonging to at least one pack. The implications for this are relatively benign: bare 'git repack' invocations which invoke pack-objects with --unpacked are impacted, and at worst we'll store a few extra objects that should have been excluded. Arguably changing this behavior is a backwards-incompatible change, since it alters the set of objects emitted from rev-list queries with `--objects` and `--unpacked`. But I argue that this change is still sensible, since the existing implementation deviates from clearly-written documentation. The fix here is straightforward: avoid showing any non-commit objects which are contained in packs by discarding them within list-objects.c, before they are shown to the user. Note that similar treatment for `list-objects.c::show_commit()` is not needed, since that case is already handled by `revision.c::get_commit_action()`. Co-authored-by: Jeff King Signed-off-by: Jeff King Signed-off-by: Taylor Blau --- list-objects.c | 3 +++ t/t6000-rev-list-misc.sh | 13 +++++++++++++ 2 files changed, 16 insertions(+) diff --git a/list-objects.c b/list-objects.c index c25c72b32c..c8a5fb998e 100644 --- a/list-objects.c +++ b/list-objects.c @@ -39,6 +39,9 @@ static void show_object(struct traversal_context *ctx, { if (!ctx->show_object) return; + if (ctx->revs->unpacked && has_object_pack(&object->oid)) + return; + ctx->show_object(object, name, ctx->show_data); } diff --git a/t/t6000-rev-list-misc.sh b/t/t6000-rev-list-misc.sh index 12def7bcbf..6289a2e8b0 100755 --- a/t/t6000-rev-list-misc.sh +++ b/t/t6000-rev-list-misc.sh @@ -169,4 +169,17 @@ test_expect_success 'rev-list --count --objects' ' test_line_count = $count actual ' +test_expect_success 'rev-list --unpacked' ' + git repack -ad && + test_commit unpacked && + + git rev-list --objects --no-object-names unpacked^.. >expect.raw && + sort expect.raw >expect && + + git rev-list --all --objects --unpacked --no-object-names >actual.raw && + sort actual.raw >actual && + + test_cmp expect actual +' + test_done From patchwork Mon Nov 6 22:56:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13447580 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0A3582FE06 for ; Mon, 6 Nov 2023 22:56:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="XYXG103Y" Received: from mail-qv1-xf32.google.com (mail-qv1-xf32.google.com [IPv6:2607:f8b0:4864:20::f32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3AE7FD6E for ; Mon, 6 Nov 2023 14:56:35 -0800 (PST) Received: by mail-qv1-xf32.google.com with SMTP id 6a1803df08f44-66d0ea3e5b8so32948116d6.0 for ; Mon, 06 Nov 2023 14:56:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1699311394; x=1699916194; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=jR12zIc/T0ZwMYnH0WA7WIxwACKUGUCgecLVT4TLNqg=; b=XYXG103YosjH5cRaynDjd26xvi7Sx5aagad2fs1w9lFOk6BPQa9l3vpkBT9UD3Ra0U GP0cCgB+pZVcl3Fj/G2fTPTejH+B7B+Y5ifpjA0uai4I77neo73asiZ/78PvbpahZ6ii wplW99g4FdFQIaje9QXfyCJvd48P6QdPvQ5AtJIFr85YL+8Z5tWBziYZuq1PdqcUl+oq 8AyAfEaMS3tLVKkEGBdany5i1SppguU69Ns08Ny739clgrg46/HA2Cc3iMWu9wYAfP1l L/h0Q/0tYmxZwceDYfIGYnM7aoVf6o291Y2l5q+mPBdgzgz73mt517EcUTg85z5pJJtE z5Cw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699311394; x=1699916194; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=jR12zIc/T0ZwMYnH0WA7WIxwACKUGUCgecLVT4TLNqg=; b=anAMvxD8rpogqBfr8v3nbZfdjFadf/yEhfQKKFkh6ImokP941TzYnuITfcQrDAymBh /8YtPGHBaw+NU38ipCESITchqWjYB/mIt3Wj8zgtjopleYFd5Osc73/fPP74L1mYUOOB TlZnnGukw42cqxbCzH05YVkXgzKPefneQfzKWQfmdG37bNz9uIN2E/gHsD/BSCIkUwXI gPGJNOuT3lKfwRIhiVWx9mSe6RLW0hByW3ixAHn33tGxPKUideOGZN1Z0wSzk7qIduBh ukab00LhUH4KYUrTSB+Jd7wexa/13WAmZylk/LQi2YaOmcGznWY9rQlCk5iA2bQu0963 reHA== X-Gm-Message-State: AOJu0YzhkgmIdW9sLKOc8kjzP6aFfS3ZgVTZTFY2Wbjqet+2KSiIswVI Gsj2Dlfxp64YSSQonCIQGqp9JvDuWnP38iqOD1g/pw== X-Google-Smtp-Source: AGHT+IFSNVlanpLxKwgFYIQ3P54V53N9F81oiXF5lVGcejiMTMbSLRlU5BCk744ZKEgVvOCftLLWRQ== X-Received: by 2002:a05:6214:2507:b0:66d:36fb:474d with SMTP id gf7-20020a056214250700b0066d36fb474dmr39381863qvb.1.1699311394179; Mon, 06 Nov 2023 14:56:34 -0800 (PST) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id x3-20020a0ceb83000000b0067095b0c473sm3868574qvo.11.2023.11.06.14.56.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 06 Nov 2023 14:56:33 -0800 (PST) Date: Mon, 6 Nov 2023 17:56:33 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Junio C Hamano Subject: [PATCH 2/2] pack-bitmap: drop --unpacked non-commit objects from results Message-ID: <7492dc699052a392d2fb394e1dcfabebac82ded0.1699311386.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: When performing revision queries with `--objects` and `--use-bitmap-index`, the output may incorrectly contain objects which are packed, even when the `--unpacked` option is given. This affects traversals, but also other querying operations, like `--count`, `--disk-usage`, etc. Like in the previous commit, the fix is to exclude those objects from the result set before they are shown to the user (or, in this case, before the bitmap containing the result of the traversal is enumerated and its objects listed). This is performed by a new function in pack-bitmap.c, called `filter_packed_objects_from_bitmap()`. Note that we do not have to inspect individual bits in the result bitmap, since we know that the first N (where N is the number of objects in the bitmap's pack/MIDX) bits correspond to objects which packed by definition. In other words, for an object to have a bitmap position (not in the extended index), it must appear in either the bitmap's pack or one of the packs in its MIDX. This presents an appealing optimization to us, which is that we can simply memset() the corresponding number of `eword_t`'s to zero, provided that we handle any objects which spill into the next word (but don't occupy all 64 bits of the word itself). We only have to handle objects in the bitmap's extended index. These objects may (or may not) appear in one or more pack(s). Since these objects are known to not appear in either the bitmap's MIDX or pack, they may be stored as loose, appear in other pack(s), or both. Before returning a bitmap containing the result of the traversal back to the caller, drop any bits from the extended index which appear in one or more packs. This implements the correct behavior for rev-list operations which use the bitmap index to compute their result. Co-authored-by: Jeff King Signed-off-by: Jeff King Signed-off-by: Taylor Blau --- pack-bitmap.c | 27 +++++++++++++++++++++++++++ t/t6113-rev-list-bitmap-filters.sh | 13 +++++++++++++ t/t6115-rev-list-du.sh | 7 +++++++ 3 files changed, 47 insertions(+) diff --git a/pack-bitmap.c b/pack-bitmap.c index ca8319b87c..0260890341 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -1666,6 +1666,30 @@ static int can_filter_bitmap(struct list_objects_filter_options *filter) return !filter_bitmap(NULL, NULL, NULL, filter); } + +static void filter_packed_objects_from_bitmap(struct bitmap_index *bitmap_git, + struct bitmap *result) +{ + struct eindex *eindex = &bitmap_git->ext_index; + uint32_t objects_nr; + size_t i, pos; + + objects_nr = bitmap_num_objects(bitmap_git); + pos = objects_nr / BITS_IN_EWORD; + + if (pos > result->word_alloc) + pos = result->word_alloc; + + memset(result->words, 0x00, sizeof(eword_t) * pos); + for (i = pos * BITS_IN_EWORD; i < objects_nr; i++) + bitmap_unset(result, i); + + for (i = 0; i < eindex->count; ++i) { + if (has_object_pack(&eindex->objects[i]->oid)) + bitmap_unset(result, objects_nr + i); + } +} + struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs, int filter_provided_objects) { @@ -1788,6 +1812,9 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs, wants_bitmap, &revs->filter); + if (revs->unpacked) + filter_packed_objects_from_bitmap(bitmap_git, wants_bitmap); + bitmap_git->result = wants_bitmap; bitmap_git->haves = haves_bitmap; diff --git a/t/t6113-rev-list-bitmap-filters.sh b/t/t6113-rev-list-bitmap-filters.sh index 4d8e09167e..86c70521f1 100755 --- a/t/t6113-rev-list-bitmap-filters.sh +++ b/t/t6113-rev-list-bitmap-filters.sh @@ -141,4 +141,17 @@ test_expect_success 'combine filter with --filter-provided-objects' ' done expect.raw && + sort expect.raw >expect && + + git rev-list --use-bitmap-index --objects --all --unpacked >actual.raw && + sort actual.raw >actual && + + test_cmp expect actual +' + test_done diff --git a/t/t6115-rev-list-du.sh b/t/t6115-rev-list-du.sh index d59111dede..c0cfda62fa 100755 --- a/t/t6115-rev-list-du.sh +++ b/t/t6115-rev-list-du.sh @@ -48,6 +48,13 @@ check_du HEAD check_du --objects HEAD check_du --objects HEAD^..HEAD +test_expect_success 'setup for --unpacked tests' ' + git repack -adb && + test_commit unpacked +' + +check_du --all --objects --unpacked + # As mentioned above, don't use hardcode sizes as actual size, but use the # output from git cat-file. test_expect_success 'rev-list --disk-usage=human' '