diff mbox series

pack-bitmap.c: typofix in `find_boundary_objects()`

Message ID cf49115db4e8dcd406a17c946659c2eef3ec6045.1732229420.git.me@ttaylorr.com (mailing list archive)
State New
Headers show
Series pack-bitmap.c: typofix in `find_boundary_objects()` | expand

Commit Message

Taylor Blau Nov. 21, 2024, 10:50 p.m. UTC
In the boundary-based bitmap traversal, we use the given 'rev_info'
structure to first do a commit-only walk in order to determine the
boundary between interesting and uninteresting objects.

That walk only looks at commit objects, regardless of the state of
revs->blob_objects, revs->tree_objects, and so on. In order to do this,
we store the state of these variables in temporary fields before
setting them back to zero, performing the traversal, and then setting
them back.

But there is a typo here that dates back to b0afdce5da (pack-bitmap.c:
use commit boundary during bitmap traversal, 2023-05-08), where we
incorrectly store the value of the "tags" field as "revs->blob_objects".

This could lead to problems later on if, say, the caller wants tag
objects but *not* blob objects. In the pre-image behavior, we'd set
revs->tag_objects back to the old value of revs->blob_objects, thus
emitting fewer objects than expected back to the caller.

Fix that by correctly assigning the value of 'revs->tag_objects' to the
'tmp_tags' field.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
Noticed while I was looking for an example of this pattern somewhere in
the codebase and was surprised when I found this typo ;-).

 pack-bitmap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


base-commit: 4083a6f05206077a50af7658bedc17a94c54607d
--
2.47.0.237.gc601277f4c4

Comments

Junio C Hamano Nov. 22, 2024, 12:39 a.m. UTC | #1
Taylor Blau <me@ttaylorr.com> writes:

> Fix that by correctly assigning the value of 'revs->tag_objects' to the
> 'tmp_tags' field.

Makes sense.  This breakage would make no difference in practice
right now, as {type}_objects members of the rev_info structure have
always been all flipped on together since their inception at
3c90f03d (Prepare git-rev-list for tracking tag objects too,
2005-06-29), so the original value of the tag_objects member is
always the same as that of the blob_objects member.

A possible alternative "fix" for this typo could be to unify these
{type}_objects members into a single .non_commit_objects member in
the rev_info structure; given that we (as far as I remember) never
utilized the "feature" that, say, we can ask only for trees but not
blobs for the past ~20 years, nobody knows if the apparent "support"
for that feature is subtly broken in multiple ways (and one of them
you just fixed with this patch) and the "feature" may not be worth
keeping.

But neverhteless, this is a correct thing to do unless we decide to
rip out the support for individual types.  Will queue.

Thanks.



>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
> Noticed while I was looking for an example of this pattern somewhere in
> the codebase and was surprised when I found this typo ;-).
>
>  pack-bitmap.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/pack-bitmap.c b/pack-bitmap.c
> index 4fa9dfc771..683f467051 100644
> --- a/pack-bitmap.c
> +++ b/pack-bitmap.c
> @@ -1270,7 +1270,7 @@ static struct bitmap *find_boundary_objects(struct bitmap_index *bitmap_git,
>
>  	tmp_blobs = revs->blob_objects;
>  	tmp_trees = revs->tree_objects;
> -	tmp_tags = revs->blob_objects;
> +	tmp_tags = revs->tag_objects;
>  	revs->blob_objects = 0;
>  	revs->tree_objects = 0;
>  	revs->tag_objects = 0;
>
> base-commit: 4083a6f05206077a50af7658bedc17a94c54607d
> --
> 2.47.0.237.gc601277f4c4
Jeff King Nov. 22, 2024, 3:41 p.m. UTC | #2
On Fri, Nov 22, 2024 at 09:39:07AM +0900, Junio C Hamano wrote:

> Taylor Blau <me@ttaylorr.com> writes:
> 
> > Fix that by correctly assigning the value of 'revs->tag_objects' to the
> > 'tmp_tags' field.
> 
> Makes sense.  This breakage would make no difference in practice
> right now, as {type}_objects members of the rev_info structure have
> always been all flipped on together since their inception at
> 3c90f03d (Prepare git-rev-list for tracking tag objects too,
> 2005-06-29), so the original value of the tag_objects member is
> always the same as that of the blob_objects member.

Yep, I concur.

> A possible alternative "fix" for this typo could be to unify these
> {type}_objects members into a single .non_commit_objects member in
> the rev_info structure; given that we (as far as I remember) never
> utilized the "feature" that, say, we can ask only for trees but not
> blobs for the past ~20 years, nobody knows if the apparent "support"
> for that feature is subtly broken in multiple ways (and one of them
> you just fixed with this patch) and the "feature" may not be worth
> keeping.

I have been tempted to do that, too. But FWIW, I do remember
implementing something that set some but not all of the fields in a
series. Digging in my old branches, I think it may have been for a
reachability check for remote git-archive, where we want to dig only as
deep as the object we are looking for (so if we are finding out if a
tree is reachable, we do not need to ask about blobs, and looking for a
tag needs neither trees nor blobs).

If no such code made it into the project in those 20 years, it may not
worth worrying about too much. But I wanted to point it out as a
plausible use case.

> But neverhteless, this is a correct thing to do unless we decide to
> rip out the support for individual types.  Will queue.

Agreed.

-Peff
Jeff King Nov. 22, 2024, 3:44 p.m. UTC | #3
On Fri, Nov 22, 2024 at 10:41:16AM -0500, Jeff King wrote:

> > A possible alternative "fix" for this typo could be to unify these
> > {type}_objects members into a single .non_commit_objects member in
> > the rev_info structure; given that we (as far as I remember) never
> > utilized the "feature" that, say, we can ask only for trees but not
> > blobs for the past ~20 years, nobody knows if the apparent "support"
> > for that feature is subtly broken in multiple ways (and one of them
> > you just fixed with this patch) and the "feature" may not be worth
> > keeping.
> 
> I have been tempted to do that, too. But FWIW, I do remember
> implementing something that set some but not all of the fields in a
> series. Digging in my old branches, I think it may have been for a
> reachability check for remote git-archive, where we want to dig only as
> deep as the object we are looking for (so if we are finding out if a
> tree is reachable, we do not need to ask about blobs, and looking for a
> tag needs neither trees nor blobs).
> 
> If no such code made it into the project in those 20 years, it may not
> worth worrying about too much. But I wanted to point it out as a
> plausible use case.

There are also two curious cases added by f18b512bbb (bundle: create
filtered bundles, 2022-03-09) that set tree/blob, but not "tag". Not
sure if that is a bug or not.

-Peff
Junio C Hamano Nov. 25, 2024, 2:29 a.m. UTC | #4
Jeff King <peff@peff.net> writes:

> There are also two curious cases added by f18b512bbb (bundle: create
> filtered bundles, 2022-03-09) that set tree/blob, but not "tag". Not
> sure if that is a bug or not.

I do not think it is a good idea to single out bundles, but I can
sort of see how it could be useful when making a partial clone that
does not have blobs. `git pack-objects --objects=tag,commit,tree`
might want to be driven when preparing pack data to populate such a
clone initially.  Unifying these three object flags into "all the
others" is tempting but would close the door for such a tweak.
diff mbox series

Patch

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 4fa9dfc771..683f467051 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1270,7 +1270,7 @@  static struct bitmap *find_boundary_objects(struct bitmap_index *bitmap_git,

 	tmp_blobs = revs->blob_objects;
 	tmp_trees = revs->tree_objects;
-	tmp_tags = revs->blob_objects;
+	tmp_tags = revs->tag_objects;
 	revs->blob_objects = 0;
 	revs->tree_objects = 0;
 	revs->tag_objects = 0;