diff mbox series

[07/11] write_reused_pack_one(): translate bit positions directly

Message ID 94e5c96f6859479e0206d6d775eacf54b3639ee5.1728505840.git.me@ttaylorr.com (mailing list archive)
State New
Headers show
Series pack-bitmap: convert offset to ref deltas where possible | expand

Commit Message

Taylor Blau Oct. 9, 2024, 8:31 p.m. UTC
A future commit will want to deal with bit positions instead of pack
positions from within builtin/pack-objects.c::write_reused_pack_one().

That function at present takes a pack position, so one approach to
accommodating the new functionality would be to add a secondary bit
position parameter, making the function's declaration look something
like:

    static void write_reused_pack_one(struct packed_git *reuse_packfile,
                                      size_t pack_pos, size_t bitmap_pos,
                                      struct hashfile *out,
                                      off_t pack_start,
                                      struct pack_window **w_curs);

But because the pack-relative position can be easily derived from the
bit position, it makes senes to just pass the latter and let the
function itself translate it into a pack-relative position.

To do this, extract a new function `bitmap_to_pack_pos()` from the
existing `write_reused_pack()` function. This new routine is responsible
for performing the conversion from bitmap- to pack-relative positions.

Instead of performing that translation in `write_reused_pack()`, instead
call the new function from within `write_reused_pack_one()` so that we
can just pass a single bit position to it.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c | 78 ++++++++++++++++++++++--------------------
 1 file changed, 41 insertions(+), 37 deletions(-)

Comments

Jeff King Oct. 11, 2024, 8:16 a.m. UTC | #1
On Wed, Oct 09, 2024 at 04:31:21PM -0400, Taylor Blau wrote:

> +static uint32_t bitmap_to_pack_pos(struct packed_git *reuse_packfile,
> +				   size_t pos)
> +{
> +	if (bitmap_is_midx(bitmap_git)) {
> +		/*
> +		 * When doing multi-pack reuse on a
> +		 * non-preferred pack, translate bit positions
> +		 * from the MIDX pseudo-pack order back to their
> +		 * pack-relative positions before attempting
> +		 * reuse.
> +		 */
> +		struct multi_pack_index *m = bitmap_midx(bitmap_git);
> +		uint32_t midx_pos, pack_pos;
> +		off_t pack_ofs;
> +
> +		if (!m)
> +			BUG("non-zero bitmap position without MIDX");

The text of this BUG() seems weird: we haven't asserted a non-zero
bitmap position. We're really only checking that bitmap_is_midx() and
bitmap_midx() agree that there is a midx. I was going to suggest that
the former could be implemented with a NULL-check on the latter, but
really, that is already how it works (except that it accesses
bitmap_index->midx directly).

So yes, it truly would be a surprising BUG() to see them disagree. :)

I do not mind keeping the BUG() there if you want to be extra careful,
but I just found the message text confusing.

Ah...hmm. This is all being copied from the earlier function. So I think
the culprit may be patch 6, which swaps:

  if (reuse_packfile->bitmap_pos)

for:

  if (bitmap_is_midx(bitmap_git))

which is what makes the BUG() text confusing. But then, what about this:

> +	} else {
> +		/*
> +		 * Can use bit positions directly, even for MIDX
> +		 * bitmaps. See comment in try_partial_reuse()
> +		 * for why.
> +		 */
> +		return pos;
> +	}
> +}

This "even for MIDX" is not really accurate, as we know this else block
is for the non-midx case. Are we losing the optimization that the first
pack in the midx can be treated the same as the single-pack case (we
know that its pack positions and the start of the midx bit positions are
identical, which is what the comment it mentions explains)?

-Peff
Taylor Blau Nov. 4, 2024, 8:36 p.m. UTC | #2
On Fri, Oct 11, 2024 at 04:16:15AM -0400, Jeff King wrote:
> Ah...hmm. This is all being copied from the earlier function. So I think
> the culprit may be patch 6, which swaps:
>
>   if (reuse_packfile->bitmap_pos)
>
> for:
>
>   if (bitmap_is_midx(bitmap_git))
>
> which is what makes the BUG() text confusing. But then, what about this:
>
> > +	} else {
> > +		/*
> > +		 * Can use bit positions directly, even for MIDX
> > +		 * bitmaps. See comment in try_partial_reuse()
> > +		 * for why.
> > +		 */
> > +		return pos;
> > +	}
> > +}
>
> This "even for MIDX" is not really accurate, as we know this else block
> is for the non-midx case. Are we losing the optimization that the first
> pack in the midx can be treated the same as the single-pack case (we
> know that its pack positions and the start of the midx bit positions are
> identical, which is what the comment it mentions explains)?

Great catch.

We indeed lost that optimization when converting "if
(reuse_packfile->bitmap_pos)" to "if (bitmap_is_midx(bitmap_git))".
Let's restore that by keeping the conditional unchanged, which:

  - makes the BUG() make sense as written, and

  - preserves the optimization where the first pack in a MIDX can be
    treated the same as if it came from a single-pack bitmap, and
    bypass the bit position translations.

Thanks for spotting.

Thanks,
Taylor
diff mbox series

Patch

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 097bb5ac2ca..7f50d58a235 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1017,6 +1017,42 @@  static off_t find_reused_offset(off_t where)
 	return reused_chunks[lo-1].difference;
 }
 
+static uint32_t bitmap_to_pack_pos(struct packed_git *reuse_packfile,
+				   size_t pos)
+{
+	if (bitmap_is_midx(bitmap_git)) {
+		/*
+		 * When doing multi-pack reuse on a
+		 * non-preferred pack, translate bit positions
+		 * from the MIDX pseudo-pack order back to their
+		 * pack-relative positions before attempting
+		 * reuse.
+		 */
+		struct multi_pack_index *m = bitmap_midx(bitmap_git);
+		uint32_t midx_pos, pack_pos;
+		off_t pack_ofs;
+
+		if (!m)
+			BUG("non-zero bitmap position without MIDX");
+
+		midx_pos = pack_pos_to_midx(m, pos);
+		pack_ofs = nth_midxed_offset(m, midx_pos);
+
+		if (offset_to_pack_pos(reuse_packfile, pack_ofs, &pack_pos) < 0)
+			BUG("could not find expected object at offset %"PRIuMAX" in pack %s",
+			    (uintmax_t)pack_ofs, pack_basename(reuse_packfile));
+
+		return pack_pos;
+	} else {
+		/*
+		 * Can use bit positions directly, even for MIDX
+		 * bitmaps. See comment in try_partial_reuse()
+		 * for why.
+		 */
+		return pos;
+	}
+}
+
 static void write_reused_pack_one(struct packed_git *reuse_packfile,
 				  size_t pos, struct hashfile *out,
 				  off_t pack_start,
@@ -1025,9 +1061,10 @@  static void write_reused_pack_one(struct packed_git *reuse_packfile,
 	off_t offset, next, cur;
 	enum object_type type;
 	unsigned long size;
+	uint32_t pack_pos = bitmap_to_pack_pos(reuse_packfile, pos);
 
-	offset = pack_pos_to_offset(reuse_packfile, pos);
-	next = pack_pos_to_offset(reuse_packfile, pos + 1);
+	offset = pack_pos_to_offset(reuse_packfile, pack_pos);
+	next = pack_pos_to_offset(reuse_packfile, pack_pos + 1);
 
 	record_reused_object(offset,
 			     offset - (hashfile_total(out) - pack_start));
@@ -1191,7 +1228,6 @@  static void write_reused_pack(struct bitmapped_pack *reuse_packfile,
 		size_t pos = (i * BITS_IN_EWORD);
 
 		for (offset = 0; offset < BITS_IN_EWORD; ++offset) {
-			uint32_t pack_pos;
 			if ((word >> offset) == 0)
 				break;
 
@@ -1201,40 +1237,8 @@  static void write_reused_pack(struct bitmapped_pack *reuse_packfile,
 			if (pos + offset >= reuse_packfile->bitmap_pos + reuse_packfile->bitmap_nr)
 				goto done;
 
-			if (bitmap_is_midx(bitmap_git)) {
-				/*
-				 * When doing multi-pack reuse on a
-				 * non-preferred pack, translate bit positions
-				 * from the MIDX pseudo-pack order back to their
-				 * pack-relative positions before attempting
-				 * reuse.
-				 */
-				struct multi_pack_index *m = bitmap_midx(bitmap_git);
-				uint32_t midx_pos;
-				off_t pack_ofs;
-
-				if (!m)
-					BUG("non-zero bitmap position without MIDX");
-
-				midx_pos = pack_pos_to_midx(m, pos + offset);
-				pack_ofs = nth_midxed_offset(m, midx_pos);
-
-				if (offset_to_pack_pos(reuse_packfile->p,
-						       pack_ofs, &pack_pos) < 0)
-					BUG("could not find expected object at offset %"PRIuMAX" in pack %s",
-					    (uintmax_t)pack_ofs,
-					    pack_basename(reuse_packfile->p));
-			} else {
-				/*
-				 * Can use bit positions directly, even for MIDX
-				 * bitmaps. See comment in try_partial_reuse()
-				 * for why.
-				 */
-				pack_pos = pos + offset;
-			}
-
-			write_reused_pack_one(reuse_packfile->p, pack_pos, f,
-					      pack_start, &w_curs);
+			write_reused_pack_one(reuse_packfile->p, pos + offset,
+					      f, pack_start, &w_curs);
 			display_progress(progress_state, ++written);
 		}
 	}