Message ID | 94e5c96f6859479e0206d6d775eacf54b3639ee5.1728505840.git.me@ttaylorr.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | pack-bitmap: convert offset to ref deltas where possible | expand |
On Wed, Oct 09, 2024 at 04:31:21PM -0400, Taylor Blau wrote: > +static uint32_t bitmap_to_pack_pos(struct packed_git *reuse_packfile, > + size_t pos) > +{ > + if (bitmap_is_midx(bitmap_git)) { > + /* > + * When doing multi-pack reuse on a > + * non-preferred pack, translate bit positions > + * from the MIDX pseudo-pack order back to their > + * pack-relative positions before attempting > + * reuse. > + */ > + struct multi_pack_index *m = bitmap_midx(bitmap_git); > + uint32_t midx_pos, pack_pos; > + off_t pack_ofs; > + > + if (!m) > + BUG("non-zero bitmap position without MIDX"); The text of this BUG() seems weird: we haven't asserted a non-zero bitmap position. We're really only checking that bitmap_is_midx() and bitmap_midx() agree that there is a midx. I was going to suggest that the former could be implemented with a NULL-check on the latter, but really, that is already how it works (except that it accesses bitmap_index->midx directly). So yes, it truly would be a surprising BUG() to see them disagree. :) I do not mind keeping the BUG() there if you want to be extra careful, but I just found the message text confusing. Ah...hmm. This is all being copied from the earlier function. So I think the culprit may be patch 6, which swaps: if (reuse_packfile->bitmap_pos) for: if (bitmap_is_midx(bitmap_git)) which is what makes the BUG() text confusing. But then, what about this: > + } else { > + /* > + * Can use bit positions directly, even for MIDX > + * bitmaps. See comment in try_partial_reuse() > + * for why. > + */ > + return pos; > + } > +} This "even for MIDX" is not really accurate, as we know this else block is for the non-midx case. Are we losing the optimization that the first pack in the midx can be treated the same as the single-pack case (we know that its pack positions and the start of the midx bit positions are identical, which is what the comment it mentions explains)? -Peff
On Fri, Oct 11, 2024 at 04:16:15AM -0400, Jeff King wrote: > Ah...hmm. This is all being copied from the earlier function. So I think > the culprit may be patch 6, which swaps: > > if (reuse_packfile->bitmap_pos) > > for: > > if (bitmap_is_midx(bitmap_git)) > > which is what makes the BUG() text confusing. But then, what about this: > > > + } else { > > + /* > > + * Can use bit positions directly, even for MIDX > > + * bitmaps. See comment in try_partial_reuse() > > + * for why. > > + */ > > + return pos; > > + } > > +} > > This "even for MIDX" is not really accurate, as we know this else block > is for the non-midx case. Are we losing the optimization that the first > pack in the midx can be treated the same as the single-pack case (we > know that its pack positions and the start of the midx bit positions are > identical, which is what the comment it mentions explains)? Great catch. We indeed lost that optimization when converting "if (reuse_packfile->bitmap_pos)" to "if (bitmap_is_midx(bitmap_git))". Let's restore that by keeping the conditional unchanged, which: - makes the BUG() make sense as written, and - preserves the optimization where the first pack in a MIDX can be treated the same as if it came from a single-pack bitmap, and bypass the bit position translations. Thanks for spotting. Thanks, Taylor
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 097bb5ac2ca..7f50d58a235 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -1017,6 +1017,42 @@ static off_t find_reused_offset(off_t where) return reused_chunks[lo-1].difference; } +static uint32_t bitmap_to_pack_pos(struct packed_git *reuse_packfile, + size_t pos) +{ + if (bitmap_is_midx(bitmap_git)) { + /* + * When doing multi-pack reuse on a + * non-preferred pack, translate bit positions + * from the MIDX pseudo-pack order back to their + * pack-relative positions before attempting + * reuse. + */ + struct multi_pack_index *m = bitmap_midx(bitmap_git); + uint32_t midx_pos, pack_pos; + off_t pack_ofs; + + if (!m) + BUG("non-zero bitmap position without MIDX"); + + midx_pos = pack_pos_to_midx(m, pos); + pack_ofs = nth_midxed_offset(m, midx_pos); + + if (offset_to_pack_pos(reuse_packfile, pack_ofs, &pack_pos) < 0) + BUG("could not find expected object at offset %"PRIuMAX" in pack %s", + (uintmax_t)pack_ofs, pack_basename(reuse_packfile)); + + return pack_pos; + } else { + /* + * Can use bit positions directly, even for MIDX + * bitmaps. See comment in try_partial_reuse() + * for why. + */ + return pos; + } +} + static void write_reused_pack_one(struct packed_git *reuse_packfile, size_t pos, struct hashfile *out, off_t pack_start, @@ -1025,9 +1061,10 @@ static void write_reused_pack_one(struct packed_git *reuse_packfile, off_t offset, next, cur; enum object_type type; unsigned long size; + uint32_t pack_pos = bitmap_to_pack_pos(reuse_packfile, pos); - offset = pack_pos_to_offset(reuse_packfile, pos); - next = pack_pos_to_offset(reuse_packfile, pos + 1); + offset = pack_pos_to_offset(reuse_packfile, pack_pos); + next = pack_pos_to_offset(reuse_packfile, pack_pos + 1); record_reused_object(offset, offset - (hashfile_total(out) - pack_start)); @@ -1191,7 +1228,6 @@ static void write_reused_pack(struct bitmapped_pack *reuse_packfile, size_t pos = (i * BITS_IN_EWORD); for (offset = 0; offset < BITS_IN_EWORD; ++offset) { - uint32_t pack_pos; if ((word >> offset) == 0) break; @@ -1201,40 +1237,8 @@ static void write_reused_pack(struct bitmapped_pack *reuse_packfile, if (pos + offset >= reuse_packfile->bitmap_pos + reuse_packfile->bitmap_nr) goto done; - if (bitmap_is_midx(bitmap_git)) { - /* - * When doing multi-pack reuse on a - * non-preferred pack, translate bit positions - * from the MIDX pseudo-pack order back to their - * pack-relative positions before attempting - * reuse. - */ - struct multi_pack_index *m = bitmap_midx(bitmap_git); - uint32_t midx_pos; - off_t pack_ofs; - - if (!m) - BUG("non-zero bitmap position without MIDX"); - - midx_pos = pack_pos_to_midx(m, pos + offset); - pack_ofs = nth_midxed_offset(m, midx_pos); - - if (offset_to_pack_pos(reuse_packfile->p, - pack_ofs, &pack_pos) < 0) - BUG("could not find expected object at offset %"PRIuMAX" in pack %s", - (uintmax_t)pack_ofs, - pack_basename(reuse_packfile->p)); - } else { - /* - * Can use bit positions directly, even for MIDX - * bitmaps. See comment in try_partial_reuse() - * for why. - */ - pack_pos = pos + offset; - } - - write_reused_pack_one(reuse_packfile->p, pack_pos, f, - pack_start, &w_curs); + write_reused_pack_one(reuse_packfile->p, pos + offset, + f, pack_start, &w_curs); display_progress(progress_state, ++written); } }
A future commit will want to deal with bit positions instead of pack positions from within builtin/pack-objects.c::write_reused_pack_one(). That function at present takes a pack position, so one approach to accommodating the new functionality would be to add a secondary bit position parameter, making the function's declaration look something like: static void write_reused_pack_one(struct packed_git *reuse_packfile, size_t pack_pos, size_t bitmap_pos, struct hashfile *out, off_t pack_start, struct pack_window **w_curs); But because the pack-relative position can be easily derived from the bit position, it makes senes to just pass the latter and let the function itself translate it into a pack-relative position. To do this, extract a new function `bitmap_to_pack_pos()` from the existing `write_reused_pack()` function. This new routine is responsible for performing the conversion from bitmap- to pack-relative positions. Instead of performing that translation in `write_reused_pack()`, instead call the new function from within `write_reused_pack_one()` so that we can just pass a single bit position to it. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- builtin/pack-objects.c | 78 ++++++++++++++++++++++-------------------- 1 file changed, 41 insertions(+), 37 deletions(-)