Message ID | afa25c8806b0e80f1d3ed46f29eb164cab4583ac.1724793201.git.me@ttaylorr.com (mailing list archive) |
---|---|
State | Accepted |
Commit | db40e3c92b9ccd03d2263b12cf1824ffab8a1cce |
Headers | show |
Series | pack-objects: brown-paper-bag fixes for multi-pack reuse | expand |
Taylor Blau <me@ttaylorr.com> writes: > @@ -2055,17 +2055,18 @@ static int try_partial_reuse(struct bitmap_index *bitmap_git, > struct bitmapped_pack *pack, > size_t bitmap_pos, > uint32_t pack_pos, > + off_t offset, > struct bitmap *reuse, > struct pack_window **w_curs) > { > - off_t offset, delta_obj_offset; > + off_t delta_obj_offset; > enum object_type type; > unsigned long size; > > if (pack_pos >= pack->p->num_objects) > return -1; /* not actually in the pack */ > > - offset = delta_obj_offset = pack_pos_to_offset(pack->p, pack_pos); We are now passing redundant piece of information "offset" that could be derived from two other parameters, which feels a bit icky, I suspect that try_partial_reuse() can be taught not to require pack_pos and instead work entirely on offset - The caller can pass a very large offset to represent "not actually in the pack" early return condition. - The logic to punt on a delta pointing backwards can be done by comparing the base_offset and offset, instead of comparing the base_pos and pack_pos. but it may not be worth the effort, unless we are going to do similar clean-up globally in all the codepaths that access packfiles. Thanks.
On Wed, Sep 04, 2024 at 11:54:18AM -0700, Junio C Hamano wrote: > Taylor Blau <me@ttaylorr.com> writes: > > > @@ -2055,17 +2055,18 @@ static int try_partial_reuse(struct bitmap_index *bitmap_git, > > struct bitmapped_pack *pack, > > size_t bitmap_pos, > > uint32_t pack_pos, > > + off_t offset, > > struct bitmap *reuse, > > struct pack_window **w_curs) > > { > > - off_t offset, delta_obj_offset; > > + off_t delta_obj_offset; > > enum object_type type; > > unsigned long size; > > > > if (pack_pos >= pack->p->num_objects) > > return -1; /* not actually in the pack */ > > > > - offset = delta_obj_offset = pack_pos_to_offset(pack->p, pack_pos); > > We are now passing redundant piece of information "offset" that > could be derived from two other parameters, which feels a bit icky, Yeah, I agree. The only spot we use pack_pos in this function is in the hunk above, and in the check where we reject any deltas whose base has a greater than or equal to bit position than the delta object's bit position. But I think both of those are redundant. The only caller in reuse_partial_packfile_from_bitmap_1() only passes bit positions that are set, so we'll never trip the bounds check above. For the latter check, it would suffice to compare object offsets in the single-pack case, since the bit positions corresponding to objects in a single pack are derived from the ordering of their actual offset. > - The logic to punt on a delta pointing backwards can be done by > comparing the base_offset and offset, instead of comparing the > base_pos and pack_pos. Ah, yes -- exactly ;-). I should have read the email in its entirety before starting to respond. > but it may not be worth the effort, unless we are going to do > similar clean-up globally in all the codepaths that access > packfiles. I'm not so sure we can do it in all cases, but at least in the case of try_partial_reuse(), doing something like this (which compiles and passes "make test") is fine: --- 8< --- diff --git a/pack-bitmap.c b/pack-bitmap.c index 9d9b8c4bfb..fc51f298d5 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -2054,7 +2054,6 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs, static int try_partial_reuse(struct bitmap_index *bitmap_git, struct bitmapped_pack *pack, size_t bitmap_pos, - uint32_t pack_pos, off_t offset, struct bitmap *reuse, struct pack_window **w_curs) @@ -2063,9 +2062,6 @@ static int try_partial_reuse(struct bitmap_index *bitmap_git, enum object_type type; unsigned long size; - if (pack_pos >= pack->p->num_objects) - return -1; /* not actually in the pack */ - delta_obj_offset = offset; type = unpack_object_header(pack->p, w_curs, &offset, &size); if (type < 0) @@ -2121,8 +2117,13 @@ static int try_partial_reuse(struct bitmap_index *bitmap_git, * OFS_DELTA is the default). But let's double check to * make sure the pack wasn't written with odd * parameters. + * + * Since we're working on a single-pack bitmap, we can + * use the object offset as a proxy for the bit + * position, since the bits are ordered by their + * positions within the pack. */ - if (base_pos >= pack_pos) + if (base_offset >= offset) return 0; base_bitmap_pos = pack->bitmap_pos + base_pos; } @@ -2218,8 +2219,8 @@ static void reuse_partial_packfile_from_bitmap_1(struct bitmap_index *bitmap_git ofs = pack_pos_to_offset(pack->p, pack_pos); } - if (try_partial_reuse(bitmap_git, pack, bit_pos, - pack_pos, ofs, reuse, &w_curs) < 0) { + if (try_partial_reuse(bitmap_git, pack, bit_pos, ofs, + reuse, &w_curs) < 0) { /* * try_partial_reuse indicated we couldn't reuse * any bits, so there is no point in trying more --- >8 --- Thanks, Taylor
diff --git a/pack-bitmap.c b/pack-bitmap.c index 218d7ac2eb..9d9b8c4bfb 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -2055,17 +2055,18 @@ static int try_partial_reuse(struct bitmap_index *bitmap_git, struct bitmapped_pack *pack, size_t bitmap_pos, uint32_t pack_pos, + off_t offset, struct bitmap *reuse, struct pack_window **w_curs) { - off_t offset, delta_obj_offset; + off_t delta_obj_offset; enum object_type type; unsigned long size; if (pack_pos >= pack->p->num_objects) return -1; /* not actually in the pack */ - offset = delta_obj_offset = pack_pos_to_offset(pack->p, pack_pos); + delta_obj_offset = offset; type = unpack_object_header(pack->p, w_curs, &offset, &size); if (type < 0) return -1; /* broken packfile, punt */ @@ -2184,6 +2185,7 @@ static void reuse_partial_packfile_from_bitmap_1(struct bitmap_index *bitmap_git for (offset = 0; offset < BITS_IN_EWORD; offset++) { size_t bit_pos; uint32_t pack_pos; + off_t ofs; if (word >> offset == 0) break; @@ -2198,7 +2200,6 @@ static void reuse_partial_packfile_from_bitmap_1(struct bitmap_index *bitmap_git if (bitmap_is_midx(bitmap_git)) { uint32_t midx_pos; - off_t ofs; midx_pos = pack_pos_to_midx(bitmap_git->midx, bit_pos); ofs = nth_midxed_offset(bitmap_git->midx, midx_pos); @@ -2213,10 +2214,12 @@ static void reuse_partial_packfile_from_bitmap_1(struct bitmap_index *bitmap_git BUG("advanced beyond the end of pack %s (%"PRIuMAX" > %"PRIu32")", pack_basename(pack->p), (uintmax_t)pack_pos, pack->p->num_objects); + + ofs = pack_pos_to_offset(pack->p, pack_pos); } if (try_partial_reuse(bitmap_git, pack, bit_pos, - pack_pos, reuse, &w_curs) < 0) { + pack_pos, ofs, reuse, &w_curs) < 0) { /* * try_partial_reuse indicated we couldn't reuse * any bits, so there is no point in trying more
When calling `try_partial_reuse()`, the (sole) caller from the function `reuse_partial_packfile_from_bitmap_1()` has to translate its bit position to a pack position. In the MIDX bitmap case, the caller translates from the bit position, to a position in the MIDX's pseudo-pack order (with `pack_pos_to_midx()`), then get a pack offset (with `nth_midxed_offset()`) before finally working backwards to get the pack position in the source pack by calling `offset_to_pack_pos()`. In the non-MIDX bitmap case, we can use the bit position as the pack position directly (see the comment at the beginning of the `reuse_partial_packfile_from_bitmap_1()` function for why). In either case, the first thing that `try_partial_reuse()` does after being called is determine the offset of the object at the given pack position by calling `pack_pos_to_offset()`. But we already have that information in the MIDX case! Avoid re-computing that information by instead passing it in. In the MIDX case, we already have that information stored. In the non-MIDX case, the call to `pack_pos_to_offset()` moves from the function `try_partial_reuse()` to its caller. In total, we'll save one call to `pack_pos_to_offset()` when processing MIDX bitmaps. (On my machine, there is a slight speed-up on the order of ~2ms, but it is within the margin of error over 10 runs, so I think you'd have to have a truly gigantic repository to confidently measure any significant improvement here). Signed-off-by: Taylor Blau <me@ttaylorr.com> --- pack-bitmap.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)