Message ID | 337ebc6de7bdf6ff3b4b09c2bea3df2802174e8b.1721250704.git.me@ttaylorr.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | midx: incremental multi-pack indexes, part one | expand |
On Wed, Jul 17, 2024 at 05:12:01PM -0400, Taylor Blau wrote: > The incremental MIDX chain feature is designed around the idea of > indexing into a concatenated lexicographic ordering of object IDs > present in the MIDX. > > When given an object position, the MIDX machinery needs to be able to > locate both (a) which MIDX layer contains the given object, and (b) at > what position *within that MIDX layer* that object appears. > > To do this, three new fields are added to the `struct multi_pack_index`: > > - struct multi_pack_index *base_midx; > - uint32_t num_objects_in_base; > - uint32_t num_packs_in_base; > > These three fields store the pieces of information suggested by their > respective field names. In turn, the `num_objects_in_base` and > `num_packs_in_base` fields are used to crawl backwards along the > `base_midx` pointer to locate the appropriate position for a given > object within the MIDX that contains it. OK, so base_midx is a back-pointer. I think in theory you could compute num_objects_in_base on the fly by doing that crawl yourself, but we'd want to be able to do it in constant time, rather than O(# of midx)? Makes sense. -Peff
On Thu, Aug 01, 2024 at 05:21:42AM -0400, Jeff King wrote: > On Wed, Jul 17, 2024 at 05:12:01PM -0400, Taylor Blau wrote: > > > The incremental MIDX chain feature is designed around the idea of > > indexing into a concatenated lexicographic ordering of object IDs > > present in the MIDX. > > > > When given an object position, the MIDX machinery needs to be able to > > locate both (a) which MIDX layer contains the given object, and (b) at > > what position *within that MIDX layer* that object appears. > > > > To do this, three new fields are added to the `struct multi_pack_index`: > > > > - struct multi_pack_index *base_midx; > > - uint32_t num_objects_in_base; > > - uint32_t num_packs_in_base; > > > > These three fields store the pieces of information suggested by their > > respective field names. In turn, the `num_objects_in_base` and > > `num_packs_in_base` fields are used to crawl backwards along the > > `base_midx` pointer to locate the appropriate position for a given > > object within the MIDX that contains it. > > OK, so base_midx is a back-pointer. I think in theory you could compute > num_objects_in_base on the fly by doing that crawl yourself, but we'd > want to be able to do it in constant time, rather than O(# of midx)? > > Makes sense. Yep. As you have seen in the later patches in this series, we end up needing to read num_objects_in_base and num_packs_in_base quite frequently, so it's nice to have them precomputed. We could compute them lazily, but they're easy to build up in midx.c::add_midx_to_chain(), where we're already validating that, for e.g.: if (unsigned_add_overflows(midx_chain->num_packs, midx_chain->num_packs_in_base)) { /* ... */ } midx->num_packs_in_base = midx_chain->num_packs + midx_chain->num_packs_in_base; > -Peff Thanks, Taylor
diff --git a/midx.h b/midx.h index 8554f2d616..020e49f77c 100644 --- a/midx.h +++ b/midx.h @@ -63,6 +63,10 @@ struct multi_pack_index { const unsigned char *chunk_revindex; size_t chunk_revindex_len; + struct multi_pack_index *base_midx; + uint32_t num_objects_in_base; + uint32_t num_packs_in_base; + const char **pack_names; struct packed_git **packs; char object_dir[FLEX_ARRAY];
The incremental MIDX chain feature is designed around the idea of indexing into a concatenated lexicographic ordering of object IDs present in the MIDX. When given an object position, the MIDX machinery needs to be able to locate both (a) which MIDX layer contains the given object, and (b) at what position *within that MIDX layer* that object appears. To do this, three new fields are added to the `struct multi_pack_index`: - struct multi_pack_index *base_midx; - uint32_t num_objects_in_base; - uint32_t num_packs_in_base; These three fields store the pieces of information suggested by their respective field names. In turn, the `num_objects_in_base` and `num_packs_in_base` fields are used to crawl backwards along the `base_midx` pointer to locate the appropriate position for a given object within the MIDX that contains it. The following commits will update various parts of the MIDX machinery (as well as their callers from outside of midx.c and midx-write.c) to be aware and make use of these fields when performing object lookups. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- midx.h | 4 ++++ 1 file changed, 4 insertions(+)