Message ID | 38b642d40423f2e14dd47d676de54f264b9d6f22.1721250704.git.me@ttaylorr.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | midx: incremental multi-pack indexes, part one | expand |
On Wed, Jul 17, 2024 at 05:12:35PM -0400, Taylor Blau wrote: > The function `midx_preferred_pack()` is used to determine the identity > of the preferred pack, which is the identity of a unique pack within > the MIDX which is used as a tie-breaker when selecting from which pack > to represent an object that appears in multiple packs within the MIDX. > > Historically we have said that the MIDX's preferred pack has the unique > property that all objects from that pack are represented in the MIDX. > But that isn't quite true: a more precise statement would be that all > objects from that pack *which appear in the MIDX* are selected from that > pack. > > This helps us extend the concept of preferred packs across a MIDX chain, > where some object(s) in the preferred pack may appear in other packs > in an earlier MIDX layer, in which case those object(s) will not appear > in a subsequent MIDX layer from either the preferred pack or any other > pack. OK, that matches my intuition for how the preferred concept should exist. I'm not quite clear on how that will affect the code, though. > diff --git a/midx.c b/midx.c > index 0fa8febb9d..d2dbea41e4 100644 > --- a/midx.c > +++ b/midx.c > @@ -500,13 +500,16 @@ int midx_contains_pack(struct multi_pack_index *m, const char *idx_or_pack_name) > int midx_preferred_pack(struct multi_pack_index *m, uint32_t *pack_int_id) > { > if (m->preferred_pack_idx == -1) { > + uint32_t midx_pos; > if (load_midx_revindex(m) < 0) { > m->preferred_pack_idx = -2; > return -1; > } > > - m->preferred_pack_idx = > - nth_midxed_pack_int_id(m, pack_pos_to_midx(m, 0)); > + midx_pos = pack_pos_to_midx(m, m->num_objects_in_base); > + > + m->preferred_pack_idx = nth_midxed_pack_int_id(m, midx_pos); > + OK, so rather than looking for the pack of object 0, we're looking for the first one in _this_ layer, since the position is global within the midx. That makes some sense, but is pack_pos_to_midx() ready for that? It looks like it just looks at m->revindex_data. Are we going to be generating a revindex for the whole chain? I'd think that each layer would have its own revindex (and any trickiness would happen at the generation stage, making sure we don't insert objects that are already mentioned in earlier layers). -Peff
On Thu, Aug 01, 2024 at 06:25:17AM -0400, Jeff King wrote: > > diff --git a/midx.c b/midx.c > > index 0fa8febb9d..d2dbea41e4 100644 > > --- a/midx.c > > +++ b/midx.c > > @@ -500,13 +500,16 @@ int midx_contains_pack(struct multi_pack_index *m, const char *idx_or_pack_name) > > int midx_preferred_pack(struct multi_pack_index *m, uint32_t *pack_int_id) > > { > > if (m->preferred_pack_idx == -1) { > > + uint32_t midx_pos; > > if (load_midx_revindex(m) < 0) { > > m->preferred_pack_idx = -2; > > return -1; > > } > > > > - m->preferred_pack_idx = > > - nth_midxed_pack_int_id(m, pack_pos_to_midx(m, 0)); > > + midx_pos = pack_pos_to_midx(m, m->num_objects_in_base); > > + > > + m->preferred_pack_idx = nth_midxed_pack_int_id(m, midx_pos); > > + > > OK, so rather than looking for the pack of object 0, we're looking for > the first one in _this_ layer, since the position is global within the > midx. That makes some sense, but is pack_pos_to_midx() ready for that? > It looks like it just looks at m->revindex_data. Are we going to be > generating a revindex for the whole chain? I'd think that each layer > would have its own revindex (and any trickiness would happen at the > generation stage, making sure we don't insert objects that are already > mentioned in earlier layers). pack_pos_to_midx() is kind of ready, and kind of not. The way that the pseudo-pack order is constructed within the midx-write.c code, we will write reverse indexes (within each MIDX layer itself as a separate chunk) that contain data for each object within that layer in the expected reverse index format. But we don't bother writing any reverse indexes for MIDXs which are incremental at this point in the multi-series plan, since we just bail if the BITMAP flag is set (saying that it is unsupported at this point). Arguably we could have just left this hunk / patch out of the series as a whole. It's this kind of stuff that's really at the boundary between adjacent "phases" that I think is awkward no matter which way you slice it. > -Peff Thanks, Taylor
diff --git a/midx.c b/midx.c index 0fa8febb9d..d2dbea41e4 100644 --- a/midx.c +++ b/midx.c @@ -500,13 +500,16 @@ int midx_contains_pack(struct multi_pack_index *m, const char *idx_or_pack_name) int midx_preferred_pack(struct multi_pack_index *m, uint32_t *pack_int_id) { if (m->preferred_pack_idx == -1) { + uint32_t midx_pos; if (load_midx_revindex(m) < 0) { m->preferred_pack_idx = -2; return -1; } - m->preferred_pack_idx = - nth_midxed_pack_int_id(m, pack_pos_to_midx(m, 0)); + midx_pos = pack_pos_to_midx(m, m->num_objects_in_base); + + m->preferred_pack_idx = nth_midxed_pack_int_id(m, midx_pos); + } else if (m->preferred_pack_idx == -2) return -1; /* no revindex */
The function `midx_preferred_pack()` is used to determine the identity of the preferred pack, which is the identity of a unique pack within the MIDX which is used as a tie-breaker when selecting from which pack to represent an object that appears in multiple packs within the MIDX. Historically we have said that the MIDX's preferred pack has the unique property that all objects from that pack are represented in the MIDX. But that isn't quite true: a more precise statement would be that all objects from that pack *which appear in the MIDX* are selected from that pack. This helps us extend the concept of preferred packs across a MIDX chain, where some object(s) in the preferred pack may appear in other packs in an earlier MIDX layer, in which case those object(s) will not appear in a subsequent MIDX layer from either the preferred pack or any other pack. Extend the concept of preferred packs by using the pack which represents the object at the first position in MIDX pseudo-pack order belonging to the current MIDX layer (i.e., at position 'm->num_objects_in_base'). Signed-off-by: Taylor Blau <me@ttaylorr.com> --- midx.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)