Message ID | ec57ff434900f2b95e31fbdf854b5ebbf46b5c78.1721250704.git.me@ttaylorr.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | midx: incremental multi-pack indexes, part one | expand |
On Wed, Jul 17, 2024 at 05:12:10PM -0400, Taylor Blau wrote: > The function `nth_midxed_object_oid()` returns the object ID for a given > object position in the MIDX lexicographic order. > > Teach this function to instead operate over the concatenated > lexicographic order defined in an earlier step so that it is able to be > used with incremental MIDXs. > > To do this, we need to both (a) adjust the bounds check for the given > 'n', as well as record the MIDX-local position after chasing the > `->base_midx` pointer to find the MIDX which contains that object. Yep, this makes sense. The hard thing about reviewing this, I think, is that each individual step like this is going to make sense, but I'll have very little clue what spots (if any) were missed. To some degree I think the proof will be in the pudding. If you missed any helpers, then the end result is going to crash and burn quite badly when used with a chained midx, and we'd see it in the test suite. And the nice thing is that most of this is abstracted inside these helpers, so we know the set of tricky places is generally limited to the helpers, and not arbitrary bits of midx code. -Peff
On Thu, Aug 01, 2024 at 05:38:05AM -0400, Jeff King wrote: > To some degree I think the proof will be in the pudding. If you missed > any helpers, then the end result is going to crash and burn quite badly > when used with a chained midx, and we'd see it in the test suite. And > the nice thing is that most of this is abstracted inside these helpers, > so we know the set of tricky places is generally limited to the helpers, > and not arbitrary bits of midx code. I think back in the old days, I might have considered experimenting with GitHub's fork of Git to build up this feature before sharing it with the list. But experience has taught me that it's far better to share early and often. Doing so gets you the benefit of having many eyes on the feature, not just from individuals at GitHub. Selfishly, it also reduces the pain of having to change some on-disk format that has already been widely rolled out within GitHub's infrastructure and similar, but the former motivation is much more compelling. In terms of "the pudding" here, I think that marking this feature as experimental / incomplete is a good way for us to push this forward and build up some real-world experience with brave users via a tagged version of Git. Then we can refine it until we are confident it has graduated the "experimental" phase. Thanks, Taylor
diff --git a/midx.c b/midx.c index da5e0bb940..d470a88755 100644 --- a/midx.c +++ b/midx.c @@ -337,9 +337,11 @@ struct object_id *nth_midxed_object_oid(struct object_id *oid, struct multi_pack_index *m, uint32_t n) { - if (n >= m->num_objects) + if (n >= m->num_objects + m->num_objects_in_base) return NULL; + n = midx_for_object(&m, n); + oidread(oid, m->chunk_oid_lookup + st_mult(m->hash_len, n), the_repository->hash_algo); return oid;
The function `nth_midxed_object_oid()` returns the object ID for a given object position in the MIDX lexicographic order. Teach this function to instead operate over the concatenated lexicographic order defined in an earlier step so that it is able to be used with incremental MIDXs. To do this, we need to both (a) adjust the bounds check for the given 'n', as well as record the MIDX-local position after chasing the `->base_midx` pointer to find the MIDX which contains that object. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- midx.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)