Message ID | a8251f8278ba9a3b41a8e299cb4918a62df6d1c7.1713163238.git.ps@pks.im (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] pack-bitmap: gracefully handle missing BTMP chunks | expand |
On Mon, Apr 15, 2024 at 08:41:25AM +0200, Patrick Steinhardt wrote: > In 0fea6b73f1 (Merge branch 'tb/multi-pack-verbatim-reuse', 2024-01-12) > we have introduced multi-pack verbatim reuse of objects. This series has > introduced a new BTMP chunk, which encodes information about bitmapped > objects in the multi-pack index. Starting with dab60934e3 (pack-bitmap: > pass `bitmapped_pack` struct to pack-reuse functions, 2023-12-14) we use > this information to figure out objects which we can reuse from each of > the packfiles. > > One thing that we glossed over though is backwards compatibility with > repositories that do not yet have BTMP chunks in their multi-pack index. > In that case, `nth_bitmapped_pack()` would return an error, which causes > us to emit a warning followed by another error message. These warnings > are visible to users that fetch from a repository: > > ``` > $ git fetch > ... > remote: error: MIDX does not contain the BTMP chunk > remote: warning: unable to load pack: 'pack-f6bb7bd71d345ea9fe604b60cab9ba9ece54ffbe.idx', disabling pack-reuse > remote: Enumerating objects: 40, done. > remote: Counting objects: 100% (40/40), done. > remote: Compressing objects: 100% (39/39), done. > remote: Total 40 (delta 5), reused 0 (delta 0), pack-reused 0 (from 0) > ... > ``` > > While the fetch succeeds the user is left wondering what they did wrong. > Furthermore, as visible both from the warning and from the reuse stats, > pack-reuse is completely disabled in such repositories. > > What is quite interesting is that this issue can even be triggered in > case `pack.allowPackReuse=single` is set, which is the default value. > One could have expected that in this case we fall back to the old logic, > which is to use the preferred packfile without consulting BTMP chunks at > all. But either we fail with the above error in case they are missing, > or we use the first pack in the multi-pack-index. The former case > disables pack-reuse altogether, whereas the latter case may result in > reusing objects from a suboptimal packfile. > > Fix this issue by partially reverting the logic back to what we had > before this patch series landed. Namely, in the case where we have no > BTMP chunks or when `pack.allowPackReuse=single` are set, we use the > preferred pack instead of consulting the BTMP chunks. > > Helped-by: Taylor Blau <me@ttaylorr.com> > Signed-off-by: Patrick Steinhardt <ps@pks.im> Junio, it would be great if we could still land this fix in Git v2.45 given that it is addressing a regression in Git v2.44. This of course assumes that the current version of this patch looks good to Taylor. Patrick
Patrick Steinhardt <ps@pks.im> writes: >> Helped-by: Taylor Blau <me@ttaylorr.com> >> Signed-off-by: Patrick Steinhardt <ps@pks.im> > > Junio, it would be great if we could still land this fix in Git v2.45 > given that it is addressing a regression in Git v2.44. This of course > assumes that the current version of this patch looks good to Taylor. Indeed. It would be nice to see an acked by or something. Will queue, in the meantime. Thanks for a ping.
On Mon, Apr 15, 2024 at 08:41:25AM +0200, Patrick Steinhardt wrote: > diff --git a/midx.c b/midx.c > index ae3b49166c..6f07de3688 100644 > --- a/midx.c > +++ b/midx.c > @@ -170,9 +170,10 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local > > pair_chunk(cf, MIDX_CHUNKID_LARGEOFFSETS, &m->chunk_large_offsets, > &m->chunk_large_offsets_len); > - pair_chunk(cf, MIDX_CHUNKID_BITMAPPEDPACKS, > - (const unsigned char **)&m->chunk_bitmapped_packs, > - &m->chunk_bitmapped_packs_len); > + if (git_env_bool("GIT_TEST_MIDX_READ_BTMP", 1)) > + pair_chunk(cf, MIDX_CHUNKID_BITMAPPEDPACKS, > + (const unsigned char **)&m->chunk_bitmapped_packs, > + &m->chunk_bitmapped_packs_len); OK, so we're switching to a new GIT_TEST_-variable here, which controls whether or not we read the BTMP chunk. That makes sense, and is much appreciated :-). > diff --git a/pack-bitmap.c b/pack-bitmap.c > index 2baeabacee..35c5ef9d3c 100644 > --- a/pack-bitmap.c > +++ b/pack-bitmap.c > @@ -2049,7 +2049,10 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git, > > load_reverse_index(r, bitmap_git); > > - if (bitmap_is_midx(bitmap_git)) { > + if (!bitmap_is_midx(bitmap_git) || !bitmap_git->midx->chunk_bitmapped_packs) > + multi_pack_reuse = 0; > + Either we don't have a MIDX, or we do, but it doesn't have a BTMP chunk. In either case, we should disable multi-pack reuse (either using the single pack corresponding with a classic pack-bitmap, or the preferred pack if using a MIDX bitamp written prior to the BTMP chunk). Looking good. > + if (multi_pack_reuse) { > for (i = 0; i < bitmap_git->midx->num_packs; i++) { > struct bitmapped_pack pack; > if (nth_bitmapped_pack(r, bitmap_git->midx, &pack, i) < 0) { > @@ -2062,34 +2065,32 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git, > if (!pack.bitmap_nr) > continue; > > - if (!multi_pack_reuse && pack.bitmap_pos) { > - /* > - * If we're only reusing a single pack, skip > - * over any packs which are not positioned at > - * the beginning of the MIDX bitmap. > - * > - * This is consistent with the existing > - * single-pack reuse behavior, which only reuses > - * parts of the MIDX's preferred pack. > - */ > - continue; > - } Yep, this hunk can go since it used to belong to the outer if-statement in the pre-image that was conditioned on 'bitmap_is_midx()'. This is dealt with separately, since we know ahead of time we're doing multi-pack reuse (and can do so). > - > ALLOC_GROW(packs, packs_nr + 1, packs_alloc); > memcpy(&packs[packs_nr++], &pack, sizeof(pack)); > > objects_nr += pack.p->num_objects; > - > - if (!multi_pack_reuse) > - break; > } > > QSORT(packs, packs_nr, bitmapped_pack_cmp); > } else { > + struct packed_git *pack; > + > + if (bitmap_is_midx(bitmap_git)) { > + uint32_t preferred_pack_pos; > + > + if (midx_preferred_pack(bitmap_git->midx, &preferred_pack_pos) < 0) { > + warning(_("unable to compute preferred pack, disabling pack-reuse")); > + return; > + } > + > + pack = bitmap_git->midx->packs[preferred_pack_pos]; > + } else { > + pack = bitmap_git->pack; > + } > + Looking good. Here we're doing single-pack reuse (either from the pack corresponding with the bitmap or the MIDX's preferred pack). Either way we set the 'pack' variable to point at the appropriate pack, and then add that pack to the list of reusable packs below. Good. > ALLOC_GROW(packs, packs_nr + 1, packs_alloc); > - > - packs[packs_nr].p = bitmap_git->pack; > - packs[packs_nr].bitmap_nr = bitmap_git->pack->num_objects; > + packs[packs_nr].p = pack; > + packs[packs_nr].bitmap_nr = pack->num_objects; > packs[packs_nr].bitmap_pos = 0; > > objects_nr = packs[packs_nr++].bitmap_nr; Makes sense. > diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh > index 70d1b58709..5d7d321840 100755 > --- a/t/t5326-multi-pack-bitmaps.sh > +++ b/t/t5326-multi-pack-bitmaps.sh > @@ -513,4 +513,21 @@ test_expect_success 'corrupt MIDX with bitmap causes fallback' ' > ) > ' > > +for allow_pack_reuse in single multi > +do > + test_expect_success "reading MIDX without BTMP chunk does not complain with $allow_pack_reuse pack reuse" ' > + test_when_finished "rm -rf midx-without-btmp" && > + git init midx-without-btmp && > + ( > + cd midx-without-btmp && > + test_commit initial && > + > + git repack -Adbl --write-bitmap-index --write-midx && `-b` is redundant with `--write-bitmap-index`. > + GIT_TEST_MIDX_READ_BTMP=false git -c pack.allowPackReuse=$allow_pack_reuse \ > + pack-objects --all --use-bitmap-index --stdout </dev/null >/dev/null 2>err && A small note here, but setting stdin to read from /dev/null is unnecessary with `--all.` > + test_must_be_empty err > + ) > + ' > +done > + This test looks like it's exercising the right thing, but I'm not sure why it was split into two separate tests. Perhaps to allow the two to fail separately? Either way, the repository initialization, test_commit, and repacking could probably be combined into a single step to avoid re-running them for different values of $allow_pack_reuse. I would probably have written: git init midx-without-btmp && ( cd midx-without-btmp && test_commit base && git repack -adb --write-midx && for c in single multi do GIT_TEST_MIDX_READ_BTMP=false git -c pack.allowPackReuse=$c pack-objects \ --all --use-bitmap-index --stdout >/dev/null 2>err && test_must_be_empty err || return 1 done ) TBH, I would like to see this test cleaned up before merging this one down. But otherwise this patch is looking good. Thanks, Taylor
On Mon, Apr 15, 2024 at 10:41:09AM -0700, Junio C Hamano wrote: > Patrick Steinhardt <ps@pks.im> writes: > > >> Helped-by: Taylor Blau <me@ttaylorr.com> > >> Signed-off-by: Patrick Steinhardt <ps@pks.im> > > > > Junio, it would be great if we could still land this fix in Git v2.45 > > given that it is addressing a regression in Git v2.44. This of course > > assumes that the current version of this patch looks good to Taylor. > > Indeed. It would be nice to see an acked by or something. > > Will queue, in the meantime. Thanks for a ping. I took a look, and I think the patch is good. I have a couple of notes on the test that I would prefer to see addressed before merging it down, though. Thanks, Taylor
Taylor Blau <me@ttaylorr.com> writes: > On Mon, Apr 15, 2024 at 10:41:09AM -0700, Junio C Hamano wrote: >> Patrick Steinhardt <ps@pks.im> writes: >> >> >> Helped-by: Taylor Blau <me@ttaylorr.com> >> >> Signed-off-by: Patrick Steinhardt <ps@pks.im> >> > >> > Junio, it would be great if we could still land this fix in Git v2.45 >> > given that it is addressing a regression in Git v2.44. This of course >> > assumes that the current version of this patch looks good to Taylor. >> >> Indeed. It would be nice to see an acked by or something. >> >> Will queue, in the meantime. Thanks for a ping. > > I took a look, and I think the patch is good. I have a couple of notes > on the test that I would prefer to see addressed before merging it down, > though. Thanks.
On Mon, Apr 15, 2024 at 06:51:16PM -0400, Taylor Blau wrote: > On Mon, Apr 15, 2024 at 08:41:25AM +0200, Patrick Steinhardt wrote: [snip] > > diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh > > index 70d1b58709..5d7d321840 100755 > > --- a/t/t5326-multi-pack-bitmaps.sh > > +++ b/t/t5326-multi-pack-bitmaps.sh > > @@ -513,4 +513,21 @@ test_expect_success 'corrupt MIDX with bitmap causes fallback' ' > > ) > > ' > > > > +for allow_pack_reuse in single multi > > +do > > + test_expect_success "reading MIDX without BTMP chunk does not complain with $allow_pack_reuse pack reuse" ' > > + test_when_finished "rm -rf midx-without-btmp" && > > + git init midx-without-btmp && > > + ( > > + cd midx-without-btmp && > > + test_commit initial && > > + > > + git repack -Adbl --write-bitmap-index --write-midx && > > `-b` is redundant with `--write-bitmap-index`. Oops, right. > > + GIT_TEST_MIDX_READ_BTMP=false git -c pack.allowPackReuse=$allow_pack_reuse \ > > + pack-objects --all --use-bitmap-index --stdout </dev/null >/dev/null 2>err && > > A small note here, but setting stdin to read from /dev/null is > unnecessary with `--all.` Is it really? Executing `git pack-objects --all --stdout` on my system blocks until stdin is closed. It _seems_ to work in the tests alright, but doesn't work outside of them. Which is puzzling on its own. > > + test_must_be_empty err > > + ) > > + ' > > +done > > + > > This test looks like it's exercising the right thing, but I'm not sure > why it was split into two separate tests. Perhaps to allow the two to > fail separately? Exactly. It makes it easier to see which of both tests fails in case only one does. > Either way, the repository initialization, test_commit, and repacking > could probably be combined into a single step to avoid re-running them > for different values of $allow_pack_reuse. > > I would probably have written: > > git init midx-without-btmp && > ( > cd midx-without-btmp && > > test_commit base && > git repack -adb --write-midx && > > for c in single multi > do > GIT_TEST_MIDX_READ_BTMP=false git -c pack.allowPackReuse=$c pack-objects \ > --all --use-bitmap-index --stdout >/dev/null 2>err && > test_must_be_empty err || return 1 > done > ) > > TBH, I would like to see this test cleaned up before merging this one > down. But otherwise this patch is looking good. So I'm a bit torn here. I think your proposed way to test things is inferior regarding usability, even though it is superior regarding performance. We could move the common setup into a separate test, but that has the issue that tests cannot easily be run as self-contained units. Patrick
On Tue, Apr 16, 2024 at 06:47:51AM +0200, Patrick Steinhardt wrote: > > > + GIT_TEST_MIDX_READ_BTMP=false git -c pack.allowPackReuse=$allow_pack_reuse \ > > > + pack-objects --all --use-bitmap-index --stdout </dev/null >/dev/null 2>err && > > > > A small note here, but setting stdin to read from /dev/null is > > unnecessary with `--all.` > > Is it really? Executing `git pack-objects --all --stdout` on my system > blocks until stdin is closed. It _seems_ to work in the tests alright, > but doesn't work outside of them. Which is puzzling on its own. Inside a test_expect block, stdin is already redirected from /dev/null. See 781f76b158 (test-lib: redirect stdin of tests, 2011-12-15). I do think it's still good practice to redirect from /dev/null explicitly to indicate the intent. -Peff
On Tue, Apr 16, 2024 at 01:12:32AM -0400, Jeff King wrote: > On Tue, Apr 16, 2024 at 06:47:51AM +0200, Patrick Steinhardt wrote: > > > > > + GIT_TEST_MIDX_READ_BTMP=false git -c pack.allowPackReuse=$allow_pack_reuse \ > > > > + pack-objects --all --use-bitmap-index --stdout </dev/null >/dev/null 2>err && > > > > > > A small note here, but setting stdin to read from /dev/null is > > > unnecessary with `--all.` > > > > Is it really? Executing `git pack-objects --all --stdout` on my system > > blocks until stdin is closed. It _seems_ to work in the tests alright, > > but doesn't work outside of them. Which is puzzling on its own. > > Inside a test_expect block, stdin is already redirected from /dev/null. > See 781f76b158 (test-lib: redirect stdin of tests, 2011-12-15). > > I do think it's still good practice to redirect from /dev/null > explicitly to indicate the intent. Ah, that explains. Thanks! Patrick
diff --git a/midx.c b/midx.c index ae3b49166c..6f07de3688 100644 --- a/midx.c +++ b/midx.c @@ -170,9 +170,10 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local pair_chunk(cf, MIDX_CHUNKID_LARGEOFFSETS, &m->chunk_large_offsets, &m->chunk_large_offsets_len); - pair_chunk(cf, MIDX_CHUNKID_BITMAPPEDPACKS, - (const unsigned char **)&m->chunk_bitmapped_packs, - &m->chunk_bitmapped_packs_len); + if (git_env_bool("GIT_TEST_MIDX_READ_BTMP", 1)) + pair_chunk(cf, MIDX_CHUNKID_BITMAPPEDPACKS, + (const unsigned char **)&m->chunk_bitmapped_packs, + &m->chunk_bitmapped_packs_len); if (git_env_bool("GIT_TEST_MIDX_READ_RIDX", 1)) pair_chunk(cf, MIDX_CHUNKID_REVINDEX, &m->chunk_revindex, diff --git a/pack-bitmap.c b/pack-bitmap.c index 2baeabacee..35c5ef9d3c 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -2049,7 +2049,10 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git, load_reverse_index(r, bitmap_git); - if (bitmap_is_midx(bitmap_git)) { + if (!bitmap_is_midx(bitmap_git) || !bitmap_git->midx->chunk_bitmapped_packs) + multi_pack_reuse = 0; + + if (multi_pack_reuse) { for (i = 0; i < bitmap_git->midx->num_packs; i++) { struct bitmapped_pack pack; if (nth_bitmapped_pack(r, bitmap_git->midx, &pack, i) < 0) { @@ -2062,34 +2065,32 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git, if (!pack.bitmap_nr) continue; - if (!multi_pack_reuse && pack.bitmap_pos) { - /* - * If we're only reusing a single pack, skip - * over any packs which are not positioned at - * the beginning of the MIDX bitmap. - * - * This is consistent with the existing - * single-pack reuse behavior, which only reuses - * parts of the MIDX's preferred pack. - */ - continue; - } - ALLOC_GROW(packs, packs_nr + 1, packs_alloc); memcpy(&packs[packs_nr++], &pack, sizeof(pack)); objects_nr += pack.p->num_objects; - - if (!multi_pack_reuse) - break; } QSORT(packs, packs_nr, bitmapped_pack_cmp); } else { + struct packed_git *pack; + + if (bitmap_is_midx(bitmap_git)) { + uint32_t preferred_pack_pos; + + if (midx_preferred_pack(bitmap_git->midx, &preferred_pack_pos) < 0) { + warning(_("unable to compute preferred pack, disabling pack-reuse")); + return; + } + + pack = bitmap_git->midx->packs[preferred_pack_pos]; + } else { + pack = bitmap_git->pack; + } + ALLOC_GROW(packs, packs_nr + 1, packs_alloc); - - packs[packs_nr].p = bitmap_git->pack; - packs[packs_nr].bitmap_nr = bitmap_git->pack->num_objects; + packs[packs_nr].p = pack; + packs[packs_nr].bitmap_nr = pack->num_objects; packs[packs_nr].bitmap_pos = 0; objects_nr = packs[packs_nr++].bitmap_nr; diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh index 70d1b58709..5d7d321840 100755 --- a/t/t5326-multi-pack-bitmaps.sh +++ b/t/t5326-multi-pack-bitmaps.sh @@ -513,4 +513,21 @@ test_expect_success 'corrupt MIDX with bitmap causes fallback' ' ) ' +for allow_pack_reuse in single multi +do + test_expect_success "reading MIDX without BTMP chunk does not complain with $allow_pack_reuse pack reuse" ' + test_when_finished "rm -rf midx-without-btmp" && + git init midx-without-btmp && + ( + cd midx-without-btmp && + test_commit initial && + + git repack -Adbl --write-bitmap-index --write-midx && + GIT_TEST_MIDX_READ_BTMP=false git -c pack.allowPackReuse=$allow_pack_reuse \ + pack-objects --all --use-bitmap-index --stdout </dev/null >/dev/null 2>err && + test_must_be_empty err + ) + ' +done + test_done