Message ID | cover.1653418457.git.me@ttaylorr.com (mailing list archive) |
---|---|
Headers | show |
Series | pack-objects: fix a pair of MIDX bitmap-related races | expand |
Taylor Blau <me@ttaylorr.com> writes: > verbatim reuse (c.f., `write_reused_pack_verbatim()`). Unlike "e.g." and "i.e.", I think these should all be "cf." (there are many others). > + This patch handles the "preferred" pack (c.f., the section > + "multi-pack-index reverse indexes" in > + Documentation/technical/pack-format.txt) specially, since pack-objects > + depends on reusing exact chunks of that pack verbatim in > + reuse_partial_packfile_from_bitmap(). So if that pack cannot be loaded, > + the utility of a bitmap is significantly diminished. It explains why we care about the "preferred" pack, which is a nice clarification. It hints that the other packs do not matter as much, and it is clearly stated that how they are handled is ... > + Similar to dc1daacdcc, we could technically just add this check in > + reuse_partial_packfile_from_bitmap(), since it's possible to use a MIDX > + .bitmap without needing to open any of its packs. But it's simpler to do > + the check as early as possible, covering all direct uses of the > + preferred pack. Note that doing this check early requires us to call > + prepare_midx_pack() early, too, so move the relevant part of that loop > + from load_reverse_index() into open_midx_bitmap_1(). > + > + Subsequent patches handle the non-preferred packs in a slightly > + different fashion. ... left for later steps. Excellent write-up. > Signed-off-by: Taylor Blau <me@ttaylorr.com> > > @@ pack-bitmap.c: static int open_midx_bitmap_1(struct bitmap_index *bitmap_git, > + > + preferred = bitmap_git->midx->packs[midx_preferred_pack(bitmap_git)]; > + if (!is_pack_valid(preferred)) { > -+ close(fd); > + warning(_("preferred pack (%s) is invalid"), > + preferred->pack_name); > + goto cleanup; > 2: 9adf6e1341 < -: ---------- builtin/pack-objects.c: ensure pack validity from MIDX bitmap objects > -: ---------- > 2: 2719d33f32 builtin/pack-objects.c: avoid redundant NULL check > -: ---------- > 3: cdc3265ec2 builtin/pack-objects.c: ensure included `--stdin-packs` exist > -: ---------- > 4: 3fc3a95517 builtin/pack-objects.c: ensure pack validity from MIDX bitmap objects
On Tue, May 24, 2022 at 02:38:39PM -0700, Junio C Hamano wrote: > Taylor Blau <me@ttaylorr.com> writes: > > > verbatim reuse (c.f., `write_reused_pack_verbatim()`). > > Unlike "e.g." and "i.e.", I think these should all be "cf." (there > are many others). Oops. You learn something new everyday! ;-) > > + This patch handles the "preferred" pack (c.f., the section > > + "multi-pack-index reverse indexes" in > > + Documentation/technical/pack-format.txt) specially, since pack-objects > > + depends on reusing exact chunks of that pack verbatim in > > + reuse_partial_packfile_from_bitmap(). So if that pack cannot be loaded, > > + the utility of a bitmap is significantly diminished. > > It explains why we care about the "preferred" pack, which is a nice > clarification. It hints that the other packs do not matter as much, > and it is clearly stated that how they are handled is ... > > > + Similar to dc1daacdcc, we could technically just add this check in > > + reuse_partial_packfile_from_bitmap(), since it's possible to use a MIDX > > + .bitmap without needing to open any of its packs. But it's simpler to do > > + the check as early as possible, covering all direct uses of the > > + preferred pack. Note that doing this check early requires us to call > > + prepare_midx_pack() early, too, so move the relevant part of that loop > > + from load_reverse_index() into open_midx_bitmap_1(). > > + > > + Subsequent patches handle the non-preferred packs in a slightly > > + different fashion. > > ... left for later steps. > > Excellent write-up. Thanks very much! Taylor
This is a small-ish reroll of mine and Victoria's series to fix a couple of races related to using multi-pack bitmaps in pack-objects. The crux of both is that we call `is_pack_valid()` far too late, leaving us in a situation where `pack-objects` committed to using objects from a specific pack in the MIDX bitmap, without having actually opened those packs. So if those packs are removed in the background (e.g., due to a simultaneous repack), any ongoing clones or fetches will see this error message: remote: Enumerating objects: 1498802, done. remote: fatal: packfile ./objects/pack/pack-$HASH.pack cannot be accessed remote: aborting due to possible repository corruption on the remote side. The first patch is mostly unchanged (except for removing an accidental double-close()), but the second patch has itself turned into three new patches in order to resolve the issue described in [1]. Thanks in advance for your review! [1]: https://lore.kernel.org/git/Yn+v8mEHm2sfo4sn@nand.local/ Taylor Blau (4): pack-bitmap.c: check preferred pack validity when opening MIDX bitmap builtin/pack-objects.c: avoid redundant NULL check builtin/pack-objects.c: ensure included `--stdin-packs` exist builtin/pack-objects.c: ensure pack validity from MIDX bitmap objects builtin/pack-objects.c | 43 +++++++++++++++++++++++++----------------- pack-bitmap.c | 18 ++++++++++++++++-- 2 files changed, 42 insertions(+), 19 deletions(-) Range-diff against v1: 1: 83e2ad3962 ! 1: 618e8a6166 pack-bitmap: check preferred pack validity when opening MIDX bitmap @@ Metadata Author: Taylor Blau <me@ttaylorr.com> ## Commit message ## - pack-bitmap: check preferred pack validity when opening MIDX bitmap + pack-bitmap.c: check preferred pack validity when opening MIDX bitmap When pack-objects adds an entry to its packing list, it marks the packfile and offset containing the object, which we may later use during verbatim reuse (c.f., `write_reused_pack_verbatim()`). If the packfile in question is deleted in the background (e.g., due to a - concurrent `git repack`), we'll die() as a result of calling use_pack(). - 4c08018204 (pack-objects: protect against disappearing packs, - 2011-10-14) worked around this by opening the pack ahead of time before - recording it as a valid source for reuse. + concurrent `git repack`), we'll die() as a result of calling use_pack(), + unless we have an open file descriptor on the pack itself. 4c08018204 + (pack-objects: protect against disappearing packs, 2011-10-14) worked + around this by opening the pack ahead of time before recording it as a + valid source for reuse. 4c08018204's treatment meant that we could tolerate disappearing packs, - since it ensures we always have an open file descriptor any pack that we - mark as a valid source for reuse. This tightens the race to only happen - when we need to close an open pack's file descriptor (c.f., the caller - of `packfile.c::get_max_fd_limit()`) _and_ that pack was deleted, in - which case we'll complain that a pack could not be accessed and die(). + since it ensures we always have an open file descriptor on any pack that + we mark as a valid source for reuse. This tightens the race to only + happen when we need to close an open pack's file descriptor (c.f., the + caller of `packfile.c::get_max_fd_limit()`) _and_ that pack was deleted, + in which case we'll complain that a pack could not be accessed and + die(). - The pack bitmap code does this, too, since prior to bab919bd44 - (pack-bitmap: check pack validity when opening bitmap, 2015-03-26) it + The pack bitmap code does this, too, since prior to dc1daacdcc + (pack-bitmap: check pack validity when opening bitmap, 2021-07-23) it was vulnerable to the same race. The MIDX bitmap code does not do this, and is vulnerable to the same - race. Apply the same treatment as bab919bd44 to the routine responsible - for opening multi-pack bitmaps to close this race. + race. Apply the same treatment as dc1daacdcc to the routine responsible + for opening the multi-pack bitmap's preferred pack to close this race. - Similar to bab919bd44, we could technically just add this check in - reuse_partial_packfile_from_bitmap(), since it's technically possible to - use a MIDX .bitmap without needing to open any of its packs. But it's - simpler to do the check as early as possible, covering all direct uses - of the preferred pack. Note that doing this check early requires us to - call prepare_midx_pack() early, too, so move the relevant part of that - loop from load_reverse_index() into open_midx_bitmap_1(). + This patch handles the "preferred" pack (c.f., the section + "multi-pack-index reverse indexes" in + Documentation/technical/pack-format.txt) specially, since pack-objects + depends on reusing exact chunks of that pack verbatim in + reuse_partial_packfile_from_bitmap(). So if that pack cannot be loaded, + the utility of a bitmap is significantly diminished. + + Similar to dc1daacdcc, we could technically just add this check in + reuse_partial_packfile_from_bitmap(), since it's possible to use a MIDX + .bitmap without needing to open any of its packs. But it's simpler to do + the check as early as possible, covering all direct uses of the + preferred pack. Note that doing this check early requires us to call + prepare_midx_pack() early, too, so move the relevant part of that loop + from load_reverse_index() into open_midx_bitmap_1(). + + Subsequent patches handle the non-preferred packs in a slightly + different fashion. Signed-off-by: Taylor Blau <me@ttaylorr.com> @@ pack-bitmap.c: static int open_midx_bitmap_1(struct bitmap_index *bitmap_git, + + preferred = bitmap_git->midx->packs[midx_preferred_pack(bitmap_git)]; + if (!is_pack_valid(preferred)) { -+ close(fd); + warning(_("preferred pack (%s) is invalid"), + preferred->pack_name); + goto cleanup; 2: 9adf6e1341 < -: ---------- builtin/pack-objects.c: ensure pack validity from MIDX bitmap objects -: ---------- > 2: 2719d33f32 builtin/pack-objects.c: avoid redundant NULL check -: ---------- > 3: cdc3265ec2 builtin/pack-objects.c: ensure included `--stdin-packs` exist -: ---------- > 4: 3fc3a95517 builtin/pack-objects.c: ensure pack validity from MIDX bitmap objects