Message ID | dfd1daacc5b12d470bb6deec3448cf7dbde2bf0f.1624314293.git.me@ttaylorr.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | multi-pack reachability bitmaps | expand |
On Mon, Jun 21 2021, Taylor Blau wrote: > When writing a new multi-pack index, write_midx_internal() attempts to > load any existing one to fill in some pieces of information. But it uses > load_multi_pack_index(), which ignores the configuration > "core.multiPackIndex", which indicates whether or not Git is allowed to > read an existing multi-pack-index. > > Replace this with a routine that does respect that setting, to avoid > reading multi-pack-index files when told not to. > > This avoids a problem that would arise in subsequent patches due to the > combination of 'git repack' reopening the object store in-process and > the multi-pack index code not checking whether a pack already exists in > the object store when calling add_pack_to_midx(). > > This would ultimately lead to a cycle being created along the > 'packed_git' struct's '->next' pointer. That is obviously bad, but it > has hard-to-debug downstream effects like saying a bitmap can't be > loaded for a pack because one already exists (for the same pack). > > Signed-off-by: Taylor Blau <me@ttaylorr.com> > --- > midx.c | 14 ++++++++++++-- > 1 file changed, 12 insertions(+), 2 deletions(-) > > diff --git a/midx.c b/midx.c > index 40eb7974ba..759007d5a8 100644 > --- a/midx.c > +++ b/midx.c > @@ -908,8 +908,18 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * > > if (m) > ctx.m = m; > - else > - ctx.m = load_multi_pack_index(object_dir, 1); > + else { Style nit: leaves the initial "if" braceless now that the "else" gained braces.
On Mon, Jun 21, 2021 at 06:25:18PM -0400, Taylor Blau wrote: > When writing a new multi-pack index, write_midx_internal() attempts to > load any existing one to fill in some pieces of information. But it uses > load_multi_pack_index(), which ignores the configuration > "core.multiPackIndex", which indicates whether or not Git is allowed to > read an existing multi-pack-index. > > Replace this with a routine that does respect that setting, to avoid > reading multi-pack-index files when told not to. > > This avoids a problem that would arise in subsequent patches due to the > combination of 'git repack' reopening the object store in-process and > the multi-pack index code not checking whether a pack already exists in > the object store when calling add_pack_to_midx(). > > This would ultimately lead to a cycle being created along the > 'packed_git' struct's '->next' pointer. That is obviously bad, but it > has hard-to-debug downstream effects like saying a bitmap can't be > loaded for a pack because one already exists (for the same pack). I'm not sure I completely understand the bug that this causes. But another question: does this impact how git -c core.multipackindex=false multi-pack-index write behaves? I.e., do we still write, but just avoid reading the existing midx? That itself seems like a more sensible behavior (e.g., trying to recover from a broken midx state). -Peff
On Wed, Jul 21, 2021 at 06:23:23AM -0400, Jeff King wrote: > On Mon, Jun 21, 2021 at 06:25:18PM -0400, Taylor Blau wrote: > > > When writing a new multi-pack index, write_midx_internal() attempts to > > load any existing one to fill in some pieces of information. But it uses > > load_multi_pack_index(), which ignores the configuration > > "core.multiPackIndex", which indicates whether or not Git is allowed to > > read an existing multi-pack-index. > > > > Replace this with a routine that does respect that setting, to avoid > > reading multi-pack-index files when told not to. > > > > This avoids a problem that would arise in subsequent patches due to the > > combination of 'git repack' reopening the object store in-process and > > the multi-pack index code not checking whether a pack already exists in > > the object store when calling add_pack_to_midx(). > > > > This would ultimately lead to a cycle being created along the > > 'packed_git' struct's '->next' pointer. That is obviously bad, but it > > has hard-to-debug downstream effects like saying a bitmap can't be > > loaded for a pack because one already exists (for the same pack). > > I'm not sure I completely understand the bug that this causes. Off-hand, I can't quite remember either. But it is important; I do have a distinct memory of dropping this patch and then watching a 'git repack --write-midx' (that option will be introduced in a later series) fail horribly. If I remember correctly, the bug has to do with loading a MIDX twice in the same process. When we call add_packed_git() from within prepare_midx_pack(), we load the pack without caring whether or not it's already loaded. So loading a MIDX twice in the same process will fail. So really I think that this is papering over that bug: we're just removing one of the times that we happened to load a MIDX from during the writing phase. What I do remember is that this bug was a huge pain to figure out ;). I'm happy to look further if you aren't satisfied with my vague explanation here (and I wouldn't blame you). > But another question: does this impact how > > git -c core.multipackindex=false multi-pack-index write > > behaves? I.e., do we still write, but just avoid reading the existing > midx? That itself seems like a more sensible behavior (e.g., trying to > recover from a broken midx state). Yes. Before this patch, that invocation would still load and use any existing MIDX to write a new one. Now we don't, because (unlike load_multi_pack_index()) prepare_multi_pack_index_one() does check core.multiPackIndex before returning anything. > -Peff Thanks, Taylor
On Wed, Jul 21, 2021 at 03:22:34PM -0400, Taylor Blau wrote: > > > This avoids a problem that would arise in subsequent patches due to the > > > combination of 'git repack' reopening the object store in-process and > > > the multi-pack index code not checking whether a pack already exists in > > > the object store when calling add_pack_to_midx(). > > > > > > This would ultimately lead to a cycle being created along the > > > 'packed_git' struct's '->next' pointer. That is obviously bad, but it > > > has hard-to-debug downstream effects like saying a bitmap can't be > > > loaded for a pack because one already exists (for the same pack). > > > > I'm not sure I completely understand the bug that this causes. > > Off-hand, I can't quite remember either. But it is important; I do have > a distinct memory of dropping this patch and then watching a 'git repack > --write-midx' (that option will be introduced in a later series) fail > horribly. > > If I remember correctly, the bug has to do with loading a MIDX twice in > the same process. When we call add_packed_git() from within > prepare_midx_pack(), we load the pack without caring whether or not it's > already loaded. So loading a MIDX twice in the same process will fail. > > So really I think that this is papering over that bug: we're just > removing one of the times that we happened to load a MIDX from during > the writing phase. Hmm, after staring at this for a bit, I've unconfused and re-confused myself several times. Here are some interesting bits: - calling load_multi_pack_index() directly creates a new midx object. None of its m->packs[] array will be filled in. Nor is it reachable as r->objects->multi_pack_index. - in using that midx, we end up calling prepare_midx_pack() for various packs, which creates a new packed_git struct and adds it to r->objects->packed_git (via install_packed_git()). So that's a bit weird already, because we have packed_git structs in r->objects that came from a midx that isn't r->objects->multi_pack_index. And then if we later call prepare_multi_pack_index(), for example as part of a pack reprepare, then we'd end up with duplicates. Whereas normally, when a direct load_multi_pack_index() was not called, our only midx would be r->objects->multi_pack_index, and so we'd avoid re-loading it. That seems wrong and wasteful, but I don't see how it results in a circular linked list. And it seems like it would already be the case for this write path, independent of your series. Either way, the solution is probably for prepare_midx_pack() to check for duplicates (which we can do pretty cheaply these days due to the hashmap; see prepare_pack). But I'm worried there is something else going on. Your commit message mentions add_pack_to_midx(). That's something we call as part of write_midx_internal(), and it does create other packed_git structs. But it never calls install_packed_git() on them; they just live in the write_midx_context. So I'm not sure how they'd interfere with things. And then there's one final oddity. Your patch assigns to ctx.m from r->objects->multi_pack_index. But later in write_midx_internal(), we call close_midx(). In the original, it's in the middle of the function, but one of your patches puts it at the end of the function. But that means we are closing r->objects->multi_pack_index. Looking at close_midx(), it does not actually zero the struct. So we'd still have r->objects->multi_pack_index->data pointed to memory which has been unmapped. That seems like an accident waiting to happen. I guess it doesn't usually cause problems because we'd typically write a midx near the end of the process, and then not look up other objects? So I'm concerned this is introducing a subtle bug that will bite us later. And we should figure out what the actual thing it's fixing is, so we can understand if there is a better way to fix it (e.g., by removing duplicates in prepare_midx_pack(), or if it is some interaction with the writing code). I guess a good thing to try would be dropping this patch and seeing if the tests break. ;) -Peff
On Fri, Jul 23, 2021 at 04:29:37AM -0400, Jeff King wrote: > On Wed, Jul 21, 2021 at 03:22:34PM -0400, Taylor Blau wrote: > > > > > This avoids a problem that would arise in subsequent patches due to the > > > > combination of 'git repack' reopening the object store in-process and > > > > the multi-pack index code not checking whether a pack already exists in > > > > the object store when calling add_pack_to_midx(). > > > > > > > > This would ultimately lead to a cycle being created along the > > > > 'packed_git' struct's '->next' pointer. That is obviously bad, but it > > > > has hard-to-debug downstream effects like saying a bitmap can't be > > > > loaded for a pack because one already exists (for the same pack). > > > > > > I'm not sure I completely understand the bug that this causes. > > > > Off-hand, I can't quite remember either. But it is important; I do have > > a distinct memory of dropping this patch and then watching a 'git repack > > --write-midx' (that option will be introduced in a later series) fail > > horribly. > > > > If I remember correctly, the bug has to do with loading a MIDX twice in > > the same process. When we call add_packed_git() from within > > prepare_midx_pack(), we load the pack without caring whether or not it's > > already loaded. So loading a MIDX twice in the same process will fail. > > > > So really I think that this is papering over that bug: we're just > > removing one of the times that we happened to load a MIDX from during > > the writing phase. > > Hmm, after staring at this for a bit, I've unconfused and re-confused > myself several times. > > Here are some interesting bits: > > - calling load_multi_pack_index() directly creates a new midx object. > None of its m->packs[] array will be filled in. Nor is it reachable > as r->objects->multi_pack_index. > > - in using that midx, we end up calling prepare_midx_pack() for > various packs, which creates a new packed_git struct and adds it to > r->objects->packed_git (via install_packed_git()). > > So that's a bit weird already, because we have packed_git structs in > r->objects that came from a midx that isn't r->objects->multi_pack_index. > And then if we later call prepare_multi_pack_index(), for example as > part of a pack reprepare, then we'd end up with duplicates. Ah, this jogged my memory: this is a relic from when we generated MIDX bitmaps in-process with the rest of the `repack` code. And when we did that, we did have to call `reprepare_packed_git()` after writing the new packs but before moving them into place. So that's where the `reprepare_packed_git()` came from, but we don't have any of that code anymore, since we now generate MIDX bitmaps by invoking: git multi-pack-index write --bitmap --stdin-packs --refs-snapshot as a sub-process of `git repack`; no need for any reprepare which is what was triggering this bug. To be sure, I reverted this patch out of GitHub's fork, and reran the tests both in normal mode (just `make test`) and then once more with the `GIT_TEST_MULTI_PACK_INDEX{,_WRITE_BITMAP}` environment variables set. Unsurprisingly, it passed both times. I'm happy to keep digging further, but I think that I'm 99% satisfied here. Digging further involves resurrecting a much older version of this series (and others adjacent to it), and there are probably other bugs lurking that would be annoying to tease out. In any case, let's drop this patch from the series. It's disappointing that we can't run: git -c core.multiPackIndex= multi-pack-index write anymore, but I guess that's no worse than the state we were in before this patch, so I'm content to let it live on. Thanks, Taylor
On Mon, Jul 26, 2021 at 02:59:02PM -0400, Taylor Blau wrote: > On Fri, Jul 23, 2021 at 04:29:37AM -0400, Jeff King wrote: > > On Wed, Jul 21, 2021 at 03:22:34PM -0400, Taylor Blau wrote: > > > > > > > This avoids a problem that would arise in subsequent patches due to the > > > > > combination of 'git repack' reopening the object store in-process and > > > > > the multi-pack index code not checking whether a pack already exists in > > > > > the object store when calling add_pack_to_midx(). > > > > > > > > > > This would ultimately lead to a cycle being created along the > > > > > 'packed_git' struct's '->next' pointer. That is obviously bad, but it > > > > > has hard-to-debug downstream effects like saying a bitmap can't be > > > > > loaded for a pack because one already exists (for the same pack). > > > > > > > > I'm not sure I completely understand the bug that this causes. > > > > > > Off-hand, I can't quite remember either. But it is important; I do have > > > a distinct memory of dropping this patch and then watching a 'git repack > > > --write-midx' (that option will be introduced in a later series) fail > > > horribly. > > > > > > If I remember correctly, the bug has to do with loading a MIDX twice in > > > the same process. When we call add_packed_git() from within > > > prepare_midx_pack(), we load the pack without caring whether or not it's > > > already loaded. So loading a MIDX twice in the same process will fail. > > > > > > So really I think that this is papering over that bug: we're just > > > removing one of the times that we happened to load a MIDX from during > > > the writing phase. > > > > Hmm, after staring at this for a bit, I've unconfused and re-confused > > myself several times. > > > > Here are some interesting bits: > > > > - calling load_multi_pack_index() directly creates a new midx object. > > None of its m->packs[] array will be filled in. Nor is it reachable > > as r->objects->multi_pack_index. > > > > - in using that midx, we end up calling prepare_midx_pack() for > > various packs, which creates a new packed_git struct and adds it to > > r->objects->packed_git (via install_packed_git()). > > > > So that's a bit weird already, because we have packed_git structs in > > r->objects that came from a midx that isn't r->objects->multi_pack_index. > > And then if we later call prepare_multi_pack_index(), for example as > > part of a pack reprepare, then we'd end up with duplicates. > > Ah, this jogged my memory: this is a relic from when we generated MIDX > bitmaps in-process with the rest of the `repack` code. And when we did > that, we did have to call `reprepare_packed_git()` after writing the new > packs but before moving them into place. Actually, I take that back. You were right from the start: the way the code is written we *can* end up calling both: - load_multi_pack_index, from write_midx_internal(), which sets up a MIDX, but does not update r->objects->multi_pack_index to point at it. - ...and prepare_multi_pack_index_one (via prepare_bitmap_git -> open_bitmap -> open_midx_bitmap -> get_multi_pack_index -> prepare_packed_git) which *also* creates a new MIDX, *and* updates the_repository->objects->multi_pack_index to point at it. (The latter codepath is from the check in write_midx_internal() to see if we already have a MIDX bitmap when the MIDX we are trying to write already exists on disk.) So in this scenario, we have two copies of the same MIDX open, and the repository's single pack is opened in one of the MIDXs, but not both. One copy of the pack is pointed at via r->objects->packed_git. Then when we fall back to open_pack_bitmap(), we call get_all_packs(), which calls prepare_midx_pack(), which installs the second MIDX's copy of the same pack into the r->objects->packed_git, and we have a cycle. I think there are a few ways to fix this bug. The most obvious is to make install_packed_git() check for the existence of the pack in the hashmap of installed packs before (re-)installing it. But that could be quadratic if the hashmap has too many collisions (and the lookup tends towards being linear in the number of keys rather than constant). But I think that a more straightforward way would be to open the MIDX we use when generating the MIDX with prepare_multi_pack_index_one() instead of load_multi_pack_index() so that the resulting MIDX is pointed at by r->objects->multi_pack_index. That would prevent the latter call from deep within the callstack of prepare_bitmap_git() from opening another copy and then (mistakenly) re-installing the same pack twice. Thoughts? Thanks, Taylor
On Mon, Jul 26, 2021 at 02:59:02PM -0400, Taylor Blau wrote: > > Hmm, after staring at this for a bit, I've unconfused and re-confused > > myself several times. > > > > Here are some interesting bits: > > > > - calling load_multi_pack_index() directly creates a new midx object. > > None of its m->packs[] array will be filled in. Nor is it reachable > > as r->objects->multi_pack_index. > > > > - in using that midx, we end up calling prepare_midx_pack() for > > various packs, which creates a new packed_git struct and adds it to > > r->objects->packed_git (via install_packed_git()). > > > > So that's a bit weird already, because we have packed_git structs in > > r->objects that came from a midx that isn't r->objects->multi_pack_index. > > And then if we later call prepare_multi_pack_index(), for example as > > part of a pack reprepare, then we'd end up with duplicates. > > Ah, this jogged my memory: this is a relic from when we generated MIDX > bitmaps in-process with the rest of the `repack` code. And when we did > that, we did have to call `reprepare_packed_git()` after writing the new > packs but before moving them into place. > > So that's where the `reprepare_packed_git()` came from, but we don't > have any of that code anymore, since we now generate MIDX bitmaps by > invoking: > > git multi-pack-index write --bitmap --stdin-packs --refs-snapshot > > as a sub-process of `git repack`; no need for any reprepare which is > what was triggering this bug. OK, that makes sense, especially given the "close_midx() leaves the pointer bogus" stuff discussed elsewhere. > To be sure, I reverted this patch out of GitHub's fork, and reran the > tests both in normal mode (just `make test`) and then once more with the > `GIT_TEST_MULTI_PACK_INDEX{,_WRITE_BITMAP}` environment variables set. > Unsurprisingly, it passed both times. > > I'm happy to keep digging further, but I think that I'm 99% satisfied > here. Digging further involves resurrecting a much older version of this > series (and others adjacent to it), and there are probably other bugs > lurking that would be annoying to tease out. > > In any case, let's drop this patch from the series. It's disappointing > that we can't run: > > git -c core.multiPackIndex= multi-pack-index write > > anymore, but I guess that's no worse than the state we were in before > this patch, so I'm content to let it live on. Great. If we can drop it, I think that is the best path forward. I think that may simplify things for the writing patch, too, then. It should not matter if we move close_midx() anymore, because we will not be closing the main r->objects->multi_pack_index struct. I do suspect we could be skipping the load _and_ close of the midx entirely in write_midx_internal(), and just using whatever the caller has passed in (and arguably just having most callers pass in the regular midx struct if they want us to reuse parts of it). That might be a cleanup we can leave for later, but it might be necessary to touch these bits anyway (if there is still some kind of close_midx() ordering gotcha in the function). -Peff
On Mon, Jul 26, 2021 at 06:14:09PM -0400, Taylor Blau wrote: > > Ah, this jogged my memory: this is a relic from when we generated MIDX > > bitmaps in-process with the rest of the `repack` code. And when we did > > that, we did have to call `reprepare_packed_git()` after writing the new > > packs but before moving them into place. > > Actually, I take that back. You were right from the start: the way the > code is written we *can* end up calling both: > > - load_multi_pack_index, from write_midx_internal(), which sets up a > MIDX, but does not update r->objects->multi_pack_index to point at > it. > > - ...and prepare_multi_pack_index_one (via prepare_bitmap_git -> > open_bitmap -> open_midx_bitmap -> get_multi_pack_index -> > prepare_packed_git) which *also* creates a new MIDX, *and* > updates the_repository->objects->multi_pack_index to point at it. > > (The latter codepath is from the check in write_midx_internal() to see > if we already have a MIDX bitmap when the MIDX we are trying to write > already exists on disk.) > > So in this scenario, we have two copies of the same MIDX open, and the > repository's single pack is opened in one of the MIDXs, but not both. > One copy of the pack is pointed at via r->objects->packed_git. Then when > we fall back to open_pack_bitmap(), we call get_all_packs(), which calls > prepare_midx_pack(), which installs the second MIDX's copy of the same > pack into the r->objects->packed_git, and we have a cycle. Right, I understand how that ends up with duplicate structs for each pack. But how do we get a cycle out of that? > I think there are a few ways to fix this bug. The most obvious is to > make install_packed_git() check for the existence of the pack in the > hashmap of installed packs before (re-)installing it. But that could be > quadratic if the hashmap has too many collisions (and the lookup tends > towards being linear in the number of keys rather than constant). I think it may be worth doing that anyway. You can assume the hashmap will behave reasonably. But it would mean that the "multi_pack_index" flag in packed_git does not specify _which_ midx is pointing to it. At the very least, it would need to become a ref-count (so when one midx goes away, it does not lose its "I am part of a midx" flag). And possibly it would need to actually know the complete list of midx structs it's associated with (I haven't looked at all of the uses of that flag). That makes things sufficiently tricky that I would prefer not to untangle it as part of this series. > But I think that a more straightforward way would be to open the MIDX we > use when generating the MIDX with prepare_multi_pack_index_one() instead > of load_multi_pack_index() so that the resulting MIDX is pointed at by > r->objects->multi_pack_index. That would prevent the latter call from > deep within the callstack of prepare_bitmap_git() from opening another > copy and then (mistakenly) re-installing the same pack twice. But now the internal midx writing code can never call close_midx() on that, because it does not own it to close. Can we simply drop the close_midx() call there? This would all make much more sense to me if write_midx_internal() simply took a conceptually read-only midx as a parameter, and the caller passed in the appropriate one (probably even using prepare_multi_pack_index_one() to get it). -Peff
On Tue, Jul 27, 2021 at 01:29:49PM -0400, Jeff King wrote: > On Mon, Jul 26, 2021 at 06:14:09PM -0400, Taylor Blau wrote: > > So in this scenario, we have two copies of the same MIDX open, and the > > repository's single pack is opened in one of the MIDXs, but not both. > > One copy of the pack is pointed at via r->objects->packed_git. Then when > > we fall back to open_pack_bitmap(), we call get_all_packs(), which calls > > prepare_midx_pack(), which installs the second MIDX's copy of the same > > pack into the r->objects->packed_git, and we have a cycle. > > Right, I understand how that ends up with duplicate structs for each > pack. But how do we get a cycle out of that? Sorry, it isn't a true cycle where p->next == p. > But now the internal midx writing code can never call close_midx() on > that, because it does not own it to close. Can we simply drop the > close_midx() call there? > > This would all make much more sense to me if write_midx_internal() > simply took a conceptually read-only midx as a parameter, and the caller > passed in the appropriate one (probably even using > prepare_multi_pack_index_one() to get it). No, we can't drop the close_midx() call there because we must close the MIDX file on Windows before moving a new one into place. My feeling is we should always be working on the r->objects->multi_pack_index pointer, and calling close_object_store() there instead of close_midx(). Does that seem like a reasonable approach to you? Thanks, Taylor
On Tue, Jul 27, 2021 at 01:36:34PM -0400, Taylor Blau wrote: > > But now the internal midx writing code can never call close_midx() on > > that, because it does not own it to close. Can we simply drop the > > close_midx() call there? > > > > This would all make much more sense to me if write_midx_internal() > > simply took a conceptually read-only midx as a parameter, and the caller > > passed in the appropriate one (probably even using > > prepare_multi_pack_index_one() to get it). > > No, we can't drop the close_midx() call there because we must close the > MIDX file on Windows before moving a new one into place. My feeling is > we should always be working on the r->objects->multi_pack_index pointer, > and calling close_object_store() there instead of close_midx(). > > Does that seem like a reasonable approach to you? Yes, though I'd have said that it is the responsibility of the caller (who knows we are operating with r->objects->multi_pack_index) to do that closing. But maybe it's not possible if the rename-into-place happens at too low a level. BTW, yet another weirdness: close_object_store() will call close_midx() on the outermost midx struct, ignoring o->multi_pack_index->next entirely. So that's a leak, but also means we may not be closing the midx we're interested in (since write_midx_internal() takes an object-dir parameter, and we could be pointing to some other object-dir's midx). -Peff
On Tue, Jul 27, 2021 at 01:42:35PM -0400, Jeff King wrote: > On Tue, Jul 27, 2021 at 01:36:34PM -0400, Taylor Blau wrote: > > > > But now the internal midx writing code can never call close_midx() on > > > that, because it does not own it to close. Can we simply drop the > > > close_midx() call there? > > > > > > This would all make much more sense to me if write_midx_internal() > > > simply took a conceptually read-only midx as a parameter, and the caller > > > passed in the appropriate one (probably even using > > > prepare_multi_pack_index_one() to get it). > > > > No, we can't drop the close_midx() call there because we must close the > > MIDX file on Windows before moving a new one into place. My feeling is > > we should always be working on the r->objects->multi_pack_index pointer, > > and calling close_object_store() there instead of close_midx(). > > > > Does that seem like a reasonable approach to you? > > Yes, though I'd have said that it is the responsibility of the caller > (who knows we are operating with r->objects->multi_pack_index) to do > that closing. But maybe it's not possible if the rename-into-place > happens at too low a level. Right; write_midx_internal() needs to access the MIDX right up until the point that we write the new one into place, so the only place to close it is in write_midx_internal(). > BTW, yet another weirdness: close_object_store() will call close_midx() > on the outermost midx struct, ignoring o->multi_pack_index->next > entirely. So that's a leak, but also means we may not be closing the > midx we're interested in (since write_midx_internal() takes an > object-dir parameter, and we could be pointing to some other > object-dir's midx). Yuck, this is a mess. I'm tempted to say that we should be closing the MIDX that we're operating on inside of write_midx_internal() so we can write, but then declaring the whole object store to be bunk and calling close_object_store() before leaving the function. Of course, one of those steps should be closing the inner-most MIDX before closing the next one and so on. > -Peff Thanks, Taylor
On Tue, Jul 27, 2021 at 01:47:40PM -0400, Taylor Blau wrote: > > BTW, yet another weirdness: close_object_store() will call close_midx() > > on the outermost midx struct, ignoring o->multi_pack_index->next > > entirely. So that's a leak, but also means we may not be closing the > > midx we're interested in (since write_midx_internal() takes an > > object-dir parameter, and we could be pointing to some other > > object-dir's midx). > > Yuck, this is a mess. I'm tempted to say that we should be closing the > MIDX that we're operating on inside of write_midx_internal() so we can > write, but then declaring the whole object store to be bunk and calling > close_object_store() before leaving the function. Of course, one of > those steps should be closing the inner-most MIDX before closing the > next one and so on. That gets even weirder when you look at other callers of write_midx_internal(). For instance, expire_midx_packs() is calling load_multi_pack_index() directly, and then passing it to write_midx_internal(). So closing the whole object store there is likewise weird. I actually think having write_midx_internal() open up a new midx is reasonable-ish. It's just that: - it's weird when it stuffs duplicate packs into the r->objects->packed_git list. But AFAICT that's not actually hurting anything? - we do need to make sure that the midx is closed (not just our copy, but any other open copies that happen to be in the same process) in order for things to work on Windows. So I guess because of the second point, the internal midx write probably needs to be calling close_object_store(). But because other callers use load_multi_pack_index(), it probably needs to be closing the one that is passed in, too! But of course not double-closing it if it did come from the regular object store. One easy easy way to avoid that is to just open a separate one. I have some spicy takes on how midx's should have been designed, but I think it's probably not productive to rant about it at this point. ;) -Peff
On Tue, Jul 27, 2021 at 01:55:52PM -0400, Jeff King wrote: > On Tue, Jul 27, 2021 at 01:47:40PM -0400, Taylor Blau wrote: > > > > BTW, yet another weirdness: close_object_store() will call close_midx() > > > on the outermost midx struct, ignoring o->multi_pack_index->next > > > entirely. So that's a leak, but also means we may not be closing the > > > midx we're interested in (since write_midx_internal() takes an > > > object-dir parameter, and we could be pointing to some other > > > object-dir's midx). > > > > Yuck, this is a mess. I'm tempted to say that we should be closing the > > MIDX that we're operating on inside of write_midx_internal() so we can > > write, but then declaring the whole object store to be bunk and calling > > close_object_store() before leaving the function. Of course, one of > > those steps should be closing the inner-most MIDX before closing the > > next one and so on. > > That gets even weirder when you look at other callers of > write_midx_internal(). For instance, expire_midx_packs() is calling > load_multi_pack_index() directly, and then passing it to > write_midx_internal(). > > So closing the whole object store there is likewise weird. > > I actually think having write_midx_internal() open up a new midx is > reasonable-ish. It's just that: > > - it's weird when it stuffs duplicate packs into the > r->objects->packed_git list. But AFAICT that's not actually hurting > anything? It is hurting us when we try to write a MIDX bitmap, because we try to see if one already exists. And to do that, we call prepare_bitmap_git(), which tries to call open_pack_bitmap_1 on *each* pack in the packed_git list. Critically, prepare_bitmap_git() errors out if it is called with a bitmap_git that has a non-NULL `->pack` pointer. So they aren't a cycle in the sense that `p->next == p`, but it does cause problems for us nonetheless. I stepped away from my computer for an hour or so and thought about this, and I think that the solution is two-fold: - We should be more careful about freeing up the ->next pointers of a MIDX, and releasing the memory we allocated to hold each MIDX struct in the first place. - We should always be operating on the repository's r->objects->multi_pack_index, or any other MIDX that can be reached via walking the `->next` pointers. If we do that consistently, then we'll only have at most one instance of a MIDX struct corresponding to each MIDX file on disk. In the reroll that I'll send shortly, those are: - https://github.com/ttaylorr/git/commit/61a617715f3827401522c7b08b50bb6866f2a4e9 - https://github.com/ttaylorr/git/commit/fd15ecf47c57ce4ff0d31621c2c9f61ff7a74939 respectively. It resolves my issue locally which was that I was previously unable to run: GIT_TEST_MULTI_PACK_INDEX=1 \ GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=1 \ t7700-repack.sh without t7700.13 failing on me (it previously complained about a "duplicate" .bitmap file, which is a side-effect of placing duplicates in the packed_git list, not a true duplicate .bitmap on disk). I'm waiting on a CI run [1], but I'm relatively happy with the result. I think we could do something similar to the MIDX code like we did to the commit-graph code in [2], but I'm reasonably happy with where we are now. Thanks, Taylor [1]: https://github.com/ttaylorr/git/actions/runs/1072513087 [2]: https://lore.kernel.org/git/cover.1580424766.git.me@ttaylorr.com/
On Tue, Jul 27, 2021 at 04:05:39PM -0400, Taylor Blau wrote: > > I actually think having write_midx_internal() open up a new midx is > > reasonable-ish. It's just that: > > > > - it's weird when it stuffs duplicate packs into the > > r->objects->packed_git list. But AFAICT that's not actually hurting > > anything? > > It is hurting us when we try to write a MIDX bitmap, because we try to > see if one already exists. And to do that, we call prepare_bitmap_git(), > which tries to call open_pack_bitmap_1 on *each* pack in the packed_git > list. Critically, prepare_bitmap_git() errors out if it is called with a > bitmap_git that has a non-NULL `->pack` pointer. It doesn't error out. It does produce a warning(), though, if it ignores a bitmap (and that warning is doubly confusing because it is ignoring bitmap X because it has already loaded and will use that exact same X!). This causes t7700.13 to fail because it is being picky about stderr being empty. So the overall behavior is correct, but I agree it's sufficiently ugly that we should make sure it doesn't happen. Side note: IMHO the "check all packs to see if there are any other bitmaps to warn about" behavior is kind of pointless, and we should consider just returning as soon as we have one. This is already somewhat the case after your midx-bitmap patches, as we will not even bother to look for a pack bitmap after finding a midx bitmap. That is a good thing, because it means you can keep pack bitmaps around for flexibility. But let's leave any changes to the pack-only behavior out of this series for simplicity. > I stepped away from my computer for an hour or so and thought about > this, and I think that the solution is two-fold: > > - We should be more careful about freeing up the ->next pointers of a > MIDX, and releasing the memory we allocated to hold each MIDX struct > in the first place. Yeah. This is a bug already before your series. I suspect nobody noticed because it's very rare for us to call close_midx() at all, and it only matters if there's an alternate odb with a midx. (The common call to close_midx() is in these write paths, but it is always using a single midx file). > - We should always be operating on the repository's > r->objects->multi_pack_index, or any other MIDX that can be reached > via walking the `->next` pointers. If we do that consistently, then > we'll only have at most one instance of a MIDX struct corresponding > to each MIDX file on disk. Certainly that makes sense to me in terms of the Windows "must close the current midx before writing" behavior. We have to realize that we're operating in the current repo. But we do allow an "--object-dir" option to "multi-pack-index write", and I don't see any other code explicitly requiring that it be part of the current repository. What I'm wondering is whether this would be breaking: cd $REPO/.. git multi-pack-index --object-dir $REPO/.git/objects write or: cd /some/other/repo git multi-pack-index --object-dir $REPO/.git/objects write The latter does seem to work, but the former segfaults (usually -- if there's already a midx it is OK). If it is broken now, this may be a good time to explicitly forbid it. It does seem to make the --object-dir mostly pointless, though it would still work for operating on a midx in one of your alternates. I'm not sure I understand the original point of that option, and if the current behavior is sufficient. If it turns out that we can't forbid writing midx's besides the ones in r->objects, it may be sufficient to just make any assumptions conditional. I.e., _if_ it's one of the ones mentioned by r->objects, then close it, but otherwise leave it open. But if we can get away with restricting ourselves as you described, I think the result will be much simpler, and we should prefer that. -Peff
On Wed, Jul 28, 2021 at 01:46:21PM -0400, Jeff King wrote: > On Tue, Jul 27, 2021 at 04:05:39PM -0400, Taylor Blau wrote: > > > > I actually think having write_midx_internal() open up a new midx is > > > reasonable-ish. It's just that: > > > > > > - it's weird when it stuffs duplicate packs into the > > > r->objects->packed_git list. But AFAICT that's not actually hurting > > > anything? > > > > It is hurting us when we try to write a MIDX bitmap, because we try to > > see if one already exists. And to do that, we call prepare_bitmap_git(), > > which tries to call open_pack_bitmap_1 on *each* pack in the packed_git > > list. Critically, prepare_bitmap_git() errors out if it is called with a > > bitmap_git that has a non-NULL `->pack` pointer. > > It doesn't error out. It does produce a warning(), though, if it ignores > a bitmap (and that warning is doubly confusing because it is ignoring > bitmap X because it has already loaded and will use that exact same X!). > > This causes t7700.13 to fail because it is being picky about stderr > being empty. Right, sorry for suggesting that the error was more severe than it actually is. > So the overall behavior is correct, but I agree it's sufficiently ugly > that we should make sure it doesn't happen. 100% agreed. I think the most unfortunate thing is the state of r->objects->packed_git, since it's utterly bizarre to have the same pack opened twice and have both of those copies in the list. That is definitely worth preventing. > Side note: IMHO the "check all packs to see if there are any other > bitmaps to warn about" behavior is kind of pointless, and we should > consider just returning as soon as we have one. This is already > somewhat the case after your midx-bitmap patches, as we will not even > bother to look for a pack bitmap after finding a midx bitmap. That is > a good thing, because it means you can keep pack bitmaps around for > flexibility. But let's leave any changes to the pack-only behavior out > of this series for simplicity. I agree. I'd be in favor of something like diff --git a/pack-bitmap.c b/pack-bitmap.c index f599646e19..5450ffb04c 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -378,13 +378,6 @@ static int open_pack_bitmap_1(struct bitmap_index *bitmap_git, struct packed_git return -1; } - if (bitmap_git->pack || bitmap_git->midx) { - /* ignore extra bitmap file; we can only handle one */ - warning("ignoring extra bitmap file: %s", packfile->pack_name); - close(fd); - return -1; - } - bitmap_git->pack = packfile; bitmap_git->map_size = xsize_t(st.st_size); bitmap_git->map = xmmap(NULL, bitmap_git->map_size, PROT_READ, MAP_PRIVATE, fd, 0); @@ -465,16 +458,15 @@ static int open_pack_bitmap(struct repository *r, struct bitmap_index *bitmap_git) { struct packed_git *p; - int ret = -1; assert(!bitmap_git->map); for (p = get_all_packs(r); p; p = p->next) { - if (open_pack_bitmap_1(bitmap_git, p) == 0) - ret = 0; + if (!open_pack_bitmap_1(bitmap_git, p)) + return 0; } - return ret; + return -1; } static int open_midx_bitmap(struct repository *r, ...but agree that we should wait until after the dust has settled on this already-complex series. > > - We should always be operating on the repository's > > r->objects->multi_pack_index, or any other MIDX that can be reached > > via walking the `->next` pointers. If we do that consistently, then > > we'll only have at most one instance of a MIDX struct corresponding > > to each MIDX file on disk. > > Certainly that makes sense to me in terms of the Windows "must close the > current midx before writing" behavior. We have to realize that we're > operating in the current repo. > > But we do allow an "--object-dir" option to "multi-pack-index write", > and I don't see any other code explicitly requiring that it be part of > the current repository. What I'm wondering is whether this would be > breaking: > > cd $REPO/.. > git multi-pack-index --object-dir $REPO/.git/objects write > > or: > > cd /some/other/repo > git multi-pack-index --object-dir $REPO/.git/objects write > > The latter does seem to work, but the former segfaults (usually -- if > there's already a midx it is OK). The former should work, but doesn't, because (as you pointed out to me in our regular weekly discussion off-list) that the "multi-pack-index" entry in git.c's commands array has the RUN_SETUP_GENTLY option, and probably should have RUN_SETUP so that we complain with die() instead of BUG. And the latter will continue to work, but only if in your scenario that $REPO is an alternate of /some/other/repo. I wrote a little bit more in [1] about this behavior, but the upshot is that we used to technically support passing *any* directory to `--object-dir`, including directories that didn't belong to an alternated repository. And that will cease to work after the patch that [1] is in response to is applied. But for the reasons that I explain there, I think that is a sufficient outcome, because the behavior is kind of bizarre to begin with. [1]: https://lore.kernel.org/git/YQMB32fvSiH9julg@nand.local/ Thanks, Taylor
On Thu, Jul 29, 2021 at 03:44:34PM -0400, Taylor Blau wrote: > > Side note: IMHO the "check all packs to see if there are any other > > bitmaps to warn about" behavior is kind of pointless, and we should > > consider just returning as soon as we have one. This is already > > somewhat the case after your midx-bitmap patches, as we will not even > > bother to look for a pack bitmap after finding a midx bitmap. That is > > a good thing, because it means you can keep pack bitmaps around for > > flexibility. But let's leave any changes to the pack-only behavior out > > of this series for simplicity. > > I agree. I'd be in favor of something like > [...patch...] Yep, that looks good. I'd be quite happy if you sent that once the dust is settled. > > But we do allow an "--object-dir" option to "multi-pack-index write", > > and I don't see any other code explicitly requiring that it be part of > > the current repository. What I'm wondering is whether this would be > > breaking: > > > > cd $REPO/.. > > git multi-pack-index --object-dir $REPO/.git/objects write > > > > or: > > > > cd /some/other/repo > > git multi-pack-index --object-dir $REPO/.git/objects write > > > > The latter does seem to work, but the former segfaults (usually -- if > > there's already a midx it is OK). > > The former should work, but doesn't, because (as you pointed out to me > in our regular weekly discussion off-list) that the "multi-pack-index" > entry in git.c's commands array has the RUN_SETUP_GENTLY option, and > probably should have RUN_SETUP so that we complain with die() instead of > BUG. > > And the latter will continue to work, but only if in your scenario that > $REPO is an alternate of /some/other/repo. > > I wrote a little bit more in [1] about this behavior, but the upshot is > that we used to technically support passing *any* directory to > `--object-dir`, including directories that didn't belong to an > alternated repository. > > And that will cease to work after the patch that [1] is in response to > is applied. But for the reasons that I explain there, I think that is a > sufficient outcome, because the behavior is kind of bizarre to begin > with. Yeah, I think I am comfortable with the change at this point. The only case that will be broken is one that is quite ridiculous, and I am surprised worked in the first place. Thanks for talking it through. -Peff
diff --git a/midx.c b/midx.c index 40eb7974ba..759007d5a8 100644 --- a/midx.c +++ b/midx.c @@ -908,8 +908,18 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * if (m) ctx.m = m; - else - ctx.m = load_multi_pack_index(object_dir, 1); + else { + struct multi_pack_index *cur; + + prepare_multi_pack_index_one(the_repository, object_dir, 1); + + ctx.m = NULL; + for (cur = the_repository->objects->multi_pack_index; cur; + cur = cur->next) { + if (!strcmp(object_dir, cur->object_dir)) + ctx.m = cur; + } + } ctx.nr = 0; ctx.alloc = ctx.m ? ctx.m->num_packs : 16;
When writing a new multi-pack index, write_midx_internal() attempts to load any existing one to fill in some pieces of information. But it uses load_multi_pack_index(), which ignores the configuration "core.multiPackIndex", which indicates whether or not Git is allowed to read an existing multi-pack-index. Replace this with a routine that does respect that setting, to avoid reading multi-pack-index files when told not to. This avoids a problem that would arise in subsequent patches due to the combination of 'git repack' reopening the object store in-process and the multi-pack index code not checking whether a pack already exists in the object store when calling add_pack_to_midx(). This would ultimately lead to a cycle being created along the 'packed_git' struct's '->next' pointer. That is obviously bad, but it has hard-to-debug downstream effects like saying a bitmap can't be loaded for a pack because one already exists (for the same pack). Signed-off-by: Taylor Blau <me@ttaylorr.com> --- midx.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-)