Message ID | 20240619130256.GA228005@coredump.intra.peff.net (mailing list archive) |
---|---|
State | Accepted |
Commit | 96a6621d256dc39a561913f5aadd1605ba33f161 |
Headers | show |
Series | fetch-pack: fix segfault when fscking without --lock-pack | expand |
On Wed, Jun 19, 2024 at 09:02:56AM -0400, Jeff King wrote: > I think it's a bug that fetch.unpackLimit causes index-pack to create a > lockfile at all, but there's a more direct way to trigger the issue, > which I've used in the patch below. I'll follow up with more details in > a reply to the patch. Your original test used transfer.unpackLimit. By itself that should just cause us to avoid calling unpack-objects, keeping the pack we got, but _not_ creating a .keep file. Likewise, if you pass one "-k" to fetch-pack, it should just keep the pack but without a .keep file (that's what the double "-k -k" does). But both of these do trigger a .keep file, which seems wrong to me. The caller has no idea that the .keep file was created, so it won't clean it up, and the pack will be in limbo forever. I tried bisecting and came up with fa74052922 (Always obtain fetch-pack arguments from struct fetch_pack_args, 2007-09-19). Given the length of time it's been this way, that makes me a little afraid to touch it. ;) But I think in practice it is not really triggered because of what I wrote earlier: > Nobody noticed the bug for so long because the transport code used by > "git fetch" always passes in a pack_lockfiles pointer, and remote-curl > (the main user of the fetch-pack plumbing command) always passes > --lock-pack. That is, we're always asking for a lock-file anyway. But it could affect external users of the fetch-pack plumbing. I.e., the very command that produced the segfault for you is probably leaving an unexpected .keep file in place. > So it's possible to ask index-pack to create the lock-file (using "-k > -k") but not ask to record it (by avoiding "--lock-pack"). This worked > fine until 5476e1efde (fetch-pack: print and use dangling .gitmodules, > 2021-02-22), but now it causes a segfault. > > Before that commit, if pack_lockfiles was NULL, we wouldn't bother > reading the output from index-pack at all. But since that commit, > index-pack may produce extra output if we asked it to fsck. So even if > nobody cares about the lockfile path, we still need to read it to skip > to the output we do care about. There's another interesting fallout from 5476e1efde that I noticed here. Before that commit, if you did not pass --lock-pack to fetch-pack, then we would never bother reading stdout from index-pack, and it would go to the caller's stdout! So doing: git fetch-pack -k -k repo HEAD would produce: keep 3282886e55735beb9a08b048394284b03bef8488 or similar on stdout. Which makes me wonder if some callers depend on that. Or if it is simply a bug, since it would be intermingled with fetch-pack's actual output. We still produce that output today, but if you use fetch.fsckObjects, then we eat it while looking for the other fsck-related output. -Peff
On Wed, Jun 19, 2024 at 09:22:08AM -0400, Jeff King wrote: > On Wed, Jun 19, 2024 at 09:02:56AM -0400, Jeff King wrote: > > > I think it's a bug that fetch.unpackLimit causes index-pack to create a > > lockfile at all, but there's a more direct way to trigger the issue, > > which I've used in the patch below. I'll follow up with more details in > > a reply to the patch. > > Your original test used transfer.unpackLimit. By itself that should just > cause us to avoid calling unpack-objects, keeping the pack we got, but > _not_ creating a .keep file. Likewise, if you pass one "-k" to > fetch-pack, it should just keep the pack but without a .keep file > (that's what the double "-k -k" does). > > But both of these do trigger a .keep file, which seems wrong to me. The > caller has no idea that the .keep file was created, so it won't clean it > up, and the pack will be in limbo forever. > > I tried bisecting and came up with fa74052922 (Always obtain fetch-pack > arguments from struct fetch_pack_args, 2007-09-19). Given the length of > time it's been this way, that makes me a little afraid to touch it. ;) Even before that commit, we'd trigger the lock of unpack_limit was set there. I find all of this code really puzzling (which makes me even more hesitant to touch it). But I really don't understand why it is not just this: diff --git a/fetch-pack.c b/fetch-pack.c index 42f48fbc31..ed57b0fdac 100644 --- a/fetch-pack.c +++ b/fetch-pack.c @@ -971,7 +971,7 @@ static int get_pack(struct fetch_pack_args *args, strvec_push(&cmd.args, "-v"); if (args->use_thin_pack) strvec_push(&cmd.args, "--fix-thin"); - if ((do_keep || index_pack_args) && (args->lock_pack || unpack_limit)) + if ((do_keep || index_pack_args) && args->lock_pack) add_index_pack_keep_option(&cmd.args); if (!index_pack_args && args->check_self_contained_and_connected) strvec_push(&cmd.args, "--check-self-contained-and-connected"); diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh index 585ea0ee16..d6d6ea6281 100755 --- a/t/t5500-fetch-pack.sh +++ b/t/t5500-fetch-pack.sh @@ -1003,6 +1003,28 @@ test_expect_success 'fetch-pack with fsckObjects and keep-file does not segfault -C client fetch-pack -k -k ../server HEAD ' +test_expect_success 'fetch-pack -k does not create .keep file' ' + rm -rf server client && + test_create_repo server && + test_commit -C server one && + + test_create_repo client && + git -C client fetch-pack -k ../server HEAD && + find client/.git/objects/pack -name "*.keep" >keep && + test_must_be_empty keep +' + +test_expect_success 'fetch-pack with unpackLimit does not create .keep file' ' + rm -rf server client && + test_create_repo server && + test_commit -C server one && + + test_create_repo client && + git -c fetch.unpackLimit=1 -C client fetch-pack ../server HEAD && + find client/.git/objects/pack -name "*.keep" >keep && + test_must_be_empty keep +' + test_expect_success 'filtering by size' ' rm -rf server client && test_create_repo server &&
Jeff King <peff@peff.net> writes: > Before that commit, if pack_lockfiles was NULL, we wouldn't bother > reading the output from index-pack at all. But since that commit, > index-pack may produce extra output if we asked it to fsck. So even if > nobody cares about the lockfile path, we still need to read it to skip > to the output we do care about. True. Looking at that problematic commit, I wonder how it passed the review process. As a design, adding a list of bare object IDs without marking what they are for is way too inextensible by our standard practice. It is probably not too late to fix it, as this is purely an internal implementation detail between fetch-pack and index-pack that is not even documented ("git index-pack --help" does talk about the "(pack|keep)\t<pack-name>" output, but never the output after that). > We correctly check that we didn't get a NULL lockfile path (which can > happen if we did not ask it to create a .keep file at all), but we > missed the case where the lockfile path is not NULL (due to "-k -k") but > the pack_lockfiles string_list is NULL (because nobody passed > "--lock-pack"), and segfault trying to add to the NULL string-list. > > We can fix this by skipping the append to the string list when either > the value or the list is NULL. In that case we must also free the > lockfile path to avoid leaking it when it's non-NULL. OK. > diff --git a/fetch-pack.c b/fetch-pack.c > index eba9e420ea..42f48fbc31 100644 > --- a/fetch-pack.c > +++ b/fetch-pack.c > @@ -1038,8 +1038,10 @@ static int get_pack(struct fetch_pack_args *args, > > if (!is_well_formed) > die(_("fetch-pack: invalid index-pack output")); > - if (pack_lockfile) > + if (pack_lockfiles && pack_lockfile) > string_list_append_nodup(pack_lockfiles, pack_lockfile); > + else > + free(pack_lockfile); > parse_gitmodules_oids(cmd.out, gitmodules_oids); > close(cmd.out); > } That looks like a very safe thing to do. > diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh > index b26f367620..585ea0ee16 100755 > --- a/t/t5500-fetch-pack.sh > +++ b/t/t5500-fetch-pack.sh > @@ -993,6 +993,16 @@ test_expect_success 'ensure bogus fetch.negotiationAlgorithm yields error' ' > fetch origin server_has both_have_2 > ' > > +test_expect_success 'fetch-pack with fsckObjects and keep-file does not segfault' ' > + rm -rf server client && > + test_create_repo server && > + test_commit -C server one && > + > + test_create_repo client && > + git -c fetch.fsckObjects=true \ > + -C client fetch-pack -k -k ../server HEAD > +' > + And the test is quite straight-forward. Will queue. Thanks.
diff --git a/fetch-pack.c b/fetch-pack.c index eba9e420ea..42f48fbc31 100644 --- a/fetch-pack.c +++ b/fetch-pack.c @@ -1038,8 +1038,10 @@ static int get_pack(struct fetch_pack_args *args, if (!is_well_formed) die(_("fetch-pack: invalid index-pack output")); - if (pack_lockfile) + if (pack_lockfiles && pack_lockfile) string_list_append_nodup(pack_lockfiles, pack_lockfile); + else + free(pack_lockfile); parse_gitmodules_oids(cmd.out, gitmodules_oids); close(cmd.out); } diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh index b26f367620..585ea0ee16 100755 --- a/t/t5500-fetch-pack.sh +++ b/t/t5500-fetch-pack.sh @@ -993,6 +993,16 @@ test_expect_success 'ensure bogus fetch.negotiationAlgorithm yields error' ' fetch origin server_has both_have_2 ' +test_expect_success 'fetch-pack with fsckObjects and keep-file does not segfault' ' + rm -rf server client && + test_create_repo server && + test_commit -C server one && + + test_create_repo client && + git -c fetch.fsckObjects=true \ + -C client fetch-pack -k -k ../server HEAD +' + test_expect_success 'filtering by size' ' rm -rf server client && test_create_repo server &&