Message ID | 53bffbf4-8308-0dd7-bca5-7c85b8334e05@web.de (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | pack-objects: fix and simplify --filter handling | expand |
On Sat, Nov 12 2022, René Scharfe wrote: > Since 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't > leak, 2022-03-28) --filter options given to git pack-objects overrule > earlier ones, letting only the leftmost win and leaking the memory > allocated for earlier ones. Fix that by only initializing the rev_info > struct once. If I do e.g. this with SANITIZE=leak: echo e83c5163316f89bfbde7d9ab23ca2e25604af290 | ./git pack-objects --revs --filter=blob:limit=1001 --filter=object:type=blob --stdout >/dev/null I see one leak that wasn't there, but two that are gone now. I haven't looked into it, but I think the commit message should discuss what leaks are fixed, which remain/are new etc. > @@ -4158,7 +4158,8 @@ static struct list_objects_filter_options *po_filter_revs_init(void *value) > { > struct po_filter_data *data = value; > > - repo_init_revisions(the_repository, &data->revs, NULL); > + if (!data->have_revs) > + repo_init_revisions(the_repository, &data->revs, NULL); > data->have_revs = 1; > > return &data->revs.filter; FWIW as this goes away in your 2/3 I think just squashing the two with a leak fix would be nice, if... > diff --git a/t/t5317-pack-objects-filter-objects.sh b/t/t5317-pack-objects-filter-objects.sh > index bb633c9b09..bd8983bb56 100755 > --- a/t/t5317-pack-objects-filter-objects.sh > +++ b/t/t5317-pack-objects-filter-objects.sh > @@ -178,6 +178,25 @@ test_expect_success 'verify blob:limit=1001' ' > test_cmp expected observed > ' > > +test_expect_success 'verify blob:limit=1001+object:type=blob' ' > + git -C r2 ls-files -s large.1000 | Aside: Should do "git >tmp && test_parse... <tmp", we lose the exit code of "ls-files" here. > + test_parse_ls_files_stage_oids | > + sort >expected && > + > + git -C r2 pack-objects --revs --stdout --filter=blob:limit=1001 \ > + --filter=object:type=blob >filter.pack <<-EOF && > + HEAD > + EOF > + git -C r2 index-pack ../filter.pack && > + > + git -C r2 verify-pack -v ../filter.pack >verify_result && > + grep blob verify_result | > + parse_verify_pack_blob_oid | Whereas this one's not a problem, no "git". > + sort >observed && > + > + test_cmp expected observed Aside: It would be nice if we had a "test_cmp_sort", but some other day... > +' > + > test_expect_success 'verify blob:limit=10001' ' > git -C r2 ls-files -s large.1000 large.10000 | > test_parse_ls_files_stage_oids | ...we can test it, but this test is in a file that's not part of "linux-leaks". If that one leak I mentioned above can be fixed (or maybe it's not new?) this could be tested if we put it in a file of its own with TEST_PASSES_SANITIZE_LEAK=true.
On Sat, Nov 12, 2022 at 11:44:00AM +0100, René Scharfe wrote: > Since 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't > leak, 2022-03-28) --filter options given to git pack-objects overrule > earlier ones, letting only the leftmost win and leaking the memory > allocated for earlier ones. Fix that by only initializing the rev_info > struct once. Yikes. I wondered how we managed to miss this in the tests. Surely we have some examples that use multiple filters? I think the answer is that we do in t5616, but in practice they didn't hit this bug because of the way filter-specs are treated server-sid3. Even if we say "clone --filter=foo --filter=bar" on the client, the server upload-pack coalesces that into a single "combine:foo+bar" string, and that's what its pack-objects sees. Your test here triggers it by invoking pack-objects directly, which makes sense. That's a little different from how pack-objects is invoked by the rest of Git in practice, but it's definitely something that _should_ work. It's conceivable that somebody outside of Git could be driving pack-objects directly, but also it's just a trap waiting to spring for future developers. -Peff
On Sat, Nov 12, 2022 at 11:58:44AM -0500, Jeff King wrote: > On Sat, Nov 12, 2022 at 11:44:00AM +0100, René Scharfe wrote: > > > Since 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't > > leak, 2022-03-28) --filter options given to git pack-objects overrule > > earlier ones, letting only the leftmost win and leaking the memory > > allocated for earlier ones. Fix that by only initializing the rev_info > > struct once. > > Yikes. I wondered how we managed to miss this in the tests. Surely we > have some examples that use multiple filters? > > I think the answer is that we do in t5616, but in practice they didn't > hit this bug because of the way filter-specs are treated server-sid3. > Even if we say "clone --filter=foo --filter=bar" on the client, the > server upload-pack coalesces that into a single "combine:foo+bar" > string, and that's what its pack-objects sees. ...Or in other words, clients aren't broken here because we always send "combine" filters when multiple `--filter` arguments are passed to `git clone`, `git fetch`, etc. Is that always the case? I *think* so, but it would be nice to know if there was a way that clients would not send the `combine` filter with >1 filter. Thanks, Taylor
On Sun, Nov 13, 2022 at 12:01:51AM -0500, Taylor Blau wrote: > > Yikes. I wondered how we managed to miss this in the tests. Surely we > > have some examples that use multiple filters? > > > > I think the answer is that we do in t5616, but in practice they didn't > > hit this bug because of the way filter-specs are treated server-sid3. > > Even if we say "clone --filter=foo --filter=bar" on the client, the > > server upload-pack coalesces that into a single "combine:foo+bar" > > string, and that's what its pack-objects sees. > > ...Or in other words, clients aren't broken here because we always send > "combine" filters when multiple `--filter` arguments are passed to `git > clone`, `git fetch`, etc. Right. > Is that always the case? I *think* so, but it would be nice to know if > there was a way that clients would not send the `combine` filter with >1 > filter. I think it's always the case at least with Git's code, because the client-side send_filter() sends from list_objects_filter_spec(), which takes the single-string spec of the top-level combine filter. It would have to explicitly iterate over the sub-filters to come up with anything else, and I don't think we even have a function for that (we do when evaluating them, of course, but not for generically iterating or producing a multi-string set of specs). And on the server side, even if a client did send multiple "filter" lines, I think we'd hit list_objects_filter_die_if_populated(), so we'd complain. I don't know of any other code paths that could hit this (can you filter on an outgoing push?). But in the end I'm not sure it even matters. We should be fixing the bug anyway, and this is all just understanding how far-reaching its effects are (and the answer seems to be: not very). -Peff
Am 13.11.22 um 06:01 schrieb Taylor Blau: > On Sat, Nov 12, 2022 at 11:58:44AM -0500, Jeff King wrote: >> On Sat, Nov 12, 2022 at 11:44:00AM +0100, René Scharfe wrote: >> >>> Since 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't >>> leak, 2022-03-28) --filter options given to git pack-objects overrule >>> earlier ones, letting only the leftmost win and leaking the memory >>> allocated for earlier ones. Fix that by only initializing the rev_info >>> struct once. >> >> Yikes. I wondered how we managed to miss this in the tests. Surely we >> have some examples that use multiple filters? >> >> I think the answer is that we do in t5616, but in practice they didn't >> hit this bug because of the way filter-specs are treated server-sid3. >> Even if we say "clone --filter=foo --filter=bar" on the client, the >> server upload-pack coalesces that into a single "combine:foo+bar" >> string, and that's what its pack-objects sees. > > ...Or in other words, clients aren't broken here because we always send > "combine" filters when multiple `--filter` arguments are passed to `git > clone`, `git fetch`, etc. > > Is that always the case? I *think* so, but it would be nice to know if > there was a way that clients would not send the `combine` filter with >1 > filter. Searching for pack-objects invocations like this should at least find all internal cases, right? $ git grep -e '"pack-objects"' -e --filter --all-match --name-only builtin/pack-objects.c bundle.c upload-pack.c git pack-objects handles --filter and doesn't pass it on. bundle.c and upload-pack.c only add at most a single --filter option to a strvec. bundle.c uses list_objects_filter_spec() to get the filter specification, upload-pack.c uses expand_list_objects_filter_spec(). René
Am 12.11.22 um 12:41 schrieb Ævar Arnfjörð Bjarmason: > > On Sat, Nov 12 2022, René Scharfe wrote: > >> Since 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't >> leak, 2022-03-28) --filter options given to git pack-objects overrule >> earlier ones, letting only the leftmost win and leaking the memory >> allocated for earlier ones. Fix that by only initializing the rev_info >> struct once. > > If I do e.g. this with SANITIZE=leak: > > echo e83c5163316f89bfbde7d9ab23ca2e25604af290 | ./git pack-objects --revs --filter=blob:limit=1001 --filter=object:type=blob --stdout >/dev/null > > I see one leak that wasn't there, but two that are gone now. I haven't > looked into it, but I think the commit message should discuss what leaks > are fixed, which remain/are new etc. The leak is insubstantial; I mentioned it just because of the irony. It is caused by initializing an already initialized struct rev_info without releasing it in between, as mentioned in the commit message. .filter_data allocated by filter_combine__init() is not released by filter_combine__init() in list-objects-filter.c. Plugging that leak allows your example command to run with SANITIZE=leak. That is a matter for a separate patch, though. > >> @@ -4158,7 +4158,8 @@ static struct list_objects_filter_options *po_filter_revs_init(void *value) >> { >> struct po_filter_data *data = value; >> >> - repo_init_revisions(the_repository, &data->revs, NULL); >> + if (!data->have_revs) >> + repo_init_revisions(the_repository, &data->revs, NULL); >> data->have_revs = 1; >> >> return &data->revs.filter; > > FWIW as this goes away in your 2/3 I think just squashing the two with a > leak fix would be nice, if... > >> diff --git a/t/t5317-pack-objects-filter-objects.sh b/t/t5317-pack-objects-filter-objects.sh >> index bb633c9b09..bd8983bb56 100755 >> --- a/t/t5317-pack-objects-filter-objects.sh >> +++ b/t/t5317-pack-objects-filter-objects.sh >> @@ -178,6 +178,25 @@ test_expect_success 'verify blob:limit=1001' ' >> test_cmp expected observed >> ' >> >> +test_expect_success 'verify blob:limit=1001+object:type=blob' ' >> + git -C r2 ls-files -s large.1000 | > > Aside: Should do "git >tmp && test_parse... <tmp", we lose the exit code > of "ls-files" here. OK. Copied that line from the surrounding tests. They used temporary files before fb2d0db502 (test-lib-functions: add parsing helpers for ls-files and ls-tree, 2022-04-04). > >> + test_parse_ls_files_stage_oids | >> + sort >expected && >> + >> + git -C r2 pack-objects --revs --stdout --filter=blob:limit=1001 \ >> + --filter=object:type=blob >filter.pack <<-EOF && >> + HEAD >> + EOF >> + git -C r2 index-pack ../filter.pack && >> + >> + git -C r2 verify-pack -v ../filter.pack >verify_result && >> + grep blob verify_result | >> + parse_verify_pack_blob_oid | > > Whereas this one's not a problem, no "git". > >> + sort >observed && >> + >> + test_cmp expected observed > > Aside: It would be nice if we had a "test_cmp_sort", but some other day... > >> +' >> + >> test_expect_success 'verify blob:limit=10001' ' >> git -C r2 ls-files -s large.1000 large.10000 | >> test_parse_ls_files_stage_oids | > > ...we can test it, but this test is in a file that's not part of "linux-leaks". > > If that one leak I mentioned above can be fixed (or maybe it's not new?) > this could be tested if we put it in a file of its own with > TEST_PASSES_SANITIZE_LEAK=true. Plugging the leak in your example command is not enough to make t5317 leak free. René
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 573d0b20b7..c702c09dd4 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -4158,7 +4158,8 @@ static struct list_objects_filter_options *po_filter_revs_init(void *value) { struct po_filter_data *data = value; - repo_init_revisions(the_repository, &data->revs, NULL); + if (!data->have_revs) + repo_init_revisions(the_repository, &data->revs, NULL); data->have_revs = 1; return &data->revs.filter; diff --git a/t/t5317-pack-objects-filter-objects.sh b/t/t5317-pack-objects-filter-objects.sh index bb633c9b09..bd8983bb56 100755 --- a/t/t5317-pack-objects-filter-objects.sh +++ b/t/t5317-pack-objects-filter-objects.sh @@ -178,6 +178,25 @@ test_expect_success 'verify blob:limit=1001' ' test_cmp expected observed ' +test_expect_success 'verify blob:limit=1001+object:type=blob' ' + git -C r2 ls-files -s large.1000 | + test_parse_ls_files_stage_oids | + sort >expected && + + git -C r2 pack-objects --revs --stdout --filter=blob:limit=1001 \ + --filter=object:type=blob >filter.pack <<-EOF && + HEAD + EOF + git -C r2 index-pack ../filter.pack && + + git -C r2 verify-pack -v ../filter.pack >verify_result && + grep blob verify_result | + parse_verify_pack_blob_oid | + sort >observed && + + test_cmp expected observed +' + test_expect_success 'verify blob:limit=10001' ' git -C r2 ls-files -s large.1000 large.10000 | test_parse_ls_files_stage_oids |
Since 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't leak, 2022-03-28) --filter options given to git pack-objects overrule earlier ones, letting only the leftmost win and leaking the memory allocated for earlier ones. Fix that by only initializing the rev_info struct once. Signed-off-by: René Scharfe <l.s.r@web.de> --- builtin/pack-objects.c | 3 ++- t/t5317-pack-objects-filter-objects.sh | 19 +++++++++++++++++++ 2 files changed, 21 insertions(+), 1 deletion(-) -- 2.38.1