diff mbox series

[v2,3/3] Revert "pack-objects: lazily set up "struct rev_info", don't leak"

Message ID f5779e19-813c-cda9-2f84-9fe58f745e89@web.de (mailing list archive)
State Superseded
Headers show
Series pack-objects: fix and simplify --filter handling | expand

Commit Message

René Scharfe Nov. 20, 2022, 10:13 a.m. UTC
This reverts commit 5cb28270a1ff94a0a23e67b479bbbec3bc993518.

5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't leak,
2022-03-28) avoided leaking rev_info allocations in many cases by
calling repo_init_revisions() only when the .filter member was actually
needed, but then still leaking it.  That was fixed later by 2108fe4a19
(revisions API users: add straightforward release_revisions(),
2022-04-13), making the reverted commit unnecessary.

5cb28270a1 broke support for multiple --filter options by calling
repo_init_revisions() for each one, clobbering earlier state and
leaking rev_info allocations.  Reverting it fixes that regression.
Luckily it only affects users that run pack-objects directly.  Internal
callers in bundle.c and upload-pack.c combine multiple filters into a
single option using "+".

5cb28270a1 introduced OPT_PARSE_LIST_OBJECTS_FILTER_INIT.  It relies on
being able to faithfully convert function pointers to object pointers
and back, which is undefined in C99.  The standard mentions this ability
as a common extension in its annex J (J.5.7 Function pointer casts), but
avoiding that dependency is more portable.  The macro hasn't been used
since, OPT_PARSE_LIST_OBJECTS_FILTER is enough so far.

So revert 5cb28270a1 fully to fix support for multiple --filter options
and avoid reliance on undefined behavior.  Its intent is better served
by the release_revisions() call added by 2108fe4a19 alone -- we just
need to do it unconditionally, mirroring the unconditional call of
repo_init_revisions().

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: René Scharfe <l.s.r@web.de>
---
 builtin/pack-objects.c                 | 31 +++++---------------------
 list-objects-filter-options.c          |  4 ----
 list-objects-filter-options.h          | 24 +++-----------------
 t/t5317-pack-objects-filter-objects.sh |  2 +-
 4 files changed, 10 insertions(+), 51 deletions(-)

--
2.38.1

Comments

Junio C Hamano Nov. 28, 2022, 10:03 a.m. UTC | #1
René Scharfe <l.s.r@web.de> writes:

> This reverts commit 5cb28270a1ff94a0a23e67b479bbbec3bc993518.
>
> 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't leak,
> 2022-03-28) avoided leaking rev_info allocations in many cases by
> calling repo_init_revisions() only when the .filter member was actually
> needed, but then still leaking it.  That was fixed later by 2108fe4a19
> (revisions API users: add straightforward release_revisions(),
> 2022-04-13), making the reverted commit unnecessary.

Hmph, with this merged, 'seen' breaks linux-leaks job in a strange
way.

https://github.com/git/git/actions/runs/3563546608/jobs/5986458300#step:5:3917

Does anybody want to help looking into it?

Thanks.
Ævar Arnfjörð Bjarmason Nov. 28, 2022, 11:12 a.m. UTC | #2
On Mon, Nov 28 2022, Junio C Hamano wrote:

> René Scharfe <l.s.r@web.de> writes:
>
>> This reverts commit 5cb28270a1ff94a0a23e67b479bbbec3bc993518.
>>
>> 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't leak,
>> 2022-03-28) avoided leaking rev_info allocations in many cases by
>> calling repo_init_revisions() only when the .filter member was actually
>> needed, but then still leaking it.  That was fixed later by 2108fe4a19
>> (revisions API users: add straightforward release_revisions(),
>> 2022-04-13), making the reverted commit unnecessary.
>
> Hmph, with this merged, 'seen' breaks linux-leaks job in a strange
> way.
>
> https://github.com/git/git/actions/runs/3563546608/jobs/5986458300#step:5:3917
>
> Does anybody want to help looking into it?

It's because the tip of that branch introduces a new leak, but it's only
revealed in our test suite by a "git" invocation that we don't check the
exit code of. So testing with "GIT_TEST_SANITIZE_LEAK_LOG=true" (which
is there to check exactly those edge cases) catches it.

René, testing locally with e.g.:

	GIT_TEST_PASSING_SANITIZE_LEAK=check \
	GIT_TEST_SANITIZE_LEAK_LOG=true \
	make test

Should catch it, but note that we might still have new leaks in tests
that are already failing due to other leaks. So diff-ing the resulting
t/test-results/*.leak/* would be a good sanity check to see if there's
any other leak we've missed.

As to the bug itself: This exact edge case is why I went for the
lazy-setup part of 5cb28270a1f (pack-objects: lazily set up "struct
rev_info", don't leak, 2022-03-28), and indeed the example mentioned in
its commit message now leaks again, but not on master:

	echo e83c5163316f89bfbde7d9ab23ca2e25604af290 | ./git pack-objects initial

Now, maybe I'm missing something, but it the tip of that branch points
out a legitimate bug in my 5cb28270a1f.

But I don't see why the fix for it needs to be a full revert of
5cb28270a1f. If I replace all the C code changes in that commit with
this the test it flipped also passes:
	
	diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
	index 573d0b20b76..0de53c5a2df 100644
	--- a/builtin/pack-objects.c
	+++ b/builtin/pack-objects.c
	@@ -4158,8 +4158,10 @@ static struct list_objects_filter_options *po_filter_revs_init(void *value)
	 {
	 	struct po_filter_data *data = value;
	 
	-	repo_init_revisions(the_repository, &data->revs, NULL);
	-	data->have_revs = 1;
	+	if (!data->have_revs) {
	+		repo_init_revisions(the_repository, &data->revs, NULL);
	+		data->have_revs = 1;
	+	}
	 
	 	return &data->revs.filter;
	 }

I.e. the commit is correct that we shouldn't be clobbering the
"rev_info" when we see --filter again, but then let's just stop doing
that.

Well, I know why, it's because René initially started poking at this to
rid the codebase of the reliance of C99's J.5.7. I think our two
opinions on that are quite clear, so I won't repeat that here (I don't
per-se mind, I just think it's a nothingburger).

But in any case, if we're going to also refactor this code to get rd of
the J.5.7 reliance I'd think we could agree that it really belongs in
its own change. That change could then be a pure refactoring change, and
wouldn't need to also explain how it's fixing a bug while-at-it.
René Scharfe Nov. 28, 2022, 11:26 a.m. UTC | #3
Am 28.11.2022 um 11:03 schrieb Junio C Hamano:
> René Scharfe <l.s.r@web.de> writes:
>
>> This reverts commit 5cb28270a1ff94a0a23e67b479bbbec3bc993518.
>>
>> 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't leak,
>> 2022-03-28) avoided leaking rev_info allocations in many cases by
>> calling repo_init_revisions() only when the .filter member was actually
>> needed, but then still leaking it.  That was fixed later by 2108fe4a19
>> (revisions API users: add straightforward release_revisions(),
>> 2022-04-13), making the reverted commit unnecessary.
>
> Hmph, with this merged, 'seen' breaks linux-leaks job in a strange
> way.
>
> https://github.com/git/git/actions/runs/3563546608/jobs/5986458300#step:5:3917
>
> Does anybody want to help looking into it?
The patch exposes that release_revisions() leaks the diffopt allocations
as we're yet to address the TODO added by 54c8a7c379 (revisions API: add
a TODO for diff_free(&revs->diffopt), 2022-04-14).

The patch below plugs it locally.

--- >8 ---
Subject: [PATCH 4/3] fixup! revision: free diffopt in release_revisions()

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 builtin/pack-objects.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 3e74fbb0cd..a47a3f0fba 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -4462,6 +4462,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 	} else {
 		get_object_list(&revs, rp.nr, rp.v);
 	}
+	diff_free(&revs.diffopt);
 	release_revisions(&revs);
 	cleanup_preferred_base();
 	if (include_tag && nr_result)
--
2.30.2
Ævar Arnfjörð Bjarmason Nov. 28, 2022, 11:31 a.m. UTC | #4
On Mon, Nov 28 2022, René Scharfe wrote:

> Am 28.11.2022 um 11:03 schrieb Junio C Hamano:
>> René Scharfe <l.s.r@web.de> writes:
>>
>>> This reverts commit 5cb28270a1ff94a0a23e67b479bbbec3bc993518.
>>>
>>> 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't leak,
>>> 2022-03-28) avoided leaking rev_info allocations in many cases by
>>> calling repo_init_revisions() only when the .filter member was actually
>>> needed, but then still leaking it.  That was fixed later by 2108fe4a19
>>> (revisions API users: add straightforward release_revisions(),
>>> 2022-04-13), making the reverted commit unnecessary.
>>
>> Hmph, with this merged, 'seen' breaks linux-leaks job in a strange
>> way.
>>
>> https://github.com/git/git/actions/runs/3563546608/jobs/5986458300#step:5:3917
>>
>> Does anybody want to help looking into it?

[I see we crossed E-Mails]:
https://lore.kernel.org/git/221128.868rjvmi3l.gmgdl@evledraar.gmail.com/

> The patch exposes that release_revisions() leaks the diffopt allocations
> as we're yet to address the TODO added by 54c8a7c379 (revisions API: add
> a TODO for diff_free(&revs->diffopt), 2022-04-14).

That's correct, and we have that leak in various places in our codebase,
but per the above side-thread I think this is primarily exposing that
we're setting up the "struct rev_info" with your change when we don't
need to. Why can't we just skip it?

Yeah, if we do set it up we'll run into an outstanding leak, and that
should also be fixed (I have some local patches...), but the other cases
I know of where we'll leak that data is where we're actually using the
"struct rev_info".

I haven't tried tearing your change apart to poke at it myself, and
maybe there's some really good reason for why you can't separate getting
rid of the J.5.7 dependency and removing the lazy-init.

> The patch below plugs it locally.
>
> --- >8 ---
> Subject: [PATCH 4/3] fixup! revision: free diffopt in release_revisions()
>
> Signed-off-by: René Scharfe <l.s.r@web.de>
> ---
>  builtin/pack-objects.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 3e74fbb0cd..a47a3f0fba 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -4462,6 +4462,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
>  	} else {
>  		get_object_list(&revs, rp.nr, rp.v);
>  	}
> +	diff_free(&revs.diffopt);
>  	release_revisions(&revs);
>  	cleanup_preferred_base();
>  	if (include_tag && nr_result)

So, the main motivation for the change was paranoia that a compiler or
platform might show up without J.5.7 support and that would bite us, but
we're now adding a double-free-in-waiting?

I think we're both a bit paranoid, but clearly have different
paranoia-priorities :)

If we do end up with some hack like this instead of fixing the
underlying problem I'd much prefer that such a hack just be an UNLEAK()
here.

I.e. we have a destructor for "revs.*" already, let's not bypass it and
start freeing things from under it, which will result in a double-free
if we forget this callsite once the TODO in 54c8a7c379 is addressed.

As you'd see if you made release_revisions() simply call
diff_free(&revs.diffopt) doing so would reveal some really gnarly edge
cases.

I haven't dug into this one, but offhand I'm not confident in saying
that this isn't exposing us to some aspect of that gnarlyness (maybe
not, it's been a while since I looked).

(IIRC some of the most gnarly edge cases will only show up as CI
failures on Windows, to do with the ordering of when we'll fclose()
files hanging off that "diffopt").
Ævar Arnfjörð Bjarmason Nov. 28, 2022, 12:24 p.m. UTC | #5
On Mon, Nov 28 2022, Ævar Arnfjörð Bjarmason wrote:

René:

> On Mon, Nov 28 2022, René Scharfe wrote:
>
>> Am 28.11.2022 um 11:03 schrieb Junio C Hamano:
>>> René Scharfe <l.s.r@web.de> writes:
>>>
>>>> This reverts commit 5cb28270a1ff94a0a23e67b479bbbec3bc993518.
>>>>
>>>> 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't leak,
>>>> 2022-03-28) avoided leaking rev_info allocations in many cases by
>>>> calling repo_init_revisions() only when the .filter member was actually
>>>> needed, but then still leaking it.  That was fixed later by 2108fe4a19
>>>> (revisions API users: add straightforward release_revisions(),
>>>> 2022-04-13), making the reverted commit unnecessary.
>>>
>>> Hmph, with this merged, 'seen' breaks linux-leaks job in a strange
>>> way.
>>>
>>> https://github.com/git/git/actions/runs/3563546608/jobs/5986458300#step:5:3917
>>>
>>> Does anybody want to help looking into it?
>
> [I see we crossed E-Mails]:
> https://lore.kernel.org/git/221128.868rjvmi3l.gmgdl@evledraar.gmail.com/
>
>> The patch exposes that release_revisions() leaks the diffopt allocations
>> as we're yet to address the TODO added by 54c8a7c379 (revisions API: add
>> a TODO for diff_free(&revs->diffopt), 2022-04-14).
>
> That's correct, and we have that leak in various places in our codebase,
> but per the above side-thread I think this is primarily exposing that
> we're setting up the "struct rev_info" with your change when we don't
> need to. Why can't we just skip it?
>
> Yeah, if we do set it up we'll run into an outstanding leak, and that
> should also be fixed (I have some local patches...), but the other cases
> I know of where we'll leak that data is where we're actually using the
> "struct rev_info".
>
> I haven't tried tearing your change apart to poke at it myself, and
> maybe there's some really good reason for why you can't separate getting
> rid of the J.5.7 dependency and removing the lazy-init.
>
>> The patch below plugs it locally.
>>
>> --- >8 ---
>> Subject: [PATCH 4/3] fixup! revision: free diffopt in release_revisions()
>>
>> Signed-off-by: René Scharfe <l.s.r@web.de>
>> ---
>>  builtin/pack-objects.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
>> index 3e74fbb0cd..a47a3f0fba 100644
>> --- a/builtin/pack-objects.c
>> +++ b/builtin/pack-objects.c
>> @@ -4462,6 +4462,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
>>  	} else {
>>  		get_object_list(&revs, rp.nr, rp.v);
>>  	}
>> +	diff_free(&revs.diffopt);
>>  	release_revisions(&revs);
>>  	cleanup_preferred_base();
>>  	if (include_tag && nr_result)
>
> So, the main motivation for the change was paranoia that a compiler or
> platform might show up without J.5.7 support and that would bite us, but
> we're now adding a double-free-in-waiting?
>
> I think we're both a bit paranoid, but clearly have different
> paranoia-priorities :)
>
> If we do end up with some hack like this instead of fixing the
> underlying problem I'd much prefer that such a hack just be an UNLEAK()
> here.
>
> I.e. we have a destructor for "revs.*" already, let's not bypass it and
> start freeing things from under it, which will result in a double-free
> if we forget this callsite once the TODO in 54c8a7c379 is addressed.
>
> As you'd see if you made release_revisions() simply call
> diff_free(&revs.diffopt) doing so would reveal some really gnarly edge
> cases.
>
> I haven't dug into this one, but offhand I'm not confident in saying
> that this isn't exposing us to some aspect of that gnarlyness (maybe
> not, it's been a while since I looked).
>
> (IIRC some of the most gnarly edge cases will only show up as CI
> failures on Windows, to do with the ordering of when we'll fclose()
> files hanging off that "diffopt").

This squashed into 3/3 seems to me to be a proper fix to a change that
wants to refactor the code for non-J.5.7 compatibility. I.e. this just
does the data<->fp casting part of the change, without refactoring the
"lazy init".

But I think you should check this a bit more carefully. Your 3/3 says
that your change "mak[es] the reverted commit unnecessary", but as I
noted if you'd run the command that commit shows, you'd have seen you're
re-introducing the leak it fixed. So I wonder what else has been missed
here.

I vaguely recall that one reason I ended up with that J.5.7 dependency
was because there was an objection to mocking up the "struct option" as
I'm doing here. I.e. here we assume that the
opt_parse_list_objects_filter() is only ever going to care about the
"value" member.

I think that's probably fine, but I may be misrecalling, or missing some
crucial detail. I'll leave digging that up & convincing us that it's
fine to the person pushing for refactoring all of this :)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 3e74fbb0cd5..faf210bfe8c 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -4149,6 +4149,27 @@ static int option_parse_cruft_expiration(const struct option *opt,
 	return 0;
 }
 
+struct po_filter_data {
+	unsigned have_revs:1;
+	struct rev_info revs;
+};
+
+static int opt_parse_list_objects_filter_init(const struct option *opt,
+					      const char *arg, int unset)
+{
+	struct po_filter_data *data = opt->value;
+	struct rev_info *revs = &data->revs;
+	const struct option opt_rev = {
+		.value = (void *)&revs->filter,
+	};
+
+	if (!data->have_revs)
+		repo_init_revisions(the_repository, revs, NULL);
+	data->have_revs = 1;
+
+	return opt_parse_list_objects_filter(&opt_rev, arg, unset);
+}
+
 int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 {
 	int use_internal_rev_list = 0;
@@ -4159,7 +4180,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 	int rev_list_index = 0;
 	int stdin_packs = 0;
 	struct string_list keep_pack_list = STRING_LIST_INIT_NODUP;
-	struct rev_info revs;
+	struct po_filter_data pfd = { .have_revs = 0 };
 
 	struct option pack_objects_options[] = {
 		OPT_SET_INT('q', "quiet", &progress,
@@ -4250,7 +4271,8 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 			      &write_bitmap_index,
 			      N_("write a bitmap index if possible"),
 			      WRITE_BITMAP_QUIET, PARSE_OPT_HIDDEN),
-		OPT_PARSE_LIST_OBJECTS_FILTER(&revs.filter),
+		OPT_CALLBACK(0, "filter", &pfd, N_("args"), N_("object filtering"),
+			     opt_parse_list_objects_filter_init),
 		OPT_CALLBACK_F(0, "missing", NULL, N_("action"),
 		  N_("handling for missing objects"), PARSE_OPT_NONEG,
 		  option_parse_missing_action),
@@ -4269,8 +4291,6 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 
 	read_replace_refs = 0;
 
-	repo_init_revisions(the_repository, &revs, NULL);
-
 	sparse = git_env_bool("GIT_TEST_PACK_SPARSE", -1);
 	if (the_repository->gitdir) {
 		prepare_repo_settings(the_repository);
@@ -4372,7 +4392,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 	if (!rev_list_all || !rev_list_reflog || !rev_list_index)
 		unpack_unreachable_expiration = 0;
 
-	if (revs.filter.choice) {
+	if (pfd.have_revs && pfd.revs.filter.choice) {
 		if (!pack_to_stdout)
 			die(_("cannot use --filter without --stdout"));
 		if (stdin_packs)
@@ -4459,10 +4479,16 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 		read_cruft_objects();
 	} else if (!use_internal_rev_list) {
 		read_object_list_from_stdin();
+	} else if (pfd.have_revs) {
+		get_object_list(&pfd.revs, rp.nr, rp.v);
+		release_revisions(&pfd.revs);
 	} else {
+		struct rev_info revs;
+
+		repo_init_revisions(the_repository, &revs, NULL);
 		get_object_list(&revs, rp.nr, rp.v);
+		release_revisions(&revs);
 	}
-	release_revisions(&revs);
 	cleanup_preferred_base();
 	if (include_tag && nr_result)
 		for_each_tag_ref(add_ref_tag, NULL);
René Scharfe Nov. 28, 2022, 2:29 p.m. UTC | #6
Am 28.11.2022 um 12:31 schrieb Ævar Arnfjörð Bjarmason:
>
> On Mon, Nov 28 2022, René Scharfe wrote:
>
>> Am 28.11.2022 um 11:03 schrieb Junio C Hamano:
>>> René Scharfe <l.s.r@web.de> writes:
>>>
>>>> This reverts commit 5cb28270a1ff94a0a23e67b479bbbec3bc993518.
>>>>
>>>> 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't leak,
>>>> 2022-03-28) avoided leaking rev_info allocations in many cases by
>>>> calling repo_init_revisions() only when the .filter member was actually
>>>> needed, but then still leaking it.  That was fixed later by 2108fe4a19
>>>> (revisions API users: add straightforward release_revisions(),
>>>> 2022-04-13), making the reverted commit unnecessary.
>>>
>>> Hmph, with this merged, 'seen' breaks linux-leaks job in a strange
>>> way.
>>>
>>> https://github.com/git/git/actions/runs/3563546608/jobs/5986458300#step:5:3917
>>>
>>> Does anybody want to help looking into it?
>
> [I see we crossed E-Mails]:
> https://lore.kernel.org/git/221128.868rjvmi3l.gmgdl@evledraar.gmail.com/
>
>> The patch exposes that release_revisions() leaks the diffopt allocations
>> as we're yet to address the TODO added by 54c8a7c379 (revisions API: add
>> a TODO for diff_free(&revs->diffopt), 2022-04-14).
>
> That's correct, and we have that leak in various places in our codebase,
> but per the above side-thread I think this is primarily exposing that
> we're setting up the "struct rev_info" with your change when we don't
> need to. Why can't we just skip it?

I have no idea how to stop get_object_list() from using struct rev_info.
We could let it take a struct list_objects_filter_options pointer
instead and have it build a struct rev_info internally, but that would
just move the problem, not solve it.

> Yeah, if we do set it up we'll run into an outstanding leak, and that
> should also be fixed (I have some local patches...), but the other cases
> I know of where we'll leak that data is where we're actually using the
> "struct rev_info".
>
> I haven't tried tearing your change apart to poke at it myself, and
> maybe there's some really good reason for why you can't separate getting
> rid of the J.5.7 dependency and removing the lazy-init.
>
>> The patch below plugs it locally.
>>
>> --- >8 ---
>> Subject: [PATCH 4/3] fixup! revision: free diffopt in release_revisions()
>>
>> Signed-off-by: René Scharfe <l.s.r@web.de>
>> ---
>>  builtin/pack-objects.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
>> index 3e74fbb0cd..a47a3f0fba 100644
>> --- a/builtin/pack-objects.c
>> +++ b/builtin/pack-objects.c
>> @@ -4462,6 +4462,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
>>  	} else {
>>  		get_object_list(&revs, rp.nr, rp.v);
>>  	}
>> +	diff_free(&revs.diffopt);
>>  	release_revisions(&revs);
>>  	cleanup_preferred_base();
>>  	if (include_tag && nr_result)
>
> So, the main motivation for the change was paranoia that a compiler or
> platform might show up without J.5.7 support and that would bite us, but
> we're now adding a double-free-in-waiting?
>
> I think we're both a bit paranoid, but clearly have different
> paranoia-priorities :)
>
> If we do end up with some hack like this instead of fixing the
> underlying problem I'd much prefer that such a hack just be an UNLEAK()
> here.
>
> I.e. we have a destructor for "revs.*" already, let's not bypass it and
> start freeing things from under it, which will result in a double-free
> if we forget this callsite once the TODO in 54c8a7c379 is addressed.

Well, that TODO fix should remove this new diff_free() call, but I
agree that this is fragile.

Removing the "TEST_PASSES_SANITIZE_LEAK=true" line from affected tests
is probably better.

> As you'd see if you made release_revisions() simply call
> diff_free(&revs.diffopt) doing so would reveal some really gnarly edge
> cases.

That was my first attempt; it breaks lots of tests due to double frees.

> I haven't dug into this one, but offhand I'm not confident in saying
> that this isn't exposing us to some aspect of that gnarlyness (maybe
> not, it's been a while since I looked).

I saw it as the way towards a release_revisions() that calls diff_free()
itself: Add such calls to each of them, fix the "gnarlyness"
individually, finally move them all into release_revisions().  The only
problem is that there are 60+ callsites.

> (IIRC some of the most gnarly edge cases will only show up as CI
> failures on Windows, to do with the ordering of when we'll fclose()
> files hanging off that "diffopt").

Fun.

René
Ævar Arnfjörð Bjarmason Nov. 28, 2022, 2:34 p.m. UTC | #7
On Mon, Nov 28 2022, René Scharfe wrote:

> Am 28.11.2022 um 12:31 schrieb Ævar Arnfjörð Bjarmason:
>>
>> On Mon, Nov 28 2022, René Scharfe wrote:
>>
>>> Am 28.11.2022 um 11:03 schrieb Junio C Hamano:
>>>> René Scharfe <l.s.r@web.de> writes:
>>>>
>>>>> This reverts commit 5cb28270a1ff94a0a23e67b479bbbec3bc993518.
>>>>>
>>>>> 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't leak,
>>>>> 2022-03-28) avoided leaking rev_info allocations in many cases by
>>>>> calling repo_init_revisions() only when the .filter member was actually
>>>>> needed, but then still leaking it.  That was fixed later by 2108fe4a19
>>>>> (revisions API users: add straightforward release_revisions(),
>>>>> 2022-04-13), making the reverted commit unnecessary.
>>>>
>>>> Hmph, with this merged, 'seen' breaks linux-leaks job in a strange
>>>> way.
>>>>
>>>> https://github.com/git/git/actions/runs/3563546608/jobs/5986458300#step:5:3917
>>>>
>>>> Does anybody want to help looking into it?
>>
>> [I see we crossed E-Mails]:
>> https://lore.kernel.org/git/221128.868rjvmi3l.gmgdl@evledraar.gmail.com/
>>
>>> The patch exposes that release_revisions() leaks the diffopt allocations
>>> as we're yet to address the TODO added by 54c8a7c379 (revisions API: add
>>> a TODO for diff_free(&revs->diffopt), 2022-04-14).
>>
>> That's correct, and we have that leak in various places in our codebase,
>> but per the above side-thread I think this is primarily exposing that
>> we're setting up the "struct rev_info" with your change when we don't
>> need to. Why can't we just skip it?
>
> I have no idea how to stop get_object_list() from using struct rev_info.
> We could let it take a struct list_objects_filter_options pointer
> instead and have it build a struct rev_info internally, but that would
> just move the problem, not solve it.

I mean skip it when it's not needed, it's needed when we call
get_object_list().

But what "problem" is being caused by get_object_list()? That there's
some other case(s) where it'll leak still? I haven't checked, I think we
should leave that for some other time if there's such leaks, and just
not introduce any new leaks in this topic.

>> Yeah, if we do set it up we'll run into an outstanding leak, and that
>> should also be fixed (I have some local patches...), but the other cases
>> I know of where we'll leak that data is where we're actually using the
>> "struct rev_info".
>>
>> I haven't tried tearing your change apart to poke at it myself, and
>> maybe there's some really good reason for why you can't separate getting
>> rid of the J.5.7 dependency and removing the lazy-init.
>>
>>> The patch below plugs it locally.
>>>
>>> --- >8 ---
>>> Subject: [PATCH 4/3] fixup! revision: free diffopt in release_revisions()
>>>
>>> Signed-off-by: René Scharfe <l.s.r@web.de>
>>> ---
>>>  builtin/pack-objects.c | 1 +
>>>  1 file changed, 1 insertion(+)
>>>
>>> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
>>> index 3e74fbb0cd..a47a3f0fba 100644
>>> --- a/builtin/pack-objects.c
>>> +++ b/builtin/pack-objects.c
>>> @@ -4462,6 +4462,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
>>>  	} else {
>>>  		get_object_list(&revs, rp.nr, rp.v);
>>>  	}
>>> +	diff_free(&revs.diffopt);
>>>  	release_revisions(&revs);
>>>  	cleanup_preferred_base();
>>>  	if (include_tag && nr_result)
>>
>> So, the main motivation for the change was paranoia that a compiler or
>> platform might show up without J.5.7 support and that would bite us, but
>> we're now adding a double-free-in-waiting?
>>
>> I think we're both a bit paranoid, but clearly have different
>> paranoia-priorities :)
>>
>> If we do end up with some hack like this instead of fixing the
>> underlying problem I'd much prefer that such a hack just be an UNLEAK()
>> here.
>>
>> I.e. we have a destructor for "revs.*" already, let's not bypass it and
>> start freeing things from under it, which will result in a double-free
>> if we forget this callsite once the TODO in 54c8a7c379 is addressed.
>
> Well, that TODO fix should remove this new diff_free() call, but I
> agree that this is fragile.
>
> Removing the "TEST_PASSES_SANITIZE_LEAK=true" line from affected tests
> is probably better.

Or just not introduce new leaks, per my suggested fix-up at
https://lore.kernel.org/git/221128.86zgcbl0pe.gmgdl@evledraar.gmail.com/
(which it looks like you haven't seen when this E-Mail is composed...).

>> As you'd see if you made release_revisions() simply call
>> diff_free(&revs.diffopt) doing so would reveal some really gnarly edge
>> cases.
>
> That was my first attempt; it breaks lots of tests due to double frees.

Right, to be clear I'm saying that none of this is needed right now,
i.e. I don't get why we'd want the scope-creep past the hunk I noted in
https://lore.kernel.org/git/221128.868rjvmi3l.gmgdl@evledraar.gmail.com/
for the --filter bug fix (plus the tests you're adding).

>> I haven't dug into this one, but offhand I'm not confident in saying
>> that this isn't exposing us to some aspect of that gnarlyness (maybe
>> not, it's been a while since I looked).
>
> I saw it as the way towards a release_revisions() that calls diff_free()
> itself: Add such calls to each of them, fix the "gnarlyness"
> individually, finally move them all into release_revisions().  The only
> problem is that there are 60+ callsites.

I think this is a really bad approach in general.

Yes, it may happen to work to free() some data from under an API, but
it's just as likely that we'll miss that this one caller is screwing
with its internal state, and e.g. when some new revision.c code is used
it'll go boom.

If we wanted to phase in such a free() of "foo" I think the right way
would be to add some "revs.free_foo = 1" flag, giving the API a chance
to treat that sanely, not to start poking at members of the struct, and
assuming that its release() won't be free()-ing them.

But as noted above & in the linked I think we can defer all of that. The
only reason we're discussing this is because you're changing the
lazy-init to be not-lazy, and introducing new leaks as a result.

I've shown a couple of approaches in this thread of fixing the issue(s)
at hand without introducing such leaks, so ...
René Scharfe Nov. 28, 2022, 3:16 p.m. UTC | #8
Am 28.11.2022 um 13:24 schrieb Ævar Arnfjörð Bjarmason:
>
> On Mon, Nov 28 2022, Ævar Arnfjörð Bjarmason wrote:
>
> René:
>
>> On Mon, Nov 28 2022, René Scharfe wrote:
>>
>>> Am 28.11.2022 um 11:03 schrieb Junio C Hamano:
>>>> René Scharfe <l.s.r@web.de> writes:
>>>>
>>>>> This reverts commit 5cb28270a1ff94a0a23e67b479bbbec3bc993518.
>>>>>
>>>>> 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't leak,
>>>>> 2022-03-28) avoided leaking rev_info allocations in many cases by
>>>>> calling repo_init_revisions() only when the .filter member was actually
>>>>> needed, but then still leaking it.  That was fixed later by 2108fe4a19
>>>>> (revisions API users: add straightforward release_revisions(),
>>>>> 2022-04-13), making the reverted commit unnecessary.
>>>>
>>>> Hmph, with this merged, 'seen' breaks linux-leaks job in a strange
>>>> way.
>>>>
>>>> https://github.com/git/git/actions/runs/3563546608/jobs/5986458300#step:5:3917
>>>>
>>>> Does anybody want to help looking into it?
>>
>> [I see we crossed E-Mails]:
>> https://lore.kernel.org/git/221128.868rjvmi3l.gmgdl@evledraar.gmail.com/
>>
>>> The patch exposes that release_revisions() leaks the diffopt allocations
>>> as we're yet to address the TODO added by 54c8a7c379 (revisions API: add
>>> a TODO for diff_free(&revs->diffopt), 2022-04-14).
>>
>> That's correct, and we have that leak in various places in our codebase,
>> but per the above side-thread I think this is primarily exposing that
>> we're setting up the "struct rev_info" with your change when we don't
>> need to. Why can't we just skip it?
>>
>> Yeah, if we do set it up we'll run into an outstanding leak, and that
>> should also be fixed (I have some local patches...), but the other cases
>> I know of where we'll leak that data is where we're actually using the
>> "struct rev_info".
>>
>> I haven't tried tearing your change apart to poke at it myself, and
>> maybe there's some really good reason for why you can't separate getting
>> rid of the J.5.7 dependency and removing the lazy-init.
>>
>>> The patch below plugs it locally.
>>>
>>> --- >8 ---
>>> Subject: [PATCH 4/3] fixup! revision: free diffopt in release_revisions()
>>>
>>> Signed-off-by: René Scharfe <l.s.r@web.de>
>>> ---
>>>  builtin/pack-objects.c | 1 +
>>>  1 file changed, 1 insertion(+)
>>>
>>> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
>>> index 3e74fbb0cd..a47a3f0fba 100644
>>> --- a/builtin/pack-objects.c
>>> +++ b/builtin/pack-objects.c
>>> @@ -4462,6 +4462,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
>>>  	} else {
>>>  		get_object_list(&revs, rp.nr, rp.v);
>>>  	}
>>> +	diff_free(&revs.diffopt);
>>>  	release_revisions(&revs);
>>>  	cleanup_preferred_base();
>>>  	if (include_tag && nr_result)
>>
>> So, the main motivation for the change was paranoia that a compiler or
>> platform might show up without J.5.7 support and that would bite us, but
>> we're now adding a double-free-in-waiting?
>>
>> I think we're both a bit paranoid, but clearly have different
>> paranoia-priorities :)
>>
>> If we do end up with some hack like this instead of fixing the
>> underlying problem I'd much prefer that such a hack just be an UNLEAK()
>> here.
>>
>> I.e. we have a destructor for "revs.*" already, let's not bypass it and
>> start freeing things from under it, which will result in a double-free
>> if we forget this callsite once the TODO in 54c8a7c379 is addressed.
>>
>> As you'd see if you made release_revisions() simply call
>> diff_free(&revs.diffopt) doing so would reveal some really gnarly edge
>> cases.
>>
>> I haven't dug into this one, but offhand I'm not confident in saying
>> that this isn't exposing us to some aspect of that gnarlyness (maybe
>> not, it's been a while since I looked).
>>
>> (IIRC some of the most gnarly edge cases will only show up as CI
>> failures on Windows, to do with the ordering of when we'll fclose()
>> files hanging off that "diffopt").
>
> This squashed into 3/3 seems to me to be a proper fix to a change that
> wants to refactor the code for non-J.5.7 compatibility. I.e. this just
> does the data<->fp casting part of the change, without refactoring the
> "lazy init".

That works, but lazy code is more complicated and there is no benefit
here -- eager allocations are not noticably slow or big.  Laziness
hides leaks in corners, i.e. requiring invocations with uncommon
options to trigger them.

>
> But I think you should check this a bit more carefully. Your 3/3 says
> that your change "mak[es] the reverted commit unnecessary"

No, it says that _your_ change 2108fe4a19 (revisions API users: add
straightforward release_revisions(), 2022-04-13) made it unnecessary.

> , but as I
> noted if you'd run the command that commit shows, you'd have seen you're
> re-introducing the leak it fixed. So I wonder what else has been missed
> here.

5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't leak,
2022-03-28) did not plug the leak.  It only moved it to the corner that
handles the --filter option.

That leak is only interesting to Git developers and harmless for users.
But if the goal is to become free of trivial leaks in order to allow
using tools like LeakSanitizer to find real ones then pushing them into
the shadows not yet reached by our test coverage won't help for long.

> I vaguely recall that one reason I ended up with that J.5.7 dependency
> was because there was an objection to mocking up the "struct option" as
> I'm doing here. I.e. here we assume that the
> opt_parse_list_objects_filter() is only ever going to care about the
> "value" member.

It's probably fine, but unnecessarily complicated compared to calling
repo_init_revisions() eagerly.

>
> I think that's probably fine, but I may be misrecalling, or missing some
> crucial detail. I'll leave digging that up & convincing us that it's
> fine to the person pushing for refactoring all of this :)
>
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 3e74fbb0cd5..faf210bfe8c 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -4149,6 +4149,27 @@ static int option_parse_cruft_expiration(const struct option *opt,
>  	return 0;
>  }
>
> +struct po_filter_data {
> +	unsigned have_revs:1;
> +	struct rev_info revs;
> +};
> +
> +static int opt_parse_list_objects_filter_init(const struct option *opt,
> +					      const char *arg, int unset)
> +{
> +	struct po_filter_data *data = opt->value;
> +	struct rev_info *revs = &data->revs;
> +	const struct option opt_rev = {
> +		.value = (void *)&revs->filter,
> +	};
> +
> +	if (!data->have_revs)
> +		repo_init_revisions(the_repository, revs, NULL);
> +	data->have_revs = 1;
> +
> +	return opt_parse_list_objects_filter(&opt_rev, arg, unset);
> +}
> +
>  int cmd_pack_objects(int argc, const char **argv, const char *prefix)
>  {
>  	int use_internal_rev_list = 0;
> @@ -4159,7 +4180,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
>  	int rev_list_index = 0;
>  	int stdin_packs = 0;
>  	struct string_list keep_pack_list = STRING_LIST_INIT_NODUP;
> -	struct rev_info revs;
> +	struct po_filter_data pfd = { .have_revs = 0 };
>
>  	struct option pack_objects_options[] = {
>  		OPT_SET_INT('q', "quiet", &progress,
> @@ -4250,7 +4271,8 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
>  			      &write_bitmap_index,
>  			      N_("write a bitmap index if possible"),
>  			      WRITE_BITMAP_QUIET, PARSE_OPT_HIDDEN),
> -		OPT_PARSE_LIST_OBJECTS_FILTER(&revs.filter),
> +		OPT_CALLBACK(0, "filter", &pfd, N_("args"), N_("object filtering"),
> +			     opt_parse_list_objects_filter_init),
>  		OPT_CALLBACK_F(0, "missing", NULL, N_("action"),
>  		  N_("handling for missing objects"), PARSE_OPT_NONEG,
>  		  option_parse_missing_action),
> @@ -4269,8 +4291,6 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
>
>  	read_replace_refs = 0;
>
> -	repo_init_revisions(the_repository, &revs, NULL);
> -
>  	sparse = git_env_bool("GIT_TEST_PACK_SPARSE", -1);
>  	if (the_repository->gitdir) {
>  		prepare_repo_settings(the_repository);
> @@ -4372,7 +4392,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
>  	if (!rev_list_all || !rev_list_reflog || !rev_list_index)
>  		unpack_unreachable_expiration = 0;
>
> -	if (revs.filter.choice) {
> +	if (pfd.have_revs && pfd.revs.filter.choice) {
>  		if (!pack_to_stdout)
>  			die(_("cannot use --filter without --stdout"));
>  		if (stdin_packs)
> @@ -4459,10 +4479,16 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
>  		read_cruft_objects();
>  	} else if (!use_internal_rev_list) {
>  		read_object_list_from_stdin();
> +	} else if (pfd.have_revs) {
> +		get_object_list(&pfd.revs, rp.nr, rp.v);
> +		release_revisions(&pfd.revs);
>  	} else {
> +		struct rev_info revs;
> +
> +		repo_init_revisions(the_repository, &revs, NULL);
>  		get_object_list(&revs, rp.nr, rp.v);
> +		release_revisions(&revs);
>  	}
> -	release_revisions(&revs);
>  	cleanup_preferred_base();
>  	if (include_tag && nr_result)
>  		for_each_tag_ref(add_ref_tag, NULL);
Ævar Arnfjörð Bjarmason Nov. 28, 2022, 3:27 p.m. UTC | #9
On Mon, Nov 28 2022, René Scharfe wrote:

> Am 28.11.2022 um 13:24 schrieb Ævar Arnfjörð Bjarmason:
>>
>> On Mon, Nov 28 2022, Ævar Arnfjörð Bjarmason wrote:
>>
>> René:
>>
>>> On Mon, Nov 28 2022, René Scharfe wrote:
>>>
>>>> Am 28.11.2022 um 11:03 schrieb Junio C Hamano:
>>>>> René Scharfe <l.s.r@web.de> writes:
>>>>>
>>>>>> This reverts commit 5cb28270a1ff94a0a23e67b479bbbec3bc993518.
>>>>>>
>>>>>> 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't leak,
>>>>>> 2022-03-28) avoided leaking rev_info allocations in many cases by
>>>>>> calling repo_init_revisions() only when the .filter member was actually
>>>>>> needed, but then still leaking it.  That was fixed later by 2108fe4a19
>>>>>> (revisions API users: add straightforward release_revisions(),
>>>>>> 2022-04-13), making the reverted commit unnecessary.
>>>>>
>>>>> Hmph, with this merged, 'seen' breaks linux-leaks job in a strange
>>>>> way.
>>>>>
>>>>> https://github.com/git/git/actions/runs/3563546608/jobs/5986458300#step:5:3917
>>>>>
>>>>> Does anybody want to help looking into it?
>>>
>>> [I see we crossed E-Mails]:
>>> https://lore.kernel.org/git/221128.868rjvmi3l.gmgdl@evledraar.gmail.com/
>>>
>>>> The patch exposes that release_revisions() leaks the diffopt allocations
>>>> as we're yet to address the TODO added by 54c8a7c379 (revisions API: add
>>>> a TODO for diff_free(&revs->diffopt), 2022-04-14).
>>>
>>> That's correct, and we have that leak in various places in our codebase,
>>> but per the above side-thread I think this is primarily exposing that
>>> we're setting up the "struct rev_info" with your change when we don't
>>> need to. Why can't we just skip it?
>>>
>>> Yeah, if we do set it up we'll run into an outstanding leak, and that
>>> should also be fixed (I have some local patches...), but the other cases
>>> I know of where we'll leak that data is where we're actually using the
>>> "struct rev_info".
>>>
>>> I haven't tried tearing your change apart to poke at it myself, and
>>> maybe there's some really good reason for why you can't separate getting
>>> rid of the J.5.7 dependency and removing the lazy-init.
>>>
>>>> The patch below plugs it locally.
>>>>
>>>> --- >8 ---
>>>> Subject: [PATCH 4/3] fixup! revision: free diffopt in release_revisions()
>>>>
>>>> Signed-off-by: René Scharfe <l.s.r@web.de>
>>>> ---
>>>>  builtin/pack-objects.c | 1 +
>>>>  1 file changed, 1 insertion(+)
>>>>
>>>> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
>>>> index 3e74fbb0cd..a47a3f0fba 100644
>>>> --- a/builtin/pack-objects.c
>>>> +++ b/builtin/pack-objects.c
>>>> @@ -4462,6 +4462,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
>>>>  	} else {
>>>>  		get_object_list(&revs, rp.nr, rp.v);
>>>>  	}
>>>> +	diff_free(&revs.diffopt);
>>>>  	release_revisions(&revs);
>>>>  	cleanup_preferred_base();
>>>>  	if (include_tag && nr_result)
>>>
>>> So, the main motivation for the change was paranoia that a compiler or
>>> platform might show up without J.5.7 support and that would bite us, but
>>> we're now adding a double-free-in-waiting?
>>>
>>> I think we're both a bit paranoid, but clearly have different
>>> paranoia-priorities :)
>>>
>>> If we do end up with some hack like this instead of fixing the
>>> underlying problem I'd much prefer that such a hack just be an UNLEAK()
>>> here.
>>>
>>> I.e. we have a destructor for "revs.*" already, let's not bypass it and
>>> start freeing things from under it, which will result in a double-free
>>> if we forget this callsite once the TODO in 54c8a7c379 is addressed.
>>>
>>> As you'd see if you made release_revisions() simply call
>>> diff_free(&revs.diffopt) doing so would reveal some really gnarly edge
>>> cases.
>>>
>>> I haven't dug into this one, but offhand I'm not confident in saying
>>> that this isn't exposing us to some aspect of that gnarlyness (maybe
>>> not, it's been a while since I looked).
>>>
>>> (IIRC some of the most gnarly edge cases will only show up as CI
>>> failures on Windows, to do with the ordering of when we'll fclose()
>>> files hanging off that "diffopt").
>>
>> This squashed into 3/3 seems to me to be a proper fix to a change that
>> wants to refactor the code for non-J.5.7 compatibility. I.e. this just
>> does the data<->fp casting part of the change, without refactoring the
>> "lazy init".
>
> That works, but lazy code is more complicated and there is no benefit
> here -- eager allocations are not noticably slow or big.  Laziness
> hides leaks in corners, i.e. requiring invocations with uncommon
> options to trigger them.

Yes, sometimes it's easier to just set everything up at the
beginning. As for hiding leaks I think the empirical data here is going
against that, i.e. your change introduced a leak.

I don't think it's realistic that we'll have the side that assigns to
"have_revs" drift from the corresponding code in cmd_pack_objects().

>> But I think you should check this a bit more carefully. Your 3/3 says
>> that your change "mak[es] the reverted commit unnecessary"
>
> No, it says that _your_ change 2108fe4a19 (revisions API users: add
> straightforward release_revisions(), 2022-04-13) made it unnecessary.

Yes, I'm saying that's not correct, because if you run the command that
5cb28270a1 prominently notes we'll now leak with this revert:

	echo e83c5163316f89bfbde7d9ab23ca2e25604af290 | ./git pack-objects initial

But yes with just 5cb28270a1 didn't add release_revisions(), that came
shortly afterwards in 2108fe4a19.

>> , but as I
>> noted if you'd run the command that commit shows, you'd have seen you're
>> re-introducing the leak it fixed. So I wonder what else has been missed
>> here.
>
> 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't leak,
> 2022-03-28) did not plug the leak.  It only moved it to the corner that
> handles the --filter option.

I think we're using "the leak" here differently. I mean callstacks that
LeakSanitizer emits & tests we have that do & don't pass with
SANITIZE=leak.

But yes, there may be multiple paths through a function, some of which
leak, some of which don't. I'm not saying that the entire set of API
features that builtin/pack-objects.c uses in the revision API is
leak-free.

> That leak is only interesting to Git developers and harmless for users.
> But if the goal is to become free of trivial leaks in order to allow
> using tools like LeakSanitizer to find real ones then pushing them into
> the shadows not yet reached by our test coverage won't help for long.

It's clearly helping in this case, as our CI had multiple failing tests.

>> I vaguely recall that one reason I ended up with that J.5.7 dependency
>> was because there was an objection to mocking up the "struct option" as
>> I'm doing here. I.e. here we assume that the
>> opt_parse_list_objects_filter() is only ever going to care about the
>> "value" member.
>
> It's probably fine, but unnecessarily complicated compared to calling
> repo_init_revisions() eagerly.

I'm leaving aside the question of whether we should go for some version
of the refactoring in your 3/3.

What I am saying is that such refactoring should be split up from the
more narrow bug fix to the existing code. I.e. this as a replacement for
your 3/3 is all that's needed to pass the test you're adding in 2/3.

-- >8 --
From: =?UTF-8?q?Ren=C3=A9=20Scharfe?= <l.s.r@web.de>
Subject: [PATCH] pack-objects: support multiple --filter options again

5cb28270a1f (pack-objects: lazily set up "struct rev_info", don't
leak, 2022-03-28) broke support for multiple --filter options by
calling repo_init_revisions() every time "--filter" was seen. Instead
we should only do so the first time, and subsequently append to the
existing filter data.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/pack-objects.c                 | 3 ++-
 t/t5317-pack-objects-filter-objects.sh | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 573d0b20b76..c702c09dd45 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -4158,7 +4158,8 @@ static struct list_objects_filter_options *po_filter_revs_init(void *value)
 {
 	struct po_filter_data *data = value;
 
-	repo_init_revisions(the_repository, &data->revs, NULL);
+	if (!data->have_revs)
+		repo_init_revisions(the_repository, &data->revs, NULL);
 	data->have_revs = 1;
 
 	return &data->revs.filter;
diff --git a/t/t5317-pack-objects-filter-objects.sh b/t/t5317-pack-objects-filter-objects.sh
index 25faebaada8..5b707d911b5 100755
--- a/t/t5317-pack-objects-filter-objects.sh
+++ b/t/t5317-pack-objects-filter-objects.sh
@@ -265,7 +265,7 @@ test_expect_success 'verify normal and blob:limit packfiles have same commits/tr
 	test_cmp expected observed
 '
 
-test_expect_failure 'verify small limit and big limit results in small limit' '
+test_expect_success 'verify small limit and big limit results in small limit' '
 	git -C r2 ls-files -s large.1000 >ls_files_result &&
 	test_parse_ls_files_stage_oids <ls_files_result |
 	sort >expected &&
René Scharfe Nov. 28, 2022, 3:56 p.m. UTC | #10
Am 28.11.2022 um 15:34 schrieb Ævar Arnfjörð Bjarmason:
>
> On Mon, Nov 28 2022, René Scharfe wrote:
>
>> Am 28.11.2022 um 12:31 schrieb Ævar Arnfjörð Bjarmason:
>>>
>>> On Mon, Nov 28 2022, René Scharfe wrote:
>>>
>>>> Am 28.11.2022 um 11:03 schrieb Junio C Hamano:
>>>>> René Scharfe <l.s.r@web.de> writes:
>>>>>
>>>>>> This reverts commit 5cb28270a1ff94a0a23e67b479bbbec3bc993518.
>>>>>>
>>>>>> 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't leak,
>>>>>> 2022-03-28) avoided leaking rev_info allocations in many cases by
>>>>>> calling repo_init_revisions() only when the .filter member was actually
>>>>>> needed, but then still leaking it.  That was fixed later by 2108fe4a19
>>>>>> (revisions API users: add straightforward release_revisions(),
>>>>>> 2022-04-13), making the reverted commit unnecessary.
>>>>>
>>>>> Hmph, with this merged, 'seen' breaks linux-leaks job in a strange
>>>>> way.
>>>>>
>>>>> https://github.com/git/git/actions/runs/3563546608/jobs/5986458300#step:5:3917
>>>>>
>>>>> Does anybody want to help looking into it?
>>>
>>> [I see we crossed E-Mails]:
>>> https://lore.kernel.org/git/221128.868rjvmi3l.gmgdl@evledraar.gmail.com/
>>>
>>>> The patch exposes that release_revisions() leaks the diffopt allocations
>>>> as we're yet to address the TODO added by 54c8a7c379 (revisions API: add
>>>> a TODO for diff_free(&revs->diffopt), 2022-04-14).
>>>
>>> That's correct, and we have that leak in various places in our codebase,
>>> but per the above side-thread I think this is primarily exposing that
>>> we're setting up the "struct rev_info" with your change when we don't
>>> need to. Why can't we just skip it?
>>
>> I have no idea how to stop get_object_list() from using struct rev_info.
>> We could let it take a struct list_objects_filter_options pointer
>> instead and have it build a struct rev_info internally, but that would
>> just move the problem, not solve it.
>
> I mean skip it when it's not needed, it's needed when we call
> get_object_list().
>
> But what "problem" is being caused by get_object_list()? That there's
> some other case(s) where it'll leak still? I haven't checked, I think we
> should leave that for some other time if there's such leaks, and just
> not introduce any new leaks in this topic.

The problem is "How to use struct rev_info without leaks?".  No matter
where you move it, the leak will be present until the TODO in
release_revisions() is done.

>
>>> Yeah, if we do set it up we'll run into an outstanding leak, and that
>>> should also be fixed (I have some local patches...), but the other cases
>>> I know of where we'll leak that data is where we're actually using the
>>> "struct rev_info".
>>>
>>> I haven't tried tearing your change apart to poke at it myself, and
>>> maybe there's some really good reason for why you can't separate getting
>>> rid of the J.5.7 dependency and removing the lazy-init.
>>>
>>>> The patch below plugs it locally.
>>>>
>>>> --- >8 ---
>>>> Subject: [PATCH 4/3] fixup! revision: free diffopt in release_revisions()
>>>>
>>>> Signed-off-by: René Scharfe <l.s.r@web.de>
>>>> ---
>>>>  builtin/pack-objects.c | 1 +
>>>>  1 file changed, 1 insertion(+)
>>>>
>>>> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
>>>> index 3e74fbb0cd..a47a3f0fba 100644
>>>> --- a/builtin/pack-objects.c
>>>> +++ b/builtin/pack-objects.c
>>>> @@ -4462,6 +4462,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
>>>>  	} else {
>>>>  		get_object_list(&revs, rp.nr, rp.v);
>>>>  	}
>>>> +	diff_free(&revs.diffopt);
>>>>  	release_revisions(&revs);
>>>>  	cleanup_preferred_base();
>>>>  	if (include_tag && nr_result)
>>>
>>> So, the main motivation for the change was paranoia that a compiler or
>>> platform might show up without J.5.7 support and that would bite us, but
>>> we're now adding a double-free-in-waiting?
>>>
>>> I think we're both a bit paranoid, but clearly have different
>>> paranoia-priorities :)
>>>
>>> If we do end up with some hack like this instead of fixing the
>>> underlying problem I'd much prefer that such a hack just be an UNLEAK()
>>> here.
>>>
>>> I.e. we have a destructor for "revs.*" already, let's not bypass it and
>>> start freeing things from under it, which will result in a double-free
>>> if we forget this callsite once the TODO in 54c8a7c379 is addressed.
>>
>> Well, that TODO fix should remove this new diff_free() call, but I
>> agree that this is fragile.
>>
>> Removing the "TEST_PASSES_SANITIZE_LEAK=true" line from affected tests
>> is probably better.
>
> Or just not introduce new leaks, per my suggested fix-up at
> https://lore.kernel.org/git/221128.86zgcbl0pe.gmgdl@evledraar.gmail.com/
> (which it looks like you haven't seen when this E-Mail is composed...).

Not adding leaks is a good idea.  AFAICS none of my patches so far add
any.  Patch 3 of v2 exposes an existing one that was only triggered by
the --filter option before.  Which is also not ideal, of course, but
giving it more visibility hopefully will motivate a proper fix.

>>> As you'd see if you made release_revisions() simply call
>>> diff_free(&revs.diffopt) doing so would reveal some really gnarly edge
>>> cases.
>>
>> That was my first attempt; it breaks lots of tests due to double frees.
>
> Right, to be clear I'm saying that none of this is needed right now,
> i.e. I don't get why we'd want the scope-creep past the hunk I noted in
> https://lore.kernel.org/git/221128.868rjvmi3l.gmgdl@evledraar.gmail.com/
> for the --filter bug fix (plus the tests you're adding).

Well, you asked to squash the minimal fix into the laziness removal in
https://lore.kernel.org/git/221112.86bkpcmm6i.gmgdl@evledraar.gmail.com/

Reverting 5cb28270a1 (pack-objects: lazily set up "struct rev_info",
don't leak, 2022-03-28) wholesale is the simplest way to reach the goals
of regression fix, simplification and standard compliance.  Except that
the leak check tests have come to depend on the leak being hidden in the
--filter corner.  So going back to v1 sure seems attractive.

>
>>> I haven't dug into this one, but offhand I'm not confident in saying
>>> that this isn't exposing us to some aspect of that gnarlyness (maybe
>>> not, it's been a while since I looked).
>>
>> I saw it as the way towards a release_revisions() that calls diff_free()
>> itself: Add such calls to each of them, fix the "gnarlyness"
>> individually, finally move them all into release_revisions().  The only
>> problem is that there are 60+ callsites.
>
> I think this is a really bad approach in general.
>
> Yes, it may happen to work to free() some data from under an API, but
> it's just as likely that we'll miss that this one caller is screwing
> with its internal state, and e.g. when some new revision.c code is used
> it'll go boom.
>
> If we wanted to phase in such a free() of "foo" I think the right way
> would be to add some "revs.free_foo = 1" flag, giving the API a chance
> to treat that sanely, not to start poking at members of the struct, and
> assuming that its release() won't be free()-ing them.

And that's why you added no_free to struct diff_options.  We could use
it here by setting it in repo_init_revisions() and unsetting in
cmd_pack_objects() and elsewhere, until it is set everywhere.

> But as noted above & in the linked I think we can defer all of that. The
> only reason we're discussing this is because you're changing the
> lazy-init to be not-lazy, and introducing new leaks as a result.>
> I've shown a couple of approaches in this thread of fixing the issue(s)
> at hand without introducing such leaks, so ...

As noted above: These leaks are not new, they are just moved into
test coverage.

René
René Scharfe Nov. 28, 2022, 5:57 p.m. UTC | #11
Am 28.11.2022 um 16:56 schrieb René Scharfe:>
> The problem is "How to use struct rev_info without leaks?".  No matter
> where you move it, the leak will be present until the TODO in
> release_revisions() is done.

Wrong.  The following sequence leaks:

	struct rev_info revs;
	repo_init_revisions(the_repository, &revs, NULL);
	release_revisions(&revs);

... and this here doesn't:

	struct rev_info revs;
	repo_init_revisions(the_repository, &revs, NULL);
	setup_revisions(0, NULL, &revs, NULL);  // leak plugger
	release_revisions(&revs);

That's because setup_revisions() calls diff_setup_done(), which frees
revs->diffopt.parseopts, and release_revisions() doesn't.

And since builtin/pack-objects.c::get_object_list() calls
setup_revisions(), it really frees that memory, as you claimed from the
start.  Sorry, I was somehow assuming that a setup function wouldn't
clean up.  D'oh!

The first sequence is used in some other places. e.g. builtin/prune.c.
But there LeakSanitizer doesn't complain for some reason; Valgrind
reports the parseopts allocation as "possibly lost".

I still think the assumption that "init_x(x); release_x(x);" doesn't
leak is reasonable.  Let's make it true.  How about this?  It's safe
in the sense that we don't risk double frees and it's close to the
TODO comment so we probably won't forget removing it once diff_free()
becomes used.

---
 revision.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/revision.c b/revision.c
index 439e34a7c5..6a51ef9418 100644
--- a/revision.c
+++ b/revision.c
@@ -3055,6 +3055,7 @@ void release_revisions(struct rev_info *revs)
 	release_revisions_mailmap(revs->mailmap);
 	free_grep_patterns(&revs->grep_filter);
 	/* TODO (need to handle "no_free"): diff_free(&revs->diffopt) */
+	FREE_AND_NULL(revs->diffopt.parseopts);
 	diff_free(&revs->pruning);
 	reflog_walk_info_release(revs->reflog_info);
 	release_revisions_topo_walk_info(revs->topo_walk_info);
--
2.30.2
Ævar Arnfjörð Bjarmason Nov. 28, 2022, 5:57 p.m. UTC | #12
On Mon, Nov 28 2022, René Scharfe wrote:

> Am 28.11.2022 um 15:34 schrieb Ævar Arnfjörð Bjarmason:
> [...]
>> I mean skip it when it's not needed, it's needed when we call
>> get_object_list().
>>
>> But what "problem" is being caused by get_object_list()? That there's
>> some other case(s) where it'll leak still? I haven't checked, I think we
>> should leave that for some other time if there's such leaks, and just
>> not introduce any new leaks in this topic.
>
> The problem is "How to use struct rev_info without leaks?".  No matter
> where you move it, the leak will be present until the TODO in
> release_revisions() is done.

No, the problem at hand is that "--filter" doesn't accept N arguments,
which is easily fixed.

Junio then tested the patch, and noted CI failing. Yes we have various
outstanding issues with the revisions API etc. leaking, but the
scope-creep of conflating those questions with a regression fix doesn't
seem necessary.

>>> [...]
>>> Well, that TODO fix should remove this new diff_free() call, but I
>>> agree that this is fragile.
>>>
>>> Removing the "TEST_PASSES_SANITIZE_LEAK=true" line from affected tests
>>> is probably better.
>>
>> Or just not introduce new leaks, per my suggested fix-up at
>> https://lore.kernel.org/git/221128.86zgcbl0pe.gmgdl@evledraar.gmail.com/
>> (which it looks like you haven't seen when this E-Mail is composed...).
>
> Not adding leaks is a good idea.  AFAICS none of my patches so far add
> any.

I don't see where debating what constitutes a "new" or "added" memory
leak is going to get us.

But to clarify I'm not saying you're responsible for the
revisions/filter APIs you're running into, and some of those are
existing leaks.

I'm also saying that if you run:

	make SANITIZE=leak test T=...

For some values of T=... (and the whole test suite with
"GIT_TEST_PASSING_SANITIZE_LEAK=true") your 2/3 passes certain tests,
and your 3/3 doesn't.

> Patch 3 of v2 exposes an existing one that was only triggered by
> the --filter option before.  Which is also not ideal, of course, but
> giving it more visibility hopefully will motivate a proper fix.

We try not to break CI. Only finding out that existing CI breaks once a
patch is queued & Junio & others start testing it to hunt down issues
isn't a good use of anyone's time.

So, maybe you think this area of CI is useless etc., but again, if
you're submitting a change to advocate for that can we peel that away
from a regression we can fix relatively easily?

>>>> As you'd see if you made release_revisions() simply call
>>>> diff_free(&revs.diffopt) doing so would reveal some really gnarly edge
>>>> cases.
>>>
>>> That was my first attempt; it breaks lots of tests due to double frees.
>>
>> Right, to be clear I'm saying that none of this is needed right now,
>> i.e. I don't get why we'd want the scope-creep past the hunk I noted in
>> https://lore.kernel.org/git/221128.868rjvmi3l.gmgdl@evledraar.gmail.com/
>> for the --filter bug fix (plus the tests you're adding).
>
> Well, you asked to squash the minimal fix into the laziness removal in
> https://lore.kernel.org/git/221112.86bkpcmm6i.gmgdl@evledraar.gmail.com/

The second part of the relevant request being so that we could
regression test the memory leak(s).

So, thanks for trying to address that in some way, but proceeding to
break linux-leaks, and then arguing that that's OK seems like an odd way
to go about that...

> Reverting 5cb28270a1 (pack-objects: lazily set up "struct rev_info",
> don't leak, 2022-03-28) wholesale is the simplest way to reach the goals
> of regression fix, simplification and standard compliance.  Except that
> the leak check tests have come to depend on the leak being hidden in the
> --filter corner.  So going back to v1 sure seems attractive.

Yeah.

Honestly I'd mostly forgotten about the v1 review, and only skimmed
things then. So, I've just noticed that the "squash" I'm suggesting is
basically equivalent to your v1 now.

But yeah I think it's fine to just skip testing the regression fix with
SANITIZE=leak for now, we can do it some other time.

I just brought it up in the v1 feedback because that patch prominently
notes the leak fix, so if we can get regression test it relatively
easily that seemed like a good change to make.

>>> [...]
>>> I saw it as the way towards a release_revisions() that calls diff_free()
>>> itself: Add such calls to each of them, fix the "gnarlyness"
>>> individually, finally move them all into release_revisions().  The only
>>> problem is that there are 60+ callsites.
>>
>> I think this is a really bad approach in general.
>>
>> Yes, it may happen to work to free() some data from under an API, but
>> it's just as likely that we'll miss that this one caller is screwing
>> with its internal state, and e.g. when some new revision.c code is used
>> it'll go boom.
>>
>> If we wanted to phase in such a free() of "foo" I think the right way
>> would be to add some "revs.free_foo = 1" flag, giving the API a chance
>> to treat that sanely, not to start poking at members of the struct, and
>> assuming that its release() won't be free()-ing them.
>
> And that's why you added no_free to struct diff_options.  We could use
> it here by setting it in repo_init_revisions() and unsetting in
> cmd_pack_objects() and elsewhere, until it is set everywhere.

FWIW I think the "no_free" isn't all that good of an approach in
retrospect either, but at least it's (mostly) self-contained in the
struct that owns it.

But it's rather messy, because some of the time it's embedded in a
"struct rev_info" that should really own it, and sometimes now. At the
time release_revisions() didn't exist yet, so making some forward
progress with the diff leaks was easiest with the "no_free".

>> But as noted above & in the linked I think we can defer all of that. The
>> only reason we're discussing this is because you're changing the
>> lazy-init to be not-lazy, and introducing new leaks as a result.>
>> I've shown a couple of approaches in this thread of fixing the issue(s)
>> at hand without introducing such leaks, so ...
>
> As noted above: These leaks are not new, they are just moved into
> test coverage.

Right, and in some cases we should definitely just un-mark a test as
being tested under linux-leaks to make forward progress.

But in this case I just don't see how that's a good trade-off. We can
fix the --filter bug without that, and yes we'll have some new/existing
leaks, but those are all contained in tests that are not tested by
linux-leaks now.

Whereas the larger rewrite to make the init non-lazy would lose us test
coverage.
Ævar Arnfjörð Bjarmason Nov. 28, 2022, 6:32 p.m. UTC | #13
On Mon, Nov 28 2022, René Scharfe wrote:

> Am 28.11.2022 um 16:56 schrieb René Scharfe:>
>> The problem is "How to use struct rev_info without leaks?".  No matter
>> where you move it, the leak will be present until the TODO in
>> release_revisions() is done.
>
> Wrong.  The following sequence leaks:
>
> 	struct rev_info revs;
> 	repo_init_revisions(the_repository, &revs, NULL);
> 	release_revisions(&revs);
>
> ... and this here doesn't:
>
> 	struct rev_info revs;
> 	repo_init_revisions(the_repository, &revs, NULL);
> 	setup_revisions(0, NULL, &revs, NULL);  // leak plugger
> 	release_revisions(&revs);
>
> That's because setup_revisions() calls diff_setup_done(), which frees
> revs->diffopt.parseopts, and release_revisions() doesn't.
>
> And since builtin/pack-objects.c::get_object_list() calls
> setup_revisions(), it really frees that memory, as you claimed from the
> start.  Sorry, I was somehow assuming that a setup function wouldn't
> clean up.  D'oh!
>
> The first sequence is used in some other places. e.g. builtin/prune.c.
> But there LeakSanitizer doesn't complain for some reason; Valgrind
> reports the parseopts allocation as "possibly lost".

Yes, some of the interactions are tricky. It's really useful to run the
tests with GIT_TEST_PASSING_SANITIZE_LEAK=[true|check] (see t/README) to
check these sorts of assumptions for sanity.

> I still think the assumption that "init_x(x); release_x(x);" doesn't
> leak is reasonable.  Let's make it true.  How about this?  It's safe
> in the sense that we don't risk double frees and it's close to the
> TODO comment so we probably won't forget removing it once diff_free()
> becomes used.
>
> ---
>  revision.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/revision.c b/revision.c
> index 439e34a7c5..6a51ef9418 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -3055,6 +3055,7 @@ void release_revisions(struct rev_info *revs)
>  	release_revisions_mailmap(revs->mailmap);
>  	free_grep_patterns(&revs->grep_filter);
>  	/* TODO (need to handle "no_free"): diff_free(&revs->diffopt) */
> +	FREE_AND_NULL(revs->diffopt.parseopts);
>  	diff_free(&revs->pruning);
>  	reflog_walk_info_release(revs->reflog_info);
>  	release_revisions_topo_walk_info(revs->topo_walk_info);

At this point I'm unclear on what & why this is needed? I.e. once we
narrowly fix the >1 "--filter" options what still fails?

But in general: I don't really think this sort of thing is worth
it. Here we're reaching into a member of "revs->diffopt" behind its back
rather than calling diff_free(). I think we should just focus on being
able to do do that safely.

WIP patches I have in that direction, partially based on your previous
"is_dead" suggestion:

	https://github.com/avar/git/commit/e02a15f6206
	https://github.com/avar/git/commit/c718f36566a

I haven't poked at that in a while, I think the only outstanding issue
with it is that fclose() interaction.

I think for this particular thing there aren't going to be any bad
side-effects in practice, but I also think convincing oneself of that
basically means putting the same amount of work in as just fixing some
of these properly.
René Scharfe Nov. 28, 2022, 9:57 p.m. UTC | #14
Am 28.11.2022 um 19:32 schrieb Ævar Arnfjörð Bjarmason:
>
> On Mon, Nov 28 2022, René Scharfe wrote:
>
>> Am 28.11.2022 um 16:56 schrieb René Scharfe:>
>>> The problem is "How to use struct rev_info without leaks?".  No matter
>>> where you move it, the leak will be present until the TODO in
>>> release_revisions() is done.
>>
>> Wrong.  The following sequence leaks:
>>
>> 	struct rev_info revs;
>> 	repo_init_revisions(the_repository, &revs, NULL);
>> 	release_revisions(&revs);
>>
>> ... and this here doesn't:
>>
>> 	struct rev_info revs;
>> 	repo_init_revisions(the_repository, &revs, NULL);
>> 	setup_revisions(0, NULL, &revs, NULL);  // leak plugger
>> 	release_revisions(&revs);
>>
>> That's because setup_revisions() calls diff_setup_done(), which frees
>> revs->diffopt.parseopts, and release_revisions() doesn't.
>>
>> And since builtin/pack-objects.c::get_object_list() calls
>> setup_revisions(), it really frees that memory, as you claimed from the
>> start.  Sorry, I was somehow assuming that a setup function wouldn't
>> clean up.  D'oh!
>>
>> The first sequence is used in some other places. e.g. builtin/prune.c.
>> But there LeakSanitizer doesn't complain for some reason; Valgrind
>> reports the parseopts allocation as "possibly lost".
>
> Yes, some of the interactions are tricky. It's really useful to run the
> tests with GIT_TEST_PASSING_SANITIZE_LEAK=[true|check] (see t/README) to
> check these sorts of assumptions for sanity.

That may be true, and looks even useful -- I didn't know the check
value.  I only get a strange error message, though:

   $ GIT_TEST_PASSING_SANITIZE_LEAK=check ./t0001-init.sh
   Bail out! GIT_TEST_PASSING_SANITIZE_LEAK=true has no effect except when compiled with SANITIZE=leak

Same with make test and prove, of course.  And of course I compiled
with SANITIZE=leak beforehand.

But I don't see a connection between my comment and yours.  I was
not running any tests, just the above sequences of function calls,
e.g. in git prune.

>
>> I still think the assumption that "init_x(x); release_x(x);" doesn't
>> leak is reasonable.  Let's make it true.  How about this?  It's safe
>> in the sense that we don't risk double frees and it's close to the
>> TODO comment so we probably won't forget removing it once diff_free()
>> becomes used.
>>
>> ---
>>  revision.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/revision.c b/revision.c
>> index 439e34a7c5..6a51ef9418 100644
>> --- a/revision.c
>> +++ b/revision.c
>> @@ -3055,6 +3055,7 @@ void release_revisions(struct rev_info *revs)
>>  	release_revisions_mailmap(revs->mailmap);
>>  	free_grep_patterns(&revs->grep_filter);
>>  	/* TODO (need to handle "no_free"): diff_free(&revs->diffopt) */
>> +	FREE_AND_NULL(revs->diffopt.parseopts);
>>  	diff_free(&revs->pruning);
>>  	reflog_walk_info_release(revs->reflog_info);
>>  	release_revisions_topo_walk_info(revs->topo_walk_info);
>
> At this point I'm unclear on what & why this is needed? I.e. once we
> narrowly fix the >1 "--filter" options what still fails?

As I wrote: A call to an initialization function followed by a call to a
cleanup function and nothing else shouldn't leak.  There are examples of
repo_init_revisions()+release_revisions() without setup_revisions() or
diff_setup_done() beyond pack-objects.  I mentioned prune, but there are
more, e.g. in sequencer.c.

> But in general: I don't really think this sort of thing is worth
> it. Here we're reaching into a member of "revs->diffopt" behind its back
> rather than calling diff_free(). I think we should just focus on being
> able to do do that safely.

Sure, but the FREE_AND_NULL call is simple and safe, while diff_free()
is complicated and calling it one time too many can hurt.

> WIP patches I have in that direction, partially based on your previous
> "is_dead" suggestion:
>
> 	https://github.com/avar/git/commit/e02a15f6206
> 	https://github.com/avar/git/commit/c718f36566a

Copy-typed the interesting parts of the first patch like a medieval monk
because there doesn't seem to be a download option. :-|

> I haven't poked at that in a while, I think the only outstanding issue
> with it is that fclose() interaction.

You mean the t3702-add-edit.sh failure on Windows mentioned in the
commit message of e02a15f6206?  That's caused by the file being kept
open and thus locked during the call of the editor.  Moving the
release_revisions() call in builtin/add.c::edit_patch() before the
launch_editor() call fixes that by closing the file.

> I think for this particular thing there aren't going to be any bad
> side-effects in practice, but I also think convincing oneself of that
> basically means putting the same amount of work in as just fixing some
> of these properly.

Not to me, but perhaps that TODO is easier solved that I expected.
In any case, with the mentioned edit_patch() change described above
e02a15f6206 passes the test suite on Windows for me.

René
Jeff King Nov. 29, 2022, 1:26 a.m. UTC | #15
On Mon, Nov 28, 2022 at 10:57:02PM +0100, René Scharfe wrote:

> >> diff --git a/revision.c b/revision.c
> >> index 439e34a7c5..6a51ef9418 100644
> >> --- a/revision.c
> >> +++ b/revision.c
> >> @@ -3055,6 +3055,7 @@ void release_revisions(struct rev_info *revs)
> >>  	release_revisions_mailmap(revs->mailmap);
> >>  	free_grep_patterns(&revs->grep_filter);
> >>  	/* TODO (need to handle "no_free"): diff_free(&revs->diffopt) */
> >> +	FREE_AND_NULL(revs->diffopt.parseopts);
> >>  	diff_free(&revs->pruning);
> >>  	reflog_walk_info_release(revs->reflog_info);
> >>  	release_revisions_topo_walk_info(revs->topo_walk_info);
> >
> > At this point I'm unclear on what & why this is needed? I.e. once we
> > narrowly fix the >1 "--filter" options what still fails?
> 
> As I wrote: A call to an initialization function followed by a call to a
> cleanup function and nothing else shouldn't leak.  There are examples of
> repo_init_revisions()+release_revisions() without setup_revisions() or
> diff_setup_done() beyond pack-objects.  I mentioned prune, but there are
> more, e.g. in sequencer.c.

I tend to agree with you; an init+release combo should release all
memory. We _can_ work around it in the caller here, but I think we are
better to correct the root cause.

I do think what Ævar is saying is: once we have fixed the bug, why are
more changes needed? I.e., why would we get rid of the lazy-init. IMHO
the answer is that the lazy-init is a more complex pattern, and requires
more boilerplate code (which can lead to more bugs, as we saw). So once
the bug is fixed, this is purely about cleanup/simplification (if one
ignores the C-standard issues), but I tend to think it is one worth
doing.

(Apologies to Ævar if I'm mis-stating your position).

> > But in general: I don't really think this sort of thing is worth
> > it. Here we're reaching into a member of "revs->diffopt" behind its back
> > rather than calling diff_free(). I think we should just focus on being
> > able to do do that safely.
> 
> Sure, but the FREE_AND_NULL call is simple and safe, while diff_free()
> is complicated and calling it one time too many can hurt.

Again, I'd agree with you here. In the long run, yes, we want
diff_free(). But if that is not ready yet, the FREE_AND_NULL() is a
stepping stone that takes us in the right direction and which we can do
immediately.

> > WIP patches I have in that direction, partially based on your previous
> > "is_dead" suggestion:
> >
> > 	https://github.com/avar/git/commit/e02a15f6206
> > 	https://github.com/avar/git/commit/c718f36566a
> 
> Copy-typed the interesting parts of the first patch like a medieval monk
> because there doesn't seem to be a download option. :-|

This is the actual reason I responded to your message. ;) Try:

  https://github.com/avar/git/commit/e02a15f6206.patch

etc. I don't think there's a "raw patch" link or similar in the
interface, though. You have to just know about it.

-Peff
Junio C Hamano Nov. 29, 2022, 1:46 a.m. UTC | #16
Jeff King <peff@peff.net> writes:

> This is the actual reason I responded to your message. ;) Try:
>
>   https://github.com/avar/git/commit/e02a15f6206.patch
>
> etc. I don't think there's a "raw patch" link or similar in the
> interface, though. You have to just know about it.

;-)
Ævar Arnfjörð Bjarmason Nov. 29, 2022, 7:12 a.m. UTC | #17
On Mon, Nov 28 2022, René Scharfe wrote:

> Am 28.11.2022 um 19:32 schrieb Ævar Arnfjörð Bjarmason:
>>
>> On Mon, Nov 28 2022, René Scharfe wrote:
>>
>>> Am 28.11.2022 um 16:56 schrieb René Scharfe:>
>>>> The problem is "How to use struct rev_info without leaks?".  No matter
>>>> where you move it, the leak will be present until the TODO in
>>>> release_revisions() is done.
>>>
>>> Wrong.  The following sequence leaks:
>>>
>>> 	struct rev_info revs;
>>> 	repo_init_revisions(the_repository, &revs, NULL);
>>> 	release_revisions(&revs);
>>>
>>> ... and this here doesn't:
>>>
>>> 	struct rev_info revs;
>>> 	repo_init_revisions(the_repository, &revs, NULL);
>>> 	setup_revisions(0, NULL, &revs, NULL);  // leak plugger
>>> 	release_revisions(&revs);
>>>
>>> That's because setup_revisions() calls diff_setup_done(), which frees
>>> revs->diffopt.parseopts, and release_revisions() doesn't.
>>>
>>> And since builtin/pack-objects.c::get_object_list() calls
>>> setup_revisions(), it really frees that memory, as you claimed from the
>>> start.  Sorry, I was somehow assuming that a setup function wouldn't
>>> clean up.  D'oh!
>>>
>>> The first sequence is used in some other places. e.g. builtin/prune.c.
>>> But there LeakSanitizer doesn't complain for some reason; Valgrind
>>> reports the parseopts allocation as "possibly lost".
>>
>> Yes, some of the interactions are tricky. It's really useful to run the
>> tests with GIT_TEST_PASSING_SANITIZE_LEAK=[true|check] (see t/README) to
>> check these sorts of assumptions for sanity.
>
> That may be true, and looks even useful -- I didn't know the check
> value.  I only get a strange error message, though:
>
>    $ GIT_TEST_PASSING_SANITIZE_LEAK=check ./t0001-init.sh
>    Bail out! GIT_TEST_PASSING_SANITIZE_LEAK=true has no effect except when compiled with SANITIZE=leak
>
> Same with make test and prove, of course.  And of course I compiled
> with SANITIZE=leak beforehand.

The "=true" part of the message is unfortunately incorrect (it pre-dates
"check" being a possible value), but I don't see how you could have
compiled with "SANITIZE=leak" and get that message.

It's unreachable if 'test -n "$SANITIZE_LEAK"', and that'll be non-empty
in GIT-BUILD-OPTIONS if compiled with it. Perhaps you gave SANITIZE=leak
to t/Makefile, not the top-level Makefile?

Try this at the top-level:

	GIT_TEST_PASSING_SANITIZE_LEAK=check make SANITIZE= test T=t0001-init.sh

> But I don't see a connection between my comment and yours.  I was
> not running any tests, just the above sequences of function calls,
> e.g. in git prune.

The connection is that you submitted patches that fail the CI. I don't
think you use GitHub, so in particular if you're submitting patches that
claim to do something with leak checking it's useful to run those modes
against them to check your assumptions.

>>
>>> I still think the assumption that "init_x(x); release_x(x);" doesn't
>>> leak is reasonable.  Let's make it true.  How about this?  It's safe
>>> in the sense that we don't risk double frees and it's close to the
>>> TODO comment so we probably won't forget removing it once diff_free()
>>> becomes used.
>>>
>>> ---
>>>  revision.c | 1 +
>>>  1 file changed, 1 insertion(+)
>>>
>>> diff --git a/revision.c b/revision.c
>>> index 439e34a7c5..6a51ef9418 100644
>>> --- a/revision.c
>>> +++ b/revision.c
>>> @@ -3055,6 +3055,7 @@ void release_revisions(struct rev_info *revs)
>>>  	release_revisions_mailmap(revs->mailmap);
>>>  	free_grep_patterns(&revs->grep_filter);
>>>  	/* TODO (need to handle "no_free"): diff_free(&revs->diffopt) */
>>> +	FREE_AND_NULL(revs->diffopt.parseopts);
>>>  	diff_free(&revs->pruning);
>>>  	reflog_walk_info_release(revs->reflog_info);
>>>  	release_revisions_topo_walk_info(revs->topo_walk_info);
>>
>> At this point I'm unclear on what & why this is needed? I.e. once we
>> narrowly fix the >1 "--filter" options what still fails?
>
> As I wrote: A call to an initialization function followed by a call to a
> cleanup function and nothing else shouldn't leak.  There are examples of
> repo_init_revisions()+release_revisions() without setup_revisions() or
> diff_setup_done() beyond pack-objects.  I mentioned prune, but there are
> more, e.g. in sequencer.c.

Yes, I agree it shouldn't leak. And we should definitely fix those
leaks. I just don't see why a series fixing bugs in --filter needs to
expand the scope to fix those.

>> But in general: I don't really think this sort of thing is worth
>> it. Here we're reaching into a member of "revs->diffopt" behind its back
>> rather than calling diff_free(). I think we should just focus on being
>> able to do do that safely.
>
> Sure, but the FREE_AND_NULL call is simple and safe, while diff_free()
> is complicated and calling it one time too many can hurt.

It's "safe" because you've read the internals of it, and know that it
isn't assuming a non-NULL there once it's past initialization?

Or is it like the revisions init()+release() in this thread, where
you're assuming it works one way based on the function names etc., only
for the CI to fail?

In either case, I'm saying that if someone's confident enough to reach
into the internals of a structure and tweak it they should be confident
enough to just patch diff_free() or the like.

>> WIP patches I have in that direction, partially based on your previous
>> "is_dead" suggestion:
>>
>> 	https://github.com/avar/git/commit/e02a15f6206
>> 	https://github.com/avar/git/commit/c718f36566a
>
> Copy-typed the interesting parts of the first patch like a medieval monk
> because there doesn't seem to be a download option. :-|

Jeff pointed out the ".patch" (there's also ".diff"), but also: Git has
this well-known transport protocol it uses, which typically maps to the
web URL on public hosting sites ... :)

	git remote add avar https://github.com/avar/git.git
	git fetch avar
	git show <OID>

>> I haven't poked at that in a while, I think the only outstanding issue
>> with it is that fclose() interaction.
>
> You mean the t3702-add-edit.sh failure on Windows mentioned in the
> commit message of e02a15f6206?  That's caused by the file being kept
> open and thus locked during the call of the editor.  Moving the
> release_revisions() call in builtin/add.c::edit_patch() before the
> launch_editor() call fixes that by closing the file.

Yeah, I haven't had time to poke at it in a while, and I had it queued
behind various other leak fixes.

I'm sure it's not that complicated in the end, just that that
interaction needs to be dealt with.

>> I think for this particular thing there aren't going to be any bad
>> side-effects in practice, but I also think convincing oneself of that
>> basically means putting the same amount of work in as just fixing some
>> of these properly.
>
> Not to me, but perhaps that TODO is easier solved that I expected.
> In any case, with the mentioned edit_patch() change described above
> e02a15f6206 passes the test suite on Windows for me.

Nice! I'll try that out.
Ævar Arnfjörð Bjarmason Nov. 29, 2022, 10:25 a.m. UTC | #18
On Mon, Nov 28 2022, Jeff King wrote:

> On Mon, Nov 28, 2022 at 10:57:02PM +0100, René Scharfe wrote:
>
>> >> diff --git a/revision.c b/revision.c
>> >> index 439e34a7c5..6a51ef9418 100644
>> >> --- a/revision.c
>> >> +++ b/revision.c
>> >> @@ -3055,6 +3055,7 @@ void release_revisions(struct rev_info *revs)
>> >>  	release_revisions_mailmap(revs->mailmap);
>> >>  	free_grep_patterns(&revs->grep_filter);
>> >>  	/* TODO (need to handle "no_free"): diff_free(&revs->diffopt) */
>> >> +	FREE_AND_NULL(revs->diffopt.parseopts);
>> >>  	diff_free(&revs->pruning);
>> >>  	reflog_walk_info_release(revs->reflog_info);
>> >>  	release_revisions_topo_walk_info(revs->topo_walk_info);
>> >
>> > At this point I'm unclear on what & why this is needed? I.e. once we
>> > narrowly fix the >1 "--filter" options what still fails?
>> 
>> As I wrote: A call to an initialization function followed by a call to a
>> cleanup function and nothing else shouldn't leak.  There are examples of
>> repo_init_revisions()+release_revisions() without setup_revisions() or
>> diff_setup_done() beyond pack-objects.  I mentioned prune, but there are
>> more, e.g. in sequencer.c.
>
> I tend to agree with you; an init+release combo should release all
> memory. We _can_ work around it in the caller here, but I think we are
> better to correct the root cause.
>
> I do think what Ævar is saying is: once we have fixed the bug, why are
> more changes needed? I.e., why would we get rid of the lazy-init. IMHO
> the answer is that the lazy-init is a more complex pattern, and requires
> more boilerplate code (which can lead to more bugs, as we saw). So once
> the bug is fixed, this is purely about cleanup/simplification (if one
> ignores the C-standard issues), but I tend to think it is one worth
> doing.
>
> (Apologies to Ævar if I'm mis-stating your position).

My position isn't that it isn't worth doing, but that we can clearly
proceed in smaller steps here, and changing one thing at a time is
better.

So far in this thread we've had, in relation to 3/3 (and I may have lost
track of some points made):

1. It's fixing the bug where you provide --filter N times
2. It's refactoring the "lazy init" to "non-lazy init"
3. It's refactoring the code to avoid relying on the J.5.7 extension in C99.
4. Due to #2 we run into more leaks
5. A dozen or so test files fail due to #2. Are we digressing to fix the
   root cause of the leak(s), or un-marking those tests as passing
   leak-free & losing test coverage?

As
https://lore.kernel.org/git/221128.868rjvmi3l.gmgdl@evledraar.gmail.com/
shows we can do #1 as a stand-alone change.

I'm not saying the rest of this isn't worth pursuing, except to point
out in
https://lore.kernel.org/git/221128.86zgcbl0pe.gmgdl@evledraar.gmail.com/
that #2 can be split off from #3.
René Scharfe Nov. 29, 2022, 7:18 p.m. UTC | #19
Am 29.11.22 um 08:12 schrieb Ævar Arnfjörð Bjarmason:
>
> On Mon, Nov 28 2022, René Scharfe wrote:
>
>> That may be true, and looks even useful -- I didn't know the check
>> value.  I only get a strange error message, though:
>>
>>    $ GIT_TEST_PASSING_SANITIZE_LEAK=check ./t0001-init.sh
>>    Bail out! GIT_TEST_PASSING_SANITIZE_LEAK=true has no effect except when compiled with SANITIZE=leak
>>
>> Same with make test and prove, of course.  And of course I compiled
>> with SANITIZE=leak beforehand.
>
> The "=true" part of the message is unfortunately incorrect (it pre-dates
> "check" being a possible value), but I don't see how you could have
> compiled with "SANITIZE=leak" and get that message.
>
> It's unreachable if 'test -n "$SANITIZE_LEAK"', and that'll be non-empty
> in GIT-BUILD-OPTIONS if compiled with it. Perhaps you gave SANITIZE=leak
> to t/Makefile, not the top-level Makefile?
>
> Try this at the top-level:
>
> 	GIT_TEST_PASSING_SANITIZE_LEAK=check make SANITIZE= test T=t0001-init.sh

It works today, no idea what I did yesterday.  Did reboot and make
clean in the meantime, shell history is inconclusive.  *shrug*

>> As I wrote: A call to an initialization function followed by a call to a
>> cleanup function and nothing else shouldn't leak.  There are examples of
>> repo_init_revisions()+release_revisions() without setup_revisions() or
>> diff_setup_done() beyond pack-objects.  I mentioned prune, but there are
>> more, e.g. in sequencer.c.
>
> Yes, I agree it shouldn't leak. And we should definitely fix those
> leaks. I just don't see why a series fixing bugs in --filter needs to
> expand the scope to fix those.

The connection is that this is the very leak that 5cb28270a1
(pack-objects: lazily set up "struct rev_info", don't leak, 2022-03-28)
plugged locally.  Took me a while to see that.  Anyway, I'm also not
keen on scope creep.

>>> But in general: I don't really think this sort of thing is worth
>>> it. Here we're reaching into a member of "revs->diffopt" behind its back
>>> rather than calling diff_free(). I think we should just focus on being
>>> able to do do that safely.
>>
>> Sure, but the FREE_AND_NULL call is simple and safe, while diff_free()
>> is complicated and calling it one time too many can hurt.
>
> It's "safe" because you've read the internals of it, and know that it
> isn't assuming a non-NULL there once it's past initialization?
>
> Or is it like the revisions init()+release() in this thread, where
> you're assuming it works one way based on the function names etc., only
> for the CI to fail?

Ouch.

> In either case, I'm saying that if someone's confident enough to reach
> into the internals of a structure and tweak it they should be confident
> enough to just patch diff_free() or the like.

diff_free() is more complicated; it does that FREE_AND_NULL plus several
things that are not idempotent.

>>> WIP patches I have in that direction, partially based on your previous
>>> "is_dead" suggestion:
>>>
>>> 	https://github.com/avar/git/commit/e02a15f6206
>>> 	https://github.com/avar/git/commit/c718f36566a
>>
>> Copy-typed the interesting parts of the first patch like a medieval monk
>> because there doesn't seem to be a download option. :-|
>
> Jeff pointed out the ".patch" (there's also ".diff"), but also: Git has
> this well-known transport protocol it uses, which typically maps to the
> web URL on public hosting sites ... :)
>
> 	git remote add avar https://github.com/avar/git.git
> 	git fetch avar
> 	git show <OID>

Yes, I probably should have downloaded everything like that.  Just did
it for the heck of it and got 43.37 MiB at 598.00 KiB/s.  Typing wasn't
that much slower.

René
diff mbox series

Patch

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 573d0b20b7..3e74fbb0cd 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -4149,21 +4149,6 @@  static int option_parse_cruft_expiration(const struct option *opt,
 	return 0;
 }

-struct po_filter_data {
-	unsigned have_revs:1;
-	struct rev_info revs;
-};
-
-static struct list_objects_filter_options *po_filter_revs_init(void *value)
-{
-	struct po_filter_data *data = value;
-
-	repo_init_revisions(the_repository, &data->revs, NULL);
-	data->have_revs = 1;
-
-	return &data->revs.filter;
-}
-
 int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 {
 	int use_internal_rev_list = 0;
@@ -4174,7 +4159,7 @@  int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 	int rev_list_index = 0;
 	int stdin_packs = 0;
 	struct string_list keep_pack_list = STRING_LIST_INIT_NODUP;
-	struct po_filter_data pfd = { .have_revs = 0 };
+	struct rev_info revs;

 	struct option pack_objects_options[] = {
 		OPT_SET_INT('q', "quiet", &progress,
@@ -4265,7 +4250,7 @@  int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 			      &write_bitmap_index,
 			      N_("write a bitmap index if possible"),
 			      WRITE_BITMAP_QUIET, PARSE_OPT_HIDDEN),
-		OPT_PARSE_LIST_OBJECTS_FILTER_INIT(&pfd, po_filter_revs_init),
+		OPT_PARSE_LIST_OBJECTS_FILTER(&revs.filter),
 		OPT_CALLBACK_F(0, "missing", NULL, N_("action"),
 		  N_("handling for missing objects"), PARSE_OPT_NONEG,
 		  option_parse_missing_action),
@@ -4284,6 +4269,8 @@  int cmd_pack_objects(int argc, const char **argv, const char *prefix)

 	read_replace_refs = 0;

+	repo_init_revisions(the_repository, &revs, NULL);
+
 	sparse = git_env_bool("GIT_TEST_PACK_SPARSE", -1);
 	if (the_repository->gitdir) {
 		prepare_repo_settings(the_repository);
@@ -4385,7 +4372,7 @@  int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 	if (!rev_list_all || !rev_list_reflog || !rev_list_index)
 		unpack_unreachable_expiration = 0;

-	if (pfd.have_revs && pfd.revs.filter.choice) {
+	if (revs.filter.choice) {
 		if (!pack_to_stdout)
 			die(_("cannot use --filter without --stdout"));
 		if (stdin_packs)
@@ -4472,16 +4459,10 @@  int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 		read_cruft_objects();
 	} else if (!use_internal_rev_list) {
 		read_object_list_from_stdin();
-	} else if (pfd.have_revs) {
-		get_object_list(&pfd.revs, rp.nr, rp.v);
-		release_revisions(&pfd.revs);
 	} else {
-		struct rev_info revs;
-
-		repo_init_revisions(the_repository, &revs, NULL);
 		get_object_list(&revs, rp.nr, rp.v);
-		release_revisions(&revs);
 	}
+	release_revisions(&revs);
 	cleanup_preferred_base();
 	if (include_tag && nr_result)
 		for_each_tag_ref(add_ref_tag, NULL);
diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index 5339660238..ee01bcd2cc 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -290,10 +290,6 @@  int opt_parse_list_objects_filter(const struct option *opt,
 				  const char *arg, int unset)
 {
 	struct list_objects_filter_options *filter_options = opt->value;
-	opt_lof_init init = (opt_lof_init)opt->defval;
-
-	if (init)
-		filter_options = init(opt->value);

 	if (unset || !arg)
 		list_objects_filter_set_no_filter(filter_options);
diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
index 7eeadab2dd..64af2e29e2 100644
--- a/list-objects-filter-options.h
+++ b/list-objects-filter-options.h
@@ -107,31 +107,13 @@  void parse_list_objects_filter(
 	struct list_objects_filter_options *filter_options,
 	const char *arg);

-/**
- * The opt->value to opt_parse_list_objects_filter() is either a
- * "struct list_objects_filter_option *" when using
- * OPT_PARSE_LIST_OBJECTS_FILTER().
- *
- * Or, if using no "struct option" field is used by the callback,
- * except the "defval" which is expected to be an "opt_lof_init"
- * function, which is called with the "opt->value" and must return a
- * pointer to the ""struct list_objects_filter_option *" to be used.
- *
- * The OPT_PARSE_LIST_OBJECTS_FILTER_INIT() can be used e.g. the
- * "struct list_objects_filter_option" is embedded in a "struct
- * rev_info", which the "defval" could be tasked with lazily
- * initializing. See cmd_pack_objects() for an example.
- */
 int opt_parse_list_objects_filter(const struct option *opt,
 				  const char *arg, int unset);
-typedef struct list_objects_filter_options *(*opt_lof_init)(void *);
-#define OPT_PARSE_LIST_OBJECTS_FILTER_INIT(fo, init) \
-	{ OPTION_CALLBACK, 0, "filter", (fo), N_("args"), \
-	  N_("object filtering"), 0, opt_parse_list_objects_filter, \
-	  (intptr_t)(init) }

 #define OPT_PARSE_LIST_OBJECTS_FILTER(fo) \
-	OPT_PARSE_LIST_OBJECTS_FILTER_INIT((fo), NULL)
+	OPT_CALLBACK(0, "filter", fo, N_("args"), \
+	  N_("object filtering"), \
+	  opt_parse_list_objects_filter)

 /*
  * Translates abbreviated numbers in the filter's filter_spec into their
diff --git a/t/t5317-pack-objects-filter-objects.sh b/t/t5317-pack-objects-filter-objects.sh
index 25faebaada..5b707d911b 100755
--- a/t/t5317-pack-objects-filter-objects.sh
+++ b/t/t5317-pack-objects-filter-objects.sh
@@ -265,7 +265,7 @@  test_expect_success 'verify normal and blob:limit packfiles have same commits/tr
 	test_cmp expected observed
 '

-test_expect_failure 'verify small limit and big limit results in small limit' '
+test_expect_success 'verify small limit and big limit results in small limit' '
 	git -C r2 ls-files -s large.1000 >ls_files_result &&
 	test_parse_ls_files_stage_oids <ls_files_result |
 	sort >expected &&