diff mbox series

[03/23] builtin/describe: fix memory leak with `--contains=`

Message ID 08a12be13c2fed247d6086967e7a3f03fa6519e1.1721995576.git.ps@pks.im (mailing list archive)
State Superseded
Headers show
Series Memory leak fixes (pt.3) | expand

Commit Message

Patrick Steinhardt July 26, 2024, 12:14 p.m. UTC
When calling `git describe --contains=`, we end up invoking
`cmd_name_rev()` with some munged argv array. This array may contain
allocated strings and furthermore will likely be modified by the called
function. This results in two memory leaks:

  - First, we leak the array that we use to assemble the arguments.

  - Second, we leak the allocated strings that we may have put into the
    array.

Fix those leaks by creating a separate copy of the array that we can
hand over to `cmd_name_rev()`. This allows us to free all strings
contained in the `strvec`, as the original vector will not be modified
anymore.

Furthermore, free both the `strvec` and the copied array to fix the
first memory leak.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/describe.c | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

Comments

Karthik Nayak July 30, 2024, 9:23 a.m. UTC | #1
Patrick Steinhardt <ps@pks.im> writes:

[snip]

> @@ -637,7 +639,21 @@ int cmd_describe(int argc, const char **argv, const char *prefix)
>  			strvec_pushv(&args, argv);
>  		else
>  			strvec_push(&args, "HEAD");
> -		return cmd_name_rev(args.nr, args.v, prefix);
> +
> +		/*
> +		 * `cmd_name_rev()` modifies the array, so we'd link its
> +		 * contained strings if we didn't do a copy here.
> +		 */
> +		ALLOC_ARRAY(argv_copy, args.nr + 1);
> +		for (size_t i = 0; i < args.nr; i++)
> +			argv_copy[i] = args.v[i];
> +		argv_copy[args.nr] = NULL;

Eventhough we pass `args.nr`, we still set the last element to NULL.
This replicates what strvec does and makes it more robust. Nice.

> +		ret = cmd_name_rev(args.nr, argv_copy, prefix);
> +
> +		strvec_clear(&args);
> +		free(argv_copy);
> +		return ret;
>  	}
>
>  	hashmap_init(&names, commit_name_neq, NULL, 0);
> --
> 2.46.0.rc1.dirty
Junio C Hamano July 30, 2024, 3:27 p.m. UTC | #2
Patrick Steinhardt <ps@pks.im> writes:

> When calling `git describe --contains=`, we end up invoking
> `cmd_name_rev()` with some munged argv array. This array may contain
> allocated strings and furthermore will likely be modified by the called
> function. This results in two memory leaks:
>
>   - First, we leak the array that we use to assemble the arguments.
>
>   - Second, we leak the allocated strings that we may have put into the
>     array.
>
> Fix those leaks by creating a separate copy of the array that we can
> hand over to `cmd_name_rev()`. This allows us to free all strings
> contained in the `strvec`, as the original vector will not be modified
> anymore.

OK, the separate copy has to be a shallow copy, as its purpose is
not to lose pointers to the contained strings.

> Furthermore, free both the `strvec` and the copied array to fix the
> first memory leak.
> ...
> +		strvec_clear(&args);
> +		free(argv_copy);

So, calling cmd_name_rev() may shuffle the argv_copy[] array but at
least it will not free any element in it (as expected---it is
typically the (argc, argv[]) the process receives from getting
exec'ed) [*].  Freeing the argv_copy shell itself is sufficient to
discard what we used to call cmd_name_rev().  And we discard args
both its content strings and the array.  OK.

> +		return ret;
>  	}
>  
>  	hashmap_init(&names, commit_name_neq, NULL, 0);


[Footnote]

 * The fact that cmd_foo() is called is not a hygiene thing to do in
   the first place, and in the longer term #leftoverbits we may need
   to refactor the thing further, into a proper library-ish reusable
   helper function that can be used to compute name_rev() any number
   of times, plus cmd_name_rev() and this caller that call it.
Patrick Steinhardt July 31, 2024, 10:42 a.m. UTC | #3
On Tue, Jul 30, 2024 at 08:27:59AM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> [Footnote]
> 
>  * The fact that cmd_foo() is called is not a hygiene thing to do in
>    the first place, and in the longer term #leftoverbits we may need
>    to refactor the thing further, into a proper library-ish reusable
>    helper function that can be used to compute name_rev() any number
>    of times, plus cmd_name_rev() and this caller that call it.

Agreed. There have been several instances of this scattered across the
codebase. The fix is quite ugly in my opinion, but it would be a bigger
topic to refactor those cases properly, so I refrained from doing so as
part of this series.

Patrick
Junio C Hamano July 31, 2024, 4:04 p.m. UTC | #4
Patrick Steinhardt <ps@pks.im> writes:

> On Tue, Jul 30, 2024 at 08:27:59AM -0700, Junio C Hamano wrote:
>> Patrick Steinhardt <ps@pks.im> writes:
>> [Footnote]
>> 
>>  * The fact that cmd_foo() is called is not a hygiene thing to do in
>>    the first place, and in the longer term #leftoverbits we may need
>>    to refactor the thing further, into a proper library-ish reusable
>>    helper function that can be used to compute name_rev() any number
>>    of times, plus cmd_name_rev() and this caller that call it.
>
> Agreed. There have been several instances of this scattered across the
> codebase. The fix is quite ugly in my opinion, but it would be a bigger
> topic to refactor those cases properly, so I refrained from doing so as
> part of this series.

Oh, I agree that it would be a "after all the dust settles" kind of
clean-up.

Thanks.
Taylor Blau July 31, 2024, 4:28 p.m. UTC | #5
On Fri, Jul 26, 2024 at 02:14:18PM +0200, Patrick Steinhardt wrote:
> @@ -637,7 +639,21 @@ int cmd_describe(int argc, const char **argv, const char *prefix)
>  			strvec_pushv(&args, argv);
>  		else
>  			strvec_push(&args, "HEAD");
> -		return cmd_name_rev(args.nr, args.v, prefix);
> +
> +		/*
> +		 * `cmd_name_rev()` modifies the array, so we'd link its

s/link/leak/ ?

Thanks,
Taylor
diff mbox series

Patch

diff --git a/builtin/describe.c b/builtin/describe.c
index cf8edc4222..4c0980c675 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -619,6 +619,8 @@  int cmd_describe(int argc, const char **argv, const char *prefix)
 	if (contains) {
 		struct string_list_item *item;
 		struct strvec args;
+		const char **argv_copy;
+		int ret;
 
 		strvec_init(&args);
 		strvec_pushl(&args, "name-rev",
@@ -637,7 +639,21 @@  int cmd_describe(int argc, const char **argv, const char *prefix)
 			strvec_pushv(&args, argv);
 		else
 			strvec_push(&args, "HEAD");
-		return cmd_name_rev(args.nr, args.v, prefix);
+
+		/*
+		 * `cmd_name_rev()` modifies the array, so we'd link its
+		 * contained strings if we didn't do a copy here.
+		 */
+		ALLOC_ARRAY(argv_copy, args.nr + 1);
+		for (size_t i = 0; i < args.nr; i++)
+			argv_copy[i] = args.v[i];
+		argv_copy[args.nr] = NULL;
+
+		ret = cmd_name_rev(args.nr, argv_copy, prefix);
+
+		strvec_clear(&args);
+		free(argv_copy);
+		return ret;
 	}
 
 	hashmap_init(&names, commit_name_neq, NULL, 0);