Message ID | 08a12be13c2fed247d6086967e7a3f03fa6519e1.1721995576.git.ps@pks.im (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Memory leak fixes (pt.3) | expand |
Patrick Steinhardt <ps@pks.im> writes: [snip] > @@ -637,7 +639,21 @@ int cmd_describe(int argc, const char **argv, const char *prefix) > strvec_pushv(&args, argv); > else > strvec_push(&args, "HEAD"); > - return cmd_name_rev(args.nr, args.v, prefix); > + > + /* > + * `cmd_name_rev()` modifies the array, so we'd link its > + * contained strings if we didn't do a copy here. > + */ > + ALLOC_ARRAY(argv_copy, args.nr + 1); > + for (size_t i = 0; i < args.nr; i++) > + argv_copy[i] = args.v[i]; > + argv_copy[args.nr] = NULL; Eventhough we pass `args.nr`, we still set the last element to NULL. This replicates what strvec does and makes it more robust. Nice. > + ret = cmd_name_rev(args.nr, argv_copy, prefix); > + > + strvec_clear(&args); > + free(argv_copy); > + return ret; > } > > hashmap_init(&names, commit_name_neq, NULL, 0); > -- > 2.46.0.rc1.dirty
Patrick Steinhardt <ps@pks.im> writes: > When calling `git describe --contains=`, we end up invoking > `cmd_name_rev()` with some munged argv array. This array may contain > allocated strings and furthermore will likely be modified by the called > function. This results in two memory leaks: > > - First, we leak the array that we use to assemble the arguments. > > - Second, we leak the allocated strings that we may have put into the > array. > > Fix those leaks by creating a separate copy of the array that we can > hand over to `cmd_name_rev()`. This allows us to free all strings > contained in the `strvec`, as the original vector will not be modified > anymore. OK, the separate copy has to be a shallow copy, as its purpose is not to lose pointers to the contained strings. > Furthermore, free both the `strvec` and the copied array to fix the > first memory leak. > ... > + strvec_clear(&args); > + free(argv_copy); So, calling cmd_name_rev() may shuffle the argv_copy[] array but at least it will not free any element in it (as expected---it is typically the (argc, argv[]) the process receives from getting exec'ed) [*]. Freeing the argv_copy shell itself is sufficient to discard what we used to call cmd_name_rev(). And we discard args both its content strings and the array. OK. > + return ret; > } > > hashmap_init(&names, commit_name_neq, NULL, 0); [Footnote] * The fact that cmd_foo() is called is not a hygiene thing to do in the first place, and in the longer term #leftoverbits we may need to refactor the thing further, into a proper library-ish reusable helper function that can be used to compute name_rev() any number of times, plus cmd_name_rev() and this caller that call it.
On Tue, Jul 30, 2024 at 08:27:59AM -0700, Junio C Hamano wrote: > Patrick Steinhardt <ps@pks.im> writes: > [Footnote] > > * The fact that cmd_foo() is called is not a hygiene thing to do in > the first place, and in the longer term #leftoverbits we may need > to refactor the thing further, into a proper library-ish reusable > helper function that can be used to compute name_rev() any number > of times, plus cmd_name_rev() and this caller that call it. Agreed. There have been several instances of this scattered across the codebase. The fix is quite ugly in my opinion, but it would be a bigger topic to refactor those cases properly, so I refrained from doing so as part of this series. Patrick
Patrick Steinhardt <ps@pks.im> writes: > On Tue, Jul 30, 2024 at 08:27:59AM -0700, Junio C Hamano wrote: >> Patrick Steinhardt <ps@pks.im> writes: >> [Footnote] >> >> * The fact that cmd_foo() is called is not a hygiene thing to do in >> the first place, and in the longer term #leftoverbits we may need >> to refactor the thing further, into a proper library-ish reusable >> helper function that can be used to compute name_rev() any number >> of times, plus cmd_name_rev() and this caller that call it. > > Agreed. There have been several instances of this scattered across the > codebase. The fix is quite ugly in my opinion, but it would be a bigger > topic to refactor those cases properly, so I refrained from doing so as > part of this series. Oh, I agree that it would be a "after all the dust settles" kind of clean-up. Thanks.
On Fri, Jul 26, 2024 at 02:14:18PM +0200, Patrick Steinhardt wrote: > @@ -637,7 +639,21 @@ int cmd_describe(int argc, const char **argv, const char *prefix) > strvec_pushv(&args, argv); > else > strvec_push(&args, "HEAD"); > - return cmd_name_rev(args.nr, args.v, prefix); > + > + /* > + * `cmd_name_rev()` modifies the array, so we'd link its s/link/leak/ ? Thanks, Taylor
diff --git a/builtin/describe.c b/builtin/describe.c index cf8edc4222..4c0980c675 100644 --- a/builtin/describe.c +++ b/builtin/describe.c @@ -619,6 +619,8 @@ int cmd_describe(int argc, const char **argv, const char *prefix) if (contains) { struct string_list_item *item; struct strvec args; + const char **argv_copy; + int ret; strvec_init(&args); strvec_pushl(&args, "name-rev", @@ -637,7 +639,21 @@ int cmd_describe(int argc, const char **argv, const char *prefix) strvec_pushv(&args, argv); else strvec_push(&args, "HEAD"); - return cmd_name_rev(args.nr, args.v, prefix); + + /* + * `cmd_name_rev()` modifies the array, so we'd link its + * contained strings if we didn't do a copy here. + */ + ALLOC_ARRAY(argv_copy, args.nr + 1); + for (size_t i = 0; i < args.nr; i++) + argv_copy[i] = args.v[i]; + argv_copy[args.nr] = NULL; + + ret = cmd_name_rev(args.nr, argv_copy, prefix); + + strvec_clear(&args); + free(argv_copy); + return ret; } hashmap_init(&names, commit_name_neq, NULL, 0);
When calling `git describe --contains=`, we end up invoking `cmd_name_rev()` with some munged argv array. This array may contain allocated strings and furthermore will likely be modified by the called function. This results in two memory leaks: - First, we leak the array that we use to assemble the arguments. - Second, we leak the allocated strings that we may have put into the array. Fix those leaks by creating a separate copy of the array that we can hand over to `cmd_name_rev()`. This allows us to free all strings contained in the `strvec`, as the original vector will not be modified anymore. Furthermore, free both the `strvec` and the copied array to fix the first memory leak. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- builtin/describe.c | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-)