diff mbox series

[02/13] sparse-checkout: pass string literals directly to add_pattern()

Message ID 20240531112613.GB428814@coredump.intra.peff.net (mailing list archive)
State Superseded
Headers show
Series leak fixes for sparse-checkout code | expand

Commit Message

Jeff King May 31, 2024, 11:26 a.m. UTC
The add_pattern() function takes a pattern string, but neither makes a
copy of it nor takes ownership of the memory. So it is the caller's
responsibility to make sure the string hangs around as long as the
pattern_list which references it.

There are a few cases in sparse-checkout where we use string literal
patterns by stuffing them into a strbuf, detaching the buffer, and then
passing the result into add_pattern(). This creates a leak when the
pattern_list is eventually cleared, since we don't retain a copy of the
detached buffer to free.

But we can observe that the whole strbuf dance is unnecessary. The point
was presumably[1] to satisfy the lifetime requirement of the string. But
string literals have static duration; we can count on them lasting for
the whole program.

So we can fix the leak by just passing them directly. And as a bonus,
that simplifies the code. The leaks can be seen in t7002, which drops
from 25 leaks to 22 with this patch. It also makes t3602 and t1090
leak-free.

In the long run, we will also want to clean up this (undocumented!)
memory lifetime requirement of add_pattern(). But that can come in a
later patch; passing the string literals directly will be the right
thing either way.

[1] The code in question comes from 416adc8711 (sparse-checkout: update
    working directory in-process for 'init', 2019-11-21) and 99dfa6f970
    (sparse-checkout: use in-process update for disable subcommand,
    2019-11-21), but I didn't see anything in their commit messages or
    on the list explaining the strbufs.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/sparse-checkout.c        | 11 +++--------
 t/t1090-sparse-checkout-scope.sh |  1 +
 t/t3602-rm-sparse-checkout.sh    |  1 +
 3 files changed, 5 insertions(+), 8 deletions(-)

Comments

Junio C Hamano May 31, 2024, 4:14 p.m. UTC | #1
Jeff King <peff@peff.net> writes:

> The add_pattern() function takes a pattern string, but neither makes a
> copy of it nor takes ownership of the memory. So it is the caller's
> responsibility to make sure the string hangs around as long as the
> pattern_list which references it.

Wow.  That's an extremely bad API.

I suspect that it long time ago did not aggressively "pre-parse" the
given pattern string and kept the original copy in the struct (i.e.,
taking ownership), and these copies of the literal strings were made
at the calling sites with the expectation that the API takes
ownership of them, as most of the strings fed to add_pattern() are
what we read from the environment into heap.

> In the long run, we will also want to clean up this (undocumented!)
> memory lifetime requirement of add_pattern(). But that can come in a
> later patch; passing the string literals directly will be the right
> thing either way.

OK.
Jeff King June 4, 2024, 9:43 a.m. UTC | #2
On Fri, May 31, 2024 at 09:14:49AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > The add_pattern() function takes a pattern string, but neither makes a
> > copy of it nor takes ownership of the memory. So it is the caller's
> > responsibility to make sure the string hangs around as long as the
> > pattern_list which references it.
> 
> Wow.  That's an extremely bad API.
> 
> I suspect that it long time ago did not aggressively "pre-parse" the
> given pattern string and kept the original copy in the struct (i.e.,
> taking ownership), and these copies of the literal strings were made
> at the calling sites with the expectation that the API takes
> ownership of them, as most of the strings fed to add_pattern() are
> what we read from the environment into heap.

Yeah, I actually dug into the history a bit but didn't find any real
smoking gun of when it crossed the line to "horrible". It was gradual. ;)

But yes, it was originally just an array of the pattern strings (in
2005!), and then grew more features, and then grew more of an interface
for adding patterns from other sources, and so on. I'm mildly surprised
nobody ran afoul of it before now.

-Peff
diff mbox series

Patch

diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
index 8747191484..7c17ed238c 100644
--- a/builtin/sparse-checkout.c
+++ b/builtin/sparse-checkout.c
@@ -442,7 +442,6 @@  static int sparse_checkout_init(int argc, const char **argv, const char *prefix)
 	char *sparse_filename;
 	int res;
 	struct object_id oid;
-	struct strbuf pattern = STRBUF_INIT;
 
 	static struct option builtin_sparse_checkout_init_options[] = {
 		OPT_BOOL(0, "cone", &init_opts.cone_mode,
@@ -493,10 +492,8 @@  static int sparse_checkout_init(int argc, const char **argv, const char *prefix)
 		return 0;
 	}
 
-	strbuf_addstr(&pattern, "/*");
-	add_pattern(strbuf_detach(&pattern, NULL), empty_base, 0, &pl, 0);
-	strbuf_addstr(&pattern, "!/*/");
-	add_pattern(strbuf_detach(&pattern, NULL), empty_base, 0, &pl, 0);
+	add_pattern("/*", empty_base, 0, &pl, 0);
+	add_pattern("!/*/", empty_base, 0, &pl, 0);
 	pl.use_cone_patterns = init_opts.cone_mode;
 
 	return write_patterns_and_update(&pl);
@@ -893,7 +890,6 @@  static int sparse_checkout_disable(int argc, const char **argv,
 		OPT_END(),
 	};
 	struct pattern_list pl;
-	struct strbuf match_all = STRBUF_INIT;
 
 	/*
 	 * We do not exit early if !core_apply_sparse_checkout; due to the
@@ -919,8 +915,7 @@  static int sparse_checkout_disable(int argc, const char **argv,
 	pl.use_cone_patterns = 0;
 	core_apply_sparse_checkout = 1;
 
-	strbuf_addstr(&match_all, "/*");
-	add_pattern(strbuf_detach(&match_all, NULL), empty_base, 0, &pl, 0);
+	add_pattern("/*", empty_base, 0, &pl, 0);
 
 	prepare_repo_settings(the_repository);
 	the_repository->settings.sparse_index = 0;
diff --git a/t/t1090-sparse-checkout-scope.sh b/t/t1090-sparse-checkout-scope.sh
index 3a14218b24..da0e7714d5 100755
--- a/t/t1090-sparse-checkout-scope.sh
+++ b/t/t1090-sparse-checkout-scope.sh
@@ -6,6 +6,7 @@  GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
 export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
 
 TEST_CREATE_REPO_NO_TEMPLATE=1
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 test_expect_success 'setup' '
diff --git a/t/t3602-rm-sparse-checkout.sh b/t/t3602-rm-sparse-checkout.sh
index 08580fd3dc..fcdefba48c 100755
--- a/t/t3602-rm-sparse-checkout.sh
+++ b/t/t3602-rm-sparse-checkout.sh
@@ -2,6 +2,7 @@ 
 
 test_description='git rm in sparse checked out working trees'
 
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 test_expect_success 'setup' "