Message ID | 20240531112613.GB428814@coredump.intra.peff.net (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | leak fixes for sparse-checkout code | expand |
Jeff King <peff@peff.net> writes: > The add_pattern() function takes a pattern string, but neither makes a > copy of it nor takes ownership of the memory. So it is the caller's > responsibility to make sure the string hangs around as long as the > pattern_list which references it. Wow. That's an extremely bad API. I suspect that it long time ago did not aggressively "pre-parse" the given pattern string and kept the original copy in the struct (i.e., taking ownership), and these copies of the literal strings were made at the calling sites with the expectation that the API takes ownership of them, as most of the strings fed to add_pattern() are what we read from the environment into heap. > In the long run, we will also want to clean up this (undocumented!) > memory lifetime requirement of add_pattern(). But that can come in a > later patch; passing the string literals directly will be the right > thing either way. OK.
On Fri, May 31, 2024 at 09:14:49AM -0700, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > The add_pattern() function takes a pattern string, but neither makes a > > copy of it nor takes ownership of the memory. So it is the caller's > > responsibility to make sure the string hangs around as long as the > > pattern_list which references it. > > Wow. That's an extremely bad API. > > I suspect that it long time ago did not aggressively "pre-parse" the > given pattern string and kept the original copy in the struct (i.e., > taking ownership), and these copies of the literal strings were made > at the calling sites with the expectation that the API takes > ownership of them, as most of the strings fed to add_pattern() are > what we read from the environment into heap. Yeah, I actually dug into the history a bit but didn't find any real smoking gun of when it crossed the line to "horrible". It was gradual. ;) But yes, it was originally just an array of the pattern strings (in 2005!), and then grew more features, and then grew more of an interface for adding patterns from other sources, and so on. I'm mildly surprised nobody ran afoul of it before now. -Peff
diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c index 8747191484..7c17ed238c 100644 --- a/builtin/sparse-checkout.c +++ b/builtin/sparse-checkout.c @@ -442,7 +442,6 @@ static int sparse_checkout_init(int argc, const char **argv, const char *prefix) char *sparse_filename; int res; struct object_id oid; - struct strbuf pattern = STRBUF_INIT; static struct option builtin_sparse_checkout_init_options[] = { OPT_BOOL(0, "cone", &init_opts.cone_mode, @@ -493,10 +492,8 @@ static int sparse_checkout_init(int argc, const char **argv, const char *prefix) return 0; } - strbuf_addstr(&pattern, "/*"); - add_pattern(strbuf_detach(&pattern, NULL), empty_base, 0, &pl, 0); - strbuf_addstr(&pattern, "!/*/"); - add_pattern(strbuf_detach(&pattern, NULL), empty_base, 0, &pl, 0); + add_pattern("/*", empty_base, 0, &pl, 0); + add_pattern("!/*/", empty_base, 0, &pl, 0); pl.use_cone_patterns = init_opts.cone_mode; return write_patterns_and_update(&pl); @@ -893,7 +890,6 @@ static int sparse_checkout_disable(int argc, const char **argv, OPT_END(), }; struct pattern_list pl; - struct strbuf match_all = STRBUF_INIT; /* * We do not exit early if !core_apply_sparse_checkout; due to the @@ -919,8 +915,7 @@ static int sparse_checkout_disable(int argc, const char **argv, pl.use_cone_patterns = 0; core_apply_sparse_checkout = 1; - strbuf_addstr(&match_all, "/*"); - add_pattern(strbuf_detach(&match_all, NULL), empty_base, 0, &pl, 0); + add_pattern("/*", empty_base, 0, &pl, 0); prepare_repo_settings(the_repository); the_repository->settings.sparse_index = 0; diff --git a/t/t1090-sparse-checkout-scope.sh b/t/t1090-sparse-checkout-scope.sh index 3a14218b24..da0e7714d5 100755 --- a/t/t1090-sparse-checkout-scope.sh +++ b/t/t1090-sparse-checkout-scope.sh @@ -6,6 +6,7 @@ GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME TEST_CREATE_REPO_NO_TEMPLATE=1 +TEST_PASSES_SANITIZE_LEAK=true . ./test-lib.sh test_expect_success 'setup' ' diff --git a/t/t3602-rm-sparse-checkout.sh b/t/t3602-rm-sparse-checkout.sh index 08580fd3dc..fcdefba48c 100755 --- a/t/t3602-rm-sparse-checkout.sh +++ b/t/t3602-rm-sparse-checkout.sh @@ -2,6 +2,7 @@ test_description='git rm in sparse checked out working trees' +TEST_PASSES_SANITIZE_LEAK=true . ./test-lib.sh test_expect_success 'setup' "
The add_pattern() function takes a pattern string, but neither makes a copy of it nor takes ownership of the memory. So it is the caller's responsibility to make sure the string hangs around as long as the pattern_list which references it. There are a few cases in sparse-checkout where we use string literal patterns by stuffing them into a strbuf, detaching the buffer, and then passing the result into add_pattern(). This creates a leak when the pattern_list is eventually cleared, since we don't retain a copy of the detached buffer to free. But we can observe that the whole strbuf dance is unnecessary. The point was presumably[1] to satisfy the lifetime requirement of the string. But string literals have static duration; we can count on them lasting for the whole program. So we can fix the leak by just passing them directly. And as a bonus, that simplifies the code. The leaks can be seen in t7002, which drops from 25 leaks to 22 with this patch. It also makes t3602 and t1090 leak-free. In the long run, we will also want to clean up this (undocumented!) memory lifetime requirement of add_pattern(). But that can come in a later patch; passing the string literals directly will be the right thing either way. [1] The code in question comes from 416adc8711 (sparse-checkout: update working directory in-process for 'init', 2019-11-21) and 99dfa6f970 (sparse-checkout: use in-process update for disable subcommand, 2019-11-21), but I didn't see anything in their commit messages or on the list explaining the strbufs. Signed-off-by: Jeff King <peff@peff.net> --- builtin/sparse-checkout.c | 11 +++-------- t/t1090-sparse-checkout-scope.sh | 1 + t/t3602-rm-sparse-checkout.sh | 1 + 3 files changed, 5 insertions(+), 8 deletions(-)