diff mbox series

[v2,13/21] config: plug various memory leaks

Message ID 70e8e2651306e9d221e5e472720a7610947580a7.1716541556.git.ps@pks.im (mailing list archive)
State Superseded
Headers show
Series Various memory leak fixes | expand

Commit Message

Patrick Steinhardt May 24, 2024, 10:04 a.m. UTC
Now that memory ownership rules around `git_config_string()` and
`git_config_pathname()` are clearer, it also got easier to spot that
the returned memory needs to be free'd. Plug a subset of those cases and
mark now-passing tests as leak free.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 alias.c                                      |  4 ++-
 config.c                                     | 38 ++++++++++++++------
 t/t1306-xdg-files.sh                         |  1 +
 t/t1350-config-hooks-path.sh                 |  1 +
 t/t3415-rebase-autosquash.sh                 |  1 +
 t/t4041-diff-submodule-option.sh             |  1 +
 t/t4060-diff-submodule-option-diff-format.sh |  1 +
 t/t4210-log-i18n.sh                          |  2 ++
 t/t6006-rev-list-format.sh                   |  1 +
 t/t7005-editor.sh                            |  1 +
 t/t7102-reset.sh                             |  1 +
 t/t9129-git-svn-i18n-commitencoding.sh       |  1 -
 t/t9139-git-svn-non-utf8-commitencoding.sh   |  1 -
 13 files changed, 41 insertions(+), 13 deletions(-)

Comments

Patrick Steinhardt May 24, 2024, 10:13 a.m. UTC | #1
On Fri, May 24, 2024 at 12:04:12PM +0200, Patrick Steinhardt wrote:
> Now that memory ownership rules around `git_config_string()` and
> `git_config_pathname()` are clearer, it also got easier to spot that
> the returned memory needs to be free'd. Plug a subset of those cases and
> mark now-passing tests as leak free.

Junio,

this mail now uses UTF-8 as encoding. Does that fix the issues for you?

Patrick
Jeff King May 25, 2024, 4:33 a.m. UTC | #2
On Fri, May 24, 2024 at 12:04:12PM +0200, Patrick Steinhardt wrote:

> diff --git a/alias.c b/alias.c
> index 269892c356..4daafd9bda 100644
> --- a/alias.c
> +++ b/alias.c
> @@ -21,9 +21,11 @@ static int config_alias_cb(const char *key, const char *value,
>  		return 0;
>  
>  	if (data->alias) {
> -		if (!strcasecmp(p, data->alias))
> +		if (!strcasecmp(p, data->alias)) {
> +			FREE_AND_NULL(data->v);
>  			return git_config_string(&data->v,
>  						 key, value);
> +		}
>  	} else if (data->list) {
>  		string_list_append(data->list, p);
>  	}

IMHO this should be done automatically by git_config_string(). The
current design is an accident waiting to happen, and in the long run
every call is going to need this FREE_AND_NULL(). By doing it in the
function the calling code is shorter, and there's no way we'll forget.

I posted a series along those lines a month or so ago:

  https://lore.kernel.org/git/20240407005656.GA436890@coredump.intra.peff.net/

The catch is that you can't do this:

  const char *foo = "bar";
  git_config_string(&foo, ...);

So I introduced a new function that took a non-const pointer with the
new behavior, with the idea that we'd eventually migrate everything
over. It looks like you may have already done that migration earlier in
your series, since the move to "char *" in the previous patch was OK.

  Though as a side note, sadly:

    char *foo = "bar";

  does not produce an error or even a warning without -Wwrite-strings. I
  think in the long run we should enable that, but there's a little
  cleanup required to do so.

The main reason I didn't follow up on that earlier series is that there
was some discussion about maybe moving this stuff over to strbufs (after
teaching it to handle literal initializers). But if you've managed to
remove all of the cases that needed that, I think just sticking with
"char *" is fine.

The other issue raised in that thread is that many of these config
variables are also passed to parse-options, which treats them as const
strings (and we get no compiler support because it goes through a void
pointer). So they may leak if we overwrite them, or in the unusual case
that we load config after parsing options, we may try to free a non-heap
string. The one we discussed was log's signature_file, and it looks like
you split that to use two variables, which works. Junio suggested an
OPT_FILENAME_DUP() option, which I'm also OK with. The main challenge to
me is being sure we found all such spots (and not accidentally
introducing new ones). But I don't have a good solution there.

> @@ -1566,7 +1569,7 @@ static int git_default_core_config(const char *var, const char *value,
>  
>  	if (!strcmp(var, "core.checkroundtripencoding")) {
>  		FREE_AND_NULL(check_roundtrip_encoding);
> -		return git_config_string((const char **) &check_roundtrip_encoding, var, value);
> +		return git_config_string(&check_roundtrip_encoding, var, value);
>  	}

This should have lost its cast in the previous commit, no? Applying up
to patch 12 and building with DEVELOPER=1 gets a warning.

-Peff
Patrick Steinhardt May 27, 2024, 6:46 a.m. UTC | #3
On Sat, May 25, 2024 at 12:33:47AM -0400, Jeff King wrote:
> On Fri, May 24, 2024 at 12:04:12PM +0200, Patrick Steinhardt wrote:
> 
> > diff --git a/alias.c b/alias.c
> > index 269892c356..4daafd9bda 100644
> > --- a/alias.c
> > +++ b/alias.c
> > @@ -21,9 +21,11 @@ static int config_alias_cb(const char *key, const char *value,
> >  		return 0;
> >  
> >  	if (data->alias) {
> > -		if (!strcasecmp(p, data->alias))
> > +		if (!strcasecmp(p, data->alias)) {
> > +			FREE_AND_NULL(data->v);
> >  			return git_config_string(&data->v,
> >  						 key, value);
> > +		}
> >  	} else if (data->list) {
> >  		string_list_append(data->list, p);
> >  	}
> 
> IMHO this should be done automatically by git_config_string(). The
> current design is an accident waiting to happen, and in the long run
> every call is going to need this FREE_AND_NULL(). By doing it in the
> function the calling code is shorter, and there's no way we'll forget.

In fact, I had this in my first iteration. But I didn't feel comfortable
with it exactly due to the reasons you mention below, where often times
we assign string constants as default values. This requires us to be
extremely careful, also because we do not yet have `-Wwrite-strings`
enabled as you mention.

> I posted a series along those lines a month or so ago:
> 
>   https://lore.kernel.org/git/20240407005656.GA436890@coredump.intra.peff.net/
> 
> The catch is that you can't do this:
> 
>   const char *foo = "bar";
>   git_config_string(&foo, ...);
> 
> So I introduced a new function that took a non-const pointer with the
> new behavior, with the idea that we'd eventually migrate everything
> over. It looks like you may have already done that migration earlier in
> your series, since the move to "char *" in the previous patch was OK.
> 
>   Though as a side note, sadly:
> 
>     char *foo = "bar";
> 
>   does not produce an error or even a warning without -Wwrite-strings. I
>   think in the long run we should enable that, but there's a little
>   cleanup required to do so.

Indeed, I had the exact same observation. I've already got a patch
series that enables `-Wwrite-strings` and that adapts our codebase to
compile cleanly with it. I'll send that series once this one here has
landed.

So my proposal would be to leave this patch as-is for the time being,
but revisit it once both patch series have landed. Does that work for
you?

> The main reason I didn't follow up on that earlier series is that
> there was some discussion about maybe moving this stuff over to
> strbufs (after teaching it to handle literal initializers). But if
> you've managed to remove all of the cases that needed that, I think
> just sticking with "char *" is fine.

I don't think I managed to hit every callsite yet that leaks memory. But
I think it shouldn't be too bad, and especiall if we follow up this
patch series with `FREE_AND_NULL()` on the out-parameter then this
should be fine.

> The other issue raised in that thread is that many of these config
> variables are also passed to parse-options, which treats them as const
> strings (and we get no compiler support because it goes through a void
> pointer). So they may leak if we overwrite them, or in the unusual
> case that we load config after parsing options, we may try to free a
> non-heap string. The one we discussed was log's signature_file, and it
> looks like you split that to use two variables, which works. Junio
> suggested an OPT_FILENAME_DUP() option, which I'm also OK with. The
> main challenge to me is being sure we found all such spots (and not
> accidentally introducing new ones). But I don't have a good solution
> there.

Yup, I remember having some issues with `OPT_FILENAME()`, but to the
best of my knowledge I've fixed all of them.

> > @@ -1566,7 +1569,7 @@ static int git_default_core_config(const char
> > *var, const char *value,
> >  
> >  	if (!strcmp(var, "core.checkroundtripencoding")) {
> >  	FREE_AND_NULL(check_roundtrip_encoding); -		return
> >  	git_config_string((const char **) &check_roundtrip_encoding,
> >  	var, value); +		return
> >  	git_config_string(&check_roundtrip_encoding, var, value); }
> 
> This should have lost its cast in the previous commit, no? Applying up
> to patch 12 and building with DEVELOPER=1 gets a warning.

Ah, good catch. Will fix.

Patrick
Jeff King May 29, 2024, 9:20 a.m. UTC | #4
On Mon, May 27, 2024 at 08:46:02AM +0200, Patrick Steinhardt wrote:

> Indeed, I had the exact same observation. I've already got a patch
> series that enables `-Wwrite-strings` and that adapts our codebase to
> compile cleanly with it. I'll send that series once this one here has
> landed.
> 
> So my proposal would be to leave this patch as-is for the time being,
> but revisit it once both patch series have landed. Does that work for
> you?

Yeah, I can certainly live with that. One of my biggest concerns was
that your patch would invalidate the investigation and work I had
already done, and I'd fail to follow up. But if you are putting it onto
your todo list, then that is even less work for me. ;)

-Peff
diff mbox series

Patch

diff --git a/alias.c b/alias.c
index 269892c356..4daafd9bda 100644
--- a/alias.c
+++ b/alias.c
@@ -21,9 +21,11 @@  static int config_alias_cb(const char *key, const char *value,
 		return 0;
 
 	if (data->alias) {
-		if (!strcasecmp(p, data->alias))
+		if (!strcasecmp(p, data->alias)) {
+			FREE_AND_NULL(data->v);
 			return git_config_string(&data->v,
 						 key, value);
+		}
 	} else if (data->list) {
 		string_list_append(data->list, p);
 	}
diff --git a/config.c b/config.c
index a025cfafe0..496cd1a61e 100644
--- a/config.c
+++ b/config.c
@@ -1414,8 +1414,10 @@  static int git_default_core_config(const char *var, const char *value,
 		return 0;
 	}
 
-	if (!strcmp(var, "core.attributesfile"))
+	if (!strcmp(var, "core.attributesfile")) {
+		FREE_AND_NULL(git_attributes_file);
 		return git_config_pathname(&git_attributes_file, var, value);
+	}
 
 	if (!strcmp(var, "core.hookspath")) {
 		if (ctx->kvi && ctx->kvi->scope == CONFIG_SCOPE_LOCAL &&
@@ -1428,6 +1430,7 @@  static int git_default_core_config(const char *var, const char *value,
 			      "again with "
 			      "`GIT_CLONE_PROTECTION_ACTIVE=false`"),
 			    value);
+		FREE_AND_NULL(git_hooks_path);
 		return git_config_pathname(&git_hooks_path, var, value);
 	}
 
@@ -1566,7 +1569,7 @@  static int git_default_core_config(const char *var, const char *value,
 
 	if (!strcmp(var, "core.checkroundtripencoding")) {
 		FREE_AND_NULL(check_roundtrip_encoding);
-		return git_config_string((const char **) &check_roundtrip_encoding, var, value);
+		return git_config_string(&check_roundtrip_encoding, var, value);
 	}
 
 	if (!strcmp(var, "core.notesref")) {
@@ -1576,8 +1579,10 @@  static int git_default_core_config(const char *var, const char *value,
 		return 0;
 	}
 
-	if (!strcmp(var, "core.editor"))
+	if (!strcmp(var, "core.editor")) {
+		FREE_AND_NULL(editor_program);
 		return git_config_string(&editor_program, var, value);
+	}
 
 	if (!strcmp(var, "core.commentchar") ||
 	    !strcmp(var, "core.commentstring")) {
@@ -1595,11 +1600,13 @@  static int git_default_core_config(const char *var, const char *value,
 		return 0;
 	}
 
-	if (!strcmp(var, "core.askpass"))
+	if (!strcmp(var, "core.askpass")) {
+		FREE_AND_NULL(askpass_program);
 		return git_config_string(&askpass_program, var, value);
+	}
 
 	if (!strcmp(var, "core.excludesfile")) {
-		free(excludes_file);
+		FREE_AND_NULL(excludes_file);
 		return git_config_pathname(&excludes_file, var, value);
 	}
 
@@ -1702,11 +1709,15 @@  static int git_default_sparse_config(const char *var, const char *value)
 
 static int git_default_i18n_config(const char *var, const char *value)
 {
-	if (!strcmp(var, "i18n.commitencoding"))
+	if (!strcmp(var, "i18n.commitencoding")) {
+		FREE_AND_NULL(git_commit_encoding);
 		return git_config_string(&git_commit_encoding, var, value);
+	}
 
-	if (!strcmp(var, "i18n.logoutputencoding"))
+	if (!strcmp(var, "i18n.logoutputencoding")) {
+		FREE_AND_NULL(git_log_output_encoding);
 		return git_config_string(&git_log_output_encoding, var, value);
+	}
 
 	/* Add other config variables here and to Documentation/config.txt. */
 	return 0;
@@ -1779,10 +1790,15 @@  static int git_default_push_config(const char *var, const char *value)
 
 static int git_default_mailmap_config(const char *var, const char *value)
 {
-	if (!strcmp(var, "mailmap.file"))
+	if (!strcmp(var, "mailmap.file")) {
+		FREE_AND_NULL(git_mailmap_file);
 		return git_config_pathname(&git_mailmap_file, var, value);
-	if (!strcmp(var, "mailmap.blob"))
+	}
+
+	if (!strcmp(var, "mailmap.blob")) {
+		FREE_AND_NULL(git_mailmap_blob);
 		return git_config_string(&git_mailmap_blob, var, value);
+	}
 
 	/* Add other config variables here and to Documentation/config.txt. */
 	return 0;
@@ -1790,8 +1806,10 @@  static int git_default_mailmap_config(const char *var, const char *value)
 
 static int git_default_attr_config(const char *var, const char *value)
 {
-	if (!strcmp(var, "attr.tree"))
+	if (!strcmp(var, "attr.tree")) {
+		FREE_AND_NULL(git_attr_tree);
 		return git_config_string(&git_attr_tree, var, value);
+	}
 
 	/*
 	 * Add other attribute related config variables here and to
diff --git a/t/t1306-xdg-files.sh b/t/t1306-xdg-files.sh
index 40d3c42618..53e5b290b9 100755
--- a/t/t1306-xdg-files.sh
+++ b/t/t1306-xdg-files.sh
@@ -7,6 +7,7 @@ 
 
 test_description='Compatibility with $XDG_CONFIG_HOME/git/ files'
 
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 test_expect_success 'read config: xdg file exists and ~/.gitconfig doesn'\''t' '
diff --git a/t/t1350-config-hooks-path.sh b/t/t1350-config-hooks-path.sh
index f6dc83e2aa..5c47f9ecc3 100755
--- a/t/t1350-config-hooks-path.sh
+++ b/t/t1350-config-hooks-path.sh
@@ -2,6 +2,7 @@ 
 
 test_description='Test the core.hooksPath configuration variable'
 
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 test_expect_success 'set up a pre-commit hook in core.hooksPath' '
diff --git a/t/t3415-rebase-autosquash.sh b/t/t3415-rebase-autosquash.sh
index fcc40d6fe1..22452ff84c 100755
--- a/t/t3415-rebase-autosquash.sh
+++ b/t/t3415-rebase-autosquash.sh
@@ -5,6 +5,7 @@  test_description='auto squash'
 GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
 export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
 
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 . "$TEST_DIRECTORY"/lib-rebase.sh
diff --git a/t/t4041-diff-submodule-option.sh b/t/t4041-diff-submodule-option.sh
index 0c1502d4b0..8fc40e75eb 100755
--- a/t/t4041-diff-submodule-option.sh
+++ b/t/t4041-diff-submodule-option.sh
@@ -12,6 +12,7 @@  This test tries to verify the sanity of the --submodule option of git diff.
 GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
 export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
 
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 # Tested non-UTF-8 encoding
diff --git a/t/t4060-diff-submodule-option-diff-format.sh b/t/t4060-diff-submodule-option-diff-format.sh
index 97c6424cd5..8ce67442d9 100755
--- a/t/t4060-diff-submodule-option-diff-format.sh
+++ b/t/t4060-diff-submodule-option-diff-format.sh
@@ -10,6 +10,7 @@  test_description='Support for diff format verbose submodule difference in git di
 This test tries to verify the sanity of --submodule=diff option of git diff.
 '
 
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 # Tested non-UTF-8 encoding
diff --git a/t/t4210-log-i18n.sh b/t/t4210-log-i18n.sh
index 75216f19ce..7120030b5c 100755
--- a/t/t4210-log-i18n.sh
+++ b/t/t4210-log-i18n.sh
@@ -1,6 +1,8 @@ 
 #!/bin/sh
 
 test_description='test log with i18n features'
+
+TEST_PASSES_SANITIZE_LEAK=true
 . ./lib-gettext.sh
 
 # two forms of é
diff --git a/t/t6006-rev-list-format.sh b/t/t6006-rev-list-format.sh
index 573eb97a0f..f1623b1c06 100755
--- a/t/t6006-rev-list-format.sh
+++ b/t/t6006-rev-list-format.sh
@@ -8,6 +8,7 @@  test_description='git rev-list --pretty=format test'
 GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
 export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
 
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 . "$TEST_DIRECTORY"/lib-terminal.sh
 
diff --git a/t/t7005-editor.sh b/t/t7005-editor.sh
index 5fcf281dfb..b9822294fe 100755
--- a/t/t7005-editor.sh
+++ b/t/t7005-editor.sh
@@ -2,6 +2,7 @@ 
 
 test_description='GIT_EDITOR, core.editor, and stuff'
 
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 unset EDITOR VISUAL GIT_EDITOR
diff --git a/t/t7102-reset.sh b/t/t7102-reset.sh
index 62d9f846ce..2add26d768 100755
--- a/t/t7102-reset.sh
+++ b/t/t7102-reset.sh
@@ -10,6 +10,7 @@  Documented tests for git reset'
 GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
 export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
 
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 commit_msg () {
diff --git a/t/t9129-git-svn-i18n-commitencoding.sh b/t/t9129-git-svn-i18n-commitencoding.sh
index 185248a4cd..01e1e8a8f7 100755
--- a/t/t9129-git-svn-i18n-commitencoding.sh
+++ b/t/t9129-git-svn-i18n-commitencoding.sh
@@ -4,7 +4,6 @@ 
 
 test_description='git svn honors i18n.commitEncoding in config'
 
-TEST_FAILS_SANITIZE_LEAK=true
 . ./lib-git-svn.sh
 
 compare_git_head_with () {
diff --git a/t/t9139-git-svn-non-utf8-commitencoding.sh b/t/t9139-git-svn-non-utf8-commitencoding.sh
index b7f756b2b7..22d80b0be2 100755
--- a/t/t9139-git-svn-non-utf8-commitencoding.sh
+++ b/t/t9139-git-svn-non-utf8-commitencoding.sh
@@ -4,7 +4,6 @@ 
 
 test_description='git svn refuses to dcommit non-UTF8 messages'
 
-TEST_FAILS_SANITIZE_LEAK=true
 . ./lib-git-svn.sh
 
 # ISO-2022-JP can pass for valid UTF-8, so skipping that in this test