Message ID | 20181218072528.3870492-4-martin.agren@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | setup: add `clear_repository_format()` | expand |
On Tue, Dec 18, 2018 at 08:25:28AM +0100, Martin Ågren wrote: > After we set up a `struct repository_format`, it owns various pieces of > allocated memory. We then either use those members, because we decide we > want to use the "candidate" repository format, or we discard the > candidate / scratch space. In the first case, we transfer ownership of > the memory to a few global variables. In the latter case, we just > silently drop the struct and end up leaking memory. > > Introduce a function `clear_repository_format()` which frees the memory > the struct holds on to. Call it in the code paths where we currently > leak the memory. Also call it in the error path of > `read_repository_format()` to clean up any partial result. > > For hygiene, we need to at least set the pointers that we free to NULL. > For future-proofing, let's zero the entire struct instead. It just means > that in the error path of `read_...()` we need to restore the error > sentinel in the `version` field. This seems reasonable, and I very much agree on the zero-ing (even though it _shouldn't_ matter due to the "undefined" rule). That also makes it safe to clear() multiple times, which is a nice property. > +void clear_repository_format(struct repository_format *format) > +{ > + string_list_clear(&format->unknown_extensions, 0); > + free(format->work_tree); > + free(format->partial_clone); > + memset(format, 0, sizeof(*format)); > } For the callers that actually pick the values out, I think it might be a little less error-prone if they actually copied the strings and then called clear_repository_format(). That avoids leaks of values that they didn't know or care about (and the cost of an extra strdup for repository setup is not a big deal). Something like this on top of your patch, I guess (with the idea being that functions which return an error would clear the format, but a "successful" one would get returned back up the stack to setup_git_directory_gently(), which then clears it before returning. -- >8 -- diff --git a/setup.c b/setup.c index babe5ea156..a5699f9ee6 100644 --- a/setup.c +++ b/setup.c @@ -470,6 +470,7 @@ static int check_repository_format_gently(const char *gitdir, struct repository_ warning("%s", err.buf); strbuf_release(&err); *nongit_ok = -1; + clear_repository_format(candidate); return -1; } die("%s", err.buf); @@ -499,7 +500,7 @@ static int check_repository_format_gently(const char *gitdir, struct repository_ } if (candidate->work_tree) { free(git_work_tree_cfg); - git_work_tree_cfg = candidate->work_tree; + git_work_tree_cfg = xstrdup(candidate->work_tree); inside_work_tree = -1; } } else { @@ -1158,6 +1159,7 @@ const char *setup_git_directory_gently(int *nongit_ok) strbuf_release(&dir); strbuf_release(&gitdir); + clear_repository_format(&repo_fmt); return prefix; } -Peff
On Wed, 19 Dec 2018 at 16:48, Jeff King <peff@peff.net> wrote: > > On Tue, Dec 18, 2018 at 08:25:28AM +0100, Martin Ågren wrote: > > > +void clear_repository_format(struct repository_format *format) > > +{ > > + string_list_clear(&format->unknown_extensions, 0); > > + free(format->work_tree); > > + free(format->partial_clone); > > + memset(format, 0, sizeof(*format)); > > } > > For the callers that actually pick the values out, I think it might be a > little less error-prone if they actually copied the strings and then > called clear_repository_format(). That avoids leaks of values that they > didn't know or care about (and the cost of an extra strdup for > repository setup is not a big deal). > > Something like this on top of your patch, I guess (with the idea being > that functions which return an error would clear the format, but a > "successful" one would get returned back up the stack to > setup_git_directory_gently(), which then clears it before returning. Thanks for the suggestion. I'll ponder 1) how to go about this robustifying, 2) how to present the result as part of a v2 series. To Junio on the sidelines in a cast (hope you're feeling better!): you can expect a v2 of this series. Martin
diff --git a/cache.h b/cache.h index 8b9e592c65..53ac01efa7 100644 --- a/cache.h +++ b/cache.h @@ -979,6 +979,12 @@ struct repository_format { */ void read_repository_format(struct repository_format *format, const char *path); +/* + * Free the memory held onto by `format`, but not the struct itself. + * (No need to use this after `read_repository_format()` fails.) + */ +void clear_repository_format(struct repository_format *format); + /* * Verify that the repository described by repository_format is something we * can read. If it is, return 0. Otherwise, return -1, and "err" will describe diff --git a/repository.c b/repository.c index 5dd1486718..efa9d1d960 100644 --- a/repository.c +++ b/repository.c @@ -159,6 +159,7 @@ int repo_init(struct repository *repo, if (worktree) repo_set_worktree(repo, worktree); + clear_repository_format(&format); return 0; error: diff --git a/setup.c b/setup.c index 52c3c9d31f..babe5ea156 100644 --- a/setup.c +++ b/setup.c @@ -517,6 +517,18 @@ void read_repository_format(struct repository_format *format, const char *path) format->hash_algo = GIT_HASH_SHA1; string_list_init(&format->unknown_extensions, 1); git_config_from_file(check_repo_format, path, format); + if (format->version == -1) { + clear_repository_format(format); + format->version = -1; + } +} + +void clear_repository_format(struct repository_format *format) +{ + string_list_clear(&format->unknown_extensions, 0); + free(format->work_tree); + free(format->partial_clone); + memset(format, 0, sizeof(*format)); } int verify_repository_format(const struct repository_format *format, @@ -1043,9 +1055,11 @@ int discover_git_directory(struct strbuf *commondir, strbuf_release(&err); strbuf_setlen(commondir, commondir_offset); strbuf_setlen(gitdir, gitdir_offset); + clear_repository_format(&candidate); return -1; } + clear_repository_format(&candidate); return 0; }
After we set up a `struct repository_format`, it owns various pieces of allocated memory. We then either use those members, because we decide we want to use the "candidate" repository format, or we discard the candidate / scratch space. In the first case, we transfer ownership of the memory to a few global variables. In the latter case, we just silently drop the struct and end up leaking memory. Introduce a function `clear_repository_format()` which frees the memory the struct holds on to. Call it in the code paths where we currently leak the memory. Also call it in the error path of `read_repository_format()` to clean up any partial result. For hygiene, we need to at least set the pointers that we free to NULL. For future-proofing, let's zero the entire struct instead. It just means that in the error path of `read_...()` we need to restore the error sentinel in the `version` field. We could take this opportunity to stop claiming that all fields except `version` are undefined in case of an error. On the other hand, having them defined as zero is not much better than having them undefined. We could define them to some fallback configuration (`is_bare = -1` and `hash_algo = GIT_HASH_SHA1`?), but "clear()" and/or "read()" seem like the wrong places to enforce fallback configurations. Let's leave things as "undefined" instead to encourage users to check `version`. Signed-off-by: Martin Ågren <martin.agren@gmail.com> --- The error state can always be defined later. Defining it now, then trying to backpedal, is probably not so fun. Filling the struct with non-zero values might help flush out bugs like the one fixed in the previous patch, but I'm wary of going that far in this patch. cache.h | 6 ++++++ repository.c | 1 + setup.c | 14 ++++++++++++++ 3 files changed, 21 insertions(+)