Message ID | f8b021033b887923662eb9fa63f6df1677ebbbb5.1548186510.git.martin.agren@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | setup: fix memory leaks with `struct repository_format` | expand |
On Tue, Jan 22, 2019 at 10:45:48PM +0100, Martin Ågren wrote: > Call `clear_...()` at the start of `read_...()` instead of just zeroing > the struct, since we sometimes enter the function multiple times. This > means that it is important to initialize the struct before calling > `read_...()`, so document that. This part is a little counter-intuitive to me. Is anybody ever going to pass in anything except a struct initialized to REPOSITORY_FORMAT_INIT? If so, might it be kinder for read_...() to not assume anything about the incoming struct, and initialize it from scratch? I.e., not to use clear() but just do the initialization step? A caller which calls read_() multiple times would presumably have an intervening clear (either their own, or the one done on an error return from the read function). Other than that minor nit, I like the overall shape of this. One interesting tidbit: > +/* > + * Always use this to initialize a `struct repository_format` > + * to a well-defined, default state before calling > + * `read_repository()`. > + */ > +#define REPOSITORY_FORMAT_INIT \ > +{ \ > + .version = -1, \ > + .is_bare = -1, \ > + .hash_algo = GIT_HASH_SHA1, \ > + .unknown_extensions = STRING_LIST_INIT_DUP, \ > +} > [...] > + struct repository_format candidate = REPOSITORY_FORMAT_INIT; This uses designated initializers, which is a C99-ism, but one we've used previously and feel confident in. But... > +void clear_repository_format(struct repository_format *format) > +{ > + string_list_clear(&format->unknown_extensions, 0); > + free(format->work_tree); > + free(format->partial_clone); > + *format = (struct repository_format)REPOSITORY_FORMAT_INIT; > +} ...this uses that expression not as an initializer, but as a compound literal. That's also C99, but AFAIK it's the first usage in our code base. I don't know if it will cause problems or not. The "old" way to do it is: struct repository_format foo = REPOSITORY_FORMAT_INIT; memcpy(format, &foo, sizeof(foo)); Given how simple it is to fix if it turns out to be a problem, I'm OK including it as a weather balloon. -Peff
On Wed, Jan 23, 2019 at 12:57:05AM -0500, Jeff King wrote: > This uses designated initializers, which is a C99-ism, but one we've > used previously and feel confident in. But... > > > +void clear_repository_format(struct repository_format *format) > > +{ > > + string_list_clear(&format->unknown_extensions, 0); > > + free(format->work_tree); > > + free(format->partial_clone); > > + *format = (struct repository_format)REPOSITORY_FORMAT_INIT; > > +} > > ...this uses that expression not as an initializer, but as a compound > literal. That's also C99, but AFAIK it's the first usage in our code > base. I don't know if it will cause problems or not. > > The "old" way to do it is: > > struct repository_format foo = REPOSITORY_FORMAT_INIT; > memcpy(format, &foo, sizeof(foo)); > > Given how simple it is to fix if it turns out to be a problem, I'm OK > including it as a weather balloon. It's my understanding that MSVC doesn't support this construct. If we care about supporting MSVC, then we need to write it without the compound literal. MSVC doesn't support any C99 feature that is not also in C++, unfortunately.
On Wed, 23 Jan 2019 at 06:57, Jeff King <peff@peff.net> wrote: > > On Tue, Jan 22, 2019 at 10:45:48PM +0100, Martin Ågren wrote: > > > Call `clear_...()` at the start of `read_...()` instead of just zeroing > > the struct, since we sometimes enter the function multiple times. This > > means that it is important to initialize the struct before calling > > `read_...()`, so document that. > > This part is a little counter-intuitive to me. Is anybody ever going to > pass in anything except a struct initialized to REPOSITORY_FORMAT_INIT? I do update all users in git.git, but yeah, out-of-tree users and in-flight topics would segfault. > If so, might it be kinder for read_...() to not assume anything about > the incoming struct, and initialize it from scratch? I.e., not to use > clear() but just do the initialization step? I have some vague memory from going down that route and giving up. Now that I'm looking at it again, I think we can at least try to do something. We can make sure that "external" users that call into setup.c are fine (they'll leak, but won't crash). Out-of-tree users inside setup.c will still be able to trip on this. I don't have much spare time over the next few days, but I'll get to this. Or we could accept that we may leak when we end up calling `read()` multiple times (I could catch all leaks now, but new ones might sneak in after that) and come back to this after X months, when we can perhaps afford to be a bit more aggressive. I guess we could just rename the struct to have the compiler catch out-of-tree users... > A caller which calls read_() multiple times would presumably have an > intervening clear (either their own, or the one done on an error return > from the read function). > > Other than that minor nit, I like the overall shape of this. Thank you. Martin
On Thu, 24 Jan 2019 at 01:15, brian m. carlson <sandals@crustytoothpaste.net> wrote: > > On Wed, Jan 23, 2019 at 12:57:05AM -0500, Jeff King wrote: > > > +void clear_repository_format(struct repository_format *format) > > > +{ > > > + string_list_clear(&format->unknown_extensions, 0); > > > + free(format->work_tree); > > > + free(format->partial_clone); > > > + *format = (struct repository_format)REPOSITORY_FORMAT_INIT; > > > +} > > > > ...this uses that expression not as an initializer, but as a compound > > literal. That's also C99, but AFAIK it's the first usage in our code > > base. I don't know if it will cause problems or not. > > Given how simple it is to fix if it turns out to be a problem, I'm OK > > including it as a weather balloon. Thanks for pointing out this potential problem. > It's my understanding that MSVC doesn't support this construct. If we > care about supporting MSVC, then we need to write it without the > compound literal. MSVC doesn't support any C99 feature that is not also > in C++, unfortunately. Ok, better play it safe then. Thanks. Martin
On Fri, Jan 25, 2019 at 08:24:35PM +0100, Martin Ågren wrote: > On Wed, 23 Jan 2019 at 06:57, Jeff King <peff@peff.net> wrote: > > > > On Tue, Jan 22, 2019 at 10:45:48PM +0100, Martin Ågren wrote: > > > > > Call `clear_...()` at the start of `read_...()` instead of just zeroing > > > the struct, since we sometimes enter the function multiple times. This > > > means that it is important to initialize the struct before calling > > > `read_...()`, so document that. > > > > This part is a little counter-intuitive to me. Is anybody ever going to > > pass in anything except a struct initialized to REPOSITORY_FORMAT_INIT? > > I do update all users in git.git, but yeah, out-of-tree users and > in-flight topics would segfault. > > > If so, might it be kinder for read_...() to not assume anything about > > the incoming struct, and initialize it from scratch? I.e., not to use > > clear() but just do the initialization step? > > I have some vague memory from going down that route and giving up. Now > that I'm looking at it again, I think we can at least try to do > something. We can make sure that "external" users that call into setup.c > are fine (they'll leak, but won't crash). Out-of-tree users inside > setup.c will still be able to trip on this. I don't have much spare time > over the next few days, but I'll get to this. > > Or we could accept that we may leak when we end up calling `read()` > multiple times (I could catch all leaks now, but new ones might sneak in > after that) and come back to this after X months, when we can perhaps > afford to be a bit more aggressive. > > I guess we could just rename the struct to have the compiler catch > out-of-tree users... I'm less worried about out-of-tree users, and more concerned with just having a calling convention that matches usual conventions (and is harder to get wrong). It's a pretty minor point, though, so I can live with it either way. -Peff
On Fri, 25 Jan 2019 at 20:51, Jeff King <peff@peff.net> wrote: > > > On Wed, 23 Jan 2019 at 06:57, Jeff King <peff@peff.net> wrote: > > > > > > On Tue, Jan 22, 2019 at 10:45:48PM +0100, Martin Ågren wrote: > > > > > > > Call `clear_...()` at the start of `read_...()` instead of just zeroing > > > > the struct, since we sometimes enter the function multiple times. This > > > > means that it is important to initialize the struct before calling > > > > `read_...()`, so document that. > > > > > > This part is a little counter-intuitive to me. Is anybody ever going to > > > pass in anything except a struct initialized to REPOSITORY_FORMAT_INIT? > > > If so, might it be kinder for read_...() to not assume anything about > > > the incoming struct, and initialize it from scratch? I.e., not to use > > > clear() but just do the initialization step? > > I'm less worried about out-of-tree users, and more concerned with just > having a calling convention that matches usual conventions (and is > harder to get wrong). > > It's a pretty minor point, though, so I can live with it either way. It's time to resurrect this thread. I've reworked this patch to avoid the compound literal when re-initing a struct, and I've been going back and forth on this point about having to initialize to `..._INIT` or risk crashing. And I keep coming back to thinking that it's not *that* different from how `STRBUF_INIT` works. There's the obvious difference that there aren't as many functions to call, so there's certainly a difference in scale. And you'd think that you'll always start with `read_...()`, but another potential first function to call is `clear_...()` (see builtin/init-db.c), in which case you better have used `..._INIT` first. I'm tempted to address this point by documenting as good as I can in the .h-file that one has to use this initializer macro. I'll obviously convert all users, so copy-paste programming should work fine... How does that sound to you? Martin
On Mon, Feb 25, 2019 at 08:21:07PM +0100, Martin Ågren wrote: > It's time to resurrect this thread. I've reworked this patch to avoid > the compound literal when re-initing a struct, and I've been going back > and forth on this point about having to initialize to `..._INIT` or risk > crashing. And I keep coming back to thinking that it's not *that* > different from how `STRBUF_INIT` works. > > There's the obvious difference that there aren't as many functions to > call, so there's certainly a difference in scale. And you'd think that > you'll always start with `read_...()`, but another potential first > function to call is `clear_...()` (see builtin/init-db.c), in which case > you better have used `..._INIT` first. > > I'm tempted to address this point by documenting as good as I can in the > .h-file that one has to use this initializer macro. I'll obviously > convert all users, so copy-paste programming should work fine... > > How does that sound to you? It sounds pretty good. -Peff
diff --git a/cache.h b/cache.h index ca36b44ee0..ec7c043412 100644 --- a/cache.h +++ b/cache.h @@ -972,14 +972,35 @@ struct repository_format { struct string_list unknown_extensions; }; +/* + * Always use this to initialize a `struct repository_format` + * to a well-defined, default state before calling + * `read_repository()`. + */ +#define REPOSITORY_FORMAT_INIT \ +{ \ + .version = -1, \ + .is_bare = -1, \ + .hash_algo = GIT_HASH_SHA1, \ + .unknown_extensions = STRING_LIST_INIT_DUP, \ +} + /* * Read the repository format characteristics from the config file "path" into - * "format" struct. Returns the numeric version. On error, -1 is returned, - * format->version is set to -1, and all other fields in the struct are - * undefined. + * "format" struct. Returns the numeric version. On error, or if no version is + * found in the configuration, -1 is returned, format->version is set to -1, + * and all other fields in the struct are set to the default configuration + * (REPOSITORY_FORMAT_INIT). Always initialize the struct using + * REPOSITORY_FORMAT_INIT before calling this function. */ int read_repository_format(struct repository_format *format, const char *path); +/* + * Free the memory held onto by `format`, but not the struct itself. + * (No need to use this after `read_repository_format()` fails.) + */ +void clear_repository_format(struct repository_format *format); + /* * Verify that the repository described by repository_format is something we * can read. If it is, return 0. Otherwise, return -1, and "err" will describe diff --git a/builtin/init-db.c b/builtin/init-db.c index 41faffd28d..04c60eaad5 100644 --- a/builtin/init-db.c +++ b/builtin/init-db.c @@ -96,7 +96,7 @@ static void copy_templates(const char *template_dir) struct strbuf path = STRBUF_INIT; struct strbuf template_path = STRBUF_INIT; size_t template_len; - struct repository_format template_format; + struct repository_format template_format = REPOSITORY_FORMAT_INIT; struct strbuf err = STRBUF_INIT; DIR *dir; char *to_free = NULL; @@ -148,6 +148,7 @@ static void copy_templates(const char *template_dir) free(to_free); strbuf_release(&path); strbuf_release(&template_path); + clear_repository_format(&template_format); } static int git_init_db_config(const char *k, const char *v, void *cb) diff --git a/repository.c b/repository.c index 7b02e1dffa..df88705574 100644 --- a/repository.c +++ b/repository.c @@ -148,7 +148,7 @@ int repo_init(struct repository *repo, const char *gitdir, const char *worktree) { - struct repository_format format; + struct repository_format format = REPOSITORY_FORMAT_INIT; memset(repo, 0, sizeof(*repo)); repo->objects = raw_object_store_new(); @@ -165,6 +165,7 @@ int repo_init(struct repository *repo, if (worktree) repo_set_worktree(repo, worktree); + clear_repository_format(&format); return 0; error: diff --git a/setup.c b/setup.c index bb633942bb..14c2f8ce24 100644 --- a/setup.c +++ b/setup.c @@ -477,7 +477,7 @@ static int check_repository_format_gently(const char *gitdir, struct repository_ } repository_format_precious_objects = candidate->precious_objects; - repository_format_partial_clone = candidate->partial_clone; + repository_format_partial_clone = xstrdup_or_null(candidate->partial_clone); repository_format_worktree_config = candidate->worktree_config; string_list_clear(&candidate->unknown_extensions, 0); @@ -500,11 +500,9 @@ static int check_repository_format_gently(const char *gitdir, struct repository_ } if (candidate->work_tree) { free(git_work_tree_cfg); - git_work_tree_cfg = candidate->work_tree; + git_work_tree_cfg = xstrdup(candidate->work_tree); inside_work_tree = -1; } - } else { - free(candidate->work_tree); } return 0; @@ -512,15 +510,21 @@ static int check_repository_format_gently(const char *gitdir, struct repository_ int read_repository_format(struct repository_format *format, const char *path) { - memset(format, 0, sizeof(*format)); - format->version = -1; - format->is_bare = -1; - format->hash_algo = GIT_HASH_SHA1; - string_list_init(&format->unknown_extensions, 1); + clear_repository_format(format); git_config_from_file(check_repo_format, path, format); + if (format->version == -1) + clear_repository_format(format); return format->version; } +void clear_repository_format(struct repository_format *format) +{ + string_list_clear(&format->unknown_extensions, 0); + free(format->work_tree); + free(format->partial_clone); + *format = (struct repository_format)REPOSITORY_FORMAT_INIT; +} + int verify_repository_format(const struct repository_format *format, struct strbuf *err) { @@ -1008,7 +1012,7 @@ int discover_git_directory(struct strbuf *commondir, struct strbuf dir = STRBUF_INIT, err = STRBUF_INIT; size_t gitdir_offset = gitdir->len, cwd_len; size_t commondir_offset = commondir->len; - struct repository_format candidate; + struct repository_format candidate = REPOSITORY_FORMAT_INIT; if (strbuf_getcwd(&dir)) return -1; @@ -1045,9 +1049,11 @@ int discover_git_directory(struct strbuf *commondir, strbuf_release(&err); strbuf_setlen(commondir, commondir_offset); strbuf_setlen(gitdir, gitdir_offset); + clear_repository_format(&candidate); return -1; } + clear_repository_format(&candidate); return 0; } @@ -1056,7 +1062,7 @@ const char *setup_git_directory_gently(int *nongit_ok) static struct strbuf cwd = STRBUF_INIT; struct strbuf dir = STRBUF_INIT, gitdir = STRBUF_INIT; const char *prefix; - struct repository_format repo_fmt; + struct repository_format repo_fmt = REPOSITORY_FORMAT_INIT; /* * We may have read an incomplete configuration before @@ -1146,6 +1152,7 @@ const char *setup_git_directory_gently(int *nongit_ok) strbuf_release(&dir); strbuf_release(&gitdir); + clear_repository_format(&repo_fmt); return prefix; } @@ -1203,9 +1210,10 @@ int git_config_perm(const char *var, const char *value) void check_repository_format(void) { - struct repository_format repo_fmt; + struct repository_format repo_fmt = REPOSITORY_FORMAT_INIT; check_repository_format_gently(get_git_dir(), &repo_fmt, NULL); startup_info->have_repository = 1; + clear_repository_format(&repo_fmt); } /* diff --git a/worktree.c b/worktree.c index d6a0ee7f73..b45bfeb9d3 100644 --- a/worktree.c +++ b/worktree.c @@ -444,7 +444,7 @@ int submodule_uses_worktrees(const char *path) DIR *dir; struct dirent *d; int ret = 0; - struct repository_format format; + struct repository_format format = REPOSITORY_FORMAT_INIT; submodule_gitdir = git_pathdup_submodule(path, "%s", ""); if (!submodule_gitdir) @@ -462,8 +462,10 @@ int submodule_uses_worktrees(const char *path) read_repository_format(&format, sb.buf); if (format.version != 0) { strbuf_release(&sb); + clear_repository_format(&format); return 1; } + clear_repository_format(&format); /* Replace config by worktrees. */ strbuf_setlen(&sb, sb.len - strlen("config"));
After we set up a `struct repository_format`, it owns various pieces of allocated memory. We then either use those members, because we decide we want to use the "candidate" repository format, or we discard the candidate / scratch space. In the first case, we transfer ownership of the memory to a few global variables. In the latter case, we just silently drop the struct and end up leaking memory. Introduce an initialization macro `REPOSITORY_FORMAT_INIT` and a function `clear_repository_format()`, to be used on each side of `read_repository_format()`. To have a clear and simple memory ownership, let all users of `struct repository_format` duplicate the strings that they take from it, rather than stealing the pointers. Call `clear_...()` at the start of `read_...()` instead of just zeroing the struct, since we sometimes enter the function multiple times. This means that it is important to initialize the struct before calling `read_...()`, so document that. Teach `read_...()` to clear the struct upon an error, so that it is reset to a safe state, and document this. About that very last point: In `setup_git_directory_gently()`, we look at `repo_fmt.hash_algo` even if `repo_fmt.version` is -1, which we weren't actually supposed to do per the API. After this commit, that's ok. We inherit the existing code's combining "error" and "no version found". Both are signalled through `version == -1` and now both cause us to clear any partial configuration we have picked up. For "extensions.*", that's fine, since they require a positive version number. For "core.bare" and "core.worktree", we're already verifying that we have a non-negative version number before using them. Signed-off-by: Martin Ågren <martin.agren@gmail.com> --- cache.h | 27 ++++++++++++++++++++++++--- builtin/init-db.c | 3 ++- repository.c | 3 ++- setup.c | 32 ++++++++++++++++++++------------ worktree.c | 4 +++- 5 files changed, 51 insertions(+), 18 deletions(-)