Message ID | 20200814111049.GA4101811@coredump.intra.peff.net (mailing list archive) |
---|---|
State | Accepted |
Commit | 8dc3156373f4e02c1b1f657350ffae8ee94cbf44 |
Headers | show |
Series | clear_pattern_list(): clear embedded hashmaps | expand |
On 8/14/2020 7:10 AM, Jeff King wrote: > Commit 96cc8ab531 (sparse-checkout: use hashmaps for cone patterns, > 2019-11-21) added some auxiliary hashmaps to the pattern_list struct, > but they're leaked when clear_pattern_list() is called. > > Signed-off-by: Jeff King <peff@peff.net> > --- > I have no idea how often this leak triggers in practice. I just noticed > it while poking at LSan output (which we remain depressingly far > from getting a clean run on). Good find. The impact of the leak is likely low since we don't create multiple pattern_list structs per process (with these hashmaps) very often. The sparse-checkout builtin is likely the only place where multiple could be instantiated at the same time. I also double-checked that hashmap_free_entries() handles a NULL hashmap pointer or uninitialized hashmap, which is what happens when cone mode is not enabled _or_ the pattern_list corresponds to something like a .gitignore file. Thanks, -Stolee > dir.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/dir.c b/dir.c > index fe64be30ed..9411b94e9b 100644 > --- a/dir.c > +++ b/dir.c > @@ -916,6 +916,8 @@ void clear_pattern_list(struct pattern_list *pl) > free(pl->patterns[i]); > free(pl->patterns); > free(pl->filebuf); > + hashmap_free_entries(&pl->recursive_hashmap, struct pattern_entry, ent); > + hashmap_free_entries(&pl->parent_hashmap, struct pattern_entry, ent); > > memset(pl, 0, sizeof(*pl)); > } >
Hi, On Fri, Aug 14, 2020 at 5:23 AM Jeff King <peff@peff.net> wrote: > > Commit 96cc8ab531 (sparse-checkout: use hashmaps for cone patterns, > 2019-11-21) added some auxiliary hashmaps to the pattern_list struct, > but they're leaked when clear_pattern_list() is called. > > Signed-off-by: Jeff King <peff@peff.net> > --- > I have no idea how often this leak triggers in practice. I just noticed > it while poking at LSan output (which we remain depressingly far > from getting a clean run on). > > dir.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/dir.c b/dir.c > index fe64be30ed..9411b94e9b 100644 > --- a/dir.c > +++ b/dir.c > @@ -916,6 +916,8 @@ void clear_pattern_list(struct pattern_list *pl) > free(pl->patterns[i]); > free(pl->patterns); > free(pl->filebuf); > + hashmap_free_entries(&pl->recursive_hashmap, struct pattern_entry, ent); > + hashmap_free_entries(&pl->parent_hashmap, struct pattern_entry, ent); This clears up the hash entries, but continues to leak the hash table. Since you submitted first, can you fix this to use hashmap_free_() instead, as per https://lore.kernel.org/git/932741d7598ca2934dbca40f715ba2d3819fcc51.1597561152.git.gitgitgadget@gmail.com/? Then I'll rebase my series on yours and drop my first patch (since it'll then be identical). Thanks, Elijah
On Mon, Aug 17, 2020 at 9:55 AM Elijah Newren <newren@gmail.com> wrote: > > Hi, > > On Fri, Aug 14, 2020 at 5:23 AM Jeff King <peff@peff.net> wrote: > > > > Commit 96cc8ab531 (sparse-checkout: use hashmaps for cone patterns, > > 2019-11-21) added some auxiliary hashmaps to the pattern_list struct, > > but they're leaked when clear_pattern_list() is called. > > > > Signed-off-by: Jeff King <peff@peff.net> > > --- > > I have no idea how often this leak triggers in practice. I just noticed > > it while poking at LSan output (which we remain depressingly far > > from getting a clean run on). > > > > dir.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/dir.c b/dir.c > > index fe64be30ed..9411b94e9b 100644 > > --- a/dir.c > > +++ b/dir.c > > @@ -916,6 +916,8 @@ void clear_pattern_list(struct pattern_list *pl) > > free(pl->patterns[i]); > > free(pl->patterns); > > free(pl->filebuf); > > + hashmap_free_entries(&pl->recursive_hashmap, struct pattern_entry, ent); > > + hashmap_free_entries(&pl->parent_hashmap, struct pattern_entry, ent); > > This clears up the hash entries, but continues to leak the hash table. > Since you submitted first, can you fix this to use hashmap_free_() > instead, as per > https://lore.kernel.org/git/932741d7598ca2934dbca40f715ba2d3819fcc51.1597561152.git.gitgitgadget@gmail.com/? > Then I'll rebase my series on yours and drop my first patch (since > it'll then be identical). Nevermind, I got confused once again by the name. hashmap_free_entries() doesn't mean just free the entries, it means free what hashmap_free() would plus all the entries, i.e. do what hashmap_free() *should* *have* *been* defined to do. Such a confusing API. And hashmap_free() really perplexes me -- it seems like a function that can't possibly be useful; it's sole purpose seems to be a trap for the unwary.
On Mon, Aug 17, 2020 at 10:22:27AM -0700, Elijah Newren wrote: > > > + hashmap_free_entries(&pl->recursive_hashmap, struct pattern_entry, ent); > > > + hashmap_free_entries(&pl->parent_hashmap, struct pattern_entry, ent); > > > > This clears up the hash entries, but continues to leak the hash table. > > Since you submitted first, can you fix this to use hashmap_free_() > > instead, as per > > https://lore.kernel.org/git/932741d7598ca2934dbca40f715ba2d3819fcc51.1597561152.git.gitgitgadget@gmail.com/? > > Then I'll rebase my series on yours and drop my first patch (since > > it'll then be identical). > > Nevermind, I got confused once again by the name. > hashmap_free_entries() doesn't mean just free the entries, it means > free what hashmap_free() would plus all the entries, i.e. do what > hashmap_free() *should* *have* *been* defined to do. Such a confusing > API. And hashmap_free() really perplexes me -- it seems like a > function that can't possibly be useful; it's sole purpose seems to be > a trap for the unwary. There used to be an "also free entries" flag, but that got complicated by the loosening of the "hashmap_entry must be at the front of the struct to be freed" rule. With this kind of embedded-entry data structure (and list.h is in the same boat) it _is_ sometimes useful to be part of a data structure without giving up ownership of the memory. But I agree that the more normal case is to free items when the hashmap is destroyed. Likewise, the whole "you have to define a struct that contains the map entry" thing is flexible and efficient, but a pain to use. I generally find khash's "map this type to that type, the hash owns the memory" much more natural. And it doesn't lose efficiency (and indeed sometimes even gains it) because it uses macros to store concrete types. But of course macros create their own headaches. :) Anyway, I'm definitely open to renaming to something more sensible. I already mentioned the free/clear thing earlier, but hashmap_clear_entries() ends up _very_ confusing. Because it's clearing the hashmap but freeing the entries. hashmap_clear_and_free_entries() is kind of long, but a lot more descriptive. -Peff
diff --git a/dir.c b/dir.c index fe64be30ed..9411b94e9b 100644 --- a/dir.c +++ b/dir.c @@ -916,6 +916,8 @@ void clear_pattern_list(struct pattern_list *pl) free(pl->patterns[i]); free(pl->patterns); free(pl->filebuf); + hashmap_free_entries(&pl->recursive_hashmap, struct pattern_entry, ent); + hashmap_free_entries(&pl->parent_hashmap, struct pattern_entry, ent); memset(pl, 0, sizeof(*pl)); }
Commit 96cc8ab531 (sparse-checkout: use hashmaps for cone patterns, 2019-11-21) added some auxiliary hashmaps to the pattern_list struct, but they're leaked when clear_pattern_list() is called. Signed-off-by: Jeff King <peff@peff.net> --- I have no idea how often this leak triggers in practice. I just noticed it while poking at LSan output (which we remain depressingly far from getting a clean run on). dir.c | 2 ++ 1 file changed, 2 insertions(+)