mbox series

[v2,0/2] Clean up some memory leaks in and around dir.c

Message ID pull.831.v2.git.git.1597791506.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series Clean up some memory leaks in and around dir.c | expand

Message

Philippe Blain via GitGitGadget Aug. 18, 2020, 10:58 p.m. UTC
Some memory leaks in dir.c were making it hard for me to find my own leaks
in merge-ort. I followed a few threads and tried to both clean up the leaks
and make the API less prone to incorrect use.

Changes since v1:

 * dropped the first patch, since Peff submitted an identical patch a day
   before me
 * implemented Peff's suggestions to rename dir_free() to dir_clear() and
   have it call dir_init() at the end so that folks can re-use the struct
   after dir_clear() if they so choose.

Elijah Newren (2):
  dir: make clear_directory() free all relevant memory
  dir: fix problematic API to avoid memory leaks

 builtin/add.c          |  4 ++--
 builtin/check-ignore.c |  4 ++--
 builtin/clean.c        | 12 ++++--------
 builtin/grep.c         |  3 ++-
 builtin/ls-files.c     |  4 ++--
 builtin/stash.c        |  7 ++-----
 dir.c                  | 20 +++++++++++++++++---
 dir.h                  | 21 +++++++++++----------
 merge.c                |  3 ++-
 wt-status.c            |  8 ++------
 10 files changed, 46 insertions(+), 40 deletions(-)


base-commit: 878e727637ec5815ccb3301eb994a54df95b21b8
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-831%2Fnewren%2Fdir_memory_cleanup-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-831/newren/dir_memory_cleanup-v2
Pull-Request: https://github.com/git/git/pull/831

Range-diff vs v1:

 1:  932741d759 < -:  ---------- dir: fix leak of parent_hashmap and recursive_hashmap
 2:  068e097e22 ! 1:  f890ac0279 dir: make clear_directory() free all relevant memory
     @@ Commit message
          The calling convention for the dir API is supposed to end with a call to
          clear_directory() to free up no longer needed memory.  However,
          clear_directory() didn't free dir->entries or dir->ignored.  I believe
     -    this was oversight, but a number of callers noticed memory leaks and
     -    started free'ing these, but often somewhat haphazardly (sometimes
     -    freeing the entries in the arrays, and sometimes only free'ing the
     -    arrays themselves).  This suggests the callers weren't trying to make
     -    sure any possible memory used might be free'd, but just the memory they
     -    noticed their usecase definitely had allocated.  This also caused the
     -    extra memory deallocations to be duplicated in many places.
     +    this was an oversight, but a number of callers noticed memory leaks and
     +    started free'ing these.  Unfortunately, they did so somewhat haphazardly
     +    (sometimes freeing the entries in the arrays, and sometimes only
     +    free'ing the arrays themselves).  This suggests the callers weren't
     +    trying to make sure any possible memory used might be free'd, but just
     +    the memory they noticed their usecase definitely had allocated.
      
          Fix this mess by moving all the duplicated free'ing logic into
     -    clear_directory().
     +    clear_directory().  End by resetting dir to a pristine state so it could
     +    be reused if desired.
      
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
     @@ builtin/stash.c: static int get_untracked_files(const struct pathspec *ps, int i
      
       ## dir.c ##
      @@ dir.c: int remove_path(const char *name)
     - 	return 0;
       }
       
     --/*
     + /*
      - * Frees memory within dir which was allocated for exclude lists and
      - * the exclude_stack.  Does not free dir itself.
     -- */
     -+/* Frees memory within dir which was allocated.  Does not free dir itself. */
     ++ * Frees memory within dir which was allocated, and resets fields for further
     ++ * use.  Does not free dir itself.
     +  */
       void clear_directory(struct dir_struct *dir)
       {
     - 	int i, j;
      @@ dir.c: void clear_directory(struct dir_struct *dir)
       		free(group->pl);
       	}
     @@ dir.c: void clear_directory(struct dir_struct *dir)
       	stk = dir->exclude_stack;
       	while (stk) {
       		struct exclude_stack *prev = stk->prev;
     +@@ dir.c: void clear_directory(struct dir_struct *dir)
     + 		stk = prev;
     + 	}
     + 	strbuf_release(&dir->basebuf);
     ++
     ++	memset(&dir, 0, sizeof(*dir));
     + }
     + 
     + struct ondisk_untracked_cache {
      
       ## dir.h ##
      @@
 3:  b9310e9941 ! 2:  dcd72eef48 dir: fix problematic API to avoid memory leaks
     @@ Metadata
       ## Commit message ##
          dir: fix problematic API to avoid memory leaks
      
     -    Two commits ago, we noticed that parent_hashmap and recursive_hashmap
     -    were being leaked and fixed it by making clear_pattern_list() free those
     -    hashmaps.  One commit ago, we noticed that clear_directory() was only
     -    taking responsibility for a subset of fields within dir_struct, despite
     -    the fact that entries[] and ignored[] we allocated internally to dir.c,
     -    resulting in many callers either leaking or haphazardly trying to free
     -    these arrays and their contents.
     +    The dir structure seemed to have a number of leaks and problems around
     +    it.  First I noticed that parent_hashmap and recursive_hashmap were
     +    being leaked (though Peff noticed and submitted fixes before me).  Then
     +    I noticed in the previous commit that clear_directory() was only taking
     +    responsibility for a subset of fields within dir_struct, despite the
     +    fact that entries[] and ignored[] we allocated internally to dir.c.
     +    That, of course, resulted in many callers either leaking or haphazardly
     +    trying to free these arrays and their contents.
      
          Digging further, I found that despite the pretty clear documentation
          near the top of dir.h that folks were supposed to call clear_directory()
     @@ Commit message
          which then snowballed into further problems with the entries[],
          ignored[], parent_hashmap, and recursive_hashmap problems.
      
     -    Rename clear_directory() to dir_free() to be more in line with other
     +    Rename clear_directory() to dir_clear() to be more in line with other
          data structures in git, and introduce a dir_init() to handle the
          suggested memsetting of dir_struct to all zeroes.  I hope that a name
     -    like "dir_free()" is more clear, and that the presence of dir_init()
     -    will provide a hint to those looking at the code that there may be a
     -    corresponding dir_free() that they need to call.
     +    like "dir_clear()" is more clear, and that the presence of dir_init()
     +    will provide a hint to those looking at the code that they need to look
     +    for either a dir_clear() or a dir_free() and lead them to find
     +    dir_clear().
      
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
     @@ builtin/add.c: int cmd_add(int argc, const char **argv, const char *prefix)
       			       COMMIT_LOCK | SKIP_IF_UNCHANGED))
       		die(_("Unable to write new index file"));
       
     -+	dir_free(&dir);
     ++	dir_clear(&dir);
       	UNLEAK(pathspec);
      -	UNLEAK(dir);
       	return exit_status;
     @@ builtin/check-ignore.c: int cmd_check_ignore(int argc, const char **argv, const
       	}
       
      -	clear_directory(&dir);
     -+	dir_free(&dir);
     ++	dir_clear(&dir);
       
       	return !num_ignored;
       }
     @@ builtin/clean.c: static int filter_by_patterns_cmd(void)
       
       		strbuf_list_free(ignore_list);
      -		clear_directory(&dir);
     -+		dir_free(&dir);
     ++		dir_clear(&dir);
       	}
       
       	strbuf_release(&confirm);
     @@ builtin/clean.c: int cmd_clean(int argc, const char **argv, const char *prefix)
       	}
       
      -	clear_directory(&dir);
     -+	dir_free(&dir);
     ++	dir_clear(&dir);
       
       	if (interactive && del_list.nr > 0)
       		interactive_main_loop();
     @@ builtin/grep.c: static int grep_directory(struct grep_opt *opt, const struct pat
       		if (hit && opt->status_only)
       			break;
       	}
     -+	dir_free(&dir);
     ++	dir_clear(&dir);
       	return hit;
       }
       
     @@ builtin/ls-files.c: int cmd_ls_files(int argc, const char **argv, const char *cm
       	}
       
      -	UNLEAK(dir);
     -+	dir_free(&dir);
     ++	dir_clear(&dir);
       	return 0;
       }
      
     @@ builtin/stash.c: static int get_untracked_files(const struct pathspec *ps, int i
       	}
       
      -	clear_directory(&dir);
     -+	dir_free(&dir);
     ++	dir_clear(&dir);
       	return found;
       }
       
     @@ dir.c: static enum path_treatment read_directory_recursive(struct dir_struct *di
       {
       	int cnt = 0;
      @@ dir.c: int remove_path(const char *name)
     - }
     - 
     - /* Frees memory within dir which was allocated.  Does not free dir itself. */
     +  * Frees memory within dir which was allocated, and resets fields for further
     +  * use.  Does not free dir itself.
     +  */
      -void clear_directory(struct dir_struct *dir)
     -+void dir_free(struct dir_struct *dir)
     ++void dir_clear(struct dir_struct *dir)
       {
       	int i, j;
       	struct exclude_list_group *group;
     +@@ dir.c: void clear_directory(struct dir_struct *dir)
     + 	}
     + 	strbuf_release(&dir->basebuf);
     + 
     +-	memset(&dir, 0, sizeof(*dir));
     ++	dir_init(dir);
     + }
     + 
     + struct ondisk_untracked_cache {
      
       ## dir.h ##
      @@
     @@ dir.h
      - * - Use `dir.entries[]`.
      + * - Use `dir.entries[]` and `dir.ignored[]`.
        *
     -  * - Call `clear_directory()` when the contained elements are no longer in use.
     +- * - Call `clear_directory()` when the contained elements are no longer in use.
     ++ * - Call `dir_clear()` when the contained elements are no longer in use.
        *
     +  */
     + 
      @@ dir.h: int match_pathspec(const struct index_state *istate,
       int report_path_error(const char *ps_matched, const struct pathspec *pathspec);
       int within_depth(const char *name, int namelen, int depth, int max_depth);
     @@ dir.h: void parse_path_pattern(const char **string, int *patternlen, unsigned *f
       		 int baselen, struct pattern_list *pl, int srcpos);
       void clear_pattern_list(struct pattern_list *pl);
      -void clear_directory(struct dir_struct *dir);
     -+void dir_free(struct dir_struct *dir);
     ++void dir_clear(struct dir_struct *dir);
       
       int repo_file_exists(struct repository *repo, const char *path);
       int file_exists(const char *);
     @@ merge.c: int checkout_fast_forward(struct repository *r,
       		clear_unpack_trees_porcelain(&opts);
       		return -1;
       	}
     -+	dir_free(&dir);
     ++	dir_clear(&dir);
       	clear_unpack_trees_porcelain(&opts);
       
       	if (write_locked_index(r->index, &lock_file, COMMIT_LOCK))
     @@ wt-status.c: static void wt_status_collect_untracked(struct wt_status *s)
       	}
       
      -	clear_directory(&dir);
     -+	dir_free(&dir);
     ++	dir_clear(&dir);
       
       	if (advice_status_u_option)
       		s->untracked_in_ms = (getnanotime() - t_begin) / 1000000;

Comments

Jeff King Aug. 19, 2020, 1:51 p.m. UTC | #1
On Tue, Aug 18, 2020 at 10:58:24PM +0000, Elijah Newren via GitGitGadget wrote:

> Some memory leaks in dir.c were making it hard for me to find my own leaks
> in merge-ort. I followed a few threads and tried to both clean up the leaks
> and make the API less prone to incorrect use.
> 
> Changes since v1:
> 
>  * dropped the first patch, since Peff submitted an identical patch a day
>    before me
>  * implemented Peff's suggestions to rename dir_free() to dir_clear() and
>    have it call dir_init() at the end so that folks can re-use the struct
>    after dir_clear() if they so choose.

Thanks, both patches look good to me.

-Peff