mbox series

[v4,0/8] Directory traversal fixes

Message ID pull.1020.v4.git.git.1620758049.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series Directory traversal fixes | expand

Message

Philippe Blain via GitGitGadget May 11, 2021, 6:34 p.m. UTC
This patchset fixes a few directory traversal issues, where fill_directory()
would traverse into directories that it shouldn't and not traverse into
directories that it should (one of which was originally reported on this
list at [1]). And it includes a few cleanups

Changes since v3, includes numerous cleanups suggested by Junio and Jeff
(thanks for the reviews!):

 * Removed the RFC labels, but if folks want a warning instead of a die on
   ls-files -i (see patch 3), let me know
 * Include the path passed to read_directory() in the printed trace2
   statistics
 * Print trace2 statistics before calling trace2_region_leave()
 * Make sure to convert both trace_performance_leave() calls
 * testcase style fixes
 * left a comment that directories-visited:1 referred to the toplevel
   directory
 * fixed up some commit message comments about "exclude" and mentioned yet
   another way it can be confusing.

As noted in previous versions of this series, if folks would prefer ls-files
-i to continue running but print a warning rather than making it an error as
I did in this series, let me know. Also, if anyone has any ideas about a
better place to put the "Some sidenotes" from the sixth commit message
rather than keeping them in a random commit message, that might be helpful
too.

[1] See
https://lore.kernel.org/git/DM6PR00MB06829EC5B85E0C5AC595004E894E9@DM6PR00MB0682.namprd00.prod.outlook.com/
or alternatively https://github.com/git-for-windows/git/issues/2732.

Derrick Stolee (1):
  dir: update stale description of treat_directory()

Elijah Newren (7):
  dir: convert trace calls to trace2 equivalents
  dir: report number of visited directories and paths with trace2
  ls-files: error out on -i unless -o or -c are specified
  t7300: add testcase showing unnecessary traversal into ignored
    directory
  t3001, t7300: add testcase showcasing missed directory traversal
  dir: avoid unnecessary traversal into ignored directory
  dir: traverse into untracked directories if they may have ignored
    subfiles

 builtin/ls-files.c                 |   3 +
 dir.c                              | 112 +++++++++++-----
 dir.h                              |   4 +
 t/t1306-xdg-files.sh               |   2 +-
 t/t3001-ls-files-others-exclude.sh |   5 +
 t/t3003-ls-files-exclude.sh        |   4 +-
 t/t7063-status-untracked-cache.sh  | 206 +++++++++++++++++------------
 t/t7300-clean.sh                   |  42 ++++++
 t/t7519-status-fsmonitor.sh        |   8 +-
 9 files changed, 259 insertions(+), 127 deletions(-)


base-commit: 311531c9de557d25ac087c1637818bd2aad6eb3a
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1020%2Fnewren%2Fdirectory-traversal-fixes-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1020/newren/directory-traversal-fixes-v4
Pull-Request: https://github.com/git/git/pull/1020

Range-diff vs v3:

 1:  9f1c0d78d739 ! 1:  9204e36b7e90 [RFC] dir: convert trace calls to trace2 equivalents
     @@ Metadata
      Author: Elijah Newren <newren@gmail.com>
      
       ## Commit message ##
     -    [RFC] dir: convert trace calls to trace2 equivalents
     +    dir: convert trace calls to trace2 equivalents
      
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
     @@ dir.c: static struct untracked_cache_dir *validate_untracked_cache(struct dir_st
       }
       
      +static void trace2_read_directory_statistics(struct dir_struct *dir,
     -+					     struct repository *repo)
     ++					     struct repository *repo,
     ++					     const char *path)
      +{
      +	if (!dir->untracked)
      +		return;
     ++	trace2_data_string("read_directory", repo, "path", path);
      +	trace2_data_intmax("read_directory", repo,
      +			   "node-creation", dir->untracked->dir_created);
      +	trace2_data_intmax("read_directory", repo,
     @@ dir.c: static struct untracked_cache_dir *validate_untracked_cache(struct dir_st
      +	trace2_region_enter("dir", "read_directory", istate->repo);
       
       	if (has_symlink_leading_path(path, len)) {
     - 		trace_performance_leave("read directory %.*s", len, path);
     +-		trace_performance_leave("read directory %.*s", len, path);
     ++		trace2_region_leave("dir", "read_directory", istate->repo);
     + 		return dir->nr;
     + 	}
     + 
      @@ dir.c: int read_directory(struct dir_struct *dir, struct index_state *istate,
       	QSORT(dir->entries, dir->nr, cmp_dir_entry);
       	QSORT(dir->ignored, dir->ignored_nr, cmp_dir_entry);
       
      -	trace_performance_leave("read directory %.*s", len, path);
     ++	if (trace2_is_enabled()) {
     ++		struct strbuf tmp = STRBUF_INIT;
     ++		strbuf_add(&tmp, path, len);
     ++		trace2_read_directory_statistics(dir, istate->repo, tmp.buf);
     ++		strbuf_release(&tmp);
     ++	}
     ++
      +	trace2_region_leave("dir", "read_directory", istate->repo);
       	if (dir->untracked) {
       		static int force_untracked_cache = -1;
     @@ dir.c: int read_directory(struct dir_struct *dir, struct index_state *istate,
       		}
       	}
      +
     -+	if (trace2_is_enabled())
     -+		trace2_read_directory_statistics(dir, istate->repo);
       	return dir->nr;
       }
       
     @@ t/t7063-status-untracked-cache.sh: iuc () {
       	return $ret
       }
       
     -+get_relevant_traces() {
     ++get_relevant_traces () {
      +	# From the GIT_TRACE2_PERF data of the form
      +	#    $TIME $FILE:$LINE | d0 | main | data | r1 | ? | ? | read_directo | $RELEVANT_STAT
      +	# extract the $RELEVANT_STAT fields.  We don't care about region_enter
      +	# or region_leave, or stats for things outside read_directory.
      +	INPUT_FILE=$1
      +	OUTPUT_FILE=$2
     -+	grep data.*read_directo $INPUT_FILE \
     -+	    | cut -d "|" -f 9 \
     -+	    >$OUTPUT_FILE
     ++	grep data.*read_directo $INPUT_FILE |
     ++	    cut -d "|" -f 9 \
     ++	    >"$OUTPUT_FILE"
      +}
      +
      +
     @@ t/t7063-status-untracked-cache.sh: EOF
      -gitignore invalidation: 1
      -directory invalidation: 0
      -opendir: 4
     -+ ..node-creation:3
     -+ ..gitignore-invalidation:1
     -+ ..directory-invalidation:0
     -+ ..opendir:4
     ++ ....path:
     ++ ....node-creation:3
     ++ ....gitignore-invalidation:1
     ++ ....directory-invalidation:0
     ++ ....opendir:4
       EOF
      -	test_cmp ../trace.expect ../trace
      +	test_cmp ../trace.expect ../trace.relevant
     @@ t/t7063-status-untracked-cache.sh: test_expect_success 'untracked cache after fi
      -gitignore invalidation: 0
      -directory invalidation: 0
      -opendir: 0
     -+ ..node-creation:0
     -+ ..gitignore-invalidation:0
     -+ ..directory-invalidation:0
     -+ ..opendir:0
     ++ ....path:
     ++ ....node-creation:0
     ++ ....gitignore-invalidation:0
     ++ ....directory-invalidation:0
     ++ ....opendir:0
       EOF
      -	test_cmp ../trace.expect ../trace
      +	test_cmp ../trace.expect ../trace.relevant
     @@ t/t7063-status-untracked-cache.sh: A  two
      -gitignore invalidation: 0
      -directory invalidation: 1
      -opendir: 1
     -+ ..node-creation:0
     -+ ..gitignore-invalidation:0
     -+ ..directory-invalidation:1
     -+ ..opendir:1
     ++ ....path:
     ++ ....node-creation:0
     ++ ....gitignore-invalidation:0
     ++ ....directory-invalidation:1
     ++ ....opendir:1
       EOF
      -	test_cmp ../trace.expect ../trace
      +	test_cmp ../trace.expect ../trace.relevant
     @@ t/t7063-status-untracked-cache.sh: A  two
      -gitignore invalidation: 1
      -directory invalidation: 1
      -opendir: 4
     -+ ..node-creation:0
     -+ ..gitignore-invalidation:1
     -+ ..directory-invalidation:1
     -+ ..opendir:4
     ++ ....path:
     ++ ....node-creation:0
     ++ ....gitignore-invalidation:1
     ++ ....directory-invalidation:1
     ++ ....opendir:4
       EOF
      -	test_cmp ../trace.expect ../trace
      +	test_cmp ../trace.expect ../trace.relevant
     @@ t/t7063-status-untracked-cache.sh: A  two
      -gitignore invalidation: 1
      -directory invalidation: 0
      -opendir: 4
     -+ ..node-creation:0
     -+ ..gitignore-invalidation:1
     -+ ..directory-invalidation:0
     -+ ..opendir:4
     ++ ....path:
     ++ ....node-creation:0
     ++ ....gitignore-invalidation:1
     ++ ....directory-invalidation:0
     ++ ....opendir:4
       EOF
      -	test_cmp ../trace.expect ../trace
      +	test_cmp ../trace.expect ../trace.relevant
     @@ t/t7063-status-untracked-cache.sh: A  one
      -gitignore invalidation: 0
      -directory invalidation: 0
      -opendir: 1
     -+ ..node-creation:0
     -+ ..gitignore-invalidation:0
     -+ ..directory-invalidation:0
     -+ ..opendir:1
     ++ ....path:
     ++ ....node-creation:0
     ++ ....gitignore-invalidation:0
     ++ ....directory-invalidation:0
     ++ ....opendir:1
       EOF
      -	test_cmp ../trace.expect ../trace
      +	test_cmp ../trace.expect ../trace.relevant
     @@ t/t7063-status-untracked-cache.sh: A  two
      -gitignore invalidation: 0
      -directory invalidation: 0
      -opendir: 1
     -+ ..node-creation:0
     -+ ..gitignore-invalidation:0
     -+ ..directory-invalidation:0
     -+ ..opendir:1
     ++ ....path:
     ++ ....node-creation:0
     ++ ....gitignore-invalidation:0
     ++ ....directory-invalidation:0
     ++ ....opendir:1
       EOF
      -	test_cmp ../trace.expect ../trace
      +	test_cmp ../trace.expect ../trace.relevant
     @@ t/t7063-status-untracked-cache.sh: test_expect_success 'status after commit' '
      -gitignore invalidation: 0
      -directory invalidation: 0
      -opendir: 2
     -+ ..node-creation:0
     -+ ..gitignore-invalidation:0
     -+ ..directory-invalidation:0
     -+ ..opendir:2
     ++ ....path:
     ++ ....node-creation:0
     ++ ....gitignore-invalidation:0
     ++ ....directory-invalidation:0
     ++ ....opendir:2
       EOF
      -	test_cmp ../trace.expect ../trace
      +	test_cmp ../trace.expect ../trace.relevant
     @@ t/t7063-status-untracked-cache.sh: test_expect_success 'test sparse status with
      -gitignore invalidation: 1
      -directory invalidation: 2
      -opendir: 2
     -+ ..node-creation:0
     -+ ..gitignore-invalidation:1
     -+ ..directory-invalidation:2
     -+ ..opendir:2
     ++ ....path:
     ++ ....node-creation:0
     ++ ....gitignore-invalidation:1
     ++ ....directory-invalidation:2
     ++ ....opendir:2
       EOF
      -	test_cmp ../trace.expect ../trace
      +	test_cmp ../trace.expect ../trace.relevant
     @@ t/t7063-status-untracked-cache.sh: test_expect_success 'test sparse status again
      -gitignore invalidation: 0
      -directory invalidation: 0
      -opendir: 0
     -+ ..node-creation:0
     -+ ..gitignore-invalidation:0
     -+ ..directory-invalidation:0
     -+ ..opendir:0
     ++ ....path:
     ++ ....node-creation:0
     ++ ....gitignore-invalidation:0
     ++ ....directory-invalidation:0
     ++ ....opendir:0
       EOF
      -	test_cmp ../trace.expect ../trace
      +	test_cmp ../trace.expect ../trace.relevant
     @@ t/t7063-status-untracked-cache.sh: test_expect_success 'test sparse status with
      -gitignore invalidation: 0
      -directory invalidation: 1
      -opendir: 3
     -+ ..node-creation:2
     -+ ..gitignore-invalidation:0
     -+ ..directory-invalidation:1
     -+ ..opendir:3
     ++ ....path:
     ++ ....node-creation:2
     ++ ....gitignore-invalidation:0
     ++ ....directory-invalidation:1
     ++ ....opendir:3
       EOF
      -	test_cmp ../trace.expect ../trace
      +	test_cmp ../trace.expect ../trace.relevant
     @@ t/t7063-status-untracked-cache.sh: EOF
      -gitignore invalidation: 0
      -directory invalidation: 0
      -opendir: 0
     -+ ..node-creation:0
     -+ ..gitignore-invalidation:0
     -+ ..directory-invalidation:0
     -+ ..opendir:0
     ++ ....path:
     ++ ....node-creation:0
     ++ ....gitignore-invalidation:0
     ++ ....directory-invalidation:0
     ++ ....opendir:0
       EOF
      -	test_cmp ../trace.expect ../trace
      +	test_cmp ../trace.expect ../trace.relevant
 2:  8b511f228af8 ! 2:  6939253be825 [RFC] dir: report number of visited directories and paths with trace2
     @@ Metadata
      Author: Elijah Newren <newren@gmail.com>
      
       ## Commit message ##
     -    [RFC] dir: report number of visited directories and paths with trace2
     +    dir: report number of visited directories and paths with trace2
      
          Provide more statistics in trace2 output that include the number of
          directories and total paths visited by the directory traversal logic.
     @@ dir.c: static enum path_treatment read_directory_recursive(struct dir_struct *di
       
       		if (state > dir_state)
       			dir_state = state;
     -@@ dir.c: static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
     - static void trace2_read_directory_statistics(struct dir_struct *dir,
     - 					     struct repository *repo)
     +@@ dir.c: static void trace2_read_directory_statistics(struct dir_struct *dir,
     + 					     struct repository *repo,
     + 					     const char *path)
       {
      +	trace2_data_intmax("read_directory", repo,
      +			   "directories-visited", dir->visited_directories);
     @@ dir.c: static struct untracked_cache_dir *validate_untracked_cache(struct dir_st
      +			   "paths-visited", dir->visited_paths);
       	if (!dir->untracked)
       		return;
     - 	trace2_data_intmax("read_directory", repo,
     + 	trace2_data_string("read_directory", repo, "path", path);
      @@ dir.c: int read_directory(struct dir_struct *dir, struct index_state *istate,
       	struct untracked_cache_dir *untracked;
       
     @@ dir.c: int read_directory(struct dir_struct *dir, struct index_state *istate,
      +	dir->visited_directories = 0;
       
       	if (has_symlink_leading_path(path, len)) {
     - 		trace_performance_leave("read directory %.*s", len, path);
     + 		trace2_region_leave("dir", "read_directory", istate->repo);
      
       ## dir.h ##
      @@ dir.h: struct dir_struct {
     @@ dir.h: struct dir_struct {
       /*Count the number of slashes for string s*/
      
       ## t/t7063-status-untracked-cache.sh ##
     -@@ t/t7063-status-untracked-cache.sh: get_relevant_traces() {
     +@@ t/t7063-status-untracked-cache.sh: get_relevant_traces () {
       	INPUT_FILE=$1
       	OUTPUT_FILE=$2
     - 	grep data.*read_directo $INPUT_FILE \
     -+	    | grep -v visited \
     - 	    | cut -d "|" -f 9 \
     - 	    >$OUTPUT_FILE
     + 	grep data.*read_directo $INPUT_FILE |
     +-	    cut -d "|" -f 9 \
     ++	    cut -d "|" -f 9 |
     ++	    grep -v visited \
     + 	    >"$OUTPUT_FILE"
       }
     + 
 3:  44a1322c4402 ! 3:  8d0ca8104be6 [RFC] ls-files: error out on -i unless -o or -c are specified
     @@ Metadata
      Author: Elijah Newren <newren@gmail.com>
      
       ## Commit message ##
     -    [RFC] ls-files: error out on -i unless -o or -c are specified
     +    ls-files: error out on -i unless -o or -c are specified
      
          ls-files --ignored can be used together with either --others or
          --cached.  After being perplexed for a bit and digging in to the code, I
          assumed that ls-files -i was just broken and not printing anything and
     -    had a nice patch ready to submit when I finally realized that -i can be
     +    I had a nice patch ready to submit when I finally realized that -i can be
          used with --cached to find tracked ignores.
      
          While that was a mistake on my part, and a careful reading of the
     @@ builtin/ls-files.c: int cmd_ls_files(int argc, const char **argv, const char *cm
       		ps_matched = xcalloc(pathspec.nr, 1);
       
      +	if ((dir.flags & DIR_SHOW_IGNORED) && !show_others && !show_cached)
     -+		die("ls-files --ignored is usually used with --others, but --cached is the default.  Please specify which you want.");
     ++		die("ls-files -i must be used with either -o or -c");
      +
       	if ((dir.flags & DIR_SHOW_IGNORED) && !exc_given)
       		die("ls-files --ignored needs some exclude pattern");
 4:  dc3d3f247141 ! 4:  317abab3571e t7300: add testcase showing unnecessary traversal into ignored directory
     @@ t/t7300-clean.sh: test_expect_success 'clean untracked paths by pathspec' '
      +		git clean -ffdxn -e untracked
      +	) &&
      +
     -+	grep data.*read_directo.*visited trace.output \
     -+		| cut -d "|" -f 9 >trace.relevant &&
     ++	# Make sure we only visited into the top-level directory, and did
     ++	# not traverse into the "untracked" subdirectory since it was excluded
     ++	grep data.*read_directo.*directories-visited trace.output |
     ++		cut -d "|" -f 9 >trace.relevant &&
      +	cat >trace.expect <<-EOF &&
     -+	 directories-visited:1
     -+	 paths-visited:4
     ++	 ..directories-visited:1
      +	EOF
      +	test_cmp trace.expect trace.relevant
      +'
 5:  73b03a1e8e05 = 5:  5eb019327b57 t3001, t7300: add testcase showcasing missed directory traversal
 6:  66ffc7f02d08 ! 6:  89cc01ef8598 dir: avoid unnecessary traversal into ignored directory
     @@ Commit message
            since "git ls-files -c -i" is supposed to list them.
      
          * The dir code often uses "ignored" and "excluded" interchangeably,
     -      which you need to keep in mind while reading the code.  Sadly, though,
     -      it can get very confusing since ignore rules can have exclusions, as
     -      in the last of the following .gitignore rules:
     -          .gitignore
     +      which you need to keep in mind while reading the code.
     +
     +    * "exclude" is used multiple ways in the code:
     +
     +      * As noted above, "exclude" is often a synonym for "ignored".
     +
     +      * The logic for parsing .gitignore files was re-used in
     +        .git/info/sparse-checkout, except there it is used to mark paths that
     +        the user wants to *keep*.  This was mostly addressed by commit
     +        65edd96aec ("treewide: rename 'exclude' methods to 'pattern'",
     +        2019-09-03), but every once in a while you'll find a comment about
     +        "exclude" referring to these patterns that might in fact be in use
     +        by the sparse-checkout machinery for inclusion rules.
     +
     +      * The word "EXCLUDE" is also used for pathspec negation, as in
     +          (pathspec->items[3].magic & PATHSPEC_EXCLUDE)
     +        Thus if a user had a .gitignore file containing
                *~
                *.log
                !settings.log
     -      In the last entry above, (pathspec->items[3].magic & PATHSPEC_EXCLUDE)
     -      will be true due the the '!' negating the rule.  Someone might refer
     -      to this as "excluded".  That means the file 'settings.log' will not
     -      match, and thus not be ignored.  So we won't return path_excluded for
     -      it.  So it's an exclude rule that prevents the file from being an
     -      exclude.  The non-excluded rules are the ones that result in files
     -      being excludes.  Great fun, eh?
     +        And then ran
     +          git add -- 'settings.*' ':^settings.log'
     +        Then :^settings.log is a pathspec negation making settings.log not
     +        be requested to be added even though all other settings.* files are
     +        being added.  Also, !settings.log in the gitignore file is a negative
     +        exclude pattern meaning that settings.log is normally a file we
     +        want to track even though all other *.log files are ignored.
      
          Sometimes it feels like dir.c needs its own glossary with its many
          definitions, including the multiply-defined terms.
 7:  acde436b220e = 7:  4a561e1229e4 dir: traverse into untracked directories if they may have ignored subfiles
 8:  57135c357774 = 8:  2945e749f5e3 dir: update stale description of treat_directory()