Message ID | dc3d3f2471410aa55da4dbc8e2747192888bce5f.1620503945.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Directory traversal fixes | expand |
"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes: > +test_expect_failure 'avoid traversing into ignored directories' ' > + test_when_finished rm -f output error trace.* && > + test_create_repo avoid-traversing-deep-hierarchy && > + ( > + cd avoid-traversing-deep-hierarchy && > + > + mkdir -p untracked/subdir/with/a && > + >untracked/subdir/with/a/random-file.txt && > + > + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ > + git clean -ffdxn -e untracked > + ) && > + > + grep data.*read_directo.*visited trace.output \ > + | cut -d "|" -f 9 >trace.relevant && > + cat >trace.expect <<-EOF && > + directories-visited:1 > + paths-visited:4 Are the origins of '1' and '4' trivially obvious to those who are reading the test, or do these deserve comments? We create an empty test repository, go there and create a untracked/ hierarchy with a junk file, and tell "clean" that 'untracked' is "also" in the exclude pattern (but since there is no other exclude pattern, that is the only one), so everything underneath untracked/ we have no reason to inspect. So, we do not visit 'untracked' directory. Which ones do we visit? Is '1' coming from the top-level of the working tree '.'? What about the number of visited paths '4' (the trace is stored outside this new test repository, so that's not it). Thanks. > + EOF > + test_cmp trace.expect trace.relevant > +' > + > test_done
On Sun, May 9, 2021 at 10:28 PM Junio C Hamano <gitster@pobox.com> wrote: > > "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > +test_expect_failure 'avoid traversing into ignored directories' ' > > + test_when_finished rm -f output error trace.* && > > + test_create_repo avoid-traversing-deep-hierarchy && > > + ( > > + cd avoid-traversing-deep-hierarchy && > > + > > + mkdir -p untracked/subdir/with/a && > > + >untracked/subdir/with/a/random-file.txt && > > + > > + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ > > + git clean -ffdxn -e untracked > > + ) && > > + > > + grep data.*read_directo.*visited trace.output \ > > + | cut -d "|" -f 9 >trace.relevant && > > + cat >trace.expect <<-EOF && > > + directories-visited:1 > > + paths-visited:4 > > Are the origins of '1' and '4' trivially obvious to those who are > reading the test, or do these deserve comments? > > We create an empty test repository, go there and create a untracked/ > hierarchy with a junk file, and tell "clean" that 'untracked' is > "also" in the exclude pattern (but since there is no other exclude > pattern, that is the only one), so everything underneath untracked/ > we have no reason to inspect. > > So, we do not visit 'untracked' directory. Which ones do we visit? > Is '1' coming from the top-level of the working tree '.'? What > about the number of visited paths '4' (the trace is stored outside > this new test repository, so that's not it). Good points. I'll make a comment that directories-visited:1 is about ensuring we only went into the toplevel directory, and I'll removed the paths-visited check. But to answer your question, the paths we visit are '.', '..', '.git', and 'untracked', the first three of which we mark as path_none and don't recurse into because of special rules for those paths, and the last of which we shouldn't recurse into since it is ignored. There weren't any non-directory files in the toplevel directory, or those would also be included in the paths-visited count. A later patch in the series will fix the code to not recurse into the 'untracked' directory, fixing this test.
Elijah Newren <newren@gmail.com> writes: > But to answer your question, the paths we visit are '.', '..', '.git', > and 'untracked', the first three of which we mark as path_none and > don't recurse into because of special rules for those paths, and the > last of which we shouldn't recurse into since it is ignored. Not a hard requirement, but I wish if we entirely ignored "." and ".." in our code (not just not counting, but making whoever calls readdir() skip and call it again when it gets "." or ".."). https://pubs.opengroup.org/onlinepubs/9699919799/functions/readdir.html seems to imply that readdir() may not give "." or ".." (if dot or dot-dot exists, you are to return them only once, which implies that it is perfectly OK for dot or dot-dot to be missing). So dropping the test for number of visited paths would be nicer from portability's point of view ;-) Thanks.
On Tue, May 11, 2021 at 3:43 PM Junio C Hamano <gitster@pobox.com> wrote: > > Elijah Newren <newren@gmail.com> writes: > > > But to answer your question, the paths we visit are '.', '..', '.git', > > and 'untracked', the first three of which we mark as path_none and > > don't recurse into because of special rules for those paths, and the > > last of which we shouldn't recurse into since it is ignored. > > Not a hard requirement, but I wish if we entirely ignored "." and > ".." in our code (not just not counting, but making whoever calls > readdir() skip and call it again when it gets "." or ".."). > > https://pubs.opengroup.org/onlinepubs/9699919799/functions/readdir.html > > seems to imply that readdir() may not give "." or ".." (if dot or > dot-dot exists, you are to return them only once, which implies that > it is perfectly OK for dot or dot-dot to be missing). Something like this? diff --git a/dir.c b/dir.c index 993a12145f..7f470bc701 100644 --- a/dir.c +++ b/dir.c @@ -2341,7 +2341,11 @@ static int read_cached_dir(struct cached_dir *cdir) struct dirent *de; if (cdir->fdir) { - de = readdir(cdir->fdir); + while ((de = readdir(cdir->fdir))) { + /* Ignore '.' and '..' by re-looping; handle the rest */ + if (!de || !is_dot_or_dotdot(de->d_name)) + break; + } if (!de) { cdir->d_name = NULL; cdir->d_type = DT_UNKNOWN; It appears that the other two callers of readdir() in dir.c, namely in is_empty_dir() and remove_dir_recurse() already have such special repeat-if-is_dot_or_dotdot() logic built into them, so this was partially lifted from those. If you'd like, I can add another patch in the series with this change so that all readdir() calls in dir.c have such ignore '.' and '..' logic. Or, we could perhaps introduce a new readdir() wrapper that does nothing other than ignore '.' and '..' and have all three of these callsites use that new wrapper. > So dropping the test for number of visited paths would be nicer from > portability's point of view ;-) Yep, makes sense. I already did that in v4, which means it'll continue to pass with or without the above proposed change to read_cached_dir().
Elijah Newren <newren@gmail.com> writes: > If you'd like, I can add another patch in the series with this change > so that all readdir() calls in dir.c have such ignore '.' and '..' > logic. Or, we could perhaps introduce a new readdir() wrapper that > does nothing other than ignore '.' and '..' and have all three of > these callsites use that new wrapper. Yeah, it is good to be consistent (either implementation). >> So dropping the test for number of visited paths would be nicer from >> portability's point of view ;-) > > Yep, makes sense. I already did that in v4, which means it'll > continue to pass with or without the above proposed change to > read_cached_dir(). Yup.
diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh index a74816ca8b46..b7c9898fac5b 100755 --- a/t/t7300-clean.sh +++ b/t/t7300-clean.sh @@ -746,4 +746,26 @@ test_expect_success 'clean untracked paths by pathspec' ' test_must_be_empty actual ' +test_expect_failure 'avoid traversing into ignored directories' ' + test_when_finished rm -f output error trace.* && + test_create_repo avoid-traversing-deep-hierarchy && + ( + cd avoid-traversing-deep-hierarchy && + + mkdir -p untracked/subdir/with/a && + >untracked/subdir/with/a/random-file.txt && + + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ + git clean -ffdxn -e untracked + ) && + + grep data.*read_directo.*visited trace.output \ + | cut -d "|" -f 9 >trace.relevant && + cat >trace.expect <<-EOF && + directories-visited:1 + paths-visited:4 + EOF + test_cmp trace.expect trace.relevant +' + test_done