diff mbox series

[v2,1/8] t7300: add testcase showing unnecessary traversal into ignored directory

Message ID a3bd253fa8e8ae47d19beb35327d8283ffa49289.1620432500.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series Directory traversal fixes | expand

Commit Message

Elijah Newren May 8, 2021, 12:08 a.m. UTC
From: Elijah Newren <newren@gmail.com>

PNPM is apparently creating deeply nested (but ignored) directory
structures; traversing them is costly performance-wise, unnecessary, and
in some cases is even throwing warnings/errors because the paths are too
long to handle on various platforms.  Add a testcase that demonstrates
this problem.

Initial-test-by: Jason Gore <Jason.Gore@microsoft.com>
Helped-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Elijah Newren <newren@gmail.com>
---
 t/t7300-clean.sh | 40 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

Comments

Junio C Hamano May 8, 2021, 10:13 a.m. UTC | #1
"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Elijah Newren <newren@gmail.com>
>
> PNPM is apparently creating deeply nested (but ignored) directory

Sorry, but what's PNPM?

> structures; traversing them is costly performance-wise, unnecessary, and
> in some cases is even throwing warnings/errors because the paths are too
> long to handle on various platforms.  Add a testcase that demonstrates
> this problem.
>
> Initial-test-by: Jason Gore <Jason.Gore@microsoft.com>
> Helped-by: brian m. carlson <sandals@crustytoothpaste.net>
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
>  t/t7300-clean.sh | 40 ++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 40 insertions(+)
>
> diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
> index a74816ca8b46..5f1dc397c11e 100755
> --- a/t/t7300-clean.sh
> +++ b/t/t7300-clean.sh
> @@ -746,4 +746,44 @@ test_expect_success 'clean untracked paths by pathspec' '
>  	test_must_be_empty actual
>  '
>  
> +test_expect_failure 'avoid traversing into ignored directories' '
> +	test_when_finished rm -f output error &&
> +	test_create_repo avoid-traversing-deep-hierarchy &&
> +	(
> +		cd avoid-traversing-deep-hierarchy &&
> +
> +		>directory-random-file.txt &&
> +		# Put this file under directory400/directory399/.../directory1/
> +		depth=400 &&
> +		for x in $(test_seq 1 $depth); do

Style.  Lose semicolon, have "do" on the next line on its own,
aligned with "for".  Tip: you shouldn't need any semicolon other
than the doubled ones in case/esac in your shell script.

> +			mkdir "tmpdirectory$x" &&
> +			mv directory* "tmpdirectory$x" &&
> +			mv "tmpdirectory$x" "directory$x"
> +		done &&
> +
> +		git clean -ffdxn -e directory$depth >../output 2>../error &&
> +
> +		test_must_be_empty ../output &&
> +		# We especially do not want things like
> +		#   "warning: could not open directory "
> +		# appearing in the error output.  It is true that directories
> +		# that are too long cannot be opened, but we should not be
> +		# recursing into those directories anyway since the very first
> +		# level is ignored.
> +		test_must_be_empty ../error &&
> +
> +		# alpine-linux-musl fails to "rm -rf" a directory with such
> +		# a deeply nested hierarchy.  Help it out by deleting the
> +		# leading directories ourselves.  Super slow, but, what else
> +		# can we do?  Without this, we will hit a
> +		#     error: Tests passed but test cleanup failed; aborting
> +		# so do this ugly manual cleanup...
> +		while test ! -f directory-random-file.txt; do

Ditto.

> +			name=$(ls -d directory*) &&
> +			mv $name/* . &&
> +			rmdir $name
> +		done

Hmph, after seeing the discussion thread of v1, I was expecting to
see a helper in Perl that cd's down and then comes back up while
removing what is in its directory (and I expected something similar
for creation side we saw above).

> +	)
> +'
> +
>  test_done
Junio C Hamano May 8, 2021, 10:19 a.m. UTC | #2
"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +		# alpine-linux-musl fails to "rm -rf" a directory with such
> +		# a deeply nested hierarchy.  Help it out by deleting the
> +		# leading directories ourselves.  Super slow, but, what else
> +		# can we do?  Without this, we will hit a
> +		#     error: Tests passed but test cleanup failed; aborting
> +		# so do this ugly manual cleanup...
> +		while test ! -f directory-random-file.txt; do
> +			name=$(ls -d directory*) &&
> +			mv $name/* . &&
> +			rmdir $name
> +		done

Another thing: this not being a test_when_finished handler means it
would not help after a test failure.

Perhaps wrap it in a helper

    clean_deep_hierarchy () {
	rm -fr directory* ||
	while test ! -f directory-random-file.txt
	do
		...
	done
    }

and call it from test_when_finished?
Elijah Newren May 8, 2021, 5:34 p.m. UTC | #3
On Sat, May 8, 2021 at 3:13 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > From: Elijah Newren <newren@gmail.com>
> >
> > PNPM is apparently creating deeply nested (but ignored) directory
>
> Sorry, but what's PNPM?

a package manager; I'll use Philip Oakley's suggestion to make it more clear.

> > structures; traversing them is costly performance-wise, unnecessary, and
> > in some cases is even throwing warnings/errors because the paths are too
> > long to handle on various platforms.  Add a testcase that demonstrates
> > this problem.
> >
> > Initial-test-by: Jason Gore <Jason.Gore@microsoft.com>
> > Helped-by: brian m. carlson <sandals@crustytoothpaste.net>
> > Signed-off-by: Elijah Newren <newren@gmail.com>
> > ---
> >  t/t7300-clean.sh | 40 ++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 40 insertions(+)
> >
> > diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
> > index a74816ca8b46..5f1dc397c11e 100755
> > --- a/t/t7300-clean.sh
> > +++ b/t/t7300-clean.sh
> > @@ -746,4 +746,44 @@ test_expect_success 'clean untracked paths by pathspec' '
> >       test_must_be_empty actual
> >  '
> >
> > +test_expect_failure 'avoid traversing into ignored directories' '
> > +     test_when_finished rm -f output error &&
> > +     test_create_repo avoid-traversing-deep-hierarchy &&
> > +     (
> > +             cd avoid-traversing-deep-hierarchy &&
> > +
> > +             >directory-random-file.txt &&
> > +             # Put this file under directory400/directory399/.../directory1/
> > +             depth=400 &&
> > +             for x in $(test_seq 1 $depth); do
>
> Style.  Lose semicolon, have "do" on the next line on its own,
> aligned with "for".  Tip: you shouldn't need any semicolon other
> than the doubled ones in case/esac in your shell script.

Thanks.

>
> > +                     mkdir "tmpdirectory$x" &&
> > +                     mv directory* "tmpdirectory$x" &&
> > +                     mv "tmpdirectory$x" "directory$x"
> > +             done &&
> > +
> > +             git clean -ffdxn -e directory$depth >../output 2>../error &&
> > +
> > +             test_must_be_empty ../output &&
> > +             # We especially do not want things like
> > +             #   "warning: could not open directory "
> > +             # appearing in the error output.  It is true that directories
> > +             # that are too long cannot be opened, but we should not be
> > +             # recursing into those directories anyway since the very first
> > +             # level is ignored.
> > +             test_must_be_empty ../error &&
> > +
> > +             # alpine-linux-musl fails to "rm -rf" a directory with such
> > +             # a deeply nested hierarchy.  Help it out by deleting the
> > +             # leading directories ourselves.  Super slow, but, what else
> > +             # can we do?  Without this, we will hit a
> > +             #     error: Tests passed but test cleanup failed; aborting
> > +             # so do this ugly manual cleanup...
> > +             while test ! -f directory-random-file.txt; do
>
> Ditto.

Yep, sorry.

> > +                     name=$(ls -d directory*) &&
> > +                     mv $name/* . &&
> > +                     rmdir $name
> > +             done
>
> Hmph, after seeing the discussion thread of v1, I was expecting to
> see a helper in Perl that cd's down and then comes back up while
> removing what is in its directory (and I expected something similar
> for creation side we saw above).

Hmm, I was a bit unsure of the alternative route I took in patches 7
and 8 (switching trace1 to trace2 in dir.c, then using it to get more
statistics which would allow a much more shallow directory structure
for this test).  I wasn't sure if the strategy seemed acceptable, and
I wanted people to be able to see the two schemes side-by-side, but if
that alternative is acceptable, I want to move patch 7 to the front of
the series, the code change parts of patch 8 as the second patch, and
then squash the rest of patch 8 into this patch vastly simplifying
this testcase and obsoleting everyone's comments on it.

Maybe I should have just refactored the series that way anyway.  I'll
send a reroll that does that, and put all the [RFC] patches first.
Elijah Newren May 8, 2021, 5:41 p.m. UTC | #4
On Sat, May 8, 2021 at 3:19 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > +             # alpine-linux-musl fails to "rm -rf" a directory with such
> > +             # a deeply nested hierarchy.  Help it out by deleting the
> > +             # leading directories ourselves.  Super slow, but, what else
> > +             # can we do?  Without this, we will hit a
> > +             #     error: Tests passed but test cleanup failed; aborting
> > +             # so do this ugly manual cleanup...
> > +             while test ! -f directory-random-file.txt; do
> > +                     name=$(ls -d directory*) &&
> > +                     mv $name/* . &&
> > +                     rmdir $name
> > +             done
>
> Another thing: this not being a test_when_finished handler means it
> would not help after a test failure.

test failures are irrelevant here; this code is here to help
test_done's directory cleanup, which only fires when all tests pass.


But if I restructure the series, this whole section of code
disappears.  I'll do that...
diff mbox series

Patch

diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
index a74816ca8b46..5f1dc397c11e 100755
--- a/t/t7300-clean.sh
+++ b/t/t7300-clean.sh
@@ -746,4 +746,44 @@  test_expect_success 'clean untracked paths by pathspec' '
 	test_must_be_empty actual
 '
 
+test_expect_failure 'avoid traversing into ignored directories' '
+	test_when_finished rm -f output error &&
+	test_create_repo avoid-traversing-deep-hierarchy &&
+	(
+		cd avoid-traversing-deep-hierarchy &&
+
+		>directory-random-file.txt &&
+		# Put this file under directory400/directory399/.../directory1/
+		depth=400 &&
+		for x in $(test_seq 1 $depth); do
+			mkdir "tmpdirectory$x" &&
+			mv directory* "tmpdirectory$x" &&
+			mv "tmpdirectory$x" "directory$x"
+		done &&
+
+		git clean -ffdxn -e directory$depth >../output 2>../error &&
+
+		test_must_be_empty ../output &&
+		# We especially do not want things like
+		#   "warning: could not open directory "
+		# appearing in the error output.  It is true that directories
+		# that are too long cannot be opened, but we should not be
+		# recursing into those directories anyway since the very first
+		# level is ignored.
+		test_must_be_empty ../error &&
+
+		# alpine-linux-musl fails to "rm -rf" a directory with such
+		# a deeply nested hierarchy.  Help it out by deleting the
+		# leading directories ourselves.  Super slow, but, what else
+		# can we do?  Without this, we will hit a
+		#     error: Tests passed but test cleanup failed; aborting
+		# so do this ugly manual cleanup...
+		while test ! -f directory-random-file.txt; do
+			name=$(ls -d directory*) &&
+			mv $name/* . &&
+			rmdir $name
+		done
+	)
+'
+
 test_done