diff mbox series

[5/7] t0060: test obscured .gitattributes and .gitignore matching

Message ID 20201005072102.GE2291074@coredump.intra.peff.net (mailing list archive)
State Superseded
Headers show
Series forbidding symlinked .gitattributes and .gitignore | expand

Commit Message

Jeff King Oct. 5, 2020, 7:21 a.m. UTC
We have tests that cover various filesystem-specific spellings of
".gitmodules", because we need to reliably identify that path for some
security checks. These are from dc2d9ba318 (is_{hfs,ntfs}_dotgitmodules:
add tests, 2018-05-12), with the actual code coming from e7cb0b4455
(is_ntfs_dotgit: match other .git files, 2018-05-11) and 0fc333ba20
(is_hfs_dotgit: match other .git files, 2018-05-02).

Those latter two commits also added similar matching functions for
.gitattributes and .gitignore. These ended up not being used in the
final series, and are currently dead code. But in preparation for them
being used, let's make sure they actually work by throwing a few basic
checks at them.

I didn't bother with the whole battery of tests that we cover for
.gitmodules. These functions are all based on the same generic matcher,
so it's sufficient to test most of the corner cases just once.

Note that the ntfs magic prefix names in the tests come from the
algorithm described in e7cb0b4455 (and are different for each file).

Signed-off-by: Jeff King <peff@peff.net>
---
 t/helper/test-path-utils.c | 41 ++++++++++++++++++++++++++------------
 t/t0060-path-utils.sh      | 20 +++++++++++++++++++
 2 files changed, 48 insertions(+), 13 deletions(-)

Comments

Jonathan Nieder Oct. 5, 2020, 8:03 a.m. UTC | #1
(+cc: Dscho for NTFS savvy)
Jeff King wrote:

> We have tests that cover various filesystem-specific spellings of
> ".gitmodules", because we need to reliably identify that path for some
> security checks. These are from dc2d9ba318 (is_{hfs,ntfs}_dotgitmodules:
> add tests, 2018-05-12), with the actual code coming from e7cb0b4455
> (is_ntfs_dotgit: match other .git files, 2018-05-11) and 0fc333ba20
> (is_hfs_dotgit: match other .git files, 2018-05-02).
>
> Those latter two commits also added similar matching functions for
> .gitattributes and .gitignore. These ended up not being used in the
> final series, and are currently dead code. But in preparation for them
> being used, let's make sure they actually work by throwing a few basic
> checks at them.
>
> I didn't bother with the whole battery of tests that we cover for
> .gitmodules. These functions are all based on the same generic matcher,
> so it's sufficient to test most of the corner cases just once.

Yeah, that's reasonable.

> Note that the ntfs magic prefix names in the tests come from the
> algorithm described in e7cb0b4455 (and are different for each file).

Doesn't block this patch, but I'm curious: how hard would it be to make
a test with an NTFS prerequisite that makes sure we got the magic prefix
right?

> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  t/helper/test-path-utils.c | 41 ++++++++++++++++++++++++++------------
>  t/t0060-path-utils.sh      | 20 +++++++++++++++++++
>  2 files changed, 48 insertions(+), 13 deletions(-)
> 
> diff --git a/t/helper/test-path-utils.c b/t/helper/test-path-utils.c
> index 313a153209..9e253f8058 100644
> --- a/t/helper/test-path-utils.c
> +++ b/t/helper/test-path-utils.c
> @@ -172,9 +172,22 @@ static struct test_data dirname_data[] = {
>  	{ NULL,              NULL     }
>  };
>  
> -static int is_dotgitmodules(const char *path)
> +static int check_dotgitx(const char *x, const char **argv,
> +			 int (*is_hfs)(const char *),
> +			 int (*is_ntfs)(const char *))
>  {
> -	return is_hfs_dotgitmodules(path) || is_ntfs_dotgitmodules(path);
> +	int res = 0, expect = 1;
> +	for (; *argv; argv++) {
> +		if (!strcmp("--not", *argv))
> +			expect = !expect;
> +		else if (expect != (is_hfs(*argv) || is_ntfs(*argv)))
> +			 res = error("'%s' is %s.%s", *argv,
> +				     expect ? "not " : "", x);
> +		else
> +			fprintf(stderr, "ok: '%s' is %s.%s\n",
> +				*argv, expect ? "" : "not ", x);

micronit: extra space on the "res" line.

This "if" cascade is a little hard to read, even though it does the
right thing.  Can we make it more explicit?  E.g.

		if (!strcmp("--not", *argv)) {
			expect = !expect;
			continue;
		}

		actual = is_hfs(*argv) || is_ntfs(*argv);

		fprintf(stderr, "%s: '%s' is %s%s",
			expect == actual ? "ok" : "error",
			*argv, actual ? "" : "not ", x);
		if (expect != actual)
			res = -1;

I think it's a little easier to read with either (a) the dot included
in the 'x' parameter or (b) the entire ".git" missing from the 'x'
parameter.

[...]
> index 56db5c8aba..b2e3cf3f4c 100755
> --- a/t/t0060-path-utils.sh
> +++ b/t/t0060-path-utils.sh
> @@ -468,6 +468,26 @@ test_expect_success 'match .gitmodules' '
>  		.gitmodules,:\$DATA
>  '
>  
> +test_expect_success 'match .gitattributes' '
> +	test-tool path-utils is_dotgitattributes \
> +		.gitattributes \
> +		.git${u200c}attributes \
> +		.Gitattributes \
> +		.gitattributeS \
> +		GITATT~1 \
> +		GI7D29~1
> +'
> +
> +test_expect_success 'match .gitignore' '
> +	test-tool path-utils is_dotgitignore \
> +		.gitignore \
> +		.git${u200c}ignore \
> +		.Gitignore \
> +		.gitignorE \
> +		GITIGN~1 \
> +		GI250A~1
> +'
> +
>  test_expect_success MINGW 'is_valid_path() on Windows' '
>  	test-tool path-utils is_valid_path \
>  		win32 \

With whatever subset of the changes above makes sense,
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>

Thanks.
Jeff King Oct. 5, 2020, 8:40 a.m. UTC | #2
On Mon, Oct 05, 2020 at 01:03:53AM -0700, Jonathan Nieder wrote:

> > Note that the ntfs magic prefix names in the tests come from the
> > algorithm described in e7cb0b4455 (and are different for each file).
> 
> Doesn't block this patch, but I'm curious: how hard would it be to make
> a test with an NTFS prerequisite that makes sure we got the magic prefix
> right?

I suspect hard since Dscho punted on it in the original series. :) If I
understand correctly, it would require having an NTFS filesystem, and
generating 10,000+ files with a clashing prefix.

> > +	for (; *argv; argv++) {
> > +		if (!strcmp("--not", *argv))
> > +			expect = !expect;
> > +		else if (expect != (is_hfs(*argv) || is_ntfs(*argv)))
> > +			 res = error("'%s' is %s.%s", *argv,
> > +				     expect ? "not " : "", x);
> > +		else
> > +			fprintf(stderr, "ok: '%s' is %s.%s\n",
> > +				*argv, expect ? "" : "not ", x);
> 
> micronit: extra space on the "res" line.

Thanks, fixed.

> This "if" cascade is a little hard to read, even though it does the
> right thing.  Can we make it more explicit?  E.g.

This is directly moved from the existing code. I'd prefer to keep the
overall structure intact to make that clear.

> I think it's a little easier to read with either (a) the dot included
> in the 'x' parameter or (b) the entire ".git" missing from the 'x'
> parameter.

Yeah, I agree that's worth doing. I took (b), as "dotgitx" implies that
"x" is "modules", etc. I had originally planned to automatically turn
"gitmodules" into "is_ntfs_dotgitmodules", too, but it required
macros and string-pasting. So I decided it was a bit too ugly. :)

-Peff
Johannes Schindelin Oct. 5, 2020, 9:20 p.m. UTC | #3
Hi Peff & Jonathan N,

On Mon, 5 Oct 2020, Jeff King wrote:

> On Mon, Oct 05, 2020 at 01:03:53AM -0700, Jonathan Nieder wrote:
>
> > > Note that the ntfs magic prefix names in the tests come from the
> > > algorithm described in e7cb0b4455 (and are different for each file).
> >
> > Doesn't block this patch, but I'm curious: how hard would it be to make
> > a test with an NTFS prerequisite that makes sure we got the magic prefix
> > right?
>
> I suspect hard since Dscho punted on it in the original series. :) If I
> understand correctly, it would require having an NTFS filesystem, and
> generating 10,000+ files with a clashing prefix.

It's not quite _as_ bad: you only need to generate 4 files with a clashing
prefix and then the real one:

-- snip --
me@work MINGW64 ~/repros/ntfs-short-names
$ touch .gitattributes1

me@work MINGW64 ~/repros/ntfs-short-names
$ touch .gitattributes2

me@work MINGW64 ~/repros/ntfs-short-names
$ touch .gitattributes3

me@work MINGW64 ~/repros/ntfs-short-names
$ touch .gitattributes4

me@work MINGW64 ~/repros/ntfs-short-names
$ touch .gitattributes

me@work MINGW64 ~/repros/ntfs-short-names
$ touch .gitignore1

me@work MINGW64 ~/repros/ntfs-short-names
$ touch .gitignore2

me@work MINGW64 ~/repros/ntfs-short-names
$ touch .gitignore3

me@work MINGW64 ~/repros/ntfs-short-names
$ touch .gitignore4

me@work MINGW64 ~/repros/ntfs-short-names
$ touch .gitignore

me@work MINGW64 ~/repros/ntfs-short-names
$ cmd //c dir //x
 Volume in drive C is OSDisk
 Volume Serial Number is 5E6B-4E77

 Directory of C:\Users\me\repros\ntfs-short-names

10/05/2020  11:11 PM    <DIR>                       .
10/05/2020  11:11 PM    <DIR>                       ..
10/05/2020  11:08 PM                 0 GI7D29~1     .gitattributes
10/05/2020  11:08 PM                 0 GITATT~1     .gitattributes1
10/05/2020  11:08 PM                 0 GITATT~2     .gitattributes2
10/05/2020  11:08 PM                 0 GITATT~3     .gitattributes3
10/05/2020  11:08 PM                 0 GITATT~4     .gitattributes4
10/05/2020  11:11 PM                 0 GI250A~1     .gitignore
10/05/2020  11:11 PM                 0 GITIGN~1     .gitignore1
10/05/2020  11:11 PM                 0 GITIGN~2     .gitignore2
10/05/2020  11:11 PM                 0 GITIGN~3     .gitignore3
10/05/2020  11:11 PM                 0 GITIGN~4     .gitignore4
              10 File(s)              0 bytes
               2 Dir(s)  314,658,705,408 bytes free
-- snap --

But I don't necessarily think that it would make sense to add that test:
it adds churn _every_ time the regression test is run, and by deity, it
sure takes way too long on Windows _already_, and the test would be for a
regression _in the NTFS driver_.

At this stage, I also highly doubt that the algorithm will change ever
again (the last time it changed was several Windows versions ago, I want
to say in Windows XP, but it could have been all the way back to NT).

In light of that, I'd say that the bang is rather small and the buck would
be not small at all, and would have to be paid by developers on Windows
who already pay a disproportionately high price when running the test
suite, so...

Ciao,
Dscho
Jeff King Oct. 6, 2020, 2:01 p.m. UTC | #4
On Mon, Oct 05, 2020 at 11:20:48PM +0200, Johannes Schindelin wrote:

> > > > Note that the ntfs magic prefix names in the tests come from the
> > > > algorithm described in e7cb0b4455 (and are different for each file).
> > >
> > > Doesn't block this patch, but I'm curious: how hard would it be to make
> > > a test with an NTFS prerequisite that makes sure we got the magic prefix
> > > right?
> >
> > I suspect hard since Dscho punted on it in the original series. :) If I
> > understand correctly, it would require having an NTFS filesystem, and
> > generating 10,000+ files with a clashing prefix.
> 
> It's not quite _as_ bad: you only need to generate 4 files with a clashing
> prefix and then the real one:

Ah, that really isn't that bad, then. Still, I don't mind leaving this
as-is under the notion that if the algorithm does change, it would
likely make it onto your radar anyway (or the radar of _anybody_ who
would raise the issue).

-Peff
diff mbox series

Patch

diff --git a/t/helper/test-path-utils.c b/t/helper/test-path-utils.c
index 313a153209..9e253f8058 100644
--- a/t/helper/test-path-utils.c
+++ b/t/helper/test-path-utils.c
@@ -172,9 +172,22 @@  static struct test_data dirname_data[] = {
 	{ NULL,              NULL     }
 };
 
-static int is_dotgitmodules(const char *path)
+static int check_dotgitx(const char *x, const char **argv,
+			 int (*is_hfs)(const char *),
+			 int (*is_ntfs)(const char *))
 {
-	return is_hfs_dotgitmodules(path) || is_ntfs_dotgitmodules(path);
+	int res = 0, expect = 1;
+	for (; *argv; argv++) {
+		if (!strcmp("--not", *argv))
+			expect = !expect;
+		else if (expect != (is_hfs(*argv) || is_ntfs(*argv)))
+			 res = error("'%s' is %s.%s", *argv,
+				     expect ? "not " : "", x);
+		else
+			fprintf(stderr, "ok: '%s' is %s.%s\n",
+				*argv, expect ? "" : "not ", x);
+	}
+	return !!res;
 }
 
 static int cmp_by_st_size(const void *a, const void *b)
@@ -382,17 +395,19 @@  int cmd__path_utils(int argc, const char **argv)
 		return test_function(dirname_data, posix_dirname, argv[1]);
 
 	if (argc > 2 && !strcmp(argv[1], "is_dotgitmodules")) {
-		int res = 0, expect = 1, i;
-		for (i = 2; i < argc; i++)
-			if (!strcmp("--not", argv[i]))
-				expect = !expect;
-			else if (expect != is_dotgitmodules(argv[i]))
-				res = error("'%s' is %s.gitmodules", argv[i],
-					    expect ? "not " : "");
-			else
-				fprintf(stderr, "ok: '%s' is %s.gitmodules\n",
-					argv[i], expect ? "" : "not ");
-		return !!res;
+		return check_dotgitx("gitmodules", argv + 2,
+				     is_hfs_dotgitmodules,
+				     is_ntfs_dotgitmodules);
+	}
+	if (argc > 2 && !strcmp(argv[1], "is_dotgitignore")) {
+		return check_dotgitx("gitignore", argv + 2,
+				     is_hfs_dotgitignore,
+				     is_ntfs_dotgitignore);
+	}
+	if (argc > 2 && !strcmp(argv[1], "is_dotgitattributes")) {
+		return check_dotgitx("gitattributes", argv + 2,
+				     is_hfs_dotgitattributes,
+				     is_ntfs_dotgitattributes);
 	}
 
 	if (argc > 2 && !strcmp(argv[1], "file-size")) {
diff --git a/t/t0060-path-utils.sh b/t/t0060-path-utils.sh
index 56db5c8aba..b2e3cf3f4c 100755
--- a/t/t0060-path-utils.sh
+++ b/t/t0060-path-utils.sh
@@ -468,6 +468,26 @@  test_expect_success 'match .gitmodules' '
 		.gitmodules,:\$DATA
 '
 
+test_expect_success 'match .gitattributes' '
+	test-tool path-utils is_dotgitattributes \
+		.gitattributes \
+		.git${u200c}attributes \
+		.Gitattributes \
+		.gitattributeS \
+		GITATT~1 \
+		GI7D29~1
+'
+
+test_expect_success 'match .gitignore' '
+	test-tool path-utils is_dotgitignore \
+		.gitignore \
+		.git${u200c}ignore \
+		.Gitignore \
+		.gitignorE \
+		GITIGN~1 \
+		GI250A~1
+'
+
 test_expect_success MINGW 'is_valid_path() on Windows' '
 	test-tool path-utils is_valid_path \
 		win32 \