diff mbox series

[v4,04/13] dir: fix pattern matching on dirs

Message ID 24bffdab139435173712101aaf72f7277298c99d.1632497954.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series Sparse-checkout: modify 'git add', 'git rm', and 'git mv' behavior | expand

Commit Message

Derrick Stolee Sept. 24, 2021, 3:39 p.m. UTC
From: Derrick Stolee <dstolee@microsoft.com>

Within match_pathname(), one successful matching category happens when
the pattern is equal to its non-wildcard prefix. At this point, we have
checked that the input 'pathname' matches the pattern up to the prefix
length, and then we subtraced that length from both 'patternlen' and
'namelen'.

In the case of a directory match, this prefix match should be
sufficient. However, the success condition only cared about _exact_
equality here. Instead, we should allow any path that agrees on this
prefix in the case of PATTERN_FLAG_MUSTBEDIR.

This case was not tested before because of the way unpack_trees() would
match a parent directory before visiting the contained paths. This
approach is changing, so we must change this comparison.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 dir.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Glen Choo Nov. 2, 2021, 12:15 a.m. UTC | #1
This patch changes the behavior of .gitignore such that directories are
now matched by prefix instead of matching exactly.

The failure that we observed is something like the following:

In "a/.gitignore", we have the pattern "git/". We should expect that
"a/git/foo" to be ignored because "git/" should be matched exactly.
However, "a/git-foo/bar" is also ignored because "git-foo" matches the
prefix.

I'll prepare a test case for this as soon as I figure out how to write
it..
Junio C Hamano Nov. 2, 2021, 12:34 a.m. UTC | #2
Glen Choo <chooglen@google.com> writes:

> This patch changes the behavior of .gitignore such that directories are
> now matched by prefix instead of matching exactly.
>
> The failure that we observed is something like the following:
>
> In "a/.gitignore", we have the pattern "git/". We should expect that
> "a/git/foo" to be ignored because "git/" should be matched exactly.
> However, "a/git-foo/bar" is also ignored because "git-foo" matches the
> prefix.
>
> I'll prepare a test case for this as soon as I figure out how to write
> it..

FWIW, reverting this commit (and nothing else) from 'master' does not
seem to break any test.  That does not necessarily mean much (it may
be indicating a gap in the test coverage).
Derrick Stolee Nov. 2, 2021, 1:42 p.m. UTC | #3
On 11/1/2021 8:34 PM, Junio C Hamano wrote:
> Glen Choo <chooglen@google.com> writes:
> 
>> This patch changes the behavior of .gitignore such that directories are
>> now matched by prefix instead of matching exactly.

Thank you for pointing out an unintended consequence.

>> The failure that we observed is something like the following:
>>
>> In "a/.gitignore", we have the pattern "git/". We should expect that
>> "a/git/foo" to be ignored because "git/" should be matched exactly.
>> However, "a/git-foo/bar" is also ignored because "git-foo" matches the
>> prefix.
>>
>> I'll prepare a test case for this as soon as I figure out how to write
>> it..
> 
> FWIW, reverting this commit (and nothing else) from 'master' does not
> seem to break any test.  That does not necessarily mean much (it may
> be indicating a gap in the test coverage).

Hm. I definitely seemed confident in my commit message that this change
was important for a test I was introducing at some point. Let me revert
it in microsoft/git and see if our Scalar functional tests find an
issue.

In the meantime, I'll try to create a Git test that demonstrates a
problem one way or another.

Thanks,
-Stolee
Derrick Stolee Nov. 2, 2021, 2:50 p.m. UTC | #4
On 11/2/2021 9:42 AM, Derrick Stolee wrote:
> On 11/1/2021 8:34 PM, Junio C Hamano wrote:
>> Glen Choo <chooglen@google.com> writes:
>>
>>> This patch changes the behavior of .gitignore such that directories are
>>> now matched by prefix instead of matching exactly.
> 
> Thank you for pointing out an unintended consequence.
> 
>>> The failure that we observed is something like the following:
>>>
>>> In "a/.gitignore", we have the pattern "git/". We should expect that
>>> "a/git/foo" to be ignored because "git/" should be matched exactly.
>>> However, "a/git-foo/bar" is also ignored because "git-foo" matches the
>>> prefix.
>>>
>>> I'll prepare a test case for this as soon as I figure out how to write
>>> it..
...
> In the meantime, I'll try to create a Git test that demonstrates a
> problem one way or another.

I created a test, but had some trouble reproducing it due to some
subtleties higher in the call stack. Here is a patch that reverts
the change and adds some tests.

The Scalar functional tests passed with the revert, so the original
patch was worthless to begin with. I don't recall what motivated the
change, but clearly it was a mistake. Sorry.

---- >8 ----

From d1cfc8efeab015273bfebd6cd93465e6f38dc40f Mon Sep 17 00:00:00 2001
From: Derrick Stolee <dstolee@microsoft.com>
Date: Tue, 2 Nov 2021 10:40:06 -0400
Subject: [PATCH] dir: fix directory-matching bug

This reverts the change from ed49584 (dir: fix pattern matching on dirs,
2021-09-24), which claimed to fix a directory-matching problem without a
test case. It turns out to _create_ a bug, but it is a bit subtle.

The bug would have been revealed by the first of two tests being added to
t0008-ignores.sh. The first uses a pattern "/git/" inside the a/.gitignores
file, which matches against 'a/git/foo' but not 'a/git-foo/bar'. This test
would fail before the revert.

The second test shows what happens if the test instead uses a pattern "git/"
and this test passes both before and after the revert.

The difference in these two cases are due to how
last_matching_pattern_from_list() checks patterns both if they have the
PATTERN_FLAG_MUSTBEDIR and PATTERN_FLAG_NODIR flags. In the case of "git/",
the PATTERN_FLAG_NODIR is also provided, making the change in behavior in
match_pathname() not affect the end result of
last_matching_pattern_from_list().

Reported-by: Glen Choo <chooglen@google.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 dir.c              |  2 +-
 t/t0008-ignores.sh | 26 ++++++++++++++++++++++++++
 2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/dir.c b/dir.c
index c6d7a8647b9..94489298f4c 100644
--- a/dir.c
+++ b/dir.c
@@ -1294,7 +1294,7 @@ int match_pathname(const char *pathname, int pathlen,
 		 * then our prefix match is all we need; we
 		 * do not need to call fnmatch at all.
 		 */
-		if (!patternlen && (!namelen || (flags & PATTERN_FLAG_MUSTBEDIR)))
+		if (!patternlen && !namelen)
 			return 1;
 	}
 
diff --git a/t/t0008-ignores.sh b/t/t0008-ignores.sh
index 532637de882..1889cfc60e0 100755
--- a/t/t0008-ignores.sh
+++ b/t/t0008-ignores.sh
@@ -803,6 +803,32 @@ test_expect_success 'existing directory and file' '
 	grep top-level-dir actual
 '
 
+test_expect_success 'exact prefix matching (with root)' '
+	test_when_finished rm -r a &&
+	mkdir -p a/git a/git-foo &&
+	touch a/git/foo a/git-foo/bar &&
+	echo /git/ >a/.gitignore &&
+	git check-ignore a/git a/git/foo a/git-foo a/git-foo/bar >actual &&
+	cat >expect <<-\EOF &&
+	a/git
+	a/git/foo
+	EOF
+	test_cmp expect actual
+'
+
+test_expect_success 'exact prefix matching (without root)' '
+	test_when_finished rm -r a &&
+	mkdir -p a/git a/git-foo &&
+	touch a/git/foo a/git-foo/bar &&
+	echo git/ >a/.gitignore &&
+	git check-ignore a/git a/git/foo a/git-foo a/git-foo/bar >actual &&
+	cat >expect <<-\EOF &&
+	a/git
+	a/git/foo
+	EOF
+	test_cmp expect actual
+'
+
 ############################################################################
 #
 # test whitespace handling
Ævar Arnfjörð Bjarmason Nov. 2, 2021, 3:33 p.m. UTC | #5
On Tue, Nov 02 2021, Derrick Stolee wrote:

> On 11/2/2021 9:42 AM, Derrick Stolee wrote:
>> On 11/1/2021 8:34 PM, Junio C Hamano wrote:
>>> Glen Choo <chooglen@google.com> writes:
>>>
>>>> This patch changes the behavior of .gitignore such that directories are
>>>> now matched by prefix instead of matching exactly.
>> 
>> Thank you for pointing out an unintended consequence.
>> 
>>>> The failure that we observed is something like the following:
>>>>
>>>> In "a/.gitignore", we have the pattern "git/". We should expect that
>>>> "a/git/foo" to be ignored because "git/" should be matched exactly.
>>>> However, "a/git-foo/bar" is also ignored because "git-foo" matches the
>>>> prefix.
>>>>
>>>> I'll prepare a test case for this as soon as I figure out how to write
>>>> it..
> ...
>> In the meantime, I'll try to create a Git test that demonstrates a
>> problem one way or another.
>
> I created a test, but had some trouble reproducing it due to some
> subtleties higher in the call stack. Here is a patch that reverts
> the change and adds some tests.
>
> The Scalar functional tests passed with the revert, so the original
> patch was worthless to begin with. I don't recall what motivated the
> change, but clearly it was a mistake. Sorry.
>
> ---- >8 ----
>
> From d1cfc8efeab015273bfebd6cd93465e6f38dc40f Mon Sep 17 00:00:00 2001
> From: Derrick Stolee <dstolee@microsoft.com>
> Date: Tue, 2 Nov 2021 10:40:06 -0400
> Subject: [PATCH] dir: fix directory-matching bug
>
> This reverts the change from ed49584 (dir: fix pattern matching on dirs,
> 2021-09-24), which claimed to fix a directory-matching problem without a
> test case. It turns out to _create_ a bug, but it is a bit subtle.
>
> The bug would have been revealed by the first of two tests being added to
> t0008-ignores.sh. The first uses a pattern "/git/" inside the a/.gitignores
> file, which matches against 'a/git/foo' but not 'a/git-foo/bar'. This test
> would fail before the revert.
>
> The second test shows what happens if the test instead uses a pattern "git/"
> and this test passes both before and after the revert.
>
> The difference in these two cases are due to how
> last_matching_pattern_from_list() checks patterns both if they have the
> PATTERN_FLAG_MUSTBEDIR and PATTERN_FLAG_NODIR flags. In the case of "git/",
> the PATTERN_FLAG_NODIR is also provided, making the change in behavior in
> match_pathname() not affect the end result of
> last_matching_pattern_from_list().
>
> Reported-by: Glen Choo <chooglen@google.com>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  dir.c              |  2 +-
>  t/t0008-ignores.sh | 26 ++++++++++++++++++++++++++
>  2 files changed, 27 insertions(+), 1 deletion(-)
>
> diff --git a/dir.c b/dir.c
> index c6d7a8647b9..94489298f4c 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -1294,7 +1294,7 @@ int match_pathname(const char *pathname, int pathlen,
>  		 * then our prefix match is all we need; we
>  		 * do not need to call fnmatch at all.
>  		 */
> -		if (!patternlen && (!namelen || (flags & PATTERN_FLAG_MUSTBEDIR)))
> +		if (!patternlen && !namelen)
>  			return 1;
>  	}
>  
> diff --git a/t/t0008-ignores.sh b/t/t0008-ignores.sh
> index 532637de882..1889cfc60e0 100755
> --- a/t/t0008-ignores.sh
> +++ b/t/t0008-ignores.sh
> @@ -803,6 +803,32 @@ test_expect_success 'existing directory and file' '
>  	grep top-level-dir actual
>  '
>  
> +test_expect_success 'exact prefix matching (with root)' '
> +	test_when_finished rm -r a &&
> +	mkdir -p a/git a/git-foo &&
> +	touch a/git/foo a/git-foo/bar &&
> +	echo /git/ >a/.gitignore &&
> +	git check-ignore a/git a/git/foo a/git-foo a/git-foo/bar >actual &&
> +	cat >expect <<-\EOF &&
> +	a/git
> +	a/git/foo
> +	EOF
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 'exact prefix matching (without root)' '
> +	test_when_finished rm -r a &&
> +	mkdir -p a/git a/git-foo &&
> +	touch a/git/foo a/git-foo/bar &&
> +	echo git/ >a/.gitignore &&
> +	git check-ignore a/git a/git/foo a/git-foo a/git-foo/bar >actual &&
> +	cat >expect <<-\EOF &&
> +	a/git
> +	a/git/foo
> +	EOF
> +	test_cmp expect actual
> +'
> +
>  ############################################################################
>  #
>  # test whitespace handling

We have t3070-wildmatch.sh testing various combinations of these, and
indeed this code resolves back to wildmatch().

But I think in this case this is due to dir.c's particular behavior of
splitting paths before feeding them to wildmatch, as it needs to match
things relative to the subdirectory.

Still, we've got a matrix of these in t3070-wildmatch.sh, which already
tests some (but apparently not all) cases where we need to create an
actual file on disk. These sorts of test blindspots are why I added that
in de8bada2bf6 (wildmatch test: create & test files on disk in addition
to in-memory, 2018-01-30).

Wouldn't it be better & more exhaustive here to change its test lines
like:


    match 0 0 1 1 foo/bar/baz 'bar'

To say:

    match 0 0 1 1 ? ? foo/bar/baz 'bar'

And just add to its match() function so that if we have a subject with a
slash, we mkdir up to that first slash (here: "mkdir foo"), and create a
.gitignore file therein with the "foo" directory with the "bar" content,
perhaps adding "/bar", "bar/" and "/bar/" variants implicitly.

Then create a "bar.txt" in the directory as well, and a
bar-otherdir/somefile.txt or whatever.

And fill in the "? ?" depending on whether those variants matched or
not...

Anyway, just an idea, but if you pursue that you should be able to get
much more exhaustive testing in this area that we've apparently had a
blindspot in.
Derrick Stolee Nov. 3, 2021, 2:40 p.m. UTC | #6
On 11/2/2021 11:33 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> We have t3070-wildmatch.sh testing various combinations of these, and
> indeed this code resolves back to wildmatch().
> 
> But I think in this case this is due to dir.c's particular behavior of
> splitting paths before feeding them to wildmatch, as it needs to match
> things relative to the subdirectory.
> 
> Still, we've got a matrix of these in t3070-wildmatch.sh, which already
> tests some (but apparently not all) cases where we need to create an
> actual file on disk. These sorts of test blindspots are why I added that
> in de8bada2bf6 (wildmatch test: create & test files on disk in addition
> to in-memory, 2018-01-30).
> 
> Wouldn't it be better & more exhaustive here to change its test lines
> like:
> 
> 
>     match 0 0 1 1 foo/bar/baz 'bar'
> 
> To say:
> 
>     match 0 0 1 1 ? ? foo/bar/baz 'bar'
> 
> And just add to its match() function so that if we have a subject with a
> slash, we mkdir up to that first slash (here: "mkdir foo"), and create a
> .gitignore file therein with the "foo" directory with the "bar" content,
> perhaps adding "/bar", "bar/" and "/bar/" variants implicitly.
> 
> Then create a "bar.txt" in the directory as well, and a
> bar-otherdir/somefile.txt or whatever.
> 
> And fill in the "? ?" depending on whether those variants matched or
> not...
> 
> Anyway, just an idea, but if you pursue that you should be able to get
> much more exhaustive testing in this area that we've apparently had a
> blindspot in.

Those tests are quite exhaustive, but the test script is pretty
inscrutable and the refactor you suggest is pretty major. I'd prefer
to keep to the focused test for the sake of fixing this in time for
the release.

Thanks,
-Stolee
Junio C Hamano Nov. 3, 2021, 5:14 p.m. UTC | #7
Derrick Stolee <stolee@gmail.com> writes:

> Those tests are quite exhaustive, but the test script is pretty
> inscrutable and the refactor you suggest is pretty major. I'd prefer
> to keep to the focused test for the sake of fixing this in time for
> the release.

Thanks, both.
diff mbox series

Patch

diff --git a/dir.c b/dir.c
index 9ea6cfe61cb..174d336c30e 100644
--- a/dir.c
+++ b/dir.c
@@ -1294,7 +1294,7 @@  int match_pathname(const char *pathname, int pathlen,
 		 * then our prefix match is all we need; we
 		 * do not need to call fnmatch at all.
 		 */
-		if (!patternlen && !namelen)
+		if (!patternlen && (!namelen || (flags & PATTERN_FLAG_MUSTBEDIR)))
 			return 1;
 	}