diff mbox series

[v2] add, rm, mv: fix bug that prevents the update of non-sparse dirs

Message ID 5e99c039db0b9644fb21f2ea72a464c67a74ff64.1635191000.git.matheus.bernardino@usp.br (mailing list archive)
State Superseded
Headers show
Series [v2] add, rm, mv: fix bug that prevents the update of non-sparse dirs | expand

Commit Message

Matheus Tavares Oct. 25, 2021, 9:07 p.m. UTC
These three commands recently learned to avoid updating paths outside
the sparse checkout even if they are missing the SKIP_WORKTREE bit. This
is done using path_in_sparse_checkout(), which checks whether a given
path matches the current list of sparsity rules, similar to what
clear_ce_flags() does when we run "git sparse checkout init" or "git
sparse-checkout reapply". However, clear_ce_flags() uses a recursive
approach, applying the match results from parent directories on paths
that get the UNDECIDED result, whereas path_in_sparse_checkout() only
attempts to match the full path and immediately considers UNDECIDED as
NOT_MATCHED. This makes the function miss matches with leading
directories. For example, if the user has the sparsity patterns "!/a"
and "b/", add, rm, and mv will fail to update the path "a/b/c" and end
up displaying a warning about it being outside the sparse checkout even
though it isn't. This problem only occurs in full pattern mode as the
pattern matching functions never return UNDECIDED for cone mode.

To fix this, replicate the recursive behavior of clear_ce_flags() in
path_in_sparse_checkout(), falling back to the parent directory match
when a path gets the UNDECIDED result. (If this turns out to be too
expensive in some cases, we may want to later add some form of caching
to accelerate multiple queries within the same directory. This is not
implemented in this patch, though.) Also add two tests for each affected
command (add, rm, and mv) to check that they behave correctly with the
recursive pattern matching. The first test would previously fail without
this patch while the second already succeeded. It is added mostly to
make sure that we are not breaking the existing pattern matching for
directories that are really sparse, and also as a protection against any
future regressions.

Two other existing tests had to be changed as well: one test in t3602
checks that "git rm -r <dir>" won't remove sparse entries, but it didn't
allow the non-sparse entries inside <dir> to be removed. The other one,
in t7002, tested that "git mv" would correctly display a warning message
for sparse paths, but it accidentally expected the message to include
two non-sparse paths as well.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---

Changes since RFC/v1 [1]:

- Inverted the loop direction to start from the full path and go backwards in
  the parent dirs. This way we can stop early when we find the first
  non-UNDECIDED match result.

- Simplified the implementation by unifing the code path for cone mode and
  full pattern mode. Since path_matches_pattern_list() never returns UNDECIDED
  for cone mode, it will always execute only one iteration of the loop and then
  find the final answer. There is no need to handle this case in a separate
  block.

- Inside the loop, made sure to change dtype to DT_DIR when going to parent
  directories. Without this, the pattern match would fail if we had a path
  like "a/b/c" and the pattern "b/" (with trailing slash).

- Changed the tests to use trailing slash to make sure they cover the corner
  case described above.

- Improved commit message.

[1]: https://lore.kernel.org/git/80b5ba61861193daf7132aa64b65fc7dde90dacb.1634866698.git.matheus.bernardino@usp.br
(The RFC was deep down another thread, so I separated v2 to help
readers. Please, let me know if that is not a good approach and I will
avoid it in the future.)

 dir.c                          | 25 +++++++++++++++++------
 t/t3602-rm-sparse-checkout.sh  | 37 +++++++++++++++++++++++++++++++---
 t/t3705-add-sparse-checkout.sh | 18 +++++++++++++++++
 t/t7002-mv-sparse-checkout.sh  | 24 ++++++++++++++++++++--
 4 files changed, 93 insertions(+), 11 deletions(-)

Comments

Derrick Stolee Oct. 26, 2021, 12:53 p.m. UTC | #1
On 10/25/2021 5:07 PM, Matheus Tavares wrote:
> Changes since RFC/v1 [1]:
> 
> - Inverted the loop direction to start from the full path and go backwards in
>   the parent dirs. This way we can stop early when we find the first
>   non-UNDECIDED match result.

This loop direction change is a good idea.
 
> - Simplified the implementation by unifing the code path for cone mode and
>   full pattern mode. Since path_matches_pattern_list() never returns UNDECIDED
>   for cone mode, it will always execute only one iteration of the loop and then
>   find the final answer. There is no need to handle this case in a separate
>   block.

This was unexpected, but makes sense. While your commit message hints at the
fact that cone mode never returns UNDECIDED, it doesn't explicitly mention
that cone mode will exit the loop after a single iteration. It might be nice
to make that explicit either in the commit message or the block comment before
the loop.

> - Inside the loop, made sure to change dtype to DT_DIR when going to parent
>   directories. Without this, the pattern match would fail if we had a path
>   like "a/b/c" and the pattern "b/" (with trailing slash).

Very good. We typically need to detect the type for the first path given,
but we know that all parents are directories. I've used this trick elsewhere.

I see in the code that the first path is used as DT_REG. It's my fault, but
perhaps it should be made more clear that path_in_sparse_checkout() will
consider the given path as a file, not a directory. The current users of the
method are using it properly, but I'm suddenly worried about another caller
misinterpreting the generality of the problem.

Would a comment be sufficient? Or should we rename it to something like
file_path_patches_pattern_list()? (A rename can be done separately.)

> - Changed the tests to use trailing slash to make sure they cover the corner
>   case described above.

Good.

> - Improved commit message.
> 
> [1]: https://lore.kernel.org/git/80b5ba61861193daf7132aa64b65fc7dde90dacb.1634866698.git.matheus.bernardino@usp.br
> (The RFC was deep down another thread, so I separated v2 to help
> readers. Please, let me know if that is not a good approach and I will
> avoid it in the future.)

I appreciate that you split this out into its own thread!

> @@ -1504,8 +1504,9 @@ static int path_in_sparse_checkout_1(const char *path,
>  				     struct index_state *istate,
>  				     int require_cone_mode)
>  {
> -	const char *base;
>  	int dtype = DT_REG;

Here is where we assume a file to start.

> +	enum pattern_match_result match = UNDECIDED;
> +	const char *end, *slash;
>  
>  	/*
>  	 * We default to accepting a path if there are no patterns or
> @@ -1516,11 +1517,23 @@ static int path_in_sparse_checkout_1(const char *path,
>  	     !istate->sparse_checkout_patterns->use_cone_patterns))
>  		return 1;
>  
> -	base = strrchr(path, '/');
> -	return path_matches_pattern_list(path, strlen(path), base ? base + 1 : path,
> -					 &dtype,
> -					 istate->sparse_checkout_patterns,
> -					 istate) > 0;
> +	/*
> +	 * If UNDECIDED, use the match from the parent dir (recursively),
> +	 * or fall back to NOT_MATCHED at the topmost level.
> +	 */
> +	for (end = path + strlen(path); end > path && match == UNDECIDED; end = slash) {

nit: since this line is long and the sentinel is complicated, it might
be worth splitting the different parts into their own lines:

	for (end = path + strlen(path);
	     end > path && match == UNDECIDED;
	     end = slash) {

> +
> +		for (slash = end - 1; slash >= path && *slash != '/'; slash--)
> +			; /* do nothing */
> +
> +		match = path_matches_pattern_list(path, end - path,
> +				slash >= path ? slash + 1 : path, &dtype,
> +				istate->sparse_checkout_patterns, istate);
> +
> +		/* We are going to match the parent dir now */
> +		dtype = DT_DIR;
> +	}
> +	return match > 0;
>  }

This implementation looks good.

And I appreciate the robust tests you added.

Thanks,
-Stolee
René Scharfe Oct. 26, 2021, 4:22 p.m. UTC | #2
Am 25.10.21 um 23:07 schrieb Matheus Tavares:
> These three commands recently learned to avoid updating paths outside
> the sparse checkout even if they are missing the SKIP_WORKTREE bit. This
> is done using path_in_sparse_checkout(), which checks whether a given
> path matches the current list of sparsity rules, similar to what
> clear_ce_flags() does when we run "git sparse checkout init" or "git
> sparse-checkout reapply". However, clear_ce_flags() uses a recursive
> approach, applying the match results from parent directories on paths
> that get the UNDECIDED result, whereas path_in_sparse_checkout() only
> attempts to match the full path and immediately considers UNDECIDED as
> NOT_MATCHED. This makes the function miss matches with leading
> directories. For example, if the user has the sparsity patterns "!/a"
> and "b/", add, rm, and mv will fail to update the path "a/b/c" and end
> up displaying a warning about it being outside the sparse checkout even
> though it isn't. This problem only occurs in full pattern mode as the
> pattern matching functions never return UNDECIDED for cone mode.
>
> To fix this, replicate the recursive behavior of clear_ce_flags() in
> path_in_sparse_checkout(), falling back to the parent directory match
> when a path gets the UNDECIDED result. (If this turns out to be too
> expensive in some cases, we may want to later add some form of caching
> to accelerate multiple queries within the same directory. This is not
> implemented in this patch, though.) Also add two tests for each affected
> command (add, rm, and mv) to check that they behave correctly with the
> recursive pattern matching. The first test would previously fail without
> this patch while the second already succeeded. It is added mostly to
> make sure that we are not breaking the existing pattern matching for
> directories that are really sparse, and also as a protection against any
> future regressions.
>
> Two other existing tests had to be changed as well: one test in t3602
> checks that "git rm -r <dir>" won't remove sparse entries, but it didn't
> allow the non-sparse entries inside <dir> to be removed. The other one,
> in t7002, tested that "git mv" would correctly display a warning message
> for sparse paths, but it accidentally expected the message to include
> two non-sparse paths as well.
>
> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
> ---
>
> Changes since RFC/v1 [1]:
>
> - Inverted the loop direction to start from the full path and go backwards in
>   the parent dirs. This way we can stop early when we find the first
>   non-UNDECIDED match result.
>
> - Simplified the implementation by unifing the code path for cone mode and
>   full pattern mode. Since path_matches_pattern_list() never returns UNDECIDED
>   for cone mode, it will always execute only one iteration of the loop and then
>   find the final answer. There is no need to handle this case in a separate
>   block.
>
> - Inside the loop, made sure to change dtype to DT_DIR when going to parent
>   directories. Without this, the pattern match would fail if we had a path
>   like "a/b/c" and the pattern "b/" (with trailing slash).
>
> - Changed the tests to use trailing slash to make sure they cover the corner
>   case described above.
>
> - Improved commit message.
>
> [1]: https://lore.kernel.org/git/80b5ba61861193daf7132aa64b65fc7dde90dacb.1634866698.git.matheus.bernardino@usp.br
> (The RFC was deep down another thread, so I separated v2 to help
> readers. Please, let me know if that is not a good approach and I will
> avoid it in the future.)
>
>  dir.c                          | 25 +++++++++++++++++------
>  t/t3602-rm-sparse-checkout.sh  | 37 +++++++++++++++++++++++++++++++---
>  t/t3705-add-sparse-checkout.sh | 18 +++++++++++++++++
>  t/t7002-mv-sparse-checkout.sh  | 24 ++++++++++++++++++++--
>  4 files changed, 93 insertions(+), 11 deletions(-)
>
> diff --git a/dir.c b/dir.c
> index a4306ab874..248f72e732 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -1504,8 +1504,9 @@ static int path_in_sparse_checkout_1(const char *path,
>  				     struct index_state *istate,
>  				     int require_cone_mode)
>  {
> -	const char *base;
>  	int dtype = DT_REG;
> +	enum pattern_match_result match = UNDECIDED;
> +	const char *end, *slash;
>
>  	/*
>  	 * We default to accepting a path if there are no patterns or
> @@ -1516,11 +1517,23 @@ static int path_in_sparse_checkout_1(const char *path,
>  	     !istate->sparse_checkout_patterns->use_cone_patterns))
>  		return 1;
>
> -	base = strrchr(path, '/');
> -	return path_matches_pattern_list(path, strlen(path), base ? base + 1 : path,
> -					 &dtype,
> -					 istate->sparse_checkout_patterns,
> -					 istate) > 0;
> +	/*
> +	 * If UNDECIDED, use the match from the parent dir (recursively),
> +	 * or fall back to NOT_MATCHED at the topmost level.
> +	 */
> +	for (end = path + strlen(path); end > path && match == UNDECIDED; end = slash) {
> +
> +		for (slash = end - 1; slash >= path && *slash != '/'; slash--)
> +			; /* do nothing */

slash can end up one less than path.  If path points to the first char
of a string object this would be undefined if I read 6.5.6 of C99
correctly.  (A pointer to the array element just after the last one is
specified as fine as long as it's not dereferenced, but a pointer to
the element before the first one is not mentioned and thus undefined.)

Do you really need the ">=" instead of ">"?

> +
> +		match = path_matches_pattern_list(path, end - path,
> +				slash >= path ? slash + 1 : path, &dtype,
> +				istate->sparse_checkout_patterns, istate);
> +
> +		/* We are going to match the parent dir now */
> +		dtype = DT_DIR;
> +	}
> +	return match > 0;
>  }
>
>  int path_in_sparse_checkout(const char *path,
Derrick Stolee Oct. 26, 2021, 7:04 p.m. UTC | #3
On 10/26/2021 12:22 PM, René Scharfe wrote:
> Am 25.10.21 um 23:07 schrieb Matheus Tavares:

I reordered some things to first audit that 'slash' is used safely,
assuming that we can store "p - 1" if p is a non-zero pointer.

>> +	/*
>> +	 * If UNDECIDED, use the match from the parent dir (recursively),
>> +	 * or fall back to NOT_MATCHED at the topmost level.
>> +	 */
>> +	for (end = path + strlen(path); end > path && match == UNDECIDED; end = slash) {
>> +
>> +		for (slash = end - 1; slash >= path && *slash != '/'; slash--)

Since "slash >= path" is compared before dereferencing '*slash', this is safe.

>> +			; /* do nothing */
> 

>> +
>> +		match = path_matches_pattern_list(path, end - path,
>> +				slash >= path ? slash + 1 : path, &dtype,

This is also a safe use of 'slash'.

> slash can end up one less than path.  If path points to the first char
> of a string object this would be undefined if I read 6.5.6 of C99
> correctly.  (A pointer to the array element just after the last one is
> specified as fine as long as it's not dereferenced, but a pointer to
> the element before the first one is not mentioned and thus undefined.)

I also see the specification saying this is undefined, but I do not
understand how any reasonable compiler/runtime could do anything
other than store "path - 1" as if it was an unsigned integer. There
are a lot of references about "the array" that the pointer points to,
but these pointer arithmetic things are not actually accessing the
memory allocator.

> Do you really need the ">=" instead of ">"?

I think the only case that would be of any interest is if the path
started with a slash, which would not be a valid worktree path. I
believe we could use ">" for an abundance of caution with the
undefined nature of subtracting from a pointer, but it is non-
obvious that that is a real problem.

Thanks,
-Stolee
Matheus Tavares Oct. 26, 2021, 10:43 p.m. UTC | #4
Hi, Stolee

Thanks for the review!

On Tue, Oct 26, 2021 at 9:53 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 10/25/2021 5:07 PM, Matheus Tavares wrote:
> >
> > - Simplified the implementation by unifing the code path for cone mode and
> >   full pattern mode. Since path_matches_pattern_list() never returns UNDECIDED
> >   for cone mode, it will always execute only one iteration of the loop and then
> >   find the final answer. There is no need to handle this case in a separate
> >   block.
>
> This was unexpected, but makes sense. While your commit message hints at the
> fact that cone mode never returns UNDECIDED, it doesn't explicitly mention
> that cone mode will exit the loop after a single iteration. It might be nice
> to make that explicit either in the commit message or the block comment before
> the loop.

Yup, will do.

> > - Inside the loop, made sure to change dtype to DT_DIR when going to parent
> >   directories. Without this, the pattern match would fail if we had a path
> >   like "a/b/c" and the pattern "b/" (with trailing slash).
>
> Very good. We typically need to detect the type for the first path given,
> but we know that all parents are directories. I've used this trick elsewhere.
>
> I see in the code that the first path is used as DT_REG. It's my fault, but
> perhaps it should be made more clear that path_in_sparse_checkout() will
> consider the given path as a file, not a directory. The current users of the
> method are using it properly, but I'm suddenly worried about another caller
> misinterpreting the generality of the problem.

Yeah, I was thinking about this too... I'm afraid there might be at
least two users of this function which already pass non-regular files
to it: builtin/add.c:refresh() and
sparse-index.c:convert_to_sparse_rec().

The first calls the function passing the user-given pathspec, which
may be a directory. But this one is easy to solve: I think we don't
even need the path_in_sparse_checkout() here as the `git add
--refresh` only work on tracked files, and the previous
matches_skip_worktree() call covers both skip_worktree and
non-skip_worktree index entries (maybe we should rename this function
to matches_sparse_ce()?)

As for convert_to_sparse_rec(), it seems to call
path_in_sparse_checkout() with the directory components of paths, so
something like "dir/". Perhaps we can make path_in_sparse_checkout()
receive a dtype argument and pass DT_UNKNOWN in this case?

Another case I haven't given much thought yet is submodules. For example:

git init sub &&
test_commit -C sub file &&
git submodule add ./sub &&
git commit -m sub &&
git sparse-checkout set 'sub/' &&
git mv sub sub2

Erroneously gives:
The following paths and/or pathspecs matched paths that exist
outside of your sparse-checkout definition, so will not be
updated in the index:
sub

But it works if we change DT_REG to DT_UNKNOWN in
path_in_sparse_checkout(). So, I'm not sure, should we use DT_UNKNOWN
for all calls?

> > @@ -1504,8 +1504,9 @@ static int path_in_sparse_checkout_1(const char *path,
> >                                    struct index_state *istate,
> >                                    int require_cone_mode)
> >  {
> > -     const char *base;
> >       int dtype = DT_REG;
>
> Here is where we assume a file to start.
>
> > +     enum pattern_match_result match = UNDECIDED;
> > +     const char *end, *slash;
> >
> >       /*
> >        * We default to accepting a path if there are no patterns or
> > @@ -1516,11 +1517,23 @@ static int path_in_sparse_checkout_1(const char *path,
> >            !istate->sparse_checkout_patterns->use_cone_patterns))
> >               return 1;
> >
> > -     base = strrchr(path, '/');
> > -     return path_matches_pattern_list(path, strlen(path), base ? base + 1 : path,
> > -                                      &dtype,
> > -                                      istate->sparse_checkout_patterns,
> > -                                      istate) > 0;
> > +     /*
> > +      * If UNDECIDED, use the match from the parent dir (recursively),
> > +      * or fall back to NOT_MATCHED at the topmost level.
> > +      */
> > +     for (end = path + strlen(path); end > path && match == UNDECIDED; end = slash) {
>
> nit: since this line is long and the sentinel is complicated, it might
> be worth splitting the different parts into their own lines:
>
>         for (end = path + strlen(path);
>              end > path && match == UNDECIDED;
>              end = slash) {

Good idea, I will split the lines like you did.
Matheus Tavares Oct. 27, 2021, midnight UTC | #5
On Tue, Oct 26, 2021 at 4:04 PM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 10/26/2021 12:22 PM, René Scharfe wrote:
> >
> > slash can end up one less than path.  If path points to the first char
> > of a string object this would be undefined if I read 6.5.6 of C99
> > correctly.  (A pointer to the array element just after the last one is
> > specified as fine as long as it's not dereferenced, but a pointer to
> > the element before the first one is not mentioned and thus undefined.)
>
> I also see the specification saying this is undefined, but I do not
> understand how any reasonable compiler/runtime could do anything
> other than store "path - 1" as if it was an unsigned integer. There
> are a lot of references about "the array" that the pointer points to,
> but these pointer arithmetic things are not actually accessing the
> memory allocator.
>
> > Do you really need the ">=" instead of ">"?
>
> I think the only case that would be of any interest is if the path
> started with a slash, which would not be a valid worktree path.

Agreed.

> I believe we could use ">" for an abundance of caution with the
> undefined nature of subtracting from a pointer, but it is non-
> obvious that that is a real problem.

Yeah, I think it should be fine to use ">" to be extra careful.
Derrick Stolee Oct. 27, 2021, 11:35 a.m. UTC | #6
On 10/26/2021 6:43 PM, Matheus Tavares wrote:> On Tue, Oct 26, 2021 at 9:53 AM Derrick Stolee <stolee@gmail.com> wrote:
>>> - Inside the loop, made sure to change dtype to DT_DIR when going to parent
>>>   directories. Without this, the pattern match would fail if we had a path
>>>   like "a/b/c" and the pattern "b/" (with trailing slash).
>>
>> Very good. We typically need to detect the type for the first path given,
>> but we know that all parents are directories. I've used this trick elsewhere.
>>
>> I see in the code that the first path is used as DT_REG. It's my fault, but
>> perhaps it should be made more clear that path_in_sparse_checkout() will
>> consider the given path as a file, not a directory. The current users of the
>> method are using it properly, but I'm suddenly worried about another caller
>> misinterpreting the generality of the problem.
> 
> Yeah, I was thinking about this too... I'm afraid there might be at
> least two users of this function which already pass non-regular files
> to it: builtin/add.c:refresh() and
> sparse-index.c:convert_to_sparse_rec().
> 
> The first calls the function passing the user-given pathspec, which
> may be a directory. But this one is easy to solve: I think we don't
> even need the path_in_sparse_checkout() here as the `git add
> --refresh` only work on tracked files, and the previous
> matches_skip_worktree() call covers both skip_worktree and
> non-skip_worktree index entries (maybe we should rename this function
> to matches_sparse_ce()?)
> 
> As for convert_to_sparse_rec(), it seems to call
> path_in_sparse_checkout() with the directory components of paths, so
> something like "dir/". Perhaps we can make path_in_sparse_checkout()
> receive a dtype argument and pass DT_UNKNOWN in this case?

This might be necessary. Thanks for digging into the details here.

> Another case I haven't given much thought yet is submodules. For example:
> 
> git init sub &&
> test_commit -C sub file &&
> git submodule add ./sub &&
> git commit -m sub &&
> git sparse-checkout set 'sub/' &&
> git mv sub sub2
> 
> Erroneously gives:
> The following paths and/or pathspecs matched paths that exist
> outside of your sparse-checkout definition, so will not be
> updated in the index:
> sub
> 
> But it works if we change DT_REG to DT_UNKNOWN in
> path_in_sparse_checkout(). So, I'm not sure, should we use DT_UNKNOWN
> for all calls?

This is interesting. Submodules aren't controlled by the sparse-checkout,
so we should probably check the cache entry to see if it is a gitlink
and skip the path_in_sparse_checkout() if so.

Good find!
-Stolee
Sean Christopherson Oct. 27, 2021, 5:36 p.m. UTC | #7
On Mon, Oct 25, 2021, Matheus Tavares wrote:
> - Changed the tests to use trailing slash to make sure they cover the corner
>   case described above.

...

> diff --git a/t/t3705-add-sparse-checkout.sh b/t/t3705-add-sparse-checkout.sh
> index 5b904988d4..54f3db4304 100755
> --- a/t/t3705-add-sparse-checkout.sh
> +++ b/t/t3705-add-sparse-checkout.sh
> @@ -214,4 +214,22 @@ test_expect_success 'add allows sparse entries with --sparse' '
>  	test_must_be_empty stderr
>  '
>  
> +test_expect_success 'can add files from non-sparse dir' '
> +	git sparse-checkout set w !/x y/ &&

Aha!  I re-discovered out the original problematic pattern that led me to omitting
the trailing slash.

Sparse checkout doesn't play nice with "/", e.g. "git sparse-checkout set /"
results in an empty working directory.  Using "/*/" omits top-level files, which
is expected, thus the only pattern that works for "include everything relative to
the top-level directory" is "/*".  And to workaround the bug being fixed here, it
requires adding both "/*" and "/*/" to sparse-checkout.

The docs clearly state that "/*" is the full pattern set, but on the other hand
I don't see anything that states that "/" isn't supported.  Part of the confusion
is that the asterisk is not needed for sub-directories, and a trailing slash for
sub-directories even results in different matching behavior per gitignore docs.
The omission of a trailing slash is also reinforced by the git-sparse-checkout
docs' examples.

If "/" can't be handled for some reason, it would helpful to explictly call that
out in the docs, e.g. git-sparse-checkout and/or the gitignore pattern docs.

Thanks!

> +	mkdir -p w x/y &&
> +	touch w/f x/y/f &&
> +	git add w/f x/y/f 2>stderr &&
> +	test_must_be_empty stderr
> +'
> +
> +test_expect_success 'refuse to add non-skip-worktree file from sparse dir' '
> +	git sparse-checkout set !/x y/ !x/y/z &&
> +	mkdir -p x/y/z &&
> +	touch x/y/z/f &&
> +	test_must_fail git add x/y/z/f 2>stderr &&
> +	echo x/y/z/f | cat sparse_error_header - sparse_hint >expect &&
> +	test_cmp expect stderr
> +
Chris Torek Oct. 27, 2021, 10:13 p.m. UTC | #8
On Tue, Oct 26, 2021 at 11:17 PM Derrick Stolee <stolee@gmail.com> wrote:
(regarding a pointer that "backs up one before the beginning
of the array" as it were)

> I also see the specification saying this is undefined, but I do not
> understand how any reasonable compiler/runtime could do anything
> other than store "path - 1" as if it was an unsigned integer. ...

This Standard C rule dates back to old segmented systems.
If you put some array A into its own segment, and use only
the offset as the "pointer", and the segment offset starts at
zero, then A[0] is at "address zero".  So the imaginary element
at A[-1] is at "address max", and a loop like:

    for (p = &A[N]; p >= &A[0]; p--)

is an infinite loop.

In practice nobody is using these architectures today, but the
restriction still exists.

Chris
diff mbox series

Patch

diff --git a/dir.c b/dir.c
index a4306ab874..248f72e732 100644
--- a/dir.c
+++ b/dir.c
@@ -1504,8 +1504,9 @@  static int path_in_sparse_checkout_1(const char *path,
 				     struct index_state *istate,
 				     int require_cone_mode)
 {
-	const char *base;
 	int dtype = DT_REG;
+	enum pattern_match_result match = UNDECIDED;
+	const char *end, *slash;
 
 	/*
 	 * We default to accepting a path if there are no patterns or
@@ -1516,11 +1517,23 @@  static int path_in_sparse_checkout_1(const char *path,
 	     !istate->sparse_checkout_patterns->use_cone_patterns))
 		return 1;
 
-	base = strrchr(path, '/');
-	return path_matches_pattern_list(path, strlen(path), base ? base + 1 : path,
-					 &dtype,
-					 istate->sparse_checkout_patterns,
-					 istate) > 0;
+	/*
+	 * If UNDECIDED, use the match from the parent dir (recursively),
+	 * or fall back to NOT_MATCHED at the topmost level.
+	 */
+	for (end = path + strlen(path); end > path && match == UNDECIDED; end = slash) {
+
+		for (slash = end - 1; slash >= path && *slash != '/'; slash--)
+			; /* do nothing */
+
+		match = path_matches_pattern_list(path, end - path,
+				slash >= path ? slash + 1 : path, &dtype,
+				istate->sparse_checkout_patterns, istate);
+
+		/* We are going to match the parent dir now */
+		dtype = DT_DIR;
+	}
+	return match > 0;
 }
 
 int path_in_sparse_checkout(const char *path,
diff --git a/t/t3602-rm-sparse-checkout.sh b/t/t3602-rm-sparse-checkout.sh
index ecce497a9c..034ec01091 100755
--- a/t/t3602-rm-sparse-checkout.sh
+++ b/t/t3602-rm-sparse-checkout.sh
@@ -40,14 +40,20 @@  done
 test_expect_success 'recursive rm does not remove sparse entries' '
 	git reset --hard &&
 	git sparse-checkout set sub/dir &&
-	test_must_fail git rm -r sub &&
-	git rm --sparse -r sub &&
+	git rm -r sub &&
 	git status --porcelain -uno >actual &&
 	cat >expected <<-\EOF &&
+	D  sub/dir/e
+	EOF
+	test_cmp expected actual &&
+
+	git rm --sparse -r sub &&
+	git status --porcelain -uno >actual2 &&
+	cat >expected2 <<-\EOF &&
 	D  sub/d
 	D  sub/dir/e
 	EOF
-	test_cmp expected actual
+	test_cmp expected2 actual2
 '
 
 test_expect_success 'recursive rm --sparse removes sparse entries' '
@@ -105,4 +111,29 @@  test_expect_success 'refuse to rm a non-skip-worktree path outside sparse cone'
 	test_path_is_missing b
 '
 
+test_expect_success 'can remove files from non-sparse dir' '
+	git reset --hard &&
+	git sparse-checkout disable &&
+	mkdir -p w x/y &&
+	test_commit w/f &&
+	test_commit x/y/f &&
+
+	git sparse-checkout set w !/x y/ &&
+	git rm w/f.t x/y/f.t 2>stderr &&
+	test_must_be_empty stderr
+'
+
+test_expect_success 'refuse to remove non-skip-worktree file from sparse dir' '
+	git reset --hard &&
+	git sparse-checkout disable &&
+	mkdir -p x/y/z &&
+	test_commit x/y/z/f &&
+	git sparse-checkout set !/x y/ !x/y/z &&
+
+	git update-index --no-skip-worktree x/y/z/f.t &&
+	test_must_fail git rm x/y/z/f.t 2>stderr &&
+	echo x/y/z/f.t | cat sparse_error_header - sparse_hint >expect &&
+	test_cmp expect stderr
+'
+
 test_done
diff --git a/t/t3705-add-sparse-checkout.sh b/t/t3705-add-sparse-checkout.sh
index 5b904988d4..54f3db4304 100755
--- a/t/t3705-add-sparse-checkout.sh
+++ b/t/t3705-add-sparse-checkout.sh
@@ -214,4 +214,22 @@  test_expect_success 'add allows sparse entries with --sparse' '
 	test_must_be_empty stderr
 '
 
+test_expect_success 'can add files from non-sparse dir' '
+	git sparse-checkout set w !/x y/ &&
+	mkdir -p w x/y &&
+	touch w/f x/y/f &&
+	git add w/f x/y/f 2>stderr &&
+	test_must_be_empty stderr
+'
+
+test_expect_success 'refuse to add non-skip-worktree file from sparse dir' '
+	git sparse-checkout set !/x y/ !x/y/z &&
+	mkdir -p x/y/z &&
+	touch x/y/z/f &&
+	test_must_fail git add x/y/z/f 2>stderr &&
+	echo x/y/z/f | cat sparse_error_header - sparse_hint >expect &&
+	test_cmp expect stderr
+
+'
+
 test_done
diff --git a/t/t7002-mv-sparse-checkout.sh b/t/t7002-mv-sparse-checkout.sh
index 545748949a..1d3d2aca21 100755
--- a/t/t7002-mv-sparse-checkout.sh
+++ b/t/t7002-mv-sparse-checkout.sh
@@ -143,8 +143,6 @@  test_expect_success 'recursive mv refuses to move (possible) sparse' '
 	cat >>expect <<-\EOF &&
 	sub/d
 	sub2/d
-	sub/dir/e
-	sub2/dir/e
 	sub/dir2/e
 	sub2/dir2/e
 	EOF
@@ -186,4 +184,26 @@  test_expect_success 'recursive mv refuses to move sparse' '
 	git reset --hard HEAD~1
 '
 
+test_expect_success 'can move files to non-sparse dir' '
+	git reset --hard &&
+	git sparse-checkout init --no-cone &&
+	git sparse-checkout set a b c w !/x y/ &&
+	mkdir -p w x/y &&
+
+	git mv a w/new-a 2>stderr &&
+	git mv b x/y/new-b 2>stderr &&
+	test_must_be_empty stderr
+'
+
+test_expect_success 'refuse to move file to non-skip-worktree sparse path' '
+	git reset --hard &&
+	git sparse-checkout init --no-cone &&
+	git sparse-checkout set a !/x y/ !x/y/z &&
+	mkdir -p x/y/z &&
+
+	test_must_fail git mv a x/y/z/new-a 2>stderr &&
+	echo x/y/z/new-a | cat sparse_error_header - sparse_hint >expect &&
+	test_cmp expect stderr
+'
+
 test_done