diff mbox series

[2/2] unpack-trees: use traverse_path instead of name

Message ID c91421996562ecedab18f8af4ed8a02896584540.1638586534.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series Sparse Index: fix a checkout bug with deep sparse-checkout patterns | expand

Commit Message

Derrick Stolee Dec. 4, 2021, 2:55 a.m. UTC
From: Derrick Stolee <dstolee@microsoft.com>

The sparse_dir_matches_path() method compares a cache entry that is a
sparse directory entry against a 'struct traverse_info *info' and a
'struct name_entry *p' to see if the cache entry has exactly the right
name for those other inputs.

This method was introduced in 523506d (unpack-trees: unpack sparse
directory entries, 2021-07-14), but included a significant mistake. The
path comparisons used 'info->name' instead of 'info->traverse_path'.
Since 'info->name' only stores a single tree entry name while
'info->traverse_path' stores the full path from root, this method does
not work when 'info' is in a subdirectory of a directory. Replacing the
right strings and their corresponding lengths make the method work
properly.

The previous change included a failing test that exposes this issue.
That test now passes. The critical detail is that as we go deep into
unpack_trees(), the logic for merging a sparse directory entry with a
tree entry during 'git checkout' relies on this
sparse_dir_matches_path() in order to avoid calling
traverse_trees_recursive() during unpack_callback() in this hunk:

	if (!is_sparse_directory_entry(src[0], names, info) &&
	    traverse_trees_recursive(n, dirmask, mask & ~dirmask,
					    names, info) < 0) {
		return -1;
	}

For deep paths, the short-circuit never occurred and
traverse_trees_recursive() was being called incorrectly and that was
causing other strange issues. Specifically, the error message from the
now-passing test previously included this:

      error: Your local changes to the following files would be overwritten by checkout:
              deep/deeper1/deepest2/a
              deep/deeper1/deepest3/a
      Please commit your changes or stash them before you switch branches.
      Aborting

These messages occurred because the 'current' cache entry in
twoway_merge() was showing as NULL because the index did not contain
entries for the paths contained within the sparse directory entries. We
instead had 'oldtree' given as the entry at HEAD and 'newtree' as the
entry in the target tree. This led to reject_merge() listing these
paths.

Now that sparse_dir_matches_path() works the same for deep paths as it
does for shallow depths, the rest of the logic kicks in to properly
handle modifying the sparse directory entries as designed.

Reported-by: Gustave Granroth <gus.gran@gmail.com>
Reported-by: Mike Marcelais <michmarc@exchange.microsoft.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh |  2 +-
 unpack-trees.c                           | 10 +++++-----
 2 files changed, 6 insertions(+), 6 deletions(-)

Comments

Elijah Newren Dec. 4, 2021, 5:42 a.m. UTC | #1
On Fri, Dec 3, 2021 at 6:55 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> The sparse_dir_matches_path() method compares a cache entry that is a
> sparse directory entry against a 'struct traverse_info *info' and a
> 'struct name_entry *p' to see if the cache entry has exactly the right
> name for those other inputs.
>
> This method was introduced in 523506d (unpack-trees: unpack sparse
> directory entries, 2021-07-14), but included a significant mistake. The
> path comparisons used 'info->name' instead of 'info->traverse_path'.
> Since 'info->name' only stores a single tree entry name while
> 'info->traverse_path' stores the full path from root, this method does
> not work when 'info' is in a subdirectory of a directory. Replacing the
> right strings and their corresponding lengths make the method work
> properly.
>
> The previous change included a failing test that exposes this issue.
> That test now passes. The critical detail is that as we go deep into
> unpack_trees(), the logic for merging a sparse directory entry with a
> tree entry during 'git checkout' relies on this
> sparse_dir_matches_path() in order to avoid calling
> traverse_trees_recursive() during unpack_callback() in this hunk:
>
>         if (!is_sparse_directory_entry(src[0], names, info) &&
>             traverse_trees_recursive(n, dirmask, mask & ~dirmask,
>                                             names, info) < 0) {
>                 return -1;
>         }
>
> For deep paths, the short-circuit never occurred and
> traverse_trees_recursive() was being called incorrectly and that was
> causing other strange issues. Specifically, the error message from the
> now-passing test previously included this:
>
>       error: Your local changes to the following files would be overwritten by checkout:
>               deep/deeper1/deepest2/a
>               deep/deeper1/deepest3/a
>       Please commit your changes or stash them before you switch branches.
>       Aborting
>
> These messages occurred because the 'current' cache entry in
> twoway_merge() was showing as NULL because the index did not contain
> entries for the paths contained within the sparse directory entries. We
> instead had 'oldtree' given as the entry at HEAD and 'newtree' as the
> entry in the target tree. This led to reject_merge() listing these
> paths.
>
> Now that sparse_dir_matches_path() works the same for deep paths as it
> does for shallow depths, the rest of the logic kicks in to properly
> handle modifying the sparse directory entries as designed.

Eek, sorry for not catching this in my earlier review.  Thanks for the
detailed explanation; well analyzed.

>
> Reported-by: Gustave Granroth <gus.gran@gmail.com>
> Reported-by: Mike Marcelais <michmarc@exchange.microsoft.com>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  t/t1092-sparse-checkout-compatibility.sh |  2 +-
>  unpack-trees.c                           | 10 +++++-----
>  2 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index e6aef40e9b3..f04a02c6b20 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -307,7 +307,7 @@ test_expect_success 'add, commit, checkout' '
>         test_all_match git checkout -
>  '
>
> -test_expect_failure 'deep changes during checkout' '
> +test_expect_success 'deep changes during checkout' '
>         init_repos &&
>
>         test_sparse_match git sparse-checkout set deep/deeper1/deepest &&
> diff --git a/unpack-trees.c b/unpack-trees.c
> index 89ca95ce90b..7381c275768 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -1243,11 +1243,11 @@ static int sparse_dir_matches_path(const struct cache_entry *ce,
>         assert(S_ISSPARSEDIR(ce->ce_mode));
>         assert(ce->name[ce->ce_namelen - 1] == '/');
>
> -       if (info->namelen)
> -               return ce->ce_namelen == info->namelen + p->pathlen + 2 &&
> -                      ce->name[info->namelen] == '/' &&
> -                      !strncmp(ce->name, info->name, info->namelen) &&
> -                      !strncmp(ce->name + info->namelen + 1, p->path, p->pathlen);
> +       if (info->pathlen)
> +               return ce->ce_namelen == info->pathlen + p->pathlen + 1 &&
> +                      ce->name[info->pathlen - 1] == '/' &&
> +                      !strncmp(ce->name, info->traverse_path, info->pathlen) &&
> +                      !strncmp(ce->name + info->pathlen, p->path, p->pathlen);
>         return ce->ce_namelen == p->pathlen + 1 &&
>                !strncmp(ce->name, p->path, p->pathlen);
>  }
> --

The comment at the beginning of this function (not shown in this
patch) is now stale and misleading; it should be corrected too.
Derrick Stolee Dec. 6, 2021, 1:59 p.m. UTC | #2
On 12/4/2021 12:42 AM, Elijah Newren wrote:
> On Fri, Dec 3, 2021 at 6:55 PM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:

>> @@ -1243,11 +1243,11 @@ static int sparse_dir_matches_path(const struct cache_entry *ce,
>>         assert(S_ISSPARSEDIR(ce->ce_mode));
>>         assert(ce->name[ce->ce_namelen - 1] == '/');
>>
>> -       if (info->namelen)
>> -               return ce->ce_namelen == info->namelen + p->pathlen + 2 &&
>> -                      ce->name[info->namelen] == '/' &&
>> -                      !strncmp(ce->name, info->name, info->namelen) &&
>> -                      !strncmp(ce->name + info->namelen + 1, p->path, p->pathlen);
>> +       if (info->pathlen)
>> +               return ce->ce_namelen == info->pathlen + p->pathlen + 1 &&
>> +                      ce->name[info->pathlen - 1] == '/' &&
>> +                      !strncmp(ce->name, info->traverse_path, info->pathlen) &&
>> +                      !strncmp(ce->name + info->pathlen, p->path, p->pathlen);
>>         return ce->ce_namelen == p->pathlen + 1 &&
>>                !strncmp(ce->name, p->path, p->pathlen);
>>  }
>> --
> 
> The comment at the beginning of this function (not shown in this
> patch) is now stale and misleading; it should be corrected too.
 
Will do! Thanks for catching that.

Thanks,
-Stolee
diff mbox series

Patch

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index e6aef40e9b3..f04a02c6b20 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -307,7 +307,7 @@  test_expect_success 'add, commit, checkout' '
 	test_all_match git checkout -
 '
 
-test_expect_failure 'deep changes during checkout' '
+test_expect_success 'deep changes during checkout' '
 	init_repos &&
 
 	test_sparse_match git sparse-checkout set deep/deeper1/deepest &&
diff --git a/unpack-trees.c b/unpack-trees.c
index 89ca95ce90b..7381c275768 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1243,11 +1243,11 @@  static int sparse_dir_matches_path(const struct cache_entry *ce,
 	assert(S_ISSPARSEDIR(ce->ce_mode));
 	assert(ce->name[ce->ce_namelen - 1] == '/');
 
-	if (info->namelen)
-		return ce->ce_namelen == info->namelen + p->pathlen + 2 &&
-		       ce->name[info->namelen] == '/' &&
-		       !strncmp(ce->name, info->name, info->namelen) &&
-		       !strncmp(ce->name + info->namelen + 1, p->path, p->pathlen);
+	if (info->pathlen)
+		return ce->ce_namelen == info->pathlen + p->pathlen + 1 &&
+		       ce->name[info->pathlen - 1] == '/' &&
+		       !strncmp(ce->name, info->traverse_path, info->pathlen) &&
+		       !strncmp(ce->name + info->pathlen, p->path, p->pathlen);
 	return ce->ce_namelen == p->pathlen + 1 &&
 	       !strncmp(ce->name, p->path, p->pathlen);
 }