diff mbox series

[4/4] object-name: diagnose trees in index properly

Message ID 99c09ccc2406e4f54c620bd7fb2d1205386e23a6.1649349442.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series Sparse index integration with 'git show' | expand

Commit Message

Derrick Stolee April 7, 2022, 4:37 p.m. UTC
From: Derrick Stolee <derrickstolee@github.com>

When running 'git show :<path>' where '<path>' is a directory, then
there is a subtle difference between a full checkout and a sparse
checkout. The error message from diagnose_invalid_index_path() reports
whether the path is on disk or not. The full checkout will have the
directory on disk, but the path will not be in the index. The sparse
checokut could have the directory not exist, specifically when that
directory is outside of the sparse-checkout cone.

In the case of a sparse index, we have yet another state: the path can
be a sparse directory in the index. In this case, the error message from
diagnose_invalid_index_path() would erroneously say "path '<path>' is in
the index, but not at stage 0", which is false.

Add special casing around sparse directory entries so we get to the
correct error message. This requires two checks in order to get parity
with the normal sparse-checkout case.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 object-name.c                            |  6 ++++--
 t/t1092-sparse-checkout-compatibility.sh | 13 +++++++++++--
 2 files changed, 15 insertions(+), 4 deletions(-)

Comments

Philip Oakley April 7, 2022, 8:46 p.m. UTC | #1
spelling nit

On 07/04/2022 17:37, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <derrickstolee@github.com>
>
> When running 'git show :<path>' where '<path>' is a directory, then
> there is a subtle difference between a full checkout and a sparse
> checkout. The error message from diagnose_invalid_index_path() reports
> whether the path is on disk or not. The full checkout will have the
> directory on disk, but the path will not be in the index. The sparse
> checokut could have the directory not exist, specifically when that

s/checokut/checkout/
> directory is outside of the sparse-checkout cone.
[...]
--
Philip
Junio C Hamano April 12, 2022, 6:32 a.m. UTC | #2
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <derrickstolee@github.com>
>
> When running 'git show :<path>' where '<path>' is a directory, then
> there is a subtle difference between a full checkout and a sparse
> checkout. The error message from diagnose_invalid_index_path() reports
> whether the path is on disk or not. The full checkout will have the
> directory on disk, but the path will not be in the index. The sparse
> checokut could have the directory not exist, specifically when that
> directory is outside of the sparse-checkout cone.
>
> In the case of a sparse index, we have yet another state: the path can
> be a sparse directory in the index. In this case, the error message from
> diagnose_invalid_index_path() would erroneously say "path '<path>' is in
> the index, but not at stage 0", which is false.
>
> Add special casing around sparse directory entries so we get to the
> correct error message. This requires two checks in order to get parity
> with the normal sparse-checkout case.

That all makes sense, but let me ask a more basic (and possibly
stupid) question and a half. 

 - When running 'git cmd :<path>' where '<path>' is a directory,
   even before the "sparse-index" or "sparse-checkout" world, I
   sometimes wished that ":<path>" resolved to the object name of
   the tree recorded in the cache-tree, if we have one.  If we are
   living in the "sparse-index" world, and the paths underneath
   '<path>' are not in the index (because we are not interested in
   them), we should be able answer "git rev-parse :<path>" with the
   name of the tree object.  It is a "new feature" regardless of
   sparse-index, but I wonder if it is sensible to add more code to
   engrave in stone that we would not support it and keep treating
   the index as if it is a flat list of paths to blobs (and commits,
   for submodules)?

 - When running 'git cmd :<path>' where '<path>' is *not* a
   directory but is not in the index because of "sparse-index"
   (i.e. a leading directory of '<path>' is represented as a tree in
   the index), should ":<path>" index expand the "tree" index entry
   on-demand, so that <path> has its own entry in the index, as if
   no "sparse-index" is in effect?  The tests I saw in the series
   were mostly asserted with test_sparse_match, not test_all_match,
   so I couldn't tell what the expectations are.

 - More generally, if <leading> path is represented as a
   "sparse-dir" entry, should ":<leading>/<lower>" cause the index
   entry for <leading> path to be expanded on-demand?  I am guessing
   that the answer is yes, as we wouldn't be able to even tell if
   such a path <leading>/<lower> would exist in the index if the
   index were converted to full upfront.

Thanks.
Derrick Stolee April 12, 2022, 1:52 p.m. UTC | #3
On 4/12/2022 2:32 AM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Derrick Stolee <derrickstolee@github.com>
>>
>> When running 'git show :<path>' where '<path>' is a directory, then
>> there is a subtle difference between a full checkout and a sparse
>> checkout. The error message from diagnose_invalid_index_path() reports
>> whether the path is on disk or not. The full checkout will have the
>> directory on disk, but the path will not be in the index. The sparse
>> checokut could have the directory not exist, specifically when that
>> directory is outside of the sparse-checkout cone.
>>
>> In the case of a sparse index, we have yet another state: the path can
>> be a sparse directory in the index. In this case, the error message from
>> diagnose_invalid_index_path() would erroneously say "path '<path>' is in
>> the index, but not at stage 0", which is false.
>>
>> Add special casing around sparse directory entries so we get to the
>> correct error message. This requires two checks in order to get parity
>> with the normal sparse-checkout case.
> 
> That all makes sense, but let me ask a more basic (and possibly
> stupid) question and a half.

All of these questions are good thoughts. 

>  - When running 'git cmd :<path>' where '<path>' is a directory,
>    even before the "sparse-index" or "sparse-checkout" world, I
>    sometimes wished that ":<path>" resolved to the object name of
>    the tree recorded in the cache-tree, if we have one.  If we are
>    living in the "sparse-index" world, and the paths underneath
>    '<path>' are not in the index (because we are not interested in
>    them), we should be able answer "git rev-parse :<path>" with the
>    name of the tree object.  It is a "new feature" regardless of
>    sparse-index, but...

I also considered "what if we made this 'just work' for trees in the
index?" The issue I found is that we might need to generate the cache-tree
extension when it does not exist (or is not up-to-date).

For this series, I erred on matching existing behavior before
adding such a feature. It is unfortunate that any work to show
trees this way would essentially revert patches 3 and 4 of the
current series.

>    ...I wonder if it is sensible to add more code to
>    engrave in stone that we would not support it and keep treating
>    the index as if it is a flat list of paths to blobs (and commits,
>    for submodules)?

I don't consider any code change to be "engraved in stone", including
these tests that I am adding. Tests document expected behavior. If we
decide to change that expected behavior, then we can change the tests.
Better to have tests that change when we are making a choice to change
behavior than to not have that evidence of a behavior change at all.

Unfortunately, we did not previously have tests for "git show :dir",
so this is the first time we are documenting the behavior.

>  - When running 'git cmd :<path>' where '<path>' is *not* a
>    directory but is not in the index because of "sparse-index"
>    (i.e. a leading directory of '<path>' is represented as a tree in
>    the index), should ":<path>" index expand the "tree" index entry
>    on-demand, so that <path> has its own entry in the index, as if
>    no "sparse-index" is in effect?

There was one case of test_sparse_match that could have been a
test_all_match: "git show :folder1/a". Since folder1/a is discovered in
the index, Git doesn't check the worktree for the existence of the file to
report any different behavior. Changing the test works without any code
change, so I'll update that in v2.

>    The tests I saw in the series
>    were mostly asserted with test_sparse_match, not test_all_match,
>    so I couldn't tell what the expectations are.

The reason for test_sparse_match instead of test_all_match is because the
error messages change depending on the existence of the path in the
worktree. For example "git show :folder1/" in the test script would have
this output difference:

+ diff -u full-checkout-err sparse-checkout-err
--- full-checkout-err   2022-04-12 13:35:51.430805689 +0000
+++ sparse-checkout-err 2022-04-12 13:35:51.430805689 +0000
@@ -1 +1 @@
-fatal: path 'folder1/' exists on disk, but not in the index
+fatal: path 'folder1/' does not exist (neither on disk nor in the index)

>  - More generally, if <leading> path is represented as a
>    "sparse-dir" entry, should ":<leading>/<lower>" cause the index
>    entry for <leading> path to be expanded on-demand?  I am guessing
>    that the answer is yes, as we wouldn't be able to even tell if
>    such a path <leading>/<lower> would exist in the index if the
>    index were converted to full upfront.

This is what happens, but it happens inside index_name_pos(), so there
isn't any change necessary in the builtin to handle this. Tests such as
"git show :folder1/a" demonstrate this behavior.

Now, there could be a _faster_ way to do this if we did not depend
directly on index_name_pos(). Using index_name_pos() will trigger the
index expansion, which is necessary for some compatibility with the sparse
index.

In this case, we don't really care what _position_ the cache entry is in,
we just want to know if the path exists in the index or not.

For that, we could extract a helper method out of the implementation of
index_name_pos() to be something like index_name_exists(). The common code
between the two implementations is to find a position in the index that
either corresponds to the exact path _or_ is a sparse directory (or the
path definitely doesn't exist in the index). From that point,
index_name_exists() could do a tree walk to find the path inside the tree.

All of this is _possible_, but I also understand that running "git show
:<path>" is not common for users to run directly. Instead, I've seen this
used by IDEs to find the staged version of a file. This is not likely to
be run for paths outside of the sparse-checkout cone.

I think that this is not the only place that could benefit from an
index_name_exists() method, but I'm not ready to focus on improving
performance for queries outside of the sparse-checkout cone.

Thanks,
-Stolee
Junio C Hamano April 12, 2022, 3:45 p.m. UTC | #4
Derrick Stolee <derrickstolee@github.com> writes:

> The reason for test_sparse_match instead of test_all_match is because the
> error messages change depending on the existence of the path in the
> worktree. For example "git show :folder1/" in the test script would have
> this output difference:
>
> + diff -u full-checkout-err sparse-checkout-err
> --- full-checkout-err   2022-04-12 13:35:51.430805689 +0000
> +++ sparse-checkout-err 2022-04-12 13:35:51.430805689 +0000
> @@ -1 +1 @@
> -fatal: path 'folder1/' exists on disk, but not in the index
> +fatal: path 'folder1/' does not exist (neither on disk nor in the index)

Ah, because naturally folder1/ would not be on disk if it is outside
the sparse cones?  Makes sense.

Thanks.
diff mbox series

Patch

diff --git a/object-name.c b/object-name.c
index 2dc5d2549b8..4d2746574cd 100644
--- a/object-name.c
+++ b/object-name.c
@@ -1832,7 +1832,8 @@  static void diagnose_invalid_index_path(struct repository *r,
 		pos = -pos - 1;
 	if (pos < istate->cache_nr) {
 		ce = istate->cache[pos];
-		if (ce_namelen(ce) == namelen &&
+		if (!S_ISSPARSEDIR(ce->ce_mode) &&
+		    ce_namelen(ce) == namelen &&
 		    !memcmp(ce->name, filename, namelen))
 			die(_("path '%s' is in the index, but not at stage %d\n"
 			    "hint: Did you mean ':%d:%s'?"),
@@ -1848,7 +1849,8 @@  static void diagnose_invalid_index_path(struct repository *r,
 		pos = -pos - 1;
 	if (pos < istate->cache_nr) {
 		ce = istate->cache[pos];
-		if (ce_namelen(ce) == fullname.len &&
+		if (!S_ISSPARSEDIR(ce->ce_mode) &&
+		    ce_namelen(ce) == fullname.len &&
 		    !memcmp(ce->name, fullname.buf, fullname.len))
 			die(_("path '%s' is in the index, but not '%s'\n"
 			    "hint: Did you mean ':%d:%s' aka ':%d:./%s'?"),
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 9d32361110d..921cafb950a 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -1164,8 +1164,17 @@  test_expect_success 'show (cached blobs/trees)' '
 	test_must_fail git -C full-checkout show :folder1/ &&
 	test_must_fail git -C sparse-checkout show :folder1/ &&
 
-	test_must_fail git -C sparse-index show :folder1/ 2>err &&
-	grep "is in the index, but not at stage 0" err
+	test_sparse_match test_must_fail git show :folder1/ &&
+
+	# Change the sparse cone for an extra case:
+	run_on_sparse git sparse-checkout set deep/deeper1 &&
+
+	# deep/deeper2 is a sparse directory in the sparse index.
+	test_sparse_match test_must_fail git show :deep/deeper2/ &&
+
+	# deep/deeper2/deepest is not in the sparse index, but
+	# will trigger an index expansion.
+	test_sparse_match test_must_fail git show :deep/deeper2/deepest/
 '
 
 test_expect_success 'submodule handling' '