Message ID | 99c09ccc2406e4f54c620bd7fb2d1205386e23a6.1649349442.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Sparse index integration with 'git show' | expand |
spelling nit On 07/04/2022 17:37, Derrick Stolee via GitGitGadget wrote: > From: Derrick Stolee <derrickstolee@github.com> > > When running 'git show :<path>' where '<path>' is a directory, then > there is a subtle difference between a full checkout and a sparse > checkout. The error message from diagnose_invalid_index_path() reports > whether the path is on disk or not. The full checkout will have the > directory on disk, but the path will not be in the index. The sparse > checokut could have the directory not exist, specifically when that s/checokut/checkout/ > directory is outside of the sparse-checkout cone. [...] -- Philip
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Derrick Stolee <derrickstolee@github.com> > > When running 'git show :<path>' where '<path>' is a directory, then > there is a subtle difference between a full checkout and a sparse > checkout. The error message from diagnose_invalid_index_path() reports > whether the path is on disk or not. The full checkout will have the > directory on disk, but the path will not be in the index. The sparse > checokut could have the directory not exist, specifically when that > directory is outside of the sparse-checkout cone. > > In the case of a sparse index, we have yet another state: the path can > be a sparse directory in the index. In this case, the error message from > diagnose_invalid_index_path() would erroneously say "path '<path>' is in > the index, but not at stage 0", which is false. > > Add special casing around sparse directory entries so we get to the > correct error message. This requires two checks in order to get parity > with the normal sparse-checkout case. That all makes sense, but let me ask a more basic (and possibly stupid) question and a half. - When running 'git cmd :<path>' where '<path>' is a directory, even before the "sparse-index" or "sparse-checkout" world, I sometimes wished that ":<path>" resolved to the object name of the tree recorded in the cache-tree, if we have one. If we are living in the "sparse-index" world, and the paths underneath '<path>' are not in the index (because we are not interested in them), we should be able answer "git rev-parse :<path>" with the name of the tree object. It is a "new feature" regardless of sparse-index, but I wonder if it is sensible to add more code to engrave in stone that we would not support it and keep treating the index as if it is a flat list of paths to blobs (and commits, for submodules)? - When running 'git cmd :<path>' where '<path>' is *not* a directory but is not in the index because of "sparse-index" (i.e. a leading directory of '<path>' is represented as a tree in the index), should ":<path>" index expand the "tree" index entry on-demand, so that <path> has its own entry in the index, as if no "sparse-index" is in effect? The tests I saw in the series were mostly asserted with test_sparse_match, not test_all_match, so I couldn't tell what the expectations are. - More generally, if <leading> path is represented as a "sparse-dir" entry, should ":<leading>/<lower>" cause the index entry for <leading> path to be expanded on-demand? I am guessing that the answer is yes, as we wouldn't be able to even tell if such a path <leading>/<lower> would exist in the index if the index were converted to full upfront. Thanks.
On 4/12/2022 2:32 AM, Junio C Hamano wrote: > "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > >> From: Derrick Stolee <derrickstolee@github.com> >> >> When running 'git show :<path>' where '<path>' is a directory, then >> there is a subtle difference between a full checkout and a sparse >> checkout. The error message from diagnose_invalid_index_path() reports >> whether the path is on disk or not. The full checkout will have the >> directory on disk, but the path will not be in the index. The sparse >> checokut could have the directory not exist, specifically when that >> directory is outside of the sparse-checkout cone. >> >> In the case of a sparse index, we have yet another state: the path can >> be a sparse directory in the index. In this case, the error message from >> diagnose_invalid_index_path() would erroneously say "path '<path>' is in >> the index, but not at stage 0", which is false. >> >> Add special casing around sparse directory entries so we get to the >> correct error message. This requires two checks in order to get parity >> with the normal sparse-checkout case. > > That all makes sense, but let me ask a more basic (and possibly > stupid) question and a half. All of these questions are good thoughts. > - When running 'git cmd :<path>' where '<path>' is a directory, > even before the "sparse-index" or "sparse-checkout" world, I > sometimes wished that ":<path>" resolved to the object name of > the tree recorded in the cache-tree, if we have one. If we are > living in the "sparse-index" world, and the paths underneath > '<path>' are not in the index (because we are not interested in > them), we should be able answer "git rev-parse :<path>" with the > name of the tree object. It is a "new feature" regardless of > sparse-index, but... I also considered "what if we made this 'just work' for trees in the index?" The issue I found is that we might need to generate the cache-tree extension when it does not exist (or is not up-to-date). For this series, I erred on matching existing behavior before adding such a feature. It is unfortunate that any work to show trees this way would essentially revert patches 3 and 4 of the current series. > ...I wonder if it is sensible to add more code to > engrave in stone that we would not support it and keep treating > the index as if it is a flat list of paths to blobs (and commits, > for submodules)? I don't consider any code change to be "engraved in stone", including these tests that I am adding. Tests document expected behavior. If we decide to change that expected behavior, then we can change the tests. Better to have tests that change when we are making a choice to change behavior than to not have that evidence of a behavior change at all. Unfortunately, we did not previously have tests for "git show :dir", so this is the first time we are documenting the behavior. > - When running 'git cmd :<path>' where '<path>' is *not* a > directory but is not in the index because of "sparse-index" > (i.e. a leading directory of '<path>' is represented as a tree in > the index), should ":<path>" index expand the "tree" index entry > on-demand, so that <path> has its own entry in the index, as if > no "sparse-index" is in effect? There was one case of test_sparse_match that could have been a test_all_match: "git show :folder1/a". Since folder1/a is discovered in the index, Git doesn't check the worktree for the existence of the file to report any different behavior. Changing the test works without any code change, so I'll update that in v2. > The tests I saw in the series > were mostly asserted with test_sparse_match, not test_all_match, > so I couldn't tell what the expectations are. The reason for test_sparse_match instead of test_all_match is because the error messages change depending on the existence of the path in the worktree. For example "git show :folder1/" in the test script would have this output difference: + diff -u full-checkout-err sparse-checkout-err --- full-checkout-err 2022-04-12 13:35:51.430805689 +0000 +++ sparse-checkout-err 2022-04-12 13:35:51.430805689 +0000 @@ -1 +1 @@ -fatal: path 'folder1/' exists on disk, but not in the index +fatal: path 'folder1/' does not exist (neither on disk nor in the index) > - More generally, if <leading> path is represented as a > "sparse-dir" entry, should ":<leading>/<lower>" cause the index > entry for <leading> path to be expanded on-demand? I am guessing > that the answer is yes, as we wouldn't be able to even tell if > such a path <leading>/<lower> would exist in the index if the > index were converted to full upfront. This is what happens, but it happens inside index_name_pos(), so there isn't any change necessary in the builtin to handle this. Tests such as "git show :folder1/a" demonstrate this behavior. Now, there could be a _faster_ way to do this if we did not depend directly on index_name_pos(). Using index_name_pos() will trigger the index expansion, which is necessary for some compatibility with the sparse index. In this case, we don't really care what _position_ the cache entry is in, we just want to know if the path exists in the index or not. For that, we could extract a helper method out of the implementation of index_name_pos() to be something like index_name_exists(). The common code between the two implementations is to find a position in the index that either corresponds to the exact path _or_ is a sparse directory (or the path definitely doesn't exist in the index). From that point, index_name_exists() could do a tree walk to find the path inside the tree. All of this is _possible_, but I also understand that running "git show :<path>" is not common for users to run directly. Instead, I've seen this used by IDEs to find the staged version of a file. This is not likely to be run for paths outside of the sparse-checkout cone. I think that this is not the only place that could benefit from an index_name_exists() method, but I'm not ready to focus on improving performance for queries outside of the sparse-checkout cone. Thanks, -Stolee
Derrick Stolee <derrickstolee@github.com> writes: > The reason for test_sparse_match instead of test_all_match is because the > error messages change depending on the existence of the path in the > worktree. For example "git show :folder1/" in the test script would have > this output difference: > > + diff -u full-checkout-err sparse-checkout-err > --- full-checkout-err 2022-04-12 13:35:51.430805689 +0000 > +++ sparse-checkout-err 2022-04-12 13:35:51.430805689 +0000 > @@ -1 +1 @@ > -fatal: path 'folder1/' exists on disk, but not in the index > +fatal: path 'folder1/' does not exist (neither on disk nor in the index) Ah, because naturally folder1/ would not be on disk if it is outside the sparse cones? Makes sense. Thanks.
diff --git a/object-name.c b/object-name.c index 2dc5d2549b8..4d2746574cd 100644 --- a/object-name.c +++ b/object-name.c @@ -1832,7 +1832,8 @@ static void diagnose_invalid_index_path(struct repository *r, pos = -pos - 1; if (pos < istate->cache_nr) { ce = istate->cache[pos]; - if (ce_namelen(ce) == namelen && + if (!S_ISSPARSEDIR(ce->ce_mode) && + ce_namelen(ce) == namelen && !memcmp(ce->name, filename, namelen)) die(_("path '%s' is in the index, but not at stage %d\n" "hint: Did you mean ':%d:%s'?"), @@ -1848,7 +1849,8 @@ static void diagnose_invalid_index_path(struct repository *r, pos = -pos - 1; if (pos < istate->cache_nr) { ce = istate->cache[pos]; - if (ce_namelen(ce) == fullname.len && + if (!S_ISSPARSEDIR(ce->ce_mode) && + ce_namelen(ce) == fullname.len && !memcmp(ce->name, fullname.buf, fullname.len)) die(_("path '%s' is in the index, but not '%s'\n" "hint: Did you mean ':%d:%s' aka ':%d:./%s'?"), diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index 9d32361110d..921cafb950a 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -1164,8 +1164,17 @@ test_expect_success 'show (cached blobs/trees)' ' test_must_fail git -C full-checkout show :folder1/ && test_must_fail git -C sparse-checkout show :folder1/ && - test_must_fail git -C sparse-index show :folder1/ 2>err && - grep "is in the index, but not at stage 0" err + test_sparse_match test_must_fail git show :folder1/ && + + # Change the sparse cone for an extra case: + run_on_sparse git sparse-checkout set deep/deeper1 && + + # deep/deeper2 is a sparse directory in the sparse index. + test_sparse_match test_must_fail git show :deep/deeper2/ && + + # deep/deeper2/deepest is not in the sparse index, but + # will trigger an index expansion. + test_sparse_match test_must_fail git show :deep/deeper2/deepest/ ' test_expect_success 'submodule handling' '