mbox series

[v2,0/5] Sparse index integration with 'git show'

Message ID pull.1207.v2.git.1651005800.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series Sparse index integration with 'git show' | expand

Message

Philippe Blain via GitGitGadget April 26, 2022, 8:43 p.m. UTC
This continues our sequence of integrating builtins with the sparse index.

'git show' is relatively simple to get working in a way that doesn't fail
when it would previously succeed, but there are some subtleties when the
user passes a directory path using the ":" syntax to get the path out of the
index. If that path happens to be a sparse directory entry, we suddenly
start succeeding and printing the tree information!

Since this behavior can change depending on the sparse checkout definition
and the state of index entries within that directory, this new behavior
would be more likely to confuse users than help them.

This ":" syntax is shared by other commands like "git rev-parse", but we are
not adding those integrations at this point.

Some background: as we shipped our sparse index integrations in the
microsoft/git fork and measured performance of real users, we noticed that
'git show' was a frequently-used command that went from ~0.5 seconds to ~4
seconds in monorepo situations. This was unexpected, because we didn't know
about the ":" syntax. Further, it seemed that some third-party tools were
triggering this behavior as a frequent way to check on staged content. That
motivated quickly shipping this integration. Performance improved to ~0.1
seconds because of the reduced index size. While inspecting our rebase of
microsoft/git commits onto the 2.36.0 release candidates, I noticed this
integration would be a simpler example to demonstrate how sparse index
integrations should go when the behavior is not too complicated.

Here is an outline of this series:

 * Patch 1: Add more tests around 'git show :' in t1092. These tests are
   only to establish the existing differences between the full and
   sparse-checkout cases, since 'git show' is still protected by
   'command_requires_full_index'.

 * Patch 2: Make 'git show' stop expanding the index by default. Make note
   of this behavior change in the tests.

 * Patches 3-4: Make the subtle changes to object-name.c that help us reject
   sparse directories (patch 3) and print the correct error message (patch
   4).

 * Patch 5: Now that the common index-parsing code is updated, do the
   minimum change to 'git rev-parse' to avoid expanding the index to parse
   index entries for a sparse index.

Patches 2-4 could realistically be squashed into a single commit, but I
thought it might be instructive to show these individual steps, especially
as an example for our GSoC project.

I know that Victoria intends to submit her 'git stash' integration soon, and
this provides a way to test if our split of test changes in t1092 are easy
to merge without conflict. If that is successful, then I will likely submit
my integration with the 'sparse-checkout' builtin after this series is
complete. (UPDATE: we inserted a test in the same location of t1092, but
otherwise there are no textual or semantic conflicts.)


Updates in v2
=============

 * The test comment in patch 2 is updated.
 * A commit message typo in patch 4 is fixed.
 * Patch 4 simplified the behavior, but the previous version didn't clean up
   a test comment about that. It now cleans up the test to be simpler.
 * Patch 5 includes an integration with 'git rev-parse'.
 * The cover letter is expanded with more context.
 * The only conflict with Victoria's new 'git stash' patch series is that we
   both added a test in the same position of t1092. Including both new tests
   is the right way to resolve the conflict. Order does not matter.

[1]
https://lore.kernel.org/git/pull.1171.git.1650908957.gitgitgadget@gmail.com/

Thanks, -Stolee

Derrick Stolee (5):
  t1092: add compatibility tests for 'git show'
  show: integrate with the sparse index
  object-name: reject trees found in the index
  object-name: diagnose trees in index properly
  rev-parse: integrate with sparse index

 builtin/log.c                            |  5 ++++
 builtin/rev-parse.c                      |  3 ++
 object-name.c                            | 25 ++++++++++++++--
 t/t1092-sparse-checkout-compatibility.sh | 36 ++++++++++++++++++++++++
 4 files changed, 66 insertions(+), 3 deletions(-)


base-commit: 07330a41d66a2c9589b585a3a24ecdcf19994f19
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1207%2Fderrickstolee%2Fsparse-index%2Fgit-show-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1207/derrickstolee/sparse-index/git-show-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1207

Range-diff vs v1:

 1:  8c2fdb5a4fc = 1:  8c2fdb5a4fc t1092: add compatibility tests for 'git show'
 2:  27ab853a9b4 ! 2:  2e9d47ab09b show: integrate with the sparse index
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'show (cached blob
      -	# does not work as implemented. The error message is
      -	# different for a full checkout and a sparse checkout
      -	# when the directory is outside of the cone.
     -+	# changes depending on the existence of a sparse index.
     ++	# had different behavior depending on the existence
     ++	# of a sparse index.
       	test_all_match test_must_fail git show :deep/ &&
       	test_must_fail git -C full-checkout show :folder1/ &&
      -	test_sparse_match test_must_fail git show :folder1/
 3:  f5da5327673 = 3:  5a7561637f0 object-name: reject trees found in the index
 4:  99c09ccc240 ! 4:  b730457fccc object-name: diagnose trees in index properly
     @@ Commit message
          checkout. The error message from diagnose_invalid_index_path() reports
          whether the path is on disk or not. The full checkout will have the
          directory on disk, but the path will not be in the index. The sparse
     -    checokut could have the directory not exist, specifically when that
     +    checkout could have the directory not exist, specifically when that
          directory is outside of the sparse-checkout cone.
      
          In the case of a sparse index, we have yet another state: the path can
     @@ object-name.c: static void diagnose_invalid_index_path(struct repository *r,
      
       ## t/t1092-sparse-checkout-compatibility.sh ##
      @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'show (cached blobs/trees)' '
     + 	test_all_match git show :deep/a &&
     + 	test_sparse_match git show :folder1/a &&
     + 
     +-	# Asking "git show" for directories in the index
     +-	# had different behavior depending on the existence
     +-	# of a sparse index.
     ++	# The error message differs depending on whether
     ++	# the directory exists in the worktree.
     + 	test_all_match test_must_fail git show :deep/ &&
       	test_must_fail git -C full-checkout show :folder1/ &&
     - 	test_must_fail git -C sparse-checkout show :folder1/ &&
     +-	test_must_fail git -C sparse-checkout show :folder1/ &&
     ++	test_sparse_match test_must_fail git show :folder1/ &&
       
      -	test_must_fail git -C sparse-index show :folder1/ 2>err &&
      -	grep "is in the index, but not at stage 0" err
     -+	test_sparse_match test_must_fail git show :folder1/ &&
     -+
      +	# Change the sparse cone for an extra case:
      +	run_on_sparse git sparse-checkout set deep/deeper1 &&
      +
 -:  ----------- > 5:  69efe637a18 rev-parse: integrate with sparse index

Comments

Junio C Hamano April 26, 2022, 8:55 p.m. UTC | #1
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> This ":" syntax is shared by other commands like "git rev-parse", but we are
> not adding those integrations at this point.

This has been the most curious thing since the initial round.  The
changes in the series are mostly about the code that parses the
":<path>" syntax and yield an object name (when exists) or give an
error messages (otherwise) and die, before the computed object name
gets used by "git show", or any other command that takes an object
name from the command line.

I guess what has been confusing me was the phrase "integration",
that you seem to be using to refer only to the final step of setting
require_full_index to 0, while that is the least interesting part of
a series like this one.  The work done in patches 3/ and 4/ that
paves the way to allow us to set the require_full_index to 0 in any
command that needs to work with the ":<path>" syntax is much more
interesting part of the series, and when viewed from that angle, the
series is not about preparing "show" but about ":<path>" syntax to
work better with the sparse index.

> I know that Victoria intends to submit her 'git stash' integration soon, and
> this provides a way to test if our split of test changes in t1092 are easy
> to merge without conflict. If that is successful, then I will likely submit
> my integration with the 'sparse-checkout' builtin after this series is
> complete. (UPDATE: we inserted a test in the same location of t1092, but
> otherwise there are no textual or semantic conflicts.)

Yeah, I think I already queued Victoria's and the previous round of
this series in 'seen' without problematic conflicts.

Let's take a brief look at the range-diff before I go do something
else...

>  2:  27ab853a9b4 ! 2:  2e9d47ab09b show: integrate with the sparse index
>      @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'show (cached blob
>       -	# does not work as implemented. The error message is
>       -	# different for a full checkout and a sparse checkout
>       -	# when the directory is outside of the cone.
>      -+	# changes depending on the existence of a sparse index.
>      ++	# had different behavior depending on the existence
>      ++	# of a sparse index.
>        	test_all_match test_must_fail git show :deep/ &&
>        	test_must_fail git -C full-checkout show :folder1/ &&
>       -	test_sparse_match test_must_fail git show :folder1/
>  3:  f5da5327673 = 3:  5a7561637f0 object-name: reject trees found in the index
>  4:  99c09ccc240 ! 4:  b730457fccc object-name: diagnose trees in index properly
>      @@ Commit message
>           checkout. The error message from diagnose_invalid_index_path() reports
>           whether the path is on disk or not. The full checkout will have the
>           directory on disk, but the path will not be in the index. The sparse
>      -    checokut could have the directory not exist, specifically when that
>      +    checkout could have the directory not exist, specifically when that
>           directory is outside of the sparse-checkout cone.
>       
>           In the case of a sparse index, we have yet another state: the path can
>      @@ object-name.c: static void diagnose_invalid_index_path(struct repository *r,
>       
>        ## t/t1092-sparse-checkout-compatibility.sh ##
>       @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'show (cached blobs/trees)' '
>      + 	test_all_match git show :deep/a &&
>      + 	test_sparse_match git show :folder1/a &&
>      + 
>      +-	# Asking "git show" for directories in the index
>      +-	# had different behavior depending on the existence
>      +-	# of a sparse index.
>      ++	# The error message differs depending on whether
>      ++	# the directory exists in the worktree.
>      + 	test_all_match test_must_fail git show :deep/ &&
>        	test_must_fail git -C full-checkout show :folder1/ &&
>      - 	test_must_fail git -C sparse-checkout show :folder1/ &&
>      +-	test_must_fail git -C sparse-checkout show :folder1/ &&
>      ++	test_sparse_match test_must_fail git show :folder1/ &&
>        
>       -	test_must_fail git -C sparse-index show :folder1/ 2>err &&
>       -	grep "is in the index, but not at stage 0" err
>      -+	test_sparse_match test_must_fail git show :folder1/ &&
>      -+
>       +	# Change the sparse cone for an extra case:
>       +	run_on_sparse git sparse-checkout set deep/deeper1 &&
>       +
>  -:  ----------- > 5:  69efe637a18 rev-parse: integrate with sparse index

OK, nothing unexpected other than the rev-parse stuff.

Thanks.
Derrick Stolee April 27, 2022, 1:47 p.m. UTC | #2
On 4/26/2022 4:55 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> This ":" syntax is shared by other commands like "git rev-parse", but we are
>> not adding those integrations at this point.
> 
> This has been the most curious thing since the initial round.  The
> changes in the series are mostly about the code that parses the
> ":<path>" syntax and yield an object name (when exists) or give an
> error messages (otherwise) and die, before the computed object name
> gets used by "git show", or any other command that takes an object
> name from the command line.
> 
> I guess what has been confusing me was the phrase "integration",
> that you seem to be using to refer only to the final step of setting
> require_full_index to 0, while that is the least interesting part of
> a series like this one.  The work done in patches 3/ and 4/ that
> paves the way to allow us to set the require_full_index to 0 in any
> command that needs to work with the ":<path>" syntax is much more
> interesting part of the series, and when viewed from that angle, the
> series is not about preparing "show" but about ":<path>" syntax to
> work better with the sparse index.

In general, yes, we do need to be teaching different parts of the codebase
about the sparse index. The way I've been tracking that progress is by
which builtins can stop expanding a sparse index to a full one upon parse.
This progress indicator also matches the testing strategy, which focuses
on preserving behavior for top-level Git commands (and checking that they
don't expand the sparse index when they don't need to).

I understand your thought that it would be better to sell the series to
reviewers by the interesting pieces under the hood that are changing. I
think this is one of the first times where only one of these systems is
sufficient to make an entire builtin (or two) work with the sparse index.

I'll keep this in mind for pitching future updates.

Thanks,
-Stolee