mbox series

[0/4] Sparse index integration with 'git show'

Message ID pull.1207.git.1649349442.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series Sparse index integration with 'git show' | expand

Message

Philippe Blain via GitGitGadget April 7, 2022, 4:37 p.m. UTC
This continues our sequence of integrating builtins with the sparse index.

'git show' is relatively simple to get working in a way that doesn't fail
when it would previously succeed, but there are some sutbleties when the
user passes a directory path. If that path happens to be a sparse directory
entry, we suddenly start succeeding and printing the tree information!

Since this behavior can change depending on the sparse checkout definition
and the state of index entries within that directory, this new behavior
would be more likely to confuse users than help them.

Here is an outline of this series:

 * Patch 1: Add more tests around 'git show :' in t1092.

 * Patch 2: Make 'git show' stop expanding the index by default. Make note
   of this behavior change in the tests.

 * Patches 3-4: Make the subtle changes to object-name.c that help us reject
   sparse directories (patch 3) and print the correct error message (patch
   4).

Patches 2-4 could realistically be squashed into a single commit, but I
thought it might be instructive to show these individual steps, especially
as an example for our GSoC project.

This series is based on the current 'master'. I know that Victoria intends
to submit her 'git stash' integration soon, and this provides a way to test
if our split of test changes in t1092 are easy to merge without conflict. If
that is successful, then I will likely submit my integration with the
'sparse-checkout' builtin after this series is complete.

Thanks, -Stolee

Derrick Stolee (4):
  t1092: add compatibility tests for 'git show'
  show: integrate with the sparse index
  object-name: reject trees found in the index
  object-name: diagnose trees in index properly

 builtin/log.c                            |  5 ++++
 object-name.c                            | 25 +++++++++++++++---
 t/t1092-sparse-checkout-compatibility.sh | 33 ++++++++++++++++++++++++
 3 files changed, 60 insertions(+), 3 deletions(-)


base-commit: 07330a41d66a2c9589b585a3a24ecdcf19994f19
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1207%2Fderrickstolee%2Fsparse-index%2Fgit-show-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1207/derrickstolee/sparse-index/git-show-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1207

Comments

Josh Steadmon April 14, 2022, 6:37 p.m. UTC | #1
Hi Stolee,

Thanks for the series! All my comments are from the perspective of
someone without much knowledge of the index, much less sparse-index, or
sparse-checkout. I am not sure whether the project should cater to
reviewers in my position, but I did have trouble understanding most of
this series. I didn't have any concerns with the implementation, but the
cover letter, commit messages, and tests were fairly confusing and I
think can be explained better with a few changes:

On 2022.04.07 16:37, Derrick Stolee via GitGitGadget wrote:
> This continues our sequence of integrating builtins with the sparse index.
> 
> 'git show' is relatively simple to get working in a way that doesn't fail
> when it would previously succeed, but there are some sutbleties when the
> user passes a directory path. If that path happens to be a sparse directory
> entry, we suddenly start succeeding and printing the tree information!
>
> Since this behavior can change depending on the sparse checkout definition
> and the state of index entries within that directory, this new behavior
> would be more likely to confuse users than help them.

The two paragraphs above did not really convey to me on first
read-through what the problem was. IIUC the main points are:

* git-show does not currently work with the sparse index.
* A simple change fixes the above, but causes us to sometimes print
  unexpected information about trees.
* We can use the simple change in our implementation, but must do
  additional work to make sure we only print the expected information.

I think this could be clarified by:
* Including examples of the unhelpful output from the simple
  implementation.
* Describing in more detail the situations that trigger this.
* Describing why those situations don't currently happen without support
  for sparse indexes.

> Here is an outline of this series:
> 
>  * Patch 1: Add more tests around 'git show :' in t1092.
> 
>  * Patch 2: Make 'git show' stop expanding the index by default. Make note
>    of this behavior change in the tests.
> 
>  * Patches 3-4: Make the subtle changes to object-name.c that help us reject
>    sparse directories (patch 3) and print the correct error message (patch
>    4).
> 
> Patches 2-4 could realistically be squashed into a single commit, but I
> thought it might be instructive to show these individual steps, especially
> as an example for our GSoC project.
> 
> This series is based on the current 'master'. I know that Victoria intends
> to submit her 'git stash' integration soon, and this provides a way to test
> if our split of test changes in t1092 are easy to merge without conflict. If
> that is successful, then I will likely submit my integration with the
> 'sparse-checkout' builtin after this series is complete.
> 
> Thanks, -Stolee


Thanks,
Josh
Junio C Hamano April 14, 2022, 9:14 p.m. UTC | #2
Josh Steadmon <steadmon@google.com> writes:

>> 'git show' is relatively simple to get working in a way that doesn't fail
>> when it would previously succeed, but there are some sutbleties when the
>> user passes a directory path. If that path happens to be a sparse directory
>> entry, we suddenly start succeeding and printing the tree information!
>>
>> Since this behavior can change depending on the sparse checkout definition
>> and the state of index entries within that directory, this new behavior
>> would be more likely to confuse users than help them.
>
> The two paragraphs above did not really convey to me on first
> read-through what the problem was. IIUC the main points are:
>
> * git-show does not currently work with the sparse index.
> * A simple change fixes the above, but causes us to sometimes print
>   unexpected information about trees.
> * We can use the simple change in our implementation, but must do
>   additional work to make sure we only print the expected information.
>
> I think this could be clarified by:
> * Including examples of the unhelpful output from the simple
>   implementation.
> * Describing in more detail the situations that trigger this.
> * Describing why those situations don't currently happen without support
>   for sparse indexes.

I think the issues patches 2-4 addresses are not limited to any
specific command, but how ":<path-in-the-index>" syntax is resolved
to an object name (including the way how error messages are issued
when the given path is not found in the index).

But the title of the series and presentation place undue stress on
"git show", which I suspect may be confusing to some readers.

For example, with these patches, wouldn't "git rev-parse" start
working better, even though the proposed log message for no commit
in the series talks about "rev-parse" and no tests are done with the
"rev-parse" command?

I am not suggesting that commands other than "show" should be also
discussed in detail or tested.  It would help readers if we said
that we are using 'show' merely as an example to demonstrate how
":<path-in-the-index>" syntax interacts with the sparse index
entries (1) before the series, and (2) how well the improved
object-name parsing logic works after the series.
Derrick Stolee April 18, 2022, 12:42 p.m. UTC | #3
On 4/14/2022 5:14 PM, Junio C Hamano wrote:
> Josh Steadmon <steadmon@google.com> writes:
> 
>>> 'git show' is relatively simple to get working in a way that doesn't fail
>>> when it would previously succeed, but there are some sutbleties when the
>>> user passes a directory path. If that path happens to be a sparse directory
>>> entry, we suddenly start succeeding and printing the tree information!
>>>
>>> Since this behavior can change depending on the sparse checkout definition
>>> and the state of index entries within that directory, this new behavior
>>> would be more likely to confuse users than help them.
>>
>> The two paragraphs above did not really convey to me on first
>> read-through what the problem was. IIUC the main points are:
>>
>> * git-show does not currently work with the sparse index.
>> * A simple change fixes the above, but causes us to sometimes print
>>   unexpected information about trees.
>> * We can use the simple change in our implementation, but must do
>>   additional work to make sure we only print the expected information.
>>
>> I think this could be clarified by:
>> * Including examples of the unhelpful output from the simple
>>   implementation.
>> * Describing in more detail the situations that trigger this.
>> * Describing why those situations don't currently happen without support
>>   for sparse indexes.
> 
> I think the issues patches 2-4 addresses are not limited to any
> specific command, but how ":<path-in-the-index>" syntax is resolved
> to an object name (including the way how error messages are issued
> when the given path is not found in the index).

Yes, this is the critical bit. It's the only part of "git show"
that cares about the index.
 
> But the title of the series and presentation place undue stress on
> "git show", which I suspect may be confusing to some readers.
> 
> For example, with these patches, wouldn't "git rev-parse" start
> working better, even though the proposed log message for no commit
> in the series talks about "rev-parse" and no tests are done with the
> "rev-parse" command?

Probably, assuming we unset the command_requires_full_index
protection on 'git rev-parse'. This might be a good case for
Shaoxuan to try.

> I am not suggesting that commands other than "show" should be also
> discussed in detail or tested.  It would help readers if we said
> that we are using 'show' merely as an example to demonstrate how
> ":<path-in-the-index>" syntax interacts with the sparse index
> entries (1) before the series, and (2) how well the improved
> object-name parsing logic works after the series.
 
I can see that.
Some background: as we shipped our sparse index integrations in
the microsoft/git fork and measured performance of real users, we
noticed that 'git show' was a frequently-used command that went
from ~0.5 seconds to ~4 seconds in monorepo situations. This was
unexpected, because we didn't know about the ":<path>" syntax.
Further, it seemed that some third-party tools were triggering
this behavior as a frequent way to check on staged content. That
motivated quickly shipping this integration. Performance improved
to ~0.1 seconds because of the reduced index size.

Thanks,
-Stolee