Message ID | pull.1207.git.1649349442.gitgitgadget@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | Sparse index integration with 'git show' | expand |
Hi Stolee, Thanks for the series! All my comments are from the perspective of someone without much knowledge of the index, much less sparse-index, or sparse-checkout. I am not sure whether the project should cater to reviewers in my position, but I did have trouble understanding most of this series. I didn't have any concerns with the implementation, but the cover letter, commit messages, and tests were fairly confusing and I think can be explained better with a few changes: On 2022.04.07 16:37, Derrick Stolee via GitGitGadget wrote: > This continues our sequence of integrating builtins with the sparse index. > > 'git show' is relatively simple to get working in a way that doesn't fail > when it would previously succeed, but there are some sutbleties when the > user passes a directory path. If that path happens to be a sparse directory > entry, we suddenly start succeeding and printing the tree information! > > Since this behavior can change depending on the sparse checkout definition > and the state of index entries within that directory, this new behavior > would be more likely to confuse users than help them. The two paragraphs above did not really convey to me on first read-through what the problem was. IIUC the main points are: * git-show does not currently work with the sparse index. * A simple change fixes the above, but causes us to sometimes print unexpected information about trees. * We can use the simple change in our implementation, but must do additional work to make sure we only print the expected information. I think this could be clarified by: * Including examples of the unhelpful output from the simple implementation. * Describing in more detail the situations that trigger this. * Describing why those situations don't currently happen without support for sparse indexes. > Here is an outline of this series: > > * Patch 1: Add more tests around 'git show :' in t1092. > > * Patch 2: Make 'git show' stop expanding the index by default. Make note > of this behavior change in the tests. > > * Patches 3-4: Make the subtle changes to object-name.c that help us reject > sparse directories (patch 3) and print the correct error message (patch > 4). > > Patches 2-4 could realistically be squashed into a single commit, but I > thought it might be instructive to show these individual steps, especially > as an example for our GSoC project. > > This series is based on the current 'master'. I know that Victoria intends > to submit her 'git stash' integration soon, and this provides a way to test > if our split of test changes in t1092 are easy to merge without conflict. If > that is successful, then I will likely submit my integration with the > 'sparse-checkout' builtin after this series is complete. > > Thanks, -Stolee Thanks, Josh
Josh Steadmon <steadmon@google.com> writes: >> 'git show' is relatively simple to get working in a way that doesn't fail >> when it would previously succeed, but there are some sutbleties when the >> user passes a directory path. If that path happens to be a sparse directory >> entry, we suddenly start succeeding and printing the tree information! >> >> Since this behavior can change depending on the sparse checkout definition >> and the state of index entries within that directory, this new behavior >> would be more likely to confuse users than help them. > > The two paragraphs above did not really convey to me on first > read-through what the problem was. IIUC the main points are: > > * git-show does not currently work with the sparse index. > * A simple change fixes the above, but causes us to sometimes print > unexpected information about trees. > * We can use the simple change in our implementation, but must do > additional work to make sure we only print the expected information. > > I think this could be clarified by: > * Including examples of the unhelpful output from the simple > implementation. > * Describing in more detail the situations that trigger this. > * Describing why those situations don't currently happen without support > for sparse indexes. I think the issues patches 2-4 addresses are not limited to any specific command, but how ":<path-in-the-index>" syntax is resolved to an object name (including the way how error messages are issued when the given path is not found in the index). But the title of the series and presentation place undue stress on "git show", which I suspect may be confusing to some readers. For example, with these patches, wouldn't "git rev-parse" start working better, even though the proposed log message for no commit in the series talks about "rev-parse" and no tests are done with the "rev-parse" command? I am not suggesting that commands other than "show" should be also discussed in detail or tested. It would help readers if we said that we are using 'show' merely as an example to demonstrate how ":<path-in-the-index>" syntax interacts with the sparse index entries (1) before the series, and (2) how well the improved object-name parsing logic works after the series.
On 4/14/2022 5:14 PM, Junio C Hamano wrote: > Josh Steadmon <steadmon@google.com> writes: > >>> 'git show' is relatively simple to get working in a way that doesn't fail >>> when it would previously succeed, but there are some sutbleties when the >>> user passes a directory path. If that path happens to be a sparse directory >>> entry, we suddenly start succeeding and printing the tree information! >>> >>> Since this behavior can change depending on the sparse checkout definition >>> and the state of index entries within that directory, this new behavior >>> would be more likely to confuse users than help them. >> >> The two paragraphs above did not really convey to me on first >> read-through what the problem was. IIUC the main points are: >> >> * git-show does not currently work with the sparse index. >> * A simple change fixes the above, but causes us to sometimes print >> unexpected information about trees. >> * We can use the simple change in our implementation, but must do >> additional work to make sure we only print the expected information. >> >> I think this could be clarified by: >> * Including examples of the unhelpful output from the simple >> implementation. >> * Describing in more detail the situations that trigger this. >> * Describing why those situations don't currently happen without support >> for sparse indexes. > > I think the issues patches 2-4 addresses are not limited to any > specific command, but how ":<path-in-the-index>" syntax is resolved > to an object name (including the way how error messages are issued > when the given path is not found in the index). Yes, this is the critical bit. It's the only part of "git show" that cares about the index. > But the title of the series and presentation place undue stress on > "git show", which I suspect may be confusing to some readers. > > For example, with these patches, wouldn't "git rev-parse" start > working better, even though the proposed log message for no commit > in the series talks about "rev-parse" and no tests are done with the > "rev-parse" command? Probably, assuming we unset the command_requires_full_index protection on 'git rev-parse'. This might be a good case for Shaoxuan to try. > I am not suggesting that commands other than "show" should be also > discussed in detail or tested. It would help readers if we said > that we are using 'show' merely as an example to demonstrate how > ":<path-in-the-index>" syntax interacts with the sparse index > entries (1) before the series, and (2) how well the improved > object-name parsing logic works after the series. I can see that. Some background: as we shipped our sparse index integrations in the microsoft/git fork and measured performance of real users, we noticed that 'git show' was a frequently-used command that went from ~0.5 seconds to ~4 seconds in monorepo situations. This was unexpected, because we didn't know about the ":<path>" syntax. Further, it seemed that some third-party tools were triggering this behavior as a frequent way to check on staged content. That motivated quickly shipping this integration. Performance improved to ~0.1 seconds because of the reduced index size. Thanks, -Stolee