Message ID | 20220908001854.206789-2-shaoxuan.yuan02@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | grep: integrate with sparse index | expand |
Shaoxuan Yuan wrote: > diff --git a/t/t7817-grep-sparse-checkout.sh b/t/t7817-grep-sparse-checkout.sh > index eb59564565..a9879cc980 100755 > --- a/t/t7817-grep-sparse-checkout.sh > +++ b/t/t7817-grep-sparse-checkout.sh > @@ -118,13 +118,19 @@ test_expect_success 'grep searches unmerged file despite not matching sparsity p > test_cmp expect actual > ' > > -test_expect_success 'grep --cached searches entries with the SKIP_WORKTREE bit' ' > +test_expect_success 'grep --cached and --sparse searches entries with the SKIP_WORKTREE bit' ' > + cat >expect <<-EOF && > + a:text > + EOF > + git grep --cached "text" >actual && > + test_cmp expect actual && > + > cat >expect <<-EOF && > a:text > b:text > dir/c:text > EOF > - git grep --cached "text" >actual && > + git grep --cached --sparse "text" >actual && > test_cmp expect actual > ' At first, seeing that all the test titles were changed from "grep --cached <does something>" to "grep --cached and --sparse <does something>", I was going to suggest that 'git grep --cached' (without '--sparse') should receive some new tests in addition to updating existing ones (which now require '--sparse' to work as before). However, looking at the actual content of the tests like the one above, I can see that you've added cases demonstrating the expected difference in behavior between 'grep --cached' and 'grep --cached --sparse'. I can't think of a clearer way to name the tests, though, so this looks okay to me. The rest of the patch (namely, the implementation of '--sparse' and corresponding documentation) looked good as well - I didn't have anything specific to note on that.
Hi Shaoxuan, Please note that it's customary to cc folks who have commented on previous versions of your patch series when you re-roll. On Wed, Sep 7, 2022 at 5:28 PM Shaoxuan Yuan <shaoxuan.yuan02@gmail.com> wrote: > > Add a --sparse option to `git-grep`. It's awesome you're working on this. Adding more of "behavior A" (restricting querying commands to the sparse cone) is something I've wanted for a long time. I think most of your code is beneficial, but I do have some issues with high level direction you were implementing, which may require some tweaks... > When the '--cached' option is used with the 'git grep' command, the > search is limited to the blobs found in the index, not in the worktree. > If the user has enabled sparse-checkout, this might present more results > than they would like, since the files outside of the sparse-checkout are > unlikely to be important to them. "files outside of the sparse-checkout are unlikely to be important to [users]" is certainly an issue. But it's *much* wider than this. Beyond `grep --cached`, it also affects `grep REVISION`, `log`, `diff [REVISION]`, and related things...perhaps even something like `blame`. I think all those other commands probably deserve a mode where they restrict output to the view associated with the user's cone. I've brought that up before[1]. I was skeptical of making it the default, because it'd probably take a long time to implement it everywhere. Slowly changing defaults of all commands over many git releases seems like a poor strategy, but I'm afraid that's what it looks like we are doing here. I'm also worried that slowly changing the defaults without a high-level plan will lead to users struggling to figure out what flag(s) to pass. Are we going to be stuck in a situation where users have to remember that for a dense search, they use one flag for `grep --cached`, a different one for `grep [REVISION]`, no flag is needed for `diff [REVISION]`, but yet a different flag is needed for `git log`? I'm also curious whether there shouldn't be a config option for something like this, so folks don't have to specify it with every invocation. In particular, while I certainly have users that want to just query git for information about the part of the history they are interested in, there are other users who are fully aware they are working in a bigger repository and want to search for additional things to add to their sparse-checkout and predominantly use grep for things like that. They have even documented that `git grep --cached <TERM>` can be used in sparse-checkouts for this purpose...and have been using that for a few years. (I did warn them at the time that there was a risk they'd have to change their command, but it's still going to be a behavioral change they might not expect.) Further, when I brought up changing the behavior of commands during sparse-checkouts to limit to files matching the sparsity paths in that old thread at [1], Stolee was a bit skeptical of making that the default. That suggests, at least, that two independent groups of users would want to use the non-sparse searching frequently, and frequently enough that they'd appreciate a config option. I also brought up in that old thread that perhaps we want to avoid adding a flag to every subcommand, and instead just having a git-global flag for triggering this type of behavior. (e.g. `git --no-restrict grep --cached ...` or `git --dense grep --cached ...`). [1] https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/ and the responses to that email. > Change the default behavior of 'git grep' to focus on the files within > the sparse-checkout definition. To enable the previous behavior, add a > '--sparse' option to 'git grep' that triggers the old behavior that > inspects paths outside of the sparse-checkout definition when paired > with the '--cached' option. I still think the flag name of `--sparse` is totally backwards and highly confusing for the described behavior. I missed Stolee's email at the time (wasn't cc'ed) where he brought up that "--sparse" had already been added to "git-add" and "git-rm", but in those cases the commands aren't querying and I just don't see how they lead to the same level of user confusion. This one seems glaringly wrong to me and both Junio and I flagged it on v1 when we first saw it. (Perhaps it also helps that for the add/rm cases, that a user is often given an error message with the suggested flag to use, which just doesn't make sense here either.) If there is concern that this flag should be the same as add and rm, then I think we need to do the backward compatibility dance and fix add and rm by adding an alias over there so that grep's flag won't be so confusing. I really don't want to have to deal with the backward compatibility headache of "git grep --sparse" means do a non-sparse search for backward compatibility reasons. Here's the flag you should really use..." > Suggested-by: Victoria Dye <vdye@github.com> > Helped-by: Derrick Stolee <derrickstolee@github.com> > Helped-by: Victoria Dye <vdye@github.com> > Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com> > --- > Documentation/git-grep.txt | 5 ++++- > builtin/grep.c | 10 +++++++++- > t/t7817-grep-sparse-checkout.sh | 34 +++++++++++++++++++++++++++------ > 3 files changed, 41 insertions(+), 8 deletions(-) > > diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt > index 58d944bd57..bdd3d5b8a6 100644 > --- a/Documentation/git-grep.txt > +++ b/Documentation/git-grep.txt > @@ -28,7 +28,7 @@ SYNOPSIS > [-f <file>] [-e] <pattern> > [--and|--or|--not|(|)|-e <pattern>...] > [--recurse-submodules] [--parent-basename <basename>] > - [ [--[no-]exclude-standard] [--cached | --no-index | --untracked] | <tree>...] > + [ [--[no-]exclude-standard] [--cached [--sparse] | --no-index | --untracked] | <tree>...] > [--] [<pathspec>...] > > DESCRIPTION > @@ -45,6 +45,9 @@ OPTIONS > Instead of searching tracked files in the working tree, search > blobs registered in the index file. > > +--sparse:: > + Use with --cached. Search outside of sparse-checkout definition. > + > --no-index:: > Search files in the current directory that is not managed by Git. > > diff --git a/builtin/grep.c b/builtin/grep.c > index e6bcdf860c..12abd832fa 100644 > --- a/builtin/grep.c > +++ b/builtin/grep.c > @@ -96,6 +96,8 @@ static pthread_cond_t cond_result; > > static int skip_first_line; > > +static int grep_sparse = 0; > + > static void add_work(struct grep_opt *opt, struct grep_source *gs) > { > if (opt->binary != GREP_BINARY_TEXT) > @@ -525,7 +527,11 @@ static int grep_cache(struct grep_opt *opt, > for (nr = 0; nr < repo->index->cache_nr; nr++) { > const struct cache_entry *ce = repo->index->cache[nr]; > > - if (!cached && ce_skip_worktree(ce)) > + /* > + * Skip entries with SKIP_WORKTREE unless both --sparse and > + * --cached are given. > + */ > + if (!(grep_sparse && cached) && ce_skip_worktree(ce)) > continue; > > strbuf_setlen(&name, name_base_len); > @@ -963,6 +969,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix) > PARSE_OPT_NOCOMPLETE), > OPT_INTEGER('m', "max-count", &opt.max_count, > N_("maximum number of results per file")), > + OPT_BOOL(0, "sparse", &grep_sparse, > + N_("search the contents of files outside the sparse-checkout definition")), > OPT_END() > }; > grep_prefix = prefix; > diff --git a/t/t7817-grep-sparse-checkout.sh b/t/t7817-grep-sparse-checkout.sh > index eb59564565..a9879cc980 100755 > --- a/t/t7817-grep-sparse-checkout.sh > +++ b/t/t7817-grep-sparse-checkout.sh > @@ -118,13 +118,19 @@ test_expect_success 'grep searches unmerged file despite not matching sparsity p > test_cmp expect actual > ' > > -test_expect_success 'grep --cached searches entries with the SKIP_WORKTREE bit' ' > +test_expect_success 'grep --cached and --sparse searches entries with the SKIP_WORKTREE bit' ' > + cat >expect <<-EOF && > + a:text > + EOF > + git grep --cached "text" >actual && > + test_cmp expect actual && > + > cat >expect <<-EOF && > a:text > b:text > dir/c:text > EOF > - git grep --cached "text" >actual && > + git grep --cached --sparse "text" >actual && > test_cmp expect actual > ' > > @@ -143,7 +149,15 @@ test_expect_success 'grep --recurse-submodules honors sparse checkout in submodu > test_cmp expect actual > ' > > -test_expect_success 'grep --recurse-submodules --cached searches entries with the SKIP_WORKTREE bit' ' > +test_expect_success 'grep --recurse-submodules --cached and --sparse searches entries with the SKIP_WORKTREE bit' ' > + cat >expect <<-EOF && > + a:text > + sub/B/b:text > + sub2/a:text > + EOF > + git grep --recurse-submodules --cached "text" >actual && > + test_cmp expect actual && > + > cat >expect <<-EOF && > a:text > b:text > @@ -152,7 +166,7 @@ test_expect_success 'grep --recurse-submodules --cached searches entries with th > sub/B/b:text > sub2/a:text > EOF > - git grep --recurse-submodules --cached "text" >actual && > + git grep --recurse-submodules --cached --sparse "text" >actual && > test_cmp expect actual > ' > > @@ -166,7 +180,15 @@ test_expect_success 'working tree grep does not search the index with CE_VALID a > test_cmp expect actual > ' > > -test_expect_success 'grep --cached searches index entries with both CE_VALID and SKIP_WORKTREE' ' > +test_expect_success 'grep --cached and --sparse searches index entries with both CE_VALID and SKIP_WORKTREE' ' > + cat >expect <<-EOF && > + a:text > + EOF > + test_when_finished "git update-index --no-assume-unchanged b" && > + git update-index --assume-unchanged b && > + git grep --cached text >actual && > + test_cmp expect actual && > + > cat >expect <<-EOF && > a:text > b:text > @@ -174,7 +196,7 @@ test_expect_success 'grep --cached searches index entries with both CE_VALID and > EOF > test_when_finished "git update-index --no-assume-unchanged b" && > git update-index --assume-unchanged b && > - git grep --cached text >actual && > + git grep --cached --sparse text >actual && > test_cmp expect actual > ' > > -- > 2.37.0 I read over this patch and the other two patches. Other than things like variable names propagating the sparse/dense confusion, and the high level goals already discussed, I didn't spot any other issues.
Elijah Newren <newren@gmail.com> writes: > ... I think all those other commands probably deserve a mode where they > restrict output to the view associated with the user's cone. I've > brought that up before[1]. I was skeptical of making it the default, > because it'd probably take a long time to implement it everywhere. > Slowly changing defaults of all commands over many git releases seems > like a poor strategy, but I'm afraid that's what it looks like we are > doing here. > > I'm also worried that slowly changing the defaults without a > high-level plan will lead to users struggling to figure out what > flag(s) to pass. Are we going to be stuck in a situation where users > have to remember that for a dense search, they use one flag for `grep > --cached`, a different one for `grep [REVISION]`, no flag is needed > for `diff [REVISION]`, but yet a different flag is needed for `git > log`? In short, the default should be "everywhere in tree, regardless of the current sparse-checkout settings", with commands opting into implementing "limit only to sparse-checkout settings" as an option, at least initially, with an eye to possibly flip the default later when all commands support that position but not before? I think that is a reasonable position to take. I lean towards the default of limiting the operations to inside sparse cone(s) for all subcommands when all subcommands learn to be capable to do so, but I also agree that using that default for only for subcommands that have learned to do, which will happen over time, would be way too confusing for our users. By the way, I briefly wondered if "limit to sparse-checkout setting" can be done by introducing a fake "attribute" and using the "attr" pathspec magic, but it may probably be a bad match, and separate option would be more appropriate. >> Change the default behavior of 'git grep' to focus on the files within >> the sparse-checkout definition. To enable the previous behavior, add a >> '--sparse' option to 'git grep' that triggers the old behavior that >> inspects paths outside of the sparse-checkout definition when paired >> with the '--cached' option. > > I still think the flag name of `--sparse` is totally backwards and > highly confusing for the described behavior. Yeah, regardless of which between "--sparse" and "--no-sparse" should be the default, I am in 100% agreement that "--sparse" meaning "affect things both inside and outside the sparse cones" is totally backwards. How strongly ingrained is this UI mistake? I have a feeling that this may be something we still can undo and redo relatively easily, i.e. "--sparse" may be that "limit to sparse-checkout setting" option, not "--no-sparse".
On 9/13/2022 11:08 PM, Elijah Newren wrote: > Hi Shaoxuan, > > Please note that it's customary to cc folks who have commented on > previous versions of your patch series when you re-roll. Hi Elijah, Sorry for the delay, I didn't have my computer with me during Merge 2022 and couldn't respond. I'm sorry that I somehow lost you along the way :( > On Wed, Sep 7, 2022 at 5:28 PM Shaoxuan Yuan <shaoxuan.yuan02@gmail.com> wrote: >> >> Add a --sparse option to `git-grep`. > > It's awesome you're working on this. Adding more of "behavior A" > (restricting querying commands to the sparse cone) is something I've > wanted for a long time. Thanks :) > I think most of your code is beneficial, but I do have some issues > with high level direction you were implementing, which may require > some tweaks... OK. >> When the '--cached' option is used with the 'git grep' command, the >> search is limited to the blobs found in the index, not in the worktree. >> If the user has enabled sparse-checkout, this might present more results >> than they would like, since the files outside of the sparse-checkout are >> unlikely to be important to them. > > "files outside of the sparse-checkout are unlikely to be important to > [users]" is certainly an issue. But it's *much* wider than this. > Beyond `grep --cached`, it also affects `grep REVISION`, `log`, `diff > [REVISION]`, and related things...perhaps even something like `blame`. Agree. Keep reading... > I think all those other commands probably deserve a mode where they > restrict output to the view associated with the user's cone. I've Agree. > brought that up before[1]. I was skeptical of making it the default, > because it'd probably take a long time to implement it everywhere. > Slowly changing defaults of all commands over many git releases seems > like a poor strategy, but I'm afraid that's what it looks like we are > doing here. True. > I'm also worried that slowly changing the defaults without a > high-level plan will lead to users struggling to figure out what > flag(s) to pass. Are we going to be stuck in a situation where users > have to remember that for a dense search, they use one flag for `grep > --cached`, a different one for `grep [REVISION]`, no flag is needed > for `diff [REVISION]`, but yet a different flag is needed for `git > log`? I think the inconsistency is certainly unsettling. > I'm also curious whether there shouldn't be a config option for > something like this, so folks don't have to specify it with every > invocation. In particular, while I certainly have users that want to > just query git for information about the part of the history they are > interested in, there are other users who are fully aware they are > working in a bigger repository and want to search for additional > things to add to their sparse-checkout and predominantly use grep for > things like that. They have even documented that `git grep --cached > <TERM>` can be used in sparse-checkouts for this purpose...and have > been using that for a few years. (I did warn them at the time that > there was a risk they'd have to change their command, but it's still > going to be a behavioral change they might not expect.) Further, when > I brought up changing the behavior of commands during sparse-checkouts > to limit to files matching the sparsity paths in that old thread at > [1], Stolee was a bit skeptical of making that the default. That > suggests, at least, that two independent groups of users would want to > use the non-sparse searching frequently, and frequently enough that > they'd appreciate a config option. A config option sounds good. Though I think 1. If this option is for global behavior: users may better off turning off sparse-checkout if they want a config to do things densely everywhere. 2. If this option is for a single subcommand (e.g. 'grep'): I don't have much thoughts here. It certainly can be nice for users who need to do non-sparse searching frequently. This design, if necessary, should belong to a patch where this config is added for every single subcommand? > I also brought up in that old thread that perhaps we want to avoid > adding a flag to every subcommand, and instead just having a > git-global flag for triggering this type of behavior. (e.g. `git > --no-restrict grep --cached ...` or `git --dense grep --cached ...`). This looks more like the answer to me. It's a peace of mind for users if they don't have to worry about whether a subcommand is sparse-aware, and how may their behaviors differ. Though we still may need to update the actual behavior in each subcommand over an extended period of time (though may not be difficult?), which you mentioned above "seems like a poor strategy". > [1] https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/ > and the responses to that email> >> Change the default behavior of 'git grep' to focus on the files within >> the sparse-checkout definition. To enable the previous behavior, add a >> '--sparse' option to 'git grep' that triggers the old behavior that >> inspects paths outside of the sparse-checkout definition when paired >> with the '--cached' option. > > I still think the flag name of `--sparse` is totally backwards and > highly confusing for the described behavior. I missed Stolee's email > at the time (wasn't cc'ed) where he brought up that "--sparse" had > already been added to "git-add" and "git-rm", but in those cases the > commands aren't querying and I just don't see how they lead to the > same level of user confusion. This one seems glaringly wrong to me > and both Junio and I flagged it on v1 when we first saw it. (Perhaps > it also helps that for the add/rm cases, that a user is often given an > error message with the suggested flag to use, which just doesn't make > sense here either.) If there is concern that this flag should be the > same as add and rm, then I think we need to do the backward > compatibility dance and fix add and rm by adding an alias over there > so that grep's flag won't be so confusing. I guess I'm using "--sparse" here because "add", "rm" and "mv" all imply that "when operating on a sparse path, ignores/warns unless '--sparse' is used". I take it as an analogy so "when searching a sparse path, ignores/warns unless '--sparse' is used". As the idea that "Git does *not* care sparse contents unless '--[no-]sparse' is specified" is sort of established through the implementations in "add", "rm", or "mv", I don't see a big problem using "--sparse" here. I *think*, as long as the users are informed that the default is to ignore things outside of the sparse-checkout definition, and they have to do something (using "--sparse" or a potential better name) to override the default, we are safe to use a name that is famous (i.e. "--sparse") even though its literal meaning is not perfectly descriptive. One outlier I do find confusing though, is the "--sparse" option from "git-ls-files". Without it, Git expands the index and show everything outside of sparse-checkout definition, which seems a bit controversial. ... > > I read over this patch and the other two patches. Other than things > like variable names propagating the sparse/dense confusion, and the > high level goals already discussed, I didn't spot any other issues. Thanks, Shaoxuan
On 9/13/2022 11:08 PM, Elijah Newren wrote: ... I think we are now at a point to make this UI decision, which may not be easily (and should not be?) reverted once it's made in this patch. So, is "--sparse" we want for "grep", even for "rm", "add", or "mv"? Love to hear from other contributors :) Thanks, Shaoxuan
On Wed, Sep 14, 2022 at 7:57 PM Junio C Hamano <gitster@pobox.com> wrote: > > Elijah Newren <newren@gmail.com> writes: > > > ... I think all those other commands probably deserve a mode where they > > restrict output to the view associated with the user's cone. I've > > brought that up before[1]. I was skeptical of making it the default, > > because it'd probably take a long time to implement it everywhere. > > Slowly changing defaults of all commands over many git releases seems > > like a poor strategy, but I'm afraid that's what it looks like we are > > doing here. > > > > I'm also worried that slowly changing the defaults without a > > high-level plan will lead to users struggling to figure out what > > flag(s) to pass. Are we going to be stuck in a situation where users > > have to remember that for a dense search, they use one flag for `grep > > --cached`, a different one for `grep [REVISION]`, no flag is needed > > for `diff [REVISION]`, but yet a different flag is needed for `git > > log`? > > In short, the default should be "everywhere in tree, regardless of > the current sparse-checkout settings", with commands opting into > implementing "limit only to sparse-checkout settings" as an option, > at least initially, with an eye to possibly flip the default later > when all commands support that position but not before? > > I think that is a reasonable position to take. I lean towards the > default of limiting the operations to inside sparse cone(s) for all > subcommands when all subcommands learn to be capable to do so, but I > also agree that using that default for only for subcommands that > have learned to do, which will happen over time, would be way too > confusing for our users. > > By the way, I briefly wondered if "limit to sparse-checkout setting" > can be done by introducing a fake "attribute" and using the "attr" > pathspec magic, but it may probably be a bad match, and separate > option would be more appropriate. > > >> Change the default behavior of 'git grep' to focus on the files within > >> the sparse-checkout definition. To enable the previous behavior, add a > >> '--sparse' option to 'git grep' that triggers the old behavior that > >> inspects paths outside of the sparse-checkout definition when paired > >> with the '--cached' option. > > > > I still think the flag name of `--sparse` is totally backwards and > > highly confusing for the described behavior. > > Yeah, regardless of which between "--sparse" and "--no-sparse" > should be the default, I am in 100% agreement that "--sparse" > meaning "affect things both inside and outside the sparse cones" is > totally backwards. > > How strongly ingrained is this UI mistake? I have a feeling that > this may be something we still can undo and redo relatively easily, > i.e. "--sparse" may be that "limit to sparse-checkout setting" > option, not "--no-sparse". It's gotten into a few commands, but I agree it seems like something we can still undo. In fact, not all uses of `--sparse` are backwards; two commands (clone & ls-files) use `--sparse` to mean limit to sparsity specification. There are three commands that use `--sparse` in a potentially confusing or backwards way, though one is new to this cycle and isn't even documented. In more detail... == clone --sparse == For clone, `--sparse` definitely means limit to the sparsity patterns. That's the meaning we want. == ls-files --sparse == For ls-files, the meaning of `--sparse` is "do not recurse into sparse directory entries in order to print the traditional ls-files output, just print the sparse directory entry". So, I'd say that also has the meaning we want; it's for restricting rather than expanding. This one is also interesting in that it is the only command in the list about querying for information rather than modifying the worktree/index, and is thus the best precedent for grep. If grep behaved similarly to ls-files, it would suggest that Shaoxuan's series should default to searching the whole index (the opposite of what his current series does) and that --sparse would be used to restrict to the sparsity patterns (also the opposite of the meaning for his flag). == add --sparse == For add, `--sparse` affects the behavior of untracked files. Its usage allows untracked files to be added to the index despite the file normally being outside the sparsity patterns. There are two ways for users to view this: * The file added is now tracked, and is present (or "checked-out"). Thus, the new file is part of the user's "sparse checkout" now. Perhaps the flag makes sense viewed from this light? (I had actually looked at it this way previously). * We used the `--sparse` flag to allow git-add to operate on something outside of the normal sparsity patterns. The flag is backwards. It might be worth noting that the reason this flag was added was that users are likely to be surprised later when some other command runs and causes the file to vanish when they update the working tree to match the sparsity patterns. == rm --sparse == For rm, `--sparse` allows files to be removed from the index despite normally being outside the sparsity patterns. There's also a couple ways to view this: * Any file being removed is not going to be part of the sparse checkout anymore. Thus there is no meaning to `--sparse`, but git-add used it as a safety check to avoid surprises by operating outside the normal patterns so perhaps we re-use that? * We used the `--sparse` flag to allow rm to operate on something outside the normal sparsity patterns. The flag is backwards. Much like add, it might be worth noting that this flag was added for cases like `git rm '*.jpg'` -- users probably only want such expressions to operate on their sparse-checkout and they could be negatively surprised by also removing stuff elsewhere. == mv --sparse == For mv, `--sparse` feels like it's stretching the logic used for `git-add` and isn't so clear that it could make sense anymore. The connection might be that when it moves files outside the sparsity specification, it actually leaves them materialized, so in that sense you could argue the files are still part of the sparse checkout, but I'd say we're stretching that a bit. However, the `mv` changes were made earlier this same cycle and aren't part of a release yet. It doesn't feel like this should be setting a precedent for how grep should behave. Especially since it's a modification command, and grep is a querying command; ls-files seems like a better precedent. Also, the `--sparse` flag was not documented for mv for whatever reason. == Overall == For existing querying commands (just ls-files), `--sparse` already means restrict to the sparse cone. If we keep using the existing flag names, grep should follow suit. For existing modification commands already released (add, rm), the fact that the command is modifying actually gives a different way to interpret things such that it's not clear `--sparse` was even a problem. However, perhaps the name of the flag is bad just because there are multiple ways to view it and those who view it one way will see it as counter-intuitive. == Flag rename? == There's another reason to potentially rename the flag. We already have `--sparse` and `--dense` flags for rev-list and friends. So, when we want to enable those other commands to restrict to the sparsity patterns, we probably need a different name. So, perhaps, we should rename our `--sparse/--dense` to `--restrict/--no-restrict`. Such a rename would also likely clear up the ambiguity about which way to interpret the command for the add & rm commands (though it'd pick the second one and suggest we were using the wrong name after all). (There are also two other commands that use `--sparse` -- pack-objects and show-branch, though in a much different way and neither would ever be affected by our new --sparse/--dense/--restrict/--no-restrict flags.) Other names are also possible. Any suggestions? == global flag vs subcommand flags == Do we want to make --[no-]restrict a flag for each subcommand, or just make it a global git flag? I kind of think it'd make sense to do the latter == Defaults == As discussed before, we probably want querying commands (ls-files, grep, log, etc.) to default to --no-restrict for now, since we are otherwise slowly changing the defaults. We may want to swap that default in the future. However, for modification commands, I think we want the default to be --restrict, regardless of the default for querying commands. There are some potentially very negative surprises for users if we don't, and those surprises will be delayed rather than occur at the time the user runs the command. In fact, those negative surprises are likely why those commands were the first to gain an option controlling whether they operated on paths outside the sparsity specification. (Also, the modification commands print a warning if they could have affected other files but didn't due the the default of restricting, so I think we have their default correct, even if the flag name is suboptimal.)
On Fri, Sep 16, 2022 at 8:34 PM Shaoxuan Yuan <shaoxuan.yuan02@gmail.com> wrote: > > > I'm also curious whether there shouldn't be a config option for > > something like this, so folks don't have to specify it with every > > invocation. In particular, while I certainly have users that want to > > just query git for information about the part of the history they are > > interested in, there are other users who are fully aware they are > > working in a bigger repository and want to search for additional > > things to add to their sparse-checkout and predominantly use grep for > > things like that. They have even documented that `git grep --cached > > <TERM>` can be used in sparse-checkouts for this purpose...and have > > been using that for a few years. (I did warn them at the time that > > there was a risk they'd have to change their command, but it's still > > going to be a behavioral change they might not expect.) Further, when > > I brought up changing the behavior of commands during sparse-checkouts > > to limit to files matching the sparsity paths in that old thread at > > [1], Stolee was a bit skeptical of making that the default. That > > suggests, at least, that two independent groups of users would want to > > use the non-sparse searching frequently, and frequently enough that > > they'd appreciate a config option. > > A config option sounds good. Though I think > > 1. If this option is for global behavior: users may better off turning > off sparse-checkout if they want a config to do things densely everywhere. Sorry, it sounds like I haven't explained the usecases to you very well. Let me try again. There are people who want to do everything densely, as you say, and those folks can just turn off sparse-checkout or not use it in the first place. Git has traditionally catered to these folks just fine. However, it's not a subset of interest for this discussion and wasn't what I was talking about. There are (at least) two different usecases for people wanting to use sparse-checkouts; I have users that fall under each category: 1) Working on a repository subset; users are _only_ interested in that subset. This usecase is very poorly supported in Git right now, but I think you understand it so I'll only briefly describe it. These folks might know there are other things in the repository, but don't care. Not only should the working tree be sparse, but grep, log, diff, etc. should be restricted to the subset of the tree they are interested in. Restricting operations to the sparsity specification is also important for marrying partial clones with sparse checkouts while allowing disconnected development. Without such a restrict-to-sparsity-paths feature, the partial clones will attempt to download objects the first time they try to grep an old revision, or do log with a glob path. The download will fail, causing the operation to fail, and break the ability of the user to work in a disconnected manner. 2) The working directory is sparse, but users are working in a larger whole. Stolee described this usecase this way[2]: "I'm also focused on users that know that they are a part of a larger whole. They know they are operating on a large repository but focus on what they need to contribute their part. I expect multiple "roles" to use very different, almost disjoint parts of the codebase. Some other "architect" users operate across the entire tree or hop between different sections of the codebase as necessary. In this situation, I'm wary of scoping too many features to the sparse-checkout definition, especially "git log," as it can be too confusing to have their view of the codebase depend on your "point of view." [2] https://lore.kernel.org/git/1a1e33f6-3514-9afc-0a28-5a6b85bd8014@gmail.com/ I describe it very similarly, but I'd like to point out something additional around this usecase and how it can be influenced by dependencies. The first cut for sparse-checkouts is usually the directories you are interested in plus what those directories depend upon within your repository. But there's a monkey wrench here: if you have integration tests, they invert the hierarchy: to run integration tests, you need not only what you are interested in and its dependencies, you also need everything that depends upon what you are interested in or that depends upon one of your dependencies...AND you need all the dependencies of that expanded group. That can easily change your sparse-checkout into a nearly dense one. Naturally, that tends to kill the benefits of sparse-checkouts. There are a couple solutions to this conundrum: either avoid grabbing dependencies (maybe have built versions of your dependencies pulled from a CI cache somewhere), or say that users shouldn't run integration tests directly and instead do it on the CI server when they submit a code review. Or do both. Regardless of whether you stub out your dependencies or stub out the things that depend upon you, there is certainly a reason to want to query and be aware of those other parts of the repository. Thus, sparse-checkouts can be used to limit what you directly build and modify, but these users do not want it to limit their queries of history. Once users pick either the first or the second usecase, they often stick within it. For either group, regardless of what Git's default is, needing to specify an additional flag for *every* grep/log/diff/etc. they run would just be a total annoyance. Neither wants a dense worktree, but one side wants a dense history query while the other wants a sparse one. Different groups should be able to configure the default that works well for them, much like we allow users to configure whether they want "git pull" to rebase or merge. > 2. If this option is for a single subcommand (e.g. 'grep'): I don't have > much thoughts here. It certainly can be nice for users who need to do > non-sparse searching frequently. This design, if necessary, should > belong to a patch where this config is added for every single subcommand? > > > I also brought up in that old thread that perhaps we want to avoid > > adding a flag to every subcommand, and instead just having a > > git-global flag for triggering this type of behavior. (e.g. `git > > --no-restrict grep --cached ...` or `git --dense grep --cached ...`). > > This looks more like the answer to me. It's a peace of mind for users if > they don't have to worry about whether a subcommand is sparse-aware, and > how may their behaviors differ. Though we still may need to update the > actual behavior in each subcommand over an extended period of time > (though may not be difficult?), which you mentioned above "seems like a > poor strategy". > > > [1] https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/ > > and the responses to that email> > >> Change the default behavior of 'git grep' to focus on the files within > >> the sparse-checkout definition. To enable the previous behavior, add a > >> '--sparse' option to 'git grep' that triggers the old behavior that > >> inspects paths outside of the sparse-checkout definition when paired > >> with the '--cached' option. > > > > I still think the flag name of `--sparse` is totally backwards and > > highly confusing for the described behavior. I missed Stolee's email > > at the time (wasn't cc'ed) where he brought up that "--sparse" had > > already been added to "git-add" and "git-rm", but in those cases the > > commands aren't querying and I just don't see how they lead to the > > same level of user confusion. This one seems glaringly wrong to me > > and both Junio and I flagged it on v1 when we first saw it. (Perhaps > > it also helps that for the add/rm cases, that a user is often given an > > error message with the suggested flag to use, which just doesn't make > > sense here either.) If there is concern that this flag should be the > > same as add and rm, then I think we need to do the backward > > compatibility dance and fix add and rm by adding an alias over there > > so that grep's flag won't be so confusing. > > I guess I'm using "--sparse" here because "add", "rm" and "mv" all imply > that "when operating on a sparse path, ignores/warns unless '--sparse' > is used". I take it as an analogy so "when searching a sparse path, > ignores/warns unless '--sparse' is used". As the idea that "Git does > *not* care sparse contents unless '--[no-]sparse' is specified" is sort > of established through the implementations in "add", "rm", or "mv", I > don't see a big problem using "--sparse" here. Well, I do. In addition to just being utterly backwards and confusing in the context of grep: * Both `clone` and `ls-files` use `--sparse` to mean to limit things to the sparsity cone, so we're already kinda split-brained. * grep is more like ls-files (both being querying functions) than add/rm/mv, so should really follow its lead instead of the one from add/rm/mv. * There's another way to interpret `--sparse` for `add` and `rm` such that it makes sense (at least to me); see my other email to Junio in this thread. * `mv` is indeed using it backward, but the `mv` change is new to this cycle (and undocumented) so I'm not sure it counts as much of a precedent yet. > I *think*, as long as the users are informed that the default is to > ignore things outside of the sparse-checkout definition, and they have > to do something (using "--sparse" or a potential better name) to > override the default, we are safe to use a name that is famous (i.e. > "--sparse") even though its literal meaning is not perfectly descriptive. > > One outlier I do find confusing though, is the "--sparse" option from > "git-ls-files". Without it, Git expands the index and show everything > outside of sparse-checkout definition, which seems a bit controversial. Nah, that perfectly matches the expectation of users in the second usecase above -- querying (ls-files/grep/log/diff) defaults to non-restricted history, modifying (add/rm/mv) defaults to restricted paths but warns if the arguments could have matched something else, and the working tree is restricted to sparse paths. It doesn't seem too controversial to me, even if it's not what we want for the long-term default. The defaults for the first usecase would be defaulting to restricted paths for everything, and perhaps not warn if arguments to a modifying command could have matched something else. Anyway, hope that helps you understand my perspective and framing.
Elijah Newren wrote: > == Overall == > > For existing querying commands (just ls-files), `--sparse` already > means restrict to the sparse cone. If we keep using the existing flag > names, grep should follow suit. > > For existing modification commands already released (add, rm), the > fact that the command is modifying actually gives a different way to > interpret things such that it's not clear `--sparse` was even a > problem. However, perhaps the name of the flag is bad just because > there are multiple ways to view it and those who view it one way will > see it as counter-intuitive. > > == Flag rename? == > > There's another reason to potentially rename the flag. We already > have `--sparse` and `--dense` flags for rev-list and friends. So, > when we want to enable those other commands to restrict to the > sparsity patterns, we probably need a different name. So, perhaps, we > should rename our `--sparse/--dense` to `--restrict/--no-restrict`. > Such a rename would also likely clear up the ambiguity about which way > to interpret the command for the add & rm commands (though it'd pick > the second one and suggest we were using the wrong name after all). > > (There are also two other commands that use `--sparse` -- pack-objects > and show-branch, though in a much different way and neither would ever > be affected by our new --sparse/--dense/--restrict/--no-restrict > flags.) > > Other names are also possible. Any suggestions? > > == global flag vs subcommand flags == > > Do we want to make --[no-]restrict a flag for each subcommand, or just > make it a global git flag? I kind of think it'd make sense to do the > latter > > == Defaults == > > As discussed before, we probably want querying commands (ls-files, > grep, log, etc.) to default to --no-restrict for now, since we are > otherwise slowly changing the defaults. We may want to swap that > default in the future. > > However, for modification commands, I think we want the default to be > --restrict, regardless of the default for querying commands. There > are some potentially very negative surprises for users if we don't, > and those surprises will be delayed rather than occur at the time the > user runs the command. In fact, those negative surprises are likely > why those commands were the first to gain an option controlling > whether they operated on paths outside the sparsity specification. > (Also, the modification commands print a warning if they could have > affected other files but didn't due the the default of restricting, so > I think we have their default correct, even if the flag name is > suboptimal.) One of the things I've found myself a bit frustrated with while working on these sparse index integrations is that we haven't had a clear set of guidelines for times when we need to make UI/UX changes relating to 'sparse-checkout' compatibility. I think what you've outlined here is a good start to a larger discussion on the topic, but in the middle of this series might not be the best place for that discussion (at least in terms of preserving for later reference). Elijah, would you be interested in compiling your thoughts into a document in 'Documentation/technical'? If not, Stolee or I could do it. If we could settle on some guidelines (option names, behavior, etc.) for better incorporating 'sparse-checkout' support into existing commands, it'd make future sparse index work substantially easier for everyone involved. As for this series, I think the best way to move the sparse index work along is to drop this patch ("builtin/grep.c: add --sparse option") altogether. Shaoxuan's updates in patch 3 [1] make 'git grep' sparse index-compatible for *all* invocations (not just those without '--sparse'), so we don't need the new option for sparse index compatibility. It can then be re-introduced later (possibly modified) in a series dedicated to unifying the sparse-checkout UX. [1] https://lore.kernel.org/git/20220908001854.206789-4-shaoxuan.yuan02@gmail.com/
Victoria Dye <vdye@github.com> writes: > Elijah Newren wrote: > ... >> However, for modification commands, I think we want the default to be >> --restrict, regardless of the default for querying commands. There >> are some potentially very negative surprises for users if we don't, >> and those surprises will be delayed rather than occur at the time the >> user runs the command. In fact, those negative surprises are likely >> why those commands were the first to gain an option controlling >> whether they operated on paths outside the sparsity specification. >> (Also, the modification commands print a warning if they could have >> affected other files but didn't due the the default of restricting, so >> I think we have their default correct, even if the flag name is >> suboptimal.) > > One of the things I've found myself a bit frustrated with while working on > these sparse index integrations is that we haven't had a clear set of > guidelines for times when we need to make UI/UX changes relating to > 'sparse-checkout' compatibility. I think what you've outlined here is a good > start to a larger discussion on the topic, but in the middle of this series > might not be the best place for that discussion (at least in terms of > preserving for later reference). Yup, I think we were a bit too quick to add the "hide outside sparse cones" feature without first coming up with a reasonable guideline that is designed to keep things consistent. It might have been nice if we did this "make X sparse checkout aware" effort in two separate steps. The first step will not change any behaviour, i.e. no optional or default "hide outside sparse cones" at all, just "we do not upfront expand the index fully; instead as we discover we need to inspect the contents in a subdirectory that is compacted to a tree in the index, we lazily expand it" as performance optimization. And once we made sure we taught all commands that used to expand the index fully upfront not to do so, we do the "guideline" design for UI to "hide outside sparse cones", and add that feature to the commands in the second step. Unfortunately we all get excited too much when we find a new shiny toy, and we ended up getting ahead of ourselves before designing a consistent end user experience. But better late than never ;-) > As for this series, I think the best way to move the sparse index work along > is to drop this patch ("builtin/grep.c: add --sparse option") altogether. Does that roughly correspond to the first step in my "It would have been nice if we did these in two steps" above? That would be a sensible thing to do, as it would be less surprises to the users, I hope. Thanks.
On 9/17/2022 9:24 PM, Elijah Newren wrote: > On Fri, Sep 16, 2022 at 8:34 PM Shaoxuan Yuan <shaoxuan.yuan02@gmail.com> wrote: >> >>> I'm also curious whether there shouldn't be a config option for >>> something like this, so folks don't have to specify it with every >>> invocation. In particular, while I certainly have users that want to >>> just query git for information about the part of the history they are >>> interested in, there are other users who are fully aware they are >>> working in a bigger repository and want to search for additional >>> things to add to their sparse-checkout and predominantly use grep for >>> things like that. They have even documented that `git grep --cached >>> <TERM>` can be used in sparse-checkouts for this purpose...and have >>> been using that for a few years. (I did warn them at the time that >>> there was a risk they'd have to change their command, but it's still >>> going to be a behavioral change they might not expect.) Further, when >>> I brought up changing the behavior of commands during sparse-checkouts >>> to limit to files matching the sparsity paths in that old thread at >>> [1], Stolee was a bit skeptical of making that the default. That >>> suggests, at least, that two independent groups of users would want to >>> use the non-sparse searching frequently, and frequently enough that >>> they'd appreciate a config option. >> >> A config option sounds good. Though I think >> >> 1. If this option is for global behavior: users may better off turning >> off sparse-checkout if they want a config to do things densely everywhere. > > Sorry, it sounds like I haven't explained the usecases to you very > well. Let me try again. > > There are people who want to do everything densely, as you say, and > those folks can just turn off sparse-checkout or not use it in the > first place. Git has traditionally catered to these folks just fine. > However, it's not a subset of interest for this discussion and wasn't > what I was talking about. OK, reading... > There are (at least) two different usecases for people wanting to use > sparse-checkouts; I have users that fall under each category: > > > 1) Working on a repository subset; users are _only_ interested in that subset. > > This usecase is very poorly supported in Git right now, but I think > you understand it so I'll only briefly describe it. > > These folks might know there are other things in the repository, but > don't care. Not only should the working tree be sparse, but grep, > log, diff, etc. should be restricted to the subset of the tree they > are interested in. Right, this is the usecase I am familiar with. > Restricting operations to the sparsity specification is also important > for marrying partial clones with sparse checkouts while allowing > disconnected development. Without such a restrict-to-sparsity-paths > feature, the partial clones will attempt to download objects the first > time they try to grep an old revision, or do log with a glob path. > The download will fail, causing the operation to fail, and break the > ability of the user to work in a disconnected manner. OK, I'm still learning about partial clone, didn't get a chance to look at it. Will try to figure out what this means :) > 2) The working directory is sparse, but users are working in a larger whole. > > Stolee described this usecase this way[2]: > > "I'm also focused on users that know that they are a part of a larger > whole. They know they are operating on a large repository but focus on > what they need to contribute their part. I expect multiple "roles" to > use very different, almost disjoint parts of the codebase. Some other > "architect" users operate across the entire tree or hop between different > sections of the codebase as necessary. In this situation, I'm wary of > scoping too many features to the sparse-checkout definition, especially > "git log," as it can be too confusing to have their view of the codebase > depend on your "point of view." > > [2] https://lore.kernel.org/git/1a1e33f6-3514-9afc-0a28-5a6b85bd8014@gmail.com/ > > I describe it very similarly, but I'd like to point out something > additional around this usecase and how it can be influenced by > dependencies. The first cut for sparse-checkouts is usually the > directories you are interested in plus what those directories depend > upon within your repository. But there's a monkey wrench here: if you > have integration tests, they invert the hierarchy: to run integration > tests, you need not only what you are interested in and its > dependencies, you also need everything that depends upon what you are > interested in or that depends upon one of your dependencies...AND you > need all the dependencies of that expanded group. That can easily > change your sparse-checkout into a nearly dense one. Naturally, that > tends to kill the benefits of sparse-checkouts. There are a couple > solutions to this conundrum: either avoid grabbing dependencies (maybe > have built versions of your dependencies pulled from a CI cache > somewhere), or say that users shouldn't run integration tests directly > and instead do it on the CI server when they submit a code review. Or > do both. Regardless of whether you stub out your dependencies or stub > out the things that depend upon you, there is certainly a reason to > want to query and be aware of those other parts of the repository. > Thus, sparse-checkouts can be used to limit what you directly build > and modify, but these users do not want it to limit their queries of > history. > > > Once users pick either the first or the second usecase, they often > stick within it. For either group, regardless of what Git's default > is, needing to specify an additional flag for *every* > grep/log/diff/etc. they run would just be a total annoyance. Neither > wants a dense worktree, but one side wants a dense history query while > the other wants a sparse one. Different groups should be able to > configure the default that works well for them, much like we allow > users to configure whether they want "git pull" to rebase or merge. OK, now I get it: Case A: users only interested in a subset, so they need only sparse history and a sparse worktree. v.s. Case B: users works within a subset but needs a larger context, so they need a dense history/query (that's why we should let grep default to --no-restrict, as you suggested?), though still a sparse worktree. > >> 2. If this option is for a single subcommand (e.g. 'grep'): I don't have >> much thoughts here. It certainly can be nice for users who need to do >> non-sparse searching frequently. This design, if necessary, should >> belong to a patch where this config is added for every single subcommand? >> >>> I also brought up in that old thread that perhaps we want to avoid >>> adding a flag to every subcommand, and instead just having a >>> git-global flag for triggering this type of behavior. (e.g. `git >>> --no-restrict grep --cached ...` or `git --dense grep --cached ...`). >> >> This looks more like the answer to me. It's a peace of mind for users if >> they don't have to worry about whether a subcommand is sparse-aware, and >> how may their behaviors differ. Though we still may need to update the >> actual behavior in each subcommand over an extended period of time >> (though may not be difficult?), which you mentioned above "seems like a >> poor strategy". >> >>> [1] https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/ >>> and the responses to that email> >>>> Change the default behavior of 'git grep' to focus on the files within >>>> the sparse-checkout definition. To enable the previous behavior, add a >>>> '--sparse' option to 'git grep' that triggers the old behavior that >>>> inspects paths outside of the sparse-checkout definition when paired >>>> with the '--cached' option. >>> >>> I still think the flag name of `--sparse` is totally backwards and >>> highly confusing for the described behavior. I missed Stolee's email >>> at the time (wasn't cc'ed) where he brought up that "--sparse" had >>> already been added to "git-add" and "git-rm", but in those cases the >>> commands aren't querying and I just don't see how they lead to the >>> same level of user confusion. This one seems glaringly wrong to me >>> and both Junio and I flagged it on v1 when we first saw it. (Perhaps >>> it also helps that for the add/rm cases, that a user is often given an >>> error message with the suggested flag to use, which just doesn't make >>> sense here either.) If there is concern that this flag should be the >>> same as add and rm, then I think we need to do the backward >>> compatibility dance and fix add and rm by adding an alias over there >>> so that grep's flag won't be so confusing. >> >> I guess I'm using "--sparse" here because "add", "rm" and "mv" all imply >> that "when operating on a sparse path, ignores/warns unless '--sparse' >> is used". I take it as an analogy so "when searching a sparse path, >> ignores/warns unless '--sparse' is used". As the idea that "Git does >> *not* care sparse contents unless '--[no-]sparse' is specified" is sort >> of established through the implementations in "add", "rm", or "mv", I >> don't see a big problem using "--sparse" here. > > Well, I do. > > In addition to just being utterly backwards and confusing in the > context of grep: > * Both `clone` and `ls-files` use `--sparse` to mean to limit things > to the sparsity cone, so we're already kinda split-brained. Agree. > * grep is more like ls-files (both being querying functions) than > add/rm/mv, so should really follow its lead instead of the one from > add/rm/mv. Agree. > * There's another way to interpret `--sparse` for `add` and `rm` > such that it makes sense (at least to me); see my other email to Junio > in this thread. According to the spirit of your points, I think they should be defaulting to --restrict (a rename perhaps) right now. > * `mv` is indeed using it backward, but the `mv` change is new to > this cycle (and undocumented) so I'm not sure it counts as much of a > precedent yet. Oops, I was making the modifications to `mv` and forgot to add documentation to it. Though the --sparse of `mv` was not documented before I touching it. Perhaps it can be added later if we are going to rename --sparse/--dense to --restrict/--no-restrict. >> I *think*, as long as the users are informed that the default is to >> ignore things outside of the sparse-checkout definition, and they have >> to do something (using "--sparse" or a potential better name) to >> override the default, we are safe to use a name that is famous (i.e. >> "--sparse") even though its literal meaning is not perfectly descriptive. >> >> One outlier I do find confusing though, is the "--sparse" option from >> "git-ls-files". Without it, Git expands the index and show everything >> outside of sparse-checkout definition, which seems a bit controversial. > > Nah, that perfectly matches the expectation of users in the second > usecase above -- querying (ls-files/grep/log/diff) defaults to > non-restricted history, modifying (add/rm/mv) defaults to restricted > paths but warns if the arguments could have matched something else, > and the working tree is restricted to sparse paths. It doesn't seem > too controversial to me, even if it's not what we want for the > long-term default. OK. After the reasoning you gave above, now the --sparse of ls-files looks good. > > The defaults for the first usecase would be defaulting to restricted > paths for everything, and perhaps not warn if arguments to a modifying > command could have matched something else. > > > Anyway, hope that helps you understand my perspective and framing. Thanks for the explanations, now I get it and agree with your points :) Thanks, Shaoxuan
Hi Victoria, :-) On 9/18/2022 12:52 PM, Victoria Dye wrote: > Elijah Newren wrote: >> == Overall == >> >> For existing querying commands (just ls-files), `--sparse` already >> means restrict to the sparse cone. If we keep using the existing flag >> names, grep should follow suit. >> >> For existing modification commands already released (add, rm), the >> fact that the command is modifying actually gives a different way to >> interpret things such that it's not clear `--sparse` was even a >> problem. However, perhaps the name of the flag is bad just because >> there are multiple ways to view it and those who view it one way will >> see it as counter-intuitive. >> >> == Flag rename? == >> >> There's another reason to potentially rename the flag. We already >> have `--sparse` and `--dense` flags for rev-list and friends. So, >> when we want to enable those other commands to restrict to the >> sparsity patterns, we probably need a different name. So, perhaps, we >> should rename our `--sparse/--dense` to `--restrict/--no-restrict`. >> Such a rename would also likely clear up the ambiguity about which way >> to interpret the command for the add & rm commands (though it'd pick >> the second one and suggest we were using the wrong name after all). >> >> (There are also two other commands that use `--sparse` -- pack-objects >> and show-branch, though in a much different way and neither would ever >> be affected by our new --sparse/--dense/--restrict/--no-restrict >> flags.) >> >> Other names are also possible. Any suggestions? >> >> == global flag vs subcommand flags == >> >> Do we want to make --[no-]restrict a flag for each subcommand, or just >> make it a global git flag? I kind of think it'd make sense to do the >> latter >> >> == Defaults == >> >> As discussed before, we probably want querying commands (ls-files, >> grep, log, etc.) to default to --no-restrict for now, since we are >> otherwise slowly changing the defaults. We may want to swap that >> default in the future. >> >> However, for modification commands, I think we want the default to be >> --restrict, regardless of the default for querying commands. There >> are some potentially very negative surprises for users if we don't, >> and those surprises will be delayed rather than occur at the time the >> user runs the command. In fact, those negative surprises are likely >> why those commands were the first to gain an option controlling >> whether they operated on paths outside the sparsity specification. >> (Also, the modification commands print a warning if they could have >> affected other files but didn't due the the default of restricting, so >> I think we have their default correct, even if the flag name is >> suboptimal.) > > One of the things I've found myself a bit frustrated with while working on > these sparse index integrations is that we haven't had a clear set of > guidelines for times when we need to make UI/UX changes relating to > 'sparse-checkout' compatibility. I think what you've outlined here is a good > start to a larger discussion on the topic, but in the middle of this series > might not be the best place for that discussion (at least in terms of > preserving for later reference). > > Elijah, would you be interested in compiling your thoughts into a document > in 'Documentation/technical'? If not, Stolee or I could do it. If we could > settle on some guidelines (option names, behavior, etc.) for better > incorporating 'sparse-checkout' support into existing commands, it'd make > future sparse index work substantially easier for everyone involved. This sounds good! I am always confused about the inconsistency of the meaning of "--sparse" across a variety of commands. A guideline definitely corrects prior integrations and helps future ones. > As for this series, I think the best way to move the sparse index work along > is to drop this patch ("builtin/grep.c: add --sparse option") altogether. > Shaoxuan's updates in patch 3 [1] make 'git grep' sparse index-compatible > for *all* invocations (not just those without '--sparse'), so we don't need > the new option for sparse index compatibility. It can then be re-introduced > later (possibly modified) in a series dedicated to unifying the > sparse-checkout UX. Are you suggesting that we should still follow the original "use --cache to search within the index and show SKIP_WORKTREE entries found"? I'm asking because the tests in the second patch [2] are still using the lately-introduced "--sparse". If yes, then I think it sounds good to re-introduce the (potentially) modified UI in the future :-). [2] https://lore.kernel.org/git/20220908001854.206789-3-shaoxuan.yuan02@gmail.com/ > > [1] https://lore.kernel.org/git/20220908001854.206789-4-shaoxuan.yuan02@gmail.com/ Thanks, Shaoxuan
On Sun, Sep 18 2022, Victoria Dye wrote: > Elijah, would you be interested in compiling your thoughts into a document > in 'Documentation/technical'? If not, Stolee or I could do it. If we could > settle on some guidelines (option names, behavior, etc.) for better > incorporating 'sparse-checkout' support into existing commands, it'd make > future sparse index work substantially easier for everyone involved. This sounds good. I'd just like to suggest that incorporating a table similar to the one I made for checkout/switch in would be useful for such documentation: https://lore.kernel.org/git/211021.86wnm6l1ip.gmgdl@evledraar.gmail.com/ We ended up dropping the ball on that topic, but for cross-command UX I think it's a very useful way to present how a "meta option", or an option shared across many commands is expected to behave.
On Sun, Sep 18, 2022 at 12:52 PM Victoria Dye <vdye@github.com> wrote: > > Elijah Newren wrote: > > == Overall == > > > > For existing querying commands (just ls-files), `--sparse` already > > means restrict to the sparse cone. If we keep using the existing flag > > names, grep should follow suit. > > > > For existing modification commands already released (add, rm), the > > fact that the command is modifying actually gives a different way to > > interpret things such that it's not clear `--sparse` was even a > > problem. However, perhaps the name of the flag is bad just because > > there are multiple ways to view it and those who view it one way will > > see it as counter-intuitive. > > > > == Flag rename? == > > > > There's another reason to potentially rename the flag. We already > > have `--sparse` and `--dense` flags for rev-list and friends. So, > > when we want to enable those other commands to restrict to the > > sparsity patterns, we probably need a different name. So, perhaps, we > > should rename our `--sparse/--dense` to `--restrict/--no-restrict`. > > Such a rename would also likely clear up the ambiguity about which way > > to interpret the command for the add & rm commands (though it'd pick > > the second one and suggest we were using the wrong name after all). > > > > (There are also two other commands that use `--sparse` -- pack-objects > > and show-branch, though in a much different way and neither would ever > > be affected by our new --sparse/--dense/--restrict/--no-restrict > > flags.) > > > > Other names are also possible. Any suggestions? > > > > == global flag vs subcommand flags == > > > > Do we want to make --[no-]restrict a flag for each subcommand, or just > > make it a global git flag? I kind of think it'd make sense to do the > > latter > > > > == Defaults == > > > > As discussed before, we probably want querying commands (ls-files, > > grep, log, etc.) to default to --no-restrict for now, since we are > > otherwise slowly changing the defaults. We may want to swap that > > default in the future. > > > > However, for modification commands, I think we want the default to be > > --restrict, regardless of the default for querying commands. There > > are some potentially very negative surprises for users if we don't, > > and those surprises will be delayed rather than occur at the time the > > user runs the command. In fact, those negative surprises are likely > > why those commands were the first to gain an option controlling > > whether they operated on paths outside the sparsity specification. > > (Also, the modification commands print a warning if they could have > > affected other files but didn't due the the default of restricting, so > > I think we have their default correct, even if the flag name is > > suboptimal.) > > One of the things I've found myself a bit frustrated with while working on > these sparse index integrations is that we haven't had a clear set of > guidelines for times when we need to make UI/UX changes relating to > 'sparse-checkout' compatibility. I think what you've outlined here is a good > start to a larger discussion on the topic, but in the middle of this series > might not be the best place for that discussion (at least in terms of > preserving for later reference). Yeah, that's fair, and I apologize for the problems. I should have pushed for a resolution and/or documentation of these issues at some point; particularly since I was the one to bring it up in the first place. Between Stolee asking us to defer for a year-ish on UI/UX changes in sparse-checkout while he got sparse-index into place, and various other things coming up in the meantime, I just didn't get back to it. I probably should have, especially since we also had other similar discussions going back to when git-sparse-checkout was first introduced, but we've often focused on just solving the next subset of usecases that were within reach rather than getting a bigger design document. Knowing that these kinds of issues were lurking was part of the reason I insisted on having the big scary warning in the docs: """ THIS COMMAND IS EXPERIMENTAL. ITS BEHAVIOR, AND THE BEHAVIOR OF OTHER COMMANDS IN THE PRESENCE OF SPARSE-CHECKOUTS, WILL LIKELY CHANGE IN THE FUTURE. """ I'm glad I at least had the foresight to insist on that small measure... :-) > Elijah, would you be interested in compiling your thoughts into a document > in 'Documentation/technical'? If not, Stolee or I could do it. If we could > settle on some guidelines (option names, behavior, etc.) for better > incorporating 'sparse-checkout' support into existing commands, it'd make > future sparse index work substantially easier for everyone involved. Sure, I'll take a stab at it this week. > As for this series, I think the best way to move the sparse index work along > is to drop this patch ("builtin/grep.c: add --sparse option") altogether. > Shaoxuan's updates in patch 3 [1] make 'git grep' sparse index-compatible > for *all* invocations (not just those without '--sparse'), so we don't need > the new option for sparse index compatibility. It can then be re-introduced > later (possibly modified) in a series dedicated to unifying the > sparse-checkout UX. Seems reasonable. > [1] https://lore.kernel.org/git/20220908001854.206789-4-shaoxuan.yuan02@gmail.com/
diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt index 58d944bd57..bdd3d5b8a6 100644 --- a/Documentation/git-grep.txt +++ b/Documentation/git-grep.txt @@ -28,7 +28,7 @@ SYNOPSIS [-f <file>] [-e] <pattern> [--and|--or|--not|(|)|-e <pattern>...] [--recurse-submodules] [--parent-basename <basename>] - [ [--[no-]exclude-standard] [--cached | --no-index | --untracked] | <tree>...] + [ [--[no-]exclude-standard] [--cached [--sparse] | --no-index | --untracked] | <tree>...] [--] [<pathspec>...] DESCRIPTION @@ -45,6 +45,9 @@ OPTIONS Instead of searching tracked files in the working tree, search blobs registered in the index file. +--sparse:: + Use with --cached. Search outside of sparse-checkout definition. + --no-index:: Search files in the current directory that is not managed by Git. diff --git a/builtin/grep.c b/builtin/grep.c index e6bcdf860c..12abd832fa 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -96,6 +96,8 @@ static pthread_cond_t cond_result; static int skip_first_line; +static int grep_sparse = 0; + static void add_work(struct grep_opt *opt, struct grep_source *gs) { if (opt->binary != GREP_BINARY_TEXT) @@ -525,7 +527,11 @@ static int grep_cache(struct grep_opt *opt, for (nr = 0; nr < repo->index->cache_nr; nr++) { const struct cache_entry *ce = repo->index->cache[nr]; - if (!cached && ce_skip_worktree(ce)) + /* + * Skip entries with SKIP_WORKTREE unless both --sparse and + * --cached are given. + */ + if (!(grep_sparse && cached) && ce_skip_worktree(ce)) continue; strbuf_setlen(&name, name_base_len); @@ -963,6 +969,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix) PARSE_OPT_NOCOMPLETE), OPT_INTEGER('m', "max-count", &opt.max_count, N_("maximum number of results per file")), + OPT_BOOL(0, "sparse", &grep_sparse, + N_("search the contents of files outside the sparse-checkout definition")), OPT_END() }; grep_prefix = prefix; diff --git a/t/t7817-grep-sparse-checkout.sh b/t/t7817-grep-sparse-checkout.sh index eb59564565..a9879cc980 100755 --- a/t/t7817-grep-sparse-checkout.sh +++ b/t/t7817-grep-sparse-checkout.sh @@ -118,13 +118,19 @@ test_expect_success 'grep searches unmerged file despite not matching sparsity p test_cmp expect actual ' -test_expect_success 'grep --cached searches entries with the SKIP_WORKTREE bit' ' +test_expect_success 'grep --cached and --sparse searches entries with the SKIP_WORKTREE bit' ' + cat >expect <<-EOF && + a:text + EOF + git grep --cached "text" >actual && + test_cmp expect actual && + cat >expect <<-EOF && a:text b:text dir/c:text EOF - git grep --cached "text" >actual && + git grep --cached --sparse "text" >actual && test_cmp expect actual ' @@ -143,7 +149,15 @@ test_expect_success 'grep --recurse-submodules honors sparse checkout in submodu test_cmp expect actual ' -test_expect_success 'grep --recurse-submodules --cached searches entries with the SKIP_WORKTREE bit' ' +test_expect_success 'grep --recurse-submodules --cached and --sparse searches entries with the SKIP_WORKTREE bit' ' + cat >expect <<-EOF && + a:text + sub/B/b:text + sub2/a:text + EOF + git grep --recurse-submodules --cached "text" >actual && + test_cmp expect actual && + cat >expect <<-EOF && a:text b:text @@ -152,7 +166,7 @@ test_expect_success 'grep --recurse-submodules --cached searches entries with th sub/B/b:text sub2/a:text EOF - git grep --recurse-submodules --cached "text" >actual && + git grep --recurse-submodules --cached --sparse "text" >actual && test_cmp expect actual ' @@ -166,7 +180,15 @@ test_expect_success 'working tree grep does not search the index with CE_VALID a test_cmp expect actual ' -test_expect_success 'grep --cached searches index entries with both CE_VALID and SKIP_WORKTREE' ' +test_expect_success 'grep --cached and --sparse searches index entries with both CE_VALID and SKIP_WORKTREE' ' + cat >expect <<-EOF && + a:text + EOF + test_when_finished "git update-index --no-assume-unchanged b" && + git update-index --assume-unchanged b && + git grep --cached text >actual && + test_cmp expect actual && + cat >expect <<-EOF && a:text b:text @@ -174,7 +196,7 @@ test_expect_success 'grep --cached searches index entries with both CE_VALID and EOF test_when_finished "git update-index --no-assume-unchanged b" && git update-index --assume-unchanged b && - git grep --cached text >actual && + git grep --cached --sparse text >actual && test_cmp expect actual '
Add a --sparse option to `git-grep`. When the '--cached' option is used with the 'git grep' command, the search is limited to the blobs found in the index, not in the worktree. If the user has enabled sparse-checkout, this might present more results than they would like, since the files outside of the sparse-checkout are unlikely to be important to them. Change the default behavior of 'git grep' to focus on the files within the sparse-checkout definition. To enable the previous behavior, add a '--sparse' option to 'git grep' that triggers the old behavior that inspects paths outside of the sparse-checkout definition when paired with the '--cached' option. Suggested-by: Victoria Dye <vdye@github.com> Helped-by: Derrick Stolee <derrickstolee@github.com> Helped-by: Victoria Dye <vdye@github.com> Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com> --- Documentation/git-grep.txt | 5 ++++- builtin/grep.c | 10 +++++++++- t/t7817-grep-sparse-checkout.sh | 34 +++++++++++++++++++++++++++------ 3 files changed, 41 insertions(+), 8 deletions(-)