Message ID | 20191122011646.218346-1-emilyshaffer@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | grep: provide pathspecs/patterns via file or stdin | expand |
Hi Emily, On Thu, Nov 21, 2019 at 05:16:46PM -0800, Emily Shaffer wrote: > Teach 'git grep' how to receive pathspecs via file, and add convenience > flags for receiving pathspecs or patterns via stdin. > > In some cases, the list of pathspecs to 'git grep' is longer than the > command line allows, or long enough that a user becomes fatigued reading > the command. In other cases, a user may wish to write a script or > oneliner which generates a list of files to 'git grep' against. In both > scenarios, a reasonable workaround is to provide the list of pathspecs > via a file or stdin. > > Before this change, it was already possible to receive patterns via > stdin by using the argument '-f -'. Now that there are two arguments > which could both use stdin, help safeguard the user from doing so by > providing convenience '--stdin-patterns' and '--stdin-pathspecs' which > are mutually exclusive. (It is still possible to ask for '-f - > --pathspecs-file -', though.) > > Finally, in order to match '--pathspecs-file', give '-f' a long form > '--patterns-file' too. > > Signed-off-by: Emily Shaffer <emilyshaffer@google.com> > --- > Providing too many pathspecs for the argument list to handle is causing > some gore out in the wild with git-secrets: > https://github.com/awslabs/git-secrets/issues/78 Out of curiosity, are there any situations that --pathspecs-file handles better than an xargs invocation? I didn't take a look at the code of git-secrets but I suspect that that could solve their problem. The reason I ask is because (correct me if I'm wrong) a lot of other git commands (like add, reset and checkout) don't seem to accept pathspecs via stdin and could suffer the same problem. xargs seems like a more general way of solving the problem of long command lines. > > A short arg for --pathspecs-file didn't occur to me because grep has > many short args already, but if anyone has a suggestion I'll happily add > it. > > The pathspec list is gathered into an argv_array because parse_pathspec > is only called once, against the entire list of pathspecs. Perhaps it > makes sense to concatenate the paths given on the command line with the > paths given via file, but since it was nontrivial to merge two > argv_arrays and I wasn't certain it was the right choice, I didn't do > so. I'd probably implement that by adding a utility function to > argv-array. ('-f somepatterns.txt -e otherpattern' concatenates the > patterns, by the way.) > > - Emily > > Documentation/git-grep.txt | 18 ++++++++++- > builtin/grep.c | 61 +++++++++++++++++++++++++++++++++----- > t/t7810-grep.sh | 52 ++++++++++++++++++++++++++++++-- > 3 files changed, 121 insertions(+), 10 deletions(-) > > diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt > index c89fb569e3..fb588a3846 100644 > --- a/Documentation/git-grep.txt > +++ b/Documentation/git-grep.txt > @@ -24,7 +24,8 @@ SYNOPSIS > [-A <post-context>] [-B <pre-context>] [-C <context>] > [-W | --function-context] > [--threads <num>] > - [-f <file>] [-e] <pattern> > + [--pathspecs-file <file>] [--stdin-pathspecs] > + [-f | --patterns-file <file>] [--stdin-patterns] [-e] <pattern> > [--and|--or|--not|(|)|-e <pattern>...] > [--recurse-submodules] [--parent-basename <basename>] > [ [--[no-]exclude-standard] [--cached | --no-index | --untracked] | <tree>...] > @@ -270,7 +271,22 @@ providing this option will cause it to die. > See `grep.threads` in 'CONFIGURATION' for more information. > > -f <file>:: > +--patterns-file <file>:: > Read patterns from <file>, one per line. > + > +--stdin-patterns:: > + Read patterns from stdin, one per line. Equivalent to `-f -`. Cannot be > + combined with `--stdin-pathspecs`. > + > +--pathspecs-file <file>:: > + Read pathspecs from <file>, one per line. Pathspecs passed in the > + argument list will be ignored in this mode. > + > +--stdin-pathspecs:: > + Read pathspecs from stdin, one per line. Pathspecs passed in the > + argument list will be ignored in this mode. Equivalent to > + `--pathspecs-file -`. > + > + > Passing the pattern via <file> allows for providing a search pattern > containing a \0. > diff --git a/builtin/grep.c b/builtin/grep.c > index 50ce8d9461..f6f9f84002 100644 > --- a/builtin/grep.c > +++ b/builtin/grep.c > @@ -720,16 +720,38 @@ static int context_callback(const struct option *opt, const char *arg, > return 0; > } > > -static int file_callback(const struct option *opt, const char *arg, int unset) > +static int pathspec_file(struct argv_array *pathspec_argv, const char *arg) > +{ > + struct strbuf sb = STRBUF_INIT; > + > + if (!strcmp(arg, "-")) { > + while (strbuf_getline(&sb, stdin) != EOF) > + argv_array_push(pathspec_argv, sb.buf); > + } > + else { > + if (!strbuf_read_file(&sb, arg, 0)) > + die_errno(_("cannot open '%s'"), arg); > + argv_array_split(pathspec_argv, sb.buf); > + } > + > + strbuf_release(&sb); > + return 0; > +} > + > +static int pathspec_file_callback(const struct option *opt, > + const char *arg, int unset) > +{ > + BUG_ON_OPT_NEG(unset); > + return pathspec_file(opt->value, arg); > +} > + > +static int pattern_file(struct grep_opt *grep_opt, const char *arg) > { > - struct grep_opt *grep_opt = opt->value; > int from_stdin; > FILE *patterns; > int lno = 0; > struct strbuf sb = STRBUF_INIT; > > - BUG_ON_OPT_NEG(unset); > - > from_stdin = !strcmp(arg, "-"); > patterns = from_stdin ? stdin : fopen(arg, "r"); > if (!patterns) > @@ -748,6 +770,12 @@ static int file_callback(const struct option *opt, const char *arg, int unset) > return 0; > } > > +static int pattern_file_callback(const struct option *opt, const char *arg, int unset) > +{ > + BUG_ON_OPT_NEG(unset); > + return pattern_file(opt->value, arg); > +} > + > static int not_callback(const struct option *opt, const char *arg, int unset) > { > struct grep_opt *grep_opt = opt->value; > @@ -803,12 +831,15 @@ int cmd_grep(int argc, const char **argv, const char *prefix) > struct grep_opt opt; > struct object_array list = OBJECT_ARRAY_INIT; > struct pathspec pathspec; > + struct argv_array pathspec_argv; > struct string_list path_list = STRING_LIST_INIT_NODUP; > int i; > int dummy; > int use_index = 1; > int pattern_type_arg = GREP_PATTERN_TYPE_UNSPECIFIED; > int allow_revs; > + int stdin_patterns = 0; > + int stdin_pathspecs = 0; > > struct option options[] = { > OPT_BOOL(0, "cached", &cached, > @@ -896,8 +927,14 @@ int cmd_grep(int argc, const char **argv, const char *prefix) > OPT_BOOL('W', "function-context", &opt.funcbody, > N_("show the surrounding function")), > OPT_GROUP(""), > - OPT_CALLBACK('f', NULL, &opt, N_("file"), > - N_("read patterns from file"), file_callback), > + OPT_CALLBACK('f', "patterns-file", &opt, N_("file"), > + N_("read patterns from file"), pattern_file_callback), > + OPT_BOOL(0, "stdin-patterns", &stdin_patterns, > + N_("read patterns from stdin")), > + OPT_CALLBACK(0, "pathspecs-file", &pathspec_argv, N_("file"), > + N_("read pathspecs from file"), pathspec_file_callback), > + OPT_BOOL(0, "stdin-pathspecs", &stdin_pathspecs, > + N_("read pathspecs from stdin")), > { OPTION_CALLBACK, 'e', NULL, &opt, N_("pattern"), > N_("match <pattern>"), PARSE_OPT_NONEG, pattern_callback }, > { OPTION_CALLBACK, 0, "and", &opt, NULL, > @@ -949,6 +986,15 @@ int cmd_grep(int argc, const char **argv, const char *prefix) > PARSE_OPT_STOP_AT_NON_OPTION); > grep_commit_pattern_type(pattern_type_arg, &opt); > > + if (stdin_patterns && stdin_pathspecs) > + die(_("cannot use stdin for both patterns and pathspecs")); > + > + if (stdin_patterns) > + pattern_file(&opt, "-"); > + > + if (stdin_pathspecs) > + pathspec_file(&pathspec_argv, "-"); > + > if (use_index && !startup_info->have_repository) { > int fallback = 0; > git_config_get_bool("grep.fallbacktonoindex", &fallback); > @@ -1057,7 +1103,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix) > parse_pathspec(&pathspec, 0, > PATHSPEC_PREFER_CWD | > (opt.max_depth != -1 ? PATHSPEC_MAXDEPTH_VALID : 0), > - prefix, argv + i); > + prefix, > + (pathspec_argv.argc ? pathspec_argv.argv : argv) + i); > pathspec.max_depth = opt.max_depth; > pathspec.recursive = 1; > pathspec.recurse_submodules = !!recurse_submodules; > diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh > index 7d7b396c23..be8e60cfa0 100755 > --- a/t/t7810-grep.sh > +++ b/t/t7810-grep.sh > @@ -549,6 +549,10 @@ test_expect_success 'grep -f, non-existent file' ' > test_must_fail git grep -f patterns > ' > > +text_expect_success 'grep --pathspec-file, non-existent file' ' > + test_must_fail git grep --pathspecs-file pathspecs > +' > + > cat >expected <<EOF > file:foo mmap bar > file:foo_mmap bar > @@ -582,8 +586,8 @@ mmap > vvv > EOF > > -test_expect_success 'grep -f, multiple patterns' ' > - git grep -f patterns >actual && > +test_expect_success 'grep --patterns-file, multiple patterns' ' > + git grep --patterns-file patterns >actual && > test_cmp expected actual > ' > > @@ -621,6 +625,11 @@ test_expect_success 'grep -f, ignore empty lines, read patterns from stdin' ' > test_cmp expected actual > ' > > +test_expect_success 'grep --stdin--patterns' ' > + git grep --stdin-patterns <patterns >actual && > + test_cmp expected actual > +' > + > cat >expected <<EOF > y:y yy > -- > @@ -1125,6 +1134,45 @@ test_expect_success 'grep --no-index descends into repos, but not .git' ' > ) > ' > > +test_expect_success 'setup pathspecs-file tests' ' > +cat >excluded-file <<EOF && > +bar > +EOF > +cat >pathspec-file <<EOF && > +foo > +bar > +baz > +EOF > +cat >unrelated-file <<EOF && > +xyz > +EOF > +git add excluded-file pathspec-file unrelated-file > +' Please indent these commands in and use `-\EOF` for the here-doc. > + > +cat >pathspecs <<EOF > +pathspec-file > +unrelated-file > +EOF > + > +cat >expected <<EOF > +pathspec-file:bar > +EOF Is there any reason why these two aren't part of the setup above? Also, even though 'expect' is the conventional name for the expected output file, it seems like the rest of the test uses 'expected' so I guess it's fine to leave it as is. Thanks, Denton > + > +test_expect_success 'grep --stdin-pathspecs' ' > + git grep --stdin-pathspecs "bar" <pathspecs >actual && > + test_cmp expected actual > +' > + > +test_expect_success 'grep --pathspecs-file with file' ' > + git grep --pathspecs-file pathspecs "bar" >actual && > + test_cmp expected actual > +' > + > +test_expect_success 'grep --pathspec-file with stdin' ' > + git grep --pathspecs-file - "bar" <pathspecs >actual && > + test_cmp expected actual > +' > + > test_expect_success 'setup double-dash tests' ' > cat >double-dash <<EOF && > -- > -- > 2.24.0.432.g9d3f5f5b63-goog >
Emily Shaffer <emilyshaffer@google.com> writes: > Teach 'git grep' how to receive pathspecs via file, and add convenience > flags for receiving pathspecs or patterns via stdin. When I invented the terms "pathspec", "refspec", etc. for this project, I meant them to be collective nouns. A pathspec is a set of pathspec elements, each of which is usually given as a separate argument on the command line. So "Read pathspec from file" is good (not "Read pathspecs from file"). Are you aware of the am/pathspec-from-file topic that has been cooking for a while by now? I do not necessarily agree with the second rationale why many commands may want to learn this option given in that series, but do agree with the first one. A lot of times running underlying "git" command multiple times via xargs invocation can mean quite different thing from giving a large pathspec to a single invocation of "git". But not with "git grep" ;-) So I'd say this is a second class target for the pathspec-from-file theme. Thanks.
Denton Liu <liu.denton@gmail.com> writes: > The reason I ask is because (correct me if I'm wrong) a lot of other git > commands (like add, reset and checkout) don't seem to accept pathspecs > via stdin and could suffer the same problem. xargs seems like a more > general way of solving the problem of long command lines. You contributors who are potentially throwing your own topics into the cauldron, please be paying a bit more attention to other topics already cooking in the pot. I think am/pathspec-from-file wants to go in the general direction. There are things "xargs" is sufficient, and there are things that absolutely requires a single invocation of "git". "grep" is a bit of both. $ git grep -e "$pattern" -- A B C (where A, B and C are hundreds) can be split into three independent invocations of "git grep" via "xargs", essentially running $ git grep -e "$pattern" -- A $ git grep -e "$pattern" -- B $ git grep -e "$pattern" -- C independently. But $ git grep -e P1 -e P2 -e P3 -- A (where each of "-e Pn" in reality may be "-e Pn1 -e Pn2 -e Pn3..." that has hundreds of patterns) cannot be split into separate invocations and keep the same meaning. $ git grep -e P1 -- A $ git grep -e P2 -- A $ git grep -e P3 -- A may show the same lines, but (1) lines with both P1 and P2 would be shown duplicated, and (2) the order of the output would be different from a single invocation looking for all patterns at once. Needless to say, the ability to combine patterns with --all-match, --and, etc., and negate them would mean the list of patterns must be split (even when it makes sense to do so) at the right places. $ git grep -e P1 --and -e P2 cannot be split into two or more invocations, for example.
Junio C Hamano <gitster@pobox.com> writes: > Denton Liu <liu.denton@gmail.com> writes: > >> The reason I ask is because (correct me if I'm wrong) a lot of other git >> commands (like add, reset and checkout) don't seem to accept pathspecs >> via stdin and could suffer the same problem. xargs seems like a more >> general way of solving the problem of long command lines. > > There are things "xargs" is sufficient, and there are things that > absolutely requires a single invocation of "git". "grep" is a bit > of both. Sorry for not addressing your sample commands. The "add", "reset", "checkout" subcommands are fine examples that driving them with "xargs" is 100% sufficient and there is no need for the pathspec-from-file thing; adding it to them would probably not make the system any worse, but it is far far less important than some other "git" subcommands for which "xargs" does not work as a replacement. Thanks.
Hi Junio, On Fri, Nov 22, 2019 at 11:34:17AM +0900, Junio C Hamano wrote: > Denton Liu <liu.denton@gmail.com> writes: > > > The reason I ask is because (correct me if I'm wrong) a lot of other git > > commands (like add, reset and checkout) don't seem to accept pathspecs > > via stdin and could suffer the same problem. xargs seems like a more > > general way of solving the problem of long command lines. > > You contributors who are potentially throwing your own topics into > the cauldron, please be paying a bit more attention to other topics > already cooking in the pot. I think am/pathspec-from-file wants to > go in the general direction. Interesting, I never caught this topic when it went over the list. I guess I should read your What's Cooking emails more thoroughly instead of just scanning for my own contributions. > > There are things "xargs" is sufficient, and there are things that > absolutely requires a single invocation of "git". "grep" is a bit > of both. > > $ git grep -e "$pattern" -- A B C > > (where A, B and C are hundreds) can be split into three independent > invocations of "git grep" via "xargs", essentially running > > $ git grep -e "$pattern" -- A > $ git grep -e "$pattern" -- B > $ git grep -e "$pattern" -- C > > independently. In the above, I was talking about the new --pathspecs-file option in particular. So it looks like you agree with me that the new option doesn't supercede xargs? > > But > > $ git grep -e P1 -e P2 -e P3 -- A > > (where each of "-e Pn" in reality may be "-e Pn1 -e Pn2 -e Pn3..." > that has hundreds of patterns) cannot be split into separate > invocations and keep the same meaning. > > $ git grep -e P1 -- A > $ git grep -e P2 -- A > $ git grep -e P3 -- A > > may show the same lines, but (1) lines with both P1 and P2 would be > shown duplicated, and (2) the order of the output would be different > from a single invocation looking for all patterns at once. We already have `-f` to handle this particular case, no? > > Needless to say, the ability to combine patterns with --all-match, > --and, etc., and negate them would mean the list of patterns must be > split (even when it makes sense to do so) at the right places. > > $ git grep -e P1 --and -e P2 > > cannot be split into two or more invocations, for example. Anyway, thanks for going into detail about this. It makes things a lot more clear. -Denton
On Fri, Nov 22, 2019 at 11:34:17AM +0900, Junio C Hamano wrote: > Denton Liu <liu.denton@gmail.com> writes: > > > The reason I ask is because (correct me if I'm wrong) a lot of other git > > commands (like add, reset and checkout) don't seem to accept pathspecs > > via stdin and could suffer the same problem. xargs seems like a more > > general way of solving the problem of long command lines. > > You contributors who are potentially throwing your own topics into > the cauldron, please be paying a bit more attention to other topics > already cooking in the pot. I think am/pathspec-from-file wants to > go in the general direction. Thanks for pointing it out. It is certainly easy to miss the big picture when you spend your time looking at a bug queue instead of a review queue; we who stand in different places see different things more clearly. I appreciate the reminder to look around a little more. am/pathspec-from-file does solve this issue in the way that we discussed internally, so I'm excited to base a solution for this issue on that branch instead. In this situation - where it is not a related fixup for a branch in next, but the topic does rely on that branch - should I send a series which is based on 'next'? How do I make the dependency clear to you via emailed patch? - Emily
On Fri, Nov 22, 2019 at 02:02:41PM -0800, Emily Shaffer wrote: > On Fri, Nov 22, 2019 at 11:34:17AM +0900, Junio C Hamano wrote: > > Denton Liu <liu.denton@gmail.com> writes: > > > > > The reason I ask is because (correct me if I'm wrong) a lot of other git > > > commands (like add, reset and checkout) don't seem to accept pathspecs > > > via stdin and could suffer the same problem. xargs seems like a more > > > general way of solving the problem of long command lines. > > > > You contributors who are potentially throwing your own topics into > > the cauldron, please be paying a bit more attention to other topics > > already cooking in the pot. I think am/pathspec-from-file wants to > > go in the general direction. > > Thanks for pointing it out. It is certainly easy to miss the big picture > when you spend your time looking at a bug queue instead of a review > queue; we who stand in different places see different things more > clearly. I appreciate the reminder to look around a little more. > > am/pathspec-from-file does solve this issue in the way that we discussed > internally, so I'm excited to base a solution for this issue on that > branch instead. In fact, as I think further, it could be that the solution is "don't change anything, just use --pathspec-from-file". But I think my next question will still be useful to learn the response to anyway. :) > In this situation - where it is not a related fixup for a branch in > next, but the topic does rely on that branch - should I send a series > which is based on 'next'? How do I make the dependency clear to you via > emailed patch? > > - Emily
Emily Shaffer <emilyshaffer@google.com> writes: > In this situation - where it is not a related fixup for a branch in > next, but the topic does rely on that branch - should I send a series > which is based on 'next'? How do I make the dependency clear to you via > emailed patch? If you need to use a particular new feature from a topic (or two), not depending on 'next' but either building on (1) directly that topic, or (2) a local merge you create across the 'master' branch and the topic(s) you want to depend on would be good. Whichever way you go, leaving a note either on the cover letter or after the three-dash line to say what you did would always be helpful to the reviewers. In this particular case, I think (1) would work better for you, as the topic you want to use builds directly on the latest release so you won't be missing anything that is not yet on that topic but already in 'master' at this point. Thanks.
diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt index c89fb569e3..fb588a3846 100644 --- a/Documentation/git-grep.txt +++ b/Documentation/git-grep.txt @@ -24,7 +24,8 @@ SYNOPSIS [-A <post-context>] [-B <pre-context>] [-C <context>] [-W | --function-context] [--threads <num>] - [-f <file>] [-e] <pattern> + [--pathspecs-file <file>] [--stdin-pathspecs] + [-f | --patterns-file <file>] [--stdin-patterns] [-e] <pattern> [--and|--or|--not|(|)|-e <pattern>...] [--recurse-submodules] [--parent-basename <basename>] [ [--[no-]exclude-standard] [--cached | --no-index | --untracked] | <tree>...] @@ -270,7 +271,22 @@ providing this option will cause it to die. See `grep.threads` in 'CONFIGURATION' for more information. -f <file>:: +--patterns-file <file>:: Read patterns from <file>, one per line. + +--stdin-patterns:: + Read patterns from stdin, one per line. Equivalent to `-f -`. Cannot be + combined with `--stdin-pathspecs`. + +--pathspecs-file <file>:: + Read pathspecs from <file>, one per line. Pathspecs passed in the + argument list will be ignored in this mode. + +--stdin-pathspecs:: + Read pathspecs from stdin, one per line. Pathspecs passed in the + argument list will be ignored in this mode. Equivalent to + `--pathspecs-file -`. + + Passing the pattern via <file> allows for providing a search pattern containing a \0. diff --git a/builtin/grep.c b/builtin/grep.c index 50ce8d9461..f6f9f84002 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -720,16 +720,38 @@ static int context_callback(const struct option *opt, const char *arg, return 0; } -static int file_callback(const struct option *opt, const char *arg, int unset) +static int pathspec_file(struct argv_array *pathspec_argv, const char *arg) +{ + struct strbuf sb = STRBUF_INIT; + + if (!strcmp(arg, "-")) { + while (strbuf_getline(&sb, stdin) != EOF) + argv_array_push(pathspec_argv, sb.buf); + } + else { + if (!strbuf_read_file(&sb, arg, 0)) + die_errno(_("cannot open '%s'"), arg); + argv_array_split(pathspec_argv, sb.buf); + } + + strbuf_release(&sb); + return 0; +} + +static int pathspec_file_callback(const struct option *opt, + const char *arg, int unset) +{ + BUG_ON_OPT_NEG(unset); + return pathspec_file(opt->value, arg); +} + +static int pattern_file(struct grep_opt *grep_opt, const char *arg) { - struct grep_opt *grep_opt = opt->value; int from_stdin; FILE *patterns; int lno = 0; struct strbuf sb = STRBUF_INIT; - BUG_ON_OPT_NEG(unset); - from_stdin = !strcmp(arg, "-"); patterns = from_stdin ? stdin : fopen(arg, "r"); if (!patterns) @@ -748,6 +770,12 @@ static int file_callback(const struct option *opt, const char *arg, int unset) return 0; } +static int pattern_file_callback(const struct option *opt, const char *arg, int unset) +{ + BUG_ON_OPT_NEG(unset); + return pattern_file(opt->value, arg); +} + static int not_callback(const struct option *opt, const char *arg, int unset) { struct grep_opt *grep_opt = opt->value; @@ -803,12 +831,15 @@ int cmd_grep(int argc, const char **argv, const char *prefix) struct grep_opt opt; struct object_array list = OBJECT_ARRAY_INIT; struct pathspec pathspec; + struct argv_array pathspec_argv; struct string_list path_list = STRING_LIST_INIT_NODUP; int i; int dummy; int use_index = 1; int pattern_type_arg = GREP_PATTERN_TYPE_UNSPECIFIED; int allow_revs; + int stdin_patterns = 0; + int stdin_pathspecs = 0; struct option options[] = { OPT_BOOL(0, "cached", &cached, @@ -896,8 +927,14 @@ int cmd_grep(int argc, const char **argv, const char *prefix) OPT_BOOL('W', "function-context", &opt.funcbody, N_("show the surrounding function")), OPT_GROUP(""), - OPT_CALLBACK('f', NULL, &opt, N_("file"), - N_("read patterns from file"), file_callback), + OPT_CALLBACK('f', "patterns-file", &opt, N_("file"), + N_("read patterns from file"), pattern_file_callback), + OPT_BOOL(0, "stdin-patterns", &stdin_patterns, + N_("read patterns from stdin")), + OPT_CALLBACK(0, "pathspecs-file", &pathspec_argv, N_("file"), + N_("read pathspecs from file"), pathspec_file_callback), + OPT_BOOL(0, "stdin-pathspecs", &stdin_pathspecs, + N_("read pathspecs from stdin")), { OPTION_CALLBACK, 'e', NULL, &opt, N_("pattern"), N_("match <pattern>"), PARSE_OPT_NONEG, pattern_callback }, { OPTION_CALLBACK, 0, "and", &opt, NULL, @@ -949,6 +986,15 @@ int cmd_grep(int argc, const char **argv, const char *prefix) PARSE_OPT_STOP_AT_NON_OPTION); grep_commit_pattern_type(pattern_type_arg, &opt); + if (stdin_patterns && stdin_pathspecs) + die(_("cannot use stdin for both patterns and pathspecs")); + + if (stdin_patterns) + pattern_file(&opt, "-"); + + if (stdin_pathspecs) + pathspec_file(&pathspec_argv, "-"); + if (use_index && !startup_info->have_repository) { int fallback = 0; git_config_get_bool("grep.fallbacktonoindex", &fallback); @@ -1057,7 +1103,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix) parse_pathspec(&pathspec, 0, PATHSPEC_PREFER_CWD | (opt.max_depth != -1 ? PATHSPEC_MAXDEPTH_VALID : 0), - prefix, argv + i); + prefix, + (pathspec_argv.argc ? pathspec_argv.argv : argv) + i); pathspec.max_depth = opt.max_depth; pathspec.recursive = 1; pathspec.recurse_submodules = !!recurse_submodules; diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh index 7d7b396c23..be8e60cfa0 100755 --- a/t/t7810-grep.sh +++ b/t/t7810-grep.sh @@ -549,6 +549,10 @@ test_expect_success 'grep -f, non-existent file' ' test_must_fail git grep -f patterns ' +text_expect_success 'grep --pathspec-file, non-existent file' ' + test_must_fail git grep --pathspecs-file pathspecs +' + cat >expected <<EOF file:foo mmap bar file:foo_mmap bar @@ -582,8 +586,8 @@ mmap vvv EOF -test_expect_success 'grep -f, multiple patterns' ' - git grep -f patterns >actual && +test_expect_success 'grep --patterns-file, multiple patterns' ' + git grep --patterns-file patterns >actual && test_cmp expected actual ' @@ -621,6 +625,11 @@ test_expect_success 'grep -f, ignore empty lines, read patterns from stdin' ' test_cmp expected actual ' +test_expect_success 'grep --stdin--patterns' ' + git grep --stdin-patterns <patterns >actual && + test_cmp expected actual +' + cat >expected <<EOF y:y yy -- @@ -1125,6 +1134,45 @@ test_expect_success 'grep --no-index descends into repos, but not .git' ' ) ' +test_expect_success 'setup pathspecs-file tests' ' +cat >excluded-file <<EOF && +bar +EOF +cat >pathspec-file <<EOF && +foo +bar +baz +EOF +cat >unrelated-file <<EOF && +xyz +EOF +git add excluded-file pathspec-file unrelated-file +' + +cat >pathspecs <<EOF +pathspec-file +unrelated-file +EOF + +cat >expected <<EOF +pathspec-file:bar +EOF + +test_expect_success 'grep --stdin-pathspecs' ' + git grep --stdin-pathspecs "bar" <pathspecs >actual && + test_cmp expected actual +' + +test_expect_success 'grep --pathspecs-file with file' ' + git grep --pathspecs-file pathspecs "bar" >actual && + test_cmp expected actual +' + +test_expect_success 'grep --pathspec-file with stdin' ' + git grep --pathspecs-file - "bar" <pathspecs >actual && + test_cmp expected actual +' + test_expect_success 'setup double-dash tests' ' cat >double-dash <<EOF && --
Teach 'git grep' how to receive pathspecs via file, and add convenience flags for receiving pathspecs or patterns via stdin. In some cases, the list of pathspecs to 'git grep' is longer than the command line allows, or long enough that a user becomes fatigued reading the command. In other cases, a user may wish to write a script or oneliner which generates a list of files to 'git grep' against. In both scenarios, a reasonable workaround is to provide the list of pathspecs via a file or stdin. Before this change, it was already possible to receive patterns via stdin by using the argument '-f -'. Now that there are two arguments which could both use stdin, help safeguard the user from doing so by providing convenience '--stdin-patterns' and '--stdin-pathspecs' which are mutually exclusive. (It is still possible to ask for '-f - --pathspecs-file -', though.) Finally, in order to match '--pathspecs-file', give '-f' a long form '--patterns-file' too. Signed-off-by: Emily Shaffer <emilyshaffer@google.com> --- Providing too many pathspecs for the argument list to handle is causing some gore out in the wild with git-secrets: https://github.com/awslabs/git-secrets/issues/78 A short arg for --pathspecs-file didn't occur to me because grep has many short args already, but if anyone has a suggestion I'll happily add it. The pathspec list is gathered into an argv_array because parse_pathspec is only called once, against the entire list of pathspecs. Perhaps it makes sense to concatenate the paths given on the command line with the paths given via file, but since it was nontrivial to merge two argv_arrays and I wasn't certain it was the right choice, I didn't do so. I'd probably implement that by adding a utility function to argv-array. ('-f somepatterns.txt -e otherpattern' concatenates the patterns, by the way.) - Emily Documentation/git-grep.txt | 18 ++++++++++- builtin/grep.c | 61 +++++++++++++++++++++++++++++++++----- t/t7810-grep.sh | 52 ++++++++++++++++++++++++++++++-- 3 files changed, 121 insertions(+), 10 deletions(-)