Message ID | 20191204203911.237056-1-emilyshaffer@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] grep: support the --pathspec-from-file option | expand |
Hi Emily, On Wed, Dec 04, 2019 at 12:39:11PM -0800, Emily Shaffer wrote: > @@ -289,6 +293,20 @@ In future versions we may learn to support patterns containing \0 for > more search backends, until then we'll die when the pattern type in > question doesn't support them. > > +--pathspec-from-file <file>:: > + Read pathspec from <file> instead of the command line. If `<file>` is > + exactly `-` then standard input is used; standard input cannot be used > + for both --patterns-from-file and --pathspec-from-file. Pathspec elements > + are separated by LF or CR/LF. Pathspec elements can be quoted as > + explained for the configuration variable `core.quotePath` (see > + linkgit:git-config[1]). See also `--pathspec-file-nul` and global > + `--literal-pathspecs`. > + > +--pathspec-file-nul:: > + Only meaningful with `--pathspec-from-file`. Pathspec elements are > + separated with NUL character and all other characters are taken > + literally (including newlines and quotes). Does it make sense to have a corresponding --patterns-file-nul option? As in, is it possible for patterns to contain inline newlines? If it's not possible, then that option probably isn't necessary. > + > -e:: > The next parameter is the pattern. This option has to be > used for patterns starting with `-` and should be used in > @@ -1125,6 +1129,44 @@ test_expect_success 'grep --no-index descends into repos, but not .git' ' > ) > ' > > +test_expect_success 'setup pathspecs-file tests' ' > +cat >excluded-file <<EOF && > +bar > +EOF > +cat >pathspec-file <<EOF && > +foo > +bar > +baz > +EOF > +cat >unrelated-file <<EOF && > +xyz > +EOF > +git add excluded-file pathspec-file unrelated-file > +' Could you please change these here-docs to be <<-\EOF and then indent the test case? > + > +cat >pathspecs <<EOF > +pathspec-file > +unrelated-file > +EOF > + > +cat >expected <<EOF > +pathspec-file:bar > +EOF In an earlier email, I was wondering aloud why these two blocks were outside of the test case above. I think the answer to that is that you're following the existing style of the test case. In that case, could I pester you to do some test clean up while you're at it? I think it'd be nice to move the cats into their respective test cases (with <<-\EOF) and also rename the files from 'expected' to 'expect'. But otherwise, I think it's fine as is as well. Thanks, Denton > + > +test_expect_success 'grep --pathspec-from-file with file' ' > + git grep --pathspec-from-file pathspecs "bar" >actual && > + test_cmp expected actual > +' > + > +test_expect_success 'grep --pathspec-file with stdin' ' > + git grep --pathspec-from-file - "bar" <pathspecs >actual && > + test_cmp expected actual > +' > + > +test_expect_success 'grep with two stdin inputs fails' ' > + test_must_fail git grep --pathspec-from-file - --patterns-from-file - <pathspecs > +' > + > test_expect_success 'setup double-dash tests' ' > cat >double-dash <<EOF && > -- > -- > 2.24.0.393.g34dc348eaf-goog >
Denton Liu <liu.denton@gmail.com> writes: >> +--pathspec-file-nul:: >> + Only meaningful with `--pathspec-from-file`. Pathspec elements are >> + separated with NUL character and all other characters are taken >> + literally (including newlines and quotes). > > Does it make sense to have a corresponding --patterns-file-nul option? I do not think so. "grep" is always record oriented and the record separter is the LF, so patterns file can safely be delimited with LF. >> +test_expect_success 'setup pathspecs-file tests' ' >> +cat >excluded-file <<EOF && >> +bar >> +EOF >> ... >> +git add excluded-file pathspec-file unrelated-file >> +' > > Could you please change these here-docs to be <<-\EOF and then indent > the test case? Good suggestion. test_expect_success 'setup ...' ' cat >excluded-file <<-\EOF && bar EOF ... git add ... ' If each line in these files consists of a short single token (which seems to be the case), perhaps consider using test_write_lines? test_write_lines >excluded-file bar && test_write_lines >pathspec-file foo bar baz && test_write_lines >unrelated-file xyz && test_write_lines >pathspecs pathspec-file unrelated-file &&
Emily Shaffer <emilyshaffer@google.com> writes: > Teach 'git grep' to use OPT_PATHSPEC_FROM_FILE and update the > documentation accordingly. > > This changes enables 'git grep' to receive the pathspec from a file by > specifying the path, or from stdin by specifying '-' as the path. This > matches the previous functionality of '-f', so the documentation of '-f' > has been expanded to describe this functionality. To let '-f' match the > new '--pathspec-from-file' option, also teach a '--patterns-from-file' > long name to '-f'. > > Since there are now two arguments which can attempt to read from stdin, > add a safeguard to check whether the user specified '-' for both of > them. It is still possible for a user to pass '/dev/stdin' to one or > both arguments at present; we do not explicitly check. OK. I guess this is good enough at least for now and possibly forever as that falls into the "doctor, it hurts when I do this" category. > Refactored to use am/pathspec-from-file. This changes the implementation > significantly since v1, but the testcases mostly remain the same. I am hoping that this topic, and Alexandr's "add" and "checkout" stuff, would help establishing a good pattern for other commands to follow. > diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt > index c89fb569e3..56b1c5a302 100644 > --- a/Documentation/git-grep.txt > +++ b/Documentation/git-grep.txt > @@ -24,7 +24,8 @@ SYNOPSIS > [-A <post-context>] [-B <pre-context>] [-C <context>] > [-W | --function-context] > [--threads <num>] > - [-f <file>] [-e] <pattern> > + [-f | --patterns-from-file <file>] [-e] <pattern> OK. > + [--pathspec-from-file=<file> [--pathspec-file-nul]] OK. > @@ -270,7 +271,10 @@ providing this option will cause it to die. > See `grep.threads` in 'CONFIGURATION' for more information. > > -f <file>:: > - Read patterns from <file>, one per line. > +--patterns-from-file <file>:: > + Read patterns from <file>, one per line. If `<file>` is exactly `-` then > + standard input is used; standard input cannot be used for both > + --patterns-from-file and --pathspec-from-file. Makes sense---otherwise they would compete and we cannot know which kind each line in the standard input is ;-) > @@ -289,6 +293,20 @@ In future versions we may learn to support patterns containing \0 for > more search backends, until then we'll die when the pattern type in > question doesn't support them. > > +--pathspec-from-file <file>:: > + Read pathspec from <file> instead of the command line. If `<file>` is > + exactly `-` then standard input is used; standard input cannot be used > + for both --patterns-from-file and --pathspec-from-file. Pathspec elements > + are separated by LF or CR/LF. Pathspec elements can be quoted as > + explained for the configuration variable `core.quotePath` (see > + linkgit:git-config[1]). See also `--pathspec-file-nul` and global > + `--literal-pathspecs`. > + > +--pathspec-file-nul:: > + Only meaningful with `--pathspec-from-file`. Pathspec elements are > + separated with NUL character and all other characters are taken > + literally (including newlines and quotes). > + I think these matches the ones in git-reset.txt from the earlier work by Alexandr's, which is good. > diff --git a/builtin/grep.c b/builtin/grep.c > index 50ce8d9461..54ba991c42 100644 > --- a/builtin/grep.c > +++ b/builtin/grep.c > @@ -31,6 +31,7 @@ static char const * const grep_usage[] = { > }; > > static int recurse_submodules; > +static int patterns_from_stdin, pathspec_from_stdin; > > #define GREP_NUM_THREADS_DEFAULT 8 > static int num_threads; > @@ -723,15 +724,18 @@ static int context_callback(const struct option *opt, const char *arg, > static int file_callback(const struct option *opt, const char *arg, int unset) > { Shouldn't this be renamed? Earlier, the only thing that can be taken from a file was the patterns, but now a file can be given to specify a set of pathspec elements. "patterns_file_callback()" or something like taht? > struct grep_opt *grep_opt = opt->value; > - int from_stdin; > FILE *patterns; > int lno = 0; > struct strbuf sb = STRBUF_INIT; > > BUG_ON_OPT_NEG(unset); > > - from_stdin = !strcmp(arg, "-"); > - patterns = from_stdin ? stdin : fopen(arg, "r"); > + patterns_from_stdin = !strcmp(arg, "-"); Totally outside the scope of this patch, but we may want to introduce int is_stdin_cmdline_marker(const char *arg) { return !strcmp(arg, "-"); } which later can be taught also about "/dev/stdin". Even outside the pathspec-from-file option, I think "diff --no-index" also has some special-case provision for treating "-" as "input coming from the standard input string", which would benefit from such a helper. > + if (patterns_from_stdin && pathspec_from_stdin) > + die(_("cannot specify both patterns and pathspec via stdin")); It's kind of sad that we need to check this in this callback and also inside cmd_grep() top-level we will see further down... > + if (pathspec_from_file) { > + if (pathspec.nr) > + die(_("--pathspec-from-file is incompatible with pathspec arguments")); > + > + pathspec_from_stdin = !strcmp(pathspec_from_file, "-"); > + > + if (patterns_from_stdin && pathspec_from_stdin) > + die(_("cannot specify both patterns and pathspec via stdin")); ... here. Thanks.
I'm excited to see someone else join my effort, thanks for continuing my effort! Also, less work for me :) On 04.12.2019 21:39, Emily Shaffer wrote: > static int file_callback(const struct option *opt, const char *arg, int unset) > { > struct grep_opt *grep_opt = opt->value; > - int from_stdin; > FILE *patterns; > int lno = 0; > struct strbuf sb = STRBUF_INIT; > > BUG_ON_OPT_NEG(unset); > > - from_stdin = !strcmp(arg, "-"); > - patterns = from_stdin ? stdin : fopen(arg, "r"); > + patterns_from_stdin = !strcmp(arg, "-"); > + > + if (patterns_from_stdin && pathspec_from_stdin) To my understanding, this check will not work as expected. `file_callback` will be called at the moment of parsing args. `pathspec_from_stdin` is only initialized later. Maybe it would be better to convert `file_callback` into a regular function and call it after the options were parsed, similar to how `pathspec_from_file` is parsed later? This will also allow to move global variables into local scope and resolve other small issues raised by other reviewers. > +test_expect_success 'grep with two stdin inputs fails' ' > + test_must_fail git grep --pathspec-from-file - --patterns-from-file - <pathspecs > +' > + It is usually a good idea to test for specific error, like this: test_must_fail git grep --pathspec-from-file - --patterns-from-file - <pathspecs 2>err && test_i18ngrep "cannot specify both patterns and pathspec via stdin" err &&
Hi Emily, On Wed, 4 Dec 2019, Emily Shaffer wrote: > diff --git a/builtin/grep.c b/builtin/grep.c > index 50ce8d9461..54ba991c42 100644 > --- a/builtin/grep.c > +++ b/builtin/grep.c > @@ -809,6 +813,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix) > int use_index = 1; > int pattern_type_arg = GREP_PATTERN_TYPE_UNSPECIFIED; > int allow_revs; > + char *pathspec_from_file; > + int pathspec_file_nul; > > struct option options[] = { I need this on top to make it work without problems (I have not yet triggered the CI build, but I will in a moment): -- snipsnap -- Subject: [PATCH] fixup??? grep: support the --pathspec-from-file option We must not use those variables uninitialized, this causes CI build failures left and right (see e.g. https://dev.azure.com/gitgitgadget/git/_build/results?buildId=22821) Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> --- builtin/grep.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/builtin/grep.c b/builtin/grep.c index 54ba991c425..c2aa1aeebd6 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -813,8 +813,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix) int use_index = 1; int pattern_type_arg = GREP_PATTERN_TYPE_UNSPECIFIED; int allow_revs; - char *pathspec_from_file; - int pathspec_file_nul; + char *pathspec_from_file = NULL; + int pathspec_file_nul = 0; struct option options[] = { OPT_BOOL(0, "cached", &cached, -- 2.24.0.windows.2.611.ge9aced84530
On Wed, Dec 04, 2019 at 12:39:11PM -0800, Emily Shaffer wrote: > Teach 'git grep' to use OPT_PATHSPEC_FROM_FILE and update the > documentation accordingly. > diff --git a/builtin/grep.c b/builtin/grep.c > index 50ce8d9461..54ba991c42 100644 > --- a/builtin/grep.c > +++ b/builtin/grep.c > @@ -809,6 +813,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix) > int use_index = 1; > int pattern_type_arg = GREP_PATTERN_TYPE_UNSPECIFIED; > int allow_revs; > + char *pathspec_from_file; > + int pathspec_file_nul; Uninitialized variables ... > > struct option options[] = { > OPT_BOOL(0, "cached", &cached, > @@ -896,8 +902,10 @@ int cmd_grep(int argc, const char **argv, const char *prefix) > OPT_BOOL('W', "function-context", &opt.funcbody, > N_("show the surrounding function")), > OPT_GROUP(""), > - OPT_CALLBACK('f', NULL, &opt, N_("file"), > + OPT_CALLBACK('f', "patterns-from-file", &opt, N_("file"), > N_("read patterns from file"), file_callback), > + OPT_PATHSPEC_FROM_FILE(&pathspec_from_file), > + OPT_PATHSPEC_FILE_NUL(&pathspec_file_nul), > { OPTION_CALLBACK, 'e', NULL, &opt, N_("pattern"), > N_("match <pattern>"), PARSE_OPT_NONEG, pattern_callback }, > { OPTION_CALLBACK, 0, "and", &opt, NULL, > @@ -1062,6 +1070,23 @@ int cmd_grep(int argc, const char **argv, const char *prefix) > pathspec.recursive = 1; > pathspec.recurse_submodules = !!recurse_submodules; > > + if (pathspec_from_file) { ... one of which is checked here ... > + if (pathspec.nr) > + die(_("--pathspec-from-file is incompatible with pathspec arguments")); ... causing a lot of tests to fail in our OSX CI builds (but, strangely, in none of the Linux builds), either with the above error message, e.g.: expecting success of 4014.150 'format-patch --pretty=mboxrd': [...] ++git format-patch --pretty=mboxrd --stdout -1 01b96447182ff76f30268fb82d3f9e9e09b14e6c~1..01b96447182ff76f30268fb82d3f9e9e09b14e6c ++git grep -h --no-index -A11 '^>From could trip up a loose mbox parser' patch fatal: --pathspec-from-file is incompatible with pathspec arguments error: last command exited with $?=128 or with: expecting success of 7814.2 'grep correctly finds patterns in a submodule': [...] ++git grep -e '(3|4)' --recurse-submodules fatal: could not open '����?' for reading: No such file or directory error: last command exited with $?=128 > + pathspec_from_stdin = !strcmp(pathspec_from_file, "-"); > + > + if (patterns_from_stdin && pathspec_from_stdin) > + die(_("cannot specify both patterns and pathspec via stdin")); > + > + parse_pathspec_file(&pathspec, 0, PATHSPEC_PREFER_CWD | > + (opt.max_depth != -1 ? PATHSPEC_MAXDEPTH_VALID : 0), > + prefix, pathspec_from_file, > + pathspec_file_nul); The other uninitialized variable is passed as parameter above and checked below, but I haven't seen any test failures caused by it. > + } else if (pathspec_file_nul) { > + die(_("--pathspec-file-nul requires --pathspec-from-file")); > + } > + > if (list.nr || cached || show_in_pager) { > if (num_threads > 1) > warning(_("invalid option combination, ignoring --threads"));
On Wed, Dec 04, 2019 at 02:24:07PM -0800, Junio C Hamano wrote: > Emily Shaffer <emilyshaffer@google.com> writes: > > > Teach 'git grep' to use OPT_PATHSPEC_FROM_FILE and update the > > documentation accordingly. > > > > This changes enables 'git grep' to receive the pathspec from a file by > > specifying the path, or from stdin by specifying '-' as the path. This > > matches the previous functionality of '-f', so the documentation of '-f' > > has been expanded to describe this functionality. To let '-f' match the > > new '--pathspec-from-file' option, also teach a '--patterns-from-file' > > long name to '-f'. > > > > Since there are now two arguments which can attempt to read from stdin, > > add a safeguard to check whether the user specified '-' for both of > > them. It is still possible for a user to pass '/dev/stdin' to one or > > both arguments at present; we do not explicitly check. > > OK. I guess this is good enough at least for now and possibly > forever as that falls into the "doctor, it hurts when I do this" > category. > > > Refactored to use am/pathspec-from-file. This changes the implementation > > significantly since v1, but the testcases mostly remain the same. > > I am hoping that this topic, and Alexandr's "add" and "checkout" > stuff, would help establishing a good pattern for other commands to > follow. > > > diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt > > index c89fb569e3..56b1c5a302 100644 > > --- a/Documentation/git-grep.txt > > +++ b/Documentation/git-grep.txt > > @@ -24,7 +24,8 @@ SYNOPSIS > > [-A <post-context>] [-B <pre-context>] [-C <context>] > > [-W | --function-context] > > [--threads <num>] > > - [-f <file>] [-e] <pattern> > > + [-f | --patterns-from-file <file>] [-e] <pattern> > > OK. > > > + [--pathspec-from-file=<file> [--pathspec-file-nul]] > > OK. > > > @@ -270,7 +271,10 @@ providing this option will cause it to die. > > See `grep.threads` in 'CONFIGURATION' for more information. > > > > -f <file>:: > > - Read patterns from <file>, one per line. > > +--patterns-from-file <file>:: > > + Read patterns from <file>, one per line. If `<file>` is exactly `-` then > > + standard input is used; standard input cannot be used for both > > + --patterns-from-file and --pathspec-from-file. > > Makes sense---otherwise they would compete and we cannot know which > kind each line in the standard input is ;-) At one point Jonathan Nieder had suggested to me that it might be nice to watch for "--" in the standard input. But I think I wasn't confident in the idea that all commands which take arbitrary number of arguments, and pathspec expect their arg list to end in <arg...> -- <pathspec>. > > > @@ -289,6 +293,20 @@ In future versions we may learn to support patterns containing \0 for > > more search backends, until then we'll die when the pattern type in > > question doesn't support them. > > > > +--pathspec-from-file <file>:: > > + Read pathspec from <file> instead of the command line. If `<file>` is > > + exactly `-` then standard input is used; standard input cannot be used > > + for both --patterns-from-file and --pathspec-from-file. Pathspec elements > > + are separated by LF or CR/LF. Pathspec elements can be quoted as > > + explained for the configuration variable `core.quotePath` (see > > + linkgit:git-config[1]). See also `--pathspec-file-nul` and global > > + `--literal-pathspecs`. > > + > > +--pathspec-file-nul:: > > + Only meaningful with `--pathspec-from-file`. Pathspec elements are > > + separated with NUL character and all other characters are taken > > + literally (including newlines and quotes). > > + > > I think these matches the ones in git-reset.txt from the earlier > work by Alexandr's, which is good. Yes, I took them verbatim. > > > diff --git a/builtin/grep.c b/builtin/grep.c > > index 50ce8d9461..54ba991c42 100644 > > --- a/builtin/grep.c > > +++ b/builtin/grep.c > > @@ -31,6 +31,7 @@ static char const * const grep_usage[] = { > > }; > > > > static int recurse_submodules; > > +static int patterns_from_stdin, pathspec_from_stdin; > > > > #define GREP_NUM_THREADS_DEFAULT 8 > > static int num_threads; > > @@ -723,15 +724,18 @@ static int context_callback(const struct option *opt, const char *arg, > > static int file_callback(const struct option *opt, const char *arg, int unset) > > { > > Shouldn't this be renamed? Earlier, the only thing that can be > taken from a file was the patterns, but now a file can be given > to specify a set of pathspec elements. "patterns_file_callback()" > or something like taht? Done. I was hesitant to touch the other code; I guess it is fine. > > > struct grep_opt *grep_opt = opt->value; > > - int from_stdin; > > FILE *patterns; > > int lno = 0; > > struct strbuf sb = STRBUF_INIT; > > > > BUG_ON_OPT_NEG(unset); > > > > - from_stdin = !strcmp(arg, "-"); > > - patterns = from_stdin ? stdin : fopen(arg, "r"); > > + patterns_from_stdin = !strcmp(arg, "-"); > > Totally outside the scope of this patch, but we may want to > introduce > > int is_stdin_cmdline_marker(const char *arg) > { > return !strcmp(arg, "-"); > } > > which later can be taught also about "/dev/stdin". Even outside the > pathspec-from-file option, I think "diff --no-index" also has some > special-case provision for treating "-" as "input coming from the > standard input string", which would benefit from such a helper. Agreed; that's probably a handy thing to put into parse-options.h. I suggest we think of this next time we are looking for Outreachy microprojects or similar. I'll write it down somewhere. > > > + if (patterns_from_stdin && pathspec_from_stdin) > > + die(_("cannot specify both patterns and pathspec via stdin")); > > It's kind of sad that we need to check this in this callback and > also inside cmd_grep() top-level we will see further down... Another reviewer suggested not using a callback and instead waiting until after the entire argparse to do file reads; this would let us avoid the weird double check here. I think I will do it. > > > + if (pathspec_from_file) { > > + if (pathspec.nr) > > + die(_("--pathspec-from-file is incompatible with pathspec arguments")); > > + > > + pathspec_from_stdin = !strcmp(pathspec_from_file, "-"); > > + > > + if (patterns_from_stdin && pathspec_from_stdin) > > + die(_("cannot specify both patterns and pathspec via stdin")); > > ... here. > > Thanks.
On Thu, Dec 05, 2019 at 12:58:19PM +0100, Alexandr Miloslavskiy wrote: > I'm excited to see someone else join my effort, thanks for continuing my > effort! Also, less work for me :) > > On 04.12.2019 21:39, Emily Shaffer wrote: > > > static int file_callback(const struct option *opt, const char *arg, int unset) > > { > > struct grep_opt *grep_opt = opt->value; > > - int from_stdin; > > FILE *patterns; > > int lno = 0; > > struct strbuf sb = STRBUF_INIT; > > BUG_ON_OPT_NEG(unset); > > - from_stdin = !strcmp(arg, "-"); > > - patterns = from_stdin ? stdin : fopen(arg, "r"); > > + patterns_from_stdin = !strcmp(arg, "-"); > > + > > + if (patterns_from_stdin && pathspec_from_stdin) > > To my understanding, this check will not work as expected. `file_callback` > will be called at the moment of parsing args. `pathspec_from_stdin` is only > initialized later. > > Maybe it would be better to convert `file_callback` into a regular function > and call it after the options were parsed, similar to how > `pathspec_from_file` is parsed later? > > This will also allow to move global variables into local scope and resolve > other small issues raised by other reviewers. Yeah, I think this is a good idea for the reasons you stated, so I'll do so. > > > +test_expect_success 'grep with two stdin inputs fails' ' > > + test_must_fail git grep --pathspec-from-file - --patterns-from-file - <pathspecs > > +' > > + > > It is usually a good idea to test for specific error, like this: > > test_must_fail git grep --pathspec-from-file - --patterns-from-file - > <pathspecs 2>err && > test_i18ngrep "cannot specify both patterns and pathspec via stdin" err && Sure. Thanks.
diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt index c89fb569e3..56b1c5a302 100644 --- a/Documentation/git-grep.txt +++ b/Documentation/git-grep.txt @@ -24,7 +24,8 @@ SYNOPSIS [-A <post-context>] [-B <pre-context>] [-C <context>] [-W | --function-context] [--threads <num>] - [-f <file>] [-e] <pattern> + [-f | --patterns-from-file <file>] [-e] <pattern> + [--pathspec-from-file=<file> [--pathspec-file-nul]] [--and|--or|--not|(|)|-e <pattern>...] [--recurse-submodules] [--parent-basename <basename>] [ [--[no-]exclude-standard] [--cached | --no-index | --untracked] | <tree>...] @@ -270,7 +271,10 @@ providing this option will cause it to die. See `grep.threads` in 'CONFIGURATION' for more information. -f <file>:: - Read patterns from <file>, one per line. +--patterns-from-file <file>:: + Read patterns from <file>, one per line. If `<file>` is exactly `-` then + standard input is used; standard input cannot be used for both + --patterns-from-file and --pathspec-from-file. + Passing the pattern via <file> allows for providing a search pattern containing a \0. @@ -289,6 +293,20 @@ In future versions we may learn to support patterns containing \0 for more search backends, until then we'll die when the pattern type in question doesn't support them. +--pathspec-from-file <file>:: + Read pathspec from <file> instead of the command line. If `<file>` is + exactly `-` then standard input is used; standard input cannot be used + for both --patterns-from-file and --pathspec-from-file. Pathspec elements + are separated by LF or CR/LF. Pathspec elements can be quoted as + explained for the configuration variable `core.quotePath` (see + linkgit:git-config[1]). See also `--pathspec-file-nul` and global + `--literal-pathspecs`. + +--pathspec-file-nul:: + Only meaningful with `--pathspec-from-file`. Pathspec elements are + separated with NUL character and all other characters are taken + literally (including newlines and quotes). + -e:: The next parameter is the pattern. This option has to be used for patterns starting with `-` and should be used in diff --git a/builtin/grep.c b/builtin/grep.c index 50ce8d9461..54ba991c42 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -31,6 +31,7 @@ static char const * const grep_usage[] = { }; static int recurse_submodules; +static int patterns_from_stdin, pathspec_from_stdin; #define GREP_NUM_THREADS_DEFAULT 8 static int num_threads; @@ -723,15 +724,18 @@ static int context_callback(const struct option *opt, const char *arg, static int file_callback(const struct option *opt, const char *arg, int unset) { struct grep_opt *grep_opt = opt->value; - int from_stdin; FILE *patterns; int lno = 0; struct strbuf sb = STRBUF_INIT; BUG_ON_OPT_NEG(unset); - from_stdin = !strcmp(arg, "-"); - patterns = from_stdin ? stdin : fopen(arg, "r"); + patterns_from_stdin = !strcmp(arg, "-"); + + if (patterns_from_stdin && pathspec_from_stdin) + die(_("cannot specify both patterns and pathspec via stdin")); + + patterns = patterns_from_stdin ? stdin : fopen(arg, "r"); if (!patterns) die_errno(_("cannot open '%s'"), arg); while (strbuf_getline(&sb, patterns) == 0) { @@ -742,7 +746,7 @@ static int file_callback(const struct option *opt, const char *arg, int unset) append_grep_pat(grep_opt, sb.buf, sb.len, arg, ++lno, GREP_PATTERN); } - if (!from_stdin) + if (!patterns_from_stdin) fclose(patterns); strbuf_release(&sb); return 0; @@ -809,6 +813,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix) int use_index = 1; int pattern_type_arg = GREP_PATTERN_TYPE_UNSPECIFIED; int allow_revs; + char *pathspec_from_file; + int pathspec_file_nul; struct option options[] = { OPT_BOOL(0, "cached", &cached, @@ -896,8 +902,10 @@ int cmd_grep(int argc, const char **argv, const char *prefix) OPT_BOOL('W', "function-context", &opt.funcbody, N_("show the surrounding function")), OPT_GROUP(""), - OPT_CALLBACK('f', NULL, &opt, N_("file"), + OPT_CALLBACK('f', "patterns-from-file", &opt, N_("file"), N_("read patterns from file"), file_callback), + OPT_PATHSPEC_FROM_FILE(&pathspec_from_file), + OPT_PATHSPEC_FILE_NUL(&pathspec_file_nul), { OPTION_CALLBACK, 'e', NULL, &opt, N_("pattern"), N_("match <pattern>"), PARSE_OPT_NONEG, pattern_callback }, { OPTION_CALLBACK, 0, "and", &opt, NULL, @@ -1062,6 +1070,23 @@ int cmd_grep(int argc, const char **argv, const char *prefix) pathspec.recursive = 1; pathspec.recurse_submodules = !!recurse_submodules; + if (pathspec_from_file) { + if (pathspec.nr) + die(_("--pathspec-from-file is incompatible with pathspec arguments")); + + pathspec_from_stdin = !strcmp(pathspec_from_file, "-"); + + if (patterns_from_stdin && pathspec_from_stdin) + die(_("cannot specify both patterns and pathspec via stdin")); + + parse_pathspec_file(&pathspec, 0, PATHSPEC_PREFER_CWD | + (opt.max_depth != -1 ? PATHSPEC_MAXDEPTH_VALID : 0), + prefix, pathspec_from_file, + pathspec_file_nul); + } else if (pathspec_file_nul) { + die(_("--pathspec-file-nul requires --pathspec-from-file")); + } + if (list.nr || cached || show_in_pager) { if (num_threads > 1) warning(_("invalid option combination, ignoring --threads")); diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh index 7d7b396c23..355890a72a 100755 --- a/t/t7810-grep.sh +++ b/t/t7810-grep.sh @@ -549,6 +549,10 @@ test_expect_success 'grep -f, non-existent file' ' test_must_fail git grep -f patterns ' +text_expect_success 'grep --pathspec-from-file, non-existent file' ' + test_must_fail git grep --pathspec-from-file pathspecs +' + cat >expected <<EOF file:foo mmap bar file:foo_mmap bar @@ -582,8 +586,8 @@ mmap vvv EOF -test_expect_success 'grep -f, multiple patterns' ' - git grep -f patterns >actual && +test_expect_success 'grep --patterns-from-file, multiple patterns' ' + git grep --patterns-from-file patterns >actual && test_cmp expected actual ' @@ -1125,6 +1129,44 @@ test_expect_success 'grep --no-index descends into repos, but not .git' ' ) ' +test_expect_success 'setup pathspecs-file tests' ' +cat >excluded-file <<EOF && +bar +EOF +cat >pathspec-file <<EOF && +foo +bar +baz +EOF +cat >unrelated-file <<EOF && +xyz +EOF +git add excluded-file pathspec-file unrelated-file +' + +cat >pathspecs <<EOF +pathspec-file +unrelated-file +EOF + +cat >expected <<EOF +pathspec-file:bar +EOF + +test_expect_success 'grep --pathspec-from-file with file' ' + git grep --pathspec-from-file pathspecs "bar" >actual && + test_cmp expected actual +' + +test_expect_success 'grep --pathspec-file with stdin' ' + git grep --pathspec-from-file - "bar" <pathspecs >actual && + test_cmp expected actual +' + +test_expect_success 'grep with two stdin inputs fails' ' + test_must_fail git grep --pathspec-from-file - --patterns-from-file - <pathspecs +' + test_expect_success 'setup double-dash tests' ' cat >double-dash <<EOF && --
Teach 'git grep' to use OPT_PATHSPEC_FROM_FILE and update the documentation accordingly. This changes enables 'git grep' to receive the pathspec from a file by specifying the path, or from stdin by specifying '-' as the path. This matches the previous functionality of '-f', so the documentation of '-f' has been expanded to describe this functionality. To let '-f' match the new '--pathspec-from-file' option, also teach a '--patterns-from-file' long name to '-f'. Since there are now two arguments which can attempt to read from stdin, add a safeguard to check whether the user specified '-' for both of them. It is still possible for a user to pass '/dev/stdin' to one or both arguments at present; we do not explicitly check. Signed-off-by: Emily Shaffer <emilyshaffer@google.com> --- Refactored to use am/pathspec-from-file. This changes the implementation significantly since v1, but the testcases mostly remain the same. This change builds on top of am/pathspec-from-file and is dependent on at least "parse-options.h: add new options `--pathspec-from-file`, `--pathspec-file-nul`". I followed the example of "commit: support the --pathspec-from-file" option and retained the tests from v1 of this topic. Documentation/git-grep.txt | 22 ++++++++++++++++-- builtin/grep.c | 35 ++++++++++++++++++++++++----- t/t7810-grep.sh | 46 ++++++++++++++++++++++++++++++++++++-- 3 files changed, 94 insertions(+), 9 deletions(-)