Message ID | 20210325233648.31162-2-stdedos+git@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | pathspec: warn for a no-glob entry that contains `**` | expand |
"Σταύρος Ντέντος" <stdedos@gmail.com> writes: > From: Stavros Ntentos <133706+stdedos@users.noreply.github.com> > > If a pathspec given that contains `**`, chances are that someone is > naively expecting that it will do what the manual has told him > (i.e. that `**` will match 0-or-more directories). OK. this is a well written introduction of the problem being solved. But chances are that the user may have meant to give double-asterisk just out of habit, very well knowing that it would not be an error to give ** when a single * suffices. Which leads us to suspect that it is a bad change to make this a warning that unconditionally fires and cannot squelch. It may be better to implement it as an advice, where the user who knowingly uses that construct can say "yes, I know what I am doing" and squelch it. > However, without an explicit `:(glob)` magic, that will fall out the sky: > the two `**` will merge into one star, which surrounded by slashes, will > match any directory name. I am not sure everything after the "the sky:" you wrote is what you meant to write. Without being marked with a "glob" magic, a wildcard character in a pattern matches even a slash, so these two git ls-files 'Documentation**v2.txt' git ls-files 'Documentation*v2.txt' give the identical result and there is nothing about "surrounded by slashes" involved in it. So, perhaps taking the first two paragraphs together and rewriting: When '/**/' appears in the pathspec, the user may be expecting that it would be matched using the "wildmatch" semantics, matching 0 or more directories. But that is not what happens without ":(glob)" magic. or did you want to warn about "foo**bar" as if it were "foo/**/bar"? The output from these commands: git ls-files ':(glob)Documentation/**/*stash.txt' git ls-files ':(glob)Documentation***stash.txt' git ls-files ':(glob)Documentation**stash.txt' seems to tell me that the "zero-or-more directories" magic happens only when /**/ form is used, not when two asterisks are placed next to each other in a random context. > These changes attempt to bring awareness to this issue. In any case, after presenting what problem we are trying to address, we present our approach / solution in a form as if we are giving orders to the codebase to "become like so". Teach the pathspec parser to emit an advice message when a substring "/**/" appears in a pathspec element that does not have a ":(glob)" magic. Make sure we don't disturb users who use ":(literal)" magic with such a substring, as it is clear they want to find these strings literally. perhaps. > Signed-off-by: Stavros Ntentos <133706+stdedos@users.noreply.github.com> > --- > pathspec.c | 13 +++++++++++++ > pathspec.h | 1 + > t/t6130-pathspec-noglob.sh | 13 +++++++++++++ > 3 files changed, 27 insertions(+) > > diff --git a/pathspec.c b/pathspec.c > index 7a229d8d22..d5b9c0d792 100644 > --- a/pathspec.c > +++ b/pathspec.c > @@ -1,3 +1,4 @@ > +#include <string.h> Never include any header file before including <git-compat-util.h>. Including <git-compat-util.h> indirectly by including <cache.h> or <builtin.h> counts as including it as the first thing. As all necessary standard system headers are supposed to be given by including <git-compat-util.h>, if you needed to explicitly include the above, that may mean <git-compat-util.h>, which should cause the header that lists necessary function, is not working properly on your platform and needs to be adjusted. Are you on a minority platform we haven't adjusted the header inclusion order for, and what function are you trying to use that does not work without adding this extra #include here? We already use strstr() in many places, so that is not the function we are missing system includes for. Or perhaps this is a remnant of what you added while you were experimenting, without realizing that "#include <cache.h>" that is already there already gives you anything you need. If that is the case, iow, if the code works without the extra include, please remove that line. > @@ -588,6 +589,8 @@ void parse_pathspec(struct pathspec *pathspec, > > init_pathspec_item(item + i, flags, prefix, prefixlen, entry); > > + check_missing_glob(entry, item[i].magic); > + We have only one caller of the helper (i.e. this one) and nowhere else, and the helper is small enough ... > @@ -739,3 +742,13 @@ int match_pathspec_attrs(const struct index_state *istate, > > return 1; > } > + > +void check_missing_glob(const char *pathspec_entry, int flags) { > + if (flags & (PATHSPEC_GLOB | PATHSPEC_LITERAL)) { > + return; > + } > + > + if (strstr(pathspec_entry, "**")) { > + warning(_("Pathspec provided contains `**`, but no :(glob) magic.\nIt will not match 0 or more directories!")); > + } > +} ... hence, there is no reason why this helper should exist, let alone be a public function. Also, there is no reason to split the conditional into two. Just "it is OK if we have GLOB or LITERAL, otherwise see if there is /**/" is sufficient. It would be more sensible to just add the check to parse_pathspec() directly. Also, our warning messages do not begin with an uppercase. See attached patch for all of the above, but it is for illustration purposes only; not even compile tested. I am not illustrating how to turn this into an advice message that the user can squelch with the illustration patch. > diff --git a/t/t6130-pathspec-noglob.sh b/t/t6130-pathspec-noglob.sh > index ba7902c9cd..af6cd16f76 100755 > --- a/t/t6130-pathspec-noglob.sh > +++ b/t/t6130-pathspec-noglob.sh > @@ -157,4 +157,17 @@ test_expect_success '**/ does not work with :(literal) and --glob-pathspecs' ' > test_must_be_empty actual > ' > > +cat > expected <<"EOF" > +warning: Pathspec provided contains `**`, but no :(glob) magic. > +EOF Please don't imitate ancient tests. These days, all preparations for individual tests, including the expected outcome, are to be done inside the test itself. Study nearby tests in the same script for better examples. > +test_expect_success '** without :(glob) warns of lacking glob magic' ' > + test_might_fail git stash -- "**/bar" 2>warns && > + grep -Ff expected warns Same comment. Nearby examples all set up expeced output and never uses "grep" to see if the expectation is satisfied. Imitate them. > +' > + > +test_expect_success '** with :(literal) does not warn of lacking glob magic' ' > + test_might_fail git stash -- ":(literal)**/bar" 2>warns && > + ! grep -Ff expected warns Ditto. Thanks. pathspec.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git i/pathspec.c w/pathspec.c index 18b3be362a..5b0eed5a65 100644 --- i/pathspec.c +++ w/pathspec.c @@ -598,6 +598,10 @@ void parse_pathspec(struct pathspec *pathspec, die(_("pathspec '%s' is beyond a symbolic link"), entry); } + if (!(item[i].magic & (PATHSPEC_GLOB | PATHSPEC_LITERAL)) && + strstr(entry, "**")) + warning(_("** in '%s'without :(glob) magic:"), entry); + if (item[i].nowildcard_len < item[i].len) pathspec->has_wildcard = 1; pathspec->magic |= item[i].magic;
On Thu, Mar 25, 2021 at 8:43 PM Junio C Hamano <gitster@pobox.com> wrote: > So, perhaps taking the first two paragraphs together and rewriting: > > When '/**/' appears in the pathspec, the user may be > expecting that it would be matched using the "wildmatch" > semantics, matching 0 or more directories. But that is > not what happens without ":(glob)" magic. > > Teach the pathspec parser to emit an advice message when a > substring "/**/" appears in a pathspec element that does not > have a ":(glob)" magic. Make sure we don't disturb users > who use ":(literal)" magic with such a substring, as it is > clear they want to find these strings literally. I haven't been following the discussion, but is there a reason we need to penalize the user with a warning rather than helping, for instance by inferring ":(glob)" in the presence of `/**/` if not otherwise countermanded by ":(literal)" or whatnot?
Eric Sunshine <sunshine@sunshineco.com> writes: > I haven't been following the discussion, but is there a reason we need > to penalize the user with a warning rather than helping, for instance > by inferring ":(glob)" in the presence of `/**/` if not otherwise > countermanded by ":(literal)" or whatnot? Two reasons I can think of offhand are - How /**/ is interpreted is not the only thing that is different between the normal mode and the glob magic mode. IIRC, an asterisk * or a question mark ? matches slash in normal mode (it started out as fnmatch() without FNM_PATHNAME). Should we warn about ":(glob)" if somebody asks for "foo*", "*foo", or "foo*bar". If not, why shouldn't? - Thers is no explicit magic that says "there is no magic" to countermand such a DWIM.
On Thu, Mar 25, 2021 at 08:02:31PM -0700, Junio C Hamano wrote: > Eric Sunshine <sunshine@sunshineco.com> writes: > > > I haven't been following the discussion, but is there a reason we need > > to penalize the user with a warning rather than helping, for instance > > by inferring ":(glob)" in the presence of `/**/` if not otherwise > > countermanded by ":(literal)" or whatnot? > > Two reasons I can think of offhand are > > - How /**/ is interpreted is not the only thing that is different > between the normal mode and the glob magic mode. IIRC, an > asterisk * or a question mark ? matches slash in normal mode (it > started out as fnmatch() without FNM_PATHNAME). Should we warn > about ":(glob)" if somebody asks for "foo*", "*foo", or > "foo*bar". If not, why shouldn't? > > - Thers is no explicit magic that says "there is no magic" to > countermand such a DWIM. I do wonder if this distinction creates more harm than good. As somebody who has never used ":(glob)" myself, I was confused about what it even does (and it was not easy to find the documentation; I ended up finding the original commit in the history first!). We have three modes: - no globbing - globbing with fnmatch(), with FNM_PATHNAME according to the docs - globbing with wildmatch You may notice that I would call both of those latter two "globbing", but only one of them is triggered by the ":(glob)" magic. :) This just seems really confusing, and I wonder if anybody would be that sad if we just used wildmatch everywhere. The original bd30c2e484 (pathspec: support :(glob) syntax, 2013-07-14) even says: The old fnmatch behavior is considered deprecated and discouraged to use. but I guess it would be backwards-incompatible. Maybe it would be less confusing if we named the three states explicitly: :(literal) :(fnmatch) :(wildmatch) (and keeping :(glob) as a synonym for compatibility). -Peff
I think I have managed to address everyone's questions in this thread. Hopefully everything will be addressed by this, and the patch that will soon follow: > You may notice that I would call both of those latter two "globbing", > but only one of them is triggered by the ":(glob)" magic. :) And that's why I argued a DWIM was warranted here (https://lore.kernel.org/git/xmqqft1iquka.fsf@gitster.g/; it's more clear in Junio's quote, but you could of course read the original). I would want to be considered as an above-average git user, and I still was oblivious to the fact that `**` required a `:(glob)` from the command line. Especially since `.gitignore` files are treated differently (i.e. don't require `:(glob)`) I cannot / don't want to argue to "do it" or not, as I think my experience is not substantial enough to navigate such argument, and go from concept to materialization. That being said, if there was a cli.pathspec.wildmatch flag, I would've had it set to true in my global gitconfig. Ofc I could set `GIT_GLOB_PATHSPECS=1`, but I am not that happy to force it by setting it in a shell rc, instead of where it belongs. > I am not sure everything after the "the sky:" you wrote is what you > meant to write. Without being marked with a "glob" magic, a > wildcard character in a pattern matches even a slash, so these two > > git ls-files 'Documentation**v2.txt' > git ls-files 'Documentation*v2.txt' > > give the identical result and there is nothing about "surrounded by > slashes" involved in it. It seems you are right - in `fnmatch` mode, `:!*.gitignore` would've served me right (and avoided the whole thread). If you think that it's just my (i.e. few people's) lacking of understanding `fnmatch` glob, then we can drop this. However, given the disparity between `.gitignore` syntax and pathspec given from the command line (from my limited POV meaningless/confusing), and your (plural) arguments of backwards compatibility, I think we can draw the line at an advice been acceptable. Unless I see other points (like the warn vs advice), or pure C code-review points, I am getting the impression that this thread will not move forward. As I don't know how reviews usually happen here, and lacking some other medium of discussion, I would appreciate (at some point) an explicit go/no-go decision - to save everyone's time. Being unfamiliar to the project, and being who I am, I prefer explicit vs implicit points. Navigating an unknown process from top to bottom is pressure enough to me :-D > seems to tell me that the "zero-or-more directories" magic happens > only when /**/ form is used, not when two asterisks are placed next > to each other in a random context. Not exactly: ** needs to either start or end with a slash (or both) for its wildmatch behavior. I can try to make the code more explicit, but of course the code (and tests) will increase - and then maybe disproportionately to the otherwise intended-to-be small change (since DWIM is not wanted).
diff --git a/pathspec.c b/pathspec.c index 7a229d8d22..d5b9c0d792 100644 --- a/pathspec.c +++ b/pathspec.c @@ -1,3 +1,4 @@ +#include <string.h> #include "cache.h" #include "config.h" #include "dir.h" @@ -588,6 +589,8 @@ void parse_pathspec(struct pathspec *pathspec, init_pathspec_item(item + i, flags, prefix, prefixlen, entry); + check_missing_glob(entry, item[i].magic); + if (item[i].magic & PATHSPEC_EXCLUDE) nr_exclude++; if (item[i].magic & magic_mask) @@ -739,3 +742,13 @@ int match_pathspec_attrs(const struct index_state *istate, return 1; } + +void check_missing_glob(const char *pathspec_entry, int flags) { + if (flags & (PATHSPEC_GLOB | PATHSPEC_LITERAL)) { + return; + } + + if (strstr(pathspec_entry, "**")) { + warning(_("Pathspec provided contains `**`, but no :(glob) magic.\nIt will not match 0 or more directories!")); + } +} diff --git a/pathspec.h b/pathspec.h index 454ce364fa..913518ebd3 100644 --- a/pathspec.h +++ b/pathspec.h @@ -157,5 +157,6 @@ char *find_pathspecs_matching_against_index(const struct pathspec *pathspec, int match_pathspec_attrs(const struct index_state *istate, const char *name, int namelen, const struct pathspec_item *item); +void check_missing_glob(const char* pathspec_entry, int flags); #endif /* PATHSPEC_H */ diff --git a/t/t6130-pathspec-noglob.sh b/t/t6130-pathspec-noglob.sh index ba7902c9cd..af6cd16f76 100755 --- a/t/t6130-pathspec-noglob.sh +++ b/t/t6130-pathspec-noglob.sh @@ -157,4 +157,17 @@ test_expect_success '**/ does not work with :(literal) and --glob-pathspecs' ' test_must_be_empty actual ' +cat > expected <<"EOF" +warning: Pathspec provided contains `**`, but no :(glob) magic. +EOF +test_expect_success '** without :(glob) warns of lacking glob magic' ' + test_might_fail git stash -- "**/bar" 2>warns && + grep -Ff expected warns +' + +test_expect_success '** with :(literal) does not warn of lacking glob magic' ' + test_might_fail git stash -- ":(literal)**/bar" 2>warns && + ! grep -Ff expected warns +' + test_done