Message ID | 20210326024005.26962-2-stdedos+git@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | pathspec: warn: long and short forms are incompatible | expand |
On Fri, Mar 26, 2021 at 04:40:04AM +0200, Σταύρος Ντέντος wrote: > +void check_mishandled_exclude(const char *entry) { > + size_t entry_len = strlen(entry); > + char flags[entry_len]; > + char path[entry_len]; We avoid using variable-length arrays in our codebase. For one thing, they were not historically supported by all platforms (we are slowly using more C99 features, but we are introducing them slowly and intentionally). But two, they are limited in size and the failure mode is not graceful. If "entry" is larger than the available stack, then we'll get a segfault with no option to handle it better. > + if (sscanf(entry, ":!(%4096[^)])%4096s", &flags, &path) != 2) { > + return; We also generally avoid using scanf, because it's error-prone. The "4096" is scary here, but I don't _think_ it's a buffer overflow, because "path" is already the same size as "entry" (not including the NUL terminator, but that is negated by the fact that we'll have skipped at least ":!"). Is this "%4096[^)]" actually valid? I don't think scanf understands regular expressions. We'd want to avoid making an extra copy of the string anyway. So you'd probably want to just parse left-to-right in the original string, like: const char *p = entry; /* skip past stuff we know must be there */ if (!skip_prefix(p, ":!(", &p)) return; /* this checks count_slashes() > 0 in the flags section, though I'm * not sure I understand what that is looking for... */ for (; *p && *p != ')'; p++) { if (*p == '/') return; } if (*p++ != ')') return; /* now p is pointing at "path", though we don't seem to do anything * with it... */ -Peff
> We avoid using variable-length arrays in our codebase. ... Hear hear, however, I wanted to avoid the "small mess" that allocate/free would cause (one or more of ++sloc, labels, if-nesting); ... > But two, they are limited in size and the failure mode is not graceful. ... ... however, my main issue is that - I don't know what's a sane allocation size. > ... The "4096" is scary here ... While scary, it is "a safe" upper high. The first time the string ends up in pathspec.c for processing it's here: entry = argv[i]; which comes from here parse_pathspec(pathspec, magic_mask, flags, prefix, parsed_file.v); and I don't know what's the maximum size of `parsed_file.v[0]` > Is this "%4096[^)]" actually valid? I don't think scanf understands > regular expressions. I was suprised too - but it's not regex. see: https://www.cplusplus.com/reference/cstdio/scanf/#parameters - 4th/3rd row from the end %s is greedy and there would be no other way to limit `sscanf` to not include `)` on its first variable. > ... though I'm not sure I understand what that is looking for ... I think it will help you see what I am trying to achieve if you read at the warning message / testcase https://lore.kernel.org/git/20210326024005.26962-2-stdedos+git@gmail.com/#iZ30t:t6132-pathspec-exclude.sh And, to clean up the testcase: git log --oneline --format=%s -- ':!(glob)**/file' I guess it should be now obvious what am I targetting: If someone naively mixes short and long pathspec magics (`!`, and `(glob)`), short form takes precedence and ignores long magic / assumes long magic as part of path. (If it's not obvious, all the more reason to include such warning)
On Fri, Mar 26, 2021 at 06:16:26PM +0200, Stavros Ntentos wrote: > > We avoid using variable-length arrays in our codebase. ... > > Hear hear, however, I wanted to avoid the "small mess" > that allocate/free would cause (one or more of ++sloc, labels, if-nesting); ... > > > But two, they are limited in size and the failure mode is not graceful. ... > ... however, my main issue is that - I don't know what's a sane allocation size. I don't think a VLA gets you out of knowing the allocation size. Either way, you are using strlen(entry). But I do think avoiding allocating altogether is better (as I showed in my previous response). > > ... The "4096" is scary here ... > While scary, it is "a safe" upper high. > The first time the string ends up in pathspec.c for processing it's here: > entry = argv[i]; > which comes from here > parse_pathspec(pathspec, magic_mask, flags, prefix, parsed_file.v); > > and I don't know what's the maximum size of `parsed_file.v[0]` There is no reasonable maximum size you can assume. Using 4096 is most definitely not a safe upper bound. However, as I said, I don't think it is doing anything useful in the first place. You have sized the destination buffers as large as the original string, so they must be large enough to hold any subset of the original. Dropping them would be equally correct, but less distracting to a reader. > > Is this "%4096[^)]" actually valid? I don't think scanf understands > > regular expressions. > > I was suprised too - but it's not regex. > > see: https://www.cplusplus.com/reference/cstdio/scanf/#parameters - 4th/3rd row from the end Thanks, this is a corner of scanf I haven't looked at (mostly because again, we generally avoid scanf in our code base entirely). > I think it will help you see what I am trying to achieve if you read at the warning message / testcase > https://lore.kernel.org/git/20210326024005.26962-2-stdedos+git@gmail.com/#iZ30t:t6132-pathspec-exclude.sh > > And, to clean up the testcase: > git log --oneline --format=%s -- ':!(glob)**/file' > > I guess it should be now obvious what am I targetting: > > If someone naively mixes short and long pathspec magics (`!`, and `(glob)`), > short form takes precedence and ignores long magic / assumes long magic as part of path. > > (If it's not obvious, all the more reason to include such warning) I understand the overall goal. I am not sure why slashes in the flags section are a reliable indicator that this mixing is not happening and we should not show the warning. It also feels like any checks like this should be relying on the existing pathspec-magic parser a bit more. I don't know the pathspec code that well, but surely at some point it has a notion of which parts are magic flags (e.g., after parse_element_magic in init_pathspec_item). -Peff
Jeff King <peff@peff.net> writes: > It also feels like any checks like this should be relying on the > existing pathspec-magic parser a bit more. I don't know the pathspec > code that well, but surely at some point it has a notion of which parts > are magic flags (e.g., after parse_element_magic in init_pathspec_item). Absolutely. parse_element_magic() decides, by looking at the character after the initial ':', the we are looking at the longhand form when magic sequence is introduced by ":(". Otherwise, : must be followed by shorthand form, where the only usable ones are '/' (synonym for top), '!' and '^' (synonym for exclude), and ':' (end of short-form marker), but most importantly, '(' will *never* be a valid shorthand form (because it is not part of the GIT_PATHSPEC_MAGIC class of bytes). So, presumably, inside parse_short_magic(), you could detect '(' and complain it as an attempt to switch to longform. Here is to illustrate where to hook the check. With this, I can get something like this: $ git show -s --oneline -- ':!/(foobar)frotz' ':/(bar)nitfol' . warning: :!/: cannot mix shortform magic with longform like (exclude) warning: :/: cannot mix shortform magic with longform like (exclude) 84d06cdc06 Sync with v2.31.1 Note that this illustration does not show the best warning message. To improve it to a bit more useful level, I think two things need to happen on top. * Use advice to allow users sequelch. * Show a few letters after '(' (but pay attention to the end of the string, which ends with either '\0' or ':'), and lose the substring "like (exclude)" from the message. That would have given hint: ":!/(foo...": cannot mix shortform and longform magic for the above example. I initially thought it might make it even better if we looked ahead of what comes after '(', took the substring before ',' or ')', and looked up pathspec_magic[] array, BUT I do not think it is a good idea. It would be confusing if we do not give the same advice to ":!(literol)" when the user does get one for ":!(literal)". So the above "two things need to happen" list deliberately excludes an "improvement" doing so. pathspec.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git i/pathspec.c w/pathspec.c index 18b3be362a..cd343d5b54 100644 --- i/pathspec.c +++ w/pathspec.c @@ -336,6 +336,9 @@ static const char *parse_long_magic(unsigned *magic, int *prefix_len, return pos; } +static const char mixed_pathspec_magic[] = + N_("%.*s: cannot mix shortform magic with longform like (exclude)"); + /* * Parse the pathspec element looking for short magic * @@ -356,6 +359,16 @@ static const char *parse_short_magic(unsigned *magic, const char *elem) continue; } + if (ch == '(') { + /* + * a possible common mistake: once you started + * a shortform you cannot add (longform), e.g. + * ":!(top)" + */ + warning(_(mixed_pathspec_magic), + (int)(pos-elem), elem); + } + if (!is_pathspec_magic(ch)) break;
Junio C Hamano <gitster@pobox.com> writes: > Jeff King <peff@peff.net> writes: > > > It also feels like any checks like this should be relying on the > > existing pathspec-magic parser a bit more. I don't know the pathspec > > code that well, but surely at some point it has a notion of which parts > > are magic flags (e.g., after parse_element_magic in init_pathspec_item). > > Absolutely. parse_element_magic() decides, by looking at the > character after the initial ':', the we are looking at the longhand > form when magic sequence is introduced by ":(". Absolutely from me too! I would want to re-use more of the existing code. In response to that, patch https://lore.kernel.org/git/20210328152629.16486-1-133706+stdedos@users.noreply.github.com/ clearly shows that it is possible to re-use the de-facto long form magic parsing code to attempt to parse the string that continues as a long form magic. This could theoretically pave the road for mixed short/long form parsing (in that order), if would be so desired. However, as probably suspected, it suffers from one big problem: if (ARRAY_SIZE(pathspec_magic) <= i) die(_("Invalid pathspec magic '%.*s' in '%s'"), (int) len, pos, elem); which is an unconditional `exit(128)`. I don't know if it's possible to rescue (or redirect) that `die` call. In any case, considering that, this again deviates from the "small informative" change this commit was supposed to be. Having said that, I have cooked a working patch with the proposed code.
diff --git a/pathspec.c b/pathspec.c index 7a229d8d22..374c529569 100644 --- a/pathspec.c +++ b/pathspec.c @@ -1,3 +1,5 @@ +#include <stdio.h> +#include <string.h> #include "cache.h" #include "config.h" #include "dir.h" @@ -586,6 +588,8 @@ void parse_pathspec(struct pathspec *pathspec, for (i = 0; i < n; i++) { entry = argv[i]; + check_mishandled_exclude(entry); + init_pathspec_item(item + i, flags, prefix, prefixlen, entry); if (item[i].magic & PATHSPEC_EXCLUDE) @@ -739,3 +743,18 @@ int match_pathspec_attrs(const struct index_state *istate, return 1; } + +void check_mishandled_exclude(const char *entry) { + size_t entry_len = strlen(entry); + char flags[entry_len]; + char path[entry_len]; + + if (sscanf(entry, ":!(%4096[^)])%4096s", &flags, &path) != 2) { + return; + } + if (count_slashes(flags) > 0) { + return; + } + + warning(_("Pathspec provided matches `:!(...)`\n\tDid you mean `:(exclude,...)`?")); +} diff --git a/pathspec.h b/pathspec.h index 454ce364fa..879d4e82c6 100644 --- a/pathspec.h +++ b/pathspec.h @@ -157,5 +157,6 @@ char *find_pathspecs_matching_against_index(const struct pathspec *pathspec, int match_pathspec_attrs(const struct index_state *istate, const char *name, int namelen, const struct pathspec_item *item); +void check_mishandled_exclude(const char* pathspec_entry); #endif /* PATHSPEC_H */ diff --git a/t/t6132-pathspec-exclude.sh b/t/t6132-pathspec-exclude.sh index 30328b87f0..b32ddb2a56 100755 --- a/t/t6132-pathspec-exclude.sh +++ b/t/t6132-pathspec-exclude.sh @@ -244,4 +244,37 @@ test_expect_success 'grep --untracked PATTERN :(exclude)*FILE' ' test_cmp expect-grep actual-grep ' +cat > expected_warn <<"EOF" +Pathspec provided matches `:!(...)` +EOF +test_expect_success 'warn pathspec :!(...) skips the parenthesized magics' ' + git log --oneline --format=%s -- '"'"':!(glob)**/file'"'"' >actual 2>warn && + cat <<EOF >expect && +sub2/file +sub/sub/sub/file +sub/file2 +sub/sub/file +sub/file +file +EOF + cat actual && + cat warn && + test_cmp expect actual && + grep -Ff expected_warn warn +' + +test_expect_success 'do not warn that pathspec :!(...) skips the parenthesized magics (if parenthesis would not be part of the magic)' ' + git log --oneline --format=%s -- '"'"':!(gl/ob)/file'"'"' >actual 2>warn && + cat <<EOF >expect && +sub2/file +sub/sub/sub/file +sub/file2 +sub/sub/file +sub/file +file +EOF + test_cmp expect actual && + ! grep -Ff expected_warn warn +' + test_done