Message ID | YeejsuVQaGCkQ97t@gondor.apana.org.au (mailing list archive) |
---|---|
State | Accepted |
Delegated to: | Herbert Xu |
Headers | show |
Series | [v2] expand: Always quote caret when using fnmatch | expand |
2022-01-19 16:37:54 +1100, Herbert Xu:
[...]
> This patch forces ^ to be a literal when we use fnmatch.
[...]
Hi,
I'm coming here from a discussion at
https://www.austingroupbugs.net/view.php?id=1558 where I'm
trying to get POSIX to mandate (or suggest that it will mandate
in the future) [^x] works like [!x] to negate a bracket
expression set like it does already in most
shell/fnmatch()/glob() implementations, and to remove that
non-sensical (to me at least) discrepancy with regexps and most
other shells / languages.
To me, that patch is a step in the wrong direction.
Other Ash-based shells already made the move (make [^x] an alias
for [!x]) a long time ago (FreeBSD / NetBSD sh in 1997), BSD
fnmatch() in 1994. The GNU libc and GNU shell have always
supported [^x] AFAIK. So has zsh. ksh made the move in 2005.
In dash, the move to fnmatch() in that case was a step in the
right direction, and reverting it would be an unfortunate step
back to me.
If the point of having dash use fnmatch() is to add consistency
with other tools used from the shell (like find -name/-path...),
then this change doesn't help either on the majority of systems.
The "fix" also seems incomplete and causes at least this
regression:
$ ltrace -e fnmatch ./bin/dash -c 'case "\\" in ($1) echo match; esac' sh '[\^]'
dash->fnmatch("[\\\\^]", "\\", 0) = 0
match
See how that added \ caused it to be included in the set. Compare with:
+++ exited (status 0) +++
$ ltrace -e fnmatch ./bin/dash -c 'case "\\" in ($1) echo match; esac' sh '[\!]'
dash->fnmatch("[\\!]", "\\", 0) = 1
+++ exited (status 0) +++
$ ltrace -e fnmatch ./bin/dash -c 'case "\\" in ($1) echo match; esac' sh '[\x]'
dash->fnmatch("[\\x]", "\\", 0) = 1
+++ exited (status 0) +++
(here on Debian with glibc 2.33)
On Sun, Feb 20, 2022 at 07:15:44AM +0000, Stephane Chazelas wrote: > 2022-01-19 16:37:54 +1100, Herbert Xu: > [...] > > This patch forces ^ to be a literal when we use fnmatch. > [...] > The "fix" also seems incomplete and causes at least this > regression: > > $ ltrace -e fnmatch ./bin/dash -c 'case "\\" in ($1) echo match; esac' sh '[\^]' > dash->fnmatch("[\\\\^]", "\\", 0) = 0 > match > > See how that added \ caused it to be included in the set. Compare with: > > +++ exited (status 0) +++ > $ ltrace -e fnmatch ./bin/dash -c 'case "\\" in ($1) echo match; esac' sh '[\!]' > dash->fnmatch("[\\!]", "\\", 0) = 1 > +++ exited (status 0) +++ > $ ltrace -e fnmatch ./bin/dash -c 'case "\\" in ($1) echo match; esac' sh '[\x]' > dash->fnmatch("[\\x]", "\\", 0) = 1 > +++ exited (status 0) +++ If you want to escape ^ to always be treated as a literal when calling fnmatch(), you have to spell it something like [[=^=]...], not [\^...] (that is, since \^ inside [] does NOT treat the \ as an escape character, but as yet another literal character as part of the brakcet group, you instead have to resort to calling out the literal character ^ via alternative spellings such as by a collation equivalent). Another alternative is to rewrite the [] bracket exprsesion so that ^ is no longer first (but be careful if the character after ^ is -, ], or !, as the rewrite is not always a trivial swap with the next character).
On 21/02/2022 16:39, Eric Blake wrote: > On Sun, Feb 20, 2022 at 07:15:44AM +0000, Stephane Chazelas wrote: >> 2022-01-19 16:37:54 +1100, Herbert Xu: >> [...] >>> This patch forces ^ to be a literal when we use fnmatch. >> [...] >> The "fix" also seems incomplete and causes at least this >> regression: >> >> $ ltrace -e fnmatch ./bin/dash -c 'case "\\" in ($1) echo match; esac' sh '[\^]' >> dash->fnmatch("[\\\\^]", "\\", 0) = 0 >> match >> >> See how that added \ caused it to be included in the set. Compare with: >> >> +++ exited (status 0) +++ >> $ ltrace -e fnmatch ./bin/dash -c 'case "\\" in ($1) echo match; esac' sh '[\!]' >> dash->fnmatch("[\\!]", "\\", 0) = 1 >> +++ exited (status 0) +++ >> $ ltrace -e fnmatch ./bin/dash -c 'case "\\" in ($1) echo match; esac' sh '[\x]' >> dash->fnmatch("[\\x]", "\\", 0) = 1 >> +++ exited (status 0) +++ > > If you want to escape ^ to always be treated as a literal when calling > fnmatch(), you have to spell it something like [[=^=]...], not [\^...] > (that is, since \^ inside [] does NOT treat the \ as an escape > character, but as yet another literal character as part of the brakcet > group, you instead have to resort to calling out the literal character > ^ via alternative spellings such as by a collation equivalent). This is correct for regular expressions, unspecified for shell patterns in general after bug 1234 was resolved, and always incorrect for fnmatch() after bug 1190 was resolved. In fnmatch(), backslash has to work even inside bracket expressions. Whether it had to may not previously have been clear in the standard, but it has been made clear now and that is also how it works in glibc. That part of the patch is not a problem. Cheers, Harald van Dijk
2022-02-21 17:06:13 +0000, Harald van Dijk: [...] > > If you want to escape ^ to always be treated as a literal when calling > > fnmatch(), you have to spell it something like [[=^=]...], not [\^...] > > (that is, since \^ inside [] does NOT treat the \ as an escape > > character, but as yet another literal character as part of the brakcet > > group, you instead have to resort to calling out the literal character > > ^ via alternative spellings such as by a collation equivalent). > > This is correct for regular expressions, unspecified for shell patterns in > general after bug 1234 was resolved, and always incorrect for fnmatch() > after bug 1190 was resolved. In fnmatch(), backslash has to work even inside > bracket expressions. Whether it had to may not previously have been clear in > the standard, but it has been made clear now and that is also how it works > in glibc. That part of the patch is not a problem. [...] Thanks, and sorry about that. I was actually the one raising bug 1234 and misremembered the outcome. case $var in ([\!x]) is specified, but: pattern='[\!x]' case $var in ($pattern) is not indeed, so, the behaviour of: sh -c 'case "\\" in ($1) echo match; esac' sh '[\^x]' is unspecified, so even though that patch changes the behaviour in this instance, it's still not a conformance bug.
diff --git a/src/expand.c b/src/expand.c index aea5cc4..9906d8a 100644 --- a/src/expand.c +++ b/src/expand.c @@ -135,8 +135,6 @@ STATIC int pmatch(const char *, const char *); #endif static size_t cvtnum(intmax_t num, int flags); STATIC size_t esclen(const char *, const char *); -STATIC char *scanleft(char *, char *, char *, char *, int, int); -STATIC char *scanright(char *, char *, char *, char *, int, int); STATIC void varunset(const char *, const char *, const char *, int) __attribute__((__noreturn__)); @@ -541,10 +539,8 @@ out: } -STATIC char * -scanleft( - char *startp, char *rmesc, char *rmescend, char *str, int quotes, - int zero +static char *scanleft(char *startp, char *endp, char *rmesc, char *rmescend, + char *str, int quotes, int zero ) { char *loc; char *loc2; @@ -573,16 +569,14 @@ scanleft( } -STATIC char * -scanright( - char *startp, char *rmesc, char *rmescend, char *str, int quotes, - int zero +static char *scanright(char *startp, char *endp, char *rmesc, char *rmescend, + char *str, int quotes, int zero ) { int esc = 0; char *loc; char *loc2; - for (loc = str - 1, loc2 = rmescend; loc >= startp; loc2--) { + for (loc = endp, loc2 = rmescend; loc >= startp; loc2--) { int match; char c = *loc2; const char *s = loc2; @@ -618,7 +612,9 @@ static char *subevalvar(char *start, char *str, int strloc, int startloc, long amount; char *rmesc, *rmescend; int zero; - char *(*scan)(char *, char *, char *, char *, int , int); + char *(*scan)(char *, char *, char *, char *, char *, int , int); + int nstrloc = strloc; + char *endp; char *p; p = argstr(start, (flag & EXP_DISCARD) | EXP_TILDE | @@ -646,33 +642,40 @@ static char *subevalvar(char *start, char *str, int strloc, int startloc, abort(); #endif - rmesc = startp; rmescend = stackblock() + strloc; + str = preglob(rmescend, FNMATCH_IS_ENABLED ? + RMESCAPE_ALLOC | RMESCAPE_GROW : 0); + if (FNMATCH_IS_ENABLED) { + startp = stackblock() + startloc; + rmescend = stackblock() + strloc; + nstrloc = str - (char *)stackblock(); + } + + rmesc = startp; if (quotes) { rmesc = _rmescapes(startp, RMESCAPE_ALLOC | RMESCAPE_GROW); - if (rmesc != startp) { + if (rmesc != startp) rmescend = expdest; - startp = stackblock() + startloc; - } + startp = stackblock() + startloc; + str = stackblock() + nstrloc; } rmescend--; - str = stackblock() + strloc; - preglob(str, 0); /* zero = subtype == VSTRIMLEFT || subtype == VSTRIMLEFTMAX */ zero = subtype >> 1; /* VSTRIMLEFT/VSTRIMRIGHTMAX -> scanleft */ scan = (subtype & 1) ^ zero ? scanleft : scanright; - loc = scan(startp, rmesc, rmescend, str, quotes, zero); + endp = stackblock() + strloc - 1; + loc = scan(startp, endp, rmesc, rmescend, str, quotes, zero); if (loc) { if (zero) { - memmove(startp, loc, str - loc); - loc = startp + (str - loc) - 1; + memmove(startp, loc, endp - loc); + loc = startp + (endp - loc); } *loc = '\0'; } else - loc = str - 1; + loc = endp; out: amount = loc - expdest; @@ -1501,7 +1504,9 @@ msort(struct strlist *list, int len) STATIC inline int patmatch(char *pattern, const char *string) { - return pmatch(preglob(pattern, 0), string); + return pmatch(preglob(pattern, FNMATCH_IS_ENABLED ? + RMESCAPE_ALLOC | RMESCAPE_GROW : 0), + string); } @@ -1654,15 +1659,22 @@ _rmescapes(char *str, int flag) int notescaped; int globbing; - p = strpbrk(str, qchars); + p = strpbrk(str, cqchars); if (!p) { return str; } q = p; r = str; + globbing = flag & RMESCAPE_GLOB; + if (flag & RMESCAPE_ALLOC) { size_t len = p - str; - size_t fulllen = len + strlen(p) + 1; + size_t fulllen = strlen(p); + + if (FNMATCH_IS_ENABLED && globbing) + fulllen *= 2; + + fulllen += len + 1; if (flag & RMESCAPE_GROW) { int strloc = str - (char *)stackblock(); @@ -1680,7 +1692,6 @@ _rmescapes(char *str, int flag) q = mempcpy(q, str, len); } } - globbing = flag & RMESCAPE_GLOB; notescaped = globbing; while (*p) { if (*p == (char)CTLQUOTEMARK) { @@ -1693,8 +1704,11 @@ _rmescapes(char *str, int flag) notescaped = 0; goto copy; } + if (FNMATCH_IS_ENABLED && *p == '^') + goto add_escape; if (*p == (char)CTLESC) { p++; +add_escape: if (notescaped) *q++ = '\\'; } diff --git a/src/mystring.c b/src/mystring.c index de624b8..ed3c8f6 100644 --- a/src/mystring.c +++ b/src/mystring.c @@ -62,7 +62,12 @@ const char spcstr[] = " "; const char snlfmt[] = "%s\n"; const char dolatstr[] = { CTLQUOTEMARK, CTLVAR, VSNORMAL, '@', '=', CTLQUOTEMARK, '\0' }; -const char qchars[] = { CTLESC, CTLQUOTEMARK, 0 }; +const char cqchars[] = { +#ifdef HAVE_FNMATCH + '^', +#endif + CTLESC, CTLQUOTEMARK, 0 +}; const char illnum[] = "Illegal number: %s"; const char homestr[] = "HOME"; diff --git a/src/mystring.h b/src/mystring.h index 083ea98..564b911 100644 --- a/src/mystring.h +++ b/src/mystring.h @@ -37,11 +37,18 @@ #include <inttypes.h> #include <string.h> +#ifdef HAVE_FNMATCH +#define FNMATCH_IS_ENABLED 1 +#else +#define FNMATCH_IS_ENABLED 0 +#endif + extern const char snlfmt[]; extern const char spcstr[]; extern const char dolatstr[]; #define DOLATSTRLEN 6 -extern const char qchars[]; +extern const char cqchars[]; +#define qchars (cqchars + FNMATCH_IS_ENABLED) extern const char illnum[]; extern const char homestr[];