diff mbox series

[v2] expand: Always quote caret when using fnmatch

Message ID YeejsuVQaGCkQ97t@gondor.apana.org.au (mailing list archive)
State Accepted
Delegated to: Herbert Xu
Headers show
Series [v2] expand: Always quote caret when using fnmatch | expand

Commit Message

Herbert Xu Jan. 19, 2022, 5:37 a.m. UTC
On Tue, Jan 18, 2022 at 08:44:05AM +0000, Harald van Dijk wrote:
>
> The loop that is modified by this patch is only taken after any qchars are
> seen, so for e.g.
> 
>   var=abc
>   echo ${var#[^a]}
> 
> it has no effect. More importantly though, _rmescapes is used to modify

Good catch.  I have modified qchars accordingly.

> strings in place. This patch causes _rmescapes to try and grow strings,
> which cannot ever work. A test case for this is
> 
>   case aa in \a[^a]) echo match ;; esac
> 
> which fails with a segfault after this patch is applied.

Indeed.  I have modified the RMESCAPE_ALLOC to allow growth.

Thanks,

---8<---
This patch forces ^ to be a literal when we use fnmatch.

In order to allow for the extra space to quote the caret, the
function _rmescapes will allocate up to twice the memory if the
flag RMESCAPE_GLOB is set.

Fixes: 7638476c18f2 ("shell: Enable fnmatch/glob by default")
Reported-by: Christoph Anton Mitterer <calestyo@scientia.org>
Suggested-by: Harald van Dijk <harald@gigawatt.nl>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Comments

Stephane Chazelas Feb. 20, 2022, 7:15 a.m. UTC | #1
2022-01-19 16:37:54 +1100, Herbert Xu:
[...]
> This patch forces ^ to be a literal when we use fnmatch.
[...]

Hi,

I'm coming here from a discussion at
https://www.austingroupbugs.net/view.php?id=1558 where I'm
trying to get POSIX to mandate (or suggest that it will mandate
in the future) [^x] works like [!x] to negate a bracket
expression set like it does already in most
shell/fnmatch()/glob() implementations, and to remove that
non-sensical (to me at least) discrepancy with regexps and most
other shells / languages.

To me, that patch is a step in the wrong direction.

Other Ash-based shells already made the move (make [^x] an alias
for [!x]) a long time ago (FreeBSD / NetBSD sh in 1997), BSD
fnmatch() in 1994. The GNU libc and GNU shell have always
supported [^x] AFAIK. So has zsh. ksh made the move in 2005.

In dash, the move to fnmatch() in that case was a step in the
right direction, and reverting it would be an unfortunate step
back to me.

If the point of having dash use fnmatch() is to add consistency
with other tools used from the shell (like find -name/-path...),
then this change doesn't help either on the majority of systems.

The "fix" also seems incomplete and causes at least this
regression:

$ ltrace -e fnmatch ./bin/dash -c 'case "\\" in ($1) echo match; esac' sh '[\^]'
dash->fnmatch("[\\\\^]", "\\", 0)                              = 0
match

See how that added \ caused it to be included in the set. Compare with:

+++ exited (status 0) +++
$ ltrace -e fnmatch ./bin/dash -c 'case "\\" in ($1) echo match; esac' sh '[\!]'
dash->fnmatch("[\\!]", "\\", 0)                                = 1
+++ exited (status 0) +++
$ ltrace -e fnmatch ./bin/dash -c 'case "\\" in ($1) echo match; esac' sh '[\x]'
dash->fnmatch("[\\x]", "\\", 0)                                = 1
+++ exited (status 0) +++

(here on Debian with glibc 2.33)
Eric Blake Feb. 21, 2022, 4:39 p.m. UTC | #2
On Sun, Feb 20, 2022 at 07:15:44AM +0000, Stephane Chazelas wrote:
> 2022-01-19 16:37:54 +1100, Herbert Xu:
> [...]
> > This patch forces ^ to be a literal when we use fnmatch.
> [...]
> The "fix" also seems incomplete and causes at least this
> regression:
> 
> $ ltrace -e fnmatch ./bin/dash -c 'case "\\" in ($1) echo match; esac' sh '[\^]'
> dash->fnmatch("[\\\\^]", "\\", 0)                              = 0
> match
> 
> See how that added \ caused it to be included in the set. Compare with:
> 
> +++ exited (status 0) +++
> $ ltrace -e fnmatch ./bin/dash -c 'case "\\" in ($1) echo match; esac' sh '[\!]'
> dash->fnmatch("[\\!]", "\\", 0)                                = 1
> +++ exited (status 0) +++
> $ ltrace -e fnmatch ./bin/dash -c 'case "\\" in ($1) echo match; esac' sh '[\x]'
> dash->fnmatch("[\\x]", "\\", 0)                                = 1
> +++ exited (status 0) +++

If you want to escape ^ to always be treated as a literal when calling
fnmatch(), you have to spell it something like [[=^=]...], not [\^...]
(that is, since \^ inside [] does NOT treat the \ as an escape
character, but as yet another literal character as part of the brakcet
group, you instead have to resort to calling out the literal character
^ via alternative spellings such as by a collation equivalent).
Another alternative is to rewrite the [] bracket exprsesion so that ^
is no longer first (but be careful if the character after ^ is -, ],
or !, as the rewrite is not always a trivial swap with the next
character).
Harald van Dijk Feb. 21, 2022, 5:06 p.m. UTC | #3
On 21/02/2022 16:39, Eric Blake wrote:
> On Sun, Feb 20, 2022 at 07:15:44AM +0000, Stephane Chazelas wrote:
>> 2022-01-19 16:37:54 +1100, Herbert Xu:
>> [...]
>>> This patch forces ^ to be a literal when we use fnmatch.
>> [...]
>> The "fix" also seems incomplete and causes at least this
>> regression:
>>
>> $ ltrace -e fnmatch ./bin/dash -c 'case "\\" in ($1) echo match; esac' sh '[\^]'
>> dash->fnmatch("[\\\\^]", "\\", 0)                              = 0
>> match
>>
>> See how that added \ caused it to be included in the set. Compare with:
>>
>> +++ exited (status 0) +++
>> $ ltrace -e fnmatch ./bin/dash -c 'case "\\" in ($1) echo match; esac' sh '[\!]'
>> dash->fnmatch("[\\!]", "\\", 0)                                = 1
>> +++ exited (status 0) +++
>> $ ltrace -e fnmatch ./bin/dash -c 'case "\\" in ($1) echo match; esac' sh '[\x]'
>> dash->fnmatch("[\\x]", "\\", 0)                                = 1
>> +++ exited (status 0) +++
> 
> If you want to escape ^ to always be treated as a literal when calling
> fnmatch(), you have to spell it something like [[=^=]...], not [\^...]
> (that is, since \^ inside [] does NOT treat the \ as an escape
> character, but as yet another literal character as part of the brakcet
> group, you instead have to resort to calling out the literal character
> ^ via alternative spellings such as by a collation equivalent).

This is correct for regular expressions, unspecified for shell patterns 
in general after bug 1234 was resolved, and always incorrect for 
fnmatch() after bug 1190 was resolved. In fnmatch(), backslash has to 
work even inside bracket expressions. Whether it had to may not 
previously have been clear in the standard, but it has been made clear 
now and that is also how it works in glibc. That part of the patch is 
not a problem.

Cheers,
Harald van Dijk
Stephane Chazelas Feb. 21, 2022, 7:15 p.m. UTC | #4
2022-02-21 17:06:13 +0000, Harald van Dijk:
[...]
> > If you want to escape ^ to always be treated as a literal when calling
> > fnmatch(), you have to spell it something like [[=^=]...], not [\^...]
> > (that is, since \^ inside [] does NOT treat the \ as an escape
> > character, but as yet another literal character as part of the brakcet
> > group, you instead have to resort to calling out the literal character
> > ^ via alternative spellings such as by a collation equivalent).
> 
> This is correct for regular expressions, unspecified for shell patterns in
> general after bug 1234 was resolved, and always incorrect for fnmatch()
> after bug 1190 was resolved. In fnmatch(), backslash has to work even inside
> bracket expressions. Whether it had to may not previously have been clear in
> the standard, but it has been made clear now and that is also how it works
> in glibc. That part of the patch is not a problem.
[...]

Thanks, and sorry about that. I was actually the one raising bug
1234 and misremembered the outcome.

case $var in ([\!x])

is specified, but:

pattern='[\!x]'
case $var in ($pattern)

is not indeed, so, the behaviour  of:

sh -c 'case "\\" in ($1) echo match; esac' sh '[\^x]'

is unspecified, so even though that patch changes the behaviour
in this instance, it's still not a conformance bug.
diff mbox series

Patch

diff --git a/src/expand.c b/src/expand.c
index aea5cc4..9906d8a 100644
--- a/src/expand.c
+++ b/src/expand.c
@@ -135,8 +135,6 @@  STATIC int pmatch(const char *, const char *);
 #endif
 static size_t cvtnum(intmax_t num, int flags);
 STATIC size_t esclen(const char *, const char *);
-STATIC char *scanleft(char *, char *, char *, char *, int, int);
-STATIC char *scanright(char *, char *, char *, char *, int, int);
 STATIC void varunset(const char *, const char *, const char *, int)
 	__attribute__((__noreturn__));
 
@@ -541,10 +539,8 @@  out:
 }
 
 
-STATIC char *
-scanleft(
-	char *startp, char *rmesc, char *rmescend, char *str, int quotes,
-	int zero
+static char *scanleft(char *startp, char *endp, char *rmesc, char *rmescend,
+		      char *str, int quotes, int zero
 ) {
 	char *loc;
 	char *loc2;
@@ -573,16 +569,14 @@  scanleft(
 }
 
 
-STATIC char *
-scanright(
-	char *startp, char *rmesc, char *rmescend, char *str, int quotes,
-	int zero
+static char *scanright(char *startp, char *endp, char *rmesc, char *rmescend,
+		       char *str, int quotes, int zero
 ) {
 	int esc = 0;
 	char *loc;
 	char *loc2;
 
-	for (loc = str - 1, loc2 = rmescend; loc >= startp; loc2--) {
+	for (loc = endp, loc2 = rmescend; loc >= startp; loc2--) {
 		int match;
 		char c = *loc2;
 		const char *s = loc2;
@@ -618,7 +612,9 @@  static char *subevalvar(char *start, char *str, int strloc, int startloc,
 	long amount;
 	char *rmesc, *rmescend;
 	int zero;
-	char *(*scan)(char *, char *, char *, char *, int , int);
+	char *(*scan)(char *, char *, char *, char *, char *, int , int);
+	int nstrloc = strloc;
+	char *endp;
 	char *p;
 
 	p = argstr(start, (flag & EXP_DISCARD) | EXP_TILDE |
@@ -646,33 +642,40 @@  static char *subevalvar(char *start, char *str, int strloc, int startloc,
 		abort();
 #endif
 
-	rmesc = startp;
 	rmescend = stackblock() + strloc;
+	str = preglob(rmescend, FNMATCH_IS_ENABLED ?
+				RMESCAPE_ALLOC | RMESCAPE_GROW : 0);
+	if (FNMATCH_IS_ENABLED) {
+		startp = stackblock() + startloc;
+		rmescend = stackblock() + strloc;
+		nstrloc = str - (char *)stackblock();
+	}
+
+	rmesc = startp;
 	if (quotes) {
 		rmesc = _rmescapes(startp, RMESCAPE_ALLOC | RMESCAPE_GROW);
-		if (rmesc != startp) {
+		if (rmesc != startp)
 			rmescend = expdest;
-			startp = stackblock() + startloc;
-		}
+		startp = stackblock() + startloc;
+		str = stackblock() + nstrloc;
 	}
 	rmescend--;
-	str = stackblock() + strloc;
-	preglob(str, 0);
 
 	/* zero = subtype == VSTRIMLEFT || subtype == VSTRIMLEFTMAX */
 	zero = subtype >> 1;
 	/* VSTRIMLEFT/VSTRIMRIGHTMAX -> scanleft */
 	scan = (subtype & 1) ^ zero ? scanleft : scanright;
 
-	loc = scan(startp, rmesc, rmescend, str, quotes, zero);
+	endp = stackblock() + strloc - 1;
+	loc = scan(startp, endp, rmesc, rmescend, str, quotes, zero);
 	if (loc) {
 		if (zero) {
-			memmove(startp, loc, str - loc);
-			loc = startp + (str - loc) - 1;
+			memmove(startp, loc, endp - loc);
+			loc = startp + (endp - loc);
 		}
 		*loc = '\0';
 	} else
-		loc = str - 1;
+		loc = endp;
 
 out:
 	amount = loc - expdest;
@@ -1501,7 +1504,9 @@  msort(struct strlist *list, int len)
 STATIC inline int
 patmatch(char *pattern, const char *string)
 {
-	return pmatch(preglob(pattern, 0), string);
+	return pmatch(preglob(pattern, FNMATCH_IS_ENABLED ?
+				       RMESCAPE_ALLOC | RMESCAPE_GROW : 0),
+		      string);
 }
 
 
@@ -1654,15 +1659,22 @@  _rmescapes(char *str, int flag)
 	int notescaped;
 	int globbing;
 
-	p = strpbrk(str, qchars);
+	p = strpbrk(str, cqchars);
 	if (!p) {
 		return str;
 	}
 	q = p;
 	r = str;
+	globbing = flag & RMESCAPE_GLOB;
+
 	if (flag & RMESCAPE_ALLOC) {
 		size_t len = p - str;
-		size_t fulllen = len + strlen(p) + 1;
+		size_t fulllen = strlen(p);
+
+		if (FNMATCH_IS_ENABLED && globbing)
+			fulllen *= 2;
+
+		fulllen += len + 1;
 
 		if (flag & RMESCAPE_GROW) {
 			int strloc = str - (char *)stackblock();
@@ -1680,7 +1692,6 @@  _rmescapes(char *str, int flag)
 			q = mempcpy(q, str, len);
 		}
 	}
-	globbing = flag & RMESCAPE_GLOB;
 	notescaped = globbing;
 	while (*p) {
 		if (*p == (char)CTLQUOTEMARK) {
@@ -1693,8 +1704,11 @@  _rmescapes(char *str, int flag)
 			notescaped = 0;
 			goto copy;
 		}
+		if (FNMATCH_IS_ENABLED && *p == '^')
+			goto add_escape;
 		if (*p == (char)CTLESC) {
 			p++;
+add_escape:
 			if (notescaped)
 				*q++ = '\\';
 		}
diff --git a/src/mystring.c b/src/mystring.c
index de624b8..ed3c8f6 100644
--- a/src/mystring.c
+++ b/src/mystring.c
@@ -62,7 +62,12 @@  const char spcstr[] = " ";
 const char snlfmt[] = "%s\n";
 const char dolatstr[] = { CTLQUOTEMARK, CTLVAR, VSNORMAL, '@', '=',
 			  CTLQUOTEMARK, '\0' };
-const char qchars[] = { CTLESC, CTLQUOTEMARK, 0 };
+const char cqchars[] = {
+#ifdef HAVE_FNMATCH
+	'^',
+#endif
+	CTLESC, CTLQUOTEMARK, 0
+};
 const char illnum[] = "Illegal number: %s";
 const char homestr[] = "HOME";
 
diff --git a/src/mystring.h b/src/mystring.h
index 083ea98..564b911 100644
--- a/src/mystring.h
+++ b/src/mystring.h
@@ -37,11 +37,18 @@ 
 #include <inttypes.h>
 #include <string.h>
 
+#ifdef HAVE_FNMATCH
+#define FNMATCH_IS_ENABLED 1
+#else
+#define FNMATCH_IS_ENABLED 0
+#endif
+
 extern const char snlfmt[];
 extern const char spcstr[];
 extern const char dolatstr[];
 #define DOLATSTRLEN 6
-extern const char qchars[];
+extern const char cqchars[];
+#define qchars (cqchars + FNMATCH_IS_ENABLED)
 extern const char illnum[];
 extern const char homestr[];