Message ID | 20210716094227.15177-1-mscalindt@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Herbert Xu |
Headers | show |
Series | expand: Recognize '^' as a negation character in BE | expand |
On 16/07/2021 10:42, Dimitar Yurukov wrote: > While parsing bracket expression ('[...]'), DASH recognizes only '!' as > a special character for negation/inversion, but POSIX specifies '^'. > > The POSIX specification (2018 edition) states: > > ^ The <circumflex> shall signify a non-matching list expression when > it occurs first in a list, immediately following a > <left-square-bracket> (see RE Bracket Expression). It also states: the <exclamation-mark> character ( '!' ) shall replace the <circumflex> character ( '^' ) in its role in a non-matching list in the regular expression notation and A bracket expression starting with an unquoted <circumflex> character produces unspecified results. See 2.13.1 Patterns Matching a Single Character. So both the dash and the bash behaviour are permitted and this patch does not address a correctness issue. Scripts that rely on ^ for negation should be modified to use !. The patch may still be worthwhile to increase compatibility, but in that case the same change also needs to be made to expmeta(). Cheers, Harald van Dijk
On 16/07/2021 11:00, Harald van Dijk wrote: > On 16/07/2021 10:42, Dimitar Yurukov wrote: > > While parsing bracket expression ('[...]'), DASH recognizes only '!' as > > a special character for negation/inversion, but POSIX specifies '^'. > > > > The POSIX specification (2018 edition) states: > > > > ^ The <circumflex> shall signify a non-matching list expression when > > it occurs first in a list, immediately following a > > <left-square-bracket> (see RE Bracket Expression). > > It also states: > > the <exclamation-mark> character ( '!' ) shall replace the > <circumflex> character ( '^' ) in its role in a non-matching list in the > regular expression notation > > and > > A bracket expression starting with an unquoted <circumflex> character > produces unspecified results. > > See 2.13.1 Patterns Matching a Single Character. Oh, my bad, sorry for the noise. > The patch may still be worthwhile to increase compatibility, but in that > case the same change also needs to be made to expmeta(). Oops, you are right. Will attach v2.
diff --git a/src/expand.c b/src/expand.c index 1730670..06392ff 100644 --- a/src/expand.c +++ b/src/expand.c @@ -1565,7 +1565,7 @@ pmatch(const char *pattern, const char *string) startp = p; invert = 0; - if (*p == '!') { + if (*p == '!' || *p == '^') { invert++; p++; }
While parsing bracket expression ('[...]'), DASH recognizes only '!' as a special character for negation/inversion, but POSIX specifies '^'. The POSIX specification (2018 edition) states: ^ The <circumflex> shall signify a non-matching list expression when it occurs first in a list, immediately following a <left-square-bracket> (see RE Bracket Expression). DASH: $ i='123 asd' && printf "%s\n" "${i##*[!a-z]}" asd $ i='123 asd' && printf "%s\n" "${i##*[^a-z]}" <empty expansion> BASH (with --posix): $ i='123 asd' && printf "%s\n" "${i##*[!a-z]}" asd $ i='123 asd' && printf "%s\n" "${i##*[^a-z]}" asd Make <circumflex> ('^') a special character used to specify negation/inversion in bracket expressions: $ i='123 asd' && printf "%s\n" "${i##*[^a-z]}" asd Signed-off-by: Dimitar Yurukov <mscalindt@gmail.com> --- src/expand.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)