Message ID | 20180224003344.GA3354@gondor.apana.org.au (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Herbert Xu |
Headers | show |
On 2/24/18 1:33 AM, Herbert Xu wrote: > On Wed, Feb 21, 2018 at 11:47:58PM +0100, Harald van Dijk wrote: >> On 2/21/18 2:50 PM, Denys Vlasenko wrote: >>> I propose replacing this: >> >> Agreed, that looks better. Thanks! > > Good work guys. However, could you check to see if this patch > works too? It passes all my tests so far. It seems like the new control character doesn't get escaped in one place where it should be, and gets lost because of that: Unpatched: $ dash -c 'x=`printf \\\211X`; echo $x | cat -v' M-^IX Patched: $ src/dash -c 'x=`printf \\\211X`; echo $x | cat -v' X Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Feb 24, 2018 at 10:47:07AM +0100, Harald van Dijk wrote: > > It seems like the new control character doesn't get escaped in one place > where it should be, and gets lost because of that: > > Unpatched: > > $ dash -c 'x=`printf \\\211X`; echo $x | cat -v' > M-^IX > > Patched: > > $ src/dash -c 'x=`printf \\\211X`; echo $x | cat -v' > X Hmm, it works here. Can you check that your Makefile is up-to-date and the generated syntax.c looks like this: const char basesyntax[] = { CEOF, CSPCL, CWORD, CCTL, CCTL, CCTL, CCTL, CCTL, CCTL, CCTL, CCTL, CCTL, Cheers,
On Sat, Feb 24, 2018 at 5:52 PM, Herbert Xu <herbert@gondor.apana.org.au> wrote: > On Sat, Feb 24, 2018 at 10:47:07AM +0100, Harald van Dijk wrote: >> >> It seems like the new control character doesn't get escaped in one place >> where it should be, and gets lost because of that: >> >> Unpatched: >> >> $ dash -c 'x=`printf \\\211X`; echo $x | cat -v' >> M-^IX >> >> Patched: >> >> $ src/dash -c 'x=`printf \\\211X`; echo $x | cat -v' >> X > > Hmm, it works here. Can you check that your Makefile is up-to-date > and the generated syntax.c looks like this: > > const char basesyntax[] = { > CEOF, CSPCL, CWORD, CCTL, > CCTL, CCTL, CCTL, CCTL, > CCTL, CCTL, CCTL, CCTL, > > Cheers, Guys, while you work on fixing this, consider taking a look at some more parsing cases in this bbox bug: https://bugs.busybox.net/show_bug.cgi?id=10821 I suppose some of them may fail in dash too (I did not test, sorry). -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2/28/18 11:14 PM, Denys Vlasenko wrote: > Guys, while you work on fixing this, consider taking a look at > some more parsing cases in this bbox bug: > > https://bugs.busybox.net/show_bug.cgi?id=10821 > > I suppose some of them may fail in dash too (I did not test, sorry). $'...' isn't supported in dash, but the underlying problem does appear to exist in dash too: $ bash -c 'x=yz; echo "${x#'"'y'"'}"' z $ dash -c 'x=yz; echo "${x#'"'y'"'}"' yz (That is, they are executing x=yz; echo "${x#'y'}".) POSIX says that in "${var#pattern}" (and the same for ##, % and %%), the pattern is considered unquoted regardless of the outer quotation marks. Because of that, the single quote characters should not be taken literally, but should be taken as quoting the y. ksh, posh and zsh agree with bash. But: POSIX does not say the same for +/-/..., so dash is correct there: $ bash -c 'x=yz; echo "${x+'"'y'"'}"' 'y' $ dash -c 'x=yz; echo "${x+'"'y'"'}"' 'y' Because of that, for that report's follow-up comment about (unset x; echo "${x-$'\x41'}") I would not have expected this to be taken as a $'...' string. ksh (which does support $'...' strings too) prints the literal text $'\x41', and so does bash if invoked as sh. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 01/03/2018 00:04, Harald van Dijk wrote: > $ bash -c 'x=yz; echo "${x#'"'y'"'}"' > z > > $ dash -c 'x=yz; echo "${x#'"'y'"'}"' > yz > > (That is, they are executing x=yz; echo "${x#'y'}".) > > POSIX says that in "${var#pattern}" (and the same for ##, % and %%), the > pattern is considered unquoted regardless of the outer quotation marks. > Because of that, the single quote characters should not be taken > literally, but should be taken as quoting the y. ksh, posh and zsh agree > with bash. Unfortunately, this causes another problem with all of the backslash approaches so far: x='\\\\'; printf "%s\n" "${x#'\\\\'}" This should print a blank line. (bash, ksh, posh and zsh agree.) Here, dash's parser stores '$\$\', where $ is a control character. preglob would need to turn this into \\\\\\\\. The problem is again that preglob cannot increase the string length. Perhaps the parser needs to store this as '$\$\$\$\', $ being either CTLESC or that new CTLBACK? Either way, it requires some more invasive changes. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Mar 01, 2018 at 08:24:22PM +0100, Harald van Dijk wrote: > On 01/03/2018 00:04, Harald van Dijk wrote: > >$ bash -c 'x=yz; echo "${x#'"'y'"'}"' > >z > > > >$ dash -c 'x=yz; echo "${x#'"'y'"'}"' > >yz > > > >(That is, they are executing x=yz; echo "${x#'y'}".) > > > >POSIX says that in "${var#pattern}" (and the same for ##, % and %%), the > >pattern is considered unquoted regardless of the outer quotation marks. > >Because of that, the single quote characters should not be taken > >literally, but should be taken as quoting the y. ksh, posh and zsh agree > >with bash. > > Unfortunately, this causes another problem with all of the backslash > approaches so far: > > x='\\\\'; printf "%s\n" "${x#'\\\\'}" > > This should print a blank line. (bash, ksh, posh and zsh agree.) > > Here, dash's parser stores '$\$\', where $ is a control character. preglob > would need to turn this into \\\\\\\\. The problem is again that preglob > cannot increase the string length. Perhaps the parser needs to store this as > '$\$\$\$\', $ being either CTLESC or that new CTLBACK? Either way, it > requires some more invasive changes. These are different issues. dash's parser currently does not understand nested quoting in patterns at all. That is, if your parameter expansion are within double quotes, then dash at the parser level will consider the pattern to be double-quoted. Thus any nested single-quotes will be literals instead of actual quotes. If we fix this in the parser then everything should just work. Cheers,
On 02/03/2018 08:49, Herbert Xu wrote: > On Thu, Mar 01, 2018 at 08:24:22PM +0100, Harald van Dijk wrote: >> On 01/03/2018 00:04, Harald van Dijk wrote: >>> $ bash -c 'x=yz; echo "${x#'"'y'"'}"' >>> z >>> >>> $ dash -c 'x=yz; echo "${x#'"'y'"'}"' >>> yz >>> >>> (That is, they are executing x=yz; echo "${x#'y'}".) >>> >>> POSIX says that in "${var#pattern}" (and the same for ##, % and %%), the >>> pattern is considered unquoted regardless of the outer quotation marks. >>> Because of that, the single quote characters should not be taken >>> literally, but should be taken as quoting the y. ksh, posh and zsh agree >>> with bash. >> >> Unfortunately, this causes another problem with all of the backslash >> approaches so far: >> >> x='\\\\'; printf "%s\n" "${x#'\\\\'}" >> >> This should print a blank line. (bash, ksh, posh and zsh agree.) >> >> Here, dash's parser stores '$\$\', where $ is a control character. preglob >> would need to turn this into \\\\\\\\. The problem is again that preglob >> cannot increase the string length. Perhaps the parser needs to store this as >> '$\$\$\$\', $ being either CTLESC or that new CTLBACK? Either way, it >> requires some more invasive changes. > > These are different issues. dash's parser currently does not > understand nested quoting in patterns at all. That is, if your > parameter expansion are within double quotes, then dash at the > parser level will consider the pattern to be double-quoted. Thus > any nested single-quotes will be literals instead of actual quotes. That's the same thing though. The problem with the backslashes is also that dash sees them as double-quoted when they should be seen as unquoted, and the approach taken in commit 7cfd8be0dc83342b4a71f3a8e5b7efab4670e50c that lasts to this day was specifically to *not* fix this in the parser, but to simply have the parser record enough information so that quote status can be determined and patched up during expansion. It's just that in the case of single quotes, expansion was never modified to recognise them. Thinking some more, I don't think the parser actually records enough information to let that work. > If we fix this in the parser then everything should just work. Right, that's the approach FreeBSD sh has taken that I referred to in my message from Feb 18, that I'd personally prefer as well. It basically involves reverting 7cfd8be0dc83342b4a71f3a8e5b7efab4670e50c, setting syntax to BASESYNTAX/DQSYNTAX (whichever is appropriate) when the parse of a variable expansion starts, and finding a sensible way to change the syntax back to BASESYNTAX/DQSYNTAX/ARISYNTAX when it ends. In FreeBSD sh, an explicit stack of syntaxes is created for this, but that might be avoidable: with slight modifications to what gets stored in the byte after CTLVAR/CTLARI, it might be possible to go back through the parser output to determine the syntax to revert to. I'll see if I can get that working. > Cheers, -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Mar 02, 2018 at 11:58:41AM +0100, Harald van Dijk wrote: > > >If we fix this in the parser then everything should just work. > > Right, that's the approach FreeBSD sh has taken that I referred to in my > message from Feb 18, that I'd personally prefer as well. It basically > involves reverting 7cfd8be0dc83342b4a71f3a8e5b7efab4670e50c, setting syntax > to BASESYNTAX/DQSYNTAX (whichever is appropriate) when the parse of a > variable expansion starts, and finding a sensible way to change the syntax > back to BASESYNTAX/DQSYNTAX/ARISYNTAX when it ends. In FreeBSD sh, an > explicit stack of syntaxes is created for this, but that might be avoidable: > with slight modifications to what gets stored in the byte after > CTLVAR/CTLARI, it might be possible to go back through the parser output to > determine the syntax to revert to. I'll see if I can get that working. Yes but that's overkill just to fix single quote within patterns. We already support nested double-quotes in patterns correctly. As single quotes cannot nest, it should be an easy fix. Cheers,
On 02/03/2018 16:28, Herbert Xu wrote: > On Fri, Mar 02, 2018 at 11:58:41AM +0100, Harald van Dijk wrote: >> >>> If we fix this in the parser then everything should just work. >> >> Right, that's the approach FreeBSD sh has taken that I referred to in my >> message from Feb 18, that I'd personally prefer as well. It basically >> involves reverting 7cfd8be0dc83342b4a71f3a8e5b7efab4670e50c, setting syntax >> to BASESYNTAX/DQSYNTAX (whichever is appropriate) when the parse of a >> variable expansion starts, and finding a sensible way to change the syntax >> back to BASESYNTAX/DQSYNTAX/ARISYNTAX when it ends. In FreeBSD sh, an >> explicit stack of syntaxes is created for this, but that might be avoidable: >> with slight modifications to what gets stored in the byte after >> CTLVAR/CTLARI, it might be possible to go back through the parser output to >> determine the syntax to revert to. I'll see if I can get that working. > > Yes but that's overkill just to fix single quote within patterns. > We already support nested double-quotes in patterns correctly. As > single quotes cannot nest, it should be an easy fix. Single quotes indeed cannot nest, but you do need to reliably determine when a single quote is a special character, which gets very tricky very quickly with nested substitutions. In "${x+''}", the ' is the literal ' character. In "${x#''}", the ' is a quote character. This part is easy, this part is just a matter of setting another variable when the parse of the substitution starts. But in "${x+${y-}''}", the ' is the literal ' character. In "${x#${y-}''}", the ' is a quote character. This part is hard. If the above is done simply using another local variable, then the parse of the nested ${y-} would clobber that variable. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Mar 02, 2018 at 05:05:46PM +0100, Harald van Dijk wrote: > > But in "${x+${y-}''}", the ' is the literal ' character. In "${x#${y-}''}", > the ' is a quote character. This part is hard. If the above is done simply > using another local variable, then the parse of the nested ${y-} would > clobber that variable. I don't see why that's hard. You just need to remember whether you're in a pattern context (i.e., after a %/%% or #/##). If you are, then you need to go back to basesyntax instead of dqsyntax until the next right brace. Cheers,
On 02/03/2018 17:33, Herbert Xu wrote: > On Fri, Mar 02, 2018 at 05:05:46PM +0100, Harald van Dijk wrote: >> >> But in "${x+${y-}''}", the ' is the literal ' character. In "${x#${y-}''}", >> the ' is a quote character. This part is hard. If the above is done simply >> using another local variable, then the parse of the nested ${y-} would >> clobber that variable. > > I don't see why that's hard. You just need to remember whether > you're in a pattern context (i.e., after a %/%% or #/##). If you > are, then you need to go back to basesyntax instead of dqsyntax > until the next right brace. Let me slightly modify my example so that the effect becomes different: "${x#"${y-*}"''}" When the parser has processed "${x#"${y- would the state be "in a pattern context", or "not in a pattern context"? If the state is "in a pattern context", how do you prevent the next * from being taken as unquoted? It should be taken as quoted. If it stores "not in a pattern context", and the parser is processing the last character in "${x#"${y-*} how can it reset the state to "in a pattern context"? Where could this state be stored? With arbitrarily deep nesting of variable expansions, I do not see how you can avoid requiring arbitrarily large state, which gets difficult with dash's non-recursive parser. Mind you, I do hope I'm missing something obvious here! Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/src/Makefile.am b/src/Makefile.am index 139355e..b5bff06 100644 --- a/src/Makefile.am +++ b/src/Makefile.am @@ -60,7 +60,7 @@ init.c: mkinit $(dash_CFILES) nodes.c nodes.h: mknodes nodetypes nodes.c.pat ./$^ -syntax.c syntax.h: mksyntax +syntax.c syntax.h: mksyntax parser.h ./$^ signames.c: mksignames diff --git a/src/expand.c b/src/expand.c index 2a50830..aeffe82 100644 --- a/src/expand.c +++ b/src/expand.c @@ -244,6 +244,7 @@ argstr(char *p, int flag) CTLVAR, CTLBACKQ, CTLENDARI, + CTLBACK, 0 }; const char *reject = spclchars; @@ -330,6 +331,7 @@ addquote: startloc++; } break; + case CTLBACK: case CTLESC: startloc++; length++; @@ -340,7 +342,7 @@ addquote: * backslash. */ if (((flag | inquotes) & (EXP_QPAT | EXP_QUOTED)) == - EXP_QPAT && *p != '\\') + EXP_QPAT && (*p != '\\' || c == CTLBACK)) break; goto addquote; @@ -373,6 +375,7 @@ exptilde(char *startp, char *p, int flag) while ((c = *++p) != '\0') { switch(c) { + case CTLBACK: case CTLESC: return (startp); case CTLQUOTEMARK: @@ -594,7 +597,7 @@ scanleft( *loc2 = c; if (match) return loc; - if (quotes && *loc == (char)CTLESC) + if (quotes && (*loc == (char)CTLESC || *loc == (char)CTLBACK)) loc++; loc++; loc2++; @@ -821,7 +824,8 @@ end: if (subtype != VSNORMAL) { /* skip to end of alternative */ int nesting = 1; for (;;) { - if ((c = (signed char)*p++) == CTLESC) + if ((c = (signed char)*p++) == CTLESC || + c == CTLBACK) p++; else if (c == CTLBACKQ) { if (varlen >= 0) @@ -1052,7 +1056,7 @@ ifsbreakup(char *string, int maxargs, struct arglist *arglist) q = p; c = *p++; - if (c == (char)CTLESC) + if (c == (char)CTLESC || c == (char)CTLBACK) c = *p++; isifs = strchr(ifs, c); @@ -1684,7 +1688,7 @@ _rmescapes(char *str, int flag) notescaped = globbing; continue; } - if (*p == (char)CTLESC) { + if (*p == (char)CTLESC || *p == (char)CTLBACK) { p++; if (notescaped) *q++ = '\\'; diff --git a/src/jobs.c b/src/jobs.c index 4f02e38..a3aef20 100644 --- a/src/jobs.c +++ b/src/jobs.c @@ -1386,6 +1386,7 @@ cmdputs(const char *s) while ((c = *p++) != 0) { str = 0; switch (c) { + case CTLBACK: case CTLESC: c = *p++; break; diff --git a/src/mystring.c b/src/mystring.c index 0106bd2..2573d1f 100644 --- a/src/mystring.c +++ b/src/mystring.c @@ -62,7 +62,7 @@ const char spcstr[] = " "; const char snlfmt[] = "%s\n"; const char dolatstr[] = { CTLQUOTEMARK, CTLVAR, VSNORMAL, '@', '=', CTLQUOTEMARK, '\0' }; -const char qchars[] = { CTLESC, CTLQUOTEMARK, 0 }; +const char qchars[] = { CTLESC, CTLQUOTEMARK, CTLBACK, 0 }; const char illnum[] = "Illegal number: %s"; const char homestr[] = "HOME"; diff --git a/src/parser.c b/src/parser.c index 382658e..22fc84a 100644 --- a/src/parser.c +++ b/src/parser.c @@ -930,7 +930,7 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs) case CBACK: c = pgetc2(); if (c == PEOF) { - USTPUTC(CTLESC, out); + USTPUTC(CTLBACK, out); USTPUTC('\\', out); pungetc(); } else if (c == '\n') { @@ -944,6 +944,7 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs) eofmark != NULL ) ) { + USTPUTC(CTLBACK, out); USTPUTC('\\', out); } USTPUTC(CTLESC, out); diff --git a/src/parser.h b/src/parser.h index 2875cce..54c02eb 100644 --- a/src/parser.h +++ b/src/parser.h @@ -45,7 +45,8 @@ #define CTLARI -122 /* arithmetic expression */ #define CTLENDARI -121 #define CTLQUOTEMARK -120 -#define CTL_LAST -120 /* last 'special' character */ +#define CTLBACK -119 +#define CTL_LAST -119 /* last 'special' character */ /* variable substitution byte (follows CTLVAR) */ #define VSTYPE 0x0f /* type of variable substitution */ diff --git a/src/show.c b/src/show.c index 4a049e9..ed840aa 100644 --- a/src/show.c +++ b/src/show.c @@ -166,6 +166,7 @@ sharg(union node *arg, FILE *fp) bqlist = arg->narg.backquote; for (p = arg->narg.text ; *p ; p++) { switch ((signed char)*p) { + case CTLBACK: case CTLESC: putc(*++p, fp); break; @@ -311,6 +312,7 @@ trstring(char *s) case '\r': c = 'r'; goto backslash; case '"': c = '"'; goto backslash; case '\\': c = '\\'; goto backslash; + case CTLBACK: case CTLESC: c = 'e'; goto backslash; case CTLVAR: c = 'v'; goto backslash; case CTLBACKQ: c = 'q'; goto backslash;