Message ID | 2d33238c-55c6-1aed-f16b-46be410ca993@gigawatt.nl (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Herbert Xu |
Headers | show |
On Wed, Oct 12, 2016 at 07:24:26PM +0200, Harald van Dijk wrote: > > > I would have expected another exception to be in alias expansions that > > end in a backslash. Shells are not entirely in agreement there, but most > > appear to treat this the regular way, meaning > > > > dash -c 'alias bs=\\ > > bs > > ' > > > > prints nothing. I think your patch changes this. In order to preserve the existing behaviour (which seems logical), you should change the second pgetc call in pgetc_eatbnl to pgetc2. > With more extensive testing, the only issue I've seen is what Jilles > Tjoelker had already mentioned, namely that backslash-newline should be > preserved inside single-quoted strings, and also that it should be > preserved inside heredocs where any part of the delimiter is quoted: > > cat <<\EOF > \ > EOF > > dash's parsing treats this mostly the same as a single-quoted string, > and the same extra check handles both cases. > > Here's an updated patch. Hoping this looks okay and can be applied. I'm fine with the concept. However, your patch also breaks here- document parsing when the delimiter is a single backslash. cat << "\" \ If you can fix these two problems it should be good to go. Cheers,
Op 06-03-18 om 08:45 schreef Herbert Xu: > However, your patch also breaks here- > document parsing when the delimiter is a single backslash. > > cat << "\" > \ That is supposed to break. "\" is not a correctly quoted backslash. Try '\' or "\\" or \\ - M. -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 3/6/18 9:45 AM, Herbert Xu wrote: > On Wed, Oct 12, 2016 at 07:24:26PM +0200, Harald van Dijk wrote: >> >>> I would have expected another exception to be in alias expansions that >>> end in a backslash. Shells are not entirely in agreement there, but most >>> appear to treat this the regular way, meaning >>> >>> dash -c 'alias bs=\\ >>> bs >>> ' >>> >>> prints nothing. > > I think your patch changes this. In order to preserve the existing > behaviour (which seems logical), you should change the second pgetc > call in pgetc_eatbnl to pgetc2. Oh, indeed, thanks. There's another problem: when there is no following command (as in the above example), things break. A shorter reproducer that has failed for years is $ dash -c 'alias x= x' dash: 2: Syntax error: end of file unexpected This breaks because the part where list() checks for NL/EOF, checkkwd==0, so aliases aren't expanded. Immediately after that, checkkwd is set and the next call to readtoken() would return TEOF, but by that point, dash has already committed to parsing a command. Since this is actually a long-standing problem, not something introduced by the patch, I think it's okay to ignore for now. Do you agree? >> With more extensive testing, the only issue I've seen is what Jilles >> Tjoelker had already mentioned, namely that backslash-newline should be >> preserved inside single-quoted strings, and also that it should be >> preserved inside heredocs where any part of the delimiter is quoted: >> >> cat <<\EOF >> \ >> EOF >> >> dash's parsing treats this mostly the same as a single-quoted string, >> and the same extra check handles both cases. >> >> Here's an updated patch. Hoping this looks okay and can be applied. > > > I'm fine with the concept. However, your patch also breaks here- > document parsing when the delimiter is a single backslash. > > cat << "\" > \ > > If you can fix these two problems it should be good to go. As Martijn Dekker wrote, this should work when the backslash is escaped or single-quoted, and in my testing does. But what you have is a nice start of another corner case: cat << "\" \ "EOF ok " EOF I'm happily surprised to see that dash accepts and gives sensible treatment to multi-line heredoc delimiters. Okay with your one extra pgetc()=>pgetc2() change, then? Cheers, Harald van Dijk > Cheers, -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Mar 07, 2018 at 01:36:08AM +0100, Harald van Dijk wrote: > > Oh, indeed, thanks. > > There's another problem: when there is no following command (as in the above > example), things break. A shorter reproducer that has failed for years is > > $ dash -c 'alias x= > x' > dash: 2: Syntax error: end of file unexpected > > This breaks because the part where list() checks for NL/EOF, checkkwd==0, so > aliases aren't expanded. Immediately after that, checkkwd is set and the > next call to readtoken() would return TEOF, but by that point, dash has > already committed to parsing a command. > > Since this is actually a long-standing problem, not something introduced by > the patch, I think it's okay to ignore for now. Do you agree? Sure. > >I'm fine with the concept. However, your patch also breaks here- > >document parsing when the delimiter is a single backslash. > > > > cat << "\" > > \ > > > >If you can fix these two problems it should be good to go. > > As Martijn Dekker wrote, this should work when the backslash is escaped or > single-quoted, and in my testing does. But what you have is a nice start of > another corner case: I made mistake with the quotes but it's a real problem: cat << '\' \ The issue is that your pgetc_eatbnl executes before we check for the end of here-documents. It has to be moved after it. IOW I think you should leave the pgetc's before CHECKEND as is, which also means keeping some of the CBACK processing as is. Thanks,
On 3/7/18 7:18 AM, Herbert Xu wrote: > On Wed, Mar 07, 2018 at 01:36:08AM +0100, Harald van Dijk wrote: > >>> I'm fine with the concept. However, your patch also breaks here- >>> document parsing when the delimiter is a single backslash. >>> >>> cat << "\" >>> \ >>> >>> If you can fix these two problems it should be good to go. >> >> As Martijn Dekker wrote, this should work when the backslash is escaped or >> single-quoted, and in my testing does. But what you have is a nice start of >> another corner case: > > I made mistake with the quotes but it's a real problem: > > cat << '\' > \ I expect this to print nothing, and it does. > The issue is that your pgetc_eatbnl executes before we check for the > end of here-documents. It has to be moved after it. IOW I think > you should leave the pgetc's before CHECKEND as is, which also means > keeping some of the CBACK processing as is. This was wrong in the original patch, but I'm not seeing it in the updated patch that you replied to. When parsing a heredoc where part of delimiter is quoted, syntax==SQSYNTAX. Since the calls to pgetc_eatbnl() are conditional on syntax!=SQSYNTAX, there shouldn't be a problem. It would be a different story if the delimiter could be an unquoted backslash, but thankfully that is not possible. > Thanks, -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Mar 07, 2018 at 07:49:16AM +0100, Harald van Dijk wrote: > > This was wrong in the original patch, but I'm not seeing it in the updated > patch that you replied to. When parsing a heredoc where part of delimiter is > quoted, syntax==SQSYNTAX. Since the calls to pgetc_eatbnl() are conditional > on syntax!=SQSYNTAX, there shouldn't be a problem. It would be a different > story if the delimiter could be an unquoted backslash, but thankfully that > is not possible. Good point. In that case please resend it with the pgetc2 change and it should be good to go. Thanks,
--- a/src/parser.c +++ b/src/parser.c @@ -106,6 +106,7 @@ STATIC void parseheredoc(void); STATIC int peektoken(void); STATIC int readtoken(void); STATIC int xxreadtoken(void); +STATIC int pgetc_eatbnl(); STATIC int readtoken1(int, char const *, char *, int); STATIC void synexpect(int) __attribute__((__noreturn__)); STATIC void synerror(const char *) __attribute__((__noreturn__)); @@ -656,8 +657,10 @@ parseheredoc(void) if (needprompt) { setprompt(2); } - readtoken1(pgetc(), here->here->type == NHERE? SQSYNTAX : DQSYNTAX, - here->eofmark, here->striptabs); + if (here->here->type == NHERE) + readtoken1(pgetc(), SQSYNTAX, here->eofmark, here->striptabs); + else + readtoken1(pgetc_eatbnl(), DQSYNTAX, here->eofmark, here->striptabs); n = (union node *)stalloc(sizeof (struct narg)); n->narg.type = NARG; n->narg.next = NULL; @@ -782,7 +785,7 @@ xxreadtoken(void) setprompt(2); } for (;;) { /* until token or start of word found */ - c = pgetc(); + c = pgetc_eatbnl(); switch (c) { case ' ': case '\t': case PEOA: @@ -791,30 +794,23 @@ xxreadtoken(void) while ((c = pgetc()) != '\n' && c != PEOF); pungetc(); continue; - case '\\': - if (pgetc() == '\n') { - nlprompt(); - continue; - } - pungetc(); - goto breakloop; case '\n': nlnoprompt(); RETURN(TNL); case PEOF: RETURN(TEOF); case '&': - if (pgetc() == '&') + if (pgetc_eatbnl() == '&') RETURN(TAND); pungetc(); RETURN(TBACKGND); case '|': - if (pgetc() == '|') + if (pgetc_eatbnl() == '|') RETURN(TOR); pungetc(); RETURN(TPIPE); case ';': - if (pgetc() == ';') + if (pgetc_eatbnl() == ';') RETURN(TENDCASE); pungetc(); RETURN(TSEMI); @@ -822,11 +818,9 @@ xxreadtoken(void) RETURN(TLP); case ')': RETURN(TRP); - default: - goto breakloop; } + break; } -breakloop: return readtoken1(c, BASESYNTAX, (char *)NULL, 0); #undef RETURN } @@ -903,7 +897,7 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs) attyline(); if (syntax == BASESYNTAX) return readtoken(); - c = pgetc(); + c = syntax == SQSYNTAX ? pgetc() : pgetc_eatbnl(); goto loop; } #endif @@ -916,7 +910,7 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs) goto endword; /* exit outer loop */ USTPUTC(c, out); nlprompt(); - c = pgetc(); + c = syntax == SQSYNTAX ? pgetc() : pgetc_eatbnl(); goto loop; /* continue outer loop */ case CWORD: USTPUTC(c, out); @@ -933,8 +927,6 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs) USTPUTC(CTLESC, out); USTPUTC('\\', out); pungetc(); - } else if (c == '\n') { - nlprompt(); } else { if ( dblquote && @@ -997,7 +989,7 @@ quotemark: USTPUTC(c, out); --parenlevel; } else { - if (pgetc() == ')') { + if (pgetc_eatbnl() == ')') { USTPUTC(CTLENDARI, out); if (!--arinest) syntax = prevsyntax; @@ -1025,7 +1017,7 @@ quotemark: USTPUTC(c, out); } } - c = pgetc(); + c = syntax == SQSYNTAX ? pgetc() : pgetc_eatbnl(); } } endword: @@ -1132,7 +1124,7 @@ parseredir: { np = (union node *)stalloc(sizeof (struct nfile)); if (c == '>') { np->nfile.fd = 1; - c = pgetc(); + c = pgetc_eatbnl(); if (c == '>') np->type = NAPPEND; else if (c == '|') @@ -1145,7 +1137,7 @@ parseredir: { } } else { /* c == '<' */ np->nfile.fd = 0; - switch (c = pgetc()) { + switch (c = pgetc_eatbnl()) { case '<': if (sizeof (struct nfile) != sizeof (struct nhere)) { np = (union node *)stalloc(sizeof (struct nhere)); @@ -1154,7 +1146,7 @@ parseredir: { np->type = NHERE; heredoc = (struct heredoc *)stalloc(sizeof (struct heredoc)); heredoc->here = np; - if ((c = pgetc()) == '-') { + if ((c = pgetc_eatbnl()) == '-') { heredoc->striptabs = 1; } else { heredoc->striptabs = 0; @@ -1336,21 +1328,12 @@ parsebackq: { if (needprompt) { setprompt(2); } - switch (pc = pgetc()) { + switch (pc = pgetc_eatbnl()) { case '`': goto done; case '\\': - if ((pc = pgetc()) == '\n') { - nlprompt(); - /* - * If eating a newline, avoid putting - * the newline into the new character - * stream (via the STPUTC after the - * switch). - */ - continue; - } + pc = pgetc_eatbnl(); if (pc != '\\' && pc != '`' && pc != '$' && (!dblquote || pc != '"')) STPUTC('\\', pout); @@ -1529,7 +1512,7 @@ expandstr(const char *ps) saveprompt = doprompt; doprompt = 0; - readtoken1(pgetc(), DQSYNTAX, FAKEEOFMARK, 0); + readtoken1(pgetc_eatbnl(), DQSYNTAX, FAKEEOFMARK, 0); doprompt = saveprompt;