From patchwork Sun Mar 4 11:44:59 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Harald van Dijk X-Patchwork-Id: 10257609 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 34932602B5 for ; Sun, 4 Mar 2018 11:44:10 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0F01928794 for ; Sun, 4 Mar 2018 11:44:10 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 02F182879B; Sun, 4 Mar 2018 11:44:09 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, RCVD_IN_DNSWL_HI, T_TVD_MIME_EPI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C034928794 for ; Sun, 4 Mar 2018 11:44:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752570AbeCDLoG (ORCPT ); Sun, 4 Mar 2018 06:44:06 -0500 Received: from home.gigawatt.nl ([83.163.3.213]:50448 "EHLO home.gigawatt.nl" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751844AbeCDLoD (ORCPT ); Sun, 4 Mar 2018 06:44:03 -0500 Received: from [IPv6:2001:980:4809:1:e045:1301:c405:78bf] (unknown [IPv6:2001:980:4809:1:e045:1301:c405:78bf]) by home.gigawatt.nl (Postfix) with ESMTPSA id 36C5D5402945; Sun, 4 Mar 2018 11:44:00 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 home.gigawatt.nl 36C5D5402945 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gigawatt.nl; s=default; t=1520163840; bh=DKWK+iSj8Czs5DaZY3dZjquPRU6D0AzdwBYOj+XuQx4=; l=29708; h=Subject:From:To:Cc:References:Date:In-Reply-To:From; b=c6j5UWpksCNZ1xVi3ssKLQXSKLU5+0LRJeqNVWug6SdqOt2xXm9WgkCKHc9XocF+K WoM87MG5S+wC6TEAgTT7pSLaPSu4NVwGnuhQqsHb7qwzjljosatbS22LysO8OD69cp jMYU3YY9IJXjbnR0uSnyxoSUVUd/eW+0G0/hco+8= Subject: Re: dash bug: double-quoted "\" breaks glob protection for next char From: Harald van Dijk To: Herbert Xu Cc: Denys Vlasenko , dash@vger.kernel.org References: <9f37ae19-6f74-f527-aa49-dd04c3c010f6@gigawatt.nl> <73e4ad51-1c3b-3173-429f-401296244869@gigawatt.nl> <20180224003344.GA3354@gondor.apana.org.au> <32935756-b1c4-70bc-2e72-4d2b0cb2a835@gigawatt.nl> <20180224165224.GA3864@gondor.apana.org.au> <86692fea-c33f-d26d-3b26-6e43bc22a0ee@gigawatt.nl> <20180302074922.GA19418@gondor.apana.org.au> <4242819b-4aee-1238-203f-ec08d001be05@gigawatt.nl> Message-ID: Date: Sun, 4 Mar 2018 12:44:59 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:58.0) Gecko/20100101 Thunderbird/58.0 MIME-Version: 1.0 In-Reply-To: <4242819b-4aee-1238-203f-ec08d001be05@gigawatt.nl> Content-Language: en-US Sender: dash-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: dash@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On 3/2/18 11:58 AM, Harald van Dijk wrote: > On 02/03/2018 08:49, Herbert Xu wrote: >> If we fix this in the parser then everything should just work. > > Right, that's the approach FreeBSD sh has taken that I referred to in my > message from Feb 18, that I'd personally prefer as well. It basically > involves reverting 7cfd8be0dc83342b4a71f3a8e5b7efab4670e50c, setting > syntax to BASESYNTAX/DQSYNTAX (whichever is appropriate) when the parse > of a variable expansion starts, and finding a sensible way to change the > syntax back to BASESYNTAX/DQSYNTAX/ARISYNTAX when it ends. In FreeBSD > sh, an explicit stack of syntaxes is created for this, but that might be > avoidable: with slight modifications to what gets stored in the byte > after CTLVAR/CTLARI, it might be possible to go back through the parser > output to determine the syntax to revert to. I'll see if I can get that > working. Since I didn't see how to avoid this approach, I went ahead with this attempt and the attached is the result. I started out by reverting 7cfd8be0dc83342b4a71f3a8e5b7efab4670e50c. Since that also removed some dead code, I re-removed the read code. I then modified the byte after CTLVAR so that it didn't just store whether the result was quoted, it stored the prior syntax, and forcibly reset syntax to either BASESYNTAX or DQSYNTAX as appropriate. Then, in CTLENDVAR, I look for the opening CTLVAR, and use that to restore the prior syntax. The same goes for CTLARI/CTLENDARI too. When CTLENDVAR is seen, I double-check that syntax has the expected value. This fixes the handling of "${$+"}"}", where the inner } was seen as ending the variable substitution. This fixes more cases than just backslashes and single quotes: another character that's special in unquoted contexts is ~, so "${HOME#~}" should expand to an empty string. This changes how $(( ${$+"123"} )) gets handled: POSIX doesn't really answer this, I think. POSIX says that $(( "123" )) is a syntax error, but doesn't address whether " is special when it appears in other places than directly in the $((...)). Most shells accept $(( ${$+"123"} )), and with this patch, dash accepts it too. This changes how "${x+"$y"}" get handled: POSIX is silent about whether the $y should be treated as quoted. dash has treated it as quoted for a very long time. ash has historically treated it as unquoted. With this patch, it gets treated as unquoted. Since 7cfd8be0dc83342b4a71f3a8e5b7efab4670e50c had also changed how "$@" got handled and reverting that changed it, I looked into how this works and fixed another bug. It also changes the handling of $* and $@ when IFS is set but empty: dash 0.5.8 didn't handle empty IFS properly at all, even if all parameters were non-empty. dash 0.5.9.1 preserves empty parameters. With this patch, they get removed just like in bash. POSIX allows for either. I would be a bit surprised if the patch is acceptable in its current form, but it's worth seeing which of the current results are definitely correct, which of the results are acceptable, which results may well be unwanted, and which special cases I missed. Cheers, Harald van Dijk command: echo "\*" bash: \* dash 0.5.8: \wwww \zzzz dash 0.5.9.1: \wwww \zzzz dash patched: \* command: case \\ab in "\*") echo BUG;; esac bash: dash 0.5.8: BUG dash 0.5.9.1: BUG dash patched: command: case \\a in "\?") echo BUG;; esac bash: dash 0.5.8: BUG dash 0.5.9.1: BUG dash patched: command: foo=\\; echo "<${foo#[\\]]}>" bash: <\> dash 0.5.8: <\> dash 0.5.9.1: <\> dash patched: <\> command: foo=a; echo "<${foo#[a\]]}>" bash: <> dash 0.5.8: <> dash 0.5.9.1: <> dash patched: <> command: x=yz; echo "${x#'y'}" bash: z dash 0.5.8: yz dash 0.5.9.1: yz dash patched: z command: x=yz; echo "${x+'y'}" bash: 'y' dash 0.5.8: 'y' dash 0.5.9.1: 'y' dash patched: 'y' command: x="''''"; echo "${x#"${x+''}"''}" bash: '' dash 0.5.8: dash 0.5.9.1: dash patched: '' command: HOME=/; echo "${HOME#~}" bash: dash 0.5.8: / dash 0.5.9.1: / dash patched: command: x="13"; echo $(( ${x#'1'} )) bash: 3 dash 0.5.8: 13 dash 0.5.9.1: 13 dash patched: 3 command: echo $(( ${$+"123"} )) bash: 123 dash 0.5.8: dash: 1: arithmetic expression: expecting primary: " "123" " dash 0.5.9.1: dash: 1: arithmetic expression: expecting primary: " "123" " dash patched: 123 command: set -- a ""; space=" "; printf "<%s>" "$@"$space bash: <> dash 0.5.8: < > dash 0.5.9.1: < > dash patched: <> command: IFS=; set -- a b; printf "<%s>" $@ bash: dash 0.5.8: dash 0.5.9.1: dash patched: command: IFS=; set -- a ""; printf "<%s>" $@ bash: dash 0.5.8: dash 0.5.9.1: <> dash patched: command: IFS=; set -- a ""; printf "<%s>" $* bash: dash 0.5.8: dash 0.5.9.1: <> dash patched: command: echo "${$+"{}"}" bash: {} dash 0.5.8: dash: 1: Syntax error: Unterminated quoted string dash 0.5.9.1: dash: 1: Syntax error: Unterminated quoted string dash patched: {} command: x="a b"; printf "<%s>" "${x+"$x"}" bash: dash 0.5.8: dash 0.5.9.1: dash patched: diff --git a/src/Makefile.am b/src/Makefile.am index 139355e..525f8ef 100644 --- a/src/Makefile.am +++ b/src/Makefile.am @@ -66,7 +66,7 @@ syntax.c syntax.h: mksyntax signames.c: mksignames ./$^ -mksyntax: token.h +mksyntax: parser.h token.h $(HELPERS): %: %.c $(COMPILE_FOR_BUILD) -o $@ $< diff --git a/src/TOUR b/src/TOUR index 056e79b..f6a4641 100644 --- a/src/TOUR +++ b/src/TOUR @@ -150,6 +150,7 @@ special codes defined in parser.h. The special codes are: CTLVAR Variable substitution CTLENDVAR End of variable substitution CTLBACKQ Command substitution + CTLBACKQ|CTLQUOTE Command substitution inside double quotes CTLESC Escape next character A variable substitution contains the following elements: @@ -169,13 +170,17 @@ stitution. The possible types are: VSASSIGN ${var=text} VSASSIGN|VSNUL ${var=text} -The name of the variable comes next, terminated by an equals -sign. If the type is not VSNORMAL, then the text field in the -substitution follows, terminated by a CTLENDVAR byte. +In addition, the type field will have the VSQUOTE flag set if the +variable is enclosed in double quotes, or VSARITH set if the variable +appears inside an $((...)) arithmetic expansion. The name of the +variable comes next, terminated by an equals sign. If the type is not +VSNORMAL, then the text field in the substitution follows, ter- +minated by a CTLENDVAR byte. Commands in back quotes are parsed and stored in a linked list. The locations of these commands in the string are indicated by -the CTLBACKQ character. +CTLBACKQ and CTLBACKQ|CTLQUOTE characters, depending upon whether +the back quotes were enclosed in double quotes. The character CTLESC escapes the next character, so that in case any of the CTL characters mentioned above appear in the input, diff --git a/src/expand.c b/src/expand.c index 2a50830..67eb747 100644 --- a/src/expand.c +++ b/src/expand.c @@ -83,7 +83,7 @@ #define RMESCAPE_HEAP 0x10 /* Malloc strings instead of stalloc */ /* Add CTLESC when necessary. */ -#define QUOTES_ESC (EXP_FULL | EXP_CASE | EXP_QPAT) +#define QUOTES_ESC (EXP_FULL | EXP_CASE) /* Do not skip NUL characters. */ #define QUOTES_KEEPNUL EXP_TILDE @@ -112,12 +112,12 @@ static struct arglist exparg; STATIC void argstr(char *, int); STATIC char *exptilde(char *, char *, int); -STATIC void expbackq(union node *, int); +STATIC void expbackq(union node *, int, int); STATIC const char *subevalvar(char *, char *, int, int, int, int, int); STATIC char *evalvar(char *, int); STATIC size_t strtodest(const char *, const char *, int); STATIC void memtodest(const char *, size_t, const char *, int); -STATIC ssize_t varvalue(char *, int, int, int *); +STATIC ssize_t varvalue(char *, int, int, int); STATIC void expandmeta(struct strlist *, int); #ifdef HAVE_GLOB STATIC void addglob(const glob_t *); @@ -243,15 +243,19 @@ argstr(char *p, int flag) CTLESC, CTLVAR, CTLBACKQ, + CTLBACKQ | CTLQUOTE, CTLENDARI, 0 }; const char *reject = spclchars; - int c; + int c = 0; + int quotes = flag & QUOTES_ESC; int breakall = (flag & (EXP_WORD | EXP_QUOTED)) == EXP_WORD; int inquotes; size_t length; int startloc; + int prev; + int dolatstrhack; if (!(flag & EXP_VARTILDE)) { reject += 2; @@ -273,6 +277,7 @@ start: startloc = expdest - (char *)stackblock(); for (;;) { length += strcspn(p + length, reject); + prev = c; c = (signed char)p[length]; if (c && (!(c & 0x80) || c == CTLENDARI)) { /* c == '=' || c == ':' || c == CTLENDARI */ @@ -316,15 +321,9 @@ start: case CTLENDVAR: /* ??? */ goto breakloop; case CTLQUOTEMARK: - inquotes ^= EXP_QUOTED; - /* "$@" syntax adherence hack */ - if (inquotes && !memcmp(p, dolatstr + 1, - DOLATSTRLEN - 1)) { - p = evalvar(p + 1, flag | inquotes) + 1; - goto start; - } + inquotes = !inquotes; addquote: - if (flag & QUOTES_ESC) { + if (quotes) { p--; length++; startloc++; @@ -333,27 +332,26 @@ addquote: case CTLESC: startloc++; length++; - - /* - * Quoted parameter expansion pattern: remove quote - * unless inside inner quotes or we have a literal - * backslash. - */ - if (((flag | inquotes) & (EXP_QPAT | EXP_QUOTED)) == - EXP_QPAT && *p != '\\') - break; - goto addquote; case CTLVAR: - p = evalvar(p, flag | inquotes); + /* "$@" syntax adherence hack */ + dolatstrhack = !memcmp(p, dolatstr+1, DOLATSTRLEN-1) && !shellparam.nparam && quotes; + p = evalvar(p, flag); + if (dolatstrhack && prev == (char)CTLQUOTEMARK && *p == (char)CTLQUOTEMARK) { + expdest--; + inquotes = !inquotes; + p++; + } goto start; case CTLBACKQ: - expbackq(argbackq->n, flag | inquotes); + c = 0; + case CTLBACKQ|CTLQUOTE: + expbackq(argbackq->n, c, quotes); argbackq = argbackq->next; goto start; case CTLENDARI: p--; - expari(flag | inquotes); + expari(quotes); goto start; } } @@ -449,11 +447,12 @@ removerecordregions(int endoff) * evaluate, place result in (backed up) result, adjust string position. */ void -expari(int flag) +expari(int quotes) { struct stackmark sm; char *p, *start; int begoff; + char flag; int len; intmax_t result; @@ -468,42 +467,24 @@ expari(int flag) p = expdest; pushstackmark(&sm, p - start); *--p = '\0'; - p--; - do { - int esc; - - while (*p != (char)CTLARI) { - p--; -#ifdef DEBUG - if (p < start) { - sh_error("missing CTLARI (shouldn't happen)"); - } -#endif - } - - esc = esclen(start, p); - if (!(esc % 2)) { - break; - } - - p -= esc + 1; - } while (1); - + p = findstartchar(start, p, CTLARI, CTLENDARI); begoff = p - start; removerecordregions(begoff); + flag = p[1] & VSSYNTAX; + expdest = p; - if (likely(flag & QUOTES_ESC)) - rmescapes(p + 1); + if (likely(quotes)) + rmescapes(p + 2); - result = arith(p + 1); + result = arith(p + 2); popstackmark(&sm); len = cvtnum(result); - if (likely(!(flag & EXP_QUOTED))) + if (likely(!flag)) recordregion(begoff, begoff + len, 0); } @@ -513,7 +494,7 @@ expari(int flag) */ STATIC void -expbackq(union node *cmd, int flag) +expbackq(union node *cmd, int quoted, int quotes) { struct backcmd in; int i; @@ -521,7 +502,7 @@ expbackq(union node *cmd, int flag) char *p; char *dest; int startloc; - char const *syntax = flag & EXP_QUOTED ? DQSYNTAX : BASESYNTAX; + char const *syntax = quoted ? DQSYNTAX : BASESYNTAX; struct stackmark smark; INTOFF; @@ -535,7 +516,7 @@ expbackq(union node *cmd, int flag) if (i == 0) goto read; for (;;) { - memtodest(p, i, syntax, flag & QUOTES_ESC); + memtodest(p, i, syntax, quotes); read: if (in.fd < 0) break; @@ -562,7 +543,7 @@ read: STUNPUTC(dest); expdest = dest; - if (!(flag & EXP_QUOTED)) + if (!quoted) recordregion(startloc, dest - (char *)stackblock(), 0); TRACE(("evalbackq: size=%d: \"%.*s\"\n", (dest - (char *)stackblock()) - startloc, @@ -639,9 +620,8 @@ scanright( } STATIC const char * -subevalvar(char *p, char *str, int strloc, int subtype, int startloc, int varflags, int flag) +subevalvar(char *p, char *str, int strloc, int subtype, int startloc, int varflags, int quotes) { - int quotes = flag & QUOTES_ESC; char *startp; char *loc; struct nodelist *saveargbackq = argbackq; @@ -651,8 +631,7 @@ subevalvar(char *p, char *str, int strloc, int subtype, int startloc, int varfla char *(*scan)(char *, char *, char *, char *, int , int); argstr(p, EXP_TILDE | (subtype != VSASSIGN && subtype != VSQUESTION ? - (flag & (EXP_QUOTED | EXP_QPAT) ? - EXP_QPAT : EXP_CASE) : 0)); + EXP_CASE : 0)); STPUTC('\0', expdest); argbackq = saveargbackq; startp = stackblock() + startloc; @@ -722,22 +701,25 @@ evalvar(char *p, int flag) int startloc; ssize_t varlen; int easy; + int quotes; int quoted; + quotes = flag & QUOTES_ESC; varflags = *p++; subtype = varflags & VSTYPE; if (!subtype) sh_error("Bad substitution"); - quoted = flag & EXP_QUOTED; + quoted = varflags & VSQUOTE; var = p; easy = (!quoted || (*var == '@' && shellparam.nparam)); + startloc = expdest - (char *)stackblock(); p = strchr(p, '=') + 1; again: - varlen = varvalue(var, varflags, flag, "ed); + varlen = varvalue(var, varflags, flag, quoted); if (varflags & VSNUL) varlen--; @@ -749,7 +731,8 @@ again: if (subtype == VSMINUS) { vsplus: if (varlen < 0) { - argstr(p, flag | EXP_TILDE | EXP_WORD); + argstr(p, flag | EXP_TILDE | EXP_WORD | + (quoted ? EXP_QUOTED : 0)); goto end; } goto record; @@ -759,8 +742,7 @@ vsplus: if (varlen >= 0) goto record; - subevalvar(p, var, 0, subtype, startloc, varflags, - flag & ~QUOTES_ESC); + subevalvar(p, var, 0, subtype, startloc, varflags, 0); varflags &= ~VSNUL; /* * Remove any recorded regions beyond @@ -806,7 +788,7 @@ record: STPUTC('\0', expdest); patloc = expdest - (char *)stackblock(); if (subevalvar(p, NULL, patloc, subtype, - startloc, varflags, flag) == 0) { + startloc, varflags, quotes) == 0) { int amount = expdest - ( (char *)stackblock() + patloc - 1 ); @@ -823,7 +805,7 @@ end: for (;;) { if ((c = (signed char)*p++) == CTLESC) p++; - else if (c == CTLBACKQ) { + else if (c == CTLBACKQ || c == (CTLBACKQ|CTLQUOTE)) { if (varlen >= 0) argbackq = argbackq->next; } else if (c == CTLVAR) { @@ -887,7 +869,7 @@ strtodest(p, syntax, quotes) */ STATIC ssize_t -varvalue(char *name, int varflags, int flags, int *quotedp) +varvalue(char *name, int varflags, int flags, int quoted) { int num; char *p; @@ -896,7 +878,6 @@ varvalue(char *name, int varflags, int flags, int *quotedp) char sepc; char **ap; char const *syntax; - int quoted = *quotedp; int subtype = varflags & VSTYPE; int discard = subtype == VSPLUS || subtype == VSLENGTH; int quotes = (discard ? 0 : (flags & QUOTES_ESC)) | QUOTES_KEEPNUL; @@ -942,7 +923,6 @@ numvar: sep |= ifsset() ? ifsval()[0] : ' '; param: sepc = sep; - *quotedp = !sepc; if (!(ap = shellparam.p)) return -1; while ((p = *ap++)) { @@ -1644,7 +1624,6 @@ char * _rmescapes(char *str, int flag) { char *p, *q, *r; - unsigned inquotes; int notescaped; int globbing; @@ -1674,24 +1653,23 @@ _rmescapes(char *str, int flag) q = mempcpy(q, str, len); } } - inquotes = 0; globbing = flag & RMESCAPE_GLOB; notescaped = globbing; while (*p) { if (*p == (char)CTLQUOTEMARK) { - inquotes = ~inquotes; p++; notescaped = globbing; continue; } + if (*p == '\\') { + /* naked back slash */ + notescaped = 0; + goto copy; + } if (*p == (char)CTLESC) { p++; if (notescaped) *q++ = '\\'; - } else if (*p == '\\' && !inquotes) { - /* naked back slash */ - notescaped = 0; - goto copy; } notescaped = globbing; copy: diff --git a/src/expand.h b/src/expand.h index 26dc5b4..90f5328 100644 --- a/src/expand.h +++ b/src/expand.h @@ -55,7 +55,6 @@ struct arglist { #define EXP_VARTILDE 0x4 /* expand tildes in an assignment */ #define EXP_REDIR 0x8 /* file glob for a redirection (1 match only) */ #define EXP_CASE 0x10 /* keeps quotes around for CASE pattern */ -#define EXP_QPAT 0x20 /* pattern in quoted parameter expansion */ #define EXP_VARTILDE2 0x40 /* expand tildes after colons only */ #define EXP_WORD 0x80 /* expand word in parameter expansion */ #define EXP_QUOTED 0x100 /* expand word in double quotes */ diff --git a/src/jobs.c b/src/jobs.c index 4f02e38..6ba6b48 100644 --- a/src/jobs.c +++ b/src/jobs.c @@ -1375,7 +1375,6 @@ cmdputs(const char *s) char *nextc; signed char c; int subtype = 0; - int quoted = 0; static const char vstype[VSTYPE + 1][4] = { "", "}", "-", "+", "?", "=", "%", "%%", "#", "##", @@ -1397,11 +1396,11 @@ cmdputs(const char *s) str = "${"; goto dostr; case CTLENDVAR: - str = "\"}" + !(quoted & 1); - quoted >>= 1; + str = "}"; subtype = 0; goto dostr; case CTLBACKQ: + case CTLBACKQ|CTLQUOTE: str = "$(...)"; goto dostr; case CTLARI: @@ -1411,14 +1410,11 @@ cmdputs(const char *s) str = "))"; goto dostr; case CTLQUOTEMARK: - quoted ^= 1; c = '"'; break; case '=': if (subtype == 0) break; - if ((subtype & VSTYPE) != VSNORMAL) - quoted <<= 1; str = vstype[subtype & VSTYPE]; if (subtype & VSNUL) c = ':'; @@ -1446,9 +1442,6 @@ dostr: USTPUTC(c, nextc); } } - if (quoted & 1) { - USTPUTC('"', nextc); - } *nextc = 0; cmdnextc = nextc; } diff --git a/src/mksyntax.c b/src/mksyntax.c index a23c18c..41c9ceb 100644 --- a/src/mksyntax.c +++ b/src/mksyntax.c @@ -145,7 +145,8 @@ main(int argc, char **argv) fprintf(hfile, "/* %s */\n", is_entry[i].comment); } putc('\n', hfile); - fprintf(hfile, "#define SYNBASE %d\n", 130); + fprintf(hfile, "#define SYNBASE %d\n", 131); + fprintf(hfile, "#define PVSSYNTAX %d\n", -131); fprintf(hfile, "#define PEOF %d\n\n", -130); fprintf(hfile, "#define PEOA %d\n\n", -129); putc('\n', hfile); @@ -158,6 +159,7 @@ main(int argc, char **argv) putc('\n', hfile); /* Generate the syntax tables. */ + fputs("#include \"parser.h\"\n\n", cfile); fputs("#include \"shell.h\"\n", cfile); fputs("#include \"syntax.h\"\n\n", cfile); init(); @@ -170,7 +172,8 @@ main(int argc, char **argv) add("$", "CVAR"); add("}", "CENDVAR"); add("<>();&| \t", "CSPCL"); - syntax[1] = "CSPCL"; + syntax[0] = "0"; + syntax[2] = "CSPCL"; print("basesyntax"); init(); fputs("\n/* syntax table used when in double quotes */\n", cfile); @@ -182,6 +185,7 @@ main(int argc, char **argv) add("}", "CENDVAR"); /* ':/' for tilde expansion, '-' for [a\-x] pattern ranges */ add("!*?[=~:/-]", "CCTL"); + syntax[0] = "VSQUOTE"; print("dqsyntax"); init(); fputs("\n/* syntax table used when in single quotes */\n", cfile); @@ -189,6 +193,7 @@ main(int argc, char **argv) add("'", "CENDQUOTE"); /* ':/' for tilde expansion, '-' for [a\-x] pattern ranges */ add("!*?[=~:/-]\\", "CCTL"); + syntax[0] = "0"; print("sqsyntax"); init(); fputs("\n/* syntax table used when in arithmetic */\n", cfile); @@ -199,6 +204,7 @@ main(int argc, char **argv) add("}", "CENDVAR"); add("(", "CLP"); add(")", "CRP"); + syntax[0] = "VSARITH"; print("arisyntax"); filltable("0"); fputs("\n/* character classification table */\n", cfile); @@ -223,7 +229,7 @@ filltable(char *dftval) { int i; - for (i = 0 ; i < 258; i++) + for (i = 0 ; i < 259; i++) syntax[i] = dftval; } @@ -238,10 +244,10 @@ init(void) int ctl; filltable("CWORD"); - syntax[0] = "CEOF"; - syntax[1] = "CIGN"; + syntax[1] = "CEOF"; + syntax[2] = "CIGN"; for (ctl = CTL_FIRST; ctl <= CTL_LAST; ctl++ ) - syntax[130 + ctl] = "CCTL"; + syntax[131 + ctl] = "CCTL"; } @@ -253,7 +259,7 @@ static void add(char *p, char *type) { while (*p) - syntax[(signed char)*p++ + 130] = type; + syntax[(signed char)*p++ + 131] = type; } @@ -271,7 +277,7 @@ print(char *name) fprintf(hfile, "extern const char %s[];\n", name); fprintf(cfile, "const char %s[] = {\n", name); col = 0; - for (i = 0 ; i < 258; i++) { + for (i = 0 ; i < 259; i++) { if (i == 0) { fputs(" ", cfile); } else if ((i & 03) == 0) { diff --git a/src/mystring.c b/src/mystring.c index 0106bd2..a0d5e47 100644 --- a/src/mystring.c +++ b/src/mystring.c @@ -60,8 +60,7 @@ char nullstr[1]; /* zero length string */ const char spcstr[] = " "; const char snlfmt[] = "%s\n"; -const char dolatstr[] = { CTLQUOTEMARK, CTLVAR, VSNORMAL, '@', '=', - CTLQUOTEMARK, '\0' }; +const char dolatstr[] = { CTLVAR, VSNORMAL|VSQUOTE, '@', '=', '\0' }; const char qchars[] = { CTLESC, CTLQUOTEMARK, 0 }; const char illnum[] = "Illegal number: %s"; const char homestr[] = "HOME"; diff --git a/src/mystring.h b/src/mystring.h index 083ea98..3a82f05 100644 --- a/src/mystring.h +++ b/src/mystring.h @@ -40,7 +40,7 @@ extern const char snlfmt[]; extern const char spcstr[]; extern const char dolatstr[]; -#define DOLATSTRLEN 6 +#define DOLATSTRLEN 4 extern const char qchars[]; extern const char illnum[]; extern const char homestr[]; diff --git a/src/parser.c b/src/parser.c index b3f3684..0e86cff 100644 --- a/src/parser.c +++ b/src/parser.c @@ -876,24 +876,18 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs) size_t len; struct nodelist *bqlist; int quotef; - int dblquote; + int nhere; int varnest; /* levels of variables expansion */ int arinest; /* levels of arithmetic expansion */ int parenlevel; /* levels of parens in arithmetic */ - int dqvarnest; /* levels of variables expansion within double quotes */ int oldstyle; - /* syntax before arithmetic */ - char const *uninitialized_var(prevsyntax); - dblquote = 0; - if (syntax == DQSYNTAX) - dblquote = 1; + nhere = eofmark && syntax == SQSYNTAX; quotef = 0; bqlist = NULL; varnest = 0; arinest = 0; parenlevel = 0; - dqvarnest = 0; STARTSTACKSTR(out); loop: { /* for each line, until end of word */ @@ -922,7 +916,7 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs) USTPUTC(c, out); break; case CCTL: - if (eofmark == NULL || dblquote) + if (!nhere) USTPUTC(CTLESC, out); USTPUTC(c, out); break; @@ -937,13 +931,14 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs) nlprompt(); } else { if ( - dblquote && + syntax != BASESYNTAX && c != '\\' && c != '`' && c != '$' && ( c != '"' || eofmark != NULL ) ) { + USTPUTC(CTLESC, out); USTPUTC('\\', out); } USTPUTC(CTLESC, out); @@ -960,16 +955,12 @@ quotemark: break; case CDQUOTE: syntax = DQSYNTAX; - dblquote = 1; goto quotemark; case CENDQUOTE: if (eofmark && !varnest) USTPUTC(c, out); else { - if (dqvarnest == 0) { - syntax = BASESYNTAX; - dblquote = 0; - } + syntax = BASESYNTAX; quotef++; goto quotemark; } @@ -979,14 +970,18 @@ quotemark: break; case CENDVAR: /* '}' */ if (varnest > 0) { - varnest--; - if (dqvarnest > 0) { - dqvarnest--; + const char *startchar = findstartchar((char *)stackblock(), out - 1, CTLVAR, CTLENDVAR); + char vstype = startchar[1] & VSTYPE; + char vssyntax = startchar[1] & VSSYNTAX; + const char *prevsyntax = vssyntax == (char)VSARITH ? ARISYNTAX : vssyntax == (char)VSQUOTE ? DQSYNTAX : BASESYNTAX; + if (syntax == (prevsyntax == BASESYNTAX || (vstype >= VSTRIM_FIRST && vstype <= VSTRIM_LAST) ? BASESYNTAX : DQSYNTAX)) { + syntax = prevsyntax; + varnest--; + USTPUTC(CTLENDVAR, out); + break; } - USTPUTC(CTLENDVAR, out); - } else { - USTPUTC(c, out); } + USTPUTC(c, out); break; case CLP: /* '(' in arithmetic */ parenlevel++; @@ -999,8 +994,10 @@ quotemark: } else { if (pgetc() == ')') { USTPUTC(CTLENDARI, out); - if (!--arinest) - syntax = prevsyntax; + --arinest; + + char type = findstartchar((char *)stackblock(), out - 1, CTLARI, CTLENDARI)[1] & VSSYNTAX; + syntax = type == (char)VSARITH ? ARISYNTAX : type == (char)VSQUOTE ? DQSYNTAX : BASESYNTAX; } else { /* * unbalanced parens @@ -1292,12 +1289,13 @@ varname: badsub: pungetc(); } - *((char *)stackblock() + typeloc) = subtype; + const char *prevsyntax = syntax; if (subtype != VSNORMAL) { varnest++; - if (dblquote) - dqvarnest++; + syntax = syntax == BASESYNTAX || (subtype >= VSTRIM_FIRST && subtype <= VSTRIM_LAST) ? BASESYNTAX : DQSYNTAX; } + subtype |= prevsyntax[PVSSYNTAX]; + *((char *)stackblock() + typeloc) = subtype; STPUTC('=', out); } goto parsesub_return; @@ -1355,7 +1353,7 @@ parsebackq: { continue; } if (pc != '\\' && pc != '`' && pc != '$' - && (!dblquote || pc != '"')) + && (syntax == BASESYNTAX || pc != '"')) STPUTC('\\', pout); if (pc > PEOA) { break; @@ -1419,7 +1417,10 @@ done: memcpy(out, str, savelen); STADJUST(savelen, out); } - USTPUTC(CTLBACKQ, out); + if (syntax != BASESYNTAX) + USTPUTC(CTLBACKQ | CTLQUOTE, out); + else + USTPUTC(CTLBACKQ, out); if (oldstyle) goto parsebackq_oldreturn; else @@ -1431,11 +1432,10 @@ done: */ parsearith: { - if (++arinest == 1) { - prevsyntax = syntax; - syntax = ARISYNTAX; - } + ++arinest; USTPUTC(CTLARI, out); + USTPUTC(VSTYPE | syntax[PVSSYNTAX], out); + syntax = ARISYNTAX; goto parsearith_return; } @@ -1469,6 +1469,39 @@ endofname(const char *name) } +const char * +findstartchar(const char *start, const char *p, char open, char close) { + int nest = 1; + const char *q; + for (;; ) { + int d; + + --p; + +#if DEBUG + if (p < start) + sh_error("missing start char (shouldn't happen)"); +#endif + + if (*p == open) { + if ((p[1] & VSTYPE) == VSNORMAL) + continue; + + d = -1; + checkescapes: + for (q = p; q != start && q[-1] == (char)CTLESC; q--) + ; + + if ((p - q) % 2 == 0 && !(nest += d)) + return p; + } else if (*p == close) { + d = 1; + goto checkescapes; + } + } +} + + /* * Called when an unexpected token is read during the parse. The argument * is the token that is expected, or -1 if more than one type of token can @@ -1543,7 +1576,7 @@ expandstr(const char *ps) n.narg.text = wordtext; n.narg.backquote = backquotelist; - expandarg(&n, NULL, EXP_QUOTED); + expandarg(&n, NULL, 0); return stackblock(); } diff --git a/src/parser.h b/src/parser.h index 2875cce..d239043 100644 --- a/src/parser.h +++ b/src/parser.h @@ -42,14 +42,19 @@ #define CTLVAR -126 /* variable defn */ #define CTLENDVAR -125 #define CTLBACKQ -124 +#define CTLQUOTE 01 /* ored with CTLBACKQ code if in quotes */ +/* CTLBACKQ | CTLQUOTE == -123 */ #define CTLARI -122 /* arithmetic expression */ #define CTLENDARI -121 #define CTLQUOTEMARK -120 #define CTL_LAST -120 /* last 'special' character */ -/* variable substitution byte (follows CTLVAR) */ +/* variable substitution byte (follows CTLVAR), values picked to be distinct from control characters */ #define VSTYPE 0x0f /* type of variable substitution */ #define VSNUL 0x10 /* colon--treat the empty string as unset */ +#define VSSYNTAX 0xc0 +#define VSQUOTE 0x40 /* inside double quotes--suppress splitting */ +#define VSARITH 0xc0 /* inside $((...)) arithmetic */ /* values of VSTYPE field */ #define VSNORMAL 0x1 /* normal variable: $var or ${var} */ @@ -57,10 +62,12 @@ #define VSPLUS 0x3 /* ${var+text} */ #define VSQUESTION 0x4 /* ${var?message} */ #define VSASSIGN 0x5 /* ${var=text} */ +#define VSTRIM_FIRST 0x6 #define VSTRIMRIGHT 0x6 /* ${var%pattern} */ #define VSTRIMRIGHTMAX 0x7 /* ${var%%pattern} */ #define VSTRIMLEFT 0x8 /* ${var#pattern} */ #define VSTRIMLEFTMAX 0x9 /* ${var##pattern} */ +#define VSTRIM_LAST 0x9 #define VSLENGTH 0xa /* ${#var} */ /* values of checkkwd variable */ @@ -88,6 +95,7 @@ const char *getprompt(void *); const char *const *findkwd(const char *); char *endofname(const char *); const char *expandstr(const char *); +const char *findstartchar(const char *, const char *, char, char); static inline int goodname(const char *p) diff --git a/src/redir.c b/src/redir.c index f96a76b..527b3be 100644 --- a/src/redir.c +++ b/src/redir.c @@ -304,7 +304,7 @@ openhere(union node *redir) p = redir->nhere.doc->narg.text; if (redir->type == NXHERE) { - expandarg(redir->nhere.doc, NULL, EXP_QUOTED); + expandarg(redir->nhere.doc, NULL, 0); p = stackblock(); } diff --git a/src/show.c b/src/show.c index 4a049e9..839a40a 100644 --- a/src/show.c +++ b/src/show.c @@ -222,6 +222,7 @@ sharg(union node *arg, FILE *fp) putc('}', fp); break; case CTLBACKQ: + case CTLBACKQ|CTLQUOTE: putc('$', fp); putc('(', fp); shtree(bqlist->n, -1, NULL, fp); @@ -314,6 +315,7 @@ trstring(char *s) case CTLESC: c = 'e'; goto backslash; case CTLVAR: c = 'v'; goto backslash; case CTLBACKQ: c = 'q'; goto backslash; + case CTLBACKQ|CTLQUOTE: c = 'Q'; goto backslash; backslash: putc('\\', tracefile); putc(c, tracefile); break;