From patchwork Fri Mar 23 10:58:47 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Herbert Xu X-Patchwork-Id: 10303993 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 5E9E560385 for ; Fri, 23 Mar 2018 10:59:04 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4C8D728D60 for ; Fri, 23 Mar 2018 10:59:04 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 40E4D28DAF; Fri, 23 Mar 2018 10:59:04 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4129D28D60 for ; Fri, 23 Mar 2018 10:59:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752675AbeCWK7B (ORCPT ); Fri, 23 Mar 2018 06:59:01 -0400 Received: from orcrist.hmeau.com ([104.223.48.154]:34090 "EHLO deadmen.hmeau.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752538AbeCWK67 (ORCPT ); Fri, 23 Mar 2018 06:58:59 -0400 Received: from gondobar.mordor.me.apana.org.au ([192.168.128.4] helo=gondobar) by deadmen.hmeau.com with esmtp (Exim 4.84_2 #2 (Debian)) id 1ezKPK-0002pG-RI; Fri, 23 Mar 2018 18:58:50 +0800 Received: from herbert by gondobar with local (Exim 4.84_2) (envelope-from ) id 1ezKPH-000055-5V; Fri, 23 Mar 2018 18:58:47 +0800 Date: Fri, 23 Mar 2018 18:58:47 +0800 From: Herbert Xu To: Martijn Dekker Cc: Harald van Dijk , dash@vger.kernel.org Subject: [PATCH v2] expand: Fix ghost fields with unquoted $@/$* Message-ID: <20180323105847.GB32220@gondor.apana.org.au> References: <20180323043732.GA30910@gondor.apana.org.au> <22bf1638-9a0b-74ce-8f31-a52883f21184@inlv.org> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <22bf1638-9a0b-74ce-8f31-a52883f21184@inlv.org> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: dash-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: dash@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Fri, Mar 23, 2018 at 10:48:13AM +0100, Martijn Dekker wrote: > > Unfortunately it also introduces a bug with $*. > > $ src/dash -c 'IFS=:; v=$*; printf "[%s]\n" "$v"' _ abc 'def ghi' jkl > [abcdef ghijkl] > > Expected: > [abc:def ghi:jkl] Thanks, the problem here is that we need to set c to 0 not just when quoted is true but also if sep is 0 since both imply that field splitting is disabled. Here is an second revision which should fix this by checking (quoted || !sep) instead of just quoted when determining whether we're doing field expansion in $*. ---8<--- Harald van Dijk wrote: > On 22/03/2018 22:38, Martijn Dekker wrote: >> Op 22-03-18 om 20:28 schreef Harald van Dijk: >>> On 22/03/2018 03:40, Martijn Dekker wrote: >>>> This patch fixes the bug that, given no positional parameters, unquoted >>>> $@ and $* incorrectly generate one empty field (they should generate no >>>> fields). Apparently that was a side effect of the above. >>> >>> This seems weird though. If you want to remove the recording of empty >>> regions because they are pointless, then how does removing them fix a >>> bug? Doesn't this show that empty regions do have an effect? Perhaps >>> they're not supposed to have any effect, perhaps it's a specific >>> combination of empty regions and something else that triggers some bug, >>> and perhaps that combination can no longer occur with your patch. >> >> The latter is my guess, but I haven't had time to investigate it. > > Looking into it again: > > When IFS is set to an empty string, sepc is set to '\0' in varvalue(). > This then causes *quotedp to be set to true, meaning evalvar()'s quoted > variable is turned on. quoted is then passed to recordregion() as the > nulonly parameter. > > ifsp->nulonly has a bigger effect than merely selecting whether to use > $IFS or whether to only split on null bytes: in ifsbreakup(), nulonly > also causes string termination to be suppressed. That's correct: that > special treatment is required to preserve empty fields in "$@" > expansion. But it should *only* be used when $@ is quoted: ifsbreakup() > takes nulonly from the last IFS region, even if it's empty, so having an > additional zero-length region with nulonly enabled causes confusion. > > Passing quoted by value to varvalue() and not attempting to modify it > should therefore, and in my quick testing does, also work to fix the > original $@ bug. You're right. The proper fix to this is to ensure that nulonly is not set in varvalue for $*. It should only be set for $@ when it's inside double quotes. In fact there is another bug while we're playing with $@/$*. When IFS is set to a non-whitespace character such as :, $* outside quotes won't remove empty fields as it should. This patch fixes both problems. Reported-by: Martijn Dekker Suggested-by: Harald van Dijk Signed-off-by: Herbert Xu diff --git a/src/expand.c b/src/expand.c index 705fef7..c14350c 100644 --- a/src/expand.c +++ b/src/expand.c @@ -119,7 +119,7 @@ STATIC const char *subevalvar(char *, char *, int, int, int, int, int); STATIC char *evalvar(char *, int); STATIC size_t strtodest(const char *, const char *, int); STATIC void memtodest(const char *, size_t, const char *, int); -STATIC ssize_t varvalue(char *, int, int, int *); +STATIC ssize_t varvalue(char *, int, int, int); STATIC void expandmeta(struct strlist *, int); #ifdef HAVE_GLOB STATIC void addglob(const glob_t *); @@ -712,7 +712,6 @@ evalvar(char *p, int flag) int c; int startloc; ssize_t varlen; - int easy; int quoted; varflags = *p++; @@ -723,12 +722,11 @@ evalvar(char *p, int flag) quoted = flag & EXP_QUOTED; var = p; - easy = (!quoted || (*var == '@' && shellparam.nparam)); startloc = expdest - (char *)stackblock(); p = strchr(p, '=') + 1; again: - varlen = varvalue(var, varflags, flag, "ed); + varlen = varvalue(var, varflags, flag, quoted); if (varflags & VSNUL) varlen--; @@ -771,8 +769,11 @@ vsplus: if (subtype == VSNORMAL) { record: - if (!easy) - goto end; + if (quoted) { + quoted = *var == '@' && shellparam.nparam; + if (!quoted) + goto end; + } recordregion(startloc, expdest - (char *)stackblock(), quoted); goto end; } @@ -878,7 +879,7 @@ strtodest(p, syntax, quotes) */ STATIC ssize_t -varvalue(char *name, int varflags, int flags, int *quotedp) +varvalue(char *name, int varflags, int flags, int quoted) { int num; char *p; @@ -887,11 +888,11 @@ varvalue(char *name, int varflags, int flags, int *quotedp) char sepc; char **ap; char const *syntax; - int quoted = *quotedp; int subtype = varflags & VSTYPE; int discard = subtype == VSPLUS || subtype == VSLENGTH; int quotes = (discard ? 0 : (flags & QUOTES_ESC)) | QUOTES_KEEPNUL; ssize_t len = 0; + char c; sep = (flags & EXP_FULL) << CHAR_BIT; syntax = quoted ? DQSYNTAX : BASESYNTAX; @@ -928,12 +929,25 @@ numvar: goto param; /* fall through */ case '*': - if (quoted) - sep = 0; - sep |= ifsset() ? ifsval()[0] : ' '; + /* We will set c to 0 or ~0 depending on whether + * we're doing field splitting. We won't do field + * splitting if either we're quoted or sep is zero. + * + * Instead of testing (quoted || !sep) the following + * trick optimises away any branches by using the + * fact that EXP_QUOTED (which is the only bit that + * can be set in quoted) is the same as EXP_FULL << + * CHAR_BIT (which is the only bit that can be set + * in sep). + */ +#if EXP_QUOTED >> CHAR_BIT != EXP_FULL +#error The following two lines expect EXP_QUOTED == EXP_FULL << CHAR_BIT +#endif + c = !((quoted | ~sep) & EXP_QUOTED) - 1; + sep &= ~quoted; + sep |= ifsset() ? (unsigned char)(c & ifsval()[0]) : ' '; param: sepc = sep; - *quotedp = !sepc; if (!(ap = shellparam.p)) return -1; while ((p = *ap++)) {