From patchwork Mon Feb 19 22:13:03 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Harald van Dijk X-Patchwork-Id: 10229143 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 6AA24602B1 for ; Mon, 19 Feb 2018 22:12:19 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 40EC327C0B for ; Mon, 19 Feb 2018 22:12:19 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 343F327CEE; Mon, 19 Feb 2018 22:12:19 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, RCVD_IN_DNSWL_HI, T_TVD_MIME_EPI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 401F027C0B for ; Mon, 19 Feb 2018 22:12:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932128AbeBSWMS (ORCPT ); Mon, 19 Feb 2018 17:12:18 -0500 Received: from home.gigawatt.nl ([83.163.3.213]:45294 "EHLO home.gigawatt.nl" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932116AbeBSWMR (ORCPT ); Mon, 19 Feb 2018 17:12:17 -0500 Received: from [IPv6:2001:980:4809:1:e045:1301:c405:78bf] (unknown [IPv6:2001:980:4809:1:e045:1301:c405:78bf]) by home.gigawatt.nl (Postfix) with ESMTPSA id CAB04540079F; Mon, 19 Feb 2018 22:12:14 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 home.gigawatt.nl CAB04540079F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gigawatt.nl; s=default; t=1519078335; bh=S91X3Ea0TmKWse+aLpNCnJkQLXI6GLOU0XhZ4BGAZU4=; l=3509; h=Subject:From:To:References:Date:In-Reply-To:From; b=nFjn2DVzxiR3lh7R0bZgf4mW7akRjTN0ZMAq5QBw5ryleMQhIJNmBvvGeSSLXVV10 9jAdP/cWHSOwJupk+EqZ4deKRCAsJfREJY9XQN/ySIfRGE0LI28ihXIozIECLHV99k Qv12k8ydyAKumfCyL1lZoh2tViOmzUA4sMMmo60I= Subject: Re: dash bug: double-quoted "\" breaks glob protection for next char From: Harald van Dijk To: Denys Vlasenko , Herbert Xu , dash@vger.kernel.org References: <5e1ee06d-d1ca-6442-51de-786e2739d4df@gigawatt.nl> <297b41c4-0e15-e0d8-f088-d68acfcc3c0f@gigawatt.nl> <5a9d8ab5-ec2d-8101-2853-ac3be6b0c4f6@gigawatt.nl> Message-ID: <9f37ae19-6f74-f527-aa49-dd04c3c010f6@gigawatt.nl> Date: Mon, 19 Feb 2018 23:13:03 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:58.0) Gecko/20100101 Thunderbird/58.0 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US Sender: dash-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: dash@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On 2/18/18 11:50 PM, Harald van Dijk wrote: > On 2/14/18 11:50 PM, Harald van Dijk wrote: >> On 2/14/18 10:44 PM, Harald van Dijk wrote: >>> On 2/14/18 9:03 PM, Harald van Dijk wrote: >>>> On 13/02/2018 14:53, Denys Vlasenko wrote: >>>>> $ >'\zzzz' >>>>> $ >'\wwww' >>>>> $ dash -c 'echo "\*"' >>>>> \wwww \zzzz >>>> >>>> [...] >>>> >>>> Currently: >>>> >>>> $ dash -c 'foo=a; echo "<${foo#[a\]]}>"' >>>> <> >>>> >>>> This is what I expect, and also what bash, ksh and posh do. >>>> >>>> With your patch: >>>> >>>> $ dash -c 'foo=a; echo "<${foo#[a\]]}>"' >>>> >>> >>> Does the attached look right as an alternative? It treats a quoted >>> backslash the same way as if it were preceded by CTLESC in >>> _rmescapes. It passes your test case and mine, but I'll do more >>> extensive testing. >> >> It causes preglob's string to potentially grow larger than the >> original. When called with RMESCAPE_ALLOC, that can be handled by >> increasing the buffer size, but preglob also gets called without >> RMESCAPE_ALLOC to modify a string in-place. That's never going to work >> with this approach. Back to the drawing board... > > There is a way to make it work: ensure sufficient memory is always > available. Instead of inserting CTLESC, which caused problems, > CTLQUOTEMARK+CTLQUOTEMARK can be inserted instead. It's effectively a > no-op here. It required one obvious additional trivial change to the CHECKSTRSPACE invocation, but with that added, the attached passed all testing I could think of. Does this look okay to include, did I miss something, or is there perhaps a better alternative? Cheers, Harald van Dijk diff --git a/src/expand.c b/src/expand.c index 2a50830..af88a69 100644 --- a/src/expand.c +++ b/src/expand.c @@ -1686,12 +1686,17 @@ _rmescapes(char *str, int flag) } if (*p == (char)CTLESC) { p++; - if (notescaped) - *q++ = '\\'; - } else if (*p == '\\' && !inquotes) { - /* naked back slash */ - notescaped = 0; - goto copy; + goto escape; + } else if (*p == '\\') { + if (inquotes) { +escape: + if (notescaped) + *q++ = '\\'; + } else { + /* naked back slash */ + notescaped = 0; + goto copy; + } } notescaped = globbing; copy: diff --git a/src/parser.c b/src/parser.c index 382658e..a847b2e 100644 --- a/src/parser.c +++ b/src/parser.c @@ -909,7 +909,7 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs) #endif CHECKEND(); /* set c to PEOF if at end of here document */ for (;;) { /* until end of line or end of word */ - CHECKSTRSPACE(4, out); /* permit 4 calls to USTPUTC */ + CHECKSTRSPACE(5, out); /* permit 5 calls to USTPUTC */ switch(syntax[c]) { case CNL: /* '\n' */ if (syntax == BASESYNTAX) @@ -944,6 +944,9 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs) eofmark != NULL ) ) { + /* Reserve extra memory in case this backslash will require later escaping. */ + USTPUTC(CTLQUOTEMARK, out); + USTPUTC(CTLQUOTEMARK, out); USTPUTC('\\', out); } USTPUTC(CTLESC, out);