From patchwork Sun Feb 18 22:50:26 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Harald van Dijk X-Patchwork-Id: 10226977 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id B43EB6055B for ; Sun, 18 Feb 2018 22:49:45 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A20AD28793 for ; Sun, 18 Feb 2018 22:49:45 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 86586287AD; Sun, 18 Feb 2018 22:49:45 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, RCVD_IN_DNSWL_HI, T_TVD_MIME_EPI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4726628793 for ; Sun, 18 Feb 2018 22:49:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751658AbeBRWtm (ORCPT ); Sun, 18 Feb 2018 17:49:42 -0500 Received: from home.gigawatt.nl ([83.163.3.213]:43078 "EHLO home.gigawatt.nl" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751642AbeBRWtl (ORCPT ); Sun, 18 Feb 2018 17:49:41 -0500 Received: from [IPv6:2001:980:4809:1:e045:1301:c405:78bf] (unknown [IPv6:2001:980:4809:1:e045:1301:c405:78bf]) by home.gigawatt.nl (Postfix) with ESMTPSA id 48A075402945; Sun, 18 Feb 2018 22:49:38 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 home.gigawatt.nl 48A075402945 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gigawatt.nl; s=default; t=1518994178; bh=RhV2wRQ0Okbj4KrCgCtqlvuBvXkRxo9ycjAtXhiDf58=; l=3019; h=Subject:From:To:References:Date:In-Reply-To:From; b=ayZYo5UHpjpoGhEE4eXF5iimazSe3HYAa5TkOfI/1vggGSb3YOBa15nJaZZ+kLLkD tTqQc2zytHoThQ8JGzbdkybUIptMl42swegfmX5i78NpLztZt58S2UEIJueA9Saenn sQEwsGkBr93CU355Mh7hqB2DhvabztjucxhkTBJY= Subject: Re: dash bug: double-quoted "\" breaks glob protection for next char From: Harald van Dijk To: Denys Vlasenko , Herbert Xu , dash@vger.kernel.org References: <5e1ee06d-d1ca-6442-51de-786e2739d4df@gigawatt.nl> <297b41c4-0e15-e0d8-f088-d68acfcc3c0f@gigawatt.nl> <5a9d8ab5-ec2d-8101-2853-ac3be6b0c4f6@gigawatt.nl> Message-ID: Date: Sun, 18 Feb 2018 23:50:26 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:58.0) Gecko/20100101 Thunderbird/58.0 MIME-Version: 1.0 In-Reply-To: <5a9d8ab5-ec2d-8101-2853-ac3be6b0c4f6@gigawatt.nl> Content-Language: en-US Sender: dash-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: dash@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On 2/14/18 11:50 PM, Harald van Dijk wrote: > On 2/14/18 10:44 PM, Harald van Dijk wrote: >> On 2/14/18 9:03 PM, Harald van Dijk wrote: >>> On 13/02/2018 14:53, Denys Vlasenko wrote: >>>> $ >'\zzzz' >>>> $ >'\wwww' >>>> $ dash -c 'echo "\*"' >>>> \wwww \zzzz >>> >>> [...] >>> >>> Currently: >>> >>> $ dash -c 'foo=a; echo "<${foo#[a\]]}>"' >>> <> >>> >>> This is what I expect, and also what bash, ksh and posh do. >>> >>> With your patch: >>> >>> $ dash -c 'foo=a; echo "<${foo#[a\]]}>"' >>> >> >> Does the attached look right as an alternative? It treats a quoted >> backslash the same way as if it were preceded by CTLESC in _rmescapes. >> It passes your test case and mine, but I'll do more extensive testing. > > It causes preglob's string to potentially grow larger than the original. > When called with RMESCAPE_ALLOC, that can be handled by increasing the > buffer size, but preglob also gets called without RMESCAPE_ALLOC to > modify a string in-place. That's never going to work with this approach. > Back to the drawing board... There is a way to make it work: ensure sufficient memory is always available. Instead of inserting CTLESC, which caused problems, CTLQUOTEMARK+CTLQUOTEMARK can be inserted instead. It's effectively a no-op here. I'm currently testing the attached. To be honest, FreeBSD sh's approach, keeping a syntax stack to detect characters' meaning reliably at parse time, feels more elegant to me right now, but that requires invasive and therefore risky changes to dash's code. Cheers, Harald van Dijk diff --git a/src/expand.c b/src/expand.c index 2a50830..af88a69 100644 --- a/src/expand.c +++ b/src/expand.c @@ -1686,12 +1686,17 @@ _rmescapes(char *str, int flag) } if (*p == (char)CTLESC) { p++; - if (notescaped) - *q++ = '\\'; - } else if (*p == '\\' && !inquotes) { - /* naked back slash */ - notescaped = 0; - goto copy; + goto escape; + } else if (*p == '\\') { + if (inquotes) { +escape: + if (notescaped) + *q++ = '\\'; + } else { + /* naked back slash */ + notescaped = 0; + goto copy; + } } notescaped = globbing; copy: diff --git a/src/parser.c b/src/parser.c index 382658e..bb16a46 100644 --- a/src/parser.c +++ b/src/parser.c @@ -944,6 +944,9 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs) eofmark != NULL ) ) { + /* Reserve extra memory in case this backslash will require later escaping. */ + USTPUTC(CTLQUOTEMARK, out); + USTPUTC(CTLQUOTEMARK, out); USTPUTC('\\', out); } USTPUTC(CTLESC, out);