Message ID | 0ce0bca2-3bdd-a1f5-169e-0291a49cd6c7@gigawatt.nl (mailing list archive) |
---|---|
State | Rejected |
Delegated to: | Herbert Xu |
Headers | show |
On 2016-08-09 23:39 +0200, Harald van Dijk wrote: > Yes, this looks like a bug in dash. With the default > --disable-fnmatch code, when dash encounters [ in a pattern, it > immediately treats the following characters as part of the set. If > it then encounters the end of the pattern without having seen a > matching ], it attempts to reset the state and continue as if [ was > treated as a literal character right from the start. The attempt to > reset the state doesn't look right, and has been like this since at > least the initial Git commit in 2005. > > This also affects > > case [a in [?) echo ok ;; *) echo bad ;; esac > > which should print ok. > > Attached is a patch that attempts to reset the state correctly. It > passes your test and mine, but I have not yet tested it extensively. Thanks a lot! For what it's worth, I've been using dash with this patch this for about a week now, including building a lot of open source software (via OpenEmbedded). I haven't seen any issues yet.
Harald van Dijk <harald@gigawatt.nl> wrote: > > Yes, this looks like a bug in dash. With the default --disable-fnmatch > code, when dash encounters [ in a pattern, it immediately treats the > following characters as part of the set. If it then encounters the end > of the pattern without having seen a matching ], it attempts to reset > the state and continue as if [ was treated as a literal character right > from the start. The attempt to reset the state doesn't look right, and > has been like this since at least the initial Git commit in 2005. pdksh exhibits the same behaviour: $ pdksh -c 'foo=[abc]; echo ${foo#[}' [abc] $ POSIX says: 9.3.3 BRE Special Characters A BRE special character has special properties in certain contexts. Outside those contexts, or when preceded by a backslash, such a character is a BRE that matches the special character itself. The BRE special characters and the contexts in which they have their special meaning are as follows: .[\ The period, left-bracket, and backslash shall be special except when used in a bracket expression (see RE Bracket Expression). An expression containing a '[' that is not preceded by a backslash and is not part of a bracket expression produces undefined results. > This also affects > > case [a in [?) echo ok ;; *) echo bad ;; esac > > which should print ok. Even ksh prints bad here. I would however consider a patch that simplifies the code in the undefined case. Cheers,
On 09/02/2016 09:04 AM, Herbert Xu wrote: >> Yes, this looks like a bug in dash. With the default --disable-fnmatch >> code, when dash encounters [ in a pattern, it immediately treats the >> following characters as part of the set. If it then encounters the end >> of the pattern without having seen a matching ], it attempts to reset >> the state and continue as if [ was treated as a literal character right >> from the start. > > POSIX says: > > 9.3.3 BRE Special Characters > > A BRE special character has special properties in certain contexts. ... > An > expression containing a '[' that is not preceded by a backslash > and is not part of a bracket expression produces undefined results. Ah, but POSIX also says: 2.13.1 Patterns Matching a Single Character [ If an open bracket introduces a bracket expression as in XBD RE Bracket Expression, except that the <exclamation-mark> character ( '!' ) shall replace the <circumflex> character ( '^' ) in its role in a non-matching list in the regular expression notation, it shall introduce a pattern bracket expression. A bracket expression starting with an unquoted <circumflex> character produces unspecified results. Otherwise, '[' shall match the character itself. So while a lone '[' is unspecified in a normal BRE, it is well-defined in a shell filename pattern matching context. Since '[' is not a bracket expression, it MUST be treated as a literal '[', so ${foo#[} MUST strip the leading [ from the contents of foo, without requiring that the [ be quoted. > >> This also affects >> >> case [a in [?) echo ok ;; *) echo bad ;; esac >> >> which should print ok. > > Even ksh prints bad here. So ksh is also buggy. > > I would however consider a patch that simplifies the code in the > undefined case. Except that it is well-defined by POSIX, not undefined.
On Fri, Sep 02, 2016 at 09:25:15AM -0500, Eric Blake wrote: > > >> This also affects > >> > >> case [a in [?) echo ok ;; *) echo bad ;; esac > >> > >> which should print ok. > > > > Even ksh prints bad here. > > So ksh is also buggy. Good luck writing a script with an unquoted [ expecting it to be portable :)
On Fri, Sep 02, 2016 at 09:25:15AM -0500, Eric Blake wrote: > > 2.13.1 Patterns Matching a Single Character > > [ > If an open bracket introduces a bracket expression as in XBD RE > Bracket Expression, except that the <exclamation-mark> character ( '!' ) > shall replace the <circumflex> character ( '^' ) in its role in a > non-matching list in the regular expression notation, it shall introduce > a pattern bracket expression. A bracket expression starting with an > unquoted <circumflex> character produces unspecified results. Otherwise, > '[' shall match the character itself. BTW, this last sentence is not present in http://pubs.opengroup.org/onlinepubs/009604499/utilities/xcu_chap02.html#tag_02_13 So I presume it's a newer unreleased revision. Seriously, you guys are turning POSIX into a joke by introducing all these new requirements. At this point I think we should pretty much give up on POSIX compliance the way it's headed. Cheers,
On 09/02/2016 09:25 AM, Eric Blake wrote: > > So while a lone '[' is unspecified in a normal BRE, it is well-defined > in a shell filename pattern matching context. Since '[' is not a > bracket expression, it MUST be treated as a literal '[', so ${foo#[} > MUST strip the leading [ from the contents of foo, without requiring > that the [ be quoted. Rationale: The '[' shell builtin is not undefined behavior. This was a specific addition made in POSIX 2008 that was not present in POSIX 2001 (although I can't easily find the bug report that justified it, since the bugs for POSIX 2008 predate the current austingroupbugs.net database)
On 09/02/2016 09:29 AM, Herbert Xu wrote: > On Fri, Sep 02, 2016 at 09:25:15AM -0500, Eric Blake wrote: >> >>>> This also affects >>>> >>>> case [a in [?) echo ok ;; *) echo bad ;; esac >>>> >>>> which should print ok. >>> >>> Even ksh prints bad here. >> >> So ksh is also buggy. > > Good luck writing a script with an unquoted [ expecting it to be > portable :) [ '' ] || echo empty There, I just wrote a portable script with unquoted [ portably interpreted as itself and not as a bracket filename expansion pattern.
On Fri, Sep 02, 2016 at 09:49:53AM -0500, Eric Blake wrote: > On 09/02/2016 09:29 AM, Herbert Xu wrote: > > On Fri, Sep 02, 2016 at 09:25:15AM -0500, Eric Blake wrote: > >> > >>>> This also affects > >>>> > >>>> case [a in [?) echo ok ;; *) echo bad ;; esac > >>>> > >>>> which should print ok. > >>> > >>> Even ksh prints bad here. > >> > >> So ksh is also buggy. > > > > Good luck writing a script with an unquoted [ expecting it to be > > portable :) > > [ '' ] || echo empty > > There, I just wrote a portable script with unquoted [ portably > interpreted as itself and not as a bracket filename expansion pattern. dash handles that just fine. We're not talking about a lone [ in file expansion context here. Cheers,
On 09/02/2016 09:46 AM, Herbert Xu wrote: > On Fri, Sep 02, 2016 at 09:25:15AM -0500, Eric Blake wrote: >> >> 2.13.1 Patterns Matching a Single Character >> >> [ >> If an open bracket introduces a bracket expression as in XBD RE >> Bracket Expression, except that the <exclamation-mark> character ( '!' ) >> shall replace the <circumflex> character ( '^' ) in its role in a >> non-matching list in the regular expression notation, it shall introduce >> a pattern bracket expression. A bracket expression starting with an >> unquoted <circumflex> character produces unspecified results. Otherwise, >> '[' shall match the character itself. > > BTW, this last sentence is not present in > > http://pubs.opengroup.org/onlinepubs/009604499/utilities/xcu_chap02.html#tag_02_13 > That's the 2004 edition (TC1 of the 2001 spec, aka Issue 6). > So I presume it's a newer unreleased revision. Newer but released (TC2 of the 2008 spec, aka Issue 7): http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_13_01 The requirement has been there for 8 years now. > > Seriously, you guys are turning POSIX into a joke by introducing > all these new requirements. At this point I think we should > pretty much give up on POSIX compliance the way it's headed. I hope you're just stating that out of frustration, and not something that you actually intend to follow through with. And if there is a requirement being considered in the Austin Group that you disagree with, please speak up on the Austin Group - membership is free.
On 9/2/16 10:46 AM, Herbert Xu wrote: > On Fri, Sep 02, 2016 at 09:25:15AM -0500, Eric Blake wrote: >> >> 2.13.1 Patterns Matching a Single Character >> >> [ >> If an open bracket introduces a bracket expression as in XBD RE >> Bracket Expression, except that the <exclamation-mark> character ( '!' ) >> shall replace the <circumflex> character ( '^' ) in its role in a >> non-matching list in the regular expression notation, it shall introduce >> a pattern bracket expression. A bracket expression starting with an >> unquoted <circumflex> character produces unspecified results. Otherwise, >> '[' shall match the character itself. > > BTW, this last sentence is not present in > > http://pubs.opengroup.org/onlinepubs/009604499/utilities/xcu_chap02.html#tag_02_13 That's pretty old; it's from 2004. > So I presume it's a newer unreleased revision. It's in the current revision, which dates from 2008. > > Seriously, you guys are turning POSIX into a joke by introducing > all these new requirements. At this point I think we should > pretty much give up on POSIX compliance the way it's headed. Let's not go overboard here. This was introduced to tighten up ambiguous behavior: what happens when the open bracket *doesn't* introduce a bracket expression.
On Fri, Sep 02, 2016 at 10:04:37PM +0800, Herbert Xu wrote: > Harald van Dijk <harald@gigawatt.nl> wrote: > > Yes, this looks like a bug in dash. With the default --disable-fnmatch > > code, when dash encounters [ in a pattern, it immediately treats the > > following characters as part of the set. If it then encounters the end > > of the pattern without having seen a matching ], it attempts to reset > > the state and continue as if [ was treated as a literal character right > > from the start. The attempt to reset the state doesn't look right, and > > has been like this since at least the initial Git commit in 2005. > pdksh exhibits the same behaviour: > $ pdksh -c 'foo=[abc]; echo ${foo#[}' > [abc] > $ > POSIX says: > 9.3.3 BRE Special Characters > A BRE special character has special properties in certain contexts. > Outside those contexts, or when preceded by a backslash, such a > character is a BRE that matches the special character itself. The > BRE special characters and the contexts in which they have their > special meaning are as follows: > .[\ > The period, left-bracket, and backslash shall be special except > when used in a bracket expression (see RE Bracket Expression). An > expression containing a '[' that is not preceded by a backslash > and is not part of a bracket expression produces undefined results. I think this interpretation of POSIX is incorrect. This is about shell patterns, not basic regular expressions. Shell patterns are specified in XCU 2.13 Pattern Matching Notation. In XCU 2.13.1, it is written: ] [ ] If an open bracket introduces a bracket expression as in XBD Section ] 9.3.5, except that the <exclamation-mark> character ('!') shall ] replace the <circumflex> character ('^') in its role in a non-matching ] list in the regular expression notation, it shall introduce a pattern ] bracket expression. A bracket expression starting with an unquoted ] <circumflex> character produces unspecified results. Otherwise, '[' ] shall match the character itself. Therefore, pdksh is wrong and the output should be abc]. It is normally better to test against the actively developed mksh instead of pdksh, but here mksh has the same bug. OpenBSD's ksh also has some active development but stays closer to the original pdksh. > > This also affects > > case [a in [?) echo ok ;; *) echo bad ;; esac > > which should print ok. > Even ksh prints bad here. I think POSIX may be saying something different here from what it really wants to say. There is text in 2.13.3 Patterns Used for Filename Expansion that leaves unspecified whether [? matches only the literal filename component [? or all two-character filename components starting with [ (other slash-separated components in the same pattern are unaffected). However, if ksh93 behaves similarly in a case statement, that may have been what the standard had intended to say. Looking at as simple code as possible, this seems, however, unhelpful. Since a pattern like *[ should match the literal string *[ in the choice where brackets that do not introduce a bracket expression are supposed to disable other special characters and any earlier work on the * is therefore wrong, implementing this choice requires an additional scan for brackets that do not introduce a bracket expression.
On 02/09/16 16:51, Herbert Xu wrote: > On Fri, Sep 02, 2016 at 09:49:53AM -0500, Eric Blake wrote: >> On 09/02/2016 09:29 AM, Herbert Xu wrote: >>> On Fri, Sep 02, 2016 at 09:25:15AM -0500, Eric Blake wrote: >>>> >>>>>> This also affects >>>>>> >>>>>> case [a in [?) echo ok ;; *) echo bad ;; esac >>>>>> >>>>>> which should print ok. >>>>> >>>>> Even ksh prints bad here. >>>> >>>> So ksh is also buggy. >>> >>> Good luck writing a script with an unquoted [ expecting it to be >>> portable :) >> >> [ '' ] || echo empty >> >> There, I just wrote a portable script with unquoted [ portably >> interpreted as itself and not as a bracket filename expansion pattern. > > dash handles that just fine. We're not talking about a lone [ > in file expansion context here. We were originally talking about the [ in ${foo#[}. That is a lone [ in pattern matching context, which is what file expansion context is. We're talking about something that dash has had code to handle for >10 years, that's been documented by dash as supported for >10 years, and now that it turns out there's a flaw in the code where dash does not behave as documented and as clearly intended by the code, it's POSIX's fault? -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Sep 03, 2016 at 02:03:28PM +0200, Harald van Dijk wrote: > >>>> > >>>>>>This also affects > >>>>>> > >>>>>> case [a in [?) echo ok ;; *) echo bad ;; esac > > We're talking about something that dash has had code to handle for > >10 years, that's been documented by dash as supported for >10 > years, and now that it turns out there's a flaw in the code where > dash does not behave as documented and as clearly intended by the > code, it's POSIX's fault? dash has never worked this way. What's worse is that your patch changes dash's behaviour in a way that is inconsistent with every shell out there except bash. Cheers,
On 03/09/16 15:05, Herbert Xu wrote: > On Sat, Sep 03, 2016 at 02:03:28PM +0200, Harald van Dijk wrote: >>>>>> >>>>>>>> This also affects >>>>>>>> >>>>>>>> case [a in [?) echo ok ;; *) echo bad ;; esac >> >> We're talking about something that dash has had code to handle for >>> 10 years, that's been documented by dash as supported for >10 >> years, and now that it turns out there's a flaw in the code where >> dash does not behave as documented and as clearly intended by the >> code, it's POSIX's fault? > > dash has never worked this way. Very misleading to continue quoting the complicated case but leave out the simple case again: dash doesn't handle even ${foo#[} correctly. And again, by correctly I mean as documented by dash, as required by POSIX, and as clearly intended by the author at the time the code was written. I do not know if you were the author of that specific piece of code. But yeah, sure, if the bug has been there for over 10 years, and I'm unable to find older versions of dash to check, I would have guessed that dash indeed has never worked this way. > What's worse is that your patch > changes dash's behaviour in a way that is inconsistent with every > shell out there except bash. For the simple case, ksh and posh also strip a leading [ if doing ${foo#[}. For the complicated case, at the very least, it makes dash consistent between builds. If you feel that for the complicated case, bad should be printed rather than ok, then that just makes dash buggy in the --enable-fnmatch builds, it still means there's something to fix. -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
--- a/src/expand.c +++ b/src/expand.c @@ -1594,7 +1594,8 @@ pmatch(const char *pattern, const char *string) do { if (!c) { p = startp; - c = *p; + q--; + c = '['; goto dft; } if (c == '[') {