Message ID | 20180315102730.GA32076@gondor.apana.org.au (mailing list archive) |
---|---|
State | Accepted |
Delegated to: | Herbert Xu |
Headers | show |
On 15/03/2018 11:27, Herbert Xu wrote: > Harald van Dijk <harald@gigawatt.nl> wrote: >> >> Changing dash to support this would in my opinion be unreasonable, as it >> would require a total rewrite of the parser. > > I don't think it's that hard. It is if you want to do it the way POSIX specifies. You're adding a special exception in the parser. I don't see how this approach can be extended to handle the other examples in my mail: cat <<`one two` ok `one two` `one two` is a single word, so should be allowed as a heredoc delimiter. By treating ` as a literal and not looking for the matching closing `, this only ends up taking `one as the delimiter, does it not? And to hopefully better demonstrate the need to not introduce special parsing exceptions: cat <<$(case x in 0) echo bug ;; esac) ok $(case x in 0) echo bug ;; esac) Yes, this actually works in bash/zsh/yash. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Mar 15, 2018 at 12:41:10PM +0100, Harald van Dijk wrote: > > It is if you want to do it the way POSIX specifies. You're adding a special > exception in the parser. I don't see how this approach can be extended to > handle the other examples in my mail: I don't think it's exactly clear what POSIX says on this. > cat <<`one two` > ok > `one two` Try this one: cat << ${ ${ bash/ksh93 both will get stuck forever until a right brace is entered. While other shells all behave the way dash does. Considering the fact that even if you closed the brace after the newline bash would still be stuck forever, I think this behaviour is suboptimal. ksh93 seems to do the right thing though. Another interesting thing to try is cat << ${ foo ${ Also you can look at the quotes: cat << " " IOW it's not clear that "word" in this context necessarily follows the same rules used during normal tokenisation. Anyway, dash has had CHKEOFMARK since 2007 so this is just an extension of existing behaviour. Cheers,
On 15/03/2018 15:52, Herbert Xu wrote: > On Thu, Mar 15, 2018 at 12:41:10PM +0100, Harald van Dijk wrote: >> >> It is if you want to do it the way POSIX specifies. You're adding a special >> exception in the parser. I don't see how this approach can be extended to >> handle the other examples in my mail: > > I don't think it's exactly clear what POSIX says on this. I don't think it's clear what POSIX means to say, but I do think it's clear what it does say. >> cat <<`one two` >> ok >> `one two` > > Try this one: > > cat << ${ > ${ > > bash/ksh93 both will get stuck forever until a right brace is > entered. That's because POSIX specifies that after ${, everything up to the matching }, not including nested strings, expansions, etc., is part of the word. No exception is made when it spans multiple lines. Another instance of this is if false; then echo ${ fi } fi echo ok This is accepted by bash, ksh, zsh and posh. However, in this instance, I'm having trouble finding where, but IIRC, POSIX says or means to say that it's okay to flag this as an invalid parameter expansion at parse time even though the evaluation would otherwise be skipped. > While other shells all behave the way dash does. All? At least two others don't: yash rejects ${ as invalid because of the missing parameter name. zsh agrees with bash and ksh that it should look for the closing }. > Considering the fact that even if you closed the brace after > the newline bash would still be stuck forever, That's because bash doesn't ever match multi-line heredoc delimiters. Which is what POSIX technically requires: "The here-document shall be treated as a single word that begins after the next <newline> and continues until there is a line containing only the delimiter and a <newline>, with no <blank> characters in between." No single line will ever match a multi-line heredoc delimiter. > I think this > behaviour is suboptimal. ksh93 seems to do the right thing though. Yes, I agree that multi-line heredoc delimiters are useful. > Another interesting thing to try is > > cat << ${ foo > ${ > > Also you can look at the quotes: > > cat << " > " > > IOW it's not clear that "word" in this context necessarily follows > the same rules used during normal tokenisation. Quotes are clear: as far as they go, this *must* follow the same rules used during normal tokenisation. Does any shell disagree? cat << "A ' B" ok 1 A ' B echo ok 2 I expect this to print ok 1 ok 2 and I'm not finding any shell that disagrees, not even dash. Do you have an example of a shell that sees only "A as the heredoc delimiter, or one that perhaps performs quote removal on the inner '? For your example, does any shell, including dash, *not* treat this as a multi-line heredoc delimiter consisting of a newline character (which may or may not be matched by two blank lines, depending on the shell)? > Anyway, dash has had CHKEOFMARK since 2007 so this is just an > extension of existing behaviour. This is definitely true: your patch, regardless of whether it matches POSIX's requirements, is consistent with how dash has behaved for a long time. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Mar 15, 2018 at 05:29:27PM +0100, Harald van Dijk wrote: > > That's because POSIX specifies that after ${, everything up to the matching > }, not including nested strings, expansions, etc., is part of the word. No > exception is made when it spans multiple lines. > > Another instance of this is > > if false; then > echo ${ > fi > } > fi > echo ok > > This is accepted by bash, ksh, zsh and posh. Interestingly, ksh bombs out on this: echo ${ fi } So this behaviour is not exactly consistent. In any case, this substituion is invalid in all of these shells so does it really matter? > >While other shells all behave the way dash does. > > All? At least two others don't: yash rejects ${ as invalid because of the > missing parameter name. zsh agrees with bash and ksh that it should look for > the closing }. pdksh/mksh are the two shells that I tried. OK I admit "all" was stretching things :) But my point regarding the inconsistency remains. > >Considering the fact that even if you closed the brace after > >the newline bash would still be stuck forever, > > That's because bash doesn't ever match multi-line heredoc delimiters. Which > is what POSIX technically requires: > > "The here-document shall be treated as a single word that begins after the > next <newline> and continues until there is a line containing only the > delimiter and a <newline>, with no <blank> characters in between." > > No single line will ever match a multi-line heredoc delimiter. Sure. But this is a quality of implementation issue. If you're never going to match the delimter you should probably emit an error at the very start rather than at the EOF. > >Another interesting thing to try is > > > > cat << ${ foo > > ${ > > > >Also you can look at the quotes: > > > > cat << " > > " > > > >IOW it's not clear that "word" in this context necessarily follows > >the same rules used during normal tokenisation. > > Quotes are clear: as far as they go, this *must* follow the same rules used > during normal tokenisation. Does any shell disagree? I was talking about multi-line quotes, which you conveniently dismissed :) > For your example, does any shell, including dash, *not* treat this as a > multi-line heredoc delimiter consisting of a newline character (which may or > may not be matched by two blank lines, depending on the shell)? Yes, almost every shell fails with the multi-line delimiter inside double quotes. Only dash and ksh93 seem to get it right. Cheers,
On 15/03/2018 18:11, Herbert Xu wrote: > On Thu, Mar 15, 2018 at 05:29:27PM +0100, Harald van Dijk wrote: >> >> That's because POSIX specifies that after ${, everything up to the matching >> }, not including nested strings, expansions, etc., is part of the word. No >> exception is made when it spans multiple lines. >> >> Another instance of this is >> >> if false; then >> echo ${ >> fi >> } >> fi >> echo ok >> >> This is accepted by bash, ksh, zsh and posh. > > Interestingly, ksh bombs out on this: > > echo ${ > fi > } > > So this behaviour is not exactly consistent. It's perfectly consistent. It gets accepted at parse time, it only gets rejected at expansion time. That's how dash generally behaves as well: $ dash -c 'echo ${x^}' dash: 1: Bad substitution $ dash -c ': || echo ${x^}' $ Historically, as I understand it, ash would reject both of these, but you specifically modified dash to accept invalid expansions during parsing (which makes sense): <https://git.kernel.org/pub/scm/utils/dash/dash.git/commit/?id=3df3edd13389ae768010bfacee5612346b413e38> > In any case, this substituion is invalid in all of these shells so > does it really matter? Okay, it can be trivially modified to something that does work in other shells (even if it were actually executed), but gets rejected at parse time by dash: if false; then : ${$+ } fi It'd be nice if all shells used the same parse rules, so that scripts can detect dynamically which expansions are supported, but don't have to go through ugly eval commands to use them. >> That's because bash doesn't ever match multi-line heredoc delimiters. Which >> is what POSIX technically requires: >> >> "The here-document shall be treated as a single word that begins after the >> next <newline> and continues until there is a line containing only the >> delimiter and a <newline>, with no <blank> characters in between." >> >> No single line will ever match a multi-line heredoc delimiter. > > Sure. But this is a quality of implementation issue. If you're > never going to match the delimter you should probably emit an error > at the very start rather than at the EOF. POSIX isn't clear on whether reaching EOF without seeing the heredoc delimiter is an error and shells disagree. dash silently accepts it, as do ksh, yash and zsh. When EOF is an acceptable way to terminate a heredoc, a delimiter which never matches can be useful to force the complete remainder of the file to be treated as a heredoc. >>> Another interesting thing to try is >>> >>> cat << ${ foo >>> ${ >>> >>> Also you can look at the quotes: >>> >>> cat << " >>> " >>> >>> IOW it's not clear that "word" in this context necessarily follows >>> the same rules used during normal tokenisation. >> >> Quotes are clear: as far as they go, this *must* follow the same rules used >> during normal tokenisation. Does any shell disagree? > > I was talking about multi-line quotes, which you conveniently dismissed :) I got back to that :) I picked another example first because I thought it would be clearer using that that the normal tokenisation rules should be used. >> For your example, does any shell, including dash, *not* treat this as a >> multi-line heredoc delimiter consisting of a newline character (which may or >> may not be matched by two blank lines, depending on the shell)? > > Yes, almost every shell fails with the multi-line delimiter inside > double quotes. Only dash and ksh93 seem to get it right. posh accepts multi-line heredoc delimiters as well. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Mar 15, 2018 at 10:49:15PM +0100, Harald van Dijk wrote: > > It's perfectly consistent. It gets accepted at parse time, it only gets > rejected at expansion time. That's how dash generally behaves as well: > > $ dash -c 'echo ${x^}' > dash: 1: Bad substitution > $ dash -c ': || echo ${x^}' > $ > > Historically, as I understand it, ash would reject both of these, but you > specifically modified dash to accept invalid expansions during parsing > (which makes sense): > > > <https://git.kernel.org/pub/scm/utils/dash/dash.git/commit/?id=3df3edd13389ae768010bfacee5612346b413e38> > > >In any case, this substituion is invalid in all of these shells so > >does it really matter? > > Okay, it can be trivially modified to something that does work in other > shells (even if it were actually executed), but gets rejected at parse time > by dash: > > if false; then > : ${$+ > } > fi That's just a bug in dash's parser with ${} in general, because it bombs out without the if clause too: : ${$+ } > It'd be nice if all shells used the same parse rules, so that scripts can > detect dynamically which expansions are supported, but don't have to go > through ugly eval commands to use them. I think the reason dash bombs out on your first example (fi) isn't because of the keyword at all, but it's actually the same reason as the second, namely newlines are not accepted within parameter substitution. If we fixed this then it should work correctly. Cheers,
diff --git a/src/parser.c b/src/parser.c index 382658e..283ed63 100644 --- a/src/parser.c +++ b/src/parser.c @@ -1012,6 +1012,11 @@ quotemark: } break; case CBQUOTE: /* '`' */ + if (checkkwd & CHKEOFMARK) { + USTPUTC('`', out); + break; + } + PARSEBACKQOLD(); break; case CEOF: