Message ID | 20180323043732.GA30910@gondor.apana.org.au (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Herbert Xu |
Headers | show |
On 23/03/2018 05:37, Herbert Xu wrote: > You're right. The proper fix to this is to ensure that nulonly > is not set in varvalue for $*. It should only be set for $@ when > it's inside double quotes. > > In fact there is another bug while we're playing with $@/$*. > When IFS is set to a non-whitespace character such as :, $* > outside quotes won't remove empty fields as it should. That's not limited to empty fields: IFS=, set -- , , set -- $@ echo $# This should print 2, not 3. (bash gets this wrong too.) > This patch fixes both problems. Also the above. But it breaks a traditional ash extension: unset IFS set -- A1 B2 C3 echo ${@%2 C3} This used to print A1 B, but prints A1 B2 C3 with your patch. echo ${@%2} This used to print A1 B2 C3, but prints A1 B with your patch. This extension was already pretty much broken within quoted expansions: "${@%2}" would already expand to A1 B. Your patch extends that to the non-quoted case. I do not know if you want to keep this extension in. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Mar 23, 2018 at 09:27:37AM +0100, Harald van Dijk wrote: > > Also the above. But it breaks a traditional ash extension: > > unset IFS > set -- A1 B2 C3 > echo ${@%2 C3} > > This used to print A1 B, but prints A1 B2 C3 with your patch. > > echo ${@%2} > > This used to print A1 B2 C3, but prints A1 B with your patch. Hmm, it still does on my machine: $ build/src/dash $ unset IFS $ set -- A1 B2 C3 $ echo ${@%2 C3} A1 B $ I'm testing against this commit: commit c166b718b496da63c4df7a0972df2fc6cd38256b Author: Herbert Xu <herbert@gondor.apana.org.au> Date: Thu Mar 15 18:27:30 2018 +0800 parser: Fix backquote support in here-document EOF mark Are you testing against the same commit or have you applied other patches already? Thanks,
On 23/03/2018 10:10, Herbert Xu wrote: > On Fri, Mar 23, 2018 at 09:27:37AM +0100, Harald van Dijk wrote: >> >> Also the above. But it breaks a traditional ash extension: >> >> unset IFS >> set -- A1 B2 C3 >> echo ${@%2 C3} >> >> This used to print A1 B, but prints A1 B2 C3 with your patch. >> >> echo ${@%2} >> >> This used to print A1 B2 C3, but prints A1 B with your patch. > > Hmm, it still does on my machine: Apologies, I ended up sending the wrong test case. It's when IFS has its default value that the behaviour is changed. It's when it's unset that it still works. Initial testing was against 0.5.9.1 plus your patch, now retested against c166b71 plus your patch. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Op 23-03-18 om 05:37 schreef Herbert Xu: > You're right. The proper fix to this is to ensure that nulonly > is not set in varvalue for $*. It should only be set for $@ when > it's inside double quotes. > > In fact there is another bug while we're playing with $@/$*. > When IFS is set to a non-whitespace character such as :, $* > outside quotes won't remove empty fields as it should. > > This patch fixes both problems. Unfortunately it also introduces a bug with $*. $ src/dash -c 'IFS=:; v=$*; printf "[%s]\n" "$v"' _ abc 'def ghi' jkl [abcdef ghijkl] Expected: [abc:def ghi:jkl] $ src/dash -fc 'IFS=:; set ${v=$*}; printf [%s]\\n "$v" "$@"' \ _ abc 'def ghi' jkl [abcdef ghijkl] [abcdef ghijkl] Expected: [abc:def ghi:jkl] [abc] [def ghi] [jkl] $ src/dash -fc 'IFS=:; set "${v=$*}"; printf [%s]\\n "$v" "$@"' \ _ "abc" "def ghi" "jkl" [abcdef ghijkl] [abcdef ghijkl] Expected: [abc:def ghi:jkl] [abc:def ghi:jkl] Thanks, - M. -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Mar 23, 2018 at 10:27:20AM +0100, Harald van Dijk wrote: > On 23/03/2018 10:10, Herbert Xu wrote: > >On Fri, Mar 23, 2018 at 09:27:37AM +0100, Harald van Dijk wrote: > >> > >>Also the above. But it breaks a traditional ash extension: > >> > >> unset IFS > >> set -- A1 B2 C3 > >> echo ${@%2 C3} > >> > >>This used to print A1 B, but prints A1 B2 C3 with your patch. > >> > >> echo ${@%2} > >> > >>This used to print A1 B2 C3, but prints A1 B with your patch. > > > >Hmm, it still does on my machine: > > Apologies, I ended up sending the wrong test case. It's when IFS has its > default value that the behaviour is changed. It's when it's unset that it > still works. Right. I think I'll leave this one alone. It worked purely by chance previously because dash incorrectly used the first IFS character even when it's outside of double-quotes, which is why the expansion ${@%2 C3} matches. The second example looks strange, but bash/ksh does something funky too in this case: bash-4.2$ echo ${*%2} A1 B C3 bash-4.2$ We could change dash to match this behaviour as it seems to make a little bit more sense but I'm not too bothered. Cheers,
On 23/03/2018 10:52, Herbert Xu wrote: > On Fri, Mar 23, 2018 at 10:27:20AM +0100, Harald van Dijk wrote: >> On 23/03/2018 10:10, Herbert Xu wrote: >>> On Fri, Mar 23, 2018 at 09:27:37AM +0100, Harald van Dijk wrote: >>>> >>>> Also the above. But it breaks a traditional ash extension: >>>> >>>> unset IFS >>>> set -- A1 B2 C3 >>>> echo ${@%2 C3} >>>> >>>> This used to print A1 B, but prints A1 B2 C3 with your patch. >>>> >>>> echo ${@%2} >>>> >>>> This used to print A1 B2 C3, but prints A1 B with your patch. >>> >>> Hmm, it still does on my machine: >> >> Apologies, I ended up sending the wrong test case. It's when IFS has its >> default value that the behaviour is changed. It's when it's unset that it >> still works. > > Right. I think I'll leave this one alone. It worked purely by > chance previously because dash incorrectly used the first IFS > character even when it's outside of double-quotes, which is why > the expansion ${@%2 C3} matches. When the first IFS character is a whitespace character, which it usually is, it's not incorrect: inserting that character and then splitting on it produces the exact results that POSIX requires. It's only when IFS doesn't start with a whitespace character that it's incorrect. > The second example looks strange, but bash/ksh does something funky > too in this case: > > bash-4.2$ echo ${*%2} > A1 B C3 > bash-4.2$ > > We could change dash to match this behaviour as it seems to make > a little bit more sense but I'm not too bothered. In bash and ksh nothing funky is going on. This is well-defined and documented in the manuals: the trimming is done for each field. So when $# is 3, ${@%pat} is equivalent to ${1%pat} ${2%pat} ${3%pat}, except that pat is only expanded once. If you're interested in implementing that in dash, a relevant test case is set -- A1 B2 C3 unset p echo ${1%${p+x}${p=?}} ${2%${p+x}${p=?}} ${3%${p+x}${p=?}} unset p echo ${@%${p+x}${p=?}} In bash and ksh, this is supposed to print, and does print: A B2 C3 A B C Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Mar 23, 2018 at 12:19:28PM +0100, Harald van Dijk wrote: > > >Right. I think I'll leave this one alone. It worked purely by > >chance previously because dash incorrectly used the first IFS > >character even when it's outside of double-quotes, which is why > >the expansion ${@%2 C3} matches. > > When the first IFS character is a whitespace character, which it usually is, > it's not incorrect: inserting that character and then splitting on it > produces the exact results that POSIX requires. It's only when IFS doesn't > start with a whitespace character that it's incorrect. Where does POSIX require this? AFAICS the only time IFS comes into play in forming the fields is with "$*" which is not relevant here. @ Expands to the positional parameters, starting from one, initially producing one field for each positional parameter that is set. When the expansion occurs in a context where field splitting will be performed, any empty fields may be discarded and each of the non-empty fields shall be further split as described in Field Splitting. When the expansion occurs within double-quotes, the behavior is unspecified unless one of the following is true: Field splitting as described in Field Splitting would be performed if the expansion were not within double-quotes (regardless of whether field splitting would have any effect; for example, if IFS is null). The double-quotes are within the word of a ${parameter:-word} or a ${parameter:+word} expansion (with or without the <colon>; see Parameter Expansion) which would have been subject to field splitting if parameter had been expanded instead of word. If one of these conditions is true, the initial fields shall be retained as separate fields, except that if the parameter being expanded was embedded within a word, the first field shall be joined with the beginning part of the original word and the last field shall be joined with the end part of the original word. In all other contexts the results of the expansion are unspecified. If there are no positional parameters, the expansion of '@' shall generate zero fields, even when '@' is within double-quotes; however, if the expansion is embedded within a word which contains one or more other parts that expand to a quoted null string, these null string(s) shall still produce an empty field, except that if the other parts are all within the same double-quotes as the '@', it is unspecified whether the result is zero fields or one empty field. * Expands to the positional parameters, starting from one, initially producing one field for each positional parameter that is set. When the expansion occurs in a context where field splitting will be performed, any empty fields may be discarded and each of the non-empty fields shall be further split as described in Field Splitting. When the expansion occurs in a context where field splitting will not be performed, the initial fields shall be joined to form a single field with the value of each parameter separated by the first character of the IFS variable if IFS contains at least one character, or separated by a <space> if IFS is unset, or with no separation if IFS is set to a null string. > In bash and ksh nothing funky is going on. This is well-defined and > documented in the manuals: the trimming is done for each field. So when $# > is 3, ${@%pat} is equivalent to ${1%pat} ${2%pat} ${3%pat}, except that pat > is only expanded once. > > If you're interested in implementing that in dash, a relevant test case is > > set -- A1 B2 C3 > unset p > echo ${1%${p+x}${p=?}} ${2%${p+x}${p=?}} ${3%${p+x}${p=?}} > unset p > echo ${@%${p+x}${p=?}} > > In bash and ksh, this is supposed to print, and does print: > > A B2 C3 > A B C Fair enough. I'm not interested in implementing that right now. Cheers,
On 23/03/2018 13:20, Herbert Xu wrote: > On Fri, Mar 23, 2018 at 12:19:28PM +0100, Harald van Dijk wrote: >> >>> Right. I think I'll leave this one alone. It worked purely by >>> chance previously because dash incorrectly used the first IFS >>> character even when it's outside of double-quotes, which is why >>> the expansion ${@%2 C3} matches. >> >> When the first IFS character is a whitespace character, which it usually is, >> it's not incorrect: inserting that character and then splitting on it >> produces the exact results that POSIX requires. It's only when IFS doesn't >> start with a whitespace character that it's incorrect. > > Where does POSIX require this? It doesn't, and I didn't say it did. POSIX doesn't care how it's implemented, POSIX cares about the results produced. When another approach produces the same results, both approaches are equally correct. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Mar 23, 2018 at 01:52:10PM +0100, Harald van Dijk wrote: > > It doesn't, and I didn't say it did. POSIX doesn't care how it's > implemented, POSIX cares about the results produced. When another approach > produces the same results, both approaches are equally correct. Right, but restoring the old behaviour for white spaces only is silly. It's silly to have this work: set -- A1 B2 C3 echo ${@%2 C3} but not this: set -- A1 B2 C3 IFS=: echo ${@%2:C3} Of course, in the old dash both worked which makes more sense. But that was at the cost of not eliminating empty fields which violates POSIX. Cheers,
On 23/03/2018 14:14, Herbert Xu wrote: > On Fri, Mar 23, 2018 at 01:52:10PM +0100, Harald van Dijk wrote: >> >> It doesn't, and I didn't say it did. POSIX doesn't care how it's >> implemented, POSIX cares about the results produced. When another approach >> produces the same results, both approaches are equally correct. > > Right, but restoring the old behaviour for white spaces only is > silly. It's silly > to have this work: > > set -- A1 B2 C3 > echo ${@%2 C3} > > but not this: > > set -- A1 B2 C3 > IFS=: > echo ${@%2:C3} Agreed. Both are doable in a POSIX-compatible way, but the second would require a different approach: either scanning the strings to suppress the addition of the separator character in those cases where its presence would cause splitting to produce an additional field, or big changes in how regions are recorded and split. Either way, that's a large cost for a feature that's likely seen very little real-world use. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/src/expand.c b/src/expand.c index 705fef7..a84c015 100644 --- a/src/expand.c +++ b/src/expand.c @@ -119,7 +119,7 @@ STATIC const char *subevalvar(char *, char *, int, int, int, int, int); STATIC char *evalvar(char *, int); STATIC size_t strtodest(const char *, const char *, int); STATIC void memtodest(const char *, size_t, const char *, int); -STATIC ssize_t varvalue(char *, int, int, int *); +STATIC ssize_t varvalue(char *, int, int, int); STATIC void expandmeta(struct strlist *, int); #ifdef HAVE_GLOB STATIC void addglob(const glob_t *); @@ -712,7 +712,6 @@ evalvar(char *p, int flag) int c; int startloc; ssize_t varlen; - int easy; int quoted; varflags = *p++; @@ -723,12 +722,11 @@ evalvar(char *p, int flag) quoted = flag & EXP_QUOTED; var = p; - easy = (!quoted || (*var == '@' && shellparam.nparam)); startloc = expdest - (char *)stackblock(); p = strchr(p, '=') + 1; again: - varlen = varvalue(var, varflags, flag, "ed); + varlen = varvalue(var, varflags, flag, quoted); if (varflags & VSNUL) varlen--; @@ -771,8 +769,11 @@ vsplus: if (subtype == VSNORMAL) { record: - if (!easy) - goto end; + if (quoted) { + quoted = *var == '@' && shellparam.nparam; + if (!quoted) + goto end; + } recordregion(startloc, expdest - (char *)stackblock(), quoted); goto end; } @@ -878,7 +879,7 @@ strtodest(p, syntax, quotes) */ STATIC ssize_t -varvalue(char *name, int varflags, int flags, int *quotedp) +varvalue(char *name, int varflags, int flags, int quoted) { int num; char *p; @@ -887,11 +888,11 @@ varvalue(char *name, int varflags, int flags, int *quotedp) char sepc; char **ap; char const *syntax; - int quoted = *quotedp; int subtype = varflags & VSTYPE; int discard = subtype == VSPLUS || subtype == VSLENGTH; int quotes = (discard ? 0 : (flags & QUOTES_ESC)) | QUOTES_KEEPNUL; ssize_t len = 0; + char c; sep = (flags & EXP_FULL) << CHAR_BIT; syntax = quoted ? DQSYNTAX : BASESYNTAX; @@ -928,12 +929,15 @@ numvar: goto param; /* fall through */ case '*': - if (quoted) + c = 0; + if (quoted) { sep = 0; - sep |= ifsset() ? ifsval()[0] : ' '; + c = ~0; + } + sep |= ifsset() ? (unsigned char)(c & ifsval()[0]) : ' '; + quoted = 0; param: sepc = sep; - *quotedp = !sepc; if (!(ap = shellparam.p)) return -1; while ((p = *ap++)) {