Message ID | 5660ADD6.4020308@gigawatt.nl (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Herbert Xu |
Headers | show |
2015-12-03 22:02:14 +0100, Harald van Dijk: [....] > $ for shell in bash mksh posh zsh; do printf %s: "$shell"; $shell > -c 'IFS=,; echo a, | { read v; echo "<$v>"; }'; done > bash:<a> > mksh:<a> > posh:<a,> > zsh:<a,> > > As far as I can tell, the posh/zsh behaviour is the correct > behaviour, but I'm not convinced yet my interpretation is correct. [...] No, that would be the same as for: v=a:b: IFS=: set -f set -- $v It's meant to split into "a" and "b", not "a", "b" and "". As ":" is meant to be treated as a *delimiter* or *terminator*. That has been discussed a few times on the austin group mailing list. zsh and pdksh (and other descendants of the Forsyth shell) treat it as separator (and are not compliant), mksh (derived from pdksh) changed it recently. posh (also based on pdksh) still hasn't changed it. That's a bit counter-intuitive and means for instance you can't use $IFS to split variables like $PATH/$LD_LIBRARY_PATH... In the case of read, it even makes less sense because: ~$ echo a:: | sh -c 'IFS=: read a; echo "$a"' a:: ~$ echo a: | sh -c 'IFS=: read a; echo "$a"' a But that's how it's specified. So dash is indeed conformant there.
Stephane Chazelas schreef op 03-12-15 om 22:17: > It's meant to split into "a" and "b", not "a", "b" and "". As > ":" is meant to be treated as a *delimiter* or *terminator*. That was my interpretation of the standard, too. So I reported this as a bug to author of yash, but he reads the standard differently and came up with a good argument for that. See: https://osdn.jp/ticket/browse.php?group_id=3863&tid=35283#comment:3863:35283:1435293070 Summarising: POSIX states that "each occurrence in the input of an IFS character that is not IFS white space, along with any adjacent IFS white space, shall delimit a field". This *may* be interpreted to read that a final non-whitespace IFS character denotes an empty final field, because otherwise that final character wouldn't be delimiting any field, but only terminating one. It's pretty ambiguous, though. Based on that discussion I concluded that both interpretations are defensible, so I didn't report this to the zsh-workers list as a bug. (In the shell library I'm developing, which you have an earlier copy of, I gave this quirk the id QRK_IFSFINAL instead of BUG_IFSFINAL to reflect that it's a quirk.) Maybe this is still worth bringing up on zsh-workers as a discussion, but lately I haven't had much time. - Martijn -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 03/12/2015 22:17, Stephane Chazelas wrote: > 2015-12-03 22:02:14 +0100, Harald van Dijk: > [....] >> $ for shell in bash mksh posh zsh; do printf %s: "$shell"; $shell >> -c 'IFS=,; echo a, | { read v; echo "<$v>"; }'; done >> bash:<a> >> mksh:<a> >> posh:<a,> >> zsh:<a,> >> >> As far as I can tell, the posh/zsh behaviour is the correct >> behaviour, but I'm not convinced yet my interpretation is correct. > [...] > > No, that would be the same as for: > > v=a:b: > IFS=: > set -f > > set -- $v > > It's meant to split into "a" and "b", not "a", "b" and "". As > ":" is meant to be treated as a *delimiter* or *terminator*. > > That has been discussed a few times on the austin group mailing > list. > > zsh and pdksh (and other descendants of the Forsyth shell) treat it as > separator (and are not compliant), mksh (derived from pdksh) > changed it recently. posh (also based on pdksh) still hasn't changed it. zsh indeed expands this into "a", "b" and "". The same version of posh that gives <a,> for my test gives just "a" and "b" for yours though. I do see your point. Thanks for the clear example, I think I agree with you, the description of field splitting mentions that delimiters are used as terminators: "The shell shall treat each character of the IFS as a delimiter and use the delimiters as field terminators to [...]" It should not be much of a problem to extend the patch I posted to cover the rules as you describe them, I will make an attempt at this later. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2015-12-03 22:43:39 +0100, Martijn Dekker: > Stephane Chazelas schreef op 03-12-15 om 22:17: > > It's meant to split into "a" and "b", not "a", "b" and "". As > > ":" is meant to be treated as a *delimiter* or *terminator*. > > That was my interpretation of the standard, too. So I reported this as a > bug to author of yash, but he reads the standard differently and came up > with a good argument for that. See: > > https://osdn.jp/ticket/browse.php?group_id=3863&tid=35283#comment:3863:35283:1435293070 > > Summarising: POSIX states that "each occurrence in the input of an IFS > character that is not IFS white space, along with any adjacent IFS white > space, shall delimit a field". This *may* be interpreted to read that a > final non-whitespace IFS character denotes an empty final field, because > otherwise that final character wouldn't be delimiting any field, but > only terminating one. It's pretty ambiguous, though. [...] I agree the spec is not very clear http://thread.gmane.org/gmane.comp.shells.bash.bugs/4825 http://thread.gmane.org/gmane.comp.shells.bash.bugs/15768 But see this interpretation: https://standards.ieee.org/findstds/interps/1003.1-2001/1003.1-2001-98.html I can't find the austin-group discussions, but I'm pretty sure I've seen several and Chet is refering to one from 2005 over there.
2015-12-03 23:04:31 +0000, Stephane Chazelas: [...] > > Summarising: POSIX states that "each occurrence in the input of an IFS > > character that is not IFS white space, along with any adjacent IFS white > > space, shall delimit a field". This *may* be interpreted to read that a > > final non-whitespace IFS character denotes an empty final field, because > > otherwise that final character wouldn't be delimiting any field, but > > only terminating one. It's pretty ambiguous, though. > [...] > > I agree the spec is not very clear > http://thread.gmane.org/gmane.comp.shells.bash.bugs/4825 > http://thread.gmane.org/gmane.comp.shells.bash.bugs/15768 > > But see this interpretation: > https://standards.ieee.org/findstds/interps/1003.1-2001/1003.1-2001-98.html > > I can't find the austin-group discussions, but I'm pretty sure > I've seen several and Chet is refering to one from 2005 over > there. [...] See also: https://groups.google.com/forum/#!topic/comp.unix.shell/krZy2rvnv2g (Geoff is from the OpenGroup). https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=512438 I actually was the one raising it for posh. I (and the posh maintainer apparently) failed to notice that it was applying to "read" there as well.
2015-12-03 23:17:58 +0000, Stephane Chazelas: > 2015-12-03 23:04:31 +0000, Stephane Chazelas: > [...] > > > Summarising: POSIX states that "each occurrence in the input of an IFS > > > character that is not IFS white space, along with any adjacent IFS white > > > space, shall delimit a field". This *may* be interpreted to read that a > > > final non-whitespace IFS character denotes an empty final field, because > > > otherwise that final character wouldn't be delimiting any field, but > > > only terminating one. It's pretty ambiguous, though. > > [...] > > > > I agree the spec is not very clear > > http://thread.gmane.org/gmane.comp.shells.bash.bugs/4825 > > http://thread.gmane.org/gmane.comp.shells.bash.bugs/15768 > > > > But see this interpretation: > > https://standards.ieee.org/findstds/interps/1003.1-2001/1003.1-2001-98.html > > > > I can't find the austin-group discussions, but I'm pretty sure > > I've seen several and Chet is refering to one from 2005 over > > there. > [...] > > See also: > https://groups.google.com/forum/#!topic/comp.unix.shell/krZy2rvnv2g > (Geoff is from the OpenGroup). > > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=512438 > I actually was the one raising it for posh. I (and the posh > maintainer apparently) failed to notice that it was applying to > "read" there as well. [...] More: http://thread.gmane.org/gmane.comp.shells.bash.bugs/9661/focus=9662 Where Chet (I think) refers to: https://standards.ieee.org/findstds/interps/1003-2-92_int/pasc-1003.2-98.html The 2005 Austin group discussion can be seen on the web archive: http://web.archive.org/web/20050301075813/http://www.opengroup.org/austin/mailarchives/ One article there was forwarded to the zsh mailing list: http://www.zsh.org/mla/workers/2005/msg00306.html (but not acted upon) Which leads to https://www.opengroup.org/austin/aardvark/latest/xcubug2.txt confirming that https://standards.ieee.org/findstds/interps/1003.1-2001/1003.1-2001-98.html is contemporary and a follow-up of that 2005 discussion.
diff --git a/src/expand.c b/src/expand.c index b2d710d..6afd562 100644 --- a/src/expand.c +++ b/src/expand.c @@ -203,7 +203,7 @@ expandarg(union node *arg, struct arglist *arglist, int flag) * TODO - EXP_REDIR */ if (flag & EXP_FULL) { - ifsbreakup(p, &exparg); + ifsbreakup(p, 0, &exparg); *exparg.lastp = NULL; exparg.lastp = &exparg.list; expandmeta(exparg.list, flag); @@ -1016,9 +1016,11 @@ recordregion(int start, int end, int nulonly) * Break the argument string into pieces based upon IFS and add the * strings to the argument list. The regions of the string to be * searched for IFS characters have been stored by recordregion. + * If maxargs is non-zero, at most maxargs arguments will be created, by + * joining together the last arguments. */ void -ifsbreakup(char *string, struct arglist *arglist) +ifsbreakup(char *string, int maxargs, struct arglist *arglist) { struct ifsregion *ifsp; struct strlist *sp; @@ -1054,12 +1056,36 @@ ifsbreakup(char *string, struct arglist *arglist) start = p; continue; } + /* If only reading one more argument, combine any field terminators, + * except for trailing IFS whitespace. */ + if (maxargs == 1) { + q = p; + p++; + if (ifsspc) { + /* Ignore IFS whitespace at end */ + for (;;) { + if (p >= string + ifsp->endoff) { + *q = '\0'; + goto add; + } + if (*p == (char)CTLESC) + p++; + ifsspc = strchr(ifs, *p) && strchr(defifs, *p); + p++; + if (!ifsspc) { + break; + } + } + } + continue; + } *q = '\0'; sp = (struct strlist *)stalloc(sizeof *sp); sp->text = start; *arglist->lastp = sp; arglist->lastp = &sp->next; p++; + if (maxargs) maxargs--; if (!nulonly) { for (;;) { if (p >= string + ifsp->endoff) { diff --git a/src/expand.h b/src/expand.h index 6a90f67..26dc5b4 100644 --- a/src/expand.h +++ b/src/expand.h @@ -69,7 +69,7 @@ char *_rmescapes(char *, int); int casematch(union node *, char *); void recordregion(int, int, int); void removerecordregions(int); -void ifsbreakup(char *, struct arglist *); +void ifsbreakup(char *, int, struct arglist *); void ifsfree(void); /* From arith.y */ diff --git a/src/miscbltin.c b/src/miscbltin.c index b596fd2..39b9c47 100644 --- a/src/miscbltin.c +++ b/src/miscbltin.c @@ -67,28 +67,21 @@ * less fields than variables -> remaining variables unset. * * @param line complete line of input + * @param ac argument count * @param ap argument (variable) list * @param len length of line including trailing '\0' */ static void -readcmd_handle_line(char *s, char **ap) +readcmd_handle_line(char *s, int ac, char **ap) { struct arglist arglist; struct strlist *sl; - char *backup; - char *line; - /* ifsbreakup will fiddle with stack region... */ - line = stackblock(); s = grabstackstr(s); - /* need a copy, so that delimiters aren't lost - * in case there are more fields than variables */ - backup = sstrdup(line); - arglist.lastp = &arglist.list; - ifsbreakup(s, &arglist); + ifsbreakup(s, ac, &arglist); *arglist.lastp = NULL; ifsfree(); @@ -104,21 +97,6 @@ readcmd_handle_line(char *s, char **ap) return; } - /* remaining fields present, but no variables left. */ - if (!ap[1] && sl->next) { - size_t offset; - char *remainder; - - /* FIXME little bit hacky, assuming that ifsbreakup - * will not modify the length of the string */ - offset = sl->text - s; - remainder = backup + offset; - rmescapes(remainder); - setvar(*ap, remainder, 0); - - return; - } - /* set variable to field */ rmescapes(sl->text); setvar(*ap, sl->text, 0); @@ -211,7 +189,7 @@ start: out: recordregion(startloc, p - (char *)stackblock(), 0); STACKSTRNUL(p); - readcmd_handle_line(p + 1, ap); + readcmd_handle_line(p + 1, argc - (ap - argv), ap); return status; }