diff mbox

dash: read does not ignore trailing spaces

Message ID 5660ADD6.4020308@gigawatt.nl (mailing list archive)
State Superseded
Delegated to: Herbert Xu
Headers show

Commit Message

Harald van Dijk Dec. 3, 2015, 9:02 p.m. UTC
On 02/12/2015 23:37, Gioele Barabucci wrote:
> Hello,
>
> I am forwarding a bug [1] reported by a Debian user: `read` does not
> ignore trailing spaces. The current version of dash is affected by
> this bug.
>
> A simple test from the original reporter:
>
>      $ dash -c 'echo "  a b  " | { read v ; echo "<$v>" ; }'
>      <a b  >
>
>      $ bash -c 'echo "  a b  " | { read v ; echo "<$v>" ; }'
>      <a b>
>
> Other shells like posh and mksh behave like bash.

This is indeed a bug based on the current specification. In the past, 
the specification was unclear and the dash interpretation was a 
legitimate one, but currently it explicitly spells out that trailing IFS 
whitespace gets ignored.

However, unless I'm misreading the spec, mksh and bash don't seem to 
implement it properly either: only trailing IFS whitespace is supposed 
to be dropped. IFS non-whitespace is supposed to remain, even at the end 
of the input. mksh and bash remove it, posh and zsh leave it in:

   $ for shell in bash mksh posh zsh; do printf %s: "$shell"; $shell -c 
'IFS=,; echo a, | { read v; echo "<$v>"; }'; done
   bash:<a>
   mksh:<a>
   posh:<a,>
   zsh:<a,>

As far as I can tell, the posh/zsh behaviour is the correct behaviour, 
but I'm not convinced yet my interpretation is correct.

Attached is a not fully tested proof of concept to implement the 
posh/zsh behaviour in dash by extending ifsbreakup() to allow specifying 
a maximum number of arguments instead of fixing it up in 
readcmd_handle_line(). It returns <a b> in your test, and <a,> in mine. 
Feedback welcome.

Cheers,
Harald van Dijk

> This error is reproducible with dash 0.5.7 and with the current master
> git master branch, commit 2e5842258bd5b252ffdaa630db09c9a19a9717ca.
>
> [1] https://bugs.debian.org/794965
>
> --
> Gioele Barabucci <gioele@svario.it>

Comments

Stephane Chazelas Dec. 3, 2015, 9:17 p.m. UTC | #1
2015-12-03 22:02:14 +0100, Harald van Dijk:
[....]
>   $ for shell in bash mksh posh zsh; do printf %s: "$shell"; $shell
> -c 'IFS=,; echo a, | { read v; echo "<$v>"; }'; done
>   bash:<a>
>   mksh:<a>
>   posh:<a,>
>   zsh:<a,>
> 
> As far as I can tell, the posh/zsh behaviour is the correct
> behaviour, but I'm not convinced yet my interpretation is correct.
[...]

No, that would be the same as for:

v=a:b:
IFS=:
set -f

set -- $v

It's meant to split into "a" and "b", not "a", "b" and "". As
":" is meant to be treated as a *delimiter* or *terminator*.

That has been discussed a few times on the austin group mailing
list.

zsh and pdksh (and other descendants of the Forsyth shell) treat it as
separator (and are not compliant), mksh (derived from pdksh)
changed it recently. posh (also based on pdksh) still hasn't changed it.

That's a bit counter-intuitive and means for instance you can't
use $IFS to split variables like $PATH/$LD_LIBRARY_PATH...

In the case of read, it even makes less sense because:

~$ echo a:: | sh -c 'IFS=: read a; echo "$a"'
a::
~$ echo a: | sh -c 'IFS=: read a; echo "$a"'
a

But that's how it's specified.

So dash is indeed conformant there.
Martijn Dekker Dec. 3, 2015, 9:43 p.m. UTC | #2
Stephane Chazelas schreef op 03-12-15 om 22:17:
> It's meant to split into "a" and "b", not "a", "b" and "". As
> ":" is meant to be treated as a *delimiter* or *terminator*.

That was my interpretation of the standard, too. So I reported this as a
bug to author of yash, but he reads the standard differently and came up
with a good argument for that. See:

https://osdn.jp/ticket/browse.php?group_id=3863&tid=35283#comment:3863:35283:1435293070

Summarising: POSIX states that "each occurrence in the input of an IFS
character that is not IFS white space, along with any adjacent IFS white
space, shall delimit a field". This *may* be interpreted to read that a
final non-whitespace IFS character denotes an empty final field, because
otherwise that final character wouldn't be delimiting any field, but
only terminating one. It's pretty ambiguous, though.

Based on that discussion I concluded that both interpretations are
defensible, so I didn't report this to the zsh-workers list as a bug.
(In the shell library I'm developing, which you have an earlier copy of,
I gave this quirk the id QRK_IFSFINAL instead of BUG_IFSFINAL to reflect
that it's a quirk.) Maybe this is still worth bringing up on zsh-workers
as a discussion, but lately I haven't had much time.

- Martijn

--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Harald van Dijk Dec. 3, 2015, 10:26 p.m. UTC | #3
On 03/12/2015 22:17, Stephane Chazelas wrote:
> 2015-12-03 22:02:14 +0100, Harald van Dijk:
> [....]
>>    $ for shell in bash mksh posh zsh; do printf %s: "$shell"; $shell
>> -c 'IFS=,; echo a, | { read v; echo "<$v>"; }'; done
>>    bash:<a>
>>    mksh:<a>
>>    posh:<a,>
>>    zsh:<a,>
>>
>> As far as I can tell, the posh/zsh behaviour is the correct
>> behaviour, but I'm not convinced yet my interpretation is correct.
> [...]
>
> No, that would be the same as for:
>
> v=a:b:
> IFS=:
> set -f
>
> set -- $v
>
> It's meant to split into "a" and "b", not "a", "b" and "". As
> ":" is meant to be treated as a *delimiter* or *terminator*.
>
> That has been discussed a few times on the austin group mailing
> list.
>
> zsh and pdksh (and other descendants of the Forsyth shell) treat it as
> separator (and are not compliant), mksh (derived from pdksh)
> changed it recently. posh (also based on pdksh) still hasn't changed it.

zsh indeed expands this into "a", "b" and "". The same version of posh 
that gives <a,> for my test gives just "a" and "b" for yours though.

I do see your point. Thanks for the clear example, I think I agree with 
you, the description of field splitting mentions that delimiters are 
used as terminators:

   "The shell shall treat each character of the IFS as a delimiter and 
use the delimiters as field terminators to [...]"

It should not be much of a problem to extend the patch I posted to cover 
the rules as you describe them, I will make an attempt at this later.

Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stephane Chazelas Dec. 3, 2015, 11:04 p.m. UTC | #4
2015-12-03 22:43:39 +0100, Martijn Dekker:
> Stephane Chazelas schreef op 03-12-15 om 22:17:
> > It's meant to split into "a" and "b", not "a", "b" and "". As
> > ":" is meant to be treated as a *delimiter* or *terminator*.
> 
> That was my interpretation of the standard, too. So I reported this as a
> bug to author of yash, but he reads the standard differently and came up
> with a good argument for that. See:
> 
> https://osdn.jp/ticket/browse.php?group_id=3863&tid=35283#comment:3863:35283:1435293070
> 
> Summarising: POSIX states that "each occurrence in the input of an IFS
> character that is not IFS white space, along with any adjacent IFS white
> space, shall delimit a field". This *may* be interpreted to read that a
> final non-whitespace IFS character denotes an empty final field, because
> otherwise that final character wouldn't be delimiting any field, but
> only terminating one. It's pretty ambiguous, though.
[...]

I agree the spec is not very clear
http://thread.gmane.org/gmane.comp.shells.bash.bugs/4825
http://thread.gmane.org/gmane.comp.shells.bash.bugs/15768

But see this interpretation:
https://standards.ieee.org/findstds/interps/1003.1-2001/1003.1-2001-98.html

I can't find the austin-group discussions, but I'm pretty sure
I've seen several and Chet is refering to one from 2005 over
there.
Stephane Chazelas Dec. 3, 2015, 11:17 p.m. UTC | #5
2015-12-03 23:04:31 +0000, Stephane Chazelas:
[...]
> > Summarising: POSIX states that "each occurrence in the input of an IFS
> > character that is not IFS white space, along with any adjacent IFS white
> > space, shall delimit a field". This *may* be interpreted to read that a
> > final non-whitespace IFS character denotes an empty final field, because
> > otherwise that final character wouldn't be delimiting any field, but
> > only terminating one. It's pretty ambiguous, though.
> [...]
> 
> I agree the spec is not very clear
> http://thread.gmane.org/gmane.comp.shells.bash.bugs/4825
> http://thread.gmane.org/gmane.comp.shells.bash.bugs/15768
> 
> But see this interpretation:
> https://standards.ieee.org/findstds/interps/1003.1-2001/1003.1-2001-98.html
> 
> I can't find the austin-group discussions, but I'm pretty sure
> I've seen several and Chet is refering to one from 2005 over
> there.
[...]

See also:
https://groups.google.com/forum/#!topic/comp.unix.shell/krZy2rvnv2g
(Geoff is from the OpenGroup).

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=512438
I actually was the one raising it for posh. I (and the posh
maintainer apparently) failed to notice that it was applying to
"read" there as well.
Stephane Chazelas Dec. 4, 2015, midnight UTC | #6
2015-12-03 23:17:58 +0000, Stephane Chazelas:
> 2015-12-03 23:04:31 +0000, Stephane Chazelas:
> [...]
> > > Summarising: POSIX states that "each occurrence in the input of an IFS
> > > character that is not IFS white space, along with any adjacent IFS white
> > > space, shall delimit a field". This *may* be interpreted to read that a
> > > final non-whitespace IFS character denotes an empty final field, because
> > > otherwise that final character wouldn't be delimiting any field, but
> > > only terminating one. It's pretty ambiguous, though.
> > [...]
> > 
> > I agree the spec is not very clear
> > http://thread.gmane.org/gmane.comp.shells.bash.bugs/4825
> > http://thread.gmane.org/gmane.comp.shells.bash.bugs/15768
> > 
> > But see this interpretation:
> > https://standards.ieee.org/findstds/interps/1003.1-2001/1003.1-2001-98.html
> > 
> > I can't find the austin-group discussions, but I'm pretty sure
> > I've seen several and Chet is refering to one from 2005 over
> > there.
> [...]
> 
> See also:
> https://groups.google.com/forum/#!topic/comp.unix.shell/krZy2rvnv2g
> (Geoff is from the OpenGroup).
> 
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=512438
> I actually was the one raising it for posh. I (and the posh
> maintainer apparently) failed to notice that it was applying to
> "read" there as well.
[...]

More:

http://thread.gmane.org/gmane.comp.shells.bash.bugs/9661/focus=9662

Where Chet (I think) refers to:
https://standards.ieee.org/findstds/interps/1003-2-92_int/pasc-1003.2-98.html

The 2005 Austin group discussion can be seen on the web archive:
http://web.archive.org/web/20050301075813/http://www.opengroup.org/austin/mailarchives/

One article there was forwarded to the zsh mailing list:
http://www.zsh.org/mla/workers/2005/msg00306.html
(but not acted upon)

Which leads to https://www.opengroup.org/austin/aardvark/latest/xcubug2.txt
confirming that
https://standards.ieee.org/findstds/interps/1003.1-2001/1003.1-2001-98.html
is contemporary and a follow-up of that 2005 discussion.
diff mbox

Patch

diff --git a/src/expand.c b/src/expand.c
index b2d710d..6afd562 100644
--- a/src/expand.c
+++ b/src/expand.c
@@ -203,7 +203,7 @@  expandarg(union node *arg, struct arglist *arglist, int flag)
 	 * TODO - EXP_REDIR
 	 */
 	if (flag & EXP_FULL) {
-		ifsbreakup(p, &exparg);
+		ifsbreakup(p, 0, &exparg);
 		*exparg.lastp = NULL;
 		exparg.lastp = &exparg.list;
 		expandmeta(exparg.list, flag);
@@ -1016,9 +1016,11 @@  recordregion(int start, int end, int nulonly)
  * Break the argument string into pieces based upon IFS and add the
  * strings to the argument list.  The regions of the string to be
  * searched for IFS characters have been stored by recordregion.
+ * If maxargs is non-zero, at most maxargs arguments will be created, by
+ * joining together the last arguments.
  */
 void
-ifsbreakup(char *string, struct arglist *arglist)
+ifsbreakup(char *string, int maxargs, struct arglist *arglist)
 {
 	struct ifsregion *ifsp;
 	struct strlist *sp;
@@ -1054,12 +1056,36 @@  ifsbreakup(char *string, struct arglist *arglist)
 						start = p;
 						continue;
 					}
+					/* If only reading one more argument, combine any field terminators,
+					 * except for trailing IFS whitespace. */
+					if (maxargs == 1) {
+						q = p;
+						p++;
+						if (ifsspc) {
+							/* Ignore IFS whitespace at end */
+							for (;;) {
+								if (p >= string + ifsp->endoff) {
+									*q = '\0';
+									goto add;
+								}
+								if (*p == (char)CTLESC)
+									p++;
+								ifsspc = strchr(ifs, *p) && strchr(defifs, *p);
+								p++;
+								if (!ifsspc) {
+									break;
+								}
+							}
+						}
+						continue;
+					}
 					*q = '\0';
 					sp = (struct strlist *)stalloc(sizeof *sp);
 					sp->text = start;
 					*arglist->lastp = sp;
 					arglist->lastp = &sp->next;
 					p++;
+					if (maxargs) maxargs--;
 					if (!nulonly) {
 						for (;;) {
 							if (p >= string + ifsp->endoff) {
diff --git a/src/expand.h b/src/expand.h
index 6a90f67..26dc5b4 100644
--- a/src/expand.h
+++ b/src/expand.h
@@ -69,7 +69,7 @@  char *_rmescapes(char *, int);
 int casematch(union node *, char *);
 void recordregion(int, int, int);
 void removerecordregions(int); 
-void ifsbreakup(char *, struct arglist *);
+void ifsbreakup(char *, int, struct arglist *);
 void ifsfree(void);
 
 /* From arith.y */
diff --git a/src/miscbltin.c b/src/miscbltin.c
index b596fd2..39b9c47 100644
--- a/src/miscbltin.c
+++ b/src/miscbltin.c
@@ -67,28 +67,21 @@ 
  *  less fields than variables -> remaining variables unset.
  *
  *  @param line complete line of input
+ *  @param ac argument count
  *  @param ap argument (variable) list
  *  @param len length of line including trailing '\0'
  */
 static void
-readcmd_handle_line(char *s, char **ap)
+readcmd_handle_line(char *s, int ac, char **ap)
 {
 	struct arglist arglist;
 	struct strlist *sl;
-	char *backup;
-	char *line;
 
-	/* ifsbreakup will fiddle with stack region... */
-	line = stackblock();
 	s = grabstackstr(s);
 
-	/* need a copy, so that delimiters aren't lost
-	 * in case there are more fields than variables */
-	backup = sstrdup(line);
-
 	arglist.lastp = &arglist.list;
 	
-	ifsbreakup(s, &arglist);
+	ifsbreakup(s, ac, &arglist);
 	*arglist.lastp = NULL;
 	ifsfree();
 
@@ -104,21 +97,6 @@  readcmd_handle_line(char *s, char **ap)
 			return;
 		}
 
-		/* remaining fields present, but no variables left. */
-		if (!ap[1] && sl->next) {
-			size_t offset;
-			char *remainder;
-
-			/* FIXME little bit hacky, assuming that ifsbreakup 
-			 * will not modify the length of the string */
-			offset = sl->text - s;
-			remainder = backup + offset;
-			rmescapes(remainder);
-			setvar(*ap, remainder, 0);
-
-			return;
-		}
-		
 		/* set variable to field */
 		rmescapes(sl->text);
 		setvar(*ap, sl->text, 0);
@@ -211,7 +189,7 @@  start:
 out:
 	recordregion(startloc, p - (char *)stackblock(), 0);
 	STACKSTRNUL(p);
-	readcmd_handle_line(p + 1, ap);
+	readcmd_handle_line(p + 1, argc - (ap - argv), ap);
 	return status;
 }