diff mbox

expand: Fix ghost fields with unquoted $@/$*

Message ID 20180323043732.GA30910@gondor.apana.org.au (mailing list archive)
State Superseded
Delegated to: Herbert Xu
Headers show

Commit Message

Herbert Xu March 23, 2018, 4:37 a.m. UTC
Harald van Dijk <harald@gigawatt.nl> wrote:
> On 22/03/2018 22:38, Martijn Dekker wrote:
>> Op 22-03-18 om 20:28 schreef Harald van Dijk:
>>> On 22/03/2018 03:40, Martijn Dekker wrote:
>>>> This patch fixes the bug that, given no positional parameters, unquoted
>>>> $@ and $* incorrectly generate one empty field (they should generate no
>>>> fields). Apparently that was a side effect of the above.
>>>
>>> This seems weird though. If you want to remove the recording of empty
>>> regions because they are pointless, then how does removing them fix a
>>> bug? Doesn't this show that empty regions do have an effect? Perhaps
>>> they're not supposed to have any effect, perhaps it's a specific
>>> combination of empty regions and something else that triggers some bug,
>>> and perhaps that combination can no longer occur with your patch.
>> 
>> The latter is my guess, but I haven't had time to investigate it.
> 
> Looking into it again:
> 
> When IFS is set to an empty string, sepc is set to '\0' in varvalue(). 
> This then causes *quotedp to be set to true, meaning evalvar()'s quoted 
> variable is turned on. quoted is then passed to recordregion() as the 
> nulonly parameter.
> 
> ifsp->nulonly has a bigger effect than merely selecting whether to use 
> $IFS or whether to only split on null bytes: in ifsbreakup(), nulonly 
> also causes string termination to be suppressed. That's correct: that 
> special treatment is required to preserve empty fields in "$@" 
> expansion. But it should *only* be used when $@ is quoted: ifsbreakup() 
> takes nulonly from the last IFS region, even if it's empty, so having an 
> additional zero-length region with nulonly enabled causes confusion.
>
> Passing quoted by value to varvalue() and not attempting to modify it 
> should therefore, and in my quick testing does, also work to fix the 
> original $@ bug.

You're right.  The proper fix to this is to ensure that nulonly
is not set in varvalue for $*.  It should only be set for $@ when
it's inside double quotes.

In fact there is another bug while we're playing with $@/$*.
When IFS is set to a non-whitespace character such as :, $*
outside quotes won't remove empty fields as it should.

This patch fixes both problems.

Reported-by: Martijn Dekker <martijn@inlv.org>
Suggested-by: Harald van Dijk <harald@gigawatt.nl>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Comments

Harald van Dijk March 23, 2018, 8:27 a.m. UTC | #1
On 23/03/2018 05:37, Herbert Xu wrote:
> You're right.  The proper fix to this is to ensure that nulonly
> is not set in varvalue for $*.  It should only be set for $@ when
> it's inside double quotes.
> 
> In fact there is another bug while we're playing with $@/$*.
> When IFS is set to a non-whitespace character such as :, $*
> outside quotes won't remove empty fields as it should.

That's not limited to empty fields:

   IFS=,
   set -- , ,
   set -- $@
   echo $#

This should print 2, not 3. (bash gets this wrong too.)

> This patch fixes both problems.

Also the above. But it breaks a traditional ash extension:

   unset IFS
   set -- A1 B2 C3
   echo ${@%2 C3}

This used to print A1 B, but prints A1 B2 C3 with your patch.

   echo ${@%2}

This used to print A1 B2 C3, but prints A1 B with your patch.

This extension was already pretty much broken within quoted expansions: 
"${@%2}" would already expand to A1 B. Your patch extends that to the 
non-quoted case.

I do not know if you want to keep this extension in.

Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu March 23, 2018, 9:10 a.m. UTC | #2
On Fri, Mar 23, 2018 at 09:27:37AM +0100, Harald van Dijk wrote:
> 
> Also the above. But it breaks a traditional ash extension:
> 
>   unset IFS
>   set -- A1 B2 C3
>   echo ${@%2 C3}
> 
> This used to print A1 B, but prints A1 B2 C3 with your patch.
> 
>   echo ${@%2}
> 
> This used to print A1 B2 C3, but prints A1 B with your patch.

Hmm, it still does on my machine:

$ build/src/dash
$ unset IFS
$ set -- A1 B2 C3
$ echo ${@%2 C3}
A1 B
$

I'm testing against this commit:

commit c166b718b496da63c4df7a0972df2fc6cd38256b
Author: Herbert Xu <herbert@gondor.apana.org.au>
Date:   Thu Mar 15 18:27:30 2018 +0800

    parser: Fix backquote support in here-document EOF mark

Are you testing against the same commit or have you applied other
patches already?

Thanks,
Harald van Dijk March 23, 2018, 9:27 a.m. UTC | #3
On 23/03/2018 10:10, Herbert Xu wrote:
> On Fri, Mar 23, 2018 at 09:27:37AM +0100, Harald van Dijk wrote:
>>
>> Also the above. But it breaks a traditional ash extension:
>>
>>    unset IFS
>>    set -- A1 B2 C3
>>    echo ${@%2 C3}
>>
>> This used to print A1 B, but prints A1 B2 C3 with your patch.
>>
>>    echo ${@%2}
>>
>> This used to print A1 B2 C3, but prints A1 B with your patch.
> 
> Hmm, it still does on my machine:

Apologies, I ended up sending the wrong test case. It's when IFS has its 
default value that the behaviour is changed. It's when it's unset that 
it still works.

Initial testing was against 0.5.9.1 plus your patch, now retested 
against c166b71 plus your patch.

Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Martijn Dekker March 23, 2018, 9:48 a.m. UTC | #4
Op 23-03-18 om 05:37 schreef Herbert Xu:
> You're right.  The proper fix to this is to ensure that nulonly
> is not set in varvalue for $*.  It should only be set for $@ when
> it's inside double quotes.
> 
> In fact there is another bug while we're playing with $@/$*.
> When IFS is set to a non-whitespace character such as :, $*
> outside quotes won't remove empty fields as it should.
> 
> This patch fixes both problems.

Unfortunately it also introduces a bug with $*.

$ src/dash -c 'IFS=:; v=$*; printf "[%s]\n" "$v"' _ abc 'def ghi' jkl
[abcdef ghijkl]

Expected:
[abc:def ghi:jkl]


$ src/dash -fc 'IFS=:; set ${v=$*}; printf [%s]\\n "$v" "$@"' \
	_ abc 'def ghi' jkl
[abcdef ghijkl]
[abcdef ghijkl]

Expected:

[abc:def ghi:jkl]
[abc]
[def ghi]
[jkl]


$ src/dash -fc 'IFS=:; set "${v=$*}"; printf [%s]\\n "$v" "$@"' \
	_ "abc" "def ghi" "jkl"
[abcdef ghijkl]
[abcdef ghijkl]

Expected:

[abc:def ghi:jkl]
[abc:def ghi:jkl]


Thanks,

- M.
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu March 23, 2018, 9:52 a.m. UTC | #5
On Fri, Mar 23, 2018 at 10:27:20AM +0100, Harald van Dijk wrote:
> On 23/03/2018 10:10, Herbert Xu wrote:
> >On Fri, Mar 23, 2018 at 09:27:37AM +0100, Harald van Dijk wrote:
> >>
> >>Also the above. But it breaks a traditional ash extension:
> >>
> >>   unset IFS
> >>   set -- A1 B2 C3
> >>   echo ${@%2 C3}
> >>
> >>This used to print A1 B, but prints A1 B2 C3 with your patch.
> >>
> >>   echo ${@%2}
> >>
> >>This used to print A1 B2 C3, but prints A1 B with your patch.
> >
> >Hmm, it still does on my machine:
> 
> Apologies, I ended up sending the wrong test case. It's when IFS has its
> default value that the behaviour is changed. It's when it's unset that it
> still works.

Right.  I think I'll leave this one alone.  It worked purely by
chance previously because dash incorrectly used the first IFS
character even when it's outside of double-quotes, which is why
the expansion ${@%2 C3} matches.

The second example looks strange, but bash/ksh does something funky
too in this case:

bash-4.2$ echo ${*%2}
A1 B C3
bash-4.2$

We could change dash to match this behaviour as it seems to make
a little bit more sense but I'm not too bothered.

Cheers,
Harald van Dijk March 23, 2018, 11:19 a.m. UTC | #6
On 23/03/2018 10:52, Herbert Xu wrote:
> On Fri, Mar 23, 2018 at 10:27:20AM +0100, Harald van Dijk wrote:
>> On 23/03/2018 10:10, Herbert Xu wrote:
>>> On Fri, Mar 23, 2018 at 09:27:37AM +0100, Harald van Dijk wrote:
>>>>
>>>> Also the above. But it breaks a traditional ash extension:
>>>>
>>>>    unset IFS
>>>>    set -- A1 B2 C3
>>>>    echo ${@%2 C3}
>>>>
>>>> This used to print A1 B, but prints A1 B2 C3 with your patch.
>>>>
>>>>    echo ${@%2}
>>>>
>>>> This used to print A1 B2 C3, but prints A1 B with your patch.
>>>
>>> Hmm, it still does on my machine:
>>
>> Apologies, I ended up sending the wrong test case. It's when IFS has its
>> default value that the behaviour is changed. It's when it's unset that it
>> still works.
> 
> Right.  I think I'll leave this one alone.  It worked purely by
> chance previously because dash incorrectly used the first IFS
> character even when it's outside of double-quotes, which is why
> the expansion ${@%2 C3} matches.

When the first IFS character is a whitespace character, which it usually 
is, it's not incorrect: inserting that character and then splitting on 
it produces the exact results that POSIX requires. It's only when IFS 
doesn't start with a whitespace character that it's incorrect.

> The second example looks strange, but bash/ksh does something funky
> too in this case:
> 
> bash-4.2$ echo ${*%2}
> A1 B C3
> bash-4.2$
 >
 > We could change dash to match this behaviour as it seems to make
 > a little bit more sense but I'm not too bothered.

In bash and ksh nothing funky is going on. This is well-defined and 
documented in the manuals: the trimming is done for each field. So when 
$# is 3, ${@%pat} is equivalent to ${1%pat} ${2%pat} ${3%pat}, except 
that pat is only expanded once.

If you're interested in implementing that in dash, a relevant test case is

   set -- A1 B2 C3
   unset p
   echo ${1%${p+x}${p=?}} ${2%${p+x}${p=?}} ${3%${p+x}${p=?}}
   unset p
   echo ${@%${p+x}${p=?}}

In bash and ksh, this is supposed to print, and does print:

   A B2 C3
   A B C

Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu March 23, 2018, 12:20 p.m. UTC | #7
On Fri, Mar 23, 2018 at 12:19:28PM +0100, Harald van Dijk wrote:
>
> >Right.  I think I'll leave this one alone.  It worked purely by
> >chance previously because dash incorrectly used the first IFS
> >character even when it's outside of double-quotes, which is why
> >the expansion ${@%2 C3} matches.
> 
> When the first IFS character is a whitespace character, which it usually is,
> it's not incorrect: inserting that character and then splitting on it
> produces the exact results that POSIX requires. It's only when IFS doesn't
> start with a whitespace character that it's incorrect.

Where does POSIX require this? AFAICS the only time IFS comes into
play in forming the fields is with "$*" which is not relevant here.

	@
	Expands to the positional parameters, starting from one,
	initially producing one field for each positional parameter
	that is set. When the expansion occurs in a context where field
	splitting will be performed, any empty fields may be discarded
	and each of the non-empty fields shall be further split as
	described in Field Splitting. When the expansion occurs within
	double-quotes, the behavior is unspecified unless one of the
	following is true: Field splitting as described in Field Splitting
	would be performed if the expansion were not within double-quotes
	(regardless of whether field splitting would have any effect;
	for example, if IFS is null).

	The double-quotes are within the word of a ${parameter:-word}
	or a ${parameter:+word} expansion (with or without the <colon>;
	see Parameter Expansion) which would have been subject to field
	splitting if parameter had been expanded instead of word.

	If one of these conditions is true, the initial fields shall be
	retained as separate fields, except that if the parameter being
	expanded was embedded within a word, the first field shall be
	joined with the beginning part of the original word and the
	last field shall be joined with the end part of the original
	word. In all other contexts the results of the expansion
	are unspecified. If there are no positional parameters, the
	expansion of '@' shall generate zero fields, even when '@'
	is within double-quotes; however, if the expansion is embedded
	within a word which contains one or more other parts that expand
	to a quoted null string, these null string(s) shall still produce
	an empty field, except that if the other parts are all within
	the same double-quotes as the '@', it is unspecified whether
	the result is zero fields or one empty field.

	*
	Expands to the positional parameters, starting from one,
	initially producing one field for each positional parameter
	that is set. When the expansion occurs in a context where field
	splitting will be performed, any empty fields may be discarded and
	each of the non-empty fields shall be further split as described
	in Field Splitting. When the expansion occurs in a context where
	field splitting will not be performed, the initial fields shall be
	joined to form a single field with the value of each parameter
	separated by the first character of the IFS variable if IFS
	contains at least one character, or separated by a <space> if IFS
	is unset, or with no separation if IFS is set to a null string.

> In bash and ksh nothing funky is going on. This is well-defined and
> documented in the manuals: the trimming is done for each field. So when $#
> is 3, ${@%pat} is equivalent to ${1%pat} ${2%pat} ${3%pat}, except that pat
> is only expanded once.
> 
> If you're interested in implementing that in dash, a relevant test case is
> 
>   set -- A1 B2 C3
>   unset p
>   echo ${1%${p+x}${p=?}} ${2%${p+x}${p=?}} ${3%${p+x}${p=?}}
>   unset p
>   echo ${@%${p+x}${p=?}}
> 
> In bash and ksh, this is supposed to print, and does print:
> 
>   A B2 C3
>   A B C

Fair enough.  I'm not interested in implementing that right now.

Cheers,
Harald van Dijk March 23, 2018, 12:52 p.m. UTC | #8
On 23/03/2018 13:20, Herbert Xu wrote:
> On Fri, Mar 23, 2018 at 12:19:28PM +0100, Harald van Dijk wrote:
>>
>>> Right.  I think I'll leave this one alone.  It worked purely by
>>> chance previously because dash incorrectly used the first IFS
>>> character even when it's outside of double-quotes, which is why
>>> the expansion ${@%2 C3} matches.
>>
>> When the first IFS character is a whitespace character, which it usually is,
>> it's not incorrect: inserting that character and then splitting on it
>> produces the exact results that POSIX requires. It's only when IFS doesn't
>> start with a whitespace character that it's incorrect.
> 
> Where does POSIX require this?

It doesn't, and I didn't say it did. POSIX doesn't care how it's 
implemented, POSIX cares about the results produced. When another 
approach produces the same results, both approaches are equally correct.

Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu March 23, 2018, 1:14 p.m. UTC | #9
On Fri, Mar 23, 2018 at 01:52:10PM +0100, Harald van Dijk wrote:
>
> It doesn't, and I didn't say it did. POSIX doesn't care how it's
> implemented, POSIX cares about the results produced. When another approach
> produces the same results, both approaches are equally correct.

Right, but restoring the old behaviour for white spaces only is
silly.  It's silly
to have this work:

	set -- A1 B2 C3
	echo ${@%2 C3}

but not this:

	set -- A1 B2 C3
	IFS=:
	echo ${@%2:C3}

Of course, in the old dash both worked which makes more sense.
But that was at the cost of not eliminating empty fields which
violates POSIX.

Cheers,
Harald van Dijk March 23, 2018, 1:57 p.m. UTC | #10
On 23/03/2018 14:14, Herbert Xu wrote:
> On Fri, Mar 23, 2018 at 01:52:10PM +0100, Harald van Dijk wrote:
>>
>> It doesn't, and I didn't say it did. POSIX doesn't care how it's
>> implemented, POSIX cares about the results produced. When another approach
>> produces the same results, both approaches are equally correct.
> 
> Right, but restoring the old behaviour for white spaces only is
> silly.  It's silly
> to have this work:
> 
> 	set -- A1 B2 C3
> 	echo ${@%2 C3}
> 
> but not this:
> 
> 	set -- A1 B2 C3
> 	IFS=:
> 	echo ${@%2:C3}

Agreed. Both are doable in a POSIX-compatible way, but the second would 
require a different approach: either scanning the strings to suppress 
the addition of the separator character in those cases where its 
presence would cause splitting to produce an additional field, or big 
changes in how regions are recorded and split. Either way, that's a 
large cost for a feature that's likely seen very little real-world use.

Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/src/expand.c b/src/expand.c
index 705fef7..a84c015 100644
--- a/src/expand.c
+++ b/src/expand.c
@@ -119,7 +119,7 @@  STATIC const char *subevalvar(char *, char *, int, int, int, int, int);
 STATIC char *evalvar(char *, int);
 STATIC size_t strtodest(const char *, const char *, int);
 STATIC void memtodest(const char *, size_t, const char *, int);
-STATIC ssize_t varvalue(char *, int, int, int *);
+STATIC ssize_t varvalue(char *, int, int, int);
 STATIC void expandmeta(struct strlist *, int);
 #ifdef HAVE_GLOB
 STATIC void addglob(const glob_t *);
@@ -712,7 +712,6 @@  evalvar(char *p, int flag)
 	int c;
 	int startloc;
 	ssize_t varlen;
-	int easy;
 	int quoted;
 
 	varflags = *p++;
@@ -723,12 +722,11 @@  evalvar(char *p, int flag)
 
 	quoted = flag & EXP_QUOTED;
 	var = p;
-	easy = (!quoted || (*var == '@' && shellparam.nparam));
 	startloc = expdest - (char *)stackblock();
 	p = strchr(p, '=') + 1;
 
 again:
-	varlen = varvalue(var, varflags, flag, &quoted);
+	varlen = varvalue(var, varflags, flag, quoted);
 	if (varflags & VSNUL)
 		varlen--;
 
@@ -771,8 +769,11 @@  vsplus:
 
 	if (subtype == VSNORMAL) {
 record:
-		if (!easy)
-			goto end;
+		if (quoted) {
+			quoted = *var == '@' && shellparam.nparam;
+			if (!quoted)
+				goto end;
+		}
 		recordregion(startloc, expdest - (char *)stackblock(), quoted);
 		goto end;
 	}
@@ -878,7 +879,7 @@  strtodest(p, syntax, quotes)
  */
 
 STATIC ssize_t
-varvalue(char *name, int varflags, int flags, int *quotedp)
+varvalue(char *name, int varflags, int flags, int quoted)
 {
 	int num;
 	char *p;
@@ -887,11 +888,11 @@  varvalue(char *name, int varflags, int flags, int *quotedp)
 	char sepc;
 	char **ap;
 	char const *syntax;
-	int quoted = *quotedp;
 	int subtype = varflags & VSTYPE;
 	int discard = subtype == VSPLUS || subtype == VSLENGTH;
 	int quotes = (discard ? 0 : (flags & QUOTES_ESC)) | QUOTES_KEEPNUL;
 	ssize_t len = 0;
+	char c;
 
 	sep = (flags & EXP_FULL) << CHAR_BIT;
 	syntax = quoted ? DQSYNTAX : BASESYNTAX;
@@ -928,12 +929,15 @@  numvar:
 			goto param;
 		/* fall through */
 	case '*':
-		if (quoted)
+		c = 0;
+		if (quoted) {
 			sep = 0;
-		sep |= ifsset() ? ifsval()[0] : ' ';
+			c = ~0;
+		}
+		sep |= ifsset() ? (unsigned char)(c & ifsval()[0]) : ' ';
+		quoted = 0;
 param:
 		sepc = sep;
-		*quotedp = !sepc;
 		if (!(ap = shellparam.p))
 			return -1;
 		while ((p = *ap++)) {