diff mbox

[v2] expand: Fix ghost fields with unquoted $@/$*

Message ID 20180323105847.GB32220@gondor.apana.org.au (mailing list archive)
State Accepted
Delegated to: Herbert Xu
Headers show

Commit Message

Herbert Xu March 23, 2018, 10:58 a.m. UTC
On Fri, Mar 23, 2018 at 10:48:13AM +0100, Martijn Dekker wrote:
>
> Unfortunately it also introduces a bug with $*.
> 
> $ src/dash -c 'IFS=:; v=$*; printf "[%s]\n" "$v"' _ abc 'def ghi' jkl
> [abcdef ghijkl]
> 
> Expected:
> [abc:def ghi:jkl]

Thanks, the problem here is that we need to set c to 0 not just
when quoted is true but also if sep is 0 since both imply that
field splitting is disabled.  Here is an second revision which
should fix this by checking (quoted || !sep) instead of just
quoted when determining whether we're doing field expansion in $*.

---8<---
Harald van Dijk <harald@gigawatt.nl> wrote:
> On 22/03/2018 22:38, Martijn Dekker wrote:
>> Op 22-03-18 om 20:28 schreef Harald van Dijk:
>>> On 22/03/2018 03:40, Martijn Dekker wrote:
>>>> This patch fixes the bug that, given no positional parameters, unquoted
>>>> $@ and $* incorrectly generate one empty field (they should generate no
>>>> fields). Apparently that was a side effect of the above.
>>>
>>> This seems weird though. If you want to remove the recording of empty
>>> regions because they are pointless, then how does removing them fix a
>>> bug? Doesn't this show that empty regions do have an effect? Perhaps
>>> they're not supposed to have any effect, perhaps it's a specific
>>> combination of empty regions and something else that triggers some bug,
>>> and perhaps that combination can no longer occur with your patch.
>>
>> The latter is my guess, but I haven't had time to investigate it.
>
> Looking into it again:
>
> When IFS is set to an empty string, sepc is set to '\0' in varvalue().
> This then causes *quotedp to be set to true, meaning evalvar()'s quoted
> variable is turned on. quoted is then passed to recordregion() as the
> nulonly parameter.
>
> ifsp->nulonly has a bigger effect than merely selecting whether to use
> $IFS or whether to only split on null bytes: in ifsbreakup(), nulonly
> also causes string termination to be suppressed. That's correct: that
> special treatment is required to preserve empty fields in "$@"
> expansion. But it should *only* be used when $@ is quoted: ifsbreakup()
> takes nulonly from the last IFS region, even if it's empty, so having an
> additional zero-length region with nulonly enabled causes confusion.
>
> Passing quoted by value to varvalue() and not attempting to modify it
> should therefore, and in my quick testing does, also work to fix the
> original $@ bug.

You're right.  The proper fix to this is to ensure that nulonly
is not set in varvalue for $*.  It should only be set for $@ when
it's inside double quotes.

In fact there is another bug while we're playing with $@/$*.
When IFS is set to a non-whitespace character such as :, $*
outside quotes won't remove empty fields as it should.

This patch fixes both problems.

Reported-by: Martijn Dekker <martijn@inlv.org>
Suggested-by: Harald van Dijk <harald@gigawatt.nl>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Comments

Martijn Dekker March 23, 2018, 2:34 p.m. UTC | #1
Op 23-03-18 om 11:58 schreef Herbert Xu:
> Thanks, the problem here is that we need to set c to 0 not just
> when quoted is true but also if sep is 0 since both imply that
> field splitting is disabled.  Here is an second revision which
> should fix this by checking (quoted || !sep) instead of just
> quoted when determining whether we're doing field expansion in $*.

FWIW, this passes all my tests.

- Martijn
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/src/expand.c b/src/expand.c
index 705fef7..c14350c 100644
--- a/src/expand.c
+++ b/src/expand.c
@@ -119,7 +119,7 @@  STATIC const char *subevalvar(char *, char *, int, int, int, int, int);
 STATIC char *evalvar(char *, int);
 STATIC size_t strtodest(const char *, const char *, int);
 STATIC void memtodest(const char *, size_t, const char *, int);
-STATIC ssize_t varvalue(char *, int, int, int *);
+STATIC ssize_t varvalue(char *, int, int, int);
 STATIC void expandmeta(struct strlist *, int);
 #ifdef HAVE_GLOB
 STATIC void addglob(const glob_t *);
@@ -712,7 +712,6 @@  evalvar(char *p, int flag)
 	int c;
 	int startloc;
 	ssize_t varlen;
-	int easy;
 	int quoted;
 
 	varflags = *p++;
@@ -723,12 +722,11 @@  evalvar(char *p, int flag)
 
 	quoted = flag & EXP_QUOTED;
 	var = p;
-	easy = (!quoted || (*var == '@' && shellparam.nparam));
 	startloc = expdest - (char *)stackblock();
 	p = strchr(p, '=') + 1;
 
 again:
-	varlen = varvalue(var, varflags, flag, &quoted);
+	varlen = varvalue(var, varflags, flag, quoted);
 	if (varflags & VSNUL)
 		varlen--;
 
@@ -771,8 +769,11 @@  vsplus:
 
 	if (subtype == VSNORMAL) {
 record:
-		if (!easy)
-			goto end;
+		if (quoted) {
+			quoted = *var == '@' && shellparam.nparam;
+			if (!quoted)
+				goto end;
+		}
 		recordregion(startloc, expdest - (char *)stackblock(), quoted);
 		goto end;
 	}
@@ -878,7 +879,7 @@  strtodest(p, syntax, quotes)
  */
 
 STATIC ssize_t
-varvalue(char *name, int varflags, int flags, int *quotedp)
+varvalue(char *name, int varflags, int flags, int quoted)
 {
 	int num;
 	char *p;
@@ -887,11 +888,11 @@  varvalue(char *name, int varflags, int flags, int *quotedp)
 	char sepc;
 	char **ap;
 	char const *syntax;
-	int quoted = *quotedp;
 	int subtype = varflags & VSTYPE;
 	int discard = subtype == VSPLUS || subtype == VSLENGTH;
 	int quotes = (discard ? 0 : (flags & QUOTES_ESC)) | QUOTES_KEEPNUL;
 	ssize_t len = 0;
+	char c;
 
 	sep = (flags & EXP_FULL) << CHAR_BIT;
 	syntax = quoted ? DQSYNTAX : BASESYNTAX;
@@ -928,12 +929,25 @@  numvar:
 			goto param;
 		/* fall through */
 	case '*':
-		if (quoted)
-			sep = 0;
-		sep |= ifsset() ? ifsval()[0] : ' ';
+		/* We will set c to 0 or ~0 depending on whether
+		 * we're doing field splitting.  We won't do field
+		 * splitting if either we're quoted or sep is zero.
+		 *
+		 * Instead of testing (quoted || !sep) the following
+		 * trick optimises away any branches by using the
+		 * fact that EXP_QUOTED (which is the only bit that
+		 * can be set in quoted) is the same as EXP_FULL <<
+		 * CHAR_BIT (which is the only bit that can be set
+		 * in sep).
+		 */
+#if EXP_QUOTED >> CHAR_BIT != EXP_FULL
+#error The following two lines expect EXP_QUOTED == EXP_FULL << CHAR_BIT
+#endif
+		c = !((quoted | ~sep) & EXP_QUOTED) - 1;
+		sep &= ~quoted;
+		sep |= ifsset() ? (unsigned char)(c & ifsval()[0]) : ' ';
 param:
 		sepc = sep;
-		*quotedp = !sepc;
 		if (!(ap = shellparam.p))
 			return -1;
 		while ((p = *ap++)) {