From patchwork Sun May 19 05:20:10 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Herbert Xu X-Patchwork-Id: 13667764 X-Patchwork-Delegate: herbert@gondor.apana.org.au Received: from abb.hmeau.com (abb.hmeau.com [144.6.53.87]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7E1FB4437 for ; Sun, 19 May 2024 05:20:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=144.6.53.87 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716096016; cv=none; b=LxCaFHnPaGlNJ86vCnQaV0doPW3m+YB99au48CxBVl0Jall1pkFBHT6PuLnkL2RgN2/AOS39ReEpEDWqUfh7u1Bb5q/G5Jzh694BhLl9dGgxF3J/SPbImkLQyXTdHSdNh2yzefi38aB+2XFxltdCJmtDmAisNMTMz+ospiTfUTs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716096016; c=relaxed/simple; bh=PTuY1FwaYoW0+ftpuGe503VHsrk4LYewOCTtoEdnLtQ=; h=Date:Message-Id:In-Reply-To:References:From:Subject:To; b=k3IDFT/O5LEwiaRqHV5X1LunonvyjEoa7x9VrdtDO2xX1SRhdcGzPs9YSveLz+XuwnexgKfcO2clykS388OdiTEBtv4+ZeWnWh8Cl7WSYGQ/M1MFYzRUd+DUsKJa1ca0Qu0Wx74hXuOwL5FauLBgeOEmtEieS0Ezxk3dvfYkoSU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=gondor.apana.org.au; spf=pass smtp.mailfrom=gondor.apana.org.au; arc=none smtp.client-ip=144.6.53.87 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=gondor.apana.org.au Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gondor.apana.org.au Received: from loth.rohan.me.apana.org.au ([192.168.167.2]) by formenos.hmeau.com with smtp (Exim 4.96 #2 (Debian)) id 1s8Yxs-00HGA0-2d; Sun, 19 May 2024 13:20:09 +0800 Received: by loth.rohan.me.apana.org.au (sSMTP sendmail emulation); Sun, 19 May 2024 13:20:10 +0800 Date: Sun, 19 May 2024 13:20:10 +0800 Message-Id: In-Reply-To: References: From: Herbert Xu Subject: [v4 PATCH 01/13] shell: Call setlocale To: DASH Mailing List Precedence: bulk X-Mailing-List: dash@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Call setlocale to initialise locale settings for libc. Signed-off-by: Herbert Xu --- src/main.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/src/main.c b/src/main.c index 7beb280..1e192f8 100644 --- a/src/main.c +++ b/src/main.c @@ -32,6 +32,7 @@ * SUCH DAMAGE. */ +#include #include #include #include @@ -101,6 +102,9 @@ main(int argc, char **argv) #if PROFILE monitor(4, etext, profile_buf, sizeof profile_buf, 50); #endif + + setlocale(LC_ALL, ""); + state = 0; if (unlikely(setjmp(main_handler.loc))) { int e; From patchwork Sun May 19 05:20:12 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Herbert Xu X-Patchwork-Id: 13667765 X-Patchwork-Delegate: herbert@gondor.apana.org.au Received: from abb.hmeau.com (abb.hmeau.com [144.6.53.87]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9729833D5 for ; Sun, 19 May 2024 05:20:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=144.6.53.87 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716096017; cv=none; b=SOe/bEPuk9SPuTWknV/5/9TVCzeTPiokCxcpKEAmdVPn5u3s6xikJKoNcS6RuvzjUeSPYRngCYb/vTAQZ4tzH4dSb7NTySMDPCnjp6CHlod0zJKSmP7udXm27xSR6jw/Skrj4Zyaq1B+sMbfbA1AynfJWMyE7osnDAOJymCKyYg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716096017; c=relaxed/simple; bh=1refcKD/Z5qW8Czk7ev9Teg/iSBIjJQ7KrlUcn8DCq0=; h=Date:Message-Id:In-Reply-To:References:From:Subject:To; b=YssUZuKnAap0/Rg21R6yqyky7bcPtqsF62XQ6F79L9IzdKM7hYmoQXvGFLHPm3mcqUhEc/oTzz3r2HS9YxVooVBzA9L2WVLrq9Cq0GEIpOeVRiY2V+Bgk30DU6Bc8PlhpMDZAbBXufO9qUw81VqZ66zHAIkmMtw9JI2E4r8bPaE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=gondor.apana.org.au; spf=pass smtp.mailfrom=gondor.apana.org.au; arc=none smtp.client-ip=144.6.53.87 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=gondor.apana.org.au Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gondor.apana.org.au Received: from loth.rohan.me.apana.org.au ([192.168.167.2]) by formenos.hmeau.com with smtp (Exim 4.96 #2 (Debian)) id 1s8Yxv-00HGAA-0O; Sun, 19 May 2024 13:20:12 +0800 Received: by loth.rohan.me.apana.org.au (sSMTP sendmail emulation); Sun, 19 May 2024 13:20:12 +0800 Date: Sun, 19 May 2024 13:20:12 +0800 Message-Id: <57e08dd1fd32c616bf8cf60315edb6e1739efe6a.1716095868.git.herbert@gondor.apana.org.au> In-Reply-To: References: From: Herbert Xu Subject: [v4 PATCH 02/13] shell: Use strcoll instead of strcmp where applicable To: DASH Mailing List Precedence: bulk X-Mailing-List: dash@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Use strcoll instead of strcmp so that the locale is taken into account when sorting strings during pathname expansion, and for the built-in test(1) string comparison operators. Signed-off-by: Herbert Xu --- src/bltin/test.c | 8 ++++---- src/expand.c | 2 +- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/src/bltin/test.c b/src/bltin/test.c index fd8a43b..2db4d0f 100644 --- a/src/bltin/test.c +++ b/src/bltin/test.c @@ -353,13 +353,13 @@ binop(void) /* NOTREACHED */ #endif case STREQ: - return strcmp(opnd1, opnd2) == 0; + return strcoll(opnd1, opnd2) == 0; case STRNE: - return strcmp(opnd1, opnd2) != 0; + return strcoll(opnd1, opnd2) != 0; case STRLT: - return strcmp(opnd1, opnd2) < 0; + return strcoll(opnd1, opnd2) < 0; case STRGT: - return strcmp(opnd1, opnd2) > 0; + return strcoll(opnd1, opnd2) > 0; case INTEQ: return getn(opnd1) == getn(opnd2); case INTNE: diff --git a/src/expand.c b/src/expand.c index d8b354c..38f8785 100644 --- a/src/expand.c +++ b/src/expand.c @@ -1464,7 +1464,7 @@ msort(struct strlist *list, int len) p = msort(p, len - half); /* sort second half */ lpp = &list; for (;;) { - if (strcmp(p->text, q->text) < 0) { + if (strcoll(p->text, q->text) < 0) { *lpp = p; lpp = &p->next; if ((p = *lpp) == NULL) { From patchwork Sun May 19 05:20:14 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Herbert Xu X-Patchwork-Id: 13667766 X-Patchwork-Delegate: herbert@gondor.apana.org.au Received: from abb.hmeau.com (abb.hmeau.com [144.6.53.87]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 958914437 for ; Sun, 19 May 2024 05:20:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=144.6.53.87 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716096019; cv=none; b=QaqBVmr1GRQtVsHlraYdu1Bn3+Y5L57+N6ov84ATKO1+jkFZc30oTYs89p7CRurfnPAHiFZ51uiKHoY8q0erOdYpC4+o9xMvt9G0apl2T7DoQds+EMtoTMSkVfAkH0/XAznmqt4Y+vLh2x18wDcoWNb+cbOXEpm72+t76lsKz70= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716096019; c=relaxed/simple; bh=N7HmBoF6ebGEKvB6WOYYtSMbHYZ4y7BXPR9dum58Yk0=; h=Date:Message-Id:In-Reply-To:References:From:Subject:To; b=nPf1OyDI78E+utCzM2Mqki2zRFQVn8THRkq+dxoOC8IBGLsu7PLFP7+KED+AqiFx1RxOR6GxAicaNr6INBw/rM2pULzHw+j/0vviGrYtm3JVaLqDhg1KBPJu1SdhwH5/l8WUjKRNpK00uoKLskzbMjwKajQn7ziCFXPVfmCq6tA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=gondor.apana.org.au; spf=pass smtp.mailfrom=gondor.apana.org.au; arc=none smtp.client-ip=144.6.53.87 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=gondor.apana.org.au Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gondor.apana.org.au Received: from loth.rohan.me.apana.org.au ([192.168.167.2]) by formenos.hmeau.com with smtp (Exim 4.96 #2 (Debian)) id 1s8Yxx-00HGAL-1O; Sun, 19 May 2024 13:20:14 +0800 Received: by loth.rohan.me.apana.org.au (sSMTP sendmail emulation); Sun, 19 May 2024 13:20:14 +0800 Date: Sun, 19 May 2024 13:20:14 +0800 Message-Id: <165ebdcfeeedf01a7f5894c8bea3ea4d002e3866.1716095868.git.herbert@gondor.apana.org.au> In-Reply-To: References: From: Herbert Xu Subject: [v4 PATCH 03/13] expand: Count multi-byte characters for VSLENGTH To: DASH Mailing List Precedence: bulk X-Mailing-List: dash@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Count multi-byte characters in variables and rather than bytes and return that as the length expansion. Signed-off-by: Herbert Xu --- src/expand.c | 105 ++++++++++++++++++++++++++++++++++++++++--------- src/memalloc.h | 10 ++--- 2 files changed, 92 insertions(+), 23 deletions(-) diff --git a/src/expand.c b/src/expand.c index 38f8785..5260d16 100644 --- a/src/expand.c +++ b/src/expand.c @@ -53,6 +53,7 @@ #endif #include #include +#include /* * Routines to expand arguments to commands. We have to deal with @@ -789,6 +790,41 @@ really_record: return p; } +static char *chtodest(int c, const char *syntax, char *out) +{ + if (syntax[c] == CCTL) + USTPUTC(CTLESC, out); + USTPUTC(c, out); + + return out; +} + +struct mbpair { + unsigned ml; + unsigned ql; +}; + +static struct mbpair mbtodest(const char *p, char *q, const char *syntax, + size_t len) +{ + mbstate_t mbs = {}; + struct mbpair mbp; + char *q0 = q; + size_t ml; + + ml = mbrlen(--p, len, &mbs); + if (ml == -2 || ml == -1 || ml < 2) + ml = 1; + + len = ml; + do { + q = chtodest((signed char)*p++, syntax, q); + } while (--len); + + mbp.ml = ml - 1; + mbp.ql = q - q0; + return mbp; +} /* * Put a string on the stack. @@ -796,38 +832,70 @@ really_record: static size_t memtodest(const char *p, size_t len, int flags) { - const char *syntax = flags & EXP_QUOTED ? DQSYNTAX : BASESYNTAX; + const char *syntax; + size_t count = 0; char *q; - char *s; if (unlikely(!len)) return 0; q = makestrspace(len * 2, expdest); - s = q; - do { +#if QUOTES_ESC != 0x11 || EXP_QUOTED != 0x100 +#error QUOTES_ESC != 0x11 || EXP_QUOTED != 0x100 +#endif + if (likely(!(flags & (flags >> 4 | flags >> 8) & QUOTES_ESC))) { + while (len >= 8) { + uint64_t x = *(uint64_t *)(p + count); + + if ((x | (x - 0x0101010101010101)) & + 0x8080808080808080) + break; + + *(uint64_t *)(q + count) = x; + + count += 8; + len -= 8; + } + + q += count; + p += count; + + syntax = flags & QUOTES_ESC ? BASESYNTAX : is_type; + } else + syntax = SQSYNTAX; + + for (; len; len--) { int c = (signed char)*p++; - if (c) { - if ((flags & QUOTES_ESC) && - ((syntax[c] == CCTL) || - (flags & EXP_QUOTED && syntax[c] == CBACK))) - USTPUTC(CTLESC, q); - } else if (!(flags & EXP_KEEPNUL)) + + if (unlikely(!c && !(flags & EXP_KEEPNUL))) continue; - USTPUTC(c, q); - } while (--len); + + count++; + + if (unlikely(c < 0)) { + struct mbpair mbp = mbtodest(p, q, syntax, len); + unsigned mlm; + + q += mbp.ql; + mlm = mbp.ml; + p += mlm; + len -= mlm; + continue; + } + + q = chtodest(c, syntax, q); + } expdest = q; - return q - s; + return count; } static size_t strtodest(const char *p, int flags) { size_t len = strlen(p); - memtodest(p, len, flags); - return len; + return memtodest(p, len, flags); } @@ -849,6 +917,7 @@ varvalue(char *name, int varflags, int flags, int quoted) int discard = (subtype == VSPLUS || subtype == VSLENGTH) | (flags & EXP_DISCARD); ssize_t len = 0; + size_t start; char c; if (!subtype) { @@ -858,9 +927,9 @@ varvalue(char *name, int varflags, int flags, int quoted) sh_error("Bad substitution"); } - flags |= EXP_KEEPNUL; flags &= discard ? ~QUOTES_ESC : ~0; sep = (flags & EXP_FULL) << CHAR_BIT; + start = expdest - (char *)stackblock(); switch (*name) { case '$': @@ -920,7 +989,7 @@ param: if (*ap && sep) { len++; - memtodest(&sepc, 1, flags); + memtodest(&sepc, 1, flags | EXP_KEEPNUL); } } break; @@ -950,7 +1019,7 @@ value: } if (discard) - STADJUST(-len, expdest); + expdest = (char *)stackblock() + start; return len; } diff --git a/src/memalloc.h b/src/memalloc.h index a7f7996..1895c1e 100644 --- a/src/memalloc.h +++ b/src/memalloc.h @@ -81,11 +81,11 @@ static inline char *_STPUTC(int c, char *p) { #define STPUTC(c, p) ((p) = _STPUTC((c), (p))) #define CHECKSTRSPACE(n, p) \ ({ \ - char *q = (p); \ - size_t l = (n); \ - size_t m = sstrend - q; \ - if (l > m) \ - (p) = makestrspace(l, q); \ + char *_q = (p); \ + size_t _l = (n); \ + size_t _m = sstrend - _q; \ + if (_l > _m) \ + (p) = makestrspace(_l, _q); \ 0; \ }) #define USTPUTC(c, p) (*p++ = (c)) From patchwork Sun May 19 05:20:17 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Herbert Xu X-Patchwork-Id: 13667767 X-Patchwork-Delegate: herbert@gondor.apana.org.au Received: from abb.hmeau.com (abb.hmeau.com [144.6.53.87]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 52BB833D5 for ; Sun, 19 May 2024 05:20:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=144.6.53.87 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716096022; cv=none; b=s6aVkEhit6WDem8EzoLThqbXuRdASD4Ibx6Nn2I9ki+xnwckOfJZd+zUcBbFPU2+504o1moJdlPBuXrSv/LVwOQ25cwTRMZ1KY/dkv2Xw/qkYPYLLehrgM1J3nF2e/4cWClLv+AEBu+CH42CH3LdH2LVTLaaQuXHKxGWtF86RqM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716096022; c=relaxed/simple; bh=E+I7Go8iQnJXch1LYhLVSAboUi+DR48eNENd6CcRDXc=; h=Date:Message-Id:In-Reply-To:References:From:Subject:To; b=f3KK53quZiwJauFPtAM2bONrCw0JDRjEIh23IEEl6ElCkKG32jysFWSUFgr6h1+3+c43fjKDNNCyaV/uoc0nmXdXkkcUtLOSp9W6ZyB6gKUHZG/hwNS0VhOue/i3eWLtZkyAouzgYXZ0b3Q6+aZaQ90c/bgs27zyWYtiagwOKLw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=gondor.apana.org.au; spf=pass smtp.mailfrom=gondor.apana.org.au; arc=none smtp.client-ip=144.6.53.87 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=gondor.apana.org.au Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gondor.apana.org.au Received: from loth.rohan.me.apana.org.au ([192.168.167.2]) by formenos.hmeau.com with smtp (Exim 4.96 #2 (Debian)) id 1s8Yxz-00HGAW-2R; Sun, 19 May 2024 13:20:16 +0800 Received: by loth.rohan.me.apana.org.au (sSMTP sendmail emulation); Sun, 19 May 2024 13:20:17 +0800 Date: Sun, 19 May 2024 13:20:17 +0800 Message-Id: In-Reply-To: References: From: Herbert Xu Subject: [v4 PATCH 04/13] expand: Process multi-byte characters in subevalvar To: DASH Mailing List Precedence: bulk X-Mailing-List: dash@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: When trimming variables in subevalvar, process multi-byte characters as one unit instead of their constituent bytes. Signed-off-by: Herbert Xu --- src/expand.c | 196 ++++++++++++++++++++++++++++++++++--------------- src/expand.h | 1 + src/mystring.c | 2 +- src/parser.h | 1 + 4 files changed, 138 insertions(+), 62 deletions(-) diff --git a/src/expand.c b/src/expand.c index 5260d16..b627c7a 100644 --- a/src/expand.c +++ b/src/expand.c @@ -32,27 +32,27 @@ * SUCH DAMAGE. */ -#include -#include -#include +#include #include -#include -#ifdef HAVE_GETPWNAM -#include -#endif -#include -#include -#include -#include -#include #ifdef HAVE_FNMATCH #include #endif #ifdef HAVE_GLOB #include #endif -#include +#include +#include +#ifdef HAVE_GETPWNAM +#include +#endif +#include #include +#include +#include +#include +#include +#include +#include #include /* @@ -543,8 +543,10 @@ static char *scanleft(char *startp, char *endp, char *rmesc, char *rmescend, loc = startp; loc2 = rmesc; do { - int match; const char *s = loc2; + unsigned ml; + int match; + c = *loc2; if (zero) { *loc2 = '\0'; @@ -553,12 +555,26 @@ static char *scanleft(char *startp, char *endp, char *rmesc, char *rmescend, match = pmatch(str, s); *loc2 = c; if (match) - return loc; - if (quotes && *loc == (char)CTLESC) + return quotes ? loc : loc2; + + if (!c) + break; + + if (*loc != (char)CTLMBCHAR) { + if (*loc == (char)CTLESC) + loc++; loc++; - loc++; - loc2++; - } while (c); + loc2++; + continue; + } + + if (*++loc == (char)CTLESC) + loc++; + + ml = (unsigned char)*loc; + loc += ml + 3; + loc2 += ml; + } while (1); return 0; } @@ -566,14 +582,16 @@ static char *scanleft(char *startp, char *endp, char *rmesc, char *rmescend, static char *scanright(char *startp, char *endp, char *rmesc, char *rmescend, char *str, int quotes, int zero ) { - int esc = 0; + size_t esc = 0; char *loc; char *loc2; for (loc = endp, loc2 = rmescend; loc >= startp; loc2--) { - int match; - char c = *loc2; const char *s = loc2; + char c = *loc2; + unsigned ml; + int match; + if (zero) { *loc2 = '\0'; s = rmesc; @@ -581,17 +599,23 @@ static char *scanright(char *startp, char *endp, char *rmesc, char *rmescend, match = pmatch(str, s); *loc2 = c; if (match) - return loc; + return quotes ? loc : loc2; loc--; - if (quotes) { - if (--esc < 0) { - esc = esclen(startp, loc); - } - if (esc % 2) { - esc--; - loc--; - } + if (!esc--) + esc = esclen(startp, loc); + if (esc % 2) { + esc--; + loc--; + continue; } + if (*loc != (char)CTLMBCHAR) + continue; + + ml = (unsigned char)*--loc; + loc -= ml + 2; + if (*loc == (char)CTLESC) + loc--; + loc2 -= ml - 1; } return 0; } @@ -645,14 +669,11 @@ static char *subevalvar(char *start, char *str, int strloc, int startloc, nstrloc = str - (char *)stackblock(); } - rmesc = startp; - if (quotes) { - rmesc = _rmescapes(startp, RMESCAPE_ALLOC | RMESCAPE_GROW); - if (rmesc != startp) - rmescend = expdest; - startp = stackblock() + startloc; - str = stackblock() + nstrloc; - } + rmesc = _rmescapes(startp, RMESCAPE_ALLOC | RMESCAPE_GROW); + if (rmesc != startp) + rmescend = expdest; + startp = stackblock() + startloc; + str = stackblock() + nstrloc; rmescend--; /* zero = subtype == VSTRIMLEFT || subtype == VSTRIMLEFTMAX */ @@ -662,16 +683,29 @@ static char *subevalvar(char *start, char *str, int strloc, int startloc, endp = stackblock() + strloc - 1; loc = scan(startp, endp, rmesc, rmescend, str, quotes, zero); - if (loc) { - if (zero) { - memmove(startp, loc, endp - loc); - loc = startp + (endp - loc); + if (!loc) { + if (quotes) { + rmesc = startp; + rmescend = endp; } - *loc = '\0'; - } else - loc = endp; + } else if (!quotes) { + if (zero) + rmesc = loc; + else + rmescend = loc; + } else if (zero) { + rmesc = loc; + rmescend = endp; + } else { + rmesc = startp; + rmescend = loc; + } + + memmove(startp, rmesc, rmescend - rmesc); + loc = startp + (rmescend - rmesc); out: + *loc = '\0'; amount = loc - expdest; STADJUST(amount, expdest); @@ -697,6 +731,7 @@ evalvar(char *p, int flag) ssize_t varlen; int discard; int quoted; + int mbchar; varflags = *p++ & ~VSBIT; subtype = varflags & VSTYPE; @@ -706,8 +741,18 @@ evalvar(char *p, int flag) startloc = expdest - (char *)stackblock(); p = strchr(p, '=') + 1; + mbchar = 0; + switch (subtype) { + case VSTRIMLEFT: + case VSTRIMLEFTMAX: + case VSTRIMRIGHT: + case VSTRIMRIGHTMAX: + mbchar = EXP_MBCHAR; + break; + } + again: - varlen = varvalue(var, varflags, flag, quoted); + varlen = varvalue(var, varflags, flag | mbchar, quoted); if (varflags & VSNUL) varlen--; @@ -813,14 +858,31 @@ static struct mbpair mbtodest(const char *p, char *q, const char *syntax, size_t ml; ml = mbrlen(--p, len, &mbs); - if (ml == -2 || ml == -1 || ml < 2) + if (ml == -2 || ml == -1 || ml < 2) { + q = chtodest((signed char)*p, syntax, q); ml = 1; + goto out; + } len = ml; do { q = chtodest((signed char)*p++, syntax, q); } while (--len); + goto out; + if (syntax[CTLMBCHAR] == CCTL) { + USTPUTC(CTLMBCHAR, q); + USTPUTC(ml, q); + } + + q = mempcpy(q, p, ml); + + if (syntax[CTLMBCHAR] == CCTL) { + USTPUTC(ml, q); + USTPUTC(CTLMBCHAR, q); + } + +out: mbp.ml = ml - 1; mbp.ql = q - q0; return mbp; @@ -839,12 +901,14 @@ static size_t memtodest(const char *p, size_t len, int flags) if (unlikely(!len)) return 0; - q = makestrspace(len * 2, expdest); + /* CTLMBCHAR, 2, c, c, 2, CTLMBCHAR */ + q = makestrspace(len * 3, expdest); -#if QUOTES_ESC != 0x11 || EXP_QUOTED != 0x100 -#error QUOTES_ESC != 0x11 || EXP_QUOTED != 0x100 +#if QUOTES_ESC != 0x11 || EXP_MBCHAR != 0x20 || EXP_QUOTED != 0x100 +#error QUOTES_ESC != 0x11 || EXP_MBCHAR != 0x20 || EXP_QUOTED != 0x100 #endif - if (likely(!(flags & (flags >> 4 | flags >> 8) & QUOTES_ESC))) { + if (likely(!(flags & (flags >> 3 | flags >> 4 | flags >> 8) & + (QUOTES_ESC | EXP_MBCHAR)))) { while (len >= 8) { uint64_t x = *(uint64_t *)(p + count); @@ -861,7 +925,8 @@ static size_t memtodest(const char *p, size_t len, int flags) q += count; p += count; - syntax = flags & QUOTES_ESC ? BASESYNTAX : is_type; + syntax = flags & (QUOTES_ESC | EXP_MBCHAR) ? + BASESYNTAX : is_type; } else syntax = SQSYNTAX; @@ -1753,17 +1818,25 @@ _rmescapes(char *str, int flag) inquotes = 0; notescaped = globbing; while (*p) { + unsigned ml; + int newnesc = globbing; + if (*p == (char)CTLQUOTEMARK) { p++; inquotes ^= globbing; continue; - } - if (*p == '\\') { + } else if (*p == '\\') { /* naked back slash */ - notescaped ^= globbing; - goto copy; - } - if (*p == (char)CTLESC) { + newnesc ^= notescaped; + } else if (*p == (char)CTLMBCHAR) { + if (*++p == (char)CTLESC) + p++; + + ml = (unsigned char)*p++; + q = mempcpy(q, p, ml); + p += ml + 2; + goto setnesc; + } else if (*p == (char)CTLESC) { p++; if (notescaped) *q++ = '\\'; @@ -1772,9 +1845,10 @@ _rmescapes(char *str, int flag) *q++ = '\\'; } } - notescaped = globbing; -copy: + *q++ = *p++; +setnesc: + notescaped = newnesc; } *q = '\0'; if (flag & RMESCAPE_GROW) { diff --git a/src/expand.h b/src/expand.h index 49a18f9..a78564f 100644 --- a/src/expand.h +++ b/src/expand.h @@ -55,6 +55,7 @@ struct arglist { #define EXP_VARTILDE 0x4 /* expand tildes in an assignment */ #define EXP_REDIR 0x8 /* file glob for a redirection (1 match only) */ #define EXP_CASE 0x10 /* keeps quotes around for CASE pattern */ +#define EXP_MBCHAR 0x20 /* mark multi-byte characters */ #define EXP_VARTILDE2 0x40 /* expand tildes after colons only */ #define EXP_WORD 0x80 /* expand word in parameter expansion */ #define EXP_QUOTED 0x100 /* expand word in double quotes */ diff --git a/src/mystring.c b/src/mystring.c index 978bbb5..afaa508 100644 --- a/src/mystring.c +++ b/src/mystring.c @@ -64,7 +64,7 @@ const char dolatstr[] = { CTLQUOTEMARK, CTLVAR, VSNORMAL | VSBIT, '@', '=', CTLQUOTEMARK, '\0' }; const char cqchars[] = { '\\', - CTLESC, CTLQUOTEMARK, 0 + CTLESC, CTLMBCHAR, CTLQUOTEMARK, 0 }; const char illnum[] = "Illegal number: %s"; const char homestr[] = "HOME"; diff --git a/src/parser.h b/src/parser.h index 433573d..14bfc4f 100644 --- a/src/parser.h +++ b/src/parser.h @@ -44,6 +44,7 @@ union node; #define CTLVAR -126 /* variable defn */ #define CTLENDVAR -125 #define CTLBACKQ -124 +#define CTLMBCHAR -123 #define CTLARI -122 /* arithmetic expression */ #define CTLENDARI -121 #define CTLQUOTEMARK -120 From patchwork Sun May 19 05:20:19 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Herbert Xu X-Patchwork-Id: 13667768 X-Patchwork-Delegate: herbert@gondor.apana.org.au Received: from abb.hmeau.com (abb.hmeau.com [144.6.53.87]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7025A4437 for ; Sun, 19 May 2024 05:20:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=144.6.53.87 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716096024; cv=none; b=a/spJhmwvAKU+hig6ga6ROKfaAY3i7dYAjcBCVgBVOOubns3/yduZWUKo06ty3iWC/Yhitpown7at1YXxQAkZxAYlGNsN0R3UbHS5Tp0+I0tXPvQlOO5cvsk4/kxiKoAslCEbzVzJHyptRbJ8LQDP/Lo9izIboxD84bbPfwPGAU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716096024; c=relaxed/simple; bh=nh5M77j97n58tWm+CMchhUBZZ/aiUXAvssszTTitQ9A=; h=Date:Message-Id:In-Reply-To:References:From:Subject:To; b=iF+aAxAi7+rYCOGmiS/rtZAYJ1xqCrfIWy9oh1z/GOf+1yZbzTgYPx3XFg412X2X0FeI+HBRmQ2uZJ8xghCGTaoHDt6m8IeWAHoSQpY/VwMBnIwk9H3I3obTGj+rNJdQc/L2dUKpiJLzJWCmVg7XDsX8i6PzLI2VPA4H2KKdNP0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=gondor.apana.org.au; spf=pass smtp.mailfrom=gondor.apana.org.au; arc=none smtp.client-ip=144.6.53.87 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=gondor.apana.org.au Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gondor.apana.org.au Received: from loth.rohan.me.apana.org.au ([192.168.167.2]) by formenos.hmeau.com with smtp (Exim 4.96 #2 (Debian)) id 1s8Yy2-00HGAh-09; Sun, 19 May 2024 13:20:19 +0800 Received: by loth.rohan.me.apana.org.au (sSMTP sendmail emulation); Sun, 19 May 2024 13:20:19 +0800 Date: Sun, 19 May 2024 13:20:19 +0800 Message-Id: <34f30d88b665583154bd20b833d99efb40847815.1716095868.git.herbert@gondor.apana.org.au> In-Reply-To: References: From: Herbert Xu Subject: [v4 PATCH 05/13] expand: Process multi-byte characters in expmeta To: DASH Mailing List Precedence: bulk X-Mailing-List: dash@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: When glob(3) is not in use, make sure that expmeta processes multi-byte characters correctly. Signed-off-by: Herbert Xu --- src/expand.c | 109 ++++++++++++++++++++++++++++++++++----------------- 1 file changed, 72 insertions(+), 37 deletions(-) diff --git a/src/expand.c b/src/expand.c index b627c7a..714eae9 100644 --- a/src/expand.c +++ b/src/expand.c @@ -84,6 +84,7 @@ #define RMESCAPE_GLOB 0x2 /* Add backslashes for glob */ #define RMESCAPE_GROW 0x8 /* Grow strings instead of stalloc */ #define RMESCAPE_HEAP 0x10 /* Malloc strings instead of stalloc */ +#define RMESCAPE_EMETA 0x20 /* Remove backslashes too */ /* Add CTLESC when necessary. */ #define QUOTES_ESC (EXP_FULL | EXP_CASE) @@ -1387,15 +1388,13 @@ expandmeta(struct strlist *str) savelastp = exparg.lastp; INTOFF; - p = preglob(str->text, RMESCAPE_ALLOC | RMESCAPE_HEAP); + p = str->text; len = strlen(p); expdir_max = len + PATH_MAX; expdir = ckmalloc(expdir_max); expmeta(p, len, 0); ckfree(expdir); - if (p != str->text) - ckfree(p); INTON; if (exparg.lastp == savelastp) { /* @@ -1416,6 +1415,40 @@ nometa: } } +static void expmeta_rmescapes(char *enddir, char *name) +{ + preglob(strcpy(enddir, name), RMESCAPE_EMETA); +} + +static unsigned mbcharlen(char *p) +{ + int esc = 0; + + if (*++p == (char)CTLESC) + esc++; + + return esc + 3 + (unsigned char)p[esc]; +} + +static size_t skipesc(char *p) +{ + size_t esc = 0; + + if (p[esc] == (char)CTLMBCHAR) + esc += mbcharlen(p); + else if (p[esc] == (char)CTLESC) + esc++; + else if (p[esc] == '\\' && p[esc + 1]) { + while (p[++esc] == (char)CTLQUOTEMARK) + ; + if (p[esc] == (char)CTLMBCHAR) + esc += mbcharlen(p + esc); + else if (p[esc] == (char)CTLESC) + esc++; + } + + return esc; +} /* * Do metacharacter (i.e. *, ?, [...]) expansion. @@ -1425,17 +1458,18 @@ STATIC void expmeta(char *name, unsigned name_len, unsigned expdir_len) { char *enddir = expdir + expdir_len; - char *p; + struct stat64 statb; + struct dirent64 *dp; const char *cp; - char *start; char *endname; int metaflag; - struct stat64 statb; - DIR *dirp; - struct dirent64 *dp; - int atend; int matchdot; + char *start; + DIR *dirp; + char *pat; + char *p; int esc; + int c; metaflag = 0; start = name; @@ -1444,11 +1478,8 @@ expmeta(char *name, unsigned name_len, unsigned expdir_len) metaflag = 1; else if (*p == '[') { char *q = p + 1; - if (*q == '!') - q++; for (;;) { - if (*q == '\\') - q++; + q += skipesc(q); if (*q == '/' || *q == '\0') break; if (*++q == ']') { @@ -1457,8 +1488,7 @@ expmeta(char *name, unsigned name_len, unsigned expdir_len) } } } else { - if (*p == '\\' && p[1]) - esc++; + esc = skipesc(p); if (p[esc] == '/') { if (metaflag) break; @@ -1469,24 +1499,18 @@ expmeta(char *name, unsigned name_len, unsigned expdir_len) if (metaflag == 0) { /* we've reached the end of the file name */ if (!expdir_len) return; - p = name; - do { - if (*p == '\\' && p[1]) - p++; - *enddir++ = *p; - } while (*p++); + expmeta_rmescapes(enddir, name); if (lstat64(expdir, &statb) >= 0) addfname(expdir); return; } endname = p; if (name < start) { - p = name; - do { - if (*p == '\\' && p[1]) - p++; - *enddir++ = *p++; - } while (p < start); + c = *start; + *start = 0; + expmeta_rmescapes(enddir, name); + *start = c; + enddir += strlen(enddir); } *enddir = 0; cp = expdir; @@ -1495,16 +1519,15 @@ expmeta(char *name, unsigned name_len, unsigned expdir_len) cp = "."; if ((dirp = opendir(cp)) == NULL) return; - if (*endname == 0) { - atend = 1; - } else { - atend = 0; + c = *endname; + if (c) { *endname = '\0'; endname += esc + 1; } name_len -= endname - name; matchdot = 0; - p = start; + pat = preglob(start, RMESCAPE_ALLOC | RMESCAPE_HEAP); + p = pat; if (*p == '\\') p++; if (*p == '.') @@ -1512,8 +1535,8 @@ expmeta(char *name, unsigned name_len, unsigned expdir_len) while (! int_pending() && (dp = readdir64(dirp)) != NULL) { if (dp->d_name[0] == '.' && ! matchdot) continue; - if (pmatch(start, dp->d_name)) { - if (atend) { + if (pmatch(pat, dp->d_name)) { + if (!c) { scopy(dp->d_name, enddir); addfname(expdir); } else { @@ -1536,9 +1559,11 @@ expmeta(char *name, unsigned name_len, unsigned expdir_len) } } } + if (pat != start) + ckfree(pat); closedir(dirp); - if (! atend) - endname[-esc - 1] = esc ? '\\' : '/'; + if (c) + endname[-esc - 1] = c; } @@ -1781,6 +1806,7 @@ _rmescapes(char *str, int flag) int notescaped; int globbing; int inquotes; + int expmeta; p = strpbrk(str, cqchars); if (!p) { @@ -1789,6 +1815,7 @@ _rmescapes(char *str, int flag) q = p; r = str; globbing = flag & RMESCAPE_GLOB; + expmeta = (flag & RMESCAPE_EMETA) ? RMESCAPE_GLOB : 0; if (flag & RMESCAPE_ALLOC) { size_t len = p - str; @@ -1828,6 +1855,12 @@ _rmescapes(char *str, int flag) } else if (*p == '\\') { /* naked back slash */ newnesc ^= notescaped; + /* naked backslashes can only occur outside quotes */ + inquotes = 0; + if (expmeta & ~newnesc) { + p++; + goto setnesc; + } } else if (*p == (char)CTLMBCHAR) { if (*++p == (char)CTLESC) p++; @@ -1838,7 +1871,9 @@ _rmescapes(char *str, int flag) goto setnesc; } else if (*p == (char)CTLESC) { p++; - if (notescaped) + if (expmeta) + ; + else if (notescaped) *q++ = '\\'; else if (inquotes) { *q++ = '\\'; From patchwork Sun May 19 05:20:21 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Herbert Xu X-Patchwork-Id: 13667769 X-Patchwork-Delegate: herbert@gondor.apana.org.au Received: from abb.hmeau.com (abb.hmeau.com [144.6.53.87]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BA31733D5 for ; Sun, 19 May 2024 05:20:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=144.6.53.87 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716096027; cv=none; b=hb7UGkp1pGWVU5pVJKyZsJ+GEne/RchloAD7Sg028gr0wu3vdqHEE8kAbugLojoZy/S+BnPJbOYlV0tBX7j2v0fGRVg87Ajr1TtgrEBg6Yl4BZH8fU2epanhooCUxI8b4jQbQkr8TX002mbQMZiyXxiXxB/Y8Yvpk45OIFwh9Mw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716096027; c=relaxed/simple; bh=8WMDiRmmgwaw32K/e3v/EM/rYzbhUoi6WnORt0TXZ8o=; h=Date:Message-Id:In-Reply-To:References:From:Subject:To; b=giqMolQQC4hmkpH9kdpCw5Kr30yr2c5w10JV74ZGFEQoFmIyWXHct5BiVIpKOgfbYnwQO3L8+iwBdHEj2+4vf7q/Mf0S4y5v97VM/DZitBQ00pj6Pv2UYR4qMSoWBLDZztL8HJhUg7b0BUclshCyaViTsBI97Bn+w1CLBh9ClwI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=gondor.apana.org.au; spf=pass smtp.mailfrom=gondor.apana.org.au; arc=none smtp.client-ip=144.6.53.87 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=gondor.apana.org.au Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gondor.apana.org.au Received: from loth.rohan.me.apana.org.au ([192.168.167.2]) by formenos.hmeau.com with smtp (Exim 4.96 #2 (Debian)) id 1s8Yy4-00HGBR-10; Sun, 19 May 2024 13:20:21 +0800 Received: by loth.rohan.me.apana.org.au (sSMTP sendmail emulation); Sun, 19 May 2024 13:20:21 +0800 Date: Sun, 19 May 2024 13:20:21 +0800 Message-Id: <94990e2a097cc2b8f1fed8e179e4db455c1b4935.1716095868.git.herbert@gondor.apana.org.au> In-Reply-To: References: From: Herbert Xu Subject: [v4 PATCH 06/13] expand: Support multi-byte characters during field splitting To: DASH Mailing List Precedence: bulk X-Mailing-List: dash@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: When multi-byte characters are used in IFS, they will be used for field splitting. Signed-off-by: Herbert Xu --- src/expand.c | 455 ++++++++++++++++++++++++++++++++++----------------- src/expand.h | 1 + src/var.c | 12 +- 3 files changed, 309 insertions(+), 159 deletions(-) diff --git a/src/expand.c b/src/expand.c index 714eae9..8f30e46 100644 --- a/src/expand.c +++ b/src/expand.c @@ -54,6 +54,7 @@ #include #include #include +#include /* * Routines to expand arguments to commands. We have to deal with @@ -101,6 +102,14 @@ struct ifsregion { int nulonly; /* search for nul bytes only */ }; +struct ifs_state { + const char *ifs; + char *start; + char *r; + int maxargs; + int ifsspc; +}; + /* output of current string */ static char *expdest; /* list of back quote expressions */ @@ -112,6 +121,11 @@ static struct ifsregion *ifslastp; /* holds expanded arg list */ static struct arglist exparg; +static char ifsmap[128]; +static const char *ncifs; +static size_t ifsmb0len; +static wchar_t *wcifs; + static char *argstr(char *p, int flag); static char *exptilde(char *startp, int flag); static char *expari(char *start, int flag); @@ -119,7 +133,7 @@ STATIC void expbackq(union node *, int); STATIC char *evalvar(char *, int); static size_t strtodest(const char *p, int flags); static size_t memtodest(const char *p, size_t len, int flags); -STATIC ssize_t varvalue(char *, int, int, int); +STATIC ssize_t varvalue(char *, int, unsigned); STATIC void expandmeta(struct strlist *); static void addglob(const glob64_t *); STATIC void expmeta(char *, unsigned, unsigned); @@ -157,6 +171,30 @@ esclen(const char *start, const char *p) { return esc; } +static __attribute__((noinline)) unsigned mbnext(const char *p) +{ + unsigned start = 0; + unsigned end = 0; + unsigned ml; + int c; + + c = (signed char)p[end++]; + + switch (__builtin_expect(c, 0)) { + case CTLMBCHAR: + if ((signed char)p[end] == CTLESC) + end++; + ml = (unsigned char)p[end++]; + start = end; + end = ml + 2; + break; + case CTLESC: + start++; + break; + } + + return start | end << 8; +} static inline const char *getpwhome(const char *name) { @@ -545,6 +583,7 @@ static char *scanleft(char *startp, char *endp, char *rmesc, char *rmescend, loc2 = rmesc; do { const char *s = loc2; + unsigned mb; unsigned ml; int match; @@ -561,19 +600,9 @@ static char *scanleft(char *startp, char *endp, char *rmesc, char *rmescend, if (!c) break; - if (*loc != (char)CTLMBCHAR) { - if (*loc == (char)CTLESC) - loc++; - loc++; - loc2++; - continue; - } - - if (*++loc == (char)CTLESC) - loc++; - - ml = (unsigned char)*loc; - loc += ml + 3; + mb = mbnext(loc); + loc += (mb & 0xff) + (mb >> 8); + ml = (mb >> 8) > 3 ? (mb >> 8) - 2 : 1; loc2 += ml; } while (1); return 0; @@ -753,7 +782,7 @@ evalvar(char *p, int flag) } again: - varlen = varvalue(var, varflags, flag | mbchar, quoted); + varlen = varvalue(var, varflags, flag | mbchar); if (varflags & VSNUL) varlen--; @@ -970,23 +999,23 @@ static size_t strtodest(const char *p, int flags) * Add the value of a specialized variable to the stack string. */ -STATIC ssize_t -varvalue(char *name, int varflags, int flags, int quoted) +static ssize_t varvalue(char *name, int varflags, unsigned flags) { + int subtype = varflags & VSTYPE; + unsigned long seplen; + const char *seps; + ssize_t len = 0; + size_t start; + int discard; + char **ap; int num; char *p; int i; - int sep; - char sepc; - char **ap; - int subtype = varflags & VSTYPE; - int discard = (subtype == VSPLUS || subtype == VSLENGTH) | - (flags & EXP_DISCARD); - ssize_t len = 0; - size_t start; - char c; - if (!subtype) { + discard = (subtype == VSPLUS || subtype == VSLENGTH) | + (flags & EXP_DISCARD); + + if (unlikely(!subtype)) { if (discard) return -1; @@ -994,7 +1023,8 @@ varvalue(char *name, int varflags, int flags, int quoted) } flags &= discard ? ~QUOTES_ESC : ~0; - sep = (flags & EXP_FULL) << CHAR_BIT; + seps = nullstr; + seplen = flags & EXP_FULL; start = expdest - (char *)stackblock(); switch (*name) { @@ -1025,13 +1055,14 @@ numvar: expdest = p; break; case '@': - if (quoted && sep) + if ((flags & (EXP_QUOTED | EXP_FULL)) == + (EXP_QUOTED | EXP_FULL)) goto param; /* fall through */ case '*': - /* We will set c to 0 or ~0 depending on whether + /* We will set seplen to 0 or !0 depending on whether * we're doing field splitting. We won't do field - * splitting if either we're quoted or sep is zero. + * splitting if either we're quoted or seplen is zero. * * Instead of testing (quoted || !sep) the following * trick optimises away any branches by using the @@ -1043,20 +1074,22 @@ numvar: #if EXP_QUOTED >> CHAR_BIT != EXP_FULL #error The following two lines expect EXP_QUOTED == EXP_FULL << CHAR_BIT #endif - c = !((quoted | ~sep) & EXP_QUOTED) - 1; - sep &= ~quoted; - sep |= ifsset() ? (unsigned char)(c & ifsval()[0]) : ' '; + seplen &= ~(flags >> CHAR_BIT); + if (!seplen) + seps = ncifs; + seplen = ((seplen - 1) & (ifsmb0len - 1)) + 1; param: - sepc = sep; if (!(ap = shellparam.p)) return -1; - while ((p = *ap++)) { + if (!(p = *ap)) + break; + for (;;) { len += strtodest(p, flags); - if (*ap && sep) { - len++; - memtodest(&sepc, 1, flags | EXP_KEEPNUL); - } + if (!(p = *++ap)) + break; + + len += memtodest(seps, seplen, flags | EXP_KEEPNUL); } break; case '0': @@ -1117,7 +1150,126 @@ recordregion(int start, int end, int nulonly) ifslastp->nulonly = nulonly; } +static unsigned ifsisifs(const char *p, unsigned ml, const char *ifs) +{ + bool isdefifs = false; + bool isifs = false; + wchar_t wc = *p; + wchar_t ifs0; + if (likely(ifs[0]) && unlikely(wcifs)) { + if (wc & 0x80) { + mbstate_t mbst = {}; + wchar_t wc2; + + if (mbrtowc(&wc2, p, ml, &mbst) != ml) + goto out; + wc = wc2; + } + + isifs = wcschr(wcifs, wc); + ifs0 = wcifs[0]; + } else if (likely(!ml)) { + isifs = strchr(ifs, wc); + ifs0 = ifs[0]; + } + + if (isifs) + isdefifs = iswspace(wc ?: ifs0); + +out: + return isifs << 1 | isdefifs; +} + +static char *ifsbreakup_slow(struct ifs_state *ifst, struct arglist *arglist, + int nulonly, char *p) +{ + struct strlist *sp; + unsigned ifschar; + unsigned sisifs; + bool isdefifs; + unsigned ml; + bool isifs; + char *q; + + q = p; + + ifschar = mbnext(p); + p += ifschar & 0xff; + ml = (ifschar >> 8) > 3 ? + (ifschar >> 8) - 2 : 0; + + sisifs = ifsisifs(p, ml, ifst->ifs); + p += ifschar >> 8; + + isifs = sisifs >> 1; + isdefifs = sisifs & 1; + + /* If only reading one more argument: + * If we have exactly one field, + * read that field without its terminator. + * If we have more than one field, + * read all fields including their terminators, + * except for trailing IFS whitespace. + * + * This means that if we have only IFS + * characters left, and at most one + * of them is non-whitespace, we stop + * reading here. + * Otherwise, we read all the remaining + * characters except for trailing + * IFS whitespace. + * + * In any case, r indicates the start + * of the characters to remove, or NULL + * if no characters should be removed. + */ + if (!ifst->maxargs) { + if (isdefifs) { + if (!ifst->r) + ifst->r = q; + return p; + } + + if (!(isifs && ifst->ifsspc)) + ifst->r = NULL; + } else if (ifst->ifsspc) { + if (isifs) + q = p; + + ifst->start = q; + + if (isdefifs) + return p; + } else if (isifs) { + int ifsspc = ifst->ifsspc; + + if (!nulonly) { + ifsspc = isdefifs; + ifst->ifsspc = ifsspc; + } + + /* Ignore IFS whitespace at start */ + if (q == ifst->start && ifsspc) { + ifst->start = p; + return p; + } + if (ifst->maxargs > 0 && !--ifst->maxargs) { + ifst->r = q; + return p; + } + *q = '\0'; + sp = (struct strlist *)stalloc(sizeof *sp); + sp->text = ifst->start; + *arglist->lastp = sp; + arglist->lastp = &sp->next; + ifst->start = p; + return p; + } + + ifst->ifsspc = 0; + return p; +} /* * Break the argument string into pieces based upon IFS and add the @@ -1130,21 +1282,19 @@ void ifsbreakup(char *string, int maxargs, struct arglist *arglist) { struct ifsregion *ifsp; + struct ifs_state ifst; + const char *realifs; struct strlist *sp; - char *start; - char *p; - char *q; - char *r = NULL; - const char *ifs, *realifs; - int ifsspc; int nulonly; + char *p; - - start = string; + ifst.r = NULL; + ifst.start = string; + ifst.maxargs = maxargs; if (ifslastp != NULL) { - ifsspc = 0; + ifst.ifsspc = 0; nulonly = 0; - realifs = ifsset() ? ifsval() : defifs; + realifs = ncifs; ifsp = &ifsfirst; do { int afternul; @@ -1152,106 +1302,60 @@ ifsbreakup(char *string, int maxargs, struct arglist *arglist) p = string + ifsp->begoff; afternul = nulonly; nulonly = ifsp->nulonly; - ifs = nulonly ? nullstr : realifs; - ifsspc = 0; - while (p < string + ifsp->endoff) { - int c; - bool isifs; - bool isdefifs; + ifst.ifs = nulonly ? nullstr : realifs; + ifst.ifsspc = 0; + for (;;) { + char *p0 = p; - q = p; - c = *p++; - if (c == (char)CTLESC) - c = *p++; + while (string + ifsp->endoff - p >= 8) { + union { + uint64_t qw; + unsigned char b[8]; + } x; - isifs = strchr(ifs, c); - isdefifs = false; - if (isifs) - isdefifs = strchr(defifs, c); + x.qw = *(uint64_t *)p; - /* If only reading one more argument: - * If we have exactly one field, - * read that field without its terminator. - * If we have more than one field, - * read all fields including their terminators, - * except for trailing IFS whitespace. - * - * This means that if we have only IFS - * characters left, and at most one - * of them is non-whitespace, we stop - * reading here. - * Otherwise, we read all the remaining - * characters except for trailing - * IFS whitespace. - * - * In any case, r indicates the start - * of the characters to remove, or NULL - * if no characters should be removed. - */ - if (!maxargs) { - if (isdefifs) { - if (!r) - r = q; - continue; - } - - if (!(isifs && ifsspc)) - r = NULL; - - ifsspc = 0; - continue; + if ((x.qw & 0x8080808080808080)) + break; + if (ifsmap[x.b[0]] | + ifsmap[x.b[1]] | + ifsmap[x.b[2]] | + ifsmap[x.b[3]] | + ifsmap[x.b[4]] | + ifsmap[x.b[5]] | + ifsmap[x.b[6]] | + ifsmap[x.b[7]]) + break; + p += 8; } - if (ifsspc) { - if (isifs) - q = p; - - start = q; - - if (isdefifs) - continue; - - isifs = false; + if (p != p0) { + if (!ifst.maxargs) + ifst.r = NULL; + else if (ifst.ifsspc) + ifst.start = p0; + ifst.ifsspc = 0; } - if (isifs) { - if (!(afternul || nulonly)) - ifsspc = isdefifs; - /* Ignore IFS whitespace at start */ - if (q == start && ifsspc) { - start = p; - ifsspc = 0; - continue; - } - if (maxargs > 0 && !--maxargs) { - r = q; - continue; - } - *q = '\0'; - sp = (struct strlist *)stalloc(sizeof *sp); - sp->text = start; - *arglist->lastp = sp; - arglist->lastp = &sp->next; - start = p; - continue; - } + if (p >= string + ifsp->endoff) + break; - ifsspc = 0; + p = ifsbreakup_slow(&ifst, arglist, + afternul | nulonly, p); } } while ((ifsp = ifsp->next) != NULL); if (nulonly) goto add; + if (ifst.r) + *ifst.r = '\0'; } - if (r) - *r = '\0'; - - if (!*start) + if (!*ifst.start) return; add: sp = (struct strlist *)stalloc(sizeof *sp); - sp->text = start; + sp->text = ifst.start; *arglist->lastp = sp; arglist->lastp = &sp->next; } @@ -1277,7 +1381,56 @@ out: ifslastp = NULL; } +void changeifs(const char *ifs) +{ + mbstate_t mbs = {}; + wchar_t *nwcifs; + unsigned mb = 0; + size_t len = 0; + const char *p; + size_t ml; + if (!ifsset()) + ifs = defifs; + ncifs = ifs; + + memset(ifsmap, 0, sizeof(ifsmap)); + + for (p = ifs;; p++) { + unsigned c = (unsigned char)*p; + + mb |= c >> 7; + if (!(c >> 7)) + ifsmap[c] = 1; + + if (c == 0) + break; + + len++; + } + + nwcifs = NULL; + + ifsmb0len = !!len; + + if (!mb) + goto out; + + ml = mbrlen(ifs, len, &mbs); + if (ml == -2 || ml == -1) + ml = 1; + ifsmb0len = ml; + + nwcifs = ckmalloc((len + 1) * sizeof(*wcifs)); + memset(nwcifs, 0, (len + 1) * sizeof(*wcifs)); + + p = ifs; + mbsrtowcs(nwcifs, &p, len + 1, &mbs); + +out: + ckfree(wcifs); + wcifs = nwcifs; +} /* * Expand shell metacharacters. At this point, the only control characters @@ -1420,31 +1573,25 @@ static void expmeta_rmescapes(char *enddir, char *name) preglob(strcpy(enddir, name), RMESCAPE_EMETA); } -static unsigned mbcharlen(char *p) +static int skipesc(char *p) { + unsigned short mb; int esc = 0; - if (*++p == (char)CTLESC) - esc++; + mb = mbnext(p); + if ((mb >> 8) > 3) + return (mb & 0xff) + (mb >> 8) - 1; - return esc + 3 + (unsigned char)p[esc]; -} + esc = mb & 0xff; -static size_t skipesc(char *p) -{ - size_t esc = 0; - - if (p[esc] == (char)CTLMBCHAR) - esc += mbcharlen(p); - else if (p[esc] == (char)CTLESC) - esc++; - else if (p[esc] == '\\' && p[esc + 1]) { + if (!esc && p[esc] == '\\' && p[esc + 1]) { while (p[++esc] == (char)CTLQUOTEMARK) ; - if (p[esc] == (char)CTLMBCHAR) - esc += mbcharlen(p + esc); - else if (p[esc] == (char)CTLESC) - esc++; + mb = mbnext(p + esc); + esc += mb & 0xff; + + if ((mb >> 8) > 3) + esc += (mb >> 8) - 1; } return esc; @@ -1845,6 +1992,7 @@ _rmescapes(char *str, int flag) inquotes = 0; notescaped = globbing; while (*p) { + unsigned mb; unsigned ml; int newnesc = globbing; @@ -1862,10 +2010,11 @@ _rmescapes(char *str, int flag) goto setnesc; } } else if (*p == (char)CTLMBCHAR) { - if (*++p == (char)CTLESC) - p++; + mb = mbnext(p); + ml = mb >> 8; - ml = (unsigned char)*p++; + ml -= 2; + p += mb & 0xff; q = mempcpy(q, p, ml); p += ml + 2; goto setnesc; diff --git a/src/expand.h b/src/expand.h index a78564f..7bcff75 100644 --- a/src/expand.h +++ b/src/expand.h @@ -75,6 +75,7 @@ void removerecordregions(int); void ifsbreakup(char *, int, struct arglist *); void ifsfree(void); void restore_handler_expandarg(struct jmploc *savehandler, int err); +void changeifs(const char *); /* From arith.y */ intmax_t arith(const char *); diff --git a/src/var.c b/src/var.c index 35ea7c6..df432b5 100644 --- a/src/var.c +++ b/src/var.c @@ -86,7 +86,7 @@ struct var varinit[] = { #if ATTY { 0, VSTRFIXED|VTEXTFIXED|VUNSET, "ATTY\0", 0 }, #endif - { 0, VSTRFIXED|VTEXTFIXED, defifsvar, 0 }, + { 0, VSTRFIXED|VTEXTFIXED, defifsvar, changeifs }, { 0, VSTRFIXED|VTEXTFIXED|VUNSET, "MAIL\0", changemail }, { 0, VSTRFIXED|VTEXTFIXED|VUNSET, "MAILPATH\0", changemail }, { 0, VSTRFIXED|VTEXTFIXED, defpathvar, changepath }, @@ -267,9 +267,6 @@ struct var *setvareq(char *s, int flags) n); } - if (vp->func && (flags & VNOFUNC) == 0) - (*vp->func)(varnull(s)); - if ((vp->flags & (VTEXTFIXED|VSTACK)) == 0) ckfree(vp->text); @@ -301,6 +298,9 @@ out_free: vp->text = s; vp->flags = flags; + if (vp->func && (flags & VNOFUNC) == 0) + (*vp->func)(varnull(s)); + out: return vp; } @@ -531,12 +531,12 @@ poplocalvars(void) vp->flags &= ~(VSTRFIXED|VREADONLY); unsetvar(vp->text); } else { - if (vp->func) - (*vp->func)(varnull(lvp->text)); if ((vp->flags & (VTEXTFIXED|VSTACK)) == 0) ckfree(vp->text); vp->flags = lvp->flags; vp->text = lvp->text; + if (vp->func) + (*vp->func)(varnull(vp->text)); } ckfree(lvp); } From patchwork Sun May 19 05:20:23 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Herbert Xu X-Patchwork-Id: 13667770 X-Patchwork-Delegate: herbert@gondor.apana.org.au Received: from abb.hmeau.com (abb.hmeau.com [144.6.53.87]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C33884437 for ; Sun, 19 May 2024 05:20:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=144.6.53.87 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716096029; cv=none; b=vFGRvNW4l7cYlFf9fL6llfu8w8BZYxr2/HuLlOWm62LZa79Zr5Bv8OJUglc/26y6QDLbzF94IndUSGJtGeW0Q2X57+fvlEi+lJ69bmF1dCAqJuBkyh/i7H9GGFIuC5wAFDB1FH5MjJ2YbREDdI/mJIhFOzoJ4zYIT+CPHEMw5fo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716096029; c=relaxed/simple; bh=H+z+9ci96L6S/y5Xo4/HZZY27rypgoKjB7rvSrpRqbc=; h=Date:Message-Id:In-Reply-To:References:From:Subject:To; b=XSP4HYXq8XAmtJG3snnChnq2SAwcHShAvuG9LuVELI1QaWgB6WTROYsvEiCgGjO3//hxaKTj8hkwCbhjGnVxY2CPpoNh3PgzIXAMzhaeuUDjwsSeNbLLAsmcq4iyRw7XUAXDOM3zkWgY1QbALfHPaLbTjYz8pviqepk1GKHuPZE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=gondor.apana.org.au; spf=pass smtp.mailfrom=gondor.apana.org.au; arc=none smtp.client-ip=144.6.53.87 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=gondor.apana.org.au Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gondor.apana.org.au Received: from loth.rohan.me.apana.org.au ([192.168.167.2]) by formenos.hmeau.com with smtp (Exim 4.96 #2 (Debian)) id 1s8Yy6-00HGCb-1z; Sun, 19 May 2024 13:20:23 +0800 Received: by loth.rohan.me.apana.org.au (sSMTP sendmail emulation); Sun, 19 May 2024 13:20:23 +0800 Date: Sun, 19 May 2024 13:20:23 +0800 Message-Id: <2e7b3b33d7c9d4df3f30b4371e74be38f934e154.1716095868.git.herbert@gondor.apana.org.au> In-Reply-To: References: From: Herbert Xu Subject: [v4 PATCH 07/13] expand: Add multi-byte support to pmatch To: DASH Mailing List Precedence: bulk X-Mailing-List: dash@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Add CTLMBCHAR support to pmatch. POSIX equivalence classes and collating symbols are not unsupported. Enable CTLMBCHAR generation in mbtodest. Signed-off-by: Herbert Xu --- src/eval.c | 3 +- src/expand.c | 351 +++++++++++++++++++++++++++++++-------------------- 2 files changed, 217 insertions(+), 137 deletions(-) diff --git a/src/eval.c b/src/eval.c index d169eb8..32f1e64 100644 --- a/src/eval.c +++ b/src/eval.c @@ -451,7 +451,8 @@ evalcase(union node *n, int flags) lineno -= funcline - 1; arglist.lastp = &arglist.list; - expandarg(n->ncase.expr, &arglist, EXP_TILDE); + expandarg(n->ncase.expr, &arglist, FNMATCH_IS_ENABLED ? EXP_TILDE : + EXP_TILDE | EXP_MBCHAR); for (cp = n->ncase.cases ; cp && evalskip == 0 ; cp = cp->nclist.next) { for (patp = cp->nclist.pattern ; patp ; patp = patp->narg.next) { if (casematch(patp, arglist.list->text)) { diff --git a/src/expand.c b/src/expand.c index 8f30e46..a3b81d5 100644 --- a/src/expand.c +++ b/src/expand.c @@ -85,7 +85,6 @@ #define RMESCAPE_GLOB 0x2 /* Add backslashes for glob */ #define RMESCAPE_GROW 0x8 /* Grow strings instead of stalloc */ #define RMESCAPE_HEAP 0x10 /* Malloc strings instead of stalloc */ -#define RMESCAPE_EMETA 0x20 /* Remove backslashes too */ /* Add CTLESC when necessary. */ #define QUOTES_ESC (EXP_FULL | EXP_CASE) @@ -141,7 +140,7 @@ STATIC struct strlist *expsort(struct strlist *); STATIC struct strlist *msort(struct strlist *, int); STATIC void addfname(char *); STATIC int patmatch(char *, const char *); -STATIC int pmatch(const char *, const char *); +STATIC int pmatch(char *, const char *); static size_t cvtnum(intmax_t num, int flags); STATIC size_t esclen(const char *, const char *); STATIC void varunset(const char *, const char *, const char *, int) @@ -156,6 +155,11 @@ STATIC void varunset(const char *, const char *, const char *, int) STATIC inline char * preglob(const char *pattern, int flag) { + if (FNMATCH_IS_ENABLED) { + if (!flag) + flag = RMESCAPE_GROW; + flag |= RMESCAPE_ALLOC; + } flag |= RMESCAPE_GLOB; return _rmescapes((char *)pattern, flag); } @@ -582,28 +586,31 @@ static char *scanleft(char *startp, char *endp, char *rmesc, char *rmescend, loc = startp; loc2 = rmesc; do { - const char *s = loc2; + char *s = FNMATCH_IS_ENABLED ? loc2 : loc; unsigned mb; unsigned ml; int match; - c = *loc2; + c = *s; if (zero) { - *loc2 = '\0'; - s = rmesc; + *s = '\0'; + s = FNMATCH_IS_ENABLED ? rmesc : startp; } match = pmatch(str, s); - *loc2 = c; + *(FNMATCH_IS_ENABLED ? loc2 : loc) = c; if (match) - return quotes ? loc : loc2; + return FNMATCH_IS_ENABLED && quotes ? loc : loc2; if (!c) break; mb = mbnext(loc); loc += (mb & 0xff) + (mb >> 8); - ml = (mb >> 8) > 3 ? (mb >> 8) - 2 : 1; - loc2 += ml; + if (unlikely(FNMATCH_IS_ENABLED || !quotes)) { + ml = (mb >> 8) > 3 ? (mb >> 8) - 2 : 1; + loc2 += ml; + } else + loc2 = loc; } while (1); return 0; } @@ -616,21 +623,23 @@ static char *scanright(char *startp, char *endp, char *rmesc, char *rmescend, char *loc; char *loc2; - for (loc = endp, loc2 = rmescend; loc >= startp; loc2--) { - const char *s = loc2; - char c = *loc2; + for (loc = endp, loc2 = rmescend;; + FNMATCH_IS_ENABLED ? loc2-- : (loc2 = loc)) { + char *s = FNMATCH_IS_ENABLED ? loc2 : loc; + char c = *s; unsigned ml; int match; if (zero) { - *loc2 = '\0'; + *s = '\0'; s = rmesc; } match = pmatch(str, s); - *loc2 = c; + *(FNMATCH_IS_ENABLED ? loc2 : loc) = c; if (match) - return quotes ? loc : loc2; - loc--; + return FNMATCH_IS_ENABLED && quotes ? loc : loc2; + if (--loc < startp) + break; if (!esc--) esc = esclen(startp, loc); if (esc % 2) { @@ -645,7 +654,8 @@ static char *scanright(char *startp, char *endp, char *rmesc, char *rmescend, loc -= ml + 2; if (*loc == (char)CTLESC) loc--; - loc2 -= ml - 1; + if (FNMATCH_IS_ENABLED) + loc2 -= ml - 1; } return 0; } @@ -691,19 +701,21 @@ static char *subevalvar(char *start, char *str, int strloc, int startloc, #endif rmescend = stackblock() + strloc; - str = preglob(rmescend, FNMATCH_IS_ENABLED ? - RMESCAPE_ALLOC | RMESCAPE_GROW : 0); + str = preglob(rmescend, 0); if (FNMATCH_IS_ENABLED) { startp = stackblock() + startloc; rmescend = stackblock() + strloc; nstrloc = str - (char *)stackblock(); } - rmesc = _rmescapes(startp, RMESCAPE_ALLOC | RMESCAPE_GROW); - if (rmesc != startp) - rmescend = expdest; - startp = stackblock() + startloc; - str = stackblock() + nstrloc; + rmesc = startp; + if (FNMATCH_IS_ENABLED || !quotes) { + rmesc = _rmescapes(startp, RMESCAPE_ALLOC | RMESCAPE_GROW); + if (rmesc != startp) + rmescend = expdest; + startp = stackblock() + startloc; + str = stackblock() + nstrloc; + } rmescend--; /* zero = subtype == VSTRIMLEFT || subtype == VSTRIMLEFTMAX */ @@ -894,12 +906,6 @@ static struct mbpair mbtodest(const char *p, char *q, const char *syntax, goto out; } - len = ml; - do { - q = chtodest((signed char)*p++, syntax, q); - } while (--len); - goto out; - if (syntax[CTLMBCHAR] == CCTL) { USTPUTC(CTLMBCHAR, q); USTPUTC(ml, q); @@ -1470,7 +1476,7 @@ static void expandmeta_glob(struct strlist *str) #endif INTOFF; - p = preglob(str->text, RMESCAPE_ALLOC | RMESCAPE_HEAP); + p = preglob(str->text, RMESCAPE_HEAP); i = glob64(p, GLOB_ALTDIRFUNC | GLOB_NOMAGIC, 0, &pglob); if (p != str->text) ckfree(p); @@ -1541,13 +1547,15 @@ expandmeta(struct strlist *str) savelastp = exparg.lastp; INTOFF; - p = str->text; + p = preglob(str->text, RMESCAPE_ALLOC | RMESCAPE_HEAP); len = strlen(p); expdir_max = len + PATH_MAX; expdir = ckmalloc(expdir_max); expmeta(p, len, 0); ckfree(expdir); + if (p != str->text) + ckfree(p); INTON; if (exparg.lastp == savelastp) { /* @@ -1568,9 +1576,21 @@ nometa: } } -static void expmeta_rmescapes(char *enddir, char *name) +static char *expmeta_rmescapes(char *enddir, const char *name) { - preglob(strcpy(enddir, name), RMESCAPE_EMETA); + const char *p; + + if (!FNMATCH_IS_ENABLED) + return strchrnul(rmescapes(strcpy(enddir, name)), 0); + + p = name; + do { + if (*p == '\\' && p[1]) + p++; + *enddir++ = *p; + } while (*p++); + + return enddir - 1; } static int skipesc(char *p) @@ -1585,8 +1605,7 @@ static int skipesc(char *p) esc = mb & 0xff; if (!esc && p[esc] == '\\' && p[esc + 1]) { - while (p[++esc] == (char)CTLQUOTEMARK) - ; + esc++; mb = mbnext(p + esc); esc += mb & 0xff; @@ -1655,9 +1674,8 @@ expmeta(char *name, unsigned name_len, unsigned expdir_len) if (name < start) { c = *start; *start = 0; - expmeta_rmescapes(enddir, name); + enddir = expmeta_rmescapes(enddir, name); *start = c; - enddir += strlen(enddir); } *enddir = 0; cp = expdir; @@ -1673,16 +1691,25 @@ expmeta(char *name, unsigned name_len, unsigned expdir_len) } name_len -= endname - name; matchdot = 0; - pat = preglob(start, RMESCAPE_ALLOC | RMESCAPE_HEAP); + pat = start; + if (FNMATCH_IS_ENABLED) + pat = preglob(pat, RMESCAPE_HEAP); p = pat; - if (*p == '\\') + if (*p == (FNMATCH_IS_ENABLED ? '\\' : (char)CTLESC)) p++; if (*p == '.') matchdot++; while (! int_pending() && (dp = readdir64(dirp)) != NULL) { if (dp->d_name[0] == '.' && ! matchdot) continue; - if (pmatch(pat, dp->d_name)) { + p = dp->d_name; + if (!FNMATCH_IS_ENABLED) { + STARTSTACKSTR(expdest); + strtodest(p, EXP_MBCHAR); + *expdest = 0; + p = stackblock(); + } + if (pmatch(pat, p)) { if (!c) { scopy(dp->d_name, enddir); addfname(expdir); @@ -1706,7 +1733,7 @@ expmeta(char *name, unsigned name_len, unsigned expdir_len) } } } - if (pat != start) + if (FNMATCH_IS_ENABLED && pat != start) ckfree(pat); closedir(dirp); if (c) @@ -1797,52 +1824,48 @@ msort(struct strlist *list, int len) STATIC inline int patmatch(char *pattern, const char *string) { - return pmatch(preglob(pattern, FNMATCH_IS_ENABLED ? - RMESCAPE_ALLOC | RMESCAPE_GROW : 0), - string); + return pmatch(preglob(pattern, 0), string); } -STATIC int ccmatch(const char *p, int chr, const char **r) +static __attribute__((noinline)) int ccmatch(char *p, const char *mbc, int ml, + char **r) { - static const struct class { - char name[10]; - int (*fn)(int); - } classes[] = { - { .name = ":alnum:]", .fn = isalnum }, - { .name = ":cntrl:]", .fn = iscntrl }, - { .name = ":lower:]", .fn = islower }, - { .name = ":space:]", .fn = isspace }, - { .name = ":alpha:]", .fn = isalpha }, - { .name = ":digit:]", .fn = isdigit }, - { .name = ":print:]", .fn = isprint }, - { .name = ":upper:]", .fn = isupper }, - { .name = ":blank:]", .fn = isblank }, - { .name = ":graph:]", .fn = isgraph }, - { .name = ":punct:]", .fn = ispunct }, - { .name = ":xdigit:]", .fn = isxdigit }, - }; - const struct class *class, *end; - - end = classes + sizeof(classes) / sizeof(classes[0]); - for (class = classes; class < end; class++) { - const char *q; - - q = prefix(p, class->name); - if (!q) - continue; - *r = q; - return class->fn(chr); - } + mbstate_t mbst = {}; + wctype_t type; + wchar_t wc; + char *q; *r = 0; - return 0; + + if (*p++ != ':') + return 0; + + q = strstr(p, ":]"); + if (!q) + return 0; + + *q = 0; + type = wctype(p); + *q = ':'; + + if (!type) + return 0; + + *r = q + 2; + + if (mbrtowc(&wc, mbc, ml, &mbst) != ml) + return 0; + + return iswctype(wc, type); } -STATIC int -pmatch(const char *pattern, const char *string) +static int pmatch(char *pattern, const char *string) { - const char *p, *q; + char stop[] = { 0, CTLESC, CTLMBCHAR }; + const char *q; + unsigned mb; + char *p; char c; if (FNMATCH_IS_ENABLED) @@ -1851,36 +1874,43 @@ pmatch(const char *pattern, const char *string) p = pattern; q = string; for (;;) { - switch (c = *p++) { + switch ((signed char)(c = *p++)) { case '\0': goto breakloop; - case '\\': - if (*p) { - c = *p++; - } - goto dft; - case '?': - if (*q++ == '\0') - return 0; + case CTLESC: + c = *p++; break; + case '?': + if (*q == '\0') + return 0; + mb = mbnext(q); + q += (mb >> 8) + (mb & 0xff); + continue; case '*': c = *p; while (c == '*') c = *++p; - if (c != '\\' && c != '?' && c != '*' && c != '[') { - while (*q != c) { - if (*q == '\0') + stop[0] = CTLESC; + if (c != '?' && c != '*' && c != '[') + stop[0] = c; + for (;;) { + if (!stop[0]) + q = nullstr; + else if (stop[0] != (char)CTLESC) { + q = strpbrk(q, stop); + if (!q) return 0; - q++; } - } - do { if (pmatch(p, q)) return 1; - } while (*q++ != '\0'); + if (!*q) + break; + mb = mbnext(q); + q += (mb >> 8) + (mb & 0xff); + } return 0; case '[': { - const char *startp; + char *startp; int invert, found; char chr; @@ -1891,48 +1921,85 @@ pmatch(const char *pattern, const char *string) p++; } found = 0; + mb = mbnext(q); + q += mb & 0xff; + mb >>= 8; chr = *q; if (chr == '\0') return 0; c = *p++; do { + unsigned mbp = 0; + const char *mbs = &c; + if (!c) { p = startp; c = '['; goto dft; } if (c == '[') { - const char *r; + char *r; - found |= !!ccmatch(p, chr, &r); + found |= !!ccmatch(p, q, mb > 1 ? + mb - 2 : mb, + &r); if (r) { p = r; continue; } - } else if (c == '\\') + } else if (c == (char)CTLESC) c = *p++; + else if (c == (char)CTLMBCHAR) { + mbp = mbnext(--p); + p += mbp & 0xff; + mbs = p; + mbp >>= 8; + p += mbp; + } if (*p == '-' && p[1] != ']') { p++; - if (*p == '\\') + if (*p == (char)CTLESC) p++; - if (chr >= c && chr <= *p) + else if (*p == CTLMBCHAR) { + mbp = mbnext(p); + p += mbp & 0xff; + p += mbp >> 8; + continue; + } + if (!(mbp | (mb - 1)) && + chr >= c && chr <= *p) found = 1; p++; - } else { - if (chr == c) - found = 1; - } + } else if (!memcmp(mbs, q, mb)) + found = 1; } while ((c = *p++) != ']'); if (found == invert) return 0; - q++; - break; + q += mb; + continue; } -dft: default: - if (*q++ != c) + case CTLMBCHAR: + mb = mbnext(--p); + p += mb & 0xff; + mb = mbnext(q); + q += mb & 0xff; + mb >>= 8; + + if (memcmp(p - 1, q - 1, mb + 1)) return 0; - break; + + p += mb; + q += mb; + continue; } +dft: + mb = mbnext(q); + if ((mb >> 8) > 1) + return 0; + q += mb & 0xff; + if (*q != c) + return 0; + q += mb >> 8; } breakloop: if (*q != '\0') @@ -1953,7 +2020,6 @@ _rmescapes(char *str, int flag) int notescaped; int globbing; int inquotes; - int expmeta; p = strpbrk(str, cqchars); if (!p) { @@ -1962,7 +2028,6 @@ _rmescapes(char *str, int flag) q = p; r = str; globbing = flag & RMESCAPE_GLOB; - expmeta = (flag & RMESCAPE_EMETA) ? RMESCAPE_GLOB : 0; if (flag & RMESCAPE_ALLOC) { size_t len = p - str; @@ -1992,50 +2057,64 @@ _rmescapes(char *str, int flag) inquotes = 0; notescaped = globbing; while (*p) { + int c = (signed char)*p; + int newnesc = globbing; unsigned mb; unsigned ml; - int newnesc = globbing; - if (*p == (char)CTLQUOTEMARK) { + if (c == CTLQUOTEMARK) { p++; inquotes ^= globbing; continue; - } else if (*p == '\\') { + } else if (c == '\\') { /* naked back slash */ newnesc ^= notescaped; /* naked backslashes can only occur outside quotes */ inquotes = 0; - if (expmeta & ~newnesc) { - p++; - goto setnesc; + if (!FNMATCH_IS_ENABLED && notescaped) + c = CTLESC; + } else if (c == CTLESC) { + if ((notescaped ^ inquotes) & inquotes) { + if (FNMATCH_IS_ENABLED) + *q++ = '\\'; + else + q[-1] = '\\'; } - } else if (*p == (char)CTLMBCHAR) { + if (globbing) + *q++ = FNMATCH_IS_ENABLED ? '\\' : CTLESC; + + c = *++p; + } else if (c == CTLMBCHAR) { + unsigned tail = 2; + + if (!FNMATCH_IS_ENABLED && (globbing ^ notescaped)) + q--; + mb = mbnext(p); ml = mb >> 8; - ml -= 2; - p += mb & 0xff; - q = mempcpy(q, p, ml); - p += ml + 2; - goto setnesc; - } else if (*p == (char)CTLESC) { - p++; - if (expmeta) - ; - else if (notescaped) - *q++ = '\\'; - else if (inquotes) { - *q++ = '\\'; - *q++ = '\\'; + if (!globbing || FNMATCH_IS_ENABLED) { + p += mb & 0xff; + ml -= 2; + } else { + ml += mb & 0xff; + tail = 0; } + + q = mempcpy(q, p, ml); + p += ml + tail; + goto setnesc; } - *q++ = *p++; + *q++ = c; + p++; setnesc: notescaped = newnesc; } + if (!FNMATCH_IS_ENABLED && (globbing ^ notescaped)) + q[-1] = '\\'; *q = '\0'; - if (flag & RMESCAPE_GROW) { + if (flag & (RMESCAPE_ALLOC | RMESCAPE_GROW)) { expdest = r; STADJUST(q - r + 1, expdest); } From patchwork Sun May 19 05:20:26 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Herbert Xu X-Patchwork-Id: 13667771 X-Patchwork-Delegate: herbert@gondor.apana.org.au Received: from abb.hmeau.com (abb.hmeau.com [144.6.53.87]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6A10433D5 for ; Sun, 19 May 2024 05:20:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=144.6.53.87 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716096031; cv=none; b=r0Snh/A1XI0+IrNbkuUL+8JjWYffaCc+43YlI7zXKXzpkA/QXAGdmo0aNcw6UqNZptIsYGsdqP05UhIfWcNjCzoQZiFhwltBidFdmKg1CRYHUZhIYmUrbAKjeyzjtq5rupOjKbn8K2s2W4eOJ8puu6ylO6DpKjnq8K8pOqnqINY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716096031; c=relaxed/simple; bh=e6fMkV1KzRDWE3ia5oUDlujL69MRfZXpp5nblpvJEtA=; h=Date:Message-Id:In-Reply-To:References:From:Subject:To; b=utMpGyx4YVrcUZ9Q1v6ePZj64NgUQsn0Qwa7uKVYlgUGLLm9WfI6YijXWYiSWbTOnuGM0dXDWogmrapB8MZbrR4cpQagmhH8Dzxwugp+0IccSbSaKHobYuHXXaWHkkYNHLZxcSMkTZK6BL4wG2D6m4KI1X+tBRbhFypZA7IS9WM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=gondor.apana.org.au; spf=pass smtp.mailfrom=gondor.apana.org.au; arc=none smtp.client-ip=144.6.53.87 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=gondor.apana.org.au Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gondor.apana.org.au Received: from loth.rohan.me.apana.org.au ([192.168.167.2]) by formenos.hmeau.com with smtp (Exim 4.96 #2 (Debian)) id 1s8Yy8-00HGCm-2z; Sun, 19 May 2024 13:20:25 +0800 Received: by loth.rohan.me.apana.org.au (sSMTP sendmail emulation); Sun, 19 May 2024 13:20:26 +0800 Date: Sun, 19 May 2024 13:20:26 +0800 Message-Id: <8f8ef6a507c7cc98ca2bbfdc7b9c1d514b0eb801.1716095868.git.herbert@gondor.apana.org.au> In-Reply-To: References: From: Herbert Xu Subject: [v4 PATCH 08/13] input: Allow MB_LEN_MAX calls to pungetc To: DASH Mailing List Precedence: bulk X-Mailing-List: dash@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: In order to parse multi-byte characters which may be up to MB_LEN_MAX bytes long, allow enough calls to pungetc to undo a single multi-byte character. Also add a function pungetn to do multiple pungetc calls in a row. Signed-off-by: Herbert Xu --- src/input.c | 58 ++++++++++++++++++++++++++++++++++------------------- src/input.h | 11 +++++----- 2 files changed, 42 insertions(+), 27 deletions(-) diff --git a/src/input.c b/src/input.c index 1c598b2..e17e067 100644 --- a/src/input.c +++ b/src/input.c @@ -56,7 +56,7 @@ #include "main.h" #include "myhistedit.h" -#define IBUFSIZ (BUFSIZ + 1) +#define IBUFSIZ (BUFSIZ + PUNGETC_MAX + 1) MKINIT struct parsefile basepf; /* top level input file */ @@ -83,13 +83,16 @@ INIT { } RESET { + int c; + /* clear input buffer */ popallfiles(); - basepf.unget = 0; - while (basepf.lastc[0] != '\n' && - basepf.lastc[0] != PEOF && - !int_pending()) - pgetc(); + + c = PEOF; + if (basepf.nextc - basebuf > basepf.unget) + c = basepf.nextc[-basepf.unget - 1]; + while (c != '\n' && c != PEOF && !int_pending()) + c = pgetc(); } FORKRESET { @@ -131,17 +134,20 @@ static int __pgetc(void) { int c; - if (parsefile->unget) - return parsefile->lastc[--parsefile->unget]; + if (parsefile->unget) { + long unget = -(long)(unsigned)parsefile->unget--; + + if (parsefile->nleft < 0) + return preadbuffer(); + + return parsefile->nextc[unget]; + } if (--parsefile->nleft >= 0) c = (signed char)*parsefile->nextc++; else c = preadbuffer(); - parsefile->lastc[1] = parsefile->lastc[0]; - parsefile->lastc[0] = c; - return c; } @@ -176,9 +182,16 @@ static int stdin_clear_nonblock(void) static int preadfd(void) { + char *buf = parsefile->buf; + int unget; int nr; - char *buf = parsefile->buf; - parsefile->nextc = buf; + + unget = parsefile->nextc - buf; + if (unget > PUNGETC_MAX) + unget = PUNGETC_MAX; + + memmove(buf, parsefile->nextc - unget, unget); + parsefile->nextc = buf += unget; retry: #ifndef SMALL @@ -196,8 +209,8 @@ retry: nr = 0; else { nr = el_len; - if (nr > IBUFSIZ - 1) - nr = IBUFSIZ - 1; + if (nr > BUFSIZ) + nr = BUFSIZ; memcpy(buf, rl_cp, nr); if (nr != el_len) { el_len -= nr; @@ -209,9 +222,9 @@ retry: } else #endif if (parsefile->fd) - nr = read(parsefile->fd, buf, IBUFSIZ - 1); + nr = read(parsefile->fd, buf, BUFSIZ); else { - unsigned len = IBUFSIZ - 1; + unsigned len = BUFSIZ; nr = 0; @@ -348,6 +361,11 @@ done: return (signed char)*parsefile->nextc++; } +void pungetn(int n) +{ + parsefile->unget += n; +} + /* * Undo a call to pgetc. Only two characters may be pushed back. * PEOF may be pushed back. @@ -356,7 +374,7 @@ done: void pungetc(void) { - parsefile->unget++; + pungetn(1); } /* @@ -383,7 +401,6 @@ pushstring(char *s, void *ap) sp->prevnleft = parsefile->nleft; sp->unget = parsefile->unget; sp->spfree = parsefile->spfree; - memcpy(sp->lastc, parsefile->lastc, sizeof(sp->lastc)); sp->ap = (struct alias *)ap; if (ap) { ((struct alias *)ap)->flag |= ALIASINUSE; @@ -413,7 +430,6 @@ static void popstring(void) parsefile->nextc = sp->prevstring; parsefile->nleft = sp->prevnleft; parsefile->unget = sp->unget; - memcpy(parsefile->lastc, sp->lastc, sizeof(sp->lastc)); /*dprintf("*** calling popstring: restoring to '%s'\n", parsenextc);*/ parsefile->strpush = sp->prev; parsefile->spfree = sp; @@ -457,7 +473,7 @@ setinputfd(int fd, int push) } parsefile->fd = fd; if (parsefile->buf == NULL) - parsefile->buf = ckmalloc(IBUFSIZ); + parsefile->nextc = parsefile->buf = ckmalloc(IBUFSIZ); input_set_lleft(parsefile, parsefile->nleft = 0); plinno = 1; } diff --git a/src/input.h b/src/input.h index 1ff5773..5b4a045 100644 --- a/src/input.h +++ b/src/input.h @@ -34,12 +34,16 @@ * @(#)input.h 8.2 (Berkeley) 5/4/95 */ +#include + #ifdef SMALL #define IS_DEFINED_SMALL 1 #else #define IS_DEFINED_SMALL 0 #endif +#define PUNGETC_MAX (MB_LEN_MAX > 16 ? MB_LEN_MAX : 16) + /* PEOF (the end of file marker) is defined in syntax.h */ enum { @@ -59,9 +63,6 @@ struct strpush { /* Delay freeing so we can stop nested aliases. */ struct strpush *spfree; - /* Remember last two characters for pungetc. */ - int lastc[2]; - /* Number of outstanding calls to pungetc. */ int unget; }; @@ -87,9 +88,6 @@ struct parsefile { /* Delay freeing so we can stop nested aliases. */ struct strpush *spfree; - /* Remember last two characters for pungetc. */ - int lastc[2]; - /* Number of outstanding calls to pungetc. */ int unget; }; @@ -106,6 +104,7 @@ extern struct parsefile *parsefile; int pgetc(void); int pgetc2(void); void pungetc(void); +void pungetn(int); void pushstring(char *, void *); int setinputfile(const char *, int); void setinputstring(char *); From patchwork Sun May 19 05:20:28 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Herbert Xu X-Patchwork-Id: 13667772 X-Patchwork-Delegate: herbert@gondor.apana.org.au Received: from abb.hmeau.com (abb.hmeau.com [144.6.53.87]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 58F5F4437 for ; Sun, 19 May 2024 05:20:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=144.6.53.87 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716096032; cv=none; b=bXTMjBqbJkd2ip7fbniRUY7V5u9MUXWDuo2zNIsVQFm2iS1eMK0kMBPh6aYkZXlwhXe/ivzBm2nTVNgj/Al7w/qqhBmraCMMSnVn6I42rdofrKAgY9Yb4KmP0hckEQO0O+IQwnZPMcRcc/BrQKP+C+T58ZIAnrQrPM0IVLywiMU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716096032; c=relaxed/simple; bh=mDQ9ZCbicr5SFLv6A+EEB8nvfMcv5Am7kAM0I2z4Evs=; h=Date:Message-Id:In-Reply-To:References:From:Subject:To; b=EmUslaevXCTPGzG771zuhJu18lM65d69dvY62arrIPWdF9X5RGrMfI8GAkZ8q+sH2ESfOUByXRCPZRIF5lcO65MIzQT5iTsTVygDHLqxDCpkuI03f7OI/YKGSDLWiYs377FGZLjFJp7XBsUhP+Q8gtcSSSi29TIQxerbghnG+S0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=gondor.apana.org.au; spf=pass smtp.mailfrom=gondor.apana.org.au; arc=none smtp.client-ip=144.6.53.87 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=gondor.apana.org.au Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gondor.apana.org.au Received: from loth.rohan.me.apana.org.au ([192.168.167.2]) by formenos.hmeau.com with smtp (Exim 4.96 #2 (Debian)) id 1s8YyB-00HGD7-0k; Sun, 19 May 2024 13:20:28 +0800 Received: by loth.rohan.me.apana.org.au (sSMTP sendmail emulation); Sun, 19 May 2024 13:20:28 +0800 Date: Sun, 19 May 2024 13:20:28 +0800 Message-Id: <97c5c86c09bb88eb4f85a4fd8acf2c5f3cdaf22a.1716095868.git.herbert@gondor.apana.org.au> In-Reply-To: References: From: Herbert Xu Subject: [v4 PATCH 09/13] input: Add pgetc_eoa To: DASH Mailing List Precedence: bulk X-Mailing-List: dash@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: This reintroduces PEOA in a limited way. Instead of allowing pgetc to return it, limit it to a new function pgetc_eoa so only specific callers need to deal with PEOA. Signed-off-by: Herbert Xu --- src/input.c | 8 +++++++- src/input.h | 3 ++- 2 files changed, 9 insertions(+), 2 deletions(-) diff --git a/src/input.c b/src/input.c index e17e067..bedc581 100644 --- a/src/input.c +++ b/src/input.c @@ -157,7 +157,7 @@ static int __pgetc(void) * Nul characters in the input are silently discarded. */ -int pgetc(void) +int __attribute__((noinline)) pgetc(void) { struct strpush *sp = parsefile->spfree; @@ -167,6 +167,12 @@ int pgetc(void) return __pgetc(); } +int pgetc_eoa(void) +{ + return parsefile->strpush && parsefile->nleft == -1 && + parsefile->strpush->ap ? PEOA : pgetc(); +} + static int stdin_clear_nonblock(void) { int flags = fcntl(0, F_GETFL, 0); diff --git a/src/input.h b/src/input.h index 5b4a045..151b1c6 100644 --- a/src/input.h +++ b/src/input.h @@ -45,6 +45,7 @@ #define PUNGETC_MAX (MB_LEN_MAX > 16 ? MB_LEN_MAX : 16) /* PEOF (the end of file marker) is defined in syntax.h */ +#define PEOA ((PEOF) - 1) enum { INPUT_PUSH_FILE = 1, @@ -102,7 +103,7 @@ extern struct parsefile *parsefile; #define plinno (parsefile->linno) int pgetc(void); -int pgetc2(void); +int pgetc_eoa(void); void pungetc(void); void pungetn(int); void pushstring(char *, void *); From patchwork Sun May 19 05:20:30 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Herbert Xu X-Patchwork-Id: 13667773 X-Patchwork-Delegate: herbert@gondor.apana.org.au Received: from abb.hmeau.com (abb.hmeau.com [144.6.53.87]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D5C3A46BA for ; Sun, 19 May 2024 05:20:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=144.6.53.87 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716096036; cv=none; b=ZXQL/whmKMPEOoJyZLD1zX0yZZ/S2vkqp8NOxyp7KmMOdDE/95cP/F876PsMjmgrdtK0C89Ed1JphU6Ke9DdqQUtisRKyxQ9NU2131Jd/qRGRbGGp4utEuBnRAqerV4pnXZsUyO7TkdYSTfkHGDowmqfaM/zb8eFa4mgNBaTaqI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716096036; c=relaxed/simple; bh=M/YIgXVDEOwBJLWzQruiTSVttfG7mFe65owbsN9dSJg=; h=Date:Message-Id:In-Reply-To:References:From:Subject:To; b=IHFl27VOAflm+6cnjmjSm39v31QaAKXQUBka6C6G3si6z+Dob4yuIfe460buvJwG10Dlgz/ghVkQQmmbxX1LywJxkFyiRIaIy26o/jjCdgfNsNDTLmqkxgGb+vN5ZTr5Hs/qPQHpUuyMmNEQHlyWd1V6BzQh7tiM4zpNOwR58II= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=gondor.apana.org.au; spf=pass smtp.mailfrom=gondor.apana.org.au; arc=none smtp.client-ip=144.6.53.87 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=gondor.apana.org.au Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gondor.apana.org.au Received: from loth.rohan.me.apana.org.au ([192.168.167.2]) by formenos.hmeau.com with smtp (Exim 4.96 #2 (Debian)) id 1s8YyD-00HGDu-1g; Sun, 19 May 2024 13:20:30 +0800 Received: by loth.rohan.me.apana.org.au (sSMTP sendmail emulation); Sun, 19 May 2024 13:20:30 +0800 Date: Sun, 19 May 2024 13:20:30 +0800 Message-Id: <88f209b1040d15c717b03da71dc7698862721c2d.1716095868.git.herbert@gondor.apana.org.au> In-Reply-To: References: From: Herbert Xu Subject: [v4 PATCH 10/13] parser: Add support for multi-byte characters To: DASH Mailing List Precedence: bulk X-Mailing-List: dash@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Add the requisite markers for multi-byte characters so that the expansion code can recognise them. Also allow wide blank characters to terminate words. Signed-off-by: Herbert Xu --- src/expand.c | 19 +++++++ src/mktokens | 1 + src/parser.c | 136 +++++++++++++++++++++++++++++++++++++++++---------- 3 files changed, 129 insertions(+), 27 deletions(-) diff --git a/src/expand.c b/src/expand.c index a3b81d5..eedd69d 100644 --- a/src/expand.c +++ b/src/expand.c @@ -275,6 +275,7 @@ static char *argstr(char *p, int flag) CTLESC, CTLVAR, CTLBACKQ, + CTLMBCHAR, CTLARI, CTLENDARI, 0 @@ -299,6 +300,8 @@ tilde: start: startloc = expdest - (char *)stackblock(); for (;;) { + unsigned ml; + unsigned mb; int end; length += strcspn(p + length, reject); @@ -361,6 +364,22 @@ addquote: startloc++; } break; + case CTLMBCHAR: + c = (signed char)*p--; + mb = mbnext(p); + ml = (mb >> 8) - 2; + if (flag & QUOTES_ESC) { + length = (mb >> 8) + (mb & 0xff); + if (c == (char)CTLESC) + startloc += length; + break; + } + if (c == CTLESC) + startloc += ml; + p += mb & 0xff; + expdest = stnputs(p, ml, expdest); + p += mb >> 8; + break; case CTLESC: startloc++; length++; diff --git a/src/mktokens b/src/mktokens index 78055be..dcef676 100644 --- a/src/mktokens +++ b/src/mktokens @@ -41,6 +41,7 @@ cat > "${TMPDIR}"/ka$$ <<\! TEOF 1 end of file +TBLANK 0 blank TNL 0 newline TSEMI 0 ";" TBACKGND 0 "&" diff --git a/src/parser.c b/src/parser.c index 27611f0..71d61f3 100644 --- a/src/parser.c +++ b/src/parser.c @@ -36,7 +36,11 @@ #include #endif +#include +#include #include +#include +#include #include "shell.h" #include "parser.h" @@ -801,6 +805,8 @@ xxreadtoken(void) setprompt(2); } for (;;) { /* until token or start of word found */ + int tok; + c = pgetc_eatbnl(); switch (c) { case ' ': case '\t': @@ -834,9 +840,10 @@ xxreadtoken(void) case ')': RETURN(TRP); } - break; + tok = readtoken1(c, BASESYNTAX, (char *)NULL, 0); + if (tok != TBLANK) + return tok; } - return readtoken1(c, BASESYNTAX, (char *)NULL, 0); #undef RETURN } @@ -876,7 +883,53 @@ static void synstack_pop(struct synstack **stack) *stack = (*stack)->next; } +static unsigned getmbc(int c, char *out, int mode) +{ + char *const start = out; + mbstate_t mbst = {}; + unsigned ml = 0; + size_t ml2; + wchar_t wc; + char *mbc; + if (likely(c >= 0)) + return 0; + + mbc = (mode & 3) < 2 ? out + 2 + (mode == 1) : out; + mbc[ml] = c; + while ((ml2 = mbrtowc(&wc, mbc + ml++, 1, &mbst)) == -2) { + if (ml >= MB_LEN_MAX) + break; + c = pgetc_eoa(); + if (c == PEOA || c == PEOF) + break; + mbc[ml] = c; + } + + if (ml2 == 1 && ml > 1) { + if (mode == 4 && iswblank(wc)) + return 1; + + if ((mode & 3) < 2) { + USTPUTC(CTLMBCHAR, out); + if (mode == 1) + USTPUTC(CTLESC, out); + USTPUTC(ml, out); + } + STADJUST(ml, out); + if ((mode & 3) < 2) { + USTPUTC(ml, out); + USTPUTC(CTLMBCHAR, out); + } + + return out - start; + } + + if (ml > 1) + pungetn(ml - 1); + + return 0; +} /* * If eofmark is NULL, read a word or a redirection symbol. If eofmark @@ -929,12 +982,29 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs) } #endif CHECKEND(); /* set c to PEOF if at end of here document */ - for (;;) { /* until end of line or end of word */ - CHECKSTRSPACE(4, out); /* permit 4 calls to USTPUTC */ + /* Until end of line or end of word */ + for (;; c = pgetc_top(synstack)) { + int fieldsplitting; + unsigned ml; + + /* Permit max(MB_LEN_MAX, 23) calls to USTPUTC. */ + CHECKSTRSPACE((MB_LEN_MAX > 16 ? MB_LEN_MAX : 16) + 7, + out); + fieldsplitting = synstack->syntax == BASESYNTAX && + !synstack->varnest ? 4 : 0; + ml = getmbc(c, out, fieldsplitting); + if (ml == 1) { + if (out == stackblock()) + return TBLANK; + c = pgetc(); + break; + } + out += ml; + if (ml) + continue; switch(synstack->syntax[c]) { case CNL: /* '\n' */ - if (synstack->syntax == BASESYNTAX && - !synstack->varnest) + if (fieldsplitting) goto endword; /* exit outer loop */ USTPUTC(c, out); nlprompt(); @@ -956,26 +1026,33 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs) USTPUTC(CTLESC, out); USTPUTC('\\', out); pungetc(); - } else { - if ( - synstack->dblquote && - c != '\\' && c != '`' && - c != '$' && ( - c != '"' || - (eofmark != NULL && - !synstack->varnest) - ) && ( - c != '}' || - !synstack->varnest - ) - ) { - USTPUTC(CTLESC, out); - USTPUTC('\\', out); - } - USTPUTC(CTLESC, out); - USTPUTC(c, out); - quotef++; + break; } + + if ( + synstack->dblquote && + c != '\\' && c != '`' && + c != '$' && ( + c != '"' || + (eofmark != NULL && + !synstack->varnest) + ) && ( + c != '}' || + !synstack->varnest + ) + ) { + USTPUTC(CTLESC, out); + USTPUTC('\\', out); + } + quotef++; + + ml = getmbc(c, out, 1); + out += ml; + if (ml) + break; + + USTPUTC(CTLESC, out); + USTPUTC(c, out); break; case CSQUOTE: synstack->syntax = SQSYNTAX; @@ -1053,11 +1130,10 @@ toggledq: case CEOF: goto endword; /* exit outer loop */ default: - if (synstack->varnest == 0) + if (fieldsplitting) goto endword; /* exit outer loop */ USTPUTC(c, out); } - c = pgetc_top(synstack); } } endword: @@ -1384,6 +1460,7 @@ parsebackq: { size_t psavelen; size_t savelen; union node *n; + unsigned ml; char *pstr; char *str; @@ -1415,6 +1492,11 @@ parsebackq: { if (pc != '\\' && pc != '`' && pc != '$' && (!synstack->dblquote || pc != '"')) STPUTC('\\', pout); + CHECKSTRSPACE(MB_LEN_MAX, pout); + ml = getmbc(pc, pout, 2); + pout += ml; + if (ml) + continue; break; case PEOF: From patchwork Sun May 19 05:20:33 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Herbert Xu X-Patchwork-Id: 13667774 X-Patchwork-Delegate: herbert@gondor.apana.org.au Received: from abb.hmeau.com (abb.hmeau.com [144.6.53.87]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0929933D5 for ; Sun, 19 May 2024 05:20:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=144.6.53.87 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716096037; cv=none; b=hS7TSR66pEtZ2CftyZvTj4biWVFIwAc+j/r8JWqYxE8GxDtoc2yFmY8UwoRBCeeG2wHmQo3wugzwJRuxXaz6W9RbjcjNGaaBke2m7yqSCVBoYJKsPh/ED+gYNQqwCQ48QuxF5zre2YLBypFxt/Ia0/Y9MnwME5TAkU4r+/AyntQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716096037; c=relaxed/simple; bh=a0/Bhd8zTCS6Pf4Cue7MBMpFXoAwS0W8kMwIM2hgVeM=; h=Date:Message-Id:In-Reply-To:References:From:Subject:To; b=r2ROTSbvKlhtdiUpjDSUd7ki0JR5JMe656Nb3cTupMMH0A4tOOzFa5OAXWm5OAvEBoCApuWVM8d4qQW7MMmAz5UNhsf03gpCw3PCDtNgkhXNG3rhpfwlqWsgBAwMRppe4YOMRum8rO2ItXaR6vEqny5yFl1vYGaqJVVl2ZGaxNs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=gondor.apana.org.au; spf=pass smtp.mailfrom=gondor.apana.org.au; arc=none smtp.client-ip=144.6.53.87 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=gondor.apana.org.au Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gondor.apana.org.au Received: from loth.rohan.me.apana.org.au ([192.168.167.2]) by formenos.hmeau.com with smtp (Exim 4.96 #2 (Debian)) id 1s8YyF-00HGET-2k; Sun, 19 May 2024 13:20:32 +0800 Received: by loth.rohan.me.apana.org.au (sSMTP sendmail emulation); Sun, 19 May 2024 13:20:33 +0800 Date: Sun, 19 May 2024 13:20:33 +0800 Message-Id: <766de45e4c736df75181acf705d45d3fffd478dc.1716095868.git.herbert@gondor.apana.org.au> In-Reply-To: References: From: Herbert Xu Subject: [v4 PATCH 11/13] input: Always push in setinputfile To: DASH Mailing List Precedence: bulk X-Mailing-List: dash@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Push the input file even in the case of "sh file". This is because the base parsefile will be used for read(1). Signed-off-by: Herbert Xu --- src/input.c | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/src/input.c b/src/input.c index bedc581..1712e5f 100644 --- a/src/input.c +++ b/src/input.c @@ -61,6 +61,7 @@ MKINIT struct parsefile basepf; /* top level input file */ MKINIT char basebuf[IBUFSIZ]; /* buffer for top level input file */ +MKINIT struct parsefile *toppf = &basepf; struct parsefile *parsefile = &basepf; /* current input file */ int whichprompt; /* 1 == PS1, 2 == PS2 */ @@ -89,8 +90,8 @@ RESET { popallfiles(); c = PEOF; - if (basepf.nextc - basebuf > basepf.unget) - c = basepf.nextc[-basepf.unget - 1]; + if (toppf->nextc - toppf->buf > toppf->unget) + c = toppf->nextc[-toppf->unget - 1]; while (c != '\n' && c != PEOF && !int_pending()) c = pgetc(); } @@ -473,13 +474,11 @@ out: static void setinputfd(int fd, int push) { - if (push) { - pushfile(); - parsefile->buf = 0; - } + pushfile(); + if (!push) + toppf = parsefile; parsefile->fd = fd; - if (parsefile->buf == NULL) - parsefile->nextc = parsefile->buf = ckmalloc(IBUFSIZ); + parsefile->nextc = parsefile->buf = ckmalloc(IBUFSIZ); input_set_lleft(parsefile, parsefile->nleft = 0); plinno = 1; } @@ -560,5 +559,5 @@ void unwindfiles(struct parsefile *stop) void popallfiles(void) { - unwindfiles(&basepf); + unwindfiles(toppf); } From patchwork Sun May 19 05:20:35 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Herbert Xu X-Patchwork-Id: 13667775 X-Patchwork-Delegate: herbert@gondor.apana.org.au Received: from abb.hmeau.com (abb.hmeau.com [144.6.53.87]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4D81633D5 for ; Sun, 19 May 2024 05:20:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=144.6.53.87 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716096040; cv=none; b=cnlFXFAdCP8UuCwWuigZiADJePFOlJJoHE1NC1fH55LBezMR9vEjPxAA/Z0mVYSszLKgdHMB3B0N+BFouSP87HnMFwn2B/Xx7t5xBgmSxS7psYboQMvW4iujDXSfZNtuflCPyjwDKDEjeiPqyVUmaKn5nBvYzJbnT4n2lK0/VtU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716096040; c=relaxed/simple; bh=Fc2IWehlzBuHjHueNdATIJlsfIzWttnfPNdiYfwlIIQ=; h=Date:Message-Id:In-Reply-To:References:From:Subject:To; b=exXTan1oohcwOPnmrld+/nYIHu7MXjllvcF7ogxrk5TY0UP8331Om+VorGV6qk3s39lZfmGF54r+c6QJFolPicTOcTR3jc2QKlbRNJDlRROWeZTyuBLfLAjDf8mpXfmFJeIQyfk9CBIT50XwhWSyVm05GwG1h6BM1AKS1ucWJLw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=gondor.apana.org.au; spf=pass smtp.mailfrom=gondor.apana.org.au; arc=none smtp.client-ip=144.6.53.87 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=gondor.apana.org.au Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gondor.apana.org.au Received: from loth.rohan.me.apana.org.au ([192.168.167.2]) by formenos.hmeau.com with smtp (Exim 4.96 #2 (Debian)) id 1s8YyI-00HGF5-0R; Sun, 19 May 2024 13:20:35 +0800 Received: by loth.rohan.me.apana.org.au (sSMTP sendmail emulation); Sun, 19 May 2024 13:20:35 +0800 Date: Sun, 19 May 2024 13:20:35 +0800 Message-Id: <9668fc2792493d1d5f14e8606c5f8bd96e086c93.1716095868.git.herbert@gondor.apana.org.au> In-Reply-To: References: From: Herbert Xu Subject: [v4 PATCH 12/13] builtin: Use pgetc in read(1) To: DASH Mailing List Precedence: bulk X-Mailing-List: dash@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Use pgetc instead of read(2) in read(1). This allows any future buffering in the input layer to be used by read(1). This also allows read(1) to call helpers in the parser that may use the input layer. Signed-off-by: Herbert Xu --- src/input.c | 42 ++++++++++++++++++++++++++++-------------- src/input.h | 1 + src/miscbltin.c | 39 +++++++++++++++++++-------------------- 3 files changed, 48 insertions(+), 34 deletions(-) diff --git a/src/input.c b/src/input.c index 1712e5f..6779069 100644 --- a/src/input.c +++ b/src/input.c @@ -42,19 +42,20 @@ * This file implements the input routines used by the parser. */ -#include "eval.h" -#include "shell.h" -#include "redir.h" -#include "syntax.h" -#include "input.h" -#include "output.h" -#include "options.h" -#include "memalloc.h" -#include "error.h" #include "alias.h" -#include "parser.h" +#include "error.h" +#include "eval.h" +#include "input.h" #include "main.h" +#include "memalloc.h" #include "myhistedit.h" +#include "options.h" +#include "output.h" +#include "parser.h" +#include "redir.h" +#include "shell.h" +#include "syntax.h" +#include "trap.h" #define IBUFSIZ (BUFSIZ + PUNGETC_MAX + 1) @@ -258,7 +259,7 @@ retry: } if (nr < 0) { - if (errno == EINTR) + if (errno == EINTR && !(basepf.prev && pending_sig)) goto retry; } return nr; @@ -522,6 +523,13 @@ pushfile(void) parsefile = pf; } +void pushstdin(void) +{ + INTOFF; + basepf.prev = parsefile; + parsefile = &basepf; + INTON; +} void popfile(void) @@ -529,6 +537,11 @@ popfile(void) struct parsefile *pf = parsefile; INTOFF; + parsefile = pf->prev; + pf->prev = NULL; + if (pf == &basepf) + goto out; + if (pf->fd >= 0) close(pf->fd); if (pf->buf) @@ -539,15 +552,16 @@ popfile(void) popstring(); freestrings(parsefile->spfree); } - parsefile = pf->prev; ckfree(pf); + +out: INTON; } -void unwindfiles(struct parsefile *stop) +void __attribute__((noinline)) unwindfiles(struct parsefile *stop) { - while (parsefile != stop) + while (basepf.prev || parsefile != stop) popfile(); } diff --git a/src/input.h b/src/input.h index 151b1c6..c59d784 100644 --- a/src/input.h +++ b/src/input.h @@ -109,6 +109,7 @@ void pungetn(int); void pushstring(char *, void *); int setinputfile(const char *, int); void setinputstring(char *); +void pushstdin(void); void popfile(void); void unwindfiles(struct parsefile *); void popallfiles(void); diff --git a/src/miscbltin.c b/src/miscbltin.c index 8a0ddf4..10d256e 100644 --- a/src/miscbltin.c +++ b/src/miscbltin.c @@ -46,18 +46,20 @@ #include #include -#include "shell.h" -#include "options.h" -#include "var.h" -#include "output.h" -#include "memalloc.h" #include "error.h" +#include "expand.h" +#include "input.h" +#include "memalloc.h" #include "miscbltin.h" #include "mystring.h" #include "main.h" -#include "expand.h" +#include "options.h" +#include "output.h" #include "parser.h" +#include "shell.h" +#include "syntax.h" #include "trap.h" +#include "var.h" #undef rflag @@ -115,14 +117,13 @@ readcmd_handle_line(char *s, int ac, char **ap) int readcmd(int argc, char **argv) { - char **ap; - char c; - int rflag; char *prompt; - char *p; int startloc; int newloc; int status; + char **ap; + int rflag; + char *p; int i; rflag = 0; @@ -145,19 +146,17 @@ readcmd(int argc, char **argv) status = 0; STARTSTACKSTR(p); + pushstdin(); + goto start; for (;;) { - switch (read(0, &c, 1)) { - case 1: - break; - default: - if (errno == EINTR && !pending_sig) - continue; - /* fall through */ - case 0: + int c; + + c = pgetc(); + if (c == PEOF) { status = 1; - goto out; + break; } if (c == '\0') continue; @@ -186,7 +185,7 @@ start: newloc = startloc - 1; } } -out: + popfile(); recordregion(startloc, p - (char *)stackblock(), 0); STACKSTRNUL(p); readcmd_handle_line(p + 1, argc - (ap - argv), ap); From patchwork Sun May 19 05:20:37 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Herbert Xu X-Patchwork-Id: 13667776 X-Patchwork-Delegate: herbert@gondor.apana.org.au Received: from abb.hmeau.com (abb.hmeau.com [144.6.53.87]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8E3E933D5 for ; Sun, 19 May 2024 05:20:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=144.6.53.87 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716096042; cv=none; b=rZU2aE4TG56NGcjgCvsy9GFSFaqjWrZUTETtwMNL9naYkq8KdBdY23vbujVTuywD5XPt8d+So1qUU1CPvxNkuadIhB0Z4qQycvDDq3tHv3hT9CgsrjgEmr0VNSliXMws7zMrE2qeZpXyGKP/x1KJvNttNnFhT8v7eqmO0zHVFAU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716096042; c=relaxed/simple; bh=hDxYgMZoW6xdz3Ij8l6UKInfYhu0ULqIOxuPdaYDmXM=; h=Date:Message-Id:In-Reply-To:References:From:Subject:To; b=fkWMjEhV8miFteSsAqgsPp8wtuSrxdsBk3C2PEA9uFZYd0BZ7l3pd39HNeoUh4Q9/eI+19yaKkJrvngxOhT9LgZXT9yIE/HE229XHbpREFhUUZ5jME5E2qYDzv7flm5ek67eU5rYQSwQyG8GT6m4CMKeD+ZGh6uacWZ+802VsJk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=gondor.apana.org.au; spf=pass smtp.mailfrom=gondor.apana.org.au; arc=none smtp.client-ip=144.6.53.87 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=gondor.apana.org.au Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gondor.apana.org.au Received: from loth.rohan.me.apana.org.au ([192.168.167.2]) by formenos.hmeau.com with smtp (Exim 4.96 #2 (Debian)) id 1s8YyK-00HGFz-1K; Sun, 19 May 2024 13:20:37 +0800 Received: by loth.rohan.me.apana.org.au (sSMTP sendmail emulation); Sun, 19 May 2024 13:20:37 +0800 Date: Sun, 19 May 2024 13:20:37 +0800 Message-Id: <45c43c58d29b88b96679da71ba94f7a956c0c7f3.1716095868.git.herbert@gondor.apana.org.au> In-Reply-To: References: From: Herbert Xu Subject: [v4 PATCH 13/13] builtin: Process multi-byte characters in read(1) To: DASH Mailing List Precedence: bulk X-Mailing-List: dash@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Add support for multi-byte characters in read(1) by using getmbc from the parser. Signed-off-by: Herbert Xu --- src/miscbltin.c | 19 +++++++++++++------ src/parser.c | 2 +- src/parser.h | 1 + 3 files changed, 15 insertions(+), 7 deletions(-) diff --git a/src/miscbltin.c b/src/miscbltin.c index 10d256e..5aa2b24 100644 --- a/src/miscbltin.c +++ b/src/miscbltin.c @@ -36,15 +36,16 @@ * Miscelaneous builtins. */ +#include +#include +#include +#include #include /* quad_t */ #include /* BSD4_4 */ #include #include #include #include -#include -#include -#include #include "error.h" #include "expand.h" @@ -151,8 +152,10 @@ readcmd(int argc, char **argv) goto start; for (;;) { + unsigned ml; int c; + CHECKSTRSPACE((MB_LEN_MAX > 16 ? MB_LEN_MAX : 16) + 4, p); c = pgetc(); if (c == PEOF) { status = 1; @@ -160,9 +163,14 @@ readcmd(int argc, char **argv) } if (c == '\0') continue; + ml = getmbc(c, p, 0); + if (ml) { + p += ml; + goto record; + } if (newloc >= startloc) { if (c == '\n') - goto resetbs; + goto record; goto put; } if (!rflag && c == '\\') { @@ -172,13 +180,12 @@ readcmd(int argc, char **argv) if (c == '\n') break; put: - CHECKSTRSPACE(2, p); if (strchr(qchars, c)) USTPUTC(CTLESC, p); USTPUTC(c, p); +record: if (newloc >= startloc) { -resetbs: recordregion(startloc, newloc, 0); start: startloc = p - (char *)stackblock(); diff --git a/src/parser.c b/src/parser.c index 71d61f3..d368adc 100644 --- a/src/parser.c +++ b/src/parser.c @@ -883,7 +883,7 @@ static void synstack_pop(struct synstack **stack) *stack = (*stack)->next; } -static unsigned getmbc(int c, char *out, int mode) +unsigned getmbc(int c, char *out, int mode) { char *const start = out; mbstate_t mbst = {}; diff --git a/src/parser.h b/src/parser.h index 14bfc4f..7a9605b 100644 --- a/src/parser.h +++ b/src/parser.h @@ -95,6 +95,7 @@ const char *getprompt(void *); const char *const *findkwd(const char *); char *endofname(const char *); const char *expandstr(const char *); +unsigned getmbc(int c, char *out, int mode); static inline int goodname(const char *p)