diff mbox series

input: preadfd: read standard input byte-wise

Message ID 20221213221732.6mvv22u7ktdozrbx@tarta.nabijaczleweli.xyz (mailing list archive)
State Superseded
Delegated to: Herbert Xu
Headers show
Series input: preadfd: read standard input byte-wise | expand

Commit Message

наб Dec. 13, 2022, 10:17 p.m. UTC
POSIX Issue 7, XCU, sh, STDIN says:
  When the shell is using standard input and it invokes a command that
  also uses standard input, the shell shall ensure that the standard
  input file pointer points directly after the command it has read when
  the command begins execution. It shall not read ahead in such a manner
  that any characters intended to be read by the invoked command are
  consumed by the shell (whether interpreted by the shell or not) or
  that characters that are not read by the invoked command are not seen
  by the shell.

I.e.
  sh <<EOF
  id
  cat
  good!
  EOF
must execute id, then execute cat, then the cat must copy "good!"
to the standard output stream, and similarly
  sh <<"EOF"
  id
  read Q
  good!
  echo Q$Q
  EOF
must execute id, then read "good!" into Q, then echo "Qgood!".

Heretofor the output was as such:
  uid=1000(nabijaczleweli) gid=100(users) groups=100(users)
  ./dash: 3: good!: not found
and as such (with -x):
  + id
  uid=1000(nabijaczleweli) gid=100(users) groups=100(users)
  + read Q
  + good!
  sh: 3: good!: not found
  + echo Q
  Q
and a strace confirms:
  read(0, "id\ncat\ngood!\n", 8192)       = 13
  read(0, "id\nread Q\ngood!\necho Q$Q\n", 8192) = 25

Reading the standard input byte-by-byte is the obvious solution to this
issue, Just Works, and is how all other shells do it (we could,
theoretically, read regular files block-wise, then seek within them after
parsing, but the complexity out-weighs the rarity of running
sh < program; we could also do whole-line reads on teletypes in
icanon mode, but, again, the gain here is miniscule for an interactive
session, and the teletype mode can change at any time, so...).
Naturally, we keep reading block-wise for non-standard-input.

With this patch, we observe the correct
  uid=1000(nabijaczleweli) gid=100(users) groups=100(users)
  good!
and
  + id
  uid=1000(nabijaczleweli) gid=100(users) groups=100(users)
  + read Q
  + echo Qgood!
  Qgood!

Fixes: https://bugs.debian.org/862907
---
 src/input.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Harald van Dijk Dec. 13, 2022, 10:37 p.m. UTC | #1
On 13/12/2022 22:17, наб wrote:
> Reading the standard input byte-by-byte is the obvious solution to this
> issue, Just Works, and is how all other shells do it (we could,
> theoretically, read regular files block-wise, then seek within them after
> parsing, but the complexity out-weighs the rarity of running
> sh < program; we could also do whole-line reads on teletypes in
> icanon mode, but, again, the gain here is miniscule for an interactive
> session, and the teletype mode can change at any time, so...).
> Naturally, we keep reading block-wise for non-standard-input.

There are a few things to consider here:

- Not all shells do it this way. bash does do the block-wise read,
   followed by a seek, when stdin is seekable. While I agree that it is
   not necessary and not worth it, you specifically say that other shells
   do not do this. That's simply not true.
- This will have no effect when running 'dash /dev/stdin'. I personally
   consider this acceptable, this doesn't work in other shells either.
- This patch breaks internal assumptions in dash that the buffer will
   contain a full line, affecting error recovery. See below.

> With this patch, we observe the correct
>    uid=1000(nabijaczleweli) gid=100(users) groups=100(users)
>    good!
> and
>    + id
>    uid=1000(nabijaczleweli) gid=100(users) groups=100(users)
>    + read Q
>    + echo Qgood!
>    Qgood!
> 
> Fixes: https://bugs.debian.org/862907
> ---
>   src/input.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/input.c b/src/input.c
> index ec075f5..6b6113e 100644
> --- a/src/input.c
> +++ b/src/input.c
> @@ -195,7 +195,7 @@ retry:
>   
>   	} else
>   #endif
> -		nr = read(parsefile->fd, buf, IBUFSIZ - 1);
> +		nr = read(parsefile->fd, buf, parsefile->fd == 0 ? 1 : IBUFSIZ - 1);
>   
>   
>   	if (nr < 0) {

With dash 0.5.12:

   $ | echo bug
   src/dash: 1: Syntax error: "|" unexpected
   $

With dash 0.5.12 + your patch:

   $ | echo bug
   src/dash: 1: Syntax error: "|" unexpected
   $ bug
   $

I had implemented the same change in my fork, see 
<https://github.com/hvdijk/gwsh/commit/d279523041c1c380d64b6dec7760feba20bbf6b5> 
for the additional changes I needed to get everything working.

Cheers,
Harald van Dijk
наб Dec. 14, 2022, 1:05 a.m. UTC | #2
Hi!

On Tue, Dec 13, 2022 at 10:37:12PM +0000, Harald van Dijk wrote:
> On 13/12/2022 22:17, наб wrote:
> > Reading the standard input byte-by-byte is the obvious solution to this
> > issue, Just Works, and is how all other shells do it (we could,
> > theoretically, read regular files block-wise, then seek within them after
> > parsing, but the complexity out-weighs the rarity of running
> > sh < program; we could also do whole-line reads on teletypes in
> > icanon mode, but, again, the gain here is miniscule for an interactive
> > session, and the teletype mode can change at any time, so...).
> > Naturally, we keep reading block-wise for non-standard-input.
> There are a few things to consider here:
> - Not all shells do it this way. bash does do the block-wise read,
>   followed by a seek, when stdin is seekable. While I agree that it is
>   not necessary and not worth it, you specifically say that other shells
>   do not do this. That's simply not true.

Yeah, I wrote the parenthetical way after the rest and didn't think to
reassess them in consort (it appears that ksh93 seeks, zsh doesn't).

> - This patch breaks internal assumptions in dash that the buffer will
>   contain a full line, affecting error recovery. See below.
> > --- a/src/input.c
> > +++ b/src/input.c
> > @@ -195,7 +195,7 @@ retry:
> >   	} else
> >   #endif
> > -		nr = read(parsefile->fd, buf, IBUFSIZ - 1);
> > +		nr = read(parsefile->fd, buf, parsefile->fd == 0 ? 1 : IBUFSIZ - 1);
> >   	if (nr < 0) {
> 
> With dash 0.5.12:
> 
>   $ | echo bug
>   src/dash: 1: Syntax error: "|" unexpected
>   $
> 
> With dash 0.5.12 + your patch:
> 
>   $ | echo bug
>   src/dash: 1: Syntax error: "|" unexpected
>   $ bug
>   $
> 
> I had implemented the same change in my fork, see <https://github.com/hvdijk/gwsh/commit/d279523041c1c380d64b6dec7760feba20bbf6b5>
> for the additional changes I needed to get everything working.

It's interesting to see that the line error bug appears to blame back to
start-of-git(?), and the partial line consumption is triggerable ‒
with vastly more pathological input, admittedly ‒ on unpatched dash as
well. I've imported the other fixes piece-meal, and the updated
series (in follow-up), passes all cases in your commit message as well.

Best,
наб
diff mbox series

Patch

diff --git a/src/input.c b/src/input.c
index ec075f5..6b6113e 100644
--- a/src/input.c
+++ b/src/input.c
@@ -195,7 +195,7 @@  retry:
 
 	} else
 #endif
-		nr = read(parsefile->fd, buf, IBUFSIZ - 1);
+		nr = read(parsefile->fd, buf, parsefile->fd == 0 ? 1 : IBUFSIZ - 1);
 
 
 	if (nr < 0) {