diff mbox series

[v2,2/3] parser: synerror: explicitly consume the entire invalid line

Message ID b62afca3e10b03c802276ace6e514f7ccb7a5134.1670979949.git.nabijaczleweli@nabijaczleweli.xyz (mailing list archive)
State Superseded
Delegated to: Herbert Xu
Headers show
Series [v2,1/3] parser: fixredir: invalid redirections are run-time, not syntax | expand

Commit Message

наб Dec. 14, 2022, 1:06 a.m. UTC
Interactively, sh_error() doesn't terminate, so
  echo "|$(printf %10000s)echo bug" | sh -i
would read the first 8KiB, see that it's invalid, then jump back to the
parser, which would then read and execute the rest of the line as-if
it were the next line.

The fix for this is to explicitly consume the rest of the invalid line,
so that the next line observed is /actually/ the next line.

This is difficult to trigger accidentally right now, since we consume
the entire icanon line buffer at once (provided it's <8k, which it
~always is interactively), so we always observe one line at a time,
but the next patch would make even "| echo bug" blow up.

Imported-from: https://github.com/hvdijk/gwsh/commit/d279523041c1c380d64b6dec7760feba20bbf6b5
---
 src/parser.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

Comments

Herbert Xu Jan. 3, 2023, 1:53 a.m. UTC | #1
наб <nabijaczleweli@nabijaczleweli.xyz> wrote:
>
> synerror(const char *msg)
> {
>        errlinno = plinno;
> +
> +       /* If we see a syntax error in a command, read the rest of the
> +        * line now before reporting the error. This ensures we get error
> +        * reporting that does not depend on buffering details. */
> +       skipline();

This is broken.  What if we already read a newline just before
the syntax error (e.g., synexpect(TDO))?

This needs to be dealt with in the input layer.

Cheers,
Harald van Dijk Jan. 3, 2023, 11:47 a.m. UTC | #2
On 03/01/2023 01:53, Herbert Xu wrote:
> наб <nabijaczleweli@nabijaczleweli.xyz> wrote:
>>
>> synerror(const char *msg)
>> {
>>         errlinno = plinno;
>> +
>> +       /* If we see a syntax error in a command, read the rest of the
>> +        * line now before reporting the error. This ensures we get error
>> +        * reporting that does not depend on buffering details. */
>> +       skipline();
> 
> This is broken.  What if we already read a newline just before
> the syntax error (e.g., synexpect(TDO))?

In order for this to be a problem, we need something where the newline 
itself triggers a syntax error. For synexpect(TDO), is that possible? In 
all cases where synexpect(TDO) is called, a newline is permitted, and it 
is the token after the newline that the error will be reported on, no? 
In that situation, reading the rest of the line is correct.

I'm not going to rule out that there is a potential for problems, but if 
there is, it's not the situation you say.

> This needs to be dealt with in the input layer.

Handling it in the input layer after the error has already been reported 
means we get inconsistent error handling when running with dash -isv. 
The error may, depending on whether the rest of the line was in the 
buffer already, be reported either in the middle of the input line, or 
on the next line.

Just try changing sh -i to sh -iv in the reproducer and seeing what what 
happens with your patch.

Cheers,
Harald van Dijk
Herbert Xu Jan. 4, 2023, 9:51 a.m. UTC | #3
On Tue, Jan 03, 2023 at 11:47:05AM +0000, Harald van Dijk wrote:
>
> In order for this to be a problem, we need something where the newline
> itself triggers a syntax error. For synexpect(TDO), is that possible? In all
> cases where synexpect(TDO) is called, a newline is permitted, and it is the
> token after the newline that the error will be reported on, no? In that
> situation, reading the rest of the line is correct.

You're right, it isn't possible with TDO.

> I'm not going to rule out that there is a potential for problems, but if
> there is, it's not the situation you say.

However, the synexpect in parsefname would seem to qualify:

cat <<- <newline>

> Handling it in the input layer after the error has already been reported
> means we get inconsistent error handling when running with dash -isv. The
> error may, depending on whether the rest of the line was in the buffer
> already, be reported either in the middle of the input line, or on the next
> line.
> 
> Just try changing sh -i to sh -iv in the reproducer and seeing what what
> happens with your patch.

I don't think this is such a big deal.  You get the same effect
if you simply do input line buffering but only up to a certain
length.  I tried one million instead of ten thousand with the
reproducer and ksh93 prints out the error after only 65536
characters.

In any case, we could actually fix this by buffering stderr and
then making sh -v bypass the buffer and write directly to 2.

I tried it and it actually seems to work.  But all the extra flush
calls bloat up the code a bit so it needs a bit more work before
I'm happy to run with it.

Cheers,
Harald van Dijk Jan. 4, 2023, 11:25 a.m. UTC | #4
On 04/01/2023 09:51, Herbert Xu wrote:
> On Tue, Jan 03, 2023 at 11:47:05AM +0000, Harald van Dijk wrote:
>>
>> In order for this to be a problem, we need something where the newline
>> itself triggers a syntax error. For synexpect(TDO), is that possible? In all
>> cases where synexpect(TDO) is called, a newline is permitted, and it is the
>> token after the newline that the error will be reported on, no? In that
>> situation, reading the rest of the line is correct.
> 
> You're right, it isn't possible with TDO.
> 
>> I'm not going to rule out that there is a potential for problems, but if
>> there is, it's not the situation you say.
> 
> However, the synexpect in parsefname would seem to qualify:
> 
> cat <<- <newline>

You're right, that does show the problem, thanks.

Since the only case where special care is needed is when the problematic 
token was newline, it can be handled as simply as

   if (lasttoken != TNL)
     skipline();

I will do some testing to see if I am overlooking something, and follow 
up if I am.

>> Handling it in the input layer after the error has already been reported
>> means we get inconsistent error handling when running with dash -isv. The
>> error may, depending on whether the rest of the line was in the buffer
>> already, be reported either in the middle of the input line, or on the next
>> line.
>>
>> Just try changing sh -i to sh -iv in the reproducer and seeing what what
>> happens with your patch.
> 
> I don't think this is such a big deal.  You get the same effect
> if you simply do input line buffering but only up to a certain
> length.  I tried one million instead of ten thousand with the
> reproducer and ksh93 prints out the error after only 65536
> characters.
> 
> In any case, we could actually fix this by buffering stderr and
> then making sh -v bypass the buffer and write directly to 2.
> 
> I tried it and it actually seems to work.  But all the extra flush
> calls bloat up the code a bit so it needs a bit more work before
> I'm happy to run with it.

Interesting approach. Error messages are known to always fit in the 
buffer, so you can probably be sure it won't be forced to be emitted 
early. It could work.

Cheers,
Harald van Dijk
Herbert Xu Jan. 4, 2023, 2:10 p.m. UTC | #5
On Wed, Jan 04, 2023 at 11:25:09AM +0000, Harald van Dijk wrote:
> 
> Since the only case where special care is needed is when the problematic
> token was newline, it can be handled as simply as
> 
>   if (lasttoken != TNL)
>     skipline();

You're assuming that the only way of exiting the parser is
through synerror.  That's not the case.  We could also exit
through sh_error, e.g., if ckmalloc runs out of memory.

Cheers,
Harald van Dijk Jan. 4, 2023, 2:30 p.m. UTC | #6
On 04/01/2023 14:10, Herbert Xu wrote:
> On Wed, Jan 04, 2023 at 11:25:09AM +0000, Harald van Dijk wrote:
>>
>> Since the only case where special care is needed is when the problematic
>> token was newline, it can be handled as simply as
>>
>>    if (lasttoken != TNL)
>>      skipline();
> 
> You're assuming that the only way of exiting the parser is
> through synerror.  That's not the case.  We could also exit
> through sh_error, e.g., if ckmalloc runs out of memory.

In theory, yes. In practice, because of the default Linux overcommit 
behaviour, that is not going to happen, that's going to cause the OOM 
killer to kick in and forcibly kill the whole process without any way of 
handling it. Either that, or forcibly kill some other process that we 
get no indication of.

Even if we change the overcommit settings, I'm struggling to think of 
any situation where parsing fails because we run out of memory, but 
freeing the memory we've already allocated for parsing that one line of 
input frees enough to allow us to meaningfully continue.

Cheers,
Harald van Dijk
Herbert Xu Jan. 4, 2023, 2:41 p.m. UTC | #7
On Wed, Jan 04, 2023 at 02:30:00PM +0000, Harald van Dijk wrote:
>
> In theory, yes. In practice, because of the default Linux overcommit
> behaviour, that is not going to happen, that's going to cause the OOM killer
> to kick in and forcibly kill the whole process without any way of handling
> it. Either that, or forcibly kill some other process that we get no
> indication of.
> 
> Even if we change the overcommit settings, I'm struggling to think of any
> situation where parsing fails because we run out of memory, but freeing the
> memory we've already allocated for parsing that one line of input frees
> enough to allow us to meaningfully continue.

ckmalloc is just an example.  It could really come from anywhere.
For example, expandstr calls expandarg, which could trigger an
exception for SIGINT.  Indeed, getting SIGINT directly while in
the parser would also do the trick.

Cheers,
Harald van Dijk Jan. 4, 2023, 2:59 p.m. UTC | #8
On 04/01/2023 14:41, Herbert Xu wrote:
> On Wed, Jan 04, 2023 at 02:30:00PM +0000, Harald van Dijk wrote:
>>
>> In theory, yes. In practice, because of the default Linux overcommit
>> behaviour, that is not going to happen, that's going to cause the OOM killer
>> to kick in and forcibly kill the whole process without any way of handling
>> it. Either that, or forcibly kill some other process that we get no
>> indication of.
>>
>> Even if we change the overcommit settings, I'm struggling to think of any
>> situation where parsing fails because we run out of memory, but freeing the
>> memory we've already allocated for parsing that one line of input frees
>> enough to allow us to meaningfully continue.
> 
> ckmalloc is just an example.  It could really come from anywhere.
> For example, expandstr calls expandarg, which could trigger an
> exception for SIGINT.  Indeed, getting SIGINT directly while in
> the parser would also do the trick.

The only place expandstr() is called in the parser is in getprompt(), 
which is only called when we're at the start of a line and reading the 
next line of input is probably wrong.

As for a SIGINT during actual parsing, that's true. I'm not sure reading 
the rest of the input line is always the right thing to do in that case 
either. Imagine the user typing

   echo 1 <^C> echo 2 <return>

Normally we want this 'echo 2' to be interpreted as a new command. It 
seems odd to me to say that we interpret the 'echo 2' as a new command, 
*except* if the Ctrl-C is processed early, and the 'echo 2' is entered 
early too, in which case we discard it. Especially if this special 
exception only applies if "early" is so early that it cannot reliably be 
triggered. I would want this to be deterministic.

As long as we're talking about things like that, regardless of how it's 
implemented, we could also get another SIGINT during the skipping of the 
rest of the input line.

Cheers,
Harald van Dijk
diff mbox series

Patch

diff --git a/src/parser.c b/src/parser.c
index 8a06b9e..35fdbc3 100644
--- a/src/parser.c
+++ b/src/parser.c
@@ -761,6 +761,13 @@  static void nlnoprompt(void)
 	needprompt = doprompt;
 }
 
+static void
+skipline(void)
+{
+	int c;
+	while ((c = pgetc()) != '\n' && c != PEOF);
+}
+
 
 /*
  * Read the next input token.
@@ -798,7 +805,7 @@  xxreadtoken(void)
 		case ' ': case '\t':
 			continue;
 		case '#':
-			while ((c = pgetc()) != '\n' && c != PEOF);
+			skipline();
 			pungetc();
 			continue;
 		case '\n':
@@ -1526,6 +1533,12 @@  STATIC void
 synerror(const char *msg)
 {
 	errlinno = plinno;
+
+	/* If we see a syntax error in a command, read the rest of the
+	 * line now before reporting the error. This ensures we get error
+	 * reporting that does not depend on buffering details. */
+	skipline();
+
 	sh_error("Syntax error: %s", msg);
 	/* NOTREACHED */
 }