Message ID | b62afca3e10b03c802276ace6e514f7ccb7a5134.1670979949.git.nabijaczleweli@nabijaczleweli.xyz (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Herbert Xu |
Headers | show |
Series | [v2,1/3] parser: fixredir: invalid redirections are run-time, not syntax | expand |
наб <nabijaczleweli@nabijaczleweli.xyz> wrote: > > synerror(const char *msg) > { > errlinno = plinno; > + > + /* If we see a syntax error in a command, read the rest of the > + * line now before reporting the error. This ensures we get error > + * reporting that does not depend on buffering details. */ > + skipline(); This is broken. What if we already read a newline just before the syntax error (e.g., synexpect(TDO))? This needs to be dealt with in the input layer. Cheers,
On 03/01/2023 01:53, Herbert Xu wrote: > наб <nabijaczleweli@nabijaczleweli.xyz> wrote: >> >> synerror(const char *msg) >> { >> errlinno = plinno; >> + >> + /* If we see a syntax error in a command, read the rest of the >> + * line now before reporting the error. This ensures we get error >> + * reporting that does not depend on buffering details. */ >> + skipline(); > > This is broken. What if we already read a newline just before > the syntax error (e.g., synexpect(TDO))? In order for this to be a problem, we need something where the newline itself triggers a syntax error. For synexpect(TDO), is that possible? In all cases where synexpect(TDO) is called, a newline is permitted, and it is the token after the newline that the error will be reported on, no? In that situation, reading the rest of the line is correct. I'm not going to rule out that there is a potential for problems, but if there is, it's not the situation you say. > This needs to be dealt with in the input layer. Handling it in the input layer after the error has already been reported means we get inconsistent error handling when running with dash -isv. The error may, depending on whether the rest of the line was in the buffer already, be reported either in the middle of the input line, or on the next line. Just try changing sh -i to sh -iv in the reproducer and seeing what what happens with your patch. Cheers, Harald van Dijk
On Tue, Jan 03, 2023 at 11:47:05AM +0000, Harald van Dijk wrote: > > In order for this to be a problem, we need something where the newline > itself triggers a syntax error. For synexpect(TDO), is that possible? In all > cases where synexpect(TDO) is called, a newline is permitted, and it is the > token after the newline that the error will be reported on, no? In that > situation, reading the rest of the line is correct. You're right, it isn't possible with TDO. > I'm not going to rule out that there is a potential for problems, but if > there is, it's not the situation you say. However, the synexpect in parsefname would seem to qualify: cat <<- <newline> > Handling it in the input layer after the error has already been reported > means we get inconsistent error handling when running with dash -isv. The > error may, depending on whether the rest of the line was in the buffer > already, be reported either in the middle of the input line, or on the next > line. > > Just try changing sh -i to sh -iv in the reproducer and seeing what what > happens with your patch. I don't think this is such a big deal. You get the same effect if you simply do input line buffering but only up to a certain length. I tried one million instead of ten thousand with the reproducer and ksh93 prints out the error after only 65536 characters. In any case, we could actually fix this by buffering stderr and then making sh -v bypass the buffer and write directly to 2. I tried it and it actually seems to work. But all the extra flush calls bloat up the code a bit so it needs a bit more work before I'm happy to run with it. Cheers,
On 04/01/2023 09:51, Herbert Xu wrote: > On Tue, Jan 03, 2023 at 11:47:05AM +0000, Harald van Dijk wrote: >> >> In order for this to be a problem, we need something where the newline >> itself triggers a syntax error. For synexpect(TDO), is that possible? In all >> cases where synexpect(TDO) is called, a newline is permitted, and it is the >> token after the newline that the error will be reported on, no? In that >> situation, reading the rest of the line is correct. > > You're right, it isn't possible with TDO. > >> I'm not going to rule out that there is a potential for problems, but if >> there is, it's not the situation you say. > > However, the synexpect in parsefname would seem to qualify: > > cat <<- <newline> You're right, that does show the problem, thanks. Since the only case where special care is needed is when the problematic token was newline, it can be handled as simply as if (lasttoken != TNL) skipline(); I will do some testing to see if I am overlooking something, and follow up if I am. >> Handling it in the input layer after the error has already been reported >> means we get inconsistent error handling when running with dash -isv. The >> error may, depending on whether the rest of the line was in the buffer >> already, be reported either in the middle of the input line, or on the next >> line. >> >> Just try changing sh -i to sh -iv in the reproducer and seeing what what >> happens with your patch. > > I don't think this is such a big deal. You get the same effect > if you simply do input line buffering but only up to a certain > length. I tried one million instead of ten thousand with the > reproducer and ksh93 prints out the error after only 65536 > characters. > > In any case, we could actually fix this by buffering stderr and > then making sh -v bypass the buffer and write directly to 2. > > I tried it and it actually seems to work. But all the extra flush > calls bloat up the code a bit so it needs a bit more work before > I'm happy to run with it. Interesting approach. Error messages are known to always fit in the buffer, so you can probably be sure it won't be forced to be emitted early. It could work. Cheers, Harald van Dijk
On Wed, Jan 04, 2023 at 11:25:09AM +0000, Harald van Dijk wrote: > > Since the only case where special care is needed is when the problematic > token was newline, it can be handled as simply as > > if (lasttoken != TNL) > skipline(); You're assuming that the only way of exiting the parser is through synerror. That's not the case. We could also exit through sh_error, e.g., if ckmalloc runs out of memory. Cheers,
On 04/01/2023 14:10, Herbert Xu wrote: > On Wed, Jan 04, 2023 at 11:25:09AM +0000, Harald van Dijk wrote: >> >> Since the only case where special care is needed is when the problematic >> token was newline, it can be handled as simply as >> >> if (lasttoken != TNL) >> skipline(); > > You're assuming that the only way of exiting the parser is > through synerror. That's not the case. We could also exit > through sh_error, e.g., if ckmalloc runs out of memory. In theory, yes. In practice, because of the default Linux overcommit behaviour, that is not going to happen, that's going to cause the OOM killer to kick in and forcibly kill the whole process without any way of handling it. Either that, or forcibly kill some other process that we get no indication of. Even if we change the overcommit settings, I'm struggling to think of any situation where parsing fails because we run out of memory, but freeing the memory we've already allocated for parsing that one line of input frees enough to allow us to meaningfully continue. Cheers, Harald van Dijk
On Wed, Jan 04, 2023 at 02:30:00PM +0000, Harald van Dijk wrote: > > In theory, yes. In practice, because of the default Linux overcommit > behaviour, that is not going to happen, that's going to cause the OOM killer > to kick in and forcibly kill the whole process without any way of handling > it. Either that, or forcibly kill some other process that we get no > indication of. > > Even if we change the overcommit settings, I'm struggling to think of any > situation where parsing fails because we run out of memory, but freeing the > memory we've already allocated for parsing that one line of input frees > enough to allow us to meaningfully continue. ckmalloc is just an example. It could really come from anywhere. For example, expandstr calls expandarg, which could trigger an exception for SIGINT. Indeed, getting SIGINT directly while in the parser would also do the trick. Cheers,
On 04/01/2023 14:41, Herbert Xu wrote: > On Wed, Jan 04, 2023 at 02:30:00PM +0000, Harald van Dijk wrote: >> >> In theory, yes. In practice, because of the default Linux overcommit >> behaviour, that is not going to happen, that's going to cause the OOM killer >> to kick in and forcibly kill the whole process without any way of handling >> it. Either that, or forcibly kill some other process that we get no >> indication of. >> >> Even if we change the overcommit settings, I'm struggling to think of any >> situation where parsing fails because we run out of memory, but freeing the >> memory we've already allocated for parsing that one line of input frees >> enough to allow us to meaningfully continue. > > ckmalloc is just an example. It could really come from anywhere. > For example, expandstr calls expandarg, which could trigger an > exception for SIGINT. Indeed, getting SIGINT directly while in > the parser would also do the trick. The only place expandstr() is called in the parser is in getprompt(), which is only called when we're at the start of a line and reading the next line of input is probably wrong. As for a SIGINT during actual parsing, that's true. I'm not sure reading the rest of the input line is always the right thing to do in that case either. Imagine the user typing echo 1 <^C> echo 2 <return> Normally we want this 'echo 2' to be interpreted as a new command. It seems odd to me to say that we interpret the 'echo 2' as a new command, *except* if the Ctrl-C is processed early, and the 'echo 2' is entered early too, in which case we discard it. Especially if this special exception only applies if "early" is so early that it cannot reliably be triggered. I would want this to be deterministic. As long as we're talking about things like that, regardless of how it's implemented, we could also get another SIGINT during the skipping of the rest of the input line. Cheers, Harald van Dijk
diff --git a/src/parser.c b/src/parser.c index 8a06b9e..35fdbc3 100644 --- a/src/parser.c +++ b/src/parser.c @@ -761,6 +761,13 @@ static void nlnoprompt(void) needprompt = doprompt; } +static void +skipline(void) +{ + int c; + while ((c = pgetc()) != '\n' && c != PEOF); +} + /* * Read the next input token. @@ -798,7 +805,7 @@ xxreadtoken(void) case ' ': case '\t': continue; case '#': - while ((c = pgetc()) != '\n' && c != PEOF); + skipline(); pungetc(); continue; case '\n': @@ -1526,6 +1533,12 @@ STATIC void synerror(const char *msg) { errlinno = plinno; + + /* If we see a syntax error in a command, read the rest of the + * line now before reporting the error. This ensures we get error + * reporting that does not depend on buffering details. */ + skipline(); + sh_error("Syntax error: %s", msg); /* NOTREACHED */ }