diff mbox series

[2/4] chainlint: tighten accuracy when consuming input stream

Message ID 31af383fd439c3c0a5003598961acfecfae4018c.1667934510.git.gitgitgadget@gmail.com (mailing list archive)
State Accepted
Commit ca748f518358448efa46f01539a8bdce5cfc710f
Headers show
Series chainlint: improve annotated output | expand

Commit Message

Eric Sunshine Nov. 8, 2022, 7:08 p.m. UTC
From: Eric Sunshine <sunshine@sunshineco.com>

To extract the next token in the input stream, Lexer::scan_token() finds
the start of the token by skipping whitespace, then consumes characters
belonging to the token until it encounters a non-token character, such
as an operator, punctuation, or whitespace. In the case of an operator
or punctuation which ends a token, before returning the just-scanned
token, it pushes that operator or punctuation character back onto the
input stream to ensure that it will be the first character consumed by
the next call to scan_token().

However, scan_token() is intentionally lax when whitespace ends a token;
it doesn't bother pushing the whitespace character back onto the token
stream since it knows that the next call to scan_token() will, as its
first step, skip over whitespace anyhow when looking for the start of
the token.

Although such laxity is harmless for the proper functioning of the
lexical analyzer, it does make it difficult to precisely identify the
token's end position in the input stream. Accurate token position
information may be desirable, for instance, to annotate problems or
highlight other interesting facets of the input found during the parsing
phase. To accommodate such possibilities, tighten scan_token() by making
it push the token-ending whitespace character back onto the input
stream, just as it does for other token-ending characters.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
---
 t/chainlint.pl | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
diff mbox series

Patch

diff --git a/t/chainlint.pl b/t/chainlint.pl
index 9908de6c758..1f66c03c593 100755
--- a/t/chainlint.pl
+++ b/t/chainlint.pl
@@ -179,7 +179,7 @@  RESTART:
 		# handle special characters
 		last unless $$b =~ /\G(.)/sgc;
 		my $c = $1;
-		last if $c =~ /^[ \t]$/; # whitespace ends token
+		pos($$b)--, last if $c =~ /^[ \t]$/; # whitespace ends token
 		pos($$b)--, last if length($token) && $c =~ /^[;&|<>(){}\n]$/;
 		$token .= $self->scan_sqstring(), next if $c eq "'";
 		$token .= $self->scan_dqstring(), next if $c eq '"';