allow show_token() on TOKEN_ZERO_IDENT

Message ID	20220607125441.36757-1-lucvoo@kernel.org (mailing list archive)
State	Mainlined, archived
Headers	show Return-Path: <linux-sparse-owner@kernel.org> From: Luc Van Oostenryck <lucvoo@kernel.org> To: linux-sparse@vger.kernel.org Cc: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> Subject: [PATCH] allow show_token() on TOKEN_ZERO_IDENT Date: Tue, 7 Jun 2022 14:54:41 +0200 Message-Id: <20220607125441.36757-1-lucvoo@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	allow show_token() on TOKEN_ZERO_IDENT \| expand allow show_token() on TOKEN_ZERO_IDENT

Message ID

20220607125441.36757-1-lucvoo@kernel.org (mailing list archive)

State

Mainlined, archived

Headers

From: Luc Van Oostenryck <lucvoo@kernel.org>
To: linux-sparse@vger.kernel.org
Cc: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Subject: [PATCH] allow show_token() on TOKEN_ZERO_IDENT
Date: Tue,  7 Jun 2022 14:54:41 +0200
Message-Id: <20220607125441.36757-1-lucvoo@kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

allow show_token() on TOKEN_ZERO_IDENT | expand

Commit Message

Luc Van Oostenryck June 7, 2022, 12:54 p.m. UTC

From: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>

TOKEN_ZERO_IDENTs are created during the evaluation of pre-processor
expressions but which otherwise are normal idents and  were first tokenized
as TOKEN_IDENTs.

As such, they could perfectly be displayed by show_token() but are not.
So, in error messages they are displayed as "unhandled token type '4'",
which is not at all informative.

Fix this by letting show_token() process them like usual TOKEN_IDENTs.
Idem for quote_token().

Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
---
 tokenize.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Linus Torvalds June 7, 2022, 6:26 p.m. UTC | #1

On Tue, Jun 7, 2022 at 5:55 AM Luc Van Oostenryck <lucvoo@kernel.org> wrote:
>
> TOKEN_ZERO_IDENTs are created during the evaluation of pre-processor
> expressions but which otherwise are normal idents and  were first tokenized
> as TOKEN_IDENTs.
>
> As such, they could perfectly be displayed by show_token() but are not.
> So, in error messages they are displayed as "unhandled token type '4'",
> which is not at all informative.
>
> Fix this by letting show_token() process them like usual TOKEN_IDENTs.
> Idem for quote_token().

Ack.

I do wonder if it should be marked somehow as being that special case.
The main reason for 'show_token()' is debugging, after all, and
TOKEN_ZERO_IDENT does have magical properties in how it either
silently expands to the constant '0', or it generates a warning about
undefined preprocessor symbol.

But considering that we've apparently reported it as "unhandled token
type '4'" since 2005, I guess it's not exactly a big deal.

           Linus

Luc Van Oostenryck June 7, 2022, 8:48 p.m. UTC | #2

On Tue, Jun 07, 2022 at 11:26:36AM -0700, Linus Torvalds wrote:
> On Tue, Jun 7, 2022 at 5:55 AM Luc Van Oostenryck <lucvoo@kernel.org> wrote:
> >
> > TOKEN_ZERO_IDENTs are created during the evaluation of pre-processor
> > expressions but which otherwise are normal idents and  were first tokenized
> > as TOKEN_IDENTs.
> >
> > As such, they could perfectly be displayed by show_token() but are not.
> > So, in error messages they are displayed as "unhandled token type '4'",
> > which is not at all informative.
> >
> > Fix this by letting show_token() process them like usual TOKEN_IDENTs.
> > Idem for quote_token().
> 
> Ack.
> 
> I do wonder if it should be marked somehow as being that special case.
> The main reason for 'show_token()' is debugging, after all, and
> TOKEN_ZERO_IDENT does have magical properties in how it either
> silently expands to the constant '0', or it generates a warning about
> undefined preprocessor symbol.
> 
> But considering that we've apparently reported it as "unhandled token
> type '4'" since 2005, I guess it's not exactly a big deal.

Yes, I first thought to do so but then choose not because I could not
convince myself that its special property was irrelevant in warning/error
messages. It looks to me more as an internal thing, more semantical than
lexical, and a non-faithful representation would be confusing in messages.

For context, the input text I had (from GCC's testsuite) was:
	#define empty
	#if empty#cpu(m68k)
	#endif
and the error message sparse issued was:
	error: garbage at end: #unhandled token type '4' (unhandled token type '4' )
with this patch it's:
	error: garbage at end: #cpu(m68k)
 
-- Luc

diff --git a/tokenize.c b/tokenize.c
index ea7105438270..fdaea370cc48 100644
--- a/tokenize.c
+++ b/tokenize.c
@@ -201,6 +201,7 @@  const char *show_token(const struct token *token)
 		return "end-of-input";
 
 	case TOKEN_IDENT:
+	case TOKEN_ZERO_IDENT:
 		return show_ident(token->ident);
 
 	case TOKEN_NUMBER:
@@ -259,6 +260,7 @@  const char *quote_token(const struct token *token)
 		return "syntax error";
 
 	case TOKEN_IDENT:
+	case TOKEN_ZERO_IDENT:
 		return show_ident(token->ident);
 
 	case TOKEN_NUMBER:

allow show_token() on TOKEN_ZERO_IDENT

Commit Message

Comments

Patch