RFC: problems with cast optimization at evaluation time

Message ID	20210102100934.28459-1-luc.vanoostenryck@gmail.com (mailing list archive)
State	Superseded, archived
Headers	show Return-Path: <linux-sparse-owner@kernel.org> From: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> To: linux-sparse@vger.kernel.org Cc: Linus Torvalds <torvalds@linux-foundation.org>, Luc Van Oostenryck <luc.vanoostenryck@gmail.com> Subject: [PATCH] RFC: problems with cast optimization at evaluation time Date: Sat, 2 Jan 2021 11:09:34 +0100 Message-Id: <20210102100934.28459-1-luc.vanoostenryck@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	RFC: problems with cast optimization at evaluation time \| expand RFC: problems with cast optimization at evaluation time

Message ID

20210102100934.28459-1-luc.vanoostenryck@gmail.com (mailing list archive)

State

Superseded, archived

Headers

From: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
To: linux-sparse@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
        Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Subject: [PATCH] RFC: problems with cast optimization at evaluation time
Date: Sat,  2 Jan 2021 11:09:34 +0100
Message-Id: <20210102100934.28459-1-luc.vanoostenryck@gmail.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

RFC: problems with cast optimization at evaluation time | expand

Commit Message

Luc Van Oostenryck Jan. 2, 2021, 10:09 a.m. UTC

The current code will, at evaluation time, optimize some cast.
For example, code like:
	int i  = <some value>;
	char c = ~i;
will be optimized to what would essentially be:
	int i = <some value>;
	char c = ~(char) i;

This is fine but it has at least two problems:
	int i = <some value>;
	bool b = ~i;
is not the same as the 'optimized' (it's would only be for 0 and -1):
	int i = <some value>;
	bool b = ~(bool)i;
Same with floats:
	int i = <some value>;
	float f = ~i;
in this case the 'optimized' form doesn't even make sense:
	int i = <some value>;
	float f = ~(float)i;

It's easy enough to add a test to only allow this on 'true' integer types
like done in the patch.

However, the same problem also exist for the optimization of the
sequence of two implied cast (only that this is rare and I haven't
succeed to reproduce a case where it is wrong). For example,
a conversion of an integer to a float can overflow, so a conversion
like 'int -> float -> int' is not always a no-op.

Again it's easy enough to add some checks but I wonder if all this
is really worth. My gut feeling is that this should not be done at
evaluation time and should only be done after linearization.
I just wonder if this simplification wasn't done for some specific
purpose other than avoiding an allocation and keeping the AST small?

-- Luc
---
 evaluate.c                       |  2 +-
 validation/eval/not-cast-bool.c  | 14 ++++++++++++++
 validation/eval/not-cast-float.c | 14 ++++++++++++++
 3 files changed, 29 insertions(+), 1 deletion(-)
 create mode 100644 validation/eval/not-cast-bool.c
 create mode 100644 validation/eval/not-cast-float.c

Comments

Linus Torvalds Jan. 2, 2021, 7:52 p.m. UTC | #1

On Sat, Jan 2, 2021 at 2:09 AM Luc Van Oostenryck
<luc.vanoostenryck@gmail.com> wrote:
>
> It's easy enough to add a test to only allow this on 'true' integer types
> like done in the patch.

We already linearize casts to different ops for the different cases
(integer/pointer/fp/bool). So yeah, we should do it at that level.

> Again it's easy enough to add some checks but I wonder if all this
> is really worth.

I agree. Remove it from the evaluation phase, and add it to the
linearized SSA format instead.

Note that even on RISC architectures that don't even _have_ truncated
operations (ie an "add" or "not" is always done on the whole
register), doing the truncation to minimal bits is worth it, because
it ends up being just a "upper bits don't matter" and will remove a
_lot_ of sign-extension or zero-extension crud.

So I think the simplification should be done, but I agree it's done in
the wrong place.

Once you do it at the instruction simplification level, all of the
confusion goes away, because a "cast to floating point" is a
fundamentally different operation than "zero-extend integer" at that
level.

            Linus

diff --git a/evaluate.c b/evaluate.c
index 41871e18503a..843b56a0386a 100644
--- a/evaluate.c
+++ b/evaluate.c
@@ -337,7 +337,7 @@  static struct expression * cast_to(struct expression *old, struct symbol *type)
 	case EXPR_PREOP:
 		if (old->ctype->bit_size < type->bit_size)
 			break;
-		if (old->op == '~') {
+		if (old->op == '~' && is_int_type(type) && !is_bool_type(type)) {
 			old->ctype = type;
 			old->unop = cast_to(old->unop, type);
 			return old;
diff --git a/validation/eval/not-cast-bool.c b/validation/eval/not-cast-bool.c
new file mode 100644
index 000000000000..acd8bbf293db
--- /dev/null
+++ b/validation/eval/not-cast-bool.c
@@ -0,0 +1,14 @@ 
+static _Bool foo(void)
+{
+	unsigned char c = 1;
+	_Bool b = ~c;
+	return b;
+}
+
+/*
+ * check-name: not-cast-bool
+ * check-command: test-linearize -Wno-decl $file
+ *
+ * check-output-ignore
+ * check-output-returns: 1
+ */
diff --git a/validation/eval/not-cast-float.c b/validation/eval/not-cast-float.c
new file mode 100644
index 000000000000..d474d69bdda3
--- /dev/null
+++ b/validation/eval/not-cast-float.c
@@ -0,0 +1,14 @@ 
+static int foo(void)
+{
+	int i = 123;
+	float x = ~i;
+	return (x < 0);
+}
+
+/*
+ * check-name: eval-bool-zext-neg
+ * check-command: test-linearize -Wno-decl $file
+ *
+ * check-output-ignore
+ * check-output-returns: 1
+ */

RFC: problems with cast optimization at evaluation time

Commit Message

Comments

Patch