diff mbox series

[v2] Kconfig: default to CC_OPTIMIZE_FOR_PERFORMANCE_O3 for gcc >= 10

Message ID 20200511215720.303181-1-Jason@zx2c4.com (mailing list archive)
State New, archived
Headers show
Series [v2] Kconfig: default to CC_OPTIMIZE_FOR_PERFORMANCE_O3 for gcc >= 10 | expand

Commit Message

Jason A. Donenfeld May 11, 2020, 9:57 p.m. UTC
GCC 10 appears to have changed -O2 in order to make compilation time
faster when using -flto, seemingly at the expense of performance, in
particular with regards to how the inliner works. Since -O3 these days
shouldn't have the same set of bugs as 10 years ago, this commit
defaults new kernel compiles to -O3 when using gcc >= 10.

Cc: linux-kbuild@vger.kernel.org
Cc: x86@kernel.org
Cc: stable@vger.kernel.org
Cc: hjl.tools@gmail.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jakub Jelinek <jakub@redhat.com>
Cc: Oleksandr Natalenko <oleksandr@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Laight <David.Laight@aculab.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
Changes v1->v2:
 - [Oleksandr] Remove O3 dependency on ARC.

 init/Kconfig | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Linus Torvalds May 12, 2020, 12:04 a.m. UTC | #1
On Mon, May 11, 2020 at 2:57 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> GCC 10 appears to have changed -O2 in order to make compilation time
> faster when using -flto, seemingly at the expense of performance, in
> particular with regards to how the inliner works. Since -O3 these days
> shouldn't have the same set of bugs as 10 years ago, this commit
> defaults new kernel compiles to -O3 when using gcc >= 10.

I'm not convinced this is sensible.

-O3 historically does bad things with gcc. Including bad things for
performance. It traditionally makes code larger and often SLOWER.

And I don't mean slower to compile (although that's an issue). I mean
actually generating slower code.

Things like trying to unroll loops etc makes very little sense in the
kernel, where we very seldom have high loop counts for pretty much
anything.

There's a reason -O3 isn't even offered as an option.

Maybe things have changed, and maybe they've improved. But I'd like to
see actual numbers for something like this.

Not inlining as aggressively is not necessarily a bad thing. It can
be, of course. But I've actually also done gcc bugreports about gcc
inlining too much, and generating _worse_ code as a result (ie
inlinging things that were behind an "if (unlikely())" test, and
causing the likely path to grow a stack fram and stack spills as a
result).

So just "O3 inlines more" is not a valid argument.

              Linus
Linus Torvalds May 12, 2020, 12:09 a.m. UTC | #2
On Mon, May 11, 2020 at 5:04 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Not inlining as aggressively is not necessarily a bad thing. It can
> be, of course. But I've actually also done gcc bugreports about gcc
> inlining too much, and generating _worse_ code as a result (ie
> inlinging things that were behind an "if (unlikely())" test, and
> causing the likely path to grow a stack fram and stack spills as a
> result).

In case people care, the bugzilla case I mentioned is this one:

    https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49194

with example code on why it's actively wrong to inline.

Obviously, in the kernel, we can fix the obvious cases with "noinline"
and "always_inline", but those take care of the outliers.  Having a
compiler that does reasonably well by default is a good thing, and
that very much includes *not* inlining mindlessly.

                  Linus
Jason A. Donenfeld May 12, 2020, 12:43 a.m. UTC | #3
On Mon, May 11, 2020 at 6:05 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> There's a reason -O3 isn't even offered as an option.
>
> Maybe things have changed, and maybe they've improved. But I'd like to
> see actual numbers for something like this.
>
> Not inlining as aggressively is not necessarily a bad thing. It can
> be, of course. But I've actually also done gcc bugreports about gcc
> inlining too much, and generating _worse_ code as a result (ie
> inlinging things that were behind an "if (unlikely())" test, and
> causing the likely path to grow a stack fram and stack spills as a
> result).
>
> So just "O3 inlines more" is not a valid argument.

Alright. It might be possible to produce some benchmarks, and then
isolate the precise inlining parameter that makes the difference, and
include that for gcc-10. But you made a compelling argument in that
old gcc bug report about not going down the finicky rabbit hole of gcc
inlining switches that seem to change meaning between releases, which
is persuasive.

The other possibility would be if -O3 actually isn't bad like it used
to be and the codegen is markedly better, alongside some numbers to
back it up. I'm not presently making that argument and don't have
those numbers, but perhaps others who were interested in this patch
for other reasons do have strong arguments there and want to chime in.
Otherwise, no problem dropping this.
Richard Biener May 12, 2020, 8:44 a.m. UTC | #4
On Mon, 11 May 2020, Linus Torvalds wrote:

> On Mon, May 11, 2020 at 2:57 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> >
> > GCC 10 appears to have changed -O2 in order to make compilation time
> > faster when using -flto, seemingly at the expense of performance, in
> > particular with regards to how the inliner works. Since -O3 these days
> > shouldn't have the same set of bugs as 10 years ago, this commit
> > defaults new kernel compiles to -O3 when using gcc >= 10.
> 
> I'm not convinced this is sensible.

Note the real thing that changed for GCC 10 at -O2 is that -O2
now includes -finline-functions which means GCC considers inlining
of functions not marked with 'inline' at -O2.  To counter code-size
growth and tune that back to previous levels the inlining limits
in effect at -O2 have been lowered.

Note this has been done based on analyzing larger C++ code and obviously
not because the kernel would benefit (IIRC kernel folks like 'inline'
to behave as written and thus rather may dislike the change to default to
-finline-functions).

> -O3 historically does bad things with gcc. Including bad things for
> performance. It traditionally makes code larger and often SLOWER.
> 
> And I don't mean slower to compile (although that's an issue). I mean
> actually generating slower code.
> 
> Things like trying to unroll loops etc makes very little sense in the
> kernel, where we very seldom have high loop counts for pretty much
> anything.
> 
> There's a reason -O3 isn't even offered as an option.

And I think that's completely sensible.  I would not recommend
to use -O3 for the kernel.  Somehow feeding back profile data
might help - though getting such data at all and with enough
coverage is probably hard.

As you said in the followup I wouldn't recommend tweaking GCCs
defaults for the various --param affecting inlining.  The behavior
with this is not consistent across releases.

Richard.

> Maybe things have changed, and maybe they've improved. But I'd like to
> see actual numbers for something like this.
> 
> Not inlining as aggressively is not necessarily a bad thing. It can
> be, of course. But I've actually also done gcc bugreports about gcc
> inlining too much, and generating _worse_ code as a result (ie
> inlinging things that were behind an "if (unlikely())" test, and
> causing the likely path to grow a stack fram and stack spills as a
> result).
> 
> So just "O3 inlines more" is not a valid argument.
> 
>               Linus
>
diff mbox series

Patch

diff --git a/init/Kconfig b/init/Kconfig
index 9e22ee8fbd75..f76ec3ccc883 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1245,7 +1245,8 @@  config BOOT_CONFIG
 
 choice
 	prompt "Compiler optimization level"
-	default CC_OPTIMIZE_FOR_PERFORMANCE
+	default CC_OPTIMIZE_FOR_PERFORMANCE_O3 if GCC_VERSION >= 100000
+	default CC_OPTIMIZE_FOR_PERFORMANCE if (GCC_VERSION < 100000 || CC_IS_CLANG)
 
 config CC_OPTIMIZE_FOR_PERFORMANCE
 	bool "Optimize for performance (-O2)"
@@ -1256,7 +1257,6 @@  config CC_OPTIMIZE_FOR_PERFORMANCE
 
 config CC_OPTIMIZE_FOR_PERFORMANCE_O3
 	bool "Optimize more for performance (-O3)"
-	depends on ARC
 	imply CC_DISABLE_WARN_MAYBE_UNINITIALIZED  # avoid false positives
 	help
 	  Choosing this option will pass "-O3" to your compiler to optimize