diff mbox

lto: Add __noreorder and mark initcalls __noreorder

Message ID 1428499058-8322-1-git-send-email-andi@firstfloor.org (mailing list archive)
State New, archived
Headers show

Commit Message

Andi Kleen April 8, 2015, 1:17 p.m. UTC
From: Andi Kleen <ak@linux.intel.com>

gcc 5 has a new no_reorder attribute that prevents top level
reordering only for that symbol.

Kernels don't like any reordering of initcalls between files, as several
initcalls depend on each other. LTO previously needed to use
-fno-toplevel-reordering to prevent boot failures.

Add a __noreorder wrapper for the no_reorder attribute and use
it for initcalls.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 include/linux/compiler-gcc5.h | 3 +++
 include/linux/compiler.h      | 4 ++++
 include/linux/init.h          | 2 +-
 3 files changed, 8 insertions(+), 1 deletion(-)

Comments

Andrew Morton April 8, 2015, 10:31 p.m. UTC | #1
On Wed,  8 Apr 2015 06:17:38 -0700 Andi Kleen <andi@firstfloor.org> wrote:

> From: Andi Kleen <ak@linux.intel.com>
> 
> gcc 5 has a new no_reorder attribute that prevents top level
> reordering only for that symbol.

I'm having trouble locating gcc documentation which explains all this
stuff.

> Kernels don't like any reordering of initcalls between files, as several
> initcalls depend on each other. LTO previously needed to use
> -fno-toplevel-reordering to prevent boot failures.

That's "-fno-toplevel-reorder", I believe?

> Add a __noreorder wrapper for the no_reorder attribute and use
> it for initcalls.

Head is spinning a bit.  As this all appears to be shiny new
added-by-andi gcc functionality, it would be useful if we could have a
few more words describing what it's all about.  Reordering of what with
respect to what and why and why is it bad.  Why is gcc reordering
things anyway, and what's the downside of preventing this.  Why is the
compiler reordering things rather than the linker. etc etc etc.

Please gently educate us ;)

--
To unsubscribe from this list: send the line "unsubscribe linux-kbuild" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andi Kleen April 8, 2015, 11:50 p.m. UTC | #2
Hi Andrew,

On Wed, Apr 08, 2015 at 03:31:12PM -0700, Andrew Morton wrote:
> On Wed,  8 Apr 2015 06:17:38 -0700 Andi Kleen <andi@firstfloor.org> wrote:
> 
> > From: Andi Kleen <ak@linux.intel.com>
> > 
> > gcc 5 has a new no_reorder attribute that prevents top level
> > reordering only for that symbol.
> 
> I'm having trouble locating gcc documentation which explains all this
> stuff.

The official manuals only have released versions, and gcc 5 is not
released yet, but it's here:

https://github.com/gcc-mirror/gcc/blob/master/gcc/doc/extend.texi#L3505

> > Kernels don't like any reordering of initcalls between files, as several
> > initcalls depend on each other. LTO previously needed to use
> > -fno-toplevel-reordering to prevent boot failures.
> 
> That's "-fno-toplevel-reorder", I believe?

Yes.

> 
> > Add a __noreorder wrapper for the no_reorder attribute and use
> > it for initcalls.
> 
> Head is spinning a bit.  As this all appears to be shiny new
> added-by-andi gcc functionality, it would be useful if we could have a
> few more words describing what it's all about.  Reordering of what with
> respect to what and why and why is it bad.  Why is gcc reordering
> things anyway, and what's the downside of preventing this.  Why is the
> compiler reordering things rather than the linker. etc etc etc.

Ok, let me try.

The original gcc a long time was function at a time: it read one
function, optimizes and writes it out, then the next. Then gcc 3.x
added unit-at-a-time where it reads one complete file, optimizes it
completely and writes it out. This has the advantage that it can make
better inlining decisions, it can remove unused statics, it can propagate
execution frequencies over the call tree before optimizing, and some
other things. Then it writes it out the unit in the call tree order,
which can also lead to better executable layout.  One side effect of
this is that the order of top level statements gets lost, unless you
specify -fno-toplevel-reorder

We had to fix Linux for this sometime in early 2.6, late 2.4. Most
problems were in top level asm() statements, assuming they had a defined
order to other variables. To still support programs doing that gcc added
-fno-toplevel-reorder, which avoided such reordering, but also disabled
a small number of optimizations.

Now 4.x added LTO, where it takes unit-at-a-time one step further and
optimizes the complete program in the same way at link time. It actually
does not keep it in memory all the time, but uses various tricks to only
look at it in pieces and distribute the work to multiple cores. To do
that it uses partitioning, where the program is split into different
partitions based on its global call tree, and then each partition is
assigned to a compiler process.  The result is a changed order for
everything in the final program.

Modern Linux was generally fine with reordering, except for initcalls. We have
a lot of initcalls that assume that some other initcalls already ran
before them, without using priorities. The order is defined in in the
Makefile's object file order for the linker. Linkers generally do not
reorder, unless told to. Unfortunately that gets lost with LTO. 

When I started the LTO patchkit I tried to debug and fix some of these
init calls, but it was hopeless. It was like a many-headed hydra.
So I needed to use -fno-toplevel-reorder for LTO. In LTO this both
gives worse partitioning (so the build is less balanced between
different cores) and also disables some optimizations, like eliminating
unused variables or some cross file optimizations.

gcc 5 finally gained a way to specify the no-toplevel-reorder attribute
per symbol with this new attribute. So it can be only done for the initcall
symbols, and everything else left alone.

That is what this patch is about.

It's not needed without LTO, but I belive it's useful documentation even
without it.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kbuild" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrew Morton April 10, 2015, 9:36 p.m. UTC | #3
On Thu, 9 Apr 2015 01:50:23 +0200 Andi Kleen <andi@firstfloor.org> wrote:

> > Head is spinning a bit.  As this all appears to be shiny new
> > added-by-andi gcc functionality, it would be useful if we could have a
> > few more words describing what it's all about.  Reordering of what with
> > respect to what and why and why is it bad.  Why is gcc reordering
> > things anyway, and what's the downside of preventing this.  Why is the
> > compiler reordering things rather than the linker. etc etc etc.
> 
> Ok, let me try.

That was super-useful, thanks.  I slurped it into the changelog -
maybe one day it will provide material for Documentation/lto-stuff.txt.

Big picture: do you have a feeling for how much benefit LTO will yield
in the kernel, if/when it's all completed?

--
To unsubscribe from this list: send the line "unsubscribe linux-kbuild" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andi Kleen April 10, 2015, 11:02 p.m. UTC | #4
On Fri, Apr 10, 2015 at 02:36:29PM -0700, Andrew Morton wrote:
> On Thu, 9 Apr 2015 01:50:23 +0200 Andi Kleen <andi@firstfloor.org> wrote:
> 
> > > Head is spinning a bit.  As this all appears to be shiny new
> > > added-by-andi gcc functionality, it would be useful if we could have a
> > > few more words describing what it's all about.  Reordering of what with
> > > respect to what and why and why is it bad.  Why is gcc reordering
> > > things anyway, and what's the downside of preventing this.  Why is the
> > > compiler reordering things rather than the linker. etc etc etc.
> > 
> > Ok, let me try.
> 
> That was super-useful, thanks.  I slurped it into the changelog -
> maybe one day it will provide material for Documentation/lto-stuff.txt.
> 
> Big picture: do you have a feeling for how much benefit LTO will yield
> in the kernel, if/when it's all completed?

At least nothing of the stuff I usually run seems to be very kernel compiler
dependent in performance. I think other people may benefit from it.
Just looking at the code it is often a lot better.

We've had great results in code size reduction for small systems though.
I also found a range of bugs in the kernel which is good.

The merge is also nearly finished, only a smaller number of patches
left. There are some future technologies which could benefit from it
too.

There is still some compile time penalty, although it got a lot better
with 5. I wouldn't expect developers to use it day-to-day, but it can
be a good release mode.

I think it's a good thing to have now, just for the benefits for
shrinking kernels.

-Andi
diff mbox

Patch

diff --git a/include/linux/compiler-gcc5.h b/include/linux/compiler-gcc5.h
index efee493..9004c00 100644
--- a/include/linux/compiler-gcc5.h
+++ b/include/linux/compiler-gcc5.h
@@ -42,6 +42,9 @@ 
 /* Mark a function definition as prohibited from being cloned. */
 #define __noclone	__attribute__((__noclone__))
 
+/* Avoid reordering a top level statement */
+#define __noreorder	__attribute__((no_reorder))
+
 /*
  * Tell the optimizer that something else uses this function or variable.
  */
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 1b45e4a..ac639a1 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -334,6 +334,10 @@  static __always_inline void __write_once_size(volatile void *p, void *res, int s
 #define noinline
 #endif
 
+#ifndef __noreorder
+#define __noreorder		/* unimplemented */
+#endif
+
 /*
  * Rather then using noinline to prevent stack consumption, use
  * noinline_for_stack instead.  For documentation reasons.
diff --git a/include/linux/init.h b/include/linux/init.h
index 2df8e8d..a0a1244 100644
--- a/include/linux/init.h
+++ b/include/linux/init.h
@@ -191,7 +191,7 @@  extern bool initcall_debug;
  */
 
 #define __define_initcall(fn, id) \
-	static initcall_t __initcall_##fn##id __used \
+	static initcall_t __initcall_##fn##id __used __noreorder \
 	__attribute__((__section__(".initcall" #id ".init"))) = fn; \
 	LTO_REFERENCE_INITCALL(__initcall_##fn##id)