Message ID | 20200630173734.14057-19-will@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Allow architectures to override __READ_ONCE() | expand |
On Tue, Jun 30, 2020 at 7:39 PM Will Deacon <will@kernel.org> wrote: > +#define __READ_ONCE(x) \ > +({ \ > + int atomic = 1; \ > + union { __unqual_scalar_typeof(x) __val; char __c[1]; } __u; \ > + typeof(&(x)) __x = &(x); \ > + switch (sizeof(x)) { \ ... > + atomic ? (typeof(x))__u.__val : (*(volatile typeof(x) *)__x); \ > +}) This expands (x) nine times (five in __unqual_scala_typeof()), which can lead to significant code bloat after preprocessing if something passes a compound expression into READ_ONCE(). The compiler works it out eventually, but we've seen an actual slowdown in compile speed from this recently, especially on clang. I think if you move the typeof(&(x)) __x = &(x); line first, all other instances can use typeof(*__x) instead of typeof(x) and avoid this problem. Once we make gcc-4.9 the minimum version, this could be further improved to __auto_type __x = &(x); Arnd
On Tue, 30 Jun 2020 at 19:39, Will Deacon <will@kernel.org> wrote: > > When building with LTO, there is an increased risk of the compiler > converting an address dependency headed by a READ_ONCE() invocation > into a control dependency and consequently allowing for harmful > reordering by the CPU. > > Ensure that such transformations are harmless by overriding the generic > READ_ONCE() definition with one that provides acquire semantics when > building with LTO. > > Signed-off-by: Will Deacon <will@kernel.org> > --- > arch/arm64/include/asm/rwonce.h | 63 +++++++++++++++++++++++++++++++ > arch/arm64/kernel/vdso/Makefile | 2 +- > arch/arm64/kernel/vdso32/Makefile | 2 +- > 3 files changed, 65 insertions(+), 2 deletions(-) > create mode 100644 arch/arm64/include/asm/rwonce.h This seems reasonable, given we can't realistically tell the compiler about dependent loads. What (if any), is the performance impact? I guess this also heavily depends on the actual silicon. I do wonder, though, if there is some way to make the compiler do something better for us. Clearly, implementing real memory_order_consume hasn't worked out until today. But maybe the compiler could promote dependent loads to acquires if it recognizes it lost dependencies during optimizations. Just thinking out loud, it probably still has some weird corner case that will break. ;-) The other thing is that I'd be cautious blaming LTO, as I tried to summarize here: https://lore.kernel.org/kernel-hardening/20200630191931.GA884155@elver.google.com/ The main thing is that, yes, this might be something to be worried about, but if we are worried about it, we need to be worried about it in *all* builds (LTO or not). My guess is that's not acceptable. Would it be better to just guard the promotion of READ_ONCE() to acquire behind a config option like CONFIG_ACQUIRE_READ_DEPENDENCIES, and then make LTO select that (or maybe leave it optional?). In future, for very aggressive non-LTO compilers even, one may then also select that if there is substantiated worry things do actually break. Thanks, -- Marco
On Tue, Jun 30, 2020 at 09:47:30PM +0200, Marco Elver wrote: > I do wonder, though, if there is some way to make the compiler do > something better for us. Clearly, implementing real > memory_order_consume hasn't worked out until today. But maybe the > compiler could promote dependent loads to acquires if it recognizes it > lost dependencies during optimizations. Just thinking out loud, it > probably still has some weird corner case that will break. ;-) I'd be very hesitant to let the compiler upgrade the ordering for us, specifically because we're not using C11 crud and are using a lot of inline asm.
On Tue, Jun 30, 2020 at 12:47 PM Marco Elver <elver@google.com> wrote: > > On Tue, 30 Jun 2020 at 19:39, Will Deacon <will@kernel.org> wrote: > > > > When building with LTO, there is an increased risk of the compiler > > converting an address dependency headed by a READ_ONCE() invocation > > into a control dependency and consequently allowing for harmful > > reordering by the CPU. > > > > Ensure that such transformations are harmless by overriding the generic > > READ_ONCE() definition with one that provides acquire semantics when > > building with LTO. > > > > Signed-off-by: Will Deacon <will@kernel.org> > > --- > > arch/arm64/include/asm/rwonce.h | 63 +++++++++++++++++++++++++++++++ > > arch/arm64/kernel/vdso/Makefile | 2 +- > > arch/arm64/kernel/vdso32/Makefile | 2 +- > > 3 files changed, 65 insertions(+), 2 deletions(-) > > create mode 100644 arch/arm64/include/asm/rwonce.h > > This seems reasonable, given we can't realistically tell the compiler > about dependent loads. What (if any), is the performance impact? I > guess this also heavily depends on the actual silicon. > > I do wonder, though, if there is some way to make the compiler do > something better for us. Clearly, implementing real > memory_order_consume hasn't worked out until today. But maybe the > compiler could promote dependent loads to acquires if it recognizes it > lost dependencies during optimizations. Just thinking out loud, it > probably still has some weird corner case that will break. ;-) > > The other thing is that I'd be cautious blaming LTO, as I tried to > summarize here: > https://lore.kernel.org/kernel-hardening/20200630191931.GA884155@elver.google.com/ > > The main thing is that, yes, this might be something to be worried > about, but if we are worried about it, we need to be worried about it > in *all* builds (LTO or not). My guess is that's not acceptable. Would > it be better to just guard the promotion of READ_ONCE() to acquire > behind a config option like CONFIG_ACQUIRE_READ_DEPENDENCIES, and then > make LTO select that (or maybe leave it optional?). In future, for > very aggressive non-LTO compilers even, one may then also select that > if there is substantiated worry things do actually break. I agree, a separate config option would be better here. Also Will, the LTO patches use CONFIG_LTO_CLANG instead of CLANG_LTO. Sami
On Tue, Jun 30, 2020 at 09:25:03PM +0200, Arnd Bergmann wrote: > On Tue, Jun 30, 2020 at 7:39 PM Will Deacon <will@kernel.org> wrote: > > +#define __READ_ONCE(x) \ > > +({ \ > > + int atomic = 1; \ > > + union { __unqual_scalar_typeof(x) __val; char __c[1]; } __u; \ > > + typeof(&(x)) __x = &(x); \ > > + switch (sizeof(x)) { \ > ... > > + atomic ? (typeof(x))__u.__val : (*(volatile typeof(x) *)__x); \ > > +}) > > This expands (x) nine times (five in __unqual_scala_typeof()), which can > lead to significant code bloat after preprocessing if something passes a > compound expression into READ_ONCE(). > The compiler works it out eventually, but we've seen an actual slowdown > in compile speed from this recently, especially on clang. > > I think if you move the > > typeof(&(x)) __x = &(x); > > line first, all other instances can use typeof(*__x) instead of typeof(x) > and avoid this problem. Cheers, I was only thinking about side-effects when I wrote this, but bloating built time is very unpopular, so I'll go with your suggestion. > Once we make gcc-4.9 the minimum version, > this could be further improved to > > __auto_type __x = &(x); Is anybody working on moving to 4.9? I've seen the mails from Linus championing it, but I thought there was a RHEL in support that people might care about? Will
On Tue, Jun 30, 2020 at 09:47:30PM +0200, Marco Elver wrote: > On Tue, 30 Jun 2020 at 19:39, Will Deacon <will@kernel.org> wrote: > > > > When building with LTO, there is an increased risk of the compiler > > converting an address dependency headed by a READ_ONCE() invocation > > into a control dependency and consequently allowing for harmful > > reordering by the CPU. > > > > Ensure that such transformations are harmless by overriding the generic > > READ_ONCE() definition with one that provides acquire semantics when > > building with LTO. > > > > Signed-off-by: Will Deacon <will@kernel.org> > > --- > > arch/arm64/include/asm/rwonce.h | 63 +++++++++++++++++++++++++++++++ > > arch/arm64/kernel/vdso/Makefile | 2 +- > > arch/arm64/kernel/vdso32/Makefile | 2 +- > > 3 files changed, 65 insertions(+), 2 deletions(-) > > create mode 100644 arch/arm64/include/asm/rwonce.h > > This seems reasonable, given we can't realistically tell the compiler > about dependent loads. What (if any), is the performance impact? I > guess this also heavily depends on the actual silicon. Right, it depends both on the CPU micro-architecture and also the workload. When we ran some basic tests, the overhead wasn't greater than the benefit seen by enabling LTO, so it seems like a reasonable trade-off (given that LTO is a dependency for CFI, so it's not just about performance). Will
On Tue, Jun 30, 2020 at 03:57:54PM -0700, Sami Tolvanen wrote: > On Tue, Jun 30, 2020 at 12:47 PM Marco Elver <elver@google.com> wrote: > > > > On Tue, 30 Jun 2020 at 19:39, Will Deacon <will@kernel.org> wrote: > > > > > > When building with LTO, there is an increased risk of the compiler > > > converting an address dependency headed by a READ_ONCE() invocation > > > into a control dependency and consequently allowing for harmful > > > reordering by the CPU. > > > > > > Ensure that such transformations are harmless by overriding the generic > > > READ_ONCE() definition with one that provides acquire semantics when > > > building with LTO. > > > > > > Signed-off-by: Will Deacon <will@kernel.org> > > > --- > > > arch/arm64/include/asm/rwonce.h | 63 +++++++++++++++++++++++++++++++ > > > arch/arm64/kernel/vdso/Makefile | 2 +- > > > arch/arm64/kernel/vdso32/Makefile | 2 +- > > > 3 files changed, 65 insertions(+), 2 deletions(-) > > > create mode 100644 arch/arm64/include/asm/rwonce.h > > > > This seems reasonable, given we can't realistically tell the compiler > > about dependent loads. What (if any), is the performance impact? I > > guess this also heavily depends on the actual silicon. > > > > I do wonder, though, if there is some way to make the compiler do > > something better for us. Clearly, implementing real > > memory_order_consume hasn't worked out until today. But maybe the > > compiler could promote dependent loads to acquires if it recognizes it > > lost dependencies during optimizations. Just thinking out loud, it > > probably still has some weird corner case that will break. ;-) > > > > The other thing is that I'd be cautious blaming LTO, as I tried to > > summarize here: > > https://lore.kernel.org/kernel-hardening/20200630191931.GA884155@elver.google.com/ > > > > The main thing is that, yes, this might be something to be worried > > about, but if we are worried about it, we need to be worried about it > > in *all* builds (LTO or not). My guess is that's not acceptable. Would > > it be better to just guard the promotion of READ_ONCE() to acquire > > behind a config option like CONFIG_ACQUIRE_READ_DEPENDENCIES, and then > > make LTO select that (or maybe leave it optional?). In future, for > > very aggressive non-LTO compilers even, one may then also select that > > if there is substantiated worry things do actually break. > > I agree, a separate config option would be better here. > > Also Will, the LTO patches use CONFIG_LTO_CLANG instead of CLANG_LTO. D'oh, sorry. I'll fix that (I had that #ifdef commented out for my testing). Will
On Wed, Jul 1, 2020 at 12:19 PM Will Deacon <will@kernel.org> wrote: > On Tue, Jun 30, 2020 at 09:25:03PM +0200, Arnd Bergmann wrote: > > On Tue, Jun 30, 2020 at 7:39 PM Will Deacon <will@kernel.org> wrote: > > Once we make gcc-4.9 the minimum version, > > this could be further improved to > > > > __auto_type __x = &(x); > > Is anybody working on moving to 4.9? I've seen the mails from Linus > championing it, but I thought there was a RHEL in support that people > might care about? I don't think there was a serious discussion about it so far, and we only just moved to gcc-4.8. I think moving to gnu11 (gcc-4.9 or clang) instead of gnu99 has other benefits as well, so we may well want to do it anyway when something else comes up. For __auto_type(), we could do it like #if (clang or gcc-4.9+) #define auto_typeof(x) __auto_type #else #define auto_typeof(x) typeof(x) #endif which could be used in a lot of macros. Arnd
On Tue, Jun 30, 2020 at 06:37:34PM +0100, Will Deacon wrote: > When building with LTO, there is an increased risk of the compiler > converting an address dependency headed by a READ_ONCE() invocation > into a control dependency and consequently allowing for harmful > reordering by the CPU. > > Ensure that such transformations are harmless by overriding the generic > READ_ONCE() definition with one that provides acquire semantics when > building with LTO. > > Signed-off-by: Will Deacon <will@kernel.org> > --- > arch/arm64/include/asm/rwonce.h | 63 +++++++++++++++++++++++++++++++ > arch/arm64/kernel/vdso/Makefile | 2 +- > arch/arm64/kernel/vdso32/Makefile | 2 +- > 3 files changed, 65 insertions(+), 2 deletions(-) > create mode 100644 arch/arm64/include/asm/rwonce.h > > diff --git a/arch/arm64/include/asm/rwonce.h b/arch/arm64/include/asm/rwonce.h > new file mode 100644 > index 000000000000..515e360b01a1 > --- /dev/null > +++ b/arch/arm64/include/asm/rwonce.h > @@ -0,0 +1,63 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > +/* > + * Copyright (C) 2020 Google LLC. > + */ > +#ifndef __ASM_RWONCE_H > +#define __ASM_RWONCE_H > + > +#ifdef CONFIG_CLANG_LTO Don't we have a generic option for LTO that's not specific to Clang. Also, can you illustrate code that can only be unsafe with Clang LTO? [...] Cheers ---Dave
On Wed, Jul 01, 2020 at 06:07:25PM +0100, Dave P Martin wrote: > On Tue, Jun 30, 2020 at 06:37:34PM +0100, Will Deacon wrote: > > When building with LTO, there is an increased risk of the compiler > > converting an address dependency headed by a READ_ONCE() invocation > > into a control dependency and consequently allowing for harmful > > reordering by the CPU. > > > > Ensure that such transformations are harmless by overriding the generic > > READ_ONCE() definition with one that provides acquire semantics when > > building with LTO. > > > > Signed-off-by: Will Deacon <will@kernel.org> > > --- > > arch/arm64/include/asm/rwonce.h | 63 +++++++++++++++++++++++++++++++ > > arch/arm64/kernel/vdso/Makefile | 2 +- > > arch/arm64/kernel/vdso32/Makefile | 2 +- > > 3 files changed, 65 insertions(+), 2 deletions(-) > > create mode 100644 arch/arm64/include/asm/rwonce.h > > > > diff --git a/arch/arm64/include/asm/rwonce.h b/arch/arm64/include/asm/rwonce.h > > new file mode 100644 > > index 000000000000..515e360b01a1 > > --- /dev/null > > +++ b/arch/arm64/include/asm/rwonce.h > > @@ -0,0 +1,63 @@ > > +/* SPDX-License-Identifier: GPL-2.0 */ > > +/* > > + * Copyright (C) 2020 Google LLC. > > + */ > > +#ifndef __ASM_RWONCE_H > > +#define __ASM_RWONCE_H > > + > > +#ifdef CONFIG_CLANG_LTO > > Don't we have a generic option for LTO that's not specific to Clang. /me looks at the LTO series some more Oh yeah, there's CONFIG_LTO which is selected by CONFIG_LTO_CLANG, which is the non-typoed version of the above. I can switch this to CONFIG_LTO. > Also, can you illustrate code that can only be unsafe with Clang LTO? I don't have a concrete example, but it's an ongoing concern over on the LTO thread [1], so I cooked this to show one way we could deal with it. The main concern is that the whole-program optimisations enabled by LTO may allow the compiler to enumerate possible values for a pointer at link time and replace an address dependency between two loads with a control dependency instead, defeating the dependency ordering within the CPU. We likely won't realise if/when this goes wrong, other than impossible to debug, subtle breakage that crops up seemingly randomly. Ideally, we'd be able to detect this sort of thing happening at build time, and perhaps even prevent it with compiler options or annotations, but none of that is close to being available and I'm keen to progress the LTO patches in the meantime because they are a requirement for CFI. Will [1] https://lore.kernel.org/r/20200624203200.78870-1-samitolvanen@google.com
On Thu, Jul 02, 2020 at 08:23:02AM +0100, Will Deacon wrote: > On Wed, Jul 01, 2020 at 06:07:25PM +0100, Dave P Martin wrote: > > On Tue, Jun 30, 2020 at 06:37:34PM +0100, Will Deacon wrote: > > > When building with LTO, there is an increased risk of the compiler > > > converting an address dependency headed by a READ_ONCE() invocation > > > into a control dependency and consequently allowing for harmful > > > reordering by the CPU. > > > > > > Ensure that such transformations are harmless by overriding the generic > > > READ_ONCE() definition with one that provides acquire semantics when > > > building with LTO. > > > > > > Signed-off-by: Will Deacon <will@kernel.org> > > > --- > > > arch/arm64/include/asm/rwonce.h | 63 +++++++++++++++++++++++++++++++ > > > arch/arm64/kernel/vdso/Makefile | 2 +- > > > arch/arm64/kernel/vdso32/Makefile | 2 +- > > > 3 files changed, 65 insertions(+), 2 deletions(-) > > > create mode 100644 arch/arm64/include/asm/rwonce.h > > > > > > diff --git a/arch/arm64/include/asm/rwonce.h b/arch/arm64/include/asm/rwonce.h > > > new file mode 100644 > > > index 000000000000..515e360b01a1 > > > --- /dev/null > > > +++ b/arch/arm64/include/asm/rwonce.h > > > @@ -0,0 +1,63 @@ > > > +/* SPDX-License-Identifier: GPL-2.0 */ > > > +/* > > > + * Copyright (C) 2020 Google LLC. > > > + */ > > > +#ifndef __ASM_RWONCE_H > > > +#define __ASM_RWONCE_H > > > + > > > +#ifdef CONFIG_CLANG_LTO > > > > Don't we have a generic option for LTO that's not specific to Clang. > > /me looks at the LTO series some more > > Oh yeah, there's CONFIG_LTO which is selected by CONFIG_LTO_CLANG, which is > the non-typoed version of the above. I can switch this to CONFIG_LTO. > > > Also, can you illustrate code that can only be unsafe with Clang LTO? > > I don't have a concrete example, but it's an ongoing concern over on the LTO > thread [1], so I cooked this to show one way we could deal with it. The main > concern is that the whole-program optimisations enabled by LTO may allow the > compiler to enumerate possible values for a pointer at link time and replace > an address dependency between two loads with a control dependency instead, > defeating the dependency ordering within the CPU. Why can't that happen without LTO? > We likely won't realise if/when this goes wrong, other than impossible to > debug, subtle breakage that crops up seemingly randomly. Ideally, we'd be > able to detect this sort of thing happening at build time, and perhaps > even prevent it with compiler options or annotations, but none of that is > close to being available and I'm keen to progress the LTO patches in the > meantime because they are a requirement for CFI. My concern was not so much why LTO makes things dangerous, as why !LTO makes things safe... Cheers ---Dave
On Tue, Jun 30, 2020 at 06:37:34PM +0100, Will Deacon wrote: > When building with LTO, there is an increased risk of the compiler > converting an address dependency headed by a READ_ONCE() invocation > into a control dependency and consequently allowing for harmful > reordering by the CPU. > > Ensure that such transformations are harmless by overriding the generic > READ_ONCE() definition with one that provides acquire semantics when > building with LTO. > > Signed-off-by: Will Deacon <will@kernel.org> > --- > arch/arm64/include/asm/rwonce.h | 63 +++++++++++++++++++++++++++++++ > arch/arm64/kernel/vdso/Makefile | 2 +- > arch/arm64/kernel/vdso32/Makefile | 2 +- > 3 files changed, 65 insertions(+), 2 deletions(-) > create mode 100644 arch/arm64/include/asm/rwonce.h > > diff --git a/arch/arm64/include/asm/rwonce.h b/arch/arm64/include/asm/rwonce.h > new file mode 100644 > index 000000000000..515e360b01a1 > --- /dev/null > +++ b/arch/arm64/include/asm/rwonce.h > @@ -0,0 +1,63 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > +/* > + * Copyright (C) 2020 Google LLC. > + */ > +#ifndef __ASM_RWONCE_H > +#define __ASM_RWONCE_H > + > +#ifdef CONFIG_CLANG_LTO > + > +#include <linux/compiler_types.h> > +#include <asm/alternative-macros.h> > + > +#ifndef BUILD_VDSO > + > +#ifdef CONFIG_AS_HAS_LDAPR > +#define __LOAD_RCPC(sfx, regs...) \ > + ALTERNATIVE( \ > + "ldar" #sfx "\t" #regs, \ ^ Should this be here? It seems that READ_ONCE() will actually read twice... even if that doesn't actually conflict with the required semantics of READ_ONCE(), it looks odd. Making a direct link between LTO and the memory model also seems highly spurious (as discussed in the other subthread) so can we have a comment explaining the reasoning? > + ".arch_extension rcpc\n" \ > + "ldapr" #sfx "\t" #regs, \ > + ARM64_HAS_LDAPR) > +#else > +#define __LOAD_RCPC(sfx, regs...) "ldar" #sfx "\t" #regs > +#endif /* CONFIG_AS_HAS_LDAPR */ [...] Cheers ---Dave
On Mon, Jul 06, 2020 at 05:00:23PM +0100, Dave Martin wrote: > On Thu, Jul 02, 2020 at 08:23:02AM +0100, Will Deacon wrote: > > On Wed, Jul 01, 2020 at 06:07:25PM +0100, Dave P Martin wrote: > > > On Tue, Jun 30, 2020 at 06:37:34PM +0100, Will Deacon wrote: > > > > When building with LTO, there is an increased risk of the compiler > > > > converting an address dependency headed by a READ_ONCE() invocation > > > > into a control dependency and consequently allowing for harmful > > > > reordering by the CPU. > > > > > > > > Ensure that such transformations are harmless by overriding the generic > > > > READ_ONCE() definition with one that provides acquire semantics when > > > > building with LTO. > > > > > > > > Signed-off-by: Will Deacon <will@kernel.org> > > > > --- > > > > arch/arm64/include/asm/rwonce.h | 63 +++++++++++++++++++++++++++++++ > > > > arch/arm64/kernel/vdso/Makefile | 2 +- > > > > arch/arm64/kernel/vdso32/Makefile | 2 +- > > > > 3 files changed, 65 insertions(+), 2 deletions(-) > > > > create mode 100644 arch/arm64/include/asm/rwonce.h > > > > > > > > diff --git a/arch/arm64/include/asm/rwonce.h b/arch/arm64/include/asm/rwonce.h > > > > new file mode 100644 > > > > index 000000000000..515e360b01a1 > > > > --- /dev/null > > > > +++ b/arch/arm64/include/asm/rwonce.h > > > > @@ -0,0 +1,63 @@ > > > > +/* SPDX-License-Identifier: GPL-2.0 */ > > > > +/* > > > > + * Copyright (C) 2020 Google LLC. > > > > + */ > > > > +#ifndef __ASM_RWONCE_H > > > > +#define __ASM_RWONCE_H > > > > + > > > > +#ifdef CONFIG_CLANG_LTO > > > > > > Don't we have a generic option for LTO that's not specific to Clang. > > > > /me looks at the LTO series some more > > > > Oh yeah, there's CONFIG_LTO which is selected by CONFIG_LTO_CLANG, which is > > the non-typoed version of the above. I can switch this to CONFIG_LTO. > > > > > Also, can you illustrate code that can only be unsafe with Clang LTO? > > > > I don't have a concrete example, but it's an ongoing concern over on the LTO > > thread [1], so I cooked this to show one way we could deal with it. The main > > concern is that the whole-program optimisations enabled by LTO may allow the > > compiler to enumerate possible values for a pointer at link time and replace > > an address dependency between two loads with a control dependency instead, > > defeating the dependency ordering within the CPU. > > Why can't that happen without LTO? Because without LTO, the compiler cannot see all the pointers all at the same time due to their being in different translation units. But yes, if the compiler could see all the pointer values and further -know- that it was seeing all the pointer values, these optimizations could happen even without LTO. But it is quite easy to make sure that the compiler thinks that there are additional pointer values that it does not know about. > > We likely won't realise if/when this goes wrong, other than impossible to > > debug, subtle breakage that crops up seemingly randomly. Ideally, we'd be > > able to detect this sort of thing happening at build time, and perhaps > > even prevent it with compiler options or annotations, but none of that is > > close to being available and I'm keen to progress the LTO patches in the > > meantime because they are a requirement for CFI. > > My concern was not so much why LTO makes things dangerous, as why !LTO > makes things safe... Because ignorant compilers are safe compilers! ;-) Thanx, Paul
On Mon, Jul 06, 2020 at 09:34:55AM -0700, Paul E. McKenney wrote: > On Mon, Jul 06, 2020 at 05:00:23PM +0100, Dave Martin wrote: > > On Thu, Jul 02, 2020 at 08:23:02AM +0100, Will Deacon wrote: > > > On Wed, Jul 01, 2020 at 06:07:25PM +0100, Dave P Martin wrote: > > > > On Tue, Jun 30, 2020 at 06:37:34PM +0100, Will Deacon wrote: > > > > > When building with LTO, there is an increased risk of the compiler > > > > > converting an address dependency headed by a READ_ONCE() invocation > > > > > into a control dependency and consequently allowing for harmful > > > > > reordering by the CPU. > > > > > > > > > > Ensure that such transformations are harmless by overriding the generic > > > > > READ_ONCE() definition with one that provides acquire semantics when > > > > > building with LTO. > > > > > > > > > > Signed-off-by: Will Deacon <will@kernel.org> > > > > > --- > > > > > arch/arm64/include/asm/rwonce.h | 63 +++++++++++++++++++++++++++++++ > > > > > arch/arm64/kernel/vdso/Makefile | 2 +- > > > > > arch/arm64/kernel/vdso32/Makefile | 2 +- > > > > > 3 files changed, 65 insertions(+), 2 deletions(-) > > > > > create mode 100644 arch/arm64/include/asm/rwonce.h > > > > > > > > > > diff --git a/arch/arm64/include/asm/rwonce.h b/arch/arm64/include/asm/rwonce.h > > > > > new file mode 100644 > > > > > index 000000000000..515e360b01a1 > > > > > --- /dev/null > > > > > +++ b/arch/arm64/include/asm/rwonce.h > > > > > @@ -0,0 +1,63 @@ > > > > > +/* SPDX-License-Identifier: GPL-2.0 */ > > > > > +/* > > > > > + * Copyright (C) 2020 Google LLC. > > > > > + */ > > > > > +#ifndef __ASM_RWONCE_H > > > > > +#define __ASM_RWONCE_H > > > > > + > > > > > +#ifdef CONFIG_CLANG_LTO > > > > > > > > Don't we have a generic option for LTO that's not specific to Clang. > > > > > > /me looks at the LTO series some more > > > > > > Oh yeah, there's CONFIG_LTO which is selected by CONFIG_LTO_CLANG, which is > > > the non-typoed version of the above. I can switch this to CONFIG_LTO. > > > > > > > Also, can you illustrate code that can only be unsafe with Clang LTO? > > > > > > I don't have a concrete example, but it's an ongoing concern over on the LTO > > > thread [1], so I cooked this to show one way we could deal with it. The main > > > concern is that the whole-program optimisations enabled by LTO may allow the > > > compiler to enumerate possible values for a pointer at link time and replace > > > an address dependency between two loads with a control dependency instead, > > > defeating the dependency ordering within the CPU. > > > > Why can't that happen without LTO? > > Because without LTO, the compiler cannot see all the pointers all at > the same time due to their being in different translation units. > > But yes, if the compiler could see all the pointer values and further > -know- that it was seeing all the pointer values, these optimizations > could happen even without LTO. But it is quite easy to make sure that > the compiler thinks that there are additional pointer values that it > does not know about. Yes of course, but even without LTO the compiler can still apply this optimisation to everything visible in the translation unit, and that can drift as people refactor code over time. Convincing the compiler there are other possible values doesn't help. Even in int foo(int *p) { asm ("" : "+r" (p)); return *p; } Can't the compiler still generate something like this: switch (p) { case &foo: return foo; case &bar: return bar; default: return *p; } ...in which case we still have the same lost ordering guarantee that we were trying to enforce. If foo and bar already happen to be in registers and profiling shows that &foo and &bar are the most likely value of p then this might be a reasonable optimisation in some situations, irrespective of LTO. The underlying problem here seems to be that the necessary ordering rule is not part of what passes for the C memory model prior to C11. If we want to control the data flow, don't we have to wrap the entire dereference in a macro? > > > We likely won't realise if/when this goes wrong, other than impossible to > > > debug, subtle breakage that crops up seemingly randomly. Ideally, we'd be > > > able to detect this sort of thing happening at build time, and perhaps > > > even prevent it with compiler options or annotations, but none of that is > > > close to being available and I'm keen to progress the LTO patches in the > > > meantime because they are a requirement for CFI. > > > > My concern was not so much why LTO makes things dangerous, as why !LTO > > makes things safe... > > Because ignorant compilers are safe compilers! ;-) AFAICT ignorance is no gurantee of ordering in general -- the compiler is free to speculatively invent knowledge any place that the language spec allows it to. !LTO doesn't stop this happening. Hopefully some of the knowledge I invented in my reply is valid... Cheers ---Dave
On Mon, Jul 06, 2020 at 06:05:57PM +0100, Dave Martin wrote: > On Mon, Jul 06, 2020 at 09:34:55AM -0700, Paul E. McKenney wrote: > > On Mon, Jul 06, 2020 at 05:00:23PM +0100, Dave Martin wrote: > > > On Thu, Jul 02, 2020 at 08:23:02AM +0100, Will Deacon wrote: > > > > On Wed, Jul 01, 2020 at 06:07:25PM +0100, Dave P Martin wrote: > > > > > On Tue, Jun 30, 2020 at 06:37:34PM +0100, Will Deacon wrote: > > > > > > When building with LTO, there is an increased risk of the compiler > > > > > > converting an address dependency headed by a READ_ONCE() invocation > > > > > > into a control dependency and consequently allowing for harmful > > > > > > reordering by the CPU. > > > > > > > > > > > > Ensure that such transformations are harmless by overriding the generic > > > > > > READ_ONCE() definition with one that provides acquire semantics when > > > > > > building with LTO. > > > > > > > > > > > > Signed-off-by: Will Deacon <will@kernel.org> > > > > > > --- > > > > > > arch/arm64/include/asm/rwonce.h | 63 +++++++++++++++++++++++++++++++ > > > > > > arch/arm64/kernel/vdso/Makefile | 2 +- > > > > > > arch/arm64/kernel/vdso32/Makefile | 2 +- > > > > > > 3 files changed, 65 insertions(+), 2 deletions(-) > > > > > > create mode 100644 arch/arm64/include/asm/rwonce.h > > > > > > > > > > > > diff --git a/arch/arm64/include/asm/rwonce.h b/arch/arm64/include/asm/rwonce.h > > > > > > new file mode 100644 > > > > > > index 000000000000..515e360b01a1 > > > > > > --- /dev/null > > > > > > +++ b/arch/arm64/include/asm/rwonce.h > > > > > > @@ -0,0 +1,63 @@ > > > > > > +/* SPDX-License-Identifier: GPL-2.0 */ > > > > > > +/* > > > > > > + * Copyright (C) 2020 Google LLC. > > > > > > + */ > > > > > > +#ifndef __ASM_RWONCE_H > > > > > > +#define __ASM_RWONCE_H > > > > > > + > > > > > > +#ifdef CONFIG_CLANG_LTO > > > > > > > > > > Don't we have a generic option for LTO that's not specific to Clang. > > > > > > > > /me looks at the LTO series some more > > > > > > > > Oh yeah, there's CONFIG_LTO which is selected by CONFIG_LTO_CLANG, which is > > > > the non-typoed version of the above. I can switch this to CONFIG_LTO. > > > > > > > > > Also, can you illustrate code that can only be unsafe with Clang LTO? > > > > > > > > I don't have a concrete example, but it's an ongoing concern over on the LTO > > > > thread [1], so I cooked this to show one way we could deal with it. The main > > > > concern is that the whole-program optimisations enabled by LTO may allow the > > > > compiler to enumerate possible values for a pointer at link time and replace > > > > an address dependency between two loads with a control dependency instead, > > > > defeating the dependency ordering within the CPU. > > > > > > Why can't that happen without LTO? > > > > Because without LTO, the compiler cannot see all the pointers all at > > the same time due to their being in different translation units. > > > > But yes, if the compiler could see all the pointer values and further > > -know- that it was seeing all the pointer values, these optimizations > > could happen even without LTO. But it is quite easy to make sure that > > the compiler thinks that there are additional pointer values that it > > does not know about. > > Yes of course, but even without LTO the compiler can still apply this > optimisation to everything visible in the translation unit, and that can > drift as people refactor code over time. > > Convincing the compiler there are other possible values doesn't help. > Even in > > int foo(int *p) > { > asm ("" : "+r" (p)); > return *p; > } > > Can't the compiler still generate something like this: > > switch (p) { > case &foo: > return foo; > > case &bar: > return bar; > > default: > return *p; > } > > ...in which case we still have the same lost ordering guarantee that > we were trying to enforce. > > If foo and bar already happen to be in registers and profiling shows > that &foo and &bar are the most likely value of p then this might be > a reasonable optimisation in some situations, irrespective of LTO. Agreed, the additional information from profile-driven optimization can be just as damaging as that from LTO. > The underlying problem here seems to be that the necessary ordering > rule is not part of what passes for the C memory model prior to C11. > If we want to control the data flow, don't we have to wrap the entire > dereference in a macro? Yes, exactly. Because we are relying on things that are not guaranteed by the C memory model, we need to pay attention to the implementations. As I have said elsewhere, the price of control dependencies is eternal vigilance. And this also applies, to a lesser extent, to address and data dependencies, which are also not well supported by the C standard. There is one important case in which the C memory model -does- support control dependencies, and that is when the dependent write is a normal C-language write that is not involved in a data race. In that case, if the compiler broke the control dependency, it might have introduced a data race, which it is forbidden to do. However, this rule can also be broken when the compiler knows too much, as it might be able to prove that breaking the dependency won't introduce a data race. In that case, according to the standard, it is free to break the dependency. > > > > We likely won't realise if/when this goes wrong, other than impossible to > > > > debug, subtle breakage that crops up seemingly randomly. Ideally, we'd be > > > > able to detect this sort of thing happening at build time, and perhaps > > > > even prevent it with compiler options or annotations, but none of that is > > > > close to being available and I'm keen to progress the LTO patches in the > > > > meantime because they are a requirement for CFI. > > > > > > My concern was not so much why LTO makes things dangerous, as why !LTO > > > makes things safe... > > > > Because ignorant compilers are safe compilers! ;-) > > AFAICT ignorance is no gurantee of ordering in general -- the compiler > is free to speculatively invent knowledge any place that the language > spec allows it to. !LTO doesn't stop this happening. Agreed, according to the standard, the compiler has great freedom. We have two choices: (1) Restrict ourselves to live within the confines of the standard or (2) Pay continued close attention to the implementation. We have made different choices at different times, but for many ordering situations we have gone with door #2. Me, I have been working to get the standard to better support our use case. This is at best slow going. But don't take my word for it, ask Will. > Hopefully some of the knowledge I invented in my reply is valid... It is. It is just that there are multiple valid strategies, and the Linux kernel is currently taking a mixed-strategy approach. Thanx, Paul
On Mon, Jul 06, 2020 at 05:08:20PM +0100, Dave Martin wrote: > On Tue, Jun 30, 2020 at 06:37:34PM +0100, Will Deacon wrote: > > diff --git a/arch/arm64/include/asm/rwonce.h b/arch/arm64/include/asm/rwonce.h > > new file mode 100644 > > index 000000000000..515e360b01a1 > > --- /dev/null > > +++ b/arch/arm64/include/asm/rwonce.h > > @@ -0,0 +1,63 @@ > > +/* SPDX-License-Identifier: GPL-2.0 */ > > +/* > > + * Copyright (C) 2020 Google LLC. > > + */ > > +#ifndef __ASM_RWONCE_H > > +#define __ASM_RWONCE_H > > + > > +#ifdef CONFIG_CLANG_LTO > > + > > +#include <linux/compiler_types.h> > > +#include <asm/alternative-macros.h> > > + > > +#ifndef BUILD_VDSO > > + > > +#ifdef CONFIG_AS_HAS_LDAPR > > +#define __LOAD_RCPC(sfx, regs...) \ > > + ALTERNATIVE( \ > > + "ldar" #sfx "\t" #regs, \ > > ^ Should this be here? It seems that READ_ONCE() will actually read > twice... even if that doesn't actually conflict with the required > semantics of READ_ONCE(), it looks odd. It's patched at runtime, so it's either LDAR or LDAPR. > Making a direct link between LTO and the memory model also seems highly > spurious (as discussed in the other subthread) so can we have a comment > explaining the reasoning? Sure, although like I say, this is more about helping to progress that conversation. Will
On Mon, Jul 06, 2020 at 05:00:23PM +0100, Dave Martin wrote: > On Thu, Jul 02, 2020 at 08:23:02AM +0100, Will Deacon wrote: > > On Wed, Jul 01, 2020 at 06:07:25PM +0100, Dave P Martin wrote: > > > Also, can you illustrate code that can only be unsafe with Clang LTO? > > > > I don't have a concrete example, but it's an ongoing concern over on the LTO > > thread [1], so I cooked this to show one way we could deal with it. The main > > concern is that the whole-program optimisations enabled by LTO may allow the > > compiler to enumerate possible values for a pointer at link time and replace > > an address dependency between two loads with a control dependency instead, > > defeating the dependency ordering within the CPU. > > Why can't that happen without LTO? It could, but I'd argue that it's considerably less likely because there is less information available to the compiler to perform these sorts of optimisations. It also doesn't appear to be happening in practice. The current state of affairs is that, if/when we catch the compiler performing harmful optimistations, we look for a way to disable them. However, there are good reasons to enable LTO, so this is one way to do that without having to worry about the potential impact on dependency ordering. Will
On Mon, 6 Jul 2020 at 20:35, Will Deacon <will@kernel.org> wrote: > On Mon, Jul 06, 2020 at 05:00:23PM +0100, Dave Martin wrote: > > On Thu, Jul 02, 2020 at 08:23:02AM +0100, Will Deacon wrote: > > > On Wed, Jul 01, 2020 at 06:07:25PM +0100, Dave P Martin wrote: > > > > Also, can you illustrate code that can only be unsafe with Clang LTO? > > > > > > I don't have a concrete example, but it's an ongoing concern over on the LTO > > > thread [1], so I cooked this to show one way we could deal with it. The main > > > concern is that the whole-program optimisations enabled by LTO may allow the > > > compiler to enumerate possible values for a pointer at link time and replace > > > an address dependency between two loads with a control dependency instead, > > > defeating the dependency ordering within the CPU. > > > > Why can't that happen without LTO? > > It could, but I'd argue that it's considerably less likely because there > is less information available to the compiler to perform these sorts of > optimisations. It also doesn't appear to be happening in practice. > > The current state of affairs is that, if/when we catch the compiler > performing harmful optimistations, we look for a way to disable them. > However, there are good reasons to enable LTO, so this is one way to > do that without having to worry about the potential impact on dependency > ordering. If it's of any help, I'll see if we can implement that warning in LLVM if data dependencies somehow disappear (although I don't have any cycles to pursue right now myself). Until then, short of manual inspection or encountering a bug in the wild, there is no proof any of this happens or doesn't happen. Also, as some anecdotal evidence it's extremely unlikely, even with LTO: looking at the passes that LLVM runs, there are a number of passes that seem to want to eliminate basic blocks, thereby getting rid of branches. Intuitively, it makes sense, because branches are expensive on most architectures (for GPU targets, I think it tries even harder to get rid of branches). If we extend our reasoning and assumptions of LTO's aggressiveness in that direction, we might actually end up with fewer branches. That might be beneficial for the data dependencies we worry about (but not so much for control dependencies we want to keep). Still, no point in speculating (no pun intended) until we have hard data what actually happens. :-) Thanks, -- Marco
On Mon, Jul 06, 2020 at 09:23:26PM +0200, Marco Elver wrote: > On Mon, 6 Jul 2020 at 20:35, Will Deacon <will@kernel.org> wrote: > > On Mon, Jul 06, 2020 at 05:00:23PM +0100, Dave Martin wrote: > > > On Thu, Jul 02, 2020 at 08:23:02AM +0100, Will Deacon wrote: > > > > On Wed, Jul 01, 2020 at 06:07:25PM +0100, Dave P Martin wrote: > > > > > Also, can you illustrate code that can only be unsafe with Clang LTO? > > > > > > > > I don't have a concrete example, but it's an ongoing concern over on the LTO > > > > thread [1], so I cooked this to show one way we could deal with it. The main > > > > concern is that the whole-program optimisations enabled by LTO may allow the > > > > compiler to enumerate possible values for a pointer at link time and replace > > > > an address dependency between two loads with a control dependency instead, > > > > defeating the dependency ordering within the CPU. > > > > > > Why can't that happen without LTO? > > > > It could, but I'd argue that it's considerably less likely because there > > is less information available to the compiler to perform these sorts of > > optimisations. It also doesn't appear to be happening in practice. > > > > The current state of affairs is that, if/when we catch the compiler > > performing harmful optimistations, we look for a way to disable them. > > However, there are good reasons to enable LTO, so this is one way to > > do that without having to worry about the potential impact on dependency > > ordering. > > If it's of any help, I'll see if we can implement that warning in LLVM > if data dependencies somehow disappear (although I don't have any > cycles to pursue right now myself). Until then, short of manual > inspection or encountering a bug in the wild, there is no proof any of > this happens or doesn't happen. > > Also, as some anecdotal evidence it's extremely unlikely, even with > LTO: looking at the passes that LLVM runs, there are a number of > passes that seem to want to eliminate basic blocks, thereby getting > rid of branches. Intuitively, it makes sense, because branches are > expensive on most architectures (for GPU targets, I think it tries > even harder to get rid of branches). If we extend our reasoning and > assumptions of LTO's aggressiveness in that direction, we might > actually end up with fewer branches. That might be beneficial for the > data dependencies we worry about (but not so much for control > dependencies we want to keep). Still, no point in speculating (no pun > intended) until we have hard data what actually happens. :-) Anything along these lines would be very welcome!!! Thanx, Paul
On Mon, Jul 06, 2020 at 07:35:11PM +0100, Will Deacon wrote: > On Mon, Jul 06, 2020 at 05:08:20PM +0100, Dave Martin wrote: > > On Tue, Jun 30, 2020 at 06:37:34PM +0100, Will Deacon wrote: > > > diff --git a/arch/arm64/include/asm/rwonce.h b/arch/arm64/include/asm/rwonce.h > > > new file mode 100644 > > > index 000000000000..515e360b01a1 > > > --- /dev/null > > > +++ b/arch/arm64/include/asm/rwonce.h > > > @@ -0,0 +1,63 @@ > > > +/* SPDX-License-Identifier: GPL-2.0 */ > > > +/* > > > + * Copyright (C) 2020 Google LLC. > > > + */ > > > +#ifndef __ASM_RWONCE_H > > > +#define __ASM_RWONCE_H > > > + > > > +#ifdef CONFIG_CLANG_LTO > > > + > > > +#include <linux/compiler_types.h> > > > +#include <asm/alternative-macros.h> > > > + > > > +#ifndef BUILD_VDSO > > > + > > > +#ifdef CONFIG_AS_HAS_LDAPR > > > +#define __LOAD_RCPC(sfx, regs...) \ > > > + ALTERNATIVE( \ > > > + "ldar" #sfx "\t" #regs, \ > > > > ^ Should this be here? It seems that READ_ONCE() will actually read > > twice... even if that doesn't actually conflict with the required > > semantics of READ_ONCE(), it looks odd. > > It's patched at runtime, so it's either LDAR or LDAPR. Agh ignore me, I somehow failed to sport the ALTERNATIVE(). For my understanding -- my background here is a bit shaky -- the LDAPR gives us load-to-load order even if there is just a control dependency? If so (possibly dumb question): why can't we just turn this on unconditionally? Is there a significant performance impact? I'm still confused (or ignorant) though. If both loads are READ_ONCE() then switching to LDAPR presumably helps, but otherwise, once the compiler has reduced the address dependency to a control dependency can't it then go one step further and reverse the order of the loads? LDAPR wouldn't rescue us from that. Or does the "memory" clobber in READ_ONCE() fix that for all important cases? I can't see this mattering for local variables (where it definitely won't work), but I wonder whether static variables might not count as "memory" in some situations. Discounting ridiculous things like static register variables, I think the only way for a static variable not to count as memory would be if there are no writes to it that are reachable from any translation unit entry point (possibly after dead code removal). If so, maybe that's enough. > > Making a direct link between LTO and the memory model also seems highly > > spurious (as discussed in the other subthread) so can we have a comment > > explaining the reasoning? > > Sure, although like I say, this is more about helping to progress that > conversation. That's fair enough, but when there is a consensus it would be good to see it documented in the code _especially_ if we know that the fix won't address all instances of the problem and in any case works partly by accident. That doesn't mean it's not a good practical compromise, but it could be very confusing to unpick later on. Cheers ---Dave
On Mon, Jul 06, 2020 at 10:36:28AM -0700, Paul E. McKenney wrote: > On Mon, Jul 06, 2020 at 06:05:57PM +0100, Dave Martin wrote: > > On Mon, Jul 06, 2020 at 09:34:55AM -0700, Paul E. McKenney wrote: > > > On Mon, Jul 06, 2020 at 05:00:23PM +0100, Dave Martin wrote: > > > > On Thu, Jul 02, 2020 at 08:23:02AM +0100, Will Deacon wrote: > > > > > On Wed, Jul 01, 2020 at 06:07:25PM +0100, Dave P Martin wrote: > > > > > > On Tue, Jun 30, 2020 at 06:37:34PM +0100, Will Deacon wrote: > > > > > > > When building with LTO, there is an increased risk of the compiler > > > > > > > converting an address dependency headed by a READ_ONCE() invocation > > > > > > > into a control dependency and consequently allowing for harmful > > > > > > > reordering by the CPU. > > > > > > > > > > > > > > Ensure that such transformations are harmless by overriding the generic > > > > > > > READ_ONCE() definition with one that provides acquire semantics when > > > > > > > building with LTO. > > > > > > > > > > > > > > Signed-off-by: Will Deacon <will@kernel.org> > > > > > > > --- > > > > > > > arch/arm64/include/asm/rwonce.h | 63 +++++++++++++++++++++++++++++++ > > > > > > > arch/arm64/kernel/vdso/Makefile | 2 +- > > > > > > > arch/arm64/kernel/vdso32/Makefile | 2 +- > > > > > > > 3 files changed, 65 insertions(+), 2 deletions(-) > > > > > > > create mode 100644 arch/arm64/include/asm/rwonce.h > > > > > > > > > > > > > > diff --git a/arch/arm64/include/asm/rwonce.h b/arch/arm64/include/asm/rwonce.h > > > > > > > new file mode 100644 > > > > > > > index 000000000000..515e360b01a1 > > > > > > > --- /dev/null > > > > > > > +++ b/arch/arm64/include/asm/rwonce.h > > > > > > > @@ -0,0 +1,63 @@ > > > > > > > +/* SPDX-License-Identifier: GPL-2.0 */ > > > > > > > +/* > > > > > > > + * Copyright (C) 2020 Google LLC. > > > > > > > + */ > > > > > > > +#ifndef __ASM_RWONCE_H > > > > > > > +#define __ASM_RWONCE_H > > > > > > > + > > > > > > > +#ifdef CONFIG_CLANG_LTO > > > > > > > > > > > > Don't we have a generic option for LTO that's not specific to Clang. > > > > > > > > > > /me looks at the LTO series some more > > > > > > > > > > Oh yeah, there's CONFIG_LTO which is selected by CONFIG_LTO_CLANG, which is > > > > > the non-typoed version of the above. I can switch this to CONFIG_LTO. > > > > > > > > > > > Also, can you illustrate code that can only be unsafe with Clang LTO? > > > > > > > > > > I don't have a concrete example, but it's an ongoing concern over on the LTO > > > > > thread [1], so I cooked this to show one way we could deal with it. The main > > > > > concern is that the whole-program optimisations enabled by LTO may allow the > > > > > compiler to enumerate possible values for a pointer at link time and replace > > > > > an address dependency between two loads with a control dependency instead, > > > > > defeating the dependency ordering within the CPU. > > > > > > > > Why can't that happen without LTO? > > > > > > Because without LTO, the compiler cannot see all the pointers all at > > > the same time due to their being in different translation units. > > > > > > But yes, if the compiler could see all the pointer values and further > > > -know- that it was seeing all the pointer values, these optimizations > > > could happen even without LTO. But it is quite easy to make sure that > > > the compiler thinks that there are additional pointer values that it > > > does not know about. > > > > Yes of course, but even without LTO the compiler can still apply this > > optimisation to everything visible in the translation unit, and that can > > drift as people refactor code over time. > > > > Convincing the compiler there are other possible values doesn't help. > > Even in > > > > int foo(int *p) > > { > > asm ("" : "+r" (p)); > > return *p; > > } > > > > Can't the compiler still generate something like this: > > > > switch (p) { > > case &foo: > > return foo; > > > > case &bar: > > return bar; > > > > default: > > return *p; > > } > > > > ...in which case we still have the same lost ordering guarantee that > > we were trying to enforce. > > > > If foo and bar already happen to be in registers and profiling shows > > that &foo and &bar are the most likely value of p then this might be > > a reasonable optimisation in some situations, irrespective of LTO. > > Agreed, the additional information from profile-driven optimization > can be just as damaging as that from LTO. > > > The underlying problem here seems to be that the necessary ordering > > rule is not part of what passes for the C memory model prior to C11. > > If we want to control the data flow, don't we have to wrap the entire > > dereference in a macro? > > Yes, exactly. Because we are relying on things that are not guaranteed > by the C memory model, we need to pay attention to the implementations. > As I have said elsewhere, the price of control dependencies is eternal > vigilance. > > And this also applies, to a lesser extent, to address and data > dependencies, which are also not well supported by the C standard. > > There is one important case in which the C memory model -does- support > control dependencies, and that is when the dependent write is a normal > C-language write that is not involved in a data race. In that case, > if the compiler broke the control dependency, it might have introduced > a data race, which it is forbidden to do. However, this rule can also > be broken when the compiler knows too much, as it might be able to prove > that breaking the dependency won't introduce a data race. In that case, > according to the standard, it is free to break the dependency. Which only matters because the C abstract machine may not match reality. LTO has no bearing on the abstract machine though. If specific compiler options etc. can be added to inhibit the problematic optimisations, that would be ideal. I guess that can't happen overnight though. > > > > > We likely won't realise if/when this goes wrong, other than impossible to > > > > > debug, subtle breakage that crops up seemingly randomly. Ideally, we'd be > > > > > able to detect this sort of thing happening at build time, and perhaps > > > > > even prevent it with compiler options or annotations, but none of that is > > > > > close to being available and I'm keen to progress the LTO patches in the > > > > > meantime because they are a requirement for CFI. > > > > > > > > My concern was not so much why LTO makes things dangerous, as why !LTO > > > > makes things safe... > > > > > > Because ignorant compilers are safe compilers! ;-) > > > > AFAICT ignorance is no gurantee of ordering in general -- the compiler > > is free to speculatively invent knowledge any place that the language > > spec allows it to. !LTO doesn't stop this happening. > > Agreed, according to the standard, the compiler has great freedom. > > We have two choices: (1) Restrict ourselves to live within the confines of > the standard or (2) Pay continued close attention to the implementation. > We have made different choices at different times, but for many ordering > situations we have gone with door #2. > > Me, I have been working to get the standard to better support our > use case. This is at best slow going. But don't take my word for it, > ask Will. I can believe it. They want to enable optimisations rather than prevent them... > > Hopefully some of the knowledge I invented in my reply is valid... > > It is. It is just that there are multiple valid strategies, and the > Linux kernel is currently taking a mixed-strategy approach. Ack. The hope that there is a correct way to fix everything dies hard ;) Life was cosier before I started trying to reason about language specs. Cheers ---Dave
On Tue, Jul 07, 2020 at 11:29:15AM +0100, Dave Martin wrote: > On Mon, Jul 06, 2020 at 10:36:28AM -0700, Paul E. McKenney wrote: > > On Mon, Jul 06, 2020 at 06:05:57PM +0100, Dave Martin wrote: [ . . . ] > > > The underlying problem here seems to be that the necessary ordering > > > rule is not part of what passes for the C memory model prior to C11. > > > If we want to control the data flow, don't we have to wrap the entire > > > dereference in a macro? > > > > Yes, exactly. Because we are relying on things that are not guaranteed > > by the C memory model, we need to pay attention to the implementations. > > As I have said elsewhere, the price of control dependencies is eternal > > vigilance. > > > > And this also applies, to a lesser extent, to address and data > > dependencies, which are also not well supported by the C standard. > > > > There is one important case in which the C memory model -does- support > > control dependencies, and that is when the dependent write is a normal > > C-language write that is not involved in a data race. In that case, > > if the compiler broke the control dependency, it might have introduced > > a data race, which it is forbidden to do. However, this rule can also > > be broken when the compiler knows too much, as it might be able to prove > > that breaking the dependency won't introduce a data race. In that case, > > according to the standard, it is free to break the dependency. > > Which only matters because the C abstract machine may not match reality. > > LTO has no bearing on the abstract machine though. > > If specific compiler options etc. can be added to inhibit the > problematic optimisations, that would be ideal. I guess that can't > happen overnight though. Sadly, I must agree. > > > > > > We likely won't realise if/when this goes wrong, other than impossible to > > > > > > debug, subtle breakage that crops up seemingly randomly. Ideally, we'd be > > > > > > able to detect this sort of thing happening at build time, and perhaps > > > > > > even prevent it with compiler options or annotations, but none of that is > > > > > > close to being available and I'm keen to progress the LTO patches in the > > > > > > meantime because they are a requirement for CFI. > > > > > > > > > > My concern was not so much why LTO makes things dangerous, as why !LTO > > > > > makes things safe... > > > > > > > > Because ignorant compilers are safe compilers! ;-) > > > > > > AFAICT ignorance is no gurantee of ordering in general -- the compiler > > > is free to speculatively invent knowledge any place that the language > > > spec allows it to. !LTO doesn't stop this happening. > > > > Agreed, according to the standard, the compiler has great freedom. > > > > We have two choices: (1) Restrict ourselves to live within the confines of > > the standard or (2) Pay continued close attention to the implementation. > > We have made different choices at different times, but for many ordering > > situations we have gone with door #2. > > > > Me, I have been working to get the standard to better support our > > use case. This is at best slow going. But don't take my word for it, > > ask Will. > > I can believe it. They want to enable optimisations rather than prevent > them... Right in one! ;-) > > > Hopefully some of the knowledge I invented in my reply is valid... > > > > It is. It is just that there are multiple valid strategies, and the > > Linux kernel is currently taking a mixed-strategy approach. > > Ack. The hope that there is a correct way to fix everything dies > hard ;) Either that, or one slowly degrades ones definition of "correct". :-/ > Life was cosier before I started trying to reason about language specs. Same here! Thanx, Paul
I'm trying to put together a Micro Conference for Linux Plumbers conference focused on "make LLVM slightly less shitty." Do you all plan on attending the conference? Would it be worthwhile to hold a session focused on discussing this (LTO and memory models) be worthwhile? On Tue, Jul 7, 2020 at 3:51 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > On Tue, Jul 07, 2020 at 11:29:15AM +0100, Dave Martin wrote: > > On Mon, Jul 06, 2020 at 10:36:28AM -0700, Paul E. McKenney wrote: > > > On Mon, Jul 06, 2020 at 06:05:57PM +0100, Dave Martin wrote: > > [ . . . ] > > > > > The underlying problem here seems to be that the necessary ordering > > > > rule is not part of what passes for the C memory model prior to C11. > > > > If we want to control the data flow, don't we have to wrap the entire > > > > dereference in a macro? > > > > > > Yes, exactly. Because we are relying on things that are not guaranteed > > > by the C memory model, we need to pay attention to the implementations. > > > As I have said elsewhere, the price of control dependencies is eternal > > > vigilance. > > > > > > And this also applies, to a lesser extent, to address and data > > > dependencies, which are also not well supported by the C standard. > > > > > > There is one important case in which the C memory model -does- support > > > control dependencies, and that is when the dependent write is a normal > > > C-language write that is not involved in a data race. In that case, > > > if the compiler broke the control dependency, it might have introduced > > > a data race, which it is forbidden to do. However, this rule can also > > > be broken when the compiler knows too much, as it might be able to prove > > > that breaking the dependency won't introduce a data race. In that case, > > > according to the standard, it is free to break the dependency. > > > > Which only matters because the C abstract machine may not match reality. > > > > LTO has no bearing on the abstract machine though. > > > > If specific compiler options etc. can be added to inhibit the > > problematic optimisations, that would be ideal. I guess that can't > > happen overnight though. > > Sadly, I must agree. > > > > > > > > We likely won't realise if/when this goes wrong, other than impossible to > > > > > > > debug, subtle breakage that crops up seemingly randomly. Ideally, we'd be > > > > > > > able to detect this sort of thing happening at build time, and perhaps > > > > > > > even prevent it with compiler options or annotations, but none of that is > > > > > > > close to being available and I'm keen to progress the LTO patches in the > > > > > > > meantime because they are a requirement for CFI. > > > > > > > > > > > > My concern was not so much why LTO makes things dangerous, as why !LTO > > > > > > makes things safe... > > > > > > > > > > Because ignorant compilers are safe compilers! ;-) > > > > > > > > AFAICT ignorance is no gurantee of ordering in general -- the compiler > > > > is free to speculatively invent knowledge any place that the language > > > > spec allows it to. !LTO doesn't stop this happening. > > > > > > Agreed, according to the standard, the compiler has great freedom. > > > > > > We have two choices: (1) Restrict ourselves to live within the confines of > > > the standard or (2) Pay continued close attention to the implementation. > > > We have made different choices at different times, but for many ordering > > > situations we have gone with door #2. > > > > > > Me, I have been working to get the standard to better support our > > > use case. This is at best slow going. But don't take my word for it, > > > ask Will. > > > > I can believe it. They want to enable optimisations rather than prevent > > them... > > Right in one! ;-) > > > > > Hopefully some of the knowledge I invented in my reply is valid... > > > > > > It is. It is just that there are multiple valid strategies, and the > > > Linux kernel is currently taking a mixed-strategy approach. > > > > Ack. The hope that there is a correct way to fix everything dies > > hard ;) > > Either that, or one slowly degrades ones definition of "correct". :-/ > > > Life was cosier before I started trying to reason about language specs. > > Same here! > > Thanx, Paul
On Wed, 8 Jul 2020 at 01:01, Nick Desaulniers <ndesaulniers@google.com> wrote: > > I'm trying to put together a Micro Conference for Linux Plumbers > conference focused on "make LLVM slightly less shitty." Do you all > plan on attending the conference? Would it be worthwhile to hold a > session focused on discussing this (LTO and memory models) be > worthwhile? I would welcome sessions on LLVM, and would try to attend. Apart from general improvements to the LLVM ecosystem, we should also emphasize the benefits LLVM provides and how we can enable them (one reason we want LTO is to get CFI). Regarding LTO and memory models, I'm not sure. Given the current state of things, such a discussion needs to be carefully framed to not go in circles, because we're trying to figure out things at the intersection of architecture, what the compiler does, the C standard, and the kernel wants. And because some of these boxes are difficult to change (standard, arch, compiler) or difficult to precisely define behaviour (compiler), we might end up going in circles. From what I see there are efforts to fix the situation at the root (standard), and we might have means to get the compiler to tell us what it's doing. But these happen extremely slowly. So, if we do this, we need to be careful to not end up re-discussing what we discussed here, but rather try and make it a continuation that hopefully leads to some constructive output. Thanks, -- Marco
On Tue, Jul 07, 2020 at 04:01:28PM -0700, Nick Desaulniers wrote: > I'm trying to put together a Micro Conference for Linux Plumbers > conference focused on "make LLVM slightly less shitty." Do you all > plan on attending the conference? Would it be worthwhile to hold a > session focused on discussing this (LTO and memory models) be > worthwhile? I'd love to have a session about compilers and memory ordering with both GCC and CLANG in attendance. We need a solution for dependent-loads and control-dependencies for both toolchains.
On Wed, Jul 08, 2020 at 11:16:20AM +0200, Peter Zijlstra wrote: > On Tue, Jul 07, 2020 at 04:01:28PM -0700, Nick Desaulniers wrote: > > I'm trying to put together a Micro Conference for Linux Plumbers > > conference focused on "make LLVM slightly less shitty." Do you all > > plan on attending the conference? Would it be worthwhile to hold a > > session focused on discussing this (LTO and memory models) be > > worthwhile? > > I'd love to have a session about compilers and memory ordering with both > GCC and CLANG in attendance. > > We need a solution for dependent-loads and control-dependencies for both > toolchains. What Peter said! ;-) Thanx, Paul
diff --git a/arch/arm64/include/asm/rwonce.h b/arch/arm64/include/asm/rwonce.h new file mode 100644 index 000000000000..515e360b01a1 --- /dev/null +++ b/arch/arm64/include/asm/rwonce.h @@ -0,0 +1,63 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2020 Google LLC. + */ +#ifndef __ASM_RWONCE_H +#define __ASM_RWONCE_H + +#ifdef CONFIG_CLANG_LTO + +#include <linux/compiler_types.h> +#include <asm/alternative-macros.h> + +#ifndef BUILD_VDSO + +#ifdef CONFIG_AS_HAS_LDAPR +#define __LOAD_RCPC(sfx, regs...) \ + ALTERNATIVE( \ + "ldar" #sfx "\t" #regs, \ + ".arch_extension rcpc\n" \ + "ldapr" #sfx "\t" #regs, \ + ARM64_HAS_LDAPR) +#else +#define __LOAD_RCPC(sfx, regs...) "ldar" #sfx "\t" #regs +#endif /* CONFIG_AS_HAS_LDAPR */ + +#define __READ_ONCE(x) \ +({ \ + int atomic = 1; \ + union { __unqual_scalar_typeof(x) __val; char __c[1]; } __u; \ + typeof(&(x)) __x = &(x); \ + switch (sizeof(x)) { \ + case 1: \ + asm volatile(__LOAD_RCPC(b, %w0, %1) \ + : "=r" (*(__u8 *)__u.__c) \ + : "Q" (*__x) : "memory"); \ + break; \ + case 2: \ + asm volatile(__LOAD_RCPC(h, %w0, %1) \ + : "=r" (*(__u16 *)__u.__c) \ + : "Q" (*__x) : "memory"); \ + break; \ + case 4: \ + asm volatile(__LOAD_RCPC(, %w0, %1) \ + : "=r" (*(__u32 *)__u.__c) \ + : "Q" (*__x) : "memory"); \ + break; \ + case 8: \ + asm volatile(__LOAD_RCPC(, %0, %1) \ + : "=r" (*(__u64 *)__u.__c) \ + : "Q" (*__x) : "memory"); \ + break; \ + default: \ + atomic = 0; \ + } \ + atomic ? (typeof(x))__u.__val : (*(volatile typeof(x) *)__x); \ +}) + +#endif /* !BUILD_VDSO */ +#endif /* CONFIG_CLANG_LTO */ + +#include <asm-generic/rwonce.h> + +#endif /* __ASM_RWONCE_H */ diff --git a/arch/arm64/kernel/vdso/Makefile b/arch/arm64/kernel/vdso/Makefile index 45d5cfe46429..60df97f2e7de 100644 --- a/arch/arm64/kernel/vdso/Makefile +++ b/arch/arm64/kernel/vdso/Makefile @@ -28,7 +28,7 @@ ldflags-y := -shared -nostdlib -soname=linux-vdso.so.1 --hash-style=sysv \ $(btildflags-y) -T ccflags-y := -fno-common -fno-builtin -fno-stack-protector -ffixed-x18 -ccflags-y += -DDISABLE_BRANCH_PROFILING +ccflags-y += -DDISABLE_BRANCH_PROFILING -DBUILD_VDSO CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os $(CC_FLAGS_SCS) $(GCC_PLUGINS_CFLAGS) KBUILD_CFLAGS += $(DISABLE_LTO) diff --git a/arch/arm64/kernel/vdso32/Makefile b/arch/arm64/kernel/vdso32/Makefile index d88148bef6b0..4fdf3754a058 100644 --- a/arch/arm64/kernel/vdso32/Makefile +++ b/arch/arm64/kernel/vdso32/Makefile @@ -43,7 +43,7 @@ cc32-as-instr = $(call try-run,\ # As a result we set our own flags here. # KBUILD_CPPFLAGS and NOSTDINC_FLAGS from top-level Makefile -VDSO_CPPFLAGS := -D__KERNEL__ -nostdinc -isystem $(shell $(CC_COMPAT) -print-file-name=include) +VDSO_CPPFLAGS := -DBUILD_VDSO -D__KERNEL__ -nostdinc -isystem $(shell $(CC_COMPAT) -print-file-name=include) VDSO_CPPFLAGS += $(LINUXINCLUDE) # Common C and assembly flags
When building with LTO, there is an increased risk of the compiler converting an address dependency headed by a READ_ONCE() invocation into a control dependency and consequently allowing for harmful reordering by the CPU. Ensure that such transformations are harmless by overriding the generic READ_ONCE() definition with one that provides acquire semantics when building with LTO. Signed-off-by: Will Deacon <will@kernel.org> --- arch/arm64/include/asm/rwonce.h | 63 +++++++++++++++++++++++++++++++ arch/arm64/kernel/vdso/Makefile | 2 +- arch/arm64/kernel/vdso32/Makefile | 2 +- 3 files changed, 65 insertions(+), 2 deletions(-) create mode 100644 arch/arm64/include/asm/rwonce.h