[v2,18/18] arm64: select ARCH_SUPPORTS_LTO_CLANG

Message ID	20171115213428.22559-19-samitolvanen@google.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org> From: Sami Tolvanen <samitolvanen@google.com> To: Alex Matveev <alxmtvv@gmail.com>, Andi Kleen <ak@linux.intel.com>, Ard Biesheuvel <ard.biesheuvel@linaro.org>, Greg Hackmann <ghackmann@google.com>, Kees Cook <keescook@chromium.org>, linux-arm-kernel@lists.infradead.org, linux-kbuild@vger.kernel.org, linux-kernel@vger.kernel.org, Mark Rutland <mark.rutland@arm.com>, Masahiro Yamada <yamada.masahiro@socionext.com>, Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org>, Michal Marek <michal.lkml@markovi.net>, Nick Desaulniers <ndesaulniers@google.com>, Yury Norov <ynorov@caviumnetworks.com>, Matthias Kaehlcke <mka@chromium.org> Subject: [PATCH v2 18/18] arm64: select ARCH_SUPPORTS_LTO_CLANG Date: Wed, 15 Nov 2017 13:34:28 -0800 Message-Id: <20171115213428.22559-19-samitolvanen@google.com> In-Reply-To: <20171115213428.22559-1-samitolvanen@google.com> References: <20171115213428.22559-1-samitolvanen@google.com> Precedence: list Cc: Sami Tolvanen <samitolvanen@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org> Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org

Sami Tolvanen Nov. 15, 2017, 9:34 p.m. UTC

Allow CONFIG_LTO_CLANG to be enabled for the architecture.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
---
 arch/arm64/Kconfig | 1 +
 1 file changed, 1 insertion(+)

Will Deacon Nov. 16, 2017, 11:58 a.m. UTC | #1

Hi Sami,

On Wed, Nov 15, 2017 at 01:34:28PM -0800, Sami Tolvanen wrote:
> Allow CONFIG_LTO_CLANG to be enabled for the architecture.
> 
> Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
> ---
>  arch/arm64/Kconfig | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 3a70f763e18a..58504327b9f6 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -40,6 +40,7 @@ config ARM64
>  	select ARCH_INLINE_WRITE_UNLOCK_IRQRESTORE if !PREEMPT
>  	select ARCH_USE_CMPXCHG_LOCKREF
>  	select ARCH_USE_QUEUED_RWLOCKS
> +	select ARCH_SUPPORTS_LTO_CLANG

I'll be honest with you: I'm absolutely terrified about enabling this.
How much testing has this seen?

The main thing that worries me is that this gives the toolchain a lot
more freedom to break dependency ordering with RCU, leading to subtle
concurrency issues that would actually break on arm64. Specifically,
I'm worried about value analysis that could potentially convert an
address dependency into a control dependency. Right now, the C standard
isn't on our side here and we're relying on the compiler not doing this
kind of thing. Can we continue to rely on that in the face of LTO?

Will

Sami Tolvanen Nov. 16, 2017, 4:17 p.m. UTC | #2

On Thu, Nov 16, 2017 at 11:58:11AM +0000, Will Deacon wrote:
> I'll be honest with you: I'm absolutely terrified about enabling this.

That's understandable, I wouldn't want to enable this by default
quite yet either. This patch doesn't enable LTO for arm64, just makes
it possible to enable the feature. I'm perfectly fine with marking
CONFIG_LTO_CLANG experimental if it makes people more comfortable.

> How much testing has this seen?

I've been running clang LTO kernels for a few months on a Pixel 2 device
without any issues. This is on a 4.4 kernel though.

> Right now, the C standard isn't on our side here and we're relying on
> the compiler not doing this kind of thing. Can we continue to rely on
> that in the face of LTO?

I'll have to check with our LLVM experts, but I have not run into these
issues with current compiler versions. Looking at Andi's old patches,
looks like gcc might be more aggressive in reordering things with LTO
than clang.

Sami

Peter Zijlstra Nov. 16, 2017, 4:30 p.m. UTC | #3

On Thu, Nov 16, 2017 at 08:17:31AM -0800, Sami Tolvanen wrote:
> On Thu, Nov 16, 2017 at 11:58:11AM +0000, Will Deacon wrote:
> > I'll be honest with you: I'm absolutely terrified about enabling this.
> 
> That's understandable, I wouldn't want to enable this by default
> quite yet either. This patch doesn't enable LTO for arm64, just makes
> it possible to enable the feature. I'm perfectly fine with marking
> CONFIG_LTO_CLANG experimental if it makes people more comfortable.
> 
> > How much testing has this seen?
> 
> I've been running clang LTO kernels for a few months on a Pixel 2 device
> without any issues. This is on a 4.4 kernel though.
> 
> > Right now, the C standard isn't on our side here and we're relying on
> > the compiler not doing this kind of thing. Can we continue to rely on
> > that in the face of LTO?
> 
> I'll have to check with our LLVM experts, but I have not run into these
> issues with current compiler versions. Looking at Andi's old patches,
> looks like gcc might be more aggressive in reordering things with LTO
> than clang.

Ideally we'd get the toolchain people to commit to supporting the kernel
memory model along side the C11 one. That would help a ton.

Nick Desaulniers Nov. 16, 2017, 4:50 p.m. UTC | #4

On Thu, Nov 16, 2017 at 8:30 AM, Peter Zijlstra <peterz@infradead.org> wrote:

> Ideally we'd get the toolchain people to commit to supporting the kernel
> memory model along side the C11 one. That would help a ton.

Does anyone from the kernel side participate in the C standardization process?

Peter Zijlstra Nov. 16, 2017, 4:59 p.m. UTC | #5

On Thu, Nov 16, 2017 at 08:50:41AM -0800, Nick Desaulniers wrote:
> On Thu, Nov 16, 2017 at 8:30 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > Ideally we'd get the toolchain people to commit to supporting the kernel
> > memory model along side the C11 one. That would help a ton.
> 
> Does anyone from the kernel side participate in the C standardization process?

Yes, Paul McKenney and Will Deacon. Doesn't mean these two can still be
reconciled though. From what I understand C11 (and onwards) are
incompatible with the kernel model on a number of subtle points.

Not to mention that there's people in the C11 process that strongly
argue for stuff that would break every single multi-threaded program
written since the 70s, which would include pretty much all OS kernels.

Nick Desaulniers Nov. 16, 2017, 5:16 p.m. UTC | #6

On Thu, Nov 16, 2017 at 8:59 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, Nov 16, 2017 at 08:50:41AM -0800, Nick Desaulniers wrote:
>> On Thu, Nov 16, 2017 at 8:30 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>>
>> > Ideally we'd get the toolchain people to commit to supporting the kernel
>> > memory model along side the C11 one. That would help a ton.
>>
>> Does anyone from the kernel side participate in the C standardization process?
>
> Yes, Paul McKenney and Will Deacon. Doesn't mean these two can still be
> reconciled though. From what I understand C11 (and onwards) are
> incompatible with the kernel model on a number of subtle points.

It would be good to have these incompatibilities written down, then
for the sake of argument, they can be cited both for discussions on
LKML and in the C standardization process.  For example, a running
list in Documentation/ or something would make it so that anyone could
understand and cite current issues with the latest C standard.

I don't understand why we'd block patches for enabling experimental
features.  We've been running this patch-set on actual devices for
months and would love to provide them to the community for further
testing.  If bugs are found, then there's more evidence to bring to
the C standards committee.  Otherwise we're shutting down feature
development for the sake of potential bugs in a C standard we're not
even using.

Peter Zijlstra Nov. 16, 2017, 5:34 p.m. UTC | #7

On Thu, Nov 16, 2017 at 09:16:49AM -0800, Nick Desaulniers wrote:
> On Thu, Nov 16, 2017 at 8:59 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Thu, Nov 16, 2017 at 08:50:41AM -0800, Nick Desaulniers wrote:
> >> On Thu, Nov 16, 2017 at 8:30 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> >>
> >> > Ideally we'd get the toolchain people to commit to supporting the kernel
> >> > memory model along side the C11 one. That would help a ton.
> >>
> >> Does anyone from the kernel side participate in the C standardization process?
> >
> > Yes, Paul McKenney and Will Deacon. Doesn't mean these two can still be
> > reconciled though. From what I understand C11 (and onwards) are
> > incompatible with the kernel model on a number of subtle points.
> 
> It would be good to have these incompatibilities written down, then
> for the sake of argument, they can be cited both for discussions on
> LKML and in the C standardization process.  For example, a running
> list in Documentation/ or something would make it so that anyone could
> understand and cite current issues with the latest C standard.

Will should be able to produce this list; I know he's done before, I
just can't find it -- my Google-foo isn't strong today.

> I don't understand why we'd block patches for enabling experimental
> features.  We've been running this patch-set on actual devices for
> months and would love to provide them to the community for further
> testing.  If bugs are found, then there's more evidence to bring to
> the C standards committee.  Otherwise we're shutting down feature
> development for the sake of potential bugs in a C standard we're not
> even using.

So the problem is that its very very hard (and painful) to find these
bugs. Getting the tools people to comment on these specific
optimizations would really help lots.

Paul E. McKenney Nov. 16, 2017, 5:48 p.m. UTC | #8

On Thu, Nov 16, 2017 at 06:34:17PM +0100, Peter Zijlstra wrote:
> On Thu, Nov 16, 2017 at 09:16:49AM -0800, Nick Desaulniers wrote:
> > On Thu, Nov 16, 2017 at 8:59 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > > On Thu, Nov 16, 2017 at 08:50:41AM -0800, Nick Desaulniers wrote:
> > >> On Thu, Nov 16, 2017 at 8:30 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > >>
> > >> > Ideally we'd get the toolchain people to commit to supporting the kernel
> > >> > memory model along side the C11 one. That would help a ton.
> > >>
> > >> Does anyone from the kernel side participate in the C standardization process?
> > >
> > > Yes, Paul McKenney and Will Deacon. Doesn't mean these two can still be
> > > reconciled though. From what I understand C11 (and onwards) are
> > > incompatible with the kernel model on a number of subtle points.
> > 
> > It would be good to have these incompatibilities written down, then
> > for the sake of argument, they can be cited both for discussions on
> > LKML and in the C standardization process.  For example, a running
> > list in Documentation/ or something would make it so that anyone could
> > understand and cite current issues with the latest C standard.
> 
> Will should be able to produce this list; I know he's done before, I
> just can't find it -- my Google-foo isn't strong today.

Here you go:

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0124r4.html

> > I don't understand why we'd block patches for enabling experimental
> > features.  We've been running this patch-set on actual devices for
> > months and would love to provide them to the community for further
> > testing.  If bugs are found, then there's more evidence to bring to
> > the C standards committee.  Otherwise we're shutting down feature
> > development for the sake of potential bugs in a C standard we're not
> > even using.
> 
> So the problem is that its very very hard (and painful) to find these
> bugs. Getting the tools people to comment on these specific
> optimizations would really help lots.

It would be good to get something similar to LKMM into KTSAN, for
example.  There would probably be a few differences due to efficiency
concerns, but closer is better than less close.  ;-)

							Thanx, Paul

Nick Desaulniers Nov. 16, 2017, 6:16 p.m. UTC | #9

On Thu, Nov 16, 2017 at 9:48 AM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
> On Thu, Nov 16, 2017 at 06:34:17PM +0100, Peter Zijlstra wrote:
>> On Thu, Nov 16, 2017 at 09:16:49AM -0800, Nick Desaulniers wrote:
>> > On Thu, Nov 16, 2017 at 8:59 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>> > > On Thu, Nov 16, 2017 at 08:50:41AM -0800, Nick Desaulniers wrote:
>> > >> On Thu, Nov 16, 2017 at 8:30 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>> > >>
>> > >> > Ideally we'd get the toolchain people to commit to supporting the kernel
>> > >> > memory model along side the C11 one. That would help a ton.
>> > >>
>> > >> Does anyone from the kernel side participate in the C standardization process?
>> > >
>> > > Yes, Paul McKenney and Will Deacon. Doesn't mean these two can still be
>> > > reconciled though. From what I understand C11 (and onwards) are
>> > > incompatible with the kernel model on a number of subtle points.
>> >
>> > It would be good to have these incompatibilities written down, then
>> > for the sake of argument, they can be cited both for discussions on
>> > LKML and in the C standardization process.  For example, a running
>> > list in Documentation/ or something would make it so that anyone could
>> > understand and cite current issues with the latest C standard.
>>
>> Will should be able to produce this list; I know he's done before, I
>> just can't find it -- my Google-foo isn't strong today.
>
> Here you go:
>
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0124r4.html

Great, thanks! Will take some time to digest, but happy to refer
others to this important work in the future.

I wonder if we have anything like a case study that shows specifically
how a compiler generated a subtle bug due to specifics of the memory
model, like a "if you do this, here's the problematic code that will
get generated, and why it's problematic due to the memory model."
That may be a good way to raise issues with toolchain developers with
concrete and actionable examples.

>> > I don't understand why we'd block patches for enabling experimental
>> > features.  We've been running this patch-set on actual devices for
>> > months and would love to provide them to the community for further
>> > testing.  If bugs are found, then there's more evidence to bring to
>> > the C standards committee.  Otherwise we're shutting down feature
>> > development for the sake of potential bugs in a C standard we're not
>> > even using.
>>
>> So the problem is that its very very hard (and painful) to find these
>> bugs. Getting the tools people to comment on these specific
>> optimizations would really help lots.

No doubt; I do not disagree with you.  Kernel developers have very
important use cases for the language.

But the core point I'm trying to make is "do we need to block this
patch set until issues with the C standards body in regards to the
kernels memory model are resolved?"  I would hope the two are
orthogonal and that we'd take them and then test them even more
extensively than the developer has in order to find out.

> It would be good to get something similar to LKMM into KTSAN, for
> example.  There would probably be a few differences due to efficiency
> concerns, but closer is better than less close.  ;-)

+ glider, who may be able to comment on the state of KTSAN.

Paul E. McKenney Nov. 16, 2017, 6:39 p.m. UTC | #10

On Thu, Nov 16, 2017 at 10:16:22AM -0800, Nick Desaulniers wrote:
> On Thu, Nov 16, 2017 at 9:48 AM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> > On Thu, Nov 16, 2017 at 06:34:17PM +0100, Peter Zijlstra wrote:
> >> On Thu, Nov 16, 2017 at 09:16:49AM -0800, Nick Desaulniers wrote:
> >> > On Thu, Nov 16, 2017 at 8:59 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> >> > > On Thu, Nov 16, 2017 at 08:50:41AM -0800, Nick Desaulniers wrote:
> >> > >> On Thu, Nov 16, 2017 at 8:30 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> >> > >>
> >> > >> > Ideally we'd get the toolchain people to commit to supporting the kernel
> >> > >> > memory model along side the C11 one. That would help a ton.
> >> > >>
> >> > >> Does anyone from the kernel side participate in the C standardization process?
> >> > >
> >> > > Yes, Paul McKenney and Will Deacon. Doesn't mean these two can still be
> >> > > reconciled though. From what I understand C11 (and onwards) are
> >> > > incompatible with the kernel model on a number of subtle points.
> >> >
> >> > It would be good to have these incompatibilities written down, then
> >> > for the sake of argument, they can be cited both for discussions on
> >> > LKML and in the C standardization process.  For example, a running
> >> > list in Documentation/ or something would make it so that anyone could
> >> > understand and cite current issues with the latest C standard.
> >>
> >> Will should be able to produce this list; I know he's done before, I
> >> just can't find it -- my Google-foo isn't strong today.
> >
> > Here you go:
> >
> > http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0124r4.html
> 
> Great, thanks! Will take some time to digest, but happy to refer
> others to this important work in the future.

Glad you like it!

> I wonder if we have anything like a case study that shows specifically
> how a compiler generated a subtle bug due to specifics of the memory
> model, like a "if you do this, here's the problematic code that will
> get generated, and why it's problematic due to the memory model."
> That may be a good way to raise issues with toolchain developers with
> concrete and actionable examples.

Well, the above is an official working paper from the C++ standards
committee.

The first priority is to fix memory_order_consume.  Which is getting a
bit more mindshare of late.  As Fedor Pikus said at CPPCON: "If you have
a large software project, you are probably already using RCU.  But you
don't know that you are using it, and you are probably doing it wrong."

> >> > I don't understand why we'd block patches for enabling experimental
> >> > features.  We've been running this patch-set on actual devices for
> >> > months and would love to provide them to the community for further
> >> > testing.  If bugs are found, then there's more evidence to bring to
> >> > the C standards committee.  Otherwise we're shutting down feature
> >> > development for the sake of potential bugs in a C standard we're not
> >> > even using.
> >>
> >> So the problem is that its very very hard (and painful) to find these
> >> bugs. Getting the tools people to comment on these specific
> >> optimizations would really help lots.
> 
> No doubt; I do not disagree with you.  Kernel developers have very
> important use cases for the language.
> 
> But the core point I'm trying to make is "do we need to block this
> patch set until issues with the C standards body in regards to the
> kernels memory model are resolved?"  I would hope the two are
> orthogonal and that we'd take them and then test them even more
> extensively than the developer has in order to find out.

Given that I have been working on getting the C and C++ standards to
correctly handle rcu_dereference() for more than ten years, I recommend
-against- waiting on standardization in the strongest possible terms.
And if you think that ten years is bad, the Java standards community has
been struggling with out-of-thin-air (OOTA) values for almost 20 years.
And the C and C++ standards haven't solved OOTA, either.

							Thanx, Paul

> > It would be good to get something similar to LKMM into KTSAN, for
> > example.  There would probably be a few differences due to efficiency
> > concerns, but closer is better than less close.  ;-)
> 
> + glider, who may be able to comment on the state of KTSAN.
>

Will Deacon Nov. 16, 2017, 6:45 p.m. UTC | #11

On Thu, Nov 16, 2017 at 10:39:51AM -0800, Paul E. McKenney wrote:
> On Thu, Nov 16, 2017 at 10:16:22AM -0800, Nick Desaulniers wrote:
> > > On Thu, Nov 16, 2017 at 06:34:17PM +0100, Peter Zijlstra wrote:
> > >> So the problem is that its very very hard (and painful) to find these
> > >> bugs. Getting the tools people to comment on these specific
> > >> optimizations would really help lots.
> > 
> > No doubt; I do not disagree with you.  Kernel developers have very
> > important use cases for the language.
> > 
> > But the core point I'm trying to make is "do we need to block this
> > patch set until issues with the C standards body in regards to the
> > kernels memory model are resolved?"  I would hope the two are
> > orthogonal and that we'd take them and then test them even more
> > extensively than the developer has in order to find out.
> 
> Given that I have been working on getting the C and C++ standards to
> correctly handle rcu_dereference() for more than ten years, I recommend
> -against- waiting on standardization in the strongest possible terms.
> And if you think that ten years is bad, the Java standards community has
> been struggling with out-of-thin-air (OOTA) values for almost 20 years.
> And the C and C++ standards haven't solved OOTA, either.

The problem is, if we go ahead with this change, the compiler *will* break
some address dependencies and something will eventually go wrong. At that
point, what do we do? Turn off some random compiler optimisation? Add a
random barrier()?

We don't necessarily need standardisation, but we at least need guarantees
from the compiler implementation that LTO/PGO will respect source level
address dependencies. I don't think we have that today.

Will

Paul E. McKenney Nov. 16, 2017, 7:13 p.m. UTC | #12

On Thu, Nov 16, 2017 at 06:45:08PM +0000, Will Deacon wrote:
> On Thu, Nov 16, 2017 at 10:39:51AM -0800, Paul E. McKenney wrote:
> > On Thu, Nov 16, 2017 at 10:16:22AM -0800, Nick Desaulniers wrote:
> > > > On Thu, Nov 16, 2017 at 06:34:17PM +0100, Peter Zijlstra wrote:
> > > >> So the problem is that its very very hard (and painful) to find these
> > > >> bugs. Getting the tools people to comment on these specific
> > > >> optimizations would really help lots.
> > > 
> > > No doubt; I do not disagree with you.  Kernel developers have very
> > > important use cases for the language.
> > > 
> > > But the core point I'm trying to make is "do we need to block this
> > > patch set until issues with the C standards body in regards to the
> > > kernels memory model are resolved?"  I would hope the two are
> > > orthogonal and that we'd take them and then test them even more
> > > extensively than the developer has in order to find out.
> > 
> > Given that I have been working on getting the C and C++ standards to
> > correctly handle rcu_dereference() for more than ten years, I recommend
> > -against- waiting on standardization in the strongest possible terms.
> > And if you think that ten years is bad, the Java standards community has
> > been struggling with out-of-thin-air (OOTA) values for almost 20 years.
> > And the C and C++ standards haven't solved OOTA, either.
> 
> The problem is, if we go ahead with this change, the compiler *will* break
> some address dependencies and something will eventually go wrong. At that
> point, what do we do? Turn off some random compiler optimisation? Add a
> random barrier()?
> 
> We don't necessarily need standardisation, but we at least need guarantees
> from the compiler implementation that LTO/PGO will respect source level
> address dependencies. I don't think we have that today.

Ah, if "this patch set" meant "adding LTO", I stand corrected and I
apologize for my confusion.

I agree that we need LTO/PGO to be housebroken from an LKMM viewpoint
before it is enabled.

							Thanx, Paul

Sami Tolvanen Nov. 16, 2017, 8:17 p.m. UTC | #13

On Thu, Nov 16, 2017 at 11:13:07AM -0800, Paul E. McKenney wrote:
> Ah, if "this patch set" meant "adding LTO", I stand corrected and I
> apologize for my confusion.

Again, I'm not proposing for LTO to be enabled by default. These patches
just make it possible to enable it. Are you saying the possibility
that a future compiler update breaks something is a blocker even for
experimental features?

> I agree that we need LTO/PGO to be housebroken from an LKMM viewpoint
> before it is enabled.

Can you elaborate what's needed from clang before this can move
forward? For example, if you have specific test cases in mind, we can
always work on including them in the LLVM test suite.

Sami

Will Deacon Nov. 20, 2017, 6:05 p.m. UTC | #14

Hi Sami,

On Thu, Nov 16, 2017 at 12:17:01PM -0800, Sami Tolvanen wrote:
> On Thu, Nov 16, 2017 at 11:13:07AM -0800, Paul E. McKenney wrote:
> > Ah, if "this patch set" meant "adding LTO", I stand corrected and I
> > apologize for my confusion.
> 
> Again, I'm not proposing for LTO to be enabled by default. These patches
> just make it possible to enable it. Are you saying the possibility
> that a future compiler update breaks something is a blocker even for
> experimental features?

My opinion here is that it's a blocker for LTO on arm64, as the symptoms
aren't different to building with a compiler that has subtle code generation
issues. If you *really* want to support LTO, we could consider giving
acquire semantics to READ_ONCE for LTO builds, but that means trading
RCU read-side performance for whatever gains are provided by LTO.

Please don't confuse this with a reluctance to support clang; I'm keen to
see that supported, so let's focus on that for the moment (although I don't
see a good reason to support old clang builds with known issues).

> > I agree that we need LTO/PGO to be housebroken from an LKMM viewpoint
> > before it is enabled.
> 
> Can you elaborate what's needed from clang before this can move
> forward? For example, if you have specific test cases in mind, we can
> always work on including them in the LLVM test suite.

This is a thorny issue, but RCU (specifically rcu_dereference but probably
also some READ_ONCEs) relies on being able to utilise syntactic dependency
chains to order local accesses to shared variables. Paul has spent a
fair amount of time working to define these dependency chains here:

  https://wg21.link/P0190

Although the current direction of the C++ committee is to prefer
that dependencies are explicitly "marked", this is not deemed to be
acceptable for the kernel (in other words, everything is always considered
"marked").

If a compiler is capable of breaking unmarked dependency chains, then we
will run into subtle concurrency issues on arm64 because the hardware is
also capable of reordering independent accesses. My understanding is that
LTO makes these sort of optimisations far more likely because whole-program
analysis can enable value prediction, which can convert address dependencies
into control dependcies; the latter not ordering reads on arm64.

Will

Peter Zijlstra Nov. 20, 2017, 7:25 p.m. UTC | #15

On Mon, Nov 20, 2017 at 06:05:55PM +0000, Will Deacon wrote:
> Please don't confuse this with a reluctance to support clang; I'm keen to
> see that supported, 

As an aside; as long as clang doesn't do asm-goto and asm-flag-output
(as examples of features that clang lacks and developers have at times
stated they would never support) I can't take them serious as a target
compiler.

Peter Zijlstra Nov. 20, 2017, 7:28 p.m. UTC | #16

On Mon, Nov 20, 2017 at 06:05:55PM +0000, Will Deacon wrote:
> This is a thorny issue, but RCU (specifically rcu_dereference but probably
> also some READ_ONCEs) relies on being able to utilise syntactic dependency
> chains to order local accesses to shared variables.

Well, we used to have READ_ONCE() and smp_read_barrier_depends(), but
we recently munged them together, in the process getting rid of
lockless_dereference().

So for sure, READ_ONCE() must be able to do the address dependency
thing, otherwise tons of code comes apart.

Peter Zijlstra Nov. 20, 2017, 7:32 p.m. UTC | #17

On Mon, Nov 20, 2017 at 06:05:55PM +0000, Will Deacon wrote:
> Although the current direction of the C++ committee is to prefer
> that dependencies are explicitly "marked", this is not deemed to be
> acceptable for the kernel (in other words, everything is always considered
> "marked").

Yeah, that is an attitude not compatible with existing code. Much like
the proposal to allow temporary/wide stores on everything not explicitly
declared atomic. Such stuff instantly breaks all extant code that does
multi-threading with no recourse.

Paul E. McKenney Nov. 20, 2017, 8:52 p.m. UTC | #18

On Mon, Nov 20, 2017 at 08:28:06PM +0100, Peter Zijlstra wrote:
> On Mon, Nov 20, 2017 at 06:05:55PM +0000, Will Deacon wrote:
> > This is a thorny issue, but RCU (specifically rcu_dereference but probably
> > also some READ_ONCEs) relies on being able to utilise syntactic dependency
> > chains to order local accesses to shared variables.
> 
> Well, we used to have READ_ONCE() and smp_read_barrier_depends(), but
> we recently munged them together, in the process getting rid of
> lockless_dereference().
> 
> So for sure, READ_ONCE() must be able to do the address dependency
> thing, otherwise tons of code comes apart.

I do hope that they track the volatile accesses produced by READ_ONCE().
Otherwise, it would not be good to apply this to anything touching MMIO.

							Thanx, Paul

Paul E. McKenney Nov. 20, 2017, 8:53 p.m. UTC | #19

On Mon, Nov 20, 2017 at 08:32:56PM +0100, Peter Zijlstra wrote:
> On Mon, Nov 20, 2017 at 06:05:55PM +0000, Will Deacon wrote:
> > Although the current direction of the C++ committee is to prefer
> > that dependencies are explicitly "marked", this is not deemed to be
> > acceptable for the kernel (in other words, everything is always considered
> > "marked").
> 
> Yeah, that is an attitude not compatible with existing code. Much like
> the proposal to allow temporary/wide stores on everything not explicitly
> declared atomic. Such stuff instantly breaks all extant code that does
> multi-threading with no recourse.

If someone suggests temporary/wide stores, even on non-atomics, tell
them that the standard does not permit them to introduce data races.

							Thanx, Paul

David Laight Nov. 21, 2017, 5:23 p.m. UTC | #20

From: Paul E. McKenney
> Sent: 20 November 2017 20:54
> 
> On Mon, Nov 20, 2017 at 08:32:56PM +0100, Peter Zijlstra wrote:
> > On Mon, Nov 20, 2017 at 06:05:55PM +0000, Will Deacon wrote:
> > > Although the current direction of the C++ committee is to prefer
> > > that dependencies are explicitly "marked", this is not deemed to be
> > > acceptable for the kernel (in other words, everything is always considered
> > > "marked").
> >
> > Yeah, that is an attitude not compatible with existing code. Much like
> > the proposal to allow temporary/wide stores on everything not explicitly
> > declared atomic. Such stuff instantly breaks all extant code that does
> > multi-threading with no recourse.
> 
> If someone suggests temporary/wide stores, even on non-atomics, tell
> them that the standard does not permit them to introduce data races.

The C standard doesn't say anything about multi-threading.

The x86 bis (bit set) family are well known for being problematic
because they always do a 32bit wide rmw cycle.

	David

Paul E. McKenney Nov. 21, 2017, 6:51 p.m. UTC | #21

On Tue, Nov 21, 2017 at 05:23:52PM +0000, David Laight wrote:
> From: Paul E. McKenney
> > Sent: 20 November 2017 20:54
> > 
> > On Mon, Nov 20, 2017 at 08:32:56PM +0100, Peter Zijlstra wrote:
> > > On Mon, Nov 20, 2017 at 06:05:55PM +0000, Will Deacon wrote:
> > > > Although the current direction of the C++ committee is to prefer
> > > > that dependencies are explicitly "marked", this is not deemed to be
> > > > acceptable for the kernel (in other words, everything is always considered
> > > > "marked").
> > >
> > > Yeah, that is an attitude not compatible with existing code. Much like
> > > the proposal to allow temporary/wide stores on everything not explicitly
> > > declared atomic. Such stuff instantly breaks all extant code that does
> > > multi-threading with no recourse.
> > 
> > If someone suggests temporary/wide stores, even on non-atomics, tell
> > them that the standard does not permit them to introduce data races.
> 
> The C standard doesn't say anything about multi-threading.

Actually, recent versions of the C standard really do cover
multi-threading, and have for some years.  For example, the June 2010
draft has this to say in section 5.1.2.4:

	Under a hosted implementation, a program can have more than one
	thread of execution (or thread) running concurrently.

Later, in paragraph 25 of this same section:

	The execution of a program contains a data race if it contains
	two conflicting actions in different threads, at least one of
	which is not atomic, and neither happens before the other. Any
	such data race results in undefined behavior.

Because the compiler is not allowed to introduce undefined behavior in a
program that does not already contain undefined behavior, the compiler
is absolutely forbidden from inventing stores unless it can prove that
doing so does not introduce a data race.

One (painful and annoying) case in which it can prove this is just before
a normal (non-volatile and non-atomic) store.

> The x86 bis (bit set) family are well known for being problematic
> because they always do a 32bit wide rmw cycle.

If the compiler is careful, it can invent atomic read-modify-write cycles
to uninvolved variables.  Here "is careful" includes ensuring that any
read from or write to one of those uninvolved variables acts just as it
would in the absence of the atomic read-modify-write cycle.

But I did say "store" above, not atomic read-modify-write operation.  ;-)

							Thanx, Paul

Alexander Potapenko Nov. 23, 2017, 1:42 p.m. UTC | #22

On Thu, Nov 16, 2017 at 7:16 PM, Nick Desaulniers
<ndesaulniers@google.com> wrote:
> On Thu, Nov 16, 2017 at 9:48 AM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
>> On Thu, Nov 16, 2017 at 06:34:17PM +0100, Peter Zijlstra wrote:
>>> On Thu, Nov 16, 2017 at 09:16:49AM -0800, Nick Desaulniers wrote:
>>> > On Thu, Nov 16, 2017 at 8:59 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>>> > > On Thu, Nov 16, 2017 at 08:50:41AM -0800, Nick Desaulniers wrote:
>>> > >> On Thu, Nov 16, 2017 at 8:30 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>>> > >>
>>> > >> > Ideally we'd get the toolchain people to commit to supporting the kernel
>>> > >> > memory model along side the C11 one. That would help a ton.
>>> > >>
>>> > >> Does anyone from the kernel side participate in the C standardization process?
>>> > >
>>> > > Yes, Paul McKenney and Will Deacon. Doesn't mean these two can still be
>>> > > reconciled though. From what I understand C11 (and onwards) are
>>> > > incompatible with the kernel model on a number of subtle points.
>>> >
>>> > It would be good to have these incompatibilities written down, then
>>> > for the sake of argument, they can be cited both for discussions on
>>> > LKML and in the C standardization process.  For example, a running
>>> > list in Documentation/ or something would make it so that anyone could
>>> > understand and cite current issues with the latest C standard.
>>>
>>> Will should be able to produce this list; I know he's done before, I
>>> just can't find it -- my Google-foo isn't strong today.
>>
>> Here you go:
>>
>> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0124r4.html
>
> Great, thanks! Will take some time to digest, but happy to refer
> others to this important work in the future.
>
> I wonder if we have anything like a case study that shows specifically
> how a compiler generated a subtle bug due to specifics of the memory
> model, like a "if you do this, here's the problematic code that will
> get generated, and why it's problematic due to the memory model."
> That may be a good way to raise issues with toolchain developers with
> concrete and actionable examples.
>
>>> > I don't understand why we'd block patches for enabling experimental
>>> > features.  We've been running this patch-set on actual devices for
>>> > months and would love to provide them to the community for further
>>> > testing.  If bugs are found, then there's more evidence to bring to
>>> > the C standards committee.  Otherwise we're shutting down feature
>>> > development for the sake of potential bugs in a C standard we're not
>>> > even using.
>>>
>>> So the problem is that its very very hard (and painful) to find these
>>> bugs. Getting the tools people to comment on these specific
>>> optimizations would really help lots.
>
> No doubt; I do not disagree with you.  Kernel developers have very
> important use cases for the language.
>
> But the core point I'm trying to make is "do we need to block this
> patch set until issues with the C standards body in regards to the
> kernels memory model are resolved?"  I would hope the two are
> orthogonal and that we'd take them and then test them even more
> extensively than the developer has in order to find out.
>
>> It would be good to get something similar to LKMM into KTSAN, for
>> example.  There would probably be a few differences due to efficiency
>> concerns, but closer is better than less close.  ;-)
>
> + glider, who may be able to comment on the state of KTSAN.
We haven't touched KTSAN for a while, so it's probably broken right now.
It should be possible to revive it, the question is how much code will
need to be annotated to prevent the tool from overwhelming the
developers with reports.
+Dima and Andrey, who should know better.

Dmitry Vyukov Nov. 24, 2017, 7:52 a.m. UTC | #23

On Thu, Nov 23, 2017 at 2:42 PM, Alexander Potapenko <glider@google.com> wrote:
>>>> > >> > Ideally we'd get the toolchain people to commit to supporting the kernel
>>>> > >> > memory model along side the C11 one. That would help a ton.
>>>> > >>
>>>> > >> Does anyone from the kernel side participate in the C standardization process?
>>>> > >
>>>> > > Yes, Paul McKenney and Will Deacon. Doesn't mean these two can still be
>>>> > > reconciled though. From what I understand C11 (and onwards) are
>>>> > > incompatible with the kernel model on a number of subtle points.
>>>> >
>>>> > It would be good to have these incompatibilities written down, then
>>>> > for the sake of argument, they can be cited both for discussions on
>>>> > LKML and in the C standardization process.  For example, a running
>>>> > list in Documentation/ or something would make it so that anyone could
>>>> > understand and cite current issues with the latest C standard.
>>>>
>>>> Will should be able to produce this list; I know he's done before, I
>>>> just can't find it -- my Google-foo isn't strong today.
>>>
>>> Here you go:
>>>
>>> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0124r4.html
>>
>> Great, thanks! Will take some time to digest, but happy to refer
>> others to this important work in the future.
>>
>> I wonder if we have anything like a case study that shows specifically
>> how a compiler generated a subtle bug due to specifics of the memory
>> model, like a "if you do this, here's the problematic code that will
>> get generated, and why it's problematic due to the memory model."
>> That may be a good way to raise issues with toolchain developers with
>> concrete and actionable examples.
>>
>>>> > I don't understand why we'd block patches for enabling experimental
>>>> > features.  We've been running this patch-set on actual devices for
>>>> > months and would love to provide them to the community for further
>>>> > testing.  If bugs are found, then there's more evidence to bring to
>>>> > the C standards committee.  Otherwise we're shutting down feature
>>>> > development for the sake of potential bugs in a C standard we're not
>>>> > even using.
>>>>
>>>> So the problem is that its very very hard (and painful) to find these
>>>> bugs. Getting the tools people to comment on these specific
>>>> optimizations would really help lots.
>>
>> No doubt; I do not disagree with you.  Kernel developers have very
>> important use cases for the language.
>>
>> But the core point I'm trying to make is "do we need to block this
>> patch set until issues with the C standards body in regards to the
>> kernels memory model are resolved?"  I would hope the two are
>> orthogonal and that we'd take them and then test them even more
>> extensively than the developer has in order to find out.
>>
>>> It would be good to get something similar to LKMM into KTSAN, for
>>> example.  There would probably be a few differences due to efficiency
>>> concerns, but closer is better than less close.  ;-)
>>
>> + glider, who may be able to comment on the state of KTSAN.
> We haven't touched KTSAN for a while, so it's probably broken right now.
> It should be possible to revive it, the question is how much code will
> need to be annotated to prevent the tool from overwhelming the
> developers with reports.
> +Dima and Andrey, who should know better.

Hi,

KTSAN checks acquire/release pairs, and that's very useful. But as far
as I understand this thread is about more subtle things and areas of
kernel/compiler tension. I afraid this in this context KTSAN is in the
same boat as compiler. Because, well, we need to write code that
implements precise algorithms. And if control-dependencies are defined
as "extreme care is required to avoid control-dependency-destroying
compiler optimizations" (that is, code is correct if it does not break
against the current set of enabled optimizations in the current
compiler, what?) and data-dependencies are defined akin to C11
definition (read -- non-implementable, unicorns); then KTSAN can't
help.

When/if C provides implementable rules for data-dependencies
(_Dependency) and that's implemented in compilers and kernel sticks to
this model, then I guess it should be possible to extract that info
from compiler and check against it in KTSAN (e.g. 2 synchronization
domains, one for dependent accesses and one for everything else).
Kernel could as well define own model, and KTSAN could check against
it. But that model must be implemented in compilers first anyway.
Because (1) doing it just for KTSAN does not look reasonable, (2)
until compiler supports that model there is little point in checking
(the fact that compiler uses a different model is the major gaping
hole and we are aware of it without tooling help).

And, yes, I agree that we should not block this LTO patch. All
problems are already there and are orthogonal to LTO. Compiler sees
enough code already (large TUs, lots of code in headers) and we move
code. I also have not seen any special rules wrt rcu and translation
units, I have not seen developers doing any additional analysis re
TUs, move code to separate files, nor I seen comments says that this
code must be in separate TU than that code. From what I see usually
it's assumed that things will just work. If anything LTO will be
useful to shake out latent bugs that will pop up later.

[v2,18/18] arm64: select ARCH_SUPPORTS_LTO_CLANG

Commit Message

Comments

Patch