Message ID | 20240322233838.868874-1-boqun.feng@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | Memory model and atomic API in Rust | expand |
On Fri, Mar 22, 2024 at 04:38:35PM -0700, Boqun Feng wrote: > Hi, > > Since I see more and more Rust code is comming in, I feel like this > should be sent sooner rather than later, so here is a WIP to open the > discussion and get feedback. > > One of the most important questions we need to answer is: which > memory (ordering) model we should use when developing Rust in Linux > kernel, given Rust has its own memory ordering model[1]. I had some > discussion with Rust language community to understand their position > on this: > > https://github.com/rust-lang/unsafe-code-guidelines/issues/348#issuecomment-1218407557 > https://github.com/rust-lang/unsafe-code-guidelines/issues/476#issue-2001382992 > > My takeaway from these discussions, along with other offline discussion > is that supporting two memory models is challenging for both correctness > reasoning (some one needs to provide a model) and implementation (one > model needs to be aware of the other model). So that's not wise to do > (at least at the beginning). So the most reasonable option to me is: > > we only use LKMM for Rust code in kernel (i.e. avoid using > Rust's own atomic). > > Because kernel developers are more familiar with LKMM and when Rust code > interacts with C code, it has to use the model that C code uses. I wonder about that. The disadvantage of only supporting LKMM atomics is that we'll be incompatible with third party code, and we don't want to be rolling all of our own data structures forever. Do we see a path towards eventually supporting the standard Rust model? Perhaps LKMM atomics could be reworked to be a layer on top of C/C++ atomics. When I last looked, they didn't look completely incompatible; rather, there is a common subset that both support with the same semantics, and either has some things that it supports and the other doesn't (i.e., LKMLL atomics have smp_mb__after_atomic(); this is just a straightforward optimization to avoid an unnecessary barrier on architectures where the atomic already provided it).
On Fri, 22 Mar 2024 at 16:57, Kent Overstreet <kent.overstreet@linux.dev> wrote: > > I wonder about that. The disadvantage of only supporting LKMM atomics is > that we'll be incompatible with third party code, and we don't want to > be rolling all of our own data structures forever. Honestly, having seen the shit-show that is language standards bodies and incomplete compiler support, I do not understand why people think that we wouldn't want to roll our own. The C++ memory model may be reliable in another decade. And then a decade after *that*, we can drop support for the pre-reliable compilers. People who think that compilers do things right just because they are automated simply don't know what they are talking about. It was just a couple of days ago that I was pointed at https://github.com/llvm/llvm-project/issues/64188 which is literally the compiler completely missing a C++ memory barrier. And when the compiler itself is fundamentally buggy, you're kind of screwed. When you roll your own, you can work around the bugs in compilers. And this is all doubly true when it is something that the kernel does, and very few other projects do. For example, we're often better off using inline asm over dubious builtins that have "native" compiler support for them, but little actual real coverage. It really is often a "ok, this builtin has actually been used for a decade, so it's hopefully stable now". We have years of examples of builtins either being completely broken (as in "immediate crash" broken), or simply generating crap code that is actively worse than using the inline asm. The memory ordering isn't going to be at all different. Moving it into the compiler doesn't solve problems. It creates them. Linus
On Fri, Mar 22, 2024 at 07:57:41PM -0400, Kent Overstreet wrote: > On Fri, Mar 22, 2024 at 04:38:35PM -0700, Boqun Feng wrote: > > Hi, > > > > Since I see more and more Rust code is comming in, I feel like this > > should be sent sooner rather than later, so here is a WIP to open the > > discussion and get feedback. > > > > One of the most important questions we need to answer is: which > > memory (ordering) model we should use when developing Rust in Linux > > kernel, given Rust has its own memory ordering model[1]. I had some > > discussion with Rust language community to understand their position > > on this: > > > > https://github.com/rust-lang/unsafe-code-guidelines/issues/348#issuecomment-1218407557 > > https://github.com/rust-lang/unsafe-code-guidelines/issues/476#issue-2001382992 > > > > My takeaway from these discussions, along with other offline discussion > > is that supporting two memory models is challenging for both correctness > > reasoning (some one needs to provide a model) and implementation (one > > model needs to be aware of the other model). So that's not wise to do > > (at least at the beginning). So the most reasonable option to me is: > > > > we only use LKMM for Rust code in kernel (i.e. avoid using > > Rust's own atomic). > > > > Because kernel developers are more familiar with LKMM and when Rust code > > interacts with C code, it has to use the model that C code uses. > > I wonder about that. The disadvantage of only supporting LKMM atomics is > that we'll be incompatible with third party code, and we don't want to > be rolling all of our own data structures forever. > A possible solution to that is a set of C++ memory model atomics implemented by LKMM atomics. That should be possible. > Do we see a path towards eventually supporting the standard Rust model? > Things that Rust/C++ memory model don't suppor but we use are at least: mixed size atomics (cmpxchg a u64, but read a u8 from another thread), dependencies (we used a lot in fast path), so it's not trivial. There are also issues like where one Rust thread does a store(.., RELEASE), and a C thread does a rcu_deference(), in practice, it probably works but no one works out (and no one would work out) a model to describe such an interaction. Regards, Boqun > Perhaps LKMM atomics could be reworked to be a layer on top of C/C++ > atomics. When I last looked, they didn't look completely incompatible; > rather, there is a common subset that both support with the same > semantics, and either has some things that it supports and the other > doesn't (i.e., LKMLL atomics have smp_mb__after_atomic(); this is just a > straightforward optimization to avoid an unnecessary barrier on > architectures where the atomic already provided it).
On Fri, Mar 22, 2024 at 05:12:29PM -0700, Linus Torvalds wrote: > On Fri, 22 Mar 2024 at 16:57, Kent Overstreet <kent.overstreet@linux.dev> wrote: > > > > I wonder about that. The disadvantage of only supporting LKMM atomics is > > that we'll be incompatible with third party code, and we don't want to > > be rolling all of our own data structures forever. > > Honestly, having seen the shit-show that is language standards bodies > and incomplete compiler support, I do not understand why people think > that we wouldn't want to roll our own. > > The C++ memory model may be reliable in another decade. And then a > decade after *that*, we can drop support for the pre-reliable > compilers. > > People who think that compilers do things right just because they are > automated simply don't know what they are talking about. > > It was just a couple of days ago that I was pointed at > > https://github.com/llvm/llvm-project/issues/64188 Besides that there's cross arch support to think about - it's hard to imagine us ever ditching our own atomics. I was thinking about something more incremental - just an optional mode where our atomics were C atomics underneath. It'd probably give the compiler people a much more effective way to test their stuff than anything they have now.
On Fri, 22 Mar 2024 at 17:21, Kent Overstreet <kent.overstreet@linux.dev> wrote: > > Besides that there's cross arch support to think about - it's hard to > imagine us ever ditching our own atomics. Well, that's one of the advantages of using compiler builtins - projects that do want cross-architecture support, but that aren't actually maintaining their _own_ architecture support. So I very much see the lure of compiler support for that kind of situation - to write portable code without having to know or care about architecture details. This is one reason I think the kernel is kind of odd and special - because in the kernel, we obviously very fundamentally have to care about the architecture details _anyway_, so then having the architecture also define things like atomics is just a pretty small (and relatively straightforward) detail. The same argument goes for compiler builtins vs inline asm. In the kernel, we have to have people who are intimately familiar with the architecture _anyway_, so inline asms and architecture-specific header files aren't some big pain-point: they'd be needed _anyway_. But in some random user level program, where all you want is an efficient way to do "find first bit"? Then using a compiler intrinsic makes a lot more sense. > I was thinking about something more incremental - just an optional mode > where our atomics were C atomics underneath. It'd probably give the > compiler people a much more effective way to test their stuff than > anything they have now. I suspect it might be painful, and some compiler people would throw their hands up in horror, because the C++ atomics model is based fairly solidly on atomic types, and the kernel memory model is much more fluid. Boqun already mentioned the "mixing access sizes", which is actually quite fundamental in the kernel, where we play lots of games with that (typically around locking, where you find patterns line unlock writing a zero to a single byte, even though the whole lock data structure is a word). And sometimes the access size games are very explicit (eg lib/lockref.c). But it actually goes deeper than that. While we do have "atomic_t" etc for arithmetic atomics, and that probably would map fairly well to C++ atomics, in other cases we simply base our atomics not on _types_, but on code. IOW, we do things like "cmpxchg()", and the target of that atomic access is just a regular data structure field. It's kind of like our "volatile" usage. If you read the C (and C++) standards, you'll find that you should use "volatile" on data types. That's almost *never* what the kernel does. The kernel uses "volatile" in _code_ (ie READ_ONCE() etc), and uses it by casting etc. Compiler people don't tend to really like those kinds of things. Linus
On Fri, Mar 22, 2024 at 05:15:08PM -0700, Boqun Feng wrote: [...] > > > > I wonder about that. The disadvantage of only supporting LKMM atomics is > > that we'll be incompatible with third party code, and we don't want to > > be rolling all of our own data structures forever. > > > > A possible solution to that is a set of C++ memory model atomics > implemented by LKMM atomics. That should be possible. > Another possible "solution" works in the opposite direction, since the folder rust/kernel/sync/atomic is quite stand-alone, we can export that as a Rust crate (library), and third party code can support using LKMM atomics instead of Rust own atomics ;-) Of course if the project is supposed to work with Linux kernel. Regards, Boqun
On Fri, Mar 22, 2024 at 05:49:23PM -0700, Boqun Feng wrote: > On Fri, Mar 22, 2024 at 05:15:08PM -0700, Boqun Feng wrote: > [...] > > > > > > I wonder about that. The disadvantage of only supporting LKMM atomics is > > > that we'll be incompatible with third party code, and we don't want to > > > be rolling all of our own data structures forever. > > > > > > > A possible solution to that is a set of C++ memory model atomics > > implemented by LKMM atomics. That should be possible. > > > > Another possible "solution" works in the opposite direction, since the > folder rust/kernel/sync/atomic is quite stand-alone, we can export that > as a Rust crate (library), and third party code can support using LKMM > atomics instead of Rust own atomics ;-) Of course if the project is > supposed to work with Linux kernel. Not just from the Rust side, the C side would be useful as well. I've got a quicky, dirty, hacky version of that in bcachefs-tools, and I would switch immediately if someone took that over and made it a real project, and I'm quite sure other projects would as well.
On Fri, Mar 22, 2024 at 05:36:00PM -0700, Linus Torvalds wrote: > On Fri, 22 Mar 2024 at 17:21, Kent Overstreet <kent.overstreet@linux.dev> wrote: > > > > Besides that there's cross arch support to think about - it's hard to > > imagine us ever ditching our own atomics. > > Well, that's one of the advantages of using compiler builtins - > projects that do want cross-architecture support, but that aren't > actually maintaining their _own_ architecture support. > > So I very much see the lure of compiler support for that kind of > situation - to write portable code without having to know or care > about architecture details. > > This is one reason I think the kernel is kind of odd and special - > because in the kernel, we obviously very fundamentally have to care > about the architecture details _anyway_, so then having the > architecture also define things like atomics is just a pretty small > (and relatively straightforward) detail. > > The same argument goes for compiler builtins vs inline asm. In the > kernel, we have to have people who are intimately familiar with the > architecture _anyway_, so inline asms and architecture-specific header > files aren't some big pain-point: they'd be needed _anyway_. > > But in some random user level program, where all you want is an > efficient way to do "find first bit"? Then using a compiler intrinsic > makes a lot more sense. We've got a whole spectrum of kernel code though, and a lot of it is code that - honestly, we'd be better off if it wasn't specific to the kernel. rhashtable comes to mind; it's a fully generic, excellent at what it does, but it's had a number of annoyingly subtle bugs and sharp edges over the years that are really just a result of it not having enough users. So I see some real value in regularizing things. > > I was thinking about something more incremental - just an optional mode > > where our atomics were C atomics underneath. It'd probably give the > > compiler people a much more effective way to test their stuff than > > anything they have now. > > I suspect it might be painful, and some compiler people would throw > their hands up in horror, because the C++ atomics model is based > fairly solidly on atomic types, and the kernel memory model is much > more fluid. > > Boqun already mentioned the "mixing access sizes", which is actually > quite fundamental in the kernel, where we play lots of games with that > (typically around locking, where you find patterns line unlock writing > a zero to a single byte, even though the whole lock data structure is > a word). And sometimes the access size games are very explicit (eg > lib/lockref.c). I don't think mixing access sizes should be a real barrier. On the read side we can obviously do that with a helper; the write side needs compiler help, but "writing just a byte out of a word" is no different from a compiler POV that "write a single bit", and we can already mix atomic_or() with atomic_add(), with both C atomics and LKMM atomics. > But it actually goes deeper than that. While we do have "atomic_t" etc > for arithmetic atomics, and that probably would map fairly well to C++ > atomics, in other cases we simply base our atomics not on _types_, but > on code. > > IOW, we do things like "cmpxchg()", and the target of that atomic > access is just a regular data structure field. Well, some of that's historical cruft; cmpxchg() and atomic_cmpxchg() have different orderings, and we can specify that more directly now. But we definitely need the ability to cmpxchg() any struct of a size the machine supports atomic access to. Rust should be able to manage that more easily than C/C++ though - they've got a type system that can sanely represent that.
On Fri, Mar 22, 2024 at 10:07:31PM -0400, Kent Overstreet wrote: [...] > > Boqun already mentioned the "mixing access sizes", which is actually > > quite fundamental in the kernel, where we play lots of games with that > > (typically around locking, where you find patterns line unlock writing > > a zero to a single byte, even though the whole lock data structure is > > a word). And sometimes the access size games are very explicit (eg > > lib/lockref.c). > > I don't think mixing access sizes should be a real barrier. On the read Well, it actually is, since mixing access sizes is, guess what, an undefined behavior: (example in https://doc.rust-lang.org/std/sync/atomic/#memory-model-for-atomic-accesses) thread::scope(|s| { // This is UB: using different-sized atomic accesses to the same data s.spawn(|| atomic.store(1, Ordering::Relaxed)); s.spawn(|| unsafe { let differently_sized = transmute::<&AtomicU16, &AtomicU8>(&atomic); differently_sized.store(2, Ordering::Relaxed); }); }); Of course, you can say "I will just ignore the UB", but if you have to ignore "compiler rules" to make your code work, why bother use compiler builtin in the first place? Being UB means they are NOT guaranteed to work. > side we can obviously do that with a helper; the write side needs > compiler help, but "writing just a byte out of a word" is no different > from a compiler POV that "write a single bit", and we can already mix > atomic_or() with atomic_add(), with both C atomics and LKMM atomics. > I totally agree with your reasoning here, but maybe the standard doesn't ;-) Regards, Boqun
On Fri, Mar 22, 2024 at 07:26:28PM -0700, Boqun Feng wrote: > On Fri, Mar 22, 2024 at 10:07:31PM -0400, Kent Overstreet wrote: > [...] > > > Boqun already mentioned the "mixing access sizes", which is actually > > > quite fundamental in the kernel, where we play lots of games with that > > > (typically around locking, where you find patterns line unlock writing > > > a zero to a single byte, even though the whole lock data structure is > > > a word). And sometimes the access size games are very explicit (eg > > > lib/lockref.c). > > > > I don't think mixing access sizes should be a real barrier. On the read > > Well, it actually is, since mixing access sizes is, guess what, > an undefined behavior: > > (example in https://doc.rust-lang.org/std/sync/atomic/#memory-model-for-atomic-accesses) > > thread::scope(|s| { > // This is UB: using different-sized atomic accesses to the same data > s.spawn(|| atomic.store(1, Ordering::Relaxed)); > s.spawn(|| unsafe { > let differently_sized = transmute::<&AtomicU16, &AtomicU8>(&atomic); > differently_sized.store(2, Ordering::Relaxed); > }); > }); > > Of course, you can say "I will just ignore the UB", but if you have to > ignore "compiler rules" to make your code work, why bother use compiler > builtin in the first place? Being UB means they are NOT guaranteed to > work. That's not what I'm proposing - you'd need additional compiler support. but the new intrinsic would be no different, semantics wise for the compiler to model, than a "lock orb".
On Fri, Mar 22, 2024 at 10:33:13PM -0400, Kent Overstreet wrote: > On Fri, Mar 22, 2024 at 07:26:28PM -0700, Boqun Feng wrote: > > On Fri, Mar 22, 2024 at 10:07:31PM -0400, Kent Overstreet wrote: > > [...] > > > > Boqun already mentioned the "mixing access sizes", which is actually > > > > quite fundamental in the kernel, where we play lots of games with that > > > > (typically around locking, where you find patterns line unlock writing > > > > a zero to a single byte, even though the whole lock data structure is > > > > a word). And sometimes the access size games are very explicit (eg > > > > lib/lockref.c). > > > > > > I don't think mixing access sizes should be a real barrier. On the read > > > > Well, it actually is, since mixing access sizes is, guess what, > > an undefined behavior: > > > > (example in https://doc.rust-lang.org/std/sync/atomic/#memory-model-for-atomic-accesses) > > > > thread::scope(|s| { > > // This is UB: using different-sized atomic accesses to the same data > > s.spawn(|| atomic.store(1, Ordering::Relaxed)); > > s.spawn(|| unsafe { > > let differently_sized = transmute::<&AtomicU16, &AtomicU8>(&atomic); > > differently_sized.store(2, Ordering::Relaxed); > > }); > > }); > > > > Of course, you can say "I will just ignore the UB", but if you have to > > ignore "compiler rules" to make your code work, why bother use compiler > > builtin in the first place? Being UB means they are NOT guaranteed to > > work. > > That's not what I'm proposing - you'd need additional compiler support. Ah, OK. > but the new intrinsic would be no different, semantics wise for the > compiler to model, than a "lock orb". Be ready to be disappointed: https://rust-lang.zulipchat.com/#narrow/stream/136281-t-opsem/topic/is.20atomic.20aliasing.20allowed.3F/near/402078545 https://rust-lang.zulipchat.com/#narrow/stream/136281-t-opsem/topic/is.20atomic.20aliasing.20allowed.3F/near/402082631 ;-) In fact, if you get a chance to read the previous discussion links I shared, you will find I was just like you in the beginning: hope we could extend the model to support more kernel code properly. But my overall feeling is that it's either very challenging or lack of motivation to do. Regards, Boqun
On Fri, Mar 22, 2024 at 07:57:20PM -0700, Boqun Feng wrote: > On Fri, Mar 22, 2024 at 10:33:13PM -0400, Kent Overstreet wrote: > > On Fri, Mar 22, 2024 at 07:26:28PM -0700, Boqun Feng wrote: > > > On Fri, Mar 22, 2024 at 10:07:31PM -0400, Kent Overstreet wrote: > > > [...] > > > > > Boqun already mentioned the "mixing access sizes", which is actually > > > > > quite fundamental in the kernel, where we play lots of games with that > > > > > (typically around locking, where you find patterns line unlock writing > > > > > a zero to a single byte, even though the whole lock data structure is > > > > > a word). And sometimes the access size games are very explicit (eg > > > > > lib/lockref.c). > > > > > > > > I don't think mixing access sizes should be a real barrier. On the read > > > > > > Well, it actually is, since mixing access sizes is, guess what, > > > an undefined behavior: > > > > > > (example in https://doc.rust-lang.org/std/sync/atomic/#memory-model-for-atomic-accesses) > > > > > > thread::scope(|s| { > > > // This is UB: using different-sized atomic accesses to the same data > > > s.spawn(|| atomic.store(1, Ordering::Relaxed)); > > > s.spawn(|| unsafe { > > > let differently_sized = transmute::<&AtomicU16, &AtomicU8>(&atomic); > > > differently_sized.store(2, Ordering::Relaxed); > > > }); > > > }); > > > > > > Of course, you can say "I will just ignore the UB", but if you have to > > > ignore "compiler rules" to make your code work, why bother use compiler > > > builtin in the first place? Being UB means they are NOT guaranteed to > > > work. > > > > That's not what I'm proposing - you'd need additional compiler support. > > Ah, OK. > > > but the new intrinsic would be no different, semantics wise for the > > compiler to model, than a "lock orb". > > Be ready to be disappointed: > > https://rust-lang.zulipchat.com/#narrow/stream/136281-t-opsem/topic/is.20atomic.20aliasing.20allowed.3F/near/402078545 > https://rust-lang.zulipchat.com/#narrow/stream/136281-t-opsem/topic/is.20atomic.20aliasing.20allowed.3F/near/402082631 > > ;-) > > In fact, if you get a chance to read the previous discussion links I > shared, you will find I was just like you in the beginning: hope we > could extend the model to support more kernel code properly. But my > overall feeling is that it's either very challenging or lack of > motivation to do. That's casting - that doesn't work because compiler people hate aliasing. But intrinsics for e.g. __atomic32_read_u8(atomic_u32_t *a, unsigned byte) __atomic32_write_u8(atomic_u32_t a*, unsigned byte) should be doable - that's perfectly fine for the compiler to model. That would admittedly be ugly to use. But, if Rust ever allowed for marking any struct up to word size as atomic (which we want anyways...), it could use that under the hood for setting a member variable without cmpxchg.
On Fri, Mar 22, 2024 at 11:10:36PM -0400, Kent Overstreet wrote: > On Fri, Mar 22, 2024 at 07:57:20PM -0700, Boqun Feng wrote: > > On Fri, Mar 22, 2024 at 10:33:13PM -0400, Kent Overstreet wrote: > > > On Fri, Mar 22, 2024 at 07:26:28PM -0700, Boqun Feng wrote: > > > > On Fri, Mar 22, 2024 at 10:07:31PM -0400, Kent Overstreet wrote: > > > > [...] > > > > > > Boqun already mentioned the "mixing access sizes", which is actually > > > > > > quite fundamental in the kernel, where we play lots of games with that > > > > > > (typically around locking, where you find patterns line unlock writing > > > > > > a zero to a single byte, even though the whole lock data structure is > > > > > > a word). And sometimes the access size games are very explicit (eg > > > > > > lib/lockref.c). > > > > > > > > > > I don't think mixing access sizes should be a real barrier. On the read > > > > > > > > Well, it actually is, since mixing access sizes is, guess what, > > > > an undefined behavior: > > > > > > > > (example in https://doc.rust-lang.org/std/sync/atomic/#memory-model-for-atomic-accesses) > > > > > > > > thread::scope(|s| { > > > > // This is UB: using different-sized atomic accesses to the same data > > > > s.spawn(|| atomic.store(1, Ordering::Relaxed)); > > > > s.spawn(|| unsafe { > > > > let differently_sized = transmute::<&AtomicU16, &AtomicU8>(&atomic); > > > > differently_sized.store(2, Ordering::Relaxed); > > > > }); > > > > }); > > > > > > > > Of course, you can say "I will just ignore the UB", but if you have to > > > > ignore "compiler rules" to make your code work, why bother use compiler > > > > builtin in the first place? Being UB means they are NOT guaranteed to > > > > work. > > > > > > That's not what I'm proposing - you'd need additional compiler support. > > > > Ah, OK. > > > > > but the new intrinsic would be no different, semantics wise for the > > > compiler to model, than a "lock orb". > > > > Be ready to be disappointed: > > > > https://rust-lang.zulipchat.com/#narrow/stream/136281-t-opsem/topic/is.20atomic.20aliasing.20allowed.3F/near/402078545 > > https://rust-lang.zulipchat.com/#narrow/stream/136281-t-opsem/topic/is.20atomic.20aliasing.20allowed.3F/near/402082631 > > > > ;-) > > > > In fact, if you get a chance to read the previous discussion links I > > shared, you will find I was just like you in the beginning: hope we > > could extend the model to support more kernel code properly. But my > > overall feeling is that it's either very challenging or lack of > > motivation to do. > > That's casting - that doesn't work because compiler people hate > aliasing. > > But intrinsics for e.g. > __atomic32_read_u8(atomic_u32_t *a, unsigned byte) > __atomic32_write_u8(atomic_u32_t a*, unsigned byte) > so "byte" here is the byte indexing in the u32? Hmm... I guess that'll work. But I really don't know whether LLVM/Rust will support such an intrinsic... Regards, Boqun > should be doable - that's perfectly fine for the compiler to model. > > That would admittedly be ugly to use. But, if Rust ever allowed for > marking any struct up to word size as atomic (which we want anyways...), > it could use that under the hood for setting a member variable without > cmpxchg.
On Fri, Mar 22, 2024 at 08:51:03PM -0700, Boqun Feng wrote: > On Fri, Mar 22, 2024 at 11:10:36PM -0400, Kent Overstreet wrote: > > On Fri, Mar 22, 2024 at 07:57:20PM -0700, Boqun Feng wrote: > > > On Fri, Mar 22, 2024 at 10:33:13PM -0400, Kent Overstreet wrote: > > > > On Fri, Mar 22, 2024 at 07:26:28PM -0700, Boqun Feng wrote: > > > > > On Fri, Mar 22, 2024 at 10:07:31PM -0400, Kent Overstreet wrote: > > > > > [...] > > > > > > > Boqun already mentioned the "mixing access sizes", which is actually > > > > > > > quite fundamental in the kernel, where we play lots of games with that > > > > > > > (typically around locking, where you find patterns line unlock writing > > > > > > > a zero to a single byte, even though the whole lock data structure is > > > > > > > a word). And sometimes the access size games are very explicit (eg > > > > > > > lib/lockref.c). > > > > > > > > > > > > I don't think mixing access sizes should be a real barrier. On the read > > > > > > > > > > Well, it actually is, since mixing access sizes is, guess what, > > > > > an undefined behavior: > > > > > > > > > > (example in https://doc.rust-lang.org/std/sync/atomic/#memory-model-for-atomic-accesses) > > > > > > > > > > thread::scope(|s| { > > > > > // This is UB: using different-sized atomic accesses to the same data > > > > > s.spawn(|| atomic.store(1, Ordering::Relaxed)); > > > > > s.spawn(|| unsafe { > > > > > let differently_sized = transmute::<&AtomicU16, &AtomicU8>(&atomic); > > > > > differently_sized.store(2, Ordering::Relaxed); > > > > > }); > > > > > }); > > > > > > > > > > Of course, you can say "I will just ignore the UB", but if you have to > > > > > ignore "compiler rules" to make your code work, why bother use compiler > > > > > builtin in the first place? Being UB means they are NOT guaranteed to > > > > > work. > > > > > > > > That's not what I'm proposing - you'd need additional compiler support. > > > > > > Ah, OK. > > > > > > > but the new intrinsic would be no different, semantics wise for the > > > > compiler to model, than a "lock orb". > > > > > > Be ready to be disappointed: > > > > > > https://rust-lang.zulipchat.com/#narrow/stream/136281-t-opsem/topic/is.20atomic.20aliasing.20allowed.3F/near/402078545 > > > https://rust-lang.zulipchat.com/#narrow/stream/136281-t-opsem/topic/is.20atomic.20aliasing.20allowed.3F/near/402082631 > > > > > > ;-) > > > > > > In fact, if you get a chance to read the previous discussion links I > > > shared, you will find I was just like you in the beginning: hope we > > > could extend the model to support more kernel code properly. But my > > > overall feeling is that it's either very challenging or lack of > > > motivation to do. > > > > That's casting - that doesn't work because compiler people hate > > aliasing. > > > > But intrinsics for e.g. > > __atomic32_read_u8(atomic_u32_t *a, unsigned byte) > > __atomic32_write_u8(atomic_u32_t a*, unsigned byte) > > > > so "byte" here is the byte indexing in the u32? Hmm... I guess that'll > work. But I really don't know whether LLVM/Rust will support such an > intrinsic... They're going to need this eventually - really, entire structs that can be marked as atomic. Types aren't limited to the integers.
> There are also issues like where one Rust thread does a store(.., > RELEASE), and a C thread does a rcu_deference(), in practice, it > probably works but no one works out (and no one would work out) a model > to describe such an interaction. Isn't that what Paul E. McKenney litmus tests are all about? tools/memory-model/litmus-test Andrew
On Sat, Mar 23, 2024 at 03:29:11PM +0100, Andrew Lunn wrote: > > There are also issues like where one Rust thread does a store(.., > > RELEASE), and a C thread does a rcu_deference(), in practice, it > > probably works but no one works out (and no one would work out) a model > > to describe such an interaction. > > Isn't that what Paul E. McKenney litmus tests are all about? > Litmus tests (or herd, or any other memory model tools) works for either LKMM or C++ memory model. But there is no model I'm aware of works for the communication between two memory models. So for example: Rust thread: let mut foo: Box<Foo> = ...; foo.a = 1; let global_ptr: &AtomicPtr = ...; global_ptr.store(foo.leak() as _, RELEASE); C thread: rcu_read_lock(); foo = rcu_dereference(global_ptr); if (foo) { r1 = foo->a; } rcu_read_unlock(); no tool or model yet to guarantee "r1" is 1, but yeah, in practice for the case we care, it's probably guaranteed. But no tool or model means challenging for code reasoning. Regards, Boqun > tools/memory-model/litmus-test > > Andrew
On Sat, Mar 23, 2024 at 07:41:28AM -0700, Boqun Feng wrote: > On Sat, Mar 23, 2024 at 03:29:11PM +0100, Andrew Lunn wrote: > > > There are also issues like where one Rust thread does a store(.., > > > RELEASE), and a C thread does a rcu_deference(), in practice, it > > > probably works but no one works out (and no one would work out) a model > > > to describe such an interaction. > > > > Isn't that what Paul E. McKenney litmus tests are all about? > > > > Litmus tests (or herd, or any other memory model tools) works for either > LKMM or C++ memory model. But there is no model I'm aware of works for > the communication between two memory models. So for example: > > Rust thread: > > let mut foo: Box<Foo> = ...; > foo.a = 1; > let global_ptr: &AtomicPtr = ...; > global_ptr.store(foo.leak() as _, RELEASE); > > > C thread: > > rcu_read_lock(); > > foo = rcu_dereference(global_ptr); > if (foo) { > r1 = foo->a; > } > > rcu_read_unlock(); > > no tool or model yet to guarantee "r1" is 1, but yeah, in practice for > the case we care, it's probably guaranteed. But no tool or model means > challenging for code reasoning. > There are also cases where two similar APIs from C++ memory model and LKMM have different semantics, for example, a SeqCst atomic in C++ memory model doesn't imply a full barrier, while a fully ordered LKMM atomic does: Rust: a.store(1, RELAXED); x.fetch_add(1, SeqCst); b.store(2, RELAXED); // ^ writes to a and b are not ordered. C: WRITE_ONCE(*a, 1); atomic_fetch_add(x, 1); WRITE_ONCE(*b, 2); // ^ writes to a and b are ordered. So if you used to have two parts synchronizing each other with LKMM atomics, converting one side to Rust *and* using Rust atomics requires much caution. Regards, Boqun > Regards, > Boqun > > > tools/memory-model/litmus-test > > > > Andrew
> On Mar 22, 2024, at 8:12 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > And when the compiler itself is fundamentally buggy, you're kind of > screwed. When you roll your own, you can work around the bugs in > compilers. That may be true, but the LLVM issue you cited isn’t a good example. In that issue, the function being miscompiled doesn’t actually use any barriers or atomics itself; only the scaffolding around it does. The same issue would happen even if the scaffolding used LKMM atomics. For anyone curious: The problematic optimization involves an allocation (‘p’) that is initially private to the function, but is returned at the end of the function. LLVM moves a non-atomic store to that allocation across an external function call (to ‘foo’). This reordering would be blatantly invalid if any other code could observe the contents of the allocation, but is valid if the allocation is private to the function. LLVM assumes the latter: after all, the pointer to it hasn’t escaped. Yet. Except that in a weak memory model, the escape can ‘time travel’...
On Sat, Mar 23, 2024 at 05:40:23PM -0400, comex wrote: > That may be true, but the LLVM issue you cited isn’t a good example. > In that issue, the function being miscompiled doesn’t actually use any > barriers or atomics itself; only the scaffolding around it does. The > same issue would happen even if the scaffolding used LKMM atomics. > > For anyone curious: The problematic optimization involves an > allocation (‘p’) that is initially private to the function, but is > returned at the end of the function. LLVM moves a non-atomic store to > that allocation across an external function call (to ‘foo’). This > reordering would be blatantly invalid if any other code could observe > the contents of the allocation, but is valid if the allocation is > private to the function. LLVM assumes the latter: after all, the > pointer to it hasn’t escaped. Yet. Except that in a weak memory > model, the escape can ‘time travel’... It's hard to understand exactly what you mean, but consider the following example: int *globalptr; int x; int *f() { int *p = kzalloc(sizeof(int)); L1: *p = 1; L2: foo(); return p; } void foo() { smp_store_release(&x, 2); } void thread0() { WRITE_ONCE(globalptr, f()); } void thread1() { int m, n; int *q; m = smp_load_acquire(&x); q = READ_ONCE(globalptr); if (m && q) n = *q; } (If you like, pretend each of these function definitions lives in a different source file -- it doesn't matter.) With no optimization, whenever thread1() reads *q it will always obtain 1, thanks to the store-release in foo() and the load-acquire() in thread1(). But if the compiler swaps L1 and L2 in f() then this is not guaranteed. On a weakly ordered architecture, thread1() could then get 0 from *q. I don't know if this is what you meant by "in a weak memory model, the escape can ‘time travel'". Regardless, it seems very clear that any compiler which swaps L1 and L2 in f() has a genuine bug. Alan Stern
> On Mar 24, 2024, at 11:22 AM, Alan Stern <stern@rowland.harvard.edu> wrote: > > I don't know if this is what you meant by "in a weak memory model, the > escape can ‘time travel'". Regardless, it seems very clear that any > compiler which swaps L1 and L2 in f() has a genuine bug. Yes, that’s what I meant. Clang thinks it’s valid to swap L1 and L2. Though, for it to actually happen, they would have to be in a loop, since the problematic optimization is “loop-invariant code motion". Here’s a modified version of your f() that shows the optimization in action: https://godbolt.org/z/bdaTjjvMs Anyway, my point is just that using LKMM doesn’t save you from the bug.
On Fri, Mar 22, 2024 at 04:38:35PM -0700, Boqun Feng wrote: > Hi, > > Since I see more and more Rust code is comming in, I feel like this > should be sent sooner rather than later, so here is a WIP to open the > discussion and get feedback. > > One of the most important questions we need to answer is: which > memory (ordering) model we should use when developing Rust in Linux > kernel, given Rust has its own memory ordering model[1]. I had some > discussion with Rust language community to understand their position > on this: > > https://github.com/rust-lang/unsafe-code-guidelines/issues/348#issuecomment-1218407557 > https://github.com/rust-lang/unsafe-code-guidelines/issues/476#issue-2001382992 > > My takeaway from these discussions, along with other offline discussion > is that supporting two memory models is challenging for both correctness > reasoning (some one needs to provide a model) and implementation (one > model needs to be aware of the other model). So that's not wise to do > (at least at the beginning). So the most reasonable option to me is: > > we only use LKMM for Rust code in kernel (i.e. avoid using > Rust's own atomic). > > Because kernel developers are more familiar with LKMM and when Rust code > interacts with C code, it has to use the model that C code uses. I think that makes sense; if nothing else it's consistent with how we handle the C atomics today. > And this patchset is the result of that option. I introduced an atomic > library to wrap and implement LKMM atomics (of course, given it's a WIP, > so it's unfinished). Things to notice: > > * I know I could use Rust macro to generate the whole set of atomics, > but I choose not to in the beginning, as I want to make it easier to > review. > > * Very likely, we will only have AtomicI32, AtomicI64 and AtomicUsize > (i.e no atomic for bool, u8, u16, etc), with limited support for > atomic load and store on 8/16 bits. > > * I choose to re-implement atomics in Rust `asm` because we are still > figuring out how we can make it easy and maintainable for Rust to call > a C function _inlinely_ (Gary makes some progress [2]). Otherwise, > atomic primitives would be function calls, and that can be performance > bottleneck in a few cases. I don't think we want to maintain two copies of each architecture's atomics. This gets painful very quickly (e.g. as arm64's atomics get patched between LL/SC and LSE forms). Can we start off with out-of-line atomics, and see where the bottlenecks are? It's relatively easy to do that today, at least for the atomic*_*() APIs: https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/commit/?h=atomics/outlined&id=e0a77bfa63e7416d610769aa4ab62bc06993ce56 ... which IIUC covers the "AtomicI32, AtomicI64 and AtomicUsize" cases you mention above. Mark.
On Fri, 2024-03-22 at 17:36 -0700, Linus Torvalds wrote: > On Fri, 22 Mar 2024 at 17:21, Kent Overstreet > <kent.overstreet@linux.dev> wrote: > > > > Besides that there's cross arch support to think about - it's hard > > to > > imagine us ever ditching our own atomics. > > > [... SNIP ...] > > > > I was thinking about something more incremental - just an optional > > mode > > where our atomics were C atomics underneath. It'd probably give the > > compiler people a much more effective way to test their stuff than > > anything they have now. > > I suspect it might be painful, and some compiler people would throw > their hands up in horror, because the C++ atomics model is based > fairly solidly on atomic types, and the kernel memory model is much > more fluid. > > Boqun already mentioned the "mixing access sizes", which is actually > quite fundamental in the kernel, where we play lots of games with > that > (typically around locking, where you find patterns line unlock > writing > a zero to a single byte, even though the whole lock data structure is > a word). And sometimes the access size games are very explicit (eg > lib/lockref.c). > > But it actually goes deeper than that. While we do have "atomic_t" > etc > for arithmetic atomics, and that probably would map fairly well to > C++ > atomics, in other cases we simply base our atomics not on _types_, > but > on code. > > IOW, we do things like "cmpxchg()", and the target of that atomic > access is just a regular data structure field. > > It's kind of like our "volatile" usage. If you read the C (and C++) > standards, you'll find that you should use "volatile" on data types. > That's almost *never* what the kernel does. The kernel uses > "volatile" > in _code_ (ie READ_ONCE() etc), and uses it by casting etc. > > Compiler people don't tend to really like those kinds of things. Just for my understanding: Why don't they like it? I guess since compiler people have to support volatile pointers anyways, temporarily casting something to such a volatile pointer shouldn't be a problem either – so they don't dislike it because it's more difficult to implement, but because it's more difficult to verify for correctness? P. > > Linus >
On Mon, 25 Mar 2024 at 06:57, Philipp Stanner <pstanner@redhat.com> wrote: > > On Fri, 2024-03-22 at 17:36 -0700, Linus Torvalds wrote: > > > > It's kind of like our "volatile" usage. If you read the C (and C++) > > standards, you'll find that you should use "volatile" on data types. > > That's almost *never* what the kernel does. The kernel uses > > "volatile" > > in _code_ (ie READ_ONCE() etc), and uses it by casting etc. > > > > Compiler people don't tend to really like those kinds of things. > > Just for my understanding: Why don't they like it? So I actually think most compiler people are perfectly fine with the kernel model of mostly doing 'volatile' not on the data structures themselves, but as accesses through casts. It's very traditional C, and there's actually nothing particularly odd about it. Not even from a compiler standpoint. In fact, I personally will argue that it is fundamentally wrong to think that the underlying data has to be volatile. A variable may be entirely stable in some cases (ie locks held), but not in others. So it's not the *variable* (aka "object") that is 'volatile', it's the *context* that makes a particular access volatile. That explains why the kernel has basically zero actual volatile objects, and 99% of all volatile accesses are done through accessor functions that use a cast to mark a particular access volatile. But I've had negative comments from compiler people who read the standards as language lawyers (which honestly, I despise - it's always possible to try to argue what the meaning of some wording is), and particularly C++ people used to be very very antsy about "volatile". They had some truly _serious_ problems with volatile. The C++ people spent absolutely insane amounts of time arguing about "volatile objects" vs "accesses", and how an access through a cast didn't make the underlying object volatile etc. There were endless discussions because a lvalue isn't supposed to be an access (an lvalue is something that is being acted on, and it shouldn't imply an access because an access will then cause other things in C++). So a statement expression that was just an lvalue shouldn't imply an access in C++ originally, but obviously when the thing was volatile it *had* to do so, and there was gnashing of teeth over this all. And all of it was purely semantic nitpicking about random wording. The C++ people finally tried to save face by claiming that it was always the C (not C++) rules that were unclear, and introduced the notion of "glvalue", and it's all good now, but there's literally decades of language lawyering and pointless nitpicking about the difference between "objects" and "accesses". Sane people didn't care, but if you reported a compiler bug about volatile use, you had better be ready to sit back and be flamed for how your volatile pointer cast wasn't an "object" and that the compiler that clearly generated wrong code was technically correct, and that your mother was a hamster. It's a bit like the NULL debacle. Another thing that took the C++ people a couple of decades to admit they were wrong all along, and that NULL isn't actually 'integer zero' in any sane language that claims to care deeply about types. [ And again, to save face, at no point did they say "ok, '(void *)0' is fine" - they introduced a new __nullptr thing just so that they wouldn't have to admit that their decades of arguing was just them being wrong. You'll find another decade of arguments explaining the finer details about _that_ difference ] It turns out that the people who are language-lawyering nitpickers are then happy to be "proven right" by adding some more pointless syntacting language-lawyering language. Which I guess makes sense, but to the rest of us it all looks a bit pointless. Linus
On Mon, Mar 25, 2024 at 10:44:43AM -0700, Linus Torvalds wrote: > On Mon, 25 Mar 2024 at 06:57, Philipp Stanner <pstanner@redhat.com> wrote: > > > > On Fri, 2024-03-22 at 17:36 -0700, Linus Torvalds wrote: > > > > > > It's kind of like our "volatile" usage. If you read the C (and C++) > > > standards, you'll find that you should use "volatile" on data types. > > > That's almost *never* what the kernel does. The kernel uses > > > "volatile" > > > in _code_ (ie READ_ONCE() etc), and uses it by casting etc. > > > > > > Compiler people don't tend to really like those kinds of things. > > > > Just for my understanding: Why don't they like it? > > So I actually think most compiler people are perfectly fine with the > kernel model of mostly doing 'volatile' not on the data structures > themselves, but as accesses through casts. > > It's very traditional C, and there's actually nothing particularly odd > about it. Not even from a compiler standpoint. > > In fact, I personally will argue that it is fundamentally wrong to > think that the underlying data has to be volatile. A variable may be > entirely stable in some cases (ie locks held), but not in others. > > So it's not the *variable* (aka "object") that is 'volatile', it's the > *context* that makes a particular access volatile. > > That explains why the kernel has basically zero actual volatile > objects, and 99% of all volatile accesses are done through accessor > functions that use a cast to mark a particular access volatile. > > But I've had negative comments from compiler people who read the > standards as language lawyers (which honestly, I despise - it's always > possible to try to argue what the meaning of some wording is), and > particularly C++ people used to be very very antsy about "volatile". > > They had some truly _serious_ problems with volatile. > > The C++ people spent absolutely insane amounts of time arguing about > "volatile objects" vs "accesses", and how an access through a cast > didn't make the underlying object volatile etc. To be fair, "volatile" dates from an era when we didn't have the haziest understanding of what a working memory model for C would look like or why we'd even want one. (Someone might want to think about depracating volatile on objects and adding compiler flag to disable it; I suspect it'd be a useful cleanup for the compiler guys if they could get rid of it.) The way the kernel uses volatile in e.g. READ_ONCE() is fully in line with modern thinking, just done with the tools available at the time. A more modern version would be just __atomic_load_n(ptr, __ATOMIC_RELAXED) Except C atomics only support pointer and integer types, READ_ONCE() supports anything up to machine word size - that's something the C people need to fix.
On Mon, 25 Mar 2024 at 11:59, Kent Overstreet <kent.overstreet@linux.dev> wrote: > > To be fair, "volatile" dates from an era when we didn't have the haziest > understanding of what a working memory model for C would look like or > why we'd even want one. I don't disagree, but I find it very depressing that now that we *do* know about memory models etc, the C++ memory model basically doubled down on the same "object" model. > The way the kernel uses volatile in e.g. READ_ONCE() is fully in line > with modern thinking, just done with the tools available at the time. A > more modern version would be just > > __atomic_load_n(ptr, __ATOMIC_RELAXED) Yes. Again, that's the *right* model in many ways, where you mark the *access*, not the variable. You make it completely and utterly clear that this is a very explicit access to memory. But that's not what C++ actually did. They went down the same old "volatile object" road, and instead of marking the access, they mark the object, and the way you do the above is std::atomic_int value; and then you just access 'value' and magic happens. EXACTLY the same way that volatile int value; works, in other words. With exactly the same downsides. And yes, I think that model is a nice shorthand. But it should be a *shorthand*, not the basis of the model. I do find it annoying, because the C++ people literally started out with shorthands. The whole "pass by reference" is literally nothing but a shorthand for pointers (ooh, scary scary pointers), where the address-of is implied at the call site, and the 'dereference' operation is implied at use. So it's not that shorthands are wrong. And it's not that C++ isn't already very fundamentally used to them. But despite that, the C++ memory model is very much designed around the broken object model, and as already shown in this thread, it causes actual immediate problems. And it's not just C++. Rust is supposed to be the really moden thing. And it made the *SAME* fundamental design mistake. IOW, the whole access size problem that Boqun described is *inherently* tied to the fact that the C++ and Rust memory model is badly designed from the wrong principles. Instead of designing it as a "this is an atomic object that you can do these operations on", it should have been "this is an atomic access, and you can use this simple object model to have the compiler generate the accesses for you". This is why I claim that LKMM is fundamentally better. It didn't start out from a bass-ackwards starting point of marking objects "atomic". And yes, the LKMM is a bit awkward, because we don't have the shorthands, so you have to write out "atomic_read()" and friends. Tough. It's better to be correct than to be simple. Linus
On Mon, Mar 25, 2024 at 10:44:45AM +0000, Mark Rutland wrote: [...] > > > > * I choose to re-implement atomics in Rust `asm` because we are still > > figuring out how we can make it easy and maintainable for Rust to call > > a C function _inlinely_ (Gary makes some progress [2]). Otherwise, > > atomic primitives would be function calls, and that can be performance > > bottleneck in a few cases. > > I don't think we want to maintain two copies of each architecture's atomics. > This gets painful very quickly (e.g. as arm64's atomics get patched between > LL/SC and LSE forms). > No argument here ;-) > Can we start off with out-of-line atomics, and see where the bottlenecks are? > > It's relatively easy to do that today, at least for the atomic*_*() APIs: > > https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/commit/?h=atomics/outlined&id=e0a77bfa63e7416d610769aa4ab62bc06993ce56 > > ... which IIUC covers the "AtomicI32, AtomicI64 and AtomicUsize" cases you > mention above. > Thanks! Yes, I know I should check with you before I finalize the implementation ;-) I will try to integrate that but things to notice: * For module usage, we need to EXPORT_SYMBOL_GPL() all the atomics, I'm OK with that, but I don't know how others feel about it. * Alice reported performance gap between inline and out-of-line refcount operations in Rust binder driver: https://github.com/Darksonn/linux/commit/b4be1bd6c44225bf7276a4666fd30b8da9cba517 I don't know how much worse since I don't have the data, but that's one of the reasons I started with inline asm. That being said, I totally agree that we could start with out-of-line atomics, and maybe provide inline version for performance critical paths. Hoping is we can figure out how Rust could inline a C function eventually. Regards, Boqun > Mark.
On Mon, Mar 25, 2024 at 12:44:34PM -0700, Linus Torvalds wrote: > On Mon, 25 Mar 2024 at 11:59, Kent Overstreet <kent.overstreet@linux.dev> wrote: > > > > To be fair, "volatile" dates from an era when we didn't have the haziest > > understanding of what a working memory model for C would look like or > > why we'd even want one. > > I don't disagree, but I find it very depressing that now that we *do* > know about memory models etc, the C++ memory model basically doubled > down on the same "object" model. > > > The way the kernel uses volatile in e.g. READ_ONCE() is fully in line > > with modern thinking, just done with the tools available at the time. A > > more modern version would be just > > > > __atomic_load_n(ptr, __ATOMIC_RELAXED) > > Yes. Again, that's the *right* model in many ways, where you mark the > *access*, not the variable. You make it completely and utterly clear > that this is a very explicit access to memory. > > But that's not what C++ actually did. They went down the same old > "volatile object" road, and instead of marking the access, they mark > the object, and the way you do the above is > > std::atomic_int value; > > and then you just access 'value' and magic happens. > > EXACTLY the same way that > > volatile int value; > > works, in other words. With exactly the same downsides. Yeah that's crap. Unfortunate too, because this does need to be a type system thing and we have all the tools to do it correctly now. What we need is for loads and stores to be explict, and that absolutely can and should be a type system thing. In Rust terminology, what we want is Volatile<T> where T is any type that fits in a machine word, and the only operations it supports are get(), set(), xchg() and cmpxchG(). You DO NOT want it to be possible to transparantly use Volatile<T> in place of a regular T - in exactly the same way as an atomic_t can't be used in place of a regular integer.
On Mon, Mar 25, 2024 at 05:14:41PM -0400, Kent Overstreet wrote: > On Mon, Mar 25, 2024 at 12:44:34PM -0700, Linus Torvalds wrote: > > On Mon, 25 Mar 2024 at 11:59, Kent Overstreet <kent.overstreet@linux.dev> wrote: > > > > > > To be fair, "volatile" dates from an era when we didn't have the haziest > > > understanding of what a working memory model for C would look like or > > > why we'd even want one. > > > > I don't disagree, but I find it very depressing that now that we *do* > > know about memory models etc, the C++ memory model basically doubled > > down on the same "object" model. > > > > > The way the kernel uses volatile in e.g. READ_ONCE() is fully in line > > > with modern thinking, just done with the tools available at the time. A > > > more modern version would be just > > > > > > __atomic_load_n(ptr, __ATOMIC_RELAXED) Note that Rust does have something similiar: https://doc.rust-lang.org/std/ptr/fn.read_volatile.html pub unsafe fn read_volatile<T>(src: *const T) -> T (and also write_volatile()). So they made a good design putting the volatile on the accesses rather than the type. However, per the current Rust memory model these two primitives will be UB when data races happen :-( I mean, sure, if I use read_volatile() on an enum (whose valid values are only 0, 1, 2), and I get a value 3, and the compiler says "you have a logic bug and I refuse to compile the program correctly", I'm OK. But if I use read_volatile() to read something like a u32, and I know it's racy so my program actually handle that, I don't know any sane compiler would miss-compile, so I don't know why that has to be a UB. > > > > Yes. Again, that's the *right* model in many ways, where you mark the > > *access*, not the variable. You make it completely and utterly clear > > that this is a very explicit access to memory. > > > > But that's not what C++ actually did. They went down the same old > > "volatile object" road, and instead of marking the access, they mark > > the object, and the way you do the above is > > > > std::atomic_int value; > > > > and then you just access 'value' and magic happens. > > > > EXACTLY the same way that > > > > volatile int value; > > > > works, in other words. With exactly the same downsides. > > Yeah that's crap. Unfortunate too, because this does need to be a type > system thing and we have all the tools to do it correctly now. > > What we need is for loads and stores to be explict, and that absolutely > can and should be a type system thing. > > In Rust terminology, what we want is > > Volatile<T> > > where T is any type that fits in a machine word, and the only operations > it supports are get(), set(), xchg() and cmpxchG(). > > You DO NOT want it to be possible to transparantly use Volatile<T> in > place of a regular T - in exactly the same way as an atomic_t can't be > used in place of a regular integer. Yes, this is useful. But no it's not that useful, how could you use that to read another CPU's stack during some debug functions in a way you know it's racy? Regards, Boqun
On Mon, Mar 25, 2024 at 02:37:14PM -0700, Boqun Feng wrote: > On Mon, Mar 25, 2024 at 05:14:41PM -0400, Kent Overstreet wrote: > > On Mon, Mar 25, 2024 at 12:44:34PM -0700, Linus Torvalds wrote: > > > On Mon, 25 Mar 2024 at 11:59, Kent Overstreet <kent.overstreet@linux.dev> wrote: > > > > > > > > To be fair, "volatile" dates from an era when we didn't have the haziest > > > > understanding of what a working memory model for C would look like or > > > > why we'd even want one. > > > > > > I don't disagree, but I find it very depressing that now that we *do* > > > know about memory models etc, the C++ memory model basically doubled > > > down on the same "object" model. > > > > > > > The way the kernel uses volatile in e.g. READ_ONCE() is fully in line > > > > with modern thinking, just done with the tools available at the time. A > > > > more modern version would be just > > > > > > > > __atomic_load_n(ptr, __ATOMIC_RELAXED) > > Note that Rust does have something similiar: > > https://doc.rust-lang.org/std/ptr/fn.read_volatile.html > > pub unsafe fn read_volatile<T>(src: *const T) -> T > > (and also write_volatile()). So they made a good design putting the > volatile on the accesses rather than the type. However, per the current > Rust memory model these two primitives will be UB when data races happen > :-( > > I mean, sure, if I use read_volatile() on an enum (whose valid values > are only 0, 1, 2), and I get a value 3, and the compiler says "you have > a logic bug and I refuse to compile the program correctly", I'm OK. But > if I use read_volatile() to read something like a u32, and I know it's > racy so my program actually handle that, I don't know any sane compiler > would miss-compile, so I don't know why that has to be a UB. Well, if T is too big to read/write atomically then you'll get torn reads, including potentially a bit representation that is not a valid T. Which is why the normal read_volatile<> or Volatile<> should disallow that. > > where T is any type that fits in a machine word, and the only operations > > it supports are get(), set(), xchg() and cmpxchG(). > > > > You DO NOT want it to be possible to transparantly use Volatile<T> in > > place of a regular T - in exactly the same way as an atomic_t can't be > > used in place of a regular integer. > > Yes, this is useful. But no it's not that useful, how could you use that > to read another CPU's stack during some debug functions in a way you > know it's racy? That's a pretty difficult thing to do, because you don't know the _layout_ of the other CPU's stack, and even if you do it's going to be changing underneath you without locking. So the races thare are equivalent to a bad mem::transmute(), and that is very much UB. For a more typical usage of volatile, consider a ringbuffer with one thread producing and another thread consuming. Then you've got head and tail pointers, each written by one thread and read by another. You don't need any locking, just memory barriers and READ_ONCE()/WRITE_ONCE() to update the head and tail pointers. If you were writing this in Rust today the easy way would be an atomic integer, but that's not really correct - you're not doing atomic operations (locked arithmetic), just volatile reads and writes. Volatile<T> would be Send and Sync, just like atomic integers. You don't need locking if you're just working with single values that are small enough for the machine to read/write atomically.
On Mon, Mar 25, 2024 at 06:09:19PM -0400, Kent Overstreet wrote: > On Mon, Mar 25, 2024 at 02:37:14PM -0700, Boqun Feng wrote: > > On Mon, Mar 25, 2024 at 05:14:41PM -0400, Kent Overstreet wrote: > > > On Mon, Mar 25, 2024 at 12:44:34PM -0700, Linus Torvalds wrote: > > > > On Mon, 25 Mar 2024 at 11:59, Kent Overstreet <kent.overstreet@linux.dev> wrote: > > > > > > > > > > To be fair, "volatile" dates from an era when we didn't have the haziest > > > > > understanding of what a working memory model for C would look like or > > > > > why we'd even want one. > > > > > > > > I don't disagree, but I find it very depressing that now that we *do* > > > > know about memory models etc, the C++ memory model basically doubled > > > > down on the same "object" model. > > > > > > > > > The way the kernel uses volatile in e.g. READ_ONCE() is fully in line > > > > > with modern thinking, just done with the tools available at the time. A > > > > > more modern version would be just > > > > > > > > > > __atomic_load_n(ptr, __ATOMIC_RELAXED) > > > > Note that Rust does have something similiar: > > > > https://doc.rust-lang.org/std/ptr/fn.read_volatile.html > > > > pub unsafe fn read_volatile<T>(src: *const T) -> T > > > > (and also write_volatile()). So they made a good design putting the > > volatile on the accesses rather than the type. However, per the current > > Rust memory model these two primitives will be UB when data races happen > > :-( > > > > I mean, sure, if I use read_volatile() on an enum (whose valid values > > are only 0, 1, 2), and I get a value 3, and the compiler says "you have > > a logic bug and I refuse to compile the program correctly", I'm OK. But > > if I use read_volatile() to read something like a u32, and I know it's > > racy so my program actually handle that, I don't know any sane compiler > > would miss-compile, so I don't know why that has to be a UB. > > Well, if T is too big to read/write atomically then you'll get torn > reads, including potentially a bit representation that is not a valid T. > > Which is why the normal read_volatile<> or Volatile<> should disallow > that. > Well, why a racy read_volatile<> is UB on a T who is valid for all bit representations is what I was complaining about ;-) > > > where T is any type that fits in a machine word, and the only operations > > > it supports are get(), set(), xchg() and cmpxchG(). > > > > > > You DO NOT want it to be possible to transparantly use Volatile<T> in > > > place of a regular T - in exactly the same way as an atomic_t can't be > > > used in place of a regular integer. > > > > Yes, this is useful. But no it's not that useful, how could you use that > > to read another CPU's stack during some debug functions in a way you > > know it's racy? > > That's a pretty difficult thing to do, because you don't know the > _layout_ of the other CPU's stack, and even if you do it's going to be > changing underneath you without locking. > It's a debug function, I don't care whether the data is accurate, I just want to get much information as possible. This kinda of usage, along with cases where the alorigthms are racy themselves are the primary reasons of volatile _accesses_ instead of volatile _types_. For example, you want to read ahead of a counter protected by a lock: if (unlikely(READ_ONCE(cnt))) { spin_lock(lock); int c = cnt; // update of the cnt is protected by a lock. ... } because you want to skip the case where cnt == 0 in a hotpath, and you know someone is going to check this again in some slowpath, so inaccurate data doesn't matter. > So the races thare are equivalent to a bad mem::transmute(), and that is > very much UB. > > For a more typical usage of volatile, consider a ringbuffer with one > thread producing and another thread consuming. Then you've got head and > tail pointers, each written by one thread and read by another. > > You don't need any locking, just memory barriers and > READ_ONCE()/WRITE_ONCE() to update the head and tail pointers. If you > were writing this in Rust today the easy way would be an atomic integer, > but that's not really correct - you're not doing atomic operations > (locked arithmetic), just volatile reads and writes. > Confused, I don't see how Volatile<T> is better than just atomic in this case, since atomc_load() and atomic_store() are also not locked in any memory model if lockless implementation is available. > Volatile<T> would be Send and Sync, just like atomic integers. You don't > need locking if you're just working with single values that are small > enough for the machine to read/write atomically. So to me Volatile<T> can help in the cases where we know some memory is "external", for example a MMIO address, or ringbuffer between guests and hypervisor. But it doesn't really fix the missing functionality here: allow generating a plain "mov" instruction on x86 for example on _any valid memory_, and programmers can take care of the result. Regards, Boqun
On Mon, Mar 25, 2024 at 03:38:32PM -0700, Boqun Feng wrote: > On Mon, Mar 25, 2024 at 06:09:19PM -0400, Kent Overstreet wrote: > > On Mon, Mar 25, 2024 at 02:37:14PM -0700, Boqun Feng wrote: > > > On Mon, Mar 25, 2024 at 05:14:41PM -0400, Kent Overstreet wrote: > > > > On Mon, Mar 25, 2024 at 12:44:34PM -0700, Linus Torvalds wrote: > > > > > On Mon, 25 Mar 2024 at 11:59, Kent Overstreet <kent.overstreet@linux.dev> wrote: > > > > > > > > > > > > To be fair, "volatile" dates from an era when we didn't have the haziest > > > > > > understanding of what a working memory model for C would look like or > > > > > > why we'd even want one. > > > > > > > > > > I don't disagree, but I find it very depressing that now that we *do* > > > > > know about memory models etc, the C++ memory model basically doubled > > > > > down on the same "object" model. > > > > > > > > > > > The way the kernel uses volatile in e.g. READ_ONCE() is fully in line > > > > > > with modern thinking, just done with the tools available at the time. A > > > > > > more modern version would be just > > > > > > > > > > > > __atomic_load_n(ptr, __ATOMIC_RELAXED) > > > > > > Note that Rust does have something similiar: > > > > > > https://doc.rust-lang.org/std/ptr/fn.read_volatile.html > > > > > > pub unsafe fn read_volatile<T>(src: *const T) -> T > > > > > > (and also write_volatile()). So they made a good design putting the > > > volatile on the accesses rather than the type. However, per the current > > > Rust memory model these two primitives will be UB when data races happen > > > :-( > > > > > > I mean, sure, if I use read_volatile() on an enum (whose valid values > > > are only 0, 1, 2), and I get a value 3, and the compiler says "you have > > > a logic bug and I refuse to compile the program correctly", I'm OK. But > > > if I use read_volatile() to read something like a u32, and I know it's > > > racy so my program actually handle that, I don't know any sane compiler > > > would miss-compile, so I don't know why that has to be a UB. > > > > Well, if T is too big to read/write atomically then you'll get torn > > reads, including potentially a bit representation that is not a valid T. > > > > Which is why the normal read_volatile<> or Volatile<> should disallow > > that. > > > > Well, why a racy read_volatile<> is UB on a T who is valid for all bit > representations is what I was complaining about ;-) yeah, that should not be considered UB; that should be an easy fix. Are you talking to Rust compiler people about this stuff? I've been meaning to make my own contacts there, but - sadly, busy as hell. > > > > where T is any type that fits in a machine word, and the only operations > > > > it supports are get(), set(), xchg() and cmpxchG(). > > > > > > > > You DO NOT want it to be possible to transparantly use Volatile<T> in > > > > place of a regular T - in exactly the same way as an atomic_t can't be > > > > used in place of a regular integer. > > > > > > Yes, this is useful. But no it's not that useful, how could you use that > > > to read another CPU's stack during some debug functions in a way you > > > know it's racy? > > > > That's a pretty difficult thing to do, because you don't know the > > _layout_ of the other CPU's stack, and even if you do it's going to be > > changing underneath you without locking. > > > > It's a debug function, I don't care whether the data is accurate, I just > want to get much information as possible. yeah, if you just want the typical backtrace functionality where you're just looking for instruction pointers - that's perfectly straightforward. > This kinda of usage, along > with cases where the alorigthms are racy themselves are the primary > reasons of volatile _accesses_ instead of volatile _types_. For example, > you want to read ahead of a counter protected by a lock: > > if (unlikely(READ_ONCE(cnt))) { > spin_lock(lock); > int c = cnt; // update of the cnt is protected by a lock. > ... > } > > because you want to skip the case where cnt == 0 in a hotpath, and you > know someone is going to check this again in some slowpath, so > inaccurate data doesn't matter. That's an interesting one because in Rust cnt is "inside" the lock, you can't access it at all without taking the lock - and usually that's exactly right. So to allow this we'd annotate in the type definition (with an attribute) which fields we allow read access to without the lock, then with some proc macro wizardry we'd get accessors that we can call without the lock held. So that probably wouldn't be a Volatile<T> thing, that'd be coupled with the lock implementation because that's where the accessors would hang off of and they'd internally probably just use mem::volatile_read(). > > So the races thare are equivalent to a bad mem::transmute(), and that is > > very much UB. > > > > For a more typical usage of volatile, consider a ringbuffer with one > > thread producing and another thread consuming. Then you've got head and > > tail pointers, each written by one thread and read by another. > > > > You don't need any locking, just memory barriers and > > READ_ONCE()/WRITE_ONCE() to update the head and tail pointers. If you > > were writing this in Rust today the easy way would be an atomic integer, > > but that's not really correct - you're not doing atomic operations > > (locked arithmetic), just volatile reads and writes. > > > > Confused, I don't see how Volatile<T> is better than just atomic in this > case, since atomc_load() and atomic_store() are also not locked in any > memory model if lockless implementation is available. It certainly compiles to the same code, yeah. But Volatile<T> really is the more primitive/generic concept, Atomic<T> is a specialization. > > Volatile<T> would be Send and Sync, just like atomic integers. You don't > > need locking if you're just working with single values that are small > > enough for the machine to read/write atomically. > > So to me Volatile<T> can help in the cases where we know some memory is > "external", for example a MMIO address, or ringbuffer between guests and > hypervisor. But it doesn't really fix the missing functionality here: > allow generating a plain "mov" instruction on x86 for example on _any > valid memory_, and programmers can take care of the result. You're talking about going completely outside the type system, though. There is a need for that, but it's very rare and something we really want to discourage. Usually, even with volatile access, you do know the type - and even if you don't, you have to treat it as _something_ so Volatile<ulong> is probably as good as anything.
On Mon, Mar 25, 2024 at 07:02:12PM -0400, Kent Overstreet wrote: > On Mon, Mar 25, 2024 at 03:38:32PM -0700, Boqun Feng wrote: > > On Mon, Mar 25, 2024 at 06:09:19PM -0400, Kent Overstreet wrote: > > > On Mon, Mar 25, 2024 at 02:37:14PM -0700, Boqun Feng wrote: > > > > On Mon, Mar 25, 2024 at 05:14:41PM -0400, Kent Overstreet wrote: > > > > > On Mon, Mar 25, 2024 at 12:44:34PM -0700, Linus Torvalds wrote: > > > > > > On Mon, 25 Mar 2024 at 11:59, Kent Overstreet <kent.overstreet@linux.dev> wrote: > > > > > > > > > > > > > > To be fair, "volatile" dates from an era when we didn't have the haziest > > > > > > > understanding of what a working memory model for C would look like or > > > > > > > why we'd even want one. > > > > > > > > > > > > I don't disagree, but I find it very depressing that now that we *do* > > > > > > know about memory models etc, the C++ memory model basically doubled > > > > > > down on the same "object" model. > > > > > > > > > > > > > The way the kernel uses volatile in e.g. READ_ONCE() is fully in line > > > > > > > with modern thinking, just done with the tools available at the time. A > > > > > > > more modern version would be just > > > > > > > > > > > > > > __atomic_load_n(ptr, __ATOMIC_RELAXED) > > > > > > > > Note that Rust does have something similiar: > > > > > > > > https://doc.rust-lang.org/std/ptr/fn.read_volatile.html > > > > > > > > pub unsafe fn read_volatile<T>(src: *const T) -> T > > > > > > > > (and also write_volatile()). So they made a good design putting the > > > > volatile on the accesses rather than the type. However, per the current > > > > Rust memory model these two primitives will be UB when data races happen > > > > :-( > > > > > > > > I mean, sure, if I use read_volatile() on an enum (whose valid values > > > > are only 0, 1, 2), and I get a value 3, and the compiler says "you have > > > > a logic bug and I refuse to compile the program correctly", I'm OK. But > > > > if I use read_volatile() to read something like a u32, and I know it's > > > > racy so my program actually handle that, I don't know any sane compiler > > > > would miss-compile, so I don't know why that has to be a UB. > > > > > > Well, if T is too big to read/write atomically then you'll get torn > > > reads, including potentially a bit representation that is not a valid T. > > > > > > Which is why the normal read_volatile<> or Volatile<> should disallow > > > that. > > > > > > > Well, why a racy read_volatile<> is UB on a T who is valid for all bit > > representations is what I was complaining about ;-) > > yeah, that should not be considered UB; that should be an easy fix. Are > you talking to Rust compiler people about this stuff? I've been meaning Here you go: https://rust-lang.zulipchat.com/#narrow/stream/136281-t-opsem/topic/UB.20caused.20by.20races.20on.20.60.7Bread.2Cwrite.7D_volatile.60/near/399343771 but maybe instead of Rust, LLVM is the best one to talk with on this. Because my take from the communication with Rust folks is more like "it's more up to LLVM on this". > to make my own contacts there, but - sadly, busy as hell. > > > > > > where T is any type that fits in a machine word, and the only operations > > > > > it supports are get(), set(), xchg() and cmpxchG(). > > > > > > > > > > You DO NOT want it to be possible to transparantly use Volatile<T> in > > > > > place of a regular T - in exactly the same way as an atomic_t can't be > > > > > used in place of a regular integer. > > > > > > > > Yes, this is useful. But no it's not that useful, how could you use that > > > > to read another CPU's stack during some debug functions in a way you > > > > know it's racy? > > > > > > That's a pretty difficult thing to do, because you don't know the > > > _layout_ of the other CPU's stack, and even if you do it's going to be > > > changing underneath you without locking. > > > > > > > It's a debug function, I don't care whether the data is accurate, I just > > want to get much information as possible. > > yeah, if you just want the typical backtrace functionality where you're > just looking for instruction pointers - that's perfectly > straightforward. > > > This kinda of usage, along > > with cases where the alorigthms are racy themselves are the primary > > reasons of volatile _accesses_ instead of volatile _types_. For example, > > you want to read ahead of a counter protected by a lock: > > > > if (unlikely(READ_ONCE(cnt))) { > > spin_lock(lock); > > int c = cnt; // update of the cnt is protected by a lock. > > ... > > } > > > > because you want to skip the case where cnt == 0 in a hotpath, and you > > know someone is going to check this again in some slowpath, so > > inaccurate data doesn't matter. > > That's an interesting one because in Rust cnt is "inside" the lock, you > can't access it at all without taking the lock - and usually that's > exactly right. > (Now you mention that, once I was trying to construct a deadlock case with some Rust code, but I couldn't since the locks are naturally hierarchical because of the types, therefore it's impossible to reverse the lock ordering. Yes, you still can have deadlocks in Rust, but that hierarchial type trees really help a lot). > So to allow this we'd annotate in the type definition (with an > attribute) which fields we allow read access to without the lock, then > with some proc macro wizardry we'd get accessors that we can call without > the lock held. > Right, that's field projection: https://internals.rust-lang.org/t/pre-rfc-field-projection/17383 > So that probably wouldn't be a Volatile<T> thing, that'd be coupled with > the lock implementation because that's where the accessors would hang > off of and they'd internally probably just use mem::volatile_read(). > So we can play the type game as deep as we can, and I'm sure it'll be helpful, but in the same time, having a reasonable {read,write}_volatile() semantics on UB and races is also what we need. > > > So the races thare are equivalent to a bad mem::transmute(), and that is > > > very much UB. > > > > > > For a more typical usage of volatile, consider a ringbuffer with one > > > thread producing and another thread consuming. Then you've got head and > > > tail pointers, each written by one thread and read by another. > > > > > > You don't need any locking, just memory barriers and > > > READ_ONCE()/WRITE_ONCE() to update the head and tail pointers. If you > > > were writing this in Rust today the easy way would be an atomic integer, > > > but that's not really correct - you're not doing atomic operations > > > (locked arithmetic), just volatile reads and writes. > > > > > > > Confused, I don't see how Volatile<T> is better than just atomic in this > > case, since atomc_load() and atomic_store() are also not locked in any > > memory model if lockless implementation is available. > > It certainly compiles to the same code, yeah. But Volatile<T> really is > the more primitive/generic concept, Atomic<T> is a specialization. > > > > Volatile<T> would be Send and Sync, just like atomic integers. You don't > > > need locking if you're just working with single values that are small > > > enough for the machine to read/write atomically. > > > > So to me Volatile<T> can help in the cases where we know some memory is > > "external", for example a MMIO address, or ringbuffer between guests and > > hypervisor. But it doesn't really fix the missing functionality here: > > allow generating a plain "mov" instruction on x86 for example on _any > > valid memory_, and programmers can take care of the result. > > You're talking about going completely outside the type system, though. > There is a need for that, but it's very rare and something we really > want to discourage. Usually, even with volatile access, you do know the Hey, in memory ordering model areas, we are supposed to work on these rare cases ;-) These are building blocks for high level synchronization constructions, so I'm entitled to complain ;-) But yes, to your point, type system can help a lot, however, there are still cases we need to roll our own. Regard, Boqun > type - and even if you don't, you have to treat it as _something_ so > Volatile<ulong> is probably as good as anything.
* Linus Torvalds (torvalds@linux-foundation.org) wrote: <snip> > IOW, the whole access size problem that Boqun described is > *inherently* tied to the fact that the C++ and Rust memory model is > badly designed from the wrong principles. > > Instead of designing it as a "this is an atomic object that you can do > these operations on", it should have been "this is an atomic access, > and you can use this simple object model to have the compiler generate > the accesses for you". Isn't one of the aims of the Rust/C++ idea that you can't forget to access a shared piece of data atomically? If you want to have 'atomic accesses' explicitly, how do you tell the compiler what you can use them on, and when it should stop you mixing them with normal accesses on the same object? Dave > This is why I claim that LKMM is fundamentally better. It didn't start > out from a bass-ackwards starting point of marking objects "atomic". > > And yes, the LKMM is a bit awkward, because we don't have the > shorthands, so you have to write out "atomic_read()" and friends. > > Tough. It's better to be correct than to be simple. > > Linus >
On Tue, Mar 26, 2024 at 12:05:48AM +0000, Dr. David Alan Gilbert wrote: > * Linus Torvalds (torvalds@linux-foundation.org) wrote: > > <snip> > > > IOW, the whole access size problem that Boqun described is > > *inherently* tied to the fact that the C++ and Rust memory model is > > badly designed from the wrong principles. > > > > Instead of designing it as a "this is an atomic object that you can do > > these operations on", it should have been "this is an atomic access, > > and you can use this simple object model to have the compiler generate > > the accesses for you". > > Isn't one of the aims of the Rust/C++ idea that you can't forget to access > a shared piece of data atomically? > > If you want to have 'atomic accesses' explicitly, how do you tell the compiler > what you can use them on, and when it should stop you mixing them with > normal accesses on the same object? "can't forget to access data atomically" - that's only half of it. And atomic accesses loads/stores are not a thing under the hood, they're just loads and stores (possibly, but not necessarily, with memory barriers). The other half is at the _source_ level you don't want to treat accesses to volatiles/atomics like accesses to normal variables, you really want those to be explicit, and not look like normal variable accesses. std:atomic_int is way better than volatile in the sense that it's not a barely specified mess, but adding operator overloading was just gratuitious and unnecessary. This is a theme with C++ - they add a _ton_ of magic to make things concise and pretty, but you have to understand in intimate detail what all that magic is doing or you're totally fucked. std::atomic_int makes it such that just changing a single line of code in a single location in your program will change the semantics of your _entire_ program and the only obserable result will be that it's faster but a ticking time bomb because you just introduced a ton of races. With Rust - I honestly haven't looked at whether they added operator overlaoding for their atomics, but it's _much_ less of a concern because changing the type to the non-atomic version means your program won't compile if it's now racy.
* Kent Overstreet (kent.overstreet@linux.dev) wrote: > On Tue, Mar 26, 2024 at 12:05:48AM +0000, Dr. David Alan Gilbert wrote: > > * Linus Torvalds (torvalds@linux-foundation.org) wrote: > > > > <snip> > > > > > IOW, the whole access size problem that Boqun described is > > > *inherently* tied to the fact that the C++ and Rust memory model is > > > badly designed from the wrong principles. > > > > > > Instead of designing it as a "this is an atomic object that you can do > > > these operations on", it should have been "this is an atomic access, > > > and you can use this simple object model to have the compiler generate > > > the accesses for you". > > > > Isn't one of the aims of the Rust/C++ idea that you can't forget to access > > a shared piece of data atomically? > > > > If you want to have 'atomic accesses' explicitly, how do you tell the compiler > > what you can use them on, and when it should stop you mixing them with > > normal accesses on the same object? > > "can't forget to access data atomically" - that's only half of it. And > atomic accesses loads/stores are not a thing under the hood, they're > just loads and stores (possibly, but not necessarily, with memory > barriers). That's quite architecturally specific isn't it? Or is this the distinction between accesses that are implicitly atomic (i.e. naturally aligned word) and things that are locked/exclusive? (either with a 'lock' on x86 or load-exclusive/store exclusive on some others)? Which are we talking about here? > The other half is at the _source_ level you don't want to treat accesses > to volatiles/atomics like accesses to normal variables, you really want > those to be explicit, and not look like normal variable accesses. > > std:atomic_int is way better than volatile in the sense that it's not a > barely specified mess, but adding operator overloading was just > gratuitious and unnecessary. > > This is a theme with C++ - they add a _ton_ of magic to make things > concise and pretty, but you have to understand in intimate detail what > all that magic is doing or you're totally fucked. > > std::atomic_int makes it such that just changing a single line of code > in a single location in your program will change the semantics of your > _entire_ program and the only obserable result will be that it's faster > but a ticking time bomb because you just introduced a ton of races. > > With Rust - I honestly haven't looked at whether they added operator > overlaoding for their atomics, but it's _much_ less of a concern because > changing the type to the non-atomic version means your program won't > compile if it's now racy. OK, so that's essentially the opposite worry of what I was saying; I was worrying about people forgetting to use an atomic access to a shared variable; I think you're worrying about people forgetting to mark a variable shared and since the accesses are the same nothing shouts? Dave
On Tue, Mar 26, 2024 at 12:05:48AM +0000, Dr. David Alan Gilbert wrote: > * Linus Torvalds (torvalds@linux-foundation.org) wrote: > > <snip> > > > IOW, the whole access size problem that Boqun described is > > *inherently* tied to the fact that the C++ and Rust memory model is > > badly designed from the wrong principles. > > > > Instead of designing it as a "this is an atomic object that you can do > > these operations on", it should have been "this is an atomic access, > > and you can use this simple object model to have the compiler generate > > the accesses for you". > > Isn't one of the aims of the Rust/C++ idea that you can't forget to access > a shared piece of data atomically? > > If you want to have 'atomic accesses' explicitly, how do you tell the compiler > what you can use them on, and when it should stop you mixing them with > normal accesses on the same object? > Well, you can just wrap it in your own atomic types, can't you? If the atomic primitives that a language provides is access-based, users can create their own atomic types or language can provide via standard library, but mixed usage is still allowed when it makes sense (debug functionality, low level concurrent code that utilizes races, etc.) But if the atomic primitives that a language provides is type-based, then you're limited to what you can do. It might be totally fine as Linus pointed out, if you just write a portable library, and don't want to care about architectural details. But that's not the case in Linux kernel. Regards, Boqun > Dave > > > This is why I claim that LKMM is fundamentally better. It didn't start > > out from a bass-ackwards starting point of marking objects "atomic". > > > > And yes, the LKMM is a bit awkward, because we don't have the > > shorthands, so you have to write out "atomic_read()" and friends. > > > > Tough. It's better to be correct than to be simple. > > > > Linus > > > -- > -----Open up your eyes, open up your mind, open up your code ------- > / Dr. David Alan Gilbert | Running GNU/Linux | Happy \ > \ dave @ treblig.org | | In Hex / > \ _________________________|_____ http://www.treblig.org |_______/
On Tue, Mar 26, 2024 at 01:35:46AM +0000, Dr. David Alan Gilbert wrote: > OK, so that's essentially the opposite worry of what I was saying; I was > worrying about people forgetting to use an atomic access to a shared > variable; I think you're worrying about people forgetting to mark > a variable shared and since the accesses are the same nothing shouts? In biological evolution, novel useful traits are generally not accessible via a single mutation; many neutral mutations are required first. Evolution is able to proceed quickly because there are a great many neutral mutations (that is, evolution quickly searches all possible paths to find accessible positive mutations), and because negative mutations are culled quickly - often before the first cell division. (The most common mutation being the addition or deletion of a base pair; but amino acids are coded for by groups of three base pairs, so that shifts everything on the chromosone after the mutation so that it codes for completely different amino acids. That cell won't live to divide again). Actual genetic diseases that significantly impair fitness are quite rare, and if they weren't we'd have a major problem. Programming at scale is million monkeys stuff - we're all hammering on our keyboards at random, the good programs survive and the bad programs are forgotten. Similarly to biological evolution, we want most edits to a program to result in a program that either still works, or fails immediately - fails to compile, or is caught immediately by basic testing. If edits can result in latent undefined behaviour or programs that _mostly_ work, and then explode in unpredictable ways weeks/months/years later - that's a huge problem. In the worst case, those bugs/negative mutations accumulate faster than they can be culled. Thank god we have source control. Places where we're working with extremely loose synchronization - no locking, raw memory barriers - are the worst kind of hand grenade, they result in bugs that are impossible to cull quickly. In kernel programming, we're always walking around with live hand grenades. So what do we do? We slow down, we take every step slowly and intentionally while telling everyone not to bump us because we're holding a live hand grenade - raw atomics, raw unlocked variables, memory barriers, they all get special care and extra comments. And we all have fuck-tons of code we need to be able to understand, review, debug and maintain, so we always try to write our code in a style where the if it's wrong, we'll see that _locally_, without having to go through and remember how _everything_ out of the possibly thousands of relevant lines work. I'm personally responsible for over 100k LOC of highly intricate code with high consequences for failure, and regularly have to debug issues arising somewhere in north of a million LOC - and when something goes wrong I have to be able to fully debug it _quickly_. What C++ does is like taking those hand grenades, with the pin already out - and leaving one under the couch cushions, another in the silverware drawer, another in the laundry basket - and expecting you to remember where you put them. Going back to the C++ example, the really fatal thing with how they do it is how a change in one line of code can completely change the semantics of a bunch of different code, and no human reviewer can be expected to catch bugs that might introduce and the compiler certainly won't. Now imagine multiple people working on the same code, at different times. Now imagine patches getting mixed up, reordered, one of them getting lost, merge conflicts - i.e. shit that happens all the time, and what happens if you're using C++ style atomics. Terrifying stuff.
On Mon, 25 Mar 2024 at 17:05, Dr. David Alan Gilbert <dave@treblig.org> wrote: > > Isn't one of the aims of the Rust/C++ idea that you can't forget to access > a shared piece of data atomically? If that is an aim, it's a really *bad* one. Really. It very much should never have been an aim, and I hope it wasn't. I think, and hope, that the source of the C++ and Rust bad decisions is cluelessness, not active malice. Take Rust - one big point of Rust is the whole "safe" thing, but it's very much not a straightjacket like Pascal was. There's a "safe" part to Rust, but equally importantly, there's also the "unsafe" part to Rust. The safe part is the one that most programmers are supposed to use. It's the one that allows you to not have to worry too much about things. It's the part that makes it much harder to screw up. But the *unsafe* part is what makes Rust powerful. It's the part that works behind the curtain. It's the part that may be needed to make the safe parts *work*. And yes, an application programmer might never actually need to use it, and in fact in many projects the rule might be that unsafe Rust is simply never even an option - but that doesn't mean that the unsafe parts don't exist. Because those unsafe parts are needed to make it all work in reality. And you should never EVER base your whole design around the "safe" part. Then you get a language that is a straight-jacket. So I'd very strongly argue that the core atomics should be done the "unsafe" way - allow people to specify exactly when they want exactly what access. Allow people to mix and match and have overlapping partial aliases, because if you implement things like locking, you *need* those partially aliasing accesses, and you need to make overlapping atomics where sometimes you access only one part of the field. And yes, that will be unsafe, and it might even be unportable, but it's exactly the kind of thing you need in order to avoid having to use assembly language to do your locking. And by all means, you should relegate that to the "unsafe corner" of the language. And maybe don't talk about the unsafe sharp edges in the first chapter of the book about the language. But you should _start_ the design of your language memory model around the unsafe "raw atomic access operations" model. Then you can use those strictly more powerful operations, and you create an object model *around* it. So you create safe objects like just an atomic counter. In *that* corner of the language, you have the "safe atomics" - they aren't the fundamental implementation, but they are the safe wrappers *around* the more powerful (but unsafe) core. With that "atomic counter" you cannot forget to do atomic accesses, because that safe corner of the world doesn't _have_ anything but the safe atomic accesses for every time you use the object. See? Having the capability to do powerful and maybe unsafe things does not force people to expose and use all that power. You can - and should - wrap the powerful model with safer and simpler interfaces. This isn't something specific to atomics. Not even remotely. This is quite fundamental. You often literally _cannot_ do interesting things using only safe interfaces. You want safe memory allocations - but to actually write the allocator itself, you want to have all those unsafe escape methods - all those raw pointers with arbitrary arithmetic etc. And if you don't have unsafe escapes, you end up doing what so many languages did: the libraries are written in something more powerful like C, because C literally can do things that other languages *cannot* do. Don't let people fool you with talk about Turing machines and similar smoke-and-mirror garbage. It's a bedtime story for first-year CS students. It's not true. Not all languages are created equal. Not all languages can do the same things. If your language doesn't have those unsafe escapes, your language is inherently weaker, and inherently worse for it. Linus
* Linus Torvalds (torvalds@linux-foundation.org) wrote: > On Mon, 25 Mar 2024 at 17:05, Dr. David Alan Gilbert <dave@treblig.org> wrote: > > > > Isn't one of the aims of the Rust/C++ idea that you can't forget to access > > a shared piece of data atomically? > > If that is an aim, it's a really *bad* one. > > Really. > > It very much should never have been an aim, and I hope it wasn't. I > think, and hope, that the source of the C++ and Rust bad decisions is > cluelessness, not active malice. Oh that hit a nerve :-) > Take Rust - one big point of Rust is the whole "safe" thing, but it's > very much not a straightjacket like Pascal was. There's a "safe" part > to Rust, but equally importantly, there's also the "unsafe" part to > Rust. > > The safe part is the one that most programmers are supposed to use. > It's the one that allows you to not have to worry too much about > things. It's the part that makes it much harder to screw up. > > But the *unsafe* part is what makes Rust powerful. It's the part that > works behind the curtain. It's the part that may be needed to make the > safe parts *work*. > > And yes, an application programmer might never actually need to use > it, and in fact in many projects the rule might be that unsafe Rust is > simply never even an option - but that doesn't mean that the unsafe > parts don't exist. > > Because those unsafe parts are needed to make it all work in reality. > > And you should never EVER base your whole design around the "safe" > part. Then you get a language that is a straight-jacket. > > So I'd very strongly argue that the core atomics should be done the > "unsafe" way - allow people to specify exactly when they want exactly > what access. Allow people to mix and match and have overlapping > partial aliases, because if you implement things like locking, you > *need* those partially aliasing accesses, and you need to make > overlapping atomics where sometimes you access only one part of the > field. > > And yes, that will be unsafe, and it might even be unportable, but > it's exactly the kind of thing you need in order to avoid having to > use assembly language to do your locking. > > And by all means, you should relegate that to the "unsafe corner" of > the language. And maybe don't talk about the unsafe sharp edges in the > first chapter of the book about the language. > > But you should _start_ the design of your language memory model around > the unsafe "raw atomic access operations" model. > > Then you can use those strictly more powerful operations, and you > create an object model *around* it. > > So you create safe objects like just an atomic counter. In *that* > corner of the language, you have the "safe atomics" - they aren't the > fundamental implementation, but they are the safe wrappers *around* > the more powerful (but unsafe) core. > > With that "atomic counter" you cannot forget to do atomic accesses, > because that safe corner of the world doesn't _have_ anything but the > safe atomic accesses for every time you use the object. > > See? Having the capability to do powerful and maybe unsafe things does > not force people to expose and use all that power. You can - and > should - wrap the powerful model with safer and simpler interfaces. I'd agree it's important to get the primitives right; but I'd argue that from a design point of view it's probably better to keep both in mind from early on; you need to create a safe interface which people can actually use most of the time, otherwise you're not getting any benefit; so yes get the bases right, but just keep a feel for how they'll get encapsulated so most of the more boring code can be safe. > This isn't something specific to atomics. Not even remotely. This is > quite fundamental. You often literally _cannot_ do interesting things > using only safe interfaces. You want safe memory allocations - but to > actually write the allocator itself, you want to have all those unsafe > escape methods - all those raw pointers with arbitrary arithmetic etc. > > And if you don't have unsafe escapes, you end up doing what so many > languages did: the libraries are written in something more powerful > like C, because C literally can do things that other languages > *cannot* do. Yeh that's fine, I'm not at all arguing against that; but it doesn't mean you shouldn't keep an eye on how the safe side should look; even in the kernel. Get it right and those unsafe escapes shouldn't be needed too commonly; get it wrong and you'll either have painful abstractions or end up with unsafes shotgunned all over the place. > Don't let people fool you with talk about Turing machines and similar > smoke-and-mirror garbage. It's a bedtime story for first-year CS > students. It's not true. My infinitely long tape is still on back order. Dave > things. If your language doesn't have those unsafe escapes, your > language is inherently weaker, and inherently worse for it. > > Linus >
On Mar 25, 2024, at 8:49 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > But you should _start_ the design of your language memory model around > the unsafe "raw atomic access operations" model. > > Then you can use those strictly more powerful operations, and you > create an object model *around* it. To some extent Rust does this already, unlike C++. C++ allows atomics to be implemented using locks. Partly for this reason, `std::atomic<T>` is documented as not necessarily having the same representation as `T` [1]. C++ also has strict aliasing, so even if those types do have the same representation, you still can't cast `T *` to `std::atomic<T> *`. But Rust atomics are lower-level. First, they are guaranteed lock-free [2]. Second, they are documented as having "the same in-memory representation as the underlying" type [3]. (They also usually have the same alignment, except on x86 where u64 is only 4-byte aligned but AtomicU64 of course needs to be 8-byte aligned.) Meanwhile, Rust intentionally lacks strict aliasing. Combined, this means it's perfectly legal in Rust to cast e.g. `&mut u32` to `&AtomicU32` and perform atomic accesses on it. Or the same with u64/AtomicU64 if you know the pointer is validly aligned. This is by design; the Atomic types' methods are considered the official way to perform atomic operations on arbitrary memory, making it unnecessary to also stabilize 'lower-level' intrinsics. That said, there *are* currently some holes in Rust's atomics model, based on the fact that it's mostly inherited from C++. From the documentation: > Since C++ does not support mixing atomic and non-atomic accesses, or > non-synchronized different-sized accesses to the same data, Rust does not > support those operations either. Note that both of those restrictions only > apply if the accesses are non-synchronized. https://doc.rust-lang.org/std/sync/atomic/index.html There are some open issues around this: - "How can we allow read-read races between atomic and non-atomic accesses?" https://github.com/rust-lang/unsafe-code-guidelines/issues/483 > [..] I do think we should allow such code. However, then we have to change > the way we document our atomics [..] - "What about: mixed-size atomic accesses" https://github.com/rust-lang/unsafe-code-guidelines/issues/345" > Apparently the x86 manual says you "should" not do this [..] It is unclear > what "should" means (or what anything else here really means, operationally > speaking...) [1] https://eel.is/c++draft/atomics#types.generic.general-3 [2] https://doc.rust-lang.org/std/sync/atomic/index.html#portability [3] https://doc.rust-lang.org/nightly/std/sync/atomic/struct.AtomicU64.html
On Wed, Mar 27, 2024 at 09:16:09AM -0700, comex wrote:
> Meanwhile, Rust intentionally lacks strict aliasing.
I wasn't aware of this. Given that unrestricted pointers are a real
impediment to compiler optimization, I thought that with Rust we were
finally starting to nail down a concrete enough memory model to tackle
this safely. But I guess not?
On Wed, 27 Mar 2024 at 11:51, Kent Overstreet <kent.overstreet@linux.dev> wrote: > > On Wed, Mar 27, 2024 at 09:16:09AM -0700, comex wrote: > > Meanwhile, Rust intentionally lacks strict aliasing. > > I wasn't aware of this. Given that unrestricted pointers are a real > impediment to compiler optimization, I thought that with Rust we were > finally starting to nail down a concrete enough memory model to tackle > this safely. But I guess not? Strict aliasing is a *horrible* mistake. It's not even *remotely* "tackle this safely". It's the exact opposite. It's completely broken. Anybody who thinks strict aliasing is a good idea either (a) doesn't understand what it means (b) has been brainwashed by incompetent compiler people. it's a horrendous crock that was introduced by people who thought it was too complicated to write out "restrict" keywords, and that thought that "let's break old working programs and make it harder to write new programs" was a good idea. Nobody should ever do it. The fact that Rust doesn't do the C strict aliasing is a good thing. Really. I suspect you have been fooled by the name. Because "strict aliasing" sounds like a good thing. It sounds like "I know these strictly can't alias". But despite that name, it's the complete opposite of that, and means "I will ignore actual real aliasing even if it exists, because I will make aliasing decisions on entirely made-up grounds". Just say no to strict aliasing. Thankfully, there's an actual compiler flag for that: -fno-strict-aliasing. It should absolutely have been the default. Linus
On Wed, Mar 27, 2024 at 12:07:26PM -0700, Linus Torvalds wrote: > On Wed, 27 Mar 2024 at 11:51, Kent Overstreet <kent.overstreet@linux.dev> wrote: > > > > On Wed, Mar 27, 2024 at 09:16:09AM -0700, comex wrote: > > > Meanwhile, Rust intentionally lacks strict aliasing. > > > > I wasn't aware of this. Given that unrestricted pointers are a real > > impediment to compiler optimization, I thought that with Rust we were > > finally starting to nail down a concrete enough memory model to tackle > > this safely. But I guess not? > > Strict aliasing is a *horrible* mistake. > > It's not even *remotely* "tackle this safely". It's the exact > opposite. It's completely broken. > > Anybody who thinks strict aliasing is a good idea either > > (a) doesn't understand what it means > > (b) has been brainwashed by incompetent compiler people. > > it's a horrendous crock that was introduced by people who thought it > was too complicated to write out "restrict" keywords, and that thought > that "let's break old working programs and make it harder to write new > programs" was a good idea. Strict aliasing is crap in C and C++ because we started out with unrestricetd pointers, and it just doesn't work in C and C++ with the realities of the kind of code we have to write, and we never got any kind of a model that would have made it workable. Never mind trying to graft that onto existing codebases... (Restrict was crap too... no scoping, nothing but a single f*cking keyword? Who ever thought _that_ was going to work?) _But_: the lack of any aliasing guarantees means that writing through any pointer can invalidate practically anything, and this is a real problem. A lot of C programmers have stockholm syndrome when it comes to this, we end up writing a lot of code in weirdly baroque and artificial styles to partially work around this when we care about performance - saving things into locals because at least the _stack_ generally can't alias to avoid forced reloads, or passing and returning things by reference instead of by value when that's _not the semantics we want_ because otherwise the compiler is going to do an unnecessary copy - again, that's fundamentally because of aliasing.
On Wed, 27 Mar 2024 at 12:41, Kent Overstreet <kent.overstreet@linux.dev> wrote: > > _But_: the lack of any aliasing guarantees means that writing through > any pointer can invalidate practically anything, and this is a real > problem. It's actually much less of a problem than you make it out to be. A lot of common aliasing information is statically visible (offsets off the same struct pointer etc). The big problems tend to be (a) old in-order hardware that really wants the compiler to schedule memory operations (b) vectorization and HPC and honestly, (a) is irrelevant, and (b) is where 'restrict' and actual real vector extensions come in. In fact, the type-based aliasing often doesn't help (because you have arrays of the same FP types), and so you really just need to tell the compiler that your arrays are disjoint. Yes, yes, possible aliasing means that the compiler won't generate nice-looking code in many situations and will end up reloading values from memory etc. AND NONE OF THAT MATTERS IN REALITY. Performance issues to a close approximation come from cache misses and branch mispredicts. The aliasing issue just isn't the horrendous issue people claim it is. It's most *definitely* not worth the absolute garbage that is C type-based aliasing. And yes, I do think it might be nice to have a nicer 'restrict' model, because yes, I look at the generated asm and I see the silly code generation too. But describing aliasing sanely in general is just hard (both for humans _and_ for some sane machine interface), and it's very very seldom worth the pain. Linus
On Wed, Mar 27, 2024 at 03:41:16PM -0400, Kent Overstreet wrote: > On Wed, Mar 27, 2024 at 12:07:26PM -0700, Linus Torvalds wrote: > > On Wed, 27 Mar 2024 at 11:51, Kent Overstreet <kent.overstreet@linux.dev> wrote: > > > > > > On Wed, Mar 27, 2024 at 09:16:09AM -0700, comex wrote: > > > > Meanwhile, Rust intentionally lacks strict aliasing. > > > > > > I wasn't aware of this. Given that unrestricted pointers are a real > > > impediment to compiler optimization, I thought that with Rust we were > > > finally starting to nail down a concrete enough memory model to tackle > > > this safely. But I guess not? > > > > Strict aliasing is a *horrible* mistake. > > > > It's not even *remotely* "tackle this safely". It's the exact > > opposite. It's completely broken. > > > > Anybody who thinks strict aliasing is a good idea either > > > > (a) doesn't understand what it means > > > > (b) has been brainwashed by incompetent compiler people. > > > > it's a horrendous crock that was introduced by people who thought it > > was too complicated to write out "restrict" keywords, and that thought > > that "let's break old working programs and make it harder to write new > > programs" was a good idea. > > Strict aliasing is crap in C and C++ because we started out with > unrestricetd pointers, and it just doesn't work in C and C++ with the > realities of the kind of code we have to write, and we never got any > kind of a model that would have made it workable. Never mind trying to > graft that onto existing codebases... > > (Restrict was crap too... no scoping, nothing but a single f*cking > keyword? Who ever thought _that_ was going to work?) > > _But_: the lack of any aliasing guarantees means that writing through > any pointer can invalidate practically anything, and this is a real I don't know whether I'm 100% correct on this, but Rust has references, so things like "you have a unique reference to a part of memory, no one would touch it in the meanwhile" are represented by `&mut`, to get a `&mut` from a raw pointer, you need unsafe, where programmers can provide the reasoning of the safety of the accesses. More like "pointers can alias anyone but references cannot" to me. Regards, Boqun > problem. A lot of C programmers have stockholm syndrome when it comes to > this, we end up writing a lot of code in weirdly baroque and artificial > styles to partially work around this when we care about performance - > saving things into locals because at least the _stack_ generally can't > alias to avoid forced reloads, or passing and returning things by > reference instead of by value when that's _not the semantics we want_ > because otherwise the compiler is going to do an unnecessary copy - > again, that's fundamentally because of aliasing.
On Wed, Mar 27, 2024 at 01:45:46PM -0700, Linus Torvalds wrote: > On Wed, 27 Mar 2024 at 12:41, Kent Overstreet <kent.overstreet@linux.dev> wrote: > > > > _But_: the lack of any aliasing guarantees means that writing through > > any pointer can invalidate practically anything, and this is a real > > problem. > > It's actually much less of a problem than you make it out to be. > > A lot of common aliasing information is statically visible (offsets > off the same struct pointer etc). > > The big problems tend to be > > (a) old in-order hardware that really wants the compiler to schedule > memory operations > > (b) vectorization and HPC Yeah, I was being a bit dramatic given the current state of the world - OOO machines _mostly_ do a reasonable job of hiding the negative effects of this (although not always, when memory barriers are involved). But I think this is going to become important again in the future, for a couple reasons. On the hardware end, the Mill guys were pointing out years ago that register renaming is a big power bottleneck in modern processors; the actual functional units consume practically nothing compared to just moving data around. They never built anything, but there's at least one new startup I know of that's continuing their work. Yes, VLIW has been a bust repeatedly, but as we keep going wider and wider (because what else are we going to do?) I think it's going to happen eventually. On the compiler side, I'm taking the long view: it's really not just about a few reloads here and there, it's that you fundamentally can't do that much with code that's mutating arbitrary state; analysis that optimizations depend on is impossible. In the compiler you really want to be working with code that's pure functional - then all your optimizations are just algebra. And internally, compilers do this as much as they can (SSA form...), but - unrestricted pointers really put a limit on what they can do. The beautiful thing about Rust to me is that we finally might have a path out of this; the rules of Rust references constrain mutability in such a way that's almost as good as being purely functional, yet it's something we can actually do systems programming with. I think this is going to lead to substantially better code optimization in the future, and based on a much more sound model that means we won't constantly be fighting with compiler people because they're doing shady shit and breaking previously working code. Based on my experiences with writing iterator heavy code in both C and Rust, I think it already is.
On Wed, Mar 27, 2024 at 02:21:03PM -0700, Boqun Feng wrote: > On Wed, Mar 27, 2024 at 03:41:16PM -0400, Kent Overstreet wrote: > > On Wed, Mar 27, 2024 at 12:07:26PM -0700, Linus Torvalds wrote: > > > On Wed, 27 Mar 2024 at 11:51, Kent Overstreet <kent.overstreet@linux.dev> wrote: > > > > > > > > On Wed, Mar 27, 2024 at 09:16:09AM -0700, comex wrote: > > > > > Meanwhile, Rust intentionally lacks strict aliasing. > > > > > > > > I wasn't aware of this. Given that unrestricted pointers are a real > > > > impediment to compiler optimization, I thought that with Rust we were > > > > finally starting to nail down a concrete enough memory model to tackle > > > > this safely. But I guess not? > > > > > > Strict aliasing is a *horrible* mistake. > > > > > > It's not even *remotely* "tackle this safely". It's the exact > > > opposite. It's completely broken. > > > > > > Anybody who thinks strict aliasing is a good idea either > > > > > > (a) doesn't understand what it means > > > > > > (b) has been brainwashed by incompetent compiler people. > > > > > > it's a horrendous crock that was introduced by people who thought it > > > was too complicated to write out "restrict" keywords, and that thought > > > that "let's break old working programs and make it harder to write new > > > programs" was a good idea. > > > > Strict aliasing is crap in C and C++ because we started out with > > unrestricetd pointers, and it just doesn't work in C and C++ with the > > realities of the kind of code we have to write, and we never got any > > kind of a model that would have made it workable. Never mind trying to > > graft that onto existing codebases... > > > > (Restrict was crap too... no scoping, nothing but a single f*cking > > keyword? Who ever thought _that_ was going to work?) > > > > _But_: the lack of any aliasing guarantees means that writing through > > any pointer can invalidate practically anything, and this is a real > > I don't know whether I'm 100% correct on this, but Rust has references, > so things like "you have a unique reference to a part of memory, no one > would touch it in the meanwhile" are represented by `&mut`, to get a > `&mut` from a raw pointer, you need unsafe, where programmers can > provide the reasoning of the safety of the accesses. More like "pointers > can alias anyone but references cannot" to me. That's not really a workable rule because in practice every data structure has unsafe Rust underneath. Strict aliasing would mean that unsafe Rust very much has to follow the aliasing rules too.
> On Mar 27, 2024, at 2:21 PM, Boqun Feng <boqun.feng@gmail.com> wrote: > > I don't know whether I'm 100% correct on this, but Rust has references, > so things like "you have a unique reference to a part of memory, no one > would touch it in the meanwhile" are represented by `&mut`, to get a > `&mut` from a raw pointer, you need unsafe, where programmers can > provide the reasoning of the safety of the accesses. More like "pointers > can alias anyone but references cannot" to me. Right. When I said “strict aliasing” I meant type-based aliasing rules, which is what GCC calls “strict aliasing". But Rust does have stricter aliasing rules than C in a different way. Both mutable and immutable references are annotated with LLVM `noalias` by default, equivalent to C `restrict`. For mutable references it’s justified because those references should be unique. For immutable references it's justified because the memory behind the reference shouldn’t be mutated at all. (There’s an exception for types with ‘interior mutability’, where ‘immutable' references actually can be used for mutation.) The hope has always been that this gives Rust better overall optimizability than C or C++ and makes up for the losses from the lack of type-based aliasing rules. If there’s any empirical data to justify or refute this, I’m not aware of it. But that’s the hope, and by this point Rust is committed to the approach. (Why only function parameters? Mainly because of limitations of what LLVM IR can express. From the perspective of the work-in-progress Rust memory model spec, function parameters are special in *some* ways, but many of the optimizations could apply to all uses of references. That's just not currently implemented.)
> On Mar 27, 2024, at 2:56 PM, comex <comexk@gmail.com> wrote: > > Right. When I said “strict aliasing” I meant type-based aliasing rules, which is what GCC calls “strict aliasing". But Rust does have stricter aliasing rules than C in a different way. Both mutable and immutable references are annotated with LLVM `noalias` by default, equivalent to C `restrict`. …oops, this should say “reference-typed function parameters”. > On Mar 27, 2024, at 2:49 PM, Kent Overstreet <kent.overstreet@linux.dev> wrote: > > That's not really a workable rule because in practice every data > structure has unsafe Rust underneath. Strict aliasing would mean that > unsafe Rust very much has to follow the aliasing rules too. There have indeed been a lot of issues where some innocent-seeming piece of unsafe Rust turns out to violate the reference aliasing rules. Miri helps (it’s a tool that can detect violations at runtime), and there have been attempts to loosen the rules where possible. But it is definitely a case where Rust’s UB rules are more subtle than one would like. At least it only applies to unsafe code.
On Wed, Mar 27, 2024 at 05:49:41PM -0400, Kent Overstreet wrote: [...] > > > Strict aliasing is crap in C and C++ because we started out with > > > unrestricetd pointers, and it just doesn't work in C and C++ with the > > > realities of the kind of code we have to write, and we never got any > > > kind of a model that would have made it workable. Never mind trying to > > > graft that onto existing codebases... > > > > > > (Restrict was crap too... no scoping, nothing but a single f*cking > > > keyword? Who ever thought _that_ was going to work?) > > > > > > _But_: the lack of any aliasing guarantees means that writing through > > > any pointer can invalidate practically anything, and this is a real > > > > I don't know whether I'm 100% correct on this, but Rust has references, > > so things like "you have a unique reference to a part of memory, no one > > would touch it in the meanwhile" are represented by `&mut`, to get a > > `&mut` from a raw pointer, you need unsafe, where programmers can > > provide the reasoning of the safety of the accesses. More like "pointers > > can alias anyone but references cannot" to me. > > That's not really a workable rule because in practice every data > structure has unsafe Rust underneath. Strict aliasing would mean that I don't follow, a plain data structure like: struct Point { x: i64, y: i64 } doesn't have any unsafe Rust underneath I think. > unsafe Rust very much has to follow the aliasing rules too. > The point I was trying to say, the aliasing rule on Rust raw pointers is relatively relaxed compared to strict aliasing in C since Rust has references which should have more accurate informatio on aliasing. (but not a language expert). Regards, Boqun
On Wed, 27 Mar 2024 at 14:41, Kent Overstreet <kent.overstreet@linux.dev> wrote: > > > On the hardware end, the Mill guys were pointing out years ago that > register renaming is a big power bottleneck in modern processors; LOL. The Mill guys took the arguments from the Itanium people, and turned the crazy up to 11, with "the belt" and seemingly trying to do a dataflow machine but not worrying over-much about memory accesses etc. The whole "we'll deal with it in the compiler" is crazy talk. In other words, I'll believe it when I see it. And I doubt we'll ever see it. Linus
On Wed, Mar 27, 2024 at 03:57:12PM -0700, Linus Torvalds wrote: > On Wed, 27 Mar 2024 at 14:41, Kent Overstreet <kent.overstreet@linux.dev> wrote: > > > > > > On the hardware end, the Mill guys were pointing out years ago that > > register renaming is a big power bottleneck in modern processors; > > LOL. > > The Mill guys took the arguments from the Itanium people, and turned > the crazy up to 11, with "the belt" and seemingly trying to do a > dataflow machine but not worrying over-much about memory accesses etc. > > The whole "we'll deal with it in the compiler" is crazy talk. And Itanium did way better on Fortran, but all those spills and reloads due to aliasing that an OoO processor can hide are death when you're statically scheduled. Unrestricted pointers are fundamentally a real barrier to improved ILP. It's not _all_ branches and cache effects, though I will grant you that cache effects do dominate. But I think there's hope for improvement on that one too. A _lot_ of kernel code defaults to lists instead of vectors even though we know that vectors are better for performance - beacuse people are afraid of memory allocations and error paths. Rust makes it way harder to fuck up your error paths, and also makes it safer to refactor to improve your data structures.
On Wed, 2024-03-27 at 12:07 -0700, Linus Torvalds wrote: > On Wed, 27 Mar 2024 at 11:51, Kent Overstreet > <kent.overstreet@linux.dev> wrote: > > > > On Wed, Mar 27, 2024 at 09:16:09AM -0700, comex wrote: > > > Meanwhile, Rust intentionally lacks strict aliasing. > > > > I wasn't aware of this. Given that unrestricted pointers are a real > > impediment to compiler optimization, I thought that with Rust we > > were > > finally starting to nail down a concrete enough memory model to > > tackle > > this safely. But I guess not? > > Strict aliasing is a *horrible* mistake. > > It's not even *remotely* "tackle this safely". It's the exact > opposite. It's completely broken. > > Anybody who thinks strict aliasing is a good idea either > > (a) doesn't understand what it means > > (b) has been brainwashed by incompetent compiler people. > > it's a horrendous crock that was introduced by people who thought it > was too complicated to write out "restrict" keywords, and that > thought > that "let's break old working programs and make it harder to write > new > programs" was a good idea. > > Nobody should ever do it. The fact that Rust doesn't do the C strict > aliasing is a good thing. Really. Btw, for the interested, that's a nice article on strict aliasing: https://blog.regehr.org/archives/1307 Dennis Ritchie, the Man Himself, back in the 1980s pushed back quite strongly on (different?) aliasing experiments: https://www.yodaiken.com/2021/03/19/dennis-ritchie-on-alias-analysis-in-the-c-programming-language-1988/ No idea why they can't just leave C alone... It's not without reason that new languages like Zig and Hare want to freeze the language (standard) once they are released. P. > > I suspect you have been fooled by the name. Because "strict aliasing" > sounds like a good thing. It sounds like "I know these strictly can't > alias". But despite that name, it's the complete opposite of that, > and > means "I will ignore actual real aliasing even if it exists, because > I > will make aliasing decisions on entirely made-up grounds". > > Just say no to strict aliasing. Thankfully, there's an actual > compiler > flag for that: -fno-strict-aliasing. It should absolutely have been > the default. > > Linus >
On Mon, Mar 25, 2024 at 10:44:43AM -0700, Linus Torvalds wrote: > So I actually think most compiler people are perfectly fine with the > kernel model of mostly doing 'volatile' not on the data structures > themselves, but as accesses through casts. > > It's very traditional C, and there's actually nothing particularly odd > about it. Not even from a compiler standpoint. > > In fact, I personally will argue that it is fundamentally wrong to > think that the underlying data has to be volatile. A variable may be > entirely stable in some cases (ie locks held), but not in others. > > So it's not the *variable* (aka "object") that is 'volatile', it's the > *context* that makes a particular access volatile. > > That explains why the kernel has basically zero actual volatile > objects, and 99% of all volatile accesses are done through accessor > functions that use a cast to mark a particular access volatile. What annoys me is that 'volatile' accesses have (at least) two distinct meanings: - Make this access untorn - Prevent various optimisations (code motion, common-subexpression-elimination, ...) As an example, folio_migrate_flags() (in mm/migrate.c): if (folio_test_error(folio)) folio_set_error(newfolio); if (folio_test_referenced(folio)) folio_set_referenced(newfolio); if (folio_test_uptodate(folio)) folio_mark_uptodate(newfolio); ... which becomes... 1f: f6 c4 04 test $0x4,%ah 22: 74 05 je 29 <folio_migrate_flags+0x19> 24: f0 80 4f 01 04 lock orb $0x4,0x1(%rdi) 29: 48 8b 03 mov (%rbx),%rax 2c: a8 04 test $0x4,%al 2e: 74 05 je 35 <folio_migrate_flags+0x25> 30: f0 80 4d 00 04 lock orb $0x4,0x0(%rbp) 35: 48 8b 03 mov (%rbx),%rax 38: a8 08 test $0x8,%al 3a: 74 05 je 41 <folio_migrate_flags+0x31> 3c: f0 80 4d 00 08 lock orb $0x8,0x0(%rbp) In my ideal world, the compiler would turn this into: newfolio->flags |= folio->flags & MIGRATE_MASK; but because folio_test_foo() and folio_set_foo() contain all manner of volatile casts, the compiler is forced to do individual tests and sets. Part of that is us being dumb; folio_set_foo() should be __folio_set_foo() because this folio is newly allocated and nobody else can be messing with its flags word yet. I failed to spot that at the time I was doing the conversion from SetPageFoo to folio_set_foo. But if the compiler people could give us something a little more granular than "scary volatile access disable everything", that would be nice. Also hard, because now you have to figure out what this new thing interacts with and when is it safe to do what.
On Mon, Apr 08, 2024 at 05:02:37PM +0100, Matthew Wilcox wrote: > On Mon, Mar 25, 2024 at 10:44:43AM -0700, Linus Torvalds wrote: > > So I actually think most compiler people are perfectly fine with the > > kernel model of mostly doing 'volatile' not on the data structures > > themselves, but as accesses through casts. > > > > It's very traditional C, and there's actually nothing particularly odd > > about it. Not even from a compiler standpoint. > > > > In fact, I personally will argue that it is fundamentally wrong to > > think that the underlying data has to be volatile. A variable may be > > entirely stable in some cases (ie locks held), but not in others. > > > > So it's not the *variable* (aka "object") that is 'volatile', it's the > > *context* that makes a particular access volatile. > > > > That explains why the kernel has basically zero actual volatile > > objects, and 99% of all volatile accesses are done through accessor > > functions that use a cast to mark a particular access volatile. > > What annoys me is that 'volatile' accesses have (at least) two distinct > meanings: > - Make this access untorn > - Prevent various optimisations (code motion, > common-subexpression-elimination, ...) > > As an example, folio_migrate_flags() (in mm/migrate.c): > > if (folio_test_error(folio)) > folio_set_error(newfolio); > if (folio_test_referenced(folio)) > folio_set_referenced(newfolio); > if (folio_test_uptodate(folio)) > folio_mark_uptodate(newfolio); > > ... which becomes... > > 1f: f6 c4 04 test $0x4,%ah > 22: 74 05 je 29 <folio_migrate_flags+0x19> > 24: f0 80 4f 01 04 lock orb $0x4,0x1(%rdi) > 29: 48 8b 03 mov (%rbx),%rax > 2c: a8 04 test $0x4,%al > 2e: 74 05 je 35 <folio_migrate_flags+0x25> > 30: f0 80 4d 00 04 lock orb $0x4,0x0(%rbp) > 35: 48 8b 03 mov (%rbx),%rax > 38: a8 08 test $0x8,%al > 3a: 74 05 je 41 <folio_migrate_flags+0x31> > 3c: f0 80 4d 00 08 lock orb $0x8,0x0(%rbp) > > In my ideal world, the compiler would turn this into: > > newfolio->flags |= folio->flags & MIGRATE_MASK; > > but because folio_test_foo() and folio_set_foo() contain all manner of > volatile casts, the compiler is forced to do individual tests and sets. > > Part of that is us being dumb; folio_set_foo() should be __folio_set_foo() > because this folio is newly allocated and nobody else can be messing > with its flags word yet. I failed to spot that at the time I was doing > the conversion from SetPageFoo to folio_set_foo. > > But if the compiler people could give us something a little more > granular than "scary volatile access disable everything", that would > be nice. Also hard, because now you have to figure out what this new > thing interacts with and when is it safe to do what. OK, I will bite... Why not accumulate the changes in a mask, and then apply the mask the one time? (In situations where __folio_set_foo() need not apply.) If it turns out that we really do need a not-quite-volatile, what exactly does it do? You clearly want it to be able to be optimized so as to merge similar accesses. Is there a limit to the number of accesses that can be merged or to the region of code over which such merging is permitted? Either way, how is the compiler informed of these limits? (I admit that I am not crazy about this sort of proposal, but that might have something to do with the difficulty of repeatedly convincing people that volatile is necessary and must be retained...) Thanx, Paul
On Mon, 8 Apr 2024 at 09:02, Matthew Wilcox <willy@infradead.org> wrote: > > What annoys me is that 'volatile' accesses have (at least) two distinct > meanings: > - Make this access untorn > - Prevent various optimisations (code motion, > common-subexpression-elimination, ...) Oh, I'm not at all trying to say that "volatile" is great. My argument was that the C (and C++, and Rust) model of attaching memory ordering to objects is actively bad. and limiting. Because the whole "the access rules are context-dependent" is really fundamental. Anybody who designs an atomic model around the object is simply not doing it right. Now, the "volatile" rules actually make sense in a historical "hardware access" context. So I do not think "volatile" is great, but I also don't think K&R were incompetent. "volatile" makes perfect sense in the historical setting of "direct hardware access". It just so happens that there weren't other tools, so then you end up using "volatile" for cached memory too when you want to get "access once" semantics, and then it isn't great. And then you have *too* many tools on the standards bodies, and they don't understand atomics, and don't understand volatile, and they have been told that "volatile" isn't great for atomics because it doesn't have memory ordering semantics, but do not understand the actual problem space. So those people - who in some cases spent decades arguing about (and sometimes against) "volatile" think that despite all the problems, the solution for atomics is to make the *same* mistake, and tie it to the data and the type system, not the action. Which is honestly just plain *stupid*. What made sense for 'volatile' in a historical setting, absolutely does not make sense for atomics. > As an example, folio_migrate_flags() (in mm/migrate.c): > > if (folio_test_error(folio)) > folio_set_error(newfolio); > if (folio_test_referenced(folio)) > folio_set_referenced(newfolio); > if (folio_test_uptodate(folio)) > folio_mark_uptodate(newfolio); > > ... which becomes... [ individual load and store code generation removed ] > In my ideal world, the compiler would turn this into: > > newfolio->flags |= folio->flags & MIGRATE_MASK; Well, honestly, we should just write the code that way, and not expect too much of the compiler. We don't currently have a "generic atomic or" operation, but we probably should have one. For our own historical reasons, while we have a few generic atomic operations: bit operations, cmpxchg, etc, most of our arithmetic and logical ops all rely on a special "atomic_t" type (later extended with "atomic_long_t"). The reason? The garbage that is legacy Sparc atomics. Sparc historically basically didn't have any atomics outside of the 'test and set byte' one, so if you wanted an atomic counter thing, and you cared about sparc, you had to play games with "some bits of the counter are the atomic byte lock". And we do not care about that Sparc horror any *more*, but we used to. End result: instead of having "do atomic ops on a normal type" - which would be a lot more powerful - we have this model of "do atomic ops on atomic_t". We could fix that now. Instead of having architectures define arch_atomic_or(int i, atomic_t *v) operations, we could - and should - make the 'arch' atomics be arch_atomic_or(int i, unsigned int *v) and then we'd still keep the "atomic_t" around for type safety reasons, but code that just wants to act on an "int" (or a "long") atomically could just do so. But in your case, I don't think you actually need it: > Part of that is us being dumb; folio_set_foo() should be __folio_set_foo() > because this folio is newly allocated and nobody else can be messing > with its flags word yet. I failed to spot that at the time I was doing > the conversion from SetPageFoo to folio_set_foo. This is part of my "context matters" rant and why I do *not* think atomics should be tied to the object, but to the operation. The compiler generally doesn't know the context rules (insert "some languages do know in some cases" here), which is why we as programmers should just use different operations when we do. In this case, since it's a new folio that hasn't been exposed to anybody, you should just have done exactly that kind of newfolio->flags |= folio->flags & MIGRATE_MASK; which we already do in the page initialization code when we know we own the flags (set_page_zone, set_page_zone, set_page_section). We've generally avoided doing this in general, though - even the buddy allocator seldom does it. The only case of manual "I know I own the flags" I know if (apart from the initialization itself) is ->flags &= ~PAGE_FLAGS_CHECK_AT_FREE; ... ->flags &= ~PAGE_FLAGS_CHECK_AT_PREP; kinds of things at free/alloc time. > But if the compiler people could give us something a little more > granular than "scary volatile access disable everything", that would > be nice. Also hard, because now you have to figure out what this new > thing interacts with and when is it safe to do what. I think it would be lovely to have some kind of "atomic access" operations that the compiler could still combine when it can see that "this is invisible at a cache access level". But as things are now, we do have most of those in the kernel, and what you ask for can either be done today, or could be done (like that "arch_atomic_or()") with a bit of re-org. Linus
On Mon, Apr 08, 2024 at 09:55:23AM -0700, Paul E. McKenney wrote: > On Mon, Apr 08, 2024 at 05:02:37PM +0100, Matthew Wilcox wrote: > > In my ideal world, the compiler would turn this into: > > > > newfolio->flags |= folio->flags & MIGRATE_MASK; > > Why not accumulate the changes in a mask, and then apply the mask the > one time? (In situations where __folio_set_foo() need not apply.) Yes, absolutely, we can, should and probably eventually will do this when it gets to the top of somebody's todo list. But it irks me that we can't tell the compiler this is a safe transformation for it to make. There are a number of places where similar things happen. $ git grep folio_test.*folio_test will find you 82 of them (where they happen to be on the same line) if (folio_test_dirty(folio) || folio_test_locked(folio) || folio_test_writeback(folio)) break; turns into: 1f41: 48 8b 29 mov (%rcx),%rbp 1f44: 48 c1 ed 04 shr $0x4,%rbp 1f48: 83 e5 01 and $0x1,%ebp 1f4b: 0f 85 d5 00 00 00 jne 2026 <filemap_range_has_writeback+0x1a6> 1f51: 48 8b 29 mov (%rcx),%rbp 1f54: 83 e5 01 and $0x1,%ebp 1f57: 0f 85 c9 00 00 00 jne 2026 <filemap_range_has_writeback+0x1a6> 1f5d: 48 8b 29 mov (%rcx),%rbp 1f60: 48 d1 ed shr $1,%rbp 1f63: 83 e5 01 and $0x1,%ebp 1f66: 0f 85 ba 00 00 00 jne 2026 <filemap_range_has_writeback+0x1a6> rather than _one_ load from rcx and a test against a mask. > If it turns out that we really do need a not-quite-volatile, what exactly > does it do? You clearly want it to be able to be optimized so as to merge > similar accesses. Is there a limit to the number of accesses that can > be merged or to the region of code over which such merging is permitted? > Either way, how is the compiler informed of these limits? Right, like I said, it's not going to be easy to define exactly what we want.
On Mon, Apr 08, 2024 at 10:01:32AM -0700, Linus Torvalds wrote: > For our own historical reasons, while we have a few generic atomic > operations: bit operations, cmpxchg, etc, most of our arithmetic and > logical ops all rely on a special "atomic_t" type (later extended with > "atomic_long_t"). > > The reason? The garbage that is legacy Sparc atomics. > > Sparc historically basically didn't have any atomics outside of the > 'test and set byte' one, so if you wanted an atomic counter thing, and > you cared about sparc, you had to play games with "some bits of the > counter are the atomic byte lock". > > And we do not care about that Sparc horror any *more*, but we used to. FWIW, PA-RISC is no better - the same "fetch and replace with constant" kind of primitive as for sparc32, only the constant is (u32)0 instead of (u8)~0. And unlike sparc64, 64bit variant didn't get better.
On Mon, Apr 08, 2024 at 06:03:11PM +0100, Matthew Wilcox wrote: > On Mon, Apr 08, 2024 at 09:55:23AM -0700, Paul E. McKenney wrote: > > On Mon, Apr 08, 2024 at 05:02:37PM +0100, Matthew Wilcox wrote: > > > In my ideal world, the compiler would turn this into: > > > > > > newfolio->flags |= folio->flags & MIGRATE_MASK; > > > > Why not accumulate the changes in a mask, and then apply the mask the > > one time? (In situations where __folio_set_foo() need not apply.) > > Yes, absolutely, we can, should and probably eventually will do this > when it gets to the top of somebody's todo list. But it irks me that > we can't tell the compiler this is a safe transformation for it to make. > There are a number of places where similar things happen. > > $ git grep folio_test.*folio_test > > will find you 82 of them (where they happen to be on the same line) > > if (folio_test_dirty(folio) || folio_test_locked(folio) || > folio_test_writeback(folio)) > break; > > turns into: > > 1f41: 48 8b 29 mov (%rcx),%rbp > 1f44: 48 c1 ed 04 shr $0x4,%rbp > 1f48: 83 e5 01 and $0x1,%ebp > 1f4b: 0f 85 d5 00 00 00 jne 2026 <filemap_range_has_writeback+0x1a6> > 1f51: 48 8b 29 mov (%rcx),%rbp > 1f54: 83 e5 01 and $0x1,%ebp > 1f57: 0f 85 c9 00 00 00 jne 2026 <filemap_range_has_writeback+0x1a6> > 1f5d: 48 8b 29 mov (%rcx),%rbp > 1f60: 48 d1 ed shr $1,%rbp > 1f63: 83 e5 01 and $0x1,%ebp > 1f66: 0f 85 ba 00 00 00 jne 2026 <filemap_range_has_writeback+0x1a6> > > rather than _one_ load from rcx and a test against a mask. Agreed, it would be nice if we could convince the compiler to do this for us, preferably without breaking anything. > > If it turns out that we really do need a not-quite-volatile, what exactly > > does it do? You clearly want it to be able to be optimized so as to merge > > similar accesses. Is there a limit to the number of accesses that can > > be merged or to the region of code over which such merging is permitted? > > Either way, how is the compiler informed of these limits? > > Right, like I said, it's not going to be easy to define exactly what we > want. Or to convince the usual suspects that any definition we might come up with is useful/implementable/teacheable/... :-/ Thanx, Paul
On Mon, 8 Apr 2024 at 11:14, Al Viro <viro@zeniv.linux.org.uk> wrote: > > FWIW, PA-RISC is no better - the same "fetch and replace with constant" > kind of primitive as for sparc32, only the constant is (u32)0 instead > of (u8)~0. And unlike sparc64, 64bit variant didn't get better. Heh. The thing about PA-RISC is that it is actually *so* much worse that it was never useful for an arithmetic type. IOW, the fact that sparc used just a byte meant that the aotmic_t hackery on sparc still gave us 24 useful bits in a 32-bit atomic_t. So long ago, we used to have an arithmetic atomic_t that was 32-bit on all sane architectures, but only had a 24-bit range on sparc. And I know you know all this, I'm just explaining the horror for the audience. On PA-RISC you couldn't do that horrendous trick, so parist just used the "we use a hashed spinlock for all atomics", and "atomic_t" was a regular full-sized integer type. Anyway, the sparc 24-bit atomics were actually replaced by the PA-RISC version back twenty years ago (almost to the day): https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/?id=373f1583c5c5 and while we still had some left-over of that horror in the git tree up until 2011 (until commit 348738afe530: "sparc32: drop unused atomic24 support") we probably should have made the "arch_atomic_xyz()" ops work on generic types rather than "atomic_t" for a long long time, so that you could use them on other things than "atomic_t" and friends. You can see the casting horror here, for example: include/asm-generic/bitops/atomic.h where we do that cast from "volatile unsigned long *p" to "atomic_long_t *" just to use the raw_atomic_long_xyz() operations. It would make more sense if the raw atomics took that "native" volatile unsigned long pointer directly. (And here that "volatile" is not because it's necessary used as a volatile - it is - but simply because it's the most permissive type of pointer. You can see other places using "const volatile unsigned long" pointers for the same reason: passing in a non-const or non-volatile pointer is perfectly fine). Linus
On Mon, Apr 08, 2024 at 06:03:11PM +0100, Matthew Wilcox wrote: > On Mon, Apr 08, 2024 at 09:55:23AM -0700, Paul E. McKenney wrote: > > On Mon, Apr 08, 2024 at 05:02:37PM +0100, Matthew Wilcox wrote: > > > In my ideal world, the compiler would turn this into: > > > > > > newfolio->flags |= folio->flags & MIGRATE_MASK; > > > > Why not accumulate the changes in a mask, and then apply the mask the > > one time? (In situations where __folio_set_foo() need not apply.) > > But it irks me that we can't tell the compiler this is a safe > transformation for it to make. There are a number of places where > similar things happen. Same thing comes up with bignum code - you really want to be able to tell the compiler "you can apply x/y/z optimizations for these functions", e.g. replace add(mul(a, b), c) with fma(a, b, c). Compiler optimizations are just algebraic transformations, we just need a way to tell the compiler what the algebraic properties of our functions are.
On Mon, Apr 08, 2024 at 08:58:18PM -0400, Kent Overstreet wrote: > On Mon, Apr 08, 2024 at 06:03:11PM +0100, Matthew Wilcox wrote: > > On Mon, Apr 08, 2024 at 09:55:23AM -0700, Paul E. McKenney wrote: > > > On Mon, Apr 08, 2024 at 05:02:37PM +0100, Matthew Wilcox wrote: > > > > In my ideal world, the compiler would turn this into: > > > > > > > > newfolio->flags |= folio->flags & MIGRATE_MASK; > > > > > > Why not accumulate the changes in a mask, and then apply the mask the > > > one time? (In situations where __folio_set_foo() need not apply.) > > > > But it irks me that we can't tell the compiler this is a safe > > transformation for it to make. There are a number of places where > > similar things happen. > > Same thing comes up with bignum code - you really want to be able to > tell the compiler "you can apply x/y/z optimizations for these > functions", e.g. replace add(mul(a, b), c) with fma(a, b, c). > > Compiler optimizations are just algebraic transformations, we just need > a way to tell the compiler what the algebraic properties of our > functions are. That might indeed be more straightforward than doing this on a per-type basis. But the C++ guys would likely just start shouting "template metaprogramming!!!" ;-) Thanx, Paul
On Mon, Mar 25, 2024 at 01:59:55PM -0700, Boqun Feng wrote: > On Mon, Mar 25, 2024 at 10:44:45AM +0000, Mark Rutland wrote: > [...] > > > > > > * I choose to re-implement atomics in Rust `asm` because we are still > > > figuring out how we can make it easy and maintainable for Rust to call > > > a C function _inlinely_ (Gary makes some progress [2]). Otherwise, > > > atomic primitives would be function calls, and that can be performance > > > bottleneck in a few cases. > > > > I don't think we want to maintain two copies of each architecture's atomics. > > This gets painful very quickly (e.g. as arm64's atomics get patched between > > LL/SC and LSE forms). > > > > No argument here ;-) Didn't we talk about bindgen being able to convert inline C functions into equivalent inline Rust functions? ISTR that getting stuck on Rust not having a useful inline asm. But fixing all that in a hurry seems like the much saner path forward.
On Tue, Apr 09, 2024 at 12:50:15PM +0200, Peter Zijlstra wrote: > On Mon, Mar 25, 2024 at 01:59:55PM -0700, Boqun Feng wrote: > > On Mon, Mar 25, 2024 at 10:44:45AM +0000, Mark Rutland wrote: > > [...] > > > > > > > > * I choose to re-implement atomics in Rust `asm` because we are still > > > > figuring out how we can make it easy and maintainable for Rust to call > > > > a C function _inlinely_ (Gary makes some progress [2]). Otherwise, > > > > atomic primitives would be function calls, and that can be performance > > > > bottleneck in a few cases. > > > > > > I don't think we want to maintain two copies of each architecture's atomics. > > > This gets painful very quickly (e.g. as arm64's atomics get patched between > > > LL/SC and LSE forms). > > > > > > > No argument here ;-) > > Didn't we talk about bindgen being able to convert inline C functions > into equivalent inline Rust functions? ISTR that getting stuck on Rust Yes, we did. > not having a useful inline asm. > Mostly two features were missing: 1) asm goto and 2) memory operands, #1 gets implemented[1] by Gary, and should be available in Rust 1.78 (plan to release at May 2, 2024); For #2, my understanding is that arch-specific effort is needed (since different architectures may have different contraints on memory operands), I haven't yet found anyone is working on this. (background explanation for broader audience: in GCC's inline asm, you can specify an memory location, other than a register location, as an input or output of an asm block's operand[2], but current Rust inline asm doesn't provide this functionality, by default, without option "pure", "nomem", etc, every asm block in Rust can be thought as a C asm block with "memory" clobber) That being said, if you look at the link I shared or this gist from Gary: https://gist.github.com/nbdd0121/d4bf7dd7f9b6d6b50fa18b1092f45a3c there is another way (yeah, we probably also have discussed this previously), basically what it does is compiling the functions in a C file as LLVM IR, so that Rust can call these functions at LLVM IR level. This in theory is doing some local LTO, and I've tested that it works for asm blocks. We still need to tweak our build system to make this work, but should it work, it would mean that Rust can call a C function in a pretty efficient way. > But fixing all that in a hurry seems like the much saner path forward. So a sane plan to me is wiring our atomics into Rust functions via what Mark has (i.e. starting off as FFI calls), then we can switch to the "local LTO" approach when it's ready. In case that "local LTO" needs more work and we do have performance need, we can always either 1) manually implement some primitives in Rust asm, or 2) look into how bindgen or other tools can translate simple C functions (asm blocks) into Rust. Regards, Boqun [1]: https://github.com/rust-lang/rust/pull/119365 [2]: https://gcc.gnu.org/onlinedocs/gcc/Simple-Constraints.html#index-m-in-constraint