Message ID | 20160114222046.GH3818@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Paul, On Thu, Jan 14, 2016 at 02:20:46PM -0800, Paul E. McKenney wrote: > On Thu, Jan 14, 2016 at 01:24:34PM -0800, Leonid Yegoshin wrote: > > It is not so simple, I mean "local ordering for address and data > > dependencies". Local ordering is NOT enough. It happens that current > > MIPS R6 doesn't require in your example smp_read_barrier_depends() > > but in discussion it comes out that it may not. Because without > > smp_read_barrier_depends() your example can be a part of Will's > > WRC+addr+addr and we found some design which easily can bump into > > this test. And that design actually performs "local ordering for > > address and data dependencies" too. > > As noted in another email in this thread, I do not believe that > WRC+addr+addr needs to be prohibited. Sounds like Will and I need to > get our story straight, though. I think you figured this out while I was sleeping, but just to confirm: 1. The MIPS64 ISA doc [1] talks about SYNC in a way that applies only to memory accesses appearing in *program-order* before the SYNC 2. We need WRC+sync+addr to work, which means that the SYNC in P1 must also capture the store in P0 as being "before" the barrier. Leonid reckons it works, but his explanation [2] focussed on the address dependency in P2 as to why this works. If that is the case (i.e. address dependency provides global transitivity), then WRC+addr+addr should also work (even though its not required). 3. It seems that WRC+addr+addr doesn't work, so I'm still suspicious about WRC+sync+addr, because neither the architecture document or Leonid's explanation tell me that it should be forbidden. Will [1] https://imgtec.com/?do-download=4302 [2] http://lkml.kernel.org/r/569565DA.2010903@imgtec.com (scroll to the end)
On 01/15/2016 01:57 AM, Will Deacon wrote: > Paul, > > > I think you figured this out while I was sleeping, but just to confirm: > > 1. The MIPS64 ISA doc [1] talks about SYNC in a way that applies only > to memory accesses appearing in *program-order* before the SYNC > > 2. We need WRC+sync+addr to work, which means that the SYNC in P1 must > also capture the store in P0 as being "before" the barrier. Leonid > reckons it works, but his explanation [2] focussed on the address > dependency in P2 as to why this works. If that is the case (i.e. > address dependency provides global transitivity), then WRC+addr+addr > should also work (even though its not required). No, it is not correct. There is one old design which provides access to core (thread0 + thread1) write-buffers for threads load in advance of it is visible to other cores. It means, that WRC+sync+addr passes because of SYNC in write thread and register dependency inside other thread but WRC+addr+addr may fail because other core may get a stale data. > > 3. It seems that WRC+addr+addr doesn't work, so I'm still suspicious > about WRC+sync+addr, because neither the architecture document or > Leonid's explanation tell me that it should be forbidden. > > Will > > [1] https://imgtec.com/?do-download=4302 > [2] http://lkml.kernel.org/r/569565DA.2010903@imgtec.com (scroll to the end)
On Thu, Jan 14, 2016 at 02:20:46PM -0800, Paul E. McKenney wrote: > On Thu, Jan 14, 2016 at 01:24:34PM -0800, Leonid Yegoshin wrote: > > On 01/14/2016 12:48 PM, Paul E. McKenney wrote: > > > > > >So SYNC_RMB is intended to implement smp_rmb(), correct? > > Yes. > > > > > >You could use SYNC_ACQUIRE() to implement read_barrier_depends() and > > >smp_read_barrier_depends(), but SYNC_RMB probably does not suffice. > > > > If smp_read_barrier_depends() is used to separate not only two reads > > but read pointer and WRITE basing on that pointer (example below) - > > yes. I just doesn't see any example of this in famous > > Documentation/memory-barriers.txt and had no chance to know what you > > use it in this way too. > > Well, Documentation/memory-barriers.txt was intended as a guide for Linux > kernel hackers, and not for hardware architects. Yeah, this goes under the header: memory-barriers.txt is _NOT_ a specification (I seem to keep repeating this). > ------------------------------------------------------------------------ > > commit 955720966e216b00613fcf60188d507c103f0e80 > Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > Date: Thu Jan 14 14:17:04 2016 -0800 > > documentation: Subsequent writes ordered by rcu_dereference() > > The current memory-barriers.txt does not address the possibility of > a write to a dereferenced pointer. This should be rare, How are these rare? Isn't: rcu_read_lock() obj = rcu_dereference(ptr); if (!atomic_inc_not_zero(&obj->ref)) obj = NULL; rcu_read_unlock(); a _very_ common thing to do?
On Tue, Jan 26, 2016 at 11:24:02AM +0100, Peter Zijlstra wrote: > On Thu, Jan 14, 2016 at 02:20:46PM -0800, Paul E. McKenney wrote: > > On Thu, Jan 14, 2016 at 01:24:34PM -0800, Leonid Yegoshin wrote: > > > On 01/14/2016 12:48 PM, Paul E. McKenney wrote: > > > > > > > >So SYNC_RMB is intended to implement smp_rmb(), correct? > > > Yes. > > > > > > > >You could use SYNC_ACQUIRE() to implement read_barrier_depends() and > > > >smp_read_barrier_depends(), but SYNC_RMB probably does not suffice. > > > > > > If smp_read_barrier_depends() is used to separate not only two reads > > > but read pointer and WRITE basing on that pointer (example below) - > > > yes. I just doesn't see any example of this in famous > > > Documentation/memory-barriers.txt and had no chance to know what you > > > use it in this way too. > > > > Well, Documentation/memory-barriers.txt was intended as a guide for Linux > > kernel hackers, and not for hardware architects. > > Yeah, this goes under the header: memory-barriers.txt is _NOT_ a > specification (I seem to keep repeating this). > > > ------------------------------------------------------------------------ > > > > commit 955720966e216b00613fcf60188d507c103f0e80 > > Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > > Date: Thu Jan 14 14:17:04 2016 -0800 > > > > documentation: Subsequent writes ordered by rcu_dereference() > > > > The current memory-barriers.txt does not address the possibility of > > a write to a dereferenced pointer. This should be rare, > > How are these rare? Isn't: > > rcu_read_lock() > obj = rcu_dereference(ptr); > if (!atomic_inc_not_zero(&obj->ref)) > obj = NULL; > rcu_read_unlock(); > > a _very_ common thing to do? It is, but it provides its own barriers, so does not need to rely on dependency ordering. Thanx, Paul
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt index f49c15f7864f..c66ba46d8079 100644 --- a/Documentation/memory-barriers.txt +++ b/Documentation/memory-barriers.txt @@ -555,6 +555,30 @@ between the address load and the data load: This enforces the occurrence of one of the two implications, and prevents the third possibility from arising. +A data-dependency barrier must also order against dependent writes: + + CPU 1 CPU 2 + =============== =============== + { A == 1, B == 2, C = 3, P == &A, Q == &C } + B = 4; + <write barrier> + WRITE_ONCE(P, &B); + Q = READ_ONCE(P); + <data dependency barrier> + *Q = 5; + +The data-dependency barrier must order the read into Q with the store +into *Q. This prohibits this outcome: + + (Q == B) && (B == 4) + +Please note that this pattern should be rare. After all, the whole point +of dependency ordering is to -prevent- writes to the data structure, along +with the expensive cache misses associated with those writes. This pattern +can be used to record rare error conditions and the like, and the ordering +prevents such records from being lost. + + [!] Note that this extremely counterintuitive situation arises most easily on machines with split caches, so that, for example, one cache bank processes even-numbered cache lines and the other bank processes odd-numbered cache