diff mbox series

[v2] docs/RCU: Bring smp_wmb() back

Message ID 20230617145346.29009-1-mmpgouride@gmail.com (mailing list archive)
State New, archived
Headers show
Series [v2] docs/RCU: Bring smp_wmb() back | expand

Commit Message

Alan Huang June 17, 2023, 2:53 p.m. UTC
There are two memory ordering required in the insertion algorithm,
we need to make sure obj->key is updated before obj->obj_node.next
and obj->refcnt, atomic_set_release is not enough to provide the
required memory barrier.

Signed-off-by: Alan Huang <mmpgouride@gmail.com>
---
 Documentation/RCU/rculist_nulls.rst | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

Comments

Matthew Wilcox (Oracle) June 17, 2023, 5:23 p.m. UTC | #1
On Sat, Jun 17, 2023 at 02:53:46PM +0000, Alan Huang wrote:
> There are two memory ordering required in the insertion algorithm,
> we need to make sure obj->key is updated before obj->obj_node.next
> and obj->refcnt, atomic_set_release is not enough to provide the
> required memory barrier.
> 
> Signed-off-by: Alan Huang <mmpgouride@gmail.com>
> ---
>  Documentation/RCU/rculist_nulls.rst | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/RCU/rculist_nulls.rst b/Documentation/RCU/rculist_nulls.rst
> index e06ed40bb6..77244adbdf 100644
> --- a/Documentation/RCU/rculist_nulls.rst
> +++ b/Documentation/RCU/rculist_nulls.rst
> @@ -98,7 +98,7 @@ Quoting Corey Minyard::
>  ----------------------
>  
>  We need to make sure a reader cannot read the new 'obj->obj_node.next' value
> -and previous value of 'obj->key'. Otherwise, an item could be deleted
> +and previous value of 'obj->key' at the same time. Otherwise, an item could be deleted

"at the same time" doesn't make a lot of sense to me.  CPUs don't do
anything "at the same time".  I think the way this is worded now is
fine; I tried coming up with a few variants of this, but none are as
clear and succinct as what is there now.

>  from a chain, and inserted into another chain. If new chain was empty
>  before the move, 'next' pointer is NULL, and lockless reader can not
>  detect the fact that it missed following items in original chain.
> @@ -112,7 +112,12 @@ detect the fact that it missed following items in original chain.
>    obj = kmem_cache_alloc(...);
>    lock_chain(); // typically a spin_lock()
>    obj->key = key;
> -  atomic_set_release(&obj->refcnt, 1); // key before refcnt
> +  /*
> +  * we need to make sure obj->key is updated before obj->obj_node.next
> +  * and obj->refcnt
> +  */
> +  smp_wmb();
> +  atomic_set(&obj->refcnt, 1);

Perhaps this could be a little clearer for those of us who aren't as
deep into the memory model as others ... are you saying that the
atomic_set_release() would only order references to obj->refcount and
would not order accesses to obj->key?  Because reading this:

     The use of ACQUIRE and RELEASE operations generally precludes the need
     for other sorts of memory barrier.  In addition, a RELEASE+ACQUIRE pair is
     -not- guaranteed to act as a full memory barrier.  However, after an
     ACQUIRE on a given variable, all memory accesses preceding any prior
     RELEASE on that same variable are guaranteed to be visible.  In other
     words, within a given variable's critical section, all accesses of all
     previous critical sections for that variable are guaranteed to have
     completed.

makes me think that this example is fine; anybody doing a load-acquire
on obj->refcount will see the update to obj->key that happened before
the store-release to obj->refcount.

I am not an expert and would welcome the opportunity to learn more here.
Alan Huang June 17, 2023, 5:55 p.m. UTC | #2
> 2023年6月18日 01:23,Matthew Wilcox <willy@infradead.org> 写道:
> 
> On Sat, Jun 17, 2023 at 02:53:46PM +0000, Alan Huang wrote:
>> There are two memory ordering required in the insertion algorithm,
>> we need to make sure obj->key is updated before obj->obj_node.next
>> and obj->refcnt, atomic_set_release is not enough to provide the
>> required memory barrier.
>> 
>> Signed-off-by: Alan Huang <mmpgouride@gmail.com>
>> ---
>> Documentation/RCU/rculist_nulls.rst | 9 +++++++--
>> 1 file changed, 7 insertions(+), 2 deletions(-)
>> 
>> diff --git a/Documentation/RCU/rculist_nulls.rst b/Documentation/RCU/rculist_nulls.rst
>> index e06ed40bb6..77244adbdf 100644
>> --- a/Documentation/RCU/rculist_nulls.rst
>> +++ b/Documentation/RCU/rculist_nulls.rst
>> @@ -98,7 +98,7 @@ Quoting Corey Minyard::
>> ----------------------
>> 
>> We need to make sure a reader cannot read the new 'obj->obj_node.next' value
>> -and previous value of 'obj->key'. Otherwise, an item could be deleted
>> +and previous value of 'obj->key' at the same time. Otherwise, an item could be deleted
> 
> "at the same time" doesn't make a lot of sense to me.  CPUs don't do
> anything "at the same time".  I think the way this is worded now is
> fine; I tried coming up with a few variants of this, but none are as
> clear and succinct as what is there now.
> 
>> from a chain, and inserted into another chain. If new chain was empty
>> before the move, 'next' pointer is NULL, and lockless reader can not
>> detect the fact that it missed following items in original chain.
>> @@ -112,7 +112,12 @@ detect the fact that it missed following items in original chain.
>>   obj = kmem_cache_alloc(...);
>>   lock_chain(); // typically a spin_lock()
>>   obj->key = key;
>> -  atomic_set_release(&obj->refcnt, 1); // key before refcnt
>> +  /*
>> +  * we need to make sure obj->key is updated before obj->obj_node.next
>> +  * and obj->refcnt
>> +  */
>> +  smp_wmb();
>> +  atomic_set(&obj->refcnt, 1);
> 
> Perhaps this could be a little clearer for those of us who aren't as
> deep into the memory model as others ... are you saying that the
> atomic_set_release() would only order references to obj->refcount and
> would not order accesses to obj->key?  Because reading this:
> 
>     The use of ACQUIRE and RELEASE operations generally precludes the need
>     for other sorts of memory barrier.  In addition, a RELEASE+ACQUIRE pair is
>     -not- guaranteed to act as a full memory barrier.  However, after an
>     ACQUIRE on a given variable, all memory accesses preceding any prior
>     RELEASE on that same variable are guaranteed to be visible.  In other
>     words, within a given variable's critical section, all accesses of all
>     previous critical sections for that variable are guaranteed to have
>     completed.
> 
> makes me think that this example is fine; anybody doing a load-acquire
> on obj->refcount will see the update to obj->key that happened before
> the store-release to obj->refcount.

Two memory ordering required here, atomic_set_release only provides one of them (the one you mentioned)

The objects are allocated with SLAB_TYPESAFE_BY_RCU, and there is:
	
		n->next = first;

in hlist_add_head_rcu, which modifies obj->obj_node.next.

So, we must make sure obj->key is updated before obj->obj_node.next, without smp_wmb(), we can read
the new 'obj->obj_node.next’ value and previous value of 'obj->key’ at the same time, and in this case, we
can not detect the movement of the object.

The following link might be helpful:

		https://patchwork.ozlabs.org/project/netdev/patch/491C282A.5050802@cosmosbay.com/


> 
> I am not an expert and would welcome the opportunity to learn more here.
Matthew Wilcox (Oracle) June 17, 2023, 6:07 p.m. UTC | #3
On Sun, Jun 18, 2023 at 01:55:10AM +0800, Alan Huang wrote:
> 
> > 2023年6月18日 01:23,Matthew Wilcox <willy@infradead.org> 写道:
> > 
> > On Sat, Jun 17, 2023 at 02:53:46PM +0000, Alan Huang wrote:
> >> There are two memory ordering required in the insertion algorithm,
> >> we need to make sure obj->key is updated before obj->obj_node.next
> >> and obj->refcnt, atomic_set_release is not enough to provide the
> >> required memory barrier.
> >> 
> >> Signed-off-by: Alan Huang <mmpgouride@gmail.com>
> >> ---
> >> Documentation/RCU/rculist_nulls.rst | 9 +++++++--
> >> 1 file changed, 7 insertions(+), 2 deletions(-)
> >> 
> >> diff --git a/Documentation/RCU/rculist_nulls.rst b/Documentation/RCU/rculist_nulls.rst
> >> index e06ed40bb6..77244adbdf 100644
> >> --- a/Documentation/RCU/rculist_nulls.rst
> >> +++ b/Documentation/RCU/rculist_nulls.rst
> >> @@ -98,7 +98,7 @@ Quoting Corey Minyard::
> >> ----------------------
> >> 
> >> We need to make sure a reader cannot read the new 'obj->obj_node.next' value
> >> -and previous value of 'obj->key'. Otherwise, an item could be deleted
> >> +and previous value of 'obj->key' at the same time. Otherwise, an item could be deleted
> > 
> > "at the same time" doesn't make a lot of sense to me.  CPUs don't do
> > anything "at the same time".  I think the way this is worded now is
> > fine; I tried coming up with a few variants of this, but none are as
> > clear and succinct as what is there now.
> > 
> >> from a chain, and inserted into another chain. If new chain was empty
> >> before the move, 'next' pointer is NULL, and lockless reader can not
> >> detect the fact that it missed following items in original chain.
> >> @@ -112,7 +112,12 @@ detect the fact that it missed following items in original chain.
> >>   obj = kmem_cache_alloc(...);
> >>   lock_chain(); // typically a spin_lock()
> >>   obj->key = key;
> >> -  atomic_set_release(&obj->refcnt, 1); // key before refcnt
> >> +  /*
> >> +  * we need to make sure obj->key is updated before obj->obj_node.next
> >> +  * and obj->refcnt
> >> +  */
> >> +  smp_wmb();
> >> +  atomic_set(&obj->refcnt, 1);
> > 
> > Perhaps this could be a little clearer for those of us who aren't as
> > deep into the memory model as others ... are you saying that the
> > atomic_set_release() would only order references to obj->refcount and
> > would not order accesses to obj->key?  Because reading this:
> > 
> >     The use of ACQUIRE and RELEASE operations generally precludes the need
> >     for other sorts of memory barrier.  In addition, a RELEASE+ACQUIRE pair is
> >     -not- guaranteed to act as a full memory barrier.  However, after an
> >     ACQUIRE on a given variable, all memory accesses preceding any prior
> >     RELEASE on that same variable are guaranteed to be visible.  In other
> >     words, within a given variable's critical section, all accesses of all
> >     previous critical sections for that variable are guaranteed to have
> >     completed.
> > 
> > makes me think that this example is fine; anybody doing a load-acquire
> > on obj->refcount will see the update to obj->key that happened before
> > the store-release to obj->refcount.
> 
> Two memory ordering required here, atomic_set_release only provides one of them (the one you mentioned)
> 
> The objects are allocated with SLAB_TYPESAFE_BY_RCU, and there is:
> 	
> 		n->next = first;
> 
> in hlist_add_head_rcu, which modifies obj->obj_node.next.

But isn't that the bug?  ie this assignment should be WRITE_ONCE() or
smp_store_release() or rcu_assign_pointer() or whichever of these
similar functions is actually appropriate?

> So, we must make sure obj->key is updated before obj->obj_node.next, without smp_wmb(), we can read
> the new 'obj->obj_node.next’ value and previous value of 'obj->key’ at the same time, and in this case, we
> can not detect the movement of the object.
> 
> The following link might be helpful:
> 
> 		https://patchwork.ozlabs.org/project/netdev/patch/491C282A.5050802@cosmosbay.com/
> 
> 
> > 
> > I am not an expert and would welcome the opportunity to learn more here.
> 
>
Alan Huang June 17, 2023, 6:21 p.m. UTC | #4
> 2023年6月18日 02:07,Matthew Wilcox <willy@infradead.org> 写道:
> 
> On Sun, Jun 18, 2023 at 01:55:10AM +0800, Alan Huang wrote:
>> 
>>> 2023年6月18日 01:23,Matthew Wilcox <willy@infradead.org> 写道:
>>> 
>>> On Sat, Jun 17, 2023 at 02:53:46PM +0000, Alan Huang wrote:
>>>> There are two memory ordering required in the insertion algorithm,
>>>> we need to make sure obj->key is updated before obj->obj_node.next
>>>> and obj->refcnt, atomic_set_release is not enough to provide the
>>>> required memory barrier.
>>>> 
>>>> Signed-off-by: Alan Huang <mmpgouride@gmail.com>
>>>> ---
>>>> Documentation/RCU/rculist_nulls.rst | 9 +++++++--
>>>> 1 file changed, 7 insertions(+), 2 deletions(-)
>>>> 
>>>> diff --git a/Documentation/RCU/rculist_nulls.rst b/Documentation/RCU/rculist_nulls.rst
>>>> index e06ed40bb6..77244adbdf 100644
>>>> --- a/Documentation/RCU/rculist_nulls.rst
>>>> +++ b/Documentation/RCU/rculist_nulls.rst
>>>> @@ -98,7 +98,7 @@ Quoting Corey Minyard::
>>>> ----------------------
>>>> 
>>>> We need to make sure a reader cannot read the new 'obj->obj_node.next' value
>>>> -and previous value of 'obj->key'. Otherwise, an item could be deleted
>>>> +and previous value of 'obj->key' at the same time. Otherwise, an item could be deleted
>>> 
>>> "at the same time" doesn't make a lot of sense to me.  CPUs don't do
>>> anything "at the same time".  I think the way this is worded now is
>>> fine; I tried coming up with a few variants of this, but none are as
>>> clear and succinct as what is there now.
>>> 
>>>> from a chain, and inserted into another chain. If new chain was empty
>>>> before the move, 'next' pointer is NULL, and lockless reader can not
>>>> detect the fact that it missed following items in original chain.
>>>> @@ -112,7 +112,12 @@ detect the fact that it missed following items in original chain.
>>>>  obj = kmem_cache_alloc(...);
>>>>  lock_chain(); // typically a spin_lock()
>>>>  obj->key = key;
>>>> -  atomic_set_release(&obj->refcnt, 1); // key before refcnt
>>>> +  /*
>>>> +  * we need to make sure obj->key is updated before obj->obj_node.next
>>>> +  * and obj->refcnt
>>>> +  */
>>>> +  smp_wmb();
>>>> +  atomic_set(&obj->refcnt, 1);
>>> 
>>> Perhaps this could be a little clearer for those of us who aren't as
>>> deep into the memory model as others ... are you saying that the
>>> atomic_set_release() would only order references to obj->refcount and
>>> would not order accesses to obj->key?  Because reading this:
>>> 
>>>    The use of ACQUIRE and RELEASE operations generally precludes the need
>>>    for other sorts of memory barrier.  In addition, a RELEASE+ACQUIRE pair is
>>>    -not- guaranteed to act as a full memory barrier.  However, after an
>>>    ACQUIRE on a given variable, all memory accesses preceding any prior
>>>    RELEASE on that same variable are guaranteed to be visible.  In other
>>>    words, within a given variable's critical section, all accesses of all
>>>    previous critical sections for that variable are guaranteed to have
>>>    completed.
>>> 
>>> makes me think that this example is fine; anybody doing a load-acquire
>>> on obj->refcount will see the update to obj->key that happened before
>>> the store-release to obj->refcount.
>> 
>> Two memory ordering required here, atomic_set_release only provides one of them (the one you mentioned)
>> 
>> The objects are allocated with SLAB_TYPESAFE_BY_RCU, and there is:
>> 
>> n->next = first;
>> 
>> in hlist_add_head_rcu, which modifies obj->obj_node.next.
> 
> But isn't that the bug?  ie this assignment should be WRITE_ONCE() or
> smp_store_release() or rcu_assign_pointer() or whichever of these
> similar functions is actually appropriate?

Without SLAB_TYPESAFE_BY_RCU, it’s fine, because it only modifies local data, which has not been published.

But with SLAB_TYPESAFE_BY_RCU, it might be a bug, but I’m not very sure.

> 
>> So, we must make sure obj->key is updated before obj->obj_node.next, without smp_wmb(), we can read
>> the new 'obj->obj_node.next’ value and previous value of 'obj->key’ at the same time, and in this case, we
>> can not detect the movement of the object.
>> 
>> The following link might be helpful:
>> 
>> https://patchwork.ozlabs.org/project/netdev/patch/491C282A.5050802@cosmosbay.com/
>> 
>> 
>>> 
>>> I am not an expert and would welcome the opportunity to learn more here.
Paul E. McKenney June 20, 2023, 10:33 p.m. UTC | #5
On Sat, Jun 17, 2023 at 02:53:46PM +0000, Alan Huang wrote:
> There are two memory ordering required in the insertion algorithm,
> we need to make sure obj->key is updated before obj->obj_node.next
> and obj->refcnt, atomic_set_release is not enough to provide the
> required memory barrier.
> 
> Signed-off-by: Alan Huang <mmpgouride@gmail.com>
> ---
>  Documentation/RCU/rculist_nulls.rst | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/RCU/rculist_nulls.rst b/Documentation/RCU/rculist_nulls.rst
> index e06ed40bb6..77244adbdf 100644
> --- a/Documentation/RCU/rculist_nulls.rst
> +++ b/Documentation/RCU/rculist_nulls.rst
> @@ -98,7 +98,7 @@ Quoting Corey Minyard::
>  ----------------------
>  
>  We need to make sure a reader cannot read the new 'obj->obj_node.next' value
> -and previous value of 'obj->key'. Otherwise, an item could be deleted
> +and previous value of 'obj->key' at the same time. Otherwise, an item could be deleted
>  from a chain, and inserted into another chain. If new chain was empty
>  before the move, 'next' pointer is NULL, and lockless reader can not
>  detect the fact that it missed following items in original chain.
> @@ -112,7 +112,12 @@ detect the fact that it missed following items in original chain.
>    obj = kmem_cache_alloc(...);
>    lock_chain(); // typically a spin_lock()
>    obj->key = key;
> -  atomic_set_release(&obj->refcnt, 1); // key before refcnt
> +  /*
> +  * we need to make sure obj->key is updated before obj->obj_node.next
> +  * and obj->refcnt
> +  */
> +  smp_wmb();
> +  atomic_set(&obj->refcnt, 1);

You lost me on this one.

From what I can see, in the original the atomic_set_release() provided
ordering between the ->key assignment and the ->refcnt assignment,
and then the rcu_assign_pointer() within hlist_add_head_rcu() provided
ordering between both the ->key and the ->refcnt assignments on the one
hand and the list insertion on the other.

What am I missing here?

							Thanx, Paul

>    hlist_add_head_rcu(&obj->obj_node, list);
>    unlock_chain(); // typically a spin_unlock()
>  
> -- 
> 2.34.1
>
Alan Huang June 21, 2023, 1:24 a.m. UTC | #6
> 2023年6月21日 06:33,Paul E. McKenney <paulmck@kernel.org> 写道:
> 
> On Sat, Jun 17, 2023 at 02:53:46PM +0000, Alan Huang wrote:
>> There are two memory ordering required in the insertion algorithm,
>> we need to make sure obj->key is updated before obj->obj_node.next
>> and obj->refcnt, atomic_set_release is not enough to provide the
>> required memory barrier.
>> 
>> Signed-off-by: Alan Huang <mmpgouride@gmail.com>
>> ---
>> Documentation/RCU/rculist_nulls.rst | 9 +++++++--
>> 1 file changed, 7 insertions(+), 2 deletions(-)
>> 
>> diff --git a/Documentation/RCU/rculist_nulls.rst b/Documentation/RCU/rculist_nulls.rst
>> index e06ed40bb6..77244adbdf 100644
>> --- a/Documentation/RCU/rculist_nulls.rst
>> +++ b/Documentation/RCU/rculist_nulls.rst
>> @@ -98,7 +98,7 @@ Quoting Corey Minyard::
>> ----------------------
>> 
>> We need to make sure a reader cannot read the new 'obj->obj_node.next' value
>> -and previous value of 'obj->key'. Otherwise, an item could be deleted
>> +and previous value of 'obj->key' at the same time. Otherwise, an item could be deleted
>> from a chain, and inserted into another chain. If new chain was empty
>> before the move, 'next' pointer is NULL, and lockless reader can not
>> detect the fact that it missed following items in original chain.
>> @@ -112,7 +112,12 @@ detect the fact that it missed following items in original chain.
>>   obj = kmem_cache_alloc(...);
>>   lock_chain(); // typically a spin_lock()
>>   obj->key = key;
>> -  atomic_set_release(&obj->refcnt, 1); // key before refcnt
>> +  /*
>> +  * we need to make sure obj->key is updated before obj->obj_node.next
>> +  * and obj->refcnt
>> +  */
>> +  smp_wmb();
>> +  atomic_set(&obj->refcnt, 1);
> 
> You lost me on this one.
> 
> From what I can see, in the original the atomic_set_release() provided
> ordering between the ->key assignment and the ->refcnt assignment,
> and then the rcu_assign_pointer() within hlist_add_head_rcu() provided
> ordering between both the ->key and the ->refcnt assignments on the one
> hand and the list insertion on the other.
> 
> What am I missing here?

The objects are allocated with SLAB_TYPESAFE_BY_RCU, and there is 
n->next = first within hlist_add_head_rcu() before rcu_assign_pointer(), which
modifies obj->obj_node.next. There may be readers holding the reference of obj in
lockless_lookup, and when updater modifies ->next, readers can see the change immediately 
because of SLAB_TYPESAFE_BY_RCU.

So, the line n->next = first is running concurrently with the line pos->next in lockless_lookup.

As you can see, after pos->next in lockless_lookup is the smp_rmb(), 
we need to make sure obj->key is updated before obj->obj_node.next (which is n->next = first in hlist_add_head_rcu).

So, the smp_wmb() is synchronized with the smp_rmb() in lockless_lookup() and the load acquire in try_get_ref().

> 
> Thanx, Paul
> 
>>   hlist_add_head_rcu(&obj->obj_node, list);
>>   unlock_chain(); // typically a spin_unlock()
>> 
>> -- 
>> 2.34.1
>>
diff mbox series

Patch

diff --git a/Documentation/RCU/rculist_nulls.rst b/Documentation/RCU/rculist_nulls.rst
index e06ed40bb6..77244adbdf 100644
--- a/Documentation/RCU/rculist_nulls.rst
+++ b/Documentation/RCU/rculist_nulls.rst
@@ -98,7 +98,7 @@  Quoting Corey Minyard::
 ----------------------
 
 We need to make sure a reader cannot read the new 'obj->obj_node.next' value
-and previous value of 'obj->key'. Otherwise, an item could be deleted
+and previous value of 'obj->key' at the same time. Otherwise, an item could be deleted
 from a chain, and inserted into another chain. If new chain was empty
 before the move, 'next' pointer is NULL, and lockless reader can not
 detect the fact that it missed following items in original chain.
@@ -112,7 +112,12 @@  detect the fact that it missed following items in original chain.
   obj = kmem_cache_alloc(...);
   lock_chain(); // typically a spin_lock()
   obj->key = key;
-  atomic_set_release(&obj->refcnt, 1); // key before refcnt
+  /*
+  * we need to make sure obj->key is updated before obj->obj_node.next
+  * and obj->refcnt
+  */
+  smp_wmb();
+  atomic_set(&obj->refcnt, 1);
   hlist_add_head_rcu(&obj->obj_node, list);
   unlock_chain(); // typically a spin_unlock()