diff mbox series

[net-next] net/tls: Use RCU API to access tls_ctx->netdev

Message ID 20220801080053.21849-1-maximmi@nvidia.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [net-next] net/tls: Use RCU API to access tls_ctx->netdev | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit fail Errors and warnings before: 58 this patch: 61
netdev/cc_maintainers warning 2 maintainers not CCed: leon@kernel.org linux-rdma@vger.kernel.org
netdev/build_clang success Errors and warnings before: 10 this patch: 10
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn fail Errors and warnings before: 58 this patch: 61
netdev/checkpatch warning WARNING: line length of 82 exceeds 80 columns WARNING: line length of 89 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Maxim Mikityanskiy Aug. 1, 2022, 8 a.m. UTC
Currently, tls_device_down synchronizes with tls_device_resync_rx using
RCU, however, the pointer to netdev is stored using WRITE_ONCE and
loaded using READ_ONCE.

Although such approach is technically correct (rcu_dereference is
essentially a READ_ONCE, and rcu_assign_pointer uses WRITE_ONCE to store
NULL), using special RCU helpers for pointers is more valid, as it
includes additional checks and might change the implementation
transparently to the callers.

Mark the netdev pointer as __rcu and use the correct RCU helpers to
access it. For non-concurrent access pass the right conditions that
guarantee safe access (locks taken, refcount value). Also use the
correct helper in mlx5e, where even READ_ONCE was missing.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
---
 .../mellanox/mlx5/core/en_accel/ktls_tx.c     |  2 +-
 include/net/tls.h                             |  2 +-
 net/tls/tls_device.c                          | 38 ++++++++++++++-----
 net/tls/tls_device_fallback.c                 |  3 +-
 4 files changed, 33 insertions(+), 12 deletions(-)

Comments

Jakub Kicinski Aug. 1, 2022, 7:42 p.m. UTC | #1
On Mon, 1 Aug 2022 11:00:53 +0300 Maxim Mikityanskiy wrote:
> @@ -1329,7 +1345,11 @@ static int tls_device_down(struct net_device *netdev)
>  
>  	spin_lock_irqsave(&tls_device_lock, flags);
>  	list_for_each_entry_safe(ctx, tmp, &tls_device_list, list) {
> -		if (ctx->netdev != netdev ||
> +		struct net_device *ctx_netdev =
> +			rcu_dereference_protected(ctx->netdev,
> +						  lockdep_is_held(&device_offload_lock));
> +
> +		if (ctx_netdev != netdev ||
>  		    !refcount_inc_not_zero(&ctx->refcount))
>  			continue;

For cases like this where we don't actually hold onto the object, just
take a peek at the address of it we can save a handful of LoC by using
rcu_access_pointer().
Jakub Kicinski Aug. 1, 2022, 7:44 p.m. UTC | #2
On Mon, 1 Aug 2022 11:00:53 +0300 Maxim Mikityanskiy wrote:
> Currently, tls_device_down synchronizes with tls_device_resync_rx using
> RCU, however, the pointer to netdev is stored using WRITE_ONCE and
> loaded using READ_ONCE.
> 
> Although such approach is technically correct (rcu_dereference is
> essentially a READ_ONCE, and rcu_assign_pointer uses WRITE_ONCE to store
> NULL), using special RCU helpers for pointers is more valid, as it
> includes additional checks and might change the implementation
> transparently to the callers.
> 
> Mark the netdev pointer as __rcu and use the correct RCU helpers to
> access it. For non-concurrent access pass the right conditions that
> guarantee safe access (locks taken, refcount value). Also use the
> correct helper in mlx5e, where even READ_ONCE was missing.

Oops, looks like we also got some new sparse warnings from this:

2 new warnings in drivers/net/bonding/bond_main.c
1 new warning  in drivers/net/ethernet/chelsio/inline_crypto/ch_ktls/chcr_ktls.c
Maxim Mikityanskiy Aug. 2, 2022, 12:03 p.m. UTC | #3
On Mon, 2022-08-01 at 12:42 -0700, Jakub Kicinski wrote:
> On Mon, 1 Aug 2022 11:00:53 +0300 Maxim Mikityanskiy wrote:
> > @@ -1329,7 +1345,11 @@ static int tls_device_down(struct net_device *netdev)
> >  
> >  	spin_lock_irqsave(&tls_device_lock, flags);
> >  	list_for_each_entry_safe(ctx, tmp, &tls_device_list, list) {
> > -		if (ctx->netdev != netdev ||
> > +		struct net_device *ctx_netdev =
> > +			rcu_dereference_protected(ctx->netdev,
> > +						  lockdep_is_held(&device_offload_lock));
> > +
> > +		if (ctx_netdev != netdev ||
> >  		    !refcount_inc_not_zero(&ctx->refcount))
> >  			continue;
> 
> For cases like this where we don't actually hold onto the object, just
> take a peek at the address of it we can save a handful of LoC by using
> rcu_access_pointer(). 

The documentation of rcu_access_pointer says it shouldn't be used on
the update side, because we lose lockdep protection:

--cut--

Although rcu_access_pointer() may also be used in cases
where update-side locks prevent the value of the pointer from changing,
you should instead use rcu_dereference_protected() for this use case.

--cut--

Though, I can change the read side to use rcu_access_pointer, wherever
the pointer is not dereferenced. But it won't save lines of code :)
Maxim Mikityanskiy Aug. 2, 2022, 12:07 p.m. UTC | #4
On Mon, 2022-08-01 at 12:44 -0700, Jakub Kicinski wrote:
> On Mon, 1 Aug 2022 11:00:53 +0300 Maxim Mikityanskiy wrote:
> > Currently, tls_device_down synchronizes with tls_device_resync_rx using
> > RCU, however, the pointer to netdev is stored using WRITE_ONCE and
> > loaded using READ_ONCE.
> > 
> > Although such approach is technically correct (rcu_dereference is
> > essentially a READ_ONCE, and rcu_assign_pointer uses WRITE_ONCE to store
> > NULL), using special RCU helpers for pointers is more valid, as it
> > includes additional checks and might change the implementation
> > transparently to the callers.
> > 
> > Mark the netdev pointer as __rcu and use the correct RCU helpers to
> > access it. For non-concurrent access pass the right conditions that
> > guarantee safe access (locks taken, refcount value). Also use the
> > correct helper in mlx5e, where even READ_ONCE was missing.
> 
> Oops, looks like we also got some new sparse warnings from this:
> 
> 2 new warnings in drivers/net/bonding/bond_main.c
> 1 new warning  in drivers/net/ethernet/chelsio/inline_crypto/ch_ktls/chcr_ktls.c

Looks like neither me, nor our internal CI built these files - sorry!
I'll fix these and look for the usages more carefully.

BTW, the bonding case misses even the READ_ONCE, so it's an existing
bug, exposed by the transition to the proper RCU API in my patch.
Jakub Kicinski Aug. 2, 2022, 3:37 p.m. UTC | #5
On Tue, 2 Aug 2022 12:03:32 +0000 Maxim Mikityanskiy wrote:
> > For cases like this where we don't actually hold onto the object, just
> > take a peek at the address of it we can save a handful of LoC by using
> > rcu_access_pointer().   
> 
> The documentation of rcu_access_pointer says it shouldn't be used on
> the update side, because we lose lockdep protection:
> 
> --cut--
> 
> Although rcu_access_pointer() may also be used in cases
> where update-side locks prevent the value of the pointer from changing,
> you should instead use rcu_dereference_protected() for this use case.

I think what this is trying to say is to not use the
rcu_access_pointer() as a hack against lockdep:


	lock(writer_lock);
	/* no need for rcu_dereference() because we have writer lock */
	ptr = rcu_access_pointer(obj->ptr);
	ptr->something = 1;
	unlock(writer_lock);

It's still perfectly fine to use access_pointer as intended on 
the write side, which is just checking the value of the pointer, 
not deferencing it:

	lock(writer_lock);
	if (rcu_access_pointer(obj->ptr) == target)
		so_something(obj);
	unlock(writer_lock);
Jakub Kicinski Aug. 2, 2022, 3:38 p.m. UTC | #6
On Tue, 2 Aug 2022 12:07:18 +0000 Maxim Mikityanskiy wrote:
> > Oops, looks like we also got some new sparse warnings from this:
> > 
> > 2 new warnings in drivers/net/bonding/bond_main.c
> > 1 new warning  in drivers/net/ethernet/chelsio/inline_crypto/ch_ktls/chcr_ktls.c  
> 
> Looks like neither me, nor our internal CI built these files - sorry!
> I'll fix these and look for the usages more carefully.
> 
> BTW, the bonding case misses even the READ_ONCE, so it's an existing
> bug, exposed by the transition to the proper RCU API in my patch.

Nice! You can slap a fixes tag on it, for accounting purposes, and stick
to net-next in the subject, tree doesn't matter right now.
kernel test robot Aug. 2, 2022, 4:26 p.m. UTC | #7
Hi Maxim,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Maxim-Mikityanskiy/net-tls-Use-RCU-API-to-access-tls_ctx-netdev/20220801-160258
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 63757225a93353bc2ce4499af5501eabdbbf23f9
config: mips-randconfig-s043-20220801 (https://download.01.org/0day-ci/archive/20220803/202208030026.UbhC51Ku-lkp@intel.com/config)
compiler: mips-linux-gcc (GCC) 12.1.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # apt-get install sparse
        # sparse version: v0.6.4-39-gce1a6720-dirty
        # https://github.com/intel-lab-lkp/linux/commit/9971662af97683a6f0ea3d752495bba5ef6229dc
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Maxim-Mikityanskiy/net-tls-Use-RCU-API-to-access-tls_ctx-netdev/20220801-160258
        git checkout 9971662af97683a6f0ea3d752495bba5ef6229dc
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=mips SHELL=/bin/bash drivers/net/bonding/

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>

sparse warnings: (new ones prefixed by >>)
   drivers/net/bonding/bond_main.c:2831:26: sparse: sparse: restricted __be16 degrades to integer
   drivers/net/bonding/bond_main.c:2837:20: sparse: sparse: restricted __be16 degrades to integer
   drivers/net/bonding/bond_main.c:2910:40: sparse: sparse: incorrect type in assignment (different base types) @@     expected restricted __be16 [usertype] vlan_proto @@     got int @@
   drivers/net/bonding/bond_main.c:2910:40: sparse:     expected restricted __be16 [usertype] vlan_proto
   drivers/net/bonding/bond_main.c:2910:40: sparse:     got int
>> drivers/net/bonding/bond_main.c:5336:13: sparse: sparse: incorrect type in argument 2 (different address spaces) @@     expected struct net_device *slave_dev @@     got struct net_device [noderef] __rcu *netdev @@
   drivers/net/bonding/bond_main.c:5336:13: sparse:     expected struct net_device *slave_dev
   drivers/net/bonding/bond_main.c:5336:13: sparse:     got struct net_device [noderef] __rcu *netdev
   drivers/net/bonding/bond_main.c:5337:75: sparse: sparse: incorrect type in argument 3 (different address spaces) @@     expected struct net_device *slave_dev @@     got struct net_device [noderef] __rcu *netdev @@
   drivers/net/bonding/bond_main.c:5337:75: sparse:     expected struct net_device *slave_dev
   drivers/net/bonding/bond_main.c:5337:75: sparse:     got struct net_device [noderef] __rcu *netdev

vim +5336 drivers/net/bonding/bond_main.c

007feb87fb1593 Tariq Toukan 2021-01-17  5331  
89df6a8104706f Tariq Toukan 2021-01-17  5332  #if IS_ENABLED(CONFIG_TLS_DEVICE)
89df6a8104706f Tariq Toukan 2021-01-17  5333  static netdev_tx_t bond_tls_device_xmit(struct bonding *bond, struct sk_buff *skb,
89df6a8104706f Tariq Toukan 2021-01-17  5334  					struct net_device *dev)
89df6a8104706f Tariq Toukan 2021-01-17  5335  {
89df6a8104706f Tariq Toukan 2021-01-17 @5336  	if (likely(bond_get_slave_by_dev(bond, tls_get_ctx(skb->sk)->netdev)))
89df6a8104706f Tariq Toukan 2021-01-17  5337  		return bond_dev_queue_xmit(bond, skb, tls_get_ctx(skb->sk)->netdev);
89df6a8104706f Tariq Toukan 2021-01-17  5338  	return bond_tx_drop(dev, skb);
89df6a8104706f Tariq Toukan 2021-01-17  5339  }
89df6a8104706f Tariq Toukan 2021-01-17  5340  #endif
89df6a8104706f Tariq Toukan 2021-01-17  5341
Maxim Mikityanskiy Aug. 3, 2022, 9:33 a.m. UTC | #8
On Tue, 2022-08-02 at 08:37 -0700, Jakub Kicinski wrote:
> On Tue, 2 Aug 2022 12:03:32 +0000 Maxim Mikityanskiy wrote:
> > > For cases like this where we don't actually hold onto the object, just
> > > take a peek at the address of it we can save a handful of LoC by using
> > > rcu_access_pointer().   
> > 
> > The documentation of rcu_access_pointer says it shouldn't be used on
> > the update side, because we lose lockdep protection:
> > 
> > --cut--
> > 
> > Although rcu_access_pointer() may also be used in cases
> > where update-side locks prevent the value of the pointer from changing,
> > you should instead use rcu_dereference_protected() for this use case.
> 
> I think what this is trying to say is to not use the
> rcu_access_pointer() as a hack against lockdep:

Well, maybe we understand it in different ways. This is how I parsed it
(the whole comment):

1. rcu_access_pointer is not for the read side. So, it's either for the
write side or for usage outside all locks.

2. It's not for dereferencing. So, it's for reading the pointer's value
on the write side or outside all locks.

3. Although it can be used on the write side, rcu_dereference_protected
should be used. So, it's for reading the pointer's value outside all
locks.

> 
> 
> 	lock(writer_lock);
> 	/* no need for rcu_dereference() because we have writer lock */
> 	ptr = rcu_access_pointer(obj->ptr);
> 	ptr->something = 1;
> 	unlock(writer_lock);

Here I totally agree with you, this isn't a valid usage, but I think
the documentation comment doesn't recommend any usage under the write
lock.

It's just a recommendation, though, so if you prefer rcu_access_pointer
to save a few lines of code, I can switch to it. We'll just lose extra
checks and add an unnecessary READ_ONCE under the hood.

> 
> It's still perfectly fine to use access_pointer as intended on 
> the write side, which is just checking the value of the pointer, 
> not deferencing it:

Sure, it's correct, but adds an extra READ_ONCE, which is not needed
under the lock, and skips the lockdep check, which is rather redundant
in this case:

down_write(&lock);
ptr = rcu_dereference_protected(rcu_ptr, lockdep_is_held(&lock));
...
up_write(&lock);

but is more meaningful in this piece of code:
	
/* Safe, because this is the destroy flow, refcount is 0, so
 * tls_device_down can't store this field in parallel.
 */
netdev = rcu_dereference_protected(ctx->netdev,
				   !refcount_read(&ctx->refcount));

async_cleanup = netdev && ctx->tx_conf == TLS_HW;

> 
> 	lock(writer_lock);
> 	if (rcu_access_pointer(obj->ptr) == target)
> 		so_something(obj);
> 	unlock(writer_lock);
Jakub Kicinski Aug. 3, 2022, 2:49 p.m. UTC | #9
On Wed, 3 Aug 2022 09:33:48 +0000 Maxim Mikityanskiy wrote:
> > > The documentation of rcu_access_pointer says it shouldn't be used on
> > > the update side, because we lose lockdep protection:
> > > 
> > > --cut--
> > > 
> > > Although rcu_access_pointer() may also be used in cases
> > > where update-side locks prevent the value of the pointer from changing,
> > > you should instead use rcu_dereference_protected() for this use case.  
> > 
> > I think what this is trying to say is to not use the
> > rcu_access_pointer() as a hack against lockdep:  
> 
> Well, maybe we understand it in different ways. This is how I parsed it
> (the whole comment):
> 
> 1. rcu_access_pointer is not for the read side. So, it's either for the
> write side or for usage outside all locks.
> 
> 2. It's not for dereferencing. So, it's for reading the pointer's value
> on the write side or outside all locks.
> 
> 3. Although it can be used on the write side, rcu_dereference_protected
> should be used. So, it's for reading the pointer's value outside all
> locks.

Using rcu_deref* when we don't dereference the pointer does not compute
for me, but it's not a big deal. 

Let me CC Paul for clarification of the docs, as it may also be
confusing to others and therefore worth rewording. But our case is 
not that important so unless Paul chimes in clearly indicating one
interpretation is right - either way is fine by me for v2.
Paul E. McKenney Aug. 3, 2022, 4:34 p.m. UTC | #10
On Wed, Aug 03, 2022 at 07:49:57AM -0700, Jakub Kicinski wrote:
> On Wed, 3 Aug 2022 09:33:48 +0000 Maxim Mikityanskiy wrote:
> > > > The documentation of rcu_access_pointer says it shouldn't be used on
> > > > the update side, because we lose lockdep protection:
> > > > 
> > > > --cut--
> > > > 
> > > > Although rcu_access_pointer() may also be used in cases
> > > > where update-side locks prevent the value of the pointer from changing,
> > > > you should instead use rcu_dereference_protected() for this use case.  
> > > 
> > > I think what this is trying to say is to not use the
> > > rcu_access_pointer() as a hack against lockdep:  
> > 
> > Well, maybe we understand it in different ways. This is how I parsed it
> > (the whole comment):
> > 
> > 1. rcu_access_pointer is not for the read side. So, it's either for the
> > write side or for usage outside all locks.

RCU readers really are permitted to use rcu_access_pointer().  As is
pretty much any other code.

See for example Documentation/RCU/rcu_dereference.rst:

	Note that if checks for being within an RCU read-side critical
	section are not required and the pointer is never dereferenced,
	rcu_access_pointer() should be used in place of rcu_dereference().

OK, s/should be/can be/, but I will fix that.

Or, for that matter, the rcu_access_pointer() docbook header comment:

/**
 * rcu_access_pointer() - fetch RCU pointer with no dereferencing
 * @p: The pointer to read
 *
 * Return the value of the specified RCU-protected pointer, but omit the
 * lockdep checks for being in an RCU read-side critical section.  This is
 * useful when the value of this pointer is accessed, but the pointer is
 * not dereferenced, for example, when testing an RCU-protected pointer
 * against NULL.  Although rcu_access_pointer() may also be used in cases
 * where update-side locks prevent the value of the pointer from changing,
 * you should instead use rcu_dereference_protected() for this use case.
 *
 * It is also permissible to use rcu_access_pointer() when read-side
 * access to the pointer was removed at least one grace period ago, as
 * is the case in the context of the RCU callback that is freeing up
 * the data, or after a synchronize_rcu() returns.  This can be useful
 * when tearing down multi-linked structures after a grace period
 * has elapsed.
 */

So the restriction is that the pointer returned from rcu_access_pointer()
cannot be dereferenced or that the structure is beyond being updated.

So this is OK:

	// Not in an RCU reader.  Or even in an RCU updater.
	if (rcu_access_pointer(my_rcu_pointer))
		do_something();
	...

And so is this:

	p = xchg(&my_rcu_pointer, NULL);
	if (p) {
		synchronize_rcu();
		// No one else has access to this list!
		while (p) {
			q = rcu_access_pointer(p->next);
			kfree(p);
			q = p;
			// But why are you hand-crafting list???
			// Any why not use rcu_dereference_protected()?
		}
	}

But this is not:

	p = rcu_access_pointer(my_rcu_pointer);
	do_something_with(p->a); // BUG!!!  Even in an RCU reader.

In this second case, you must instead use rcu_dereference() or
similar.

> > 2. It's not for dereferencing. So, it's for reading the pointer's value
> > on the write side or outside all locks.

True enough, you are not permitted to dereference the value returned
from rcu_access_pointer().  Unless you have the only copy.

But it is just fine to check the value of the pointer, compare it, or
do arithmetic on it.  Just don't dereference it and don't dereference
any value computed from it.

> > 3. Although it can be used on the write side, rcu_dereference_protected
> > should be used. So, it's for reading the pointer's value outside all
> > locks.

Yes, if an RCU updater is going to dereference a pointer, it should
use rcu_dereference_protected() rather than rcu_access_pointer().

So rcu_access_pointer() does what it it says.  It permits the caller
to access the value of the pointer, and only to access that value.
Not dereference that value.

> Using rcu_deref* when we don't dereference the pointer does not compute
> for me, but it's not a big deal. 

It is OK to use rcu_dereference*() to access a pointer without
dereferencing it.

One key difference between rcu_dereference() and rcu_access_pointer()
is that rcu_access_pointer() can be used outside of an RCU reader.
For example:

	// Not in an RCU reader.  Or even in an RCU updater.
	if (rcu_access_pointer(my_rcu_pointer)) {
		rcu_read_lock();
		p = rcu_dereference(my_rcu_pointer);
		if (p)
			do_something_with(p);
		rcu_read_unlock();
	}

This example is silly because the overhead of the extra check might well
exceed that of the rcu_read_lock() and rcu_read_unlock() put together,
especially in CONFIG_PREEMPTION=n kernels.  A less-silly example might
schedule a workqueue or similar to handle the RCU-protected data, and
the overhead of the extra check might be very worthwhile in that case.

> Let me CC Paul for clarification of the docs, as it may also be
> confusing to others and therefore worth rewording. But our case is 
> not that important so unless Paul chimes in clearly indicating one
> interpretation is right - either way is fine by me for v2.

Hope this helps!

And please let me know if it does not.

							Thanx, Paul
Maxim Mikityanskiy Aug. 4, 2022, 8:08 a.m. UTC | #11
On Wed, 2022-08-03 at 09:34 -0700, Paul E. McKenney wrote:
> On Wed, Aug 03, 2022 at 07:49:57AM -0700, Jakub Kicinski wrote:
> > On Wed, 3 Aug 2022 09:33:48 +0000 Maxim Mikityanskiy wrote:
> > > > > The documentation of rcu_access_pointer says it shouldn't be used on
> > > > > the update side, because we lose lockdep protection:
> > > > > 
> > > > > --cut--
> > > > > 
> > > > > Although rcu_access_pointer() may also be used in cases
> > > > > where update-side locks prevent the value of the pointer from changing,
> > > > > you should instead use rcu_dereference_protected() for this use case.  
> > > > 
> > > > I think what this is trying to say is to not use the
> > > > rcu_access_pointer() as a hack against lockdep:  
> > > 
> > > Well, maybe we understand it in different ways. This is how I parsed it
> > > (the whole comment):
> > > 
> > > 1. rcu_access_pointer is not for the read side. So, it's either for the
> > > write side or for usage outside all locks.
> 
> RCU readers really are permitted to use rcu_access_pointer().  As is
> pretty much any other code.
> 
> See for example Documentation/RCU/rcu_dereference.rst:
> 
> 	Note that if checks for being within an RCU read-side critical
> 	section are not required and the pointer is never dereferenced,
> 	rcu_access_pointer() should be used in place of rcu_dereference().
> 
> OK, s/should be/can be/, but I will fix that.
> 
> Or, for that matter, the rcu_access_pointer() docbook header comment:
> 
> /**
>  * rcu_access_pointer() - fetch RCU pointer with no dereferencing
>  * @p: The pointer to read
>  *
>  * Return the value of the specified RCU-protected pointer, but omit the
>  * lockdep checks for being in an RCU read-side critical section.  This is
>  * useful when the value of this pointer is accessed, but the pointer is
>  * not dereferenced, for example, when testing an RCU-protected pointer
>  * against NULL.  Although rcu_access_pointer() may also be used in cases
>  * where update-side locks prevent the value of the pointer from changing,
>  * you should instead use rcu_dereference_protected() for this use case.
>  *
>  * It is also permissible to use rcu_access_pointer() when read-side
>  * access to the pointer was removed at least one grace period ago, as
>  * is the case in the context of the RCU callback that is freeing up
>  * the data, or after a synchronize_rcu() returns.  This can be useful
>  * when tearing down multi-linked structures after a grace period
>  * has elapsed.
>  */
> 
> So the restriction is that the pointer returned from rcu_access_pointer()
> cannot be dereferenced or that the structure is beyond being updated.
> 
> So this is OK:
> 
> 	// Not in an RCU reader.  Or even in an RCU updater.
> 	if (rcu_access_pointer(my_rcu_pointer))
> 		do_something();
> 	...
> 
> And so is this:
> 
> 	p = xchg(&my_rcu_pointer, NULL);
> 	if (p) {
> 		synchronize_rcu();
> 		// No one else has access to this list!
> 		while (p) {
> 			q = rcu_access_pointer(p->next);
> 			kfree(p);
> 			q = p;
> 			// But why are you hand-crafting list???
> 			// Any why not use rcu_dereference_protected()?
> 		}
> 	}
> 
> But this is not:
> 
> 	p = rcu_access_pointer(my_rcu_pointer);
> 	do_something_with(p->a); // BUG!!!  Even in an RCU reader.
> 
> In this second case, you must instead use rcu_dereference() or
> similar.
> 
> > > 2. It's not for dereferencing. So, it's for reading the pointer's value
> > > on the write side or outside all locks.
> 
> True enough, you are not permitted to dereference the value returned
> from rcu_access_pointer().  Unless you have the only copy.
> 
> But it is just fine to check the value of the pointer, compare it, or
> do arithmetic on it.  Just don't dereference it and don't dereference
> any value computed from it.
> 
> > > 3. Although it can be used on the write side, rcu_dereference_protected
> > > should be used. So, it's for reading the pointer's value outside all
> > > locks.
> 
> Yes, if an RCU updater is going to dereference a pointer, it should
> use rcu_dereference_protected() rather than rcu_access_pointer().
> 
> So rcu_access_pointer() does what it it says.  It permits the caller
> to access the value of the pointer, and only to access that value.
> Not dereference that value.
> 
> > Using rcu_deref* when we don't dereference the pointer does not compute
> > for me, but it's not a big deal. 
> 
> It is OK to use rcu_dereference*() to access a pointer without
> dereferencing it.
> 
> One key difference between rcu_dereference() and rcu_access_pointer()
> is that rcu_access_pointer() can be used outside of an RCU reader.
> For example:
> 
> 	// Not in an RCU reader.  Or even in an RCU updater.
> 	if (rcu_access_pointer(my_rcu_pointer)) {
> 		rcu_read_lock();
> 		p = rcu_dereference(my_rcu_pointer);
> 		if (p)
> 			do_something_with(p);
> 		rcu_read_unlock();
> 	}
> 
> This example is silly because the overhead of the extra check might well
> exceed that of the rcu_read_lock() and rcu_read_unlock() put together,
> especially in CONFIG_PREEMPTION=n kernels.  A less-silly example might
> schedule a workqueue or similar to handle the RCU-protected data, and
> the overhead of the extra check might be very worthwhile in that case.

That's essentially one of our cases, we check a pointer and queue a
work if it's not NULL, and the work dereferences it. But we don't use
rcu_read_lock there, because it's the teardown flow, and there are no
concurrent users anymore (refcount is 0).

> 
> > Let me CC Paul for clarification of the docs, as it may also be
> > confusing to others and therefore worth rewording. But our case is 
> > not that important so unless Paul chimes in clearly indicating one
> > interpretation is right - either way is fine by me for v2.
> 
> Hope this helps!

Thanks a lot for your detailed explanation! It's truly useful,
especially the examples are helpful.

As far as I understood, rcu_access_pointer can be used in any context,
including RCU updater, as long as we don't dereference the pointer. At
the same time, it's OK to use rcu_dereference* without dereferencing.
So, is there any preference, which helper to use, in the context where
it can't be changed concurrently, if we don't dereference it, but just
compare the value?

Specifically, we have two (simplified) cases:

1. We set the pointer to NULL under the write-lock, but only if it
matches another pointer.

 down_write(&lock);
 ctx_netdev = rcu_dereference_protected(ctx->netdev,
                                        lockdep_is_held(&lock));
 if (ctx_netdev == netdev) {
         rcu_assign_pointer(ctx->netdev, NULL);
         synchronize_rcu();
 }
 up_write(&lock);

2. ctx->refcount goes down to 0, no one can access ctx->netdev anymore,
we tear down ctx and need to check whether ctx->netdev is NULL.

 if (!refcount_dec_and_test(&ctx->refcount))
         return;
 netdev = rcu_dereference_protected(ctx->netdev,
                                    !refcount_read(&ctx->refcount));
 if (netdev)
         queue_work(...);

It's somewhat similar to the "structure is beyond being updated" case,
but it's ensured by refcount, not by RCU (i.e. you example assigned
my_rcu_pointer = NULL and called synchronize_rcu() to ensure no one
touches it, and I ensure that we are the only user of ctx by dropping
refcount to zero).

So, hoping that my understanding of your explanation is correct, both
cases can use any of rcu_access_pointer or rcu_dereference_protected.
Is there some rule of thumb which one to pick in such case?

Thanks,
Max

> 
> And please let me know if it does not.
> 
> 							Thanx, Paul
Jakub Kicinski Aug. 4, 2022, 4:18 p.m. UTC | #12
On Thu, 4 Aug 2022 08:08:37 +0000 Maxim Mikityanskiy wrote:
> 2. ctx->refcount goes down to 0, no one can access ctx->netdev anymore,
> we tear down ctx and need to check whether ctx->netdev is NULL.
> 
>  if (!refcount_dec_and_test(&ctx->refcount))
>          return;
>  netdev = rcu_dereference_protected(ctx->netdev,
>                                     !refcount_read(&ctx->refcount));
>  if (netdev)
>          queue_work(...);
> 
> It's somewhat similar to the "structure is beyond being updated" case,
> but it's ensured by refcount, not by RCU (i.e. you example assigned
> my_rcu_pointer = NULL and called synchronize_rcu() to ensure no one
> touches it, and I ensure that we are the only user of ctx by dropping
> refcount to zero).
> 
> So, hoping that my understanding of your explanation is correct, both
> cases can use any of rcu_access_pointer or rcu_dereference_protected.
> Is there some rule of thumb which one to pick in such case?

IMHO, rcu_dereference_protected() documents why it's safe to
dereference the pointer outside of the rcu read section. 

We are only documenting the writer side locking. The fact that there
is a RCU pointer involved is coincidental - I think we could 
as well be checking the TLS_RX_DEV_DEGRADED bit here.

We can add asserts or comments to document the writer side locking.
Piggy backing on RCU seems coincidental. But again, I'm fine either 
way. I'm just saying this because I really doubt there is a rule of
thumb for this level of detail. It's most likely your call.
Paul E. McKenney Aug. 4, 2022, 6:40 p.m. UTC | #13
On Thu, Aug 04, 2022 at 08:08:37AM +0000, Maxim Mikityanskiy wrote:
> On Wed, 2022-08-03 at 09:34 -0700, Paul E. McKenney wrote:
> > On Wed, Aug 03, 2022 at 07:49:57AM -0700, Jakub Kicinski wrote:
> > > On Wed, 3 Aug 2022 09:33:48 +0000 Maxim Mikityanskiy wrote:
> > > > > > The documentation of rcu_access_pointer says it shouldn't be used on
> > > > > > the update side, because we lose lockdep protection:
> > > > > > 
> > > > > > --cut--
> > > > > > 
> > > > > > Although rcu_access_pointer() may also be used in cases
> > > > > > where update-side locks prevent the value of the pointer from changing,
> > > > > > you should instead use rcu_dereference_protected() for this use case.  
> > > > > 
> > > > > I think what this is trying to say is to not use the
> > > > > rcu_access_pointer() as a hack against lockdep:  
> > > > 
> > > > Well, maybe we understand it in different ways. This is how I parsed it
> > > > (the whole comment):
> > > > 
> > > > 1. rcu_access_pointer is not for the read side. So, it's either for the
> > > > write side or for usage outside all locks.
> > 
> > RCU readers really are permitted to use rcu_access_pointer().  As is
> > pretty much any other code.
> > 
> > See for example Documentation/RCU/rcu_dereference.rst:
> > 
> > 	Note that if checks for being within an RCU read-side critical
> > 	section are not required and the pointer is never dereferenced,
> > 	rcu_access_pointer() should be used in place of rcu_dereference().
> > 
> > OK, s/should be/can be/, but I will fix that.
> > 
> > Or, for that matter, the rcu_access_pointer() docbook header comment:
> > 
> > /**
> >  * rcu_access_pointer() - fetch RCU pointer with no dereferencing
> >  * @p: The pointer to read
> >  *
> >  * Return the value of the specified RCU-protected pointer, but omit the
> >  * lockdep checks for being in an RCU read-side critical section.  This is
> >  * useful when the value of this pointer is accessed, but the pointer is
> >  * not dereferenced, for example, when testing an RCU-protected pointer
> >  * against NULL.  Although rcu_access_pointer() may also be used in cases
> >  * where update-side locks prevent the value of the pointer from changing,
> >  * you should instead use rcu_dereference_protected() for this use case.
> >  *
> >  * It is also permissible to use rcu_access_pointer() when read-side
> >  * access to the pointer was removed at least one grace period ago, as
> >  * is the case in the context of the RCU callback that is freeing up
> >  * the data, or after a synchronize_rcu() returns.  This can be useful
> >  * when tearing down multi-linked structures after a grace period
> >  * has elapsed.
> >  */
> > 
> > So the restriction is that the pointer returned from rcu_access_pointer()
> > cannot be dereferenced or that the structure is beyond being updated.
> > 
> > So this is OK:
> > 
> > 	// Not in an RCU reader.  Or even in an RCU updater.
> > 	if (rcu_access_pointer(my_rcu_pointer))
> > 		do_something();
> > 	...
> > 
> > And so is this:
> > 
> > 	p = xchg(&my_rcu_pointer, NULL);
> > 	if (p) {
> > 		synchronize_rcu();
> > 		// No one else has access to this list!
> > 		while (p) {
> > 			q = rcu_access_pointer(p->next);
> > 			kfree(p);
> > 			q = p;
> > 			// But why are you hand-crafting list???
> > 			// Any why not use rcu_dereference_protected()?
> > 		}
> > 	}
> > 
> > But this is not:
> > 
> > 	p = rcu_access_pointer(my_rcu_pointer);
> > 	do_something_with(p->a); // BUG!!!  Even in an RCU reader.
> > 
> > In this second case, you must instead use rcu_dereference() or
> > similar.
> > 
> > > > 2. It's not for dereferencing. So, it's for reading the pointer's value
> > > > on the write side or outside all locks.
> > 
> > True enough, you are not permitted to dereference the value returned
> > from rcu_access_pointer().  Unless you have the only copy.
> > 
> > But it is just fine to check the value of the pointer, compare it, or
> > do arithmetic on it.  Just don't dereference it and don't dereference
> > any value computed from it.
> > 
> > > > 3. Although it can be used on the write side, rcu_dereference_protected
> > > > should be used. So, it's for reading the pointer's value outside all
> > > > locks.
> > 
> > Yes, if an RCU updater is going to dereference a pointer, it should
> > use rcu_dereference_protected() rather than rcu_access_pointer().
> > 
> > So rcu_access_pointer() does what it it says.  It permits the caller
> > to access the value of the pointer, and only to access that value.
> > Not dereference that value.
> > 
> > > Using rcu_deref* when we don't dereference the pointer does not compute
> > > for me, but it's not a big deal. 
> > 
> > It is OK to use rcu_dereference*() to access a pointer without
> > dereferencing it.
> > 
> > One key difference between rcu_dereference() and rcu_access_pointer()
> > is that rcu_access_pointer() can be used outside of an RCU reader.
> > For example:
> > 
> > 	// Not in an RCU reader.  Or even in an RCU updater.
> > 	if (rcu_access_pointer(my_rcu_pointer)) {
> > 		rcu_read_lock();
> > 		p = rcu_dereference(my_rcu_pointer);
> > 		if (p)
> > 			do_something_with(p);
> > 		rcu_read_unlock();
> > 	}
> > 
> > This example is silly because the overhead of the extra check might well
> > exceed that of the rcu_read_lock() and rcu_read_unlock() put together,
> > especially in CONFIG_PREEMPTION=n kernels.  A less-silly example might
> > schedule a workqueue or similar to handle the RCU-protected data, and
> > the overhead of the extra check might be very worthwhile in that case.
> 
> That's essentially one of our cases, we check a pointer and queue a
> work if it's not NULL, and the work dereferences it. But we don't use
> rcu_read_lock there, because it's the teardown flow, and there are no
> concurrent users anymore (refcount is 0).

Very good!

> > > Let me CC Paul for clarification of the docs, as it may also be
> > > confusing to others and therefore worth rewording. But our case is 
> > > not that important so unless Paul chimes in clearly indicating one
> > > interpretation is right - either way is fine by me for v2.
> > 
> > Hope this helps!
> 
> Thanks a lot for your detailed explanation! It's truly useful,
> especially the examples are helpful.

Would you like to nominate a simple use case for addition to the
official documentation?

> As far as I understood, rcu_access_pointer can be used in any context,
> including RCU updater, as long as we don't dereference the pointer. At
> the same time, it's OK to use rcu_dereference* without dereferencing.
> So, is there any preference, which helper to use, in the context where
> it can't be changed concurrently, if we don't dereference it, but just
> compare the value?

If you are in an RCU reader, there is little reason to use
rcu_access_pointer(), though it might be a good documentation aid.
It can also be helpful in code that is called both from an RCU reader
and from either and updater or an RCU outsider.

> Specifically, we have two (simplified) cases:
> 
> 1. We set the pointer to NULL under the write-lock, but only if it
> matches another pointer.
> 
>  down_write(&lock);
>  ctx_netdev = rcu_dereference_protected(ctx->netdev,
>                                         lockdep_is_held(&lock));
>  if (ctx_netdev == netdev) {
>          rcu_assign_pointer(ctx->netdev, NULL);
>          synchronize_rcu();
>  }
>  up_write(&lock);

I suggest keeping rcu_dereference_protected() here because it documents
the locking.  This might seem silly in this case because you just
acquired that lock, but code can grown functions and then be subject to
copy-pasta-induced bugs, and that call to rcu_dereference_protected()
can help locate such bugs.  In contrast, rcu_assign_pointer() will
cheerfully aid and abet the creation of this sort of bug.

> 2. ctx->refcount goes down to 0, no one can access ctx->netdev anymore,
> we tear down ctx and need to check whether ctx->netdev is NULL.
> 
>  if (!refcount_dec_and_test(&ctx->refcount))
>          return;
>  netdev = rcu_dereference_protected(ctx->netdev,
>                                     !refcount_read(&ctx->refcount));
>  if (netdev)
>          queue_work(...);
> 
> It's somewhat similar to the "structure is beyond being updated" case,
> but it's ensured by refcount, not by RCU (i.e. you example assigned
> my_rcu_pointer = NULL and called synchronize_rcu() to ensure no one
> touches it, and I ensure that we are the only user of ctx by dropping
> refcount to zero).

If that refcount_dec_and_test() is in an RCU callback or if RCU
readers are otherwise guaranteed to no longer be accessing this
object, then rcu_access_pointer() would work.  But again, that
rcu_dereference_protected() has the advantage of protecting against
copy-pasta bugs.

Otherwise, I would need to better understand the example.

> So, hoping that my understanding of your explanation is correct, both
> cases can use any of rcu_access_pointer or rcu_dereference_protected.
> Is there some rule of thumb which one to pick in such case?

If you have something meaningful to put into the lockdep condition of
rcu_dereference_protected(), you should use rcu_dereference_protected().
If not, and if either:

a.	There are no concurrent updates possible, or

b.	There will be no dereferencing of the returned pointer,

then rcu_access_pointer() can be useful.

Also, rcu_access_pointer() can sometimes simplify common code.

> Thanks,
> Max
> 
> > 
> > And please let me know if it does not.
> > 
> > 							Thanx, Paul
>
Maxim Mikityanskiy Aug. 5, 2022, 10:59 a.m. UTC | #14
On Thu, 2022-08-04 at 09:18 -0700, Jakub Kicinski wrote:
> On Thu, 4 Aug 2022 08:08:37 +0000 Maxim Mikityanskiy wrote:
> > 2. ctx->refcount goes down to 0, no one can access ctx->netdev anymore,
> > we tear down ctx and need to check whether ctx->netdev is NULL.
> > 
> >  if (!refcount_dec_and_test(&ctx->refcount))
> >          return;
> >  netdev = rcu_dereference_protected(ctx->netdev,
> >                                     !refcount_read(&ctx->refcount));
> >  if (netdev)
> >          queue_work(...);
> > 
> > It's somewhat similar to the "structure is beyond being updated" case,
> > but it's ensured by refcount, not by RCU (i.e. you example assigned
> > my_rcu_pointer = NULL and called synchronize_rcu() to ensure no one
> > touches it, and I ensure that we are the only user of ctx by dropping
> > refcount to zero).
> > 
> > So, hoping that my understanding of your explanation is correct, both
> > cases can use any of rcu_access_pointer or rcu_dereference_protected.
> > Is there some rule of thumb which one to pick in such case?
> 
> IMHO, rcu_dereference_protected() documents why it's safe to
> dereference the pointer outside of the rcu read section. 
> 
> We are only documenting the writer side locking. The fact that there
> is a RCU pointer involved is coincidental - I think we could 
> as well be checking the TLS_RX_DEV_DEGRADED bit here.

Yes, checking TLS_RX_DEV_DEGRADED would be equivalent in the current
implementation. But I don't want to give this flag extra
responsibilities (currently it's for RX resync only) if we can check
the netdev pointer, it should be more flexible in long term.

> We can add asserts or comments to document the writer side locking.
> Piggy backing on RCU seems coincidental. But again, I'm fine either 
> way. I'm just saying this because I really doubt there is a rule of
> thumb for this level of detail. It's most likely your call.

OK, I'd keep rcu_dereference_protected with its asserts, rather than
reinvent my own asserts.
Maxim Mikityanskiy Aug. 5, 2022, 10:59 a.m. UTC | #15
On Thu, 2022-08-04 at 11:40 -0700, Paul E. McKenney wrote:
> On Thu, Aug 04, 2022 at 08:08:37AM +0000, Maxim Mikityanskiy wrote:
> > On Wed, 2022-08-03 at 09:34 -0700, Paul E. McKenney wrote:
> > > On Wed, Aug 03, 2022 at 07:49:57AM -0700, Jakub Kicinski wrote:
> > > > On Wed, 3 Aug 2022 09:33:48 +0000 Maxim Mikityanskiy wrote:
> > > > > > > The documentation of rcu_access_pointer says it shouldn't be used on
> > > > > > > the update side, because we lose lockdep protection:
> > > > > > > 
> > > > > > > --cut--
> > > > > > > 
> > > > > > > Although rcu_access_pointer() may also be used in cases
> > > > > > > where update-side locks prevent the value of the pointer from changing,
> > > > > > > you should instead use rcu_dereference_protected() for this use case.  
> > > > > > 
> > > > > > I think what this is trying to say is to not use the
> > > > > > rcu_access_pointer() as a hack against lockdep:  
> > > > > 
> > > > > Well, maybe we understand it in different ways. This is how I parsed it
> > > > > (the whole comment):
> > > > > 
> > > > > 1. rcu_access_pointer is not for the read side. So, it's either for the
> > > > > write side or for usage outside all locks.
> > > 
> > > RCU readers really are permitted to use rcu_access_pointer().  As is
> > > pretty much any other code.
> > > 
> > > See for example Documentation/RCU/rcu_dereference.rst:
> > > 
> > > 	Note that if checks for being within an RCU read-side critical
> > > 	section are not required and the pointer is never dereferenced,
> > > 	rcu_access_pointer() should be used in place of rcu_dereference().
> > > 
> > > OK, s/should be/can be/, but I will fix that.
> > > 
> > > Or, for that matter, the rcu_access_pointer() docbook header comment:
> > > 
> > > /**
> > >  * rcu_access_pointer() - fetch RCU pointer with no dereferencing
> > >  * @p: The pointer to read
> > >  *
> > >  * Return the value of the specified RCU-protected pointer, but omit the
> > >  * lockdep checks for being in an RCU read-side critical section.  This is
> > >  * useful when the value of this pointer is accessed, but the pointer is
> > >  * not dereferenced, for example, when testing an RCU-protected pointer
> > >  * against NULL.  Although rcu_access_pointer() may also be used in cases
> > >  * where update-side locks prevent the value of the pointer from changing,
> > >  * you should instead use rcu_dereference_protected() for this use case.
> > >  *
> > >  * It is also permissible to use rcu_access_pointer() when read-side
> > >  * access to the pointer was removed at least one grace period ago, as
> > >  * is the case in the context of the RCU callback that is freeing up
> > >  * the data, or after a synchronize_rcu() returns.  This can be useful
> > >  * when tearing down multi-linked structures after a grace period
> > >  * has elapsed.
> > >  */
> > > 
> > > So the restriction is that the pointer returned from rcu_access_pointer()
> > > cannot be dereferenced or that the structure is beyond being updated.
> > > 
> > > So this is OK:
> > > 
> > > 	// Not in an RCU reader.  Or even in an RCU updater.
> > > 	if (rcu_access_pointer(my_rcu_pointer))
> > > 		do_something();
> > > 	...
> > > 
> > > And so is this:
> > > 
> > > 	p = xchg(&my_rcu_pointer, NULL);
> > > 	if (p) {
> > > 		synchronize_rcu();
> > > 		// No one else has access to this list!
> > > 		while (p) {
> > > 			q = rcu_access_pointer(p->next);
> > > 			kfree(p);
> > > 			q = p;
> > > 			// But why are you hand-crafting list???
> > > 			// Any why not use rcu_dereference_protected()?
> > > 		}
> > > 	}
> > > 
> > > But this is not:
> > > 
> > > 	p = rcu_access_pointer(my_rcu_pointer);
> > > 	do_something_with(p->a); // BUG!!!  Even in an RCU reader.
> > > 
> > > In this second case, you must instead use rcu_dereference() or
> > > similar.
> > > 
> > > > > 2. It's not for dereferencing. So, it's for reading the pointer's value
> > > > > on the write side or outside all locks.
> > > 
> > > True enough, you are not permitted to dereference the value returned
> > > from rcu_access_pointer().  Unless you have the only copy.
> > > 
> > > But it is just fine to check the value of the pointer, compare it, or
> > > do arithmetic on it.  Just don't dereference it and don't dereference
> > > any value computed from it.
> > > 
> > > > > 3. Although it can be used on the write side, rcu_dereference_protected
> > > > > should be used. So, it's for reading the pointer's value outside all
> > > > > locks.
> > > 
> > > Yes, if an RCU updater is going to dereference a pointer, it should
> > > use rcu_dereference_protected() rather than rcu_access_pointer().
> > > 
> > > So rcu_access_pointer() does what it it says.  It permits the caller
> > > to access the value of the pointer, and only to access that value.
> > > Not dereference that value.
> > > 
> > > > Using rcu_deref* when we don't dereference the pointer does not compute
> > > > for me, but it's not a big deal. 
> > > 
> > > It is OK to use rcu_dereference*() to access a pointer without
> > > dereferencing it.
> > > 
> > > One key difference between rcu_dereference() and rcu_access_pointer()
> > > is that rcu_access_pointer() can be used outside of an RCU reader.
> > > For example:
> > > 
> > > 	// Not in an RCU reader.  Or even in an RCU updater.
> > > 	if (rcu_access_pointer(my_rcu_pointer)) {
> > > 		rcu_read_lock();
> > > 		p = rcu_dereference(my_rcu_pointer);
> > > 		if (p)
> > > 			do_something_with(p);
> > > 		rcu_read_unlock();
> > > 	}
> > > 
> > > This example is silly because the overhead of the extra check might well
> > > exceed that of the rcu_read_lock() and rcu_read_unlock() put together,
> > > especially in CONFIG_PREEMPTION=n kernels.  A less-silly example might
> > > schedule a workqueue or similar to handle the RCU-protected data, and
> > > the overhead of the extra check might be very worthwhile in that case.
> > 
> > That's essentially one of our cases, we check a pointer and queue a
> > work if it's not NULL, and the work dereferences it. But we don't use
> > rcu_read_lock there, because it's the teardown flow, and there are no
> > concurrent users anymore (refcount is 0).
> 
> Very good!
> 
> > > > Let me CC Paul for clarification of the docs, as it may also be
> > > > confusing to others and therefore worth rewording. But our case is 
> > > > not that important so unless Paul chimes in clearly indicating one
> > > > interpretation is right - either way is fine by me for v2.
> > > 
> > > Hope this helps!
> > 
> > Thanks a lot for your detailed explanation! It's truly useful,
> > especially the examples are helpful.
> 
> Would you like to nominate a simple use case for addition to the
> official documentation?

The example with freeing the list (xchg, synchronize_rcu,
rcu_access_pointer) would be a nice illustration to the last paragraph
of rcu_access_pointer's doc comment. Probably not in the comment
itself, but under Documentation/.

Also the example with rcu_access_pointer + queue_work, which takes
rcu_read_lock and calls rcu_dereference could illustrate the usefulness
of rcu_access_pointer outside all locks.

> 
> > As far as I understood, rcu_access_pointer can be used in any context,
> > including RCU updater, as long as we don't dereference the pointer. At
> > the same time, it's OK to use rcu_dereference* without dereferencing.
> > So, is there any preference, which helper to use, in the context where
> > it can't be changed concurrently, if we don't dereference it, but just
> > compare the value?
> 
> If you are in an RCU reader, there is little reason to use
> rcu_access_pointer(), though it might be a good documentation aid.
> It can also be helpful in code that is called both from an RCU reader
> and from either and updater or an RCU outsider.
> 
> > Specifically, we have two (simplified) cases:
> > 
> > 1. We set the pointer to NULL under the write-lock, but only if it
> > matches another pointer.
> > 
> >  down_write(&lock);
> >  ctx_netdev = rcu_dereference_protected(ctx->netdev,
> >                                         lockdep_is_held(&lock));
> >  if (ctx_netdev == netdev) {
> >          rcu_assign_pointer(ctx->netdev, NULL);
> >          synchronize_rcu();
> >  }
> >  up_write(&lock);
> 
> I suggest keeping rcu_dereference_protected() here because it documents
> the locking.  This might seem silly in this case because you just
> acquired that lock, but code can grown functions and then be subject to
> copy-pasta-induced bugs, and that call to rcu_dereference_protected()
> can help locate such bugs.  In contrast, rcu_assign_pointer() will
> cheerfully aid and abet the creation of this sort of bug.

OK, I'll keep it.

> 
> > 2. ctx->refcount goes down to 0, no one can access ctx->netdev anymore,
> > we tear down ctx and need to check whether ctx->netdev is NULL.
> > 
> >  if (!refcount_dec_and_test(&ctx->refcount))
> >          return;
> >  netdev = rcu_dereference_protected(ctx->netdev,
> >                                     !refcount_read(&ctx->refcount));
> >  if (netdev)
> >          queue_work(...);
> > 
> > It's somewhat similar to the "structure is beyond being updated" case,
> > but it's ensured by refcount, not by RCU (i.e. you example assigned
> > my_rcu_pointer = NULL and called synchronize_rcu() to ensure no one
> > touches it, and I ensure that we are the only user of ctx by dropping
> > refcount to zero).
> 
> If that refcount_dec_and_test() is in an RCU callback or if RCU
> readers are otherwise guaranteed to no longer be accessing this
> object, then rcu_access_pointer() would work. 

It's the latter - no one else can access ctx after its refcount dropped
to zero.

>  But again, that
> rcu_dereference_protected() has the advantage of protecting against
> copy-pasta bugs.

OK then, I'll use that advantage.

> Otherwise, I would need to better understand the example.
> 
> > So, hoping that my understanding of your explanation is correct, both
> > cases can use any of rcu_access_pointer or rcu_dereference_protected.
> > Is there some rule of thumb which one to pick in such case?
> 
> If you have something meaningful to put into the lockdep condition of
> rcu_dereference_protected(), you should use rcu_dereference_protected().
> If not, and if either:
> 
> a.	There are no concurrent updates possible, or
> 
> b.	There will be no dereferencing of the returned pointer,
> 
> then rcu_access_pointer() can be useful.
> 
> Also, rcu_access_pointer() can sometimes simplify common code.

Thanks again for your advice and providing a detailed explanation! I'll
stick to this rule, and maybe it's a good idea to put it to some
prominent place in the documentation to simplify the choice for other
developers.

> 
> > Thanks,
> > Max
> > 
> > > 
> > > And please let me know if it does not.
> > > 
> > > 							Thanx, Paul
> >
diff mbox series

Patch

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c
index 6b6c7044b64a..d8dea2aa7ade 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c
@@ -819,7 +819,7 @@  bool mlx5e_ktls_handle_tx_skb(struct net_device *netdev, struct mlx5e_txqsq *sq,
 	mlx5e_tx_mpwqe_ensure_complete(sq);
 
 	tls_ctx = tls_get_ctx(skb->sk);
-	if (WARN_ON_ONCE(tls_ctx->netdev != netdev))
+	if (WARN_ON_ONCE(rcu_dereference_bh(tls_ctx->netdev) != netdev))
 		goto err_out;
 
 	priv_tx = mlx5e_get_ktls_tx_priv_ctx(tls_ctx);
diff --git a/include/net/tls.h b/include/net/tls.h
index b75b5727abdb..cb205f9d9473 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -237,7 +237,7 @@  struct tls_context {
 	void *priv_ctx_tx;
 	void *priv_ctx_rx;
 
-	struct net_device *netdev;
+	struct net_device __rcu *netdev;
 
 	/* rw cache line */
 	struct cipher_context tx;
diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
index 18c7e5c6d228..351d8959d80f 100644
--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -71,7 +71,13 @@  static void tls_device_tx_del_task(struct work_struct *work)
 	struct tls_offload_context_tx *offload_ctx =
 		container_of(work, struct tls_offload_context_tx, destruct_work);
 	struct tls_context *ctx = offload_ctx->ctx;
-	struct net_device *netdev = ctx->netdev;
+	struct net_device *netdev;
+
+	/* Safe, because this is the destroy flow, refcount is 0, so
+	 * tls_device_down can't store this field in parallel.
+	 */
+	netdev = rcu_dereference_protected(ctx->netdev,
+					   !refcount_read(&ctx->refcount));
 
 	netdev->tlsdev_ops->tls_dev_del(netdev, ctx, TLS_OFFLOAD_CTX_DIR_TX);
 	dev_put(netdev);
@@ -81,6 +87,7 @@  static void tls_device_tx_del_task(struct work_struct *work)
 
 static void tls_device_queue_ctx_destruction(struct tls_context *ctx)
 {
+	struct net_device *netdev;
 	unsigned long flags;
 	bool async_cleanup;
 
@@ -91,7 +98,14 @@  static void tls_device_queue_ctx_destruction(struct tls_context *ctx)
 	}
 
 	list_del(&ctx->list); /* Remove from tls_device_list / tls_device_down_list */
-	async_cleanup = ctx->netdev && ctx->tx_conf == TLS_HW;
+
+	/* Safe, because this is the destroy flow, refcount is 0, so
+	 * tls_device_down can't store this field in parallel.
+	 */
+	netdev = rcu_dereference_protected(ctx->netdev,
+					   !refcount_read(&ctx->refcount));
+
+	async_cleanup = netdev && ctx->tx_conf == TLS_HW;
 	if (async_cleanup) {
 		struct tls_offload_context_tx *offload_ctx = tls_offload_ctx_tx(ctx);
 
@@ -229,7 +243,8 @@  static void tls_device_resync_tx(struct sock *sk, struct tls_context *tls_ctx,
 
 	trace_tls_device_tx_resync_send(sk, seq, rcd_sn);
 	down_read(&device_offload_lock);
-	netdev = tls_ctx->netdev;
+	netdev = rcu_dereference_protected(tls_ctx->netdev,
+					   lockdep_is_held(&device_offload_lock));
 	if (netdev)
 		err = netdev->tlsdev_ops->tls_dev_resync(netdev, sk, seq,
 							 rcd_sn,
@@ -710,7 +725,7 @@  static void tls_device_resync_rx(struct tls_context *tls_ctx,
 
 	trace_tls_device_rx_resync_send(sk, seq, rcd_sn, rx_ctx->resync_type);
 	rcu_read_lock();
-	netdev = READ_ONCE(tls_ctx->netdev);
+	netdev = rcu_dereference(tls_ctx->netdev);
 	if (netdev)
 		netdev->tlsdev_ops->tls_dev_resync(netdev, sk, seq, rcd_sn,
 						   TLS_OFFLOAD_CTX_DIR_RX);
@@ -1029,7 +1044,7 @@  static void tls_device_attach(struct tls_context *ctx, struct sock *sk,
 	if (sk->sk_destruct != tls_device_sk_destruct) {
 		refcount_set(&ctx->refcount, 1);
 		dev_hold(netdev);
-		ctx->netdev = netdev;
+		RCU_INIT_POINTER(ctx->netdev, netdev);
 		spin_lock_irq(&tls_device_lock);
 		list_add_tail(&ctx->list, &tls_device_list);
 		spin_unlock_irq(&tls_device_lock);
@@ -1300,7 +1315,8 @@  void tls_device_offload_cleanup_rx(struct sock *sk)
 	struct net_device *netdev;
 
 	down_read(&device_offload_lock);
-	netdev = tls_ctx->netdev;
+	netdev = rcu_dereference_protected(tls_ctx->netdev,
+					   lockdep_is_held(&device_offload_lock));
 	if (!netdev)
 		goto out;
 
@@ -1309,7 +1325,7 @@  void tls_device_offload_cleanup_rx(struct sock *sk)
 
 	if (tls_ctx->tx_conf != TLS_HW) {
 		dev_put(netdev);
-		tls_ctx->netdev = NULL;
+		rcu_assign_pointer(tls_ctx->netdev, NULL);
 	} else {
 		set_bit(TLS_RX_DEV_CLOSED, &tls_ctx->flags);
 	}
@@ -1329,7 +1345,11 @@  static int tls_device_down(struct net_device *netdev)
 
 	spin_lock_irqsave(&tls_device_lock, flags);
 	list_for_each_entry_safe(ctx, tmp, &tls_device_list, list) {
-		if (ctx->netdev != netdev ||
+		struct net_device *ctx_netdev =
+			rcu_dereference_protected(ctx->netdev,
+						  lockdep_is_held(&device_offload_lock));
+
+		if (ctx_netdev != netdev ||
 		    !refcount_inc_not_zero(&ctx->refcount))
 			continue;
 
@@ -1346,7 +1366,7 @@  static int tls_device_down(struct net_device *netdev)
 		/* Stop the RX and TX resync.
 		 * tls_dev_resync must not be called after tls_dev_del.
 		 */
-		WRITE_ONCE(ctx->netdev, NULL);
+		rcu_assign_pointer(ctx->netdev, NULL);
 
 		/* Start skipping the RX resync logic completely. */
 		set_bit(TLS_RX_DEV_DEGRADED, &ctx->flags);
diff --git a/net/tls/tls_device_fallback.c b/net/tls/tls_device_fallback.c
index 618cee704217..7dfc8023e0f1 100644
--- a/net/tls/tls_device_fallback.c
+++ b/net/tls/tls_device_fallback.c
@@ -426,7 +426,8 @@  struct sk_buff *tls_validate_xmit_skb(struct sock *sk,
 				      struct net_device *dev,
 				      struct sk_buff *skb)
 {
-	if (dev == tls_get_ctx(sk)->netdev || netif_is_bond_master(dev))
+	if (dev == rcu_dereference_bh(tls_get_ctx(sk)->netdev) ||
+	    netif_is_bond_master(dev))
 		return skb;
 
 	return tls_sw_fallback(sk, skb);