Message ID | 20220801080053.21849-1-maximmi@nvidia.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net-next] net/tls: Use RCU API to access tls_ctx->netdev | expand |
On Mon, 1 Aug 2022 11:00:53 +0300 Maxim Mikityanskiy wrote: > @@ -1329,7 +1345,11 @@ static int tls_device_down(struct net_device *netdev) > > spin_lock_irqsave(&tls_device_lock, flags); > list_for_each_entry_safe(ctx, tmp, &tls_device_list, list) { > - if (ctx->netdev != netdev || > + struct net_device *ctx_netdev = > + rcu_dereference_protected(ctx->netdev, > + lockdep_is_held(&device_offload_lock)); > + > + if (ctx_netdev != netdev || > !refcount_inc_not_zero(&ctx->refcount)) > continue; For cases like this where we don't actually hold onto the object, just take a peek at the address of it we can save a handful of LoC by using rcu_access_pointer().
On Mon, 1 Aug 2022 11:00:53 +0300 Maxim Mikityanskiy wrote: > Currently, tls_device_down synchronizes with tls_device_resync_rx using > RCU, however, the pointer to netdev is stored using WRITE_ONCE and > loaded using READ_ONCE. > > Although such approach is technically correct (rcu_dereference is > essentially a READ_ONCE, and rcu_assign_pointer uses WRITE_ONCE to store > NULL), using special RCU helpers for pointers is more valid, as it > includes additional checks and might change the implementation > transparently to the callers. > > Mark the netdev pointer as __rcu and use the correct RCU helpers to > access it. For non-concurrent access pass the right conditions that > guarantee safe access (locks taken, refcount value). Also use the > correct helper in mlx5e, where even READ_ONCE was missing. Oops, looks like we also got some new sparse warnings from this: 2 new warnings in drivers/net/bonding/bond_main.c 1 new warning in drivers/net/ethernet/chelsio/inline_crypto/ch_ktls/chcr_ktls.c
On Mon, 2022-08-01 at 12:42 -0700, Jakub Kicinski wrote: > On Mon, 1 Aug 2022 11:00:53 +0300 Maxim Mikityanskiy wrote: > > @@ -1329,7 +1345,11 @@ static int tls_device_down(struct net_device *netdev) > > > > spin_lock_irqsave(&tls_device_lock, flags); > > list_for_each_entry_safe(ctx, tmp, &tls_device_list, list) { > > - if (ctx->netdev != netdev || > > + struct net_device *ctx_netdev = > > + rcu_dereference_protected(ctx->netdev, > > + lockdep_is_held(&device_offload_lock)); > > + > > + if (ctx_netdev != netdev || > > !refcount_inc_not_zero(&ctx->refcount)) > > continue; > > For cases like this where we don't actually hold onto the object, just > take a peek at the address of it we can save a handful of LoC by using > rcu_access_pointer(). The documentation of rcu_access_pointer says it shouldn't be used on the update side, because we lose lockdep protection: --cut-- Although rcu_access_pointer() may also be used in cases where update-side locks prevent the value of the pointer from changing, you should instead use rcu_dereference_protected() for this use case. --cut-- Though, I can change the read side to use rcu_access_pointer, wherever the pointer is not dereferenced. But it won't save lines of code :)
On Mon, 2022-08-01 at 12:44 -0700, Jakub Kicinski wrote: > On Mon, 1 Aug 2022 11:00:53 +0300 Maxim Mikityanskiy wrote: > > Currently, tls_device_down synchronizes with tls_device_resync_rx using > > RCU, however, the pointer to netdev is stored using WRITE_ONCE and > > loaded using READ_ONCE. > > > > Although such approach is technically correct (rcu_dereference is > > essentially a READ_ONCE, and rcu_assign_pointer uses WRITE_ONCE to store > > NULL), using special RCU helpers for pointers is more valid, as it > > includes additional checks and might change the implementation > > transparently to the callers. > > > > Mark the netdev pointer as __rcu and use the correct RCU helpers to > > access it. For non-concurrent access pass the right conditions that > > guarantee safe access (locks taken, refcount value). Also use the > > correct helper in mlx5e, where even READ_ONCE was missing. > > Oops, looks like we also got some new sparse warnings from this: > > 2 new warnings in drivers/net/bonding/bond_main.c > 1 new warning in drivers/net/ethernet/chelsio/inline_crypto/ch_ktls/chcr_ktls.c Looks like neither me, nor our internal CI built these files - sorry! I'll fix these and look for the usages more carefully. BTW, the bonding case misses even the READ_ONCE, so it's an existing bug, exposed by the transition to the proper RCU API in my patch.
On Tue, 2 Aug 2022 12:03:32 +0000 Maxim Mikityanskiy wrote: > > For cases like this where we don't actually hold onto the object, just > > take a peek at the address of it we can save a handful of LoC by using > > rcu_access_pointer(). > > The documentation of rcu_access_pointer says it shouldn't be used on > the update side, because we lose lockdep protection: > > --cut-- > > Although rcu_access_pointer() may also be used in cases > where update-side locks prevent the value of the pointer from changing, > you should instead use rcu_dereference_protected() for this use case. I think what this is trying to say is to not use the rcu_access_pointer() as a hack against lockdep: lock(writer_lock); /* no need for rcu_dereference() because we have writer lock */ ptr = rcu_access_pointer(obj->ptr); ptr->something = 1; unlock(writer_lock); It's still perfectly fine to use access_pointer as intended on the write side, which is just checking the value of the pointer, not deferencing it: lock(writer_lock); if (rcu_access_pointer(obj->ptr) == target) so_something(obj); unlock(writer_lock);
On Tue, 2 Aug 2022 12:07:18 +0000 Maxim Mikityanskiy wrote: > > Oops, looks like we also got some new sparse warnings from this: > > > > 2 new warnings in drivers/net/bonding/bond_main.c > > 1 new warning in drivers/net/ethernet/chelsio/inline_crypto/ch_ktls/chcr_ktls.c > > Looks like neither me, nor our internal CI built these files - sorry! > I'll fix these and look for the usages more carefully. > > BTW, the bonding case misses even the READ_ONCE, so it's an existing > bug, exposed by the transition to the proper RCU API in my patch. Nice! You can slap a fixes tag on it, for accounting purposes, and stick to net-next in the subject, tree doesn't matter right now.
Hi Maxim,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on net-next/master]
url: https://github.com/intel-lab-lkp/linux/commits/Maxim-Mikityanskiy/net-tls-Use-RCU-API-to-access-tls_ctx-netdev/20220801-160258
base: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 63757225a93353bc2ce4499af5501eabdbbf23f9
config: mips-randconfig-s043-20220801 (https://download.01.org/0day-ci/archive/20220803/202208030026.UbhC51Ku-lkp@intel.com/config)
compiler: mips-linux-gcc (GCC) 12.1.0
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# apt-get install sparse
# sparse version: v0.6.4-39-gce1a6720-dirty
# https://github.com/intel-lab-lkp/linux/commit/9971662af97683a6f0ea3d752495bba5ef6229dc
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Maxim-Mikityanskiy/net-tls-Use-RCU-API-to-access-tls_ctx-netdev/20220801-160258
git checkout 9971662af97683a6f0ea3d752495bba5ef6229dc
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=mips SHELL=/bin/bash drivers/net/bonding/
If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>
sparse warnings: (new ones prefixed by >>)
drivers/net/bonding/bond_main.c:2831:26: sparse: sparse: restricted __be16 degrades to integer
drivers/net/bonding/bond_main.c:2837:20: sparse: sparse: restricted __be16 degrades to integer
drivers/net/bonding/bond_main.c:2910:40: sparse: sparse: incorrect type in assignment (different base types) @@ expected restricted __be16 [usertype] vlan_proto @@ got int @@
drivers/net/bonding/bond_main.c:2910:40: sparse: expected restricted __be16 [usertype] vlan_proto
drivers/net/bonding/bond_main.c:2910:40: sparse: got int
>> drivers/net/bonding/bond_main.c:5336:13: sparse: sparse: incorrect type in argument 2 (different address spaces) @@ expected struct net_device *slave_dev @@ got struct net_device [noderef] __rcu *netdev @@
drivers/net/bonding/bond_main.c:5336:13: sparse: expected struct net_device *slave_dev
drivers/net/bonding/bond_main.c:5336:13: sparse: got struct net_device [noderef] __rcu *netdev
drivers/net/bonding/bond_main.c:5337:75: sparse: sparse: incorrect type in argument 3 (different address spaces) @@ expected struct net_device *slave_dev @@ got struct net_device [noderef] __rcu *netdev @@
drivers/net/bonding/bond_main.c:5337:75: sparse: expected struct net_device *slave_dev
drivers/net/bonding/bond_main.c:5337:75: sparse: got struct net_device [noderef] __rcu *netdev
vim +5336 drivers/net/bonding/bond_main.c
007feb87fb1593 Tariq Toukan 2021-01-17 5331
89df6a8104706f Tariq Toukan 2021-01-17 5332 #if IS_ENABLED(CONFIG_TLS_DEVICE)
89df6a8104706f Tariq Toukan 2021-01-17 5333 static netdev_tx_t bond_tls_device_xmit(struct bonding *bond, struct sk_buff *skb,
89df6a8104706f Tariq Toukan 2021-01-17 5334 struct net_device *dev)
89df6a8104706f Tariq Toukan 2021-01-17 5335 {
89df6a8104706f Tariq Toukan 2021-01-17 @5336 if (likely(bond_get_slave_by_dev(bond, tls_get_ctx(skb->sk)->netdev)))
89df6a8104706f Tariq Toukan 2021-01-17 5337 return bond_dev_queue_xmit(bond, skb, tls_get_ctx(skb->sk)->netdev);
89df6a8104706f Tariq Toukan 2021-01-17 5338 return bond_tx_drop(dev, skb);
89df6a8104706f Tariq Toukan 2021-01-17 5339 }
89df6a8104706f Tariq Toukan 2021-01-17 5340 #endif
89df6a8104706f Tariq Toukan 2021-01-17 5341
On Tue, 2022-08-02 at 08:37 -0700, Jakub Kicinski wrote: > On Tue, 2 Aug 2022 12:03:32 +0000 Maxim Mikityanskiy wrote: > > > For cases like this where we don't actually hold onto the object, just > > > take a peek at the address of it we can save a handful of LoC by using > > > rcu_access_pointer(). > > > > The documentation of rcu_access_pointer says it shouldn't be used on > > the update side, because we lose lockdep protection: > > > > --cut-- > > > > Although rcu_access_pointer() may also be used in cases > > where update-side locks prevent the value of the pointer from changing, > > you should instead use rcu_dereference_protected() for this use case. > > I think what this is trying to say is to not use the > rcu_access_pointer() as a hack against lockdep: Well, maybe we understand it in different ways. This is how I parsed it (the whole comment): 1. rcu_access_pointer is not for the read side. So, it's either for the write side or for usage outside all locks. 2. It's not for dereferencing. So, it's for reading the pointer's value on the write side or outside all locks. 3. Although it can be used on the write side, rcu_dereference_protected should be used. So, it's for reading the pointer's value outside all locks. > > > lock(writer_lock); > /* no need for rcu_dereference() because we have writer lock */ > ptr = rcu_access_pointer(obj->ptr); > ptr->something = 1; > unlock(writer_lock); Here I totally agree with you, this isn't a valid usage, but I think the documentation comment doesn't recommend any usage under the write lock. It's just a recommendation, though, so if you prefer rcu_access_pointer to save a few lines of code, I can switch to it. We'll just lose extra checks and add an unnecessary READ_ONCE under the hood. > > It's still perfectly fine to use access_pointer as intended on > the write side, which is just checking the value of the pointer, > not deferencing it: Sure, it's correct, but adds an extra READ_ONCE, which is not needed under the lock, and skips the lockdep check, which is rather redundant in this case: down_write(&lock); ptr = rcu_dereference_protected(rcu_ptr, lockdep_is_held(&lock)); ... up_write(&lock); but is more meaningful in this piece of code: /* Safe, because this is the destroy flow, refcount is 0, so * tls_device_down can't store this field in parallel. */ netdev = rcu_dereference_protected(ctx->netdev, !refcount_read(&ctx->refcount)); async_cleanup = netdev && ctx->tx_conf == TLS_HW; > > lock(writer_lock); > if (rcu_access_pointer(obj->ptr) == target) > so_something(obj); > unlock(writer_lock);
On Wed, 3 Aug 2022 09:33:48 +0000 Maxim Mikityanskiy wrote: > > > The documentation of rcu_access_pointer says it shouldn't be used on > > > the update side, because we lose lockdep protection: > > > > > > --cut-- > > > > > > Although rcu_access_pointer() may also be used in cases > > > where update-side locks prevent the value of the pointer from changing, > > > you should instead use rcu_dereference_protected() for this use case. > > > > I think what this is trying to say is to not use the > > rcu_access_pointer() as a hack against lockdep: > > Well, maybe we understand it in different ways. This is how I parsed it > (the whole comment): > > 1. rcu_access_pointer is not for the read side. So, it's either for the > write side or for usage outside all locks. > > 2. It's not for dereferencing. So, it's for reading the pointer's value > on the write side or outside all locks. > > 3. Although it can be used on the write side, rcu_dereference_protected > should be used. So, it's for reading the pointer's value outside all > locks. Using rcu_deref* when we don't dereference the pointer does not compute for me, but it's not a big deal. Let me CC Paul for clarification of the docs, as it may also be confusing to others and therefore worth rewording. But our case is not that important so unless Paul chimes in clearly indicating one interpretation is right - either way is fine by me for v2.
On Wed, Aug 03, 2022 at 07:49:57AM -0700, Jakub Kicinski wrote: > On Wed, 3 Aug 2022 09:33:48 +0000 Maxim Mikityanskiy wrote: > > > > The documentation of rcu_access_pointer says it shouldn't be used on > > > > the update side, because we lose lockdep protection: > > > > > > > > --cut-- > > > > > > > > Although rcu_access_pointer() may also be used in cases > > > > where update-side locks prevent the value of the pointer from changing, > > > > you should instead use rcu_dereference_protected() for this use case. > > > > > > I think what this is trying to say is to not use the > > > rcu_access_pointer() as a hack against lockdep: > > > > Well, maybe we understand it in different ways. This is how I parsed it > > (the whole comment): > > > > 1. rcu_access_pointer is not for the read side. So, it's either for the > > write side or for usage outside all locks. RCU readers really are permitted to use rcu_access_pointer(). As is pretty much any other code. See for example Documentation/RCU/rcu_dereference.rst: Note that if checks for being within an RCU read-side critical section are not required and the pointer is never dereferenced, rcu_access_pointer() should be used in place of rcu_dereference(). OK, s/should be/can be/, but I will fix that. Or, for that matter, the rcu_access_pointer() docbook header comment: /** * rcu_access_pointer() - fetch RCU pointer with no dereferencing * @p: The pointer to read * * Return the value of the specified RCU-protected pointer, but omit the * lockdep checks for being in an RCU read-side critical section. This is * useful when the value of this pointer is accessed, but the pointer is * not dereferenced, for example, when testing an RCU-protected pointer * against NULL. Although rcu_access_pointer() may also be used in cases * where update-side locks prevent the value of the pointer from changing, * you should instead use rcu_dereference_protected() for this use case. * * It is also permissible to use rcu_access_pointer() when read-side * access to the pointer was removed at least one grace period ago, as * is the case in the context of the RCU callback that is freeing up * the data, or after a synchronize_rcu() returns. This can be useful * when tearing down multi-linked structures after a grace period * has elapsed. */ So the restriction is that the pointer returned from rcu_access_pointer() cannot be dereferenced or that the structure is beyond being updated. So this is OK: // Not in an RCU reader. Or even in an RCU updater. if (rcu_access_pointer(my_rcu_pointer)) do_something(); ... And so is this: p = xchg(&my_rcu_pointer, NULL); if (p) { synchronize_rcu(); // No one else has access to this list! while (p) { q = rcu_access_pointer(p->next); kfree(p); q = p; // But why are you hand-crafting list??? // Any why not use rcu_dereference_protected()? } } But this is not: p = rcu_access_pointer(my_rcu_pointer); do_something_with(p->a); // BUG!!! Even in an RCU reader. In this second case, you must instead use rcu_dereference() or similar. > > 2. It's not for dereferencing. So, it's for reading the pointer's value > > on the write side or outside all locks. True enough, you are not permitted to dereference the value returned from rcu_access_pointer(). Unless you have the only copy. But it is just fine to check the value of the pointer, compare it, or do arithmetic on it. Just don't dereference it and don't dereference any value computed from it. > > 3. Although it can be used on the write side, rcu_dereference_protected > > should be used. So, it's for reading the pointer's value outside all > > locks. Yes, if an RCU updater is going to dereference a pointer, it should use rcu_dereference_protected() rather than rcu_access_pointer(). So rcu_access_pointer() does what it it says. It permits the caller to access the value of the pointer, and only to access that value. Not dereference that value. > Using rcu_deref* when we don't dereference the pointer does not compute > for me, but it's not a big deal. It is OK to use rcu_dereference*() to access a pointer without dereferencing it. One key difference between rcu_dereference() and rcu_access_pointer() is that rcu_access_pointer() can be used outside of an RCU reader. For example: // Not in an RCU reader. Or even in an RCU updater. if (rcu_access_pointer(my_rcu_pointer)) { rcu_read_lock(); p = rcu_dereference(my_rcu_pointer); if (p) do_something_with(p); rcu_read_unlock(); } This example is silly because the overhead of the extra check might well exceed that of the rcu_read_lock() and rcu_read_unlock() put together, especially in CONFIG_PREEMPTION=n kernels. A less-silly example might schedule a workqueue or similar to handle the RCU-protected data, and the overhead of the extra check might be very worthwhile in that case. > Let me CC Paul for clarification of the docs, as it may also be > confusing to others and therefore worth rewording. But our case is > not that important so unless Paul chimes in clearly indicating one > interpretation is right - either way is fine by me for v2. Hope this helps! And please let me know if it does not. Thanx, Paul
On Wed, 2022-08-03 at 09:34 -0700, Paul E. McKenney wrote: > On Wed, Aug 03, 2022 at 07:49:57AM -0700, Jakub Kicinski wrote: > > On Wed, 3 Aug 2022 09:33:48 +0000 Maxim Mikityanskiy wrote: > > > > > The documentation of rcu_access_pointer says it shouldn't be used on > > > > > the update side, because we lose lockdep protection: > > > > > > > > > > --cut-- > > > > > > > > > > Although rcu_access_pointer() may also be used in cases > > > > > where update-side locks prevent the value of the pointer from changing, > > > > > you should instead use rcu_dereference_protected() for this use case. > > > > > > > > I think what this is trying to say is to not use the > > > > rcu_access_pointer() as a hack against lockdep: > > > > > > Well, maybe we understand it in different ways. This is how I parsed it > > > (the whole comment): > > > > > > 1. rcu_access_pointer is not for the read side. So, it's either for the > > > write side or for usage outside all locks. > > RCU readers really are permitted to use rcu_access_pointer(). As is > pretty much any other code. > > See for example Documentation/RCU/rcu_dereference.rst: > > Note that if checks for being within an RCU read-side critical > section are not required and the pointer is never dereferenced, > rcu_access_pointer() should be used in place of rcu_dereference(). > > OK, s/should be/can be/, but I will fix that. > > Or, for that matter, the rcu_access_pointer() docbook header comment: > > /** > * rcu_access_pointer() - fetch RCU pointer with no dereferencing > * @p: The pointer to read > * > * Return the value of the specified RCU-protected pointer, but omit the > * lockdep checks for being in an RCU read-side critical section. This is > * useful when the value of this pointer is accessed, but the pointer is > * not dereferenced, for example, when testing an RCU-protected pointer > * against NULL. Although rcu_access_pointer() may also be used in cases > * where update-side locks prevent the value of the pointer from changing, > * you should instead use rcu_dereference_protected() for this use case. > * > * It is also permissible to use rcu_access_pointer() when read-side > * access to the pointer was removed at least one grace period ago, as > * is the case in the context of the RCU callback that is freeing up > * the data, or after a synchronize_rcu() returns. This can be useful > * when tearing down multi-linked structures after a grace period > * has elapsed. > */ > > So the restriction is that the pointer returned from rcu_access_pointer() > cannot be dereferenced or that the structure is beyond being updated. > > So this is OK: > > // Not in an RCU reader. Or even in an RCU updater. > if (rcu_access_pointer(my_rcu_pointer)) > do_something(); > ... > > And so is this: > > p = xchg(&my_rcu_pointer, NULL); > if (p) { > synchronize_rcu(); > // No one else has access to this list! > while (p) { > q = rcu_access_pointer(p->next); > kfree(p); > q = p; > // But why are you hand-crafting list??? > // Any why not use rcu_dereference_protected()? > } > } > > But this is not: > > p = rcu_access_pointer(my_rcu_pointer); > do_something_with(p->a); // BUG!!! Even in an RCU reader. > > In this second case, you must instead use rcu_dereference() or > similar. > > > > 2. It's not for dereferencing. So, it's for reading the pointer's value > > > on the write side or outside all locks. > > True enough, you are not permitted to dereference the value returned > from rcu_access_pointer(). Unless you have the only copy. > > But it is just fine to check the value of the pointer, compare it, or > do arithmetic on it. Just don't dereference it and don't dereference > any value computed from it. > > > > 3. Although it can be used on the write side, rcu_dereference_protected > > > should be used. So, it's for reading the pointer's value outside all > > > locks. > > Yes, if an RCU updater is going to dereference a pointer, it should > use rcu_dereference_protected() rather than rcu_access_pointer(). > > So rcu_access_pointer() does what it it says. It permits the caller > to access the value of the pointer, and only to access that value. > Not dereference that value. > > > Using rcu_deref* when we don't dereference the pointer does not compute > > for me, but it's not a big deal. > > It is OK to use rcu_dereference*() to access a pointer without > dereferencing it. > > One key difference between rcu_dereference() and rcu_access_pointer() > is that rcu_access_pointer() can be used outside of an RCU reader. > For example: > > // Not in an RCU reader. Or even in an RCU updater. > if (rcu_access_pointer(my_rcu_pointer)) { > rcu_read_lock(); > p = rcu_dereference(my_rcu_pointer); > if (p) > do_something_with(p); > rcu_read_unlock(); > } > > This example is silly because the overhead of the extra check might well > exceed that of the rcu_read_lock() and rcu_read_unlock() put together, > especially in CONFIG_PREEMPTION=n kernels. A less-silly example might > schedule a workqueue or similar to handle the RCU-protected data, and > the overhead of the extra check might be very worthwhile in that case. That's essentially one of our cases, we check a pointer and queue a work if it's not NULL, and the work dereferences it. But we don't use rcu_read_lock there, because it's the teardown flow, and there are no concurrent users anymore (refcount is 0). > > > Let me CC Paul for clarification of the docs, as it may also be > > confusing to others and therefore worth rewording. But our case is > > not that important so unless Paul chimes in clearly indicating one > > interpretation is right - either way is fine by me for v2. > > Hope this helps! Thanks a lot for your detailed explanation! It's truly useful, especially the examples are helpful. As far as I understood, rcu_access_pointer can be used in any context, including RCU updater, as long as we don't dereference the pointer. At the same time, it's OK to use rcu_dereference* without dereferencing. So, is there any preference, which helper to use, in the context where it can't be changed concurrently, if we don't dereference it, but just compare the value? Specifically, we have two (simplified) cases: 1. We set the pointer to NULL under the write-lock, but only if it matches another pointer. down_write(&lock); ctx_netdev = rcu_dereference_protected(ctx->netdev, lockdep_is_held(&lock)); if (ctx_netdev == netdev) { rcu_assign_pointer(ctx->netdev, NULL); synchronize_rcu(); } up_write(&lock); 2. ctx->refcount goes down to 0, no one can access ctx->netdev anymore, we tear down ctx and need to check whether ctx->netdev is NULL. if (!refcount_dec_and_test(&ctx->refcount)) return; netdev = rcu_dereference_protected(ctx->netdev, !refcount_read(&ctx->refcount)); if (netdev) queue_work(...); It's somewhat similar to the "structure is beyond being updated" case, but it's ensured by refcount, not by RCU (i.e. you example assigned my_rcu_pointer = NULL and called synchronize_rcu() to ensure no one touches it, and I ensure that we are the only user of ctx by dropping refcount to zero). So, hoping that my understanding of your explanation is correct, both cases can use any of rcu_access_pointer or rcu_dereference_protected. Is there some rule of thumb which one to pick in such case? Thanks, Max > > And please let me know if it does not. > > Thanx, Paul
On Thu, 4 Aug 2022 08:08:37 +0000 Maxim Mikityanskiy wrote: > 2. ctx->refcount goes down to 0, no one can access ctx->netdev anymore, > we tear down ctx and need to check whether ctx->netdev is NULL. > > if (!refcount_dec_and_test(&ctx->refcount)) > return; > netdev = rcu_dereference_protected(ctx->netdev, > !refcount_read(&ctx->refcount)); > if (netdev) > queue_work(...); > > It's somewhat similar to the "structure is beyond being updated" case, > but it's ensured by refcount, not by RCU (i.e. you example assigned > my_rcu_pointer = NULL and called synchronize_rcu() to ensure no one > touches it, and I ensure that we are the only user of ctx by dropping > refcount to zero). > > So, hoping that my understanding of your explanation is correct, both > cases can use any of rcu_access_pointer or rcu_dereference_protected. > Is there some rule of thumb which one to pick in such case? IMHO, rcu_dereference_protected() documents why it's safe to dereference the pointer outside of the rcu read section. We are only documenting the writer side locking. The fact that there is a RCU pointer involved is coincidental - I think we could as well be checking the TLS_RX_DEV_DEGRADED bit here. We can add asserts or comments to document the writer side locking. Piggy backing on RCU seems coincidental. But again, I'm fine either way. I'm just saying this because I really doubt there is a rule of thumb for this level of detail. It's most likely your call.
On Thu, Aug 04, 2022 at 08:08:37AM +0000, Maxim Mikityanskiy wrote: > On Wed, 2022-08-03 at 09:34 -0700, Paul E. McKenney wrote: > > On Wed, Aug 03, 2022 at 07:49:57AM -0700, Jakub Kicinski wrote: > > > On Wed, 3 Aug 2022 09:33:48 +0000 Maxim Mikityanskiy wrote: > > > > > > The documentation of rcu_access_pointer says it shouldn't be used on > > > > > > the update side, because we lose lockdep protection: > > > > > > > > > > > > --cut-- > > > > > > > > > > > > Although rcu_access_pointer() may also be used in cases > > > > > > where update-side locks prevent the value of the pointer from changing, > > > > > > you should instead use rcu_dereference_protected() for this use case. > > > > > > > > > > I think what this is trying to say is to not use the > > > > > rcu_access_pointer() as a hack against lockdep: > > > > > > > > Well, maybe we understand it in different ways. This is how I parsed it > > > > (the whole comment): > > > > > > > > 1. rcu_access_pointer is not for the read side. So, it's either for the > > > > write side or for usage outside all locks. > > > > RCU readers really are permitted to use rcu_access_pointer(). As is > > pretty much any other code. > > > > See for example Documentation/RCU/rcu_dereference.rst: > > > > Note that if checks for being within an RCU read-side critical > > section are not required and the pointer is never dereferenced, > > rcu_access_pointer() should be used in place of rcu_dereference(). > > > > OK, s/should be/can be/, but I will fix that. > > > > Or, for that matter, the rcu_access_pointer() docbook header comment: > > > > /** > > * rcu_access_pointer() - fetch RCU pointer with no dereferencing > > * @p: The pointer to read > > * > > * Return the value of the specified RCU-protected pointer, but omit the > > * lockdep checks for being in an RCU read-side critical section. This is > > * useful when the value of this pointer is accessed, but the pointer is > > * not dereferenced, for example, when testing an RCU-protected pointer > > * against NULL. Although rcu_access_pointer() may also be used in cases > > * where update-side locks prevent the value of the pointer from changing, > > * you should instead use rcu_dereference_protected() for this use case. > > * > > * It is also permissible to use rcu_access_pointer() when read-side > > * access to the pointer was removed at least one grace period ago, as > > * is the case in the context of the RCU callback that is freeing up > > * the data, or after a synchronize_rcu() returns. This can be useful > > * when tearing down multi-linked structures after a grace period > > * has elapsed. > > */ > > > > So the restriction is that the pointer returned from rcu_access_pointer() > > cannot be dereferenced or that the structure is beyond being updated. > > > > So this is OK: > > > > // Not in an RCU reader. Or even in an RCU updater. > > if (rcu_access_pointer(my_rcu_pointer)) > > do_something(); > > ... > > > > And so is this: > > > > p = xchg(&my_rcu_pointer, NULL); > > if (p) { > > synchronize_rcu(); > > // No one else has access to this list! > > while (p) { > > q = rcu_access_pointer(p->next); > > kfree(p); > > q = p; > > // But why are you hand-crafting list??? > > // Any why not use rcu_dereference_protected()? > > } > > } > > > > But this is not: > > > > p = rcu_access_pointer(my_rcu_pointer); > > do_something_with(p->a); // BUG!!! Even in an RCU reader. > > > > In this second case, you must instead use rcu_dereference() or > > similar. > > > > > > 2. It's not for dereferencing. So, it's for reading the pointer's value > > > > on the write side or outside all locks. > > > > True enough, you are not permitted to dereference the value returned > > from rcu_access_pointer(). Unless you have the only copy. > > > > But it is just fine to check the value of the pointer, compare it, or > > do arithmetic on it. Just don't dereference it and don't dereference > > any value computed from it. > > > > > > 3. Although it can be used on the write side, rcu_dereference_protected > > > > should be used. So, it's for reading the pointer's value outside all > > > > locks. > > > > Yes, if an RCU updater is going to dereference a pointer, it should > > use rcu_dereference_protected() rather than rcu_access_pointer(). > > > > So rcu_access_pointer() does what it it says. It permits the caller > > to access the value of the pointer, and only to access that value. > > Not dereference that value. > > > > > Using rcu_deref* when we don't dereference the pointer does not compute > > > for me, but it's not a big deal. > > > > It is OK to use rcu_dereference*() to access a pointer without > > dereferencing it. > > > > One key difference between rcu_dereference() and rcu_access_pointer() > > is that rcu_access_pointer() can be used outside of an RCU reader. > > For example: > > > > // Not in an RCU reader. Or even in an RCU updater. > > if (rcu_access_pointer(my_rcu_pointer)) { > > rcu_read_lock(); > > p = rcu_dereference(my_rcu_pointer); > > if (p) > > do_something_with(p); > > rcu_read_unlock(); > > } > > > > This example is silly because the overhead of the extra check might well > > exceed that of the rcu_read_lock() and rcu_read_unlock() put together, > > especially in CONFIG_PREEMPTION=n kernels. A less-silly example might > > schedule a workqueue or similar to handle the RCU-protected data, and > > the overhead of the extra check might be very worthwhile in that case. > > That's essentially one of our cases, we check a pointer and queue a > work if it's not NULL, and the work dereferences it. But we don't use > rcu_read_lock there, because it's the teardown flow, and there are no > concurrent users anymore (refcount is 0). Very good! > > > Let me CC Paul for clarification of the docs, as it may also be > > > confusing to others and therefore worth rewording. But our case is > > > not that important so unless Paul chimes in clearly indicating one > > > interpretation is right - either way is fine by me for v2. > > > > Hope this helps! > > Thanks a lot for your detailed explanation! It's truly useful, > especially the examples are helpful. Would you like to nominate a simple use case for addition to the official documentation? > As far as I understood, rcu_access_pointer can be used in any context, > including RCU updater, as long as we don't dereference the pointer. At > the same time, it's OK to use rcu_dereference* without dereferencing. > So, is there any preference, which helper to use, in the context where > it can't be changed concurrently, if we don't dereference it, but just > compare the value? If you are in an RCU reader, there is little reason to use rcu_access_pointer(), though it might be a good documentation aid. It can also be helpful in code that is called both from an RCU reader and from either and updater or an RCU outsider. > Specifically, we have two (simplified) cases: > > 1. We set the pointer to NULL under the write-lock, but only if it > matches another pointer. > > down_write(&lock); > ctx_netdev = rcu_dereference_protected(ctx->netdev, > lockdep_is_held(&lock)); > if (ctx_netdev == netdev) { > rcu_assign_pointer(ctx->netdev, NULL); > synchronize_rcu(); > } > up_write(&lock); I suggest keeping rcu_dereference_protected() here because it documents the locking. This might seem silly in this case because you just acquired that lock, but code can grown functions and then be subject to copy-pasta-induced bugs, and that call to rcu_dereference_protected() can help locate such bugs. In contrast, rcu_assign_pointer() will cheerfully aid and abet the creation of this sort of bug. > 2. ctx->refcount goes down to 0, no one can access ctx->netdev anymore, > we tear down ctx and need to check whether ctx->netdev is NULL. > > if (!refcount_dec_and_test(&ctx->refcount)) > return; > netdev = rcu_dereference_protected(ctx->netdev, > !refcount_read(&ctx->refcount)); > if (netdev) > queue_work(...); > > It's somewhat similar to the "structure is beyond being updated" case, > but it's ensured by refcount, not by RCU (i.e. you example assigned > my_rcu_pointer = NULL and called synchronize_rcu() to ensure no one > touches it, and I ensure that we are the only user of ctx by dropping > refcount to zero). If that refcount_dec_and_test() is in an RCU callback or if RCU readers are otherwise guaranteed to no longer be accessing this object, then rcu_access_pointer() would work. But again, that rcu_dereference_protected() has the advantage of protecting against copy-pasta bugs. Otherwise, I would need to better understand the example. > So, hoping that my understanding of your explanation is correct, both > cases can use any of rcu_access_pointer or rcu_dereference_protected. > Is there some rule of thumb which one to pick in such case? If you have something meaningful to put into the lockdep condition of rcu_dereference_protected(), you should use rcu_dereference_protected(). If not, and if either: a. There are no concurrent updates possible, or b. There will be no dereferencing of the returned pointer, then rcu_access_pointer() can be useful. Also, rcu_access_pointer() can sometimes simplify common code. > Thanks, > Max > > > > > And please let me know if it does not. > > > > Thanx, Paul >
On Thu, 2022-08-04 at 09:18 -0700, Jakub Kicinski wrote: > On Thu, 4 Aug 2022 08:08:37 +0000 Maxim Mikityanskiy wrote: > > 2. ctx->refcount goes down to 0, no one can access ctx->netdev anymore, > > we tear down ctx and need to check whether ctx->netdev is NULL. > > > > if (!refcount_dec_and_test(&ctx->refcount)) > > return; > > netdev = rcu_dereference_protected(ctx->netdev, > > !refcount_read(&ctx->refcount)); > > if (netdev) > > queue_work(...); > > > > It's somewhat similar to the "structure is beyond being updated" case, > > but it's ensured by refcount, not by RCU (i.e. you example assigned > > my_rcu_pointer = NULL and called synchronize_rcu() to ensure no one > > touches it, and I ensure that we are the only user of ctx by dropping > > refcount to zero). > > > > So, hoping that my understanding of your explanation is correct, both > > cases can use any of rcu_access_pointer or rcu_dereference_protected. > > Is there some rule of thumb which one to pick in such case? > > IMHO, rcu_dereference_protected() documents why it's safe to > dereference the pointer outside of the rcu read section. > > We are only documenting the writer side locking. The fact that there > is a RCU pointer involved is coincidental - I think we could > as well be checking the TLS_RX_DEV_DEGRADED bit here. Yes, checking TLS_RX_DEV_DEGRADED would be equivalent in the current implementation. But I don't want to give this flag extra responsibilities (currently it's for RX resync only) if we can check the netdev pointer, it should be more flexible in long term. > We can add asserts or comments to document the writer side locking. > Piggy backing on RCU seems coincidental. But again, I'm fine either > way. I'm just saying this because I really doubt there is a rule of > thumb for this level of detail. It's most likely your call. OK, I'd keep rcu_dereference_protected with its asserts, rather than reinvent my own asserts.
On Thu, 2022-08-04 at 11:40 -0700, Paul E. McKenney wrote: > On Thu, Aug 04, 2022 at 08:08:37AM +0000, Maxim Mikityanskiy wrote: > > On Wed, 2022-08-03 at 09:34 -0700, Paul E. McKenney wrote: > > > On Wed, Aug 03, 2022 at 07:49:57AM -0700, Jakub Kicinski wrote: > > > > On Wed, 3 Aug 2022 09:33:48 +0000 Maxim Mikityanskiy wrote: > > > > > > > The documentation of rcu_access_pointer says it shouldn't be used on > > > > > > > the update side, because we lose lockdep protection: > > > > > > > > > > > > > > --cut-- > > > > > > > > > > > > > > Although rcu_access_pointer() may also be used in cases > > > > > > > where update-side locks prevent the value of the pointer from changing, > > > > > > > you should instead use rcu_dereference_protected() for this use case. > > > > > > > > > > > > I think what this is trying to say is to not use the > > > > > > rcu_access_pointer() as a hack against lockdep: > > > > > > > > > > Well, maybe we understand it in different ways. This is how I parsed it > > > > > (the whole comment): > > > > > > > > > > 1. rcu_access_pointer is not for the read side. So, it's either for the > > > > > write side or for usage outside all locks. > > > > > > RCU readers really are permitted to use rcu_access_pointer(). As is > > > pretty much any other code. > > > > > > See for example Documentation/RCU/rcu_dereference.rst: > > > > > > Note that if checks for being within an RCU read-side critical > > > section are not required and the pointer is never dereferenced, > > > rcu_access_pointer() should be used in place of rcu_dereference(). > > > > > > OK, s/should be/can be/, but I will fix that. > > > > > > Or, for that matter, the rcu_access_pointer() docbook header comment: > > > > > > /** > > > * rcu_access_pointer() - fetch RCU pointer with no dereferencing > > > * @p: The pointer to read > > > * > > > * Return the value of the specified RCU-protected pointer, but omit the > > > * lockdep checks for being in an RCU read-side critical section. This is > > > * useful when the value of this pointer is accessed, but the pointer is > > > * not dereferenced, for example, when testing an RCU-protected pointer > > > * against NULL. Although rcu_access_pointer() may also be used in cases > > > * where update-side locks prevent the value of the pointer from changing, > > > * you should instead use rcu_dereference_protected() for this use case. > > > * > > > * It is also permissible to use rcu_access_pointer() when read-side > > > * access to the pointer was removed at least one grace period ago, as > > > * is the case in the context of the RCU callback that is freeing up > > > * the data, or after a synchronize_rcu() returns. This can be useful > > > * when tearing down multi-linked structures after a grace period > > > * has elapsed. > > > */ > > > > > > So the restriction is that the pointer returned from rcu_access_pointer() > > > cannot be dereferenced or that the structure is beyond being updated. > > > > > > So this is OK: > > > > > > // Not in an RCU reader. Or even in an RCU updater. > > > if (rcu_access_pointer(my_rcu_pointer)) > > > do_something(); > > > ... > > > > > > And so is this: > > > > > > p = xchg(&my_rcu_pointer, NULL); > > > if (p) { > > > synchronize_rcu(); > > > // No one else has access to this list! > > > while (p) { > > > q = rcu_access_pointer(p->next); > > > kfree(p); > > > q = p; > > > // But why are you hand-crafting list??? > > > // Any why not use rcu_dereference_protected()? > > > } > > > } > > > > > > But this is not: > > > > > > p = rcu_access_pointer(my_rcu_pointer); > > > do_something_with(p->a); // BUG!!! Even in an RCU reader. > > > > > > In this second case, you must instead use rcu_dereference() or > > > similar. > > > > > > > > 2. It's not for dereferencing. So, it's for reading the pointer's value > > > > > on the write side or outside all locks. > > > > > > True enough, you are not permitted to dereference the value returned > > > from rcu_access_pointer(). Unless you have the only copy. > > > > > > But it is just fine to check the value of the pointer, compare it, or > > > do arithmetic on it. Just don't dereference it and don't dereference > > > any value computed from it. > > > > > > > > 3. Although it can be used on the write side, rcu_dereference_protected > > > > > should be used. So, it's for reading the pointer's value outside all > > > > > locks. > > > > > > Yes, if an RCU updater is going to dereference a pointer, it should > > > use rcu_dereference_protected() rather than rcu_access_pointer(). > > > > > > So rcu_access_pointer() does what it it says. It permits the caller > > > to access the value of the pointer, and only to access that value. > > > Not dereference that value. > > > > > > > Using rcu_deref* when we don't dereference the pointer does not compute > > > > for me, but it's not a big deal. > > > > > > It is OK to use rcu_dereference*() to access a pointer without > > > dereferencing it. > > > > > > One key difference between rcu_dereference() and rcu_access_pointer() > > > is that rcu_access_pointer() can be used outside of an RCU reader. > > > For example: > > > > > > // Not in an RCU reader. Or even in an RCU updater. > > > if (rcu_access_pointer(my_rcu_pointer)) { > > > rcu_read_lock(); > > > p = rcu_dereference(my_rcu_pointer); > > > if (p) > > > do_something_with(p); > > > rcu_read_unlock(); > > > } > > > > > > This example is silly because the overhead of the extra check might well > > > exceed that of the rcu_read_lock() and rcu_read_unlock() put together, > > > especially in CONFIG_PREEMPTION=n kernels. A less-silly example might > > > schedule a workqueue or similar to handle the RCU-protected data, and > > > the overhead of the extra check might be very worthwhile in that case. > > > > That's essentially one of our cases, we check a pointer and queue a > > work if it's not NULL, and the work dereferences it. But we don't use > > rcu_read_lock there, because it's the teardown flow, and there are no > > concurrent users anymore (refcount is 0). > > Very good! > > > > > Let me CC Paul for clarification of the docs, as it may also be > > > > confusing to others and therefore worth rewording. But our case is > > > > not that important so unless Paul chimes in clearly indicating one > > > > interpretation is right - either way is fine by me for v2. > > > > > > Hope this helps! > > > > Thanks a lot for your detailed explanation! It's truly useful, > > especially the examples are helpful. > > Would you like to nominate a simple use case for addition to the > official documentation? The example with freeing the list (xchg, synchronize_rcu, rcu_access_pointer) would be a nice illustration to the last paragraph of rcu_access_pointer's doc comment. Probably not in the comment itself, but under Documentation/. Also the example with rcu_access_pointer + queue_work, which takes rcu_read_lock and calls rcu_dereference could illustrate the usefulness of rcu_access_pointer outside all locks. > > > As far as I understood, rcu_access_pointer can be used in any context, > > including RCU updater, as long as we don't dereference the pointer. At > > the same time, it's OK to use rcu_dereference* without dereferencing. > > So, is there any preference, which helper to use, in the context where > > it can't be changed concurrently, if we don't dereference it, but just > > compare the value? > > If you are in an RCU reader, there is little reason to use > rcu_access_pointer(), though it might be a good documentation aid. > It can also be helpful in code that is called both from an RCU reader > and from either and updater or an RCU outsider. > > > Specifically, we have two (simplified) cases: > > > > 1. We set the pointer to NULL under the write-lock, but only if it > > matches another pointer. > > > > down_write(&lock); > > ctx_netdev = rcu_dereference_protected(ctx->netdev, > > lockdep_is_held(&lock)); > > if (ctx_netdev == netdev) { > > rcu_assign_pointer(ctx->netdev, NULL); > > synchronize_rcu(); > > } > > up_write(&lock); > > I suggest keeping rcu_dereference_protected() here because it documents > the locking. This might seem silly in this case because you just > acquired that lock, but code can grown functions and then be subject to > copy-pasta-induced bugs, and that call to rcu_dereference_protected() > can help locate such bugs. In contrast, rcu_assign_pointer() will > cheerfully aid and abet the creation of this sort of bug. OK, I'll keep it. > > > 2. ctx->refcount goes down to 0, no one can access ctx->netdev anymore, > > we tear down ctx and need to check whether ctx->netdev is NULL. > > > > if (!refcount_dec_and_test(&ctx->refcount)) > > return; > > netdev = rcu_dereference_protected(ctx->netdev, > > !refcount_read(&ctx->refcount)); > > if (netdev) > > queue_work(...); > > > > It's somewhat similar to the "structure is beyond being updated" case, > > but it's ensured by refcount, not by RCU (i.e. you example assigned > > my_rcu_pointer = NULL and called synchronize_rcu() to ensure no one > > touches it, and I ensure that we are the only user of ctx by dropping > > refcount to zero). > > If that refcount_dec_and_test() is in an RCU callback or if RCU > readers are otherwise guaranteed to no longer be accessing this > object, then rcu_access_pointer() would work. It's the latter - no one else can access ctx after its refcount dropped to zero. > But again, that > rcu_dereference_protected() has the advantage of protecting against > copy-pasta bugs. OK then, I'll use that advantage. > Otherwise, I would need to better understand the example. > > > So, hoping that my understanding of your explanation is correct, both > > cases can use any of rcu_access_pointer or rcu_dereference_protected. > > Is there some rule of thumb which one to pick in such case? > > If you have something meaningful to put into the lockdep condition of > rcu_dereference_protected(), you should use rcu_dereference_protected(). > If not, and if either: > > a. There are no concurrent updates possible, or > > b. There will be no dereferencing of the returned pointer, > > then rcu_access_pointer() can be useful. > > Also, rcu_access_pointer() can sometimes simplify common code. Thanks again for your advice and providing a detailed explanation! I'll stick to this rule, and maybe it's a good idea to put it to some prominent place in the documentation to simplify the choice for other developers. > > > Thanks, > > Max > > > > > > > > And please let me know if it does not. > > > > > > Thanx, Paul > >
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c index 6b6c7044b64a..d8dea2aa7ade 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c @@ -819,7 +819,7 @@ bool mlx5e_ktls_handle_tx_skb(struct net_device *netdev, struct mlx5e_txqsq *sq, mlx5e_tx_mpwqe_ensure_complete(sq); tls_ctx = tls_get_ctx(skb->sk); - if (WARN_ON_ONCE(tls_ctx->netdev != netdev)) + if (WARN_ON_ONCE(rcu_dereference_bh(tls_ctx->netdev) != netdev)) goto err_out; priv_tx = mlx5e_get_ktls_tx_priv_ctx(tls_ctx); diff --git a/include/net/tls.h b/include/net/tls.h index b75b5727abdb..cb205f9d9473 100644 --- a/include/net/tls.h +++ b/include/net/tls.h @@ -237,7 +237,7 @@ struct tls_context { void *priv_ctx_tx; void *priv_ctx_rx; - struct net_device *netdev; + struct net_device __rcu *netdev; /* rw cache line */ struct cipher_context tx; diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c index 18c7e5c6d228..351d8959d80f 100644 --- a/net/tls/tls_device.c +++ b/net/tls/tls_device.c @@ -71,7 +71,13 @@ static void tls_device_tx_del_task(struct work_struct *work) struct tls_offload_context_tx *offload_ctx = container_of(work, struct tls_offload_context_tx, destruct_work); struct tls_context *ctx = offload_ctx->ctx; - struct net_device *netdev = ctx->netdev; + struct net_device *netdev; + + /* Safe, because this is the destroy flow, refcount is 0, so + * tls_device_down can't store this field in parallel. + */ + netdev = rcu_dereference_protected(ctx->netdev, + !refcount_read(&ctx->refcount)); netdev->tlsdev_ops->tls_dev_del(netdev, ctx, TLS_OFFLOAD_CTX_DIR_TX); dev_put(netdev); @@ -81,6 +87,7 @@ static void tls_device_tx_del_task(struct work_struct *work) static void tls_device_queue_ctx_destruction(struct tls_context *ctx) { + struct net_device *netdev; unsigned long flags; bool async_cleanup; @@ -91,7 +98,14 @@ static void tls_device_queue_ctx_destruction(struct tls_context *ctx) } list_del(&ctx->list); /* Remove from tls_device_list / tls_device_down_list */ - async_cleanup = ctx->netdev && ctx->tx_conf == TLS_HW; + + /* Safe, because this is the destroy flow, refcount is 0, so + * tls_device_down can't store this field in parallel. + */ + netdev = rcu_dereference_protected(ctx->netdev, + !refcount_read(&ctx->refcount)); + + async_cleanup = netdev && ctx->tx_conf == TLS_HW; if (async_cleanup) { struct tls_offload_context_tx *offload_ctx = tls_offload_ctx_tx(ctx); @@ -229,7 +243,8 @@ static void tls_device_resync_tx(struct sock *sk, struct tls_context *tls_ctx, trace_tls_device_tx_resync_send(sk, seq, rcd_sn); down_read(&device_offload_lock); - netdev = tls_ctx->netdev; + netdev = rcu_dereference_protected(tls_ctx->netdev, + lockdep_is_held(&device_offload_lock)); if (netdev) err = netdev->tlsdev_ops->tls_dev_resync(netdev, sk, seq, rcd_sn, @@ -710,7 +725,7 @@ static void tls_device_resync_rx(struct tls_context *tls_ctx, trace_tls_device_rx_resync_send(sk, seq, rcd_sn, rx_ctx->resync_type); rcu_read_lock(); - netdev = READ_ONCE(tls_ctx->netdev); + netdev = rcu_dereference(tls_ctx->netdev); if (netdev) netdev->tlsdev_ops->tls_dev_resync(netdev, sk, seq, rcd_sn, TLS_OFFLOAD_CTX_DIR_RX); @@ -1029,7 +1044,7 @@ static void tls_device_attach(struct tls_context *ctx, struct sock *sk, if (sk->sk_destruct != tls_device_sk_destruct) { refcount_set(&ctx->refcount, 1); dev_hold(netdev); - ctx->netdev = netdev; + RCU_INIT_POINTER(ctx->netdev, netdev); spin_lock_irq(&tls_device_lock); list_add_tail(&ctx->list, &tls_device_list); spin_unlock_irq(&tls_device_lock); @@ -1300,7 +1315,8 @@ void tls_device_offload_cleanup_rx(struct sock *sk) struct net_device *netdev; down_read(&device_offload_lock); - netdev = tls_ctx->netdev; + netdev = rcu_dereference_protected(tls_ctx->netdev, + lockdep_is_held(&device_offload_lock)); if (!netdev) goto out; @@ -1309,7 +1325,7 @@ void tls_device_offload_cleanup_rx(struct sock *sk) if (tls_ctx->tx_conf != TLS_HW) { dev_put(netdev); - tls_ctx->netdev = NULL; + rcu_assign_pointer(tls_ctx->netdev, NULL); } else { set_bit(TLS_RX_DEV_CLOSED, &tls_ctx->flags); } @@ -1329,7 +1345,11 @@ static int tls_device_down(struct net_device *netdev) spin_lock_irqsave(&tls_device_lock, flags); list_for_each_entry_safe(ctx, tmp, &tls_device_list, list) { - if (ctx->netdev != netdev || + struct net_device *ctx_netdev = + rcu_dereference_protected(ctx->netdev, + lockdep_is_held(&device_offload_lock)); + + if (ctx_netdev != netdev || !refcount_inc_not_zero(&ctx->refcount)) continue; @@ -1346,7 +1366,7 @@ static int tls_device_down(struct net_device *netdev) /* Stop the RX and TX resync. * tls_dev_resync must not be called after tls_dev_del. */ - WRITE_ONCE(ctx->netdev, NULL); + rcu_assign_pointer(ctx->netdev, NULL); /* Start skipping the RX resync logic completely. */ set_bit(TLS_RX_DEV_DEGRADED, &ctx->flags); diff --git a/net/tls/tls_device_fallback.c b/net/tls/tls_device_fallback.c index 618cee704217..7dfc8023e0f1 100644 --- a/net/tls/tls_device_fallback.c +++ b/net/tls/tls_device_fallback.c @@ -426,7 +426,8 @@ struct sk_buff *tls_validate_xmit_skb(struct sock *sk, struct net_device *dev, struct sk_buff *skb) { - if (dev == tls_get_ctx(sk)->netdev || netif_is_bond_master(dev)) + if (dev == rcu_dereference_bh(tls_get_ctx(sk)->netdev) || + netif_is_bond_master(dev)) return skb; return tls_sw_fallback(sk, skb);
Currently, tls_device_down synchronizes with tls_device_resync_rx using RCU, however, the pointer to netdev is stored using WRITE_ONCE and loaded using READ_ONCE. Although such approach is technically correct (rcu_dereference is essentially a READ_ONCE, and rcu_assign_pointer uses WRITE_ONCE to store NULL), using special RCU helpers for pointers is more valid, as it includes additional checks and might change the implementation transparently to the callers. Mark the netdev pointer as __rcu and use the correct RCU helpers to access it. For non-concurrent access pass the right conditions that guarantee safe access (locks taken, refcount value). Also use the correct helper in mlx5e, where even READ_ONCE was missing. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> --- .../mellanox/mlx5/core/en_accel/ktls_tx.c | 2 +- include/net/tls.h | 2 +- net/tls/tls_device.c | 38 ++++++++++++++----- net/tls/tls_device_fallback.c | 3 +- 4 files changed, 33 insertions(+), 12 deletions(-)