Message ID | 20220721091127.3209661-1-maximmi@nvidia.com (mailing list archive) |
---|---|
State | Accepted |
Commit | f6336724a4d4220c89a4ec38bca84b03b178b1a3 |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] net/tls: Remove the context from the list in tls_device_down | expand |
On Thu, 21 Jul 2022 12:11:27 +0300 Maxim Mikityanskiy wrote: > tls_device_down takes a reference on all contexts it's going to move to > the degraded state (software fallback). If sk_destruct runs afterwards, > it can reduce the reference counter back to 1 and return early without > destroying the context. Then tls_device_down will release the reference > it took and call tls_device_free_ctx. However, the context will still > stay in tls_device_down_list forever. The list will contain an item, > memory for which is released, making a memory corruption possible. > > Fix the above bug by properly removing the context from all lists before > any call to tls_device_free_ctx. SGTM. The tls_device_down_list has no use, tho, is the plan to remove it later as a cleanup or your upcoming patches make use of it? We can delete it now if you don't have a preference, either way the fix is small.
Hello: This patch was applied to netdev/net.git (master) by David S. Miller <davem@davemloft.net>: On Thu, 21 Jul 2022 12:11:27 +0300 you wrote: > tls_device_down takes a reference on all contexts it's going to move to > the degraded state (software fallback). If sk_destruct runs afterwards, > it can reduce the reference counter back to 1 and return early without > destroying the context. Then tls_device_down will release the reference > it took and call tls_device_free_ctx. However, the context will still > stay in tls_device_down_list forever. The list will contain an item, > memory for which is released, making a memory corruption possible. > > [...] Here is the summary with links: - [net] net/tls: Remove the context from the list in tls_device_down https://git.kernel.org/netdev/net/c/f6336724a4d4 You are awesome, thank you!
On Fri, 2022-07-22 at 15:04 -0700, Jakub Kicinski wrote: > On Thu, 21 Jul 2022 12:11:27 +0300 Maxim Mikityanskiy wrote: > > tls_device_down takes a reference on all contexts it's going to move to > > the degraded state (software fallback). If sk_destruct runs afterwards, > > it can reduce the reference counter back to 1 and return early without > > destroying the context. Then tls_device_down will release the reference > > it took and call tls_device_free_ctx. However, the context will still > > stay in tls_device_down_list forever. The list will contain an item, > > memory for which is released, making a memory corruption possible. > > > > Fix the above bug by properly removing the context from all lists before > > any call to tls_device_free_ctx. > > SGTM. The tls_device_down_list has no use, tho, is the plan to remove > it later as a cleanup or your upcoming patches make use of it? I don't plan to remove it. Right, we never iterate over it, so instead of moving the context to tls_device_down_list, we can remove it from list, as long as we check to not remove it second time on destruction. However, this way we don't gain anything, but lose a debugging opportunity: for example, when list debugging is enabled, double list_del will be detected. So, it doesn't make sense to me to remove this list, but if you still want to do it, Tariq has a patch for this. > > We can delete it now if you don't have a preference, either way the fix > is small.
On Mon, 25 Jul 2022 14:35:08 +0000 Maxim Mikityanskiy wrote: > On Fri, 2022-07-22 at 15:04 -0700, Jakub Kicinski wrote: > > On Thu, 21 Jul 2022 12:11:27 +0300 Maxim Mikityanskiy wrote: > > > tls_device_down takes a reference on all contexts it's going to move to > > > the degraded state (software fallback). If sk_destruct runs afterwards, > > > it can reduce the reference counter back to 1 and return early without > > > destroying the context. Then tls_device_down will release the reference > > > it took and call tls_device_free_ctx. However, the context will still > > > stay in tls_device_down_list forever. The list will contain an item, > > > memory for which is released, making a memory corruption possible. > > > > > > Fix the above bug by properly removing the context from all lists before > > > any call to tls_device_free_ctx. > > > > SGTM. The tls_device_down_list has no use, tho, is the plan to remove > > it later as a cleanup or your upcoming patches make use of it? > > I don't plan to remove it. Right, we never iterate over it, so instead > of moving the context to tls_device_down_list, we can remove it from > list, as long as we check to not remove it second time on destruction. > > However, this way we don't gain anything, but lose a debugging > opportunity: for example, when list debugging is enabled, double > list_del will be detected. I see. I haven't actually checked if list_del_init() would do as well here. > So, it doesn't make sense to me to remove this list, but if you still > want to do it, Tariq has a patch for this. Fine either way, thanks for the explanation.
diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c index 879b9024678e..9975df34d9c2 100644 --- a/net/tls/tls_device.c +++ b/net/tls/tls_device.c @@ -1376,8 +1376,13 @@ static int tls_device_down(struct net_device *netdev) * by tls_device_free_ctx. rx_conf and tx_conf stay in TLS_HW. * Now release the ref taken above. */ - if (refcount_dec_and_test(&ctx->refcount)) + if (refcount_dec_and_test(&ctx->refcount)) { + /* sk_destruct ran after tls_device_down took a ref, and + * it returned early. Complete the destruction here. + */ + list_del(&ctx->list); tls_device_free_ctx(ctx); + } } up_write(&device_offload_lock);