Message ID | 20220715084216.4778-1-tariqt@nvidia.com (mailing list archive) |
---|---|
State | Accepted |
Commit | f08d8c1bb97c48f24a82afaa2fd8c140f8d3da8b |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] net/tls: Fix race in TLS device down flow | expand |
On Fri, 15 Jul 2022 11:42:16 +0300 Tariq Toukan wrote: > Socket destruction flow and tls_device_down function sync against each > other using tls_device_lock and the context refcount, to guarantee the > device resources are freed via tls_dev_del() by the end of > tls_device_down. > > In the following unfortunate flow, this won't happen: > - refcount is decreased to zero in tls_device_sk_destruct. > - tls_device_down starts, skips the context as refcount is zero, going > all the way until it flushes the gc work, and returns without freeing > the device resources. > - only then, tls_device_queue_ctx_destruction is called, queues the gc > work and frees the context's device resources. > > Solve it by decreasing the refcount in the socket's destruction flow > under the tls_device_lock, for perfect synchronization. This does not > slow down the common likely destructor flow, in which both the refcount > is decreased and the spinlock is acquired, anyway. > > Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure") > Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> > Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Oh, so it was already racy? Sad this has missed the PR, another delay for your -next patches :S Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Hello: This patch was applied to netdev/net.git (master) by David S. Miller <davem@davemloft.net>: On Fri, 15 Jul 2022 11:42:16 +0300 you wrote: > Socket destruction flow and tls_device_down function sync against each > other using tls_device_lock and the context refcount, to guarantee the > device resources are freed via tls_dev_del() by the end of > tls_device_down. > > In the following unfortunate flow, this won't happen: > - refcount is decreased to zero in tls_device_sk_destruct. > - tls_device_down starts, skips the context as refcount is zero, going > all the way until it flushes the gc work, and returns without freeing > the device resources. > - only then, tls_device_queue_ctx_destruction is called, queues the gc > work and frees the context's device resources. > > [...] Here is the summary with links: - [net] net/tls: Fix race in TLS device down flow https://git.kernel.org/netdev/net/c/f08d8c1bb97c You are awesome, thank you!
diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c index ce827e79c66a..879b9024678e 100644 --- a/net/tls/tls_device.c +++ b/net/tls/tls_device.c @@ -97,13 +97,16 @@ static void tls_device_queue_ctx_destruction(struct tls_context *ctx) unsigned long flags; spin_lock_irqsave(&tls_device_lock, flags); + if (unlikely(!refcount_dec_and_test(&ctx->refcount))) + goto unlock; + list_move_tail(&ctx->list, &tls_device_gc_list); /* schedule_work inside the spinlock * to make sure tls_device_down waits for that work. */ schedule_work(&tls_device_gc_work); - +unlock: spin_unlock_irqrestore(&tls_device_lock, flags); } @@ -194,8 +197,7 @@ void tls_device_sk_destruct(struct sock *sk) clean_acked_data_disable(inet_csk(sk)); } - if (refcount_dec_and_test(&tls_ctx->refcount)) - tls_device_queue_ctx_destruction(tls_ctx); + tls_device_queue_ctx_destruction(tls_ctx); } EXPORT_SYMBOL_GPL(tls_device_sk_destruct);