Message ID | 20210812053056.1699-1-longpeng2@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [resend] vsock/virtio: avoid potential deadlock when vsock device remove | expand |
On Thu, Aug 12, 2021 at 01:30:56PM +0800, Longpeng(Mike) wrote: >There's a potential deadlock case when remove the vsock device or >process the RESET event: > > vsock_for_each_connected_socket: > spin_lock_bh(&vsock_table_lock) ----------- (1) > ... > virtio_vsock_reset_sock: > lock_sock(sk) --------------------- (2) > ... > spin_unlock_bh(&vsock_table_lock) > >lock_sock() may do initiative schedule when the 'sk' is owned by >other thread at the same time, we would receivce a warning message >that "scheduling while atomic". > >Even worse, if the next task (selected by the scheduler) try to >release a 'sk', it need to request vsock_table_lock and the deadlock >occur, cause the system into softlockup state. > Call trace: > queued_spin_lock_slowpath > vsock_remove_bound > vsock_remove_sock > virtio_transport_release > __vsock_release > vsock_release > __sock_release > sock_close > __fput > ____fput > >So we should not require sk_lock in this case, just like the behavior >in vhost_vsock or vmci. The difference with vhost_vsock is that here we call it also when we receive an event in the event queue (for example because we are migrating the VM). I think the idea of this lock was to prevent concurrency with RX loop, but actually if a socket is connected, it can only change state to TCP_CLOSING/TCP_CLOSE. I don't think there is any problem not to take the lock, at most we could take the rx_lock in virtio_vsock_event_handle(), but I'm not sure it's necessary. > >Cc: Stefan Hajnoczi <stefanha@redhat.com> >Cc: Stefano Garzarella <sgarzare@redhat.com> >Cc: "David S. Miller" <davem@davemloft.net> >Cc: Jakub Kicinski <kuba@kernel.org> We should add: Fixes: 0ea9e1d3a9e3 ("VSOCK: Introduce virtio_transport.ko") >Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com> >--- > net/vmw_vsock/virtio_transport.c | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > >diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c >index e0c2c99..4f7c99d 100644 >--- a/net/vmw_vsock/virtio_transport.c >+++ b/net/vmw_vsock/virtio_transport.c >@@ -357,11 +357,14 @@ static void virtio_vsock_event_fill(struct virtio_vsock *vsock) > > static void virtio_vsock_reset_sock(struct sock *sk) > { >- lock_sock(sk); >+ /* vmci_transport.c doesn't take sk_lock here either. At least we're >+ * under vsock_table_lock so the sock cannot disappear while >we're >+ * executing. >+ */ >+ > sk->sk_state = TCP_CLOSE; > sk->sk_err = ECONNRESET; > sk_error_report(sk); >- release_sock(sk); > } > > static void virtio_vsock_update_guest_cid(struct virtio_vsock *vsock) >-- >1.8.3.1 > With the Fixes tag added: Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
On Thu, 12 Aug 2021 10:03:32 +0200 Stefano Garzarella wrote: > On Thu, Aug 12, 2021 at 01:30:56PM +0800, Longpeng(Mike) wrote: > >There's a potential deadlock case when remove the vsock device or > >process the RESET event: > > > > vsock_for_each_connected_socket: > > spin_lock_bh(&vsock_table_lock) ----------- (1) > > ... > > virtio_vsock_reset_sock: > > lock_sock(sk) --------------------- (2) > > ... > > spin_unlock_bh(&vsock_table_lock) > > > >lock_sock() may do initiative schedule when the 'sk' is owned by > >other thread at the same time, we would receivce a warning message > >that "scheduling while atomic". > > > >Even worse, if the next task (selected by the scheduler) try to > >release a 'sk', it need to request vsock_table_lock and the deadlock > >occur, cause the system into softlockup state. > > Call trace: > > queued_spin_lock_slowpath > > vsock_remove_bound > > vsock_remove_sock > > virtio_transport_release > > __vsock_release > > vsock_release > > __sock_release > > sock_close > > __fput > > ____fput > > > >So we should not require sk_lock in this case, just like the behavior > >in vhost_vsock or vmci. > > We should add: > Fixes: 0ea9e1d3a9e3 ("VSOCK: Introduce virtio_transport.ko") Added. > Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> And applied, thanks!
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c index e0c2c99..4f7c99d 100644 --- a/net/vmw_vsock/virtio_transport.c +++ b/net/vmw_vsock/virtio_transport.c @@ -357,11 +357,14 @@ static void virtio_vsock_event_fill(struct virtio_vsock *vsock) static void virtio_vsock_reset_sock(struct sock *sk) { - lock_sock(sk); + /* vmci_transport.c doesn't take sk_lock here either. At least we're + * under vsock_table_lock so the sock cannot disappear while we're + * executing. + */ + sk->sk_state = TCP_CLOSE; sk->sk_err = ECONNRESET; sk_error_report(sk); - release_sock(sk); } static void virtio_vsock_update_guest_cid(struct virtio_vsock *vsock)
There's a potential deadlock case when remove the vsock device or process the RESET event: vsock_for_each_connected_socket: spin_lock_bh(&vsock_table_lock) ----------- (1) ... virtio_vsock_reset_sock: lock_sock(sk) --------------------- (2) ... spin_unlock_bh(&vsock_table_lock) lock_sock() may do initiative schedule when the 'sk' is owned by other thread at the same time, we would receivce a warning message that "scheduling while atomic". Even worse, if the next task (selected by the scheduler) try to release a 'sk', it need to request vsock_table_lock and the deadlock occur, cause the system into softlockup state. Call trace: queued_spin_lock_slowpath vsock_remove_bound vsock_remove_sock virtio_transport_release __vsock_release vsock_release __sock_release sock_close __fput ____fput So we should not require sk_lock in this case, just like the behavior in vhost_vsock or vmci. Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Stefano Garzarella <sgarzare@redhat.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com> --- net/vmw_vsock/virtio_transport.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)