diff mbox series

[jgg-for-next] RDMA/rxe: Fix spinlock recursion deadlock on requester

Message ID 20230418090642.1849358-1-matsuda-daisuke@fujitsu.com (mailing list archive)
State Accepted
Delegated to: Jason Gunthorpe
Headers show
Series [jgg-for-next] RDMA/rxe: Fix spinlock recursion deadlock on requester | expand

Commit Message

Daisuke Matsuda (Fujitsu) April 18, 2023, 9:06 a.m. UTC
After applying commit f605f26ea196, the following deadlock is observed:
 Call Trace:
  <IRQ>
  _raw_spin_lock_bh+0x29/0x30
  check_type_state.constprop.0+0x4e/0xc0 [rdma_rxe]
  rxe_rcv+0x173/0x3d0 [rdma_rxe]
  rxe_udp_encap_recv+0x69/0xd0 [rdma_rxe]
  ? __pfx_rxe_udp_encap_recv+0x10/0x10 [rdma_rxe]
  udp_queue_rcv_one_skb+0x258/0x520
  udp_unicast_rcv_skb+0x75/0x90
  __udp4_lib_rcv+0x364/0x5c0
  ip_protocol_deliver_rcu+0xa7/0x160
  ip_local_deliver_finish+0x73/0xa0
  ip_sublist_rcv_finish+0x80/0x90
  ip_sublist_rcv+0x191/0x220
  ip_list_rcv+0x132/0x160
  __netif_receive_skb_list_core+0x297/0x2c0
  netif_receive_skb_list_internal+0x1c5/0x300
  napi_complete_done+0x6f/0x1b0
  virtnet_poll+0x1f4/0x2d0 [virtio_net]
  __napi_poll+0x2c/0x1b0
  net_rx_action+0x293/0x350
  ? __napi_schedule+0x79/0x90
  __do_softirq+0xcb/0x2ab
  __irq_exit_rcu+0xb9/0xf0
  common_interrupt+0x80/0xa0
  </IRQ>
  <TASK>
  asm_common_interrupt+0x22/0x40
  RIP: 0010:_raw_spin_lock+0x17/0x30
  rxe_requester+0xe4/0x8f0 [rdma_rxe]
  ? xas_load+0x9/0xa0
  ? xa_load+0x70/0xb0
  do_task+0x64/0x1f0 [rdma_rxe]
  rxe_post_send+0x54/0x110 [rdma_rxe]
  ib_uverbs_post_send+0x5f8/0x680 [ib_uverbs]
  ? netif_receive_skb_list_internal+0x1e3/0x300
  ib_uverbs_write+0x3c8/0x500 [ib_uverbs]
  vfs_write+0xc5/0x3b0
  ksys_write+0xab/0xe0
  ? syscall_trace_enter.constprop.0+0x126/0x1a0
  do_syscall_64+0x3b/0x90
  entry_SYSCALL_64_after_hwframe+0x72/0xdc
  </TASK>

The deadlock is easily reproducible with perftest. Fix it by disabling
softirq when acquiring the lock in process context.

Fixes: f605f26ea196 ("RDMA/rxe: Protect QP state with qp->state_lock")
Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
---
 drivers/infiniband/sw/rxe/rxe_req.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comments

Zhu Yanjun April 18, 2023, 1:43 p.m. UTC | #1
On Tue, Apr 18, 2023 at 5:07 PM Daisuke Matsuda
<matsuda-daisuke@fujitsu.com> wrote:
>
> After applying commit f605f26ea196, the following deadlock is observed:
>  Call Trace:
>   <IRQ>
>   _raw_spin_lock_bh+0x29/0x30
>   check_type_state.constprop.0+0x4e/0xc0 [rdma_rxe]
>   rxe_rcv+0x173/0x3d0 [rdma_rxe]
>   rxe_udp_encap_recv+0x69/0xd0 [rdma_rxe]
>   ? __pfx_rxe_udp_encap_recv+0x10/0x10 [rdma_rxe]
>   udp_queue_rcv_one_skb+0x258/0x520
>   udp_unicast_rcv_skb+0x75/0x90
>   __udp4_lib_rcv+0x364/0x5c0
>   ip_protocol_deliver_rcu+0xa7/0x160
>   ip_local_deliver_finish+0x73/0xa0
>   ip_sublist_rcv_finish+0x80/0x90
>   ip_sublist_rcv+0x191/0x220
>   ip_list_rcv+0x132/0x160
>   __netif_receive_skb_list_core+0x297/0x2c0
>   netif_receive_skb_list_internal+0x1c5/0x300
>   napi_complete_done+0x6f/0x1b0
>   virtnet_poll+0x1f4/0x2d0 [virtio_net]
>   __napi_poll+0x2c/0x1b0
>   net_rx_action+0x293/0x350
>   ? __napi_schedule+0x79/0x90
>   __do_softirq+0xcb/0x2ab
>   __irq_exit_rcu+0xb9/0xf0
>   common_interrupt+0x80/0xa0
>   </IRQ>
>   <TASK>
>   asm_common_interrupt+0x22/0x40
>   RIP: 0010:_raw_spin_lock+0x17/0x30
>   rxe_requester+0xe4/0x8f0 [rdma_rxe]
>   ? xas_load+0x9/0xa0
>   ? xa_load+0x70/0xb0
>   do_task+0x64/0x1f0 [rdma_rxe]
>   rxe_post_send+0x54/0x110 [rdma_rxe]
>   ib_uverbs_post_send+0x5f8/0x680 [ib_uverbs]
>   ? netif_receive_skb_list_internal+0x1e3/0x300
>   ib_uverbs_write+0x3c8/0x500 [ib_uverbs]
>   vfs_write+0xc5/0x3b0
>   ksys_write+0xab/0xe0
>   ? syscall_trace_enter.constprop.0+0x126/0x1a0
>   do_syscall_64+0x3b/0x90
>   entry_SYSCALL_64_after_hwframe+0x72/0xdc
>   </TASK>
>
> The deadlock is easily reproducible with perftest. Fix it by disabling
> softirq when acquiring the lock in process context.

I am fine. Thanks.

Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com>

Zhu Yanjun

>
> Fixes: f605f26ea196 ("RDMA/rxe: Protect QP state with qp->state_lock")
> Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
> ---
>  drivers/infiniband/sw/rxe/rxe_req.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
> index 8e50d116d273..65134a9aefe7 100644
> --- a/drivers/infiniband/sw/rxe/rxe_req.c
> +++ b/drivers/infiniband/sw/rxe/rxe_req.c
> @@ -180,13 +180,13 @@ static struct rxe_send_wqe *req_next_wqe(struct rxe_qp *qp)
>         if (wqe == NULL)
>                 return NULL;
>
> -       spin_lock(&qp->state_lock);
> +       spin_lock_bh(&qp->state_lock);
>         if (unlikely((qp_state(qp) == IB_QPS_SQD) &&
>                      (wqe->state != wqe_state_processing))) {
> -               spin_unlock(&qp->state_lock);
> +               spin_unlock_bh(&qp->state_lock);
>                 return NULL;
>         }
> -       spin_unlock(&qp->state_lock);
> +       spin_unlock_bh(&qp->state_lock);
>
>         wqe->mask = wr_opcode_mask(wqe->wr.opcode, qp);
>         return wqe;
> --
> 2.39.1
>
Jason Gunthorpe April 21, 2023, 3:39 p.m. UTC | #2
On Tue, Apr 18, 2023 at 06:06:42PM +0900, Daisuke Matsuda wrote:
> After applying commit f605f26ea196, the following deadlock is observed:
>  Call Trace:
>   <IRQ>
>   _raw_spin_lock_bh+0x29/0x30
>   check_type_state.constprop.0+0x4e/0xc0 [rdma_rxe]
>   rxe_rcv+0x173/0x3d0 [rdma_rxe]
>   rxe_udp_encap_recv+0x69/0xd0 [rdma_rxe]
>   ? __pfx_rxe_udp_encap_recv+0x10/0x10 [rdma_rxe]
>   udp_queue_rcv_one_skb+0x258/0x520
>   udp_unicast_rcv_skb+0x75/0x90
>   __udp4_lib_rcv+0x364/0x5c0
>   ip_protocol_deliver_rcu+0xa7/0x160
>   ip_local_deliver_finish+0x73/0xa0
>   ip_sublist_rcv_finish+0x80/0x90
>   ip_sublist_rcv+0x191/0x220
>   ip_list_rcv+0x132/0x160
>   __netif_receive_skb_list_core+0x297/0x2c0
>   netif_receive_skb_list_internal+0x1c5/0x300
>   napi_complete_done+0x6f/0x1b0
>   virtnet_poll+0x1f4/0x2d0 [virtio_net]
>   __napi_poll+0x2c/0x1b0
>   net_rx_action+0x293/0x350
>   ? __napi_schedule+0x79/0x90
>   __do_softirq+0xcb/0x2ab
>   __irq_exit_rcu+0xb9/0xf0
>   common_interrupt+0x80/0xa0
>   </IRQ>
>   <TASK>
>   asm_common_interrupt+0x22/0x40
>   RIP: 0010:_raw_spin_lock+0x17/0x30
>   rxe_requester+0xe4/0x8f0 [rdma_rxe]
>   ? xas_load+0x9/0xa0
>   ? xa_load+0x70/0xb0
>   do_task+0x64/0x1f0 [rdma_rxe]
>   rxe_post_send+0x54/0x110 [rdma_rxe]
>   ib_uverbs_post_send+0x5f8/0x680 [ib_uverbs]
>   ? netif_receive_skb_list_internal+0x1e3/0x300
>   ib_uverbs_write+0x3c8/0x500 [ib_uverbs]
>   vfs_write+0xc5/0x3b0
>   ksys_write+0xab/0xe0
>   ? syscall_trace_enter.constprop.0+0x126/0x1a0
>   do_syscall_64+0x3b/0x90
>   entry_SYSCALL_64_after_hwframe+0x72/0xdc
>   </TASK>
> 
> The deadlock is easily reproducible with perftest. Fix it by disabling
> softirq when acquiring the lock in process context.
> 
> Fixes: f605f26ea196 ("RDMA/rxe: Protect QP state with qp->state_lock")
> Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
> Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com>
> ---
>  drivers/infiniband/sw/rxe/rxe_req.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)

Applied to for-next, thanks

Jason
diff mbox series

Patch

diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
index 8e50d116d273..65134a9aefe7 100644
--- a/drivers/infiniband/sw/rxe/rxe_req.c
+++ b/drivers/infiniband/sw/rxe/rxe_req.c
@@ -180,13 +180,13 @@  static struct rxe_send_wqe *req_next_wqe(struct rxe_qp *qp)
 	if (wqe == NULL)
 		return NULL;
 
-	spin_lock(&qp->state_lock);
+	spin_lock_bh(&qp->state_lock);
 	if (unlikely((qp_state(qp) == IB_QPS_SQD) &&
 		     (wqe->state != wqe_state_processing))) {
-		spin_unlock(&qp->state_lock);
+		spin_unlock_bh(&qp->state_lock);
 		return NULL;
 	}
-	spin_unlock(&qp->state_lock);
+	spin_unlock_bh(&qp->state_lock);
 
 	wqe->mask = wr_opcode_mask(wqe->wr.opcode, qp);
 	return wqe;