diff mbox series

[v3,for-rc] RDMA/qedr: qedr crash while running rdma-tool

Message ID 20211027184329.18454-1-palok@marvell.com (mailing list archive)
State Accepted
Delegated to: Jason Gunthorpe
Headers show
Series [v3,for-rc] RDMA/qedr: qedr crash while running rdma-tool | expand

Commit Message

Alok Prasad Oct. 27, 2021, 6:43 p.m. UTC
This patch fixes crash caused by querying qp.
    Also corrects the state of gsi qp.

    Below call trace is generated while using iproute2 utility
    "rdma res show -dd qp" on rdma interface.
    ======================================================================
    [  302.569794] BUG: kernel NULL pointer dereference, address: 0000000000000034
    ..
    [  302.570378] Hardware name: Dell Inc. PowerEdge R720/0M1GCR, BIOS 1.2.6 05/10/2012
    [  302.570500] RIP: 0010:qed_rdma_query_qp+0x33/0x1a0 [qed]
    [  302.570861] RSP: 0018:ffffba560a08f580 EFLAGS: 00010206
    [  302.570979] RAX: 0000000200000000 RBX: ffffba560a08f5b8 RCX: 0000000000000000
    [  302.571100] RDX: ffffba560a08f5b8 RSI: 0000000000000000 RDI: ffff9807ee458090
    [  302.571221] RBP: ffffba560a08f5a0 R08: 0000000000000000 R09: ffff9807890e7048
    [  302.571342] R10: ffffba560a08f658 R11: 0000000000000000 R12: 0000000000000000
    [  302.571462] R13: ffff9807ee458090 R14: ffff9807f0afb000 R15: ffffba560a08f7ec
    [  302.571583] FS:  00007fbbf8bfe740(0000) GS:ffff980aafa00000(0000) knlGS:0000000000000000
    [  302.571729] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [  302.571847] CR2: 0000000000000034 CR3: 00000001720ba001 CR4: 00000000000606f0
    [  302.571968] Call Trace:
    [  302.572083]  qedr_query_qp+0x82/0x360 [qedr]
    [  302.572211]  ib_query_qp+0x34/0x40 [ib_core]
    [  302.572361]  ? ib_query_qp+0x34/0x40 [ib_core]
    [  302.572503]  fill_res_qp_entry_query.isra.26+0x47/0x1d0 [ib_core]
    [  302.572670]  ? __nla_put+0x20/0x30
    [  302.572788]  ? nla_put+0x33/0x40
    [  302.572901]  fill_res_qp_entry+0xe3/0x120 [ib_core]
    [  302.573058]  res_get_common_dumpit+0x3f8/0x5d0 [ib_core]
    [  302.573213]  ? fill_res_cm_id_entry+0x1f0/0x1f0 [ib_core]
    [  302.573377]  nldev_res_get_qp_dumpit+0x1a/0x20 [ib_core]
    [  302.573529]  netlink_dump+0x156/0x2f0
    [  302.573648]  __netlink_dump_start+0x1ab/0x260
    [  302.573765]  rdma_nl_rcv+0x1de/0x330 [ib_core]
    [  302.573918]  ? nldev_res_get_cm_id_dumpit+0x20/0x20 [ib_core]
    [  302.574074]  netlink_unicast+0x1b8/0x270
    [  302.574191]  netlink_sendmsg+0x33e/0x470
    [  302.574307]  sock_sendmsg+0x63/0x70
    [  302.574421]  __sys_sendto+0x13f/0x180
    [  302.574536]  ? setup_sgl.isra.12+0x70/0xc0
    [  302.574655]  __x64_sys_sendto+0x28/0x30
    [  302.574769]  do_syscall_64+0x3a/0xb0
    [  302.574884]  entry_SYSCALL_64_after_hwframe+0x44/0xae
    =====================================================================
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Alok Prasad <palok@marvell.com>
---
v3:(from [2]):
         - Call query qp callback only in case of non-gsi qp as
           suggested by Kamal.
  [2] https://patchwork.kernel.org/project/linux-rdma/patch/20211023164557.7921-1-palok@marvell.com/

v2:(from [1]):
         - Change description.
         - Corrected enum type.
  [1] https://patchwork.kernel.org/project/linux-rdma/patch/20210821074339.16614-1-palok@marvell.com/
---
 drivers/infiniband/hw/qedr/verbs.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

Comments

Leon Romanovsky Oct. 28, 2021, 6:22 a.m. UTC | #1
On Wed, Oct 27, 2021 at 06:43:29PM +0000, Alok Prasad wrote:
>     This patch fixes crash caused by querying qp.
>     Also corrects the state of gsi qp.

Please don't add extra space in front of every line.

> 
>     Below call trace is generated while using iproute2 utility
>     "rdma res show -dd qp" on rdma interface.
>     ======================================================================
>     [  302.569794] BUG: kernel NULL pointer dereference, address: 0000000000000034
>     ..
>     [  302.570378] Hardware name: Dell Inc. PowerEdge R720/0M1GCR, BIOS 1.2.6 05/10/2012
>     [  302.570500] RIP: 0010:qed_rdma_query_qp+0x33/0x1a0 [qed]
>     [  302.570861] RSP: 0018:ffffba560a08f580 EFLAGS: 00010206
>     [  302.570979] RAX: 0000000200000000 RBX: ffffba560a08f5b8 RCX: 0000000000000000
>     [  302.571100] RDX: ffffba560a08f5b8 RSI: 0000000000000000 RDI: ffff9807ee458090
>     [  302.571221] RBP: ffffba560a08f5a0 R08: 0000000000000000 R09: ffff9807890e7048
>     [  302.571342] R10: ffffba560a08f658 R11: 0000000000000000 R12: 0000000000000000
>     [  302.571462] R13: ffff9807ee458090 R14: ffff9807f0afb000 R15: ffffba560a08f7ec
>     [  302.571583] FS:  00007fbbf8bfe740(0000) GS:ffff980aafa00000(0000) knlGS:0000000000000000
>     [  302.571729] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>     [  302.571847] CR2: 0000000000000034 CR3: 00000001720ba001 CR4: 00000000000606f0
>     [  302.571968] Call Trace:
>     [  302.572083]  qedr_query_qp+0x82/0x360 [qedr]
>     [  302.572211]  ib_query_qp+0x34/0x40 [ib_core]
>     [  302.572361]  ? ib_query_qp+0x34/0x40 [ib_core]
>     [  302.572503]  fill_res_qp_entry_query.isra.26+0x47/0x1d0 [ib_core]
>     [  302.572670]  ? __nla_put+0x20/0x30
>     [  302.572788]  ? nla_put+0x33/0x40
>     [  302.572901]  fill_res_qp_entry+0xe3/0x120 [ib_core]
>     [  302.573058]  res_get_common_dumpit+0x3f8/0x5d0 [ib_core]
>     [  302.573213]  ? fill_res_cm_id_entry+0x1f0/0x1f0 [ib_core]
>     [  302.573377]  nldev_res_get_qp_dumpit+0x1a/0x20 [ib_core]
>     [  302.573529]  netlink_dump+0x156/0x2f0
>     [  302.573648]  __netlink_dump_start+0x1ab/0x260
>     [  302.573765]  rdma_nl_rcv+0x1de/0x330 [ib_core]
>     [  302.573918]  ? nldev_res_get_cm_id_dumpit+0x20/0x20 [ib_core]
>     [  302.574074]  netlink_unicast+0x1b8/0x270
>     [  302.574191]  netlink_sendmsg+0x33e/0x470
>     [  302.574307]  sock_sendmsg+0x63/0x70
>     [  302.574421]  __sys_sendto+0x13f/0x180
>     [  302.574536]  ? setup_sgl.isra.12+0x70/0xc0
>     [  302.574655]  __x64_sys_sendto+0x28/0x30
>     [  302.574769]  do_syscall_64+0x3a/0xb0
>     [  302.574884]  entry_SYSCALL_64_after_hwframe+0x44/0xae
>     =====================================================================

Extra line here.

> Signed-off-by: Ariel Elior <aelior@marvell.com>
> Signed-off-by: Shai Malin <smalin@marvell.com>
> Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
> Signed-off-by: Alok Prasad <palok@marvell.com>
> ---
> v3:(from [2]):
>          - Call query qp callback only in case of non-gsi qp as
>            suggested by Kamal.
>   [2] https://patchwork.kernel.org/project/linux-rdma/patch/20211023164557.7921-1-palok@marvell.com/
> 
> v2:(from [1]):
>          - Change description.
>          - Corrected enum type.
>   [1] https://patchwork.kernel.org/project/linux-rdma/patch/20210821074339.16614-1-palok@marvell.com/
> ---
>  drivers/infiniband/hw/qedr/verbs.c | 15 +++++++++------
>  1 file changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/qedr/verbs.c b/drivers/infiniband/hw/qedr/verbs.c
> index dcb3653db72d..3d4e4a766574 100644
> --- a/drivers/infiniband/hw/qedr/verbs.c
> +++ b/drivers/infiniband/hw/qedr/verbs.c
> @@ -2744,15 +2744,18 @@ int qedr_query_qp(struct ib_qp *ibqp,
>  	int rc = 0;
>  
>  	memset(&params, 0, sizeof(params));
> -
> -	rc = dev->ops->rdma_query_qp(dev->rdma_ctx, qp->qed_qp, &params);
> -	if (rc)
> -		goto err;
> -
>  	memset(qp_attr, 0, sizeof(*qp_attr));
>  	memset(qp_init_attr, 0, sizeof(*qp_init_attr));
>  
> -	qp_attr->qp_state = qedr_get_ibqp_state(params.state);
> +	if (qp->qp_type != IB_QPT_GSI) {
> +		rc = dev->ops->rdma_query_qp(dev->rdma_ctx, qp->qed_qp, &params);
> +		if (rc)
> +			goto err;
> +		qp_attr->qp_state = qedr_get_ibqp_state(params.state);
> +	} else {
> +		qp_attr->qp_state = qedr_get_ibqp_state(QED_ROCE_QP_STATE_RTS);

This line makes me wonder if any qp_attr assignments below are correct.
For example, cur_qp_state will stay to be QED_ROCE_QP_STATE_RESET.

Thanks

> +	}
> +
>  	qp_attr->cur_qp_state = qedr_get_ibqp_state(params.state);
>  	qp_attr->path_mtu = ib_mtu_int_to_enum(params.mtu);
>  	qp_attr->path_mig_state = IB_MIG_MIGRATED;
> -- 
> 2.17.1
>
Jason Gunthorpe Oct. 29, 2021, 3:06 p.m. UTC | #2
On Wed, Oct 27, 2021 at 06:43:29PM +0000, Alok Prasad wrote:
> This patch fixes crash caused by querying qp.
>     Also corrects the state of gsi qp.
> 
>     Below call trace is generated while using iproute2 utility
>     "rdma res show -dd qp" on rdma interface.
>     ======================================================================
>     [  302.569794] BUG: kernel NULL pointer dereference, address: 0000000000000034
>     ..
>     [  302.570378] Hardware name: Dell Inc. PowerEdge R720/0M1GCR, BIOS 1.2.6 05/10/2012
>     [  302.570500] RIP: 0010:qed_rdma_query_qp+0x33/0x1a0 [qed]
>     [  302.570861] RSP: 0018:ffffba560a08f580 EFLAGS: 00010206
>     [  302.570979] RAX: 0000000200000000 RBX: ffffba560a08f5b8 RCX: 0000000000000000
>     [  302.571100] RDX: ffffba560a08f5b8 RSI: 0000000000000000 RDI: ffff9807ee458090
>     [  302.571221] RBP: ffffba560a08f5a0 R08: 0000000000000000 R09: ffff9807890e7048
>     [  302.571342] R10: ffffba560a08f658 R11: 0000000000000000 R12: 0000000000000000
>     [  302.571462] R13: ffff9807ee458090 R14: ffff9807f0afb000 R15: ffffba560a08f7ec
>     [  302.571583] FS:  00007fbbf8bfe740(0000) GS:ffff980aafa00000(0000) knlGS:0000000000000000
>     [  302.571729] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>     [  302.571847] CR2: 0000000000000034 CR3: 00000001720ba001 CR4: 00000000000606f0
>     [  302.571968] Call Trace:
>     [  302.572083]  qedr_query_qp+0x82/0x360 [qedr]
>     [  302.572211]  ib_query_qp+0x34/0x40 [ib_core]
>     [  302.572361]  ? ib_query_qp+0x34/0x40 [ib_core]
>     [  302.572503]  fill_res_qp_entry_query.isra.26+0x47/0x1d0 [ib_core]
>     [  302.572670]  ? __nla_put+0x20/0x30
>     [  302.572788]  ? nla_put+0x33/0x40
>     [  302.572901]  fill_res_qp_entry+0xe3/0x120 [ib_core]
>     [  302.573058]  res_get_common_dumpit+0x3f8/0x5d0 [ib_core]
>     [  302.573213]  ? fill_res_cm_id_entry+0x1f0/0x1f0 [ib_core]
>     [  302.573377]  nldev_res_get_qp_dumpit+0x1a/0x20 [ib_core]
>     [  302.573529]  netlink_dump+0x156/0x2f0
>     [  302.573648]  __netlink_dump_start+0x1ab/0x260
>     [  302.573765]  rdma_nl_rcv+0x1de/0x330 [ib_core]
>     [  302.573918]  ? nldev_res_get_cm_id_dumpit+0x20/0x20 [ib_core]
>     [  302.574074]  netlink_unicast+0x1b8/0x270
>     [  302.574191]  netlink_sendmsg+0x33e/0x470
>     [  302.574307]  sock_sendmsg+0x63/0x70
>     [  302.574421]  __sys_sendto+0x13f/0x180
>     [  302.574536]  ? setup_sgl.isra.12+0x70/0xc0
>     [  302.574655]  __x64_sys_sendto+0x28/0x30
>     [  302.574769]  do_syscall_64+0x3a/0xb0
>     [  302.574884]  entry_SYSCALL_64_after_hwframe+0x44/0xae
>     =====================================================================
> 
> Signed-off-by: Ariel Elior <aelior@marvell.com>
> Signed-off-by: Shai Malin <smalin@marvell.com>
> Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
> Signed-off-by: Alok Prasad <palok@marvell.com>
> ---
>  drivers/infiniband/hw/qedr/verbs.c | 15 +++++++++------
>  1 file changed, 9 insertions(+), 6 deletions(-)

Applied to for-rc, I added a fixes line

Jason
diff mbox series

Patch

diff --git a/drivers/infiniband/hw/qedr/verbs.c b/drivers/infiniband/hw/qedr/verbs.c
index dcb3653db72d..3d4e4a766574 100644
--- a/drivers/infiniband/hw/qedr/verbs.c
+++ b/drivers/infiniband/hw/qedr/verbs.c
@@ -2744,15 +2744,18 @@  int qedr_query_qp(struct ib_qp *ibqp,
 	int rc = 0;
 
 	memset(&params, 0, sizeof(params));
-
-	rc = dev->ops->rdma_query_qp(dev->rdma_ctx, qp->qed_qp, &params);
-	if (rc)
-		goto err;
-
 	memset(qp_attr, 0, sizeof(*qp_attr));
 	memset(qp_init_attr, 0, sizeof(*qp_init_attr));
 
-	qp_attr->qp_state = qedr_get_ibqp_state(params.state);
+	if (qp->qp_type != IB_QPT_GSI) {
+		rc = dev->ops->rdma_query_qp(dev->rdma_ctx, qp->qed_qp, &params);
+		if (rc)
+			goto err;
+		qp_attr->qp_state = qedr_get_ibqp_state(params.state);
+	} else {
+		qp_attr->qp_state = qedr_get_ibqp_state(QED_ROCE_QP_STATE_RTS);
+	}
+
 	qp_attr->cur_qp_state = qedr_get_ibqp_state(params.state);
 	qp_attr->path_mtu = ib_mtu_int_to_enum(params.mtu);
 	qp_attr->path_mig_state = IB_MIG_MIGRATED;