Message ID | 20210821074339.16614-1-palok@marvell.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Leon Romanovsky |
Headers | show |
Series | [for-rc] RDMA/qedr: qedr crash while running rdma-tool. | expand |
On Sat, Aug 21, 2021 at 07:43:39AM +0000, Alok Prasad wrote: > This patch fixes crash caused by querying qp. > This is due the fact that when no traffic is running, > rdma_create_qp hasn't created any qp hence qed->qp is null. This description is not correct, all QP creation flows dev->ops->rdma_create_qp() is called and if qedr_create_qp() successes, we will have valid qp->qed_qp pointer. > > Below call trace is generated while using iproute2 utility > "rdma res show -dd qp" on rdma interface. > > ========================================================================== > [ 302.569794] BUG: kernel NULL pointer dereference, address: 0000000000000034 > .. > [ 302.570378] Hardware name: Dell Inc. PowerEdge R720/0M1GCR, BIOS 1.2.6 05/10/2012 > [ 302.570500] RIP: 0010:qed_rdma_query_qp+0x33/0x1a0 [qed] > [ 302.570861] RSP: 0018:ffffba560a08f580 EFLAGS: 00010206 > [ 302.570979] RAX: 0000000200000000 RBX: ffffba560a08f5b8 RCX: 0000000000000000 > [ 302.571100] RDX: ffffba560a08f5b8 RSI: 0000000000000000 RDI: ffff9807ee458090 > [ 302.571221] RBP: ffffba560a08f5a0 R08: 0000000000000000 R09: ffff9807890e7048 > [ 302.571342] R10: ffffba560a08f658 R11: 0000000000000000 R12: 0000000000000000 > [ 302.571462] R13: ffff9807ee458090 R14: ffff9807f0afb000 R15: ffffba560a08f7ec > [ 302.571583] FS: 00007fbbf8bfe740(0000) GS:ffff980aafa00000(0000) knlGS:0000000000000000 > [ 302.571729] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 302.571847] CR2: 0000000000000034 CR3: 00000001720ba001 CR4: 00000000000606f0 > [ 302.571968] Call Trace: > [ 302.572083] qedr_query_qp+0x82/0x360 [qedr] > [ 302.572211] ib_query_qp+0x34/0x40 [ib_core] > [ 302.572361] ? ib_query_qp+0x34/0x40 [ib_core] > [ 302.572503] fill_res_qp_entry_query.isra.26+0x47/0x1d0 [ib_core] > [ 302.572670] ? __nla_put+0x20/0x30 > [ 302.572788] ? nla_put+0x33/0x40 > [ 302.572901] fill_res_qp_entry+0xe3/0x120 [ib_core] > [ 302.573058] res_get_common_dumpit+0x3f8/0x5d0 [ib_core] > [ 302.573213] ? fill_res_cm_id_entry+0x1f0/0x1f0 [ib_core] > [ 302.573377] nldev_res_get_qp_dumpit+0x1a/0x20 [ib_core] > [ 302.573529] netlink_dump+0x156/0x2f0 > [ 302.573648] __netlink_dump_start+0x1ab/0x260 > [ 302.573765] rdma_nl_rcv+0x1de/0x330 [ib_core] > [ 302.573918] ? nldev_res_get_cm_id_dumpit+0x20/0x20 [ib_core] > [ 302.574074] netlink_unicast+0x1b8/0x270 > [ 302.574191] netlink_sendmsg+0x33e/0x470 > [ 302.574307] sock_sendmsg+0x63/0x70 > [ 302.574421] __sys_sendto+0x13f/0x180 > [ 302.574536] ? setup_sgl.isra.12+0x70/0xc0 > [ 302.574655] __x64_sys_sendto+0x28/0x30 > [ 302.574769] do_syscall_64+0x3a/0xb0 > [ 302.574884] entry_SYSCALL_64_after_hwframe+0x44/0xae > ========================================================================== > > Signed-off-by: Ariel Elior <aelior@marvell.com> > Signed-off-by: Shai Malin <smalin@marvell.com> > Signed-off-by: Alok Prasad <palok@marvell.com> > --- > drivers/infiniband/hw/qedr/verbs.c | 17 +++++++++-------- > 1 file changed, 9 insertions(+), 8 deletions(-) > > diff --git a/drivers/infiniband/hw/qedr/verbs.c b/drivers/infiniband/hw/qedr/verbs.c > index fdc47ef7d861..79603e3fe2db 100644 > --- a/drivers/infiniband/hw/qedr/verbs.c > +++ b/drivers/infiniband/hw/qedr/verbs.c > @@ -2758,15 +2758,18 @@ int qedr_query_qp(struct ib_qp *ibqp, > int rc = 0; > > memset(¶ms, 0, sizeof(params)); > - > - rc = dev->ops->rdma_query_qp(dev->rdma_ctx, qp->qed_qp, ¶ms); > - if (rc) > - goto err; > - At that point, QP should be valid. > memset(qp_attr, 0, sizeof(*qp_attr)); > memset(qp_init_attr, 0, sizeof(*qp_init_attr)); > > - qp_attr->qp_state = qedr_get_ibqp_state(params.state); > + if (qp->qed_qp) > + rc = dev->ops->rdma_query_qp(dev->rdma_ctx, > + qp->qed_qp, ¶ms); > + > + if (qp->qp_type == IB_QPT_GSI) > + qp_attr->qp_state = QED_ROCE_QP_STATE_RTS; > + else > + qp_attr->qp_state = qedr_get_ibqp_state(params.state); > + > qp_attr->cur_qp_state = qedr_get_ibqp_state(params.state); > qp_attr->path_mtu = ib_mtu_int_to_enum(params.mtu); > qp_attr->path_mig_state = IB_MIG_MIGRATED; > @@ -2810,8 +2813,6 @@ int qedr_query_qp(struct ib_qp *ibqp, > > DP_DEBUG(dev, QEDR_MSG_QP, "QEDR_QUERY_QP: max_inline_data=%d\n", > qp_attr->cap.max_inline_data); > - > -err: > return rc; > } > > -- > 2.17.1 >
Hi Alok, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on rdma/for-next] [also build test WARNING on v5.14-rc6 next-20210820] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Alok-Prasad/RDMA-qedr-qedr-crash-while-running-rdma-tool/20210821-154459 base: https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git for-next config: powerpc-allyesconfig (attached as .config) compiler: powerpc64-linux-gcc (GCC) 11.2.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/0day-ci/linux/commit/f9b6462f18a87caead9b362d4cdd049504ac3c62 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Alok-Prasad/RDMA-qedr-qedr-crash-while-running-rdma-tool/20210821-154459 git checkout f9b6462f18a87caead9b362d4cdd049504ac3c62 # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross ARCH=powerpc If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot <lkp@intel.com> All warnings (new ones prefixed by >>): drivers/infiniband/hw/qedr/verbs.c: In function 'qedr_query_qp': >> drivers/infiniband/hw/qedr/verbs.c:2754:35: warning: implicit conversion from 'enum qed_roce_qp_state' to 'enum ib_qp_state' [-Wenum-conversion] 2754 | qp_attr->qp_state = QED_ROCE_QP_STATE_RTS; | ^ vim +2754 drivers/infiniband/hw/qedr/verbs.c 2735 2736 int qedr_query_qp(struct ib_qp *ibqp, 2737 struct ib_qp_attr *qp_attr, 2738 int attr_mask, struct ib_qp_init_attr *qp_init_attr) 2739 { 2740 struct qed_rdma_query_qp_out_params params; 2741 struct qedr_qp *qp = get_qedr_qp(ibqp); 2742 struct qedr_dev *dev = qp->dev; 2743 int rc = 0; 2744 2745 memset(¶ms, 0, sizeof(params)); 2746 memset(qp_attr, 0, sizeof(*qp_attr)); 2747 memset(qp_init_attr, 0, sizeof(*qp_init_attr)); 2748 2749 if (qp->qed_qp) 2750 rc = dev->ops->rdma_query_qp(dev->rdma_ctx, 2751 qp->qed_qp, ¶ms); 2752 2753 if (qp->qp_type == IB_QPT_GSI) > 2754 qp_attr->qp_state = QED_ROCE_QP_STATE_RTS; 2755 else 2756 qp_attr->qp_state = qedr_get_ibqp_state(params.state); 2757 2758 qp_attr->cur_qp_state = qedr_get_ibqp_state(params.state); 2759 qp_attr->path_mtu = ib_mtu_int_to_enum(params.mtu); 2760 qp_attr->path_mig_state = IB_MIG_MIGRATED; 2761 qp_attr->rq_psn = params.rq_psn; 2762 qp_attr->sq_psn = params.sq_psn; 2763 qp_attr->dest_qp_num = params.dest_qp; 2764 2765 qp_attr->qp_access_flags = qedr_to_ib_qp_acc_flags(¶ms); 2766 2767 qp_attr->cap.max_send_wr = qp->sq.max_wr; 2768 qp_attr->cap.max_recv_wr = qp->rq.max_wr; 2769 qp_attr->cap.max_send_sge = qp->sq.max_sges; 2770 qp_attr->cap.max_recv_sge = qp->rq.max_sges; 2771 qp_attr->cap.max_inline_data = dev->attr.max_inline; 2772 qp_init_attr->cap = qp_attr->cap; 2773 2774 qp_attr->ah_attr.type = RDMA_AH_ATTR_TYPE_ROCE; 2775 rdma_ah_set_grh(&qp_attr->ah_attr, NULL, 2776 params.flow_label, qp->sgid_idx, 2777 params.hop_limit_ttl, params.traffic_class_tos); 2778 rdma_ah_set_dgid_raw(&qp_attr->ah_attr, ¶ms.dgid.bytes[0]); 2779 rdma_ah_set_port_num(&qp_attr->ah_attr, 1); 2780 rdma_ah_set_sl(&qp_attr->ah_attr, 0); 2781 qp_attr->timeout = params.timeout; 2782 qp_attr->rnr_retry = params.rnr_retry; 2783 qp_attr->retry_cnt = params.retry_cnt; 2784 qp_attr->min_rnr_timer = params.min_rnr_nak_timer; 2785 qp_attr->pkey_index = params.pkey_index; 2786 qp_attr->port_num = 1; 2787 rdma_ah_set_path_bits(&qp_attr->ah_attr, 0); 2788 rdma_ah_set_static_rate(&qp_attr->ah_attr, 0); 2789 qp_attr->alt_pkey_index = 0; 2790 qp_attr->alt_port_num = 0; 2791 qp_attr->alt_timeout = 0; 2792 memset(&qp_attr->alt_ah_attr, 0, sizeof(qp_attr->alt_ah_attr)); 2793 2794 qp_attr->sq_draining = (params.state == QED_ROCE_QP_STATE_SQD) ? 1 : 0; 2795 qp_attr->max_dest_rd_atomic = params.max_dest_rd_atomic; 2796 qp_attr->max_rd_atomic = params.max_rd_atomic; 2797 qp_attr->en_sqd_async_notify = (params.sqd_async) ? 1 : 0; 2798 2799 DP_DEBUG(dev, QEDR_MSG_QP, "QEDR_QUERY_QP: max_inline_data=%d\n", 2800 qp_attr->cap.max_inline_data); 2801 return rc; 2802 } 2803 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
Hi Leon, > On Sat, Aug 21, 2021 at 07:43:39AM +0000, Alok Prasad wrote: > > This patch fixes crash caused by querying qp. > > This is due the fact that when no traffic is running, > > rdma_create_qp hasn't created any qp hence qed->qp is null. > > This description is not correct, all QP creation flows > dev->ops->rdma_create_qp() is called and if qedr_create_qp() successes, > we will have valid qp->qed_qp pointer. > In qedr_create_qp(), first qp we create is GSI QP and it immediately returns after creating gsi_qp, and none of function either qedr_create_user_qp() nor qedr_create_kernel_qp() is called, both of them would have in turned called dev->ops->rdma_create_qp(), hence qp->qed_qp is null here. Anyway will send a v2 as kernel test robot reported one Enum Warning. > > > > Below call trace is generated while using iproute2 utility > > "rdma res show -dd qp" on rdma interface. > > > > ========================================================================== > > [ 302.569794] BUG: kernel NULL pointer dereference, address: 0000000000000034 > > .. > > [ 302.570378] Hardware name: Dell Inc. PowerEdge R720/0M1GCR, BIOS 1.2.6 05/10/2012 > > [ 302.570500] RIP: 0010:qed_rdma_query_qp+0x33/0x1a0 [qed] > > [ 302.570861] RSP: 0018:ffffba560a08f580 EFLAGS: 00010206 > > [ 302.570979] RAX: 0000000200000000 RBX: ffffba560a08f5b8 RCX: 0000000000000000 > > [ 302.571100] RDX: ffffba560a08f5b8 RSI: 0000000000000000 RDI: ffff9807ee458090 > > [ 302.571221] RBP: ffffba560a08f5a0 R08: 0000000000000000 R09: ffff9807890e7048 > > [ 302.571342] R10: ffffba560a08f658 R11: 0000000000000000 R12: 0000000000000000 > > [ 302.571462] R13: ffff9807ee458090 R14: ffff9807f0afb000 R15: ffffba560a08f7ec > > [ 302.571583] FS: 00007fbbf8bfe740(0000) GS:ffff980aafa00000(0000) > knlGS:0000000000000000 > > [ 302.571729] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 302.571847] CR2: 0000000000000034 CR3: 00000001720ba001 CR4: 00000000000606f0 > > [ 302.571968] Call Trace: > > [ 302.572083] qedr_query_qp+0x82/0x360 [qedr] > > [ 302.572211] ib_query_qp+0x34/0x40 [ib_core] > > [ 302.572361] ? ib_query_qp+0x34/0x40 [ib_core] > > [ 302.572503] fill_res_qp_entry_query.isra.26+0x47/0x1d0 [ib_core] > > [ 302.572670] ? __nla_put+0x20/0x30 > > [ 302.572788] ? nla_put+0x33/0x40 > > [ 302.572901] fill_res_qp_entry+0xe3/0x120 [ib_core] > > [ 302.573058] res_get_common_dumpit+0x3f8/0x5d0 [ib_core] > > [ 302.573213] ? fill_res_cm_id_entry+0x1f0/0x1f0 [ib_core] > > [ 302.573377] nldev_res_get_qp_dumpit+0x1a/0x20 [ib_core] > > [ 302.573529] netlink_dump+0x156/0x2f0 > > [ 302.573648] __netlink_dump_start+0x1ab/0x260 > > [ 302.573765] rdma_nl_rcv+0x1de/0x330 [ib_core] > > [ 302.573918] ? nldev_res_get_cm_id_dumpit+0x20/0x20 [ib_core] > > [ 302.574074] netlink_unicast+0x1b8/0x270 > > [ 302.574191] netlink_sendmsg+0x33e/0x470 > > [ 302.574307] sock_sendmsg+0x63/0x70 > > [ 302.574421] __sys_sendto+0x13f/0x180 > > [ 302.574536] ? setup_sgl.isra.12+0x70/0xc0 > > [ 302.574655] __x64_sys_sendto+0x28/0x30 > > [ 302.574769] do_syscall_64+0x3a/0xb0 > > [ 302.574884] entry_SYSCALL_64_after_hwframe+0x44/0xae > > ========================================================================== > > > > Signed-off-by: Ariel Elior <aelior@marvell.com> > > Signed-off-by: Shai Malin <smalin@marvell.com> > > Signed-off-by: Alok Prasad <palok@marvell.com> > > --- > > drivers/infiniband/hw/qedr/verbs.c | 17 +++++++++-------- > > 1 file changed, 9 insertions(+), 8 deletions(-) > > > > diff --git a/drivers/infiniband/hw/qedr/verbs.c b/drivers/infiniband/hw/qedr/verbs.c > > index fdc47ef7d861..79603e3fe2db 100644 > > --- a/drivers/infiniband/hw/qedr/verbs.c > > +++ b/drivers/infiniband/hw/qedr/verbs.c > > @@ -2758,15 +2758,18 @@ int qedr_query_qp(struct ib_qp *ibqp, > > int rc = 0; > > > > memset(¶ms, 0, sizeof(params)); > > - > > - rc = dev->ops->rdma_query_qp(dev->rdma_ctx, qp->qed_qp, ¶ms); > > - if (rc) > > - goto err; > > - > > At that point, QP should be valid. > > > memset(qp_attr, 0, sizeof(*qp_attr)); > > memset(qp_init_attr, 0, sizeof(*qp_init_attr)); > > > > - qp_attr->qp_state = qedr_get_ibqp_state(params.state); > > + if (qp->qed_qp) > > + rc = dev->ops->rdma_query_qp(dev->rdma_ctx, > > + qp->qed_qp, ¶ms); > > + > > + if (qp->qp_type == IB_QPT_GSI) > > + qp_attr->qp_state = QED_ROCE_QP_STATE_RTS; > > + else > > + qp_attr->qp_state = qedr_get_ibqp_state(params.state); > > + > > qp_attr->cur_qp_state = qedr_get_ibqp_state(params.state); > > qp_attr->path_mtu = ib_mtu_int_to_enum(params.mtu); > > qp_attr->path_mig_state = IB_MIG_MIGRATED; > > @@ -2810,8 +2813,6 @@ int qedr_query_qp(struct ib_qp *ibqp, > > > > DP_DEBUG(dev, QEDR_MSG_QP, "QEDR_QUERY_QP: max_inline_data=%d\n", > > qp_attr->cap.max_inline_data); > > - > > -err: > > return rc; > > } > > > > -- > > 2.17.1 > >
On 8/24/21 09:19, Alok Prasad wrote: > Hi Leon, > >> On Sat, Aug 21, 2021 at 07:43:39AM +0000, Alok Prasad wrote: >>> This patch fixes crash caused by querying qp. >>> This is due the fact that when no traffic is running, >>> rdma_create_qp hasn't created any qp hence qed->qp is null. >> >> This description is not correct, all QP creation flows >> dev->ops->rdma_create_qp() is called and if qedr_create_qp() successes, >> we will have valid qp->qed_qp pointer. >> > > In qedr_create_qp(), first qp we create is GSI QP > and it immediately returns after creating gsi_qp, and none of function > either qedr_create_user_qp() nor qedr_create_kernel_qp() is > called, both of them would have in turned called dev->ops->rdma_create_qp(), > hence qp->qed_qp is null here. > > Anyway will send a v2 as kernel test robot reported one > Enum Warning. Hi Alok, Could you please tell when you plan to send a v2 for this patch? We need this patch to get accepted in order to fix the distribution version of the qedr driver. Thanks, Kamal > >>> >>> Below call trace is generated while using iproute2 utility >>> "rdma res show -dd qp" on rdma interface. >>> >>> ========================================================================== >>> [ 302.569794] BUG: kernel NULL pointer dereference, address: 0000000000000034 >>> .. >>> [ 302.570378] Hardware name: Dell Inc. PowerEdge R720/0M1GCR, BIOS 1.2.6 05/10/2012 >>> [ 302.570500] RIP: 0010:qed_rdma_query_qp+0x33/0x1a0 [qed] >>> [ 302.570861] RSP: 0018:ffffba560a08f580 EFLAGS: 00010206 >>> [ 302.570979] RAX: 0000000200000000 RBX: ffffba560a08f5b8 RCX: 0000000000000000 >>> [ 302.571100] RDX: ffffba560a08f5b8 RSI: 0000000000000000 RDI: ffff9807ee458090 >>> [ 302.571221] RBP: ffffba560a08f5a0 R08: 0000000000000000 R09: ffff9807890e7048 >>> [ 302.571342] R10: ffffba560a08f658 R11: 0000000000000000 R12: 0000000000000000 >>> [ 302.571462] R13: ffff9807ee458090 R14: ffff9807f0afb000 R15: ffffba560a08f7ec >>> [ 302.571583] FS: 00007fbbf8bfe740(0000) GS:ffff980aafa00000(0000) >> knlGS:0000000000000000 >>> [ 302.571729] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> [ 302.571847] CR2: 0000000000000034 CR3: 00000001720ba001 CR4: 00000000000606f0 >>> [ 302.571968] Call Trace: >>> [ 302.572083] qedr_query_qp+0x82/0x360 [qedr] >>> [ 302.572211] ib_query_qp+0x34/0x40 [ib_core] >>> [ 302.572361] ? ib_query_qp+0x34/0x40 [ib_core] >>> [ 302.572503] fill_res_qp_entry_query.isra.26+0x47/0x1d0 [ib_core] >>> [ 302.572670] ? __nla_put+0x20/0x30 >>> [ 302.572788] ? nla_put+0x33/0x40 >>> [ 302.572901] fill_res_qp_entry+0xe3/0x120 [ib_core] >>> [ 302.573058] res_get_common_dumpit+0x3f8/0x5d0 [ib_core] >>> [ 302.573213] ? fill_res_cm_id_entry+0x1f0/0x1f0 [ib_core] >>> [ 302.573377] nldev_res_get_qp_dumpit+0x1a/0x20 [ib_core] >>> [ 302.573529] netlink_dump+0x156/0x2f0 >>> [ 302.573648] __netlink_dump_start+0x1ab/0x260 >>> [ 302.573765] rdma_nl_rcv+0x1de/0x330 [ib_core] >>> [ 302.573918] ? nldev_res_get_cm_id_dumpit+0x20/0x20 [ib_core] >>> [ 302.574074] netlink_unicast+0x1b8/0x270 >>> [ 302.574191] netlink_sendmsg+0x33e/0x470 >>> [ 302.574307] sock_sendmsg+0x63/0x70 >>> [ 302.574421] __sys_sendto+0x13f/0x180 >>> [ 302.574536] ? setup_sgl.isra.12+0x70/0xc0 >>> [ 302.574655] __x64_sys_sendto+0x28/0x30 >>> [ 302.574769] do_syscall_64+0x3a/0xb0 >>> [ 302.574884] entry_SYSCALL_64_after_hwframe+0x44/0xae >>> ========================================================================== >>> >>> Signed-off-by: Ariel Elior <aelior@marvell.com> >>> Signed-off-by: Shai Malin <smalin@marvell.com> >>> Signed-off-by: Alok Prasad <palok@marvell.com> >>> --- >>> drivers/infiniband/hw/qedr/verbs.c | 17 +++++++++-------- >>> 1 file changed, 9 insertions(+), 8 deletions(-) >>> >>> diff --git a/drivers/infiniband/hw/qedr/verbs.c b/drivers/infiniband/hw/qedr/verbs.c >>> index fdc47ef7d861..79603e3fe2db 100644 >>> --- a/drivers/infiniband/hw/qedr/verbs.c >>> +++ b/drivers/infiniband/hw/qedr/verbs.c >>> @@ -2758,15 +2758,18 @@ int qedr_query_qp(struct ib_qp *ibqp, >>> int rc = 0; >>> >>> memset(¶ms, 0, sizeof(params)); >>> - >>> - rc = dev->ops->rdma_query_qp(dev->rdma_ctx, qp->qed_qp, ¶ms); >>> - if (rc) >>> - goto err; >>> - >> >> At that point, QP should be valid. >> >>> memset(qp_attr, 0, sizeof(*qp_attr)); >>> memset(qp_init_attr, 0, sizeof(*qp_init_attr)); >>> >>> - qp_attr->qp_state = qedr_get_ibqp_state(params.state); >>> + if (qp->qed_qp) >>> + rc = dev->ops->rdma_query_qp(dev->rdma_ctx, >>> + qp->qed_qp, ¶ms); >>> + >>> + if (qp->qp_type == IB_QPT_GSI) >>> + qp_attr->qp_state = QED_ROCE_QP_STATE_RTS; >>> + else >>> + qp_attr->qp_state = qedr_get_ibqp_state(params.state); >>> + >>> qp_attr->cur_qp_state = qedr_get_ibqp_state(params.state); >>> qp_attr->path_mtu = ib_mtu_int_to_enum(params.mtu); >>> qp_attr->path_mig_state = IB_MIG_MIGRATED; >>> @@ -2810,8 +2813,6 @@ int qedr_query_qp(struct ib_qp *ibqp, >>> >>> DP_DEBUG(dev, QEDR_MSG_QP, "QEDR_QUERY_QP: max_inline_data=%d\n", >>> qp_attr->cap.max_inline_data); >>> - >>> -err: >>> return rc; >>> } >>> >>> -- >>> 2.17.1 >>> >
> -----Original Message----- > From: Kamal Heib <kheib@redhat.com> > Sent: 22 October 2021 21:20 > To: Alok Prasad <palok@marvell.com> > Cc: jgg@ziepe.ca; dledford@redhat.com; Michal Kalderon <mkalderon@marvell.com>; Ariel > Elior <aelior@marvell.com>; Shai Malin <smalin@marvell.com>; linux-rdma@vger.kernel.org; > Leon Romanovsky <leon@kernel.org> > Subject: [EXT] Re: [for-rc] RDMA/qedr: qedr crash while running rdma-tool. > > External Email > > ---------------------------------------------------------------------- > > > On 8/24/21 09:19, Alok Prasad wrote: > > Hi Leon, > > > >> On Sat, Aug 21, 2021 at 07:43:39AM +0000, Alok Prasad wrote: > >>> This patch fixes crash caused by querying qp. > >>> This is due the fact that when no traffic is running, > >>> rdma_create_qp hasn't created any qp hence qed->qp is null. > >> > >> This description is not correct, all QP creation flows > >> dev->ops->rdma_create_qp() is called and if qedr_create_qp() successes, > >> we will have valid qp->qed_qp pointer. > >> > > > > In qedr_create_qp(), first qp we create is GSI QP > > and it immediately returns after creating gsi_qp, and none of function > > either qedr_create_user_qp() nor qedr_create_kernel_qp() is > > called, both of them would have in turned called dev->ops->rdma_create_qp(), > > hence qp->qed_qp is null here. > > > > Anyway will send a v2 as kernel test robot reported one > > Enum Warning. > > Hi Alok, > > Could you please tell when you plan to send a v2 for this patch? > > We need this patch to get accepted in order to fix the distribution > version of the qedr driver. > > Thanks, > Kamal Just sent! Thanks Reminding it. Regards, Alok > > > >>> > >>> Below call trace is generated while using iproute2 utility > >>> "rdma res show -dd qp" on rdma interface. > >>> > >>> ========================================================================== > >>> [ 302.569794] BUG: kernel NULL pointer dereference, address: 0000000000000034 > >>> .. > >>> [ 302.570378] Hardware name: Dell Inc. PowerEdge R720/0M1GCR, BIOS 1.2.6 05/10/2012 > >>> [ 302.570500] RIP: 0010:qed_rdma_query_qp+0x33/0x1a0 [qed] > >>> [ 302.570861] RSP: 0018:ffffba560a08f580 EFLAGS: 00010206 > >>> [ 302.570979] RAX: 0000000200000000 RBX: ffffba560a08f5b8 RCX: 0000000000000000 > >>> [ 302.571100] RDX: ffffba560a08f5b8 RSI: 0000000000000000 RDI: ffff9807ee458090 > >>> [ 302.571221] RBP: ffffba560a08f5a0 R08: 0000000000000000 R09: ffff9807890e7048 > >>> [ 302.571342] R10: ffffba560a08f658 R11: 0000000000000000 R12: 0000000000000000 > >>> [ 302.571462] R13: ffff9807ee458090 R14: ffff9807f0afb000 R15: ffffba560a08f7ec > >>> [ 302.571583] FS: 00007fbbf8bfe740(0000) GS:ffff980aafa00000(0000) > >> knlGS:0000000000000000 > >>> [ 302.571729] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >>> [ 302.571847] CR2: 0000000000000034 CR3: 00000001720ba001 CR4: 00000000000606f0 > >>> [ 302.571968] Call Trace: > >>> [ 302.572083] qedr_query_qp+0x82/0x360 [qedr] > >>> [ 302.572211] ib_query_qp+0x34/0x40 [ib_core] > >>> [ 302.572361] ? ib_query_qp+0x34/0x40 [ib_core] > >>> [ 302.572503] fill_res_qp_entry_query.isra.26+0x47/0x1d0 [ib_core] > >>> [ 302.572670] ? __nla_put+0x20/0x30 > >>> [ 302.572788] ? nla_put+0x33/0x40 > >>> [ 302.572901] fill_res_qp_entry+0xe3/0x120 [ib_core] > >>> [ 302.573058] res_get_common_dumpit+0x3f8/0x5d0 [ib_core] > >>> [ 302.573213] ? fill_res_cm_id_entry+0x1f0/0x1f0 [ib_core] > >>> [ 302.573377] nldev_res_get_qp_dumpit+0x1a/0x20 [ib_core] > >>> [ 302.573529] netlink_dump+0x156/0x2f0 > >>> [ 302.573648] __netlink_dump_start+0x1ab/0x260 > >>> [ 302.573765] rdma_nl_rcv+0x1de/0x330 [ib_core] > >>> [ 302.573918] ? nldev_res_get_cm_id_dumpit+0x20/0x20 [ib_core] > >>> [ 302.574074] netlink_unicast+0x1b8/0x270 > >>> [ 302.574191] netlink_sendmsg+0x33e/0x470 > >>> [ 302.574307] sock_sendmsg+0x63/0x70 > >>> [ 302.574421] __sys_sendto+0x13f/0x180 > >>> [ 302.574536] ? setup_sgl.isra.12+0x70/0xc0 > >>> [ 302.574655] __x64_sys_sendto+0x28/0x30 > >>> [ 302.574769] do_syscall_64+0x3a/0xb0 > >>> [ 302.574884] entry_SYSCALL_64_after_hwframe+0x44/0xae > >>> ========================================================================== > >>> > >>> Signed-off-by: Ariel Elior <aelior@marvell.com> > >>> Signed-off-by: Shai Malin <smalin@marvell.com> > >>> Signed-off-by: Alok Prasad <palok@marvell.com> > >>> --- > >>> drivers/infiniband/hw/qedr/verbs.c | 17 +++++++++-------- > >>> 1 file changed, 9 insertions(+), 8 deletions(-) > >>> > >>> diff --git a/drivers/infiniband/hw/qedr/verbs.c b/drivers/infiniband/hw/qedr/verbs.c > >>> index fdc47ef7d861..79603e3fe2db 100644 > >>> --- a/drivers/infiniband/hw/qedr/verbs.c > >>> +++ b/drivers/infiniband/hw/qedr/verbs.c > >>> @@ -2758,15 +2758,18 @@ int qedr_query_qp(struct ib_qp *ibqp, > >>> int rc = 0; > >>> > >>> memset(¶ms, 0, sizeof(params)); > >>> - > >>> - rc = dev->ops->rdma_query_qp(dev->rdma_ctx, qp->qed_qp, ¶ms); > >>> - if (rc) > >>> - goto err; > >>> - > >> > >> At that point, QP should be valid. > >> > >>> memset(qp_attr, 0, sizeof(*qp_attr)); > >>> memset(qp_init_attr, 0, sizeof(*qp_init_attr)); > >>> > >>> - qp_attr->qp_state = qedr_get_ibqp_state(params.state); > >>> + if (qp->qed_qp) > >>> + rc = dev->ops->rdma_query_qp(dev->rdma_ctx, > >>> + qp->qed_qp, ¶ms); > >>> + > >>> + if (qp->qp_type == IB_QPT_GSI) > >>> + qp_attr->qp_state = QED_ROCE_QP_STATE_RTS; > >>> + else > >>> + qp_attr->qp_state = qedr_get_ibqp_state(params.state); > >>> + > >>> qp_attr->cur_qp_state = qedr_get_ibqp_state(params.state); > >>> qp_attr->path_mtu = ib_mtu_int_to_enum(params.mtu); > >>> qp_attr->path_mig_state = IB_MIG_MIGRATED; > >>> @@ -2810,8 +2813,6 @@ int qedr_query_qp(struct ib_qp *ibqp, > >>> > >>> DP_DEBUG(dev, QEDR_MSG_QP, "QEDR_QUERY_QP: max_inline_data=%d\n", > >>> qp_attr->cap.max_inline_data); > >>> - > >>> -err: > >>> return rc; > >>> } > >>> > >>> -- > >>> 2.17.1 > >>> > >
diff --git a/drivers/infiniband/hw/qedr/verbs.c b/drivers/infiniband/hw/qedr/verbs.c index fdc47ef7d861..79603e3fe2db 100644 --- a/drivers/infiniband/hw/qedr/verbs.c +++ b/drivers/infiniband/hw/qedr/verbs.c @@ -2758,15 +2758,18 @@ int qedr_query_qp(struct ib_qp *ibqp, int rc = 0; memset(¶ms, 0, sizeof(params)); - - rc = dev->ops->rdma_query_qp(dev->rdma_ctx, qp->qed_qp, ¶ms); - if (rc) - goto err; - memset(qp_attr, 0, sizeof(*qp_attr)); memset(qp_init_attr, 0, sizeof(*qp_init_attr)); - qp_attr->qp_state = qedr_get_ibqp_state(params.state); + if (qp->qed_qp) + rc = dev->ops->rdma_query_qp(dev->rdma_ctx, + qp->qed_qp, ¶ms); + + if (qp->qp_type == IB_QPT_GSI) + qp_attr->qp_state = QED_ROCE_QP_STATE_RTS; + else + qp_attr->qp_state = qedr_get_ibqp_state(params.state); + qp_attr->cur_qp_state = qedr_get_ibqp_state(params.state); qp_attr->path_mtu = ib_mtu_int_to_enum(params.mtu); qp_attr->path_mig_state = IB_MIG_MIGRATED; @@ -2810,8 +2813,6 @@ int qedr_query_qp(struct ib_qp *ibqp, DP_DEBUG(dev, QEDR_MSG_QP, "QEDR_QUERY_QP: max_inline_data=%d\n", qp_attr->cap.max_inline_data); - -err: return rc; }