Message ID | 1616420132-31005-1-git-send-email-haakon.bugge@oracle.com (mailing list archive) |
---|---|
State | Accepted |
Delegated to: | Jason Gunthorpe |
Headers | show |
Series | [for-rc] RDMA/core: Fix corrupted SL on passive side | expand |
On Mon, Mar 22, 2021 at 02:35:32PM +0100, Håkon Bugge wrote: > On RoCE systems, a CM REQ contains a Primary Hop Limit > 1 and Primary > Subnet Local is zero. > > In cm_req_handler(), the cm_process_routed_req() function is > called. Since the Primary Subnet Local value is zero in the request, > and since this is RoCE (Primary Local LID is permissive), the > following statement will be executed: > > IBA_SET(CM_REQ_PRIMARY_SL, req_msg, wc->sl); > > This corrupts SL in req_msg if it was different from zero. In other > words, a request to setup a connection using an SL != zero, will not > be honored, and a connection using SL zero will be created instead. > > Fixed by not calling cm_process_routed_req() on RoCE systems. > > Fixes: 3971c9f6dbf2 ("IB/cm: Add interim support for routed paths") > Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> > drivers/infiniband/core/cm.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c > index 3d194bb..6adbaea 100644 > +++ b/drivers/infiniband/core/cm.c > @@ -2138,7 +2138,8 @@ static int cm_req_handler(struct cm_work *work) > goto destroy; > } > > - cm_process_routed_req(req_msg, work->mad_recv_wc->wc); > + if (cm_id_priv->av.ah_attr.type != RDMA_AH_ATTR_TYPE_ROCE) > + cm_process_routed_req(req_msg, work->mad_recv_wc->wc); why use ah_attr.type when a few lines below we have: if (gid_attr && rdma_protocol_roce(work->port->cm_dev->ib_device, work->port->port_num)) { ? I suspect you can just move this into the else? Jason
> On 23 Mar 2021, at 20:46, Jason Gunthorpe <jgg@nvidia.com> wrote: > > On Mon, Mar 22, 2021 at 02:35:32PM +0100, Håkon Bugge wrote: >> On RoCE systems, a CM REQ contains a Primary Hop Limit > 1 and Primary >> Subnet Local is zero. >> >> In cm_req_handler(), the cm_process_routed_req() function is >> called. Since the Primary Subnet Local value is zero in the request, >> and since this is RoCE (Primary Local LID is permissive), the >> following statement will be executed: >> >> IBA_SET(CM_REQ_PRIMARY_SL, req_msg, wc->sl); >> >> This corrupts SL in req_msg if it was different from zero. In other >> words, a request to setup a connection using an SL != zero, will not >> be honored, and a connection using SL zero will be created instead. >> >> Fixed by not calling cm_process_routed_req() on RoCE systems. >> >> Fixes: 3971c9f6dbf2 ("IB/cm: Add interim support for routed paths") >> Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> >> drivers/infiniband/core/cm.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c >> index 3d194bb..6adbaea 100644 >> +++ b/drivers/infiniband/core/cm.c >> @@ -2138,7 +2138,8 @@ static int cm_req_handler(struct cm_work *work) >> goto destroy; >> } >> >> - cm_process_routed_req(req_msg, work->mad_recv_wc->wc); >> + if (cm_id_priv->av.ah_attr.type != RDMA_AH_ATTR_TYPE_ROCE) >> + cm_process_routed_req(req_msg, work->mad_recv_wc->wc); > > why use ah_attr.type when a few lines below we have: > > if (gid_attr && > rdma_protocol_roce(work->port->cm_dev->ib_device, > work->port->port_num)) { > > ? > > I suspect you can just move this into the else? I can counter that by saying ah_attr.type is used ~10 lines further down in the conditional call to sa_path_set_dmac() ;-) Further, in > if (gid_attr && > rdma_protocol_roce(work->port->cm_dev->ib_device, > work->port->port_num)) { I cannot really see how gid_attr could be null. If ib_init_ah_attr_from_wc() succeeds, it is set after the call to cm_init_av_for_response() above. May be using ah_attr.type in this test instead, for uniformity and readability? I have no strong opinion. Let me know your preference. Thxs, Håkon
> On 24 Mar 2021, at 15:34, Håkon Bugge <haakon.bugge@oracle.com> wrote: > > > >> On 23 Mar 2021, at 20:46, Jason Gunthorpe <jgg@nvidia.com> wrote: >> >> On Mon, Mar 22, 2021 at 02:35:32PM +0100, Håkon Bugge wrote: >>> On RoCE systems, a CM REQ contains a Primary Hop Limit > 1 and Primary >>> Subnet Local is zero. >>> >>> In cm_req_handler(), the cm_process_routed_req() function is >>> called. Since the Primary Subnet Local value is zero in the request, >>> and since this is RoCE (Primary Local LID is permissive), the >>> following statement will be executed: >>> >>> IBA_SET(CM_REQ_PRIMARY_SL, req_msg, wc->sl); >>> >>> This corrupts SL in req_msg if it was different from zero. In other >>> words, a request to setup a connection using an SL != zero, will not >>> be honored, and a connection using SL zero will be created instead. >>> >>> Fixed by not calling cm_process_routed_req() on RoCE systems. >>> >>> Fixes: 3971c9f6dbf2 ("IB/cm: Add interim support for routed paths") >>> Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> >>> drivers/infiniband/core/cm.c | 3 ++- >>> 1 file changed, 2 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c >>> index 3d194bb..6adbaea 100644 >>> +++ b/drivers/infiniband/core/cm.c >>> @@ -2138,7 +2138,8 @@ static int cm_req_handler(struct cm_work *work) >>> goto destroy; >>> } >>> >>> - cm_process_routed_req(req_msg, work->mad_recv_wc->wc); >>> + if (cm_id_priv->av.ah_attr.type != RDMA_AH_ATTR_TYPE_ROCE) >>> + cm_process_routed_req(req_msg, work->mad_recv_wc->wc); >> >> why use ah_attr.type when a few lines below we have: >> >> if (gid_attr && >> rdma_protocol_roce(work->port->cm_dev->ib_device, >> work->port->port_num)) { >> >> ? >> >> I suspect you can just move this into the else? > > I can counter that by saying ah_attr.type is used ~10 lines further down in the conditional call to sa_path_set_dmac() ;-) > > > Further, in > >> if (gid_attr && >> rdma_protocol_roce(work->port->cm_dev->ib_device, >> work->port->port_num)) { > > I cannot really see how gid_attr could be null. If ib_init_ah_attr_from_wc() succeeds, it is set after the call to cm_init_av_for_response() above. May be using ah_attr.type in this test instead, for uniformity and readability? > > I have no strong opinion. > > Let me know your preference. A gentle ping. Thxs, Håkon > > > Thxs, Håkon
On Wed, Mar 24, 2021 at 02:34:13PM +0000, Håkon Bugge wrote: > > > > On 23 Mar 2021, at 20:46, Jason Gunthorpe <jgg@nvidia.com> wrote: > > > > On Mon, Mar 22, 2021 at 02:35:32PM +0100, Håkon Bugge wrote: > >> On RoCE systems, a CM REQ contains a Primary Hop Limit > 1 and Primary > >> Subnet Local is zero. > >> > >> In cm_req_handler(), the cm_process_routed_req() function is > >> called. Since the Primary Subnet Local value is zero in the request, > >> and since this is RoCE (Primary Local LID is permissive), the > >> following statement will be executed: > >> > >> IBA_SET(CM_REQ_PRIMARY_SL, req_msg, wc->sl); > >> > >> This corrupts SL in req_msg if it was different from zero. In other > >> words, a request to setup a connection using an SL != zero, will not > >> be honored, and a connection using SL zero will be created instead. > >> > >> Fixed by not calling cm_process_routed_req() on RoCE systems. > >> > >> Fixes: 3971c9f6dbf2 ("IB/cm: Add interim support for routed paths") > >> Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> > >> drivers/infiniband/core/cm.c | 3 ++- > >> 1 file changed, 2 insertions(+), 1 deletion(-) > >> > >> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c > >> index 3d194bb..6adbaea 100644 > >> +++ b/drivers/infiniband/core/cm.c > >> @@ -2138,7 +2138,8 @@ static int cm_req_handler(struct cm_work *work) > >> goto destroy; > >> } > >> > >> - cm_process_routed_req(req_msg, work->mad_recv_wc->wc); > >> + if (cm_id_priv->av.ah_attr.type != RDMA_AH_ATTR_TYPE_ROCE) > >> + cm_process_routed_req(req_msg, work->mad_recv_wc->wc); > > > > why use ah_attr.type when a few lines below we have: > > > > if (gid_attr && > > rdma_protocol_roce(work->port->cm_dev->ib_device, > > work->port->port_num)) { > > > > ? > > > > I suspect you can just move this into the else? > > I can counter that by saying ah_attr.type is used ~10 lines further > down in the conditional call to sa_path_set_dmac() ;-) Hum, OK. Please send an additional patch to unify everything around av.ah_attr.type > > if (gid_attr && > > rdma_protocol_roce(work->port->cm_dev->ib_device, > > work->port->port_num)) { > > I cannot really see how gid_attr could be null. If > ib_init_ah_attr_from_wc() succeeds, it is set after the call to > cm_init_av_for_response() above. May be using ah_attr.type in this > test instead, for uniformity and readability? The GRH is optional, ib_init_ah_attr_from_wc() only sets it conditionally. Applied to for-next Thanks, Jason
> On 1 Apr 2021, at 17:04, Jason Gunthorpe <jgg@nvidia.com> wrote: > > On Wed, Mar 24, 2021 at 02:34:13PM +0000, Håkon Bugge wrote: >> >> >>> On 23 Mar 2021, at 20:46, Jason Gunthorpe <jgg@nvidia.com> wrote: >>> >>> On Mon, Mar 22, 2021 at 02:35:32PM +0100, Håkon Bugge wrote: >>>> On RoCE systems, a CM REQ contains a Primary Hop Limit > 1 and Primary >>>> Subnet Local is zero. >>>> >>>> In cm_req_handler(), the cm_process_routed_req() function is >>>> called. Since the Primary Subnet Local value is zero in the request, >>>> and since this is RoCE (Primary Local LID is permissive), the >>>> following statement will be executed: >>>> >>>> IBA_SET(CM_REQ_PRIMARY_SL, req_msg, wc->sl); >>>> >>>> This corrupts SL in req_msg if it was different from zero. In other >>>> words, a request to setup a connection using an SL != zero, will not >>>> be honored, and a connection using SL zero will be created instead. >>>> >>>> Fixed by not calling cm_process_routed_req() on RoCE systems. >>>> >>>> Fixes: 3971c9f6dbf2 ("IB/cm: Add interim support for routed paths") >>>> Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> >>>> drivers/infiniband/core/cm.c | 3 ++- >>>> 1 file changed, 2 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c >>>> index 3d194bb..6adbaea 100644 >>>> +++ b/drivers/infiniband/core/cm.c >>>> @@ -2138,7 +2138,8 @@ static int cm_req_handler(struct cm_work *work) >>>> goto destroy; >>>> } >>>> >>>> - cm_process_routed_req(req_msg, work->mad_recv_wc->wc); >>>> + if (cm_id_priv->av.ah_attr.type != RDMA_AH_ATTR_TYPE_ROCE) >>>> + cm_process_routed_req(req_msg, work->mad_recv_wc->wc); >>> >>> why use ah_attr.type when a few lines below we have: >>> >>> if (gid_attr && >>> rdma_protocol_roce(work->port->cm_dev->ib_device, >>> work->port->port_num)) { >>> >>> ? >>> >>> I suspect you can just move this into the else? >> >> I can counter that by saying ah_attr.type is used ~10 lines further >> down in the conditional call to sa_path_set_dmac() ;-) > > Hum, OK. Please send an additional patch to unify everything around > av.ah_attr.type Will do. >>> if (gid_attr && >>> rdma_protocol_roce(work->port->cm_dev->ib_device, >>> work->port->port_num)) { >> >> I cannot really see how gid_attr could be null. If >> ib_init_ah_attr_from_wc() succeeds, it is set after the call to >> cm_init_av_for_response() above. May be using ah_attr.type in this >> test instead, for uniformity and readability? > > The GRH is optional, ib_init_ah_attr_from_wc() only sets it > conditionally. True. But one of the conditions to set sgid_attr is rdma_protocol_roce(). Hence the first term in: if (gid_attr && rdma_protocol_roce()) is superfluous. This because, it cannot be NULL on RoCE systems, because it is dereferenced in: cm_init_av_for_response() ib_init_ah_attr_from_wc() rdma_move_grh_sgid_attr() I'll send the patch with the gid_attr term and let you can decide. Thxs, Håkon > > Applied to for-next > > Thanks, > Jason
diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index 3d194bb..6adbaea 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -2138,7 +2138,8 @@ static int cm_req_handler(struct cm_work *work) goto destroy; } - cm_process_routed_req(req_msg, work->mad_recv_wc->wc); + if (cm_id_priv->av.ah_attr.type != RDMA_AH_ATTR_TYPE_ROCE) + cm_process_routed_req(req_msg, work->mad_recv_wc->wc); memset(&work->path[0], 0, sizeof(work->path[0])); if (cm_req_has_alt_path(req_msg))
On RoCE systems, a CM REQ contains a Primary Hop Limit > 1 and Primary Subnet Local is zero. In cm_req_handler(), the cm_process_routed_req() function is called. Since the Primary Subnet Local value is zero in the request, and since this is RoCE (Primary Local LID is permissive), the following statement will be executed: IBA_SET(CM_REQ_PRIMARY_SL, req_msg, wc->sl); This corrupts SL in req_msg if it was different from zero. In other words, a request to setup a connection using an SL != zero, will not be honored, and a connection using SL zero will be created instead. Fixed by not calling cm_process_routed_req() on RoCE systems. Fixes: 3971c9f6dbf2 ("IB/cm: Add interim support for routed paths") Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> --- drivers/infiniband/core/cm.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)