diff mbox

iwarp kernel mode applications are broken with commit f35faa4ba

Message ID VI1PR0502MB3008273F9B13095A781D8764D18D0@VI1PR0502MB3008.eurprd05.prod.outlook.com (mailing list archive)
State Deferred
Headers show

Commit Message

Parav Pandit April 27, 2018, 3:36 a.m. UTC
Hi Shiraz, Raju,

> -----Original Message-----
> From: Shiraz Saleem [mailto:shiraz.saleem@intel.com]
> Sent: Thursday, April 26, 2018 8:48 PM
> To: Raju Rangoju <rajur@chelsio.com>
> Cc: Parav Pandit <parav@mellanox.com>; linux-rdma@vger.kernel.org; SWise
> OGC <swise@opengridcomputing.com>; sean.hefty@intel.com
> Subject: Re: iwarp kernel mode applications are broken with commit f35faa4ba
> 
> On Thu, Apr 26, 2018 at 07:46:38PM +0000, Raju  Rangoju wrote:
> > Hi Parav,
> >
> > The following commit f35faa4ba broke iWARP kernel mode applications.
> >
> > commit f35faa4ba9568138eea1c58abb92e8ef415dce41
> > Author: Parav Pandit <parav@mellanox.com>
> > Date:   Sun Apr 1 15:08:20 2018 +0300
> >
> >     IB/core: Simplify ib_query_gid to always refer to cache
> >
> > [root@bhumthang]# nvme discover -t rdma -a 102.1.1.17 Failed to write
> > to /dev/nvme-fabrics: Invalid argument
> >
> > [root@bhumthang]# dmesg
> > [55961.151787] nvme nvme0: rdma_connect failed (-22).
> > [55961.151971] nvme nvme0: rdma connection establishment failed (-22)
> >
> > ------------
> > iser
> > -------------
> > [54714.834984] iw_cxgb4: Chelsio T4/T5 RDMA Driver - version 0.1
> > [54714.834987] iw_cxgb4: 0000:04:00.4: Up [54714.834987] iw_cxgb4:
> > 0000:04:00.4: On-Chip Queues not supported on this device
> > [54714.855963] ib_srpt MAD registration failed for cxgb4_0-1.
> > [54714.855972] ib_srpt srpt_add_one(cxgb4_0) failed.
> > [54715.123119] iw_cxgb4: 0000:07:00.4: Up [54715.123121] iw_cxgb4:
> > 0000:07:00.4: On-Chip Queues not supported on this device
> > [54715.125977] cxgb4 0000:07:00.4 enp7s0f4: port module unplugged
> > [54715.166076] ib_srpt MAD registration failed for cxgb4_1-1.
> > [54715.166080] ib_srpt srpt_add_one(cxgb4_1) failed.
> >  [54834.322675] iser: iser_route_handler: failure connecting: -22
> > [54835.326918] iser: iser_route_handler: failure connecting: -22
> > [54836.331221] iser: iser_route_handler: failure connecting: -22
> > [54837.335625] iser: iser_route_handler: failure connecting: -22
> > [54838.339980] iser: iser_route_handler: failure connecting: -22
> > [54839.343882] iser: iser_route_handler: failure connecting: -22
> >
> 
> 
> My validation team reported the same issue on i40iw with 4.17-rc kernels.
> 
> Some more data. Looks like the failure is because we can't find the cached gid
> due to the gid idx being wrong in query_gid.
> 
> rdma_connect
>    cma_connect_iw
> 	cma_modify_qp_rtr
> 	    ib_query_gid
> 		ib_get_cached_gid
> 		    __ib_cache_gid_get (EINVAL)
> 
This call trace is helpful.
Can you please run ibv_devinfo -v | grep GID and see that you are getting the expected GID.
For iWarp we have only one entry GID table. So want to make sure that GID table is build correctly.

From the above call trace, is appears that,
cma_modify_qp_rtr() contains,
struct ib_qp_attr qp_attr;

ah_attr from above qp_attr remains uninitialized by iw_cm_init_qp_attr() and iwcm_init_qp_init_attr().
Before my fix, gid_index was always ignored by the query_gid() callback such as i40iw_query_gid() and c4iw_query_gid().
So it used to work.
Now my fix expects all values to be correct; due to uninitialized ib_qp_attr it is likely failing.

So can you please try below hunk and see if ib_query_gid() progresses for you?
If it works, I will send the proper patch shortly.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Raju Rangoju April 27, 2018, 9:07 a.m. UTC | #1
On Friday, April 04/27/18, 2018 at 03:36:18 +0000, Parav Pandit wrote:

Parav,

> Hi Shiraz, Raju,
> 
> > -----Original Message-----
> > My validation team reported the same issue on i40iw with 4.17-rc kernels.
> > 
> > Some more data. Looks like the failure is because we can't find the cached gid
> > due to the gid idx being wrong in query_gid.
> > 
> > rdma_connect
> >    cma_connect_iw
> > 	cma_modify_qp_rtr
> > 	    ib_query_gid
> > 		ib_get_cached_gid
> > 		    __ib_cache_gid_get (EINVAL)
> > 
> This call trace is helpful.
> Can you please run ibv_devinfo -v | grep GID and see that you are getting the expected GID.
> For iWarp we have only one entry GID table. So want to make sure that GID table is build correctly.
> 
> From the above call trace, is appears that,
> cma_modify_qp_rtr() contains,
> struct ib_qp_attr qp_attr;
> 
> ah_attr from above qp_attr remains uninitialized by iw_cm_init_qp_attr() and iwcm_init_qp_init_attr().
> Before my fix, gid_index was always ignored by the query_gid() callback such as i40iw_query_gid() and c4iw_query_gid().
> So it used to work.
> Now my fix expects all values to be correct; due to uninitialized ib_qp_attr it is likely failing.
> 
> So can you please try below hunk and see if ib_query_gid() progresses for you?
> If it works, I will send the proper patch shortly.
>

Its working. Issue is not seen with the below patch.


> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
> index 8512f63..e119cff 100644
> --- a/drivers/infiniband/core/cma.c
> +++ b/drivers/infiniband/core/cma.c
> @@ -863,7 +863,7 @@ void rdma_destroy_qp(struct rdma_cm_id *id)
>  static int cma_modify_qp_rtr(struct rdma_id_private *id_priv,
>                              struct rdma_conn_param *conn_param)
>  {
> -       struct ib_qp_attr qp_attr;
> +       struct ib_qp_attr qp_attr = {};
>         int qp_attr_mask, ret;
>         union ib_gid sgid;
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Parav Pandit April 27, 2018, 4:35 p.m. UTC | #2
Hi Raju,

> -----Original Message-----
> From: Raju Rangoju [mailto:rajur@chelsio.com]
> Sent: Friday, April 27, 2018 4:08 AM
> To: Parav Pandit <parav@mellanox.com>
> Cc: Shiraz Saleem <shiraz.saleem@intel.com>; linux-rdma@vger.kernel.org;
> SWise OGC <swise@opengridcomputing.com>; sean.hefty@intel.com
> Subject: Re: iwarp kernel mode applications are broken with commit f35faa4ba
> 
> On Friday, April 04/27/18, 2018 at 03:36:18 +0000, Parav Pandit wrote:
> 
> Parav,
> 
> > Hi Shiraz, Raju,
> >
> > > -----Original Message-----
> > > My validation team reported the same issue on i40iw with 4.17-rc kernels.
> > >
> > > Some more data. Looks like the failure is because we can't find the
> > > cached gid due to the gid idx being wrong in query_gid.
> > >
> > > rdma_connect
> > >    cma_connect_iw
> > > 	cma_modify_qp_rtr
> > > 	    ib_query_gid
> > > 		ib_get_cached_gid
> > > 		    __ib_cache_gid_get (EINVAL)
> > >
> > This call trace is helpful.
> > Can you please run ibv_devinfo -v | grep GID and see that you are getting the
> expected GID.
> > For iWarp we have only one entry GID table. So want to make sure that GID
> table is build correctly.
> >
> > From the above call trace, is appears that,
> > cma_modify_qp_rtr() contains,
> > struct ib_qp_attr qp_attr;
> >
> > ah_attr from above qp_attr remains uninitialized by iw_cm_init_qp_attr() and
> iwcm_init_qp_init_attr().
> > Before my fix, gid_index was always ignored by the query_gid() callback such
> as i40iw_query_gid() and c4iw_query_gid().
> > So it used to work.
> > Now my fix expects all values to be correct; due to uninitialized ib_qp_attr it is
> likely failing.
> >
> > So can you please try below hunk and see if ib_query_gid() progresses for you?
> > If it works, I will send the proper patch shortly.
> >
> 
> Its working. Issue is not seen with the below patch.

Thanks a lot for the quick test.
I will send the patch through Leon's queue for for-rc with your Tested-by tag.

> 
> 
> > diff --git a/drivers/infiniband/core/cma.c
> > b/drivers/infiniband/core/cma.c index 8512f63..e119cff 100644
> > --- a/drivers/infiniband/core/cma.c
> > +++ b/drivers/infiniband/core/cma.c
> > @@ -863,7 +863,7 @@ void rdma_destroy_qp(struct rdma_cm_id *id)
> > static int cma_modify_qp_rtr(struct rdma_id_private *id_priv,
> >                              struct rdma_conn_param *conn_param)  {
> > -       struct ib_qp_attr qp_attr;
> > +       struct ib_qp_attr qp_attr = {};
> >         int qp_attr_mask, ret;
> >         union ib_gid sgid;
> >
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Shiraz Saleem April 27, 2018, 6:26 p.m. UTC | #3
>Subject: RE: iwarp kernel mode applications are broken with commit f35faa4ba
>
>>
>> > diff --git a/drivers/infiniband/core/cma.c
>> > b/drivers/infiniband/core/cma.c index 8512f63..e119cff 100644
>> > --- a/drivers/infiniband/core/cma.c
>> > +++ b/drivers/infiniband/core/cma.c
>> > @@ -863,7 +863,7 @@ void rdma_destroy_qp(struct rdma_cm_id *id)
>> > static int cma_modify_qp_rtr(struct rdma_id_private *id_priv,
>> >                              struct rdma_conn_param *conn_param)  {
>> > -       struct ib_qp_attr qp_attr;
>> > +       struct ib_qp_attr qp_attr = {};
>> >         int qp_attr_mask, ret;
>> >         union ib_gid sgid;
>> >
>--

Hi Parav - The patch resolves the issue.
But shouldn't we remove ib_query_gid() from cma_modify_qp_rtr() altogether? What is its purpose in that function?

Shiraz
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Raju Rangoju April 27, 2018, 6:35 p.m. UTC | #4
On Friday, April 04/27/18, 2018 at 16:35:32 +0000, Parav Pandit wrote:
> Hi Raju,
> 
> > -----Original Message-----
> > From: Raju Rangoju [mailto:rajur@chelsio.com]
> > Sent: Friday, April 27, 2018 4:08 AM
> > To: Parav Pandit <parav@mellanox.com>
> > Cc: Shiraz Saleem <shiraz.saleem@intel.com>; linux-rdma@vger.kernel.org;
> > SWise OGC <swise@opengridcomputing.com>; sean.hefty@intel.com
> > Subject: Re: iwarp kernel mode applications are broken with commit f35faa4ba
> > 
> > On Friday, April 04/27/18, 2018 at 03:36:18 +0000, Parav Pandit wrote:
> > 
> > Parav,
> > 
> > > Hi Shiraz, Raju,
> > >
> > > > -----Original Message-----
> > > > My validation team reported the same issue on i40iw with 4.17-rc kernels.
> > > >
> > > > Some more data. Looks like the failure is because we can't find the
> > > > cached gid due to the gid idx being wrong in query_gid.
> > > >
> > > > rdma_connect
> > > >    cma_connect_iw
> > > > 	cma_modify_qp_rtr
> > > > 	    ib_query_gid
> > > > 		ib_get_cached_gid
> > > > 		    __ib_cache_gid_get (EINVAL)
> > > >
> > > This call trace is helpful.
> > > Can you please run ibv_devinfo -v | grep GID and see that you are getting the
> > expected GID.
> > > For iWarp we have only one entry GID table. So want to make sure that GID
> > table is build correctly.
> > >
> > > From the above call trace, is appears that,
> > > cma_modify_qp_rtr() contains,
> > > struct ib_qp_attr qp_attr;
> > >
> > > ah_attr from above qp_attr remains uninitialized by iw_cm_init_qp_attr() and
> > iwcm_init_qp_init_attr().
> > > Before my fix, gid_index was always ignored by the query_gid() callback such
> > as i40iw_query_gid() and c4iw_query_gid().
> > > So it used to work.
> > > Now my fix expects all values to be correct; due to uninitialized ib_qp_attr it is
> > likely failing.
> > >
> > > So can you please try below hunk and see if ib_query_gid() progresses for you?
> > > If it works, I will send the proper patch shortly.
> > >
> > 
> > Its working. Issue is not seen with the below patch.
> 
> Thanks a lot for the quick test.
> I will send the patch through Leon's queue for for-rc with your Tested-by tag.
> 

Sure. Thanks Parav.

> > 
> > 
> > > diff --git a/drivers/infiniband/core/cma.c
> > > b/drivers/infiniband/core/cma.c index 8512f63..e119cff 100644
> > > --- a/drivers/infiniband/core/cma.c
> > > +++ b/drivers/infiniband/core/cma.c
> > > @@ -863,7 +863,7 @@ void rdma_destroy_qp(struct rdma_cm_id *id)
> > > static int cma_modify_qp_rtr(struct rdma_id_private *id_priv,
> > >                              struct rdma_conn_param *conn_param)  {
> > > -       struct ib_qp_attr qp_attr;
> > > +       struct ib_qp_attr qp_attr = {};
> > >         int qp_attr_mask, ret;
> > >         union ib_gid sgid;
> > >
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Parav Pandit April 27, 2018, 7:12 p.m. UTC | #5
Hi Shiraz,

> -----Original Message-----
> From: Saleem, Shiraz [mailto:shiraz.saleem@intel.com]
> Sent: Friday, April 27, 2018 1:26 PM
> To: Parav Pandit <parav@mellanox.com>; Raju Rangoju <rajur@chelsio.com>
> Cc: linux-rdma@vger.kernel.org; SWise OGC <swise@opengridcomputing.com>;
> Hefty, Sean <sean.hefty@intel.com>
> Subject: RE: iwarp kernel mode applications are broken with commit f35faa4ba
> 
> >Subject: RE: iwarp kernel mode applications are broken with commit
> >f35faa4ba
> >
> >>
> >> > diff --git a/drivers/infiniband/core/cma.c
> >> > b/drivers/infiniband/core/cma.c index 8512f63..e119cff 100644
> >> > --- a/drivers/infiniband/core/cma.c
> >> > +++ b/drivers/infiniband/core/cma.c
> >> > @@ -863,7 +863,7 @@ void rdma_destroy_qp(struct rdma_cm_id *id)
> >> > static int cma_modify_qp_rtr(struct rdma_id_private *id_priv,
> >> >                              struct rdma_conn_param *conn_param)  {
> >> > -       struct ib_qp_attr qp_attr;
> >> > +       struct ib_qp_attr qp_attr = {};
> >> >         int qp_attr_mask, ret;
> >> >         union ib_gid sgid;
> >> >
> >--
> 
> Hi Parav - The patch resolves the issue.
> But shouldn't we remove ib_query_gid() from cma_modify_qp_rtr() altogether?
> What is its purpose in that function?
I agree it is not needed. I looked at the commit dd5f03beb4f7 from Matan where sgid was used for smac address.
But now the code has improved a lot over it and this query_gid should have been removed when smac thing was refactored, but it was not.
For this rc cycle I think we just keep it and only make qp_attr {} which is anyway good thing to do, just trying to avoid too many changes in rc now.
But I am fine either ways.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steve Wise April 27, 2018, 7:20 p.m. UTC | #6
> Hi Shiraz,
> 
> > -----Original Message-----
> > From: Saleem, Shiraz [mailto:shiraz.saleem@intel.com]
> > Sent: Friday, April 27, 2018 1:26 PM
> > To: Parav Pandit <parav@mellanox.com>; Raju Rangoju
> <rajur@chelsio.com>
> > Cc: linux-rdma@vger.kernel.org; SWise OGC
> <swise@opengridcomputing.com>;
> > Hefty, Sean <sean.hefty@intel.com>
> > Subject: RE: iwarp kernel mode applications are broken with commit
> f35faa4ba
> >
> > >Subject: RE: iwarp kernel mode applications are broken with commit
> > >f35faa4ba
> > >
> > >>
> > >> > diff --git a/drivers/infiniband/core/cma.c
> > >> > b/drivers/infiniband/core/cma.c index 8512f63..e119cff 100644
> > >> > --- a/drivers/infiniband/core/cma.c
> > >> > +++ b/drivers/infiniband/core/cma.c
> > >> > @@ -863,7 +863,7 @@ void rdma_destroy_qp(struct rdma_cm_id *id)
> > >> > static int cma_modify_qp_rtr(struct rdma_id_private *id_priv,
> > >> >                              struct rdma_conn_param *conn_param)  {
> > >> > -       struct ib_qp_attr qp_attr;
> > >> > +       struct ib_qp_attr qp_attr = {};
> > >> >         int qp_attr_mask, ret;
> > >> >         union ib_gid sgid;
> > >> >
> > >--
> >
> > Hi Parav - The patch resolves the issue.
> > But shouldn't we remove ib_query_gid() from cma_modify_qp_rtr()
> altogether?
> > What is its purpose in that function?
> I agree it is not needed. I looked at the commit dd5f03beb4f7 from Matan
> where sgid was used for smac address.
> But now the code has improved a lot over it and this query_gid should have
> been removed when smac thing was refactored, but it was not.
> For this rc cycle I think we just keep it and only make qp_attr {} which
is
> anyway good thing to do, just trying to avoid too many changes in rc now.
> But I am fine either ways.

Let's just initialize qp_attr for this -rc.  Please post a patch for Doug to
merge.  And tag it for -stable if that is needed.

Thanks,

Steve.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Parav Pandit April 27, 2018, 7:31 p.m. UTC | #7
> -----Original Message-----
> From: Steve Wise [mailto:swise@opengridcomputing.com]
> Sent: Friday, April 27, 2018 2:21 PM
> To: Parav Pandit <parav@mellanox.com>; 'Saleem, Shiraz'
> <shiraz.saleem@intel.com>; 'Raju Rangoju' <rajur@chelsio.com>
> Cc: linux-rdma@vger.kernel.org; 'Hefty, Sean' <sean.hefty@intel.com>
> Subject: RE: iwarp kernel mode applications are broken with commit f35faa4ba
> 
> > Hi Shiraz,
> >
> > > -----Original Message-----
> > > From: Saleem, Shiraz [mailto:shiraz.saleem@intel.com]
> > > Sent: Friday, April 27, 2018 1:26 PM
> > > To: Parav Pandit <parav@mellanox.com>; Raju Rangoju
> > <rajur@chelsio.com>
> > > Cc: linux-rdma@vger.kernel.org; SWise OGC
> > <swise@opengridcomputing.com>;
> > > Hefty, Sean <sean.hefty@intel.com>
> > > Subject: RE: iwarp kernel mode applications are broken with commit
> > f35faa4ba
> > >
> > > >Subject: RE: iwarp kernel mode applications are broken with commit
> > > >f35faa4ba
> > > >
> > > >>
> > > >> > diff --git a/drivers/infiniband/core/cma.c
> > > >> > b/drivers/infiniband/core/cma.c index 8512f63..e119cff 100644
> > > >> > --- a/drivers/infiniband/core/cma.c
> > > >> > +++ b/drivers/infiniband/core/cma.c
> > > >> > @@ -863,7 +863,7 @@ void rdma_destroy_qp(struct rdma_cm_id *id)
> > > >> > static int cma_modify_qp_rtr(struct rdma_id_private *id_priv,
> > > >> >                              struct rdma_conn_param *conn_param)  {
> > > >> > -       struct ib_qp_attr qp_attr;
> > > >> > +       struct ib_qp_attr qp_attr = {};
> > > >> >         int qp_attr_mask, ret;
> > > >> >         union ib_gid sgid;
> > > >> >
> > > >--
> > >
> > > Hi Parav - The patch resolves the issue.
> > > But shouldn't we remove ib_query_gid() from cma_modify_qp_rtr()
> > altogether?
> > > What is its purpose in that function?
> > I agree it is not needed. I looked at the commit dd5f03beb4f7 from
> > Matan where sgid was used for smac address.
> > But now the code has improved a lot over it and this query_gid should
> > have been removed when smac thing was refactored, but it was not.
> > For this rc cycle I think we just keep it and only make qp_attr {}
> > which
> is
> > anyway good thing to do, just trying to avoid too many changes in rc now.
> > But I am fine either ways.
> 
> Let's just initialize qp_attr for this -rc.  Please post a patch for Doug to merge.
> And tag it for -stable if that is needed.

ok Steve, we usually let Leon post it.
I have already enqueued to Leon's internal tree, so mostly Sun/Mon it should be out to the ML.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 8512f63..e119cff 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -863,7 +863,7 @@  void rdma_destroy_qp(struct rdma_cm_id *id)
 static int cma_modify_qp_rtr(struct rdma_id_private *id_priv,
                             struct rdma_conn_param *conn_param)
 {
-       struct ib_qp_attr qp_attr;
+       struct ib_qp_attr qp_attr = {};
        int qp_attr_mask, ret;
        union ib_gid sgid;