Message ID | 1525197913.11756.129.camel@redhat.com (mailing list archive) |
---|---|
State | RFC |
Headers | show |
On Tue, May 01, 2018 at 02:05:13PM -0400, Doug Ledford wrote: > On Mon, 2018-04-30 at 15:56 -0600, Jason Gunthorpe wrote: > > On Sun, Apr 29, 2018 at 10:46:46AM +0300, Leon Romanovsky wrote: > > > From: Parav Pandit <parav@mellanox.com> > > > > > > Commit given in fixes tag introduced an accurate check to validate > > > device, port, index by referring to the cache layer for querying GIDs > > > for all link layers (IB, RoCE and iWarp). > > > > > > When rdmacm tries to modify the QP to RTR state for kernel consumers, > > > qp and ah attributes are uninitialized. Each transport layer (IB/iWarp) > > > initializes them depending on transport type. > > > However qp ah_attr are not used for iWarp and remained uninitialized, > > > which is further used in ib_query_gid() call. This results into a > > > failure to query the GID due to an invalid GID index coming from > > > the uninitialized stack memory. > > > This is reported and discussed in thread [1]. > > > > What is ib_query_gid supposed to do for iWarp? > > Return the sole entry in the GID table. But, you pass the index, so you > have to know to pass index 0. In this case, because of uninitialized > memory in the ah_attr, it's being told to try and return another entry > that is invalid. Ok, but what is in the 0 index of the GID table for iWarp? > However, I see no benefit to doing this bandaid fix while leaving the > real bug: that we are calling ib_query_gid() for no purpose in this > function now a days. Just clear out the bad usage and move on. To that > end, I've written a patch for this: > > [dledford@haswell-e linus (k.o/wip/dl-for-rc *)]$ git diff > diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c > index 8364223422d0..a693fcd4c513 100644 > +++ b/drivers/infiniband/core/cma.c > @@ -868,7 +868,6 @@ static int cma_modify_qp_rtr(struct rdma_id_private *id_priv, > { > struct ib_qp_attr qp_attr; > int qp_attr_mask, ret; > - union ib_gid sgid; > > mutex_lock(&id_priv->qp_mutex); > if (!id_priv->id.qp) { > @@ -891,12 +890,6 @@ static int cma_modify_qp_rtr(struct rdma_id_private *id_priv, > if (ret) > goto out; > > - ret = ib_query_gid(id_priv->id.device, id_priv->id.port_num, > - rdma_ah_read_grh(&qp_attr.ah_attr)->sgid_index, > - &sgid, NULL); > - if (ret) > - goto out; > - Well, this was serving to ensure that the sgid index is present in the table for protocols that use gids. Presumably modify_qp will check this too though? > I think we should also be at a point where we can remove that > BUG_ON()... Yes Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> > - ret = ib_query_gid(id_priv->id.device, id_priv->id.port_num, > > - rdma_ah_read_grh(&qp_attr.ah_attr)- > >sgid_index, > > - &sgid, NULL); > > - if (ret) > > - goto out; > > - > > Well, this was serving to ensure that the sgid index is present in the > table for protocols that use gids. > > Presumably modify_qp will check this too though? The protocols exchange gids, so there should have been a prior conversion from the gid to get the index... But if a check is needed, I think it needs to be done within modify_qp to avoid potential races. - Sean -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> -----Original Message----- > From: Jason Gunthorpe [mailto:jgg@ziepe.ca] > Sent: Tuesday, May 01, 2018 1:19 PM > To: Doug Ledford <dledford@redhat.com> > Cc: Leon Romanovsky <leon@kernel.org>; Parav Pandit > <parav@mellanox.com>; RDMA mailing list <linux-rdma@vger.kernel.org>; Raju > Rangoju <rajur@chelsio.com>; Leon Romanovsky <leonro@mellanox.com> > Subject: Re: [PATCH rdma-rc] RDMA/cma: Zero out qp and ah attribute > > On Tue, May 01, 2018 at 02:05:13PM -0400, Doug Ledford wrote: > > On Mon, 2018-04-30 at 15:56 -0600, Jason Gunthorpe wrote: > > > On Sun, Apr 29, 2018 at 10:46:46AM +0300, Leon Romanovsky wrote: > > > > From: Parav Pandit <parav@mellanox.com> > > > > > > > > Commit given in fixes tag introduced an accurate check to validate > > > > device, port, index by referring to the cache layer for querying > > > > GIDs for all link layers (IB, RoCE and iWarp). > > > > > > > > When rdmacm tries to modify the QP to RTR state for kernel > > > > consumers, qp and ah attributes are uninitialized. Each transport > > > > layer (IB/iWarp) initializes them depending on transport type. > > > > However qp ah_attr are not used for iWarp and remained > > > > uninitialized, which is further used in ib_query_gid() call. This > > > > results into a failure to query the GID due to an invalid GID > > > > index coming from the uninitialized stack memory. > > > > This is reported and discussed in thread [1]. > > > > > > What is ib_query_gid supposed to do for iWarp? > > > > Return the sole entry in the GID table. But, you pass the index, so > > you have to know to pass index 0. In this case, because of > > uninitialized memory in the ah_attr, it's being told to try and return > > another entry that is invalid. > > Ok, but what is in the 0 index of the GID table for iWarp? It's a GID derived based on mac address of netdev. > > > However, I see no benefit to doing this bandaid fix while leaving the > > real bug: that we are calling ib_query_gid() for no purpose in this > > function now a days. Just clear out the bad usage and move on. To > > that end, I've written a patch for this: > > > > [dledford@haswell-e linus (k.o/wip/dl-for-rc *)]$ git diff diff --git > > a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index > > 8364223422d0..a693fcd4c513 100644 > > +++ b/drivers/infiniband/core/cma.c > > @@ -868,7 +868,6 @@ static int cma_modify_qp_rtr(struct > > rdma_id_private *id_priv, { > > struct ib_qp_attr qp_attr; > > int qp_attr_mask, ret; > > - union ib_gid sgid; > > > > mutex_lock(&id_priv->qp_mutex); > > if (!id_priv->id.qp) { > > @@ -891,12 +890,6 @@ static int cma_modify_qp_rtr(struct rdma_id_private > *id_priv, > > if (ret) > > goto out; > > > > - ret = ib_query_gid(id_priv->id.device, id_priv->id.port_num, > > - rdma_ah_read_grh(&qp_attr.ah_attr)->sgid_index, > > - &sgid, NULL); > > - if (ret) > > - goto out; > > - > > Well, this was serving to ensure that the sgid index is present in the table for > protocols that use gids. > > Presumably modify_qp will check this too though? Yes. Ah_attr creation from wc and path records now have good checks for kernel QPs. User QP checks will be done anyway in modify_qp(). A patch to remove ib_query_gid() is already in Leon's queue with right commit ids. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, May 01, 2018 at 02:05:13PM -0400, Doug Ledford wrote: > On Mon, 2018-04-30 at 15:56 -0600, Jason Gunthorpe wrote: > > On Sun, Apr 29, 2018 at 10:46:46AM +0300, Leon Romanovsky wrote: > > > From: Parav Pandit <parav@mellanox.com> > > > > > > Commit given in fixes tag introduced an accurate check to validate > > > device, port, index by referring to the cache layer for querying GIDs > > > for all link layers (IB, RoCE and iWarp). > > > > > > When rdmacm tries to modify the QP to RTR state for kernel consumers, > > > qp and ah attributes are uninitialized. Each transport layer (IB/iWarp) > > > initializes them depending on transport type. > > > However qp ah_attr are not used for iWarp and remained uninitialized, > > > which is further used in ib_query_gid() call. This results into a > > > failure to query the GID due to an invalid GID index coming from > > > the uninitialized stack memory. > > > This is reported and discussed in thread [1]. > > > > What is ib_query_gid supposed to do for iWarp? > > Return the sole entry in the GID table. But, you pass the index, so you > have to know to pass index 0. In this case, because of uninitialized > memory in the ah_attr, it's being told to try and return another entry > that is invalid. > > However, I see no benefit to doing this bandaid fix while leaving the > real bug: that we are calling ib_query_gid() for no purpose in this > function now a days. Just clear out the bad usage and move on. To that > end, I've written a patch for this: > > [dledford@haswell-e linus (k.o/wip/dl-for-rc *)]$ git diff > diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c > index 8364223422d0..a693fcd4c513 100644 > --- a/drivers/infiniband/core/cma.c > +++ b/drivers/infiniband/core/cma.c > @@ -868,7 +868,6 @@ static int cma_modify_qp_rtr(struct rdma_id_private *id_priv, > { > struct ib_qp_attr qp_attr; > int qp_attr_mask, ret; > - union ib_gid sgid; > > mutex_lock(&id_priv->qp_mutex); > if (!id_priv->id.qp) { > @@ -891,12 +890,6 @@ static int cma_modify_qp_rtr(struct rdma_id_private *id_priv, > if (ret) > goto out; > > - ret = ib_query_gid(id_priv->id.device, id_priv->id.port_num, > - rdma_ah_read_grh(&qp_attr.ah_attr)->sgid_index, > - &sgid, NULL); > - if (ret) > - goto out; > - > BUG_ON(id_priv->cma_dev->device != id_priv->id.device); > > if (conn_param) > --- > Steve/Shiraz, can either of you test this real quick to make sure it > still works in your specific situation (I can't imagine it not working, > but a test would be nice)? Yes. It works. > I think we should also be at a point where we can remove that > BUG_ON()... > > -- > Doug Ledford <dledford@redhat.com> > GPG KeyID: B826A3330E572FDD > Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, May 01, 2018 at 06:26:09PM +0000, Parav Pandit wrote: > > On Tue, May 01, 2018 at 02:05:13PM -0400, Doug Ledford wrote: > > > On Mon, 2018-04-30 at 15:56 -0600, Jason Gunthorpe wrote: > > > > On Sun, Apr 29, 2018 at 10:46:46AM +0300, Leon Romanovsky wrote: > > > > > From: Parav Pandit <parav@mellanox.com> > > > > > > > > > > Commit given in fixes tag introduced an accurate check to validate > > > > > device, port, index by referring to the cache layer for querying > > > > > GIDs for all link layers (IB, RoCE and iWarp). > > > > > > > > > > When rdmacm tries to modify the QP to RTR state for kernel > > > > > consumers, qp and ah attributes are uninitialized. Each transport > > > > > layer (IB/iWarp) initializes them depending on transport type. > > > > > However qp ah_attr are not used for iWarp and remained > > > > > uninitialized, which is further used in ib_query_gid() call. This > > > > > results into a failure to query the GID due to an invalid GID > > > > > index coming from the uninitialized stack memory. > > > > > This is reported and discussed in thread [1]. > > > > > > > > What is ib_query_gid supposed to do for iWarp? > > > > > > Return the sole entry in the GID table. But, you pass the index, so > > > you have to know to pass index 0. In this case, because of > > > uninitialized memory in the ah_attr, it's being told to try and return > > > another entry that is invalid. > > > > Ok, but what is in the 0 index of the GID table for iWarp? > It's a GID derived based on mac address of netdev. It is not used for anything right? Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> -----Original Message----- > From: Jason Gunthorpe [mailto:jgg@ziepe.ca] > Sent: Tuesday, May 01, 2018 1:44 PM > To: Parav Pandit <parav@mellanox.com> > Cc: Doug Ledford <dledford@redhat.com>; Leon Romanovsky > <leon@kernel.org>; RDMA mailing list <linux-rdma@vger.kernel.org>; Raju > Rangoju <rajur@chelsio.com>; Leon Romanovsky <leonro@mellanox.com> > Subject: Re: [PATCH rdma-rc] RDMA/cma: Zero out qp and ah attribute > > On Tue, May 01, 2018 at 06:26:09PM +0000, Parav Pandit wrote: > > > On Tue, May 01, 2018 at 02:05:13PM -0400, Doug Ledford wrote: > > > > On Mon, 2018-04-30 at 15:56 -0600, Jason Gunthorpe wrote: > > > > > On Sun, Apr 29, 2018 at 10:46:46AM +0300, Leon Romanovsky wrote: > > > > > > From: Parav Pandit <parav@mellanox.com> > > > > > > > > > > > > Commit given in fixes tag introduced an accurate check to > > > > > > validate device, port, index by referring to the cache layer > > > > > > for querying GIDs for all link layers (IB, RoCE and iWarp). > > > > > > > > > > > > When rdmacm tries to modify the QP to RTR state for kernel > > > > > > consumers, qp and ah attributes are uninitialized. Each > > > > > > transport layer (IB/iWarp) initializes them depending on transport type. > > > > > > However qp ah_attr are not used for iWarp and remained > > > > > > uninitialized, which is further used in ib_query_gid() call. > > > > > > This results into a failure to query the GID due to an invalid > > > > > > GID index coming from the uninitialized stack memory. > > > > > > This is reported and discussed in thread [1]. > > > > > > > > > > What is ib_query_gid supposed to do for iWarp? > > > > > > > > Return the sole entry in the GID table. But, you pass the index, > > > > so you have to know to pass index 0. In this case, because of > > > > uninitialized memory in the ah_attr, it's being told to try and > > > > return another entry that is invalid. > > > > > > Ok, but what is in the 0 index of the GID table for iWarp? > > It's a GID derived based on mac address of netdev. > > It is not used for anything right? > Right. Please go through commit log. It is in Leon's queue. https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git/commit/?h=rdma-next&id=8858e62f3ff9b6cd0242edeb89b47e61a2c24a5f -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, May 01, 2018 at 07:32:10PM +0000, Parav Pandit wrote: > Right. > Please go through commit log. It is in Leon's queue. > https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git/commit/?h=rdma-next&id=8858e62f3ff9b6cd0242edeb89b47e61a2c24a5f Okay. We should get rid of that BUG_ON in a preceeding patch in case Linus looks at the patches :( Leon, what do you think about just sending a patch to replace the BUG_ON's with WARN_ON ? $ git grep BUG_ON drivers/infiniband/ | wc --l 136 :( Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> -----Original Message----- > From: Jason Gunthorpe [mailto:jgg@ziepe.ca] > Sent: Tuesday, May 01, 2018 2:41 PM > To: Parav Pandit <parav@mellanox.com> > Cc: Doug Ledford <dledford@redhat.com>; Leon Romanovsky > <leon@kernel.org>; RDMA mailing list <linux-rdma@vger.kernel.org>; Raju > Rangoju <rajur@chelsio.com>; Leon Romanovsky <leonro@mellanox.com> > Subject: Re: [PATCH rdma-rc] RDMA/cma: Zero out qp and ah attribute > > On Tue, May 01, 2018 at 07:32:10PM +0000, Parav Pandit wrote: > > > Right. > > Please go through commit log. It is in Leon's queue. > > https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git/co > > mmit/?h=rdma-next&id=8858e62f3ff9b6cd0242edeb89b47e61a2c24a5f > > Okay. > > We should get rid of that BUG_ON in a preceeding patch in case Linus looks at > the patches :( > Yes. The right fix is to move up the validation cma_modify_qp_rtr() like QP check and return error because cma_dev binding is done before that. > Leon, what do you think about just sending a patch to replace the BUG_ON's > with WARN_ON ? > > $ git grep BUG_ON drivers/infiniband/ | wc --l > 136 > > :( > > Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, May 01, 2018 at 01:40:46PM -0600, Jason Gunthorpe wrote: > On Tue, May 01, 2018 at 07:32:10PM +0000, Parav Pandit wrote: > > > Right. > > Please go through commit log. It is in Leon's queue. > > https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git/commit/?h=rdma-next&id=8858e62f3ff9b6cd0242edeb89b47e61a2c24a5f > > Okay. In offline discussion prior to submission, me and Parav decided that current patch (zero out) is good enough for -rc, but removal is better to be sent to -next. > > We should get rid of that BUG_ON in a preceeding patch in case Linus > looks at the patches :( > > Leon, what do you think about just sending a patch to replace the > BUG_ON's with WARN_ON ? I'm not fan of such dumb replacement, hope that I'll catch some BUG_ONs which for sure not possible and remove them. > > $ git grep BUG_ON drivers/infiniband/ | wc --l > 136 You can immediately remove 33 occurrences by simply adding "-w" in git grep line :) ➜ git grep -w BUG_ON drivers/infiniband/ | wc -l 103 Thanks > > :( > > Jason > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 8364223422d0..a693fcd4c513 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -868,7 +868,6 @@ static int cma_modify_qp_rtr(struct rdma_id_private *id_priv, { struct ib_qp_attr qp_attr; int qp_attr_mask, ret; - union ib_gid sgid; mutex_lock(&id_priv->qp_mutex); if (!id_priv->id.qp) { @@ -891,12 +890,6 @@ static int cma_modify_qp_rtr(struct rdma_id_private *id_priv, if (ret) goto out; - ret = ib_query_gid(id_priv->id.device, id_priv->id.port_num, - rdma_ah_read_grh(&qp_attr.ah_attr)->sgid_index, - &sgid, NULL); - if (ret) - goto out; - BUG_ON(id_priv->cma_dev->device != id_priv->id.device); if (conn_param)