diff mbox

crash in 4.14-rc1 with IPoIB

Message ID 7aac2d78-462b-c9ad-4443-9ec670a27b74@grimberg.me (mailing list archive)
State Not Applicable
Headers show

Commit Message

Sagi Grimberg Sept. 20, 2017, 10:37 a.m. UTC
> Hi folks,
> 
> I wanted to try out Christoph's NVMe multipathing patchset on my NVMe OmniPath
> setup and merged it into 4.14-rc1. On bootup I stumbled upon that splat and no
> RDMA operation was possible:

...

> is_valid_mcast_lid.isra.23+0xfb/0x110
> 
> (gdb) l *(is_valid_mcast_lid+0xfb)
> 0x229b is in is_valid_mcast_lid (drivers/infiniband/core/verbs.c:1649).
> 1644		/* If QP state >= init, it is assigned to a port and we can check this
> 1645		 * port only.
> 1646		 */
> 1647		if (!ib_query_qp(qp, &attr, IB_QP_STATE | IB_QP_PORT, &init_attr)) {
> 1648			if (attr.qp_state >= IB_QPS_INIT) {
> 1649				if (qp->device->get_link_layer(qp->device, attr.port_num) !=
> 1650				    IB_LINK_LAYER_INFINIBAND)
> 1651					return true;
> 1652				goto lid_check;
> 1653			}
> (gdb)

Why isn't ipoib uses the generic rdma_port_get_link_layer?

Does this help?
--
attr.port_num) !=
                             IB_LINK_LAYER_INFINIBAND)
                                 return true;
                         goto lid_check;
--
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Johannes Thumshirn Sept. 20, 2017, 10:57 a.m. UTC | #1
On Wed, Sep 20, 2017 at 01:37:28PM +0300, Sagi Grimberg wrote:
> --
> diff --git a/drivers/infiniband/core/verbs.c
> b/drivers/infiniband/core/verbs.c
> index ee9e27dc799b..f2c70afea238 100644
> --- a/drivers/infiniband/core/verbs.c
> +++ b/drivers/infiniband/core/verbs.c
> @@ -1646,7 +1646,7 @@ static bool is_valid_mcast_lid(struct ib_qp *qp, u16
> lid)
>          */
>         if (!ib_query_qp(qp, &attr, IB_QP_STATE | IB_QP_PORT, &init_attr)) {
>                 if (attr.qp_state >= IB_QPS_INIT) {
> -                       if (qp->device->get_link_layer(qp->device,
> attr.port_num) !=
> +                       if (rdma_port_get_link_layer(qp->device,
> attr.port_num) !=
>                             IB_LINK_LAYER_INFINIBAND)
>                                 return true;
>                         goto lid_check;
> --

w00000! You're my hero.

Tested-by: Johannes Thumshirn <jthumshirn@suse.de>
Hal Rosenstock Sept. 20, 2017, 11:35 a.m. UTC | #2
On 9/20/2017 6:37 AM, Sagi Grimberg wrote:
>> Hi folks,
>>
>> I wanted to try out Christoph's NVMe multipathing patchset on my NVMe
>> OmniPath
>> setup and merged it into 4.14-rc1. On bootup I stumbled upon that
>> splat and no
>> RDMA operation was possible:
> 
> ...
> 
>> is_valid_mcast_lid.isra.23+0xfb/0x110
>>
>> (gdb) l *(is_valid_mcast_lid+0xfb)
>> 0x229b is in is_valid_mcast_lid (drivers/infiniband/core/verbs.c:1649).
>> 1644        /* If QP state >= init, it is assigned to a port and we
>> can check this
>> 1645         * port only.
>> 1646         */
>> 1647        if (!ib_query_qp(qp, &attr, IB_QP_STATE | IB_QP_PORT,
>> &init_attr)) {
>> 1648            if (attr.qp_state >= IB_QPS_INIT) {
>> 1649                if (qp->device->get_link_layer(qp->device,
>> attr.port_num) !=
>> 1650                    IB_LINK_LAYER_INFINIBAND)
>> 1651                    return true;
>> 1652                goto lid_check;
>> 1653            }
>> (gdb)
> 
> Why isn't ipoib uses the generic rdma_port_get_link_layer?
> 
> Does this help?
> -- 
> diff --git a/drivers/infiniband/core/verbs.c
> b/drivers/infiniband/core/verbs.c
> index ee9e27dc799b..f2c70afea238 100644
> --- a/drivers/infiniband/core/verbs.c
> +++ b/drivers/infiniband/core/verbs.c
> @@ -1646,7 +1646,7 @@ static bool is_valid_mcast_lid(struct ib_qp *qp,
> u16 lid)
>          */
>         if (!ib_query_qp(qp, &attr, IB_QP_STATE | IB_QP_PORT,
> &init_attr)) {
>                 if (attr.qp_state >= IB_QPS_INIT) {
> -                       if (qp->device->get_link_layer(qp->device,
> attr.port_num) !=
> +                       if (rdma_port_get_link_layer(qp->device,
> attr.port_num) !=
>                             IB_LINK_LAYER_INFINIBAND)
>                                 return true;
>                         goto lid_check;

There's another occurrence of qp->device->get_link_layer in that routine
just below this. Shouldn't that be replaced by rdma_port_get_link_layer
too ?

-- Hal
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sagi Grimberg Sept. 20, 2017, 11:51 a.m. UTC | #3
Hey Hal! :)

> There's another occurrence of qp->device->get_link_layer in that routine
> just below this. Shouldn't that be replaced by rdma_port_get_link_layer
> too ?

You're absolutely correct!

Sending a formal patch now.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/infiniband/core/verbs.c 
b/drivers/infiniband/core/verbs.c
index ee9e27dc799b..f2c70afea238 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1646,7 +1646,7 @@  static bool is_valid_mcast_lid(struct ib_qp *qp, 
u16 lid)
          */
         if (!ib_query_qp(qp, &attr, IB_QP_STATE | IB_QP_PORT, 
&init_attr)) {
                 if (attr.qp_state >= IB_QPS_INIT) {
-                       if (qp->device->get_link_layer(qp->device, 
attr.port_num) !=
+                       if (rdma_port_get_link_layer(qp->device,