Message ID | 20221016061925.1831180-1-yanjun.zhu@intel.com (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Series | [1/1] RDMA/mlx5: Make mlx5 device work with ib_device_get_by_netdev | expand |
On Sun, Oct 16, 2022 at 02:19:25AM -0400, Zhu Yanjun wrote: > From: Zhu Yanjun <yanjun.zhu@linux.dev> > > Before mlx5 ib device is registered, the function ib_device_set_netdev > is not called to map the mlx5 ib device with the related net device. > > As such, when the function ib_device_get_by_netdev is called to get ib > device, NULL is returned. > > Other ib devices, such as irdma, rxe and so on, the function > ib_device_get_by_netdev can get ib device from the related net device. Ohh, you opened Pandora box, everything around it looks half-backed. mlx4 and mlx5 don't call to ib_device_set_netdev(), because they have .get_netdev() callback. This callback is not an easy task to eliminate and many internal attempts failed to eliminate them. This caused to very questionable ksmbd_rdma_capable_netdev() implementation where ksmbd first checked internal ib_dev callback and tried to use ib_device_get_by_netdev(). And to smc_ib, which didn't even bother to use ib_device_get_by_netdev(). > > Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> > --- > drivers/infiniband/hw/mlx5/main.c | 18 ++++++++++++++++++ > 1 file changed, 18 insertions(+) > > diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c > index 883d7c60143e..6899c3f73509 100644 > --- a/drivers/infiniband/hw/mlx5/main.c > +++ b/drivers/infiniband/hw/mlx5/main.c > @@ -168,6 +168,7 @@ static int mlx5_netdev_event(struct notifier_block *this, > u32 port_num = roce->native_port_num; > struct mlx5_core_dev *mdev; > struct mlx5_ib_dev *ibdev; > + int ret = 0; > > ibdev = roce->dev; > mdev = mlx5_ib_get_native_port_mdev(ibdev, port_num, NULL); > @@ -183,6 +184,14 @@ static int mlx5_netdev_event(struct notifier_block *this, This is part of the problem, as you are setting netdev for IB representors, and not for simple RoCE flow. There is more cumbersome multiport flow which needs special logic too. Thanks > if (ndev->dev.parent == mdev->device) > roce->netdev = ndev; > write_unlock(&roce->netdev_lock); > + if (ndev->dev.parent == mdev->device) { > + ret = ib_device_set_netdev(&ibdev->ib_dev, ndev, port_num); > + if (ret) { > + pr_warn("func: %s, error: %d\n", __func__, ret); > + goto done; > + } > + } > + > break; > > case NETDEV_UNREGISTER: > @@ -191,6 +200,15 @@ static int mlx5_netdev_event(struct notifier_block *this, > if (roce->netdev == ndev) > roce->netdev = NULL; > write_unlock(&roce->netdev_lock); > + > + if (roce->netdev == ndev) { > + ret = ib_device_set_netdev(&ibdev->ib_dev, NULL, port_num); > + if (ret) { > + pr_warn("func: %s, error: %d\n", __func__, ret); > + goto done; > + } > + } > + > break; > > case NETDEV_CHANGE: > -- > 2.27.0 >
On Tue, Oct 18, 2022 at 11:24:43AM +0300, Leon Romanovsky wrote: > On Sun, Oct 16, 2022 at 02:19:25AM -0400, Zhu Yanjun wrote: > > From: Zhu Yanjun <yanjun.zhu@linux.dev> > > > > Before mlx5 ib device is registered, the function ib_device_set_netdev > > is not called to map the mlx5 ib device with the related net device. > > > > As such, when the function ib_device_get_by_netdev is called to get ib > > device, NULL is returned. > > > > Other ib devices, such as irdma, rxe and so on, the function > > ib_device_get_by_netdev can get ib device from the related net device. > > Ohh, you opened Pandora box, everything around it looks half-backed. > > mlx4 and mlx5 don't call to ib_device_set_netdev(), because they have > .get_netdev() callback. This callback is not an easy task to eliminate > and many internal attempts failed to eliminate them. > > This caused to very questionable ksmbd_rdma_capable_netdev() > implementation where ksmbd first checked internal ib_dev callback > and tried to use ib_device_get_by_netdev(). And to smc_ib, which > didn't even bother to use ib_device_get_by_netdev(). Oh really? Those APIs were only for driver use, not ULP :( Jason
October 18, 2022 4:24 PM, "Leon Romanovsky" <leon@kernel.org> wrote: > On Sun, Oct 16, 2022 at 02:19:25AM -0400, Zhu Yanjun wrote: > >> From: Zhu Yanjun <yanjun.zhu@linux.dev> >> >> Before mlx5 ib device is registered, the function ib_device_set_netdev >> is not called to map the mlx5 ib device with the related net device. >> >> As such, when the function ib_device_get_by_netdev is called to get ib >> device, NULL is returned. >> >> Other ib devices, such as irdma, rxe and so on, the function >> ib_device_get_by_netdev can get ib device from the related net device. > > Ohh, you opened Pandora box, everything around it looks half-backed. > > mlx4 and mlx5 don't call to ib_device_set_netdev(), because they have > .get_netdev() callback. This callback is not an easy task to eliminate > and many internal attempts failed to eliminate them. > > This caused to very questionable ksmbd_rdma_capable_netdev() > implementation where ksmbd first checked internal ib_dev callback > and tried to use ib_device_get_by_netdev(). And to smc_ib, which > didn't even bother to use ib_device_get_by_netdev(). > >> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> >> --- >> drivers/infiniband/hw/mlx5/main.c | 18 ++++++++++++++++++ >> 1 file changed, 18 insertions(+) >> >> diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c >> index 883d7c60143e..6899c3f73509 100644 >> --- a/drivers/infiniband/hw/mlx5/main.c >> +++ b/drivers/infiniband/hw/mlx5/main.c >> @@ -168,6 +168,7 @@ static int mlx5_netdev_event(struct notifier_block *this, >> u32 port_num = roce->native_port_num; >> struct mlx5_core_dev *mdev; >> struct mlx5_ib_dev *ibdev; >> + int ret = 0; >> >> ibdev = roce->dev; >> mdev = mlx5_ib_get_native_port_mdev(ibdev, port_num, NULL); >> @@ -183,6 +184,14 @@ static int mlx5_netdev_event(struct notifier_block *this, > > This is part of the problem, as you are setting netdev for IB > representors, and not for simple RoCE flow. There is more cumbersome > multiport flow which needs special logic too. In this event, if net device is registered, the snippet will map ib device and net device. So for simple RoCE flow or IB representors, the snippet should work. In my test environment, I just use a mlx5 device to make tests. And this commit works well. So to a simple RoCE flow, this should work. I can not get "more cumbersome multiport flow which needs special logic too" Can you explain it? From my perspective, if the net device is registered, a net event NETDEV_REGISTER will send out. This commit will be called, so the ib device and net device should be mapped. Thanks and Regards Zhu Yanjun > > Thanks > >> if (ndev->dev.parent == mdev->device) >> roce->netdev = ndev; >> write_unlock(&roce->netdev_lock); >> + if (ndev->dev.parent == mdev->device) { >> + ret = ib_device_set_netdev(&ibdev->ib_dev, ndev, port_num); >> + if (ret) { >> + pr_warn("func: %s, error: %d\n", __func__, ret); >> + goto done; >> + } >> + } >> + >> break; >> >> case NETDEV_UNREGISTER: >> @@ -191,6 +200,15 @@ static int mlx5_netdev_event(struct notifier_block *this, >> if (roce->netdev == ndev) >> roce->netdev = NULL; >> write_unlock(&roce->netdev_lock); >> + >> + if (roce->netdev == ndev) { >> + ret = ib_device_set_netdev(&ibdev->ib_dev, NULL, port_num); >> + if (ret) { >> + pr_warn("func: %s, error: %d\n", __func__, ret); >> + goto done; >> + } >> + } >> + >> break; >> >> case NETDEV_CHANGE: >> -- >> 2.27.0
October 18, 2022 4:24 PM, "Leon Romanovsky" <leon@kernel.org> wrote: > On Sun, Oct 16, 2022 at 02:19:25AM -0400, Zhu Yanjun wrote: > >> From: Zhu Yanjun <yanjun.zhu@linux.dev> >> >> Before mlx5 ib device is registered, the function ib_device_set_netdev >> is not called to map the mlx5 ib device with the related net device. >> >> As such, when the function ib_device_get_by_netdev is called to get ib >> device, NULL is returned. >> >> Other ib devices, such as irdma, rxe and so on, the function >> ib_device_get_by_netdev can get ib device from the related net device. > > Ohh, you opened Pandora box, everything around it looks half-backed. > > mlx4 and mlx5 don't call to ib_device_set_netdev(), because they have > .get_netdev() callback. This callback is not an easy task to eliminate > and many internal attempts failed to eliminate them. > > This caused to very questionable ksmbd_rdma_capable_netdev() > implementation where ksmbd first checked internal ib_dev callback > and tried to use ib_device_get_by_netdev(). And to smc_ib, which > didn't even bother to use ib_device_get_by_netdev(). Thanks. I read the function ksmbd_rdma_capable_netdev carefully. Mlx5 and mlx4 do not call ib_device_set_netdev to map net device and ib devices. This brings a lot of problems. When ib devices are needed from net devices, the callers will handle mlx5/4 and other ib devices differently. Zhu Yanjun > >> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> >> --- >> drivers/infiniband/hw/mlx5/main.c | 18 ++++++++++++++++++ >> 1 file changed, 18 insertions(+) >> >> diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c >> index 883d7c60143e..6899c3f73509 100644 >> --- a/drivers/infiniband/hw/mlx5/main.c >> +++ b/drivers/infiniband/hw/mlx5/main.c >> @@ -168,6 +168,7 @@ static int mlx5_netdev_event(struct notifier_block *this, >> u32 port_num = roce->native_port_num; >> struct mlx5_core_dev *mdev; >> struct mlx5_ib_dev *ibdev; >> + int ret = 0; >> >> ibdev = roce->dev; >> mdev = mlx5_ib_get_native_port_mdev(ibdev, port_num, NULL); >> @@ -183,6 +184,14 @@ static int mlx5_netdev_event(struct notifier_block *this, > > This is part of the problem, as you are setting netdev for IB > representors, and not for simple RoCE flow. There is more cumbersome > multiport flow which needs special logic too. > > Thanks > >> if (ndev->dev.parent == mdev->device) >> roce->netdev = ndev; >> write_unlock(&roce->netdev_lock); >> + if (ndev->dev.parent == mdev->device) { >> + ret = ib_device_set_netdev(&ibdev->ib_dev, ndev, port_num); >> + if (ret) { >> + pr_warn("func: %s, error: %d\n", __func__, ret); >> + goto done; >> + } >> + } >> + >> break; >> >> case NETDEV_UNREGISTER: >> @@ -191,6 +200,15 @@ static int mlx5_netdev_event(struct notifier_block *this, >> if (roce->netdev == ndev) >> roce->netdev = NULL; >> write_unlock(&roce->netdev_lock); >> + >> + if (roce->netdev == ndev) { >> + ret = ib_device_set_netdev(&ibdev->ib_dev, NULL, port_num); >> + if (ret) { >> + pr_warn("func: %s, error: %d\n", __func__, ret); >> + goto done; >> + } >> + } >> + >> break; >> >> case NETDEV_CHANGE: >> -- >> 2.27.0
On Wed, Oct 19, 2022 at 09:08:14AM +0000, yanjun.zhu@linux.dev wrote: > October 18, 2022 4:24 PM, "Leon Romanovsky" <leon@kernel.org> wrote: > > > On Sun, Oct 16, 2022 at 02:19:25AM -0400, Zhu Yanjun wrote: > > > >> From: Zhu Yanjun <yanjun.zhu@linux.dev> > >> > >> Before mlx5 ib device is registered, the function ib_device_set_netdev > >> is not called to map the mlx5 ib device with the related net device. > >> > >> As such, when the function ib_device_get_by_netdev is called to get ib > >> device, NULL is returned. > >> > >> Other ib devices, such as irdma, rxe and so on, the function > >> ib_device_get_by_netdev can get ib device from the related net device. > > > > Ohh, you opened Pandora box, everything around it looks half-backed. > > > > mlx4 and mlx5 don't call to ib_device_set_netdev(), because they have > > .get_netdev() callback. This callback is not an easy task to eliminate > > and many internal attempts failed to eliminate them. > > > > This caused to very questionable ksmbd_rdma_capable_netdev() > > implementation where ksmbd first checked internal ib_dev callback > > and tried to use ib_device_get_by_netdev(). And to smc_ib, which > > didn't even bother to use ib_device_get_by_netdev(). > > Thanks. > > I read the function ksmbd_rdma_capable_netdev carefully. > Mlx5 and mlx4 do not call ib_device_set_netdev to map net device and ib devices. > This brings a lot of problems. ULPs are not allowed to use these interfaces, they are for driver implementations. It is an error that ksmbd_rdma_capable_netdev() calls it in the first place. Jason
在 2022/10/19 19:55, Jason Gunthorpe 写道: > On Wed, Oct 19, 2022 at 09:08:14AM +0000, yanjun.zhu@linux.dev wrote: >> October 18, 2022 4:24 PM, "Leon Romanovsky" <leon@kernel.org> wrote: >> >>> On Sun, Oct 16, 2022 at 02:19:25AM -0400, Zhu Yanjun wrote: >>> >>>> From: Zhu Yanjun <yanjun.zhu@linux.dev> >>>> >>>> Before mlx5 ib device is registered, the function ib_device_set_netdev >>>> is not called to map the mlx5 ib device with the related net device. >>>> >>>> As such, when the function ib_device_get_by_netdev is called to get ib >>>> device, NULL is returned. >>>> >>>> Other ib devices, such as irdma, rxe and so on, the function >>>> ib_device_get_by_netdev can get ib device from the related net device. >>> Ohh, you opened Pandora box, everything around it looks half-backed. >>> >>> mlx4 and mlx5 don't call to ib_device_set_netdev(), because they have >>> .get_netdev() callback. This callback is not an easy task to eliminate >>> and many internal attempts failed to eliminate them. >>> >>> This caused to very questionable ksmbd_rdma_capable_netdev() >>> implementation where ksmbd first checked internal ib_dev callback >>> and tried to use ib_device_get_by_netdev(). And to smc_ib, which >>> didn't even bother to use ib_device_get_by_netdev(). >> Thanks. >> >> I read the function ksmbd_rdma_capable_netdev carefully. >> Mlx5 and mlx4 do not call ib_device_set_netdev to map net device and ib devices. >> This brings a lot of problems. > ULPs are not allowed to use these interfaces, they are for driver > implementations. > > It is an error that ksmbd_rdma_capable_netdev() calls it in the first > place. Got it. The following function should complete the same job. Maybe we can use this function to implement ksmbd_rdma_capable_netdev again. int ib_enum_all_devs(nldev_callback nldev_cb, struct sk_buff *skb, struct netlink_callback *cb) Thanks and Regards, Zhu Yanjun > > Jason
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c index 883d7c60143e..6899c3f73509 100644 --- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -168,6 +168,7 @@ static int mlx5_netdev_event(struct notifier_block *this, u32 port_num = roce->native_port_num; struct mlx5_core_dev *mdev; struct mlx5_ib_dev *ibdev; + int ret = 0; ibdev = roce->dev; mdev = mlx5_ib_get_native_port_mdev(ibdev, port_num, NULL); @@ -183,6 +184,14 @@ static int mlx5_netdev_event(struct notifier_block *this, if (ndev->dev.parent == mdev->device) roce->netdev = ndev; write_unlock(&roce->netdev_lock); + if (ndev->dev.parent == mdev->device) { + ret = ib_device_set_netdev(&ibdev->ib_dev, ndev, port_num); + if (ret) { + pr_warn("func: %s, error: %d\n", __func__, ret); + goto done; + } + } + break; case NETDEV_UNREGISTER: @@ -191,6 +200,15 @@ static int mlx5_netdev_event(struct notifier_block *this, if (roce->netdev == ndev) roce->netdev = NULL; write_unlock(&roce->netdev_lock); + + if (roce->netdev == ndev) { + ret = ib_device_set_netdev(&ibdev->ib_dev, NULL, port_num); + if (ret) { + pr_warn("func: %s, error: %d\n", __func__, ret); + goto done; + } + } + break; case NETDEV_CHANGE: