Message ID | 20220926024033.284341-1-yanjun.zhu@linux.dev (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Leon Romanovsky |
Headers | show |
Series | rdma: not display the rdma link in other net namespace | expand |
在 2022/9/26 10:40, yanjun.zhu@linux.dev 写道: > From: Zhu Yanjun <yanjun.zhu@linux.dev> > > When the net devices are moved to another net namespace, the command > "rdma link" should not dispaly the rdma link about this net device. > > For example, when the net device eno12399 is moved to net namespace net0 > from init_net, the rdma link of eno12399 should not display in init_net. > > Before this change: > > Init_net: > > link roceo12399/1 state DOWN physical_state DISABLED <---should not display > link roceo12409/1 state DOWN physical_state DISABLED netdev eno12409 > link rocep202s0f0/1 state DOWN physical_state DISABLED netdev ens7f0 > link rocep202s0f1/1 state ACTIVE physical_state LINK_UP netdev ens7f1 > > net0: > > link roceo12399/1 state DOWN physical_state DISABLED netdev eno12399 > link roceo12409/1 state DOWN physical_state DISABLED <---should not display > link rocep202s0f0/1 state DOWN physical_state DISABLED <---should not display > link rocep202s0f1/1 state ACTIVE physical_state LINK_UP <---should not display > > After this change > > Init_net: > > link roceo12409/1 state DOWN physical_state DISABLED netdev eno12409 > link rocep202s0f0/1 state DOWN physical_state DISABLED netdev ens7f0 > link rocep202s0f1/1 state ACTIVE physical_state LINK_UP netdev ens7f1 > > net0: > > link roceo12399/1 state DOWN physical_state DISABLED netdev eno12399 > > Fixes: da990ab40a92 ("rdma: Add link object") > Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Hi Leon This commit is to fix a problem in "rdma link show" of iproute2. I do not know the maillist of iproute2. So I send the commit here. If you know the maillist, please let me know. Thanks and Regards, Zhu Yanjun > --- > rdma/link.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/rdma/link.c b/rdma/link.c > index bf24b849..449a7636 100644 > --- a/rdma/link.c > +++ b/rdma/link.c > @@ -238,6 +238,9 @@ static int link_parse_cb(const struct nlmsghdr *nlh, void *data) > return MNL_CB_ERROR; > } > > + if (!tb[RDMA_NLDEV_ATTR_NDEV_NAME] || !tb[RDMA_NLDEV_ATTR_NDEV_INDEX]) > + return MNL_CB_OK; > + > idx = mnl_attr_get_u32(tb[RDMA_NLDEV_ATTR_DEV_INDEX]); > port = mnl_attr_get_u32(tb[RDMA_NLDEV_ATTR_PORT_INDEX]); > name = mnl_attr_get_str(tb[RDMA_NLDEV_ATTR_DEV_NAME]);
On Sun, Sep 25, 2022 at 10:40:33PM -0400, yanjun.zhu@linux.dev wrote: > From: Zhu Yanjun <yanjun.zhu@linux.dev> > > When the net devices are moved to another net namespace, the command > "rdma link" should not dispaly the rdma link about this net device. > > For example, when the net device eno12399 is moved to net namespace net0 > from init_net, the rdma link of eno12399 should not display in init_net. > > Before this change: > > Init_net: > > link roceo12399/1 state DOWN physical_state DISABLED <---should not display > link roceo12409/1 state DOWN physical_state DISABLED netdev eno12409 > link rocep202s0f0/1 state DOWN physical_state DISABLED netdev ens7f0 > link rocep202s0f1/1 state ACTIVE physical_state LINK_UP netdev ens7f1 > > net0: > > link roceo12399/1 state DOWN physical_state DISABLED netdev eno12399 > link roceo12409/1 state DOWN physical_state DISABLED <---should not display > link rocep202s0f0/1 state DOWN physical_state DISABLED <---should not display > link rocep202s0f1/1 state ACTIVE physical_state LINK_UP <---should not display > > After this change > > Init_net: > > link roceo12409/1 state DOWN physical_state DISABLED netdev eno12409 > link rocep202s0f0/1 state DOWN physical_state DISABLED netdev ens7f0 > link rocep202s0f1/1 state ACTIVE physical_state LINK_UP netdev ens7f1 > > net0: > > link roceo12399/1 state DOWN physical_state DISABLED netdev eno12399 > > Fixes: da990ab40a92 ("rdma: Add link object") > Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> > --- > rdma/link.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/rdma/link.c b/rdma/link.c > index bf24b849..449a7636 100644 > --- a/rdma/link.c > +++ b/rdma/link.c > @@ -238,6 +238,9 @@ static int link_parse_cb(const struct nlmsghdr *nlh, void *data) > return MNL_CB_ERROR; > } > > + if (!tb[RDMA_NLDEV_ATTR_NDEV_NAME] || !tb[RDMA_NLDEV_ATTR_NDEV_INDEX]) > + return MNL_CB_OK; > + Regarding your question where it should go in addition to RDMA, the answer is netdev ML. The rdmatool is part of iproute2 and the relevant maintainers should be CCed. Regarding the change, I don't think that it is right. User space tool is a simple viewer of data returned from the kernel. It is not a mistake to return device without netdev. Thanks
在 2022/9/27 18:34, Leon Romanovsky 写道: > On Sun, Sep 25, 2022 at 10:40:33PM -0400, yanjun.zhu@linux.dev wrote: >> From: Zhu Yanjun <yanjun.zhu@linux.dev> >> >> When the net devices are moved to another net namespace, the command >> "rdma link" should not dispaly the rdma link about this net device. >> >> For example, when the net device eno12399 is moved to net namespace net0 >> from init_net, the rdma link of eno12399 should not display in init_net. >> >> Before this change: >> >> Init_net: >> >> link roceo12399/1 state DOWN physical_state DISABLED <---should not display >> link roceo12409/1 state DOWN physical_state DISABLED netdev eno12409 >> link rocep202s0f0/1 state DOWN physical_state DISABLED netdev ens7f0 >> link rocep202s0f1/1 state ACTIVE physical_state LINK_UP netdev ens7f1 >> >> net0: >> >> link roceo12399/1 state DOWN physical_state DISABLED netdev eno12399 >> link roceo12409/1 state DOWN physical_state DISABLED <---should not display >> link rocep202s0f0/1 state DOWN physical_state DISABLED <---should not display >> link rocep202s0f1/1 state ACTIVE physical_state LINK_UP <---should not display >> >> After this change >> >> Init_net: >> >> link roceo12409/1 state DOWN physical_state DISABLED netdev eno12409 >> link rocep202s0f0/1 state DOWN physical_state DISABLED netdev ens7f0 >> link rocep202s0f1/1 state ACTIVE physical_state LINK_UP netdev ens7f1 >> >> net0: >> >> link roceo12399/1 state DOWN physical_state DISABLED netdev eno12399 >> >> Fixes: da990ab40a92 ("rdma: Add link object") >> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> >> --- >> rdma/link.c | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/rdma/link.c b/rdma/link.c >> index bf24b849..449a7636 100644 >> --- a/rdma/link.c >> +++ b/rdma/link.c >> @@ -238,6 +238,9 @@ static int link_parse_cb(const struct nlmsghdr *nlh, void *data) >> return MNL_CB_ERROR; >> } >> >> + if (!tb[RDMA_NLDEV_ATTR_NDEV_NAME] || !tb[RDMA_NLDEV_ATTR_NDEV_INDEX]) >> + return MNL_CB_OK; >> + > Regarding your question where it should go in addition to RDMA, the answer > is netdev ML. The rdmatool is part of iproute2 and the relevant maintainers > should be CCed. Thanks. I will also send it to netdev ML and CC the maintainers. > > Regarding the change, I don't think that it is right. User space tool is > a simple viewer of data returned from the kernel. It is not a mistake to > return device without netdev. Normally a rdma link based on RoCEv2 should be with a NIC. This NIC device will send/recv udp packets. With mellanox/intel NIC device, this net device also do more work than sending/receiving packets. From this perspective, a rdma link is dependent on a net device. In this problem, net device is moved to another net namespace. So it can not be obtained. And this rdma link can also not work in this net namespace. So this rdma link should not appear in this net namespace. Or else, it would confuse the user. In fact, net namespace is a concept in tcp/ip stack. And it does not exist in rdma stack. But rdma link based on RoCEv2 is dependent on network device. So when the net device is moved to another net namespace, this rdma link should also be moved to another net namespace. Zhu Yanjun > > Thanks
On Tue, Sep 27, 2022 at 06:58:50PM +0800, Yanjun Zhu wrote: > > 在 2022/9/27 18:34, Leon Romanovsky 写道: > > On Sun, Sep 25, 2022 at 10:40:33PM -0400, yanjun.zhu@linux.dev wrote: > > > From: Zhu Yanjun <yanjun.zhu@linux.dev> > > > > > > When the net devices are moved to another net namespace, the command > > > "rdma link" should not dispaly the rdma link about this net device. > > > > > > For example, when the net device eno12399 is moved to net namespace net0 > > > from init_net, the rdma link of eno12399 should not display in init_net. > > > > > > Before this change: > > > > > > Init_net: > > > > > > link roceo12399/1 state DOWN physical_state DISABLED <---should not display > > > link roceo12409/1 state DOWN physical_state DISABLED netdev eno12409 > > > link rocep202s0f0/1 state DOWN physical_state DISABLED netdev ens7f0 > > > link rocep202s0f1/1 state ACTIVE physical_state LINK_UP netdev ens7f1 > > > > > > net0: > > > > > > link roceo12399/1 state DOWN physical_state DISABLED netdev eno12399 > > > link roceo12409/1 state DOWN physical_state DISABLED <---should not display > > > link rocep202s0f0/1 state DOWN physical_state DISABLED <---should not display > > > link rocep202s0f1/1 state ACTIVE physical_state LINK_UP <---should not display > > > > > > After this change > > > > > > Init_net: > > > > > > link roceo12409/1 state DOWN physical_state DISABLED netdev eno12409 > > > link rocep202s0f0/1 state DOWN physical_state DISABLED netdev ens7f0 > > > link rocep202s0f1/1 state ACTIVE physical_state LINK_UP netdev ens7f1 > > > > > > net0: > > > > > > link roceo12399/1 state DOWN physical_state DISABLED netdev eno12399 > > > > > > Fixes: da990ab40a92 ("rdma: Add link object") > > > Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> > > > --- > > > rdma/link.c | 3 +++ > > > 1 file changed, 3 insertions(+) > > > > > > diff --git a/rdma/link.c b/rdma/link.c > > > index bf24b849..449a7636 100644 > > > --- a/rdma/link.c > > > +++ b/rdma/link.c > > > @@ -238,6 +238,9 @@ static int link_parse_cb(const struct nlmsghdr *nlh, void *data) > > > return MNL_CB_ERROR; > > > } > > > + if (!tb[RDMA_NLDEV_ATTR_NDEV_NAME] || !tb[RDMA_NLDEV_ATTR_NDEV_INDEX]) > > > + return MNL_CB_OK; > > > + > > Regarding your question where it should go in addition to RDMA, the answer > > is netdev ML. The rdmatool is part of iproute2 and the relevant maintainers > > should be CCed. > Thanks. I will also send it to netdev ML and CC the maintainers. > > > > Regarding the change, I don't think that it is right. User space tool is > > a simple viewer of data returned from the kernel. It is not a mistake to > > return device without netdev. > > Normally a rdma link based on RoCEv2 should be with a NIC. This NIC device > > will send/recv udp packets. With mellanox/intel NIC device, this net device > also > > do more work than sending/receiving packets. > > From this perspective, a rdma link is dependent on a net device. > > In this problem, net device is moved to another net namespace. So it can not > be > > obtained. And this rdma link can also not work in this net namespace. > > So this rdma link should not appear in this net namespace. Or else, it would > confuse > > the user. > > In fact, net namespace is a concept in tcp/ip stack. And it does not exist > in rdma stack. RDMA has two different net namespace mode: shared and exclusive. In shared mode, the IB devices are shared across all net namespaces and "moving" net device into different namespace just "hides" it, but don't disconnect. See comments around various usages of ib_devices_shared_netns variable. Thanks
在 2022/9/28 14:04, Leon Romanovsky 写道: > On Tue, Sep 27, 2022 at 06:58:50PM +0800, Yanjun Zhu wrote: >> >> 在 2022/9/27 18:34, Leon Romanovsky 写道: >>> On Sun, Sep 25, 2022 at 10:40:33PM -0400, yanjun.zhu@linux.dev wrote: >>>> From: Zhu Yanjun <yanjun.zhu@linux.dev> >>>> >>>> When the net devices are moved to another net namespace, the command >>>> "rdma link" should not dispaly the rdma link about this net device. >>>> >>>> For example, when the net device eno12399 is moved to net namespace net0 >>>> from init_net, the rdma link of eno12399 should not display in init_net. >>>> >>>> Before this change: >>>> >>>> Init_net: >>>> >>>> link roceo12399/1 state DOWN physical_state DISABLED <---should not display >>>> link roceo12409/1 state DOWN physical_state DISABLED netdev eno12409 >>>> link rocep202s0f0/1 state DOWN physical_state DISABLED netdev ens7f0 >>>> link rocep202s0f1/1 state ACTIVE physical_state LINK_UP netdev ens7f1 >>>> >>>> net0: >>>> >>>> link roceo12399/1 state DOWN physical_state DISABLED netdev eno12399 >>>> link roceo12409/1 state DOWN physical_state DISABLED <---should not display >>>> link rocep202s0f0/1 state DOWN physical_state DISABLED <---should not display >>>> link rocep202s0f1/1 state ACTIVE physical_state LINK_UP <---should not display >>>> >>>> After this change >>>> >>>> Init_net: >>>> >>>> link roceo12409/1 state DOWN physical_state DISABLED netdev eno12409 >>>> link rocep202s0f0/1 state DOWN physical_state DISABLED netdev ens7f0 >>>> link rocep202s0f1/1 state ACTIVE physical_state LINK_UP netdev ens7f1 >>>> >>>> net0: >>>> >>>> link roceo12399/1 state DOWN physical_state DISABLED netdev eno12399 >>>> >>>> Fixes: da990ab40a92 ("rdma: Add link object") >>>> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> >>>> --- >>>> rdma/link.c | 3 +++ >>>> 1 file changed, 3 insertions(+) >>>> >>>> diff --git a/rdma/link.c b/rdma/link.c >>>> index bf24b849..449a7636 100644 >>>> --- a/rdma/link.c >>>> +++ b/rdma/link.c >>>> @@ -238,6 +238,9 @@ static int link_parse_cb(const struct nlmsghdr *nlh, void *data) >>>> return MNL_CB_ERROR; >>>> } >>>> + if (!tb[RDMA_NLDEV_ATTR_NDEV_NAME] || !tb[RDMA_NLDEV_ATTR_NDEV_INDEX]) >>>> + return MNL_CB_OK; >>>> + >>> Regarding your question where it should go in addition to RDMA, the answer >>> is netdev ML. The rdmatool is part of iproute2 and the relevant maintainers >>> should be CCed. >> Thanks. I will also send it to netdev ML and CC the maintainers. >>> >>> Regarding the change, I don't think that it is right. User space tool is >>> a simple viewer of data returned from the kernel. It is not a mistake to >>> return device without netdev. >> >> Normally a rdma link based on RoCEv2 should be with a NIC. This NIC device >> >> will send/recv udp packets. With mellanox/intel NIC device, this net device >> also >> >> do more work than sending/receiving packets. >> >> From this perspective, a rdma link is dependent on a net device. >> >> In this problem, net device is moved to another net namespace. So it can not >> be >> >> obtained. And this rdma link can also not work in this net namespace. >> >> So this rdma link should not appear in this net namespace. Or else, it would >> confuse >> >> the user. >> >> In fact, net namespace is a concept in tcp/ip stack. And it does not exist >> in rdma stack. > > RDMA has two different net namespace mode: shared and exclusive. This is different from net namespace in network. > > In shared mode, the IB devices are shared across all net namespaces and > "moving" net device into different namespace just "hides" it, but don't > disconnect. In exclusive mode, the net device also hide. It is the same with shared mode. # rdma system netns exclusive copy-on-fork on # rdma link link roceo12399/1 state DOWN physical_state DISABLED link roceo12409/1 state DOWN physical_state DISABLED netdev eno12409 link rocep202s0f0/1 state DOWN physical_state DISABLED netdev ens7f0 link rocep202s0f1/1 state ACTIVE physical_state LINK_UP netdev ens7f1 Is it better to append "exclusive" or "shared" in the end of the line? For example, Exclusive mode: # rdma system netns exclusive copy-on-fork on # rdma link link roceo12399/1 state DOWN physical_state DISABLED exclusive Shared mode: # rdma system netns shared copy-on-fork on # rdma link link roceo12399/1 state DOWN physical_state DISABLED shared Thanks and Regards Zhu Yanjun > > See comments around various usages of ib_devices_shared_netns variable. > > Thanks
On Fri, Sep 30, 2022 at 03:25:00PM +0800, Yanjun Zhu wrote: > 在 2022/9/28 14:04, Leon Romanovsky 写道: > > On Tue, Sep 27, 2022 at 06:58:50PM +0800, Yanjun Zhu wrote: > > > > > > 在 2022/9/27 18:34, Leon Romanovsky 写道: > > > > On Sun, Sep 25, 2022 at 10:40:33PM -0400, yanjun.zhu@linux.dev wrote: > > > > > From: Zhu Yanjun <yanjun.zhu@linux.dev> <...> > Is it better to append "exclusive" or "shared" in the end of the line? No, exclusive/shared is global property, applied to all links. Thanks
在 2022/10/6 20:53, Leon Romanovsky 写道: > On Fri, Sep 30, 2022 at 03:25:00PM +0800, Yanjun Zhu wrote: >> 在 2022/9/28 14:04, Leon Romanovsky 写道: >>> On Tue, Sep 27, 2022 at 06:58:50PM +0800, Yanjun Zhu wrote: >>>> 在 2022/9/27 18:34, Leon Romanovsky 写道: >>>>> On Sun, Sep 25, 2022 at 10:40:33PM -0400, yanjun.zhu@linux.dev wrote: >>>>>> From: Zhu Yanjun <yanjun.zhu@linux.dev> > <...> > >> Is it better to append "exclusive" or "shared" in the end of the line? > No, exclusive/shared is global property, applied to all links. OK. When running "rdma link show", there is no difference between shared and exclusive. Is it acceptable? And in exclusive mode, a rdma link that can not be accessed in net namespace A still appears in net namespace A when running "rdma link show" in net namespace A. The above is different from others in net namespace. For example, in net namespace, if net device NIC0 is moved to net namespace B from net namespace A, this NIC0 will not appear in net namespace A when running "ip link" command in net namespace A. Is it a problem? Zhu Yanjun > > Thanks
On Thu, Oct 06, 2022 at 10:26:33PM +0800, Yanjun Zhu wrote: > > 在 2022/10/6 20:53, Leon Romanovsky 写道: > > On Fri, Sep 30, 2022 at 03:25:00PM +0800, Yanjun Zhu wrote: > > > 在 2022/9/28 14:04, Leon Romanovsky 写道: > > > > On Tue, Sep 27, 2022 at 06:58:50PM +0800, Yanjun Zhu wrote: > > > > > 在 2022/9/27 18:34, Leon Romanovsky 写道: > > > > > > On Sun, Sep 25, 2022 at 10:40:33PM -0400, yanjun.zhu@linux.dev wrote: > > > > > > > From: Zhu Yanjun <yanjun.zhu@linux.dev> > > <...> > > > > > Is it better to append "exclusive" or "shared" in the end of the line? > > No, exclusive/shared is global property, applied to all links. > > OK. > > When running "rdma link show", there is no difference between shared and > exclusive. > > Is it acceptable? exclusive/shared is ib_core module parameter, so or all links are shared or all links are exclusive. They can't be both at the same time. > > > And in exclusive mode, a rdma link that can not be accessed in net namespace > A still > > appears in net namespace A when running "rdma link show" in net namespace A. > > The above is different from others in net namespace. > > For example, in net namespace, if net device NIC0 is moved to net namespace > B from net namespace A, > > this NIC0 will not appear in net namespace A when running "ip link" command > in net namespace A. > > Is it a problem? rdmatool presents IB devices. It has no logic that decides if that device is usable/operable or not. Thanks > > > Zhu Yanjun > > > > > Thanks
On Thu, Oct 06, 2022 at 07:21:39PM +0300, Leon Romanovsky wrote: > > And in exclusive mode, a rdma link that can not be accessed in net namespace > > A still > > > > appears in net namespace A when running "rdma link show" in net namespace A. > > > > The above is different from others in net namespace. > > > > For example, in net namespace, if net device NIC0 is moved to net namespace > > B from net namespace A, > > > > this NIC0 will not appear in net namespace A when running "ip link" command > > in net namespace A. > > > > Is it a problem? > > rdmatool presents IB devices. It has no logic that decides if that > device is usable/operable or not. It should really not report an IB device that is not in the net namespace.. I'm surprised this hasn't been noticed because it will break verbs. Jason
On Thu, Oct 06, 2022 at 01:23:05PM -0300, Jason Gunthorpe wrote: > On Thu, Oct 06, 2022 at 07:21:39PM +0300, Leon Romanovsky wrote: > > > > And in exclusive mode, a rdma link that can not be accessed in net namespace > > > A still > > > > > > appears in net namespace A when running "rdma link show" in net namespace A. > > > > > > The above is different from others in net namespace. > > > > > > For example, in net namespace, if net device NIC0 is moved to net namespace > > > B from net namespace A, > > > > > > this NIC0 will not appear in net namespace A when running "ip link" command > > > in net namespace A. > > > > > > Is it a problem? > > > > rdmatool presents IB devices. It has no logic that decides if that > > device is usable/operable or not. > > It should really not report an IB device that is not in the net > namespace.. It is kernel (nldev.c) job to hide such IB devices and it seems like it does. For devices with help of ib_enum_all_devs() and for links with ib_device_get_by_index(). They both checks netns - rdma_dev_access_netns(device, net). Thanks
在 2022/10/7 14:21, Leon Romanovsky 写道: > On Thu, Oct 06, 2022 at 01:23:05PM -0300, Jason Gunthorpe wrote: >> On Thu, Oct 06, 2022 at 07:21:39PM +0300, Leon Romanovsky wrote: >> >>>> And in exclusive mode, a rdma link that can not be accessed in net namespace >>>> A still >>>> >>>> appears in net namespace A when running "rdma link show" in net namespace A. >>>> >>>> The above is different from others in net namespace. >>>> >>>> For example, in net namespace, if net device NIC0 is moved to net namespace >>>> B from net namespace A, >>>> >>>> this NIC0 will not appear in net namespace A when running "ip link" command >>>> in net namespace A. >>>> >>>> Is it a problem? >>> rdmatool presents IB devices. It has no logic that decides if that >>> device is usable/operable or not. >> It should really not report an IB device that is not in the net >> namespace.. > It is kernel (nldev.c) job to hide such IB devices and it seems like it > does. For devices with help of ib_enum_all_devs() and for links with > ib_device_get_by_index(). Thanks, Jason and Leon I am working on it. I will send the patch out very soon. Zhu Yanjun > > They both checks netns - rdma_dev_access_netns(device, net). > > Thanks
September 28, 2022 2:04 PM, "Leon Romanovsky" <leon@kernel.org> wrote: > On Tue, Sep 27, 2022 at 06:58:50PM +0800, Yanjun Zhu wrote: > >> 在 2022/9/27 18:34, Leon Romanovsky 写道: >> On Sun, Sep 25, 2022 at 10:40:33PM -0400, yanjun.zhu@linux.dev wrote: >>> From: Zhu Yanjun <yanjun.zhu@linux.dev> >>> >>> When the net devices are moved to another net namespace, the command >>> "rdma link" should not dispaly the rdma link about this net device. >>> >>> For example, when the net device eno12399 is moved to net namespace net0 >>> from init_net, the rdma link of eno12399 should not display in init_net. >>> >>> Before this change: >>> >>> Init_net: >>> >>> link roceo12399/1 state DOWN physical_state DISABLED <---should not display >>> link roceo12409/1 state DOWN physical_state DISABLED netdev eno12409 >>> link rocep202s0f0/1 state DOWN physical_state DISABLED netdev ens7f0 >>> link rocep202s0f1/1 state ACTIVE physical_state LINK_UP netdev ens7f1 >>> >>> net0: >>> >>> link roceo12399/1 state DOWN physical_state DISABLED netdev eno12399 >>> link roceo12409/1 state DOWN physical_state DISABLED <---should not display >>> link rocep202s0f0/1 state DOWN physical_state DISABLED <---should not display >>> link rocep202s0f1/1 state ACTIVE physical_state LINK_UP <---should not display >>> >>> After this change >>> >>> Init_net: >>> >>> link roceo12409/1 state DOWN physical_state DISABLED netdev eno12409 >>> link rocep202s0f0/1 state DOWN physical_state DISABLED netdev ens7f0 >>> link rocep202s0f1/1 state ACTIVE physical_state LINK_UP netdev ens7f1 >>> >>> net0: >>> >>> link roceo12399/1 state DOWN physical_state DISABLED netdev eno12399 >>> >>> Fixes: da990ab40a92 ("rdma: Add link object") >>> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> >>> --- >>> rdma/link.c | 3 +++ >>> 1 file changed, 3 insertions(+) >>> >>> diff --git a/rdma/link.c b/rdma/link.c >>> index bf24b849..449a7636 100644 >>> --- a/rdma/link.c >>> +++ b/rdma/link.c >>> @@ -238,6 +238,9 @@ static int link_parse_cb(const struct nlmsghdr *nlh, void *data) >>> return MNL_CB_ERROR; >>> } >>> + if (!tb[RDMA_NLDEV_ATTR_NDEV_NAME] || !tb[RDMA_NLDEV_ATTR_NDEV_INDEX]) >>> + return MNL_CB_OK; >>> + >> Regarding your question where it should go in addition to RDMA, the answer >> is netdev ML. The rdmatool is part of iproute2 and the relevant maintainers >> should be CCed. >> Thanks. I will also send it to netdev ML and CC the maintainers. >> >> Regarding the change, I don't think that it is right. User space tool is >> a simple viewer of data returned from the kernel. It is not a mistake to >> return device without netdev. >> >> Normally a rdma link based on RoCEv2 should be with a NIC. This NIC device >> >> will send/recv udp packets. With mellanox/intel NIC device, this net device >> also >> >> do more work than sending/receiving packets. >> >> From this perspective, a rdma link is dependent on a net device. >> >> In this problem, net device is moved to another net namespace. So it can not >> be >> >> obtained. And this rdma link can also not work in this net namespace. >> >> So this rdma link should not appear in this net namespace. Or else, it would >> confuse >> >> the user. >> >> In fact, net namespace is a concept in tcp/ip stack. And it does not exist >> in rdma stack. > > RDMA has two different net namespace mode: shared and exclusive. > > In shared mode, the IB devices are shared across all net namespaces and > "moving" net device into different namespace just "hides" it, but don't > disconnect. Hi, Leon About RDMA shared and exclusive mode, I am confusing about this scenario: In shared mode, ib device A is in net namespace A1 while netdev device B is in net namespace B1. IB device A is dependent on netdev device B. How to make tests in the above scenario? Both rping and perftest need a IP address to work. But now ip address is in net namespace B1 while ib device A is in net namespace A1. In the product environment, does the above scenario exist? Thanks and Regards, Zhu Yanjun > > See comments around various usages of ib_devices_shared_netns variable. > > Thanks
On Sun, Oct 09, 2022 at 10:20:53AM +0000, yanjun.zhu@linux.dev wrote: > September 28, 2022 2:04 PM, "Leon Romanovsky" <leon@kernel.org> wrote: > > > On Tue, Sep 27, 2022 at 06:58:50PM +0800, Yanjun Zhu wrote: > > > >> 在 2022/9/27 18:34, Leon Romanovsky 写道: > >> On Sun, Sep 25, 2022 at 10:40:33PM -0400, yanjun.zhu@linux.dev wrote: > >>> From: Zhu Yanjun <yanjun.zhu@linux.dev> > >>> > >>> When the net devices are moved to another net namespace, the command > >>> "rdma link" should not dispaly the rdma link about this net device. > >>> > >>> For example, when the net device eno12399 is moved to net namespace net0 > >>> from init_net, the rdma link of eno12399 should not display in init_net. > >>> > >>> Before this change: > >>> > >>> Init_net: > >>> > >>> link roceo12399/1 state DOWN physical_state DISABLED <---should not display > >>> link roceo12409/1 state DOWN physical_state DISABLED netdev eno12409 > >>> link rocep202s0f0/1 state DOWN physical_state DISABLED netdev ens7f0 > >>> link rocep202s0f1/1 state ACTIVE physical_state LINK_UP netdev ens7f1 > >>> > >>> net0: > >>> > >>> link roceo12399/1 state DOWN physical_state DISABLED netdev eno12399 > >>> link roceo12409/1 state DOWN physical_state DISABLED <---should not display > >>> link rocep202s0f0/1 state DOWN physical_state DISABLED <---should not display > >>> link rocep202s0f1/1 state ACTIVE physical_state LINK_UP <---should not display > >>> > >>> After this change > >>> > >>> Init_net: > >>> > >>> link roceo12409/1 state DOWN physical_state DISABLED netdev eno12409 > >>> link rocep202s0f0/1 state DOWN physical_state DISABLED netdev ens7f0 > >>> link rocep202s0f1/1 state ACTIVE physical_state LINK_UP netdev ens7f1 > >>> > >>> net0: > >>> > >>> link roceo12399/1 state DOWN physical_state DISABLED netdev eno12399 > >>> > >>> Fixes: da990ab40a92 ("rdma: Add link object") > >>> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> > >>> --- > >>> rdma/link.c | 3 +++ > >>> 1 file changed, 3 insertions(+) > >>> > >>> diff --git a/rdma/link.c b/rdma/link.c > >>> index bf24b849..449a7636 100644 > >>> --- a/rdma/link.c > >>> +++ b/rdma/link.c > >>> @@ -238,6 +238,9 @@ static int link_parse_cb(const struct nlmsghdr *nlh, void *data) > >>> return MNL_CB_ERROR; > >>> } > >>> + if (!tb[RDMA_NLDEV_ATTR_NDEV_NAME] || !tb[RDMA_NLDEV_ATTR_NDEV_INDEX]) > >>> + return MNL_CB_OK; > >>> + > >> Regarding your question where it should go in addition to RDMA, the answer > >> is netdev ML. The rdmatool is part of iproute2 and the relevant maintainers > >> should be CCed. > >> Thanks. I will also send it to netdev ML and CC the maintainers. > >> > >> Regarding the change, I don't think that it is right. User space tool is > >> a simple viewer of data returned from the kernel. It is not a mistake to > >> return device without netdev. > >> > >> Normally a rdma link based on RoCEv2 should be with a NIC. This NIC device > >> > >> will send/recv udp packets. With mellanox/intel NIC device, this net device > >> also > >> > >> do more work than sending/receiving packets. > >> > >> From this perspective, a rdma link is dependent on a net device. > >> > >> In this problem, net device is moved to another net namespace. So it can not > >> be > >> > >> obtained. And this rdma link can also not work in this net namespace. > >> > >> So this rdma link should not appear in this net namespace. Or else, it would > >> confuse > >> > >> the user. > >> > >> In fact, net namespace is a concept in tcp/ip stack. And it does not exist > >> in rdma stack. > > > > RDMA has two different net namespace mode: shared and exclusive. > > > > In shared mode, the IB devices are shared across all net namespaces and > > "moving" net device into different namespace just "hides" it, but don't > > disconnect. > > Hi, Leon > > About RDMA shared and exclusive mode, I am confusing about this scenario: > > In shared mode, ib device A is in net namespace A1 while netdev device B is in net namespace B1. > IB device A is dependent on netdev device B. How to make tests in the above scenario? > Both rping and perftest need a IP address to work. But now ip address is in net namespace B1 while > ib device A is in net namespace A1. > > In the product environment, does the above scenario exist? Yes and no at the same time. Yes: The whole net namespace support is needed for containers. In old versions of rdma-core, libibverbs relied on /sys/class/infiniband/ structure. This is why we need "shared" mode, where IB exists without relation to netdev. No: Like you said, it won't work for RoCE and iWARP. Thanks > > Thanks and Regards, > Zhu Yanjun > > > > > See comments around various usages of ib_devices_shared_netns variable. > > > > Thanks
diff --git a/rdma/link.c b/rdma/link.c index bf24b849..449a7636 100644 --- a/rdma/link.c +++ b/rdma/link.c @@ -238,6 +238,9 @@ static int link_parse_cb(const struct nlmsghdr *nlh, void *data) return MNL_CB_ERROR; } + if (!tb[RDMA_NLDEV_ATTR_NDEV_NAME] || !tb[RDMA_NLDEV_ATTR_NDEV_INDEX]) + return MNL_CB_OK; + idx = mnl_attr_get_u32(tb[RDMA_NLDEV_ATTR_DEV_INDEX]); port = mnl_attr_get_u32(tb[RDMA_NLDEV_ATTR_PORT_INDEX]); name = mnl_attr_get_str(tb[RDMA_NLDEV_ATTR_DEV_NAME]);