diff mbox series

rdma: not display the rdma link in other net namespace

Message ID 20220926024033.284341-1-yanjun.zhu@linux.dev (mailing list archive)
State Changes Requested
Delegated to: Leon Romanovsky
Headers show
Series rdma: not display the rdma link in other net namespace | expand

Commit Message

Zhu Yanjun Sept. 26, 2022, 2:40 a.m. UTC
From: Zhu Yanjun <yanjun.zhu@linux.dev>

When the net devices are moved to another net namespace, the command
"rdma link" should not dispaly the rdma link about this net device.

For example, when the net device eno12399 is moved to net namespace net0
from init_net, the rdma link of eno12399 should not display in init_net.

Before this change:

Init_net:

link roceo12399/1 state DOWN physical_state DISABLED  <---should not display
link roceo12409/1 state DOWN physical_state DISABLED netdev eno12409
link rocep202s0f0/1 state DOWN physical_state DISABLED netdev ens7f0
link rocep202s0f1/1 state ACTIVE physical_state LINK_UP netdev ens7f1

net0:

link roceo12399/1 state DOWN physical_state DISABLED netdev eno12399
link roceo12409/1 state DOWN physical_state DISABLED <---should not display
link rocep202s0f0/1 state DOWN physical_state DISABLED <---should not display
link rocep202s0f1/1 state ACTIVE physical_state LINK_UP <---should not display

After this change

Init_net:

link roceo12409/1 state DOWN physical_state DISABLED netdev eno12409
link rocep202s0f0/1 state DOWN physical_state DISABLED netdev ens7f0
link rocep202s0f1/1 state ACTIVE physical_state LINK_UP netdev ens7f1

net0:

link roceo12399/1 state DOWN physical_state DISABLED netdev eno12399

Fixes: da990ab40a92 ("rdma: Add link object")
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
---
 rdma/link.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Zhu Yanjun Sept. 25, 2022, 10:22 a.m. UTC | #1
在 2022/9/26 10:40, yanjun.zhu@linux.dev 写道:
> From: Zhu Yanjun <yanjun.zhu@linux.dev>
> 
> When the net devices are moved to another net namespace, the command
> "rdma link" should not dispaly the rdma link about this net device.
> 
> For example, when the net device eno12399 is moved to net namespace net0
> from init_net, the rdma link of eno12399 should not display in init_net.
> 
> Before this change:
> 
> Init_net:
> 
> link roceo12399/1 state DOWN physical_state DISABLED  <---should not display
> link roceo12409/1 state DOWN physical_state DISABLED netdev eno12409
> link rocep202s0f0/1 state DOWN physical_state DISABLED netdev ens7f0
> link rocep202s0f1/1 state ACTIVE physical_state LINK_UP netdev ens7f1
> 
> net0:
> 
> link roceo12399/1 state DOWN physical_state DISABLED netdev eno12399
> link roceo12409/1 state DOWN physical_state DISABLED <---should not display
> link rocep202s0f0/1 state DOWN physical_state DISABLED <---should not display
> link rocep202s0f1/1 state ACTIVE physical_state LINK_UP <---should not display
> 
> After this change
> 
> Init_net:
> 
> link roceo12409/1 state DOWN physical_state DISABLED netdev eno12409
> link rocep202s0f0/1 state DOWN physical_state DISABLED netdev ens7f0
> link rocep202s0f1/1 state ACTIVE physical_state LINK_UP netdev ens7f1
> 
> net0:
> 
> link roceo12399/1 state DOWN physical_state DISABLED netdev eno12399
> 
> Fixes: da990ab40a92 ("rdma: Add link object")
> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>

Hi Leon

This commit is to fix a problem in "rdma link show" of iproute2.
I do not know the maillist of iproute2. So I send the commit here.
If you know the maillist, please let me know.

Thanks and Regards,
Zhu Yanjun

> ---
>   rdma/link.c | 3 +++
>   1 file changed, 3 insertions(+)
> 
> diff --git a/rdma/link.c b/rdma/link.c
> index bf24b849..449a7636 100644
> --- a/rdma/link.c
> +++ b/rdma/link.c
> @@ -238,6 +238,9 @@ static int link_parse_cb(const struct nlmsghdr *nlh, void *data)
>   		return MNL_CB_ERROR;
>   	}
>   
> +	if (!tb[RDMA_NLDEV_ATTR_NDEV_NAME] || !tb[RDMA_NLDEV_ATTR_NDEV_INDEX])
> +		return MNL_CB_OK;
> +
>   	idx = mnl_attr_get_u32(tb[RDMA_NLDEV_ATTR_DEV_INDEX]);
>   	port = mnl_attr_get_u32(tb[RDMA_NLDEV_ATTR_PORT_INDEX]);
>   	name = mnl_attr_get_str(tb[RDMA_NLDEV_ATTR_DEV_NAME]);
Leon Romanovsky Sept. 27, 2022, 10:34 a.m. UTC | #2
On Sun, Sep 25, 2022 at 10:40:33PM -0400, yanjun.zhu@linux.dev wrote:
> From: Zhu Yanjun <yanjun.zhu@linux.dev>
> 
> When the net devices are moved to another net namespace, the command
> "rdma link" should not dispaly the rdma link about this net device.
> 
> For example, when the net device eno12399 is moved to net namespace net0
> from init_net, the rdma link of eno12399 should not display in init_net.
> 
> Before this change:
> 
> Init_net:
> 
> link roceo12399/1 state DOWN physical_state DISABLED  <---should not display
> link roceo12409/1 state DOWN physical_state DISABLED netdev eno12409
> link rocep202s0f0/1 state DOWN physical_state DISABLED netdev ens7f0
> link rocep202s0f1/1 state ACTIVE physical_state LINK_UP netdev ens7f1
> 
> net0:
> 
> link roceo12399/1 state DOWN physical_state DISABLED netdev eno12399
> link roceo12409/1 state DOWN physical_state DISABLED <---should not display
> link rocep202s0f0/1 state DOWN physical_state DISABLED <---should not display
> link rocep202s0f1/1 state ACTIVE physical_state LINK_UP <---should not display
> 
> After this change
> 
> Init_net:
> 
> link roceo12409/1 state DOWN physical_state DISABLED netdev eno12409
> link rocep202s0f0/1 state DOWN physical_state DISABLED netdev ens7f0
> link rocep202s0f1/1 state ACTIVE physical_state LINK_UP netdev ens7f1
> 
> net0:
> 
> link roceo12399/1 state DOWN physical_state DISABLED netdev eno12399
> 
> Fixes: da990ab40a92 ("rdma: Add link object")
> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
> ---
>  rdma/link.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/rdma/link.c b/rdma/link.c
> index bf24b849..449a7636 100644
> --- a/rdma/link.c
> +++ b/rdma/link.c
> @@ -238,6 +238,9 @@ static int link_parse_cb(const struct nlmsghdr *nlh, void *data)
>  		return MNL_CB_ERROR;
>  	}
>  
> +	if (!tb[RDMA_NLDEV_ATTR_NDEV_NAME] || !tb[RDMA_NLDEV_ATTR_NDEV_INDEX])
> +		return MNL_CB_OK;
> +

Regarding your question where it should go in addition to RDMA, the answer
is netdev ML. The rdmatool is part of iproute2 and the relevant maintainers
should be CCed.

Regarding the change, I don't think that it is right. User space tool is
a simple viewer of data returned from the kernel. It is not a mistake to
return device without netdev.

Thanks
Zhu Yanjun Sept. 27, 2022, 10:58 a.m. UTC | #3
在 2022/9/27 18:34, Leon Romanovsky 写道:
> On Sun, Sep 25, 2022 at 10:40:33PM -0400, yanjun.zhu@linux.dev wrote:
>> From: Zhu Yanjun <yanjun.zhu@linux.dev>
>>
>> When the net devices are moved to another net namespace, the command
>> "rdma link" should not dispaly the rdma link about this net device.
>>
>> For example, when the net device eno12399 is moved to net namespace net0
>> from init_net, the rdma link of eno12399 should not display in init_net.
>>
>> Before this change:
>>
>> Init_net:
>>
>> link roceo12399/1 state DOWN physical_state DISABLED  <---should not display
>> link roceo12409/1 state DOWN physical_state DISABLED netdev eno12409
>> link rocep202s0f0/1 state DOWN physical_state DISABLED netdev ens7f0
>> link rocep202s0f1/1 state ACTIVE physical_state LINK_UP netdev ens7f1
>>
>> net0:
>>
>> link roceo12399/1 state DOWN physical_state DISABLED netdev eno12399
>> link roceo12409/1 state DOWN physical_state DISABLED <---should not display
>> link rocep202s0f0/1 state DOWN physical_state DISABLED <---should not display
>> link rocep202s0f1/1 state ACTIVE physical_state LINK_UP <---should not display
>>
>> After this change
>>
>> Init_net:
>>
>> link roceo12409/1 state DOWN physical_state DISABLED netdev eno12409
>> link rocep202s0f0/1 state DOWN physical_state DISABLED netdev ens7f0
>> link rocep202s0f1/1 state ACTIVE physical_state LINK_UP netdev ens7f1
>>
>> net0:
>>
>> link roceo12399/1 state DOWN physical_state DISABLED netdev eno12399
>>
>> Fixes: da990ab40a92 ("rdma: Add link object")
>> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
>> ---
>>   rdma/link.c | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/rdma/link.c b/rdma/link.c
>> index bf24b849..449a7636 100644
>> --- a/rdma/link.c
>> +++ b/rdma/link.c
>> @@ -238,6 +238,9 @@ static int link_parse_cb(const struct nlmsghdr *nlh, void *data)
>>   		return MNL_CB_ERROR;
>>   	}
>>   
>> +	if (!tb[RDMA_NLDEV_ATTR_NDEV_NAME] || !tb[RDMA_NLDEV_ATTR_NDEV_INDEX])
>> +		return MNL_CB_OK;
>> +
> Regarding your question where it should go in addition to RDMA, the answer
> is netdev ML. The rdmatool is part of iproute2 and the relevant maintainers
> should be CCed.
Thanks. I will also send it to netdev ML and CC the maintainers.
>
> Regarding the change, I don't think that it is right. User space tool is
> a simple viewer of data returned from the kernel. It is not a mistake to
> return device without netdev.

Normally a rdma link based on RoCEv2 should be with a NIC. This NIC device

will send/recv udp packets. With mellanox/intel NIC device, this net 
device also

do more work than sending/receiving packets.

 From this perspective, a rdma link is dependent on a net device.

In this problem, net device is moved to another net namespace. So it can 
not be

obtained.  And this rdma link can also not work in this net namespace.

So this rdma link should not appear in this net namespace. Or else, it 
would confuse

the user.

In fact, net namespace is a concept in tcp/ip stack. And it does not 
exist in rdma stack.

But rdma link based on RoCEv2 is dependent on network device. So when 
the net device

is moved to another net namespace, this rdma link should also be moved 
to another net namespace.

Zhu Yanjun

>
> Thanks
Leon Romanovsky Sept. 28, 2022, 6:04 a.m. UTC | #4
On Tue, Sep 27, 2022 at 06:58:50PM +0800, Yanjun Zhu wrote:
> 
> 在 2022/9/27 18:34, Leon Romanovsky 写道:
> > On Sun, Sep 25, 2022 at 10:40:33PM -0400, yanjun.zhu@linux.dev wrote:
> > > From: Zhu Yanjun <yanjun.zhu@linux.dev>
> > > 
> > > When the net devices are moved to another net namespace, the command
> > > "rdma link" should not dispaly the rdma link about this net device.
> > > 
> > > For example, when the net device eno12399 is moved to net namespace net0
> > > from init_net, the rdma link of eno12399 should not display in init_net.
> > > 
> > > Before this change:
> > > 
> > > Init_net:
> > > 
> > > link roceo12399/1 state DOWN physical_state DISABLED  <---should not display
> > > link roceo12409/1 state DOWN physical_state DISABLED netdev eno12409
> > > link rocep202s0f0/1 state DOWN physical_state DISABLED netdev ens7f0
> > > link rocep202s0f1/1 state ACTIVE physical_state LINK_UP netdev ens7f1
> > > 
> > > net0:
> > > 
> > > link roceo12399/1 state DOWN physical_state DISABLED netdev eno12399
> > > link roceo12409/1 state DOWN physical_state DISABLED <---should not display
> > > link rocep202s0f0/1 state DOWN physical_state DISABLED <---should not display
> > > link rocep202s0f1/1 state ACTIVE physical_state LINK_UP <---should not display
> > > 
> > > After this change
> > > 
> > > Init_net:
> > > 
> > > link roceo12409/1 state DOWN physical_state DISABLED netdev eno12409
> > > link rocep202s0f0/1 state DOWN physical_state DISABLED netdev ens7f0
> > > link rocep202s0f1/1 state ACTIVE physical_state LINK_UP netdev ens7f1
> > > 
> > > net0:
> > > 
> > > link roceo12399/1 state DOWN physical_state DISABLED netdev eno12399
> > > 
> > > Fixes: da990ab40a92 ("rdma: Add link object")
> > > Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
> > > ---
> > >   rdma/link.c | 3 +++
> > >   1 file changed, 3 insertions(+)
> > > 
> > > diff --git a/rdma/link.c b/rdma/link.c
> > > index bf24b849..449a7636 100644
> > > --- a/rdma/link.c
> > > +++ b/rdma/link.c
> > > @@ -238,6 +238,9 @@ static int link_parse_cb(const struct nlmsghdr *nlh, void *data)
> > >   		return MNL_CB_ERROR;
> > >   	}
> > > +	if (!tb[RDMA_NLDEV_ATTR_NDEV_NAME] || !tb[RDMA_NLDEV_ATTR_NDEV_INDEX])
> > > +		return MNL_CB_OK;
> > > +
> > Regarding your question where it should go in addition to RDMA, the answer
> > is netdev ML. The rdmatool is part of iproute2 and the relevant maintainers
> > should be CCed.
> Thanks. I will also send it to netdev ML and CC the maintainers.
> > 
> > Regarding the change, I don't think that it is right. User space tool is
> > a simple viewer of data returned from the kernel. It is not a mistake to
> > return device without netdev.
> 
> Normally a rdma link based on RoCEv2 should be with a NIC. This NIC device
> 
> will send/recv udp packets. With mellanox/intel NIC device, this net device
> also
> 
> do more work than sending/receiving packets.
> 
> From this perspective, a rdma link is dependent on a net device.
> 
> In this problem, net device is moved to another net namespace. So it can not
> be
> 
> obtained.  And this rdma link can also not work in this net namespace.
> 
> So this rdma link should not appear in this net namespace. Or else, it would
> confuse
> 
> the user.
> 
> In fact, net namespace is a concept in tcp/ip stack. And it does not exist
> in rdma stack.

RDMA has two different net namespace mode: shared and exclusive.

In shared mode, the IB devices are shared across all net namespaces and
"moving" net device into different namespace just "hides" it, but don't
disconnect.

See comments around various usages of ib_devices_shared_netns variable.

Thanks
Zhu Yanjun Sept. 30, 2022, 7:25 a.m. UTC | #5
在 2022/9/28 14:04, Leon Romanovsky 写道:
> On Tue, Sep 27, 2022 at 06:58:50PM +0800, Yanjun Zhu wrote:
>>
>> 在 2022/9/27 18:34, Leon Romanovsky 写道:
>>> On Sun, Sep 25, 2022 at 10:40:33PM -0400, yanjun.zhu@linux.dev wrote:
>>>> From: Zhu Yanjun <yanjun.zhu@linux.dev>
>>>>
>>>> When the net devices are moved to another net namespace, the command
>>>> "rdma link" should not dispaly the rdma link about this net device.
>>>>
>>>> For example, when the net device eno12399 is moved to net namespace net0
>>>> from init_net, the rdma link of eno12399 should not display in init_net.
>>>>
>>>> Before this change:
>>>>
>>>> Init_net:
>>>>
>>>> link roceo12399/1 state DOWN physical_state DISABLED  <---should not display
>>>> link roceo12409/1 state DOWN physical_state DISABLED netdev eno12409
>>>> link rocep202s0f0/1 state DOWN physical_state DISABLED netdev ens7f0
>>>> link rocep202s0f1/1 state ACTIVE physical_state LINK_UP netdev ens7f1
>>>>
>>>> net0:
>>>>
>>>> link roceo12399/1 state DOWN physical_state DISABLED netdev eno12399
>>>> link roceo12409/1 state DOWN physical_state DISABLED <---should not display
>>>> link rocep202s0f0/1 state DOWN physical_state DISABLED <---should not display
>>>> link rocep202s0f1/1 state ACTIVE physical_state LINK_UP <---should not display
>>>>
>>>> After this change
>>>>
>>>> Init_net:
>>>>
>>>> link roceo12409/1 state DOWN physical_state DISABLED netdev eno12409
>>>> link rocep202s0f0/1 state DOWN physical_state DISABLED netdev ens7f0
>>>> link rocep202s0f1/1 state ACTIVE physical_state LINK_UP netdev ens7f1
>>>>
>>>> net0:
>>>>
>>>> link roceo12399/1 state DOWN physical_state DISABLED netdev eno12399
>>>>
>>>> Fixes: da990ab40a92 ("rdma: Add link object")
>>>> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
>>>> ---
>>>>    rdma/link.c | 3 +++
>>>>    1 file changed, 3 insertions(+)
>>>>
>>>> diff --git a/rdma/link.c b/rdma/link.c
>>>> index bf24b849..449a7636 100644
>>>> --- a/rdma/link.c
>>>> +++ b/rdma/link.c
>>>> @@ -238,6 +238,9 @@ static int link_parse_cb(const struct nlmsghdr *nlh, void *data)
>>>>    		return MNL_CB_ERROR;
>>>>    	}
>>>> +	if (!tb[RDMA_NLDEV_ATTR_NDEV_NAME] || !tb[RDMA_NLDEV_ATTR_NDEV_INDEX])
>>>> +		return MNL_CB_OK;
>>>> +
>>> Regarding your question where it should go in addition to RDMA, the answer
>>> is netdev ML. The rdmatool is part of iproute2 and the relevant maintainers
>>> should be CCed.
>> Thanks. I will also send it to netdev ML and CC the maintainers.
>>>
>>> Regarding the change, I don't think that it is right. User space tool is
>>> a simple viewer of data returned from the kernel. It is not a mistake to
>>> return device without netdev.
>>
>> Normally a rdma link based on RoCEv2 should be with a NIC. This NIC device
>>
>> will send/recv udp packets. With mellanox/intel NIC device, this net device
>> also
>>
>> do more work than sending/receiving packets.
>>
>>  From this perspective, a rdma link is dependent on a net device.
>>
>> In this problem, net device is moved to another net namespace. So it can not
>> be
>>
>> obtained.  And this rdma link can also not work in this net namespace.
>>
>> So this rdma link should not appear in this net namespace. Or else, it would
>> confuse
>>
>> the user.
>>
>> In fact, net namespace is a concept in tcp/ip stack. And it does not exist
>> in rdma stack.
> 
> RDMA has two different net namespace mode: shared and exclusive.

This is different from net namespace in network.

> 
> In shared mode, the IB devices are shared across all net namespaces and
> "moving" net device into different namespace just "hides" it, but don't
> disconnect.

In exclusive mode, the net device also hide. It is the same with shared 
mode.

# rdma system
netns exclusive copy-on-fork on
# rdma link
link roceo12399/1 state DOWN physical_state DISABLED
link roceo12409/1 state DOWN physical_state DISABLED netdev eno12409
link rocep202s0f0/1 state DOWN physical_state DISABLED netdev ens7f0
link rocep202s0f1/1 state ACTIVE physical_state LINK_UP netdev ens7f1

Is it better to append "exclusive" or "shared" in the end of the line?

For example,

Exclusive mode:

# rdma system
netns exclusive copy-on-fork on
# rdma link
link roceo12399/1 state DOWN physical_state DISABLED exclusive

Shared mode:

# rdma system
netns shared copy-on-fork on
# rdma link
link roceo12399/1 state DOWN physical_state DISABLED shared

Thanks and Regards
Zhu Yanjun
> 
> See comments around various usages of ib_devices_shared_netns variable.
> 
> Thanks
Leon Romanovsky Oct. 6, 2022, 12:53 p.m. UTC | #6
On Fri, Sep 30, 2022 at 03:25:00PM +0800, Yanjun Zhu wrote:
> 在 2022/9/28 14:04, Leon Romanovsky 写道:
> > On Tue, Sep 27, 2022 at 06:58:50PM +0800, Yanjun Zhu wrote:
> > > 
> > > 在 2022/9/27 18:34, Leon Romanovsky 写道:
> > > > On Sun, Sep 25, 2022 at 10:40:33PM -0400, yanjun.zhu@linux.dev wrote:
> > > > > From: Zhu Yanjun <yanjun.zhu@linux.dev>

<...>

> Is it better to append "exclusive" or "shared" in the end of the line?

No, exclusive/shared is global property, applied to all links.

Thanks
Zhu Yanjun Oct. 6, 2022, 2:26 p.m. UTC | #7
在 2022/10/6 20:53, Leon Romanovsky 写道:
> On Fri, Sep 30, 2022 at 03:25:00PM +0800, Yanjun Zhu wrote:
>> 在 2022/9/28 14:04, Leon Romanovsky 写道:
>>> On Tue, Sep 27, 2022 at 06:58:50PM +0800, Yanjun Zhu wrote:
>>>> 在 2022/9/27 18:34, Leon Romanovsky 写道:
>>>>> On Sun, Sep 25, 2022 at 10:40:33PM -0400, yanjun.zhu@linux.dev wrote:
>>>>>> From: Zhu Yanjun <yanjun.zhu@linux.dev>
> <...>
>
>> Is it better to append "exclusive" or "shared" in the end of the line?
> No, exclusive/shared is global property, applied to all links.

OK.

When running "rdma link show", there is no difference between shared and 
exclusive.

Is it acceptable?


And in exclusive mode, a rdma link that can not be accessed in net 
namespace A still

appears in net namespace A when running "rdma link show" in net namespace A.

The above is different from others in net namespace.

For example, in net namespace, if net device NIC0 is moved to net 
namespace B from net namespace A,

this NIC0 will not appear in net namespace A when running "ip link" 
command in net namespace A.

Is it a problem?


Zhu Yanjun

>
> Thanks
Leon Romanovsky Oct. 6, 2022, 4:21 p.m. UTC | #8
On Thu, Oct 06, 2022 at 10:26:33PM +0800, Yanjun Zhu wrote:
> 
> 在 2022/10/6 20:53, Leon Romanovsky 写道:
> > On Fri, Sep 30, 2022 at 03:25:00PM +0800, Yanjun Zhu wrote:
> > > 在 2022/9/28 14:04, Leon Romanovsky 写道:
> > > > On Tue, Sep 27, 2022 at 06:58:50PM +0800, Yanjun Zhu wrote:
> > > > > 在 2022/9/27 18:34, Leon Romanovsky 写道:
> > > > > > On Sun, Sep 25, 2022 at 10:40:33PM -0400, yanjun.zhu@linux.dev wrote:
> > > > > > > From: Zhu Yanjun <yanjun.zhu@linux.dev>
> > <...>
> > 
> > > Is it better to append "exclusive" or "shared" in the end of the line?
> > No, exclusive/shared is global property, applied to all links.
> 
> OK.
> 
> When running "rdma link show", there is no difference between shared and
> exclusive.
> 
> Is it acceptable?

exclusive/shared is ib_core module parameter, so or all links are shared
or all links are exclusive. They can't be both at the same time.

> 
> 
> And in exclusive mode, a rdma link that can not be accessed in net namespace
> A still
> 
> appears in net namespace A when running "rdma link show" in net namespace A.
> 
> The above is different from others in net namespace.
> 
> For example, in net namespace, if net device NIC0 is moved to net namespace
> B from net namespace A,
> 
> this NIC0 will not appear in net namespace A when running "ip link" command
> in net namespace A.
> 
> Is it a problem?

rdmatool presents IB devices. It has no logic that decides if that
device is usable/operable or not.

Thanks

> 
> 
> Zhu Yanjun
> 
> > 
> > Thanks
Jason Gunthorpe Oct. 6, 2022, 4:23 p.m. UTC | #9
On Thu, Oct 06, 2022 at 07:21:39PM +0300, Leon Romanovsky wrote:

> > And in exclusive mode, a rdma link that can not be accessed in net namespace
> > A still
> > 
> > appears in net namespace A when running "rdma link show" in net namespace A.
> > 
> > The above is different from others in net namespace.
> > 
> > For example, in net namespace, if net device NIC0 is moved to net namespace
> > B from net namespace A,
> > 
> > this NIC0 will not appear in net namespace A when running "ip link" command
> > in net namespace A.
> > 
> > Is it a problem?
> 
> rdmatool presents IB devices. It has no logic that decides if that
> device is usable/operable or not.

It should really not report an IB device that is not in the net
namespace..

I'm surprised this hasn't been noticed because it will break verbs.

Jason
Leon Romanovsky Oct. 7, 2022, 6:21 a.m. UTC | #10
On Thu, Oct 06, 2022 at 01:23:05PM -0300, Jason Gunthorpe wrote:
> On Thu, Oct 06, 2022 at 07:21:39PM +0300, Leon Romanovsky wrote:
> 
> > > And in exclusive mode, a rdma link that can not be accessed in net namespace
> > > A still
> > > 
> > > appears in net namespace A when running "rdma link show" in net namespace A.
> > > 
> > > The above is different from others in net namespace.
> > > 
> > > For example, in net namespace, if net device NIC0 is moved to net namespace
> > > B from net namespace A,
> > > 
> > > this NIC0 will not appear in net namespace A when running "ip link" command
> > > in net namespace A.
> > > 
> > > Is it a problem?
> > 
> > rdmatool presents IB devices. It has no logic that decides if that
> > device is usable/operable or not.
> 
> It should really not report an IB device that is not in the net
> namespace..

It is kernel (nldev.c) job to hide such IB devices and it seems like it
does. For devices with help of ib_enum_all_devs() and for links with
ib_device_get_by_index().

They both checks netns - rdma_dev_access_netns(device, net).

Thanks
Zhu Yanjun Oct. 7, 2022, 6:56 a.m. UTC | #11
在 2022/10/7 14:21, Leon Romanovsky 写道:
> On Thu, Oct 06, 2022 at 01:23:05PM -0300, Jason Gunthorpe wrote:
>> On Thu, Oct 06, 2022 at 07:21:39PM +0300, Leon Romanovsky wrote:
>>
>>>> And in exclusive mode, a rdma link that can not be accessed in net namespace
>>>> A still
>>>>
>>>> appears in net namespace A when running "rdma link show" in net namespace A.
>>>>
>>>> The above is different from others in net namespace.
>>>>
>>>> For example, in net namespace, if net device NIC0 is moved to net namespace
>>>> B from net namespace A,
>>>>
>>>> this NIC0 will not appear in net namespace A when running "ip link" command
>>>> in net namespace A.
>>>>
>>>> Is it a problem?
>>> rdmatool presents IB devices. It has no logic that decides if that
>>> device is usable/operable or not.
>> It should really not report an IB device that is not in the net
>> namespace..
> It is kernel (nldev.c) job to hide such IB devices and it seems like it
> does. For devices with help of ib_enum_all_devs() and for links with
> ib_device_get_by_index().

Thanks, Jason and Leon

I am working on it. I will send the patch out very soon.

Zhu Yanjun

>
> They both checks netns - rdma_dev_access_netns(device, net).
>
> Thanks
Zhu Yanjun Oct. 9, 2022, 10:20 a.m. UTC | #12
September 28, 2022 2:04 PM, "Leon Romanovsky" <leon@kernel.org> wrote:

> On Tue, Sep 27, 2022 at 06:58:50PM +0800, Yanjun Zhu wrote:
> 
>> 在 2022/9/27 18:34, Leon Romanovsky 写道:
>> On Sun, Sep 25, 2022 at 10:40:33PM -0400, yanjun.zhu@linux.dev wrote:
>>> From: Zhu Yanjun <yanjun.zhu@linux.dev>
>>> 
>>> When the net devices are moved to another net namespace, the command
>>> "rdma link" should not dispaly the rdma link about this net device.
>>> 
>>> For example, when the net device eno12399 is moved to net namespace net0
>>> from init_net, the rdma link of eno12399 should not display in init_net.
>>> 
>>> Before this change:
>>> 
>>> Init_net:
>>> 
>>> link roceo12399/1 state DOWN physical_state DISABLED <---should not display
>>> link roceo12409/1 state DOWN physical_state DISABLED netdev eno12409
>>> link rocep202s0f0/1 state DOWN physical_state DISABLED netdev ens7f0
>>> link rocep202s0f1/1 state ACTIVE physical_state LINK_UP netdev ens7f1
>>> 
>>> net0:
>>> 
>>> link roceo12399/1 state DOWN physical_state DISABLED netdev eno12399
>>> link roceo12409/1 state DOWN physical_state DISABLED <---should not display
>>> link rocep202s0f0/1 state DOWN physical_state DISABLED <---should not display
>>> link rocep202s0f1/1 state ACTIVE physical_state LINK_UP <---should not display
>>> 
>>> After this change
>>> 
>>> Init_net:
>>> 
>>> link roceo12409/1 state DOWN physical_state DISABLED netdev eno12409
>>> link rocep202s0f0/1 state DOWN physical_state DISABLED netdev ens7f0
>>> link rocep202s0f1/1 state ACTIVE physical_state LINK_UP netdev ens7f1
>>> 
>>> net0:
>>> 
>>> link roceo12399/1 state DOWN physical_state DISABLED netdev eno12399
>>> 
>>> Fixes: da990ab40a92 ("rdma: Add link object")
>>> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
>>> ---
>>> rdma/link.c | 3 +++
>>> 1 file changed, 3 insertions(+)
>>> 
>>> diff --git a/rdma/link.c b/rdma/link.c
>>> index bf24b849..449a7636 100644
>>> --- a/rdma/link.c
>>> +++ b/rdma/link.c
>>> @@ -238,6 +238,9 @@ static int link_parse_cb(const struct nlmsghdr *nlh, void *data)
>>> return MNL_CB_ERROR;
>>> }
>>> + if (!tb[RDMA_NLDEV_ATTR_NDEV_NAME] || !tb[RDMA_NLDEV_ATTR_NDEV_INDEX])
>>> + return MNL_CB_OK;
>>> +
>> Regarding your question where it should go in addition to RDMA, the answer
>> is netdev ML. The rdmatool is part of iproute2 and the relevant maintainers
>> should be CCed.
>> Thanks. I will also send it to netdev ML and CC the maintainers.
>> 
>> Regarding the change, I don't think that it is right. User space tool is
>> a simple viewer of data returned from the kernel. It is not a mistake to
>> return device without netdev.
>> 
>> Normally a rdma link based on RoCEv2 should be with a NIC. This NIC device
>> 
>> will send/recv udp packets. With mellanox/intel NIC device, this net device
>> also
>> 
>> do more work than sending/receiving packets.
>> 
>> From this perspective, a rdma link is dependent on a net device.
>> 
>> In this problem, net device is moved to another net namespace. So it can not
>> be
>> 
>> obtained.  And this rdma link can also not work in this net namespace.
>> 
>> So this rdma link should not appear in this net namespace. Or else, it would
>> confuse
>> 
>> the user.
>> 
>> In fact, net namespace is a concept in tcp/ip stack. And it does not exist
>> in rdma stack.
> 
> RDMA has two different net namespace mode: shared and exclusive.
> 
> In shared mode, the IB devices are shared across all net namespaces and
> "moving" net device into different namespace just "hides" it, but don't
> disconnect.

Hi, Leon

About RDMA shared and exclusive mode, I am confusing about this scenario:

In shared mode, ib device A is in net namespace A1 while netdev device B is in net namespace B1.
IB device A is dependent on netdev device B. How to make tests in the above scenario?
Both rping and perftest need a IP address to work. But now ip address is in net namespace B1 while
ib device A is in net namespace A1.

In the product environment, does the above scenario exist?

Thanks and Regards,
Zhu Yanjun

> 
> See comments around various usages of ib_devices_shared_netns variable.
> 
> Thanks
Leon Romanovsky Oct. 11, 2022, 9:49 a.m. UTC | #13
On Sun, Oct 09, 2022 at 10:20:53AM +0000, yanjun.zhu@linux.dev wrote:
> September 28, 2022 2:04 PM, "Leon Romanovsky" <leon@kernel.org> wrote:
> 
> > On Tue, Sep 27, 2022 at 06:58:50PM +0800, Yanjun Zhu wrote:
> > 
> >> 在 2022/9/27 18:34, Leon Romanovsky 写道:
> >> On Sun, Sep 25, 2022 at 10:40:33PM -0400, yanjun.zhu@linux.dev wrote:
> >>> From: Zhu Yanjun <yanjun.zhu@linux.dev>
> >>> 
> >>> When the net devices are moved to another net namespace, the command
> >>> "rdma link" should not dispaly the rdma link about this net device.
> >>> 
> >>> For example, when the net device eno12399 is moved to net namespace net0
> >>> from init_net, the rdma link of eno12399 should not display in init_net.
> >>> 
> >>> Before this change:
> >>> 
> >>> Init_net:
> >>> 
> >>> link roceo12399/1 state DOWN physical_state DISABLED <---should not display
> >>> link roceo12409/1 state DOWN physical_state DISABLED netdev eno12409
> >>> link rocep202s0f0/1 state DOWN physical_state DISABLED netdev ens7f0
> >>> link rocep202s0f1/1 state ACTIVE physical_state LINK_UP netdev ens7f1
> >>> 
> >>> net0:
> >>> 
> >>> link roceo12399/1 state DOWN physical_state DISABLED netdev eno12399
> >>> link roceo12409/1 state DOWN physical_state DISABLED <---should not display
> >>> link rocep202s0f0/1 state DOWN physical_state DISABLED <---should not display
> >>> link rocep202s0f1/1 state ACTIVE physical_state LINK_UP <---should not display
> >>> 
> >>> After this change
> >>> 
> >>> Init_net:
> >>> 
> >>> link roceo12409/1 state DOWN physical_state DISABLED netdev eno12409
> >>> link rocep202s0f0/1 state DOWN physical_state DISABLED netdev ens7f0
> >>> link rocep202s0f1/1 state ACTIVE physical_state LINK_UP netdev ens7f1
> >>> 
> >>> net0:
> >>> 
> >>> link roceo12399/1 state DOWN physical_state DISABLED netdev eno12399
> >>> 
> >>> Fixes: da990ab40a92 ("rdma: Add link object")
> >>> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
> >>> ---
> >>> rdma/link.c | 3 +++
> >>> 1 file changed, 3 insertions(+)
> >>> 
> >>> diff --git a/rdma/link.c b/rdma/link.c
> >>> index bf24b849..449a7636 100644
> >>> --- a/rdma/link.c
> >>> +++ b/rdma/link.c
> >>> @@ -238,6 +238,9 @@ static int link_parse_cb(const struct nlmsghdr *nlh, void *data)
> >>> return MNL_CB_ERROR;
> >>> }
> >>> + if (!tb[RDMA_NLDEV_ATTR_NDEV_NAME] || !tb[RDMA_NLDEV_ATTR_NDEV_INDEX])
> >>> + return MNL_CB_OK;
> >>> +
> >> Regarding your question where it should go in addition to RDMA, the answer
> >> is netdev ML. The rdmatool is part of iproute2 and the relevant maintainers
> >> should be CCed.
> >> Thanks. I will also send it to netdev ML and CC the maintainers.
> >> 
> >> Regarding the change, I don't think that it is right. User space tool is
> >> a simple viewer of data returned from the kernel. It is not a mistake to
> >> return device without netdev.
> >> 
> >> Normally a rdma link based on RoCEv2 should be with a NIC. This NIC device
> >> 
> >> will send/recv udp packets. With mellanox/intel NIC device, this net device
> >> also
> >> 
> >> do more work than sending/receiving packets.
> >> 
> >> From this perspective, a rdma link is dependent on a net device.
> >> 
> >> In this problem, net device is moved to another net namespace. So it can not
> >> be
> >> 
> >> obtained.  And this rdma link can also not work in this net namespace.
> >> 
> >> So this rdma link should not appear in this net namespace. Or else, it would
> >> confuse
> >> 
> >> the user.
> >> 
> >> In fact, net namespace is a concept in tcp/ip stack. And it does not exist
> >> in rdma stack.
> > 
> > RDMA has two different net namespace mode: shared and exclusive.
> > 
> > In shared mode, the IB devices are shared across all net namespaces and
> > "moving" net device into different namespace just "hides" it, but don't
> > disconnect.
> 
> Hi, Leon
> 
> About RDMA shared and exclusive mode, I am confusing about this scenario:
> 
> In shared mode, ib device A is in net namespace A1 while netdev device B is in net namespace B1.
> IB device A is dependent on netdev device B. How to make tests in the above scenario?
> Both rping and perftest need a IP address to work. But now ip address is in net namespace B1 while
> ib device A is in net namespace A1.
> 
> In the product environment, does the above scenario exist?

Yes and no at the same time.

Yes:
The whole net namespace support is needed for containers. In old
versions of rdma-core, libibverbs relied on /sys/class/infiniband/
structure. This is why we need "shared" mode, where IB exists without
relation to netdev.

No:
Like you said, it won't work for RoCE and iWARP.

Thanks

> 
> Thanks and Regards,
> Zhu Yanjun
> 
> > 
> > See comments around various usages of ib_devices_shared_netns variable.
> > 
> > Thanks
diff mbox series

Patch

diff --git a/rdma/link.c b/rdma/link.c
index bf24b849..449a7636 100644
--- a/rdma/link.c
+++ b/rdma/link.c
@@ -238,6 +238,9 @@  static int link_parse_cb(const struct nlmsghdr *nlh, void *data)
 		return MNL_CB_ERROR;
 	}
 
+	if (!tb[RDMA_NLDEV_ATTR_NDEV_NAME] || !tb[RDMA_NLDEV_ATTR_NDEV_INDEX])
+		return MNL_CB_OK;
+
 	idx = mnl_attr_get_u32(tb[RDMA_NLDEV_ATTR_DEV_INDEX]);
 	port = mnl_attr_get_u32(tb[RDMA_NLDEV_ATTR_PORT_INDEX]);
 	name = mnl_attr_get_str(tb[RDMA_NLDEV_ATTR_DEV_NAME]);