diff mbox series

[v1] RDMA/core: Handle ARPHRD_NONE devices for iWARP

Message ID 168616682205.2099.4473975057644323224.stgit@oracle-102.nfsv4bat.org (mailing list archive)
State Superseded
Headers show
Series [v1] RDMA/core: Handle ARPHRD_NONE devices for iWARP | expand

Commit Message

Chuck Lever June 7, 2023, 7:43 p.m. UTC
From: Chuck Lever <chuck.lever@oracle.com>

We would like to enable the use of siw on top of a VPN that is
constructed and managed via a tun device. That hasn't worked up
until now because ARPHRD_NONE devices (such as tun devices) have
no GID for the RDMA/core to look up.

But it turns out that the egress device has already been picked for
us. addr_handler() just has to do the right thing with it.

Tested with siw and qedr, both initiator and target.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 drivers/infiniband/core/cma.c |    3 +++
 1 file changed, 3 insertions(+)

This of course needs broader testing, but it seems to work, and it's
a little nicer than "if (dev_type == ARPHRD_NONE)".

One thing I discovered is that the NFS/RDMA server implementation
does not deal at all with more than one RDMA device on the system.
I will address that with an ib_client; SunRPC patches forthcoming.

Comments

Chuck Lever III June 8, 2023, 3:30 p.m. UTC | #1
> On Jun 7, 2023, at 3:43 PM, Chuck Lever <cel@kernel.org> wrote:
> 
> From: Chuck Lever <chuck.lever@oracle.com>
> 
> We would like to enable the use of siw on top of a VPN that is
> constructed and managed via a tun device. That hasn't worked up
> until now because ARPHRD_NONE devices (such as tun devices) have
> no GID for the RDMA/core to look up.
> 
> But it turns out that the egress device has already been picked for
> us. addr_handler() just has to do the right thing with it.
> 
> Tested with siw and qedr, both initiator and target.
> 
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
> drivers/infiniband/core/cma.c |    3 +++
> 1 file changed, 3 insertions(+)
> 
> This of course needs broader testing, but it seems to work, and it's
> a little nicer than "if (dev_type == ARPHRD_NONE)".
> 
> One thing I discovered is that the NFS/RDMA server implementation
> does not deal at all with more than one RDMA device on the system.
> I will address that with an ib_client; SunRPC patches forthcoming.

Or maybe not.

I'm looking at cma_iw_acquire_dev(). Where is the
listen_id_priv->id.port_num field supposed to be initialized?


> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
> index 56e568fcd32b..c9a2bdb49e3c 100644
> --- a/drivers/infiniband/core/cma.c
> +++ b/drivers/infiniband/core/cma.c
> @@ -694,6 +694,9 @@ cma_validate_port(struct ib_device *device, u32 port,
> if (!rdma_dev_access_netns(device, id_priv->id.route.addr.dev_addr.net))
> return ERR_PTR(-ENODEV);
> 
> + if (rdma_protocol_iwarp(device, port))
> + return rdma_get_gid_attr(device, port, 0);
> +
> if ((dev_type == ARPHRD_INFINIBAND) && !rdma_protocol_ib(device, port))
> return ERR_PTR(-ENODEV);
> 
> 
> 

--
Chuck Lever
Tom Talpey June 9, 2023, 8:49 p.m. UTC | #2
On 6/7/2023 3:43 PM, Chuck Lever wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
> 
> We would like to enable the use of siw on top of a VPN that is
> constructed and managed via a tun device. That hasn't worked up
> until now because ARPHRD_NONE devices (such as tun devices) have
> no GID for the RDMA/core to look up.
> 
> But it turns out that the egress device has already been picked for
> us. addr_handler() just has to do the right thing with it.
> 
> Tested with siw and qedr, both initiator and target.
> 
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>   drivers/infiniband/core/cma.c |    3 +++
>   1 file changed, 3 insertions(+)
> 
> This of course needs broader testing, but it seems to work, and it's
> a little nicer than "if (dev_type == ARPHRD_NONE)".
> 
> One thing I discovered is that the NFS/RDMA server implementation
> does not deal at all with more than one RDMA device on the system.
> I will address that with an ib_client; SunRPC patches forthcoming.
> 
> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
> index 56e568fcd32b..c9a2bdb49e3c 100644
> --- a/drivers/infiniband/core/cma.c
> +++ b/drivers/infiniband/core/cma.c
> @@ -694,6 +694,9 @@ cma_validate_port(struct ib_device *device, u32 port,
>   	if (!rdma_dev_access_netns(device, id_priv->id.route.addr.dev_addr.net))
>   		return ERR_PTR(-ENODEV);
>   
> +	if (rdma_protocol_iwarp(device, port))
> +		return rdma_get_gid_attr(device, port, 0);

This might work, but I'm skeptical of the conditional. There's nothing
special about iWARP that says a GID should come from the outgoing device
mac. And, other protocols without IB GID addressing won't benefit.

Wouldn't it be better to detect a missing GID, or infer the need from
some other transport attribute?

Tom.

> +
>   	if ((dev_type == ARPHRD_INFINIBAND) && !rdma_protocol_ib(device, port))
>   		return ERR_PTR(-ENODEV);
>   
> 
> 
>
Zhu Yanjun June 10, 2023, 12:04 a.m. UTC | #3
在 2023/6/10 4:49, Tom Talpey 写道:
> On 6/7/2023 3:43 PM, Chuck Lever wrote:
>> From: Chuck Lever <chuck.lever@oracle.com>
>>
>> We would like to enable the use of siw on top of a VPN that is
>> constructed and managed via a tun device. That hasn't worked up
>> until now because ARPHRD_NONE devices (such as tun devices) have
>> no GID for the RDMA/core to look up.
>>
>> But it turns out that the egress device has already been picked for
>> us. addr_handler() just has to do the right thing with it.
>>
>> Tested with siw and qedr, both initiator and target.
>>
>> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> ---
>>   drivers/infiniband/core/cma.c |    3 +++
>>   1 file changed, 3 insertions(+)
>>
>> This of course needs broader testing, but it seems to work, and it's
>> a little nicer than "if (dev_type == ARPHRD_NONE)".
>>
>> One thing I discovered is that the NFS/RDMA server implementation
>> does not deal at all with more than one RDMA device on the system.
>> I will address that with an ib_client; SunRPC patches forthcoming.
>>
>> diff --git a/drivers/infiniband/core/cma.c 
>> b/drivers/infiniband/core/cma.c
>> index 56e568fcd32b..c9a2bdb49e3c 100644
>> --- a/drivers/infiniband/core/cma.c
>> +++ b/drivers/infiniband/core/cma.c
>> @@ -694,6 +694,9 @@ cma_validate_port(struct ib_device *device, u32 port,
>>       if (!rdma_dev_access_netns(device, 
>> id_priv->id.route.addr.dev_addr.net))
>>           return ERR_PTR(-ENODEV);
>> +    if (rdma_protocol_iwarp(device, port))
>> +        return rdma_get_gid_attr(device, port, 0);
> 
> This might work, but I'm skeptical of the conditional. There's nothing
> special about iWARP that says a GID should come from the outgoing device
> mac. And, other protocols without IB GID addressing won't benefit.

Agree with you. Other protocols, such as RXE, also need be handled.
So a better solution than this needs.

Zhu Yanjun

> 
> Wouldn't it be better to detect a missing GID, or infer the need from
> some other transport attribute?
> 
> Tom.
> 
>> +
>>       if ((dev_type == ARPHRD_INFINIBAND) && !rdma_protocol_ib(device, 
>> port))
>>           return ERR_PTR(-ENODEV);
>>
>>
>>
Jason Gunthorpe June 10, 2023, 12:05 p.m. UTC | #4
On Fri, Jun 09, 2023 at 04:49:49PM -0400, Tom Talpey wrote:
> On 6/7/2023 3:43 PM, Chuck Lever wrote:
> > From: Chuck Lever <chuck.lever@oracle.com>
> > 
> > We would like to enable the use of siw on top of a VPN that is
> > constructed and managed via a tun device. That hasn't worked up
> > until now because ARPHRD_NONE devices (such as tun devices) have
> > no GID for the RDMA/core to look up.
> > 
> > But it turns out that the egress device has already been picked for
> > us. addr_handler() just has to do the right thing with it.
> > 
> > Tested with siw and qedr, both initiator and target.
> > 
> > Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> > Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> > ---
> >   drivers/infiniband/core/cma.c |    3 +++
> >   1 file changed, 3 insertions(+)
> > 
> > This of course needs broader testing, but it seems to work, and it's
> > a little nicer than "if (dev_type == ARPHRD_NONE)".
> > 
> > One thing I discovered is that the NFS/RDMA server implementation
> > does not deal at all with more than one RDMA device on the system.
> > I will address that with an ib_client; SunRPC patches forthcoming.
> > 
> > diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
> > index 56e568fcd32b..c9a2bdb49e3c 100644
> > --- a/drivers/infiniband/core/cma.c
> > +++ b/drivers/infiniband/core/cma.c
> > @@ -694,6 +694,9 @@ cma_validate_port(struct ib_device *device, u32 port,
> >   	if (!rdma_dev_access_netns(device, id_priv->id.route.addr.dev_addr.net))
> >   		return ERR_PTR(-ENODEV);
> > +	if (rdma_protocol_iwarp(device, port))
> > +		return rdma_get_gid_attr(device, port, 0);
> 
> This might work, but I'm skeptical of the conditional. There's nothing
> special about iWARP that says a GID should come from the outgoing device
> mac. And, other protocols without IB GID addressing won't benefit.

The GID represents the source address, so it better come from the
outgoing device or something is really wrong.

iWARP gets a conditional because iwarp always has a single GID, other
devices do not work that way.

Jason
Tom Talpey June 10, 2023, 4:38 p.m. UTC | #5
On 6/10/2023 8:05 AM, Jason Gunthorpe wrote:
> On Fri, Jun 09, 2023 at 04:49:49PM -0400, Tom Talpey wrote:
>> On 6/7/2023 3:43 PM, Chuck Lever wrote:
>>> From: Chuck Lever <chuck.lever@oracle.com>
>>>
>>> We would like to enable the use of siw on top of a VPN that is
>>> constructed and managed via a tun device. That hasn't worked up
>>> until now because ARPHRD_NONE devices (such as tun devices) have
>>> no GID for the RDMA/core to look up.
>>>
>>> But it turns out that the egress device has already been picked for
>>> us. addr_handler() just has to do the right thing with it.
>>>
>>> Tested with siw and qedr, both initiator and target.
>>>
>>> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>> ---
>>>    drivers/infiniband/core/cma.c |    3 +++
>>>    1 file changed, 3 insertions(+)
>>>
>>> This of course needs broader testing, but it seems to work, and it's
>>> a little nicer than "if (dev_type == ARPHRD_NONE)".
>>>
>>> One thing I discovered is that the NFS/RDMA server implementation
>>> does not deal at all with more than one RDMA device on the system.
>>> I will address that with an ib_client; SunRPC patches forthcoming.
>>>
>>> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
>>> index 56e568fcd32b..c9a2bdb49e3c 100644
>>> --- a/drivers/infiniband/core/cma.c
>>> +++ b/drivers/infiniband/core/cma.c
>>> @@ -694,6 +694,9 @@ cma_validate_port(struct ib_device *device, u32 port,
>>>    	if (!rdma_dev_access_netns(device, id_priv->id.route.addr.dev_addr.net))
>>>    		return ERR_PTR(-ENODEV);
>>> +	if (rdma_protocol_iwarp(device, port))
>>> +		return rdma_get_gid_attr(device, port, 0);
>>
>> This might work, but I'm skeptical of the conditional. There's nothing
>> special about iWARP that says a GID should come from the outgoing device
>> mac. And, other protocols without IB GID addressing won't benefit.
> 
> The GID represents the source address, so it better come from the
> outgoing device or something is really wrong.
> 
> iWARP gets a conditional because iwarp always has a single GID, other
> devices do not work that way.

Not sure I follow. TCP is routable so it can use multiple egress ports.
That same routing means an incoming packet bearing an appropriate local
address will be accepted on any port.

So still, I don't think this an iWARP property per se. It's coming from
the transport and its addressing. I'd hope that this can be gleaned from
something other than "it's iWARP, cma needs to do ...".

Tom.
Chuck Lever III June 10, 2023, 4:43 p.m. UTC | #6
> On Jun 9, 2023, at 8:04 PM, Zhu Yanjun <yanjun.zhu@linux.dev> wrote:
> 
> 在 2023/6/10 4:49, Tom Talpey 写道:
>> On 6/7/2023 3:43 PM, Chuck Lever wrote:
>>> From: Chuck Lever <chuck.lever@oracle.com>
>>> 
>>> We would like to enable the use of siw on top of a VPN that is
>>> constructed and managed via a tun device. That hasn't worked up
>>> until now because ARPHRD_NONE devices (such as tun devices) have
>>> no GID for the RDMA/core to look up.
>>> 
>>> But it turns out that the egress device has already been picked for
>>> us. addr_handler() just has to do the right thing with it.
>>> 
>>> Tested with siw and qedr, both initiator and target.
>>> 
>>> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>> ---
>>>   drivers/infiniband/core/cma.c |    3 +++
>>>   1 file changed, 3 insertions(+)
>>> 
>>> This of course needs broader testing, but it seems to work, and it's
>>> a little nicer than "if (dev_type == ARPHRD_NONE)".
>>> 
>>> One thing I discovered is that the NFS/RDMA server implementation
>>> does not deal at all with more than one RDMA device on the system.
>>> I will address that with an ib_client; SunRPC patches forthcoming.
>>> 
>>> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
>>> index 56e568fcd32b..c9a2bdb49e3c 100644
>>> --- a/drivers/infiniband/core/cma.c
>>> +++ b/drivers/infiniband/core/cma.c
>>> @@ -694,6 +694,9 @@ cma_validate_port(struct ib_device *device, u32 port,
>>>       if (!rdma_dev_access_netns(device, id_priv->id.route.addr.dev_addr.net))
>>>           return ERR_PTR(-ENODEV);
>>> +    if (rdma_protocol_iwarp(device, port))
>>> +        return rdma_get_gid_attr(device, port, 0);
>> This might work, but I'm skeptical of the conditional. There's nothing
>> special about iWARP that says a GID should come from the outgoing device
>> mac. And, other protocols without IB GID addressing won't benefit.

Hi Zhu, thanks for having a look.


> Agree with you. Other protocols, such as RXE, also need be handled.
> So a better solution than this needs.

I assume you mean, in particular, RoCEv2 with rxe on a non-Ethernet
device? Tom and I discussed that possibility privately while I
developed this patch.

I do not object to that idea. But I would feel more comfortable if
that was justified and tested via a separate patch. I expect that
the logic for those cases will look different to what I'm adding here.


> Zhu Yanjun
> 
>> Wouldn't it be better to detect a missing GID, or infer the need from
>> some other transport attribute?
>> Tom.
>>> +
>>>       if ((dev_type == ARPHRD_INFINIBAND) && !rdma_protocol_ib(device, port))
>>>           return ERR_PTR(-ENODEV);


--
Chuck Lever
Chuck Lever III June 10, 2023, 5:11 p.m. UTC | #7
> On Jun 10, 2023, at 12:38 PM, Tom Talpey <tom@talpey.com> wrote:
> 
> On 6/10/2023 8:05 AM, Jason Gunthorpe wrote:
>> On Fri, Jun 09, 2023 at 04:49:49PM -0400, Tom Talpey wrote:
>>> On 6/7/2023 3:43 PM, Chuck Lever wrote:
>>>> From: Chuck Lever <chuck.lever@oracle.com>
>>>> 
>>>> We would like to enable the use of siw on top of a VPN that is
>>>> constructed and managed via a tun device. That hasn't worked up
>>>> until now because ARPHRD_NONE devices (such as tun devices) have
>>>> no GID for the RDMA/core to look up.
>>>> 
>>>> But it turns out that the egress device has already been picked for
>>>> us. addr_handler() just has to do the right thing with it.
>>>> 
>>>> Tested with siw and qedr, both initiator and target.
>>>> 
>>>> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
>>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>>> ---
>>>>   drivers/infiniband/core/cma.c |    3 +++
>>>>   1 file changed, 3 insertions(+)
>>>> 
>>>> This of course needs broader testing, but it seems to work, and it's
>>>> a little nicer than "if (dev_type == ARPHRD_NONE)".
>>>> 
>>>> One thing I discovered is that the NFS/RDMA server implementation
>>>> does not deal at all with more than one RDMA device on the system.
>>>> I will address that with an ib_client; SunRPC patches forthcoming.
>>>> 
>>>> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
>>>> index 56e568fcd32b..c9a2bdb49e3c 100644
>>>> --- a/drivers/infiniband/core/cma.c
>>>> +++ b/drivers/infiniband/core/cma.c
>>>> @@ -694,6 +694,9 @@ cma_validate_port(struct ib_device *device, u32 port,
>>>>    if (!rdma_dev_access_netns(device, id_priv->id.route.addr.dev_addr.net))
>>>>    return ERR_PTR(-ENODEV);
>>>> + if (rdma_protocol_iwarp(device, port))
>>>> + return rdma_get_gid_attr(device, port, 0);
>>> 
>>> This might work, but I'm skeptical of the conditional. There's nothing
>>> special about iWARP that says a GID should come from the outgoing device
>>> mac. And, other protocols without IB GID addressing won't benefit.
>> The GID represents the source address, so it better come from the
>> outgoing device or something is really wrong.
>> iWARP gets a conditional because iwarp always has a single GID, other
>> devices do not work that way.
> 
> Not sure I follow. TCP is routable so it can use multiple egress ports.
> That same routing means an incoming packet bearing an appropriate local
> address will be accepted on any port.

An siw virtual device, for example, is paired with one particular network
i/f. I'm not sure that "multiple egress ports" is applicable. I would
think this is also the case with a device-offloaded iWARP implementation:
the egress device is always the i/f on the offload card.


> So still, I don't think this an iWARP property per se.

Agreed, it's not particular to the iWARP protocol. But it does seem to
be particular to the entire class of iWARP protocol implementations on
Linux.

Adding a comment about this distinction would certainly be apropos.


> It's coming from
> the transport and its addressing. I'd hope that this can be gleaned from
> something other than "it's iWARP, cma needs to do ...".

I could add another flag somewhere to indicate this property of a
device. I expect only Linux iWARP devices would set it, though, so
would it really add much value?

Do you have a recommendation for where to home such a flag? Would it
be a fixed device property, or something that would be set dynamically
during earlier steps of address resolution? Again, it feels like
something that would be set iff rdma_protocol_iwarp() is true...

--
Chuck Lever
Zhu Yanjun June 11, 2023, 12:27 a.m. UTC | #8
在 2023/6/11 0:43, Chuck Lever III 写道:
>
>> On Jun 9, 2023, at 8:04 PM, Zhu Yanjun <yanjun.zhu@linux.dev> wrote:
>>
>> 在 2023/6/10 4:49, Tom Talpey 写道:
>>> On 6/7/2023 3:43 PM, Chuck Lever wrote:
>>>> From: Chuck Lever <chuck.lever@oracle.com>
>>>>
>>>> We would like to enable the use of siw on top of a VPN that is
>>>> constructed and managed via a tun device. That hasn't worked up
>>>> until now because ARPHRD_NONE devices (such as tun devices) have
>>>> no GID for the RDMA/core to look up.
>>>>
>>>> But it turns out that the egress device has already been picked for
>>>> us. addr_handler() just has to do the right thing with it.
>>>>
>>>> Tested with siw and qedr, both initiator and target.
>>>>
>>>> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
>>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>>> ---
>>>>    drivers/infiniband/core/cma.c |    3 +++
>>>>    1 file changed, 3 insertions(+)
>>>>
>>>> This of course needs broader testing, but it seems to work, and it's
>>>> a little nicer than "if (dev_type == ARPHRD_NONE)".
>>>>
>>>> One thing I discovered is that the NFS/RDMA server implementation
>>>> does not deal at all with more than one RDMA device on the system.
>>>> I will address that with an ib_client; SunRPC patches forthcoming.
>>>>
>>>> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
>>>> index 56e568fcd32b..c9a2bdb49e3c 100644
>>>> --- a/drivers/infiniband/core/cma.c
>>>> +++ b/drivers/infiniband/core/cma.c
>>>> @@ -694,6 +694,9 @@ cma_validate_port(struct ib_device *device, u32 port,
>>>>        if (!rdma_dev_access_netns(device, id_priv->id.route.addr.dev_addr.net))
>>>>            return ERR_PTR(-ENODEV);
>>>> +    if (rdma_protocol_iwarp(device, port))
>>>> +        return rdma_get_gid_attr(device, port, 0);
>>> This might work, but I'm skeptical of the conditional. There's nothing
>>> special about iWARP that says a GID should come from the outgoing device
>>> mac. And, other protocols without IB GID addressing won't benefit.
> Hi Zhu, thanks for having a look.
>
>
>> Agree with you. Other protocols, such as RXE, also need be handled.
>> So a better solution than this needs.
> I assume you mean, in particular, RoCEv2 with rxe on a non-Ethernet
> device? Tom and I discussed that possibility privately while I
> developed this patch.

I am not sure if I can get you correctly or not. I mean, if RXE is 
attached to a loopback device,

the symptopms are similar to iWARP.

So if you want to fix this problem on iWARP, it is better to also fix 
the similar problem on RXE.

That is , your patch should fix the similar problems on all the rdma 
devices, not just for iWARP.

I am not sure if this request is appropriate to you or not.

Zhu Yanjun

>
> I do not object to that idea. But I would feel more comfortable if
> that was justified and tested via a separate patch. I expect that
> the logic for those cases will look different to what I'm adding here.
>
>
>> Zhu Yanjun
>>
>>> Wouldn't it be better to detect a missing GID, or infer the need from
>>> some other transport attribute?
>>> Tom.
>>>> +
>>>>        if ((dev_type == ARPHRD_INFINIBAND) && !rdma_protocol_ib(device, port))
>>>>            return ERR_PTR(-ENODEV);
>
> --
> Chuck Lever
>
>
Zhu Yanjun June 11, 2023, 12:33 a.m. UTC | #9
在 2023/6/10 20:05, Jason Gunthorpe 写道:
> On Fri, Jun 09, 2023 at 04:49:49PM -0400, Tom Talpey wrote:
>> On 6/7/2023 3:43 PM, Chuck Lever wrote:
>>> From: Chuck Lever <chuck.lever@oracle.com>
>>>
>>> We would like to enable the use of siw on top of a VPN that is
>>> constructed and managed via a tun device. That hasn't worked up
>>> until now because ARPHRD_NONE devices (such as tun devices) have
>>> no GID for the RDMA/core to look up.
>>>
>>> But it turns out that the egress device has already been picked for
>>> us. addr_handler() just has to do the right thing with it.
>>>
>>> Tested with siw and qedr, both initiator and target.
>>>
>>> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>> ---
>>>    drivers/infiniband/core/cma.c |    3 +++
>>>    1 file changed, 3 insertions(+)
>>>
>>> This of course needs broader testing, but it seems to work, and it's
>>> a little nicer than "if (dev_type == ARPHRD_NONE)".
>>>
>>> One thing I discovered is that the NFS/RDMA server implementation
>>> does not deal at all with more than one RDMA device on the system.
>>> I will address that with an ib_client; SunRPC patches forthcoming.
>>>
>>> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
>>> index 56e568fcd32b..c9a2bdb49e3c 100644
>>> --- a/drivers/infiniband/core/cma.c
>>> +++ b/drivers/infiniband/core/cma.c
>>> @@ -694,6 +694,9 @@ cma_validate_port(struct ib_device *device, u32 port,
>>>    	if (!rdma_dev_access_netns(device, id_priv->id.route.addr.dev_addr.net))
>>>    		return ERR_PTR(-ENODEV);
>>> +	if (rdma_protocol_iwarp(device, port))
>>> +		return rdma_get_gid_attr(device, port, 0);
>>
>> This might work, but I'm skeptical of the conditional. There's nothing
>> special about iWARP that says a GID should come from the outgoing device
>> mac. And, other protocols without IB GID addressing won't benefit.
> 
> The GID represents the source address, so it better come from the
> outgoing device or something is really wrong.
> 
> iWARP gets a conditional because iwarp always has a single GID, other
> devices do not work that way.


Thanks. If I get this problem correctly, this problem is about "GID can 
not be generated correctly if a iWARP device is attached to a loopback 
device". This patch will fix this problem. So iWARP device can be 
managed via this unique GID.

Do you mean that RXE does not have the similar problem?
But if a RXE device is attached to a loopback device, the similar 
problem occurs.

Zhu Yanjun

> 
> Jason
Jason Gunthorpe June 11, 2023, 1:08 a.m. UTC | #10
On Sat, Jun 10, 2023 at 12:38:53PM -0400, Tom Talpey wrote:
> On 6/10/2023 8:05 AM, Jason Gunthorpe wrote:
> > On Fri, Jun 09, 2023 at 04:49:49PM -0400, Tom Talpey wrote:
> > > On 6/7/2023 3:43 PM, Chuck Lever wrote:
> > > > From: Chuck Lever <chuck.lever@oracle.com>
> > > > 
> > > > We would like to enable the use of siw on top of a VPN that is
> > > > constructed and managed via a tun device. That hasn't worked up
> > > > until now because ARPHRD_NONE devices (such as tun devices) have
> > > > no GID for the RDMA/core to look up.
> > > > 
> > > > But it turns out that the egress device has already been picked for
> > > > us. addr_handler() just has to do the right thing with it.
> > > > 
> > > > Tested with siw and qedr, both initiator and target.
> > > > 
> > > > Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> > > > Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> > > > ---
> > > >    drivers/infiniband/core/cma.c |    3 +++
> > > >    1 file changed, 3 insertions(+)
> > > > 
> > > > This of course needs broader testing, but it seems to work, and it's
> > > > a little nicer than "if (dev_type == ARPHRD_NONE)".
> > > > 
> > > > One thing I discovered is that the NFS/RDMA server implementation
> > > > does not deal at all with more than one RDMA device on the system.
> > > > I will address that with an ib_client; SunRPC patches forthcoming.
> > > > 
> > > > diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
> > > > index 56e568fcd32b..c9a2bdb49e3c 100644
> > > > --- a/drivers/infiniband/core/cma.c
> > > > +++ b/drivers/infiniband/core/cma.c
> > > > @@ -694,6 +694,9 @@ cma_validate_port(struct ib_device *device, u32 port,
> > > >    	if (!rdma_dev_access_netns(device, id_priv->id.route.addr.dev_addr.net))
> > > >    		return ERR_PTR(-ENODEV);
> > > > +	if (rdma_protocol_iwarp(device, port))
> > > > +		return rdma_get_gid_attr(device, port, 0);
> > > 
> > > This might work, but I'm skeptical of the conditional. There's nothing
> > > special about iWARP that says a GID should come from the outgoing device
> > > mac. And, other protocols without IB GID addressing won't benefit.
> > 
> > The GID represents the source address, so it better come from the
> > outgoing device or something is really wrong.
> > 
> > iWARP gets a conditional because iwarp always has a single GID, other
> > devices do not work that way.
> 
> Not sure I follow. TCP is routable so it can use multiple egress ports.
> That same routing means an incoming packet bearing an appropriate local
> address will be accepted on any port.

sgid is about the outgoing device (and usually IP, but not for iwarp),
and RDMA is always bound to a single outgoing netdevice

> So still, I don't think this an iWARP property per se. It's coming from
> the transport and its addressing. I'd hope that this can be gleaned from
> something other than "it's iWARP, cma needs to do ...".

Well, we are supposed to match the sgid based  on the outgoing route
selected, but the iwarp drivers don't setup their sgid's properly..

You are right it should be fixed better, but I'm not sure anyone is
left familiar enough with all the iwarp drivers to do the change.

Jason
Chuck Lever III June 12, 2023, 12:19 a.m. UTC | #11
> On Jun 10, 2023, at 8:05 AM, Jason Gunthorpe <jgg@nvidia.com> wrote:
> 
> On Fri, Jun 09, 2023 at 04:49:49PM -0400, Tom Talpey wrote:
>> On 6/7/2023 3:43 PM, Chuck Lever wrote:
>>> From: Chuck Lever <chuck.lever@oracle.com>
>>> 
>>> We would like to enable the use of siw on top of a VPN that is
>>> constructed and managed via a tun device. That hasn't worked up
>>> until now because ARPHRD_NONE devices (such as tun devices) have
>>> no GID for the RDMA/core to look up.
>>> 
>>> But it turns out that the egress device has already been picked for
>>> us. addr_handler() just has to do the right thing with it.
>>> 
>>> Tested with siw and qedr, both initiator and target.
>>> 
>>> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>> ---
>>>  drivers/infiniband/core/cma.c |    3 +++
>>>  1 file changed, 3 insertions(+)
>>> 
>>> This of course needs broader testing, but it seems to work, and it's
>>> a little nicer than "if (dev_type == ARPHRD_NONE)".
>>> 
>>> One thing I discovered is that the NFS/RDMA server implementation
>>> does not deal at all with more than one RDMA device on the system.
>>> I will address that with an ib_client; SunRPC patches forthcoming.
>>> 
>>> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
>>> index 56e568fcd32b..c9a2bdb49e3c 100644
>>> --- a/drivers/infiniband/core/cma.c
>>> +++ b/drivers/infiniband/core/cma.c
>>> @@ -694,6 +694,9 @@ cma_validate_port(struct ib_device *device, u32 port,
>>>   if (!rdma_dev_access_netns(device, id_priv->id.route.addr.dev_addr.net))
>>>   return ERR_PTR(-ENODEV);
>>> + 	if (rdma_protocol_iwarp(device, port))
>>> + 		return rdma_get_gid_attr(device, port, 0);
>> 
>> This might work, but I'm skeptical of the conditional. There's nothing
>> special about iWARP that says a GID should come from the outgoing device
>> mac. And, other protocols without IB GID addressing won't benefit.
> 
> The GID represents the source address, so it better come from the
> outgoing device or something is really wrong.
> 
> iWARP gets a conditional because iwarp always has a single GID, other
> devices do not work that way.

If rdma_get_gid_attr() works for rxe, perhaps we could change
the conditional to match on only software-emulated devices.


--
Chuck Lever
Zhu Yanjun June 12, 2023, 2:33 a.m. UTC | #12
在 2023/6/12 8:19, Chuck Lever III 写道:
> 
> 
>> On Jun 10, 2023, at 8:05 AM, Jason Gunthorpe <jgg@nvidia.com> wrote:
>>
>> On Fri, Jun 09, 2023 at 04:49:49PM -0400, Tom Talpey wrote:
>>> On 6/7/2023 3:43 PM, Chuck Lever wrote:
>>>> From: Chuck Lever <chuck.lever@oracle.com>
>>>>
>>>> We would like to enable the use of siw on top of a VPN that is
>>>> constructed and managed via a tun device. That hasn't worked up
>>>> until now because ARPHRD_NONE devices (such as tun devices) have
>>>> no GID for the RDMA/core to look up.
>>>>
>>>> But it turns out that the egress device has already been picked for
>>>> us. addr_handler() just has to do the right thing with it.
>>>>
>>>> Tested with siw and qedr, both initiator and target.
>>>>
>>>> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
>>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>>> ---
>>>>   drivers/infiniband/core/cma.c |    3 +++
>>>>   1 file changed, 3 insertions(+)
>>>>
>>>> This of course needs broader testing, but it seems to work, and it's
>>>> a little nicer than "if (dev_type == ARPHRD_NONE)".
>>>>
>>>> One thing I discovered is that the NFS/RDMA server implementation
>>>> does not deal at all with more than one RDMA device on the system.
>>>> I will address that with an ib_client; SunRPC patches forthcoming.
>>>>
>>>> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
>>>> index 56e568fcd32b..c9a2bdb49e3c 100644
>>>> --- a/drivers/infiniband/core/cma.c
>>>> +++ b/drivers/infiniband/core/cma.c
>>>> @@ -694,6 +694,9 @@ cma_validate_port(struct ib_device *device, u32 port,
>>>>    if (!rdma_dev_access_netns(device, id_priv->id.route.addr.dev_addr.net))
>>>>    return ERR_PTR(-ENODEV);
>>>> + 	if (rdma_protocol_iwarp(device, port))
>>>> + 		return rdma_get_gid_attr(device, port, 0);
>>>
>>> This might work, but I'm skeptical of the conditional. There's nothing
>>> special about iWARP that says a GID should come from the outgoing device
>>> mac. And, other protocols without IB GID addressing won't benefit.
>>
>> The GID represents the source address, so it better come from the
>> outgoing device or something is really wrong.
>>
>> iWARP gets a conditional because iwarp always has a single GID, other
>> devices do not work that way.
> 
> If rdma_get_gid_attr() works for rxe, perhaps we could change
> the conditional to match on only software-emulated devices.


drivers/infiniband/sw/rxe/rxe_net.c:

453 struct sk_buff *rxe_init_packet(struct rxe_dev *rxe, struct rxe_av *av,
454                                 int paylen, struct rxe_pkt_info *pkt)
455 {
456         unsigned int hdr_len;
457         struct sk_buff *skb = NULL;
458         struct net_device *ndev;
459         const struct ib_gid_attr *attr;
460         const int port_num = 1;
461
462         attr = rdma_get_gid_attr(&rxe->ib_dev, port_num, 
av->grh.sgid_index);
463         if (IS_ERR(attr))
464                 return NULL;
465
...

Zhu Yanjun

> 
> 
> --
> Chuck Lever
> 
>
diff mbox series

Patch

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 56e568fcd32b..c9a2bdb49e3c 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -694,6 +694,9 @@  cma_validate_port(struct ib_device *device, u32 port,
 	if (!rdma_dev_access_netns(device, id_priv->id.route.addr.dev_addr.net))
 		return ERR_PTR(-ENODEV);
 
+	if (rdma_protocol_iwarp(device, port))
+		return rdma_get_gid_attr(device, port, 0);
+
 	if ((dev_type == ARPHRD_INFINIBAND) && !rdma_protocol_ib(device, port))
 		return ERR_PTR(-ENODEV);