diff mbox series

[v5,4/4] RDMA/cma: Avoid GID lookups on iWARP devices

Message ID 168805181129.164730.67097436102991889.stgit@manet.1015granger.net (mailing list archive)
State Superseded
Headers show
Series Handle ARPHRD_NONE devices for siw | expand

Commit Message

Chuck Lever June 29, 2023, 3:16 p.m. UTC
From: Chuck Lever <chuck.lever@oracle.com>

We would like to enable the use of siw on top of a VPN that is
constructed and managed via a tun device. That hasn't worked up
until now because ARPHRD_NONE devices (such as tun devices) have
no GID for the RDMA/core to look up.

But it turns out that the egress device has already been picked for
us -- no GID is necessary. addr_handler() just has to do the right
thing with it.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 drivers/infiniband/core/cma.c |   15 +++++++++++++++
 1 file changed, 15 insertions(+)

Comments

Tom Talpey July 1, 2023, 4:24 p.m. UTC | #1
On 6/29/2023 11:16 AM, Chuck Lever wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
> 
> We would like to enable the use of siw on top of a VPN that is
> constructed and managed via a tun device. That hasn't worked up
> until now because ARPHRD_NONE devices (such as tun devices) have
> no GID for the RDMA/core to look up.
> 
> But it turns out that the egress device has already been picked for
> us -- no GID is necessary. addr_handler() just has to do the right
> thing with it.
> 
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>   drivers/infiniband/core/cma.c |   15 +++++++++++++++
>   1 file changed, 15 insertions(+)
> 
> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
> index 889b3e4ea980..07bb5ac4019d 100644
> --- a/drivers/infiniband/core/cma.c
> +++ b/drivers/infiniband/core/cma.c
> @@ -700,6 +700,21 @@ cma_validate_port(struct ib_device *device, u32 port,
>   	if ((dev_type != ARPHRD_INFINIBAND) && rdma_protocol_ib(device, port))
>   		goto out;
>   
> +	/* Linux iWARP devices have but one port */

I don't believe this comment is correct, or necessary. In-tree drivers
exist for several multi-port iWARP devices, and the port bnumber passed
to rdma_protocol_iwarp() and rdma_get_gid_attr() will follow, no?

The code looks correct otherwise...

Tom.

> +	if (rdma_protocol_iwarp(device, port)) {
> +		sgid_attr = rdma_get_gid_attr(device, port, 0);
> +		if (IS_ERR(sgid_attr))
> +			goto out;
> +
> +		rcu_read_lock();
> +		ndev = rcu_dereference(sgid_attr->ndev);
> +		if (!net_eq(dev_net(ndev), dev_addr->net) ||
> +		    ndev->ifindex != bound_if_index)
> +			sgid_attr = ERR_PTR(-ENODEV);
> +		rcu_read_unlock();
> +		goto out;
> +	}
> +
>   	if (dev_type == ARPHRD_ETHER && rdma_protocol_roce(device, port)) {
>   		ndev = dev_get_by_index(dev_addr->net, bound_if_index);
>   		if (!ndev)
> 
> 
>
Chuck Lever III July 1, 2023, 4:27 p.m. UTC | #2
> On Jul 1, 2023, at 12:24 PM, Tom Talpey <tom@talpey.com> wrote:
> 
> On 6/29/2023 11:16 AM, Chuck Lever wrote:
>> From: Chuck Lever <chuck.lever@oracle.com>
>> We would like to enable the use of siw on top of a VPN that is
>> constructed and managed via a tun device. That hasn't worked up
>> until now because ARPHRD_NONE devices (such as tun devices) have
>> no GID for the RDMA/core to look up.
>> But it turns out that the egress device has already been picked for
>> us -- no GID is necessary. addr_handler() just has to do the right
>> thing with it.
>> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> ---
>>  drivers/infiniband/core/cma.c |   15 +++++++++++++++
>>  1 file changed, 15 insertions(+)
>> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
>> index 889b3e4ea980..07bb5ac4019d 100644
>> --- a/drivers/infiniband/core/cma.c
>> +++ b/drivers/infiniband/core/cma.c
>> @@ -700,6 +700,21 @@ cma_validate_port(struct ib_device *device, u32 port,
>>   if ((dev_type != ARPHRD_INFINIBAND) && rdma_protocol_ib(device, port))
>>   goto out;
>>  + /* Linux iWARP devices have but one port */
> 
> I don't believe this comment is correct, or necessary. In-tree drivers
> exist for several multi-port iWARP devices, and the port bnumber passed
> to rdma_protocol_iwarp() and rdma_get_gid_attr() will follow, no?

Then I must have misunderstood what Jason said about the reason
for the rdma_protocol_iwarp() check. He said that we are able to
do this kind of GID lookup because iWARP devices have only a
single port.

Jason?


> The code looks correct otherwise...
> 
> Tom.
> 
>> + if (rdma_protocol_iwarp(device, port)) {
>> + sgid_attr = rdma_get_gid_attr(device, port, 0);
>> + if (IS_ERR(sgid_attr))
>> + goto out;
>> +
>> + rcu_read_lock();
>> + ndev = rcu_dereference(sgid_attr->ndev);
>> + if (!net_eq(dev_net(ndev), dev_addr->net) ||
>> +     ndev->ifindex != bound_if_index)
>> + sgid_attr = ERR_PTR(-ENODEV);
>> + rcu_read_unlock();
>> + goto out;
>> + }
>> +
>>   if (dev_type == ARPHRD_ETHER && rdma_protocol_roce(device, port)) {
>>   ndev = dev_get_by_index(dev_addr->net, bound_if_index);
>>   if (!ndev)


--
Chuck Lever
Jason Gunthorpe July 3, 2023, 9:07 p.m. UTC | #3
On Sat, Jul 01, 2023 at 04:27:23PM +0000, Chuck Lever III wrote:
> 
> 
> > On Jul 1, 2023, at 12:24 PM, Tom Talpey <tom@talpey.com> wrote:
> > 
> > On 6/29/2023 11:16 AM, Chuck Lever wrote:
> >> From: Chuck Lever <chuck.lever@oracle.com>
> >> We would like to enable the use of siw on top of a VPN that is
> >> constructed and managed via a tun device. That hasn't worked up
> >> until now because ARPHRD_NONE devices (such as tun devices) have
> >> no GID for the RDMA/core to look up.
> >> But it turns out that the egress device has already been picked for
> >> us -- no GID is necessary. addr_handler() just has to do the right
> >> thing with it.
> >> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> >> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> >> ---
> >>  drivers/infiniband/core/cma.c |   15 +++++++++++++++
> >>  1 file changed, 15 insertions(+)
> >> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
> >> index 889b3e4ea980..07bb5ac4019d 100644
> >> --- a/drivers/infiniband/core/cma.c
> >> +++ b/drivers/infiniband/core/cma.c
> >> @@ -700,6 +700,21 @@ cma_validate_port(struct ib_device *device, u32 port,
> >>   if ((dev_type != ARPHRD_INFINIBAND) && rdma_protocol_ib(device, port))
> >>   goto out;
> >>  + /* Linux iWARP devices have but one port */
> > 
> > I don't believe this comment is correct, or necessary. In-tree drivers
> > exist for several multi-port iWARP devices, and the port bnumber passed
> > to rdma_protocol_iwarp() and rdma_get_gid_attr() will follow, no?
> 
> Then I must have misunderstood what Jason said about the reason
> for the rdma_protocol_iwarp() check. He said that we are able to
> do this kind of GID lookup because iWARP devices have only a
> single port.
> 
> Jason?

I don't know alot about iwarp - tom does iwarp really have multiported
*struct ib_device* models? This is different from multiport hw.

If it is multiport how do the gid tables work across the ports?

Jason
Tom Talpey July 4, 2023, 2:23 p.m. UTC | #4
On 7/3/2023 5:07 PM, Jason Gunthorpe wrote:
> On Sat, Jul 01, 2023 at 04:27:23PM +0000, Chuck Lever III wrote:
>>
>>
>>> On Jul 1, 2023, at 12:24 PM, Tom Talpey <tom@talpey.com> wrote:
>>>
>>> On 6/29/2023 11:16 AM, Chuck Lever wrote:
>>>> From: Chuck Lever <chuck.lever@oracle.com>
>>>> We would like to enable the use of siw on top of a VPN that is
>>>> constructed and managed via a tun device. That hasn't worked up
>>>> until now because ARPHRD_NONE devices (such as tun devices) have
>>>> no GID for the RDMA/core to look up.
>>>> But it turns out that the egress device has already been picked for
>>>> us -- no GID is necessary. addr_handler() just has to do the right
>>>> thing with it.
>>>> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
>>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>>> ---
>>>>   drivers/infiniband/core/cma.c |   15 +++++++++++++++
>>>>   1 file changed, 15 insertions(+)
>>>> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
>>>> index 889b3e4ea980..07bb5ac4019d 100644
>>>> --- a/drivers/infiniband/core/cma.c
>>>> +++ b/drivers/infiniband/core/cma.c
>>>> @@ -700,6 +700,21 @@ cma_validate_port(struct ib_device *device, u32 port,
>>>>    if ((dev_type != ARPHRD_INFINIBAND) && rdma_protocol_ib(device, port))
>>>>    goto out;
>>>>   + /* Linux iWARP devices have but one port */
>>>
>>> I don't believe this comment is correct, or necessary. In-tree drivers
>>> exist for several multi-port iWARP devices, and the port bnumber passed
>>> to rdma_protocol_iwarp() and rdma_get_gid_attr() will follow, no?
>>
>> Then I must have misunderstood what Jason said about the reason
>> for the rdma_protocol_iwarp() check. He said that we are able to
>> do this kind of GID lookup because iWARP devices have only a
>> single port.
>>
>> Jason?
> 
> I don't know alot about iwarp - tom does iwarp really have multiported
> *struct ib_device* models? This is different from multiport hw.

I don't see how the iWARP protocol impacts this, but I believe the
cxgb4 provider implements multiport. It sets the ibdev.phys_port_cnt
anyway. Perhaps this is incorrect.

> If it is multiport how do the gid tables work across the ports?

Again, not sure how to respond. iWARP doesn't express the gid as a
protocol element. And the iw_cm really doesn't either, although it
does implement a gid-type API I guess. That's local behavior though,
not something that goes on the wire directly.

Maybe I should ask... what does the "Linux iWARP devices have but one
port" actually mean in the comment? Would the code below it not work
if that were not the case? All I'm saying is that the comment seems
to be unnecessary, and confusing.

Tom.
Chuck Lever III July 4, 2023, 2:54 p.m. UTC | #5
> On Jul 4, 2023, at 10:23 AM, Tom Talpey <tom@talpey.com> wrote:
> 
> On 7/3/2023 5:07 PM, Jason Gunthorpe wrote:
>> On Sat, Jul 01, 2023 at 04:27:23PM +0000, Chuck Lever III wrote:
>>> 
>>> 
>>>> On Jul 1, 2023, at 12:24 PM, Tom Talpey <tom@talpey.com> wrote:
>>>> 
>>>> On 6/29/2023 11:16 AM, Chuck Lever wrote:
>>>>> From: Chuck Lever <chuck.lever@oracle.com>
>>>>> We would like to enable the use of siw on top of a VPN that is
>>>>> constructed and managed via a tun device. That hasn't worked up
>>>>> until now because ARPHRD_NONE devices (such as tun devices) have
>>>>> no GID for the RDMA/core to look up.
>>>>> But it turns out that the egress device has already been picked for
>>>>> us -- no GID is necessary. addr_handler() just has to do the right
>>>>> thing with it.
>>>>> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
>>>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>>>> ---
>>>>>  drivers/infiniband/core/cma.c |   15 +++++++++++++++
>>>>>  1 file changed, 15 insertions(+)
>>>>> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
>>>>> index 889b3e4ea980..07bb5ac4019d 100644
>>>>> --- a/drivers/infiniband/core/cma.c
>>>>> +++ b/drivers/infiniband/core/cma.c
>>>>> @@ -700,6 +700,21 @@ cma_validate_port(struct ib_device *device, u32 port,
>>>>>   if ((dev_type != ARPHRD_INFINIBAND) && rdma_protocol_ib(device, port))
>>>>>   goto out;
>>>>>  + /* Linux iWARP devices have but one port */
>>>> 
>>>> I don't believe this comment is correct, or necessary. In-tree drivers
>>>> exist for several multi-port iWARP devices, and the port bnumber passed
>>>> to rdma_protocol_iwarp() and rdma_get_gid_attr() will follow, no?
>>> 
>>> Then I must have misunderstood what Jason said about the reason
>>> for the rdma_protocol_iwarp() check. He said that we are able to
>>> do this kind of GID lookup because iWARP devices have only a
>>> single port.
>>> 
>>> Jason?
>> I don't know alot about iwarp - tom does iwarp really have multiported
>> *struct ib_device* models? This is different from multiport hw.
> 
> I don't see how the iWARP protocol impacts this, but I believe the
> cxgb4 provider implements multiport. It sets the ibdev.phys_port_cnt
> anyway. Perhaps this is incorrect.
> 
>> If it is multiport how do the gid tables work across the ports?
> 
> Again, not sure how to respond. iWARP doesn't express the gid as a
> protocol element. And the iw_cm really doesn't either, although it
> does implement a gid-type API I guess. That's local behavior though,
> not something that goes on the wire directly.
> 
> Maybe I should ask... what does the "Linux iWARP devices have but one
> port" actually mean in the comment? Would the code below it not work
> if that were not the case? All I'm saying is that the comment seems
> to be unnecessary, and confusing.

It replaces a code comment you complained about in an earlier review
regarding the use of "if (rdma_protocol_iwarp())". As far as I
understand, /in Linux/ each iWARP endpoint gets its own ib_device
and that device has exactly one port.

So for example, a physical device that has two ports would appear
as two ib_devices each with a single port. Is that not how it
works? It's certainly possible I've misunderstood something.

Again, this is not about the protocol, it's about how Linux
implements it. I'm open to specific suggestions to improve the
comment, or I can remove it if the code is sufficiently clear.


--
Chuck Lever
Jason Gunthorpe July 10, 2023, 5:06 p.m. UTC | #6
On Tue, Jul 04, 2023 at 02:54:59PM +0000, Chuck Lever III wrote:
> 
> 
> > On Jul 4, 2023, at 10:23 AM, Tom Talpey <tom@talpey.com> wrote:
> > 
> > On 7/3/2023 5:07 PM, Jason Gunthorpe wrote:
> >> On Sat, Jul 01, 2023 at 04:27:23PM +0000, Chuck Lever III wrote:
> >>> 
> >>> 
> >>>> On Jul 1, 2023, at 12:24 PM, Tom Talpey <tom@talpey.com> wrote:
> >>>> 
> >>>> On 6/29/2023 11:16 AM, Chuck Lever wrote:
> >>>>> From: Chuck Lever <chuck.lever@oracle.com>
> >>>>> We would like to enable the use of siw on top of a VPN that is
> >>>>> constructed and managed via a tun device. That hasn't worked up
> >>>>> until now because ARPHRD_NONE devices (such as tun devices) have
> >>>>> no GID for the RDMA/core to look up.
> >>>>> But it turns out that the egress device has already been picked for
> >>>>> us -- no GID is necessary. addr_handler() just has to do the right
> >>>>> thing with it.
> >>>>> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> >>>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> >>>>> ---
> >>>>>  drivers/infiniband/core/cma.c |   15 +++++++++++++++
> >>>>>  1 file changed, 15 insertions(+)
> >>>>> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
> >>>>> index 889b3e4ea980..07bb5ac4019d 100644
> >>>>> --- a/drivers/infiniband/core/cma.c
> >>>>> +++ b/drivers/infiniband/core/cma.c
> >>>>> @@ -700,6 +700,21 @@ cma_validate_port(struct ib_device *device, u32 port,
> >>>>>   if ((dev_type != ARPHRD_INFINIBAND) && rdma_protocol_ib(device, port))
> >>>>>   goto out;
> >>>>>  + /* Linux iWARP devices have but one port */
> >>>> 
> >>>> I don't believe this comment is correct, or necessary. In-tree drivers
> >>>> exist for several multi-port iWARP devices, and the port bnumber passed
> >>>> to rdma_protocol_iwarp() and rdma_get_gid_attr() will follow, no?
> >>> 
> >>> Then I must have misunderstood what Jason said about the reason
> >>> for the rdma_protocol_iwarp() check. He said that we are able to
> >>> do this kind of GID lookup because iWARP devices have only a
> >>> single port.
> >>> 
> >>> Jason?
> >> I don't know alot about iwarp - tom does iwarp really have multiported
> >> *struct ib_device* models? This is different from multiport hw.
> > 
> > I don't see how the iWARP protocol impacts this, but I believe the
> > cxgb4 provider implements multiport. It sets the ibdev.phys_port_cnt
> > anyway. Perhaps this is incorrect.
> > 
> >> If it is multiport how do the gid tables work across the ports?
> > 
> > Again, not sure how to respond. iWARP doesn't express the gid as a
> > protocol element. And the iw_cm really doesn't either, although it
> > does implement a gid-type API I guess. That's local behavior though,
> > not something that goes on the wire directly.
> > 
> > Maybe I should ask... what does the "Linux iWARP devices have but one
> > port" actually mean in the comment? Would the code below it not work
> > if that were not the case? All I'm saying is that the comment seems
> > to be unnecessary, and confusing.
> 
> It replaces a code comment you complained about in an earlier review
> regarding the use of "if (rdma_protocol_iwarp())". As far as I
> understand, /in Linux/ each iWARP endpoint gets its own ib_device
> and that device has exactly one port.
>
> So for example, a physical device that has two ports would appear
> as two ib_devices each with a single port. Is that not how it
> works? It's certainly possible I've misunderstood something.

That is how I would expect it to work. Multi-port ib_device is really
only something that exists to support IB's APM, and iWarp doesn't have
that.

Otherwise verbs says a QP is bound to a single IB device's port and a
single GID of that port. It should not float around between multiple
ports.

So, I don't know what the iwarp drivers did here.

As for rthe comment, I don't think it is quite right, this code
already knows what ib_device port it is supposed to be using somehow,
so it doesn't matter.

I think it should be more like:

 In iWarp mode we need to have a sgid entry to be able to locate the
 netdevice. iWarp drivers are not allowed to associate more than one
 net device with their gid tables, so returning the first entry is
 sufficient. iWarp will ignore the GID entries actual GID, and the
 passed in GID may not even be present in the GID table for tunnels
 and other non-ethernet netdevices.

Jason
Chuck Lever III July 11, 2023, 10:49 p.m. UTC | #7
> On Jul 10, 2023, at 1:06 PM, Jason Gunthorpe <jgg@nvidia.com> wrote:
> 
> On Tue, Jul 04, 2023 at 02:54:59PM +0000, Chuck Lever III wrote:
>> 
>> 
>>> On Jul 4, 2023, at 10:23 AM, Tom Talpey <tom@talpey.com> wrote:
>>> 
>>> On 7/3/2023 5:07 PM, Jason Gunthorpe wrote:
>>>> On Sat, Jul 01, 2023 at 04:27:23PM +0000, Chuck Lever III wrote:
>>>>> 
>>>>> 
>>>>>> On Jul 1, 2023, at 12:24 PM, Tom Talpey <tom@talpey.com> wrote:
>>>>>> 
>>>>>> On 6/29/2023 11:16 AM, Chuck Lever wrote:
>>>>>>> From: Chuck Lever <chuck.lever@oracle.com>
>>>>>>> We would like to enable the use of siw on top of a VPN that is
>>>>>>> constructed and managed via a tun device. That hasn't worked up
>>>>>>> until now because ARPHRD_NONE devices (such as tun devices) have
>>>>>>> no GID for the RDMA/core to look up.
>>>>>>> But it turns out that the egress device has already been picked for
>>>>>>> us -- no GID is necessary. addr_handler() just has to do the right
>>>>>>> thing with it.
>>>>>>> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
>>>>>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>>>>>> ---
>>>>>>> drivers/infiniband/core/cma.c |   15 +++++++++++++++
>>>>>>> 1 file changed, 15 insertions(+)
>>>>>>> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
>>>>>>> index 889b3e4ea980..07bb5ac4019d 100644
>>>>>>> --- a/drivers/infiniband/core/cma.c
>>>>>>> +++ b/drivers/infiniband/core/cma.c
>>>>>>> @@ -700,6 +700,21 @@ cma_validate_port(struct ib_device *device, u32 port,
>>>>>>>  if ((dev_type != ARPHRD_INFINIBAND) && rdma_protocol_ib(device, port))
>>>>>>>  goto out;
>>>>>>> + /* Linux iWARP devices have but one port */
>>>>>> 
>>>>>> I don't believe this comment is correct, or necessary. In-tree drivers
>>>>>> exist for several multi-port iWARP devices, and the port bnumber passed
>>>>>> to rdma_protocol_iwarp() and rdma_get_gid_attr() will follow, no?
>>>>> 
>>>>> Then I must have misunderstood what Jason said about the reason
>>>>> for the rdma_protocol_iwarp() check. He said that we are able to
>>>>> do this kind of GID lookup because iWARP devices have only a
>>>>> single port.
>>>>> 
>>>>> Jason?
>>>> I don't know alot about iwarp - tom does iwarp really have multiported
>>>> *struct ib_device* models? This is different from multiport hw.
>>> 
>>> I don't see how the iWARP protocol impacts this, but I believe the
>>> cxgb4 provider implements multiport. It sets the ibdev.phys_port_cnt
>>> anyway. Perhaps this is incorrect.
>>> 
>>>> If it is multiport how do the gid tables work across the ports?
>>> 
>>> Again, not sure how to respond. iWARP doesn't express the gid as a
>>> protocol element. And the iw_cm really doesn't either, although it
>>> does implement a gid-type API I guess. That's local behavior though,
>>> not something that goes on the wire directly.
>>> 
>>> Maybe I should ask... what does the "Linux iWARP devices have but one
>>> port" actually mean in the comment? Would the code below it not work
>>> if that were not the case? All I'm saying is that the comment seems
>>> to be unnecessary, and confusing.
>> 
>> It replaces a code comment you complained about in an earlier review
>> regarding the use of "if (rdma_protocol_iwarp())". As far as I
>> understand, /in Linux/ each iWARP endpoint gets its own ib_device
>> and that device has exactly one port.
>> 
>> So for example, a physical device that has two ports would appear
>> as two ib_devices each with a single port. Is that not how it
>> works? It's certainly possible I've misunderstood something.
> 
> That is how I would expect it to work. Multi-port ib_device is really
> only something that exists to support IB's APM, and iWarp doesn't have
> that.
> 
> Otherwise verbs says a QP is bound to a single IB device's port and a
> single GID of that port. It should not float around between multiple
> ports.
> 
> So, I don't know what the iwarp drivers did here.
> 
> As for rthe comment, I don't think it is quite right, this code
> already knows what ib_device port it is supposed to be using somehow,
> so it doesn't matter.
> 
> I think it should be more like:
> 
> In iWarp mode we need to have a sgid entry to be able to locate the
> netdevice. iWarp drivers are not allowed to associate more than one
> net device with their gid tables, so returning the first entry is
> sufficient. iWarp will ignore the GID entries actual GID, and the
> passed in GID may not even be present in the GID table for tunnels
> and other non-ethernet netdevices.

I can make that change and post a refresh. I'd like
to hear from Tom first.



--
Chuck Lever
Tom Talpey July 12, 2023, 5:28 p.m. UTC | #8
On 7/11/2023 6:49 PM, Chuck Lever III wrote:
> 
> 
>> On Jul 10, 2023, at 1:06 PM, Jason Gunthorpe <jgg@nvidia.com> wrote:
>>
>> On Tue, Jul 04, 2023 at 02:54:59PM +0000, Chuck Lever III wrote:
>>>
>>>
>>>> On Jul 4, 2023, at 10:23 AM, Tom Talpey <tom@talpey.com> wrote:
>>>>
>>>> On 7/3/2023 5:07 PM, Jason Gunthorpe wrote:
>>>>> On Sat, Jul 01, 2023 at 04:27:23PM +0000, Chuck Lever III wrote:
>>>>>>
>>>>>>
>>>>>>> On Jul 1, 2023, at 12:24 PM, Tom Talpey <tom@talpey.com> wrote:
>>>>>>>
>>>>>>> On 6/29/2023 11:16 AM, Chuck Lever wrote:
>>>>>>>> From: Chuck Lever <chuck.lever@oracle.com>
>>>>>>>> We would like to enable the use of siw on top of a VPN that is
>>>>>>>> constructed and managed via a tun device. That hasn't worked up
>>>>>>>> until now because ARPHRD_NONE devices (such as tun devices) have
>>>>>>>> no GID for the RDMA/core to look up.
>>>>>>>> But it turns out that the egress device has already been picked for
>>>>>>>> us -- no GID is necessary. addr_handler() just has to do the right
>>>>>>>> thing with it.
>>>>>>>> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
>>>>>>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>>>>>>> ---
>>>>>>>> drivers/infiniband/core/cma.c |   15 +++++++++++++++
>>>>>>>> 1 file changed, 15 insertions(+)
>>>>>>>> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
>>>>>>>> index 889b3e4ea980..07bb5ac4019d 100644
>>>>>>>> --- a/drivers/infiniband/core/cma.c
>>>>>>>> +++ b/drivers/infiniband/core/cma.c
>>>>>>>> @@ -700,6 +700,21 @@ cma_validate_port(struct ib_device *device, u32 port,
>>>>>>>>   if ((dev_type != ARPHRD_INFINIBAND) && rdma_protocol_ib(device, port))
>>>>>>>>   goto out;
>>>>>>>> + /* Linux iWARP devices have but one port */
>>>>>>>
>>>>>>> I don't believe this comment is correct, or necessary. In-tree drivers
>>>>>>> exist for several multi-port iWARP devices, and the port bnumber passed
>>>>>>> to rdma_protocol_iwarp() and rdma_get_gid_attr() will follow, no?
>>>>>>
>>>>>> Then I must have misunderstood what Jason said about the reason
>>>>>> for the rdma_protocol_iwarp() check. He said that we are able to
>>>>>> do this kind of GID lookup because iWARP devices have only a
>>>>>> single port.
>>>>>>
>>>>>> Jason?
>>>>> I don't know alot about iwarp - tom does iwarp really have multiported
>>>>> *struct ib_device* models? This is different from multiport hw.
>>>>
>>>> I don't see how the iWARP protocol impacts this, but I believe the
>>>> cxgb4 provider implements multiport. It sets the ibdev.phys_port_cnt
>>>> anyway. Perhaps this is incorrect.
>>>>
>>>>> If it is multiport how do the gid tables work across the ports?
>>>>
>>>> Again, not sure how to respond. iWARP doesn't express the gid as a
>>>> protocol element. And the iw_cm really doesn't either, although it
>>>> does implement a gid-type API I guess. That's local behavior though,
>>>> not something that goes on the wire directly.
>>>>
>>>> Maybe I should ask... what does the "Linux iWARP devices have but one
>>>> port" actually mean in the comment? Would the code below it not work
>>>> if that were not the case? All I'm saying is that the comment seems
>>>> to be unnecessary, and confusing.
>>>
>>> It replaces a code comment you complained about in an earlier review
>>> regarding the use of "if (rdma_protocol_iwarp())". As far as I
>>> understand, /in Linux/ each iWARP endpoint gets its own ib_device
>>> and that device has exactly one port.
>>>
>>> So for example, a physical device that has two ports would appear
>>> as two ib_devices each with a single port. Is that not how it
>>> works? It's certainly possible I've misunderstood something.
>>
>> That is how I would expect it to work. Multi-port ib_device is really
>> only something that exists to support IB's APM, and iWarp doesn't have
>> that.
>>
>> Otherwise verbs says a QP is bound to a single IB device's port and a
>> single GID of that port. It should not float around between multiple
>> ports.
>>
>> So, I don't know what the iwarp drivers did here.
>>
>> As for rthe comment, I don't think it is quite right, this code
>> already knows what ib_device port it is supposed to be using somehow,
>> so it doesn't matter.
>>
>> I think it should be more like:
>>
>> In iWarp mode we need to have a sgid entry to be able to locate the
>> netdevice. iWarp drivers are not allowed to associate more than one
>> net device with their gid tables, so returning the first entry is
>> sufficient. iWarp will ignore the GID entries actual GID, and the
>> passed in GID may not even be present in the GID table for tunnels
>> and other non-ethernet netdevices.
> 
> I can make that change and post a refresh. I'd like
> to hear from Tom first.

The extra detail is helpful, but I'm still queasy about tying this to
iWARP. The situation seems much more general. But the breadcrumbs are
here so if the situation arises in another transport, there's a trail
for others to follow.

Tom.
diff mbox series

Patch

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 889b3e4ea980..07bb5ac4019d 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -700,6 +700,21 @@  cma_validate_port(struct ib_device *device, u32 port,
 	if ((dev_type != ARPHRD_INFINIBAND) && rdma_protocol_ib(device, port))
 		goto out;
 
+	/* Linux iWARP devices have but one port */
+	if (rdma_protocol_iwarp(device, port)) {
+		sgid_attr = rdma_get_gid_attr(device, port, 0);
+		if (IS_ERR(sgid_attr))
+			goto out;
+
+		rcu_read_lock();
+		ndev = rcu_dereference(sgid_attr->ndev);
+		if (!net_eq(dev_net(ndev), dev_addr->net) ||
+		    ndev->ifindex != bound_if_index)
+			sgid_attr = ERR_PTR(-ENODEV);
+		rcu_read_unlock();
+		goto out;
+	}
+
 	if (dev_type == ARPHRD_ETHER && rdma_protocol_roce(device, port)) {
 		ndev = dev_get_by_index(dev_addr->net, bound_if_index);
 		if (!ndev)