mbox series

[for-next,00/16] New hfi1 feature: Accelerated IP

Message ID 20200210131223.87776.21339.stgit@awfm-01.aw.intel.com (mailing list archive)
Headers show
Series New hfi1 feature: Accelerated IP | expand

Message

Dennis Dalessandro Feb. 10, 2020, 1:18 p.m. UTC
This patch series is an accelerated ipoib using the rdma netdev mechanism
already present in ipoib. A new device capability bit,
IB_DEVICE_RDMA_NETDEV_OPA, triggers ipoib to create a datagram QP using the
IB_QP_CREATE_NETDEV_USE.

The highlights include:
- Sharing send and receive resources with VNIC
- Allows for switching between connected mode and datagram mode
- Increases the maximum datagram MTU for opa devices to 10k

The same spreading capability exploited by VNIC is used here to vary
the receive context that receives the packet.

The patches are fully bisectable and stepwise implement the capability.

---

Gary Leshner (6):
      IB/hfi1: Add functions to transmit datagram ipoib packets
      IB/hfi1: Add the transmit side of a datagram ipoib RDMA netdev
      IB/hfi1: Remove module parameter for KDETH qpns
      IB/{rdmavt,hfi1}: Implement creation of accelerated UD QPs
      IB/{hfi1,ipoib,rdma}: Broadcast ping sent packets which exceeded mtu size
      IB/ipoib: Add capability to switch between datagram and connected mode

Grzegorz Andrejczuk (7):
      IB/hfi1: RSM rules for AIP
      IB/hfi1: Rename num_vnic_contexts as num_netdev_contexts
      IB/hfi1: Add functions to receive accelerated ipoib packets
      IB/hfi1: Add interrupt handler functions for accelerated ipoib
      IB/hfi1: Add rx functions for dummy netdev
      IB/hfi1: Activate the dummy netdev
      IB/hfi1: Add packet histogram trace event

Kaike Wan (1):
      IB/hfi1: Add accelerated IP capability bit

Piotr Stankiewicz (1):
      IB/hfi1: Enable the transmit side of the datagram ipoib netdev

Sadanand Warrier (1):
      IB/ipoib: Increase ipoib Datagram mode MTU's upper limit


 drivers/infiniband/hw/hfi1/Makefile            |    4 
 drivers/infiniband/hw/hfi1/affinity.c          |   12 
 drivers/infiniband/hw/hfi1/affinity.h          |    3 
 drivers/infiniband/hw/hfi1/chip.c              |  303 ++++++---
 drivers/infiniband/hw/hfi1/chip.h              |    5 
 drivers/infiniband/hw/hfi1/common.h            |   13 
 drivers/infiniband/hw/hfi1/driver.c            |  231 ++++++-
 drivers/infiniband/hw/hfi1/file_ops.c          |    4 
 drivers/infiniband/hw/hfi1/hfi.h               |   38 -
 drivers/infiniband/hw/hfi1/init.c              |   14 
 drivers/infiniband/hw/hfi1/ipoib.h             |  171 +++++
 drivers/infiniband/hw/hfi1/ipoib_main.c        |  309 +++++++++
 drivers/infiniband/hw/hfi1/ipoib_rx.c          |   95 +++
 drivers/infiniband/hw/hfi1/ipoib_tx.c          |  828 ++++++++++++++++++++++++
 drivers/infiniband/hw/hfi1/msix.c              |   36 +
 drivers/infiniband/hw/hfi1/msix.h              |    7 
 drivers/infiniband/hw/hfi1/netdev.h            |  118 +++
 drivers/infiniband/hw/hfi1/netdev_rx.c         |  481 ++++++++++++++
 drivers/infiniband/hw/hfi1/qp.c                |   18 -
 drivers/infiniband/hw/hfi1/tid_rdma.c          |    4 
 drivers/infiniband/hw/hfi1/trace.c             |   42 +
 drivers/infiniband/hw/hfi1/trace_ctxts.h       |   11 
 drivers/infiniband/hw/hfi1/verbs.c             |   13 
 drivers/infiniband/hw/hfi1/vnic.h              |    5 
 drivers/infiniband/hw/hfi1/vnic_main.c         |  318 ++-------
 drivers/infiniband/sw/rdmavt/qp.c              |   24 +
 drivers/infiniband/ulp/ipoib/ipoib_main.c      |   25 -
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c |   12 
 drivers/infiniband/ulp/ipoib/ipoib_verbs.c     |    3 
 drivers/infiniband/ulp/ipoib/ipoib_vlan.c      |    3 
 include/rdma/ib_verbs.h                        |   65 ++
 include/rdma/opa_port_info.h                   |   10 
 include/rdma/opa_vnic.h                        |    4 
 include/rdma/rdmavt_qp.h                       |   29 +
 include/uapi/rdma/hfi/hfi1_user.h              |    3 
 35 files changed, 2768 insertions(+), 493 deletions(-)
 create mode 100644 drivers/infiniband/hw/hfi1/ipoib.h
 create mode 100644 drivers/infiniband/hw/hfi1/ipoib_main.c
 create mode 100644 drivers/infiniband/hw/hfi1/ipoib_rx.c
 create mode 100644 drivers/infiniband/hw/hfi1/ipoib_tx.c
 create mode 100644 drivers/infiniband/hw/hfi1/netdev.h
 create mode 100644 drivers/infiniband/hw/hfi1/netdev_rx.c

--
-Denny

Comments

Jason Gunthorpe Feb. 10, 2020, 1:31 p.m. UTC | #1
On Mon, Feb 10, 2020 at 08:18:05AM -0500, Dennis Dalessandro wrote:
> This patch series is an accelerated ipoib using the rdma netdev mechanism
> already present in ipoib. A new device capability bit,
> IB_DEVICE_RDMA_NETDEV_OPA, triggers ipoib to create a datagram QP using the
> IB_QP_CREATE_NETDEV_USE.
> 
> The highlights include:
> - Sharing send and receive resources with VNIC
> - Allows for switching between connected mode and datagram mode

There is still value in connected mode?

> - Increases the maximum datagram MTU for opa devices to 10k
> 
> The same spreading capability exploited by VNIC is used here to vary
> the receive context that receives the packet.
> 
> The patches are fully bisectable and stepwise implement the capability.

This is alot of code to send without a performance
justification.. What is it? Is it worth while?

> Gary Leshner (6):
>       IB/hfi1: Add functions to transmit datagram ipoib packets
>       IB/hfi1: Add the transmit side of a datagram ipoib RDMA netdev
>       IB/hfi1: Remove module parameter for KDETH qpns
>       IB/{rdmavt,hfi1}: Implement creation of accelerated UD QPs
>       IB/{hfi1,ipoib,rdma}: Broadcast ping sent packets which exceeded mtu size
>       IB/ipoib: Add capability to switch between datagram and connected mode
> 
> Grzegorz Andrejczuk (7):
>       IB/hfi1: RSM rules for AIP
>       IB/hfi1: Rename num_vnic_contexts as num_netdev_contexts
>       IB/hfi1: Add functions to receive accelerated ipoib packets
>       IB/hfi1: Add interrupt handler functions for accelerated ipoib
>       IB/hfi1: Add rx functions for dummy netdev

This dummy netdev thing seemed very strange

Jason
Dennis Dalessandro Feb. 10, 2020, 5:36 p.m. UTC | #2
On 2/10/2020 8:31 AM, Jason Gunthorpe wrote:
> On Mon, Feb 10, 2020 at 08:18:05AM -0500, Dennis Dalessandro wrote:
>> This patch series is an accelerated ipoib using the rdma netdev mechanism
>> already present in ipoib. A new device capability bit,
>> IB_DEVICE_RDMA_NETDEV_OPA, triggers ipoib to create a datagram QP using the
>> IB_QP_CREATE_NETDEV_USE.
>>
>> The highlights include:
>> - Sharing send and receive resources with VNIC
>> - Allows for switching between connected mode and datagram mode
> 
> There is still value in connected mode?

It's really a compatibility thing. If someone wants to change modes that 
will work. There won't be any benefit to connected mode though. The goal 
is just to not break.

> >> - Increases the maximum datagram MTU for opa devices to 10k
>>
>> The same spreading capability exploited by VNIC is used here to vary
>> the receive context that receives the packet.
>>
>> The patches are fully bisectable and stepwise implement the capability.
> 
> This is alot of code to send without a performance
> justification.. What is it? Is it worth while?

It avoids the scalability problem of connected mode, the number of QPs. 
Incoming packets are spread into multiple receive contexts increasing 
parallelism. The MTU is increased to allows 10K. It also reduces/removes 
the verbs TX overhead by allowing packets to be sent through the SDMA 
engines directly.

>> Gary Leshner (6):
>>        IB/hfi1: Add functions to transmit datagram ipoib packets
>>        IB/hfi1: Add the transmit side of a datagram ipoib RDMA netdev
>>        IB/hfi1: Remove module parameter for KDETH qpns
>>        IB/{rdmavt,hfi1}: Implement creation of accelerated UD QPs
>>        IB/{hfi1,ipoib,rdma}: Broadcast ping sent packets which exceeded mtu size
>>        IB/ipoib: Add capability to switch between datagram and connected mode
>>
>> Grzegorz Andrejczuk (7):
>>        IB/hfi1: RSM rules for AIP
>>        IB/hfi1: Rename num_vnic_contexts as num_netdev_contexts
>>        IB/hfi1: Add functions to receive accelerated ipoib packets
>>        IB/hfi1: Add interrupt handler functions for accelerated ipoib
>>        IB/hfi1: Add rx functions for dummy netdev
> 
> This dummy netdev thing seemed very strange

One of the existing uses of dummy netdev seems to be to tie multiple 
hardware interfaces together. We are using a similar concept for two 
software interfaces. Those being VNIC and AIP. The dummy netdev here 
will own the receiving resources which are shared.


-Denny
Jason Gunthorpe Feb. 10, 2020, 6:32 p.m. UTC | #3
On Mon, Feb 10, 2020 at 12:36:02PM -0500, Dennis Dalessandro wrote:
> On 2/10/2020 8:31 AM, Jason Gunthorpe wrote:
> > On Mon, Feb 10, 2020 at 08:18:05AM -0500, Dennis Dalessandro wrote:
> > > This patch series is an accelerated ipoib using the rdma netdev mechanism
> > > already present in ipoib. A new device capability bit,
> > > IB_DEVICE_RDMA_NETDEV_OPA, triggers ipoib to create a datagram QP using the
> > > IB_QP_CREATE_NETDEV_USE.
> > > 
> > > The highlights include:
> > > - Sharing send and receive resources with VNIC
> > > - Allows for switching between connected mode and datagram mode
> > 
> > There is still value in connected mode?
> 
> It's really a compatibility thing. If someone wants to change modes that
> will work. There won't be any benefit to connected mode though. The goal is
> just to not break.

I am a bit confused by this.. I thought the mlx5 implementation
already could select connected mode?

Why were core ipoib changes needed?

> > > The patches are fully bisectable and stepwise implement the capability.
> > 
> > This is alot of code to send without a performance
> > justification.. What is it? Is it worth while?
> 
> It avoids the scalability problem of connected mode, the number of QPs.
> Incoming packets are spread into multiple receive contexts increasing
> parallelism. The MTU is increased to allows 10K. It also reduces/removes the
> verbs TX overhead by allowing packets to be sent through the SDMA engines
> directly.

No numbers to share?

Jason
Dennis Dalessandro Feb. 11, 2020, 9:58 p.m. UTC | #4
On 2/10/2020 1:32 PM, Jason Gunthorpe wrote:
> On Mon, Feb 10, 2020 at 12:36:02PM -0500, Dennis Dalessandro wrote:
>> On 2/10/2020 8:31 AM, Jason Gunthorpe wrote:
>>> On Mon, Feb 10, 2020 at 08:18:05AM -0500, Dennis Dalessandro wrote:
>>>> This patch series is an accelerated ipoib using the rdma netdev mechanism
>>>> already present in ipoib. A new device capability bit,
>>>> IB_DEVICE_RDMA_NETDEV_OPA, triggers ipoib to create a datagram QP using the
>>>> IB_QP_CREATE_NETDEV_USE.
>>>>
>>>> The highlights include:
>>>> - Sharing send and receive resources with VNIC
>>>> - Allows for switching between connected mode and datagram mode
>>>
>>> There is still value in connected mode?
>>
>> It's really a compatibility thing. If someone wants to change modes that
>> will work. There won't be any benefit to connected mode though. The goal is
>> just to not break.
> 
> I am a bit confused by this.. I thought the mlx5 implementation
> already could select connected mode?
> 
> Why were core ipoib changes needed?

I don't think so, patch 15/16 seemed to be necessary to get connected 
mode to work with the rdma netdev.

> 
>>>> The patches are fully bisectable and stepwise implement the capability.
>>>
>>> This is alot of code to send without a performance
>>> justification.. What is it? Is it worth while?
>>
>> It avoids the scalability problem of connected mode, the number of QPs.
>> Incoming packets are spread into multiple receive contexts increasing
>> parallelism. The MTU is increased to allows 10K. It also reduces/removes the
>> verbs TX overhead by allowing packets to be sent through the SDMA engines
>> directly.
> 
> No numbers to share?

No numbers directly but I can say that AIP enables line-rate performance 
between two nodes with Datagram Mode, it also provides IPoFabric latency 
improvements relative to standard Datagram Mode without AIP.

-Denny