Message ID | 20250306230203.1550314-1-nikolay@enfabrica.net (mailing list archive) |
---|---|
Headers | show |
Series | Ultra Ethernet driver introduction | expand |
On Fri, Mar 07, 2025 at 01:01:50AM +0200, Nikolay Aleksandrov wrote: > Hi all, <...> > Ultra Ethernet is a new RDMA transport. Awesome, and now please explain why new subsystem is needed when drivers/infiniband already supports at least 5 different RDMA transports (OmniPath, iWARP, Infiniband, RoCE v1 and RoCE v2). Maybe after this discussion it will be very clear that new subsystem is needed, but at least it needs to be stated clearly. An please CC RDMA maintainers to any Ultra Ethernet related discussions as it is more RDMA than Ethernet. Thanks
> From: Leon Romanovsky <leon@kernel.org> > Sent: Sunday, March 9, 2025 12:17 AM > > On Fri, Mar 07, 2025 at 01:01:50AM +0200, Nikolay Aleksandrov wrote: > > Hi all, > > <...> > > > Ultra Ethernet is a new RDMA transport. > > Awesome, and now please explain why new subsystem is needed when > drivers/infiniband already supports at least 5 different RDMA transports > (OmniPath, iWARP, Infiniband, RoCE v1 and RoCE v2). > 6th transport is drivers/infiniband/hw/efa (srd). > Maybe after this discussion it will be very clear that new subsystem is needed, > but at least it needs to be stated clearly. > > An please CC RDMA maintainers to any Ultra Ethernet related discussions as it > is more RDMA than Ethernet. > > Thanks
> -----Original Message----- > From: Parav Pandit <parav@nvidia.com> > Sent: Sunday, March 9, 2025 4:22 AM > To: Leon Romanovsky <leon@kernel.org>; Nikolay Aleksandrov > <nikolay@enfabrica.net> > Cc: netdev@vger.kernel.org; shrijeet@enfabrica.net; > alex.badea@keysight.com; eric.davis@broadcom.com; rip.sohan@amd.com; > dsahern@kernel.org; Bernard Metzler <BMT@zurich.ibm.com>; > roland@enfabrica.net; winston.liu@keysight.com; > dan.mihailescu@keysight.com; Kamal Heib <kheib@redhat.com>; > parth.v.parikh@keysight.com; Dave Miller <davem@redhat.com>; > ian.ziemba@hpe.com; andrew.tauferner@cornelisnetworks.com; > welch@hpe.com; rakhahari.bhunia@keysight.com; > kingshuk.mandal@keysight.com; linux-rdma@vger.kernel.org; > kuba@kernel.org; Paolo Abeni <pabeni@redhat.com>; Jason Gunthorpe > <jgg@nvidia.com> > Subject: [EXTERNAL] RE: [RFC PATCH 00/13] Ultra Ethernet driver > introduction > > > > > From: Leon Romanovsky <leon@kernel.org> > > Sent: Sunday, March 9, 2025 12:17 AM > > > > On Fri, Mar 07, 2025 at 01:01:50AM +0200, Nikolay Aleksandrov wrote: > > > Hi all, > > > > <...> > > > > > Ultra Ethernet is a new RDMA transport. > > > > Awesome, and now please explain why new subsystem is needed when > > drivers/infiniband already supports at least 5 different RDMA > transports > > (OmniPath, iWARP, Infiniband, RoCE v1 and RoCE v2). > > > 6th transport is drivers/infiniband/hw/efa (srd). > > > Maybe after this discussion it will be very clear that new subsystem > is needed, > > but at least it needs to be stated clearly. I am not sure if a new subsystem is what this RFC calls for, but rather a discussion about the proper integration of a new RDMA transport into the Linux kernel. Ultra Ethernet Transport is probably not just another transport up for easy integration into the current RDMA subsystem. First of all, its design does not follow the well-known RDMA verbs model inherited from InfiniBand, which has largely shaped the current structure of the RDMA subsystem. While having send, receive and completion queues (and completion counters) to steer message exchange, there is no concept of a queue pair. Endpoints can span multiple queues, can have multiple peer addresses. Communication resources sharing is controlled in a different way than within protection domains. Connections are ephemeral, created and released by the provider as needed. There are more differences. In a nutshell, the UET communication model is trimmed for extreme scalability. Its API semantics follow libfabrics, not RDMA verbs. I think Nik gave us a first still incomplete look at the UET protocol engine to help us understand some of the specifics. It's just the lower part (packet delivery). The implementation of the upper part (resource management, communication semantics, job management) may largely depend on the environment we all choose. IMO, integrating UET with the current RDMA subsystem would ask for its extension to allow exposing all of UETs intended functionality, probably starting with a more generic RDMA device model than current ib_device. The different API semantics of UET may further call for either extending verbs to cover it as well, or exposing a new non-verbs API (libfabrics), or both. Thanks, Bernard. > > > > An please CC RDMA maintainers to any Ultra Ethernet related > discussions as it > > is more RDMA than Ethernet. > > > > Thanks
On Tue, Mar 11, 2025 at 02:20:07PM +0000, Bernard Metzler wrote: > > > > -----Original Message----- > > From: Parav Pandit <parav@nvidia.com> > > Sent: Sunday, March 9, 2025 4:22 AM > > To: Leon Romanovsky <leon@kernel.org>; Nikolay Aleksandrov > > <nikolay@enfabrica.net> > > Cc: netdev@vger.kernel.org; shrijeet@enfabrica.net; > > alex.badea@keysight.com; eric.davis@broadcom.com; rip.sohan@amd.com; > > dsahern@kernel.org; Bernard Metzler <BMT@zurich.ibm.com>; > > roland@enfabrica.net; winston.liu@keysight.com; > > dan.mihailescu@keysight.com; Kamal Heib <kheib@redhat.com>; > > parth.v.parikh@keysight.com; Dave Miller <davem@redhat.com>; > > ian.ziemba@hpe.com; andrew.tauferner@cornelisnetworks.com; > > welch@hpe.com; rakhahari.bhunia@keysight.com; > > kingshuk.mandal@keysight.com; linux-rdma@vger.kernel.org; > > kuba@kernel.org; Paolo Abeni <pabeni@redhat.com>; Jason Gunthorpe > > <jgg@nvidia.com> > > Subject: [EXTERNAL] RE: [RFC PATCH 00/13] Ultra Ethernet driver > > introduction > > > > > > > > > From: Leon Romanovsky <leon@kernel.org> > > > Sent: Sunday, March 9, 2025 12:17 AM > > > > > > On Fri, Mar 07, 2025 at 01:01:50AM +0200, Nikolay Aleksandrov wrote: > > > > Hi all, > > > > > > <...> > > > > > > > Ultra Ethernet is a new RDMA transport. > > > > > > Awesome, and now please explain why new subsystem is needed when > > > drivers/infiniband already supports at least 5 different RDMA > > transports > > > (OmniPath, iWARP, Infiniband, RoCE v1 and RoCE v2). > > > > > 6th transport is drivers/infiniband/hw/efa (srd). > > > > > Maybe after this discussion it will be very clear that new subsystem > > is needed, > > > but at least it needs to be stated clearly. > > I am not sure if a new subsystem is what this RFC calls > for, but rather a discussion about the proper integration of > a new RDMA transport into the Linux kernel. <...> > The different API semantics of UET may further call > for either extending verbs to cover it as well, or exposing a > new non-verbs API (libfabrics), or both. So you should start from there (UAPI) by presenting the device model and how the verbs API needs to be extended, so it will be possible to evaluate how to fit that model into existing Linux kernel codebase. RDNA subsystem provides multiple type of QPs and operational models, some of them are indeed follow IB style, but not all of them (SRD, DC e.t.c). Thanks > > Thanks, > Bernard. > > > > > > > > An please CC RDMA maintainers to any Ultra Ethernet related > > discussions as it > > > is more RDMA than Ethernet. > > > > > > Thanks >
> I am not sure if a new subsystem is what this RFC calls for, but rather a > discussion about the proper integration of a new RDMA transport into the > Linux kernel. > > Ultra Ethernet Transport is probably not just another transport up for easy > integration into the current RDMA subsystem. > First of all, its design does not follow the well-known RDMA verbs model > inherited from InfiniBand, which has largely shaped the current structure of > the RDMA subsystem. While having send, receive and completion queues (and > completion counters) to steer message exchange, there is no concept of a > queue pair. Endpoints can span multiple queues, can have multiple peer > addresses. > Communication resources sharing is controlled in a different way than within > protection domains. Connections are ephemeral, created and released by the > provider as needed. There are more differences. In a nutshell, the UET > communication model is trimmed for extreme scalability. Its API semantics > follow libfabrics, not RDMA verbs. > > I think Nik gave us a first still incomplete look at the UET protocol engine to > help us understand some of the specifics. > It's just the lower part (packet delivery). The implementation of the upper part > (resource management, communication semantics, job management) may > largely depend on the environment we all choose. > > IMO, integrating UET with the current RDMA subsystem would ask for its > extension to allow exposing all of UETs intended functionality, probably > starting with a more generic RDMA device model than current ib_device. > > The different API semantics of UET may further call for either extending verbs > to cover it as well, or exposing a new non-verbs API (libfabrics), or both. Reading through the submissions, what I found lacking is a description of some higher-level plan. I don't easily see how to relate this series to NICs that may implement UET in HW. Should the PDS be viewed as a partial implementation of a SW UET 'device', similar to soft RoCE or iWarp? If so, having a description of a proposed device model seems like a necessary first step. If, instead, the PDS should be viewed more along the lines of a partial RDS-like path, then that changes the uapi. Or, am I not viewing this series as intended at all? It is almost guaranteed that there will be NICs which will support both RoCE and UET, and it's not farfetched to think that an app may use both simultaneously. IMO, a common device model is ideal, assuming exposing a device model is the intent. I agree that different transport models should not be forced together unnaturally, but I think that's solvable. In the end, the application developer is exposed to libfabric naming anyway. Besides, even a repurposed RDMA name is still better than the naming used within OpenMPI. :) - Sean