Message ID | 20230602121653.80017-25-yi.l.liu@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Add vfio_device cdev for iommufd support | expand |
On Fri, 2 Jun 2023 05:16:53 -0700 Yi Liu <yi.l.liu@intel.com> wrote: > This gives notes for userspace applications on device cdev usage. > > Reviewed-by: Kevin Tian <kevin.tian@intel.com> > Signed-off-by: Yi Liu <yi.l.liu@intel.com> > --- > Documentation/driver-api/vfio.rst | 132 ++++++++++++++++++++++++++++++ > 1 file changed, 132 insertions(+) > > diff --git a/Documentation/driver-api/vfio.rst b/Documentation/driver-api/vfio.rst > index 363e12c90b87..f00c9b86bda0 100644 > --- a/Documentation/driver-api/vfio.rst > +++ b/Documentation/driver-api/vfio.rst > @@ -239,6 +239,130 @@ group and can access them as follows:: > /* Gratuitous device reset and go... */ > ioctl(device, VFIO_DEVICE_RESET); > > +IOMMUFD and vfio_iommu_type1 > +---------------------------- > + > +IOMMUFD is the new user API to manage I/O page tables from userspace. > +It intends to be the portal of delivering advanced userspace DMA > +features (nested translation [5]_, PASID [6]_, etc.) while also providing > +a backwards compatibility interface for existing VFIO_TYPE1v2_IOMMU use > +cases. Eventually the vfio_iommu_type1 driver, as well as the legacy > +vfio container and group model is intended to be deprecated. > + > +The IOMMUFD backwards compatibility interface can be enabled two ways. > +In the first method, the kernel can be configured with > +CONFIG_IOMMUFD_VFIO_CONTAINER, in which case the IOMMUFD subsystem > +transparently provides the entire infrastructure for the VFIO > +container and IOMMU backend interfaces. The compatibility mode can > +also be accessed if the VFIO container interface, ie. /dev/vfio/vfio is > +simply symlink'd to /dev/iommu. Note that at the time of writing, the > +compatibility mode is not entirely feature complete relative to > +VFIO_TYPE1v2_IOMMU (ex. DMA mapping MMIO) and does not attempt to > +provide compatibility to the VFIO_SPAPR_TCE_IOMMU interface. Therefore > +it is not generally advisable at this time to switch from native VFIO > +implementations to the IOMMUFD compatibility interfaces. > + > +Long term, VFIO users should migrate to device access through the cdev > +interface described below, and native access through the IOMMUFD > +provided interfaces. > + > +VFIO Device cdev > +---------------- > + > +Traditionally user acquires a device fd via VFIO_GROUP_GET_DEVICE_FD > +in a VFIO group. > + > +With CONFIG_VFIO_DEVICE_CDEV=y the user can now acquire a device fd > +by directly opening a character device /dev/vfio/devices/vfioX where > +"X" is the number allocated uniquely by VFIO for registered devices. > +cdev interface does not support noiommu, so user should use the legacy > +group interface if noiommu is needed. > + > +The cdev only works with IOMMUFD. Both VFIO drivers and applications > +must adapt to the new cdev security model which requires using > +VFIO_DEVICE_BIND_IOMMUFD to claim DMA ownership before starting to > +actually use the device. Once BIND succeeds then a VFIO device can > +be fully accessed by the user. > + > +VFIO device cdev doesn't rely on VFIO group/container/iommu drivers. > +Hence those modules can be fully compiled out in an environment > +where no legacy VFIO application exists. > + > +So far SPAPR does not support IOMMUFD yet. So it cannot support device > +cdev neither. s/neither/either/ Unless I missed it, we've not described that vfio device cdev access is still bound by IOMMU group semantics, ie. there can be one DMA owner for the group. That's a pretty common failure point for multi-function consumer device use cases, so the why, where, and how it fails should be well covered. In general there's been a lot of cross collaboration to get the series this far. I see an abundance of Tested-by, but unfortunately not a lot of Reviewed-by beyond about the first 1/3rd of the series. Thanks, Alex > + > +Device cdev Example > +------------------- > + > +Assume user wants to access PCI device 0000:6a:01.0:: > + > + $ ls /sys/bus/pci/devices/0000:6a:01.0/vfio-dev/ > + vfio0 > + > +This device is therefore represented as vfio0. The user can verify > +its existence:: > + > + $ ls -l /dev/vfio/devices/vfio0 > + crw------- 1 root root 511, 0 Feb 16 01:22 /dev/vfio/devices/vfio0 > + $ cat /sys/bus/pci/devices/0000:6a:01.0/vfio-dev/vfio0/dev > + 511:0 > + $ ls -l /dev/char/511\:0 > + lrwxrwxrwx 1 root root 21 Feb 16 01:22 /dev/char/511:0 -> ../vfio/devices/vfio0 > + > +Then provide the user with access to the device if unprivileged > +operation is desired:: > + > + $ chown user:user /dev/vfio/devices/vfio0 > + > +Finally the user could get cdev fd by:: > + > + cdev_fd = open("/dev/vfio/devices/vfio0", O_RDWR); > + > +An opened cdev_fd doesn't give the user any permission of accessing > +the device except binding the cdev_fd to an iommufd. After that point > +then the device is fully accessible including attaching it to an > +IOMMUFD IOAS/HWPT to enable userspace DMA:: > + > + struct vfio_device_bind_iommufd bind = { > + .argsz = sizeof(bind), > + .flags = 0, > + }; > + struct iommu_ioas_alloc alloc_data = { > + .size = sizeof(alloc_data), > + .flags = 0, > + }; > + struct vfio_device_attach_iommufd_pt attach_data = { > + .argsz = sizeof(attach_data), > + .flags = 0, > + }; > + struct iommu_ioas_map map = { > + .size = sizeof(map), > + .flags = IOMMU_IOAS_MAP_READABLE | > + IOMMU_IOAS_MAP_WRITEABLE | > + IOMMU_IOAS_MAP_FIXED_IOVA, > + .__reserved = 0, > + }; > + > + iommufd = open("/dev/iommu", O_RDWR); > + > + bind.iommufd = iommufd; > + ioctl(cdev_fd, VFIO_DEVICE_BIND_IOMMUFD, &bind); > + > + ioctl(iommufd, IOMMU_IOAS_ALLOC, &alloc_data); > + attach_data.pt_id = alloc_data.out_ioas_id; > + ioctl(cdev_fd, VFIO_DEVICE_ATTACH_IOMMUFD_PT, &attach_data); > + > + /* Allocate some space and setup a DMA mapping */ > + map.user_va = (int64_t)mmap(0, 1024 * 1024, PROT_READ | PROT_WRITE, > + MAP_PRIVATE | MAP_ANONYMOUS, 0, 0); > + map.iova = 0; /* 1MB starting at 0x0 from device view */ > + map.length = 1024 * 1024; > + map.ioas_id = alloc_data.out_ioas_id;; > + > + ioctl(iommufd, IOMMU_IOAS_MAP, &map); > + > + /* Other device operations as stated in "VFIO Usage Example" */ > + > VFIO User API > ------------------------------------------------------------------------------- > > @@ -566,3 +690,11 @@ This implementation has some specifics: > \-0d.1 > > 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90) > + > +.. [5] Nested translation is an IOMMU feature which supports two stage > + address translations. This improves the address translation efficiency > + in IOMMU virtualization. > + > +.. [6] PASID stands for Process Address Space ID, introduced by PCI > + Express. It is a prerequisite for Shared Virtual Addressing (SVA) > + and Scalable I/O Virtualization (Scalable IOV).
> From: Alex Williamson <alex.williamson@redhat.com> > Sent: Tuesday, June 13, 2023 7:06 AM > > On Fri, 2 Jun 2023 05:16:53 -0700 > Yi Liu <yi.l.liu@intel.com> wrote: > > > This gives notes for userspace applications on device cdev usage. > > > > Reviewed-by: Kevin Tian <kevin.tian@intel.com> > > Signed-off-by: Yi Liu <yi.l.liu@intel.com> > > --- > > Documentation/driver-api/vfio.rst | 132 ++++++++++++++++++++++++++++++ > > 1 file changed, 132 insertions(+) > > > > diff --git a/Documentation/driver-api/vfio.rst b/Documentation/driver-api/vfio.rst > > index 363e12c90b87..f00c9b86bda0 100644 > > --- a/Documentation/driver-api/vfio.rst > > +++ b/Documentation/driver-api/vfio.rst > > @@ -239,6 +239,130 @@ group and can access them as follows:: > > /* Gratuitous device reset and go... */ > > ioctl(device, VFIO_DEVICE_RESET); > > > > +IOMMUFD and vfio_iommu_type1 > > +---------------------------- > > + > > +IOMMUFD is the new user API to manage I/O page tables from userspace. > > +It intends to be the portal of delivering advanced userspace DMA > > +features (nested translation [5]_, PASID [6]_, etc.) while also providing > > +a backwards compatibility interface for existing VFIO_TYPE1v2_IOMMU use > > +cases. Eventually the vfio_iommu_type1 driver, as well as the legacy > > +vfio container and group model is intended to be deprecated. > > + > > +The IOMMUFD backwards compatibility interface can be enabled two ways. > > +In the first method, the kernel can be configured with > > +CONFIG_IOMMUFD_VFIO_CONTAINER, in which case the IOMMUFD subsystem > > +transparently provides the entire infrastructure for the VFIO > > +container and IOMMU backend interfaces. The compatibility mode can > > +also be accessed if the VFIO container interface, ie. /dev/vfio/vfio is > > +simply symlink'd to /dev/iommu. Note that at the time of writing, the > > +compatibility mode is not entirely feature complete relative to > > +VFIO_TYPE1v2_IOMMU (ex. DMA mapping MMIO) and does not attempt to > > +provide compatibility to the VFIO_SPAPR_TCE_IOMMU interface. Therefore > > +it is not generally advisable at this time to switch from native VFIO > > +implementations to the IOMMUFD compatibility interfaces. > > + > > +Long term, VFIO users should migrate to device access through the cdev > > +interface described below, and native access through the IOMMUFD > > +provided interfaces. > > + > > +VFIO Device cdev > > +---------------- > > + > > +Traditionally user acquires a device fd via VFIO_GROUP_GET_DEVICE_FD > > +in a VFIO group. > > + > > +With CONFIG_VFIO_DEVICE_CDEV=y the user can now acquire a device fd > > +by directly opening a character device /dev/vfio/devices/vfioX where > > +"X" is the number allocated uniquely by VFIO for registered devices. > > +cdev interface does not support noiommu, so user should use the legacy > > +group interface if noiommu is needed. > > + > > +The cdev only works with IOMMUFD. Both VFIO drivers and applications > > +must adapt to the new cdev security model which requires using > > +VFIO_DEVICE_BIND_IOMMUFD to claim DMA ownership before starting to > > +actually use the device. Once BIND succeeds then a VFIO device can > > +be fully accessed by the user. > > + > > +VFIO device cdev doesn't rely on VFIO group/container/iommu drivers. > > +Hence those modules can be fully compiled out in an environment > > +where no legacy VFIO application exists. > > + > > +So far SPAPR does not support IOMMUFD yet. So it cannot support device > > +cdev neither. > > s/neither/either/ Got it. > > Unless I missed it, we've not described that vfio device cdev access is > still bound by IOMMU group semantics, ie. there can be one DMA owner > for the group. That's a pretty common failure point for multi-function > consumer device use cases, so the why, where, and how it fails should > be well covered. Yes. this needs to be documented. How about below words: vfio device cdev access is still bound by IOMMU group semantics, ie. there can be only one DMA owner for the group. Devices belonging to the same group can not be bound to multiple iommufd_ctx. The users that try to bind such device to different iommufd shall be failed in VFIO_DEVICE_BIND_IOMMUFD which is the start point to get full access for the device. > > In general there's been a lot of cross collaboration to get the series > this far. I see an abundance of Tested-by, but unfortunately not a lot > of Reviewed-by beyond about the first 1/3rd of the series. Thanks, Yeah. The rest 2/3rd part has back and forth changes since v8. Regards, Yi Liu > Alex > > > + > > +Device cdev Example > > +------------------- > > + > > +Assume user wants to access PCI device 0000:6a:01.0:: > > + > > + $ ls /sys/bus/pci/devices/0000:6a:01.0/vfio-dev/ > > + vfio0 > > + > > +This device is therefore represented as vfio0. The user can verify > > +its existence:: > > + > > + $ ls -l /dev/vfio/devices/vfio0 > > + crw------- 1 root root 511, 0 Feb 16 01:22 /dev/vfio/devices/vfio0 > > + $ cat /sys/bus/pci/devices/0000:6a:01.0/vfio-dev/vfio0/dev > > + 511:0 > > + $ ls -l /dev/char/511\:0 > > + lrwxrwxrwx 1 root root 21 Feb 16 01:22 /dev/char/511:0 -> ../vfio/devices/vfio0 > > + > > +Then provide the user with access to the device if unprivileged > > +operation is desired:: > > + > > + $ chown user:user /dev/vfio/devices/vfio0 > > + > > +Finally the user could get cdev fd by:: > > + > > + cdev_fd = open("/dev/vfio/devices/vfio0", O_RDWR); > > + > > +An opened cdev_fd doesn't give the user any permission of accessing > > +the device except binding the cdev_fd to an iommufd. After that point > > +then the device is fully accessible including attaching it to an > > +IOMMUFD IOAS/HWPT to enable userspace DMA:: > > + > > + struct vfio_device_bind_iommufd bind = { > > + .argsz = sizeof(bind), > > + .flags = 0, > > + }; > > + struct iommu_ioas_alloc alloc_data = { > > + .size = sizeof(alloc_data), > > + .flags = 0, > > + }; > > + struct vfio_device_attach_iommufd_pt attach_data = { > > + .argsz = sizeof(attach_data), > > + .flags = 0, > > + }; > > + struct iommu_ioas_map map = { > > + .size = sizeof(map), > > + .flags = IOMMU_IOAS_MAP_READABLE | > > + IOMMU_IOAS_MAP_WRITEABLE | > > + IOMMU_IOAS_MAP_FIXED_IOVA, > > + .__reserved = 0, > > + }; > > + > > + iommufd = open("/dev/iommu", O_RDWR); > > + > > + bind.iommufd = iommufd; > > + ioctl(cdev_fd, VFIO_DEVICE_BIND_IOMMUFD, &bind); > > + > > + ioctl(iommufd, IOMMU_IOAS_ALLOC, &alloc_data); > > + attach_data.pt_id = alloc_data.out_ioas_id; > > + ioctl(cdev_fd, VFIO_DEVICE_ATTACH_IOMMUFD_PT, &attach_data); > > + > > + /* Allocate some space and setup a DMA mapping */ > > + map.user_va = (int64_t)mmap(0, 1024 * 1024, PROT_READ | PROT_WRITE, > > + MAP_PRIVATE | MAP_ANONYMOUS, 0, 0); > > + map.iova = 0; /* 1MB starting at 0x0 from device view */ > > + map.length = 1024 * 1024; > > + map.ioas_id = alloc_data.out_ioas_id;; > > + > > + ioctl(iommufd, IOMMU_IOAS_MAP, &map); > > + > > + /* Other device operations as stated in "VFIO Usage Example" */ > > + > > VFIO User API > > ------------------------------------------------------------------------------- > > > > @@ -566,3 +690,11 @@ This implementation has some specifics: > > \-0d.1 > > > > 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90) > > + > > +.. [5] Nested translation is an IOMMU feature which supports two stage > > + address translations. This improves the address translation efficiency > > + in IOMMU virtualization. > > + > > +.. [6] PASID stands for Process Address Space ID, introduced by PCI > > + Express. It is a prerequisite for Shared Virtual Addressing (SVA) > > + and Scalable I/O Virtualization (Scalable IOV).
On Tue, 13 Jun 2023 12:01:51 +0000 "Liu, Yi L" <yi.l.liu@intel.com> wrote: > > From: Alex Williamson <alex.williamson@redhat.com> > > Sent: Tuesday, June 13, 2023 7:06 AM > > > > On Fri, 2 Jun 2023 05:16:53 -0700 > > Yi Liu <yi.l.liu@intel.com> wrote: > > > > > This gives notes for userspace applications on device cdev usage. > > > > > > Reviewed-by: Kevin Tian <kevin.tian@intel.com> > > > Signed-off-by: Yi Liu <yi.l.liu@intel.com> > > > --- > > > Documentation/driver-api/vfio.rst | 132 ++++++++++++++++++++++++++++++ > > > 1 file changed, 132 insertions(+) > > > > > > diff --git a/Documentation/driver-api/vfio.rst b/Documentation/driver-api/vfio.rst > > > index 363e12c90b87..f00c9b86bda0 100644 > > > --- a/Documentation/driver-api/vfio.rst > > > +++ b/Documentation/driver-api/vfio.rst > > > @@ -239,6 +239,130 @@ group and can access them as follows:: > > > /* Gratuitous device reset and go... */ > > > ioctl(device, VFIO_DEVICE_RESET); > > > > > > +IOMMUFD and vfio_iommu_type1 > > > +---------------------------- > > > + > > > +IOMMUFD is the new user API to manage I/O page tables from userspace. > > > +It intends to be the portal of delivering advanced userspace DMA > > > +features (nested translation [5]_, PASID [6]_, etc.) while also providing > > > +a backwards compatibility interface for existing VFIO_TYPE1v2_IOMMU use > > > +cases. Eventually the vfio_iommu_type1 driver, as well as the legacy > > > +vfio container and group model is intended to be deprecated. > > > + > > > +The IOMMUFD backwards compatibility interface can be enabled two ways. > > > +In the first method, the kernel can be configured with > > > +CONFIG_IOMMUFD_VFIO_CONTAINER, in which case the IOMMUFD subsystem > > > +transparently provides the entire infrastructure for the VFIO > > > +container and IOMMU backend interfaces. The compatibility mode can > > > +also be accessed if the VFIO container interface, ie. /dev/vfio/vfio is > > > +simply symlink'd to /dev/iommu. Note that at the time of writing, the > > > +compatibility mode is not entirely feature complete relative to > > > +VFIO_TYPE1v2_IOMMU (ex. DMA mapping MMIO) and does not attempt to > > > +provide compatibility to the VFIO_SPAPR_TCE_IOMMU interface. Therefore > > > +it is not generally advisable at this time to switch from native VFIO > > > +implementations to the IOMMUFD compatibility interfaces. > > > + > > > +Long term, VFIO users should migrate to device access through the cdev > > > +interface described below, and native access through the IOMMUFD > > > +provided interfaces. > > > + > > > +VFIO Device cdev > > > +---------------- > > > + > > > +Traditionally user acquires a device fd via VFIO_GROUP_GET_DEVICE_FD > > > +in a VFIO group. > > > + > > > +With CONFIG_VFIO_DEVICE_CDEV=y the user can now acquire a device fd > > > +by directly opening a character device /dev/vfio/devices/vfioX where > > > +"X" is the number allocated uniquely by VFIO for registered devices. > > > +cdev interface does not support noiommu, so user should use the legacy > > > +group interface if noiommu is needed. > > > + > > > +The cdev only works with IOMMUFD. Both VFIO drivers and applications > > > +must adapt to the new cdev security model which requires using > > > +VFIO_DEVICE_BIND_IOMMUFD to claim DMA ownership before starting to > > > +actually use the device. Once BIND succeeds then a VFIO device can > > > +be fully accessed by the user. > > > + > > > +VFIO device cdev doesn't rely on VFIO group/container/iommu drivers. > > > +Hence those modules can be fully compiled out in an environment > > > +where no legacy VFIO application exists. > > > + > > > +So far SPAPR does not support IOMMUFD yet. So it cannot support device > > > +cdev neither. > > > > s/neither/either/ > > Got it. > > > > > Unless I missed it, we've not described that vfio device cdev access is > > still bound by IOMMU group semantics, ie. there can be one DMA owner > > for the group. That's a pretty common failure point for multi-function > > consumer device use cases, so the why, where, and how it fails should > > be well covered. > > Yes. this needs to be documented. How about below words: > > vfio device cdev access is still bound by IOMMU group semantics, ie. there > can be only one DMA owner for the group. Devices belonging to the same > group can not be bound to multiple iommufd_ctx. ... or shared between native kernel and vfio drivers. > The users that try to bind > such device to different iommufd shall be failed in VFIO_DEVICE_BIND_IOMMUFD > which is the start point to get full access for the device. "A violation of this ownership requirement will fail at the VFIO_DEVICE_BIND_IOMMUFD ioctl, which gates full device access." Thanks, Alex
> From: Alex Williamson <alex.williamson@redhat.com> > Sent: Tuesday, June 13, 2023 10:24 PM > > On Tue, 13 Jun 2023 12:01:51 +0000 > "Liu, Yi L" <yi.l.liu@intel.com> wrote: > > > > From: Alex Williamson <alex.williamson@redhat.com> > > > Sent: Tuesday, June 13, 2023 7:06 AM > > > > > > On Fri, 2 Jun 2023 05:16:53 -0700 > > > Yi Liu <yi.l.liu@intel.com> wrote: > > > > > > > This gives notes for userspace applications on device cdev usage. > > > > > > > > Reviewed-by: Kevin Tian <kevin.tian@intel.com> > > > > Signed-off-by: Yi Liu <yi.l.liu@intel.com> > > > > --- > > > > Documentation/driver-api/vfio.rst | 132 ++++++++++++++++++++++++++++++ > > > > 1 file changed, 132 insertions(+) > > > > > > > > diff --git a/Documentation/driver-api/vfio.rst b/Documentation/driver-api/vfio.rst > > > > index 363e12c90b87..f00c9b86bda0 100644 > > > > --- a/Documentation/driver-api/vfio.rst > > > > +++ b/Documentation/driver-api/vfio.rst > > > > @@ -239,6 +239,130 @@ group and can access them as follows:: > > > > /* Gratuitous device reset and go... */ > > > > ioctl(device, VFIO_DEVICE_RESET); > > > > > > > > +IOMMUFD and vfio_iommu_type1 > > > > +---------------------------- > > > > + > > > > +IOMMUFD is the new user API to manage I/O page tables from userspace. > > > > +It intends to be the portal of delivering advanced userspace DMA > > > > +features (nested translation [5]_, PASID [6]_, etc.) while also providing > > > > +a backwards compatibility interface for existing VFIO_TYPE1v2_IOMMU use > > > > +cases. Eventually the vfio_iommu_type1 driver, as well as the legacy > > > > +vfio container and group model is intended to be deprecated. > > > > + > > > > +The IOMMUFD backwards compatibility interface can be enabled two ways. > > > > +In the first method, the kernel can be configured with > > > > +CONFIG_IOMMUFD_VFIO_CONTAINER, in which case the IOMMUFD subsystem > > > > +transparently provides the entire infrastructure for the VFIO > > > > +container and IOMMU backend interfaces. The compatibility mode can > > > > +also be accessed if the VFIO container interface, ie. /dev/vfio/vfio is > > > > +simply symlink'd to /dev/iommu. Note that at the time of writing, the > > > > +compatibility mode is not entirely feature complete relative to > > > > +VFIO_TYPE1v2_IOMMU (ex. DMA mapping MMIO) and does not attempt to > > > > +provide compatibility to the VFIO_SPAPR_TCE_IOMMU interface. Therefore > > > > +it is not generally advisable at this time to switch from native VFIO > > > > +implementations to the IOMMUFD compatibility interfaces. > > > > + > > > > +Long term, VFIO users should migrate to device access through the cdev > > > > +interface described below, and native access through the IOMMUFD > > > > +provided interfaces. > > > > + > > > > +VFIO Device cdev > > > > +---------------- > > > > + > > > > +Traditionally user acquires a device fd via VFIO_GROUP_GET_DEVICE_FD > > > > +in a VFIO group. > > > > + > > > > +With CONFIG_VFIO_DEVICE_CDEV=y the user can now acquire a device fd > > > > +by directly opening a character device /dev/vfio/devices/vfioX where > > > > +"X" is the number allocated uniquely by VFIO for registered devices. > > > > +cdev interface does not support noiommu, so user should use the legacy > > > > +group interface if noiommu is needed. > > > > + > > > > +The cdev only works with IOMMUFD. Both VFIO drivers and applications > > > > +must adapt to the new cdev security model which requires using > > > > +VFIO_DEVICE_BIND_IOMMUFD to claim DMA ownership before starting to > > > > +actually use the device. Once BIND succeeds then a VFIO device can > > > > +be fully accessed by the user. > > > > + > > > > +VFIO device cdev doesn't rely on VFIO group/container/iommu drivers. > > > > +Hence those modules can be fully compiled out in an environment > > > > +where no legacy VFIO application exists. > > > > + > > > > +So far SPAPR does not support IOMMUFD yet. So it cannot support device > > > > +cdev neither. > > > > > > s/neither/either/ > > > > Got it. > > > > > > > > Unless I missed it, we've not described that vfio device cdev access is > > > still bound by IOMMU group semantics, ie. there can be one DMA owner > > > for the group. That's a pretty common failure point for multi-function > > > consumer device use cases, so the why, where, and how it fails should > > > be well covered. > > > > Yes. this needs to be documented. How about below words: > > > > vfio device cdev access is still bound by IOMMU group semantics, ie. there > > can be only one DMA owner for the group. Devices belonging to the same > > group can not be bound to multiple iommufd_ctx. > > ... or shared between native kernel and vfio drivers. I suppose you mean the devices in one group are bound to different drivers. right? > > > The users that try to bind > > such device to different iommufd shall be failed in VFIO_DEVICE_BIND_IOMMUFD > > which is the start point to get full access for the device. > > "A violation of this ownership requirement will fail at the > VFIO_DEVICE_BIND_IOMMUFD ioctl, which gates full device access." Got it. Regards, Yi Liu
On Tue, 13 Jun 2023 14:48:02 +0000 "Liu, Yi L" <yi.l.liu@intel.com> wrote: > > From: Alex Williamson <alex.williamson@redhat.com> > > Sent: Tuesday, June 13, 2023 10:24 PM > > > > On Tue, 13 Jun 2023 12:01:51 +0000 > > "Liu, Yi L" <yi.l.liu@intel.com> wrote: > > > > > > From: Alex Williamson <alex.williamson@redhat.com> > > > > Sent: Tuesday, June 13, 2023 7:06 AM > > > > > > > > On Fri, 2 Jun 2023 05:16:53 -0700 > > > > Yi Liu <yi.l.liu@intel.com> wrote: > > > > > > > > > This gives notes for userspace applications on device cdev usage. > > > > > > > > > > Reviewed-by: Kevin Tian <kevin.tian@intel.com> > > > > > Signed-off-by: Yi Liu <yi.l.liu@intel.com> > > > > > --- > > > > > Documentation/driver-api/vfio.rst | 132 ++++++++++++++++++++++++++++++ > > > > > 1 file changed, 132 insertions(+) > > > > > > > > > > diff --git a/Documentation/driver-api/vfio.rst b/Documentation/driver-api/vfio.rst > > > > > index 363e12c90b87..f00c9b86bda0 100644 > > > > > --- a/Documentation/driver-api/vfio.rst > > > > > +++ b/Documentation/driver-api/vfio.rst > > > > > @@ -239,6 +239,130 @@ group and can access them as follows:: > > > > > /* Gratuitous device reset and go... */ > > > > > ioctl(device, VFIO_DEVICE_RESET); > > > > > > > > > > +IOMMUFD and vfio_iommu_type1 > > > > > +---------------------------- > > > > > + > > > > > +IOMMUFD is the new user API to manage I/O page tables from userspace. > > > > > +It intends to be the portal of delivering advanced userspace DMA > > > > > +features (nested translation [5]_, PASID [6]_, etc.) while also providing > > > > > +a backwards compatibility interface for existing VFIO_TYPE1v2_IOMMU use > > > > > +cases. Eventually the vfio_iommu_type1 driver, as well as the legacy > > > > > +vfio container and group model is intended to be deprecated. > > > > > + > > > > > +The IOMMUFD backwards compatibility interface can be enabled two ways. > > > > > +In the first method, the kernel can be configured with > > > > > +CONFIG_IOMMUFD_VFIO_CONTAINER, in which case the IOMMUFD subsystem > > > > > +transparently provides the entire infrastructure for the VFIO > > > > > +container and IOMMU backend interfaces. The compatibility mode can > > > > > +also be accessed if the VFIO container interface, ie. /dev/vfio/vfio is > > > > > +simply symlink'd to /dev/iommu. Note that at the time of writing, the > > > > > +compatibility mode is not entirely feature complete relative to > > > > > +VFIO_TYPE1v2_IOMMU (ex. DMA mapping MMIO) and does not attempt to > > > > > +provide compatibility to the VFIO_SPAPR_TCE_IOMMU interface. Therefore > > > > > +it is not generally advisable at this time to switch from native VFIO > > > > > +implementations to the IOMMUFD compatibility interfaces. > > > > > + > > > > > +Long term, VFIO users should migrate to device access through the cdev > > > > > +interface described below, and native access through the IOMMUFD > > > > > +provided interfaces. > > > > > + > > > > > +VFIO Device cdev > > > > > +---------------- > > > > > + > > > > > +Traditionally user acquires a device fd via VFIO_GROUP_GET_DEVICE_FD > > > > > +in a VFIO group. > > > > > + > > > > > +With CONFIG_VFIO_DEVICE_CDEV=y the user can now acquire a device fd > > > > > +by directly opening a character device /dev/vfio/devices/vfioX where > > > > > +"X" is the number allocated uniquely by VFIO for registered devices. > > > > > +cdev interface does not support noiommu, so user should use the legacy > > > > > +group interface if noiommu is needed. > > > > > + > > > > > +The cdev only works with IOMMUFD. Both VFIO drivers and applications > > > > > +must adapt to the new cdev security model which requires using > > > > > +VFIO_DEVICE_BIND_IOMMUFD to claim DMA ownership before starting to > > > > > +actually use the device. Once BIND succeeds then a VFIO device can > > > > > +be fully accessed by the user. > > > > > + > > > > > +VFIO device cdev doesn't rely on VFIO group/container/iommu drivers. > > > > > +Hence those modules can be fully compiled out in an environment > > > > > +where no legacy VFIO application exists. > > > > > + > > > > > +So far SPAPR does not support IOMMUFD yet. So it cannot support device > > > > > +cdev neither. > > > > > > > > s/neither/either/ > > > > > > Got it. > > > > > > > > > > > Unless I missed it, we've not described that vfio device cdev access is > > > > still bound by IOMMU group semantics, ie. there can be one DMA owner > > > > for the group. That's a pretty common failure point for multi-function > > > > consumer device use cases, so the why, where, and how it fails should > > > > be well covered. > > > > > > Yes. this needs to be documented. How about below words: > > > > > > vfio device cdev access is still bound by IOMMU group semantics, ie. there > > > can be only one DMA owner for the group. Devices belonging to the same > > > group can not be bound to multiple iommufd_ctx. > > > > ... or shared between native kernel and vfio drivers. > > I suppose you mean the devices in one group are bound to different > drivers. right? Essentially, but we need to be careful that we're developing multiple vfio drivers for a given bus now, which is why I try to distinguish between the two sets of drivers. Thanks, Alex > > > The users that try to bind > > > such device to different iommufd shall be failed in VFIO_DEVICE_BIND_IOMMUFD > > > which is the start point to get full access for the device. > > > > "A violation of this ownership requirement will fail at the > > VFIO_DEVICE_BIND_IOMMUFD ioctl, which gates full device access." > > Got it. > > Regards, > Yi Liu >
> From: Alex Williamson <alex.williamson@redhat.com> > Sent: Tuesday, June 13, 2023 11:04 PM > > > > > > > > > > > > > > > Unless I missed it, we've not described that vfio device cdev access is > > > > > still bound by IOMMU group semantics, ie. there can be one DMA owner > > > > > for the group. That's a pretty common failure point for multi-function > > > > > consumer device use cases, so the why, where, and how it fails should > > > > > be well covered. > > > > > > > > Yes. this needs to be documented. How about below words: > > > > > > > > vfio device cdev access is still bound by IOMMU group semantics, ie. there > > > > can be only one DMA owner for the group. Devices belonging to the same > > > > group can not be bound to multiple iommufd_ctx. > > > > > > ... or shared between native kernel and vfio drivers. > > > > I suppose you mean the devices in one group are bound to different > > drivers. right? > > Essentially, but we need to be careful that we're developing multiple > vfio drivers for a given bus now, which is why I try to distinguish > between the two sets of drivers. Thanks, Indeed. There are a set of vfio drivers. Even pci-stub can be considered in this set? Perhaps, it is more precise to say : or shared between drivers that set the struct pci_driver::driver_managed_dma flag and the drivers that do not. Regards, Yi Liu
On Tue, 13 Jun 2023 15:11:06 +0000 "Liu, Yi L" <yi.l.liu@intel.com> wrote: > > From: Alex Williamson <alex.williamson@redhat.com> > > Sent: Tuesday, June 13, 2023 11:04 PM > > > > > > > > > > > > > > > > > > > Unless I missed it, we've not described that vfio device cdev access is > > > > > > still bound by IOMMU group semantics, ie. there can be one DMA owner > > > > > > for the group. That's a pretty common failure point for multi-function > > > > > > consumer device use cases, so the why, where, and how it fails should > > > > > > be well covered. > > > > > > > > > > Yes. this needs to be documented. How about below words: > > > > > > > > > > vfio device cdev access is still bound by IOMMU group semantics, ie. there > > > > > can be only one DMA owner for the group. Devices belonging to the same > > > > > group can not be bound to multiple iommufd_ctx. > > > > > > > > ... or shared between native kernel and vfio drivers. > > > > > > I suppose you mean the devices in one group are bound to different > > > drivers. right? > > > > Essentially, but we need to be careful that we're developing multiple > > vfio drivers for a given bus now, which is why I try to distinguish > > between the two sets of drivers. Thanks, > > Indeed. There are a set of vfio drivers. Even pci-stub can be considered > in this set? Perhaps, it is more precise to say : or shared between drivers > that set the struct pci_driver::driver_managed_dma flag and the drivers > that do not. Yeah, I wish there was a less technical way to describe this. This is essentially why we have the VIABLE flag on VFIO_GROUP_GET_STATUS in the legacy interface, which is what QEMU uses to generate the warning specific to binding all devices to vfio bus drivers. Technically there are some exceptions, like pci-stub or "no driver" that can be used to prevent direct access to devices within the group, but except for that narrow use case a vfio driver is generally recommended, and is currently required for certain things like the dev_set test during hot-reset. If we want to be accurate without being too pedantic, perhaps it would be something like "vfio bus driver or other driver supporting the driver_manged_dma flag". Note the flag is supported for several drivers other than pci_driver. Thanks, Alex
diff --git a/Documentation/driver-api/vfio.rst b/Documentation/driver-api/vfio.rst index 363e12c90b87..f00c9b86bda0 100644 --- a/Documentation/driver-api/vfio.rst +++ b/Documentation/driver-api/vfio.rst @@ -239,6 +239,130 @@ group and can access them as follows:: /* Gratuitous device reset and go... */ ioctl(device, VFIO_DEVICE_RESET); +IOMMUFD and vfio_iommu_type1 +---------------------------- + +IOMMUFD is the new user API to manage I/O page tables from userspace. +It intends to be the portal of delivering advanced userspace DMA +features (nested translation [5]_, PASID [6]_, etc.) while also providing +a backwards compatibility interface for existing VFIO_TYPE1v2_IOMMU use +cases. Eventually the vfio_iommu_type1 driver, as well as the legacy +vfio container and group model is intended to be deprecated. + +The IOMMUFD backwards compatibility interface can be enabled two ways. +In the first method, the kernel can be configured with +CONFIG_IOMMUFD_VFIO_CONTAINER, in which case the IOMMUFD subsystem +transparently provides the entire infrastructure for the VFIO +container and IOMMU backend interfaces. The compatibility mode can +also be accessed if the VFIO container interface, ie. /dev/vfio/vfio is +simply symlink'd to /dev/iommu. Note that at the time of writing, the +compatibility mode is not entirely feature complete relative to +VFIO_TYPE1v2_IOMMU (ex. DMA mapping MMIO) and does not attempt to +provide compatibility to the VFIO_SPAPR_TCE_IOMMU interface. Therefore +it is not generally advisable at this time to switch from native VFIO +implementations to the IOMMUFD compatibility interfaces. + +Long term, VFIO users should migrate to device access through the cdev +interface described below, and native access through the IOMMUFD +provided interfaces. + +VFIO Device cdev +---------------- + +Traditionally user acquires a device fd via VFIO_GROUP_GET_DEVICE_FD +in a VFIO group. + +With CONFIG_VFIO_DEVICE_CDEV=y the user can now acquire a device fd +by directly opening a character device /dev/vfio/devices/vfioX where +"X" is the number allocated uniquely by VFIO for registered devices. +cdev interface does not support noiommu, so user should use the legacy +group interface if noiommu is needed. + +The cdev only works with IOMMUFD. Both VFIO drivers and applications +must adapt to the new cdev security model which requires using +VFIO_DEVICE_BIND_IOMMUFD to claim DMA ownership before starting to +actually use the device. Once BIND succeeds then a VFIO device can +be fully accessed by the user. + +VFIO device cdev doesn't rely on VFIO group/container/iommu drivers. +Hence those modules can be fully compiled out in an environment +where no legacy VFIO application exists. + +So far SPAPR does not support IOMMUFD yet. So it cannot support device +cdev neither. + +Device cdev Example +------------------- + +Assume user wants to access PCI device 0000:6a:01.0:: + + $ ls /sys/bus/pci/devices/0000:6a:01.0/vfio-dev/ + vfio0 + +This device is therefore represented as vfio0. The user can verify +its existence:: + + $ ls -l /dev/vfio/devices/vfio0 + crw------- 1 root root 511, 0 Feb 16 01:22 /dev/vfio/devices/vfio0 + $ cat /sys/bus/pci/devices/0000:6a:01.0/vfio-dev/vfio0/dev + 511:0 + $ ls -l /dev/char/511\:0 + lrwxrwxrwx 1 root root 21 Feb 16 01:22 /dev/char/511:0 -> ../vfio/devices/vfio0 + +Then provide the user with access to the device if unprivileged +operation is desired:: + + $ chown user:user /dev/vfio/devices/vfio0 + +Finally the user could get cdev fd by:: + + cdev_fd = open("/dev/vfio/devices/vfio0", O_RDWR); + +An opened cdev_fd doesn't give the user any permission of accessing +the device except binding the cdev_fd to an iommufd. After that point +then the device is fully accessible including attaching it to an +IOMMUFD IOAS/HWPT to enable userspace DMA:: + + struct vfio_device_bind_iommufd bind = { + .argsz = sizeof(bind), + .flags = 0, + }; + struct iommu_ioas_alloc alloc_data = { + .size = sizeof(alloc_data), + .flags = 0, + }; + struct vfio_device_attach_iommufd_pt attach_data = { + .argsz = sizeof(attach_data), + .flags = 0, + }; + struct iommu_ioas_map map = { + .size = sizeof(map), + .flags = IOMMU_IOAS_MAP_READABLE | + IOMMU_IOAS_MAP_WRITEABLE | + IOMMU_IOAS_MAP_FIXED_IOVA, + .__reserved = 0, + }; + + iommufd = open("/dev/iommu", O_RDWR); + + bind.iommufd = iommufd; + ioctl(cdev_fd, VFIO_DEVICE_BIND_IOMMUFD, &bind); + + ioctl(iommufd, IOMMU_IOAS_ALLOC, &alloc_data); + attach_data.pt_id = alloc_data.out_ioas_id; + ioctl(cdev_fd, VFIO_DEVICE_ATTACH_IOMMUFD_PT, &attach_data); + + /* Allocate some space and setup a DMA mapping */ + map.user_va = (int64_t)mmap(0, 1024 * 1024, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, 0, 0); + map.iova = 0; /* 1MB starting at 0x0 from device view */ + map.length = 1024 * 1024; + map.ioas_id = alloc_data.out_ioas_id;; + + ioctl(iommufd, IOMMU_IOAS_MAP, &map); + + /* Other device operations as stated in "VFIO Usage Example" */ + VFIO User API ------------------------------------------------------------------------------- @@ -566,3 +690,11 @@ This implementation has some specifics: \-0d.1 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90) + +.. [5] Nested translation is an IOMMU feature which supports two stage + address translations. This improves the address translation efficiency + in IOMMU virtualization. + +.. [6] PASID stands for Process Address Space ID, introduced by PCI + Express. It is a prerequisite for Shared Virtual Addressing (SVA) + and Scalable I/O Virtualization (Scalable IOV).