Message ID | 20230616093946.68711-23-yi.l.liu@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Add vfio_device cdev for iommufd support | expand |
On Fri, 16 Jun 2023 02:39:46 -0700 Yi Liu <yi.l.liu@intel.com> wrote: > This gives notes for userspace applications on device cdev usage. > > Reviewed-by: Kevin Tian <kevin.tian@intel.com> > Signed-off-by: Yi Liu <yi.l.liu@intel.com> > --- > Documentation/driver-api/vfio.rst | 139 ++++++++++++++++++++++++++++++ > 1 file changed, 139 insertions(+) > > diff --git a/Documentation/driver-api/vfio.rst b/Documentation/driver-api/vfio.rst > index 363e12c90b87..633d11c7fa71 100644 > --- a/Documentation/driver-api/vfio.rst > +++ b/Documentation/driver-api/vfio.rst > @@ -239,6 +239,137 @@ group and can access them as follows:: > /* Gratuitous device reset and go... */ > ioctl(device, VFIO_DEVICE_RESET); > > +IOMMUFD and vfio_iommu_type1 > +---------------------------- > + > +IOMMUFD is the new user API to manage I/O page tables from userspace. > +It intends to be the portal of delivering advanced userspace DMA > +features (nested translation [5]_, PASID [6]_, etc.) while also providing > +a backwards compatibility interface for existing VFIO_TYPE1v2_IOMMU use > +cases. Eventually the vfio_iommu_type1 driver, as well as the legacy > +vfio container and group model is intended to be deprecated. > + > +The IOMMUFD backwards compatibility interface can be enabled two ways. > +In the first method, the kernel can be configured with > +CONFIG_IOMMUFD_VFIO_CONTAINER, in which case the IOMMUFD subsystem > +transparently provides the entire infrastructure for the VFIO > +container and IOMMU backend interfaces. The compatibility mode can > +also be accessed if the VFIO container interface, ie. /dev/vfio/vfio is > +simply symlink'd to /dev/iommu. Note that at the time of writing, the > +compatibility mode is not entirely feature complete relative to > +VFIO_TYPE1v2_IOMMU (ex. DMA mapping MMIO) and does not attempt to > +provide compatibility to the VFIO_SPAPR_TCE_IOMMU interface. Therefore > +it is not generally advisable at this time to switch from native VFIO > +implementations to the IOMMUFD compatibility interfaces. > + > +Long term, VFIO users should migrate to device access through the cdev > +interface described below, and native access through the IOMMUFD > +provided interfaces. > + > +VFIO Device cdev > +---------------- > + > +Traditionally user acquires a device fd via VFIO_GROUP_GET_DEVICE_FD > +in a VFIO group. > + > +With CONFIG_VFIO_DEVICE_CDEV=y the user can now acquire a device fd > +by directly opening a character device /dev/vfio/devices/vfioX where > +"X" is the number allocated uniquely by VFIO for registered devices. > +cdev interface does not support noiommu devices, so user should use > +the legacy group interface if noiommu is wanted. > + > +The cdev only works with IOMMUFD. Both VFIO drivers and applications > +must adapt to the new cdev security model which requires using > +VFIO_DEVICE_BIND_IOMMUFD to claim DMA ownership before starting to > +actually use the device. Once BIND succeeds then a VFIO device can > +be fully accessed by the user. > + > +VFIO device cdev doesn't rely on VFIO group/container/iommu drivers. > +Hence those modules can be fully compiled out in an environment > +where no legacy VFIO application exists. > + > +So far SPAPR does not support IOMMUFD yet. So it cannot support device > +cdev either. Why isn´t this enforced via Kconfig? At the vfio level we could simply add the following in patch 17/: config VFIO_DEVICE_CDEV bool "Support for the VFIO cdev /dev/vfio/devices/vfioX" depends on IOMMUFD && !SPAPR_TCE_IOMMU ^^^^^^^^^^^^^^^^^^^ Or if Jason wants, IOMMUFD could depend on !SPAPR_TCE_IOMMU for now and the existing Kconfig options would exclude it. If we know it doesn't work, let's not put the burden on the user to figure that out. A follow-up patch for this would be fine if there's no other reason to respin the series. Otherwise the series is looking pretty good to me. It still requires some reviews/acks in the iommufd space and it would be good to see more reviews for the remainder given the amount of collaboration here. I'm out for the rest of the week, but I'll leave open accepting this and the hot-reset series next week for the merge window. Thanks, Alex > + > +vfio device cdev access is still bound by IOMMU group semantics, ie. there > +can be only one DMA owner for the group. Devices belonging to the same > +group can not be bound to multiple iommufd_ctx or shared between native > +kernel and vfio bus driver or other driver supporting the driver_managed_dma > +flag. A violation of this ownership requirement will fail at the > +VFIO_DEVICE_BIND_IOMMUFD ioctl, which gates full device access. > + > +Device cdev Example > +------------------- > + > +Assume user wants to access PCI device 0000:6a:01.0:: > + > + $ ls /sys/bus/pci/devices/0000:6a:01.0/vfio-dev/ > + vfio0 > + > +This device is therefore represented as vfio0. The user can verify > +its existence:: > + > + $ ls -l /dev/vfio/devices/vfio0 > + crw------- 1 root root 511, 0 Feb 16 01:22 /dev/vfio/devices/vfio0 > + $ cat /sys/bus/pci/devices/0000:6a:01.0/vfio-dev/vfio0/dev > + 511:0 > + $ ls -l /dev/char/511\:0 > + lrwxrwxrwx 1 root root 21 Feb 16 01:22 /dev/char/511:0 -> ../vfio/devices/vfio0 > + > +Then provide the user with access to the device if unprivileged > +operation is desired:: > + > + $ chown user:user /dev/vfio/devices/vfio0 > + > +Finally the user could get cdev fd by:: > + > + cdev_fd = open("/dev/vfio/devices/vfio0", O_RDWR); > + > +An opened cdev_fd doesn't give the user any permission of accessing > +the device except binding the cdev_fd to an iommufd. After that point > +then the device is fully accessible including attaching it to an > +IOMMUFD IOAS/HWPT to enable userspace DMA:: > + > + struct vfio_device_bind_iommufd bind = { > + .argsz = sizeof(bind), > + .flags = 0, > + }; > + struct iommu_ioas_alloc alloc_data = { > + .size = sizeof(alloc_data), > + .flags = 0, > + }; > + struct vfio_device_attach_iommufd_pt attach_data = { > + .argsz = sizeof(attach_data), > + .flags = 0, > + }; > + struct iommu_ioas_map map = { > + .size = sizeof(map), > + .flags = IOMMU_IOAS_MAP_READABLE | > + IOMMU_IOAS_MAP_WRITEABLE | > + IOMMU_IOAS_MAP_FIXED_IOVA, > + .__reserved = 0, > + }; > + > + iommufd = open("/dev/iommu", O_RDWR); > + > + bind.iommufd = iommufd; > + ioctl(cdev_fd, VFIO_DEVICE_BIND_IOMMUFD, &bind); > + > + ioctl(iommufd, IOMMU_IOAS_ALLOC, &alloc_data); > + attach_data.pt_id = alloc_data.out_ioas_id; > + ioctl(cdev_fd, VFIO_DEVICE_ATTACH_IOMMUFD_PT, &attach_data); > + > + /* Allocate some space and setup a DMA mapping */ > + map.user_va = (int64_t)mmap(0, 1024 * 1024, PROT_READ | PROT_WRITE, > + MAP_PRIVATE | MAP_ANONYMOUS, 0, 0); > + map.iova = 0; /* 1MB starting at 0x0 from device view */ > + map.length = 1024 * 1024; > + map.ioas_id = alloc_data.out_ioas_id;; > + > + ioctl(iommufd, IOMMU_IOAS_MAP, &map); > + > + /* Other device operations as stated in "VFIO Usage Example" */ > + > VFIO User API > ------------------------------------------------------------------------------- > > @@ -566,3 +697,11 @@ This implementation has some specifics: > \-0d.1 > > 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90) > + > +.. [5] Nested translation is an IOMMU feature which supports two stage > + address translations. This improves the address translation efficiency > + in IOMMU virtualization. > + > +.. [6] PASID stands for Process Address Space ID, introduced by PCI > + Express. It is a prerequisite for Shared Virtual Addressing (SVA) > + and Scalable I/O Virtualization (Scalable IOV).
> From: Alex Williamson <alex.williamson@redhat.com> > Sent: Thursday, June 22, 2023 5:54 AM > > On Fri, 16 Jun 2023 02:39:46 -0700 > Yi Liu <yi.l.liu@intel.com> wrote: > > +VFIO device cdev doesn't rely on VFIO group/container/iommu drivers. > > +Hence those modules can be fully compiled out in an environment > > +where no legacy VFIO application exists. > > + > > +So far SPAPR does not support IOMMUFD yet. So it cannot support device > > +cdev either. > > Why isn´t this enforced via Kconfig? At the vfio level we could simply > add the following in patch 17/: > > config VFIO_DEVICE_CDEV > bool "Support for the VFIO cdev /dev/vfio/devices/vfioX" > depends on IOMMUFD && !SPAPR_TCE_IOMMU > ^^^^^^^^^^^^^^^^^^^ > > Or if Jason wants, IOMMUFD could depend on !SPAPR_TCE_IOMMU for now and > the existing Kconfig options would exclude it. If we know it doesn't > work, let's not put the burden on the user to figure that out. A > follow-up patch for this would be fine if there's no other reason to > respin the series. @Jason, How about your opinion? Seems reasonable to make IOMMUFD depend on !SPAPR_TCE_IOMMU. Is it? > Otherwise the series is looking pretty good to me. It still requires > some reviews/acks in the iommufd space and it would be good to see more > reviews for the remainder given the amount of collaboration here. > > I'm out for the rest of the week, but I'll leave open accepting this > and the hot-reset series next week for the merge window. Thanks, @Alex, Given Jason's remarks on cdev v12, I've already got a new version as below. I can post it once the above kconfig open is closed. https://github.com/yiliu1765/iommufd/tree/wip/vfio_device_cdev_v14 Regards, Yi Liu
On Tue, Jun 27, 2023 at 08:54:33AM +0000, Liu, Yi L wrote: > > From: Alex Williamson <alex.williamson@redhat.com> > > Sent: Thursday, June 22, 2023 5:54 AM > > > > On Fri, 16 Jun 2023 02:39:46 -0700 > > Yi Liu <yi.l.liu@intel.com> wrote: > > > > +VFIO device cdev doesn't rely on VFIO group/container/iommu drivers. > > > +Hence those modules can be fully compiled out in an environment > > > +where no legacy VFIO application exists. > > > + > > > +So far SPAPR does not support IOMMUFD yet. So it cannot support device > > > +cdev either. > > > > Why isn´t this enforced via Kconfig? At the vfio level we could simply > > add the following in patch 17/: > > > > config VFIO_DEVICE_CDEV > > bool "Support for the VFIO cdev /dev/vfio/devices/vfioX" > > depends on IOMMUFD && !SPAPR_TCE_IOMMU > > ^^^^^^^^^^^^^^^^^^^ > > > > Or if Jason wants, IOMMUFD could depend on !SPAPR_TCE_IOMMU for now and > > the existing Kconfig options would exclude it. If we know it doesn't > > work, let's not put the burden on the user to figure that out. A > > follow-up patch for this would be fine if there's no other reason to > > respin the series. > > @Jason, > How about your opinion? Seems reasonable to make IOMMUFD > depend on !SPAPR_TCE_IOMMU. Is it? The right kconfig would be to list all the iommu drivers that can support iommufd and allow it to be selected if any of them are enabled. This seems too complex to bother with, so I like Alex's version above.. > > Otherwise the series is looking pretty good to me. It still requires > > some reviews/acks in the iommufd space and it would be good to see more > > reviews for the remainder given the amount of collaboration here. > > > > I'm out for the rest of the week, but I'll leave open accepting this > > and the hot-reset series next week for the merge window. Thanks, > > @Alex, > Given Jason's remarks on cdev v12, I've already got a new version as below. > I can post it once the above kconfig open is closed. I think we don't need to bend the rules, Linus would not be happy to see 30 major patches that never hit linux-next at all. I'm happy if we put it on a branch at RC1 and merge it to the vfio & iommufd trees, it is functionally the same outcome in the same time frame. Jason
On Tue, 27 Jun 2023 13:12:14 -0300 Jason Gunthorpe <jgg@nvidia.com> wrote: > On Tue, Jun 27, 2023 at 08:54:33AM +0000, Liu, Yi L wrote: > > > From: Alex Williamson <alex.williamson@redhat.com> > > > Sent: Thursday, June 22, 2023 5:54 AM > > > > > > On Fri, 16 Jun 2023 02:39:46 -0700 > > > Yi Liu <yi.l.liu@intel.com> wrote: > > > > > > +VFIO device cdev doesn't rely on VFIO group/container/iommu drivers. > > > > +Hence those modules can be fully compiled out in an environment > > > > +where no legacy VFIO application exists. > > > > + > > > > +So far SPAPR does not support IOMMUFD yet. So it cannot support device > > > > +cdev either. > > > > > > Why isn´t this enforced via Kconfig? At the vfio level we could simply > > > add the following in patch 17/: > > > > > > config VFIO_DEVICE_CDEV > > > bool "Support for the VFIO cdev /dev/vfio/devices/vfioX" > > > depends on IOMMUFD && !SPAPR_TCE_IOMMU > > > ^^^^^^^^^^^^^^^^^^^ > > > > > > Or if Jason wants, IOMMUFD could depend on !SPAPR_TCE_IOMMU for now and > > > the existing Kconfig options would exclude it. If we know it doesn't > > > work, let's not put the burden on the user to figure that out. A > > > follow-up patch for this would be fine if there's no other reason to > > > respin the series. > > > > @Jason, > > How about your opinion? Seems reasonable to make IOMMUFD > > depend on !SPAPR_TCE_IOMMU. Is it? > > The right kconfig would be to list all the iommu drivers that can > support iommufd and allow it to be selected if any of them are > enabled. > > This seems too complex to bother with, so I like Alex's version above.. > > > > Otherwise the series is looking pretty good to me. It still requires > > > some reviews/acks in the iommufd space and it would be good to see more > > > reviews for the remainder given the amount of collaboration here. > > > > > > I'm out for the rest of the week, but I'll leave open accepting this > > > and the hot-reset series next week for the merge window. Thanks, > > > > @Alex, > > Given Jason's remarks on cdev v12, I've already got a new version as below. > > I can post it once the above kconfig open is closed. > > I think we don't need to bend the rules, Linus would not be happy to > see 30 major patches that never hit linux-next at all. > > I'm happy if we put it on a branch at RC1 and merge it to the vfio & > iommufd trees, it is functionally the same outcome in the same time > frame. Not sure I'm clear on the plan. My intention would have been to apply v14 to my next branch, make sure it did see linux-next exposure, and send a pull request for rc1 next week. Are you suggesting a post-merge-window pull request for v6.5 (also frowned on) or are you suggesting that it simmers in both our next branches until v6.6? Thanks, Alex
> From: Jason Gunthorpe <jgg@nvidia.com> > Sent: Wednesday, June 28, 2023 12:12 AM > > On Tue, Jun 27, 2023 at 08:54:33AM +0000, Liu, Yi L wrote: > > > From: Alex Williamson <alex.williamson@redhat.com> > > > Sent: Thursday, June 22, 2023 5:54 AM > > > > > > On Fri, 16 Jun 2023 02:39:46 -0700 > > > Yi Liu <yi.l.liu@intel.com> wrote: > > > > > > +VFIO device cdev doesn't rely on VFIO group/container/iommu drivers. > > > > +Hence those modules can be fully compiled out in an environment > > > > +where no legacy VFIO application exists. > > > > + > > > > +So far SPAPR does not support IOMMUFD yet. So it cannot support device > > > > +cdev either. > > > > > > Why isn´t this enforced via Kconfig? At the vfio level we could simply > > > add the following in patch 17/: > > > > > > config VFIO_DEVICE_CDEV > > > bool "Support for the VFIO cdev /dev/vfio/devices/vfioX" > > > depends on IOMMUFD && !SPAPR_TCE_IOMMU > > > ^^^^^^^^^^^^^^^^^^^ > > > Proposal A. > > > Or if Jason wants, IOMMUFD could depend on !SPAPR_TCE_IOMMU for now and > > > the existing Kconfig options would exclude it. If we know it doesn't > > > work, let's not put the burden on the user to figure that out. A > > > follow-up patch for this would be fine if there's no other reason to > > > respin the series. Proposal B. > > > > @Jason, > > How about your opinion? Seems reasonable to make IOMMUFD > > depend on !SPAPR_TCE_IOMMU. Is it? > > The right kconfig would be to list all the iommu drivers that can > support iommufd and allow it to be selected if any of them are > enabled. > > This seems too complex to bother with, so I like Alex's version above.. Sorry, I'm not quite clear. Alex has two proposals above (A and B). Which one do you mean? It looks like you prefer A. is it? :-) Regards, Yi Liu
> From: Alex Williamson <alex.williamson@redhat.com> > Sent: Wednesday, June 28, 2023 1:35 AM [The Cc list gets broken in the reply from Alex to Jason, here I reply to Alex's email with the Cc list fixed. @Alex, seems like the same symptom with last time, do you have any idea on it?] > On Tue, 27 Jun 2023 13:12:14 -0300 > Jason Gunthorpe <jgg@nvidia.com> wrote: > > > On Tue, Jun 27, 2023 at 08:54:33AM +0000, Liu, Yi L wrote: > > > > From: Alex Williamson <alex.williamson@redhat.com> > > > > Sent: Thursday, June 22, 2023 5:54 AM > > > > > > > > On Fri, 16 Jun 2023 02:39:46 -0700 > > > > Yi Liu <yi.l.liu@intel.com> wrote: > > > > > > > > +VFIO device cdev doesn't rely on VFIO group/container/iommu drivers. > > > > > +Hence those modules can be fully compiled out in an environment > > > > > +where no legacy VFIO application exists. > > > > > + > > > > > +So far SPAPR does not support IOMMUFD yet. So it cannot support device > > > > > +cdev either. > > > > > > > > Why isn´t this enforced via Kconfig? At the vfio level we could simply > > > > add the following in patch 17/: > > > > > > > > config VFIO_DEVICE_CDEV > > > > bool "Support for the VFIO cdev /dev/vfio/devices/vfioX" > > > > depends on IOMMUFD && !SPAPR_TCE_IOMMU > > > > ^^^^^^^^^^^^^^^^^^^ > > > > > > > > Or if Jason wants, IOMMUFD could depend on !SPAPR_TCE_IOMMU for now and > > > > the existing Kconfig options would exclude it. If we know it doesn't > > > > work, let's not put the burden on the user to figure that out. A > > > > follow-up patch for this would be fine if there's no other reason to > > > > respin the series. > > > > > > @Jason, > > > How about your opinion? Seems reasonable to make IOMMUFD > > > depend on !SPAPR_TCE_IOMMU. Is it? > > > > The right kconfig would be to list all the iommu drivers that can > > support iommufd and allow it to be selected if any of them are > > enabled. > > > > This seems too complex to bother with, so I like Alex's version above.. > > > > > > Otherwise the series is looking pretty good to me. It still requires > > > > some reviews/acks in the iommufd space and it would be good to see more > > > > reviews for the remainder given the amount of collaboration here. > > > > > > > > I'm out for the rest of the week, but I'll leave open accepting this > > > > and the hot-reset series next week for the merge window. Thanks, > > > > > > @Alex, > > > Given Jason's remarks on cdev v12, I've already got a new version as below. > > > I can post it once the above kconfig open is closed. > > > > I think we don't need to bend the rules, Linus would not be happy to > > see 30 major patches that never hit linux-next at all. > > > > I'm happy if we put it on a branch at RC1 and merge it to the vfio & > > iommufd trees, it is functionally the same outcome in the same time > > frame. > > Not sure I'm clear on the plan. My intention would have been to apply > v14 to my next branch, make sure it did see linux-next exposure, > and send a pull request for rc1 next week. > > Are you suggesting a post-merge-window pull request for v6.5 (also > frowned on) or are you suggesting that it simmers in both our next > branches until v6.6? Thanks, It appears to me the latter one. When 6.5-rc1 is released, we immediately apply the hot-reset and cdev series onto it and put it in a shared tree to assist the other iommufd feature development (e.g. nesting). Jason, is it? Regards, Yi Liu
On Wed, Jun 28, 2023 at 12:56:40AM +0000, Liu, Yi L wrote: > > From: Jason Gunthorpe <jgg@nvidia.com> > > Sent: Wednesday, June 28, 2023 12:12 AM > > > > On Tue, Jun 27, 2023 at 08:54:33AM +0000, Liu, Yi L wrote: > > > > From: Alex Williamson <alex.williamson@redhat.com> > > > > Sent: Thursday, June 22, 2023 5:54 AM > > > > > > > > On Fri, 16 Jun 2023 02:39:46 -0700 > > > > Yi Liu <yi.l.liu@intel.com> wrote: > > > > > > > > +VFIO device cdev doesn't rely on VFIO group/container/iommu drivers. > > > > > +Hence those modules can be fully compiled out in an environment > > > > > +where no legacy VFIO application exists. > > > > > + > > > > > +So far SPAPR does not support IOMMUFD yet. So it cannot support device > > > > > +cdev either. > > > > > > > > Why isn´t this enforced via Kconfig? At the vfio level we could simply > > > > add the following in patch 17/: > > > > > > > > config VFIO_DEVICE_CDEV > > > > bool "Support for the VFIO cdev /dev/vfio/devices/vfioX" > > > > depends on IOMMUFD && !SPAPR_TCE_IOMMU > > > > ^^^^^^^^^^^^^^^^^^^ > > > > > > Proposal A. > > > > > Or if Jason wants, IOMMUFD could depend on !SPAPR_TCE_IOMMU for now and > > > > the existing Kconfig options would exclude it. If we know it doesn't > > > > work, let's not put the burden on the user to figure that out. A > > > > follow-up patch for this would be fine if there's no other reason to > > > > respin the series. > > Proposal B. > > > > > > > @Jason, > > > How about your opinion? Seems reasonable to make IOMMUFD > > > depend on !SPAPR_TCE_IOMMU. Is it? > > > > The right kconfig would be to list all the iommu drivers that can > > support iommufd and allow it to be selected if any of them are > > enabled. > > > > This seems too complex to bother with, so I like Alex's version above.. > > Sorry, I'm not quite clear. Alex has two proposals above (A and B). Which > one do you mean? It looks like you prefer A. is it? :-) A Jason
On Wed, Jun 28, 2023 at 01:10:11AM +0000, Liu, Yi L wrote: > > From: Alex Williamson <alex.williamson@redhat.com> > > Sent: Wednesday, June 28, 2023 1:35 AM > > [The Cc list gets broken in the reply from Alex to Jason, here I reply to > Alex's email with the Cc list fixed. @Alex, seems like the same symptom > with last time, do you have any idea on it?] It is weird... > > Are you suggesting a post-merge-window pull request for v6.5 (also > > frowned on) or are you suggesting that it simmers in both our next > > branches until v6.6? Thanks, > > It appears to me the latter one. When 6.5-rc1 is released, we immediately > apply the hot-reset and cdev series onto it and put it in a shared tree to > assist the other iommufd feature development (e.g. nesting). Jason, is it? Yes, no reason to try to bend the rules with Linus at this point Jason
diff --git a/Documentation/driver-api/vfio.rst b/Documentation/driver-api/vfio.rst index 363e12c90b87..633d11c7fa71 100644 --- a/Documentation/driver-api/vfio.rst +++ b/Documentation/driver-api/vfio.rst @@ -239,6 +239,137 @@ group and can access them as follows:: /* Gratuitous device reset and go... */ ioctl(device, VFIO_DEVICE_RESET); +IOMMUFD and vfio_iommu_type1 +---------------------------- + +IOMMUFD is the new user API to manage I/O page tables from userspace. +It intends to be the portal of delivering advanced userspace DMA +features (nested translation [5]_, PASID [6]_, etc.) while also providing +a backwards compatibility interface for existing VFIO_TYPE1v2_IOMMU use +cases. Eventually the vfio_iommu_type1 driver, as well as the legacy +vfio container and group model is intended to be deprecated. + +The IOMMUFD backwards compatibility interface can be enabled two ways. +In the first method, the kernel can be configured with +CONFIG_IOMMUFD_VFIO_CONTAINER, in which case the IOMMUFD subsystem +transparently provides the entire infrastructure for the VFIO +container and IOMMU backend interfaces. The compatibility mode can +also be accessed if the VFIO container interface, ie. /dev/vfio/vfio is +simply symlink'd to /dev/iommu. Note that at the time of writing, the +compatibility mode is not entirely feature complete relative to +VFIO_TYPE1v2_IOMMU (ex. DMA mapping MMIO) and does not attempt to +provide compatibility to the VFIO_SPAPR_TCE_IOMMU interface. Therefore +it is not generally advisable at this time to switch from native VFIO +implementations to the IOMMUFD compatibility interfaces. + +Long term, VFIO users should migrate to device access through the cdev +interface described below, and native access through the IOMMUFD +provided interfaces. + +VFIO Device cdev +---------------- + +Traditionally user acquires a device fd via VFIO_GROUP_GET_DEVICE_FD +in a VFIO group. + +With CONFIG_VFIO_DEVICE_CDEV=y the user can now acquire a device fd +by directly opening a character device /dev/vfio/devices/vfioX where +"X" is the number allocated uniquely by VFIO for registered devices. +cdev interface does not support noiommu devices, so user should use +the legacy group interface if noiommu is wanted. + +The cdev only works with IOMMUFD. Both VFIO drivers and applications +must adapt to the new cdev security model which requires using +VFIO_DEVICE_BIND_IOMMUFD to claim DMA ownership before starting to +actually use the device. Once BIND succeeds then a VFIO device can +be fully accessed by the user. + +VFIO device cdev doesn't rely on VFIO group/container/iommu drivers. +Hence those modules can be fully compiled out in an environment +where no legacy VFIO application exists. + +So far SPAPR does not support IOMMUFD yet. So it cannot support device +cdev either. + +vfio device cdev access is still bound by IOMMU group semantics, ie. there +can be only one DMA owner for the group. Devices belonging to the same +group can not be bound to multiple iommufd_ctx or shared between native +kernel and vfio bus driver or other driver supporting the driver_managed_dma +flag. A violation of this ownership requirement will fail at the +VFIO_DEVICE_BIND_IOMMUFD ioctl, which gates full device access. + +Device cdev Example +------------------- + +Assume user wants to access PCI device 0000:6a:01.0:: + + $ ls /sys/bus/pci/devices/0000:6a:01.0/vfio-dev/ + vfio0 + +This device is therefore represented as vfio0. The user can verify +its existence:: + + $ ls -l /dev/vfio/devices/vfio0 + crw------- 1 root root 511, 0 Feb 16 01:22 /dev/vfio/devices/vfio0 + $ cat /sys/bus/pci/devices/0000:6a:01.0/vfio-dev/vfio0/dev + 511:0 + $ ls -l /dev/char/511\:0 + lrwxrwxrwx 1 root root 21 Feb 16 01:22 /dev/char/511:0 -> ../vfio/devices/vfio0 + +Then provide the user with access to the device if unprivileged +operation is desired:: + + $ chown user:user /dev/vfio/devices/vfio0 + +Finally the user could get cdev fd by:: + + cdev_fd = open("/dev/vfio/devices/vfio0", O_RDWR); + +An opened cdev_fd doesn't give the user any permission of accessing +the device except binding the cdev_fd to an iommufd. After that point +then the device is fully accessible including attaching it to an +IOMMUFD IOAS/HWPT to enable userspace DMA:: + + struct vfio_device_bind_iommufd bind = { + .argsz = sizeof(bind), + .flags = 0, + }; + struct iommu_ioas_alloc alloc_data = { + .size = sizeof(alloc_data), + .flags = 0, + }; + struct vfio_device_attach_iommufd_pt attach_data = { + .argsz = sizeof(attach_data), + .flags = 0, + }; + struct iommu_ioas_map map = { + .size = sizeof(map), + .flags = IOMMU_IOAS_MAP_READABLE | + IOMMU_IOAS_MAP_WRITEABLE | + IOMMU_IOAS_MAP_FIXED_IOVA, + .__reserved = 0, + }; + + iommufd = open("/dev/iommu", O_RDWR); + + bind.iommufd = iommufd; + ioctl(cdev_fd, VFIO_DEVICE_BIND_IOMMUFD, &bind); + + ioctl(iommufd, IOMMU_IOAS_ALLOC, &alloc_data); + attach_data.pt_id = alloc_data.out_ioas_id; + ioctl(cdev_fd, VFIO_DEVICE_ATTACH_IOMMUFD_PT, &attach_data); + + /* Allocate some space and setup a DMA mapping */ + map.user_va = (int64_t)mmap(0, 1024 * 1024, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, 0, 0); + map.iova = 0; /* 1MB starting at 0x0 from device view */ + map.length = 1024 * 1024; + map.ioas_id = alloc_data.out_ioas_id;; + + ioctl(iommufd, IOMMU_IOAS_MAP, &map); + + /* Other device operations as stated in "VFIO Usage Example" */ + VFIO User API ------------------------------------------------------------------------------- @@ -566,3 +697,11 @@ This implementation has some specifics: \-0d.1 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90) + +.. [5] Nested translation is an IOMMU feature which supports two stage + address translations. This improves the address translation efficiency + in IOMMU virtualization. + +.. [6] PASID stands for Process Address Space ID, introduced by PCI + Express. It is a prerequisite for Shared Virtual Addressing (SVA) + and Scalable I/O Virtualization (Scalable IOV).