Message ID | 20230712072528.275577-1-zhenzhong.duan@intel.com (mailing list archive) |
---|---|
Headers | show |
Series | vfio: Adopt iommufd | expand |
Ping, any comments or suggestions are appreciated. Thanks Zhenzhong >-----Original Message----- >From: Duan, Zhenzhong <zhenzhong.duan@intel.com> >Sent: Wednesday, July 12, 2023 3:25 PM >To: qemu-devel@nongnu.org >Cc: alex.williamson@redhat.com; clg@redhat.com; jgg@nvidia.com; >nicolinc@nvidia.com; eric.auger@redhat.com; peterx@redhat.com; >jasonwang@redhat.com; Tian, Kevin <kevin.tian@intel.com>; Liu, Yi L ><yi.l.liu@intel.com>; Sun, Yi Y <yi.y.sun@intel.com>; Peng, Chao P ><chao.p.peng@intel.com>; Duan, Zhenzhong <zhenzhong.duan@intel.com> >Subject: [RFC PATCH v4 00/24] vfio: Adopt iommufd > >With the introduction of iommufd, the Linux kernel provides a generic >interface for userspace drivers to propagate their DMA mappings to kernel >for assigned devices. This series does the porting of the VFIO devices >onto the /dev/iommu uapi and let it coexist with the legacy implementation. > >This QEMU integration is the result of a collaborative work between >Yi Liu, Yi Sun, Nicolin Chen and Eric Auger. > >At QEMU level, interactions with the /dev/iommu are abstracted by a new >iommufd object (compiled in with the CONFIG_IOMMUFD option). > >Any QEMU device (e.g. vfio device) wishing to use /dev/iommu must be >linked with an iommufd object. In this series, the vfio-pci device is >granted with such capability (other VFIO devices are not yet ready): > >It gets a new optional parameter named iommufd which allows to pass >an iommufd object: > > -object iommufd,id=iommufd0 > -device vfio-pci,host=0000:02:00.0,iommufd=iommufd0 > >Note the /dev/iommu and vfio cdev can be externally opened by a >management layer. In such a case the fd is passed: > > -object iommufd,id=iommufd0,fd=22 > -device vfio-pci,iommufd=iommufd0,fd=23 > >If the fd parameter is not passed, the fd is opened by QEMU. >See https://www.mail-archive.com/qemu-devel@nongnu.org/msg937155.html >for detailed discuss on this requirement. > >If no iommufd option is passed to the vfio-pci device, iommufd is not >used and the end-user gets the behavior based on the legacy vfio iommu >interfaces: > > -device vfio-pci,host=0000:02:00.0 > >While the legacy kernel interface is group-centric, the new iommufd >interface is device-centric, relying on device fd and iommufd. > >To support both interfaces in the QEMU VFIO device we reworked the vfio >container abstraction so that the generic VFIO code can use either >backend. > >The VFIOContainer object becomes a base object derived into >a) the legacy VFIO container and >b) the new iommufd based container. > >The base object implements generic code such as code related to >memory_listener and address space management whereas the derived >objects implement callbacks specific to either BE, legacy and >iommufd. Indeed each backend has its own way to setup secure context >and dma management interface. The below diagram shows how it looks >like with both BEs. > > VFIO AddressSpace/Memory > +-------+ +----------+ +-----+ +-----+ > | pci | | platform | | ap | | ccw | > +---+---+ +----+-----+ +--+--+ +--+--+ +----------------------+ > | | | | | AddressSpace | > | | | | +------------+---------+ > +---V-----------V-----------V--------V----+ / > | VFIOAddressSpace | <------------+ > | | | MemoryListener > | VFIOContainer list | > +-------+----------------------------+----+ > | | > | | > +-------V------+ +--------V----------+ > | iommufd | | vfio legacy | > | container | | container | > +-------+------+ +--------+----------+ > | | > | /dev/iommu | /dev/vfio/vfio > | /dev/vfio/devices/vfioX | /dev/vfio/$group_id >Userspace | | >============+============================+======================= >==== >Kernel | device fd | > +---------------+ | group/container fd > | (BIND_IOMMUFD | | (SET_CONTAINER/SET_IOMMU) > | ATTACH_IOAS) | | device fd > | | | > | +-------V------------V-----------------+ > iommufd | | vfio | >(map/unmap | +---------+--------------------+-------+ >ioas_copy) | | | map/unmap > | | | > +------V------+ +-----V------+ +------V--------+ > | iommfd core | | device | | vfio iommu | > +-------------+ +------------+ +---------------+ > >[Secure Context setup] >- iommufd BE: uses device fd and iommufd to setup secure context > (bind_iommufd, attach_ioas) >- vfio legacy BE: uses group fd and container fd to setup secure context > (set_container, set_iommu) >[Device access] >- iommufd BE: device fd is opened through /dev/vfio/devices/vfioX >- vfio legacy BE: device fd is retrieved from group fd ioctl >[DMA Mapping flow] >1. VFIOAddressSpace receives MemoryRegion add/del via MemoryListener >2. VFIO populates DMA map/unmap via the container BEs > *) iommufd BE: uses iommufd > *) vfio legacy BE: uses container fd > >This series depends on Yi's kernel series: >"[PATCH v14 00/26] Add vfio_device cdev for iommufd support" >https://lore.kernel.org/all/20230711025928.6438-1-yi.l.liu@intel.com/ >and >"[PATCH v9 00/10] Enhance vfio PCI hot reset for vfio cdev device" >https://lore.kernel.org/kvm/20230711023126.5531-1-yi.l.liu@intel.com/ > >which can be found at: >https://github.com/yiliu1765/iommufd/tree/vfio_device_cdev_v14 > >This qemu series can be found at: >https://github.com/yiliu1765/qemu/tree/zhenzhong/iommufd_rfcv4 > >Test done: >- PCI device were tested >- platform, ccw and ap were only compile-tested >- FD passing and hot reset with some trick. >- device hotplug test with legacy and iommufd backends (limited tests) >- vIOMMU test run for both legacy and iommufd backends (limited tests) > > >Given some iommufd kernel limitations, the iommufd backend is >not yet fully on par with the legacy backend w.r.t. features like: >- p2p mappings (you will see related error traces) >- live migration >- and etc. > >About TODOs in rfcv3: >- Add DMA alias check for iommufd BE (group level) >attach_ioas will fail for aliased device, so I think that's not a problem. > >- Make pci.c to be BE agnostic. Needs kernel change as well to fix the > VFIO_DEVICE_PCI_HOT_RESET gap >I didn't make pci.c fully group agnostic because pci device reset is >device scope operation, force mapping it to container scope callback >isn't a good idea. Instead I added iommufd code in pci.c and fixed >VFIO_DEVICE_PCI_HOT_RESET gap there. > >- Cleanup the VFIODevice fields as it's used in both backends >- Replace list with g_tree >This TODO is not viable due to iterator callback depending on list element. > >- Add locks >I think it's not necessory as BQL already ensure that. > >base-commit: 887cba855b > >Change log: >v3 -> v4: >- rebase on top of v8.0.3 >- Add one patch from Yi which is about vfio device add in kvm >- Remove IOAS_COPY optimization and focus on functions in this patchset >- Fix wrong name issue reported and fix suggested by Matthew >- Fix compilation issue reported and fix sugggsted by Nicolin >- Use query_dirty_bitmap callback to replace get_dirty_bitmap for better >granularity >- Add dev_iter_next() callback to avoid adding so many callback > at container scope, add VFIODevice.hwpt to support that >- Restore all functions back to common from container whenever possible, > mainly migration and reset related functions >- Add --enable/disable-iommufd config option, enabled by default in linux >- Remove VFIODevice.hwpt_next as it's redundant with VFIODevice.next >- Adapt new VFIO_DEVICE_PCI_HOT_RESET uAPI for IOMMUFD backed device >- vfio_kvm_device_add/del_group call vfio_kvm_device_add/del_fd to remove >redundant code >- Add FD passing support for vfio device backed by IOMMUFD >- Fix hot unplug resource leak issue in vfio_legacy_detach_device() >- Fix FD leak in vfio_get_devicefd() > >v3: https://lists.nongnu.org/archive/html/qemu-devel/2023-01/msg07189.html > >v2 -> v3: >- rebase on top of v7.2.0 >- Fix the compilation with CONFIG_IOMMUFD unset by using true classes for > VFIO backends >- Fix use after free in error path, reported by Alister >- Split common.c in several steps to ease the review > >v1 -> v2: >- remove the first three patches of rfcv1 >- add open cdev helper suggested by Jason >- remove the QOMification of the VFIOContainer and simply use standard ops >(David) >- add "-object iommufd" suggested by Alex > >v1: https://lore.kernel.org/qemu-devel/20220414104710.28534-1- >yi.l.liu@intel.com/ > >Thanks, >Yi, Yi, Eric, Zhenzhong > >Eric Auger (9): > scripts/update-linux-headers: Add iommufd.h > vfio/common: Introduce vfio_container_add|del_section_window() > vfio/container: Introduce vfio_[attach/detach]_device > vfio/platform: Use vfio_[attach/detach]_device > vfio/ap: Use vfio_[attach/detach]_device > vfio/ccw: Use vfio_[attach/detach]_device > vfio/container-base: Introduce [attach/detach]_device container > callbacks > backends/iommufd: Introduce the iommufd object > vfio/as: Allow the selection of a given iommu backend > >Yi Liu (6): > vfio/common: Move IOMMU agnostic helpers to a separate file > vfio/common: Move legacy VFIO backend code into separate container.c > vfio/common: Rename into as.c > vfio: Add base container > util/char_dev: Add open_cdev() > vfio/iommufd: Implement the iommufd backend > >Zhenzhong Duan (9): > Update linux-header per VFIO device cdev v14 > vfio/common: Extract out vfio_kvm_device_[add/del]_fd > vfio/common: Add a vfio device iterator > vfio/common: Refactor vfio_viommu_preset() to be group agnostic > vfio/as: Simplify vfio_viommu_preset() > Add iommufd configure option > vfio/as: Add vfio device iterator callback for iommufd > vfio/pci: Adapt vfio pci hot reset support with iommufd BE > vfio/iommufd: Make vfio cdev pre-openable by passing a file handle > > MAINTAINERS | 13 + > backends/Kconfig | 4 + > backends/iommufd.c | 268 +++ > backends/meson.build | 3 + > backends/trace-events | 12 + > hw/vfio/ap.c | 66 +- > hw/vfio/as.c | 1555 +++++++++++++ > hw/vfio/ccw.c | 122 +- > hw/vfio/common.c | 3078 ------------------------- > hw/vfio/container-base.c | 146 ++ > hw/vfio/container.c | 1218 ++++++++++ > hw/vfio/helpers.c | 598 +++++ > hw/vfio/iommufd.c | 546 +++++ > hw/vfio/meson.build | 8 +- > hw/vfio/pci.c | 354 ++- > hw/vfio/platform.c | 43 +- > hw/vfio/spapr.c | 22 +- > hw/vfio/trace-events | 16 +- > include/hw/vfio/vfio-common.h | 109 +- > include/hw/vfio/vfio-container-base.h | 158 ++ > include/qemu/char_dev.h | 16 + > include/sysemu/iommufd.h | 47 + > linux-headers/linux/iommufd.h | 347 +++ > linux-headers/linux/kvm.h | 13 +- > linux-headers/linux/vfio.h | 142 +- > meson.build | 6 + > meson_options.txt | 2 + > qapi/qom.json | 18 +- > qemu-options.hx | 13 + > scripts/meson-buildoptions.sh | 3 + > scripts/update-linux-headers.sh | 3 +- > util/chardev_open.c | 61 + > util/meson.build | 1 + > 33 files changed, 5601 insertions(+), 3410 deletions(-) > create mode 100644 backends/iommufd.c > create mode 100644 hw/vfio/as.c > delete mode 100644 hw/vfio/common.c > create mode 100644 hw/vfio/container-base.c > create mode 100644 hw/vfio/container.c > create mode 100644 hw/vfio/helpers.c > create mode 100644 hw/vfio/iommufd.c > create mode 100644 include/hw/vfio/vfio-container-base.h > create mode 100644 include/qemu/char_dev.h > create mode 100644 include/sysemu/iommufd.h > create mode 100644 linux-headers/linux/iommufd.h > create mode 100644 util/chardev_open.c > >-- >2.34.1
On Tue, Aug 01, 2023 at 08:28:01AM +0000, Duan, Zhenzhong wrote:
> Ping, any comments or suggestions are appreciated.
Zhenzhong, I'd love to, yet haven't got the chance to go through
this series. I think that most of us are quite occupied at this
moment by the kernel side of the changes.
I plan to take a close look and run some tests next week.
Thanks
Nicolin
>-----Original Message----- >From: Nicolin Chen <nicolinc@nvidia.com> >Subject: Re: [RFC PATCH v4 00/24] vfio: Adopt iommufd > >On Tue, Aug 01, 2023 at 08:28:01AM +0000, Duan, Zhenzhong wrote: > >> Ping, any comments or suggestions are appreciated. > >Zhenzhong, I'd love to, yet haven't got the chance to go through >this series. I think that most of us are quite occupied at this >moment by the kernel side of the changes. Oh, I see. > >I plan to take a close look and run some tests next week. Much appreciated, thanks Nicolin. BRs, Zhenzhong