mbox series

[v7,00/27] vfio: Adopt iommufd

Message ID 20231121084426.1286987-1-zhenzhong.duan@intel.com (mailing list archive)
Headers show
Series vfio: Adopt iommufd | expand

Message

Duan, Zhenzhong Nov. 21, 2023, 8:43 a.m. UTC
Hi,

Thanks all for giving guides and comments on previous series, this is
the remaining part of the iommufd support.

Besides suggested changes in v6, I'd like to highlight two changes
for final review:
1. Instantiate can_be_deleted callback to fix race where iommufd object
   can be deleted before vfio device
2. After careful re-thinking, I'd like to follow Nicolin's suggestion in v5,
   remove is_ioas check which indeed looks heavy just for tracepoint.
   In fact we can get corresponding info by looking over trace context.

PATCH 1: Introduce iommufd object
PATCH 2-9: add IOMMUFD container and cdev support
PATCH 10-17: fd passing for cdev and linking to IOMMUFD
PATCH 18: make VFIOContainerBase parameter const
PATCH 19-21: Compile out for IOMMUFD for arm, s390x and x86
PATCH 22-26: vfio device init code cleanup
PATCH 27: add iommufd doc


We have done wide test with different combinations, e.g:
- PCI device were tested
- FD passing and hot reset with some trick.
- device hotplug test with legacy and iommufd backends
- with or without vIOMMU for legacy and iommufd backends
- divices linked to different iommufds
- VFIO migration with a E800 net card(no dirty sync support) passthrough
- platform, ccw and ap were only compile-tested due to environment limit
- test mdev pass through with mtty and mix with real device and different BE
- test iommufd object hotplug/unplug and mix with vfio device plug/unplug

Given some iommufd kernel limitations, the iommufd backend is
not yet fully on par with the legacy backend w.r.t. features like:
- p2p mappings (you will see related error traces)
- dirty page sync
- and etc.


qemu code: https://github.com/yiliu1765/qemu/commits/zhenzhong/iommufd_cdev_v7
Based on vfio-next, commit id: c487fb8a50

--------------------------------------------------------------------------

Below are some background and graph about the design:

With the introduction of iommufd, the Linux kernel provides a generic
interface for userspace drivers to propagate their DMA mappings to kernel
for assigned devices. This series does the porting of the VFIO devices
onto the /dev/iommu uapi and let it coexist with the legacy implementation.

At QEMU level, interactions with the /dev/iommu are abstracted by a new
iommufd object (compiled in with the CONFIG_IOMMUFD option).

Any QEMU device (e.g. vfio device) wishing to use /dev/iommu must be
linked with an iommufd object. In this series, the vfio-pci device is
granted with such capability (other VFIO devices are not yet ready):

It gets a new optional parameter named iommufd which allows to pass
an iommufd object:

    -object iommufd,id=iommufd0
    -device vfio-pci,host=0000:02:00.0,iommufd=iommufd0

Note the /dev/iommu and vfio cdev can be externally opened by a
management layer. In such a case the fd is passed:

    -object iommufd,id=iommufd0,fd=22
    -device vfio-pci,iommufd=iommufd0,fd=23

If the fd parameter is not passed, the fd is opened by QEMU.
See https://www.mail-archive.com/qemu-devel@nongnu.org/msg937155.html
for detailed discuss on this requirement.

If no iommufd option is passed to the vfio-pci device, iommufd is not
used and the end-user gets the behavior based on the legacy vfio iommu
interfaces:

    -device vfio-pci,host=0000:02:00.0

While the legacy kernel interface is group-centric, the new iommufd
interface is device-centric, relying on device fd and iommufd.

To support both interfaces in the QEMU VFIO device we reworked the vfio
container abstraction so that the generic VFIO code can use either
backend.

The VFIOContainer object becomes a base object derived into
a) the legacy VFIO container and
b) the new iommufd based container.

The base object implements generic code such as code related to
memory_listener and address space management whereas the derived
objects implement callbacks specific to either BE, legacy and
iommufd. Indeed each backend has its own way to setup secure context
and dma management interface. The below diagram shows how it looks
like with both BEs.

                    VFIO                           AddressSpace/Memory
    +-------+  +----------+  +-----+  +-----+
    |  pci  |  | platform |  |  ap |  | ccw |
    +---+---+  +----+-----+  +--+--+  +--+--+     +----------------------+
        |           |           |        |        |   AddressSpace       |
        |           |           |        |        +------------+---------+
    +---V-----------V-----------V--------V----+               /
    |           VFIOAddressSpace              | <------------+
    |                  |                      |  MemoryListener
    |          VFIOContainer list             |
    +-------+----------------------------+----+
            |                            |
            |                            |
    +-------V------+            +--------V----------+
    |   iommufd    |            |    vfio legacy    |
    |  container   |            |     container     |
    +-------+------+            +--------+----------+
            |                            |
            | /dev/iommu                 | /dev/vfio/vfio
            | /dev/vfio/devices/vfioX    | /dev/vfio/$group_id
Userspace   |                            |
============+============================+===========================
Kernel      |  device fd                 |
            +---------------+            | group/container fd
            | (BIND_IOMMUFD |            | (SET_CONTAINER/SET_IOMMU)
            |  ATTACH_IOAS) |            | device fd
            |               |            |
            |       +-------V------------V-----------------+
    iommufd |       |                vfio                  |
(map/unmap  |       +---------+--------------------+-------+
ioas_copy)  |                 |                    | map/unmap
            |                 |                    |
     +------V------+    +-----V------+      +------V--------+
     | iommfd core |    |  device    |      |  vfio iommu   |
     +-------------+    +------------+      +---------------+

[Secure Context setup]
- iommufd BE: uses device fd and iommufd to setup secure context
              (bind_iommufd, attach_ioas)
- vfio legacy BE: uses group fd and container fd to setup secure context
                  (set_container, set_iommu)
[Device access]
- iommufd BE: device fd is opened through /dev/vfio/devices/vfioX
- vfio legacy BE: device fd is retrieved from group fd ioctl
[DMA Mapping flow]
1. VFIOAddressSpace receives MemoryRegion add/del via MemoryListener
2. VFIO populates DMA map/unmap via the container BEs
   *) iommufd BE: uses iommufd
   *) vfio legacy BE: uses container fd


Changelog:
v7:
- Add iommufd doc (Cedric)
- Abstract iommufd_cdev_get_realized_vpdev in hot reset code (Eric)
- Add docs for vfio_device_get_name and return bool (Philippe)
- Remove is_ioas parameter (Nicolin)
- Include the general cleanup patchset in this series
- Instantiate can_be_deleted callback
- qom.json (Since: 8.2) -> (Since: 9.0)
- Add new R-b, T-b from Eric

v6:
- simplify CONFIG_IOMMUFD checking code further (Cédric)
- check iommufd_cdev_kvm_device_add return value (Cédric)
- dirrectory -> directory (Cédric)
- propagate iommufd_cdev_get_info_iova_range err and print as warning (Cédric)
- introduce a helper vfio_device_set_fd (Cédric)
- Move #include "sysemu/iommufd.h" in platform.c (Cédric)
- simplify iommufd backend uAPI, remove alloc_hwpt, get/put_ioas
- Remove CONFIG_IOMMUFD check in meson.build and treat iommufd.c general
- Add Matthew and Cédric's RB

v5:
- Change to use Kconfig for CONFIG_IOMMUFD and drop stub file (Cédric)
- Add (uintptr_t) to info->allowed_iovas (Cédric)
- Switch to IOAS attach/detach and hide hwpt (Jason)
- move chardev_open.[h|c] under the IOMMUFD entry (Cédric)
- Move vfio_legacy_pci_hot_reset into container.c (Cédric)
- Add missed pgsizes initialization in vfio_get_info_iova_range
- split linking iommufd patch into three to be cleaner
- Fix comments on PCI BAR unmap

v4:
- add CONFIG_IOMMUFD check for IOMMUFDProperties (Markus)
- add doc for default case without fd (Markus)
- Fix build issue reported by Markus and Cédric
- Simply use SPDX identifier in new file (Cédric)
- make vfio_container_init/destroy helper a seperate patch (Cédric)
- make vrdl_list movement a seperate patch (Cédric)
- add const for some callback parameters (Cédric)
- add g_assert in VFIOIOMMUOps callback (Cédric)
- introduce pci_hot_reset callback (Cédric)
- remove VFIOIOMMUSpaprOps (Cédric)
- initialize g_autofree to NULL (Cédric)
- adjust func name prefix and trace event in iommufd.c (Cédric)
- add RB

v3:
- Rename base container as VFIOContainerBase and legacy container as container (Cédric)
- Drop VFIO_IOMMU_BACKEND_OPS class and use struct instead (Cédric)
- Cleanup container.c by introducing spapr backend and move spapr code out (Cédric)
- Introduce vfio_iommu_spapr_ops (Cédric)
- Add doc of iommufd in qom.json and have iommufd member sorted (Markus)
- patch19 and patch21 are splitted to two parts to facilitate review

v2:
- patch "vfio: Add base container" in v1 is split into patch1-15 per Cédric
- add fd passing to platform/ap/ccw vfio device
- add (uintptr_t) cast in iommufd_backend_map_dma() per Cédric
- rename char_dev.h to chardev_open.h for same naming scheme per Daniel
- add full copyright per Daniel and Jason


Note changelog below are from full IOMMUFD series:

v1:
- Alloc hwpt instead of using auto hwpt
- elaborate iommufd code per Nicolin
- consolidate two patches and drop as.c
- typo error fix and function rename

rfcv4:
- rebase on top of v8.0.3
- Add one patch from Yi which is about vfio device add in kvm
- Remove IOAS_COPY optimization and focus on functions in this patchset
- Fix wrong name issue reported and fix suggested by Matthew
- Fix compilation issue reported and fix sugggsted by Nicolin
- Use query_dirty_bitmap callback to replace get_dirty_bitmap for better
granularity
- Add dev_iter_next() callback to avoid adding so many callback
  at container scope, add VFIODevice.hwpt to support that
- Restore all functions back to common from container whenever possible,
  mainly migration and reset related functions
- Add --enable/disable-iommufd config option, enabled by default in linux
- Remove VFIODevice.hwpt_next as it's redundant with VFIODevice.next
- Adapt new VFIO_DEVICE_PCI_HOT_RESET uAPI for IOMMUFD backed device
- vfio_kvm_device_add/del_group call vfio_kvm_device_add/del_fd to remove
redundant code
- Add FD passing support for vfio device backed by IOMMUFD
- Fix hot unplug resource leak issue in vfio_legacy_detach_device()
- Fix FD leak in vfio_get_devicefd()

rfcv3:
- rebase on top of v7.2.0
- Fix the compilation with CONFIG_IOMMUFD unset by using true classes for
  VFIO backends
- Fix use after free in error path, reported by Alister
- Split common.c in several steps to ease the review

rfcv2:
- remove the first three patches of rfcv1
- add open cdev helper suggested by Jason
- remove the QOMification of the VFIOContainer and simply use standard ops
(David)
- add "-object iommufd" suggested by Alex

Thanks
Zhenzhong


Cédric Le Goater (3):
  hw/arm: Activate IOMMUFD for virt machines
  kconfig: Activate IOMMUFD for s390x machines
  hw/i386: Activate IOMMUFD for q35 machines

Eric Auger (2):
  backends/iommufd: Introduce the iommufd object
  vfio/pci: Allow the selection of a given iommu backend

Yi Liu (2):
  util/char_dev: Add open_cdev()
  vfio/iommufd: Implement the iommufd backend

Zhenzhong Duan (20):
  vfio/common: return early if space isn't empty
  vfio/iommufd: Relax assert check for iommufd backend
  vfio/iommufd: Add support for iova_ranges and pgsizes
  vfio/pci: Extract out a helper vfio_pci_get_pci_hot_reset_info
  vfio/pci: Introduce a vfio pci hot reset interface
  vfio/iommufd: Enable pci hot reset through iommufd cdev interface
  vfio/pci: Make vfio cdev pre-openable by passing a file handle
  vfio/platform: Allow the selection of a given iommu backend
  vfio/platform: Make vfio cdev pre-openable by passing a file handle
  vfio/ap: Allow the selection of a given iommu backend
  vfio/ap: Make vfio cdev pre-openable by passing a file handle
  vfio/ccw: Allow the selection of a given iommu backend
  vfio/ccw: Make vfio cdev pre-openable by passing a file handle
  vfio: Make VFIOContainerBase poiner parameter const in VFIOIOMMUOps
    callbacks
  vfio/pci: Move VFIODevice initializations in vfio_instance_init
  vfio/platform: Move VFIODevice initializations in
    vfio_platform_instance_init
  vfio/ap: Move VFIODevice initializations in vfio_ap_instance_init
  vfio/ccw: Move VFIODevice initializations in vfio_ccw_instance_init
  vfio: Introduce a helper function to initialize VFIODevice
  docs/devel: Add VFIO iommufd backend documentation

 MAINTAINERS                           |  11 +
 docs/devel/index-internals.rst        |   1 +
 docs/devel/vfio-iommufd.rst           | 166 +++++++
 qapi/qom.json                         |  19 +
 hw/vfio/pci.h                         |   6 +
 include/hw/vfio/vfio-common.h         |  29 +-
 include/hw/vfio/vfio-container-base.h |  15 +-
 include/qemu/chardev_open.h           |  16 +
 include/sysemu/iommufd.h              |  38 ++
 backends/iommufd.c                    | 245 ++++++++++
 hw/vfio/ap.c                          |  47 +-
 hw/vfio/ccw.c                         |  53 ++-
 hw/vfio/common.c                      |  24 +-
 hw/vfio/container-base.c              |   6 +-
 hw/vfio/container.c                   | 208 ++++++++-
 hw/vfio/helpers.c                     |  54 +++
 hw/vfio/iommufd.c                     | 630 ++++++++++++++++++++++++++
 hw/vfio/pci.c                         | 218 ++-------
 hw/vfio/platform.c                    |  44 +-
 util/chardev_open.c                   |  81 ++++
 backends/Kconfig                      |   4 +
 backends/meson.build                  |   1 +
 backends/trace-events                 |  10 +
 hw/arm/Kconfig                        |   1 +
 hw/i386/Kconfig                       |   1 +
 hw/s390x/Kconfig                      |   1 +
 hw/vfio/meson.build                   |   3 +
 hw/vfio/trace-events                  |  11 +
 qemu-options.hx                       |  12 +
 util/meson.build                      |   1 +
 30 files changed, 1706 insertions(+), 250 deletions(-)
 create mode 100644 docs/devel/vfio-iommufd.rst
 create mode 100644 include/qemu/chardev_open.h
 create mode 100644 include/sysemu/iommufd.h
 create mode 100644 backends/iommufd.c
 create mode 100644 hw/vfio/iommufd.c
 create mode 100644 util/chardev_open.c

Comments

Cédric Le Goater Nov. 21, 2023, 5:22 p.m. UTC | #1
Hello Zhenzhong

On 11/21/23 09:43, Zhenzhong Duan wrote:
> Hi,
> 
> Thanks all for giving guides and comments on previous series, this is
> the remaining part of the iommufd support.
> 
> Besides suggested changes in v6, I'd like to highlight two changes
> for final review:
> 1. Instantiate can_be_deleted callback to fix race where iommufd object
>     can be deleted before vfio device
> 2. After careful re-thinking, I'd like to follow Nicolin's suggestion in v5,
>     remove is_ioas check which indeed looks heavy just for tracepoint.
>     In fact we can get corresponding info by looking over trace context.
> 
> PATCH 1: Introduce iommufd object
> PATCH 2-9: add IOMMUFD container and cdev support
> PATCH 10-17: fd passing for cdev and linking to IOMMUFD
> PATCH 18: make VFIOContainerBase parameter const
> PATCH 19-21: Compile out for IOMMUFD for arm, s390x and x86
> PATCH 22-26: vfio device init code cleanup
> PATCH 27: add iommufd doc
> 
> 
> We have done wide test with different combinations, e.g:
> - PCI device were tested
> - FD passing and hot reset with some trick.
> - device hotplug test with legacy and iommufd backends
> - with or without vIOMMU for legacy and iommufd backends
> - divices linked to different iommufds
> - VFIO migration with a E800 net card(no dirty sync support) passthrough
> - platform, ccw and ap were only compile-tested due to environment limit
> - test mdev pass through with mtty and mix with real device and different BE
> - test iommufd object hotplug/unplug and mix with vfio device plug/unplug
> 
> Given some iommufd kernel limitations, the iommufd backend is
> not yet fully on par with the legacy backend w.r.t. features like:
> - p2p mappings (you will see related error traces)
> - dirty page sync
> - and etc.
> 
> 
> qemu code: https://github.com/yiliu1765/qemu/commits/zhenzhong/iommufd_cdev_v7
> Based on vfio-next, commit id: c487fb8a50

The series is pushed on top of vfio-next in the vfio-8.2 tree :

   https://github.com/legoater/qemu/commits/vfio-8.2

with a little extra to deal with a PPC build failure.

Thanks,

C.
Nicolin Chen Nov. 21, 2023, 10:56 p.m. UTC | #2
On Tue, Nov 21, 2023 at 04:43:59PM +0800, Zhenzhong Duan wrote:
 
> qemu code: https://github.com/yiliu1765/qemu/commits/zhenzhong/iommufd_cdev_v7
> Based on vfio-next, commit id: c487fb8a50

I've tested with an aarch64-softmmu build using both legacy VFIO
passthrough and IOMMUFD+cdev, although this might be similar to
what Eric tested.

Also, tried rebasing our nesting changes on top and ran some
2-stage translation sanity using this branch:
https://github.com/nicolinc/qemu/tree/wip/iommufd_nesting-11212023-cdev-v7
(Note that the nesting branch is WIP with no stability guarantee)

I'll do more tests with vSVA cases in the next days, yet FWIW:

Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Duan, Zhenzhong Nov. 22, 2023, 3:21 a.m. UTC | #3
>-----Original Message-----
>From: Cédric Le Goater <clg@redhat.com>
>Sent: Wednesday, November 22, 2023 1:23 AM
>Subject: Re: [PATCH v7 00/27] vfio: Adopt iommufd
>
>Hello Zhenzhong
>
>On 11/21/23 09:43, Zhenzhong Duan wrote:
>> Hi,
>>
>> Thanks all for giving guides and comments on previous series, this is
>> the remaining part of the iommufd support.
>>
>> Besides suggested changes in v6, I'd like to highlight two changes
>> for final review:
>> 1. Instantiate can_be_deleted callback to fix race where iommufd object
>>     can be deleted before vfio device
>> 2. After careful re-thinking, I'd like to follow Nicolin's suggestion in v5,
>>     remove is_ioas check which indeed looks heavy just for tracepoint.
>>     In fact we can get corresponding info by looking over trace context.
>>
>> PATCH 1: Introduce iommufd object
>> PATCH 2-9: add IOMMUFD container and cdev support
>> PATCH 10-17: fd passing for cdev and linking to IOMMUFD
>> PATCH 18: make VFIOContainerBase parameter const
>> PATCH 19-21: Compile out for IOMMUFD for arm, s390x and x86
>> PATCH 22-26: vfio device init code cleanup
>> PATCH 27: add iommufd doc
>>
>>
>> We have done wide test with different combinations, e.g:
>> - PCI device were tested
>> - FD passing and hot reset with some trick.
>> - device hotplug test with legacy and iommufd backends
>> - with or without vIOMMU for legacy and iommufd backends
>> - divices linked to different iommufds
>> - VFIO migration with a E800 net card(no dirty sync support) passthrough
>> - platform, ccw and ap were only compile-tested due to environment limit
>> - test mdev pass through with mtty and mix with real device and different BE
>> - test iommufd object hotplug/unplug and mix with vfio device plug/unplug
>>
>> Given some iommufd kernel limitations, the iommufd backend is
>> not yet fully on par with the legacy backend w.r.t. features like:
>> - p2p mappings (you will see related error traces)
>> - dirty page sync
>> - and etc.
>>
>>
>> qemu code:
>https://github.com/yiliu1765/qemu/commits/zhenzhong/iommufd_cdev_v7
>> Based on vfio-next, commit id: c487fb8a50
>
>The series is pushed on top of vfio-next in the vfio-8.2 tree :
>
>   https://github.com/legoater/qemu/commits/vfio-8.2
>
>with a little extra to deal with a PPC build failure.

Thanks Cédric. That's strange I didn't see this failure on my env
which has CONFIG_VFIO_PCI enabled by default for PPC.

BRs.
Zhenzhong
Duan, Zhenzhong Nov. 22, 2023, 3:32 a.m. UTC | #4
>-----Original Message-----
>From: Nicolin Chen <nicolinc@nvidia.com>
>Sent: Wednesday, November 22, 2023 6:56 AM
>Subject: Re: [PATCH v7 00/27] vfio: Adopt iommufd
>
>On Tue, Nov 21, 2023 at 04:43:59PM +0800, Zhenzhong Duan wrote:
>
>> qemu code:
>https://github.com/yiliu1765/qemu/commits/zhenzhong/iommufd_cdev_v7
>> Based on vfio-next, commit id: c487fb8a50
>
>I've tested with an aarch64-softmmu build using both legacy VFIO
>passthrough and IOMMUFD+cdev, although this might be similar to
>what Eric tested.
>
>Also, tried rebasing our nesting changes on top and ran some
>2-stage translation sanity using this branch:
>https://github.com/nicolinc/qemu/tree/wip/iommufd_nesting-11212023-cdev-v7
>(Note that the nesting branch is WIP with no stability guarantee)
>
>I'll do more tests with vSVA cases in the next days, yet FWIW:
>
>Tested-by: Nicolin Chen <nicolinc@nvidia.com>

Thanks Nicolin.

BRs.
Zhenzhong
Cédric Le Goater Nov. 22, 2023, 8:06 a.m. UTC | #5
>> The series is pushed on top of vfio-next in the vfio-8.2 tree :
>>
>>    https://github.com/legoater/qemu/commits/vfio-8.2
>>
>> with a little extra to deal with a PPC build failure.
> 
> Thanks Cédric. That's strange I didn't see this failure on my env
> which has CONFIG_VFIO_PCI enabled by default for PPC.


The compile issue shows with --without-default-devices.

VFIO is always selected (see ppc/Kconfig) but VFIO_PCI is not when
--without-default-devices is used. Hence the compile failure because
of the common vfio-pci routines exported in pci.c.

Ideally, we should use an 'imply VFIO' in ppc/Kconfig because VFIO
is not a required subsystem for the pseries machine. If that was
the case, we wouldn't compile the VFIO EEH hooks defined in
hw/ppc/spapr_pci_vfio.c :

   bool spapr_phb_eeh_available(SpaprPhbState *sphb);
   int spapr_phb_vfio_eeh_set_option(SpaprPhbState *sphb,
                                     unsigned int addr, int option);
   int spapr_phb_vfio_eeh_get_state(SpaprPhbState *sphb, int *state);
   int spapr_phb_vfio_eeh_reset(SpaprPhbState *sphb, int option);
   int spapr_phb_vfio_eeh_configure(SpaprPhbState *sphb);
   void spapr_phb_vfio_reset(DeviceState *qdev);

It is not that simple to fix. The simpler approach is to force compile
of VFIO PCI in ppc/Kconfig with a 'select VFIO_PCI'. we should improve
that.

Thanks,

C.
Duan, Zhenzhong Nov. 22, 2023, 11:49 a.m. UTC | #6
>-----Original Message-----
>From: Cédric Le Goater <clg@redhat.com>
>Sent: Wednesday, November 22, 2023 4:07 PM
>Subject: Re: [PATCH v7 00/27] vfio: Adopt iommufd
>
>
>>> The series is pushed on top of vfio-next in the vfio-8.2 tree :
>>>
>>>    https://github.com/legoater/qemu/commits/vfio-8.2
>>>
>>> with a little extra to deal with a PPC build failure.
>>
>> Thanks Cédric. That's strange I didn't see this failure on my env
>> which has CONFIG_VFIO_PCI enabled by default for PPC.
>
>
>The compile issue shows with --without-default-devices.
>
>VFIO is always selected (see ppc/Kconfig) but VFIO_PCI is not when
>--without-default-devices is used. Hence the compile failure because
>of the common vfio-pci routines exported in pci.c.

Clear, thanks

>
>Ideally, we should use an 'imply VFIO' in ppc/Kconfig because VFIO
>is not a required subsystem for the pseries machine. If that was
>the case, we wouldn't compile the VFIO EEH hooks defined in
>hw/ppc/spapr_pci_vfio.c :
>
>   bool spapr_phb_eeh_available(SpaprPhbState *sphb);
>   int spapr_phb_vfio_eeh_set_option(SpaprPhbState *sphb,
>                                     unsigned int addr, int option);
>   int spapr_phb_vfio_eeh_get_state(SpaprPhbState *sphb, int *state);
>   int spapr_phb_vfio_eeh_reset(SpaprPhbState *sphb, int option);
>   int spapr_phb_vfio_eeh_configure(SpaprPhbState *sphb);
>   void spapr_phb_vfio_reset(DeviceState *qdev);

Indeed, I reproduced same after some try out.
Maybe need a stub file for those functions.

>
>It is not that simple to fix. The simpler approach is to force compile
>of VFIO PCI in ppc/Kconfig with a 'select VFIO_PCI'. we should improve
>that.

Yes, not a blocking issue after your fix. I can have a try when I'm idle.

Thanks
Zhenzhong
Joao Martins Nov. 22, 2023, 1:48 p.m. UTC | #7
On 21/11/2023 08:43, Zhenzhong Duan wrote:
> Hi,
> 
> Thanks all for giving guides and comments on previous series, this is
> the remaining part of the iommufd support.
> 
> Besides suggested changes in v6, I'd like to highlight two changes
> for final review:
> 1. Instantiate can_be_deleted callback to fix race where iommufd object
>    can be deleted before vfio device
> 2. After careful re-thinking, I'd like to follow Nicolin's suggestion in v5,
>    remove is_ioas check which indeed looks heavy just for tracepoint.
>    In fact we can get corresponding info by looking over trace context.
> 
> PATCH 1: Introduce iommufd object
> PATCH 2-9: add IOMMUFD container and cdev support
> PATCH 10-17: fd passing for cdev and linking to IOMMUFD
> PATCH 18: make VFIOContainerBase parameter const
> PATCH 19-21: Compile out for IOMMUFD for arm, s390x and x86
> PATCH 22-26: vfio device init code cleanup
> PATCH 27: add iommufd doc
> 
> 
> We have done wide test with different combinations, e.g:
> - PCI device were tested
> - FD passing and hot reset with some trick.
> - device hotplug test with legacy and iommufd backends
> - with or without vIOMMU for legacy and iommufd backends
> - divices linked to different iommufds
> - VFIO migration with a E800 net card(no dirty sync support) passthrough
> - platform, ccw and ap were only compile-tested due to environment limit
> - test mdev pass through with mtty and mix with real device and different BE
> - test iommufd object hotplug/unplug and mix with vfio device plug/unplug
> 
> Given some iommufd kernel limitations, the iommufd backend is
> not yet fully on par with the legacy backend w.r.t. features like:
> - p2p mappings (you will see related error traces)
> - dirty page sync

Just as a quick update: I intend to follow-up with this (alongside my other
debt) when I am back from vacation ~2weeks from now. The one thing needed before
dirty tracking enabling is a userspace autodomains that's equivalent to the kernel.

I don't know what to do about mdevs tbh, considering there's no hw domain going
on it's just telling iommufd that 'hey I am accessing these pages' (IIUC I could
be missing something). Right now it just falls back to IOAS attach instead of
hwpt-id attach if it all fails;

Here's a partial snip below that is trying to match kernel
iommufd_device_auto_get_domain(). It's not that far from Zhenzhong v5
implementation, except that we handle the -EINVAL from the attach to allocate a
new hwpt for said device, and if that fails then just bail out like the kernel.

 static int iommufd_cdev_attach_container(VFIODevice *vbasedev,
                                          VFIOIOMMUFDContainer *container,
                                          Error **errp)
 {
-    return iommufd_cdev_attach_ioas_hwpt(vbasedev, container->ioas_id, errp);
+    uint32_t pt_id;
+
+    if (!iommufd_cdev_autodomains_get(vbasedev, container, &pt_id, errp)) {
+        return 0;
+    }
+    QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
+        ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, &err);
+        if (ret) {
+            /* -EINVAL means the domain is incompatible with the device. */
+            if (ret == -EINVAL) {
+                continue;
+            }
+            return ret;
+        } else {
+            vbasedev->hwpt = hwpt;
+            *hwpt_id = hwpt->hwpt_id;
+            return 0;
+        }
+    }
+
+    ret = iommufd_backend_alloc_hwpt(iommufd, vbasedev->devid,
+                                     container->ioas_id, 0, 0, 0,
+                                     NULL, hwpt_id);
+    if (ret) {
+        error_setg_errno(errp, errno, "error alloc shadow hwpt");
+        return ret;
+    }
+
+    hwpt = g_malloc0(sizeof(*hwpt));
+    hwpt->hwpt_id = *hwpt_id;
+    QLIST_INIT(&hwpt->device_list);
+
+    ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, &err);
+    if (ret) {
+        iommufd_backend_free_id(vbasedev->iommufd, hwpt->hwpt_id);
+        g_free(hwpt);
+        return ret;
+    }
+
+    QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next);
+    return 0;
+}
+
 static int iommufd_cdev_attach_container(VFIODevice *vbasedev,
                                          VFIOIOMMUFDContainer *container,
                                          Error **errp)
 {
-    return iommufd_cdev_attach_ioas_hwpt(vbasedev, container->ioas_id, errp);
+    uint32_t pt_id;
+
+    if (!iommufd_cdev_autodomains_get(vbasedev, container, &pt_id, errp)) {
+        return 0;
+    }
+
+    pt_id = container->ioas_id;
+    return iommufd_cdev_attach_ioas_hwpt(vbasedev, pt_id, errp);
 }
Cédric Le Goater Nov. 28, 2023, 5:10 p.m. UTC | #8
On 11/21/23 09:43, Zhenzhong Duan wrote:
> Hi,
> 
> Thanks all for giving guides and comments on previous series, this is
> the remaining part of the iommufd support.
> 
> Besides suggested changes in v6, I'd like to highlight two changes
> for final review:
> 1. Instantiate can_be_deleted callback to fix race where iommufd object
>     can be deleted before vfio device
> 2. After careful re-thinking, I'd like to follow Nicolin's suggestion in v5,
>     remove is_ioas check which indeed looks heavy just for tracepoint.
>     In fact we can get corresponding info by looking over trace context.
> 
> PATCH 1: Introduce iommufd object
> PATCH 2-9: add IOMMUFD container and cdev support
> PATCH 10-17: fd passing for cdev and linking to IOMMUFD
> PATCH 18: make VFIOContainerBase parameter const
> PATCH 19-21: Compile out for IOMMUFD for arm, s390x and x86
> PATCH 22-26: vfio device init code cleanup
> PATCH 27: add iommufd doc



Applied to vfio-next.

Thanks,

C.