Message ID | 20231121084426.1286987-1-zhenzhong.duan@intel.com (mailing list archive) |
---|---|
Headers | show |
Series | vfio: Adopt iommufd | expand |
Hello Zhenzhong On 11/21/23 09:43, Zhenzhong Duan wrote: > Hi, > > Thanks all for giving guides and comments on previous series, this is > the remaining part of the iommufd support. > > Besides suggested changes in v6, I'd like to highlight two changes > for final review: > 1. Instantiate can_be_deleted callback to fix race where iommufd object > can be deleted before vfio device > 2. After careful re-thinking, I'd like to follow Nicolin's suggestion in v5, > remove is_ioas check which indeed looks heavy just for tracepoint. > In fact we can get corresponding info by looking over trace context. > > PATCH 1: Introduce iommufd object > PATCH 2-9: add IOMMUFD container and cdev support > PATCH 10-17: fd passing for cdev and linking to IOMMUFD > PATCH 18: make VFIOContainerBase parameter const > PATCH 19-21: Compile out for IOMMUFD for arm, s390x and x86 > PATCH 22-26: vfio device init code cleanup > PATCH 27: add iommufd doc > > > We have done wide test with different combinations, e.g: > - PCI device were tested > - FD passing and hot reset with some trick. > - device hotplug test with legacy and iommufd backends > - with or without vIOMMU for legacy and iommufd backends > - divices linked to different iommufds > - VFIO migration with a E800 net card(no dirty sync support) passthrough > - platform, ccw and ap were only compile-tested due to environment limit > - test mdev pass through with mtty and mix with real device and different BE > - test iommufd object hotplug/unplug and mix with vfio device plug/unplug > > Given some iommufd kernel limitations, the iommufd backend is > not yet fully on par with the legacy backend w.r.t. features like: > - p2p mappings (you will see related error traces) > - dirty page sync > - and etc. > > > qemu code: https://github.com/yiliu1765/qemu/commits/zhenzhong/iommufd_cdev_v7 > Based on vfio-next, commit id: c487fb8a50 The series is pushed on top of vfio-next in the vfio-8.2 tree : https://github.com/legoater/qemu/commits/vfio-8.2 with a little extra to deal with a PPC build failure. Thanks, C.
On Tue, Nov 21, 2023 at 04:43:59PM +0800, Zhenzhong Duan wrote: > qemu code: https://github.com/yiliu1765/qemu/commits/zhenzhong/iommufd_cdev_v7 > Based on vfio-next, commit id: c487fb8a50 I've tested with an aarch64-softmmu build using both legacy VFIO passthrough and IOMMUFD+cdev, although this might be similar to what Eric tested. Also, tried rebasing our nesting changes on top and ran some 2-stage translation sanity using this branch: https://github.com/nicolinc/qemu/tree/wip/iommufd_nesting-11212023-cdev-v7 (Note that the nesting branch is WIP with no stability guarantee) I'll do more tests with vSVA cases in the next days, yet FWIW: Tested-by: Nicolin Chen <nicolinc@nvidia.com>
>-----Original Message----- >From: Cédric Le Goater <clg@redhat.com> >Sent: Wednesday, November 22, 2023 1:23 AM >Subject: Re: [PATCH v7 00/27] vfio: Adopt iommufd > >Hello Zhenzhong > >On 11/21/23 09:43, Zhenzhong Duan wrote: >> Hi, >> >> Thanks all for giving guides and comments on previous series, this is >> the remaining part of the iommufd support. >> >> Besides suggested changes in v6, I'd like to highlight two changes >> for final review: >> 1. Instantiate can_be_deleted callback to fix race where iommufd object >> can be deleted before vfio device >> 2. After careful re-thinking, I'd like to follow Nicolin's suggestion in v5, >> remove is_ioas check which indeed looks heavy just for tracepoint. >> In fact we can get corresponding info by looking over trace context. >> >> PATCH 1: Introduce iommufd object >> PATCH 2-9: add IOMMUFD container and cdev support >> PATCH 10-17: fd passing for cdev and linking to IOMMUFD >> PATCH 18: make VFIOContainerBase parameter const >> PATCH 19-21: Compile out for IOMMUFD for arm, s390x and x86 >> PATCH 22-26: vfio device init code cleanup >> PATCH 27: add iommufd doc >> >> >> We have done wide test with different combinations, e.g: >> - PCI device were tested >> - FD passing and hot reset with some trick. >> - device hotplug test with legacy and iommufd backends >> - with or without vIOMMU for legacy and iommufd backends >> - divices linked to different iommufds >> - VFIO migration with a E800 net card(no dirty sync support) passthrough >> - platform, ccw and ap were only compile-tested due to environment limit >> - test mdev pass through with mtty and mix with real device and different BE >> - test iommufd object hotplug/unplug and mix with vfio device plug/unplug >> >> Given some iommufd kernel limitations, the iommufd backend is >> not yet fully on par with the legacy backend w.r.t. features like: >> - p2p mappings (you will see related error traces) >> - dirty page sync >> - and etc. >> >> >> qemu code: >https://github.com/yiliu1765/qemu/commits/zhenzhong/iommufd_cdev_v7 >> Based on vfio-next, commit id: c487fb8a50 > >The series is pushed on top of vfio-next in the vfio-8.2 tree : > > https://github.com/legoater/qemu/commits/vfio-8.2 > >with a little extra to deal with a PPC build failure. Thanks Cédric. That's strange I didn't see this failure on my env which has CONFIG_VFIO_PCI enabled by default for PPC. BRs. Zhenzhong
>-----Original Message----- >From: Nicolin Chen <nicolinc@nvidia.com> >Sent: Wednesday, November 22, 2023 6:56 AM >Subject: Re: [PATCH v7 00/27] vfio: Adopt iommufd > >On Tue, Nov 21, 2023 at 04:43:59PM +0800, Zhenzhong Duan wrote: > >> qemu code: >https://github.com/yiliu1765/qemu/commits/zhenzhong/iommufd_cdev_v7 >> Based on vfio-next, commit id: c487fb8a50 > >I've tested with an aarch64-softmmu build using both legacy VFIO >passthrough and IOMMUFD+cdev, although this might be similar to >what Eric tested. > >Also, tried rebasing our nesting changes on top and ran some >2-stage translation sanity using this branch: >https://github.com/nicolinc/qemu/tree/wip/iommufd_nesting-11212023-cdev-v7 >(Note that the nesting branch is WIP with no stability guarantee) > >I'll do more tests with vSVA cases in the next days, yet FWIW: > >Tested-by: Nicolin Chen <nicolinc@nvidia.com> Thanks Nicolin. BRs. Zhenzhong
>> The series is pushed on top of vfio-next in the vfio-8.2 tree : >> >> https://github.com/legoater/qemu/commits/vfio-8.2 >> >> with a little extra to deal with a PPC build failure. > > Thanks Cédric. That's strange I didn't see this failure on my env > which has CONFIG_VFIO_PCI enabled by default for PPC. The compile issue shows with --without-default-devices. VFIO is always selected (see ppc/Kconfig) but VFIO_PCI is not when --without-default-devices is used. Hence the compile failure because of the common vfio-pci routines exported in pci.c. Ideally, we should use an 'imply VFIO' in ppc/Kconfig because VFIO is not a required subsystem for the pseries machine. If that was the case, we wouldn't compile the VFIO EEH hooks defined in hw/ppc/spapr_pci_vfio.c : bool spapr_phb_eeh_available(SpaprPhbState *sphb); int spapr_phb_vfio_eeh_set_option(SpaprPhbState *sphb, unsigned int addr, int option); int spapr_phb_vfio_eeh_get_state(SpaprPhbState *sphb, int *state); int spapr_phb_vfio_eeh_reset(SpaprPhbState *sphb, int option); int spapr_phb_vfio_eeh_configure(SpaprPhbState *sphb); void spapr_phb_vfio_reset(DeviceState *qdev); It is not that simple to fix. The simpler approach is to force compile of VFIO PCI in ppc/Kconfig with a 'select VFIO_PCI'. we should improve that. Thanks, C.
>-----Original Message----- >From: Cédric Le Goater <clg@redhat.com> >Sent: Wednesday, November 22, 2023 4:07 PM >Subject: Re: [PATCH v7 00/27] vfio: Adopt iommufd > > >>> The series is pushed on top of vfio-next in the vfio-8.2 tree : >>> >>> https://github.com/legoater/qemu/commits/vfio-8.2 >>> >>> with a little extra to deal with a PPC build failure. >> >> Thanks Cédric. That's strange I didn't see this failure on my env >> which has CONFIG_VFIO_PCI enabled by default for PPC. > > >The compile issue shows with --without-default-devices. > >VFIO is always selected (see ppc/Kconfig) but VFIO_PCI is not when >--without-default-devices is used. Hence the compile failure because >of the common vfio-pci routines exported in pci.c. Clear, thanks > >Ideally, we should use an 'imply VFIO' in ppc/Kconfig because VFIO >is not a required subsystem for the pseries machine. If that was >the case, we wouldn't compile the VFIO EEH hooks defined in >hw/ppc/spapr_pci_vfio.c : > > bool spapr_phb_eeh_available(SpaprPhbState *sphb); > int spapr_phb_vfio_eeh_set_option(SpaprPhbState *sphb, > unsigned int addr, int option); > int spapr_phb_vfio_eeh_get_state(SpaprPhbState *sphb, int *state); > int spapr_phb_vfio_eeh_reset(SpaprPhbState *sphb, int option); > int spapr_phb_vfio_eeh_configure(SpaprPhbState *sphb); > void spapr_phb_vfio_reset(DeviceState *qdev); Indeed, I reproduced same after some try out. Maybe need a stub file for those functions. > >It is not that simple to fix. The simpler approach is to force compile >of VFIO PCI in ppc/Kconfig with a 'select VFIO_PCI'. we should improve >that. Yes, not a blocking issue after your fix. I can have a try when I'm idle. Thanks Zhenzhong
On 21/11/2023 08:43, Zhenzhong Duan wrote: > Hi, > > Thanks all for giving guides and comments on previous series, this is > the remaining part of the iommufd support. > > Besides suggested changes in v6, I'd like to highlight two changes > for final review: > 1. Instantiate can_be_deleted callback to fix race where iommufd object > can be deleted before vfio device > 2. After careful re-thinking, I'd like to follow Nicolin's suggestion in v5, > remove is_ioas check which indeed looks heavy just for tracepoint. > In fact we can get corresponding info by looking over trace context. > > PATCH 1: Introduce iommufd object > PATCH 2-9: add IOMMUFD container and cdev support > PATCH 10-17: fd passing for cdev and linking to IOMMUFD > PATCH 18: make VFIOContainerBase parameter const > PATCH 19-21: Compile out for IOMMUFD for arm, s390x and x86 > PATCH 22-26: vfio device init code cleanup > PATCH 27: add iommufd doc > > > We have done wide test with different combinations, e.g: > - PCI device were tested > - FD passing and hot reset with some trick. > - device hotplug test with legacy and iommufd backends > - with or without vIOMMU for legacy and iommufd backends > - divices linked to different iommufds > - VFIO migration with a E800 net card(no dirty sync support) passthrough > - platform, ccw and ap were only compile-tested due to environment limit > - test mdev pass through with mtty and mix with real device and different BE > - test iommufd object hotplug/unplug and mix with vfio device plug/unplug > > Given some iommufd kernel limitations, the iommufd backend is > not yet fully on par with the legacy backend w.r.t. features like: > - p2p mappings (you will see related error traces) > - dirty page sync Just as a quick update: I intend to follow-up with this (alongside my other debt) when I am back from vacation ~2weeks from now. The one thing needed before dirty tracking enabling is a userspace autodomains that's equivalent to the kernel. I don't know what to do about mdevs tbh, considering there's no hw domain going on it's just telling iommufd that 'hey I am accessing these pages' (IIUC I could be missing something). Right now it just falls back to IOAS attach instead of hwpt-id attach if it all fails; Here's a partial snip below that is trying to match kernel iommufd_device_auto_get_domain(). It's not that far from Zhenzhong v5 implementation, except that we handle the -EINVAL from the attach to allocate a new hwpt for said device, and if that fails then just bail out like the kernel. static int iommufd_cdev_attach_container(VFIODevice *vbasedev, VFIOIOMMUFDContainer *container, Error **errp) { - return iommufd_cdev_attach_ioas_hwpt(vbasedev, container->ioas_id, errp); + uint32_t pt_id; + + if (!iommufd_cdev_autodomains_get(vbasedev, container, &pt_id, errp)) { + return 0; + } + QLIST_FOREACH(hwpt, &container->hwpt_list, next) { + ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, &err); + if (ret) { + /* -EINVAL means the domain is incompatible with the device. */ + if (ret == -EINVAL) { + continue; + } + return ret; + } else { + vbasedev->hwpt = hwpt; + *hwpt_id = hwpt->hwpt_id; + return 0; + } + } + + ret = iommufd_backend_alloc_hwpt(iommufd, vbasedev->devid, + container->ioas_id, 0, 0, 0, + NULL, hwpt_id); + if (ret) { + error_setg_errno(errp, errno, "error alloc shadow hwpt"); + return ret; + } + + hwpt = g_malloc0(sizeof(*hwpt)); + hwpt->hwpt_id = *hwpt_id; + QLIST_INIT(&hwpt->device_list); + + ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, &err); + if (ret) { + iommufd_backend_free_id(vbasedev->iommufd, hwpt->hwpt_id); + g_free(hwpt); + return ret; + } + + QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next); + return 0; +} + static int iommufd_cdev_attach_container(VFIODevice *vbasedev, VFIOIOMMUFDContainer *container, Error **errp) { - return iommufd_cdev_attach_ioas_hwpt(vbasedev, container->ioas_id, errp); + uint32_t pt_id; + + if (!iommufd_cdev_autodomains_get(vbasedev, container, &pt_id, errp)) { + return 0; + } + + pt_id = container->ioas_id; + return iommufd_cdev_attach_ioas_hwpt(vbasedev, pt_id, errp); }
On 11/21/23 09:43, Zhenzhong Duan wrote: > Hi, > > Thanks all for giving guides and comments on previous series, this is > the remaining part of the iommufd support. > > Besides suggested changes in v6, I'd like to highlight two changes > for final review: > 1. Instantiate can_be_deleted callback to fix race where iommufd object > can be deleted before vfio device > 2. After careful re-thinking, I'd like to follow Nicolin's suggestion in v5, > remove is_ioas check which indeed looks heavy just for tracepoint. > In fact we can get corresponding info by looking over trace context. > > PATCH 1: Introduce iommufd object > PATCH 2-9: add IOMMUFD container and cdev support > PATCH 10-17: fd passing for cdev and linking to IOMMUFD > PATCH 18: make VFIOContainerBase parameter const > PATCH 19-21: Compile out for IOMMUFD for arm, s390x and x86 > PATCH 22-26: vfio device init code cleanup > PATCH 27: add iommufd doc Applied to vfio-next. Thanks, C.