mbox series

[v5,00/19] Add vfio_device cdev for iommufd support

Message ID 20230227111135.61728-1-yi.l.liu@intel.com (mailing list archive)
Headers show
Series Add vfio_device cdev for iommufd support | expand

Message

Yi Liu Feb. 27, 2023, 11:11 a.m. UTC
Existing VFIO provides group-centric user APIs for userspace. Userspace
opens the /dev/vfio/$group_id first before getting device fd and hence
getting access to device. This is not the desired model for iommufd. Per
the conclusion of community discussion[1], iommufd provides device-centric
kAPIs and requires its consumer (like VFIO) to be device-centric user
APIs. Such user APIs are used to associate device with iommufd and also
the I/O address spaces managed by the iommufd.

This series first introduces a per device file structure to be prepared
for further enhancement and refactors the kvm-vfio code to be prepared
for accepting device file from userspace. Then refactors the vfio to be
able to handle iommufd binding. This refactor includes the mechanism of
blocking device access before iommufd bind, making the device_open exclusive.
between the group path and the cdev path. Eventually, adds the cdev support
for vfio device, and makes group infrastructure optional as it is not needed
when vfio device cdev is compiled.

This is also a prerequisite for iommu nesting for vfio device[2].

The complete code can be found in below branch, simple test done with the
legacy group path and the cdev path. Draft QEMU branch can be found at[3]

https://github.com/yiliu1765/iommufd/tree/vfio_device_cdev_v5
(config CONFIG_IOMMUFD=y CONFIG_VFIO_DEVICE_CDEV=y)

base-commit: 63777bd2daa3625da6eada88bd9081f047664dad

[1] https://lore.kernel.org/kvm/BN9PR11MB5433B1E4AE5B0480369F97178C189@BN9PR11MB5433.namprd11.prod.outlook.com/
[2] https://lore.kernel.org/linux-iommu/20230209043153.14964-1-yi.l.liu@intel.com/
[3] https://github.com/yiliu1765/qemu/tree/iommufd_rfcv3 (it is based on Eric's
    QEMU iommufd rfcv3 (https://lore.kernel.org/kvm/20230131205305.2726330-1-eric.auger@redhat.com/)
    plus two commits to align with vfio_device_cdev v3/v4/v5)

Change log:

v5:
 - Add r-b from Kevin on patch 08, 13, 14, 15 and 17.
 - Rename patch 02 to limit the change for KVM facing kAPIs. The vfio pci
   hot reset path only accepts group file until patch 09. (Kevin)
 - Update comment around smp_load_acquire(&df->access_granted) (Yan)
 - Adopt Jason's suggestion on the vfio pci hot reset path, passing zero-length
   fd array to indicate using bound iommufd_ctx as ownership check. (Jason, Kevin)
 - Direct read df->access_granted value in vfio_device_cdev_close() (Kevin, Yan, Jason)
 - Wrap the iommufd get/put into a helper to refine the error path of
   vfio_device_ioctl_bind_iommufd(). (Yan)

v4: https://lore.kernel.org/kvm/20230221034812.138051-1-yi.l.liu@intel.com/
 - Add r-b from Kevin on patch 09/10
 - Add a line in devices/vfio.rst to emphasize user should add group/device to
   KVM prior to invoke open_device op which may be called in the VFIO_GROUP_GET_DEVICE_FD
   or VFIO_DEVICE_BIND_IOMMUFD ioctl.
 - Modify VFIO_GROUP/VFIO_DEVICE_CDEV Kconfig dependency (Alex)
 - Select VFIO_GROUP for SPAPR (Jason)
 - Check device fully-opened in PCI hotreset path for device fd (Jason)
 - Set df->access_granted in the caller of vfio_device_open() since
   the caller may fail in other operations, but df->access_granted
   does not allow a true to false change. So it should be set only when
   the open path is really done successfully. (Yan, Kevin)
 - Fix missing iommufd_ctx_put() in the cdev path (Yan)
 - Fix an issue found in testing exclusion between group and cdev path.
   vfio_device_cdev_close() should check df->access_granted before heading
   to other operations.
 - Update vfio.rst for iommufd/cdev

v3: https://lore.kernel.org/kvm/20230213151348.56451-1-yi.l.liu@intel.com/
 - Add r-b from Kevin on patch 03, 06, 07, 08.
 - Refine the group and cdev path exclusion. Remove vfio_device:single_open;
   add vfio_group::cdev_device_open_cnt to achieve exlucsion between group
   path and cdev path (Kevin, Jason)
 - Fix a bug in the error handling path (Yan Zhao)
 - Address misc remarks from Kevin

v2: https://lore.kernel.org/kvm/20230206090532.95598-1-yi.l.liu@intel.com/
 - Add r-b from Kevin and Eric on patch 01 02 04.
 - "Split kvm/vfio: Provide struct kvm_device_ops::release() insted of ::destroy()"
   from this series and got applied. (Alex, Kevin, Jason, Mathhew)
 - Add kvm_ref_lock to protect vfio_device_file->kvm instead of reusing
   dev_set->lock as dead-lock is observed with vfio-ap which would try to
   acquire kvm_lock. This is opposite lock order with kvm_device_release()
   which holds kvm_lock first and then hold dev_set->lock. (Kevin)
 - Use a separate ioctl for detaching IOAS. (Alex)
 - Rename vfio_device_file::single_open to be is_cdev_device (Kevin, Alex)
 - Move the vfio device cdev code into device_cdev.c and add a VFIO_DEVICE_CDEV
   kconfig for it. (Kevin, Jason)

v1: https://lore.kernel.org/kvm/20230117134942.101112-1-yi.l.liu@intel.com/
 - Fix the circular refcount between kvm struct and device file reference. (JasonG)
 - Address comments from KevinT
 - Remained the ioctl for detach, needs to Alex's taste
   (https://lore.kernel.org/kvm/BN9PR11MB5276BE9F4B0613EE859317028CFF9@BN9PR11MB5276.namprd11.prod.outlook.com/)

rfc: https://lore.kernel.org/kvm/20221219084718.9342-1-yi.l.liu@intel.com/

Thanks,
	Yi Liu

Yi Liu (19):
  vfio: Allocate per device file structure
  vfio: Refine vfio file kAPIs for KVM
  vfio: Accept vfio device file in the KVM facing kAPI
  kvm/vfio: Rename kvm_vfio_group to prepare for accepting vfio device
    fd
  kvm/vfio: Accept vfio device file from userspace
  vfio: Pass struct vfio_device_file * to vfio_device_open/close()
  vfio: Block device access via device fd until device is opened
  vfio/pci: Update comment around group_fd get in
    vfio_pci_ioctl_pci_hot_reset()
  vfio/pci: Allow passing zero-length fd array in
    VFIO_DEVICE_PCI_HOT_RESET
  vfio: Add infrastructure for bind_iommufd from userspace
  vfio-iommufd: Add detach_ioas support for physical VFIO devices
  vfio-iommufd: Add detach_ioas for emulated VFIO devices
  vfio: Add cdev_device_open_cnt to vfio_group
  vfio: Make vfio_device_open() single open for device cdev path
  vfio: Add cdev for vfio_device
  vfio: Add VFIO_DEVICE_BIND_IOMMUFD
  vfio: Add VFIO_DEVICE_AT[DE]TACH_IOMMUFD_PT
  vfio: Compile group optionally
  docs: vfio: Add vfio device cdev description

 Documentation/driver-api/vfio.rst             | 133 +++++++-
 Documentation/virt/kvm/devices/vfio.rst       |  52 ++--
 drivers/gpu/drm/i915/gvt/kvmgt.c              |   1 +
 drivers/s390/cio/vfio_ccw_ops.c               |   1 +
 drivers/s390/crypto/vfio_ap_ops.c             |   1 +
 drivers/vfio/Kconfig                          |  26 ++
 drivers/vfio/Makefile                         |   3 +-
 drivers/vfio/device_cdev.c                    | 285 ++++++++++++++++++
 drivers/vfio/fsl-mc/vfio_fsl_mc.c             |   1 +
 drivers/vfio/group.c                          | 139 ++++++---
 drivers/vfio/iommufd.c                        |  59 +++-
 .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c    |   2 +
 drivers/vfio/pci/mlx5/main.c                  |   1 +
 drivers/vfio/pci/vfio_pci.c                   |   1 +
 drivers/vfio/pci/vfio_pci_core.c              | 116 +++++--
 drivers/vfio/platform/vfio_amba.c             |   1 +
 drivers/vfio/platform/vfio_platform.c         |   1 +
 drivers/vfio/vfio.h                           | 192 +++++++++++-
 drivers/vfio/vfio_main.c                      | 244 +++++++++++++--
 include/linux/iommufd.h                       |   6 +
 include/linux/vfio.h                          |  40 ++-
 include/uapi/linux/kvm.h                      |  16 +-
 include/uapi/linux/vfio.h                     | 102 +++++++
 virt/kvm/vfio.c                               | 141 ++++-----
 24 files changed, 1348 insertions(+), 216 deletions(-)
 create mode 100644 drivers/vfio/device_cdev.c

Comments

Jason Gunthorpe Feb. 27, 2023, 7:21 p.m. UTC | #1
On Mon, Feb 27, 2023 at 03:11:16AM -0800, Yi Liu wrote:
> Existing VFIO provides group-centric user APIs for userspace. Userspace
> opens the /dev/vfio/$group_id first before getting device fd and hence
> getting access to device. This is not the desired model for iommufd. Per
> the conclusion of community discussion[1], iommufd provides device-centric
> kAPIs and requires its consumer (like VFIO) to be device-centric user
> APIs. Such user APIs are used to associate device with iommufd and also
> the I/O address spaces managed by the iommufd.
> 
> This series first introduces a per device file structure to be prepared
> for further enhancement and refactors the kvm-vfio code to be prepared
> for accepting device file from userspace. Then refactors the vfio to be
> able to handle iommufd binding. This refactor includes the mechanism of
> blocking device access before iommufd bind, making the device_open exclusive.
> between the group path and the cdev path. Eventually, adds the cdev support
> for vfio device, and makes group infrastructure optional as it is not needed
> when vfio device cdev is compiled.
> 
> This is also a prerequisite for iommu nesting for vfio device[2].
> 
> The complete code can be found in below branch, simple test done with the
> legacy group path and the cdev path. Draft QEMU branch can be found at[3]
> 
> https://github.com/yiliu1765/iommufd/tree/vfio_device_cdev_v5
> (config CONFIG_IOMMUFD=y CONFIG_VFIO_DEVICE_CDEV=y)
> 
> base-commit: 63777bd2daa3625da6eada88bd9081f047664dad

This needs to be rebased onto a clean v6.3-rc1 when it comes out

Jason
Yi Liu Feb. 28, 2023, 3:03 a.m. UTC | #2
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, February 28, 2023 3:21 AM
> 
> On Mon, Feb 27, 2023 at 03:11:16AM -0800, Yi Liu wrote:
> > Existing VFIO provides group-centric user APIs for userspace. Userspace
> > opens the /dev/vfio/$group_id first before getting device fd and hence
> > getting access to device. This is not the desired model for iommufd. Per
> > the conclusion of community discussion[1], iommufd provides device-
> centric
> > kAPIs and requires its consumer (like VFIO) to be device-centric user
> > APIs. Such user APIs are used to associate device with iommufd and also
> > the I/O address spaces managed by the iommufd.
> >
> > This series first introduces a per device file structure to be prepared
> > for further enhancement and refactors the kvm-vfio code to be prepared
> > for accepting device file from userspace. Then refactors the vfio to be
> > able to handle iommufd binding. This refactor includes the mechanism of
> > blocking device access before iommufd bind, making the device_open
> exclusive.
> > between the group path and the cdev path. Eventually, adds the cdev
> support
> > for vfio device, and makes group infrastructure optional as it is not needed
> > when vfio device cdev is compiled.
> >
> > This is also a prerequisite for iommu nesting for vfio device[2].
> >
> > The complete code can be found in below branch, simple test done with
> the
> > legacy group path and the cdev path. Draft QEMU branch can be found
> at[3]
> >
> > https://github.com/yiliu1765/iommufd/tree/vfio_device_cdev_v5
> > (config CONFIG_IOMMUFD=y CONFIG_VFIO_DEVICE_CDEV=y)
> >
> > base-commit: 63777bd2daa3625da6eada88bd9081f047664dad
> 
> This needs to be rebased onto a clean v6.3-rc1 when it comes out

Yes, I'll send rebase and send one more version when v6.3-rc1
comes. Here just try to be near to the vfio code in Alex's next
branch.

Regards,
Yi Liu
Xu, Terrence Feb. 28, 2023, 4:58 p.m. UTC | #3
> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Tuesday, February 28, 2023 11:03 AM
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Tuesday, February 28, 2023 3:21 AM
> >
> > On Mon, Feb 27, 2023 at 03:11:16AM -0800, Yi Liu wrote:
> > > Existing VFIO provides group-centric user APIs for userspace.
> > > Userspace opens the /dev/vfio/$group_id first before getting device
> > > fd and hence getting access to device. This is not the desired model
> > > for iommufd. Per the conclusion of community discussion[1], iommufd
> > > provides device-
> > centric
> > > kAPIs and requires its consumer (like VFIO) to be device-centric
> > > user APIs. Such user APIs are used to associate device with iommufd
> > > and also the I/O address spaces managed by the iommufd.
> > >
> > > This series first introduces a per device file structure to be
> > > prepared for further enhancement and refactors the kvm-vfio code to
> > > be prepared for accepting device file from userspace. Then refactors
> > > the vfio to be able to handle iommufd binding. This refactor
> > > includes the mechanism of blocking device access before iommufd
> > > bind, making the device_open
> > exclusive.
> > > between the group path and the cdev path. Eventually, adds the cdev
> > support
> > > for vfio device, and makes group infrastructure optional as it is
> > > not needed when vfio device cdev is compiled.
> > >
> > > This is also a prerequisite for iommu nesting for vfio device[2].
> > >
> > > The complete code can be found in below branch, simple test done
> > > with
> > the
> > > legacy group path and the cdev path. Draft QEMU branch can be found
> > at[3]
> > >
> > > https://github.com/yiliu1765/iommufd/tree/vfio_device_cdev_v5
> > > (config CONFIG_IOMMUFD=y CONFIG_VFIO_DEVICE_CDEV=y)
> > >
> > > base-commit: 63777bd2daa3625da6eada88bd9081f047664dad
> >
> > This needs to be rebased onto a clean v6.3-rc1 when it comes out
> 
> Yes, I'll send rebase and send one more version when v6.3-rc1 comes. Here
> just try to be near to the vfio code in Alex's next branch.
> 
> Regards,
> Yi Liu

Verified this series by "Intel GVT-g GPU device mediated passthrough" and "Intel GVT-d GPU device direct passthrough" technologies.
Both passed VFIO legacy mode / compat mode / cdev mode, including negative tests.

Tested-by: Terrence Xu <terrence.xu@intel.com>
Nicolin Chen March 1, 2023, 2:29 a.m. UTC | #4
On Tue, Feb 28, 2023 at 04:58:06PM +0000, Xu, Terrence wrote:

> Verified this series by "Intel GVT-g GPU device mediated passthrough" and "Intel GVT-d GPU device direct passthrough" technologies.
> Both passed VFIO legacy mode / compat mode / cdev mode, including negative tests.
> 
> Tested-by: Terrence Xu <terrence.xu@intel.com>

Sanity-tested this series on ARM64 with my wip branch:
https://github.com/nicolinc/iommufd/commits/wip/iommufd-v6.2-nesting
(Covering new iommufd and vfio-compat)

Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Yi Liu March 1, 2023, 3:44 a.m. UTC | #5
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Wednesday, March 1, 2023 10:29 AM
> 
> On Tue, Feb 28, 2023 at 04:58:06PM +0000, Xu, Terrence wrote:
> 
> > Verified this series by "Intel GVT-g GPU device mediated passthrough"
> and "Intel GVT-d GPU device direct passthrough" technologies.
> > Both passed VFIO legacy mode / compat mode / cdev mode, including
> negative tests.
> >
> > Tested-by: Terrence Xu <terrence.xu@intel.com>
> 
> Sanity-tested this series on ARM64 with my wip branch:
> https://github.com/nicolinc/iommufd/commits/wip/iommufd-v6.2-nesting
> (Covering new iommufd and vfio-compat)
> 
> Tested-by: Nicolin Chen <nicolinc@nvidia.com>

Thanks.

Regards,
Yi Liu
Shameerali Kolothum Thodi March 2, 2023, 9:43 a.m. UTC | #6
> -----Original Message-----
> From: Nicolin Chen [mailto:nicolinc@nvidia.com]
> Sent: 01 March 2023 02:29
> To: Xu, Terrence <terrence.xu@intel.com>
> Cc: Liu, Yi L <yi.l.liu@intel.com>; Jason Gunthorpe <jgg@nvidia.com>;
> alex.williamson@redhat.com; Tian, Kevin <kevin.tian@intel.com>;
> joro@8bytes.org; robin.murphy@arm.com; cohuck@redhat.com;
> eric.auger@redhat.com; kvm@vger.kernel.org; mjrosato@linux.ibm.com;
> chao.p.peng@linux.intel.com; yi.y.sun@linux.intel.com; peterx@redhat.com;
> jasowang@redhat.com; Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; lulu@redhat.com;
> suravee.suthikulpanit@amd.com; intel-gvt-dev@lists.freedesktop.org;
> intel-gfx@lists.freedesktop.org; linux-s390@vger.kernel.org; Hao, Xudong
> <xudong.hao@intel.com>; Zhao, Yan Y <yan.y.zhao@intel.com>
> Subject: Re: [PATCH v5 00/19] Add vfio_device cdev for iommufd support
> 
> On Tue, Feb 28, 2023 at 04:58:06PM +0000, Xu, Terrence wrote:
> 
> > Verified this series by "Intel GVT-g GPU device mediated passthrough" and
> "Intel GVT-d GPU device direct passthrough" technologies.
> > Both passed VFIO legacy mode / compat mode / cdev mode, including
> negative tests.
> >
> > Tested-by: Terrence Xu <terrence.xu@intel.com>
> 
> Sanity-tested this series on ARM64 with my wip branch:
> https://github.com/nicolinc/iommufd/commits/wip/iommufd-v6.2-nesting
> (Covering new iommufd and vfio-compat)

Hi Nicolin,

Thanks for the latest ARM64 branch. Do you have a working Qemu branch corresponding to the
above one?

I tried the https://github.com/nicolinc/qemu/tree/wip/iommufd_rfcv3%2Bnesting%2Bsmmuv3
but for some reason not able to launch the Guest.

Please let me know.

Thanks,
Shameer
Nicolin Chen March 2, 2023, 11:51 p.m. UTC | #7
On Thu, Mar 02, 2023 at 09:43:00AM +0000, Shameerali Kolothum Thodi wrote:
 
> Hi Nicolin,
> 
> Thanks for the latest ARM64 branch. Do you have a working Qemu branch corresponding to the
> above one?
> 
> I tried the https://github.com/nicolinc/qemu/tree/wip/iommufd_rfcv3%2Bnesting%2Bsmmuv3
> but for some reason not able to launch the Guest.
> 
> Please let me know.

I do use that branch. It might not be that robust though as it
went through a big rebase. Can you try with the followings?

--trace "iommufd*" --trace "smmu*" --trace "vfio_*" --trace "pci_*" --trace "msi_*" --trace "nvme_*"

Thanks
Nicolin
Shameerali Kolothum Thodi March 3, 2023, 3:01 p.m. UTC | #8
> -----Original Message-----
> From: Nicolin Chen [mailto:nicolinc@nvidia.com]
> Sent: 02 March 2023 23:51
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: Xu, Terrence <terrence.xu@intel.com>; Liu, Yi L <yi.l.liu@intel.com>;
> Jason Gunthorpe <jgg@nvidia.com>; alex.williamson@redhat.com; Tian,
> Kevin <kevin.tian@intel.com>; joro@8bytes.org; robin.murphy@arm.com;
> cohuck@redhat.com; eric.auger@redhat.com; kvm@vger.kernel.org;
> mjrosato@linux.ibm.com; chao.p.peng@linux.intel.com;
> yi.y.sun@linux.intel.com; peterx@redhat.com; jasowang@redhat.com;
> lulu@redhat.com; suravee.suthikulpanit@amd.com;
> intel-gvt-dev@lists.freedesktop.org; intel-gfx@lists.freedesktop.org;
> linux-s390@vger.kernel.org; Hao, Xudong <xudong.hao@intel.com>; Zhao,
> Yan Y <yan.y.zhao@intel.com>
> Subject: Re: [PATCH v5 00/19] Add vfio_device cdev for iommufd support
> 
> On Thu, Mar 02, 2023 at 09:43:00AM +0000, Shameerali Kolothum Thodi
> wrote:
> 
> > Hi Nicolin,
> >
> > Thanks for the latest ARM64 branch. Do you have a working Qemu branch
> corresponding to the
> > above one?
> >
> > I tried the
> https://github.com/nicolinc/qemu/tree/wip/iommufd_rfcv3%2Bnesting%2B
> smmuv3
> > but for some reason not able to launch the Guest.
> >
> > Please let me know.
> 
> I do use that branch. It might not be that robust though as it
> went through a big rebase.

Ok. The issue seems to be quite random in nature and only happens when there
are multiple vCPUs. Also doesn't look like related to VFIO device assignment
as I can reproduce Guest hang without it by only having nested-smmuv3 and
iommufd object.

./qemu-system-aarch64-iommuf -machine virt,gic-version=3,iommu=nested-smmuv3,iommufd=iommufd0 \
-enable-kvm -cpu host -m 1G -smp cpus=8,maxcpus=8 \
-object iommufd,id=iommufd0 \
-bios QEMU_EFI.fd \
-kernel Image-6.2-iommufd \
-initrd rootfs-iperf.cpio \
-net none \
-nographic \
-append "rdinit=init console=ttyAMA0 root=/dev/vda rw earlycon=pl011,0x9000000" \
-trace events=events \
-D trace_iommufd 

When the issue happens, no output on terminal as if Qemu is in a locked state.

 Can you try with the followings?
> 
> --trace "iommufd*" --trace "smmu*" --trace "vfio_*" --trace "pci_*" --trace
> "msi_*" --trace "nvme_*"

The only trace events with above are this,

iommufd_backend_connect fd=22 owned=1 users=1 (0)
smmu_add_mr smmuv3-iommu-memory-region-0-0

I haven't debugged this further. Please let me know if issue is reproducible 
with multiple vCPUs at your end. For now will focus on VFIO dev specific tests.

Thanks,
Shameer
Matthew Rosato March 3, 2023, 9:29 p.m. UTC | #9
On 2/28/23 9:29 PM, Nicolin Chen wrote:
> On Tue, Feb 28, 2023 at 04:58:06PM +0000, Xu, Terrence wrote:
> 
>> Verified this series by "Intel GVT-g GPU device mediated passthrough" and "Intel GVT-d GPU device direct passthrough" technologies.
>> Both passed VFIO legacy mode / compat mode / cdev mode, including negative tests.
>>
>> Tested-by: Terrence Xu <terrence.xu@intel.com>
> 
> Sanity-tested this series on ARM64 with my wip branch:
> https://github.com/nicolinc/iommufd/commits/wip/iommufd-v6.2-nesting
> (Covering new iommufd and vfio-compat)
> 
> Tested-by: Nicolin Chen <nicolinc@nvidia.com>

Tested a few different flavors of this series on s390 (I grabbed the most recent v6 copy from github):

legacy (IOMMUFD=n): vfio-pci, vfio-ccw, vfio-ap
compat (CONFIG_IOMMUFD_VFIO_CONTAINER=y): vfio-pci, vfio-ccw, vfio-ap
compat+cdev+group (VFIO_DEVICE_CDEV=y && VFIO_GROUP=y): vfio-pci (over cdev using Yi's qemu branch as well as via group), vfio-ccw and vfio-ap via group
compat+cdev-only (VFIO_DEVICE_CDEV=y && VFIO_GROUP=n): vfio-pci using Yi's qemu branch

Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Nicolin Chen March 4, 2023, 7 a.m. UTC | #10
On Fri, Mar 03, 2023 at 03:01:03PM +0000, Shameerali Kolothum Thodi wrote:
> External email: Use caution opening links or attachments
> 
> 
> > -----Original Message-----
> > From: Nicolin Chen [mailto:nicolinc@nvidia.com]
> > Sent: 02 March 2023 23:51
> > To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> > Cc: Xu, Terrence <terrence.xu@intel.com>; Liu, Yi L <yi.l.liu@intel.com>;
> > Jason Gunthorpe <jgg@nvidia.com>; alex.williamson@redhat.com; Tian,
> > Kevin <kevin.tian@intel.com>; joro@8bytes.org; robin.murphy@arm.com;
> > cohuck@redhat.com; eric.auger@redhat.com; kvm@vger.kernel.org;
> > mjrosato@linux.ibm.com; chao.p.peng@linux.intel.com;
> > yi.y.sun@linux.intel.com; peterx@redhat.com; jasowang@redhat.com;
> > lulu@redhat.com; suravee.suthikulpanit@amd.com;
> > intel-gvt-dev@lists.freedesktop.org; intel-gfx@lists.freedesktop.org;
> > linux-s390@vger.kernel.org; Hao, Xudong <xudong.hao@intel.com>; Zhao,
> > Yan Y <yan.y.zhao@intel.com>
> > Subject: Re: [PATCH v5 00/19] Add vfio_device cdev for iommufd support
> >
> > On Thu, Mar 02, 2023 at 09:43:00AM +0000, Shameerali Kolothum Thodi
> > wrote:
> >
> > > Hi Nicolin,
> > >
> > > Thanks for the latest ARM64 branch. Do you have a working Qemu branch
> > corresponding to the
> > > above one?
> > >
> > > I tried the
> > https://github.com/nicolinc/qemu/tree/wip/iommufd_rfcv3%2Bnesting%2B
> > smmuv3
> > > but for some reason not able to launch the Guest.
> > >
> > > Please let me know.
> >
> > I do use that branch. It might not be that robust though as it
> > went through a big rebase.
> 
> Ok. The issue seems to be quite random in nature and only happens when there
> are multiple vCPUs. Also doesn't look like related to VFIO device assignment
> as I can reproduce Guest hang without it by only having nested-smmuv3 and
> iommufd object.
> 
> ./qemu-system-aarch64-iommuf -machine virt,gic-version=3,iommu=nested-smmuv3,iommufd=iommufd0 \
> -enable-kvm -cpu host -m 1G -smp cpus=8,maxcpus=8 \
> -object iommufd,id=iommufd0 \
> -bios QEMU_EFI.fd \
> -kernel Image-6.2-iommufd \
> -initrd rootfs-iperf.cpio \
> -net none \
> -nographic \
> -append "rdinit=init console=ttyAMA0 root=/dev/vda rw earlycon=pl011,0x9000000" \
> -trace events=events \
> -D trace_iommufd
> 
> When the issue happens, no output on terminal as if Qemu is in a locked state.
> 
>  Can you try with the followings?
> >
> > --trace "iommufd*" --trace "smmu*" --trace "vfio_*" --trace "pci_*" --trace
> > "msi_*" --trace "nvme_*"
> 
> The only trace events with above are this,
> 
> iommufd_backend_connect fd=22 owned=1 users=1 (0)
> smmu_add_mr smmuv3-iommu-memory-region-0-0
> 
> I haven't debugged this further. Please let me know if issue is reproducible
> with multiple vCPUs at your end. For now will focus on VFIO dev specific tests.

Oh. My test environment has been a single-core vCPU. So that
doesn't happen to me. Can you try a vanilla QEMU branch that
our nesting branch is rebased on? I took a branch from Yi as
the baseline, while he might take from Eric for the rfcv3.

I am guessing that it might be an issue in the common tree.

Thanks
Nicolin
Yi Liu March 4, 2023, 8:22 a.m. UTC | #11
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Saturday, March 4, 2023 3:01 PM
> 
> Oh. My test environment has been a single-core vCPU. So that
> doesn't happen to me. Can you try a vanilla QEMU branch that
> our nesting branch is rebased on? I took a branch from Yi as
> the baseline, while he might take from Eric for the rfcv3.

Yes, I took the qemu from Eric's rfcv3, just plus two commits to align the
uapi.

Regards
Yi Liu
Shameerali Kolothum Thodi March 8, 2023, 3:54 p.m. UTC | #12
> -----Original Message-----
> From: Nicolin Chen [mailto:nicolinc@nvidia.com]
> Sent: 04 March 2023 07:01
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: Xu, Terrence <terrence.xu@intel.com>; Liu, Yi L <yi.l.liu@intel.com>;
> Jason Gunthorpe <jgg@nvidia.com>; alex.williamson@redhat.com; Tian,
> Kevin <kevin.tian@intel.com>; joro@8bytes.org; robin.murphy@arm.com;
> cohuck@redhat.com; eric.auger@redhat.com; kvm@vger.kernel.org;
> mjrosato@linux.ibm.com; chao.p.peng@linux.intel.com;
> yi.y.sun@linux.intel.com; peterx@redhat.com; jasowang@redhat.com;
> lulu@redhat.com; suravee.suthikulpanit@amd.com;
> intel-gvt-dev@lists.freedesktop.org; intel-gfx@lists.freedesktop.org;
> linux-s390@vger.kernel.org; Hao, Xudong <xudong.hao@intel.com>; Zhao,
> Yan Y <yan.y.zhao@intel.com>
> Subject: Re: [PATCH v5 00/19] Add vfio_device cdev for iommufd support
> 
> On Fri, Mar 03, 2023 at 03:01:03PM +0000, Shameerali Kolothum Thodi
> wrote:
> > External email: Use caution opening links or attachments
> >
> >
> > > -----Original Message-----
> > > From: Nicolin Chen [mailto:nicolinc@nvidia.com]
> > > Sent: 02 March 2023 23:51
> > > To: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>
> > > Cc: Xu, Terrence <terrence.xu@intel.com>; Liu, Yi L
> > > <yi.l.liu@intel.com>; Jason Gunthorpe <jgg@nvidia.com>;
> > > alex.williamson@redhat.com; Tian, Kevin <kevin.tian@intel.com>;
> > > joro@8bytes.org; robin.murphy@arm.com; cohuck@redhat.com;
> > > eric.auger@redhat.com; kvm@vger.kernel.org; mjrosato@linux.ibm.com;
> > > chao.p.peng@linux.intel.com; yi.y.sun@linux.intel.com;
> > > peterx@redhat.com; jasowang@redhat.com; lulu@redhat.com;
> > > suravee.suthikulpanit@amd.com; intel-gvt-dev@lists.freedesktop.org;
> > > intel-gfx@lists.freedesktop.org; linux-s390@vger.kernel.org; Hao,
> > > Xudong <xudong.hao@intel.com>; Zhao, Yan Y <yan.y.zhao@intel.com>
> > > Subject: Re: [PATCH v5 00/19] Add vfio_device cdev for iommufd
> > > support
> > >
> > > On Thu, Mar 02, 2023 at 09:43:00AM +0000, Shameerali Kolothum Thodi
> > > wrote:
> > >
> > > > Hi Nicolin,
> > > >
> > > > Thanks for the latest ARM64 branch. Do you have a working Qemu
> > > > branch
> > > corresponding to the
> > > > above one?
> > > >
> > > > I tried the
> > >
> https://github.com/nicolinc/qemu/tree/wip/iommufd_rfcv3%2Bnesting%2B
> > > smmuv3
> > > > but for some reason not able to launch the Guest.
> > > >
> > > > Please let me know.
> > >
> > > I do use that branch. It might not be that robust though as it went
> > > through a big rebase.
> >
> > Ok. The issue seems to be quite random in nature and only happens when
> > there are multiple vCPUs. Also doesn't look like related to VFIO
> > device assignment as I can reproduce Guest hang without it by only
> > having nested-smmuv3 and iommufd object.
> >
> > ./qemu-system-aarch64-iommuf -machine
> > virt,gic-version=3,iommu=nested-smmuv3,iommufd=iommufd0 \
> -enable-kvm
> > -cpu host -m 1G -smp cpus=8,maxcpus=8 \ -object iommufd,id=iommufd0
> \
> > -bios QEMU_EFI.fd \ -kernel Image-6.2-iommufd \ -initrd
> > rootfs-iperf.cpio \ -net none \ -nographic \ -append "rdinit=init
> > console=ttyAMA0 root=/dev/vda rw earlycon=pl011,0x9000000" \ -trace
> > events=events \ -D trace_iommufd
> >
> > When the issue happens, no output on terminal as if Qemu is in a locked
> state.
> >
> >  Can you try with the followings?
> > >
> > > --trace "iommufd*" --trace "smmu*" --trace "vfio_*" --trace "pci_*"
> > > --trace "msi_*" --trace "nvme_*"
> >
> > The only trace events with above are this,
> >
> > iommufd_backend_connect fd=22 owned=1 users=1 (0) smmu_add_mr
> > smmuv3-iommu-memory-region-0-0
> >
> > I haven't debugged this further. Please let me know if issue is
> > reproducible with multiple vCPUs at your end. For now will focus on VFIO
> dev specific tests.
> 
> Oh. My test environment has been a single-core vCPU. So that doesn't
> happen to me. Can you try a vanilla QEMU branch that our nesting branch is
> rebased on? I took a branch from Yi as the baseline, while he might take
> from Eric for the rfcv3.
> 
> I am guessing that it might be an issue in the common tree.

Yes, that looks like the case.
I tried with:
 commit 13356edb8750("Merge tag 'block-pull-request' of https://gitlab.com/stefanha/qemu into staging")

And issue is still there. So hopefully once we rebase everything it will go away.

Thanks,
Shameer
Shameerali Kolothum Thodi March 14, 2023, 11:38 a.m. UTC | #13
> -----Original Message-----
> From: Shameerali Kolothum Thodi
> Sent: 08 March 2023 15:55
> To: 'Nicolin Chen' <nicolinc@nvidia.com>
> Cc: Xu, Terrence <terrence.xu@intel.com>; Liu, Yi L <yi.l.liu@intel.com>;
> Jason Gunthorpe <jgg@nvidia.com>; alex.williamson@redhat.com; Tian,
> Kevin <kevin.tian@intel.com>; joro@8bytes.org; robin.murphy@arm.com;
> cohuck@redhat.com; eric.auger@redhat.com; kvm@vger.kernel.org;
> mjrosato@linux.ibm.com; chao.p.peng@linux.intel.com;
> yi.y.sun@linux.intel.com; peterx@redhat.com; jasowang@redhat.com;
> lulu@redhat.com; suravee.suthikulpanit@amd.com;
> intel-gvt-dev@lists.freedesktop.org; intel-gfx@lists.freedesktop.org;
> linux-s390@vger.kernel.org; Hao, Xudong <xudong.hao@intel.com>; Zhao,
> Yan Y <yan.y.zhao@intel.com>
> Subject: RE: [PATCH v5 00/19] Add vfio_device cdev for iommufd support
> 

[...]
> > > > On Thu, Mar 02, 2023 at 09:43:00AM +0000, Shameerali Kolothum
> > > > Thodi
> > > > wrote:
> > > >
> > > > > Hi Nicolin,
> > > > >
> > > > > Thanks for the latest ARM64 branch. Do you have a working Qemu
> > > > > branch
> > > > corresponding to the
> > > > > above one?
> > > > >
> > > > > I tried the
> > > >
> >
> https://github.com/nicolinc/qemu/tree/wip/iommufd_rfcv3%2Bnesting%2B
> > > > smmuv3
> > > > > but for some reason not able to launch the Guest.
> > > > >
> > > > > Please let me know.
> > > >
> > > > I do use that branch. It might not be that robust though as it
> > > > went through a big rebase.
> > >
> > > Ok. The issue seems to be quite random in nature and only happens
> > > when there are multiple vCPUs. Also doesn't look like related to
> > > VFIO device assignment as I can reproduce Guest hang without it by
> > > only having nested-smmuv3 and iommufd object.
> > >
> > > ./qemu-system-aarch64-iommuf -machine
> > > virt,gic-version=3,iommu=nested-smmuv3,iommufd=iommufd0 \
> > -enable-kvm
> > > -cpu host -m 1G -smp cpus=8,maxcpus=8 \ -object
> iommufd,id=iommufd0
> > \
> > > -bios QEMU_EFI.fd \ -kernel Image-6.2-iommufd \ -initrd
> > > rootfs-iperf.cpio \ -net none \ -nographic \ -append "rdinit=init
> > > console=ttyAMA0 root=/dev/vda rw earlycon=pl011,0x9000000" \ -trace
> > > events=events \ -D trace_iommufd
> > >
> > > When the issue happens, no output on terminal as if Qemu is in a
> > > locked
> > state.
> > >
> > >  Can you try with the followings?
> > > >
> > > > --trace "iommufd*" --trace "smmu*" --trace "vfio_*" --trace "pci_*"
> > > > --trace "msi_*" --trace "nvme_*"
> > >
> > > The only trace events with above are this,
> > >
> > > iommufd_backend_connect fd=22 owned=1 users=1 (0) smmu_add_mr
> > > smmuv3-iommu-memory-region-0-0
> > >
> > > I haven't debugged this further. Please let me know if issue is
> > > reproducible with multiple vCPUs at your end. For now will focus on
> > > VFIO
> > dev specific tests.
> >
> > Oh. My test environment has been a single-core vCPU. So that doesn't
> > happen to me. Can you try a vanilla QEMU branch that our nesting
> > branch is rebased on? I took a branch from Yi as the baseline, while
> > he might take from Eric for the rfcv3.
> >
> > I am guessing that it might be an issue in the common tree.
> 
> Yes, that looks like the case.
> I tried with:
>  commit 13356edb8750("Merge tag 'block-pull-request' of
> https://gitlab.com/stefanha/qemu into staging")
> 
> And issue is still there. So hopefully once we rebase everything it will go
> away.

Hi Nicolin,

I rebased your latest Qemu branch[1] on top of v7.2.0 and not observed
the above issue so far. However noticed couple of other issues when
we try to hot add/remove devices.

(qemu) device_del net1
qemu-system-aarch64-iommufd: Failed to free id: 4 Inappropriate ioctl for device
qemu-system-aarch64-iommufd: IOMMU_IOAS_UNMAP failed: No such file or directory
qemu-system-aarch64-iommufd: vfio_dma_unmap(0xaaaaf587a3d0, 0x8000101000, 0xf000) = -2 (No such file or directory)
qemu-system-aarch64-iommufd: IOMMU_IOAS_UNMAP failed: No such file or directory
qemu-system-aarch64-iommufd: vfio_dma_unmap(0xaaaaf587a3d0, 0x8000000000, 0x100000) = -2 (No such file or directory)
qemu-system-aarch64-iommufd: Failed to free id:1 Device or resource busy

Ignoring the MMIO UNMAP errors, it looks like the object free is
not proper on dev removal path. I have few quick fixes here 
for this,
https://github.com/hisilicon/qemu/tree/private-v7.2.0-iommufd-nesting

With the above, it seems the HWPT/IOAS objects are destroyed properly
on dev detach path. But when the dev is added back, gets a Qemu seg fault
and so far I have no clue why that happens.

(qemu) device_add vfio-pci,host=0000:7d:02.1,iommufd=iommufd0,bus=rp1,id=net1
./qemu_run-iommufd-nested: line 13:  7041 Segmentation fault
(core dumped) ./qemu-system-aarch64-iommufd
-machine virt,gic-version=3,iommu=nested-smmuv3,iommufd=iommufd0
-enable-kvm -cpu host -m 1G -smp cpus=8,maxcpus=8 -object
iommufd,id=iommufd0 -bios QEMU_EFI_Dec2018.fd -kernel
Image-iommufd -initrd rootfs-iperf.cpio -device
ioh3420,id=rp1 -device
vfio-pci,host=0000:7d:02.1,iommufd=iommufd0,bus=rp1,id=net1 -append
"rdinit=init console=ttyAMA0 root=/dev/vda rw
earlycon=pl011,0x9000000" -net none -nographic -trace events=events -D
trace_iommufd

There are no kernel log/crash and not much useful traces while this happens.
Understand these are early days and it is not robust in anyway, but please
let me know if you suspect anything. I will continue debugging and will update
if anything.

Thanks,
Shameer

[1] https://github.com/nicolinc/qemu/tree/wip/iommufd_rfcv3%2Bnesting%2Bsmmuv3
Nicolin Chen March 15, 2023, 11:22 p.m. UTC | #14
On Tue, Mar 14, 2023 at 11:38:11AM +0000, Shameerali Kolothum Thodi wrote:

> Hi Nicolin,
> 
> I rebased your latest Qemu branch[1] on top of v7.2.0 and not observed
> the above issue so far. However noticed couple of other issues when
> we try to hot add/remove devices.
> 
> (qemu) device_del net1
> qemu-system-aarch64-iommufd: Failed to free id: 4 Inappropriate ioctl for device
> qemu-system-aarch64-iommufd: IOMMU_IOAS_UNMAP failed: No such file or directory
> qemu-system-aarch64-iommufd: vfio_dma_unmap(0xaaaaf587a3d0, 0x8000101000, 0xf000) = -2 (No such file or directory)
> qemu-system-aarch64-iommufd: IOMMU_IOAS_UNMAP failed: No such file or directory
> qemu-system-aarch64-iommufd: vfio_dma_unmap(0xaaaaf587a3d0, 0x8000000000, 0x100000) = -2 (No such file or directory)
> qemu-system-aarch64-iommufd: Failed to free id:1 Device or resource busy
> 
> Ignoring the MMIO UNMAP errors, it looks like the object free is
> not proper on dev removal path. I have few quick fixes here
> for this,
> https://github.com/hisilicon/qemu/tree/private-v7.2.0-iommufd-nesting

The smmuv3 change looks good to me. I will let Yi check the
iommufd change.

Yi, I wonder if this is the hot reset case that you asked me
for, a couple of weeks ago.

> With the above, it seems the HWPT/IOAS objects are destroyed properly
> on dev detach path. But when the dev is added back, gets a Qemu seg fault
> and so far I have no clue why that happens.
>
> (qemu) device_add vfio-pci,host=0000:7d:02.1,iommufd=iommufd0,bus=rp1,id=net1
> ./qemu_run-iommufd-nested: line 13:  7041 Segmentation fault
> (core dumped) ./qemu-system-aarch64-iommufd
> -machine virt,gic-version=3,iommu=nested-smmuv3,iommufd=iommufd0
> -enable-kvm -cpu host -m 1G -smp cpus=8,maxcpus=8 -object
> iommufd,id=iommufd0 -bios QEMU_EFI_Dec2018.fd -kernel
> Image-iommufd -initrd rootfs-iperf.cpio -device
> ioh3420,id=rp1 -device
> vfio-pci,host=0000:7d:02.1,iommufd=iommufd0,bus=rp1,id=net1 -append
> "rdinit=init console=ttyAMA0 root=/dev/vda rw
> earlycon=pl011,0x9000000" -net none -nographic -trace events=events -D
> trace_iommufd
> 
> There are no kernel log/crash and not much useful traces while this happens.
> Understand these are early days and it is not robust in anyway, but please
> let me know if you suspect anything. I will continue debugging and will update
> if anything.

Thanks! That'd be very helpful.

Nicolin
Yi Liu March 16, 2023, 7:39 a.m. UTC | #15
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Thursday, March 16, 2023 7:23 AM
>
> On Tue, Mar 14, 2023 at 11:38:11AM +0000, Shameerali Kolothum Thodi
> wrote:
> 
> > Hi Nicolin,
> >
> > I rebased your latest Qemu branch[1] on top of v7.2.0 and not observed
> > the above issue so far. However noticed couple of other issues when
> > we try to hot add/remove devices.
> >
> > (qemu) device_del net1
> > qemu-system-aarch64-iommufd: Failed to free id: 4 Inappropriate ioctl for
> device
> > qemu-system-aarch64-iommufd: IOMMU_IOAS_UNMAP failed: No such
> file or directory
> > qemu-system-aarch64-iommufd: vfio_dma_unmap(0xaaaaf587a3d0,
> 0x8000101000, 0xf000) = -2 (No such file or directory)
> > qemu-system-aarch64-iommufd: IOMMU_IOAS_UNMAP failed: No such
> file or directory
> > qemu-system-aarch64-iommufd: vfio_dma_unmap(0xaaaaf587a3d0,
> 0x8000000000, 0x100000) = -2 (No such file or directory)
> > qemu-system-aarch64-iommufd: Failed to free id:1 Device or resource
> busy
> >
> > Ignoring the MMIO UNMAP errors, it looks like the object free is
> > not proper on dev removal path. I have few quick fixes here
> > for this,
> > https://github.com/hisilicon/qemu/tree/private-v7.2.0-iommufd-nesting
> 
> The smmuv3 change looks good to me. I will let Yi check the
> iommufd change.
> 
> Yi, I wonder if this is the hot reset case that you asked me
> for, a couple of weeks ago.

Aha, not really. What Thodi does is the hot removal which is emulating
hot-plug out a physical device from the PCI slot. It may trigger hot reset
though since reset is something needs to be done during it. However,
it's not a focus test as I asked weeks ago.