mbox series

[v2,00/10] Introduce new methods for verifying ownership in vfio PCI hot reset

Message ID 20230327093458.44939-1-yi.l.liu@intel.com (mailing list archive)
Headers show
Series Introduce new methods for verifying ownership in vfio PCI hot reset | expand

Message

Yi Liu March 27, 2023, 9:34 a.m. UTC
VFIO_DEVICE_PCI_HOT_RESET requires user to pass an array of group fds
to prove that it owns all devices affected by resetting the calling
device. This series introduces several extensions to allow the ownership
check better aligned with iommufd and coming vfio device cdev support.

First, resetting an unopened device is always safe given nobody is using
it. So relax the check to allow such devices not covered by group fd
array. [1]

When iommufd is used we can simply verify that all affected devices are
bound to a same iommufd then no need for the user to provide extra fd
information. This is enabled by the user passing a zero-length fd array
and moving forward this should be the preferred way for hot reset. [2]

However the iommufd method has difficulty working with noiommu devices
since those devices don't have a valid iommufd, unless the noiommu device
is in a singleton dev_set hence no ownership check is required. [3]

For noiommu backward compatibility a 3rd method is introduced by allowing
the user to pass an array of device fds to prove ownership. [4]

As suggested by Jason [5], we have this series to introduce the above
stuffs to the vfio PCI hot reset. Per the dicussion in [6], this series
also adds a new _INFO ioctl to get hot reset scope for given device.

[1] https://lore.kernel.org/kvm/Y%2FdobS6gdSkxnPH7@nvidia.com/
[2] https://lore.kernel.org/kvm/Y%2FZOOClu8nXy2toX@nvidia.com/#t
[3] https://lore.kernel.org/kvm/ZACX+Np%2FIY7ygqL5@nvidia.com/
[4] https://lore.kernel.org/kvm/DS0PR11MB7529BE88460582BD599DC1F7C3B19@DS0PR11MB7529.namprd11.prod.outlook.com/#t
[5] https://lore.kernel.org/kvm/ZAcvzvhkt9QhCmdi@nvidia.com/
[6] https://lore.kernel.org/kvm/ZBoYgNq60eDpV9Un@nvidia.com/

Change log:

v2:
 - Split the patch 03 of v1 to be 03, 04 and 05 of v2 (Jaon)
 - Add r-b from Kevin and Jason
 - Add patch 10 to introduce a new _INFO ioctl for the usage of device
   fd passing usage in cdev path (Jason, Alex)

v1: https://lore.kernel.org/kvm/20230316124156.12064-1-yi.l.liu@intel.com/

Regards,
	Yi Liu

Yi Liu (10):
  vfio/pci: Update comment around group_fd get in
    vfio_pci_ioctl_pci_hot_reset()
  vfio/pci: Only check ownership of opened devices in hot reset
  vfio/pci: Move the existing hot reset logic to be a helper
  vfio-iommufd: Add helper to retrieve iommufd_ctx and devid for
    vfio_device
  vfio/pci: Allow passing zero-length fd array in
    VFIO_DEVICE_PCI_HOT_RESET
  vfio: Refine vfio file kAPIs for vfio PCI hot reset
  vfio: Accpet device file from vfio PCI hot reset path
  vfio/pci: Renaming for accepting device fd in hot reset path
  vfio/pci: Accept device fd in VFIO_DEVICE_PCI_HOT_RESET ioctl
  vfio/pci: Add VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO

 drivers/iommu/iommufd/device.c   |  12 ++
 drivers/vfio/group.c             |  32 ++--
 drivers/vfio/iommufd.c           |  16 ++
 drivers/vfio/pci/vfio_pci_core.c | 244 ++++++++++++++++++++++++-------
 drivers/vfio/vfio.h              |   2 +
 drivers/vfio/vfio_main.c         |  44 ++++++
 include/linux/iommufd.h          |   3 +
 include/linux/vfio.h             |  14 ++
 include/uapi/linux/vfio.h        |  65 +++++++-
 9 files changed, 364 insertions(+), 68 deletions(-)

Comments

Jiang, Yanting March 31, 2023, 3:14 a.m. UTC | #1
> 
> VFIO_DEVICE_PCI_HOT_RESET requires user to pass an array of group fds to
> prove that it owns all devices affected by resetting the calling device. This series
> introduces several extensions to allow the ownership check better aligned with
> iommufd and coming vfio device cdev support.
> 
> First, resetting an unopened device is always safe given nobody is using it. So
> relax the check to allow such devices not covered by group fd array. [1]
> 
> When iommufd is used we can simply verify that all affected devices are bound
> to a same iommufd then no need for the user to provide extra fd information.
> This is enabled by the user passing a zero-length fd array and moving forward
> this should be the preferred way for hot reset. [2]
> 
> However the iommufd method has difficulty working with noiommu devices
> since those devices don't have a valid iommufd, unless the noiommu device is in
> a singleton dev_set hence no ownership check is required. [3]
> 
> For noiommu backward compatibility a 3rd method is introduced by allowing the
> user to pass an array of device fds to prove ownership. [4]
> 
> As suggested by Jason [5], we have this series to introduce the above stuffs to
> the vfio PCI hot reset. Per the dicussion in [6], this series also adds a new _INFO
> ioctl to get hot reset scope for given device.
> 
Tested NIC passthrough on Intel platform.
Result looks good hence, 
Tested by: Jiang, Yanting <yanting.jiang@intel.com>

Thanks,
Yanting
Jiang, Yanting March 31, 2023, 5:01 a.m. UTC | #2
> VFIO_DEVICE_PCI_HOT_RESET requires user to pass an array of group fds to
> prove that it owns all devices affected by resetting the calling device. This series
> introduces several extensions to allow the ownership check better aligned with
> iommufd and coming vfio device cdev support.
> 
> First, resetting an unopened device is always safe given nobody is using it. So
> relax the check to allow such devices not covered by group fd array. [1]
> 
> When iommufd is used we can simply verify that all affected devices are bound
> to a same iommufd then no need for the user to provide extra fd information.
> This is enabled by the user passing a zero-length fd array and moving forward
> this should be the preferred way for hot reset. [2]
> 
> However the iommufd method has difficulty working with noiommu devices
> since those devices don't have a valid iommufd, unless the noiommu device is in
> a singleton dev_set hence no ownership check is required. [3]
> 
> For noiommu backward compatibility a 3rd method is introduced by allowing the
> user to pass an array of device fds to prove ownership. [4]
> 
> As suggested by Jason [5], we have this series to introduce the above stuffs to
> the vfio PCI hot reset. Per the dicussion in [6], this series also adds a new _INFO
> ioctl to get hot reset scope for given device.
> 

Tested-by: Yanting Jiang <yanting.jiang@intel.com>

Thanks,
Yanting
Alex Williamson March 31, 2023, 1:24 p.m. UTC | #3
On Fri, 31 Mar 2023 03:14:23 +0000
"Jiang, Yanting" <yanting.jiang@intel.com> wrote:

> > 
> > VFIO_DEVICE_PCI_HOT_RESET requires user to pass an array of group fds to
> > prove that it owns all devices affected by resetting the calling device. This series
> > introduces several extensions to allow the ownership check better aligned with
> > iommufd and coming vfio device cdev support.
> > 
> > First, resetting an unopened device is always safe given nobody is using it. So
> > relax the check to allow such devices not covered by group fd array. [1]
> > 
> > When iommufd is used we can simply verify that all affected devices are bound
> > to a same iommufd then no need for the user to provide extra fd information.
> > This is enabled by the user passing a zero-length fd array and moving forward
> > this should be the preferred way for hot reset. [2]
> > 
> > However the iommufd method has difficulty working with noiommu devices
> > since those devices don't have a valid iommufd, unless the noiommu device is in
> > a singleton dev_set hence no ownership check is required. [3]
> > 
> > For noiommu backward compatibility a 3rd method is introduced by allowing the
> > user to pass an array of device fds to prove ownership. [4]
> > 
> > As suggested by Jason [5], we have this series to introduce the above stuffs to
> > the vfio PCI hot reset. Per the dicussion in [6], this series also adds a new _INFO
> > ioctl to get hot reset scope for given device.
> >   
> Tested NIC passthrough on Intel platform.
> Result looks good hence, 
> Tested by: Jiang, Yanting <yanting.jiang@intel.com>

I'm not aware of any userspace that exercises this reset ioctl in cdev
mode.  Is this regression testing only?  Thanks,

Alex
Xu, Terrence March 31, 2023, 5:27 p.m. UTC | #4
> -----Original Message-----
> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Monday, March 27, 2023 5:35 PM
> 
> VFIO_DEVICE_PCI_HOT_RESET requires user to pass an array of group fds to
> prove that it owns all devices affected by resetting the calling device. This
> series introduces several extensions to allow the ownership check better
> aligned with iommufd and coming vfio device cdev support.
> 
> First, resetting an unopened device is always safe given nobody is using it. So
> relax the check to allow such devices not covered by group fd array. [1]
> 
> When iommufd is used we can simply verify that all affected devices are
> bound to a same iommufd then no need for the user to provide extra fd
> information. This is enabled by the user passing a zero-length fd array and
> moving forward this should be the preferred way for hot reset. [2]
> 
> However the iommufd method has difficulty working with noiommu devices
> since those devices don't have a valid iommufd, unless the noiommu device
> is in a singleton dev_set hence no ownership check is required. [3]
> 
> For noiommu backward compatibility a 3rd method is introduced by allowing
> the user to pass an array of device fds to prove ownership. [4]
> 
> As suggested by Jason [5], we have this series to introduce the above stuffs
> to the vfio PCI hot reset. Per the dicussion in [6], this series also adds a new
> _INFO ioctl to get hot reset scope for given device.
> 
> [1] https://lore.kernel.org/kvm/Y%2FdobS6gdSkxnPH7@nvidia.com/
> [2] https://lore.kernel.org/kvm/Y%2FZOOClu8nXy2toX@nvidia.com/#t
> [3] https://lore.kernel.org/kvm/ZACX+Np%2FIY7ygqL5@nvidia.com/
> [4]
> https://lore.kernel.org/kvm/DS0PR11MB7529BE88460582BD599DC1F7C3B19
> @DS0PR11MB7529.namprd11.prod.outlook.com/#t
> [5] https://lore.kernel.org/kvm/ZAcvzvhkt9QhCmdi@nvidia.com/
> [6] https://lore.kernel.org/kvm/ZBoYgNq60eDpV9Un@nvidia.com/
> 
> Change log:
> 
> v2:
>  - Split the patch 03 of v1 to be 03, 04 and 05 of v2 (Jaon)
>  - Add r-b from Kevin and Jason
>  - Add patch 10 to introduce a new _INFO ioctl for the usage of device
>    fd passing usage in cdev path (Jason, Alex)
> 
> v1: https://lore.kernel.org/kvm/20230316124156.12064-1-yi.l.liu@intel.com/
> 
> Regards,
> 	Yi Liu
> 
> Yi Liu (10):
>   vfio/pci: Update comment around group_fd get in
>     vfio_pci_ioctl_pci_hot_reset()
>   vfio/pci: Only check ownership of opened devices in hot reset
>   vfio/pci: Move the existing hot reset logic to be a helper
>   vfio-iommufd: Add helper to retrieve iommufd_ctx and devid for
>     vfio_device
>   vfio/pci: Allow passing zero-length fd array in
>     VFIO_DEVICE_PCI_HOT_RESET
>   vfio: Refine vfio file kAPIs for vfio PCI hot reset
>   vfio: Accpet device file from vfio PCI hot reset path
>   vfio/pci: Renaming for accepting device fd in hot reset path
>   vfio/pci: Accept device fd in VFIO_DEVICE_PCI_HOT_RESET ioctl
>   vfio/pci: Add VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO
> 
>  drivers/iommu/iommufd/device.c   |  12 ++
>  drivers/vfio/group.c             |  32 ++--
>  drivers/vfio/iommufd.c           |  16 ++
>  drivers/vfio/pci/vfio_pci_core.c | 244 ++++++++++++++++++++++++-------
>  drivers/vfio/vfio.h              |   2 +
>  drivers/vfio/vfio_main.c         |  44 ++++++
>  include/linux/iommufd.h          |   3 +
>  include/linux/vfio.h             |  14 ++
>  include/uapi/linux/vfio.h        |  65 +++++++-
>  9 files changed, 364 insertions(+), 68 deletions(-)
> 
> --
> 2.34.1

Verified this series by "Intel GVT-g GPU device mediated passthrough".
Passed VFIO legacy mode / compat mode / cdev mode basic functionality and GPU force reset test.

Tested-by: Terrence Xu <terrence.xu@intel.com>
Alex Williamson March 31, 2023, 5:49 p.m. UTC | #5
On Fri, 31 Mar 2023 17:27:27 +0000
"Xu, Terrence" <terrence.xu@intel.com> wrote:

> > -----Original Message-----
> > From: Liu, Yi L <yi.l.liu@intel.com>
> > Sent: Monday, March 27, 2023 5:35 PM
> > 
> > VFIO_DEVICE_PCI_HOT_RESET requires user to pass an array of group fds to
> > prove that it owns all devices affected by resetting the calling device. This
> > series introduces several extensions to allow the ownership check better
> > aligned with iommufd and coming vfio device cdev support.
> > 
> > First, resetting an unopened device is always safe given nobody is using it. So
> > relax the check to allow such devices not covered by group fd array. [1]
> > 
> > When iommufd is used we can simply verify that all affected devices are
> > bound to a same iommufd then no need for the user to provide extra fd
> > information. This is enabled by the user passing a zero-length fd array and
> > moving forward this should be the preferred way for hot reset. [2]
> > 
> > However the iommufd method has difficulty working with noiommu devices
> > since those devices don't have a valid iommufd, unless the noiommu device
> > is in a singleton dev_set hence no ownership check is required. [3]
> > 
> > For noiommu backward compatibility a 3rd method is introduced by allowing
> > the user to pass an array of device fds to prove ownership. [4]
> > 
> > As suggested by Jason [5], we have this series to introduce the above stuffs
> > to the vfio PCI hot reset. Per the dicussion in [6], this series also adds a new
> > _INFO ioctl to get hot reset scope for given device.
> > 
> > [1] https://lore.kernel.org/kvm/Y%2FdobS6gdSkxnPH7@nvidia.com/
> > [2] https://lore.kernel.org/kvm/Y%2FZOOClu8nXy2toX@nvidia.com/#t
> > [3] https://lore.kernel.org/kvm/ZACX+Np%2FIY7ygqL5@nvidia.com/
> > [4]
> > https://lore.kernel.org/kvm/DS0PR11MB7529BE88460582BD599DC1F7C3B19
> > @DS0PR11MB7529.namprd11.prod.outlook.com/#t
> > [5] https://lore.kernel.org/kvm/ZAcvzvhkt9QhCmdi@nvidia.com/
> > [6] https://lore.kernel.org/kvm/ZBoYgNq60eDpV9Un@nvidia.com/
> > 
> > Change log:
> > 
> > v2:
> >  - Split the patch 03 of v1 to be 03, 04 and 05 of v2 (Jaon)
> >  - Add r-b from Kevin and Jason
> >  - Add patch 10 to introduce a new _INFO ioctl for the usage of device
> >    fd passing usage in cdev path (Jason, Alex)
> > 
> > v1: https://lore.kernel.org/kvm/20230316124156.12064-1-yi.l.liu@intel.com/
> > 
> > Regards,
> > 	Yi Liu
> > 
> > Yi Liu (10):
> >   vfio/pci: Update comment around group_fd get in
> >     vfio_pci_ioctl_pci_hot_reset()
> >   vfio/pci: Only check ownership of opened devices in hot reset
> >   vfio/pci: Move the existing hot reset logic to be a helper
> >   vfio-iommufd: Add helper to retrieve iommufd_ctx and devid for
> >     vfio_device
> >   vfio/pci: Allow passing zero-length fd array in
> >     VFIO_DEVICE_PCI_HOT_RESET
> >   vfio: Refine vfio file kAPIs for vfio PCI hot reset
> >   vfio: Accpet device file from vfio PCI hot reset path
> >   vfio/pci: Renaming for accepting device fd in hot reset path
> >   vfio/pci: Accept device fd in VFIO_DEVICE_PCI_HOT_RESET ioctl
> >   vfio/pci: Add VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO
> > 
> >  drivers/iommu/iommufd/device.c   |  12 ++
> >  drivers/vfio/group.c             |  32 ++--
> >  drivers/vfio/iommufd.c           |  16 ++
> >  drivers/vfio/pci/vfio_pci_core.c | 244 ++++++++++++++++++++++++-------
> >  drivers/vfio/vfio.h              |   2 +
> >  drivers/vfio/vfio_main.c         |  44 ++++++
> >  include/linux/iommufd.h          |   3 +
> >  include/linux/vfio.h             |  14 ++
> >  include/uapi/linux/vfio.h        |  65 +++++++-
> >  9 files changed, 364 insertions(+), 68 deletions(-)
> > 
> > --
> > 2.34.1  
> 
> Verified this series by "Intel GVT-g GPU device mediated passthrough".
> Passed VFIO legacy mode / compat mode / cdev mode basic functionality and GPU force reset test.
> 
> Tested-by: Terrence Xu <terrence.xu@intel.com>

Seems like only this "GPU force reset test" is relevant to the new
functionality of this series, GVT-g does not and has no reason to
support the HOT_RESET ioctls used here.  Can you provide more details
of the force-reset test?  What userspace driver is being used?  Thanks,

Alex
Xu, Terrence April 1, 2023, 9:15 a.m. UTC | #6
> -----Original Message-----
> From: intel-gvt-dev <intel-gvt-dev-bounces@lists.freedesktop.org> On
> Behalf Of Alex Williamson
> Sent: Saturday, April 1, 2023 1:50 AM
> 
> On Fri, 31 Mar 2023 17:27:27 +0000
> "Xu, Terrence" <terrence.xu@intel.com> wrote:
> 
> > > -----Original Message-----
> > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > Sent: Monday, March 27, 2023 5:35 PM
> > >
> > > VFIO_DEVICE_PCI_HOT_RESET requires user to pass an array of group
> > > fds to prove that it owns all devices affected by resetting the
> > > calling device. This series introduces several extensions to allow
> > > the ownership check better aligned with iommufd and coming vfio device
> cdev support.
> > >
> > > First, resetting an unopened device is always safe given nobody is
> > > using it. So relax the check to allow such devices not covered by
> > > group fd array. [1]
> > >
> > > When iommufd is used we can simply verify that all affected devices
> > > are bound to a same iommufd then no need for the user to provide
> > > extra fd information. This is enabled by the user passing a
> > > zero-length fd array and moving forward this should be the preferred
> > > way for hot reset. [2]
> > >
> > > However the iommufd method has difficulty working with noiommu
> > > devices since those devices don't have a valid iommufd, unless the
> > > noiommu device is in a singleton dev_set hence no ownership check is
> > > required. [3]
> > >
> > > For noiommu backward compatibility a 3rd method is introduced by
> > > allowing the user to pass an array of device fds to prove ownership.
> > > [4]
> > >
> > > As suggested by Jason [5], we have this series to introduce the
> > > above stuffs to the vfio PCI hot reset. Per the dicussion in [6],
> > > this series also adds a new _INFO ioctl to get hot reset scope for given
> device.
> > >
> > > [1] https://lore.kernel.org/kvm/Y%2FdobS6gdSkxnPH7@nvidia.com/
> > > [2] https://lore.kernel.org/kvm/Y%2FZOOClu8nXy2toX@nvidia.com/#t
> > > [3] https://lore.kernel.org/kvm/ZACX+Np%2FIY7ygqL5@nvidia.com/
> > > [4]
> > >
> https://lore.kernel.org/kvm/DS0PR11MB7529BE88460582BD599DC1F7C3B19
> > > @DS0PR11MB7529.namprd11.prod.outlook.com/#t
> > > [5] https://lore.kernel.org/kvm/ZAcvzvhkt9QhCmdi@nvidia.com/
> > > [6] https://lore.kernel.org/kvm/ZBoYgNq60eDpV9Un@nvidia.com/
> > >
> > > Change log:
> > >
> > > v2:
> > >  - Split the patch 03 of v1 to be 03, 04 and 05 of v2 (Jaon)
> > >  - Add r-b from Kevin and Jason
> > >  - Add patch 10 to introduce a new _INFO ioctl for the usage of device
> > >    fd passing usage in cdev path (Jason, Alex)
> > >
> > > v1:
> > > https://lore.kernel.org/kvm/20230316124156.12064-1-yi.l.liu@intel.co
> > > m/
> > >
> > > Regards,
> > > 	Yi Liu
> > >
> > > Yi Liu (10):
> > >   vfio/pci: Update comment around group_fd get in
> > >     vfio_pci_ioctl_pci_hot_reset()
> > >   vfio/pci: Only check ownership of opened devices in hot reset
> > >   vfio/pci: Move the existing hot reset logic to be a helper
> > >   vfio-iommufd: Add helper to retrieve iommufd_ctx and devid for
> > >     vfio_device
> > >   vfio/pci: Allow passing zero-length fd array in
> > >     VFIO_DEVICE_PCI_HOT_RESET
> > >   vfio: Refine vfio file kAPIs for vfio PCI hot reset
> > >   vfio: Accpet device file from vfio PCI hot reset path
> > >   vfio/pci: Renaming for accepting device fd in hot reset path
> > >   vfio/pci: Accept device fd in VFIO_DEVICE_PCI_HOT_RESET ioctl
> > >   vfio/pci: Add VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO
> > >
> > >  drivers/iommu/iommufd/device.c   |  12 ++
> > >  drivers/vfio/group.c             |  32 ++--
> > >  drivers/vfio/iommufd.c           |  16 ++
> > >  drivers/vfio/pci/vfio_pci_core.c | 244 ++++++++++++++++++++++++----
> ---
> > >  drivers/vfio/vfio.h              |   2 +
> > >  drivers/vfio/vfio_main.c         |  44 ++++++
> > >  include/linux/iommufd.h          |   3 +
> > >  include/linux/vfio.h             |  14 ++
> > >  include/uapi/linux/vfio.h        |  65 +++++++-
> > >  9 files changed, 364 insertions(+), 68 deletions(-)
> > >
> > > --
> > > 2.34.1
> >
> > Verified this series by "Intel GVT-g GPU device mediated passthrough".
> > Passed VFIO legacy mode / compat mode / cdev mode basic functionality
> and GPU force reset test.
> >
> > Tested-by: Terrence Xu <terrence.xu@intel.com>
> 
> Seems like only this "GPU force reset test" is relevant to the new
> functionality of this series, GVT-g does not and has no reason to support the
> HOT_RESET ioctls used here.  Can you provide more details of the force-reset
> test?  What userspace driver is being used?  Thanks,
> 
> Alex
Hi Alex, about the "GPU force reset test", I used the "i915_hangman" test from intel-gpu-tools, it is for GPU force hang / reset. 
It is an important regression test scenario for this patch series. 
To test the HOT_RESET ioctls itself, need to wait the corresponding Qemu changes from Yi.
Alex Williamson April 1, 2023, 1:08 p.m. UTC | #7
On Sat, 1 Apr 2023 09:15:33 +0000
"Xu, Terrence" <terrence.xu@intel.com> wrote:

> > -----Original Message-----
> > From: intel-gvt-dev <intel-gvt-dev-bounces@lists.freedesktop.org> On
> > Behalf Of Alex Williamson
> > Sent: Saturday, April 1, 2023 1:50 AM
> > 
> > On Fri, 31 Mar 2023 17:27:27 +0000
> > "Xu, Terrence" <terrence.xu@intel.com> wrote:
> >   
> > > > -----Original Message-----
> > > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > > Sent: Monday, March 27, 2023 5:35 PM
> > > >
> > > > VFIO_DEVICE_PCI_HOT_RESET requires user to pass an array of group
> > > > fds to prove that it owns all devices affected by resetting the
> > > > calling device. This series introduces several extensions to allow
> > > > the ownership check better aligned with iommufd and coming vfio device  
> > cdev support.  
> > > >
> > > > First, resetting an unopened device is always safe given nobody is
> > > > using it. So relax the check to allow such devices not covered by
> > > > group fd array. [1]
> > > >
> > > > When iommufd is used we can simply verify that all affected devices
> > > > are bound to a same iommufd then no need for the user to provide
> > > > extra fd information. This is enabled by the user passing a
> > > > zero-length fd array and moving forward this should be the preferred
> > > > way for hot reset. [2]
> > > >
> > > > However the iommufd method has difficulty working with noiommu
> > > > devices since those devices don't have a valid iommufd, unless the
> > > > noiommu device is in a singleton dev_set hence no ownership check is
> > > > required. [3]
> > > >
> > > > For noiommu backward compatibility a 3rd method is introduced by
> > > > allowing the user to pass an array of device fds to prove ownership.
> > > > [4]
> > > >
> > > > As suggested by Jason [5], we have this series to introduce the
> > > > above stuffs to the vfio PCI hot reset. Per the dicussion in [6],
> > > > this series also adds a new _INFO ioctl to get hot reset scope for given  
> > device.  
> > > >
> > > > [1] https://lore.kernel.org/kvm/Y%2FdobS6gdSkxnPH7@nvidia.com/
> > > > [2] https://lore.kernel.org/kvm/Y%2FZOOClu8nXy2toX@nvidia.com/#t
> > > > [3] https://lore.kernel.org/kvm/ZACX+Np%2FIY7ygqL5@nvidia.com/
> > > > [4]
> > > >  
> > https://lore.kernel.org/kvm/DS0PR11MB7529BE88460582BD599DC1F7C3B19  
> > > > @DS0PR11MB7529.namprd11.prod.outlook.com/#t
> > > > [5] https://lore.kernel.org/kvm/ZAcvzvhkt9QhCmdi@nvidia.com/
> > > > [6] https://lore.kernel.org/kvm/ZBoYgNq60eDpV9Un@nvidia.com/
> > > >
> > > > Change log:
> > > >
> > > > v2:
> > > >  - Split the patch 03 of v1 to be 03, 04 and 05 of v2 (Jaon)
> > > >  - Add r-b from Kevin and Jason
> > > >  - Add patch 10 to introduce a new _INFO ioctl for the usage of device
> > > >    fd passing usage in cdev path (Jason, Alex)
> > > >
> > > > v1:
> > > > https://lore.kernel.org/kvm/20230316124156.12064-1-yi.l.liu@intel.co
> > > > m/
> > > >
> > > > Regards,
> > > > 	Yi Liu
> > > >
> > > > Yi Liu (10):
> > > >   vfio/pci: Update comment around group_fd get in
> > > >     vfio_pci_ioctl_pci_hot_reset()
> > > >   vfio/pci: Only check ownership of opened devices in hot reset
> > > >   vfio/pci: Move the existing hot reset logic to be a helper
> > > >   vfio-iommufd: Add helper to retrieve iommufd_ctx and devid for
> > > >     vfio_device
> > > >   vfio/pci: Allow passing zero-length fd array in
> > > >     VFIO_DEVICE_PCI_HOT_RESET
> > > >   vfio: Refine vfio file kAPIs for vfio PCI hot reset
> > > >   vfio: Accpet device file from vfio PCI hot reset path
> > > >   vfio/pci: Renaming for accepting device fd in hot reset path
> > > >   vfio/pci: Accept device fd in VFIO_DEVICE_PCI_HOT_RESET ioctl
> > > >   vfio/pci: Add VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO
> > > >
> > > >  drivers/iommu/iommufd/device.c   |  12 ++
> > > >  drivers/vfio/group.c             |  32 ++--
> > > >  drivers/vfio/iommufd.c           |  16 ++
> > > >  drivers/vfio/pci/vfio_pci_core.c | 244 ++++++++++++++++++++++++----  
> > ---  
> > > >  drivers/vfio/vfio.h              |   2 +
> > > >  drivers/vfio/vfio_main.c         |  44 ++++++
> > > >  include/linux/iommufd.h          |   3 +
> > > >  include/linux/vfio.h             |  14 ++
> > > >  include/uapi/linux/vfio.h        |  65 +++++++-
> > > >  9 files changed, 364 insertions(+), 68 deletions(-)
> > > >
> > > > --
> > > > 2.34.1  
> > >
> > > Verified this series by "Intel GVT-g GPU device mediated passthrough".
> > > Passed VFIO legacy mode / compat mode / cdev mode basic functionality  
> > and GPU force reset test.  
> > >
> > > Tested-by: Terrence Xu <terrence.xu@intel.com>  
> > 
> > Seems like only this "GPU force reset test" is relevant to the new
> > functionality of this series, GVT-g does not and has no reason to support the
> > HOT_RESET ioctls used here.  Can you provide more details of the force-reset
> > test?  What userspace driver is being used?  Thanks,
> > 
> > Alex  
> Hi Alex, about the "GPU force reset test", I used the "i915_hangman"
> test from intel-gpu-tools, it is for GPU force hang / reset. It is an
> important regression test scenario for this patch series. To test the
> HOT_RESET ioctls itself, need to wait the corresponding Qemu changes
> from Yi.

But i915 exists on the host root bus, we fundamentally cannot perform a
bus reset of the root bus.  So how exactly is testing with GVT-g, which
doesn't use the vfio-pci-core hot-reset ioctl, or GVT-d, which can't do
a bus reset because it exists on the root bus, relevant to this series?
Is this some novel use of a dGPU i915 with out-of-tree drivers?

Obviously any regression testing is fine and appreciated, but if this
is intended to express some validation of the new interface, I'm
failing to see how.  Thanks,

Alex
Jiang, Yanting April 3, 2023, 2:04 a.m. UTC | #8
> -----Original Message-----
> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Friday, March 31, 2023 9:25 PM
> To: Jiang, Yanting <yanting.jiang@intel.com>
> Cc: Liu, Yi L <yi.l.liu@intel.com>; jgg@nvidia.com; Tian, Kevin
> <kevin.tian@intel.com>; joro@8bytes.org; robin.murphy@arm.com;
> cohuck@redhat.com; eric.auger@redhat.com; nicolinc@nvidia.com;
> kvm@vger.kernel.org; mjrosato@linux.ibm.com; chao.p.peng@linux.intel.com;
> yi.y.sun@linux.intel.com; peterx@redhat.com; jasowang@redhat.com;
> shameerali.kolothum.thodi@huawei.com; lulu@redhat.com;
> suravee.suthikulpanit@amd.com; intel-gvt-dev@lists.freedesktop.org; intel-
> gfx@lists.freedesktop.org; linux-s390@vger.kernel.org; Hao, Xudong
> <xudong.hao@intel.com>; Zhao, Yan Y <yan.y.zhao@intel.com>; Xu, Terrence
> <terrence.xu@intel.com>
> Subject: Re: [PATCH v2 00/10] Introduce new methods for verifying ownership in
> vfio PCI hot reset
> 
> On Fri, 31 Mar 2023 03:14:23 +0000
> "Jiang, Yanting" <yanting.jiang@intel.com> wrote:
> 
> > >
> > > VFIO_DEVICE_PCI_HOT_RESET requires user to pass an array of group
> > > fds to prove that it owns all devices affected by resetting the
> > > calling device. This series introduces several extensions to allow
> > > the ownership check better aligned with iommufd and coming vfio device
> cdev support.
> > >
> > > First, resetting an unopened device is always safe given nobody is
> > > using it. So relax the check to allow such devices not covered by
> > > group fd array. [1]
> > >
> > > When iommufd is used we can simply verify that all affected devices
> > > are bound to a same iommufd then no need for the user to provide extra fd
> information.
> > > This is enabled by the user passing a zero-length fd array and
> > > moving forward this should be the preferred way for hot reset. [2]
> > >
> > > However the iommufd method has difficulty working with noiommu
> > > devices since those devices don't have a valid iommufd, unless the
> > > noiommu device is in a singleton dev_set hence no ownership check is
> > > required. [3]
> > >
> > > For noiommu backward compatibility a 3rd method is introduced by
> > > allowing the user to pass an array of device fds to prove ownership.
> > > [4]
> > >
> > > As suggested by Jason [5], we have this series to introduce the
> > > above stuffs to the vfio PCI hot reset. Per the dicussion in [6],
> > > this series also adds a new _INFO ioctl to get hot reset scope for given
> device.
> > >
> > Tested NIC passthrough on Intel platform.
> > Result looks good hence,
> > Tested by: Jiang, Yanting <yanting.jiang@intel.com>
> 
> I'm not aware of any userspace that exercises this reset ioctl in cdev mode.  Is
> this regression testing only?  Thanks,
> 
> Alex

Hi Alex, 

Yes, only regression testing and some negative testing for NIC passthrough with legacy vfio mode, vfio iommufd compat mode, and cdev mode.

Thanks,
Yanting