mbox series

[RFC,v2,0/4] vfio/pci: Atomic Ops completer support

Message ID 20230526231558.1660396-1-alex.williamson@redhat.com (mailing list archive)
Headers show
Series vfio/pci: Atomic Ops completer support | expand

Message

Alex Williamson May 26, 2023, 11:15 p.m. UTC
This RFC proposes to allow a vfio-pci device to manipulate the PCI
Express capability of an associated root port to enable Atomic Op
completer support as equivalent to host capabilities.  This would
dynamically change capability bits in the config space of the root
port on realize and exit of the vfio-pci device under the following
condiations:

 - The vfio-pci device is directly connected to the root port and
   the root port implements a v2 PCIe capability, thereby supporting
   the DEVCAP2 register.

 - The vfio-pci device is exposed as a single-function device.

 - Atomic completer support is not otherwise reported on the root port.

 - The vfio-pci device reports the VFIO_DEVICE_INFO_CAP_PCI_ATOMIC_COMP
   capability with a non-zero set of supported atomic completer widths.

This proposal aims to avoid complications of reporting Atomic Ops
Routing, which can easily escalate into invalid P2P paths.  We also
require a specific VM configuration to avoid dependencies between
devices which may be sourced from dissimilar host paths.

While it's not exactly standard practice to modify root port device
capabilities runtime, it also does not seem to be precluded by the PCIe
Spec (6.0.1).  The Atomic Op completion bits of the DEVCAP2 register
are defined as Read-only:

7.4 Configuration Register Types
 Read-only - Register bits are read-only and cannot be altered by software.
             Where explicitly defined, these bits are used to reflect changing
             hardware state, and as a result bit values can be observed to
             change at run time. Register bit default values and bits that
             cannot change value at run time, are permitted to be hard-coded,
             initialized by system/device firmware, or initialized by hardware
             mechanisms such as pin strapping or nonvolatile storage.
             Initialization by system firmware is permitted only for system-
             integrated devices. If the optional feature that would Set the
             bits is not implemented, the bits must be hardwired to Zero.

Here "altered by software" is relative to guest writes to the config
space register, whereas in this implementation we're acting as hardware
and the bits are changing to reflect a change in runtime capabilities.
The spec does include a HwInit register type which would restrict the
value from changing at runtime outside of resets.  Therefore while it
would not be advised to update these bits arbitrarily, it does seem safe
and compatible with guest software to update the value on device attach
and detach.

Note that of the Linux in-kernel drivers that make use of Atomic Ops,
it's not common for the driver to test Atomic Ops support of the device
itself.  Support is assumed, therefore it's fruitless to provide masking
of support at the device rather than the root port.

Also, by allowing this dynamic support, enabling Atomic Ops becomes
transparent to VM management tools.  There is no requirement to
designate specific Atomic Ops capabilities to a root port and impose a
burden on other userspace utilities.

Feedback welcome.  Thanks,

Alex

v2:
 - Don't require cold-plug device, modify RP bits around realize/exit

Alex Williamson (4):
  linux-headers: Update for vfio capability reporting AtomicOps
  vfio: Implement a common device info helper
  pcie: Add a PCIe capability version helper
  vfio/pci: Enable AtomicOps completers on root ports

 hw/pci/pcie.c                 |  7 ++++
 hw/s390x/s390-pci-vfio.c      | 37 +++--------------
 hw/vfio/common.c              | 46 ++++++++++++++++-----
 hw/vfio/pci.c                 | 78 +++++++++++++++++++++++++++++++++++
 hw/vfio/pci.h                 |  1 +
 include/hw/pci/pcie.h         |  1 +
 include/hw/vfio/vfio-common.h |  1 +
 linux-headers/linux/vfio.h    | 14 +++++++
 8 files changed, 142 insertions(+), 43 deletions(-)

Comments

Robin Voetter May 31, 2023, 9:55 p.m. UTC | #1
Hi Alex,

Thanks for taking the time to implement support for Atomic Op completer
support properly :). I have tested out these patches and the kernel
patch, and apart from a minor issue with patch 2 everything works fine;
ROCm programs that use device->host atomic operations work properly.

Something that I have been thinking about, are there any implications
involved with enabling this feature automatically with no possibility of
turning it off? I have no use case for that, though, and I cant really
think of a reason other than preventing the guest from finding out
hardware details about the host.

Thanks,

Robin Voetter, Stream HPC
Alex Williamson May 31, 2023, 10:24 p.m. UTC | #2
On Wed, 31 May 2023 23:55:41 +0200
Robin Voetter <robin@streamhpc.com> wrote:

> Hi Alex,
> 
> Thanks for taking the time to implement support for Atomic Op completer
> support properly :). I have tested out these patches and the kernel
> patch, and apart from a minor issue with patch 2 everything works fine;

Yes, Cedric noted the extra semicolon as well, I'm about to post that
patch as non-RFC since it has no dependencies on anything else.

> ROCm programs that use device->host atomic operations work properly.

Great, thanks for testing!

> Something that I have been thinking about, are there any implications
> involved with enabling this feature automatically with no possibility of
> turning it off? I have no use case for that, though, and I cant really
> think of a reason other than preventing the guest from finding out
> hardware details about the host.

Not sure I follow, are you suggesting that Atomic Ops completion
support is proactively enabled in the VM to match the host, regardless
of attached assigned devices?  An obvious problem there is migration.
If I start a VM on a host with Atomic Ops support and want to migrate
to a host without Atomic Ops support, config space of the root ports is
now different and the migration fails.  QEMU would also require
elevated privileges to read config space to determine the host support,
and then what does it do if only some of the PCIe hierarchy supports
Atomic Ops.

Policy decisions like that are generally left to management tools, so
there would always be a means to enable or disable the feature.  In
fact, that's specifically why I test that the Atomic Op completer bits
are unset in the root port before changing them so that this automatic
enablement could live alongside a command line option to statically
enable some bits.

That does however remind me that it is often good with these sorts of
"clever" automatic features to have an opt-out, so I'll likely add an
x-no-rp-atomics device option in the next version to provide that.
Thanks,

Alex
Philippe Mathieu-Daudé May 31, 2023, 10:29 p.m. UTC | #3
On 1/6/23 00:24, Alex Williamson wrote:
> On Wed, 31 May 2023 23:55:41 +0200
> Robin Voetter <robin@streamhpc.com> wrote:
> 
>> Hi Alex,
>>
>> Thanks for taking the time to implement support for Atomic Op completer
>> support properly :). I have tested out these patches and the kernel
>> patch, and apart from a minor issue with patch 2 everything works fine;
> 
> Yes, Cedric noted the extra semicolon as well, I'm about to post that
> patch as non-RFC since it has no dependencies on anything else.


> Policy decisions like that are generally left to management tools, so
> there would always be a means to enable or disable the feature.  In
> fact, that's specifically why I test that the Atomic Op completer bits
> are unset in the root port before changing them so that this automatic
> enablement could live alongside a command line option to statically
> enable some bits.
> 
> That does however remind me that it is often good with these sorts of
> "clever" automatic features to have an opt-out, so I'll likely add an
> x-no-rp-atomics device option in the next version to provide that.

I was just going to suggest that, just in case :)
Philippe Mathieu-Daudé May 31, 2023, 10:31 p.m. UTC | #4
On 27/5/23 01:15, Alex Williamson wrote:
> This RFC proposes to allow a vfio-pci device to manipulate the PCI
> Express capability of an associated root port to enable Atomic Op
> completer support as equivalent to host capabilities.  This would
[...]

> While it's not exactly standard practice to modify root port device
> capabilities runtime, it also does not seem to be precluded by the PCIe
> Spec (6.0.1).  The Atomic Op completion bits of the DEVCAP2 register
> are defined as Read-only:
> 
> 7.4 Configuration Register Types
>   Read-only - Register bits are read-only and cannot be altered by software.
>               Where explicitly defined, these bits are used to reflect changing
>               hardware state, and as a result bit values can be observed to
>               change at run time. Register bit default values and bits that
>               cannot change value at run time, are permitted to be hard-coded,
>               initialized by system/device firmware, or initialized by hardware
>               mechanisms such as pin strapping or nonvolatile storage.
>               Initialization by system firmware is permitted only for system-
>               integrated devices. If the optional feature that would Set the
>               bits is not implemented, the bits must be hardwired to Zero.
> 
> Here "altered by software" is relative to guest writes to the config
> space register, whereas in this implementation we're acting as hardware
> and the bits are changing to reflect a change in runtime capabilities.
> The spec does include a HwInit register type which would restrict the
> value from changing at runtime outside of resets.  Therefore while it
> would not be advised to update these bits arbitrarily, it does seem safe
> and compatible with guest software to update the value on device attach
> and detach.

 From my previous (short) PCIe experience, this is also my understanding.
Robin Voetter June 1, 2023, 8:15 a.m. UTC | #5
On 6/1/23 00:24, Alex Williamson wrote:
> On Wed, 31 May 2023 23:55:41 +0200
> Robin Voetter <robin@streamhpc.com> wrote:
>> Something that I have been thinking about, are there any implications
>> involved with enabling this feature automatically with no possibility of
>> turning it off? I have no use case for that, though, and I cant really
>> think of a reason other than preventing the guest from finding out
>> hardware details about the host.
> 
> Not sure I follow, are you suggesting that Atomic Ops completion
> support is proactively enabled in the VM to match the host, regardless
> of attached assigned devices?  An obvious problem there is migration.
> If I start a VM on a host with Atomic Ops support and want to migrate
> to a host without Atomic Ops support, config space of the root ports is
> now different and the migration fails.  QEMU would also require
> elevated privileges to read config space to determine the host support,
> and then what does it do if only some of the PCIe hierarchy supports
> Atomic Ops.
> 
> Policy decisions like that are generally left to management tools, so
> there would always be a means to enable or disable the feature.  In
> fact, that's specifically why I test that the Atomic Op completer bits
> are unset in the root port before changing them so that this automatic
> enablement could live alongside a command line option to statically
> enable some bits.
> 
> That does however remind me that it is often good with these sorts of
> "clever" automatic features to have an opt-out, so I'll likely add an
> x-no-rp-atomics device option in the next version to provide that.

Yes, something like that that is exactly what I was thinking about.

Thanks,

Robin Voetter
Philippe Mathieu-Daudé June 2, 2023, 2:02 p.m. UTC | #6
Hi Markus, Marc-André,

> On 1/6/23 00:24, Alex Williamson wrote:
>> Policy decisions like that are generally left to management tools, so
>> there would always be a means to enable or disable the feature.  In
>> fact, that's specifically why I test that the Atomic Op completer bits
>> are unset in the root port before changing them so that this automatic
>> enablement could live alongside a command line option to statically
>> enable some bits.
>>
>> That does however remind me that it is often good with these sorts of
>> "clever" automatic features to have an opt-out, so I'll likely add an
>> x-no-rp-atomics device option in the next version to provide that.

Still thinking about this series I remembered since commit a3c45b3e62
("qapi: New special feature flag "unstable"") the "x-" prefix is
obsolete (superseded) in QAPI generated code.

I wonder about device properties:

$ git grep -F '"x-' hw | wc -l
      130

$ git grep -F '"x-' hw/vfio/pci.c
hw/vfio/pci.c:2987:        error_setg(errp, "x-balloon-allowed only 
potentially compatible "
hw/vfio/pci.c:3335: 
DEFINE_PROP_ON_OFF_AUTO("x-pre-copy-dirty-page-tracking", VFIOPCIDevice,
hw/vfio/pci.c:3342:    DEFINE_PROP_UINT32("x-intx-mmap-timeout-ms", 
VFIOPCIDevice,
hw/vfio/pci.c:3344:    DEFINE_PROP_BIT("x-vga", VFIOPCIDevice, features,
hw/vfio/pci.c:3346:    DEFINE_PROP_BIT("x-req", VFIOPCIDevice, features,
hw/vfio/pci.c:3348:    DEFINE_PROP_BIT("x-igd-opregion", VFIOPCIDevice, 
features,
hw/vfio/pci.c:3350:    DEFINE_PROP_BOOL("x-enable-migration", VFIOPCIDevice,
hw/vfio/pci.c:3352:    DEFINE_PROP_BOOL("x-no-mmap", VFIOPCIDevice, 
vbasedev.no_mmap, false),
hw/vfio/pci.c:3353:    DEFINE_PROP_BOOL("x-balloon-allowed", VFIOPCIDevice,
hw/vfio/pci.c:3355:    DEFINE_PROP_BOOL("x-no-kvm-intx", VFIOPCIDevice, 
no_kvm_intx, false),
hw/vfio/pci.c:3356:    DEFINE_PROP_BOOL("x-no-kvm-msi", VFIOPCIDevice, 
no_kvm_msi, false),
hw/vfio/pci.c:3357:    DEFINE_PROP_BOOL("x-no-kvm-msix", VFIOPCIDevice, 
no_kvm_msix, false),
hw/vfio/pci.c:3358:    DEFINE_PROP_BOOL("x-no-geforce-quirks", 
VFIOPCIDevice,
hw/vfio/pci.c:3360:    DEFINE_PROP_BOOL("x-no-kvm-ioeventfd", 
VFIOPCIDevice, no_kvm_ioeventfd,
hw/vfio/pci.c:3362:    DEFINE_PROP_BOOL("x-no-vfio-ioeventfd", 
VFIOPCIDevice, no_vfio_ioeventfd,
hw/vfio/pci.c:3364:    DEFINE_PROP_UINT32("x-pci-vendor-id", 
VFIOPCIDevice, vendor_id, PCI_ANY_ID),
hw/vfio/pci.c:3365:    DEFINE_PROP_UINT32("x-pci-device-id", 
VFIOPCIDevice, device_id, PCI_ANY_ID),
hw/vfio/pci.c:3366:    DEFINE_PROP_UINT32("x-pci-sub-vendor-id", 
VFIOPCIDevice,
hw/vfio/pci.c:3368:    DEFINE_PROP_UINT32("x-pci-sub-device-id", 
VFIOPCIDevice,
hw/vfio/pci.c:3370:    DEFINE_PROP_UINT32("x-igd-gms", VFIOPCIDevice, 
igd_gms, 0),
hw/vfio/pci.c:3371: 
DEFINE_PROP_UNSIGNED_NODEFAULT("x-nv-gpudirect-clique", VFIOPCIDevice,
hw/vfio/pci.c:3374:    DEFINE_PROP_OFF_AUTO_PCIBAR("x-msix-relocation", 
VFIOPCIDevice, msix_relo,

Is there a plan to use something similar for QOM properties?
Is it OK to keep using the "x-" prefix there?

Thanks,

Phil.