Message ID | 20250220224918.2520417-1-alex.williamson@redhat.com (mailing list archive) |
---|---|
Headers | show |
Series | PCI: Implement basic PCI PM capability backing | expand |
On Thu, Feb 20, 2025 at 03:48:53PM -0700, Alex Williamson wrote: > Eric recently identified an issue[1] where during graceful shutdown > of a VM in a vIOMMU configuration, the guest driver places the device > into the D3 power state, the vIOMMU is then disabled, triggering an > AddressSpace update. The device BARs are still mapped into the AS, > but the vfio host driver refuses to DMA map the MMIO space due to the > device power state. > > The proposed solution in [1] was to skip mappings based on the > device power state. Here we take a different approach. The PCI spec > defines that devices in D1/2/3 power state should respond only to > configuration and message requests and all other requests should be > handled as an Unsupported Request. In other words, the memory and > IO BARs are not accessible except when the device is in the D0 power > state. > > To emulate this behavior, we can factor the device power state into > the mapping state of the device BARs. Therefore the BAR is marked > as unmapped if either the respective command register enable bit is > clear or the device is not in the D0 power state. > > In order to implement this, the PowerState field of the PMCSR > register becomes writable, which allows the device to appear in > lower power states. This also therefore implements D3 support > (insofar as the BAR behavior) for all devices implementing the PM > capability. The PCI spec requires D3 support. > > An aspect that needs attention here is whether this change in the > wmask and PMCSR bits becomes a problem for migration, and how we > might solve it. For a guest migrating old->new, the device would > always be in the D0 power state, but the register becomes writable. > In the opposite direction, is it possible that a device could > migrate in a low power state and be stuck there since the bits are > read-only in old QEMU? Do we need an option for this behavior and a > machine state bump, or are there alternatives? > > Thanks, > Alex > > [1]https://lore.kernel.org/all/20250219175941.135390-1-eric.auger@redhat.com/ PCI bits: Reviewed-by: Michael S. Tsirkin <mst@redhat.com> feel free to merge. > Alex Williamson (5): > hw/pci: Basic support for PCI power management > pci: Use PCI PM capability initializer > vfio/pci: Delete local pm_cap > pcie, virtio: Remove redundant pm_cap > hw/vfio/pci: Re-order pre-reset > > hw/net/e1000e.c | 3 +- > hw/net/eepro100.c | 4 +- > hw/net/igb.c | 3 +- > hw/nvme/ctrl.c | 3 +- > hw/pci-bridge/pcie_pci_bridge.c | 3 +- > hw/pci/pci.c | 83 ++++++++++++++++++++++++++++++++- > hw/pci/trace-events | 2 + > hw/vfio/pci.c | 29 ++++++------ > hw/vfio/pci.h | 1 - > hw/virtio/virtio-pci.c | 11 ++--- > include/hw/pci/pci.h | 3 ++ > include/hw/pci/pci_device.h | 3 ++ > include/hw/pci/pcie.h | 2 - > 13 files changed, 112 insertions(+), 38 deletions(-) > > -- > 2.48.1