Message ID | 20190905042215.3974-1-shawn@anastas.io (mailing list archive) |
---|---|
Headers | show |
Series | Fix IOMMU setup for hotplugged devices on pseries | expand |
On 05/09/2019 14:22, Shawn Anastasio wrote: > On pseries QEMU guests, IOMMU setup for hotplugged PCI devices is currently > broken for all but the first device on a given bus. The culprit is an ordering > issue in the pseries hotplug path (via pci_rescan_bus()) which results in IOMMU > group assigment occuring before device registration in sysfs. This triggers > the following check in arch/powerpc/kernel/iommu.c: > > /* > * The sysfs entries should be populated before > * binding IOMMU group. If sysfs entries isn't > * ready, we simply bail. > */ > if (!device_is_registered(dev)) > return -ENOENT; > > This fails for hotplugged devices since the pcibios_add_device() call in the > pseries hotplug path (in pci_device_add()) occurs before device_add(). > Since the IOMMU groups are set up in pcibios_add_device(), this means that a > sysfs entry will not yet be present and it will fail. I just tried hotplugging 3 virtio-net devices into a guest system with v5.2 kernel and it seems working (i.e. BARs mapped, a driver is bound): root@le-dbg:~# lspci -v | egrep -i '(virtio|Memory)' 00:00.0 Ethernet controller: Red Hat, Inc Virtio network device Memory at 200080040000 (32-bit, non-prefetchable) [size=4K] Memory at 210000000000 (64-bit, prefetchable) [size=16K] Kernel driver in use: virtio-pci 00:01.0 Ethernet controller: Red Hat, Inc Virtio network device Memory at 200080041000 (32-bit, non-prefetchable) [size=4K] Memory at 210000004000 (64-bit, prefetchable) [size=16K] Kernel driver in use: virtio-pci 00:02.0 Ethernet controller: Red Hat, Inc Virtio network device Memory at 200080042000 (32-bit, non-prefetchable) [size=4K] Memory at 210000008000 (64-bit, prefetchable) [size=16K] Kernel driver in use: virtio-pci Can you explain in detail what you are doing exactly and what is failing and what qemu/guest kernel/guest distro is used? Thanks, > > There is a special case that allows the first hotplugged device on a bus to > succeed, though. The powerpc pcibios_add_device() implementation will skip > initializing the device if bus setup is not yet complete. > Later, the pci core will call pcibios_fixup_bus() which will perform setup > for the first (and only) device on the bus and since it has already been > registered in sysfs, the IOMMU setup will succeed. > > My current solution is to introduce another pcibios function, pcibios_fixup_dev, > which is called after device_add() in pci_device_add(). Then in powerpc code, > pcibios_setup_device() was moved from pcibios_add_device() to this new function > which will occur after sysfs registration so IOMMU assignment will succeed. > > I added a new pcibios function rather than moving the pcibios_add_device() call > to after the device_add() call in pci_add_device() because there are other > architectures that use it and it wasn't immediately clear to me whether moving > it would break them. > > If anybody has more insight or a better way to fix this, please let me know. > > Shawn Anastasio (2): > PCI: Introduce pcibios_fixup_dev() > powerpc/pci: Fix IOMMU setup for hotplugged devices on pseries > > arch/powerpc/kernel/pci-common.c | 13 ++++++------- > drivers/pci/probe.c | 14 ++++++++++++++ > include/linux/pci.h | 1 + > 3 files changed, 21 insertions(+), 7 deletions(-) >
On Wed, Sep 04, 2019 at 11:22:13PM -0500, Shawn Anastasio wrote:
> If anybody has more insight or a better way to fix this, please let me know.
Have you considered moving the invocation of pcibios_setup_device()
to pcibios_bus_add_device()?
The latter is called from pci_bus_add_device() in drivers/pci/bus.c.
At this point device_add() has been called, so the device exists in
sysfs.
Basically when adding a PCI device, the order is:
* pci_device_add() populates struct pci_dev, calls device_add(),
binding the device to a driver is prevented
* after pci_device_add() has been called for all discovered devices,
resources are allocated
* pci_bus_add_device() is called for each device,
calls pcibios_bus_add_device() and binds the device to a driver
Thanks,
Lukas
On 9/5/19 4:08 AM, Alexey Kardashevskiy wrot> > I just tried hotplugging 3 virtio-net devices into a guest system with > v5.2 kernel and it seems working (i.e. BARs mapped, a driver is bound): > > > root@le-dbg:~# lspci -v | egrep -i '(virtio|Memory)' > 00:00.0 Ethernet controller: Red Hat, Inc Virtio network device > Memory at 200080040000 (32-bit, non-prefetchable) [size=4K] > Memory at 210000000000 (64-bit, prefetchable) [size=16K] > Kernel driver in use: virtio-pci > 00:01.0 Ethernet controller: Red Hat, Inc Virtio network device > Memory at 200080041000 (32-bit, non-prefetchable) [size=4K] > Memory at 210000004000 (64-bit, prefetchable) [size=16K] > Kernel driver in use: virtio-pci > 00:02.0 Ethernet controller: Red Hat, Inc Virtio network device > Memory at 200080042000 (32-bit, non-prefetchable) [size=4K] > Memory at 210000008000 (64-bit, prefetchable) [size=16K] > Kernel driver in use: virtio-pci > > Can you explain in detail what you are doing exactly and what is failing > and what qemu/guest kernel/guest distro is used? Thanks, Sure. I'm on host kernel 5.2.8, guest on 5.3-rc7 (also tested on 5.1.16) and I'm hotplugging ivshmem devices to a separate spapr-pci-host-bridge defined as follows: -device spapr-pci-host-bridge,index=1,id=pci.1 Device hotplug and BAR assignment does work, but IOMMU group assignment seems to fail. This is evidenced by the kernel log which shows the following message for the first device but not the second: [ 136.849448] pci 0001:00:00.0: Adding to iommu group 1 Trying to bind the second device to vfio-pci as a result of this fails: [ 471.691948] vfio-pci: probe of 0001:00:01.0 failed with error -22 I traced that failure to a call to iommu_group_get() which returns NULL for the second device. I then traced that back to the ordering issue I described. For your second and third virtio-net devices, was the "Adding to iommu group N" kernel message printed?
On 9/5/19 4:38 AM, Lukas Wunner wrote: > On Wed, Sep 04, 2019 at 11:22:13PM -0500, Shawn Anastasio wrote: >> If anybody has more insight or a better way to fix this, please let me know. > > Have you considered moving the invocation of pcibios_setup_device() > to pcibios_bus_add_device()? > > The latter is called from pci_bus_add_device() in drivers/pci/bus.c. > At this point device_add() has been called, so the device exists in > sysfs. > > Basically when adding a PCI device, the order is: > > * pci_device_add() populates struct pci_dev, calls device_add(), > binding the device to a driver is prevented > * after pci_device_add() has been called for all discovered devices, > resources are allocated > * pci_bus_add_device() is called for each device, > calls pcibios_bus_add_device() and binds the device to a driver Thank you, this is exactly what I was looking for! Just tested and this seems to work perfectly. I'll go ahead and submit a v2 that does this instead. Thanks again, Shawn