mbox series

[0/2] Fix IOMMU setup for hotplugged devices on pseries

Message ID 20190905042215.3974-1-shawn@anastas.io (mailing list archive)
Headers show
Series Fix IOMMU setup for hotplugged devices on pseries | expand

Message

Shawn Anastasio Sept. 5, 2019, 4:22 a.m. UTC
On pseries QEMU guests, IOMMU setup for hotplugged PCI devices is currently
broken for all but the first device on a given bus. The culprit is an ordering
issue in the pseries hotplug path (via pci_rescan_bus()) which results in IOMMU
group assigment occuring before device registration in sysfs. This triggers
the following check in arch/powerpc/kernel/iommu.c:

/*
 * The sysfs entries should be populated before
 * binding IOMMU group. If sysfs entries isn't
 * ready, we simply bail.
 */
if (!device_is_registered(dev))
	return -ENOENT;

This fails for hotplugged devices since the pcibios_add_device() call in the
pseries hotplug path (in pci_device_add()) occurs before device_add().
Since the IOMMU groups are set up in pcibios_add_device(), this means that a
sysfs entry will not yet be present and it will fail.

There is a special case that allows the first hotplugged device on a bus to
succeed, though. The powerpc pcibios_add_device() implementation will skip
initializing the device if bus setup is not yet complete.
Later, the pci core will call pcibios_fixup_bus() which will perform setup
for the first (and only) device on the bus and since it has already been
registered in sysfs, the IOMMU setup will succeed.

My current solution is to introduce another pcibios function, pcibios_fixup_dev,
which is called after device_add() in pci_device_add(). Then in powerpc code,
pcibios_setup_device() was moved from pcibios_add_device() to this new function
which will occur after sysfs registration so IOMMU assignment will succeed.

I added a new pcibios function rather than moving the pcibios_add_device() call
to after the device_add() call in pci_add_device() because there are other
architectures that use it and it wasn't immediately clear to me whether moving
it would break them.

If anybody has more insight or a better way to fix this, please let me know.

Shawn Anastasio (2):
  PCI: Introduce pcibios_fixup_dev()
  powerpc/pci: Fix IOMMU setup for hotplugged devices on pseries

 arch/powerpc/kernel/pci-common.c | 13 ++++++-------
 drivers/pci/probe.c              | 14 ++++++++++++++
 include/linux/pci.h              |  1 +
 3 files changed, 21 insertions(+), 7 deletions(-)

Comments

Alexey Kardashevskiy Sept. 5, 2019, 9:08 a.m. UTC | #1
On 05/09/2019 14:22, Shawn Anastasio wrote:
> On pseries QEMU guests, IOMMU setup for hotplugged PCI devices is currently
> broken for all but the first device on a given bus. The culprit is an ordering
> issue in the pseries hotplug path (via pci_rescan_bus()) which results in IOMMU
> group assigment occuring before device registration in sysfs. This triggers
> the following check in arch/powerpc/kernel/iommu.c:
> 
> /*
>   * The sysfs entries should be populated before
>   * binding IOMMU group. If sysfs entries isn't
>   * ready, we simply bail.
>   */
> if (!device_is_registered(dev))
> 	return -ENOENT;
> 
> This fails for hotplugged devices since the pcibios_add_device() call in the
> pseries hotplug path (in pci_device_add()) occurs before device_add().
> Since the IOMMU groups are set up in pcibios_add_device(), this means that a
> sysfs entry will not yet be present and it will fail.

I just tried hotplugging 3 virtio-net devices into a guest system with 
v5.2 kernel and it seems working (i.e. BARs mapped, a driver is bound):


root@le-dbg:~# lspci -v | egrep -i '(virtio|Memory)'
00:00.0 Ethernet controller: Red Hat, Inc Virtio network device
         Memory at 200080040000 (32-bit, non-prefetchable) [size=4K]
         Memory at 210000000000 (64-bit, prefetchable) [size=16K]
         Kernel driver in use: virtio-pci
00:01.0 Ethernet controller: Red Hat, Inc Virtio network device
         Memory at 200080041000 (32-bit, non-prefetchable) [size=4K]
         Memory at 210000004000 (64-bit, prefetchable) [size=16K]
         Kernel driver in use: virtio-pci
00:02.0 Ethernet controller: Red Hat, Inc Virtio network device
         Memory at 200080042000 (32-bit, non-prefetchable) [size=4K]
         Memory at 210000008000 (64-bit, prefetchable) [size=16K]
         Kernel driver in use: virtio-pci

Can you explain in detail what you are doing exactly and what is failing 
and what qemu/guest kernel/guest distro is used? Thanks,


> 
> There is a special case that allows the first hotplugged device on a bus to
> succeed, though. The powerpc pcibios_add_device() implementation will skip
> initializing the device if bus setup is not yet complete.
> Later, the pci core will call pcibios_fixup_bus() which will perform setup
> for the first (and only) device on the bus and since it has already been
> registered in sysfs, the IOMMU setup will succeed.
> 
> My current solution is to introduce another pcibios function, pcibios_fixup_dev,
> which is called after device_add() in pci_device_add(). Then in powerpc code,
> pcibios_setup_device() was moved from pcibios_add_device() to this new function
> which will occur after sysfs registration so IOMMU assignment will succeed.
> 
> I added a new pcibios function rather than moving the pcibios_add_device() call
> to after the device_add() call in pci_add_device() because there are other
> architectures that use it and it wasn't immediately clear to me whether moving
> it would break them.
> 
> If anybody has more insight or a better way to fix this, please let me know.
> 
> Shawn Anastasio (2):
>    PCI: Introduce pcibios_fixup_dev()
>    powerpc/pci: Fix IOMMU setup for hotplugged devices on pseries
> 
>   arch/powerpc/kernel/pci-common.c | 13 ++++++-------
>   drivers/pci/probe.c              | 14 ++++++++++++++
>   include/linux/pci.h              |  1 +
>   3 files changed, 21 insertions(+), 7 deletions(-)
>
Lukas Wunner Sept. 5, 2019, 9:38 a.m. UTC | #2
On Wed, Sep 04, 2019 at 11:22:13PM -0500, Shawn Anastasio wrote:
> If anybody has more insight or a better way to fix this, please let me know.

Have you considered moving the invocation of pcibios_setup_device()
to pcibios_bus_add_device()?

The latter is called from pci_bus_add_device() in drivers/pci/bus.c.
At this point device_add() has been called, so the device exists in
sysfs.

Basically when adding a PCI device, the order is:

* pci_device_add() populates struct pci_dev, calls device_add(),
  binding the device to a driver is prevented
* after pci_device_add() has been called for all discovered devices,
  resources are allocated
* pci_bus_add_device() is called for each device,
  calls pcibios_bus_add_device() and binds the device to a driver

Thanks,

Lukas
Shawn Anastasio Sept. 5, 2019, 5:59 p.m. UTC | #3
On 9/5/19 4:08 AM, Alexey Kardashevskiy wrot>
> I just tried hotplugging 3 virtio-net devices into a guest system with 
> v5.2 kernel and it seems working (i.e. BARs mapped, a driver is bound):
>
> 
> root@le-dbg:~# lspci -v | egrep -i '(virtio|Memory)'
> 00:00.0 Ethernet controller: Red Hat, Inc Virtio network device
>          Memory at 200080040000 (32-bit, non-prefetchable) [size=4K]
>          Memory at 210000000000 (64-bit, prefetchable) [size=16K]
>          Kernel driver in use: virtio-pci
> 00:01.0 Ethernet controller: Red Hat, Inc Virtio network device
>          Memory at 200080041000 (32-bit, non-prefetchable) [size=4K]
>          Memory at 210000004000 (64-bit, prefetchable) [size=16K]
>          Kernel driver in use: virtio-pci
> 00:02.0 Ethernet controller: Red Hat, Inc Virtio network device
>          Memory at 200080042000 (32-bit, non-prefetchable) [size=4K]
>          Memory at 210000008000 (64-bit, prefetchable) [size=16K]
>          Kernel driver in use: virtio-pci
> 
> Can you explain in detail what you are doing exactly and what is failing 
> and what qemu/guest kernel/guest distro is used? Thanks,

Sure. I'm on host kernel 5.2.8, guest on 5.3-rc7 (also tested on 5.1.16)
and I'm hotplugging ivshmem devices to a separate spapr-pci-host-bridge
defined as follows:

-device spapr-pci-host-bridge,index=1,id=pci.1

Device hotplug and BAR assignment does work, but IOMMU group assignment
seems to fail. This is evidenced by the kernel log which shows the
following message for the first device but not the second:

[  136.849448] pci 0001:00:00.0: Adding to iommu group 1

Trying to bind the second device to vfio-pci as a result of this
fails:

[  471.691948] vfio-pci: probe of 0001:00:01.0 failed with error -22

I traced that failure to a call to iommu_group_get() which returns
NULL for the second device. I then traced that back to the ordering
issue I described.

For your second and third virtio-net devices, was the
"Adding to iommu group N" kernel message printed?
Shawn Anastasio Sept. 5, 2019, 6:42 p.m. UTC | #4
On 9/5/19 4:38 AM, Lukas Wunner wrote:
> On Wed, Sep 04, 2019 at 11:22:13PM -0500, Shawn Anastasio wrote:
>> If anybody has more insight or a better way to fix this, please let me know.
> 
> Have you considered moving the invocation of pcibios_setup_device()
> to pcibios_bus_add_device()?
> 
> The latter is called from pci_bus_add_device() in drivers/pci/bus.c.
> At this point device_add() has been called, so the device exists in
> sysfs.
> 
> Basically when adding a PCI device, the order is:
> 
> * pci_device_add() populates struct pci_dev, calls device_add(),
>    binding the device to a driver is prevented
> * after pci_device_add() has been called for all discovered devices,
>    resources are allocated
> * pci_bus_add_device() is called for each device,
>    calls pcibios_bus_add_device() and binds the device to a driver

Thank you, this is exactly what I was looking for! Just tested and
this seems to work perfectly. I'll go ahead and submit a v2 that
does this instead.

Thanks again,
Shawn