Message ID | cover.1642626515.git.jag.raman@oracle.com (mailing list archive) |
---|---|
Headers | show |
Series | vfio-user server in QEMU | expand |
Hi Jag, Thanks for this latest revision. The biggest outstanding question I have is about the isolated address spaces design. This patch series needs a PCIBus with its own Memory Space, I/O Space, and interrupts. That way a single QEMU process can host vfio-user servers that different VMs connect to. They all need isolated address spaces so that mapping a BAR in Device A does not conflict with mapping a BAR in Device B. The current approach adds special code to hw/pci/pci.c so that custom AddressSpace can be set up. The isolated PCIBus is an automatically created PCIe root port that's a child of the machine's main PCI bus. On one hand it's neat because QEMU's assumption that there is only one root SysBus isn't violated. On the other hand it seems like a special case hack for PCI and I'm not sure in what sense these PCIBusses are really children of the machine's main PCI bus since they don't share or interact in any way. Another approach that came to mind is to allow multiple root SysBusses. Each vfio-user server would need its own SysBus and put a regular PCI host onto that isolated SysBus without modifying hw/pci/pci.c with a special case. The downside to this is that violating the single SysBus assumption probably breaks monitor commands that rely on qdev_find_recursive() and friends. It seems cleaner than adding isolated address spaces to PCI specifically, but also raises the question if multiple machine instances are needed (which would raise even more questions). I wanted to raise this to see if Peter, Kevin, Michael, and others are happy with the current approach or have ideas for a clean solution. Stefan
> On Jan 25, 2022, at 11:00 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote: > > Hi Jag, > Thanks for this latest revision. The biggest outstanding question I have > is about the isolated address spaces design. Thank you for taking the time to review the patches, Stefan! > > This patch series needs a PCIBus with its own Memory Space, I/O Space, > and interrupts. That way a single QEMU process can host vfio-user > servers that different VMs connect to. They all need isolated address > spaces so that mapping a BAR in Device A does not conflict with mapping > a BAR in Device B. > > The current approach adds special code to hw/pci/pci.c so that custom > AddressSpace can be set up. The isolated PCIBus is an automatically > created PCIe root port that's a child of the machine's main PCI bus. On > one hand it's neat because QEMU's assumption that there is only one > root SysBus isn't violated. On the other hand it seems like a special > case hack for PCI and I'm not sure in what sense these PCIBusses are > really children of the machine's main PCI bus since they don't share or > interact in any way. We are discussing the automatic creation part you just mentioned in the following email: [PATCH v5 07/18] vfio-user: set qdev bus callbacks for remote machine I agree that automatic creation of a parent bus is not ideal - we could specify the parent bus as a separate option in the command-line or QMP. This change would avoid modification to hw/pci/pci.c - the new PCI bus could be created inplace during device creation/hotplug. The following image gives an idea of the bus/device topology in the remote machine, as implemented in the current series. Each secondary bus and its children have isolated memory and IO spaces. https://gitlab.com/jraman/qemu/-/commit/2e2ebf004894075ad8044739b0b16ce875114c4c > > Another approach that came to mind is to allow multiple root SysBusses. > Each vfio-user server would need its own SysBus and put a regular PCI > host onto that isolated SysBus without modifying hw/pci/pci.c with a > special case. The downside to this is that violating the single SysBus > assumption probably breaks monitor commands that rely on > qdev_find_recursive() and friends. It seems cleaner than adding isolated > address spaces to PCI specifically, but also raises the question if > multiple machine instances are needed (which would raise even more > questions). Based on further discussion with Stefan, I got some clarity. We could consider one more option as well - somewhere in-between multiple root SysBuses and the topology discussed above (with secondary PCI buses). We could implement a TYPE_SYS_BUS_DEVICE that creates a root PCI bus with isolated memory ranges. Something along the lines in the following diagram: https://gitlab.com/jraman/qemu/-/commit/81f6a998278a2a795be0db7acdeb1caa2d6744fb An example set of QMP commands to attach PCI devices would be: device_add pci-root-bus,id=rb1 device_add <driver>,id=mydev,bus=rb1 object-add x-vfio-user-server,device=mydev where ‘pci-root-bus’ is a TYPE_SYS_BUS_DEVICE that creates its own root PCI bus. I’m very sorry if the above two web links disturb your review workflow. > > I wanted to raise this to see if Peter, Kevin, Michael, and others are > happy with the current approach or have ideas for a clean solution. Looking forward to your comments. Thank you! -- Jag > > Stefan
On Wed, Jan 26, 2022 at 05:04:58AM +0000, Jag Raman wrote: > > > > On Jan 25, 2022, at 11:00 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote: > > > > Hi Jag, > > Thanks for this latest revision. The biggest outstanding question I have > > is about the isolated address spaces design. > > Thank you for taking the time to review the patches, Stefan! > > > > > This patch series needs a PCIBus with its own Memory Space, I/O Space, > > and interrupts. That way a single QEMU process can host vfio-user > > servers that different VMs connect to. They all need isolated address > > spaces so that mapping a BAR in Device A does not conflict with mapping > > a BAR in Device B. > > > > The current approach adds special code to hw/pci/pci.c so that custom > > AddressSpace can be set up. The isolated PCIBus is an automatically > > created PCIe root port that's a child of the machine's main PCI bus. On > > one hand it's neat because QEMU's assumption that there is only one > > root SysBus isn't violated. On the other hand it seems like a special > > case hack for PCI and I'm not sure in what sense these PCIBusses are > > really children of the machine's main PCI bus since they don't share or > > interact in any way. > > We are discussing the automatic creation part you just mentioned in > the following email: > [PATCH v5 07/18] vfio-user: set qdev bus callbacks for remote machine > > I agree that automatic creation of a parent bus is not ideal - we could > specify the parent bus as a separate option in the command-line or > QMP. This change would avoid modification to hw/pci/pci.c - the new > PCI bus could be created inplace during device creation/hotplug. > > The following image gives an idea of the bus/device topology in the remote > machine, as implemented in the current series. Each secondary bus and > its children have isolated memory and IO spaces. > https://gitlab.com/jraman/qemu/-/commit/2e2ebf004894075ad8044739b0b16ce875114c4c Do isolated PCI busses have any relationship with their parent at all? I think the parent plays a useful role in DMA/IOMMU, interrupts, or PCI addressing. That leaves me wondering if a parent/child relationship is an appropriate way to model this. That said, this approach solves two problems: 1. There must be some parent for the new PCI bus. 2. qdev_find_recursive() and friends must be able to find the PCIDevice on the isolated bus. > > Another approach that came to mind is to allow multiple root SysBusses. > > Each vfio-user server would need its own SysBus and put a regular PCI > > host onto that isolated SysBus without modifying hw/pci/pci.c with a > > special case. The downside to this is that violating the single SysBus > > assumption probably breaks monitor commands that rely on > > qdev_find_recursive() and friends. It seems cleaner than adding isolated > > address spaces to PCI specifically, but also raises the question if > > multiple machine instances are needed (which would raise even more > > questions). > > Based on further discussion with Stefan, I got some clarity. We could consider one > more option as well - somewhere in-between multiple root SysBuses and the topology > discussed above (with secondary PCI buses). We could implement a > TYPE_SYS_BUS_DEVICE that creates a root PCI bus with isolated memory ranges. > Something along the lines in the following diagram: > https://gitlab.com/jraman/qemu/-/commit/81f6a998278a2a795be0db7acdeb1caa2d6744fb > > An example set of QMP commands to attach PCI devices would be: > device_add pci-root-bus,id=rb1 > device_add <driver>,id=mydev,bus=rb1 > object-add x-vfio-user-server,device=mydev > > where ‘pci-root-bus’ is a TYPE_SYS_BUS_DEVICE that creates its own root PCI bus. If it's less code then that's an advantage but it still places unrelated DMA/interrupt spaces onto the same SysBus and therefore requires isolation. I think this alternative doesn't fundamentally fix the design. If multiple roots are possible then isolation doesn't need to be implemented explicitly, it comes for free as part of the regular qdev/qbus hierarchy. The devices would be isolated by the fact that they live on different roots :). I have never tried the multi-root approach though, so I'm not sure how much work it is. Stefan