Message ID | 20181012145917.6840-1-jean-philippe.brucker@arm.com (mailing list archive) |
---|---|
Headers | show |
Series | Add virtio-iommu driver | expand |
On Fri, Oct 12, 2018 at 03:59:10PM +0100, Jean-Philippe Brucker wrote: > Implement the virtio-iommu driver, following specification v0.8 [1]. > Changes since v2 [2]: > > * Patches 2-4 allow virtio-iommu to use the PCI transport, since QEMU > would like to phase out the MMIO transport. This produces a complex > topology where the programming interface of the IOMMU could appear > lower than the endpoints that it translates. It's not unheard of (e.g. > AMD IOMMU), and the guest easily copes with this. > > The "Firmware description" section of the specification has been > updated with all combinations of PCI, MMIO and DT, ACPI. > > * Fix structures layout, they don't need the "packed" attribute anymore. > > * While we're at it, add domain parameter to DETACH request, and leave > some padding. This way the next version, that adds PASID support, > won't have to introduce a "DETACH2" request to stay backward > compatible. > > * Require virtio device 1.0+. Remove legacy transport notes from the > specification. > > * Request timeout is now only enabled with DEBUG. > > * The patch for VFIO Kconfig (previously patch 5/5) is in next. > > You can find Linux driver and kvmtool device on branches > virtio-iommu/v0.8 [3] (currently based on 4.19-rc7 but rebasing onto > next only produced a trivial conflict). Branch virtio-iommu/devel > contains a few patches that I'd like to send once the base is upstream: > > * virtio-iommu as a module. It got *much* nicer after Rob's probe > deferral rework, but I still have a bug to fix when re-loading the > virtio-iommu module. > > * ACPI support requires a minor IORT spec update (reservation of node > ID). I think it should be easier to obtain once the device and drivers > are upstream. > > [1] Virtio-iommu specification v0.8, diff from v0.7, and sources > git://linux-arm.org/virtio-iommu.git virtio-iommu/v0.8 > http://jpbrucker.net/virtio-iommu/spec/v0.8/virtio-iommu-v0.8.pdf > http://jpbrucker.net/virtio-iommu/spec/diffs/virtio-iommu-pdf-diff-v0.7-v0.8.pdf > > [2] [PATCH v2 0/5] Add virtio-iommu driver > https://www.spinics.net/lists/kvm/msg170655.html > > [3] git://linux-arm.org/linux-jpb.git virtio-iommu/v0.8 > git://linux-arm.org/kvmtool-jpb.git virtio-iommu/v0.8 > > Jean-Philippe Brucker (7): > dt-bindings: virtio-mmio: Add IOMMU description > dt-bindings: virtio: Add virtio-pci-iommu node > PCI: OF: allow endpoints to bypass the iommu > PCI: OF: Initialize dev->fwnode appropriately > iommu: Add virtio-iommu driver > iommu/virtio: Add probe request > iommu/virtio: Add event queue > > .../devicetree/bindings/virtio/iommu.txt | 66 + > .../devicetree/bindings/virtio/mmio.txt | 30 + > MAINTAINERS | 7 + > drivers/iommu/Kconfig | 11 + > drivers/iommu/Makefile | 1 + > drivers/iommu/virtio-iommu.c | 1171 +++++++++++++++++ > drivers/pci/of.c | 14 +- > include/uapi/linux/virtio_ids.h | 1 + > include/uapi/linux/virtio_iommu.h | 159 +++ > 9 files changed, 1457 insertions(+), 3 deletions(-) > create mode 100644 Documentation/devicetree/bindings/virtio/iommu.txt > create mode 100644 drivers/iommu/virtio-iommu.c > create mode 100644 include/uapi/linux/virtio_iommu.h This all looks good to me. Minor nits: - I think DEBUG mode is best just removed for now - Slightly wrong patch splitup causing a misaligned structure in uapi until all patches are applied. You should Cc Bjorn on the pci change - I'd like to see his ack on it being merged through my tree. And pls Cc the virtio-dev list on any virtio uapi changes. At a feature level I have some ideas for more features we could add, but for now I think I'll put this version in -next while you iron out the above wrinkles. Hope you can make the merge window. > -- > 2.19.1
On 12/10/2018 18:00, Michael S. Tsirkin wrote: > This all looks good to me. Minor nits: > - I think DEBUG mode is best just removed for now > - Slightly wrong patch splitup causing a misaligned structure > in uapi until all patches are applied. Thanks a lot for the review, I'll fix these up and send a new version > You should Cc Bjorn on the pci change - I'd like to see his ack on it > being merged through my tree. Argh, I don't know how I missed him. However patches 1-4 are device tree changes, and need acks from Rob or Mark (on Cc) > And pls Cc the virtio-dev list on any virtio uapi changes. > > At a feature level I have some ideas for more features we > could add, but for now I think I'll put this version in -next > while you iron out the above wrinkles. Hope you can make the > merge window. Thanks, I also have some work lined up for hardware acceleration and shared address spaces. Jean
Hi Jean, On 10/12/18 4:59 PM, Jean-Philippe Brucker wrote: > Implement the virtio-iommu driver, following specification v0.8 [1]. > Changes since v2 [2]: > > * Patches 2-4 allow virtio-iommu to use the PCI transport, since QEMU > would like to phase out the MMIO transport. This produces a complex > topology where the programming interface of the IOMMU could appear > lower than the endpoints that it translates. It's not unheard of (e.g. > AMD IOMMU), and the guest easily copes with this. > > The "Firmware description" section of the specification has been > updated with all combinations of PCI, MMIO and DT, ACPI. I have a question wrt the FW specification. The IOMMU consumes 1 slot in the PCI domain and one needs to leave a RID hole in the iommu-map. It is not obvious to me that this RID always is predictable given the pcie enumeration mechanism. Generally we have a coarse grain mapping of RID onto iommu phandles/STREAMIDs. Here, if I understand correctly we need to precisely identify the RID granted to the iommu. On QEMU this may depend on the instantiation order of the virtio-pci device right? So this does not look trivial to build this info. Isn't it possible to do this exclusion at kernel level instead? Thanks Eric > > * Fix structures layout, they don't need the "packed" attribute anymore. > > * While we're at it, add domain parameter to DETACH request, and leave > some padding. This way the next version, that adds PASID support, > won't have to introduce a "DETACH2" request to stay backward > compatible. > > * Require virtio device 1.0+. Remove legacy transport notes from the > specification. > > * Request timeout is now only enabled with DEBUG. > > * The patch for VFIO Kconfig (previously patch 5/5) is in next. > > You can find Linux driver and kvmtool device on branches > virtio-iommu/v0.8 [3] (currently based on 4.19-rc7 but rebasing onto > next only produced a trivial conflict). Branch virtio-iommu/devel > contains a few patches that I'd like to send once the base is upstream: > > * virtio-iommu as a module. It got *much* nicer after Rob's probe > deferral rework, but I still have a bug to fix when re-loading the > virtio-iommu module. > > * ACPI support requires a minor IORT spec update (reservation of node > ID). I think it should be easier to obtain once the device and drivers > are upstream. > > [1] Virtio-iommu specification v0.8, diff from v0.7, and sources > git://linux-arm.org/virtio-iommu.git virtio-iommu/v0.8 > http://jpbrucker.net/virtio-iommu/spec/v0.8/virtio-iommu-v0.8.pdf > http://jpbrucker.net/virtio-iommu/spec/diffs/virtio-iommu-pdf-diff-v0.7-v0.8.pdf > > [2] [PATCH v2 0/5] Add virtio-iommu driver > https://www.spinics.net/lists/kvm/msg170655.html > > [3] git://linux-arm.org/linux-jpb.git virtio-iommu/v0.8 > git://linux-arm.org/kvmtool-jpb.git virtio-iommu/v0.8 > > Jean-Philippe Brucker (7): > dt-bindings: virtio-mmio: Add IOMMU description > dt-bindings: virtio: Add virtio-pci-iommu node > PCI: OF: allow endpoints to bypass the iommu > PCI: OF: Initialize dev->fwnode appropriately > iommu: Add virtio-iommu driver > iommu/virtio: Add probe request > iommu/virtio: Add event queue > > .../devicetree/bindings/virtio/iommu.txt | 66 + > .../devicetree/bindings/virtio/mmio.txt | 30 + > MAINTAINERS | 7 + > drivers/iommu/Kconfig | 11 + > drivers/iommu/Makefile | 1 + > drivers/iommu/virtio-iommu.c | 1171 +++++++++++++++++ > drivers/pci/of.c | 14 +- > include/uapi/linux/virtio_ids.h | 1 + > include/uapi/linux/virtio_iommu.h | 159 +++ > 9 files changed, 1457 insertions(+), 3 deletions(-) > create mode 100644 Documentation/devicetree/bindings/virtio/iommu.txt > create mode 100644 drivers/iommu/virtio-iommu.c > create mode 100644 include/uapi/linux/virtio_iommu.h >
On 16/10/2018 10:25, Auger Eric wrote: > Hi Jean, > > On 10/12/18 4:59 PM, Jean-Philippe Brucker wrote: >> Implement the virtio-iommu driver, following specification v0.8 [1]. >> Changes since v2 [2]: >> >> * Patches 2-4 allow virtio-iommu to use the PCI transport, since QEMU >> would like to phase out the MMIO transport. This produces a complex >> topology where the programming interface of the IOMMU could appear >> lower than the endpoints that it translates. It's not unheard of (e.g. >> AMD IOMMU), and the guest easily copes with this. >> >> The "Firmware description" section of the specification has been >> updated with all combinations of PCI, MMIO and DT, ACPI. > > I have a question wrt the FW specification. The IOMMU consumes 1 slot in > the PCI domain and one needs to leave a RID hole in the iommu-map. It > is not obvious to me that this RID always is predictable given the pcie > enumeration mechanism. Generally we have a coarse grain mapping of RID > onto iommu phandles/STREAMIDs. Here, if I understand correctly we need > to precisely identify the RID granted to the iommu. On QEMU this may > depend on the instantiation order of the virtio-pci device right? Yes, although it should all happen before you boot the guest, since there is no hotplugging an IOMMU. Could you reserve a PCI slot upfront and use it for virtio-iommu later? Or generate the iommu-map at the same time as generating the child node of the PCI RC? > So > this does not look trivial to build this info. Isn't it possible to do > this exclusion at kernel level instead? So in theory VIRTIO_F_IOMMU_PLATFORM already does that: VIRTIO_F_IOMMU_PLATFORM(33) This feature indicates that the device is behind an IOMMU that translates bus addresses from the device into physical addresses in memory. If this feature bit is set to 0, then the device emits physical addresses which are not translated further, even though an IOMMU may be present. For better or for worse, the guest has to implement it. If this feature bit is unset for virtio-iommu, it does DMA on the physical address space, regardless of what the static topology description says. In practice it doesn't quite work. If your iommu-map describes the IOMMU as translating itself, Linux' OF code will wait for the IOMMU to be probed before probing the IOMMU. Working around this with hacks is possible, but I don't want to introduce more questionable code to OF and device tree bindings if there is any other way. Thanks, Jean
Hi Jean, On 10/16/18 8:44 PM, Jean-Philippe Brucker wrote: > On 16/10/2018 10:25, Auger Eric wrote: >> Hi Jean, >> >> On 10/12/18 4:59 PM, Jean-Philippe Brucker wrote: >>> Implement the virtio-iommu driver, following specification v0.8 [1]. >>> Changes since v2 [2]: >>> >>> * Patches 2-4 allow virtio-iommu to use the PCI transport, since QEMU >>> would like to phase out the MMIO transport. This produces a complex >>> topology where the programming interface of the IOMMU could appear >>> lower than the endpoints that it translates. It's not unheard of (e.g. >>> AMD IOMMU), and the guest easily copes with this. >>> >>> The "Firmware description" section of the specification has been >>> updated with all combinations of PCI, MMIO and DT, ACPI. >> >> I have a question wrt the FW specification. The IOMMU consumes 1 slot in >> the PCI domain and one needs to leave a RID hole in the iommu-map. It >> is not obvious to me that this RID always is predictable given the pcie >> enumeration mechanism. Generally we have a coarse grain mapping of RID >> onto iommu phandles/STREAMIDs. Here, if I understand correctly we need >> to precisely identify the RID granted to the iommu. On QEMU this may >> depend on the instantiation order of the virtio-pci device right? > > Yes, although it should all happen before you boot the guest, since > there is no hotplugging an IOMMU. Could you reserve a PCI slot upfront > and use it for virtio-iommu later? Or generate the iommu-map at the same > time as generating the child node of the PCI RC? Even when cold-plugging the PCIe devices through qemu CLI, this depends on the order of the pcie devices in the list I guess. I need to further experiment. > >> So >> this does not look trivial to build this info. Isn't it possible to do >> this exclusion at kernel level instead? > > So in theory VIRTIO_F_IOMMU_PLATFORM already does that: > > VIRTIO_F_IOMMU_PLATFORM(33) > This feature indicates that the device is behind an IOMMU that > translates bus addresses from the device into physical addresses in > memory. If this feature bit is set to 0, then the device emits > physical addresses which are not translated further, even though an > IOMMU may be present. This tells the driver to use the dma api, right? Effectively this explicitly says whether the device is supposed to be upfront an IOMMU. > > For better or for worse, the guest has to implement it. If this feature > bit is unset for virtio-iommu, it does DMA on the physical address > space, regardless of what the static topology description says. > > In practice it doesn't quite work. If your iommu-map describes the IOMMU > as translating itself, Linux' OF code will wait for the IOMMU to be > probed before probing the IOMMU. Working around this with hacks is > possible, but I don't want to introduce more questionable code to OF and > device tree bindings if there is any other way. Hum ok. I cannot really comment on this. I just wanted to raise this concern about RID identfication. Thanks Eric > > Thanks, > Jean >
On 16/10/2018 21:31, Auger Eric wrote: > Hi Jean, > > On 10/16/18 8:44 PM, Jean-Philippe Brucker wrote: >> On 16/10/2018 10:25, Auger Eric wrote: >>> Hi Jean, >>> >>> On 10/12/18 4:59 PM, Jean-Philippe Brucker wrote: >>>> Implement the virtio-iommu driver, following specification v0.8 [1]. >>>> Changes since v2 [2]: >>>> >>>> * Patches 2-4 allow virtio-iommu to use the PCI transport, since QEMU >>>> would like to phase out the MMIO transport. This produces a complex >>>> topology where the programming interface of the IOMMU could appear >>>> lower than the endpoints that it translates. It's not unheard of (e.g. >>>> AMD IOMMU), and the guest easily copes with this. >>>> >>>> The "Firmware description" section of the specification has been >>>> updated with all combinations of PCI, MMIO and DT, ACPI. >>> >>> I have a question wrt the FW specification. The IOMMU consumes 1 slot in >>> the PCI domain and one needs to leave a RID hole in the iommu-map. It >>> is not obvious to me that this RID always is predictable given the pcie >>> enumeration mechanism. Generally we have a coarse grain mapping of RID >>> onto iommu phandles/STREAMIDs. Here, if I understand correctly we need >>> to precisely identify the RID granted to the iommu. On QEMU this may >>> depend on the instantiation order of the virtio-pci device right? >> >> Yes, although it should all happen before you boot the guest, since >> there is no hotplugging an IOMMU. Could you reserve a PCI slot upfront >> and use it for virtio-iommu later? Or generate the iommu-map at the same >> time as generating the child node of the PCI RC? > > Even when cold-plugging the PCIe devices through qemu CLI, this depends > on the order of the pcie devices in the list I guess. I need to further > experiment. Please let me know how it goes. I guess the problem will be the same for building IORT tables? You're also going to need a hole in the ID mappings of the PCI root complex node. >>> So >>> this does not look trivial to build this info. Isn't it possible to do >>> this exclusion at kernel level instead? >> >> So in theory VIRTIO_F_IOMMU_PLATFORM already does that: >> >> VIRTIO_F_IOMMU_PLATFORM(33) >> This feature indicates that the device is behind an IOMMU that >> translates bus addresses from the device into physical addresses in >> memory. If this feature bit is set to 0, then the device emits >> physical addresses which are not translated further, even though an >> IOMMU may be present. > > This tells the driver to use the dma api, right? That's how Linux implements the bit, install custom DMA ops when the bit is absent. But it doesn't work for everyone and has caused a lot of debate (https://patchwork.ozlabs.org/cover/946708/) > Effectively this > explicitly says whether the device is supposed to be upfront an IOMMU. Yes. It's quite strange if you consider hotpluggable hardware, since those devices shouldn't get to choose whether they are managed by an IOMMU. For the IOMMU itself, it should be fine >> For better or for worse, the guest has to implement it. If this feature >> bit is unset for virtio-iommu, it does DMA on the physical address >> space, regardless of what the static topology description says. >> >> In practice it doesn't quite work. If your iommu-map describes the IOMMU >> as translating itself, Linux' OF code will wait for the IOMMU to be >> probed before probing the IOMMU. Working around this with hacks is >> possible, but I don't want to introduce more questionable code to OF and >> device tree bindings if there is any other way. > Hum ok. I cannot really comment on this. > > I just wanted to raise this concern about RID identfication. We can always try. Relaxing iommu-map further would be one additional patch to Documentation/devicetree/bindings/pci/pci-iommu.txt, and one to drivers/iommu/of-iommu.c. I'd rather make it a separate RFC. Since we need acks from an OF maintainer and I'd also like Joerg's approval for adding a new driver to the IOMMU tree, I think it's too late for this iteration. I wasn't intending for this to go into 4.20, just have something to discuss at KVM forum next week. Thanks, Jean
On Wed, Oct 17, 2018 at 12:54:28PM +0100, Jean-Philippe Brucker wrote: > On 16/10/2018 21:31, Auger Eric wrote: > > Hi Jean, > > > > On 10/16/18 8:44 PM, Jean-Philippe Brucker wrote: > >> On 16/10/2018 10:25, Auger Eric wrote: > >>> Hi Jean, > >>> > >>> On 10/12/18 4:59 PM, Jean-Philippe Brucker wrote: > >>>> Implement the virtio-iommu driver, following specification v0.8 [1]. > >>>> Changes since v2 [2]: > >>>> > >>>> * Patches 2-4 allow virtio-iommu to use the PCI transport, since QEMU > >>>> would like to phase out the MMIO transport. This produces a complex > >>>> topology where the programming interface of the IOMMU could appear > >>>> lower than the endpoints that it translates. It's not unheard of (e.g. > >>>> AMD IOMMU), and the guest easily copes with this. > >>>> > >>>> The "Firmware description" section of the specification has been > >>>> updated with all combinations of PCI, MMIO and DT, ACPI. > >>> > >>> I have a question wrt the FW specification. The IOMMU consumes 1 slot in > >>> the PCI domain and one needs to leave a RID hole in the iommu-map. It > >>> is not obvious to me that this RID always is predictable given the pcie > >>> enumeration mechanism. Generally we have a coarse grain mapping of RID > >>> onto iommu phandles/STREAMIDs. Here, if I understand correctly we need > >>> to precisely identify the RID granted to the iommu. On QEMU this may > >>> depend on the instantiation order of the virtio-pci device right? > >> > >> Yes, although it should all happen before you boot the guest, since > >> there is no hotplugging an IOMMU. Could you reserve a PCI slot upfront > >> and use it for virtio-iommu later? Or generate the iommu-map at the same > >> time as generating the child node of the PCI RC? > > > > Even when cold-plugging the PCIe devices through qemu CLI, this depends > > on the order of the pcie devices in the list I guess. I need to further > > experiment. > > Please let me know how it goes. I guess the problem will be the same for > building IORT tables? You're also going to need a hole in the ID > mappings of the PCI root complex node. > > >>> So > >>> this does not look trivial to build this info. Isn't it possible to do > >>> this exclusion at kernel level instead? > >> > >> So in theory VIRTIO_F_IOMMU_PLATFORM already does that: > >> > >> VIRTIO_F_IOMMU_PLATFORM(33) > >> This feature indicates that the device is behind an IOMMU that > >> translates bus addresses from the device into physical addresses in > >> memory. If this feature bit is set to 0, then the device emits > >> physical addresses which are not translated further, even though an > >> IOMMU may be present. > > > > This tells the driver to use the dma api, right? > > That's how Linux implements the bit, install custom DMA ops when the bit > is absent. But it doesn't work for everyone and has caused a lot of > debate (https://patchwork.ozlabs.org/cover/946708/) > > > Effectively this > > explicitly says whether the device is supposed to be upfront an IOMMU. > > Yes. It's quite strange if you consider hotpluggable hardware, since > those devices shouldn't get to choose whether they are managed by an > IOMMU. For the IOMMU itself, it should be fine > > >> For better or for worse, the guest has to implement it. If this feature > >> bit is unset for virtio-iommu, it does DMA on the physical address > >> space, regardless of what the static topology description says. > >> > >> In practice it doesn't quite work. If your iommu-map describes the IOMMU > >> as translating itself, Linux' OF code will wait for the IOMMU to be > >> probed before probing the IOMMU. Working around this with hacks is > >> possible, but I don't want to introduce more questionable code to OF and > >> device tree bindings if there is any other way. > > Hum ok. I cannot really comment on this. > > > > I just wanted to raise this concern about RID identfication. > > We can always try. Relaxing iommu-map further would be one additional > patch to Documentation/devicetree/bindings/pci/pci-iommu.txt, and one to > drivers/iommu/of-iommu.c. I'd rather make it a separate RFC. > > Since we need acks from an OF maintainer and I'd also like Joerg's > approval for adding a new driver to the IOMMU tree, I think it's too > late for this iteration. I wasn't intending for this to go into 4.20, > just have something to discuss at KVM forum next week. > > Thanks, > Jean OK then. I'd appreciate it if you mark patches that aren't intended to be merged as RFC in subject line. Thanks!