Message ID | 20181211182104.18241-1-jean-philippe.brucker@arm.com (mailing list archive) |
---|---|
Headers | show |
Series | Add virtio-iommu driver | expand |
On Tue, Dec 11, 2018 at 06:20:57PM +0000, Jean-Philippe Brucker wrote: > Implement the virtio-iommu driver, following specification v0.9 [1]. > > Only minor changes since v5 [2]. I fixed issues reported by Michael and > added tags from Eric and Bharat. Thanks! > > You can find Linux driver and kvmtool device on v0.9 branches [3], > module and x86 support on virtio-iommu/devel. Also tested with Eric's > QEMU device [4]. Just curious, what is the use case for it?
On 11/12/2018 18:31, Christoph Hellwig wrote: > On Tue, Dec 11, 2018 at 06:20:57PM +0000, Jean-Philippe Brucker wrote: >> Implement the virtio-iommu driver, following specification v0.9 [1]. >> >> Only minor changes since v5 [2]. I fixed issues reported by Michael and >> added tags from Eric and Bharat. Thanks! >> >> You can find Linux driver and kvmtool device on v0.9 branches [3], >> module and x86 support on virtio-iommu/devel. Also tested with Eric's >> QEMU device [4]. > > Just curious, what is the use case for it? The main use case is assigning a device to guest userspace, using VFIO both in the host and in the guest (the most cited example being DPDK). There are others, and I wrote a little more about them last week: https://www.spinics.net/lists/linux-pci/msg78529.html Thanks, Jean
On Tue, Dec 11, 2018 at 10:31:01AM -0800, Christoph Hellwig wrote: > On Tue, Dec 11, 2018 at 06:20:57PM +0000, Jean-Philippe Brucker wrote: > > Implement the virtio-iommu driver, following specification v0.9 [1]. > > > > Only minor changes since v5 [2]. I fixed issues reported by Michael and > > added tags from Eric and Bharat. Thanks! > > > > You can find Linux driver and kvmtool device on v0.9 branches [3], > > module and x86 support on virtio-iommu/devel. Also tested with Eric's > > QEMU device [4]. > > Just curious, what is the use case for it? If what I saw is any indication, it allows a very simple implementation of page table shadowing on the host, at the cost of not being able to accelerate with a hardware nested page tables support when available.
Hi, to make progress on this, we should first agree on the protocol used between guest and host. I have a few points to discuss on the protocol first. On Tue, Dec 11, 2018 at 06:20:57PM +0000, Jean-Philippe Brucker wrote: > [1] Virtio-iommu specification v0.9, sources and pdf > git://linux-arm.org/virtio-iommu.git virtio-iommu/v0.9 > http://jpbrucker.net/virtio-iommu/spec/v0.9/virtio-iommu-v0.9.pdf Looking at this I wonder why it doesn't make the IOTLB visible to the guest. the UNMAP requests seem to require that the TLB is already flushed to make the unmap visible. I think that will cost significant performance for both, vfio and dma-iommu use-cases which both do (vfio at least to some degree), deferred flushing. I also wonder whether the protocol should implement a protocol version handshake and iommu-feature set queries. > [3] git://linux-arm.org/linux-jpb.git virtio-iommu/v0.9.1 > git://linux-arm.org/kvmtool-jpb.git virtio-iommu/v0.9 Unfortunatly gitweb seems to be broken on linux-arm.org. What is missing in this patch-set to make this work on x86? Regards, Joerg
On Wed, Dec 12, 2018 at 11:35:45AM +0100, Joerg Roedel wrote: > Hi, > > to make progress on this, we should first agree on the protocol used > between guest and host. I have a few points to discuss on the protocol > first. > > On Tue, Dec 11, 2018 at 06:20:57PM +0000, Jean-Philippe Brucker wrote: > > [1] Virtio-iommu specification v0.9, sources and pdf > > git://linux-arm.org/virtio-iommu.git virtio-iommu/v0.9 > > http://jpbrucker.net/virtio-iommu/spec/v0.9/virtio-iommu-v0.9.pdf > > Looking at this I wonder why it doesn't make the IOTLB visible to the > guest. the UNMAP requests seem to require that the TLB is already > flushed to make the unmap visible. > > I think that will cost significant performance for both, vfio and > dma-iommu use-cases which both do (vfio at least to some degree), > deferred flushing. > > I also wonder whether the protocol should implement a > protocol version handshake virtio has a builtin version handshake so devices don't need to. > and iommu-feature set queries. > > > [3] git://linux-arm.org/linux-jpb.git virtio-iommu/v0.9.1 > > git://linux-arm.org/kvmtool-jpb.git virtio-iommu/v0.9 > > Unfortunatly gitweb seems to be broken on linux-arm.org. What is missing > in this patch-set to make this work on x86? And I wonder about pcc too. > Regards, > > Joerg
Hi Joerg, On 12/12/2018 10:35, Joerg Roedel wrote: > Hi, > > to make progress on this, we should first agree on the protocol used > between guest and host. I have a few points to discuss on the protocol > first. > > On Tue, Dec 11, 2018 at 06:20:57PM +0000, Jean-Philippe Brucker wrote: >> [1] Virtio-iommu specification v0.9, sources and pdf >> git://linux-arm.org/virtio-iommu.git virtio-iommu/v0.9 >> http://jpbrucker.net/virtio-iommu/spec/v0.9/virtio-iommu-v0.9.pdf > > Looking at this I wonder why it doesn't make the IOTLB visible to the > guest. the UNMAP requests seem to require that the TLB is already > flushed to make the unmap visible. > > I think that will cost significant performance for both, vfio and > dma-iommu use-cases which both do (vfio at least to some degree), > deferred flushing. We already do deferred flush: UNMAP requests are added to the queue by iommu_unmap(), and then flushed out by iotlb_sync(). So we switch to the host only on iotlb_sync(), or when the request queue is full. > I also wonder whether the protocol should implement a > protocol version handshake and iommu-feature set queries. With the virtio transport there is a handshake when the device (IOMMU) is initialized, through feature bits and global config fields. Feature bits are made of both transport-specific features, including the version number, and device-specific features defined in section 2.3 of the above document (the transport is described in the virtio 1.0 specification). The device presents features that it supports in a register, and the driver masks out the feature bits that it doesn't support. Then the driver sets the global status to FEATURES_OK and initialization continues. In addition virtio-iommu has per-endpoint features through the PROBE request, since the vIOMMU may manage hardware (VFIO) and software (virtio) endpoints at the same time, which don't have the same DMA capabilities (different IOVA ranges, page granularity, reserved ranges, pgtable sharing, etc). At the moment this is a one-way probe, not a handshake. The device simply fills the properties of each endpoint, but the driver doesn't have to ack them. Initially there was a way to negotiate each PROBE property but it was deemed unnecessary during review. By leaving a few spare bits in the property headers I made sure it can be added back with a feature bit if we ever need it. >> [3] git://linux-arm.org/linux-jpb.git virtio-iommu/v0.9.1 >> git://linux-arm.org/kvmtool-jpb.git virtio-iommu/v0.9 > > Unfortunatly gitweb seems to be broken on linux-arm.org. What is missing > in this patch-set to make this work on x86? You should be able to access it here: http://www.linux-arm.org/git?p=linux-jpb.git;a=shortlog;h=refs/heads/virtio-iommu/devel That branch contains missing bits for x86 support: * ACPI support. We have the code but it's waiting for an IORT spec update, to reserve the IORT node ID. I expect it to take a while, given that I'm alone requesting a change for something that's not upstream or in hardware. * DMA ops for x86 (see "HACK" commit). I'd like to use dma-iommu but I'm not sure how to implement the glue that sets dma_ops properly. Thanks, Jean
On Thu, Dec 13, 2018 at 12:50:29PM +0000, Jean-Philippe Brucker wrote: > * DMA ops for x86 (see "HACK" commit). I'd like to use dma-iommu but I'm > not sure how to implement the glue that sets dma_ops properly. http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-iommu-ops
On Thu, Dec 13, 2018 at 12:50:29PM +0000, Jean-Philippe Brucker wrote: > >> [3] git://linux-arm.org/linux-jpb.git virtio-iommu/v0.9.1 > >> git://linux-arm.org/kvmtool-jpb.git virtio-iommu/v0.9 > > > > Unfortunatly gitweb seems to be broken on linux-arm.org. What is missing > > in this patch-set to make this work on x86? > > You should be able to access it here: > http://www.linux-arm.org/git?p=linux-jpb.git;a=shortlog;h=refs/heads/virtio-iommu/devel > > That branch contains missing bits for x86 support: > > * ACPI support. We have the code but it's waiting for an IORT spec > update, to reserve the IORT node ID. I expect it to take a while, given > that I'm alone requesting a change for something that's not upstream or > in hardware. Frankly I think you should take a hard look at just getting the data needed from the PCI device itself. You don't need to depend on virtio, it can be a small driver that gets you that data from the device config space and then just goes away. If you want help with writing such a small driver let me know. If there's an advantage to virtio-iommu then that would be its portability, and it all goes out of the window because of dependencies on ACPI and DT and OF and the rest of the zoo. > * DMA ops for x86 (see "HACK" commit). I'd like to use dma-iommu but I'm > not sure how to implement the glue that sets dma_ops properly. > > Thanks, > Jean OK so IIUC you are looking into Christoph's suggestions to fix that up? There's still a bit of time left before the merge window, maybe you can make above changes.
On 19/12/2018 23:09, Michael S. Tsirkin wrote: > On Thu, Dec 13, 2018 at 12:50:29PM +0000, Jean-Philippe Brucker wrote: >>>> [3] git://linux-arm.org/linux-jpb.git virtio-iommu/v0.9.1 >>>> git://linux-arm.org/kvmtool-jpb.git virtio-iommu/v0.9 >>> >>> Unfortunatly gitweb seems to be broken on linux-arm.org. What is missing >>> in this patch-set to make this work on x86? >> >> You should be able to access it here: >> http://www.linux-arm.org/git?p=linux-jpb.git;a=shortlog;h=refs/heads/virtio-iommu/devel >> >> That branch contains missing bits for x86 support: >> >> * ACPI support. We have the code but it's waiting for an IORT spec >> update, to reserve the IORT node ID. I expect it to take a while, given >> that I'm alone requesting a change for something that's not upstream or >> in hardware. > > Frankly I think you should take a hard look at just getting the data > needed from the PCI device itself. You don't need to depend on virtio, > it can be a small driver that gets you that data from the device config > space and then just goes away. > > If you want help with writing such a small driver let me know. > > If there's an advantage to virtio-iommu then that would be its > portability, and it all goes out of the window because > of dependencies on ACPI and DT and OF and the rest of the zoo. But the portable solutions are ACPI and DT. Describing the DMA dependency through a device would require the guest to probe the device before all others. How do we communicate this? * pass a kernel parameter saying something like "probe_first=00:01.0" * make sure that the PCI root complex is probed before any other platform device (since the IOMMU can manage DMA of platform devices). * change DT, ACPI and PCI core code to handle this probe_first kernel parameter. Better go with something standard, that any OS and hypervisor knows how to use, and that other IOMMU devices already use. >> * DMA ops for x86 (see "HACK" commit). I'd like to use dma-iommu but I'm >> not sure how to implement the glue that sets dma_ops properly. >> >> Thanks, >> Jean > > OK so IIUC you are looking into Christoph's suggestions to fix that up? Eventually yes. I'll give it a try next year, once the dma-iommu changes are on the list. It's not a priority for me, given that x86 already has a pvIOMMU with VT-d, and that Arm still needs one. It shouldn't block this series. > There's still a bit of time left before the merge window, > maybe you can make above changes. I'll wait to see if Joerg has other concerns about the design or the code, and resend in January. I think that IOMMU driver changes should go through his tree. Thanks, Jean
On Thu, Dec 20, 2018 at 05:59:46PM +0000, Jean-Philippe Brucker wrote: > On 19/12/2018 23:09, Michael S. Tsirkin wrote: > > On Thu, Dec 13, 2018 at 12:50:29PM +0000, Jean-Philippe Brucker wrote: > >>>> [3] git://linux-arm.org/linux-jpb.git virtio-iommu/v0.9.1 > >>>> git://linux-arm.org/kvmtool-jpb.git virtio-iommu/v0.9 > >>> > >>> Unfortunatly gitweb seems to be broken on linux-arm.org. What is missing > >>> in this patch-set to make this work on x86? > >> > >> You should be able to access it here: > >> http://www.linux-arm.org/git?p=linux-jpb.git;a=shortlog;h=refs/heads/virtio-iommu/devel > >> > >> That branch contains missing bits for x86 support: > >> > >> * ACPI support. We have the code but it's waiting for an IORT spec > >> update, to reserve the IORT node ID. I expect it to take a while, given > >> that I'm alone requesting a change for something that's not upstream or > >> in hardware. > > > > Frankly I think you should take a hard look at just getting the data > > needed from the PCI device itself. You don't need to depend on virtio, > > it can be a small driver that gets you that data from the device config > > space and then just goes away. > > > > If you want help with writing such a small driver let me know. > > > > If there's an advantage to virtio-iommu then that would be its > > portability, and it all goes out of the window because > > of dependencies on ACPI and DT and OF and the rest of the zoo. > > But the portable solutions are ACPI and DT. > > Describing the DMA dependency through a device would require the guest > to probe the device before all others. How do we communicate this? > * pass a kernel parameter saying something like "probe_first=00:01.0" > * make sure that the PCI root complex is probed before any other > platform device (since the IOMMU can manage DMA of platform devices). My idea was to just find and probe the specific device. > * change DT, ACPI and PCI core code to handle this probe_first kernel > parameter. > > Better go with something standard, that any OS and hypervisor knows how > to use, and that other IOMMU devices already use. > > >> * DMA ops for x86 (see "HACK" commit). I'd like to use dma-iommu but I'm > >> not sure how to implement the glue that sets dma_ops properly. > >> > >> Thanks, > >> Jean > > > > OK so IIUC you are looking into Christoph's suggestions to fix that up? > > Eventually yes. I'll give it a try next year, once the dma-iommu changes > are on the list. It's not a priority for me, given that x86 already has > a pvIOMMU with VT-d, and that Arm still needs one. Well that's a kind of a weak usecase, isn't it? Can we just build VTD on ARM? diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index d9a25715650e..009fa98e9363 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -174,7 +174,7 @@ config DMAR_TABLE config INTEL_IOMMU bool "Support for Intel IOMMU using DMA Remapping Devices" - depends on PCI_MSI && ACPI && (X86 || IA64_GENERIC) + depends on PCI_MSI && ACPI && (X86 || IA64_GENERIC || ARM) select IOMMU_API select IOMMU_IOVA select NEED_DMA_MAP_STATE didn't try this one ... > It shouldn't block > this series. > > > There's still a bit of time left before the merge window, > > maybe you can make above changes. > > I'll wait to see if Joerg has other concerns about the design or the > code, and resend in January. I think that IOMMU driver changes should go > through his tree. > > Thanks, > Jean Sorry which changes do you mean?
Hi Jean-Philippe, On Thu, Dec 13, 2018 at 12:50:29PM +0000, Jean-Philippe Brucker wrote: > We already do deferred flush: UNMAP requests are added to the queue by > iommu_unmap(), and then flushed out by iotlb_sync(). So we switch to the > host only on iotlb_sync(), or when the request queue is full. So the mappings can stay in place until iotlb_sync() returns? What happens when the guest sends a map-request for a region it sent and unmap request before, but did not call iotlb_sync inbetween? Regards, Joerg
On 11/01/2019 12:28, Joerg Roedel wrote: > Hi Jean-Philippe, > > On Thu, Dec 13, 2018 at 12:50:29PM +0000, Jean-Philippe Brucker wrote: >> We already do deferred flush: UNMAP requests are added to the queue by >> iommu_unmap(), and then flushed out by iotlb_sync(). So we switch to the >> host only on iotlb_sync(), or when the request queue is full. > > So the mappings can stay in place until iotlb_sync() returns? What > happens when the guest sends a map-request for a region it sent and > unmap request before, but did not call iotlb_sync inbetween? At that point the unmap is still in the request queue, and the host will handle it before getting to the map request. For correctness requests are necessarily handled in-order by the host. So if the map and unmap refer to the same domain and IOVA, the host will remove the old mapping before creating the new one. Thanks, Jean