mbox series

[v6,0/7] Add virtio-iommu driver

Message ID 20181211182104.18241-1-jean-philippe.brucker@arm.com (mailing list archive)
Headers show
Series Add virtio-iommu driver | expand

Message

Jean-Philippe Brucker Dec. 11, 2018, 6:20 p.m. UTC
Implement the virtio-iommu driver, following specification v0.9 [1].

Only minor changes since v5 [2]. I fixed issues reported by Michael and
added tags from Eric and Bharat. Thanks!

You can find Linux driver and kvmtool device on v0.9 branches [3],
module and x86 support on virtio-iommu/devel. Also tested with Eric's
QEMU device [4].

[1] Virtio-iommu specification v0.9, sources and pdf
    git://linux-arm.org/virtio-iommu.git virtio-iommu/v0.9
    http://jpbrucker.net/virtio-iommu/spec/v0.9/virtio-iommu-v0.9.pdf

[2] [PATCH v5 0/7] Add virtio-iommu driver
    https://www.spinics.net/lists/linux-pci/msg78158.html

[3] git://linux-arm.org/linux-jpb.git virtio-iommu/v0.9.1
    git://linux-arm.org/kvmtool-jpb.git virtio-iommu/v0.9

[4] [RFC v9 00/17] VIRTIO-IOMMU device
    https://www.mail-archive.com/qemu-devel@nongnu.org/msg575578.html

Jean-Philippe Brucker (7):
  dt-bindings: virtio-mmio: Add IOMMU description
  dt-bindings: virtio: Add virtio-pci-iommu node
  of: Allow the iommu-map property to omit untranslated devices
  PCI: OF: Initialize dev->fwnode appropriately
  iommu: Add virtio-iommu driver
  iommu/virtio: Add probe request
  iommu/virtio: Add event queue

 .../devicetree/bindings/virtio/iommu.txt      |   66 +
 .../devicetree/bindings/virtio/mmio.txt       |   30 +
 MAINTAINERS                                   |    7 +
 drivers/iommu/Kconfig                         |   11 +
 drivers/iommu/Makefile                        |    1 +
 drivers/iommu/virtio-iommu.c                  | 1157 +++++++++++++++++
 drivers/of/base.c                             |   10 +-
 drivers/pci/of.c                              |    7 +
 include/uapi/linux/virtio_ids.h               |    1 +
 include/uapi/linux/virtio_iommu.h             |  161 +++
 10 files changed, 1448 insertions(+), 3 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/virtio/iommu.txt
 create mode 100644 drivers/iommu/virtio-iommu.c
 create mode 100644 include/uapi/linux/virtio_iommu.h

Comments

Christoph Hellwig Dec. 11, 2018, 6:31 p.m. UTC | #1
On Tue, Dec 11, 2018 at 06:20:57PM +0000, Jean-Philippe Brucker wrote:
> Implement the virtio-iommu driver, following specification v0.9 [1].
> 
> Only minor changes since v5 [2]. I fixed issues reported by Michael and
> added tags from Eric and Bharat. Thanks!
> 
> You can find Linux driver and kvmtool device on v0.9 branches [3],
> module and x86 support on virtio-iommu/devel. Also tested with Eric's
> QEMU device [4].

Just curious, what is the use case for it?
Jean-Philippe Brucker Dec. 11, 2018, 7:07 p.m. UTC | #2
On 11/12/2018 18:31, Christoph Hellwig wrote:
> On Tue, Dec 11, 2018 at 06:20:57PM +0000, Jean-Philippe Brucker wrote:
>> Implement the virtio-iommu driver, following specification v0.9 [1].
>>
>> Only minor changes since v5 [2]. I fixed issues reported by Michael and
>> added tags from Eric and Bharat. Thanks!
>>
>> You can find Linux driver and kvmtool device on v0.9 branches [3],
>> module and x86 support on virtio-iommu/devel. Also tested with Eric's
>> QEMU device [4].
> 
> Just curious, what is the use case for it?

The main use case is assigning a device to guest userspace, using VFIO
both in the host and in the guest (the most cited example being DPDK).
There are others, and I wrote a little more about them last week:
https://www.spinics.net/lists/linux-pci/msg78529.html

Thanks,
Jean
Michael S. Tsirkin Dec. 12, 2018, 12:39 a.m. UTC | #3
On Tue, Dec 11, 2018 at 10:31:01AM -0800, Christoph Hellwig wrote:
> On Tue, Dec 11, 2018 at 06:20:57PM +0000, Jean-Philippe Brucker wrote:
> > Implement the virtio-iommu driver, following specification v0.9 [1].
> > 
> > Only minor changes since v5 [2]. I fixed issues reported by Michael and
> > added tags from Eric and Bharat. Thanks!
> > 
> > You can find Linux driver and kvmtool device on v0.9 branches [3],
> > module and x86 support on virtio-iommu/devel. Also tested with Eric's
> > QEMU device [4].
> 
> Just curious, what is the use case for it?

If what I saw is any indication, it allows a very simple implementation
of page table shadowing on the host, at the cost of not being able to
accelerate with a hardware nested page tables support when available.
Joerg Roedel Dec. 12, 2018, 10:35 a.m. UTC | #4
Hi,

to make progress on this, we should first agree on the protocol used
between guest and host. I have a few points to discuss on the protocol
first.

On Tue, Dec 11, 2018 at 06:20:57PM +0000, Jean-Philippe Brucker wrote:
> [1] Virtio-iommu specification v0.9, sources and pdf
>     git://linux-arm.org/virtio-iommu.git virtio-iommu/v0.9
>     http://jpbrucker.net/virtio-iommu/spec/v0.9/virtio-iommu-v0.9.pdf

Looking at this I wonder why it doesn't make the IOTLB visible to the
guest. the UNMAP requests seem to require that the TLB is already
flushed to make the unmap visible.

I think that will cost significant performance for both, vfio and
dma-iommu use-cases which both do (vfio at least to some degree),
deferred flushing.

I also wonder whether the protocol should implement a
protocol version handshake and iommu-feature set queries.

> [3] git://linux-arm.org/linux-jpb.git virtio-iommu/v0.9.1
>     git://linux-arm.org/kvmtool-jpb.git virtio-iommu/v0.9

Unfortunatly gitweb seems to be broken on linux-arm.org. What is missing
in this patch-set to make this work on x86?

Regards,

	Joerg
Michael S. Tsirkin Dec. 12, 2018, 2:35 p.m. UTC | #5
On Wed, Dec 12, 2018 at 11:35:45AM +0100, Joerg Roedel wrote:
> Hi,
> 
> to make progress on this, we should first agree on the protocol used
> between guest and host. I have a few points to discuss on the protocol
> first.
> 
> On Tue, Dec 11, 2018 at 06:20:57PM +0000, Jean-Philippe Brucker wrote:
> > [1] Virtio-iommu specification v0.9, sources and pdf
> >     git://linux-arm.org/virtio-iommu.git virtio-iommu/v0.9
> >     http://jpbrucker.net/virtio-iommu/spec/v0.9/virtio-iommu-v0.9.pdf
> 
> Looking at this I wonder why it doesn't make the IOTLB visible to the
> guest. the UNMAP requests seem to require that the TLB is already
> flushed to make the unmap visible.
> 
> I think that will cost significant performance for both, vfio and
> dma-iommu use-cases which both do (vfio at least to some degree),
> deferred flushing.
> 
> I also wonder whether the protocol should implement a
> protocol version handshake

virtio has a builtin version handshake so devices don't need to.

> and iommu-feature set queries.
> 
> > [3] git://linux-arm.org/linux-jpb.git virtio-iommu/v0.9.1
> >     git://linux-arm.org/kvmtool-jpb.git virtio-iommu/v0.9
> 
> Unfortunatly gitweb seems to be broken on linux-arm.org. What is missing
> in this patch-set to make this work on x86?

And I wonder about pcc too.

> Regards,
> 
> 	Joerg
Jean-Philippe Brucker Dec. 13, 2018, 12:50 p.m. UTC | #6
Hi Joerg,

On 12/12/2018 10:35, Joerg Roedel wrote:
> Hi,
> 
> to make progress on this, we should first agree on the protocol used
> between guest and host. I have a few points to discuss on the protocol
> first.
> 
> On Tue, Dec 11, 2018 at 06:20:57PM +0000, Jean-Philippe Brucker wrote:
>> [1] Virtio-iommu specification v0.9, sources and pdf
>>     git://linux-arm.org/virtio-iommu.git virtio-iommu/v0.9
>>     http://jpbrucker.net/virtio-iommu/spec/v0.9/virtio-iommu-v0.9.pdf
> 
> Looking at this I wonder why it doesn't make the IOTLB visible to the
> guest. the UNMAP requests seem to require that the TLB is already
> flushed to make the unmap visible.
> 
> I think that will cost significant performance for both, vfio and
> dma-iommu use-cases which both do (vfio at least to some degree),
> deferred flushing.

We already do deferred flush: UNMAP requests are added to the queue by
iommu_unmap(), and then flushed out by iotlb_sync(). So we switch to the
host only on iotlb_sync(), or when the request queue is full.

> I also wonder whether the protocol should implement a
> protocol version handshake and iommu-feature set queries.

With the virtio transport there is a handshake when the device (IOMMU)
is initialized, through feature bits and global config fields. Feature
bits are made of both transport-specific features, including the version
number, and device-specific features defined in section 2.3 of the above
document (the transport is described in the virtio 1.0 specification).
The device presents features that it supports in a register, and the
driver masks out the feature bits that it doesn't support. Then the
driver sets the global status to FEATURES_OK and initialization continues.

In addition virtio-iommu has per-endpoint features through the PROBE
request, since the vIOMMU may manage hardware (VFIO) and software
(virtio) endpoints at the same time, which don't have the same DMA
capabilities (different IOVA ranges, page granularity, reserved ranges,
pgtable sharing, etc). At the moment this is a one-way probe, not a
handshake. The device simply fills the properties of each endpoint, but
the driver doesn't have to ack them. Initially there was a way to
negotiate each PROBE property but it was deemed unnecessary during
review. By leaving a few spare bits in the property headers I made sure
it can be added back with a feature bit if we ever need it.

>> [3] git://linux-arm.org/linux-jpb.git virtio-iommu/v0.9.1
>>     git://linux-arm.org/kvmtool-jpb.git virtio-iommu/v0.9
> 
> Unfortunatly gitweb seems to be broken on linux-arm.org. What is missing
> in this patch-set to make this work on x86?

You should be able to access it here:
http://www.linux-arm.org/git?p=linux-jpb.git;a=shortlog;h=refs/heads/virtio-iommu/devel

That branch contains missing bits for x86 support:

* ACPI support. We have the code but it's waiting for an IORT spec
update, to reserve the IORT node ID. I expect it to take a while, given
that I'm alone requesting a change for something that's not upstream or
in hardware.

* DMA ops for x86 (see "HACK" commit). I'd like to use dma-iommu but I'm
not sure how to implement the glue that sets dma_ops properly.

Thanks,
Jean
Christoph Hellwig Dec. 13, 2018, 2:17 p.m. UTC | #7
On Thu, Dec 13, 2018 at 12:50:29PM +0000, Jean-Philippe Brucker wrote:
> * DMA ops for x86 (see "HACK" commit). I'd like to use dma-iommu but I'm
> not sure how to implement the glue that sets dma_ops properly.

http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-iommu-ops
Michael S. Tsirkin Dec. 19, 2018, 11:09 p.m. UTC | #8
On Thu, Dec 13, 2018 at 12:50:29PM +0000, Jean-Philippe Brucker wrote:
> >> [3] git://linux-arm.org/linux-jpb.git virtio-iommu/v0.9.1
> >>     git://linux-arm.org/kvmtool-jpb.git virtio-iommu/v0.9
> > 
> > Unfortunatly gitweb seems to be broken on linux-arm.org. What is missing
> > in this patch-set to make this work on x86?
> 
> You should be able to access it here:
> http://www.linux-arm.org/git?p=linux-jpb.git;a=shortlog;h=refs/heads/virtio-iommu/devel
> 
> That branch contains missing bits for x86 support:
> 
> * ACPI support. We have the code but it's waiting for an IORT spec
> update, to reserve the IORT node ID. I expect it to take a while, given
> that I'm alone requesting a change for something that's not upstream or
> in hardware.

Frankly I think you should take a hard look at just getting the data
needed from the PCI device itself.  You don't need to depend on virtio,
it can be a small driver that gets you that data from the device config
space and then just goes away.

If you want help with writing such a small driver let me know.

If there's an advantage to virtio-iommu then that would be its
portability, and it all goes out of the window because
of dependencies on ACPI and DT and OF and the rest of the zoo.


> * DMA ops for x86 (see "HACK" commit). I'd like to use dma-iommu but I'm
> not sure how to implement the glue that sets dma_ops properly.
> 
> Thanks,
> Jean

OK so IIUC you are looking into Christoph's suggestions to fix that up?

There's still a bit of time left before the merge window,
maybe you can make above changes.
Jean-Philippe Brucker Dec. 20, 2018, 5:59 p.m. UTC | #9
On 19/12/2018 23:09, Michael S. Tsirkin wrote:
> On Thu, Dec 13, 2018 at 12:50:29PM +0000, Jean-Philippe Brucker wrote:
>>>> [3] git://linux-arm.org/linux-jpb.git virtio-iommu/v0.9.1
>>>>      git://linux-arm.org/kvmtool-jpb.git virtio-iommu/v0.9
>>>
>>> Unfortunatly gitweb seems to be broken on linux-arm.org. What is missing
>>> in this patch-set to make this work on x86?
>>
>> You should be able to access it here:
>> http://www.linux-arm.org/git?p=linux-jpb.git;a=shortlog;h=refs/heads/virtio-iommu/devel
>>
>> That branch contains missing bits for x86 support:
>>
>> * ACPI support. We have the code but it's waiting for an IORT spec
>> update, to reserve the IORT node ID. I expect it to take a while, given
>> that I'm alone requesting a change for something that's not upstream or
>> in hardware.
> 
> Frankly I think you should take a hard look at just getting the data
> needed from the PCI device itself.  You don't need to depend on virtio,
> it can be a small driver that gets you that data from the device config
> space and then just goes away.
> 
> If you want help with writing such a small driver let me know.
> 
> If there's an advantage to virtio-iommu then that would be its
> portability, and it all goes out of the window because
> of dependencies on ACPI and DT and OF and the rest of the zoo.

But the portable solutions are ACPI and DT.

Describing the DMA dependency through a device would require the guest
to probe the device before all others. How do we communicate this?
* pass a kernel parameter saying something like "probe_first=00:01.0"
* make sure that the PCI root complex is probed before any other
platform device (since the IOMMU can manage DMA of platform devices).
* change DT, ACPI and PCI core code to handle this probe_first kernel
parameter.

Better go with something standard, that any OS and hypervisor knows how
to use, and that other IOMMU devices already use.

>> * DMA ops for x86 (see "HACK" commit). I'd like to use dma-iommu but I'm
>> not sure how to implement the glue that sets dma_ops properly.
>>
>> Thanks,
>> Jean
> 
> OK so IIUC you are looking into Christoph's suggestions to fix that up?

Eventually yes. I'll give it a try next year, once the dma-iommu changes
are on the list. It's not a priority for me, given that x86 already has
a pvIOMMU with VT-d, and that Arm still needs one. It shouldn't block
this series.

> There's still a bit of time left before the merge window,
> maybe you can make above changes.

I'll wait to see if Joerg has other concerns about the design or the
code, and resend in January. I think that IOMMU driver changes should go
through his tree.

Thanks,
Jean
Michael S. Tsirkin Dec. 20, 2018, 6:17 p.m. UTC | #10
On Thu, Dec 20, 2018 at 05:59:46PM +0000, Jean-Philippe Brucker wrote:
> On 19/12/2018 23:09, Michael S. Tsirkin wrote:
> > On Thu, Dec 13, 2018 at 12:50:29PM +0000, Jean-Philippe Brucker wrote:
> >>>> [3] git://linux-arm.org/linux-jpb.git virtio-iommu/v0.9.1
> >>>>      git://linux-arm.org/kvmtool-jpb.git virtio-iommu/v0.9
> >>>
> >>> Unfortunatly gitweb seems to be broken on linux-arm.org. What is missing
> >>> in this patch-set to make this work on x86?
> >>
> >> You should be able to access it here:
> >> http://www.linux-arm.org/git?p=linux-jpb.git;a=shortlog;h=refs/heads/virtio-iommu/devel
> >>
> >> That branch contains missing bits for x86 support:
> >>
> >> * ACPI support. We have the code but it's waiting for an IORT spec
> >> update, to reserve the IORT node ID. I expect it to take a while, given
> >> that I'm alone requesting a change for something that's not upstream or
> >> in hardware.
> > 
> > Frankly I think you should take a hard look at just getting the data
> > needed from the PCI device itself.  You don't need to depend on virtio,
> > it can be a small driver that gets you that data from the device config
> > space and then just goes away.
> > 
> > If you want help with writing such a small driver let me know.
> > 
> > If there's an advantage to virtio-iommu then that would be its
> > portability, and it all goes out of the window because
> > of dependencies on ACPI and DT and OF and the rest of the zoo.
> 
> But the portable solutions are ACPI and DT.
> 
> Describing the DMA dependency through a device would require the guest
> to probe the device before all others. How do we communicate this?
> * pass a kernel parameter saying something like "probe_first=00:01.0"
> * make sure that the PCI root complex is probed before any other
> platform device (since the IOMMU can manage DMA of platform devices).

My idea was to just find and probe the specific device.

> * change DT, ACPI and PCI core code to handle this probe_first kernel
> parameter.
> 
> Better go with something standard, that any OS and hypervisor knows how
> to use, and that other IOMMU devices already use.
> 
> >> * DMA ops for x86 (see "HACK" commit). I'd like to use dma-iommu but I'm
> >> not sure how to implement the glue that sets dma_ops properly.
> >>
> >> Thanks,
> >> Jean
> > 
> > OK so IIUC you are looking into Christoph's suggestions to fix that up?
> 
> Eventually yes. I'll give it a try next year, once the dma-iommu changes
> are on the list. It's not a priority for me, given that x86 already has
> a pvIOMMU with VT-d, and that Arm still needs one.


Well that's a kind of a weak usecase, isn't it?
Can we just build VTD on ARM?


diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index d9a25715650e..009fa98e9363 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -174,7 +174,7 @@ config DMAR_TABLE
 
 config INTEL_IOMMU
 	bool "Support for Intel IOMMU using DMA Remapping Devices"
-	depends on PCI_MSI && ACPI && (X86 || IA64_GENERIC)
+	depends on PCI_MSI && ACPI && (X86 || IA64_GENERIC || ARM)
 	select IOMMU_API
 	select IOMMU_IOVA
 	select NEED_DMA_MAP_STATE


didn't try this one ...



> It shouldn't block
> this series.
> 
> > There's still a bit of time left before the merge window,
> > maybe you can make above changes.
> 
> I'll wait to see if Joerg has other concerns about the design or the
> code, and resend in January. I think that IOMMU driver changes should go
> through his tree.
> 
> Thanks,
> Jean

Sorry which changes do you mean?
Joerg Roedel Jan. 11, 2019, 12:28 p.m. UTC | #11
Hi Jean-Philippe,

On Thu, Dec 13, 2018 at 12:50:29PM +0000, Jean-Philippe Brucker wrote:
> We already do deferred flush: UNMAP requests are added to the queue by
> iommu_unmap(), and then flushed out by iotlb_sync(). So we switch to the
> host only on iotlb_sync(), or when the request queue is full.

So the mappings can stay in place until iotlb_sync() returns? What
happens when the guest sends a map-request for a region it sent and
unmap request before, but did not call iotlb_sync inbetween?

Regards,

	Joerg
Jean-Philippe Brucker Jan. 11, 2019, 1 p.m. UTC | #12
On 11/01/2019 12:28, Joerg Roedel wrote:
> Hi Jean-Philippe,
> 
> On Thu, Dec 13, 2018 at 12:50:29PM +0000, Jean-Philippe Brucker wrote:
>> We already do deferred flush: UNMAP requests are added to the queue by
>> iommu_unmap(), and then flushed out by iotlb_sync(). So we switch to the
>> host only on iotlb_sync(), or when the request queue is full.
> 
> So the mappings can stay in place until iotlb_sync() returns? What
> happens when the guest sends a map-request for a region it sent and
> unmap request before, but did not call iotlb_sync inbetween?

At that point the unmap is still in the request queue, and the host will
handle it before getting to the map request. For correctness requests
are necessarily handled in-order by the host. So if the map and unmap
refer to the same domain and IOVA, the host will remove the old mapping
before creating the new one.

Thanks,
Jean