[v3,0/7] Add virtio-iommu driver

Message ID	20181012145917.6840-1-jean-philippe.brucker@arm.com (mailing list archive)
Headers	show Return-Path: <linux-pci-owner@kernel.org> From: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> To: iommu@lists.linux-foundation.org, virtualization@lists.linux-foundation.org, devicetree@vger.kernel.org Cc: linux-pci@vger.kernel.org, kvmarm@lists.cs.columbia.edu, peter.maydell@linaro.org, joro@8bytes.org, mst@redhat.com, jasowang@redhat.com, robh+dt@kernel.org, mark.rutland@arm.com, eric.auger@redhat.com, tnowicki@caviumnetworks.com, kevin.tian@intel.com, marc.zyngier@arm.com, robin.murphy@arm.com, will.deacon@arm.com, lorenzo.pieralisi@arm.com Subject: [PATCH v3 0/7] Add virtio-iommu driver Date: Fri, 12 Oct 2018 15:59:10 +0100 Message-Id: <20181012145917.6840-1-jean-philippe.brucker@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-pci-owner@vger.kernel.org Precedence: bulk
Series	Add virtio-iommu driver \| expand [v3,0/7] Add virtio-iommu driver [v3,1/7] dt-bindings: virtio-mmio: Add IOMMU description [v3,2/7] dt-bindings: virtio: Add virtio-pci-iommu node [v3,3/7] PCI: OF: Allow endpoints to bypass the iommu [v3,4/7] PCI: OF: Initialize dev->fwnode appropriately [v3,5/7] iommu: Add virtio-iommu driver [v3,6/7] iommu/virtio: Add probe request [v3,7/7] iommu/virtio: Add event queue

Jean-Philippe Brucker Oct. 12, 2018, 2:59 p.m. UTC

Implement the virtio-iommu driver, following specification v0.8 [1].
Changes since v2 [2]:

* Patches 2-4 allow virtio-iommu to use the PCI transport, since QEMU
  would like to phase out the MMIO transport. This produces a complex
  topology where the programming interface of the IOMMU could appear
  lower than the endpoints that it translates. It's not unheard of (e.g.
  AMD IOMMU), and the guest easily copes with this.
  
  The "Firmware description" section of the specification has been
  updated with all combinations of PCI, MMIO and DT, ACPI.

* Fix structures layout, they don't need the "packed" attribute anymore.

* While we're at it, add domain parameter to DETACH request, and leave
  some padding. This way the next version, that adds PASID support,
  won't have to introduce a "DETACH2" request to stay backward
  compatible.

* Require virtio device 1.0+. Remove legacy transport notes from the
  specification.

* Request timeout is now only enabled with DEBUG.

* The patch for VFIO Kconfig (previously patch 5/5) is in next.

You can find Linux driver and kvmtool device on branches
virtio-iommu/v0.8 [3] (currently based on 4.19-rc7 but rebasing onto
next only produced a trivial conflict). Branch virtio-iommu/devel
contains a few patches that I'd like to send once the base is upstream:

* virtio-iommu as a module. It got *much* nicer after Rob's probe
  deferral rework, but I still have a bug to fix when re-loading the
  virtio-iommu module.

* ACPI support requires a minor IORT spec update (reservation of node
  ID). I think it should be easier to obtain once the device and drivers
  are upstream.

[1] Virtio-iommu specification v0.8, diff from v0.7, and sources
    git://linux-arm.org/virtio-iommu.git virtio-iommu/v0.8
    http://jpbrucker.net/virtio-iommu/spec/v0.8/virtio-iommu-v0.8.pdf
    http://jpbrucker.net/virtio-iommu/spec/diffs/virtio-iommu-pdf-diff-v0.7-v0.8.pdf

[2] [PATCH v2 0/5] Add virtio-iommu driver
    https://www.spinics.net/lists/kvm/msg170655.html

[3] git://linux-arm.org/linux-jpb.git virtio-iommu/v0.8
    git://linux-arm.org/kvmtool-jpb.git virtio-iommu/v0.8

Jean-Philippe Brucker (7):
  dt-bindings: virtio-mmio: Add IOMMU description
  dt-bindings: virtio: Add virtio-pci-iommu node
  PCI: OF: allow endpoints to bypass the iommu
  PCI: OF: Initialize dev->fwnode appropriately
  iommu: Add virtio-iommu driver
  iommu/virtio: Add probe request
  iommu/virtio: Add event queue

 .../devicetree/bindings/virtio/iommu.txt      |   66 +
 .../devicetree/bindings/virtio/mmio.txt       |   30 +
 MAINTAINERS                                   |    7 +
 drivers/iommu/Kconfig                         |   11 +
 drivers/iommu/Makefile                        |    1 +
 drivers/iommu/virtio-iommu.c                  | 1171 +++++++++++++++++
 drivers/pci/of.c                              |   14 +-
 include/uapi/linux/virtio_ids.h               |    1 +
 include/uapi/linux/virtio_iommu.h             |  159 +++
 9 files changed, 1457 insertions(+), 3 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/virtio/iommu.txt
 create mode 100644 drivers/iommu/virtio-iommu.c
 create mode 100644 include/uapi/linux/virtio_iommu.h

Michael S. Tsirkin Oct. 12, 2018, 5 p.m. UTC | #1

On Fri, Oct 12, 2018 at 03:59:10PM +0100, Jean-Philippe Brucker wrote:
> Implement the virtio-iommu driver, following specification v0.8 [1].
> Changes since v2 [2]:
> 
> * Patches 2-4 allow virtio-iommu to use the PCI transport, since QEMU
>   would like to phase out the MMIO transport. This produces a complex
>   topology where the programming interface of the IOMMU could appear
>   lower than the endpoints that it translates. It's not unheard of (e.g.
>   AMD IOMMU), and the guest easily copes with this.
>   
>   The "Firmware description" section of the specification has been
>   updated with all combinations of PCI, MMIO and DT, ACPI.
> 
> * Fix structures layout, they don't need the "packed" attribute anymore.
> 
> * While we're at it, add domain parameter to DETACH request, and leave
>   some padding. This way the next version, that adds PASID support,
>   won't have to introduce a "DETACH2" request to stay backward
>   compatible.
> 
> * Require virtio device 1.0+. Remove legacy transport notes from the
>   specification.
> 
> * Request timeout is now only enabled with DEBUG.
> 
> * The patch for VFIO Kconfig (previously patch 5/5) is in next.
> 
> You can find Linux driver and kvmtool device on branches
> virtio-iommu/v0.8 [3] (currently based on 4.19-rc7 but rebasing onto
> next only produced a trivial conflict). Branch virtio-iommu/devel
> contains a few patches that I'd like to send once the base is upstream:
> 
> * virtio-iommu as a module. It got *much* nicer after Rob's probe
>   deferral rework, but I still have a bug to fix when re-loading the
>   virtio-iommu module.
> 
> * ACPI support requires a minor IORT spec update (reservation of node
>   ID). I think it should be easier to obtain once the device and drivers
>   are upstream.
> 
> [1] Virtio-iommu specification v0.8, diff from v0.7, and sources
>     git://linux-arm.org/virtio-iommu.git virtio-iommu/v0.8
>     http://jpbrucker.net/virtio-iommu/spec/v0.8/virtio-iommu-v0.8.pdf
>     http://jpbrucker.net/virtio-iommu/spec/diffs/virtio-iommu-pdf-diff-v0.7-v0.8.pdf
> 
> [2] [PATCH v2 0/5] Add virtio-iommu driver
>     https://www.spinics.net/lists/kvm/msg170655.html
> 
> [3] git://linux-arm.org/linux-jpb.git virtio-iommu/v0.8
>     git://linux-arm.org/kvmtool-jpb.git virtio-iommu/v0.8
> 
> Jean-Philippe Brucker (7):
>   dt-bindings: virtio-mmio: Add IOMMU description
>   dt-bindings: virtio: Add virtio-pci-iommu node
>   PCI: OF: allow endpoints to bypass the iommu
>   PCI: OF: Initialize dev->fwnode appropriately
>   iommu: Add virtio-iommu driver
>   iommu/virtio: Add probe request
>   iommu/virtio: Add event queue
> 
>  .../devicetree/bindings/virtio/iommu.txt      |   66 +
>  .../devicetree/bindings/virtio/mmio.txt       |   30 +
>  MAINTAINERS                                   |    7 +
>  drivers/iommu/Kconfig                         |   11 +
>  drivers/iommu/Makefile                        |    1 +
>  drivers/iommu/virtio-iommu.c                  | 1171 +++++++++++++++++
>  drivers/pci/of.c                              |   14 +-
>  include/uapi/linux/virtio_ids.h               |    1 +
>  include/uapi/linux/virtio_iommu.h             |  159 +++
>  9 files changed, 1457 insertions(+), 3 deletions(-)
>  create mode 100644 Documentation/devicetree/bindings/virtio/iommu.txt
>  create mode 100644 drivers/iommu/virtio-iommu.c
>  create mode 100644 include/uapi/linux/virtio_iommu.h

This all looks good to me. Minor nits:
- I think DEBUG mode is best just removed for now
- Slightly wrong patch splitup causing a misaligned structure
  in uapi until all patches are applied.

You should Cc Bjorn on the pci change - I'd like to see his ack on it
being merged through my tree.

And pls Cc the virtio-dev list on any virtio uapi changes.

At a feature level I have some ideas for more features we
could add, but for now I think I'll put this version in -next
while you iron out the above wrinkles. Hope you can make the
merge window.

> -- 
> 2.19.1

Jean-Philippe Brucker Oct. 12, 2018, 6:55 p.m. UTC | #2

On 12/10/2018 18:00, Michael S. Tsirkin wrote:
> This all looks good to me. Minor nits:
> - I think DEBUG mode is best just removed for now
> - Slightly wrong patch splitup causing a misaligned structure
>   in uapi until all patches are applied.

Thanks a lot for the review, I'll fix these up and send a new version

> You should Cc Bjorn on the pci change - I'd like to see his ack on it
> being merged through my tree.

Argh, I don't know how I missed him. However patches 1-4 are device tree
changes, and need acks from Rob or Mark (on Cc)

> And pls Cc the virtio-dev list on any virtio uapi changes.
> 
> At a feature level I have some ideas for more features we
> could add, but for now I think I'll put this version in -next
> while you iron out the above wrinkles. Hope you can make the
> merge window.
Thanks, I also have some work lined up for hardware acceleration and
shared address spaces.

Jean

Eric Auger Oct. 16, 2018, 9:25 a.m. UTC | #3

Hi Jean,

On 10/12/18 4:59 PM, Jean-Philippe Brucker wrote:
> Implement the virtio-iommu driver, following specification v0.8 [1].
> Changes since v2 [2]:
> 
> * Patches 2-4 allow virtio-iommu to use the PCI transport, since QEMU
>   would like to phase out the MMIO transport. This produces a complex
>   topology where the programming interface of the IOMMU could appear
>   lower than the endpoints that it translates. It's not unheard of (e.g.
>   AMD IOMMU), and the guest easily copes with this.
>   
>   The "Firmware description" section of the specification has been
>   updated with all combinations of PCI, MMIO and DT, ACPI.

I have a question wrt the FW specification. The IOMMU consumes 1 slot in
the PCI domain and one needs to leave a RID hole in the iommu-map.  It
is not obvious to me that this RID always is predictable given the pcie
enumeration mechanism. Generally we have a coarse grain mapping of RID
onto iommu phandles/STREAMIDs. Here, if I understand correctly we need
to precisely identify the RID granted to the iommu. On QEMU this may
depend on the instantiation order of the virtio-pci device right? So
this does not look trivial to build this info. Isn't it possible to do
this exclusion at kernel level instead?

Thanks

Eric
> 
> * Fix structures layout, they don't need the "packed" attribute anymore.
> 
> * While we're at it, add domain parameter to DETACH request, and leave
>   some padding. This way the next version, that adds PASID support,
>   won't have to introduce a "DETACH2" request to stay backward
>   compatible.
> 
> * Require virtio device 1.0+. Remove legacy transport notes from the
>   specification.
> 
> * Request timeout is now only enabled with DEBUG.
> 
> * The patch for VFIO Kconfig (previously patch 5/5) is in next.
> 
> You can find Linux driver and kvmtool device on branches
> virtio-iommu/v0.8 [3] (currently based on 4.19-rc7 but rebasing onto
> next only produced a trivial conflict). Branch virtio-iommu/devel
> contains a few patches that I'd like to send once the base is upstream:
> 
> * virtio-iommu as a module. It got *much* nicer after Rob's probe
>   deferral rework, but I still have a bug to fix when re-loading the
>   virtio-iommu module.
> 
> * ACPI support requires a minor IORT spec update (reservation of node
>   ID). I think it should be easier to obtain once the device and drivers
>   are upstream.
> 
> [1] Virtio-iommu specification v0.8, diff from v0.7, and sources
>     git://linux-arm.org/virtio-iommu.git virtio-iommu/v0.8
>     http://jpbrucker.net/virtio-iommu/spec/v0.8/virtio-iommu-v0.8.pdf
>     http://jpbrucker.net/virtio-iommu/spec/diffs/virtio-iommu-pdf-diff-v0.7-v0.8.pdf
> 
> [2] [PATCH v2 0/5] Add virtio-iommu driver
>     https://www.spinics.net/lists/kvm/msg170655.html
> 
> [3] git://linux-arm.org/linux-jpb.git virtio-iommu/v0.8
>     git://linux-arm.org/kvmtool-jpb.git virtio-iommu/v0.8
> 
> Jean-Philippe Brucker (7):
>   dt-bindings: virtio-mmio: Add IOMMU description
>   dt-bindings: virtio: Add virtio-pci-iommu node
>   PCI: OF: allow endpoints to bypass the iommu
>   PCI: OF: Initialize dev->fwnode appropriately
>   iommu: Add virtio-iommu driver
>   iommu/virtio: Add probe request
>   iommu/virtio: Add event queue
> 
>  .../devicetree/bindings/virtio/iommu.txt      |   66 +
>  .../devicetree/bindings/virtio/mmio.txt       |   30 +
>  MAINTAINERS                                   |    7 +
>  drivers/iommu/Kconfig                         |   11 +
>  drivers/iommu/Makefile                        |    1 +
>  drivers/iommu/virtio-iommu.c                  | 1171 +++++++++++++++++
>  drivers/pci/of.c                              |   14 +-
>  include/uapi/linux/virtio_ids.h               |    1 +
>  include/uapi/linux/virtio_iommu.h             |  159 +++
>  9 files changed, 1457 insertions(+), 3 deletions(-)
>  create mode 100644 Documentation/devicetree/bindings/virtio/iommu.txt
>  create mode 100644 drivers/iommu/virtio-iommu.c
>  create mode 100644 include/uapi/linux/virtio_iommu.h
>

Jean-Philippe Brucker Oct. 16, 2018, 6:44 p.m. UTC | #4

On 16/10/2018 10:25, Auger Eric wrote:
> Hi Jean,
> 
> On 10/12/18 4:59 PM, Jean-Philippe Brucker wrote:
>> Implement the virtio-iommu driver, following specification v0.8 [1].
>> Changes since v2 [2]:
>> 
>> * Patches 2-4 allow virtio-iommu to use the PCI transport, since QEMU
>>   would like to phase out the MMIO transport. This produces a complex
>>   topology where the programming interface of the IOMMU could appear
>>   lower than the endpoints that it translates. It's not unheard of (e.g.
>>   AMD IOMMU), and the guest easily copes with this.
>>   
>>   The "Firmware description" section of the specification has been
>>   updated with all combinations of PCI, MMIO and DT, ACPI.
> 
> I have a question wrt the FW specification. The IOMMU consumes 1 slot in
> the PCI domain and one needs to leave a RID hole in the iommu-map.  It
> is not obvious to me that this RID always is predictable given the pcie
> enumeration mechanism. Generally we have a coarse grain mapping of RID
> onto iommu phandles/STREAMIDs. Here, if I understand correctly we need
> to precisely identify the RID granted to the iommu. On QEMU this may
> depend on the instantiation order of the virtio-pci device right?

Yes, although it should all happen before you boot the guest, since
there is no hotplugging an IOMMU. Could you reserve a PCI slot upfront
and use it for virtio-iommu later? Or generate the iommu-map at the same
time as generating the child node of the PCI RC?

> So
> this does not look trivial to build this info. Isn't it possible to do
> this exclusion at kernel level instead?

So in theory VIRTIO_F_IOMMU_PLATFORM already does that:

VIRTIO_F_IOMMU_PLATFORM(33)
    This feature indicates that the device is behind an IOMMU that
    translates bus addresses from the device into physical addresses in
    memory. If this feature bit is set to 0, then the device emits
    physical addresses which are not translated further, even though an
    IOMMU may be present.

For better or for worse, the guest has to implement it. If this feature
bit is unset for virtio-iommu, it does DMA on the physical address
space, regardless of what the static topology description says.

In practice it doesn't quite work. If your iommu-map describes the IOMMU
as translating itself, Linux' OF code will wait for the IOMMU to be
probed before probing the IOMMU. Working around this with hacks is
possible, but I don't want to introduce more questionable code to OF and
device tree bindings if there is any other way.

Thanks,
Jean

Eric Auger Oct. 16, 2018, 8:31 p.m. UTC | #5

Hi Jean,

On 10/16/18 8:44 PM, Jean-Philippe Brucker wrote:
> On 16/10/2018 10:25, Auger Eric wrote:
>> Hi Jean,
>>
>> On 10/12/18 4:59 PM, Jean-Philippe Brucker wrote:
>>> Implement the virtio-iommu driver, following specification v0.8 [1].
>>> Changes since v2 [2]:
>>>
>>> * Patches 2-4 allow virtio-iommu to use the PCI transport, since QEMU
>>>    would like to phase out the MMIO transport. This produces a complex
>>>    topology where the programming interface of the IOMMU could appear
>>>    lower than the endpoints that it translates. It's not unheard of (e.g.
>>>    AMD IOMMU), and the guest easily copes with this.
>>>    
>>>    The "Firmware description" section of the specification has been
>>>    updated with all combinations of PCI, MMIO and DT, ACPI.
>>
>> I have a question wrt the FW specification. The IOMMU consumes 1 slot in
>> the PCI domain and one needs to leave a RID hole in the iommu-map.  It
>> is not obvious to me that this RID always is predictable given the pcie
>> enumeration mechanism. Generally we have a coarse grain mapping of RID
>> onto iommu phandles/STREAMIDs. Here, if I understand correctly we need
>> to precisely identify the RID granted to the iommu. On QEMU this may
>> depend on the instantiation order of the virtio-pci device right?
> 
> Yes, although it should all happen before you boot the guest, since
> there is no hotplugging an IOMMU. Could you reserve a PCI slot upfront
> and use it for virtio-iommu later? Or generate the iommu-map at the same
> time as generating the child node of the PCI RC?

Even when cold-plugging the PCIe devices through qemu CLI, this depends
on the order of the pcie devices in the list I guess. I need to further
experiment.
> 
>> So
>> this does not look trivial to build this info. Isn't it possible to do
>> this exclusion at kernel level instead?
> 
> So in theory VIRTIO_F_IOMMU_PLATFORM already does that:
> 
> VIRTIO_F_IOMMU_PLATFORM(33)
>     This feature indicates that the device is behind an IOMMU that
>     translates bus addresses from the device into physical addresses in
>     memory. If this feature bit is set to 0, then the device emits
>     physical addresses which are not translated further, even though an
>     IOMMU may be present.

This tells the driver to use the dma api, right? Effectively this
explicitly says whether the device is supposed to be upfront an IOMMU.
> 
> For better or for worse, the guest has to implement it. If this feature
> bit is unset for virtio-iommu, it does DMA on the physical address
> space, regardless of what the static topology description says.
> 
> In practice it doesn't quite work. If your iommu-map describes the IOMMU
> as translating itself, Linux' OF code will wait for the IOMMU to be
> probed before probing the IOMMU. Working around this with hacks is
> possible, but I don't want to introduce more questionable code to OF and
> device tree bindings if there is any other way.
Hum ok. I cannot really comment on this.

I just wanted to raise this concern about RID identfication.

Thanks

Eric
> 
> Thanks,
> Jean
>

Jean-Philippe Brucker Oct. 17, 2018, 11:54 a.m. UTC | #6

On 16/10/2018 21:31, Auger Eric wrote:
> Hi Jean,
> 
> On 10/16/18 8:44 PM, Jean-Philippe Brucker wrote:
>> On 16/10/2018 10:25, Auger Eric wrote:
>>> Hi Jean,
>>>
>>> On 10/12/18 4:59 PM, Jean-Philippe Brucker wrote:
>>>> Implement the virtio-iommu driver, following specification v0.8 [1].
>>>> Changes since v2 [2]:
>>>>
>>>> * Patches 2-4 allow virtio-iommu to use the PCI transport, since QEMU
>>>>    would like to phase out the MMIO transport. This produces a complex
>>>>    topology where the programming interface of the IOMMU could appear
>>>>    lower than the endpoints that it translates. It's not unheard of (e.g.
>>>>    AMD IOMMU), and the guest easily copes with this.
>>>>    
>>>>    The "Firmware description" section of the specification has been
>>>>    updated with all combinations of PCI, MMIO and DT, ACPI.
>>>
>>> I have a question wrt the FW specification. The IOMMU consumes 1 slot in
>>> the PCI domain and one needs to leave a RID hole in the iommu-map.  It
>>> is not obvious to me that this RID always is predictable given the pcie
>>> enumeration mechanism. Generally we have a coarse grain mapping of RID
>>> onto iommu phandles/STREAMIDs. Here, if I understand correctly we need
>>> to precisely identify the RID granted to the iommu. On QEMU this may
>>> depend on the instantiation order of the virtio-pci device right?
>> 
>> Yes, although it should all happen before you boot the guest, since
>> there is no hotplugging an IOMMU. Could you reserve a PCI slot upfront
>> and use it for virtio-iommu later? Or generate the iommu-map at the same
>> time as generating the child node of the PCI RC?
> 
> Even when cold-plugging the PCIe devices through qemu CLI, this depends
> on the order of the pcie devices in the list I guess. I need to further
> experiment.

Please let me know how it goes. I guess the problem will be the same for
building IORT tables? You're also going to need a hole in the ID
mappings of the PCI root complex node.

>>> So
>>> this does not look trivial to build this info. Isn't it possible to do
>>> this exclusion at kernel level instead?
>> 
>> So in theory VIRTIO_F_IOMMU_PLATFORM already does that:
>> 
>> VIRTIO_F_IOMMU_PLATFORM(33)
>>     This feature indicates that the device is behind an IOMMU that
>>     translates bus addresses from the device into physical addresses in
>>     memory. If this feature bit is set to 0, then the device emits
>>     physical addresses which are not translated further, even though an
>>     IOMMU may be present.
> 
> This tells the driver to use the dma api, right? 

That's how Linux implements the bit, install custom DMA ops when the bit
is absent. But it doesn't work for everyone and has caused a lot of
debate (https://patchwork.ozlabs.org/cover/946708/)

> Effectively this
> explicitly says whether the device is supposed to be upfront an IOMMU.

Yes. It's quite strange if you consider hotpluggable hardware, since
those devices shouldn't get to choose whether they are managed by an
IOMMU. For the IOMMU itself, it should be fine

>> For better or for worse, the guest has to implement it. If this feature
>> bit is unset for virtio-iommu, it does DMA on the physical address
>> space, regardless of what the static topology description says.
>> 
>> In practice it doesn't quite work. If your iommu-map describes the IOMMU
>> as translating itself, Linux' OF code will wait for the IOMMU to be
>> probed before probing the IOMMU. Working around this with hacks is
>> possible, but I don't want to introduce more questionable code to OF and
>> device tree bindings if there is any other way.
> Hum ok. I cannot really comment on this.
> 
> I just wanted to raise this concern about RID identfication.

We can always try. Relaxing iommu-map further would be one additional
patch to Documentation/devicetree/bindings/pci/pci-iommu.txt, and one to
drivers/iommu/of-iommu.c. I'd rather make it a separate RFC.

Since we need acks from an OF maintainer and I'd also like Joerg's
approval for adding a new driver to the IOMMU tree, I think it's too
late for this iteration. I wasn't intending for this to go into 4.20,
just have something to discuss at KVM forum next week.

Thanks,
Jean

Michael S. Tsirkin Oct. 17, 2018, 3:23 p.m. UTC | #7

On Wed, Oct 17, 2018 at 12:54:28PM +0100, Jean-Philippe Brucker wrote:
> On 16/10/2018 21:31, Auger Eric wrote:
> > Hi Jean,
> > 
> > On 10/16/18 8:44 PM, Jean-Philippe Brucker wrote:
> >> On 16/10/2018 10:25, Auger Eric wrote:
> >>> Hi Jean,
> >>>
> >>> On 10/12/18 4:59 PM, Jean-Philippe Brucker wrote:
> >>>> Implement the virtio-iommu driver, following specification v0.8 [1].
> >>>> Changes since v2 [2]:
> >>>>
> >>>> * Patches 2-4 allow virtio-iommu to use the PCI transport, since QEMU
> >>>>    would like to phase out the MMIO transport. This produces a complex
> >>>>    topology where the programming interface of the IOMMU could appear
> >>>>    lower than the endpoints that it translates. It's not unheard of (e.g.
> >>>>    AMD IOMMU), and the guest easily copes with this.
> >>>>    
> >>>>    The "Firmware description" section of the specification has been
> >>>>    updated with all combinations of PCI, MMIO and DT, ACPI.
> >>>
> >>> I have a question wrt the FW specification. The IOMMU consumes 1 slot in
> >>> the PCI domain and one needs to leave a RID hole in the iommu-map.  It
> >>> is not obvious to me that this RID always is predictable given the pcie
> >>> enumeration mechanism. Generally we have a coarse grain mapping of RID
> >>> onto iommu phandles/STREAMIDs. Here, if I understand correctly we need
> >>> to precisely identify the RID granted to the iommu. On QEMU this may
> >>> depend on the instantiation order of the virtio-pci device right?
> >> 
> >> Yes, although it should all happen before you boot the guest, since
> >> there is no hotplugging an IOMMU. Could you reserve a PCI slot upfront
> >> and use it for virtio-iommu later? Or generate the iommu-map at the same
> >> time as generating the child node of the PCI RC?
> > 
> > Even when cold-plugging the PCIe devices through qemu CLI, this depends
> > on the order of the pcie devices in the list I guess. I need to further
> > experiment.
> 
> Please let me know how it goes. I guess the problem will be the same for
> building IORT tables? You're also going to need a hole in the ID
> mappings of the PCI root complex node.
> 
> >>> So
> >>> this does not look trivial to build this info. Isn't it possible to do
> >>> this exclusion at kernel level instead?
> >> 
> >> So in theory VIRTIO_F_IOMMU_PLATFORM already does that:
> >> 
> >> VIRTIO_F_IOMMU_PLATFORM(33)
> >>     This feature indicates that the device is behind an IOMMU that
> >>     translates bus addresses from the device into physical addresses in
> >>     memory. If this feature bit is set to 0, then the device emits
> >>     physical addresses which are not translated further, even though an
> >>     IOMMU may be present.
> > 
> > This tells the driver to use the dma api, right? 
> 
> That's how Linux implements the bit, install custom DMA ops when the bit
> is absent. But it doesn't work for everyone and has caused a lot of
> debate (https://patchwork.ozlabs.org/cover/946708/)
> 
> > Effectively this
> > explicitly says whether the device is supposed to be upfront an IOMMU.
> 
> Yes. It's quite strange if you consider hotpluggable hardware, since
> those devices shouldn't get to choose whether they are managed by an
> IOMMU. For the IOMMU itself, it should be fine
> 
> >> For better or for worse, the guest has to implement it. If this feature
> >> bit is unset for virtio-iommu, it does DMA on the physical address
> >> space, regardless of what the static topology description says.
> >> 
> >> In practice it doesn't quite work. If your iommu-map describes the IOMMU
> >> as translating itself, Linux' OF code will wait for the IOMMU to be
> >> probed before probing the IOMMU. Working around this with hacks is
> >> possible, but I don't want to introduce more questionable code to OF and
> >> device tree bindings if there is any other way.
> > Hum ok. I cannot really comment on this.
> > 
> > I just wanted to raise this concern about RID identfication.
> 
> We can always try. Relaxing iommu-map further would be one additional
> patch to Documentation/devicetree/bindings/pci/pci-iommu.txt, and one to
> drivers/iommu/of-iommu.c. I'd rather make it a separate RFC.
> 
> Since we need acks from an OF maintainer and I'd also like Joerg's
> approval for adding a new driver to the IOMMU tree, I think it's too
> late for this iteration. I wasn't intending for this to go into 4.20,
> just have something to discuss at KVM forum next week.
> 
> Thanks,
> Jean

OK then. I'd appreciate it if you mark patches that aren't
intended to be merged as RFC in subject line.
Thanks!

[v3,0/7] Add virtio-iommu driver

Message

Comments