mbox series

[XEN,RFC,v5,0/5] IOMMU subsystem redesign and PV-IOMMU interface

Message ID cover.1737470269.git.teddy.astie@vates.tech (mailing list archive)
Headers show
Series IOMMU subsystem redesign and PV-IOMMU interface | expand

Message

Teddy Astie Jan. 21, 2025, 4:13 p.m. UTC
This work has been presented at Xen Summit 2024 during the
  IOMMU paravirtualization and Xen IOMMU subsystem rework
design session.

Operating systems may want to have access to a IOMMU in order to do DMA
protection or implement certain features (e.g VFIO on Linux).

VFIO support is mandatory for framework such as SPDK, which can be useful to
implement an alternative storage backend for virtual machines [1].

In this patch series, we introduce in Xen the ability to manage several
contexts per domain and provide a new hypercall interface to allow guests
to manage IOMMU contexts.

The VT-d driver is updated to support these new features.

[1] Using SPDK with the Xen hypervisor - FOSDEM 2023
---
Cc: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>

PCI Passthrough now work on my side, but things are still feels quite brittle.

Changed in v2 :
* fixed Xen crash when dumping IOMMU contexts (using X debug key)
with DomUs without IOMMU
* s/dettach/detach/
* removed some unused includes
* fix dangling devices in contexts with detach

Changed in v3 :
* lock entirely map/unmap in hypercall
* prevent IOMMU operations on dying contexts (fix race condition)
* iommu_check_context+iommu_get_context -> iommu_get_context and check for NULL

Changed in v4 :
* Part of initialization logic is moved to domain or toolstack (IOMMU_init)
  + domain/toolstack now decides on "context count" and "pagetable pool size"
  + for now, all domains are able to initialize PV-IOMMU
* introduce "dom0-iommu=no-dma" to make default context block all DMA
  (disables HAP and sync-pt), enforcing usage of PV-IOMMU for DMA
  Can be used to expose properly "Pre-boot DMA protection"
* redesigned locking logic for contexts
  + contexts are accessed using iommu_get_context and released with iommu_put_context

Changed in v5 :
* various PCI Passthrough related fixes
  + rewrote parts of PCI Passthrough logic
  + various other related bug fixes
* simplified VT-d DID (for hardware) management by only having one map instead of two
  (pseudo_domid map was previously used for old quarantine code then recycled for PV-IOMMU
   in addition to another map also tracing Domain<->VT-d DID, now there is only one
   map tracking both making things simpler)
* reworked parts of Xen quarantine logic (needed for PCI Passthrough)
* added cf_check annotations
* some changes to PV-IOMMU headers (Alejandro)

TODO:
* add stub implementations for bissecting needs and non-ported IOMMU implementations
* fix some issues with no-dma+PV and grants
* complete "no-dma" mode (expose to toolstack, add documentation, ...)
* properly define nested mode and PASID support

* make new quarantine code more unity region aware (isolate devices with
  different reserved regions regions using separate 'contexts')
* find a way to make PV-IOMMU work in DomUs (they don't see machine bdf)
* there are corner cases with PV-IOMMU and to-domain Xen PCI Passthrough
  (e.g pci-assignable-remove will reassign to context 0, while the driver
   expects the device to to be in context X)

Teddy Astie (5):
  docs/designs: Add a design document for IOMMU subsystem redesign
  docs/designs: Add a design document for PV-IOMMU
  xen/public: Introduce PV-IOMMU hypercall interface
  IOMMU: Introduce redesigned IOMMU subsystem
  VT-d: Port IOMMU driver to new subsystem

 docs/designs/iommu-contexts.md       |  403 ++++++
 docs/designs/pv-iommu.md             |  116 ++
 xen/arch/x86/domain.c                |    2 +-
 xen/arch/x86/include/asm/arena.h     |   54 +
 xen/arch/x86/include/asm/iommu.h     |   58 +-
 xen/arch/x86/include/asm/pci.h       |   17 -
 xen/arch/x86/mm/p2m-ept.c            |    2 +-
 xen/arch/x86/pv/dom0_build.c         |    6 +-
 xen/arch/x86/tboot.c                 |    4 +-
 xen/common/Makefile                  |    1 +
 xen/common/memory.c                  |    4 +-
 xen/common/pv-iommu.c                |  539 ++++++++
 xen/drivers/passthrough/Makefile     |    3 +
 xen/drivers/passthrough/context.c    |  740 +++++++++++
 xen/drivers/passthrough/iommu.c      |  431 +++----
 xen/drivers/passthrough/pci.c        |  379 ++----
 xen/drivers/passthrough/quarantine.c |   49 +
 xen/drivers/passthrough/vtd/Makefile |    2 +-
 xen/drivers/passthrough/vtd/extern.h |   16 +-
 xen/drivers/passthrough/vtd/iommu.c  | 1692 ++++++++------------------
 xen/drivers/passthrough/vtd/iommu.h  |    4 +-
 xen/drivers/passthrough/vtd/qinval.c |    2 +-
 xen/drivers/passthrough/vtd/quirks.c |   20 +-
 xen/drivers/passthrough/x86/Makefile |    1 +
 xen/drivers/passthrough/x86/arena.c  |  157 +++
 xen/drivers/passthrough/x86/iommu.c  |  299 +++--
 xen/include/hypercall-defs.c         |    6 +
 xen/include/public/pv-iommu.h        |  343 ++++++
 xen/include/public/xen.h             |    1 +
 xen/include/xen/iommu.h              |  119 +-
 xen/include/xen/pci.h                |    3 +
 31 files changed, 3606 insertions(+), 1867 deletions(-)
 create mode 100644 docs/designs/iommu-contexts.md
 create mode 100644 docs/designs/pv-iommu.md
 create mode 100644 xen/arch/x86/include/asm/arena.h
 create mode 100644 xen/common/pv-iommu.c
 create mode 100644 xen/drivers/passthrough/context.c
 create mode 100644 xen/drivers/passthrough/quarantine.c
 create mode 100644 xen/drivers/passthrough/x86/arena.c
 create mode 100644 xen/include/public/pv-iommu.h

Comments

Marek Marczykowski-Górecki Jan. 23, 2025, 12:45 p.m. UTC | #1
On Tue, Jan 21, 2025 at 04:13:20PM +0000, Teddy Astie wrote:
> This work has been presented at Xen Summit 2024 during the
>   IOMMU paravirtualization and Xen IOMMU subsystem rework
> design session.
> 
> Operating systems may want to have access to a IOMMU in order to do DMA
> protection or implement certain features (e.g VFIO on Linux).
> 
> VFIO support is mandatory for framework such as SPDK, which can be useful to
> implement an alternative storage backend for virtual machines [1].
> 
> In this patch series, we introduce in Xen the ability to manage several
> contexts per domain and provide a new hypercall interface to allow guests
> to manage IOMMU contexts.
> 
> The VT-d driver is updated to support these new features.
> 
> [1] Using SPDK with the Xen hypervisor - FOSDEM 2023
> ---
> Cc: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> 
> PCI Passthrough now work on my side, but things are still feels quite brittle.
> 
> Changed in v2 :
> * fixed Xen crash when dumping IOMMU contexts (using X debug key)
> with DomUs without IOMMU
> * s/dettach/detach/
> * removed some unused includes
> * fix dangling devices in contexts with detach
> 
> Changed in v3 :
> * lock entirely map/unmap in hypercall
> * prevent IOMMU operations on dying contexts (fix race condition)
> * iommu_check_context+iommu_get_context -> iommu_get_context and check for NULL
> 
> Changed in v4 :
> * Part of initialization logic is moved to domain or toolstack (IOMMU_init)
>   + domain/toolstack now decides on "context count" and "pagetable pool size"
>   + for now, all domains are able to initialize PV-IOMMU
> * introduce "dom0-iommu=no-dma" to make default context block all DMA
>   (disables HAP and sync-pt), enforcing usage of PV-IOMMU for DMA
>   Can be used to expose properly "Pre-boot DMA protection"
> * redesigned locking logic for contexts
>   + contexts are accessed using iommu_get_context and released with iommu_put_context
> 
> Changed in v5 :
> * various PCI Passthrough related fixes
>   + rewrote parts of PCI Passthrough logic
>   + various other related bug fixes
> * simplified VT-d DID (for hardware) management by only having one map instead of two
>   (pseudo_domid map was previously used for old quarantine code then recycled for PV-IOMMU
>    in addition to another map also tracing Domain<->VT-d DID, now there is only one
>    map tracking both making things simpler)
> * reworked parts of Xen quarantine logic (needed for PCI Passthrough)
> * added cf_check annotations
> * some changes to PV-IOMMU headers (Alejandro)
> 
> TODO:
> * add stub implementations for bissecting needs and non-ported IOMMU implementations
> * fix some issues with no-dma+PV and grants
> * complete "no-dma" mode (expose to toolstack, add documentation, ...)
> * properly define nested mode and PASID support
> 
> * make new quarantine code more unity region aware (isolate devices with
>   different reserved regions regions using separate 'contexts')
> * find a way to make PV-IOMMU work in DomUs (they don't see machine bdf)
> * there are corner cases with PV-IOMMU and to-domain Xen PCI Passthrough
>   (e.g pci-assignable-remove will reassign to context 0, while the driver
>    expects the device to to be in context X)

Thanks for the updated patches. I have run them through gitlab-ci, and
here are some observations:
- I needed to disable CONFIG_AMD_IOMMU (it fails to build, as expected at this point)
- I needed to disable pvshim (it fails to build)
- fails to build with clang: https://gitlab.com/xen-project/people/marmarek/xen/-/jobs/8931373789/viewer#L3525
- gcc-ibt build fails: https://gitlab.com/xen-project/people/marmarek/xen/-/jobs/8931373785#L1314
- fails to build for ARM (both 32 and 64) and PPC64
- QEMU smoke test panic with PV dom0, looks like it runs on AMD, so it
  may be related to the disabled CONFIG_AMD_IOMMU, but I wouldn't expect
  it to panic on _PV_ dom0 boot...
- PVH dom0 fails to boot (on real hw) with a lot of VT-d faults: https://gitlab.com/xen-project/people/marmarek/xen/-/jobs/8931373875
- PCI passthrough (with PV dom0) results in a lot of VT-d faults: https://gitlab.com/xen-project/people/marmarek/xen/-/jobs/8931373881

Note this uses only this series, but plain Linux (appears to be 6.1.19).
IIUC if one doesn't try to configure PV-IOMMU specifically (non-default
contexts) it should still work.

BTW Linux says it detected "Xen version 4.19." - shouldn't it report
4.20 already at this point in release cycle?

All results:
https://gitlab.com/xen-project/people/marmarek/xen/-/pipelines/1637849303
Jan Beulich Jan. 23, 2025, 12:48 p.m. UTC | #2
On 23.01.2025 13:45, Marek Marczykowski-Górecki wrote:
> BTW Linux says it detected "Xen version 4.19." - shouldn't it report
> 4.20 already at this point in release cycle?

Not only at this point, but throughout the release cycle. Yet I fear I
haven't seen such, so I wouldn't be able to look into it.

Jan
Marek Marczykowski-Górecki Jan. 23, 2025, 1:23 p.m. UTC | #3
On Thu, Jan 23, 2025 at 01:48:29PM +0100, Jan Beulich wrote:
> On 23.01.2025 13:45, Marek Marczykowski-Górecki wrote:
> > BTW Linux says it detected "Xen version 4.19." - shouldn't it report
> > 4.20 already at this point in release cycle?
> 
> Not only at this point, but throughout the release cycle. Yet I fear I
> haven't seen such, so I wouldn't be able to look into it.

Ah, my bad, it seems Teddy's branch is based on 4.19, not staging.
Marek Marczykowski-Górecki Jan. 23, 2025, 1:58 p.m. UTC | #4
On Thu, Jan 23, 2025 at 01:45:40PM +0100, Marek Marczykowski-Górecki wrote:
> On Tue, Jan 21, 2025 at 04:13:20PM +0000, Teddy Astie wrote:
> > This work has been presented at Xen Summit 2024 during the
> >   IOMMU paravirtualization and Xen IOMMU subsystem rework
> > design session.
> > 
> > Operating systems may want to have access to a IOMMU in order to do DMA
> > protection or implement certain features (e.g VFIO on Linux).
> > 
> > VFIO support is mandatory for framework such as SPDK, which can be useful to
> > implement an alternative storage backend for virtual machines [1].
> > 
> > In this patch series, we introduce in Xen the ability to manage several
> > contexts per domain and provide a new hypercall interface to allow guests
> > to manage IOMMU contexts.
> > 
> > The VT-d driver is updated to support these new features.
> > 
> > [1] Using SPDK with the Xen hypervisor - FOSDEM 2023
> > ---
> > Cc: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> > 
> > PCI Passthrough now work on my side, but things are still feels quite brittle.
> > 
> > Changed in v2 :
> > * fixed Xen crash when dumping IOMMU contexts (using X debug key)
> > with DomUs without IOMMU
> > * s/dettach/detach/
> > * removed some unused includes
> > * fix dangling devices in contexts with detach
> > 
> > Changed in v3 :
> > * lock entirely map/unmap in hypercall
> > * prevent IOMMU operations on dying contexts (fix race condition)
> > * iommu_check_context+iommu_get_context -> iommu_get_context and check for NULL
> > 
> > Changed in v4 :
> > * Part of initialization logic is moved to domain or toolstack (IOMMU_init)
> >   + domain/toolstack now decides on "context count" and "pagetable pool size"
> >   + for now, all domains are able to initialize PV-IOMMU
> > * introduce "dom0-iommu=no-dma" to make default context block all DMA
> >   (disables HAP and sync-pt), enforcing usage of PV-IOMMU for DMA
> >   Can be used to expose properly "Pre-boot DMA protection"
> > * redesigned locking logic for contexts
> >   + contexts are accessed using iommu_get_context and released with iommu_put_context
> > 
> > Changed in v5 :
> > * various PCI Passthrough related fixes
> >   + rewrote parts of PCI Passthrough logic
> >   + various other related bug fixes
> > * simplified VT-d DID (for hardware) management by only having one map instead of two
> >   (pseudo_domid map was previously used for old quarantine code then recycled for PV-IOMMU
> >    in addition to another map also tracing Domain<->VT-d DID, now there is only one
> >    map tracking both making things simpler)
> > * reworked parts of Xen quarantine logic (needed for PCI Passthrough)
> > * added cf_check annotations
> > * some changes to PV-IOMMU headers (Alejandro)
> > 
> > TODO:
> > * add stub implementations for bissecting needs and non-ported IOMMU implementations
> > * fix some issues with no-dma+PV and grants
> > * complete "no-dma" mode (expose to toolstack, add documentation, ...)
> > * properly define nested mode and PASID support
> > 
> > * make new quarantine code more unity region aware (isolate devices with
> >   different reserved regions regions using separate 'contexts')
> > * find a way to make PV-IOMMU work in DomUs (they don't see machine bdf)
> > * there are corner cases with PV-IOMMU and to-domain Xen PCI Passthrough
> >   (e.g pci-assignable-remove will reassign to context 0, while the driver
> >    expects the device to to be in context X)
> 
> Thanks for the updated patches. I have run them through gitlab-ci, and
> here are some observations:
> - I needed to disable CONFIG_AMD_IOMMU (it fails to build, as expected at this point)
> - I needed to disable pvshim (it fails to build)
> - fails to build with clang: https://gitlab.com/xen-project/people/marmarek/xen/-/jobs/8931373789/viewer#L3525
> - gcc-ibt build fails: https://gitlab.com/xen-project/people/marmarek/xen/-/jobs/8931373785#L1314
> - fails to build for ARM (both 32 and 64) and PPC64
> - QEMU smoke test panic with PV dom0, looks like it runs on AMD, so it
>   may be related to the disabled CONFIG_AMD_IOMMU, but I wouldn't expect
>   it to panic on _PV_ dom0 boot...
> - PVH dom0 fails to boot (on real hw) with a lot of VT-d faults: https://gitlab.com/xen-project/people/marmarek/xen/-/jobs/8931373875
> - PCI passthrough (with PV dom0) results in a lot of VT-d faults: https://gitlab.com/xen-project/people/marmarek/xen/-/jobs/8931373881
> 
> Note this uses only this series, but plain Linux (appears to be 6.1.19).
> IIUC if one doesn't try to configure PV-IOMMU specifically (non-default
> contexts) it should still work.
> 
> BTW Linux says it detected "Xen version 4.19." - shouldn't it report
> 4.20 already at this point in release cycle?
> 
> All results:
> https://gitlab.com/xen-project/people/marmarek/xen/-/pipelines/1637849303

FWIW the test run rebased on staging looks similar:
https://gitlab.com/xen-project/people/marmarek/xen/-/pipelines/1638019332
Teddy Astie Jan. 23, 2025, 3:46 p.m. UTC | #5
Hello Marek,
Thanks for your testing.

Le 23/01/2025 à 13:45, Marek Marczykowski-Górecki a écrit :
> Thanks for the updated patches. I have run them through gitlab-
ci, and
> here are some observations:
> - I needed to disable CONFIG_AMD_IOMMU (it fails to build, as expected at this point)
> - I needed to disable pvshim (it fails to build)

> - fails to build with clang: https://gitlab.com/xen-project/people/marmarek/xen/-/jobs/8931373789/viewer#L3525> - gcc-ibt build fails: 
https://gitlab.com/xen-project/people/marmarek/xen/-/jobs/8931373785#L1314

Looks like another cf_check related issue that I missed.

> - fails to build for ARM (both 32 and 64) and PPC64

This is expected like for the AMD_IOMMU part.

> - QEMU smoke test panic with PV dom0, looks like it runs on AMD, so it
>    may be related to the disabled CONFIG_AMD_IOMMU, but I wouldn't expect
>    it to panic on _PV_ dom0 boot...

Looks like I broke something when there is no IOMMU detected (removed 
some check that should be there).

This patch should fix it (tested with QEMU without IOMMU).

---
diff --git a/xen/drivers/passthrough/context.c 
b/xen/drivers/passthrough/context.c
index 6e68f840f3..98c84b439b 100644
--- a/xen/drivers/passthrough/context.c
+++ b/xen/drivers/passthrough/context.c
@@ -347,6 +347,10 @@ int iommu_iotlb_flush_all(struct domain *d, u16 
ctx_no, unsigned int flush_flags
      struct iommu_context *ctx;
      int rc;

+    if ( !is_iommu_enabled(d) || !hd->platform_ops->iotlb_flush ||
+         !flush_flags )
+        return 0;
+
      if ( !(ctx = iommu_get_context(d, ctx_no)) )
          return -ENOENT;

---

> - PVH dom0 fails to boot (on real hw) with a lot of VT-d faults: https://gitlab.com/xen-project/people/marmarek/xen/-/jobs/8931373875

I guess 00:02.0 is the iGPU. The addresses point to reserved memory in 
E820 (is it the framebuffer ?) which should be reconfigured by the guest.

Is the guest dying (maybe due to the PVH Dom0 issue) before being able
to setup anything, causing the GPU to not be properly set ?
I tested a plain Alpine 3.18 PVH Dom0 and the kernel crashes very early
(6.1.123-0-lts though).

Or there is something else going wrong like with PCI Passthrough.
> - PCI passthrough (with PV dom0) results in a lot of VT-d faults: 
> Note this uses only this series, but plain Linux (appears to be 6.1.19).
> IIUC if one doesn't try to configure PV-IOMMU specifically (non-default
> contexts) it should still work.

Yes, and PV-IOMMU drivers will likely not fix the issues you are facing.

> 
> BTW Linux says it detected "Xen version 4.19." - shouldn't it report
> 4.20 already at this point in release cycle?
> 

It's probably because I mostly tested on Xen 4.19 (for practical reasons 
to make toolstack happy), but I will update it to staging.

> All results:
> https://gitlab.com/xen-project/people/marmarek/xen/-/pipelines/1637849303
> 

Thanks

Teddy



Teddy Astie | Vates XCP-ng Developer

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech