diff mbox series

[XEN,RFC,v6,02/11] docs/designs: Add a design document for PV-IOMMU

Message ID ddffc703f6919e8d8004cd58a682682e1e86ec80.1739785339.git.teddy.astie@vates.tech (mailing list archive)
State New
Headers show
Series IOMMU subsystem redesign and PV-IOMMU interface | expand

Commit Message

Teddy Astie Feb. 17, 2025, 10:18 a.m. UTC
Some operating systems want to use IOMMU to implement various features (e.g
VFIO) or DMA protection.
This patch introduce a proposal for IOMMU paravirtualization for Dom0.

Signed-off-by: Teddy Astie <teddy.astie@vates.tech>
---
 docs/designs/pv-iommu.md | 116 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 116 insertions(+)
 create mode 100644 docs/designs/pv-iommu.md

Comments

Frediano Ziglio Feb. 19, 2025, 12:02 p.m. UTC | #1
On 17/02/2025 10:18, Teddy Astie wrote:
> Some operating systems want to use IOMMU to implement various features (e.g
> VFIO) or DMA protection.
> This patch introduce a proposal for IOMMU paravirtualization for Dom0.
> 
> Signed-off-by: Teddy Astie <teddy.astie@vates.tech>
> ---
>   docs/designs/pv-iommu.md | 116 +++++++++++++++++++++++++++++++++++++++
>   1 file changed, 116 insertions(+)
>   create mode 100644 docs/designs/pv-iommu.md
> 
> diff --git a/docs/designs/pv-iommu.md b/docs/designs/pv-iommu.md
> new file mode 100644
> index 0000000000..7df9fa0b94
> --- /dev/null
> +++ b/docs/designs/pv-iommu.md
> @@ -0,0 +1,116 @@
> +# IOMMU paravirtualization for Dom0
> +
> +Status: Experimental
> +
> +# Background
> +
> +By default, Xen only uses the IOMMU for itself, either to make device adress
> +space coherent with guest adress space (x86 HVM/PVH) or to prevent devices

typo: adress -> address

> +from doing DMA outside it's expected memory regions including the hypervisor
> +(x86 PV).
> +
> +A limitation is that guests (especially privildged ones) may want to use

typo: privildged -> privileged

> +IOMMU hardware in order to implement features such as DMA protection and
> +VFIO [1] as IOMMU functionality is not available outside of the hypervisor
> +currently.
> +
> +[1] VFIO - "Virtual Function I/O" - https://www.kernel.org/doc/html/latest/driver-api/vfio.html
> +
> +# Design
> +
> +The operating system may want to have access to various IOMMU features such as
> +context management and DMA remapping. We can create a new hypercall that allows
> +the guest to have access to a new paravirtualized IOMMU interface.
> +
> +This feature is only meant to be available for the Dom0, as DomU have some
> +emulated devices that can't be managed on Xen side and are not hardware, we
> +can't rely on the hardware IOMMU to enforce DMA remapping.
> +
> +This interface is exposed under the `iommu_op` hypercall.
> +
> +In addition, Xen domains are modified in order to allow existence of several
> +IOMMU context including a default one that implement default behavior (e.g
> +hardware assisted paging) and can't be modified by guest. DomU cannot have
> +contexts, and therefore act as if they only have the default domain.
> +
> +Each IOMMU context within a Xen domain is identified using a domain-specific
> +context number that is used in the Xen IOMMU subsystem and the hypercall
> +interface.
> +
> +The number of IOMMU context a domain is specified by either the toolstack or
> +the domain itself.

I don't understand what you want express with the above sentence.
Maybe it's just me.

> +
> +# IOMMU operations
> +
> +## Initialize PV-IOMMU
> +
> +Initialize PV-IOMMU for the domain.
> +It can only be called once.
> +

Could this operation be done automatically on first context allocation ?

> +## Alloc context
> +
> +Create a new IOMMU context for the guest and return the context number to the
> +guest.
> +Fail if the IOMMU context limit of the guest is reached.
> +
> +A flag can be specified to create a identity mapping.
> +
> +## Free context
> +
> +Destroy a IOMMU context created previously.
> +It is not possible to free the default context.
> +
> +Reattach context devices to default context if specified by the guest.
> +
> +Fail if there is a device in the context and reattach-to-default flag is not
> +specified.
> +
> +## Reattach device
> +
> +Reattach a device to another IOMMU context (including the default one).
> +The target IOMMU context number must be valid and the context allocated.
> +
> +The guest needs to specify a PCI SBDF of a device he has access to.
> +
> +## Map/unmap page
> +
> +Map/unmap a page on a context.
> +The guest needs to specify a gfn and target dfn to map.
> +
> +Refuse to create the mapping if one already exist for the same dfn.
> +
> +## Lookup page
> +
> +Get the gfn mapped by a specific dfn.
> +
> +## Remote command
> +
> +Make a PV-IOMMU operation on behalf of another domain.
> +Especially useful for implementing IOMMU emulation (e.g using QEMU)
> +or initializing PV-IOMMU with enforced limits.
> +
> +# Implementation considerations
> +
> +## Hypercall batching
> +
> +In order to prevent unneeded hypercalls and IOMMU flushing, it is advisable to
> +be able to batch some critical IOMMU operations (e.g map/unmap multiple pages).
> +

I suppose that batching also implies preemption.

> +## Hardware without IOMMU support
> +
> +Operating system needs to be aware on PV-IOMMU capability, and whether it is
> +able to make contexts. However, some operating system may critically fail in
> +case they are able to make a new IOMMU context. Which is supposed to happen
> +if no IOMMU hardware is available.
> +
> +The hypercall interface needs a interface to advertise the ability to create
> +and manage IOMMU contexts including the amount of context the guest is able
> +to use. Using these informations, the Dom0 may decide whether to use or not
> +the PV-IOMMU interface.
> +
> +## Page pool for contexts
> +
> +In order to prevent unexpected starving on the hypervisor memory with a
> +buggy Dom0. We can preallocate the pages the contexts will use and make
> +map/unmap use these pages instead of allocating them dynamically.
> +

Regards,
   Frediano
Teddy Astie Feb. 19, 2025, 2:01 p.m. UTC | #2
Hello Frediano,

Ok for typos fixes

Le 19/02/2025 à 13:02, Frediano Ziglio a écrit :
> On 17/02/2025 10:18, Teddy Astie wrote:
>> +Each IOMMU context within a Xen domain is identified using a domain- 
>> specific
>> +context number that is used in the Xen IOMMU subsystem and the hypercall
>> +interface.
>> +
>> +The number of IOMMU context a domain is specified by either the 
>> toolstack or
>> +the domain itself.
> 
> I don't understand what you want express with the above sentence.
> Maybe it's just me.
> 
>> +
>> +# IOMMU operations
>> +
>> +## Initialize PV-IOMMU
>> +
>> +Initialize PV-IOMMU for the domain.
>> +It can only be called once.
>> +
> 
> Could this operation be done automatically on first context allocation ?
> 

For initializing PV-IOMMU, you need to pass some additional parameters 
(memory/context limits). To avoid a guest from initializing with 
arbitrary limits, it can also be done by the toolstack (e.g domain 
builder) to enforce some specific limitations as this initialization can 
only be done once.

>> +## Alloc context
>> +
>> +Create a new IOMMU context for the guest and return the context 
>> number to the
>> +guest.
>> +Fail if the IOMMU context limit of the guest is reached.
>> +
>> +A flag can be specified to create a identity mapping.
>> +
>> +## Free context
>> +
>> +Destroy a IOMMU context created previously.
>> +It is not possible to free the default context.
>> +
>> +Reattach context devices to default context if specified by the guest.
>> +
>> +Fail if there is a device in the context and reattach-to-default flag 
>> is not
>> +specified.
>> +
>> +## Reattach device
>> +
>> +Reattach a device to another IOMMU context (including the default one).
>> +The target IOMMU context number must be valid and the context allocated.
>> +
>> +The guest needs to specify a PCI SBDF of a device he has access to.
>> +
>> +## Map/unmap page
>> +
>> +Map/unmap a page on a context.
>> +The guest needs to specify a gfn and target dfn to map.
>> +
>> +Refuse to create the mapping if one already exist for the same dfn.
>> +
>> +## Lookup page
>> +
>> +Get the gfn mapped by a specific dfn.
>> +
>> +## Remote command
>> +
>> +Make a PV-IOMMU operation on behalf of another domain.
>> +Especially useful for implementing IOMMU emulation (e.g using QEMU)
>> +or initializing PV-IOMMU with enforced limits.
>> +
>> +# Implementation considerations
>> +
>> +## Hypercall batching
>> +
>> +In order to prevent unneeded hypercalls and IOMMU flushing, it is 
>> advisable to
>> +be able to batch some critical IOMMU operations (e.g map/unmap 
>> multiple pages).
>> +
> 
> I suppose that batching also implies preemption.
> 

Yes, the current implementation does it, but I haven't updated to doc on 
that aspect.

>> +## Hardware without IOMMU support
>> +
>> +Operating system needs to be aware on PV-IOMMU capability, and 
>> whether it is
>> +able to make contexts. However, some operating system may critically 
>> fail in
>> +case they are able to make a new IOMMU context. Which is supposed to 
>> happen
>> +if no IOMMU hardware is available.
>> +
>> +The hypercall interface needs a interface to advertise the ability to 
>> create
>> +and manage IOMMU contexts including the amount of context the guest 
>> is able
>> +to use. Using these informations, the Dom0 may decide whether to use 
>> or not
>> +the PV-IOMMU interface.
>> +
>> +## Page pool for contexts
>> +
>> +In order to prevent unexpected starving on the hypervisor memory with a
>> +buggy Dom0. We can preallocate the pages the contexts will use and make
>> +map/unmap use these pages instead of allocating them dynamically.
>> +
> 
> Regards,
>    Frediano
> 

Thanks
Teddy



Teddy Astie | Vates XCP-ng Developer

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech
diff mbox series

Patch

diff --git a/docs/designs/pv-iommu.md b/docs/designs/pv-iommu.md
new file mode 100644
index 0000000000..7df9fa0b94
--- /dev/null
+++ b/docs/designs/pv-iommu.md
@@ -0,0 +1,116 @@ 
+# IOMMU paravirtualization for Dom0
+
+Status: Experimental
+
+# Background
+
+By default, Xen only uses the IOMMU for itself, either to make device adress
+space coherent with guest adress space (x86 HVM/PVH) or to prevent devices
+from doing DMA outside it's expected memory regions including the hypervisor
+(x86 PV).
+
+A limitation is that guests (especially privildged ones) may want to use
+IOMMU hardware in order to implement features such as DMA protection and
+VFIO [1] as IOMMU functionality is not available outside of the hypervisor
+currently.
+
+[1] VFIO - "Virtual Function I/O" - https://www.kernel.org/doc/html/latest/driver-api/vfio.html
+
+# Design
+
+The operating system may want to have access to various IOMMU features such as
+context management and DMA remapping. We can create a new hypercall that allows
+the guest to have access to a new paravirtualized IOMMU interface.
+
+This feature is only meant to be available for the Dom0, as DomU have some
+emulated devices that can't be managed on Xen side and are not hardware, we
+can't rely on the hardware IOMMU to enforce DMA remapping.
+
+This interface is exposed under the `iommu_op` hypercall.
+
+In addition, Xen domains are modified in order to allow existence of several
+IOMMU context including a default one that implement default behavior (e.g
+hardware assisted paging) and can't be modified by guest. DomU cannot have
+contexts, and therefore act as if they only have the default domain.
+
+Each IOMMU context within a Xen domain is identified using a domain-specific
+context number that is used in the Xen IOMMU subsystem and the hypercall
+interface.
+
+The number of IOMMU context a domain is specified by either the toolstack or
+the domain itself.
+
+# IOMMU operations
+
+## Initialize PV-IOMMU
+
+Initialize PV-IOMMU for the domain.
+It can only be called once.
+
+## Alloc context
+
+Create a new IOMMU context for the guest and return the context number to the
+guest.
+Fail if the IOMMU context limit of the guest is reached.
+
+A flag can be specified to create a identity mapping.
+
+## Free context
+
+Destroy a IOMMU context created previously.
+It is not possible to free the default context.
+
+Reattach context devices to default context if specified by the guest.
+
+Fail if there is a device in the context and reattach-to-default flag is not
+specified.
+
+## Reattach device
+
+Reattach a device to another IOMMU context (including the default one).
+The target IOMMU context number must be valid and the context allocated.
+
+The guest needs to specify a PCI SBDF of a device he has access to.
+
+## Map/unmap page
+
+Map/unmap a page on a context.
+The guest needs to specify a gfn and target dfn to map.
+
+Refuse to create the mapping if one already exist for the same dfn.
+
+## Lookup page
+
+Get the gfn mapped by a specific dfn.
+
+## Remote command
+
+Make a PV-IOMMU operation on behalf of another domain.
+Especially useful for implementing IOMMU emulation (e.g using QEMU)
+or initializing PV-IOMMU with enforced limits.
+
+# Implementation considerations
+
+## Hypercall batching
+
+In order to prevent unneeded hypercalls and IOMMU flushing, it is advisable to
+be able to batch some critical IOMMU operations (e.g map/unmap multiple pages).
+
+## Hardware without IOMMU support
+
+Operating system needs to be aware on PV-IOMMU capability, and whether it is
+able to make contexts. However, some operating system may critically fail in
+case they are able to make a new IOMMU context. Which is supposed to happen
+if no IOMMU hardware is available.
+
+The hypercall interface needs a interface to advertise the ability to create
+and manage IOMMU contexts including the amount of context the guest is able
+to use. Using these informations, the Dom0 may decide whether to use or not
+the PV-IOMMU interface.
+
+## Page pool for contexts
+
+In order to prevent unexpected starving on the hypervisor memory with a
+buggy Dom0. We can preallocate the pages the contexts will use and make
+map/unmap use these pages instead of allocating them dynamically.
+