Message ID | 20231113222118.825758-5-stewart.hildebrand@amd.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Kconfig for PCI passthrough on ARM | expand |
On 13.11.2023 23:21, Stewart Hildebrand wrote: > --- a/xen/arch/x86/include/asm/domain.h > +++ b/xen/arch/x86/include/asm/domain.h > @@ -503,6 +503,8 @@ struct arch_domain > #define has_vpit(d) (!!((d)->arch.emulation_flags & X86_EMU_PIT)) > #define has_pirq(d) (!!((d)->arch.emulation_flags & X86_EMU_USE_PIRQ)) > > +#define arch_needs_vpci(d) ({ (void)(d); false; }) See my comments on the v5 thread on both this and ... > --- a/xen/drivers/passthrough/pci.c > +++ b/xen/drivers/passthrough/pci.c > @@ -1542,6 +1542,18 @@ void iommu_dev_iotlb_flush_timeout(struct domain *d, struct pci_dev *pdev) > pcidevs_unlock(); > } > > +static bool needs_vpci(const struct domain *d) > +{ > + if ( is_hardware_domain(d) ) > + return false; ... this. (It is generally a good idea to wait a little with sending new versions, when you can't be sure yet whether the earlier discussion has settled.) Jan
On 11/14/23 04:13, Jan Beulich wrote: > On 13.11.2023 23:21, Stewart Hildebrand wrote: >> --- a/xen/arch/x86/include/asm/domain.h >> +++ b/xen/arch/x86/include/asm/domain.h >> @@ -503,6 +503,8 @@ struct arch_domain >> #define has_vpit(d) (!!((d)->arch.emulation_flags & X86_EMU_PIT)) >> #define has_pirq(d) (!!((d)->arch.emulation_flags & X86_EMU_USE_PIRQ)) >> >> +#define arch_needs_vpci(d) ({ (void)(d); false; }) > > See my comments on the v5 thread on both this and ... So, the goal here is to return true for a PVH dom0, and false otherwise (for now). Since dom0 can't feasibly be full HVM, and is_hvm_domain(d) returns true for PVH, how about the following? /* TODO: re-visit when vPCI is enabled for PVH domUs. */ #define arch_needs_vpci(d) ({ \ const struct domain *_d = (d); \ is_hardware_domain(_d) && is_hvm_domain(_d); }) Link to v5 thread for reference [1] [1] https://lists.xenproject.org/archives/html/xen-devel/2023-11/msg00968.html > >> --- a/xen/drivers/passthrough/pci.c >> +++ b/xen/drivers/passthrough/pci.c >> @@ -1542,6 +1542,18 @@ void iommu_dev_iotlb_flush_timeout(struct domain *d, struct pci_dev *pdev) >> pcidevs_unlock(); >> } >> >> +static bool needs_vpci(const struct domain *d) >> +{ >> + if ( is_hardware_domain(d) ) >> + return false; > > ... this. I'll move this check to the Arm arch_needs_vpci() in xen/arch/arm/include/asm/domain.h > (It is generally a good idea to wait a little with sending new > versions, when you can't be sure yet whether the earlier discussion has > settled.) (Sorry, I'll be better about this going forward.) > > Jan
On 30.11.2023 03:47, Stewart Hildebrand wrote: > On 11/14/23 04:13, Jan Beulich wrote: >> On 13.11.2023 23:21, Stewart Hildebrand wrote: >>> --- a/xen/arch/x86/include/asm/domain.h >>> +++ b/xen/arch/x86/include/asm/domain.h >>> @@ -503,6 +503,8 @@ struct arch_domain >>> #define has_vpit(d) (!!((d)->arch.emulation_flags & X86_EMU_PIT)) >>> #define has_pirq(d) (!!((d)->arch.emulation_flags & X86_EMU_USE_PIRQ)) >>> >>> +#define arch_needs_vpci(d) ({ (void)(d); false; }) >> >> See my comments on the v5 thread on both this and ... > > So, the goal here is to return true for a PVH dom0, and false otherwise (for now). Since dom0 can't feasibly be full HVM, and is_hvm_domain(d) returns true for PVH, how about the following? > > /* TODO: re-visit when vPCI is enabled for PVH domUs. */ > #define arch_needs_vpci(d) ({ \ > const struct domain *_d = (d); \ > is_hardware_domain(_d) && is_hvm_domain(_d); }) Looks okay to me, except for the leading underscore in _d (see respective Misra guidelines, merely re-enforcing what the C standard says). Of course the double evaluate_nospec() isn't quite nice in the result, but I guess this isn't going to be used in many places? Jan
On 11/30/23 03:33, Jan Beulich wrote: > On 30.11.2023 03:47, Stewart Hildebrand wrote: >> On 11/14/23 04:13, Jan Beulich wrote: >>> On 13.11.2023 23:21, Stewart Hildebrand wrote: >>>> --- a/xen/arch/x86/include/asm/domain.h >>>> +++ b/xen/arch/x86/include/asm/domain.h >>>> @@ -503,6 +503,8 @@ struct arch_domain >>>> #define has_vpit(d) (!!((d)->arch.emulation_flags & X86_EMU_PIT)) >>>> #define has_pirq(d) (!!((d)->arch.emulation_flags & X86_EMU_USE_PIRQ)) >>>> >>>> +#define arch_needs_vpci(d) ({ (void)(d); false; }) >>> >>> See my comments on the v5 thread on both this and ... >> >> So, the goal here is to return true for a PVH dom0, and false otherwise (for now). Since dom0 can't feasibly be full HVM, and is_hvm_domain(d) returns true for PVH, how about the following? >> >> /* TODO: re-visit when vPCI is enabled for PVH domUs. */ >> #define arch_needs_vpci(d) ({ \ >> const struct domain *_d = (d); \ >> is_hardware_domain(_d) && is_hvm_domain(_d); }) > > Looks okay to me, except for the leading underscore in _d (see respective > Misra guidelines, merely re-enforcing what the C standard says). Right. If I'm interpreting the standards correctly, it looks like a trailing underscore would work, seeing as we haven't adopted MISRA C:2012 Dir 4.5 (yet?). > Of course > the double evaluate_nospec() isn't quite nice in the result, but I guess > this isn't going to be used in many places? Only in XEN_DOMCTL_assign_device, as far as I'm aware. > > Jan
On 30.11.2023 18:06, Stewart Hildebrand wrote: > On 11/30/23 03:33, Jan Beulich wrote: >> On 30.11.2023 03:47, Stewart Hildebrand wrote: >>> On 11/14/23 04:13, Jan Beulich wrote: >>>> On 13.11.2023 23:21, Stewart Hildebrand wrote: >>>>> --- a/xen/arch/x86/include/asm/domain.h >>>>> +++ b/xen/arch/x86/include/asm/domain.h >>>>> @@ -503,6 +503,8 @@ struct arch_domain >>>>> #define has_vpit(d) (!!((d)->arch.emulation_flags & X86_EMU_PIT)) >>>>> #define has_pirq(d) (!!((d)->arch.emulation_flags & X86_EMU_USE_PIRQ)) >>>>> >>>>> +#define arch_needs_vpci(d) ({ (void)(d); false; }) >>>> >>>> See my comments on the v5 thread on both this and ... >>> >>> So, the goal here is to return true for a PVH dom0, and false otherwise (for now). Since dom0 can't feasibly be full HVM, and is_hvm_domain(d) returns true for PVH, how about the following? >>> >>> /* TODO: re-visit when vPCI is enabled for PVH domUs. */ >>> #define arch_needs_vpci(d) ({ \ >>> const struct domain *_d = (d); \ >>> is_hardware_domain(_d) && is_hvm_domain(_d); }) >> >> Looks okay to me, except for the leading underscore in _d (see respective >> Misra guidelines, merely re-enforcing what the C standard says). > > Right. If I'm interpreting the standards correctly, it looks like a trailing underscore would work, seeing as we haven't adopted MISRA C:2012 Dir 4.5 (yet?). Adopting the respective Misra rule would only affirm that we should adhere to the C spec. Us being free-standing (and hence no runtime library involved) may have been an argument towards more relaxed treatment, but imo never was a good justification. And yes, trailing underscores is what I have been recommending, but some of the other maintainers don't really like them (without, iirc, indicating what else to use as an underlying naming scheme). Jan
On Mon, Nov 13, 2023 at 05:21:13PM -0500, Stewart Hildebrand wrote: > @@ -1618,6 +1630,14 @@ int iommu_do_pci_domctl( > bus = PCI_BUS(machine_sbdf); > devfn = PCI_DEVFN(machine_sbdf); > > + if ( needs_vpci(d) && !has_vpci(d) ) > + { > + printk(XENLOG_G_WARNING "Cannot assign %pp to %pd: vPCI support not enabled\n", > + &PCI_SBDF(seg, bus, devfn), d); > + ret = -EPERM; > + break; I think this is likely too restrictive going forward. The current approach is indeed to enable vPCI on a per-domain basis because that's how PVH dom0 uses it, due to being unable to use ioreq servers. If we start to expose vPCI suport to guests the interface should be on a per-device basis, so that vPCI could be enabled for some devices, while others could still be handled by ioreq servers. We might want to add a new flag to xen_domctl_assign_device (used by XEN_DOMCTL_assign_device) in order to signal whether the device will use vPCI. Thanks, Roger.
On Fri, 1 Dec 2023, Roger Pau Monné wrote: > On Mon, Nov 13, 2023 at 05:21:13PM -0500, Stewart Hildebrand wrote: > > @@ -1618,6 +1630,14 @@ int iommu_do_pci_domctl( > > bus = PCI_BUS(machine_sbdf); > > devfn = PCI_DEVFN(machine_sbdf); > > > > + if ( needs_vpci(d) && !has_vpci(d) ) > > + { > > + printk(XENLOG_G_WARNING "Cannot assign %pp to %pd: vPCI support not enabled\n", > > + &PCI_SBDF(seg, bus, devfn), d); > > + ret = -EPERM; > > + break; > > I think this is likely too restrictive going forward. The current > approach is indeed to enable vPCI on a per-domain basis because that's > how PVH dom0 uses it, due to being unable to use ioreq servers. > > If we start to expose vPCI suport to guests the interface should be on > a per-device basis, so that vPCI could be enabled for some devices, > while others could still be handled by ioreq servers. > > We might want to add a new flag to xen_domctl_assign_device (used by > XEN_DOMCTL_assign_device) in order to signal whether the device will > use vPCI. Actually I don't think this is a good idea. I am all for flexibility but supporting multiple different configurations comes at an extra cost for both maintainers and contributors. I think we should try to reduce the amount of configurations we support rather than increasing them (especially on x86 where we have PV, PVH, HVM). I don't think we should enable IOREQ servers to handle PCI passthrough for PVH guests and/or guests with vPCI. If the domain has vPCI, PCI Passthrough can be handled by vPCI just fine. I think this should be a good anti-feature to have (a goal to explicitly not add this feature) to reduce complexity. Unless you see a specific usecase to add support for it?
On 12/1/23 21:56, Stefano Stabellini wrote: > On Fri, 1 Dec 2023, Roger Pau Monné wrote: >> On Mon, Nov 13, 2023 at 05:21:13PM -0500, Stewart Hildebrand wrote: >>> @@ -1618,6 +1630,14 @@ int iommu_do_pci_domctl( >>> bus = PCI_BUS(machine_sbdf); >>> devfn = PCI_DEVFN(machine_sbdf); >>> >>> + if ( needs_vpci(d) && !has_vpci(d) ) >>> + { >>> + printk(XENLOG_G_WARNING "Cannot assign %pp to %pd: vPCI support not enabled\n", >>> + &PCI_SBDF(seg, bus, devfn), d); >>> + ret = -EPERM; >>> + break; >> >> I think this is likely too restrictive going forward. The current >> approach is indeed to enable vPCI on a per-domain basis because that's >> how PVH dom0 uses it, due to being unable to use ioreq servers. >> >> If we start to expose vPCI suport to guests the interface should be on >> a per-device basis, so that vPCI could be enabled for some devices, >> while others could still be handled by ioreq servers. >> >> We might want to add a new flag to xen_domctl_assign_device (used by >> XEN_DOMCTL_assign_device) in order to signal whether the device will >> use vPCI. > > Actually I don't think this is a good idea. I am all for flexibility but > supporting multiple different configurations comes at an extra cost for > both maintainers and contributors. I think we should try to reduce the > amount of configurations we support rather than increasing them > (especially on x86 where we have PV, PVH, HVM). > > I don't think we should enable IOREQ servers to handle PCI passthrough > for PVH guests and/or guests with vPCI. If the domain has vPCI, PCI > Passthrough can be handled by vPCI just fine. I think this should be a > good anti-feature to have (a goal to explicitly not add this feature) to > reduce complexity. Unless you see a specific usecase to add support for > it? Just to preemptively clarify: there is a use case for passthrough (vPCI) and emulated virtio-pci devices (ioreq). However, the XEN_DOMCTL_assign_device hypercall, where this check is added, is only used for real physical hardware devices as far as I can tell. So I agree, I can't see a use case for passing through some physical devices with vPCI and some physical devices with qemu/ioreq. With that said, the plumbing for a new flag does not appear to be particularly complex. It may actually be simpler than introducing a per-arch needs_vpci(d) heuristic. diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c index 96cb4da0794e..2c38088a4772 100644 --- a/tools/libs/light/libxl_pci.c +++ b/tools/libs/light/libxl_pci.c @@ -1113,6 +1113,7 @@ typedef struct pci_add_state { libxl_device_pci pci; libxl_domid pci_domid; int retries; + bool vpci; } pci_add_state; static void pci_add_qemu_trad_watch_state_cb(libxl__egc *egc, @@ -1176,6 +1177,10 @@ static void do_pci_add(libxl__egc *egc, } } + if (type == LIBXL_DOMAIN_TYPE_PVH /* includes Arm guests */) { + pas->vpci = true; + } + rc = 0; out: @@ -1418,7 +1423,8 @@ static void pci_add_dm_done(libxl__egc *egc, unsigned long long start, end, flags, size; int irq, i; int r; - uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED; + uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED | + (pas->vpci ? XEN_DOMCTL_DEV_USES_VPCI : 0); uint32_t domainid = domid; bool isstubdom = libxl_is_stubdom(ctx, domid, &domainid); diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c index 2203725a2aa6..7786da1cf1e6 100644 --- a/xen/drivers/passthrough/pci.c +++ b/xen/drivers/passthrough/pci.c @@ -1630,7 +1630,7 @@ int iommu_do_pci_domctl( bus = PCI_BUS(machine_sbdf); devfn = PCI_DEVFN(machine_sbdf); - if ( needs_vpci(d) && !has_vpci(d) ) + if ( (flags & XEN_DOMCTL_DEV_USES_VPCI) && !has_vpci(d) ) { printk(XENLOG_G_WARNING "Cannot assign %pp to %pd: vPCI support not enabled\n", &PCI_SBDF(seg, bus, devfn), d); diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h index 8b3ea62cae06..5735d47219bc 100644 --- a/xen/include/public/domctl.h +++ b/xen/include/public/domctl.h @@ -527,6 +527,7 @@ struct xen_domctl_assign_device { uint32_t dev; /* XEN_DOMCTL_DEV_* */ uint32_t flags; #define XEN_DOMCTL_DEV_RDM_RELAXED 1 /* assign only */ +#define XEN_DOMCTL_DEV_USES_VPCI (1 << 1) union { struct { uint32_t machine_sbdf; /* machine PCI ID of assigned device */
On 02.12.2023 03:56, Stefano Stabellini wrote: > On Fri, 1 Dec 2023, Roger Pau Monné wrote: >> On Mon, Nov 13, 2023 at 05:21:13PM -0500, Stewart Hildebrand wrote: >>> @@ -1618,6 +1630,14 @@ int iommu_do_pci_domctl( >>> bus = PCI_BUS(machine_sbdf); >>> devfn = PCI_DEVFN(machine_sbdf); >>> >>> + if ( needs_vpci(d) && !has_vpci(d) ) >>> + { >>> + printk(XENLOG_G_WARNING "Cannot assign %pp to %pd: vPCI support not enabled\n", >>> + &PCI_SBDF(seg, bus, devfn), d); >>> + ret = -EPERM; >>> + break; >> >> I think this is likely too restrictive going forward. The current >> approach is indeed to enable vPCI on a per-domain basis because that's >> how PVH dom0 uses it, due to being unable to use ioreq servers. >> >> If we start to expose vPCI suport to guests the interface should be on >> a per-device basis, so that vPCI could be enabled for some devices, >> while others could still be handled by ioreq servers. >> >> We might want to add a new flag to xen_domctl_assign_device (used by >> XEN_DOMCTL_assign_device) in order to signal whether the device will >> use vPCI. > > Actually I don't think this is a good idea. I am all for flexibility but > supporting multiple different configurations comes at an extra cost for > both maintainers and contributors. I think we should try to reduce the > amount of configurations we support rather than increasing them > (especially on x86 where we have PV, PVH, HVM). > > I don't think we should enable IOREQ servers to handle PCI passthrough > for PVH guests and/or guests with vPCI. If the domain has vPCI, PCI > Passthrough can be handled by vPCI just fine. I think this should be a > good anti-feature to have (a goal to explicitly not add this feature) to > reduce complexity. Unless you see a specific usecase to add support for > it? I could see very special devices to want accompanying by a special ioreq server. I could also see tandem pass-through/emulated device pairs to potentially be of use and require coordination in the same ioreq server. That said, I wouldn't insist on allowing a mix of vPCI and ioreq servers to be used until an actual use case (with code supporting it) actually appears. Jan
On Fri, Dec 01, 2023 at 06:56:32PM -0800, Stefano Stabellini wrote: > On Fri, 1 Dec 2023, Roger Pau Monné wrote: > > On Mon, Nov 13, 2023 at 05:21:13PM -0500, Stewart Hildebrand wrote: > > > @@ -1618,6 +1630,14 @@ int iommu_do_pci_domctl( > > > bus = PCI_BUS(machine_sbdf); > > > devfn = PCI_DEVFN(machine_sbdf); > > > > > > + if ( needs_vpci(d) && !has_vpci(d) ) > > > + { > > > + printk(XENLOG_G_WARNING "Cannot assign %pp to %pd: vPCI support not enabled\n", > > > + &PCI_SBDF(seg, bus, devfn), d); > > > + ret = -EPERM; > > > + break; > > > > I think this is likely too restrictive going forward. The current > > approach is indeed to enable vPCI on a per-domain basis because that's > > how PVH dom0 uses it, due to being unable to use ioreq servers. > > > > If we start to expose vPCI suport to guests the interface should be on > > a per-device basis, so that vPCI could be enabled for some devices, > > while others could still be handled by ioreq servers. > > > > We might want to add a new flag to xen_domctl_assign_device (used by > > XEN_DOMCTL_assign_device) in order to signal whether the device will > > use vPCI. > > Actually I don't think this is a good idea. I am all for flexibility but > supporting multiple different configurations comes at an extra cost for > both maintainers and contributors. I think we should try to reduce the > amount of configurations we support rather than increasing them > (especially on x86 where we have PV, PVH, HVM). I think it's perfectly fine to initially require a domain to have all its devices either passed through using vPCI or ireqs, but the interface IMO should allow for such differentiation in the future. That's why I think introducing a domain wide vPCI flag might not be the best option going forward. It would be perfectly fine for XEN_DOMCTL_assign_device to set a domain wide vPCI flag, iow: if ( HYPERCALL_VPCI_FLAG_SET && !has_vpci(d) ) { if ( has_arch_pdevs(d) ) { printk("All passthrough devices must use the same backend\n"); return -EINVAL; } /* Set vPCI domain flag */ } We have already agreed that we want to aim for a setup where ioreqs and vPCI could be used for the same domain, but I guess you assumed that ioreqs would be used for emulated devices exclusively and vPCI for passthrough devices? Overall if we agree that ioreqs and vPCI should co-exist for a domain, I'm not sure there's much reason to limit ioreqs to only handle emulated devices, seems like an arbitrary limitation. > I don't think we should enable IOREQ servers to handle PCI passthrough > for PVH guests and/or guests with vPCI. If the domain has vPCI, PCI > Passthrough can be handled by vPCI just fine. I think this should be a > good anti-feature to have (a goal to explicitly not add this feature) to > reduce complexity. Unless you see a specific usecase to add support for > it? There are passthrough devices (GPUs) that might require some extra mediation on dom0 (like the Intel GVT-g thing) that would force the usage of ioreqs to passthrough. It's important that the interfaces we introduce are correct IMO, because that ends up reflecting on the configuration options that we expose to users on xl/libxl. While both XEN_DOMCTL_createdomain and XEN_DOMCTL_assign_device are unstable interfaces, how the vPCI option gets placed there will ultimately influence how the option gets exposed in xl/libxl, and the interface there is relevant to keep stable for end user sanity. Thanks, Roger.
On 12/4/23 05:58, Roger Pau Monné wrote: > On Fri, Dec 01, 2023 at 06:56:32PM -0800, Stefano Stabellini wrote: >> On Fri, 1 Dec 2023, Roger Pau Monné wrote: >>> On Mon, Nov 13, 2023 at 05:21:13PM -0500, Stewart Hildebrand wrote: >>>> @@ -1618,6 +1630,14 @@ int iommu_do_pci_domctl( >>>> bus = PCI_BUS(machine_sbdf); >>>> devfn = PCI_DEVFN(machine_sbdf); >>>> >>>> + if ( needs_vpci(d) && !has_vpci(d) ) >>>> + { >>>> + printk(XENLOG_G_WARNING "Cannot assign %pp to %pd: vPCI support not enabled\n", >>>> + &PCI_SBDF(seg, bus, devfn), d); >>>> + ret = -EPERM; >>>> + break; >>> >>> I think this is likely too restrictive going forward. The current >>> approach is indeed to enable vPCI on a per-domain basis because that's >>> how PVH dom0 uses it, due to being unable to use ioreq servers. >>> >>> If we start to expose vPCI suport to guests the interface should be on >>> a per-device basis, so that vPCI could be enabled for some devices, >>> while others could still be handled by ioreq servers. >>> >>> We might want to add a new flag to xen_domctl_assign_device (used by >>> XEN_DOMCTL_assign_device) in order to signal whether the device will >>> use vPCI. >> >> Actually I don't think this is a good idea. I am all for flexibility but >> supporting multiple different configurations comes at an extra cost for >> both maintainers and contributors. I think we should try to reduce the >> amount of configurations we support rather than increasing them >> (especially on x86 where we have PV, PVH, HVM). > > I think it's perfectly fine to initially require a domain to have all > its devices either passed through using vPCI or ireqs, but the > interface IMO should allow for such differentiation in the future. > That's why I think introducing a domain wide vPCI flag might not be > the best option going forward. > > It would be perfectly fine for XEN_DOMCTL_assign_device to set a > domain wide vPCI flag, iow: > > if ( HYPERCALL_VPCI_FLAG_SET && !has_vpci(d) ) > { > if ( has_arch_pdevs(d) ) > { > printk("All passthrough devices must use the same backend\n"); > return -EINVAL; > } > > /* Set vPCI domain flag */ There is a vPCI initializer that would need to be called too (on Arm, to set up mmio handlers). It is normally called earlier during arch_domain_create(), but it looks to be okay to call here as long as it is done before the guest boots. d->options |= XEN_DOMCTL_CDF_vpci; domain_vpci_init(d); Perhaps the flag should be set inside domain_vpci_init(d). > } > > We have already agreed that we want to aim for a setup where ioreqs > and vPCI could be used for the same domain, but I guess you assumed > that ioreqs would be used for emulated devices exclusively and vPCI > for passthrough devices? > > Overall if we agree that ioreqs and vPCI should co-exist for a domain, > I'm not sure there's much reason to limit ioreqs to only handle > emulated devices, seems like an arbitrary limitation. > >> I don't think we should enable IOREQ servers to handle PCI passthrough >> for PVH guests and/or guests with vPCI. If the domain has vPCI, PCI >> Passthrough can be handled by vPCI just fine. I think this should be a >> good anti-feature to have (a goal to explicitly not add this feature) to >> reduce complexity. Unless you see a specific usecase to add support for >> it? > > There are passthrough devices (GPUs) that might require some extra > mediation on dom0 (like the Intel GVT-g thing) that would force the > usage of ioreqs to passthrough. > > It's important that the interfaces we introduce are correct IMO, > because that ends up reflecting on the configuration options that we > expose to users on xl/libxl. While both XEN_DOMCTL_createdomain and > XEN_DOMCTL_assign_device are unstable interfaces, how the vPCI option > gets placed there will ultimately influence how the option gets > exposed in xl/libxl, and the interface there is relevant to keep > stable for end user sanity. > > Thanks, Roger.
On Mon, 4 Dec 2023, Roger Pau Monné wrote: > On Fri, Dec 01, 2023 at 06:56:32PM -0800, Stefano Stabellini wrote: > > On Fri, 1 Dec 2023, Roger Pau Monné wrote: > > > On Mon, Nov 13, 2023 at 05:21:13PM -0500, Stewart Hildebrand wrote: > > > > @@ -1618,6 +1630,14 @@ int iommu_do_pci_domctl( > > > > bus = PCI_BUS(machine_sbdf); > > > > devfn = PCI_DEVFN(machine_sbdf); > > > > > > > > + if ( needs_vpci(d) && !has_vpci(d) ) > > > > + { > > > > + printk(XENLOG_G_WARNING "Cannot assign %pp to %pd: vPCI support not enabled\n", > > > > + &PCI_SBDF(seg, bus, devfn), d); > > > > + ret = -EPERM; > > > > + break; > > > > > > I think this is likely too restrictive going forward. The current > > > approach is indeed to enable vPCI on a per-domain basis because that's > > > how PVH dom0 uses it, due to being unable to use ioreq servers. > > > > > > If we start to expose vPCI suport to guests the interface should be on > > > a per-device basis, so that vPCI could be enabled for some devices, > > > while others could still be handled by ioreq servers. > > > > > > We might want to add a new flag to xen_domctl_assign_device (used by > > > XEN_DOMCTL_assign_device) in order to signal whether the device will > > > use vPCI. > > > > Actually I don't think this is a good idea. I am all for flexibility but > > supporting multiple different configurations comes at an extra cost for > > both maintainers and contributors. I think we should try to reduce the > > amount of configurations we support rather than increasing them > > (especially on x86 where we have PV, PVH, HVM). > > I think it's perfectly fine to initially require a domain to have all > its devices either passed through using vPCI or ireqs, but the > interface IMO should allow for such differentiation in the future. > That's why I think introducing a domain wide vPCI flag might not be > the best option going forward. > > It would be perfectly fine for XEN_DOMCTL_assign_device to set a > domain wide vPCI flag, iow: > > if ( HYPERCALL_VPCI_FLAG_SET && !has_vpci(d) ) > { > if ( has_arch_pdevs(d) ) > { > printk("All passthrough devices must use the same backend\n"); > return -EINVAL; > } > > /* Set vPCI domain flag */ > } That would be fine by me, but maybe we can avoid this change too. I was imagining that vPCI would be enabled at domain creation, not at runtime. And that vPCI would be enabled by default for all PVH guests (once we are past the initial experimental phase.) > We have already agreed that we want to aim for a setup where ioreqs > and vPCI could be used for the same domain, but I guess you assumed > that ioreqs would be used for emulated devices exclusively and vPCI > for passthrough devices? Yes, that's right > Overall if we agree that ioreqs and vPCI should co-exist for a domain, > I'm not sure there's much reason to limit ioreqs to only handle > emulated devices, seems like an arbitrary limitation. Reply below > > I don't think we should enable IOREQ servers to handle PCI passthrough > > for PVH guests and/or guests with vPCI. If the domain has vPCI, PCI > > Passthrough can be handled by vPCI just fine. I think this should be a > > good anti-feature to have (a goal to explicitly not add this feature) to > > reduce complexity. Unless you see a specific usecase to add support for > > it? > > There are passthrough devices (GPUs) that might require some extra > mediation on dom0 (like the Intel GVT-g thing) that would force the > usage of ioreqs to passthrough. From an architectural perspective, I think it would be cleaner, simpler to maintain, and simpler to understand if Xen was the sole owner of the PCI Root Complex and PCI config space mediation (implemented with vPCI). IOREQ can be used for emulation and it works very well for that. At least in my mind, that makes things much simpler. I understand there are non-trivial cases, like virtual GPUs with hardware access, but I don't classify those as passthrough. That's because there isn't one device that gets fully assigned to the guest. Instead, there is an emulated device (hence IOREQ) with certain MMIO regions and interrupts that end up being directly mapped from real hardware. So I think it is natural in those cases to use IOREQ and it is also natural to have QEMU remap MMIO/IRQs at runtime. From a vPCI perspective, I hope it will mostly look as if the device is assigned to Dom0. Even if it ends up being more complex than that, Rome wasn't built in one day, and I don't think we should try to solve this problem on day1 (as long as the interfaces are not stable interfaces). > It's important that the interfaces we introduce are correct IMO, > because that ends up reflecting on the configuration options that we > expose to users on xl/libxl. While both XEN_DOMCTL_createdomain and > XEN_DOMCTL_assign_device are unstable interfaces, how the vPCI option > gets placed there will ultimately influence how the option gets > exposed in xl/libxl, and the interface there is relevant to keep > stable for end user sanity. I agree with you on the stable interfaces. The important part is not to introduce changes to stable interfaces that could limit us in the future. Specifically that includes xl and libxl, we need to be careful there. But I don't see a single per-domain vPCI enable/disable option as a problem. Let's say that in the future we have a mediated vGPU implementation: if it works together with vPCI then the per-domain vPCI option in libxl will be enabled (either explicitely or by default), if it doesn't then vPCI will be disabled (either explicitely or by the newer vGPU options.) For *unstable* interfaces (XEN_DOMCTL_assign_device) I would rather wait before adding more changes on top of them, not because I don't care about the mediated GPU problem (we do have something similar at AMD), but because I worry that if we try to change them now we might not do a good enough job. I would prefer to wait until we know more about the actual use case, ideally with code supporting it. I think the difference in points of views comes from the fact that I see vPCI as the default, QEMU only as a limited peripheral emulator (or mediator for the vGPU case) but not in control. vPCI and QEMU are not equal in my view. vPCI is in charge and always present if not in very uncommon setups (even if we decide to hook it inside Xen by using internal IOREQ interfaces). QEMU might come and go. Now that I am writing this, I realize this is also why I wasn't too happy with the idea of hooking vPCI using IOREQ. It makes them look as if they are the same, while I don't they should be considered at the same level of priority, criticality, safety, integration in the system, etc.
On Mon, Dec 04, 2023 at 02:07:51PM -0800, Stefano Stabellini wrote: > On Mon, 4 Dec 2023, Roger Pau Monné wrote: > > On Fri, Dec 01, 2023 at 06:56:32PM -0800, Stefano Stabellini wrote: > > > On Fri, 1 Dec 2023, Roger Pau Monné wrote: > > > > On Mon, Nov 13, 2023 at 05:21:13PM -0500, Stewart Hildebrand wrote: > > > > > @@ -1618,6 +1630,14 @@ int iommu_do_pci_domctl( > > > > > bus = PCI_BUS(machine_sbdf); > > > > > devfn = PCI_DEVFN(machine_sbdf); > > > > > > > > > > + if ( needs_vpci(d) && !has_vpci(d) ) > > > > > + { > > > > > + printk(XENLOG_G_WARNING "Cannot assign %pp to %pd: vPCI support not enabled\n", > > > > > + &PCI_SBDF(seg, bus, devfn), d); > > > > > + ret = -EPERM; > > > > > + break; > > > > > > > > I think this is likely too restrictive going forward. The current > > > > approach is indeed to enable vPCI on a per-domain basis because that's > > > > how PVH dom0 uses it, due to being unable to use ioreq servers. > > > > > > > > If we start to expose vPCI suport to guests the interface should be on > > > > a per-device basis, so that vPCI could be enabled for some devices, > > > > while others could still be handled by ioreq servers. > > > > > > > > We might want to add a new flag to xen_domctl_assign_device (used by > > > > XEN_DOMCTL_assign_device) in order to signal whether the device will > > > > use vPCI. > > > > > > Actually I don't think this is a good idea. I am all for flexibility but > > > supporting multiple different configurations comes at an extra cost for > > > both maintainers and contributors. I think we should try to reduce the > > > amount of configurations we support rather than increasing them > > > (especially on x86 where we have PV, PVH, HVM). > > > > I think it's perfectly fine to initially require a domain to have all > > its devices either passed through using vPCI or ireqs, but the > > interface IMO should allow for such differentiation in the future. > > That's why I think introducing a domain wide vPCI flag might not be > > the best option going forward. > > > > It would be perfectly fine for XEN_DOMCTL_assign_device to set a > > domain wide vPCI flag, iow: > > > > if ( HYPERCALL_VPCI_FLAG_SET && !has_vpci(d) ) > > { > > if ( has_arch_pdevs(d) ) > > { > > printk("All passthrough devices must use the same backend\n"); > > return -EINVAL; > > } > > > > /* Set vPCI domain flag */ > > } > > That would be fine by me, but maybe we can avoid this change too. I was > imagining that vPCI would be enabled at domain creation, not at runtime. > And that vPCI would be enabled by default for all PVH guests (once we > are past the initial experimental phase.) Then we don't even need a new CDF flag, and just enable vPCI when IOMMU is enabled? IOW: we can key the enabling of vPCI to XEN_DOMCTL_CDF_iommu for specific domain types? Maybe that's not so trivial on x86, as there's no x86 PVH domain type from the hypervisor PoV. > > > We have already agreed that we want to aim for a setup where ioreqs > > and vPCI could be used for the same domain, but I guess you assumed > > that ioreqs would be used for emulated devices exclusively and vPCI > > for passthrough devices? > > Yes, that's right > > > > Overall if we agree that ioreqs and vPCI should co-exist for a domain, > > I'm not sure there's much reason to limit ioreqs to only handle > > emulated devices, seems like an arbitrary limitation. > > Reply below > > > > > I don't think we should enable IOREQ servers to handle PCI passthrough > > > for PVH guests and/or guests with vPCI. If the domain has vPCI, PCI > > > Passthrough can be handled by vPCI just fine. I think this should be a > > > good anti-feature to have (a goal to explicitly not add this feature) to > > > reduce complexity. Unless you see a specific usecase to add support for > > > it? > > > > There are passthrough devices (GPUs) that might require some extra > > mediation on dom0 (like the Intel GVT-g thing) that would force the > > usage of ioreqs to passthrough. > > From an architectural perspective, I think it would be cleaner, simpler > to maintain, and simpler to understand if Xen was the sole owner of the > PCI Root Complex and PCI config space mediation (implemented with vPCI). > IOREQ can be used for emulation and it works very well for that. At > least in my mind, that makes things much simpler. But IOREQ already has all the code to mediate accesses to the PCI config space, and the interface to register separate servers for different PCI devices. We would then need to duplicate this internally for vPCI, so that vPCI could forward accesses to IOREQ just for IOREQ to forward to yet a different component? Seems like a lot of duplication for no benefit. > I understand there are non-trivial cases, like virtual GPUs with > hardware access, but I don't classify those as passthrough. That's > because there isn't one device that gets fully assigned to the guest. > Instead, there is an emulated device (hence IOREQ) with certain MMIO > regions and interrupts that end up being directly mapped from real > hardware. > > So I think it is natural in those cases to use IOREQ and it is also > natural to have QEMU remap MMIO/IRQs at runtime. From a vPCI > perspective, I hope it will mostly look as if the device is assigned to > Dom0. Even if it ends up being more complex than that, Rome wasn't > built in one day, and I don't think we should try to solve this problem > on day1 (as long as the interfaces are not stable interfaces). I don't see IOREQ as dealing explicitly with emulation. Yes, it does allow for emulators to be implemented in user-space, but at the end it's just an interface that allows forwarding accesses to certain resources (for the case we are speaking about here, PCI config space) to entities that registered as handlers. vPCI OTOH just deals with a very specific resource (PCI config space) and only allows internal handlers to be registered on a byte granularity. So your proposal would be to implement a hierarchy like the one on the diagram below: ┌────────┐ ┌──────────┐ ┌──────────────────┐ │ Memory │ │ IO Ports │ │ PCI config space │ └───────┬┘ └┬─────────┘ └───┬──────────────┘ │ │ │ │ │ ┌───┴──┐ │ │ │ vPCI │ │ │ └─┬──┬─┘ ┌──┴───┴┐ │ │ │ IOREQ ├────────────┘ │ └────┬──┘ │ │ │ ┌────────────┴──┐ ┌┴──────────────┐ │ IOREQ servers │ │ vPCI handlers │ └───────────────┘ └───────────────┘ While what I'm proposing would look like: ┌────────┐ ┌──────────┐ ┌──────────────────┐ │ Memory │ │ IO Ports │ │ PCI config space │ └────┬───┘ └────┬─────┘ └────────┬─────────┘ │ │ │ └─────┬────┴────┬───────────┘ │ IOREQ │ └─┬─────┬─┘ │ │ ┌───────────────┤ └────┬──────┐ │ IOREQ servers │ │ vPCI │ └───────────────┘ └───┬──┘ │ ┌───┴───────────┐ │ vPCI handlers │ └───────────────┘ I'm obviously biased, but I think the latter is cleaner, and allows all resources to be arbitrated by the same component (IOREQ). If the concern is about the IOREQ hypercall interface, it would be fine to introduce an option that limit IOREQs to internal users (vPCI) without supporting external IOREQ servers. Think of IOREQ as a resource mediator inside of Xen, that just does the PCI address decoding and forwards the access to the interested party, either an external IOREQ server or vPCI. > > > It's important that the interfaces we introduce are correct IMO, > > because that ends up reflecting on the configuration options that we > > expose to users on xl/libxl. While both XEN_DOMCTL_createdomain and > > XEN_DOMCTL_assign_device are unstable interfaces, how the vPCI option > > gets placed there will ultimately influence how the option gets > > exposed in xl/libxl, and the interface there is relevant to keep > > stable for end user sanity. > > I agree with you on the stable interfaces. The important part is not to > introduce changes to stable interfaces that could limit us in the > future. Specifically that includes xl and libxl, we need to be careful > there. But I don't see a single per-domain vPCI enable/disable option as > a problem. Let's say that in the future we have a mediated vGPU > implementation: if it works together with vPCI then the per-domain vPCI > option in libxl will be enabled (either explicitely or by default), if > it doesn't then vPCI will be disabled (either explicitely or by the > newer vGPU options.) If vPCI is hooked into IOREQ there won't be a need anymore to register the vPCI config space traps, as that would be done by IOREQ, and hence vPCI managed devices could be registered at runtime with IOREQ. IOW: there won't be a need anymore to signal at domain creation whether vPCI is intended to be used or not. We would obviously need to enable IOREQ for all domains with IOMMU enabled, as it would be IOREQ that register the PCI config space handlers. > For *unstable* interfaces (XEN_DOMCTL_assign_device) I would rather wait > before adding more changes on top of them, not because I don't care > about the mediated GPU problem (we do have something similar at AMD), > but because I worry that if we try to change them now we might not do a > good enough job. I would prefer to wait until we know more about the > actual use case, ideally with code supporting it. > > I think the difference in points of views comes from the fact that I see > vPCI as the default, QEMU only as a limited peripheral emulator (or > mediator for the vGPU case) but not in control. vPCI and QEMU are not > equal in my view. vPCI is in charge and always present if not in very > uncommon setups (even if we decide to hook it inside Xen by using > internal IOREQ interfaces). QEMU might come and go. Xen needs a single component that mediates accesses to resources, whether that's IOREQ, or something else I don't really care that much. Having vPCI mediate accesses to the PCI config space, and IOREQ to the memory (and on x86 IO port) space just seems awfully complicated for AFAICT no real benefit. Also, you seem to confabulate IOREQ with QEMU, while the latter is indeed an user of IOREQ, I do see IOREQ as a simple resource mediator inside of Xen, that has the ability to forward such accesses to external emulators using an hypercall interface. > Now that I am writing this, I realize this is also why I wasn't too > happy with the idea of hooking vPCI using IOREQ. It makes them look as > if they are the same, while I don't they should be considered at the > same level of priority, criticality, safety, integration in the system, > etc. I feel there are some fears with IOREQ from a safety PoV? The code that does the resource multiplexing is small, and as said above if there are safety concerns with the hypercall interface it would be fine to limit it's usage to internal handlers only. Thanks, Roger.
On 12/5/23 06:08, Roger Pau Monné wrote: > On Mon, Dec 04, 2023 at 02:07:51PM -0800, Stefano Stabellini wrote: >> On Mon, 4 Dec 2023, Roger Pau Monné wrote: >>> On Fri, Dec 01, 2023 at 06:56:32PM -0800, Stefano Stabellini wrote: >>>> On Fri, 1 Dec 2023, Roger Pau Monné wrote: >>>>> On Mon, Nov 13, 2023 at 05:21:13PM -0500, Stewart Hildebrand wrote: >>>>>> @@ -1618,6 +1630,14 @@ int iommu_do_pci_domctl( >>>>>> bus = PCI_BUS(machine_sbdf); >>>>>> devfn = PCI_DEVFN(machine_sbdf); >>>>>> >>>>>> + if ( needs_vpci(d) && !has_vpci(d) ) >>>>>> + { >>>>>> + printk(XENLOG_G_WARNING "Cannot assign %pp to %pd: vPCI support not enabled\n", >>>>>> + &PCI_SBDF(seg, bus, devfn), d); >>>>>> + ret = -EPERM; >>>>>> + break; >>>>> >>>>> I think this is likely too restrictive going forward. The current >>>>> approach is indeed to enable vPCI on a per-domain basis because that's >>>>> how PVH dom0 uses it, due to being unable to use ioreq servers. >>>>> >>>>> If we start to expose vPCI suport to guests the interface should be on >>>>> a per-device basis, so that vPCI could be enabled for some devices, >>>>> while others could still be handled by ioreq servers. >>>>> >>>>> We might want to add a new flag to xen_domctl_assign_device (used by >>>>> XEN_DOMCTL_assign_device) in order to signal whether the device will >>>>> use vPCI. >>>> >>>> Actually I don't think this is a good idea. I am all for flexibility but >>>> supporting multiple different configurations comes at an extra cost for >>>> both maintainers and contributors. I think we should try to reduce the >>>> amount of configurations we support rather than increasing them >>>> (especially on x86 where we have PV, PVH, HVM). >>> >>> I think it's perfectly fine to initially require a domain to have all >>> its devices either passed through using vPCI or ireqs, but the >>> interface IMO should allow for such differentiation in the future. >>> That's why I think introducing a domain wide vPCI flag might not be >>> the best option going forward. >>> >>> It would be perfectly fine for XEN_DOMCTL_assign_device to set a >>> domain wide vPCI flag, iow: >>> >>> if ( HYPERCALL_VPCI_FLAG_SET && !has_vpci(d) ) >>> { >>> if ( has_arch_pdevs(d) ) >>> { >>> printk("All passthrough devices must use the same backend\n"); >>> return -EINVAL; >>> } >>> >>> /* Set vPCI domain flag */ >>> } >> >> That would be fine by me, but maybe we can avoid this change too. I was >> imagining that vPCI would be enabled at domain creation, not at runtime. >> And that vPCI would be enabled by default for all PVH guests (once we >> are past the initial experimental phase.) > > Then we don't even need a new CDF flag, and just enable vPCI when > IOMMU is enabled? IOW: we can key the enabling of vPCI to > XEN_DOMCTL_CDF_iommu for specific domain types? There are many Arm based platforms that need to use iommu but don't have (or don't use) PCI, so we'd still like to have a separate vPCI flag. > > Maybe that's not so trivial on x86, as there's no x86 PVH domain type > from the hypervisor PoV. > >> >>> We have already agreed that we want to aim for a setup where ioreqs >>> and vPCI could be used for the same domain, but I guess you assumed >>> that ioreqs would be used for emulated devices exclusively and vPCI >>> for passthrough devices? >> >> Yes, that's right >> >> >>> Overall if we agree that ioreqs and vPCI should co-exist for a domain, >>> I'm not sure there's much reason to limit ioreqs to only handle >>> emulated devices, seems like an arbitrary limitation. >> >> Reply below >> >> >>>> I don't think we should enable IOREQ servers to handle PCI passthrough >>>> for PVH guests and/or guests with vPCI. If the domain has vPCI, PCI >>>> Passthrough can be handled by vPCI just fine. I think this should be a >>>> good anti-feature to have (a goal to explicitly not add this feature) to >>>> reduce complexity. Unless you see a specific usecase to add support for >>>> it? >>> >>> There are passthrough devices (GPUs) that might require some extra >>> mediation on dom0 (like the Intel GVT-g thing) that would force the >>> usage of ioreqs to passthrough. >> >> From an architectural perspective, I think it would be cleaner, simpler >> to maintain, and simpler to understand if Xen was the sole owner of the >> PCI Root Complex and PCI config space mediation (implemented with vPCI). >> IOREQ can be used for emulation and it works very well for that. At >> least in my mind, that makes things much simpler. > > But IOREQ already has all the code to mediate accesses to the PCI > config space, and the interface to register separate servers for > different PCI devices. > > We would then need to duplicate this internally for vPCI, so that vPCI > could forward accesses to IOREQ just for IOREQ to forward to yet a > different component? Seems like a lot of duplication for no benefit. > >> I understand there are non-trivial cases, like virtual GPUs with >> hardware access, but I don't classify those as passthrough. That's >> because there isn't one device that gets fully assigned to the guest. >> Instead, there is an emulated device (hence IOREQ) with certain MMIO >> regions and interrupts that end up being directly mapped from real >> hardware. >> >> So I think it is natural in those cases to use IOREQ and it is also >> natural to have QEMU remap MMIO/IRQs at runtime. From a vPCI >> perspective, I hope it will mostly look as if the device is assigned to >> Dom0. Even if it ends up being more complex than that, Rome wasn't >> built in one day, and I don't think we should try to solve this problem >> on day1 (as long as the interfaces are not stable interfaces). > > I don't see IOREQ as dealing explicitly with emulation. Yes, it does > allow for emulators to be implemented in user-space, but at the end > it's just an interface that allows forwarding accesses to certain > resources (for the case we are speaking about here, PCI config space) > to entities that registered as handlers. > > vPCI OTOH just deals with a very specific resource (PCI config space) > and only allows internal handlers to be registered on a byte > granularity. > > So your proposal would be to implement a hierarchy like the one on the > diagram below: > > ┌────────┐ ┌──────────┐ ┌──────────────────┐ > │ Memory │ │ IO Ports │ │ PCI config space │ > └───────┬┘ └┬─────────┘ └───┬──────────────┘ > │ │ │ > │ │ ┌───┴──┐ > │ │ │ vPCI │ > │ │ └─┬──┬─┘ > ┌──┴───┴┐ │ │ > │ IOREQ ├────────────┘ │ > └────┬──┘ │ > │ │ > ┌────────────┴──┐ ┌┴──────────────┐ > │ IOREQ servers │ │ vPCI handlers │ > └───────────────┘ └───────────────┘ > > While what I'm proposing would look like: > > ┌────────┐ ┌──────────┐ ┌──────────────────┐ > │ Memory │ │ IO Ports │ │ PCI config space │ > └────┬───┘ └────┬─────┘ └────────┬─────────┘ > │ │ │ > └─────┬────┴────┬───────────┘ > │ IOREQ │ > └─┬─────┬─┘ > │ │ > ┌───────────────┤ └────┬──────┐ > │ IOREQ servers │ │ vPCI │ > └───────────────┘ └───┬──┘ > │ > ┌───┴───────────┐ > │ vPCI handlers │ > └───────────────┘ > > I'm obviously biased, but I think the latter is cleaner, and allows > all resources to be arbitrated by the same component (IOREQ). > > If the concern is about the IOREQ hypercall interface, it would be > fine to introduce an option that limit IOREQs to internal users > (vPCI) without supporting external IOREQ servers. > > Think of IOREQ as a resource mediator inside of Xen, that just does > the PCI address decoding and forwards the access to the interested > party, either an external IOREQ server or vPCI. > >> >>> It's important that the interfaces we introduce are correct IMO, >>> because that ends up reflecting on the configuration options that we >>> expose to users on xl/libxl. While both XEN_DOMCTL_createdomain and >>> XEN_DOMCTL_assign_device are unstable interfaces, how the vPCI option >>> gets placed there will ultimately influence how the option gets >>> exposed in xl/libxl, and the interface there is relevant to keep >>> stable for end user sanity. >> >> I agree with you on the stable interfaces. The important part is not to >> introduce changes to stable interfaces that could limit us in the >> future. Specifically that includes xl and libxl, we need to be careful >> there. But I don't see a single per-domain vPCI enable/disable option as >> a problem. Let's say that in the future we have a mediated vGPU >> implementation: if it works together with vPCI then the per-domain vPCI >> option in libxl will be enabled (either explicitely or by default), if >> it doesn't then vPCI will be disabled (either explicitely or by the >> newer vGPU options.) > > If vPCI is hooked into IOREQ there won't be a need anymore to register > the vPCI config space traps, as that would be done by IOREQ, and hence > vPCI managed devices could be registered at runtime with IOREQ. IOW: > there won't be a need anymore to signal at domain creation whether > vPCI is intended to be used or not. > > We would obviously need to enable IOREQ for all domains with IOMMU > enabled, as it would be IOREQ that register the PCI config space > handlers. > >> For *unstable* interfaces (XEN_DOMCTL_assign_device) I would rather wait >> before adding more changes on top of them, not because I don't care >> about the mediated GPU problem (we do have something similar at AMD), >> but because I worry that if we try to change them now we might not do a >> good enough job. I would prefer to wait until we know more about the >> actual use case, ideally with code supporting it. >> >> I think the difference in points of views comes from the fact that I see >> vPCI as the default, QEMU only as a limited peripheral emulator (or >> mediator for the vGPU case) but not in control. vPCI and QEMU are not >> equal in my view. vPCI is in charge and always present if not in very >> uncommon setups (even if we decide to hook it inside Xen by using >> internal IOREQ interfaces). QEMU might come and go. > > Xen needs a single component that mediates accesses to resources, > whether that's IOREQ, or something else I don't really care that much. > Having vPCI mediate accesses to the PCI config space, and IOREQ to the > memory (and on x86 IO port) space just seems awfully complicated for > AFAICT no real benefit. > > Also, you seem to confabulate IOREQ with QEMU, while the latter is > indeed an user of IOREQ, I do see IOREQ as a simple resource mediator > inside of Xen, that has the ability to forward such accesses to > external emulators using an hypercall interface. > >> Now that I am writing this, I realize this is also why I wasn't too >> happy with the idea of hooking vPCI using IOREQ. It makes them look as >> if they are the same, while I don't they should be considered at the >> same level of priority, criticality, safety, integration in the system, >> etc. > > I feel there are some fears with IOREQ from a safety PoV? The code > that does the resource multiplexing is small, and as said above if > there are safety concerns with the hypercall interface it would be > fine to limit it's usage to internal handlers only. > > Thanks, Roger.
On Tue, Dec 05, 2023 at 11:27:03AM -0500, Stewart Hildebrand wrote: > On 12/5/23 06:08, Roger Pau Monné wrote: > > On Mon, Dec 04, 2023 at 02:07:51PM -0800, Stefano Stabellini wrote: > >> On Mon, 4 Dec 2023, Roger Pau Monné wrote: > >>> On Fri, Dec 01, 2023 at 06:56:32PM -0800, Stefano Stabellini wrote: > >>>> On Fri, 1 Dec 2023, Roger Pau Monné wrote: > >>>>> On Mon, Nov 13, 2023 at 05:21:13PM -0500, Stewart Hildebrand wrote: > >>>>>> @@ -1618,6 +1630,14 @@ int iommu_do_pci_domctl( > >>>>>> bus = PCI_BUS(machine_sbdf); > >>>>>> devfn = PCI_DEVFN(machine_sbdf); > >>>>>> > >>>>>> + if ( needs_vpci(d) && !has_vpci(d) ) > >>>>>> + { > >>>>>> + printk(XENLOG_G_WARNING "Cannot assign %pp to %pd: vPCI support not enabled\n", > >>>>>> + &PCI_SBDF(seg, bus, devfn), d); > >>>>>> + ret = -EPERM; > >>>>>> + break; > >>>>> > >>>>> I think this is likely too restrictive going forward. The current > >>>>> approach is indeed to enable vPCI on a per-domain basis because that's > >>>>> how PVH dom0 uses it, due to being unable to use ioreq servers. > >>>>> > >>>>> If we start to expose vPCI suport to guests the interface should be on > >>>>> a per-device basis, so that vPCI could be enabled for some devices, > >>>>> while others could still be handled by ioreq servers. > >>>>> > >>>>> We might want to add a new flag to xen_domctl_assign_device (used by > >>>>> XEN_DOMCTL_assign_device) in order to signal whether the device will > >>>>> use vPCI. > >>>> > >>>> Actually I don't think this is a good idea. I am all for flexibility but > >>>> supporting multiple different configurations comes at an extra cost for > >>>> both maintainers and contributors. I think we should try to reduce the > >>>> amount of configurations we support rather than increasing them > >>>> (especially on x86 where we have PV, PVH, HVM). > >>> > >>> I think it's perfectly fine to initially require a domain to have all > >>> its devices either passed through using vPCI or ireqs, but the > >>> interface IMO should allow for such differentiation in the future. > >>> That's why I think introducing a domain wide vPCI flag might not be > >>> the best option going forward. > >>> > >>> It would be perfectly fine for XEN_DOMCTL_assign_device to set a > >>> domain wide vPCI flag, iow: > >>> > >>> if ( HYPERCALL_VPCI_FLAG_SET && !has_vpci(d) ) > >>> { > >>> if ( has_arch_pdevs(d) ) > >>> { > >>> printk("All passthrough devices must use the same backend\n"); > >>> return -EINVAL; > >>> } > >>> > >>> /* Set vPCI domain flag */ > >>> } > >> > >> That would be fine by me, but maybe we can avoid this change too. I was > >> imagining that vPCI would be enabled at domain creation, not at runtime. > >> And that vPCI would be enabled by default for all PVH guests (once we > >> are past the initial experimental phase.) > > > > Then we don't even need a new CDF flag, and just enable vPCI when > > IOMMU is enabled? IOW: we can key the enabling of vPCI to > > XEN_DOMCTL_CDF_iommu for specific domain types? > > There are many Arm based platforms that need to use iommu but don't have (or don't use) PCI, so we'd still like to have a separate vPCI flag. OK, read below though - if we switch to vPCI being a descendant of IOREQ (so that the PCI config space decoding is done by IOREQ) we could hotplug vPCI managed devices at runtime without requiring any prior initialization at domain create, since the traps to the PCI config space would be setup by IOREQ. We might need a PCI flag in order to signal whether the domain is intended to use PCI devices or not, and so whether IOREQ needs to setup PCI config space traps (either fully emulated or passthrough devices). But that would be arch-specific AFAICT, as on x86 we always trap accesses to the PCI IO ports. Thanks, Roger.
On 12/5/23 12:09, Roger Pau Monné wrote: > On Tue, Dec 05, 2023 at 11:27:03AM -0500, Stewart Hildebrand wrote: >> On 12/5/23 06:08, Roger Pau Monné wrote: >>> On Mon, Dec 04, 2023 at 02:07:51PM -0800, Stefano Stabellini wrote: >>>> On Mon, 4 Dec 2023, Roger Pau Monné wrote: >>>>> On Fri, Dec 01, 2023 at 06:56:32PM -0800, Stefano Stabellini wrote: >>>>>> On Fri, 1 Dec 2023, Roger Pau Monné wrote: >>>>>>> On Mon, Nov 13, 2023 at 05:21:13PM -0500, Stewart Hildebrand wrote: >>>>>>>> @@ -1618,6 +1630,14 @@ int iommu_do_pci_domctl( >>>>>>>> bus = PCI_BUS(machine_sbdf); >>>>>>>> devfn = PCI_DEVFN(machine_sbdf); >>>>>>>> >>>>>>>> + if ( needs_vpci(d) && !has_vpci(d) ) >>>>>>>> + { >>>>>>>> + printk(XENLOG_G_WARNING "Cannot assign %pp to %pd: vPCI support not enabled\n", >>>>>>>> + &PCI_SBDF(seg, bus, devfn), d); >>>>>>>> + ret = -EPERM; >>>>>>>> + break; >>>>>>> >>>>>>> I think this is likely too restrictive going forward. The current >>>>>>> approach is indeed to enable vPCI on a per-domain basis because that's >>>>>>> how PVH dom0 uses it, due to being unable to use ioreq servers. >>>>>>> >>>>>>> If we start to expose vPCI suport to guests the interface should be on >>>>>>> a per-device basis, so that vPCI could be enabled for some devices, >>>>>>> while others could still be handled by ioreq servers. >>>>>>> >>>>>>> We might want to add a new flag to xen_domctl_assign_device (used by >>>>>>> XEN_DOMCTL_assign_device) in order to signal whether the device will >>>>>>> use vPCI. >>>>>> >>>>>> Actually I don't think this is a good idea. I am all for flexibility but >>>>>> supporting multiple different configurations comes at an extra cost for >>>>>> both maintainers and contributors. I think we should try to reduce the >>>>>> amount of configurations we support rather than increasing them >>>>>> (especially on x86 where we have PV, PVH, HVM). >>>>> >>>>> I think it's perfectly fine to initially require a domain to have all >>>>> its devices either passed through using vPCI or ireqs, but the >>>>> interface IMO should allow for such differentiation in the future. >>>>> That's why I think introducing a domain wide vPCI flag might not be >>>>> the best option going forward. >>>>> >>>>> It would be perfectly fine for XEN_DOMCTL_assign_device to set a >>>>> domain wide vPCI flag, iow: >>>>> >>>>> if ( HYPERCALL_VPCI_FLAG_SET && !has_vpci(d) ) >>>>> { >>>>> if ( has_arch_pdevs(d) ) >>>>> { >>>>> printk("All passthrough devices must use the same backend\n"); >>>>> return -EINVAL; >>>>> } >>>>> >>>>> /* Set vPCI domain flag */ >>>>> } >>>> >>>> That would be fine by me, but maybe we can avoid this change too. I was >>>> imagining that vPCI would be enabled at domain creation, not at runtime. >>>> And that vPCI would be enabled by default for all PVH guests (once we >>>> are past the initial experimental phase.) >>> >>> Then we don't even need a new CDF flag, and just enable vPCI when >>> IOMMU is enabled? IOW: we can key the enabling of vPCI to >>> XEN_DOMCTL_CDF_iommu for specific domain types? >> >> There are many Arm based platforms that need to use iommu but don't have (or don't use) PCI, so we'd still like to have a separate vPCI flag. > > OK, read below though - if we switch to vPCI being a descendant of > IOREQ (so that the PCI config space decoding is done by IOREQ) we > could hotplug vPCI managed devices at runtime without requiring any > prior initialization at domain create, since the traps to the PCI > config space would be setup by IOREQ. > > We might need a PCI flag in order to signal whether the domain is > intended to use PCI devices or not, and so whether IOREQ needs to > setup PCI config space traps (either fully emulated or passthrough > devices). But that would be arch-specific AFAICT, as on x86 we > always trap accesses to the PCI IO ports. On Arm, the toolstack (or dom0less creation code) needs to construct a {v,ioreq}PCI root complex node in the device tree before guest boot. Attempting to hot plug such a device tree node at runtime sounds like a goal for the (far) future. On Arm, we don't trap the {v,ioreq}PCI config space by default, so, yes, we for sure would need such a {v,ioreq}PCI flag for setting up the {v,ioreq}PCI mmio handlers if we go this route.
On 12/5/23 06:08, Roger Pau Monné wrote: > On Mon, Dec 04, 2023 at 02:07:51PM -0800, Stefano Stabellini wrote: >> On Mon, 4 Dec 2023, Roger Pau Monné wrote: >>> On Fri, Dec 01, 2023 at 06:56:32PM -0800, Stefano Stabellini wrote: >>>> On Fri, 1 Dec 2023, Roger Pau Monné wrote: >>>>> On Mon, Nov 13, 2023 at 05:21:13PM -0500, Stewart Hildebrand wrote: >>>>>> @@ -1618,6 +1630,14 @@ int iommu_do_pci_domctl( >>>>>> bus = PCI_BUS(machine_sbdf); >>>>>> devfn = PCI_DEVFN(machine_sbdf); >>>>>> >>>>>> + if ( needs_vpci(d) && !has_vpci(d) ) >>>>>> + { >>>>>> + printk(XENLOG_G_WARNING "Cannot assign %pp to %pd: vPCI support not enabled\n", >>>>>> + &PCI_SBDF(seg, bus, devfn), d); >>>>>> + ret = -EPERM; >>>>>> + break; >>>>> >>>>> I think this is likely too restrictive going forward. The current >>>>> approach is indeed to enable vPCI on a per-domain basis because that's >>>>> how PVH dom0 uses it, due to being unable to use ioreq servers. >>>>> >>>>> If we start to expose vPCI suport to guests the interface should be on >>>>> a per-device basis, so that vPCI could be enabled for some devices, >>>>> while others could still be handled by ioreq servers. >>>>> >>>>> We might want to add a new flag to xen_domctl_assign_device (used by >>>>> XEN_DOMCTL_assign_device) in order to signal whether the device will >>>>> use vPCI. >>>> >>>> Actually I don't think this is a good idea. I am all for flexibility but >>>> supporting multiple different configurations comes at an extra cost for >>>> both maintainers and contributors. I think we should try to reduce the >>>> amount of configurations we support rather than increasing them >>>> (especially on x86 where we have PV, PVH, HVM). >>> >>> I think it's perfectly fine to initially require a domain to have all >>> its devices either passed through using vPCI or ireqs, but the >>> interface IMO should allow for such differentiation in the future. >>> That's why I think introducing a domain wide vPCI flag might not be >>> the best option going forward. >>> >>> It would be perfectly fine for XEN_DOMCTL_assign_device to set a >>> domain wide vPCI flag, iow: >>> >>> if ( HYPERCALL_VPCI_FLAG_SET && !has_vpci(d) ) >>> { >>> if ( has_arch_pdevs(d) ) >>> { >>> printk("All passthrough devices must use the same backend\n"); >>> return -EINVAL; >>> } >>> >>> /* Set vPCI domain flag */ >>> } >> >> That would be fine by me, but maybe we can avoid this change too. I was >> imagining that vPCI would be enabled at domain creation, not at runtime. >> And that vPCI would be enabled by default for all PVH guests (once we >> are past the initial experimental phase.) > > Then we don't even need a new CDF flag, and just enable vPCI when > IOMMU is enabled? IOW: we can key the enabling of vPCI to > XEN_DOMCTL_CDF_iommu for specific domain types? > > Maybe that's not so trivial on x86, as there's no x86 PVH domain type > from the hypervisor PoV. > >> >>> We have already agreed that we want to aim for a setup where ioreqs >>> and vPCI could be used for the same domain, but I guess you assumed >>> that ioreqs would be used for emulated devices exclusively and vPCI >>> for passthrough devices? >> >> Yes, that's right >> >> >>> Overall if we agree that ioreqs and vPCI should co-exist for a domain, >>> I'm not sure there's much reason to limit ioreqs to only handle >>> emulated devices, seems like an arbitrary limitation. >> >> Reply below >> >> >>>> I don't think we should enable IOREQ servers to handle PCI passthrough >>>> for PVH guests and/or guests with vPCI. If the domain has vPCI, PCI >>>> Passthrough can be handled by vPCI just fine. I think this should be a >>>> good anti-feature to have (a goal to explicitly not add this feature) to >>>> reduce complexity. Unless you see a specific usecase to add support for >>>> it? >>> >>> There are passthrough devices (GPUs) that might require some extra >>> mediation on dom0 (like the Intel GVT-g thing) that would force the >>> usage of ioreqs to passthrough. >> >> From an architectural perspective, I think it would be cleaner, simpler >> to maintain, and simpler to understand if Xen was the sole owner of the >> PCI Root Complex and PCI config space mediation (implemented with vPCI). >> IOREQ can be used for emulation and it works very well for that. At >> least in my mind, that makes things much simpler. > > But IOREQ already has all the code to mediate accesses to the PCI > config space, and the interface to register separate servers for > different PCI devices. > > We would then need to duplicate this internally for vPCI, so that vPCI > could forward accesses to IOREQ just for IOREQ to forward to yet a > different component? Seems like a lot of duplication for no benefit. > >> I understand there are non-trivial cases, like virtual GPUs with >> hardware access, but I don't classify those as passthrough. That's >> because there isn't one device that gets fully assigned to the guest. >> Instead, there is an emulated device (hence IOREQ) with certain MMIO >> regions and interrupts that end up being directly mapped from real >> hardware. >> >> So I think it is natural in those cases to use IOREQ and it is also >> natural to have QEMU remap MMIO/IRQs at runtime. From a vPCI >> perspective, I hope it will mostly look as if the device is assigned to >> Dom0. Even if it ends up being more complex than that, Rome wasn't >> built in one day, and I don't think we should try to solve this problem >> on day1 (as long as the interfaces are not stable interfaces). > > I don't see IOREQ as dealing explicitly with emulation. Yes, it does > allow for emulators to be implemented in user-space, but at the end > it's just an interface that allows forwarding accesses to certain > resources (for the case we are speaking about here, PCI config space) > to entities that registered as handlers. > > vPCI OTOH just deals with a very specific resource (PCI config space) > and only allows internal handlers to be registered on a byte > granularity. > > So your proposal would be to implement a hierarchy like the one on the > diagram below: > > ┌────────┐ ┌──────────┐ ┌──────────────────┐ > │ Memory │ │ IO Ports │ │ PCI config space │ > └───────┬┘ └┬─────────┘ └───┬──────────────┘ > │ │ │ > │ │ ┌───┴──┐ > │ │ │ vPCI │ > │ │ └─┬──┬─┘ > ┌──┴───┴┐ │ │ > │ IOREQ ├────────────┘ │ > └────┬──┘ │ > │ │ > ┌────────────┴──┐ ┌┴──────────────┐ > │ IOREQ servers │ │ vPCI handlers │ > └───────────────┘ └───────────────┘ > > While what I'm proposing would look like: > > ┌────────┐ ┌──────────┐ ┌──────────────────┐ > │ Memory │ │ IO Ports │ │ PCI config space │ > └────┬───┘ └────┬─────┘ └────────┬─────────┘ > │ │ │ > └─────┬────┴────┬───────────┘ > │ IOREQ │ > └─┬─────┬─┘ > │ │ > ┌───────────────┤ └────┬──────┐ > │ IOREQ servers │ │ vPCI │ > └───────────────┘ └───┬──┘ > │ > ┌───┴───────────┐ > │ vPCI handlers │ > └───────────────┘ > > I'm obviously biased, but I think the latter is cleaner, and allows > all resources to be arbitrated by the same component (IOREQ). > > If the concern is about the IOREQ hypercall interface, it would be > fine to introduce an option that limit IOREQs to internal users > (vPCI) without supporting external IOREQ servers. > > Think of IOREQ as a resource mediator inside of Xen, that just does > the PCI address decoding and forwards the access to the interested > party, either an external IOREQ server or vPCI. > >> >>> It's important that the interfaces we introduce are correct IMO, >>> because that ends up reflecting on the configuration options that we >>> expose to users on xl/libxl. While both XEN_DOMCTL_createdomain and >>> XEN_DOMCTL_assign_device are unstable interfaces, how the vPCI option >>> gets placed there will ultimately influence how the option gets >>> exposed in xl/libxl, and the interface there is relevant to keep >>> stable for end user sanity. >> >> I agree with you on the stable interfaces. The important part is not to >> introduce changes to stable interfaces that could limit us in the >> future. Specifically that includes xl and libxl, we need to be careful >> there. But I don't see a single per-domain vPCI enable/disable option as >> a problem. Let's say that in the future we have a mediated vGPU >> implementation: if it works together with vPCI then the per-domain vPCI >> option in libxl will be enabled (either explicitely or by default), if >> it doesn't then vPCI will be disabled (either explicitely or by the >> newer vGPU options.) > > If vPCI is hooked into IOREQ there won't be a need anymore to register > the vPCI config space traps, as that would be done by IOREQ, and hence > vPCI managed devices could be registered at runtime with IOREQ. IOW: > there won't be a need anymore to signal at domain creation whether > vPCI is intended to be used or not. > > We would obviously need to enable IOREQ for all domains with IOMMU > enabled, as it would be IOREQ that register the PCI config space > handlers. > >> For *unstable* interfaces (XEN_DOMCTL_assign_device) I would rather wait >> before adding more changes on top of them, not because I don't care >> about the mediated GPU problem (we do have something similar at AMD), >> but because I worry that if we try to change them now we might not do a >> good enough job. I would prefer to wait until we know more about the >> actual use case, ideally with code supporting it. >> >> I think the difference in points of views comes from the fact that I see >> vPCI as the default, QEMU only as a limited peripheral emulator (or >> mediator for the vGPU case) but not in control. vPCI and QEMU are not >> equal in my view. vPCI is in charge and always present if not in very >> uncommon setups (even if we decide to hook it inside Xen by using >> internal IOREQ interfaces). QEMU might come and go. > > Xen needs a single component that mediates accesses to resources, > whether that's IOREQ, or something else I don't really care that much. I just wanted to share what the "something else" diagram might look like. ┌────────┐ ┌──────────┐ ┌──────────────────┐ │ Memory │ │ IO Ports │ │ PCI config space │ └────┬───┘ └────┬─────┘ └──────────┬───────┘ │ │ │ └──┬───────┴───────────────┬──┘ │ PCI Resource Mediator │ └────┬─────┬────────────┘ │ │ ┌───────┤ └────┬──────┐ │ IOREQ │ │ vPCI │ └────┬──┘ └───┬──┘ │ │ ┌────────────┴──┐ ┌───┴───────────┐ │ IOREQ servers │ │ vPCI handlers │ └───────────────┘ └───────────────┘ > Having vPCI mediate accesses to the PCI config space, and IOREQ to the > memory (and on x86 IO port) space just seems awfully complicated for > AFAICT no real benefit. > > Also, you seem to confabulate IOREQ with QEMU, while the latter is > indeed an user of IOREQ, I do see IOREQ as a simple resource mediator > inside of Xen, that has the ability to forward such accesses to > external emulators using an hypercall interface. > >> Now that I am writing this, I realize this is also why I wasn't too >> happy with the idea of hooking vPCI using IOREQ. It makes them look as >> if they are the same, while I don't they should be considered at the >> same level of priority, criticality, safety, integration in the system, >> etc. > > I feel there are some fears with IOREQ from a safety PoV? The code > that does the resource multiplexing is small, and as said above if > there are safety concerns with the hypercall interface it would be > fine to limit it's usage to internal handlers only. Would it make any sense at all to split the resource multiplexing bits from IOREQ into a new separate PCI resource mediator? > > Thanks, Roger.
On Tue, 5 Dec 2023, Stewart Hildebrand wrote: > On 12/5/23 12:09, Roger Pau Monné wrote: > > On Tue, Dec 05, 2023 at 11:27:03AM -0500, Stewart Hildebrand wrote: > >> On 12/5/23 06:08, Roger Pau Monné wrote: > >>> On Mon, Dec 04, 2023 at 02:07:51PM -0800, Stefano Stabellini wrote: > >>>> On Mon, 4 Dec 2023, Roger Pau Monné wrote: > >>>>> On Fri, Dec 01, 2023 at 06:56:32PM -0800, Stefano Stabellini wrote: > >>>>>> On Fri, 1 Dec 2023, Roger Pau Monné wrote: > >>>>>>> On Mon, Nov 13, 2023 at 05:21:13PM -0500, Stewart Hildebrand wrote: > >>>>>>>> @@ -1618,6 +1630,14 @@ int iommu_do_pci_domctl( > >>>>>>>> bus = PCI_BUS(machine_sbdf); > >>>>>>>> devfn = PCI_DEVFN(machine_sbdf); > >>>>>>>> > >>>>>>>> + if ( needs_vpci(d) && !has_vpci(d) ) > >>>>>>>> + { > >>>>>>>> + printk(XENLOG_G_WARNING "Cannot assign %pp to %pd: vPCI support not enabled\n", > >>>>>>>> + &PCI_SBDF(seg, bus, devfn), d); > >>>>>>>> + ret = -EPERM; > >>>>>>>> + break; > >>>>>>> > >>>>>>> I think this is likely too restrictive going forward. The current > >>>>>>> approach is indeed to enable vPCI on a per-domain basis because that's > >>>>>>> how PVH dom0 uses it, due to being unable to use ioreq servers. > >>>>>>> > >>>>>>> If we start to expose vPCI suport to guests the interface should be on > >>>>>>> a per-device basis, so that vPCI could be enabled for some devices, > >>>>>>> while others could still be handled by ioreq servers. > >>>>>>> > >>>>>>> We might want to add a new flag to xen_domctl_assign_device (used by > >>>>>>> XEN_DOMCTL_assign_device) in order to signal whether the device will > >>>>>>> use vPCI. > >>>>>> > >>>>>> Actually I don't think this is a good idea. I am all for flexibility but > >>>>>> supporting multiple different configurations comes at an extra cost for > >>>>>> both maintainers and contributors. I think we should try to reduce the > >>>>>> amount of configurations we support rather than increasing them > >>>>>> (especially on x86 where we have PV, PVH, HVM). > >>>>> > >>>>> I think it's perfectly fine to initially require a domain to have all > >>>>> its devices either passed through using vPCI or ireqs, but the > >>>>> interface IMO should allow for such differentiation in the future. > >>>>> That's why I think introducing a domain wide vPCI flag might not be > >>>>> the best option going forward. > >>>>> > >>>>> It would be perfectly fine for XEN_DOMCTL_assign_device to set a > >>>>> domain wide vPCI flag, iow: > >>>>> > >>>>> if ( HYPERCALL_VPCI_FLAG_SET && !has_vpci(d) ) > >>>>> { > >>>>> if ( has_arch_pdevs(d) ) > >>>>> { > >>>>> printk("All passthrough devices must use the same backend\n"); > >>>>> return -EINVAL; > >>>>> } > >>>>> > >>>>> /* Set vPCI domain flag */ > >>>>> } > >>>> > >>>> That would be fine by me, but maybe we can avoid this change too. I was > >>>> imagining that vPCI would be enabled at domain creation, not at runtime. > >>>> And that vPCI would be enabled by default for all PVH guests (once we > >>>> are past the initial experimental phase.) > >>> > >>> Then we don't even need a new CDF flag, and just enable vPCI when > >>> IOMMU is enabled? IOW: we can key the enabling of vPCI to > >>> XEN_DOMCTL_CDF_iommu for specific domain types? > >> > >> There are many Arm based platforms that need to use iommu but don't have (or don't use) PCI, so we'd still like to have a separate vPCI flag. > > > > OK, read below though - if we switch to vPCI being a descendant of > > IOREQ (so that the PCI config space decoding is done by IOREQ) we > > could hotplug vPCI managed devices at runtime without requiring any > > prior initialization at domain create, since the traps to the PCI > > config space would be setup by IOREQ. > > > > We might need a PCI flag in order to signal whether the domain is > > intended to use PCI devices or not, and so whether IOREQ needs to > > setup PCI config space traps (either fully emulated or passthrough > > devices). But that would be arch-specific AFAICT, as on x86 we > > always trap accesses to the PCI IO ports. > > On Arm, the toolstack (or dom0less creation code) needs to construct a {v,ioreq}PCI root complex node in the device tree before guest boot. Attempting to hot plug such a device tree node at runtime sounds like a goal for the (far) future. On Arm, we don't trap the {v,ioreq}PCI config space by default, so, yes, we for sure would need such a {v,ioreq}PCI flag for setting up the {v,ioreq}PCI mmio handlers if we go this route. Yes and also dynamic configuration and hotplug are actually detrimental in safety configurations where you need as much as possible, ideally everything, to be specified at build time.
On Tue, 5 Dec 2023, Roger Pau Monné wrote: > > > > I don't think we should enable IOREQ servers to handle PCI passthrough > > > > for PVH guests and/or guests with vPCI. If the domain has vPCI, PCI > > > > Passthrough can be handled by vPCI just fine. I think this should be a > > > > good anti-feature to have (a goal to explicitly not add this feature) to > > > > reduce complexity. Unless you see a specific usecase to add support for > > > > it? > > > > > > There are passthrough devices (GPUs) that might require some extra > > > mediation on dom0 (like the Intel GVT-g thing) that would force the > > > usage of ioreqs to passthrough. > > > > From an architectural perspective, I think it would be cleaner, simpler > > to maintain, and simpler to understand if Xen was the sole owner of the > > PCI Root Complex and PCI config space mediation (implemented with vPCI). > > IOREQ can be used for emulation and it works very well for that. At > > least in my mind, that makes things much simpler. > > But IOREQ already has all the code to mediate accesses to the PCI > config space, and the interface to register separate servers for > different PCI devices. > > We would then need to duplicate this internally for vPCI, so that vPCI > could forward accesses to IOREQ just for IOREQ to forward to yet a > different component? Seems like a lot of duplication for no benefit. [...] > Also, you seem to confabulate IOREQ with QEMU, while the latter is > indeed an user of IOREQ, I do see IOREQ as a simple resource mediator > inside of Xen, that has the ability to forward such accesses to > external emulators using an hypercall interface. We have been using different terminologies until now. IOREQ could mean anything from the ABI interface, the emulator side (QEMU) or the hypervisor side (Xen). I am going to align with your wording and say: IOREQ: only the IOREQ implementation in Xen (xen/common/ioreq.c) IOREQ server: QEMU or alternative I think it is OK if we use IOREQ internally within Xen to hook vPCI with PCI config space accesses and emulation. I don't think it is a good idea to attempt to enable IOREQ servers (e.g. QEMU) to implement PCI Passthrough when vPCI is also enabled for the domain, at least initially. > > I understand there are non-trivial cases, like virtual GPUs with > > hardware access, but I don't classify those as passthrough. That's > > because there isn't one device that gets fully assigned to the guest. > > Instead, there is an emulated device (hence IOREQ) with certain MMIO > > regions and interrupts that end up being directly mapped from real > > hardware. > > > > So I think it is natural in those cases to use IOREQ and it is also > > natural to have QEMU remap MMIO/IRQs at runtime. From a vPCI > > perspective, I hope it will mostly look as if the device is assigned to > > Dom0. Even if it ends up being more complex than that, Rome wasn't > > built in one day, and I don't think we should try to solve this problem > > on day1 (as long as the interfaces are not stable interfaces). > > I don't see IOREQ as dealing explicitly with emulation. Yes, it does > allow for emulators to be implemented in user-space, but at the end > it's just an interface that allows forwarding accesses to certain > resources (for the case we are speaking about here, PCI config space) > to entities that registered as handlers. > > vPCI OTOH just deals with a very specific resource (PCI config space) > and only allows internal handlers to be registered on a byte > granularity. > > So your proposal would be to implement a hierarchy like the one on the > diagram below: > > ┌────────┐ ┌──────────┐ ┌──────────────────┐ > │ Memory │ │ IO Ports │ │ PCI config space │ > └───────┬┘ └┬─────────┘ └───┬──────────────┘ > │ │ │ > │ │ ┌───┴──┐ > │ │ │ vPCI │ > │ │ └─┬──┬─┘ > ┌──┴───┴┐ │ │ > │ IOREQ ├────────────┘ │ > └────┬──┘ │ > │ │ > ┌────────────┴──┐ ┌┴──────────────┐ > │ IOREQ servers │ │ vPCI handlers │ > └───────────────┘ └───────────────┘ Yes > While what I'm proposing would look like: > > ┌────────┐ ┌──────────┐ ┌──────────────────┐ > │ Memory │ │ IO Ports │ │ PCI config space │ > └────┬───┘ └────┬─────┘ └────────┬─────────┘ > │ │ │ > └─────┬────┴────┬───────────┘ > │ IOREQ │ > └─┬─────┬─┘ > │ │ > ┌───────────────┤ └────┬──────┐ > │ IOREQ servers │ │ vPCI │ > └───────────────┘ └───┬──┘ > │ > ┌───┴───────────┐ > │ vPCI handlers │ > └───────────────┘ I don't have a major problem with this, but I find it less clear than the first one. Let's say that all domains are PVH (or ARM guests). QEMU is running in Dom0. If QEMU does emulation, that's fine. If QEMU implements PCI Passthrough then QEMU uses libpci to do PCI config space reads and write, which go to the Linux kernel in Dom0, which ends up doing PCI config space reads and writes on the device, and that goes via vPCI in Xen (vPCI for Dom0). So actually vPCI is still present. It is a lot simpler to think that vPCI is in charge of all mediated PCI config space accesses rather than thinking that for the same device vPCI handles PCI config space accesses for Dom0 but not for DomU. It is not my preference but I am OK to compromise and go ahead with the architecture you proposed but please let's keep IOREQ servers out of the PCI Passthrough picture at least initially. > I'm obviously biased, but I think the latter is cleaner, and allows > all resources to be arbitrated by the same component (IOREQ). > > If the concern is about the IOREQ hypercall interface, it would be > fine to introduce an option that limit IOREQs to internal users > (vPCI) without supporting external IOREQ servers. > > Think of IOREQ as a resource mediator inside of Xen, that just does > the PCI address decoding and forwards the access to the interested > party, either an external IOREQ server or vPCI. The part about IOREQ (xen/common/ioreq.c) being a resource mediator is OKish. I had many discussions over the years with various members of the larger open source embedded community (Linaro, etc.) and the problem is that when one says "IOREQ" typically people think of QEMU or other userspace emulators. They don't think of the Xen side of it. This becomes very relevant here because Xen is the only part of the system that is getting safety-certified and it is important to convey the message that nothing else in required to be safety-certified to have a fully working Xen system that supports PCI Passthrough. In short, it is important that the community doesn't get the idea that QEMU needs to be safety-certified to have PCI Passthrough working correctly with Xen in a safety scenario. > > > It's important that the interfaces we introduce are correct IMO, > > > because that ends up reflecting on the configuration options that we > > > expose to users on xl/libxl. While both XEN_DOMCTL_createdomain and > > > XEN_DOMCTL_assign_device are unstable interfaces, how the vPCI option > > > gets placed there will ultimately influence how the option gets > > > exposed in xl/libxl, and the interface there is relevant to keep > > > stable for end user sanity. > > > > I agree with you on the stable interfaces. The important part is not to > > introduce changes to stable interfaces that could limit us in the > > future. Specifically that includes xl and libxl, we need to be careful > > there. But I don't see a single per-domain vPCI enable/disable option as > > a problem. Let's say that in the future we have a mediated vGPU > > implementation: if it works together with vPCI then the per-domain vPCI > > option in libxl will be enabled (either explicitely or by default), if > > it doesn't then vPCI will be disabled (either explicitely or by the > > newer vGPU options.) > > If vPCI is hooked into IOREQ there won't be a need anymore to register > the vPCI config space traps, as that would be done by IOREQ, and hence > vPCI managed devices could be registered at runtime with IOREQ. IOW: > there won't be a need anymore to signal at domain creation whether > vPCI is intended to be used or not. For safety, we have requirements to specify everything statically before boot so typically anything dynamic is a problem. > We would obviously need to enable IOREQ for all domains with IOMMU > enabled, as it would be IOREQ that register the PCI config space > handlers. This bit might be OK > > For *unstable* interfaces (XEN_DOMCTL_assign_device) I would rather wait > > before adding more changes on top of them, not because I don't care > > about the mediated GPU problem (we do have something similar at AMD), > > but because I worry that if we try to change them now we might not do a > > good enough job. I would prefer to wait until we know more about the > > actual use case, ideally with code supporting it. > > > > I think the difference in points of views comes from the fact that I see > > vPCI as the default, QEMU only as a limited peripheral emulator (or > > mediator for the vGPU case) but not in control. vPCI and QEMU are not > > equal in my view. vPCI is in charge and always present if not in very > > uncommon setups (even if we decide to hook it inside Xen by using > > internal IOREQ interfaces). QEMU might come and go. > > Xen needs a single component that mediates accesses to resources, > whether that's IOREQ, or something else I don't really care that much. > Having vPCI mediate accesses to the PCI config space, and IOREQ to the > memory (and on x86 IO port) space just seems awfully complicated for > AFAICT no real benefit. > > > Now that I am writing this, I realize this is also why I wasn't too > > happy with the idea of hooking vPCI using IOREQ. It makes them look as > > if they are the same, while I don't they should be considered at the > > same level of priority, criticality, safety, integration in the system, > > etc. > > I feel there are some fears with IOREQ from a safety PoV? The code > that does the resource multiplexing is small, and as said above if > there are safety concerns with the hypercall interface it would be > fine to limit it's usage to internal handlers only. Yes it is about safety. Everything within Xen will be safety-certified, hence usable in a safety critical scenario, everything outside of Xen might not. The fear is not on the IOREQ itself because xen/common/ioreq.c is part of the certification scope. The fear is that IOREQ servers (e.g. QEMU) are somehow in the picture when we discuss safety architectures with PCI Passthrough, or that IOREQ servers could interfere with vPCI. By "interfere" I mean that QEMU running in dom0 (a deprivileged dom0) will be able to cause a malfunction in Xen vPCI. Yes, limiting the hypercall interface would help in that regard because it would limit Xen exposure.
On Tue, Dec 05, 2023 at 02:01:46PM -0500, Stewart Hildebrand wrote: > On 12/5/23 06:08, Roger Pau Monné wrote: > > On Mon, Dec 04, 2023 at 02:07:51PM -0800, Stefano Stabellini wrote: > >> On Mon, 4 Dec 2023, Roger Pau Monné wrote: > >>> On Fri, Dec 01, 2023 at 06:56:32PM -0800, Stefano Stabellini wrote: > >>>> On Fri, 1 Dec 2023, Roger Pau Monné wrote: > >>>>> On Mon, Nov 13, 2023 at 05:21:13PM -0500, Stewart Hildebrand wrote: > >>>>>> @@ -1618,6 +1630,14 @@ int iommu_do_pci_domctl( > >>>>>> bus = PCI_BUS(machine_sbdf); > >>>>>> devfn = PCI_DEVFN(machine_sbdf); > >>>>>> > >>>>>> + if ( needs_vpci(d) && !has_vpci(d) ) > >>>>>> + { > >>>>>> + printk(XENLOG_G_WARNING "Cannot assign %pp to %pd: vPCI support not enabled\n", > >>>>>> + &PCI_SBDF(seg, bus, devfn), d); > >>>>>> + ret = -EPERM; > >>>>>> + break; > >>>>> > >>>>> I think this is likely too restrictive going forward. The current > >>>>> approach is indeed to enable vPCI on a per-domain basis because that's > >>>>> how PVH dom0 uses it, due to being unable to use ioreq servers. > >>>>> > >>>>> If we start to expose vPCI suport to guests the interface should be on > >>>>> a per-device basis, so that vPCI could be enabled for some devices, > >>>>> while others could still be handled by ioreq servers. > >>>>> > >>>>> We might want to add a new flag to xen_domctl_assign_device (used by > >>>>> XEN_DOMCTL_assign_device) in order to signal whether the device will > >>>>> use vPCI. > >>>> > >>>> Actually I don't think this is a good idea. I am all for flexibility but > >>>> supporting multiple different configurations comes at an extra cost for > >>>> both maintainers and contributors. I think we should try to reduce the > >>>> amount of configurations we support rather than increasing them > >>>> (especially on x86 where we have PV, PVH, HVM). > >>> > >>> I think it's perfectly fine to initially require a domain to have all > >>> its devices either passed through using vPCI or ireqs, but the > >>> interface IMO should allow for such differentiation in the future. > >>> That's why I think introducing a domain wide vPCI flag might not be > >>> the best option going forward. > >>> > >>> It would be perfectly fine for XEN_DOMCTL_assign_device to set a > >>> domain wide vPCI flag, iow: > >>> > >>> if ( HYPERCALL_VPCI_FLAG_SET && !has_vpci(d) ) > >>> { > >>> if ( has_arch_pdevs(d) ) > >>> { > >>> printk("All passthrough devices must use the same backend\n"); > >>> return -EINVAL; > >>> } > >>> > >>> /* Set vPCI domain flag */ > >>> } > >> > >> That would be fine by me, but maybe we can avoid this change too. I was > >> imagining that vPCI would be enabled at domain creation, not at runtime. > >> And that vPCI would be enabled by default for all PVH guests (once we > >> are past the initial experimental phase.) > > > > Then we don't even need a new CDF flag, and just enable vPCI when > > IOMMU is enabled? IOW: we can key the enabling of vPCI to > > XEN_DOMCTL_CDF_iommu for specific domain types? > > > > Maybe that's not so trivial on x86, as there's no x86 PVH domain type > > from the hypervisor PoV. > > > >> > >>> We have already agreed that we want to aim for a setup where ioreqs > >>> and vPCI could be used for the same domain, but I guess you assumed > >>> that ioreqs would be used for emulated devices exclusively and vPCI > >>> for passthrough devices? > >> > >> Yes, that's right > >> > >> > >>> Overall if we agree that ioreqs and vPCI should co-exist for a domain, > >>> I'm not sure there's much reason to limit ioreqs to only handle > >>> emulated devices, seems like an arbitrary limitation. > >> > >> Reply below > >> > >> > >>>> I don't think we should enable IOREQ servers to handle PCI passthrough > >>>> for PVH guests and/or guests with vPCI. If the domain has vPCI, PCI > >>>> Passthrough can be handled by vPCI just fine. I think this should be a > >>>> good anti-feature to have (a goal to explicitly not add this feature) to > >>>> reduce complexity. Unless you see a specific usecase to add support for > >>>> it? > >>> > >>> There are passthrough devices (GPUs) that might require some extra > >>> mediation on dom0 (like the Intel GVT-g thing) that would force the > >>> usage of ioreqs to passthrough. > >> > >> From an architectural perspective, I think it would be cleaner, simpler > >> to maintain, and simpler to understand if Xen was the sole owner of the > >> PCI Root Complex and PCI config space mediation (implemented with vPCI). > >> IOREQ can be used for emulation and it works very well for that. At > >> least in my mind, that makes things much simpler. > > > > But IOREQ already has all the code to mediate accesses to the PCI > > config space, and the interface to register separate servers for > > different PCI devices. > > > > We would then need to duplicate this internally for vPCI, so that vPCI > > could forward accesses to IOREQ just for IOREQ to forward to yet a > > different component? Seems like a lot of duplication for no benefit. > > > >> I understand there are non-trivial cases, like virtual GPUs with > >> hardware access, but I don't classify those as passthrough. That's > >> because there isn't one device that gets fully assigned to the guest. > >> Instead, there is an emulated device (hence IOREQ) with certain MMIO > >> regions and interrupts that end up being directly mapped from real > >> hardware. > >> > >> So I think it is natural in those cases to use IOREQ and it is also > >> natural to have QEMU remap MMIO/IRQs at runtime. From a vPCI > >> perspective, I hope it will mostly look as if the device is assigned to > >> Dom0. Even if it ends up being more complex than that, Rome wasn't > >> built in one day, and I don't think we should try to solve this problem > >> on day1 (as long as the interfaces are not stable interfaces). > > > > I don't see IOREQ as dealing explicitly with emulation. Yes, it does > > allow for emulators to be implemented in user-space, but at the end > > it's just an interface that allows forwarding accesses to certain > > resources (for the case we are speaking about here, PCI config space) > > to entities that registered as handlers. > > > > vPCI OTOH just deals with a very specific resource (PCI config space) > > and only allows internal handlers to be registered on a byte > > granularity. > > > > So your proposal would be to implement a hierarchy like the one on the > > diagram below: > > > > ┌────────┐ ┌──────────┐ ┌──────────────────┐ > > │ Memory │ │ IO Ports │ │ PCI config space │ > > └───────┬┘ └┬─────────┘ └───┬──────────────┘ > > │ │ │ > > │ │ ┌───┴──┐ > > │ │ │ vPCI │ > > │ │ └─┬──┬─┘ > > ┌──┴───┴┐ │ │ > > │ IOREQ ├────────────┘ │ > > └────┬──┘ │ > > │ │ > > ┌────────────┴──┐ ┌┴──────────────┐ > > │ IOREQ servers │ │ vPCI handlers │ > > └───────────────┘ └───────────────┘ > > > > While what I'm proposing would look like: > > > > ┌────────┐ ┌──────────┐ ┌──────────────────┐ > > │ Memory │ │ IO Ports │ │ PCI config space │ > > └────┬───┘ └────┬─────┘ └────────┬─────────┘ > > │ │ │ > > └─────┬────┴────┬───────────┘ > > │ IOREQ │ > > └─┬─────┬─┘ > > │ │ > > ┌───────────────┤ └────┬──────┐ > > │ IOREQ servers │ │ vPCI │ > > └───────────────┘ └───┬──┘ > > │ > > ┌───┴───────────┐ > > │ vPCI handlers │ > > └───────────────┘ > > > > I'm obviously biased, but I think the latter is cleaner, and allows > > all resources to be arbitrated by the same component (IOREQ). > > > > If the concern is about the IOREQ hypercall interface, it would be > > fine to introduce an option that limit IOREQs to internal users > > (vPCI) without supporting external IOREQ servers. > > > > Think of IOREQ as a resource mediator inside of Xen, that just does > > the PCI address decoding and forwards the access to the interested > > party, either an external IOREQ server or vPCI. > > > >> > >>> It's important that the interfaces we introduce are correct IMO, > >>> because that ends up reflecting on the configuration options that we > >>> expose to users on xl/libxl. While both XEN_DOMCTL_createdomain and > >>> XEN_DOMCTL_assign_device are unstable interfaces, how the vPCI option > >>> gets placed there will ultimately influence how the option gets > >>> exposed in xl/libxl, and the interface there is relevant to keep > >>> stable for end user sanity. > >> > >> I agree with you on the stable interfaces. The important part is not to > >> introduce changes to stable interfaces that could limit us in the > >> future. Specifically that includes xl and libxl, we need to be careful > >> there. But I don't see a single per-domain vPCI enable/disable option as > >> a problem. Let's say that in the future we have a mediated vGPU > >> implementation: if it works together with vPCI then the per-domain vPCI > >> option in libxl will be enabled (either explicitely or by default), if > >> it doesn't then vPCI will be disabled (either explicitely or by the > >> newer vGPU options.) > > > > If vPCI is hooked into IOREQ there won't be a need anymore to register > > the vPCI config space traps, as that would be done by IOREQ, and hence > > vPCI managed devices could be registered at runtime with IOREQ. IOW: > > there won't be a need anymore to signal at domain creation whether > > vPCI is intended to be used or not. > > > > We would obviously need to enable IOREQ for all domains with IOMMU > > enabled, as it would be IOREQ that register the PCI config space > > handlers. > > > >> For *unstable* interfaces (XEN_DOMCTL_assign_device) I would rather wait > >> before adding more changes on top of them, not because I don't care > >> about the mediated GPU problem (we do have something similar at AMD), > >> but because I worry that if we try to change them now we might not do a > >> good enough job. I would prefer to wait until we know more about the > >> actual use case, ideally with code supporting it. > >> > >> I think the difference in points of views comes from the fact that I see > >> vPCI as the default, QEMU only as a limited peripheral emulator (or > >> mediator for the vGPU case) but not in control. vPCI and QEMU are not > >> equal in my view. vPCI is in charge and always present if not in very > >> uncommon setups (even if we decide to hook it inside Xen by using > >> internal IOREQ interfaces). QEMU might come and go. > > > > Xen needs a single component that mediates accesses to resources, > > whether that's IOREQ, or something else I don't really care that much. > > I just wanted to share what the "something else" diagram might look like. > > ┌────────┐ ┌──────────┐ ┌──────────────────┐ > │ Memory │ │ IO Ports │ │ PCI config space │ > └────┬───┘ └────┬─────┘ └──────────┬───────┘ > │ │ │ > └──┬───────┴───────────────┬──┘ > │ PCI Resource Mediator │ > └────┬─────┬────────────┘ > │ │ > ┌───────┤ └────┬──────┐ > │ IOREQ │ │ vPCI │ > └────┬──┘ └───┬──┘ > │ │ > ┌────────────┴──┐ ┌───┴───────────┐ > │ IOREQ servers │ │ vPCI handlers │ > └───────────────┘ └───────────────┘ It's IMO weird that the PCI resource mediator also controls Memory and IO ports, since that's not a PCI specific resource. Isn't your proposed "PCI Resource Mediator" the same as what IOREQ currently does? I'm kind of confused as what benefit there is in introducing another layer that multiplexes guest resource accesses. > > Having vPCI mediate accesses to the PCI config space, and IOREQ to the > > memory (and on x86 IO port) space just seems awfully complicated for > > AFAICT no real benefit. > > > > Also, you seem to confabulate IOREQ with QEMU, while the latter is > > indeed an user of IOREQ, I do see IOREQ as a simple resource mediator > > inside of Xen, that has the ability to forward such accesses to > > external emulators using an hypercall interface. > > > >> Now that I am writing this, I realize this is also why I wasn't too > >> happy with the idea of hooking vPCI using IOREQ. It makes them look as > >> if they are the same, while I don't they should be considered at the > >> same level of priority, criticality, safety, integration in the system, > >> etc. > > > > I feel there are some fears with IOREQ from a safety PoV? The code > > that does the resource multiplexing is small, and as said above if > > there are safety concerns with the hypercall interface it would be > > fine to limit it's usage to internal handlers only. > > Would it make any sense at all to split the resource multiplexing bits from IOREQ into a new separate PCI resource mediator? That might be fine, but seems like a lot of work and more complexity in Xen for AFAICT no real benefit. I think I would need to better understand the worries with using IOREQ, but wouldn't it be easier to just limit the current IOREQ code/interface to suit your needs? Again without knowing exactly what are the issues with using IOREQ it's hard to propose solutions. Thanks, Roger.
On Tue, Dec 05, 2023 at 06:34:35PM -0800, Stefano Stabellini wrote: > On Tue, 5 Dec 2023, Roger Pau Monné wrote: > > > > > I don't think we should enable IOREQ servers to handle PCI passthrough > > > > > for PVH guests and/or guests with vPCI. If the domain has vPCI, PCI > > > > > Passthrough can be handled by vPCI just fine. I think this should be a > > > > > good anti-feature to have (a goal to explicitly not add this feature) to > > > > > reduce complexity. Unless you see a specific usecase to add support for > > > > > it? > > > > > > > > There are passthrough devices (GPUs) that might require some extra > > > > mediation on dom0 (like the Intel GVT-g thing) that would force the > > > > usage of ioreqs to passthrough. > > > > > > From an architectural perspective, I think it would be cleaner, simpler > > > to maintain, and simpler to understand if Xen was the sole owner of the > > > PCI Root Complex and PCI config space mediation (implemented with vPCI). > > > IOREQ can be used for emulation and it works very well for that. At > > > least in my mind, that makes things much simpler. > > > > But IOREQ already has all the code to mediate accesses to the PCI > > config space, and the interface to register separate servers for > > different PCI devices. > > > > We would then need to duplicate this internally for vPCI, so that vPCI > > could forward accesses to IOREQ just for IOREQ to forward to yet a > > different component? Seems like a lot of duplication for no benefit. > > [...] > > > Also, you seem to confabulate IOREQ with QEMU, while the latter is > > indeed an user of IOREQ, I do see IOREQ as a simple resource mediator > > inside of Xen, that has the ability to forward such accesses to > > external emulators using an hypercall interface. > > We have been using different terminologies until now. IOREQ could mean > anything from the ABI interface, the emulator side (QEMU) or the > hypervisor side (Xen). I am going to align with your wording and say: > > IOREQ: only the IOREQ implementation in Xen (xen/common/ioreq.c) > IOREQ server: QEMU or alternative > > I think it is OK if we use IOREQ internally within Xen to hook vPCI with > PCI config space accesses and emulation. I don't think it is a good idea > to attempt to enable IOREQ servers (e.g. QEMU) to implement PCI > Passthrough when vPCI is also enabled for the domain, at least > initially. I agree, it's perfectly fine to initially limit to vPCI passthrough devices + QEMU emulated devices only for example. I think it was mostly an issue with terminology then :). > > > > I understand there are non-trivial cases, like virtual GPUs with > > > hardware access, but I don't classify those as passthrough. That's > > > because there isn't one device that gets fully assigned to the guest. > > > Instead, there is an emulated device (hence IOREQ) with certain MMIO > > > regions and interrupts that end up being directly mapped from real > > > hardware. > > > > > > So I think it is natural in those cases to use IOREQ and it is also > > > natural to have QEMU remap MMIO/IRQs at runtime. From a vPCI > > > perspective, I hope it will mostly look as if the device is assigned to > > > Dom0. Even if it ends up being more complex than that, Rome wasn't > > > built in one day, and I don't think we should try to solve this problem > > > on day1 (as long as the interfaces are not stable interfaces). > > > > I don't see IOREQ as dealing explicitly with emulation. Yes, it does > > allow for emulators to be implemented in user-space, but at the end > > it's just an interface that allows forwarding accesses to certain > > resources (for the case we are speaking about here, PCI config space) > > to entities that registered as handlers. > > > > vPCI OTOH just deals with a very specific resource (PCI config space) > > and only allows internal handlers to be registered on a byte > > granularity. > > > > So your proposal would be to implement a hierarchy like the one on the > > diagram below: > > > > ┌────────┐ ┌──────────┐ ┌──────────────────┐ > > │ Memory │ │ IO Ports │ │ PCI config space │ > > └───────┬┘ └┬─────────┘ └───┬──────────────┘ > > │ │ │ > > │ │ ┌───┴──┐ > > │ │ │ vPCI │ > > │ │ └─┬──┬─┘ > > ┌──┴───┴┐ │ │ > > │ IOREQ ├────────────┘ │ > > └────┬──┘ │ > > │ │ > > ┌────────────┴──┐ ┌┴──────────────┐ > > │ IOREQ servers │ │ vPCI handlers │ > > └───────────────┘ └───────────────┘ > > Yes > > > > While what I'm proposing would look like: > > > > ┌────────┐ ┌──────────┐ ┌──────────────────┐ > > │ Memory │ │ IO Ports │ │ PCI config space │ > > └────┬───┘ └────┬─────┘ └────────┬─────────┘ > > │ │ │ > > └─────┬────┴────┬───────────┘ > > │ IOREQ │ > > └─┬─────┬─┘ > > │ │ > > ┌───────────────┤ └────┬──────┐ > > │ IOREQ servers │ │ vPCI │ > > └───────────────┘ └───┬──┘ > > │ > > ┌───┴───────────┐ > > │ vPCI handlers │ > > └───────────────┘ > > I don't have a major problem with this, but I find it less clear than > the first one. > > Let's say that all domains are PVH (or ARM guests). QEMU is running in > Dom0. If QEMU does emulation, that's fine. If QEMU implements PCI > Passthrough then QEMU uses libpci to do PCI config space reads and > write, which go to the Linux kernel in Dom0, which ends up doing PCI > config space reads and writes on the device, and that goes via vPCI in > Xen (vPCI for Dom0). So actually vPCI is still present. It is a lot > simpler to think that vPCI is in charge of all mediated PCI config space > accesses rather than thinking that for the same device vPCI handles PCI > config space accesses for Dom0 but not for DomU. So most of the issue is again with terminology I think, you would like to avoid even having to mention the word IOREQ for PVH domains for example, which you could possibly do if vPCI trapped all accesses to the PCI config space. I would be fine with renaming that internal IOREQ component to something else. What I insist on having is a single component that multiplexes access to all platform resources (IO ports, MMIO, PCI config space), so that we can have a (kind of) unified interface to register handlers. > It is not my preference but I am OK to compromise and go ahead with the > architecture you proposed but please let's keep IOREQ servers out of the > PCI Passthrough picture at least initially. > > > > I'm obviously biased, but I think the latter is cleaner, and allows > > all resources to be arbitrated by the same component (IOREQ). > > > > If the concern is about the IOREQ hypercall interface, it would be > > fine to introduce an option that limit IOREQs to internal users > > (vPCI) without supporting external IOREQ servers. > > > > Think of IOREQ as a resource mediator inside of Xen, that just does > > the PCI address decoding and forwards the access to the interested > > party, either an external IOREQ server or vPCI. > > The part about IOREQ (xen/common/ioreq.c) being a resource mediator is > OKish. > > I had many discussions over the years with various members of the larger > open source embedded community (Linaro, etc.) and the problem is that > when one says "IOREQ" typically people think of QEMU or other userspace > emulators. They don't think of the Xen side of it. This becomes very > relevant here because Xen is the only part of the system that is > getting safety-certified and it is important to convey the message that > nothing else in required to be safety-certified to have a fully working > Xen system that supports PCI Passthrough. > > In short, it is important that the community doesn't get the idea that > QEMU needs to be safety-certified to have PCI Passthrough working > correctly with Xen in a safety scenario. Maybe we need to rename that internal IOREQ component to something else, and then IOREQ would strictly be limited to the hypercall interface + IOREQ servers. Or maybe we just need more education/documentation around the difference between the internal side of IOREQs vs IOREQ servers vs QEMU. See for example demu, which is an emulator for a PC-like compatible system using IOREQ servers: https://xenbits.xen.org/gitweb/?p=people/pauldu/demu.git;a=summary > > > > > It's important that the interfaces we introduce are correct IMO, > > > > because that ends up reflecting on the configuration options that we > > > > expose to users on xl/libxl. While both XEN_DOMCTL_createdomain and > > > > XEN_DOMCTL_assign_device are unstable interfaces, how the vPCI option > > > > gets placed there will ultimately influence how the option gets > > > > exposed in xl/libxl, and the interface there is relevant to keep > > > > stable for end user sanity. > > > > > > I agree with you on the stable interfaces. The important part is not to > > > introduce changes to stable interfaces that could limit us in the > > > future. Specifically that includes xl and libxl, we need to be careful > > > there. But I don't see a single per-domain vPCI enable/disable option as > > > a problem. Let's say that in the future we have a mediated vGPU > > > implementation: if it works together with vPCI then the per-domain vPCI > > > option in libxl will be enabled (either explicitely or by default), if > > > it doesn't then vPCI will be disabled (either explicitely or by the > > > newer vGPU options.) > > > > If vPCI is hooked into IOREQ there won't be a need anymore to register > > the vPCI config space traps, as that would be done by IOREQ, and hence > > vPCI managed devices could be registered at runtime with IOREQ. IOW: > > there won't be a need anymore to signal at domain creation whether > > vPCI is intended to be used or not. > > For safety, we have requirements to specify everything statically before > boot so typically anything dynamic is a problem. > > > > We would obviously need to enable IOREQ for all domains with IOMMU > > enabled, as it would be IOREQ that register the PCI config space > > handlers. > > This bit might be OK > > > > > For *unstable* interfaces (XEN_DOMCTL_assign_device) I would rather wait > > > before adding more changes on top of them, not because I don't care > > > about the mediated GPU problem (we do have something similar at AMD), > > > but because I worry that if we try to change them now we might not do a > > > good enough job. I would prefer to wait until we know more about the > > > actual use case, ideally with code supporting it. > > > > > > I think the difference in points of views comes from the fact that I see > > > vPCI as the default, QEMU only as a limited peripheral emulator (or > > > mediator for the vGPU case) but not in control. vPCI and QEMU are not > > > equal in my view. vPCI is in charge and always present if not in very > > > uncommon setups (even if we decide to hook it inside Xen by using > > > internal IOREQ interfaces). QEMU might come and go. > > > > Xen needs a single component that mediates accesses to resources, > > whether that's IOREQ, or something else I don't really care that much. > > Having vPCI mediate accesses to the PCI config space, and IOREQ to the > > memory (and on x86 IO port) space just seems awfully complicated for > > AFAICT no real benefit. > > > > > Now that I am writing this, I realize this is also why I wasn't too > > > happy with the idea of hooking vPCI using IOREQ. It makes them look as > > > if they are the same, while I don't they should be considered at the > > > same level of priority, criticality, safety, integration in the system, > > > etc. > > > > I feel there are some fears with IOREQ from a safety PoV? The code > > that does the resource multiplexing is small, and as said above if > > there are safety concerns with the hypercall interface it would be > > fine to limit it's usage to internal handlers only. > > Yes it is about safety. Everything within Xen will be safety-certified, > hence usable in a safety critical scenario, everything outside of Xen > might not. > > The fear is not on the IOREQ itself because xen/common/ioreq.c is part > of the certification scope. The fear is that IOREQ servers (e.g. QEMU) > are somehow in the picture when we discuss safety architectures with PCI > Passthrough, or that IOREQ servers could interfere with vPCI. By > "interfere" I mean that QEMU running in dom0 (a deprivileged dom0) will > be able to cause a malfunction in Xen vPCI. For that purpose it doesn't matter much how IOREQs or vPCI interact, as any (buggy) interaction could possibly allow IOREQ to cause malfunctions to vPCI. > Yes, limiting the hypercall interface would help in that regard because > it would limit Xen exposure. That would be fine IMO, it could even be a Kconfig option if that better suits your needs. Thanks, Roger.
On Mon, 11 Dec 2023, Roger Pau Monné wrote: > On Tue, Dec 05, 2023 at 06:34:35PM -0800, Stefano Stabellini wrote: > > On Tue, 5 Dec 2023, Roger Pau Monné wrote: > > > > > > I don't think we should enable IOREQ servers to handle PCI passthrough > > > > > > for PVH guests and/or guests with vPCI. If the domain has vPCI, PCI > > > > > > Passthrough can be handled by vPCI just fine. I think this should be a > > > > > > good anti-feature to have (a goal to explicitly not add this feature) to > > > > > > reduce complexity. Unless you see a specific usecase to add support for > > > > > > it? > > > > > > > > > > There are passthrough devices (GPUs) that might require some extra > > > > > mediation on dom0 (like the Intel GVT-g thing) that would force the > > > > > usage of ioreqs to passthrough. > > > > > > > > From an architectural perspective, I think it would be cleaner, simpler > > > > to maintain, and simpler to understand if Xen was the sole owner of the > > > > PCI Root Complex and PCI config space mediation (implemented with vPCI). > > > > IOREQ can be used for emulation and it works very well for that. At > > > > least in my mind, that makes things much simpler. > > > > > > But IOREQ already has all the code to mediate accesses to the PCI > > > config space, and the interface to register separate servers for > > > different PCI devices. > > > > > > We would then need to duplicate this internally for vPCI, so that vPCI > > > could forward accesses to IOREQ just for IOREQ to forward to yet a > > > different component? Seems like a lot of duplication for no benefit. > > > > [...] > > > > > Also, you seem to confabulate IOREQ with QEMU, while the latter is > > > indeed an user of IOREQ, I do see IOREQ as a simple resource mediator > > > inside of Xen, that has the ability to forward such accesses to > > > external emulators using an hypercall interface. > > > > We have been using different terminologies until now. IOREQ could mean > > anything from the ABI interface, the emulator side (QEMU) or the > > hypervisor side (Xen). I am going to align with your wording and say: > > > > IOREQ: only the IOREQ implementation in Xen (xen/common/ioreq.c) > > IOREQ server: QEMU or alternative > > > > I think it is OK if we use IOREQ internally within Xen to hook vPCI with > > PCI config space accesses and emulation. I don't think it is a good idea > > to attempt to enable IOREQ servers (e.g. QEMU) to implement PCI > > Passthrough when vPCI is also enabled for the domain, at least > > initially. > > I agree, it's perfectly fine to initially limit to vPCI passthrough > devices + QEMU emulated devices only for example. OK good > I think it was mostly an issue with terminology then :). Yes :) > > > > I understand there are non-trivial cases, like virtual GPUs with > > > > hardware access, but I don't classify those as passthrough. That's > > > > because there isn't one device that gets fully assigned to the guest. > > > > Instead, there is an emulated device (hence IOREQ) with certain MMIO > > > > regions and interrupts that end up being directly mapped from real > > > > hardware. > > > > > > > > So I think it is natural in those cases to use IOREQ and it is also > > > > natural to have QEMU remap MMIO/IRQs at runtime. From a vPCI > > > > perspective, I hope it will mostly look as if the device is assigned to > > > > Dom0. Even if it ends up being more complex than that, Rome wasn't > > > > built in one day, and I don't think we should try to solve this problem > > > > on day1 (as long as the interfaces are not stable interfaces). > > > > > > I don't see IOREQ as dealing explicitly with emulation. Yes, it does > > > allow for emulators to be implemented in user-space, but at the end > > > it's just an interface that allows forwarding accesses to certain > > > resources (for the case we are speaking about here, PCI config space) > > > to entities that registered as handlers. > > > > > > vPCI OTOH just deals with a very specific resource (PCI config space) > > > and only allows internal handlers to be registered on a byte > > > granularity. > > > > > > So your proposal would be to implement a hierarchy like the one on the > > > diagram below: > > > > > > ┌────────┐ ┌──────────┐ ┌──────────────────┐ > > > │ Memory │ │ IO Ports │ │ PCI config space │ > > > └───────┬┘ └┬─────────┘ └───┬──────────────┘ > > > │ │ │ > > > │ │ ┌───┴──┐ > > > │ │ │ vPCI │ > > > │ │ └─┬──┬─┘ > > > ┌──┴───┴┐ │ │ > > > │ IOREQ ├────────────┘ │ > > > └────┬──┘ │ > > > │ │ > > > ┌────────────┴──┐ ┌┴──────────────┐ > > > │ IOREQ servers │ │ vPCI handlers │ > > > └───────────────┘ └───────────────┘ > > > > Yes > > > > > > > While what I'm proposing would look like: > > > > > > ┌────────┐ ┌──────────┐ ┌──────────────────┐ > > > │ Memory │ │ IO Ports │ │ PCI config space │ > > > └────┬───┘ └────┬─────┘ └────────┬─────────┘ > > > │ │ │ > > > └─────┬────┴────┬───────────┘ > > > │ IOREQ │ > > > └─┬─────┬─┘ > > > │ │ > > > ┌───────────────┤ └────┬──────┐ > > > │ IOREQ servers │ │ vPCI │ > > > └───────────────┘ └───┬──┘ > > > │ > > > ┌───┴───────────┐ > > > │ vPCI handlers │ > > > └───────────────┘ > > > > I don't have a major problem with this, but I find it less clear than > > the first one. > > > > Let's say that all domains are PVH (or ARM guests). QEMU is running in > > Dom0. If QEMU does emulation, that's fine. If QEMU implements PCI > > Passthrough then QEMU uses libpci to do PCI config space reads and > > write, which go to the Linux kernel in Dom0, which ends up doing PCI > > config space reads and writes on the device, and that goes via vPCI in > > Xen (vPCI for Dom0). So actually vPCI is still present. It is a lot > > simpler to think that vPCI is in charge of all mediated PCI config space > > accesses rather than thinking that for the same device vPCI handles PCI > > config space accesses for Dom0 but not for DomU. > > So most of the issue is again with terminology I think, you would > like to avoid even having to mention the word IOREQ for PVH domains > for example, which you could possibly do if vPCI trapped all accesses > to the PCI config space. > > I would be fine with renaming that internal IOREQ component to > something else. What I insist on having is a single component that > multiplexes access to all platform resources (IO ports, MMIO, PCI > config space), so that we can have a (kind of) unified interface to > register handlers. Yes I am OK with that. A single multiplexer is fine, however we need to be careful as IOREQ in Xen has a lot of stuff about handling messages to and from QEMU and state changes related to it, see ioreq_send and ioreq_send_buffered. > > It is not my preference but I am OK to compromise and go ahead with the > > architecture you proposed but please let's keep IOREQ servers out of the > > PCI Passthrough picture at least initially. > > > > > > > I'm obviously biased, but I think the latter is cleaner, and allows > > > all resources to be arbitrated by the same component (IOREQ). > > > > > > If the concern is about the IOREQ hypercall interface, it would be > > > fine to introduce an option that limit IOREQs to internal users > > > (vPCI) without supporting external IOREQ servers. > > > > > > Think of IOREQ as a resource mediator inside of Xen, that just does > > > the PCI address decoding and forwards the access to the interested > > > party, either an external IOREQ server or vPCI. > > > > The part about IOREQ (xen/common/ioreq.c) being a resource mediator is > > OKish. > > > > I had many discussions over the years with various members of the larger > > open source embedded community (Linaro, etc.) and the problem is that > > when one says "IOREQ" typically people think of QEMU or other userspace > > emulators. They don't think of the Xen side of it. This becomes very > > relevant here because Xen is the only part of the system that is > > getting safety-certified and it is important to convey the message that > > nothing else in required to be safety-certified to have a fully working > > Xen system that supports PCI Passthrough. > > > > In short, it is important that the community doesn't get the idea that > > QEMU needs to be safety-certified to have PCI Passthrough working > > correctly with Xen in a safety scenario. > > Maybe we need to rename that internal IOREQ component to something > else, and then IOREQ would strictly be limited to the hypercall > interface + IOREQ servers. Right. We could also keep calling IOREQ things in Xen strictly related to handling the message passing interface to QEMU, e.g. ioreq_send. > Or maybe we just need more education/documentation around the > difference between the internal side of IOREQs vs IOREQ servers vs > QEMU. See for example demu, which is an emulator for a PC-like > compatible system using IOREQ servers: > > https://xenbits.xen.org/gitweb/?p=people/pauldu/demu.git;a=summary > > > > > > > > It's important that the interfaces we introduce are correct IMO, > > > > > because that ends up reflecting on the configuration options that we > > > > > expose to users on xl/libxl. While both XEN_DOMCTL_createdomain and > > > > > XEN_DOMCTL_assign_device are unstable interfaces, how the vPCI option > > > > > gets placed there will ultimately influence how the option gets > > > > > exposed in xl/libxl, and the interface there is relevant to keep > > > > > stable for end user sanity. > > > > > > > > I agree with you on the stable interfaces. The important part is not to > > > > introduce changes to stable interfaces that could limit us in the > > > > future. Specifically that includes xl and libxl, we need to be careful > > > > there. But I don't see a single per-domain vPCI enable/disable option as > > > > a problem. Let's say that in the future we have a mediated vGPU > > > > implementation: if it works together with vPCI then the per-domain vPCI > > > > option in libxl will be enabled (either explicitely or by default), if > > > > it doesn't then vPCI will be disabled (either explicitely or by the > > > > newer vGPU options.) > > > > > > If vPCI is hooked into IOREQ there won't be a need anymore to register > > > the vPCI config space traps, as that would be done by IOREQ, and hence > > > vPCI managed devices could be registered at runtime with IOREQ. IOW: > > > there won't be a need anymore to signal at domain creation whether > > > vPCI is intended to be used or not. > > > > For safety, we have requirements to specify everything statically before > > boot so typically anything dynamic is a problem. > > > > > > > We would obviously need to enable IOREQ for all domains with IOMMU > > > enabled, as it would be IOREQ that register the PCI config space > > > handlers. > > > > This bit might be OK > > > > > > > > For *unstable* interfaces (XEN_DOMCTL_assign_device) I would rather wait > > > > before adding more changes on top of them, not because I don't care > > > > about the mediated GPU problem (we do have something similar at AMD), > > > > but because I worry that if we try to change them now we might not do a > > > > good enough job. I would prefer to wait until we know more about the > > > > actual use case, ideally with code supporting it. > > > > > > > > I think the difference in points of views comes from the fact that I see > > > > vPCI as the default, QEMU only as a limited peripheral emulator (or > > > > mediator for the vGPU case) but not in control. vPCI and QEMU are not > > > > equal in my view. vPCI is in charge and always present if not in very > > > > uncommon setups (even if we decide to hook it inside Xen by using > > > > internal IOREQ interfaces). QEMU might come and go. > > > > > > Xen needs a single component that mediates accesses to resources, > > > whether that's IOREQ, or something else I don't really care that much. > > > Having vPCI mediate accesses to the PCI config space, and IOREQ to the > > > memory (and on x86 IO port) space just seems awfully complicated for > > > AFAICT no real benefit. > > > > > > > Now that I am writing this, I realize this is also why I wasn't too > > > > happy with the idea of hooking vPCI using IOREQ. It makes them look as > > > > if they are the same, while I don't they should be considered at the > > > > same level of priority, criticality, safety, integration in the system, > > > > etc. > > > > > > I feel there are some fears with IOREQ from a safety PoV? The code > > > that does the resource multiplexing is small, and as said above if > > > there are safety concerns with the hypercall interface it would be > > > fine to limit it's usage to internal handlers only. > > > > Yes it is about safety. Everything within Xen will be safety-certified, > > hence usable in a safety critical scenario, everything outside of Xen > > might not. > > > > The fear is not on the IOREQ itself because xen/common/ioreq.c is part > > of the certification scope. The fear is that IOREQ servers (e.g. QEMU) > > are somehow in the picture when we discuss safety architectures with PCI > > Passthrough, or that IOREQ servers could interfere with vPCI. By > > "interfere" I mean that QEMU running in dom0 (a deprivileged dom0) will > > be able to cause a malfunction in Xen vPCI. > > For that purpose it doesn't matter much how IOREQs or vPCI interact, > as any (buggy) interaction could possibly allow IOREQ to cause > malfunctions to vPCI. yep > > Yes, limiting the hypercall interface would help in that regard because > > it would limit Xen exposure. > > That would be fine IMO, it could even be a Kconfig option if that > better suits your needs. OK. I think we are aligned.
diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig index 5ff68e5d5979..3845b238a33f 100644 --- a/xen/arch/arm/Kconfig +++ b/xen/arch/arm/Kconfig @@ -195,6 +195,7 @@ config PCI_PASSTHROUGH depends on ARM_64 select HAS_PCI select HAS_VPCI + select HAS_VPCI_GUEST_SUPPORT default n help This option enables PCI device passthrough diff --git a/xen/arch/arm/include/asm/domain.h b/xen/arch/arm/include/asm/domain.h index 3614562eaefe..8e6d5fe9578c 100644 --- a/xen/arch/arm/include/asm/domain.h +++ b/xen/arch/arm/include/asm/domain.h @@ -31,6 +31,8 @@ enum domain_type { #define is_domain_direct_mapped(d) ((d)->cdf & CDF_directmap) +#define arch_needs_vpci(d) ({ (void)(d); true; }) + /* * Is the domain using the host memory layout? * diff --git a/xen/arch/arm/vpci.c b/xen/arch/arm/vpci.c index 3bc4bb55082a..61e0edcedea9 100644 --- a/xen/arch/arm/vpci.c +++ b/xen/arch/arm/vpci.c @@ -2,6 +2,7 @@ /* * xen/arch/arm/vpci.c */ +#include <xen/lib.h> #include <xen/sched.h> #include <xen/vpci.h> @@ -90,8 +91,15 @@ int domain_vpci_init(struct domain *d) return ret; } else + { + if ( !IS_ENABLED(CONFIG_HAS_VPCI_GUEST_SUPPORT) ) + { + gdprintk(XENLOG_ERR, "vPCI requested but guest support not enabled\n"); + return -EINVAL; + } register_mmio_handler(d, &vpci_mmio_handler, GUEST_VPCI_ECAM_BASE, GUEST_VPCI_ECAM_SIZE, NULL); + } return 0; } diff --git a/xen/arch/x86/include/asm/domain.h b/xen/arch/x86/include/asm/domain.h index cb02a4d1ebb2..d34015391eed 100644 --- a/xen/arch/x86/include/asm/domain.h +++ b/xen/arch/x86/include/asm/domain.h @@ -503,6 +503,8 @@ struct arch_domain #define has_vpit(d) (!!((d)->arch.emulation_flags & X86_EMU_PIT)) #define has_pirq(d) (!!((d)->arch.emulation_flags & X86_EMU_USE_PIRQ)) +#define arch_needs_vpci(d) ({ (void)(d); false; }) + #define gdt_ldt_pt_idx(v) \ ((v)->vcpu_id >> (PAGETABLE_ORDER - GDT_LDT_VCPU_SHIFT)) #define pv_gdt_ptes(v) \ diff --git a/xen/common/domain.c b/xen/common/domain.c index 12dc27428972..47d49c57bf83 100644 --- a/xen/common/domain.c +++ b/xen/common/domain.c @@ -692,6 +692,11 @@ struct domain *domain_create(domid_t domid, if ( !is_idle_domain(d) ) { + err = -EINVAL; + if ( !is_hardware_domain(d) && (config->flags & XEN_DOMCTL_CDF_vpci) && + !IS_ENABLED(CONFIG_HAS_VPCI_GUEST_SUPPORT) ) + goto fail; + if ( !is_hardware_domain(d) ) d->nr_pirqs = nr_static_irqs + extra_domU_irqs; else diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c index 04d00c7c37df..2203725a2aa6 100644 --- a/xen/drivers/passthrough/pci.c +++ b/xen/drivers/passthrough/pci.c @@ -1542,6 +1542,18 @@ void iommu_dev_iotlb_flush_timeout(struct domain *d, struct pci_dev *pdev) pcidevs_unlock(); } +static bool needs_vpci(const struct domain *d) +{ + if ( is_hardware_domain(d) ) + return false; + + if ( d == dom_io ) + /* xl pci-assignable-add assigns PCI devices to domIO */ + return false; + + return arch_needs_vpci(d); +} + int iommu_do_pci_domctl( struct xen_domctl *domctl, struct domain *d, XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl) @@ -1618,6 +1630,14 @@ int iommu_do_pci_domctl( bus = PCI_BUS(machine_sbdf); devfn = PCI_DEVFN(machine_sbdf); + if ( needs_vpci(d) && !has_vpci(d) ) + { + printk(XENLOG_G_WARNING "Cannot assign %pp to %pd: vPCI support not enabled\n", + &PCI_SBDF(seg, bus, devfn), d); + ret = -EPERM; + break; + } + pcidevs_lock(); ret = device_assigned(seg, bus, devfn); if ( domctl->cmd == XEN_DOMCTL_test_assign_device )
Select HAS_VPCI_GUEST_SUPPORT in Kconfig for enabling vPCI support for domUs. Add checks to fail guest creation if the configuration is invalid. Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com> --- As the tag implies, this patch is not intended to be merged (yet). Note that CONFIG_HAS_VPCI_GUEST_SUPPORT is not currently used in the upstream code base. It will be used by the vPCI series [1]. This patch is intended to be merged as part of the vPCI series. I'll coordinate with Volodymyr to include this in the vPCI series or resend afterwards. Meanwhile, I'll include it here until the Kconfig and xen_domctl_createdomain prerequisites have been committed. v5->v6: * drop is_pvh_domain(), simply make arch_needs_vpci() return false on x86 for now * leave has_vpci() alone, instead, add HAS_VPCI_GUEST_SUPPORT check in domain_create v4->v5: * replace is_system_domain() check with dom_io check * return an error in XEN_DOMCTL_assign_device (thanks Jan!) * remove CONFIG_ARM check * add needs_vpci() and arch_needs_vpci() * add HAS_VPCI_GUEST_SUPPORT check to has_vpci() v3->v4: * refuse to create domain if configuration is invalid * split toolstack change into separate patch v2->v3: * set pci flags in toolstack v1->v2: * new patch [1] https://lists.xenproject.org/archives/html/xen-devel/2023-10/msg00660.html --- xen/arch/arm/Kconfig | 1 + xen/arch/arm/include/asm/domain.h | 2 ++ xen/arch/arm/vpci.c | 8 ++++++++ xen/arch/x86/include/asm/domain.h | 2 ++ xen/common/domain.c | 5 +++++ xen/drivers/passthrough/pci.c | 20 ++++++++++++++++++++ 6 files changed, 38 insertions(+)