Message ID | 20210930075223.860329-12-andr2000@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | PCI devices passthrough on Arm, part 3 | expand |
On 30.09.2021 09:52, Oleksandr Andrushchenko wrote: > --- a/xen/drivers/passthrough/pci.c > +++ b/xen/drivers/passthrough/pci.c > @@ -889,6 +889,31 @@ int pci_remove_virtual_device(struct domain *d, const struct pci_dev *pdev) > xfree(vdev); > return 0; > } > + > +/* > + * Find the physical device which is mapped to the virtual device > + * and translate virtual SBDF to the physical one. > + */ > +bool pci_translate_virtual_device(const struct domain *d, pci_sbdf_t *sbdf) > +{ > + struct vpci_dev *vdev; const (afaict) > + bool found = false; > + > + pcidevs_lock(); > + list_for_each_entry ( vdev, &d->vdev_list, list ) > + { > + if ( vdev->sbdf.sbdf == sbdf->sbdf ) > + { > + /* Replace virtual SBDF with the physical one. */ > + *sbdf = vdev->pdev->sbdf; > + found = true; > + break; > + } > + } > + pcidevs_unlock(); As per the comments on the earlier patch, locking as well as placement may need reconsidering. Jan
On 30.09.21 11:53, Jan Beulich wrote: > On 30.09.2021 09:52, Oleksandr Andrushchenko wrote: >> --- a/xen/drivers/passthrough/pci.c >> +++ b/xen/drivers/passthrough/pci.c >> @@ -889,6 +889,31 @@ int pci_remove_virtual_device(struct domain *d, const struct pci_dev *pdev) >> xfree(vdev); >> return 0; >> } >> + >> +/* >> + * Find the physical device which is mapped to the virtual device >> + * and translate virtual SBDF to the physical one. >> + */ >> +bool pci_translate_virtual_device(const struct domain *d, pci_sbdf_t *sbdf) >> +{ >> + struct vpci_dev *vdev; > const (afaict) Ok > >> + bool found = false; >> + >> + pcidevs_lock(); >> + list_for_each_entry ( vdev, &d->vdev_list, list ) >> + { >> + if ( vdev->sbdf.sbdf == sbdf->sbdf ) >> + { >> + /* Replace virtual SBDF with the physical one. */ >> + *sbdf = vdev->pdev->sbdf; >> + found = true; >> + break; >> + } >> + } >> + pcidevs_unlock(); > As per the comments on the earlier patch, locking as well as placement > may need reconsidering. Other then that do you have other comments on this? > > Jan > Thank you, Oleksandr
On 30.09.2021 11:35, Oleksandr Andrushchenko wrote: > On 30.09.21 11:53, Jan Beulich wrote: >> On 30.09.2021 09:52, Oleksandr Andrushchenko wrote: >>> --- a/xen/drivers/passthrough/pci.c >>> +++ b/xen/drivers/passthrough/pci.c >>> @@ -889,6 +889,31 @@ int pci_remove_virtual_device(struct domain *d, const struct pci_dev *pdev) >>> xfree(vdev); >>> return 0; >>> } >>> + >>> +/* >>> + * Find the physical device which is mapped to the virtual device >>> + * and translate virtual SBDF to the physical one. >>> + */ >>> +bool pci_translate_virtual_device(const struct domain *d, pci_sbdf_t *sbdf) >>> +{ >>> + struct vpci_dev *vdev; >> const (afaict) > Ok >> >>> + bool found = false; >>> + >>> + pcidevs_lock(); >>> + list_for_each_entry ( vdev, &d->vdev_list, list ) >>> + { >>> + if ( vdev->sbdf.sbdf == sbdf->sbdf ) >>> + { >>> + /* Replace virtual SBDF with the physical one. */ >>> + *sbdf = vdev->pdev->sbdf; >>> + found = true; >>> + break; >>> + } >>> + } >>> + pcidevs_unlock(); >> As per the comments on the earlier patch, locking as well as placement >> may need reconsidering. > Other then that do you have other comments on this? Iirc this was the only thing here. But I haven't got around to look at patches 4-9 yet ... Jan
[snip] >> + bool found = false; >> + >> + pcidevs_lock(); >> + list_for_each_entry ( vdev, &d->vdev_list, list ) >> + { >> + if ( vdev->sbdf.sbdf == sbdf->sbdf ) >> + { >> + /* Replace virtual SBDF with the physical one. */ >> + *sbdf = vdev->pdev->sbdf; >> + found = true; >> + break; >> + } >> + } >> + pcidevs_unlock(); > As per the comments on the earlier patch, locking as well as placement > may need reconsidering. I was thinking about the locking happening here. So, there are 4 sources where we need to manipulate d->vdev_list: 1. XEN_DOMCTL_assign_device 2. XEN_DOMCTL_test_assign_device 3. XEN_DOMCTL_deassign_device 4. MMIO handlers 5. Do I miss others? The first three already use pcidevs_{lock|unlock} and there it seems to be ok as those get called when PCI devices are discovered by Dom0 and during guest domain creation. So, this is assumed not to happen frequently and can be accepted wrt global locking. What is more important is the fourth case, where in order to redirect configuration space access from virtual SBDF to physical SBDF we need to traverse the d->vdev_list each time the guest accesses PCI configuration space. This means that with each such access we take a BIG PCI lock... That being said, I think that we may want having a dedicated per-domain lock for d->vdev_list handling, e.g. d->vdev_lock. At the same time we may also consider that even for guests it is acceptable to use pcidevs_{lock|unlock} as this will not affect PCI memory space access and only has influence during device setup. I would love to hear your opinion on this > Jan > Thank you, Oleksandr
On 30.09.2021 18:57, Oleksandr Andrushchenko wrote: > [snip] > >>> + bool found = false; >>> + >>> + pcidevs_lock(); >>> + list_for_each_entry ( vdev, &d->vdev_list, list ) >>> + { >>> + if ( vdev->sbdf.sbdf == sbdf->sbdf ) >>> + { >>> + /* Replace virtual SBDF with the physical one. */ >>> + *sbdf = vdev->pdev->sbdf; >>> + found = true; >>> + break; >>> + } >>> + } >>> + pcidevs_unlock(); >> As per the comments on the earlier patch, locking as well as placement >> may need reconsidering. > I was thinking about the locking happening here. > So, there are 4 sources where we need to manipulate d->vdev_list: > 1. XEN_DOMCTL_assign_device > 2. XEN_DOMCTL_test_assign_device > 3. XEN_DOMCTL_deassign_device > 4. MMIO handlers > 5. Do I miss others? > > The first three already use pcidevs_{lock|unlock} and there it seems > to be ok as those get called when PCI devices are discovered by Dom0 > and during guest domain creation. So, this is assumed not to happen > frequently and can be accepted wrt global locking. > > What is more important is the fourth case, where in order to redirect > configuration space access from virtual SBDF to physical SBDF we need > to traverse the d->vdev_list each time the guest accesses PCI configuration > space. This means that with each such access we take a BIG PCI lock... > > That being said, I think that we may want having a dedicated per-domain > lock for d->vdev_list handling, e.g. d->vdev_lock. > At the same time we may also consider that even for guests it is acceptable > to use pcidevs_{lock|unlock} as this will not affect PCI memory space access > and only has influence during device setup. > > I would love to hear your opinion on this I've voiced my opinion already: Using the global lock really is an abuse, which would require good justification. Hence unless there's anything speaking against a per-domain lock, that's imo the only suitable route to go. Nesting rules with the global lock may want explicitly spelling out. Jan
On 01.10.21 10:42, Jan Beulich wrote: > On 30.09.2021 18:57, Oleksandr Andrushchenko wrote: >> [snip] >> >>>> + bool found = false; >>>> + >>>> + pcidevs_lock(); >>>> + list_for_each_entry ( vdev, &d->vdev_list, list ) >>>> + { >>>> + if ( vdev->sbdf.sbdf == sbdf->sbdf ) >>>> + { >>>> + /* Replace virtual SBDF with the physical one. */ >>>> + *sbdf = vdev->pdev->sbdf; >>>> + found = true; >>>> + break; >>>> + } >>>> + } >>>> + pcidevs_unlock(); >>> As per the comments on the earlier patch, locking as well as placement >>> may need reconsidering. >> I was thinking about the locking happening here. >> So, there are 4 sources where we need to manipulate d->vdev_list: >> 1. XEN_DOMCTL_assign_device >> 2. XEN_DOMCTL_test_assign_device >> 3. XEN_DOMCTL_deassign_device >> 4. MMIO handlers >> 5. Do I miss others? >> >> The first three already use pcidevs_{lock|unlock} and there it seems >> to be ok as those get called when PCI devices are discovered by Dom0 >> and during guest domain creation. So, this is assumed not to happen >> frequently and can be accepted wrt global locking. >> >> What is more important is the fourth case, where in order to redirect >> configuration space access from virtual SBDF to physical SBDF we need >> to traverse the d->vdev_list each time the guest accesses PCI configuration >> space. This means that with each such access we take a BIG PCI lock... >> >> That being said, I think that we may want having a dedicated per-domain >> lock for d->vdev_list handling, e.g. d->vdev_lock. >> At the same time we may also consider that even for guests it is acceptable >> to use pcidevs_{lock|unlock} as this will not affect PCI memory space access >> and only has influence during device setup. >> >> I would love to hear your opinion on this > I've voiced my opinion already: Using the global lock really is an > abuse, which would require good justification. Hence unless there's > anything speaking against a per-domain lock, that's imo the only > suitable route to go. Nesting rules with the global lock may want > explicitly spelling out. I do understand your concern here and also support the idea that the less we wait for locks the better. Nevertheless, even if I introduce d->vdev_lock, which will obviously help MMIO traps, the rest will remain under pcidevs_{lock|unlock}, e.g. XEN_DOMCTL_assign_device, XEN_DOMCTL_test_assign_device and XEN_DOMCTL_deassign_device and the underlying code like vpci_{assign|deassign}_device in my case Anyways, I'll implement a per-domain d->vdev_lock > > Jan > Thank you, Oleksandr
On 01.10.2021 09:57, Oleksandr Andrushchenko wrote: > > > On 01.10.21 10:42, Jan Beulich wrote: >> On 30.09.2021 18:57, Oleksandr Andrushchenko wrote: >>> [snip] >>> >>>>> + bool found = false; >>>>> + >>>>> + pcidevs_lock(); >>>>> + list_for_each_entry ( vdev, &d->vdev_list, list ) >>>>> + { >>>>> + if ( vdev->sbdf.sbdf == sbdf->sbdf ) >>>>> + { >>>>> + /* Replace virtual SBDF with the physical one. */ >>>>> + *sbdf = vdev->pdev->sbdf; >>>>> + found = true; >>>>> + break; >>>>> + } >>>>> + } >>>>> + pcidevs_unlock(); >>>> As per the comments on the earlier patch, locking as well as placement >>>> may need reconsidering. >>> I was thinking about the locking happening here. >>> So, there are 4 sources where we need to manipulate d->vdev_list: >>> 1. XEN_DOMCTL_assign_device >>> 2. XEN_DOMCTL_test_assign_device >>> 3. XEN_DOMCTL_deassign_device >>> 4. MMIO handlers >>> 5. Do I miss others? >>> >>> The first three already use pcidevs_{lock|unlock} and there it seems >>> to be ok as those get called when PCI devices are discovered by Dom0 >>> and during guest domain creation. So, this is assumed not to happen >>> frequently and can be accepted wrt global locking. >>> >>> What is more important is the fourth case, where in order to redirect >>> configuration space access from virtual SBDF to physical SBDF we need >>> to traverse the d->vdev_list each time the guest accesses PCI configuration >>> space. This means that with each such access we take a BIG PCI lock... >>> >>> That being said, I think that we may want having a dedicated per-domain >>> lock for d->vdev_list handling, e.g. d->vdev_lock. >>> At the same time we may also consider that even for guests it is acceptable >>> to use pcidevs_{lock|unlock} as this will not affect PCI memory space access >>> and only has influence during device setup. >>> >>> I would love to hear your opinion on this >> I've voiced my opinion already: Using the global lock really is an >> abuse, which would require good justification. Hence unless there's >> anything speaking against a per-domain lock, that's imo the only >> suitable route to go. Nesting rules with the global lock may want >> explicitly spelling out. > I do understand your concern here and also support the idea that > the less we wait for locks the better. Nevertheless, even if I introduce > d->vdev_lock, which will obviously help MMIO traps, the rest will remain > under pcidevs_{lock|unlock}, e.g. XEN_DOMCTL_assign_device, > XEN_DOMCTL_test_assign_device and XEN_DOMCTL_deassign_device > and the underlying code like vpci_{assign|deassign}_device in my case Well, it's entirely usual that certain operations require more than one lock. > Anyways, I'll implement a per-domain d->vdev_lock Thanks. Jan
Hi Oleksandr, On 30/09/2021 08:52, Oleksandr Andrushchenko wrote: > From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> > > There are three originators for the PCI configuration space access: > 1. The domain that owns physical host bridge: MMIO handlers are > there so we can update vPCI register handlers with the values > written by the hardware domain, e.g. physical view of the registers > vs guest's view on the configuration space. > 2. Guest access to the passed through PCI devices: we need to properly > map virtual bus topology to the physical one, e.g. pass the configuration > space access to the corresponding physical devices. > 3. Emulated host PCI bridge access. It doesn't exist in the physical > topology, e.g. it can't be mapped to some physical host bridge. > So, all access to the host bridge itself needs to be trapped and > emulated. > > Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> > > --- > Since v2: > - pass struct domain instead of struct vcpu > - constify arguments where possible > - gate relevant code with CONFIG_HAS_VPCI_GUEST_SUPPORT > New in v2 > --- > xen/arch/arm/domain.c | 1 + > xen/arch/arm/vpci.c | 86 +++++++++++++++++++++++++++++++---- > xen/arch/arm/vpci.h | 3 ++ > xen/drivers/passthrough/pci.c | 25 ++++++++++ > xen/include/asm-arm/pci.h | 1 + > xen/include/xen/pci.h | 1 + > xen/include/xen/sched.h | 2 + > 7 files changed, 111 insertions(+), 8 deletions(-) > > diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c > index fa6fcc5e467c..095671742ad8 100644 > --- a/xen/arch/arm/domain.c > +++ b/xen/arch/arm/domain.c > @@ -797,6 +797,7 @@ void arch_domain_destroy(struct domain *d) > get_order_from_bytes(d->arch.efi_acpi_len)); > #endif > domain_io_free(d); > + domain_vpci_free(d); > } > > void arch_domain_shutdown(struct domain *d) > diff --git a/xen/arch/arm/vpci.c b/xen/arch/arm/vpci.c > index 5d6c29c8dcd9..26ec2fa7cf2d 100644 > --- a/xen/arch/arm/vpci.c > +++ b/xen/arch/arm/vpci.c > @@ -17,6 +17,14 @@ > > #define REGISTER_OFFSET(addr) ( (addr) & 0x00000fff) > > +struct vpci_mmio_priv { > + /* > + * Set to true if the MMIO handlers were set up for the emulated > + * ECAM host PCI bridge. > + */ > + bool is_virt_ecam; > +}; > + > /* Do some sanity checks. */ > static bool vpci_mmio_access_allowed(unsigned int reg, unsigned int len) > { > @@ -38,6 +46,7 @@ static int vpci_mmio_read(struct vcpu *v, mmio_info_t *info, > pci_sbdf_t sbdf; > unsigned long data = ~0UL; > unsigned int size = 1U << info->dabt.size; > + struct vpci_mmio_priv *priv = (struct vpci_mmio_priv *)p; This cast is unnecessary. Same... > > sbdf.sbdf = MMCFG_BDF(info->gpa); > reg = REGISTER_OFFSET(info->gpa); > @@ -45,6 +54,13 @@ static int vpci_mmio_read(struct vcpu *v, mmio_info_t *info, > if ( !vpci_mmio_access_allowed(reg, size) ) > return 0; > > + /* > + * For the passed through devices we need to map their virtual SBDF > + * to the physical PCI device being passed through. > + */ > + if ( priv->is_virt_ecam && !pci_translate_virtual_device(v->domain, &sbdf) ) > + return 1; > + > data = vpci_read(sbdf, reg, min(4u, size)); > if ( size == 8 ) > data |= (uint64_t)vpci_read(sbdf, reg + 4, 4) << 32; > @@ -61,6 +77,7 @@ static int vpci_mmio_write(struct vcpu *v, mmio_info_t *info, > pci_sbdf_t sbdf; > unsigned long data = r; > unsigned int size = 1U << info->dabt.size; > + struct vpci_mmio_priv *priv = (struct vpci_mmio_priv *)p; ... here. But is it meant to be modified? If not, then I think you want to turn it to add a const in both cases. > > sbdf.sbdf = MMCFG_BDF(info->gpa); > reg = REGISTER_OFFSET(info->gpa); > @@ -68,6 +85,13 @@ static int vpci_mmio_write(struct vcpu *v, mmio_info_t *info, > if ( !vpci_mmio_access_allowed(reg, size) ) > return 0; > > + /* > + * For the passed through devices we need to map their virtual SBDF > + * to the physical PCI device being passed through. > + */ > + if ( priv->is_virt_ecam && !pci_translate_virtual_device(v->domain, &sbdf) ) > + return 1; > + > vpci_write(sbdf, reg, min(4u, size), data); > if ( size == 8 ) > vpci_write(sbdf, reg + 4, 4, data >> 32); > @@ -80,13 +104,48 @@ static const struct mmio_handler_ops vpci_mmio_handler = { > .write = vpci_mmio_write, > }; > > +/* > + * There are three originators for the PCI configuration space access: > + * 1. The domain that owns physical host bridge: MMIO handlers are > + * there so we can update vPCI register handlers with the values > + * written by the hardware domain, e.g. physical view of the registers/ > + * configuration space. > + * 2. Guest access to the passed through PCI devices: we need to properly > + * map virtual bus topology to the physical one, e.g. pass the configuration > + * space access to the corresponding physical devices. > + * 3. Emulated host PCI bridge access. It doesn't exist in the physical > + * topology, e.g. it can't be mapped to some physical host bridge. > + * So, all access to the host bridge itself needs to be trapped and > + * emulated. > + */ > static int vpci_setup_mmio_handler(struct domain *d, > struct pci_host_bridge *bridge) > { > - struct pci_config_window *cfg = bridge->cfg; > + struct vpci_mmio_priv *priv; > + > + priv = xzalloc(struct vpci_mmio_priv); > + if ( !priv ) > + return -ENOMEM; > + > + priv->is_virt_ecam = !is_hardware_domain(d); > > - register_mmio_handler(d, &vpci_mmio_handler, > - cfg->phys_addr, cfg->size, NULL); > + if ( is_hardware_domain(d) ) > + { > + struct pci_config_window *cfg = bridge->cfg; > + > + bridge->mmio_priv = priv; > + register_mmio_handler(d, &vpci_mmio_handler, > + cfg->phys_addr, cfg->size, > + priv); > + } > + else > + { > + d->vpci_mmio_priv = priv; Something feels odd to me in this code. The if ( !is_hardware_domain(d) ) part seems to suggests that this can be called on multiple bridge. But here you are directly assigning priv to d->vpci_mmio_priv. The call... > + /* Guest domains use what is programmed in their device tree. */ > + register_mmio_handler(d, &vpci_mmio_handler, > + GUEST_VPCI_ECAM_BASE, GUEST_VPCI_ECAM_SIZE, > + priv); > + } > return 0; > } > > @@ -95,14 +154,25 @@ int domain_vpci_init(struct domain *d) > if ( !has_vpci(d) ) > return 0; > > + return pci_host_iterate_bridges(d, vpci_setup_mmio_handler); ... here seems to confirm that you may (in theory) have multiple bridges. So the 'else' would want some rework to avoid assuming a single bridge. > diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c > index 5b963d75d1ba..b7dffb769cfd 100644 > --- a/xen/drivers/passthrough/pci.c > +++ b/xen/drivers/passthrough/pci.c > @@ -889,6 +889,31 @@ int pci_remove_virtual_device(struct domain *d, const struct pci_dev *pdev) > xfree(vdev); > return 0; > } > + > +/* > + * Find the physical device which is mapped to the virtual device > + * and translate virtual SBDF to the physical one. > + */ > +bool pci_translate_virtual_device(const struct domain *d, pci_sbdf_t *sbdf) > +{ > + struct vpci_dev *vdev; > + bool found = false; > + > + pcidevs_lock(); > + list_for_each_entry ( vdev, &d->vdev_list, list ) I haven't looked at the rest of the series yet. But I am a bit concerned to see code to iterate through a list accessible by the guest. 1) What safety mechanism do we have in place to ensure that the list is going to be small 2) If there is a limit, do we have any documentation on top of this limit to make clear this can't be bumped without removing the list? Cheers,
On Thu, Sep 30, 2021 at 10:52:23AM +0300, Oleksandr Andrushchenko wrote: > From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> > > There are three originators for the PCI configuration space access: > 1. The domain that owns physical host bridge: MMIO handlers are > there so we can update vPCI register handlers with the values > written by the hardware domain, e.g. physical view of the registers > vs guest's view on the configuration space. > 2. Guest access to the passed through PCI devices: we need to properly > map virtual bus topology to the physical one, e.g. pass the configuration > space access to the corresponding physical devices. > 3. Emulated host PCI bridge access. It doesn't exist in the physical > topology, e.g. it can't be mapped to some physical host bridge. > So, all access to the host bridge itself needs to be trapped and > emulated. I'm slightly confused by the fact that you seem to allow unprivileged guests to use vPCI in this commit, yet there's still a concerning bit that AFAICT has not been changed by the series. vpci_{read,write} will passthough any access not explicitly handled by vPCI (see the usage of vpci_{read,write}_hw). This is fine for the hardware domain, but needs inverting for unprivileged guests: any access not explicitly handled by vPCI needs to be dropped. > Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> > > --- > Since v2: > - pass struct domain instead of struct vcpu > - constify arguments where possible > - gate relevant code with CONFIG_HAS_VPCI_GUEST_SUPPORT > New in v2 > --- > xen/arch/arm/domain.c | 1 + > xen/arch/arm/vpci.c | 86 +++++++++++++++++++++++++++++++---- > xen/arch/arm/vpci.h | 3 ++ > xen/drivers/passthrough/pci.c | 25 ++++++++++ > xen/include/asm-arm/pci.h | 1 + > xen/include/xen/pci.h | 1 + > xen/include/xen/sched.h | 2 + > 7 files changed, 111 insertions(+), 8 deletions(-) > > diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c > index fa6fcc5e467c..095671742ad8 100644 > --- a/xen/arch/arm/domain.c > +++ b/xen/arch/arm/domain.c > @@ -797,6 +797,7 @@ void arch_domain_destroy(struct domain *d) > get_order_from_bytes(d->arch.efi_acpi_len)); > #endif > domain_io_free(d); > + domain_vpci_free(d); It's a nit, but I think from a logical PoV this should be inverted? You first free the handlers and then the IO infrastructure. > } > > void arch_domain_shutdown(struct domain *d) > diff --git a/xen/arch/arm/vpci.c b/xen/arch/arm/vpci.c > index 5d6c29c8dcd9..26ec2fa7cf2d 100644 > --- a/xen/arch/arm/vpci.c > +++ b/xen/arch/arm/vpci.c > @@ -17,6 +17,14 @@ > > #define REGISTER_OFFSET(addr) ( (addr) & 0x00000fff) > > +struct vpci_mmio_priv { > + /* > + * Set to true if the MMIO handlers were set up for the emulated > + * ECAM host PCI bridge. > + */ > + bool is_virt_ecam; > +}; Is this strictly required? It feels a bit odd to have a structure to store and single boolean. I think you could replace it's usage with is_hardware_domain. > + > /* Do some sanity checks. */ > static bool vpci_mmio_access_allowed(unsigned int reg, unsigned int len) > { > @@ -38,6 +46,7 @@ static int vpci_mmio_read(struct vcpu *v, mmio_info_t *info, > pci_sbdf_t sbdf; > unsigned long data = ~0UL; > unsigned int size = 1U << info->dabt.size; > + struct vpci_mmio_priv *priv = (struct vpci_mmio_priv *)p; > > sbdf.sbdf = MMCFG_BDF(info->gpa); > reg = REGISTER_OFFSET(info->gpa); > @@ -45,6 +54,13 @@ static int vpci_mmio_read(struct vcpu *v, mmio_info_t *info, > if ( !vpci_mmio_access_allowed(reg, size) ) > return 0; > > + /* > + * For the passed through devices we need to map their virtual SBDF > + * to the physical PCI device being passed through. > + */ > + if ( priv->is_virt_ecam && !pci_translate_virtual_device(v->domain, &sbdf) ) > + return 1; > + > data = vpci_read(sbdf, reg, min(4u, size)); Given my earlier recommendation to place the virtual sbdf inside struct vpci, it might make sense to let vpci_read do the translation itself. > if ( size == 8 ) > data |= (uint64_t)vpci_read(sbdf, reg + 4, 4) << 32; > @@ -61,6 +77,7 @@ static int vpci_mmio_write(struct vcpu *v, mmio_info_t *info, > pci_sbdf_t sbdf; > unsigned long data = r; > unsigned int size = 1U << info->dabt.size; > + struct vpci_mmio_priv *priv = (struct vpci_mmio_priv *)p; > > sbdf.sbdf = MMCFG_BDF(info->gpa); > reg = REGISTER_OFFSET(info->gpa); > @@ -68,6 +85,13 @@ static int vpci_mmio_write(struct vcpu *v, mmio_info_t *info, > if ( !vpci_mmio_access_allowed(reg, size) ) > return 0; > > + /* > + * For the passed through devices we need to map their virtual SBDF > + * to the physical PCI device being passed through. > + */ > + if ( priv->is_virt_ecam && !pci_translate_virtual_device(v->domain, &sbdf) ) > + return 1; > + > vpci_write(sbdf, reg, min(4u, size), data); > if ( size == 8 ) > vpci_write(sbdf, reg + 4, 4, data >> 32); > @@ -80,13 +104,48 @@ static const struct mmio_handler_ops vpci_mmio_handler = { > .write = vpci_mmio_write, > }; > > +/* > + * There are three originators for the PCI configuration space access: > + * 1. The domain that owns physical host bridge: MMIO handlers are > + * there so we can update vPCI register handlers with the values > + * written by the hardware domain, e.g. physical view of the registers/ > + * configuration space. > + * 2. Guest access to the passed through PCI devices: we need to properly > + * map virtual bus topology to the physical one, e.g. pass the configuration > + * space access to the corresponding physical devices. > + * 3. Emulated host PCI bridge access. It doesn't exist in the physical > + * topology, e.g. it can't be mapped to some physical host bridge. > + * So, all access to the host bridge itself needs to be trapped and > + * emulated. I'm not sure 3. is equivalent to the other points. 1. and 2. seem to be referring to where accesses to the config space are coming from, while point 3. is referring to a fully emulated device in Xen (one that doesn't have a backing pci_dev). I'm also failing to see any fully virtual PCI device being added to the bus for guest domains so far. > + */ > static int vpci_setup_mmio_handler(struct domain *d, > struct pci_host_bridge *bridge) > { > - struct pci_config_window *cfg = bridge->cfg; > + struct vpci_mmio_priv *priv; > + > + priv = xzalloc(struct vpci_mmio_priv); > + if ( !priv ) > + return -ENOMEM; > + > + priv->is_virt_ecam = !is_hardware_domain(d); > > - register_mmio_handler(d, &vpci_mmio_handler, > - cfg->phys_addr, cfg->size, NULL); > + if ( is_hardware_domain(d) ) > + { > + struct pci_config_window *cfg = bridge->cfg; > + > + bridge->mmio_priv = priv; > + register_mmio_handler(d, &vpci_mmio_handler, > + cfg->phys_addr, cfg->size, > + priv); > + } > + else > + { > + d->vpci_mmio_priv = priv; > + /* Guest domains use what is programmed in their device tree. */ > + register_mmio_handler(d, &vpci_mmio_handler, > + GUEST_VPCI_ECAM_BASE, GUEST_VPCI_ECAM_SIZE, > + priv); > + } > return 0; > } > > @@ -95,14 +154,25 @@ int domain_vpci_init(struct domain *d) > if ( !has_vpci(d) ) > return 0; > > + return pci_host_iterate_bridges(d, vpci_setup_mmio_handler); I think this is wrong for unprivileged domains: you iterate against host bridges but just setup a single ECAM region from GUEST_VPCI_ECAM_BASE to GUEST_VPCI_ECAM_SIZE, so you are leaking multiple allocations of vpci_mmio_priv, and also adding a bunch of duplicated IO handlers for the same ECAM region. IMO you should iterate against host bridges only for the hardware domain case. For the unpriviledged domain case there's no need to iterate against the list of physical host bridges as you end up exposing a fully emulated bus which bears no resemblance to the physical setup. > +} > + > +static int domain_vpci_free_cb(struct domain *d, > + struct pci_host_bridge *bridge) > +{ > if ( is_hardware_domain(d) ) > - return pci_host_iterate_bridges(d, vpci_setup_mmio_handler); > + XFREE(bridge->mmio_priv); > + else > + XFREE(d->vpci_mmio_priv); > + return 0; > +} > > - /* Guest domains use what is programmed in their device tree. */ > - register_mmio_handler(d, &vpci_mmio_handler, > - GUEST_VPCI_ECAM_BASE, GUEST_VPCI_ECAM_SIZE, NULL); > +void domain_vpci_free(struct domain *d) > +{ > + if ( !has_vpci(d) ) > + return; > > - return 0; > + pci_host_iterate_bridges(d, domain_vpci_free_cb); Why do we need to iterate the host bridges for unprivileged domains? AFAICT it just causes duplicated calls to XFREE(d->vpci_mmio_priv). I would expect something like: static int bridge_free_cb(struct domain *d, struct pci_host_bridge *bridge) { ASSERT(is_hardware_domain(d)); XFREE(bridge->mmio_priv); return 0; } void domain_vpci_free(struct domain *d) { if ( !has_vpci(d) ) return; if ( is_hardware_domain(d) ) pci_host_iterate_bridges(d, bridge_free_cb); else XFREE(d->vpci_mmio_priv); } Albeit I think there's no need for vpci_mmio_priv in the first place. Thanks, Roger.
Hi, Roger! On 26.10.21 16:30, Roger Pau Monné wrote: > diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c >> index fa6fcc5e467c..095671742ad8 100644 >> --- a/xen/arch/arm/domain.c >> +++ b/xen/arch/arm/domain.c >> @@ -797,6 +797,7 @@ void arch_domain_destroy(struct domain *d) >> get_order_from_bytes(d->arch.efi_acpi_len)); >> #endif >> domain_io_free(d); >> + domain_vpci_free(d); > It's a nit, but I think from a logical PoV this should be inverted? > You first free the handlers and then the IO infrastructure. Indeed, thanks > >> } >> >> void arch_domain_shutdown(struct domain *d) >> diff --git a/xen/arch/arm/vpci.c b/xen/arch/arm/vpci.c >> index 5d6c29c8dcd9..26ec2fa7cf2d 100644 >> --- a/xen/arch/arm/vpci.c >> +++ b/xen/arch/arm/vpci.c >> @@ -17,6 +17,14 @@ >> >> #define REGISTER_OFFSET(addr) ( (addr) & 0x00000fff) >> >> +struct vpci_mmio_priv { >> + /* >> + * Set to true if the MMIO handlers were set up for the emulated >> + * ECAM host PCI bridge. >> + */ >> + bool is_virt_ecam; >> +}; > Is this strictly required? It feels a bit odd to have a structure to > store and single boolean. > > I think you could replace it's usage with is_hardware_domain. I am working on some "earlier" patch fixes [1] which already needs some private to be passed to the handlers: we need to set sbdf.seg to the proper host bridge segment instead of always setting it to 0. And then I can pass "struct pci_host_bridge *bridge" as the private member and use is_hardware_domain(v->domain) to see if this is guest or hwdom. So, I'll remove the structure completely [snip] >> + */ >> static int vpci_setup_mmio_handler(struct domain *d, >> struct pci_host_bridge *bridge) >> { >> - struct pci_config_window *cfg = bridge->cfg; >> + struct vpci_mmio_priv *priv; >> + >> + priv = xzalloc(struct vpci_mmio_priv); >> + if ( !priv ) >> + return -ENOMEM; >> + >> + priv->is_virt_ecam = !is_hardware_domain(d); >> >> - register_mmio_handler(d, &vpci_mmio_handler, >> - cfg->phys_addr, cfg->size, NULL); >> + if ( is_hardware_domain(d) ) >> + { >> + struct pci_config_window *cfg = bridge->cfg; >> + >> + bridge->mmio_priv = priv; >> + register_mmio_handler(d, &vpci_mmio_handler, >> + cfg->phys_addr, cfg->size, >> + priv); >> + } >> + else >> + { >> + d->vpci_mmio_priv = priv; >> + /* Guest domains use what is programmed in their device tree. */ >> + register_mmio_handler(d, &vpci_mmio_handler, >> + GUEST_VPCI_ECAM_BASE, GUEST_VPCI_ECAM_SIZE, >> + priv); >> + } >> return 0; >> } >> >> @@ -95,14 +154,25 @@ int domain_vpci_init(struct domain *d) >> if ( !has_vpci(d) ) >> return 0; >> >> + return pci_host_iterate_bridges(d, vpci_setup_mmio_handler); > I think this is wrong for unprivileged domains: you iterate against > host bridges but just setup a single ECAM region from > GUEST_VPCI_ECAM_BASE to GUEST_VPCI_ECAM_SIZE, so you are leaking > multiple allocations of vpci_mmio_priv, and also adding a bunch of > duplicated IO handlers for the same ECAM region. > > IMO you should iterate against host bridges only for the hardware > domain case. For the unpriviledged domain case there's no need to > iterate against the list of physical host bridges as you end up > exposing a fully emulated bus which bears no resemblance to the > physical setup. Yes, I am moving this code into that "earlier" patch [1] and already spotted the leak: thus I am also re-working this code. > >> +} >> + >> +static int domain_vpci_free_cb(struct domain *d, >> + struct pci_host_bridge *bridge) >> +{ >> if ( is_hardware_domain(d) ) >> - return pci_host_iterate_bridges(d, vpci_setup_mmio_handler); >> + XFREE(bridge->mmio_priv); >> + else >> + XFREE(d->vpci_mmio_priv); >> + return 0; >> +} >> >> - /* Guest domains use what is programmed in their device tree. */ >> - register_mmio_handler(d, &vpci_mmio_handler, >> - GUEST_VPCI_ECAM_BASE, GUEST_VPCI_ECAM_SIZE, NULL); >> +void domain_vpci_free(struct domain *d) >> +{ >> + if ( !has_vpci(d) ) >> + return; >> >> - return 0; >> + pci_host_iterate_bridges(d, domain_vpci_free_cb); > Why do we need to iterate the host bridges for unprivileged domains? No need, I am taking care of this > AFAICT it just causes duplicated calls to XFREE(d->vpci_mmio_priv). I > would expect something like: > > static int bridge_free_cb(struct domain *d, > struct pci_host_bridge *bridge) > { > ASSERT(is_hardware_domain(d)); > XFREE(bridge->mmio_priv); > return 0; > } > > void domain_vpci_free(struct domain *d) > { > if ( !has_vpci(d) ) > return; > > if ( is_hardware_domain(d) ) > pci_host_iterate_bridges(d, bridge_free_cb); > else > XFREE(d->vpci_mmio_priv); > } > > Albeit I think there's no need for vpci_mmio_priv in the first place. > > Thanks, Roger. Thank you, Oleksandr [1] https://patchwork.kernel.org/project/xen-devel/patch/20211008055535.337436-9-andr2000@gmail.com/
diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c index fa6fcc5e467c..095671742ad8 100644 --- a/xen/arch/arm/domain.c +++ b/xen/arch/arm/domain.c @@ -797,6 +797,7 @@ void arch_domain_destroy(struct domain *d) get_order_from_bytes(d->arch.efi_acpi_len)); #endif domain_io_free(d); + domain_vpci_free(d); } void arch_domain_shutdown(struct domain *d) diff --git a/xen/arch/arm/vpci.c b/xen/arch/arm/vpci.c index 5d6c29c8dcd9..26ec2fa7cf2d 100644 --- a/xen/arch/arm/vpci.c +++ b/xen/arch/arm/vpci.c @@ -17,6 +17,14 @@ #define REGISTER_OFFSET(addr) ( (addr) & 0x00000fff) +struct vpci_mmio_priv { + /* + * Set to true if the MMIO handlers were set up for the emulated + * ECAM host PCI bridge. + */ + bool is_virt_ecam; +}; + /* Do some sanity checks. */ static bool vpci_mmio_access_allowed(unsigned int reg, unsigned int len) { @@ -38,6 +46,7 @@ static int vpci_mmio_read(struct vcpu *v, mmio_info_t *info, pci_sbdf_t sbdf; unsigned long data = ~0UL; unsigned int size = 1U << info->dabt.size; + struct vpci_mmio_priv *priv = (struct vpci_mmio_priv *)p; sbdf.sbdf = MMCFG_BDF(info->gpa); reg = REGISTER_OFFSET(info->gpa); @@ -45,6 +54,13 @@ static int vpci_mmio_read(struct vcpu *v, mmio_info_t *info, if ( !vpci_mmio_access_allowed(reg, size) ) return 0; + /* + * For the passed through devices we need to map their virtual SBDF + * to the physical PCI device being passed through. + */ + if ( priv->is_virt_ecam && !pci_translate_virtual_device(v->domain, &sbdf) ) + return 1; + data = vpci_read(sbdf, reg, min(4u, size)); if ( size == 8 ) data |= (uint64_t)vpci_read(sbdf, reg + 4, 4) << 32; @@ -61,6 +77,7 @@ static int vpci_mmio_write(struct vcpu *v, mmio_info_t *info, pci_sbdf_t sbdf; unsigned long data = r; unsigned int size = 1U << info->dabt.size; + struct vpci_mmio_priv *priv = (struct vpci_mmio_priv *)p; sbdf.sbdf = MMCFG_BDF(info->gpa); reg = REGISTER_OFFSET(info->gpa); @@ -68,6 +85,13 @@ static int vpci_mmio_write(struct vcpu *v, mmio_info_t *info, if ( !vpci_mmio_access_allowed(reg, size) ) return 0; + /* + * For the passed through devices we need to map their virtual SBDF + * to the physical PCI device being passed through. + */ + if ( priv->is_virt_ecam && !pci_translate_virtual_device(v->domain, &sbdf) ) + return 1; + vpci_write(sbdf, reg, min(4u, size), data); if ( size == 8 ) vpci_write(sbdf, reg + 4, 4, data >> 32); @@ -80,13 +104,48 @@ static const struct mmio_handler_ops vpci_mmio_handler = { .write = vpci_mmio_write, }; +/* + * There are three originators for the PCI configuration space access: + * 1. The domain that owns physical host bridge: MMIO handlers are + * there so we can update vPCI register handlers with the values + * written by the hardware domain, e.g. physical view of the registers/ + * configuration space. + * 2. Guest access to the passed through PCI devices: we need to properly + * map virtual bus topology to the physical one, e.g. pass the configuration + * space access to the corresponding physical devices. + * 3. Emulated host PCI bridge access. It doesn't exist in the physical + * topology, e.g. it can't be mapped to some physical host bridge. + * So, all access to the host bridge itself needs to be trapped and + * emulated. + */ static int vpci_setup_mmio_handler(struct domain *d, struct pci_host_bridge *bridge) { - struct pci_config_window *cfg = bridge->cfg; + struct vpci_mmio_priv *priv; + + priv = xzalloc(struct vpci_mmio_priv); + if ( !priv ) + return -ENOMEM; + + priv->is_virt_ecam = !is_hardware_domain(d); - register_mmio_handler(d, &vpci_mmio_handler, - cfg->phys_addr, cfg->size, NULL); + if ( is_hardware_domain(d) ) + { + struct pci_config_window *cfg = bridge->cfg; + + bridge->mmio_priv = priv; + register_mmio_handler(d, &vpci_mmio_handler, + cfg->phys_addr, cfg->size, + priv); + } + else + { + d->vpci_mmio_priv = priv; + /* Guest domains use what is programmed in their device tree. */ + register_mmio_handler(d, &vpci_mmio_handler, + GUEST_VPCI_ECAM_BASE, GUEST_VPCI_ECAM_SIZE, + priv); + } return 0; } @@ -95,14 +154,25 @@ int domain_vpci_init(struct domain *d) if ( !has_vpci(d) ) return 0; + return pci_host_iterate_bridges(d, vpci_setup_mmio_handler); +} + +static int domain_vpci_free_cb(struct domain *d, + struct pci_host_bridge *bridge) +{ if ( is_hardware_domain(d) ) - return pci_host_iterate_bridges(d, vpci_setup_mmio_handler); + XFREE(bridge->mmio_priv); + else + XFREE(d->vpci_mmio_priv); + return 0; +} - /* Guest domains use what is programmed in their device tree. */ - register_mmio_handler(d, &vpci_mmio_handler, - GUEST_VPCI_ECAM_BASE, GUEST_VPCI_ECAM_SIZE, NULL); +void domain_vpci_free(struct domain *d) +{ + if ( !has_vpci(d) ) + return; - return 0; + pci_host_iterate_bridges(d, domain_vpci_free_cb); } int domain_vpci_get_num_mmio_handlers(struct domain *d) diff --git a/xen/arch/arm/vpci.h b/xen/arch/arm/vpci.h index 27a2b069abd2..38e5a28c0d95 100644 --- a/xen/arch/arm/vpci.h +++ b/xen/arch/arm/vpci.h @@ -18,6 +18,7 @@ #ifdef CONFIG_HAS_VPCI int domain_vpci_init(struct domain *d); int domain_vpci_get_num_mmio_handlers(struct domain *d); +void domain_vpci_free(struct domain *d); #else static inline int domain_vpci_init(struct domain *d) { @@ -28,6 +29,8 @@ static inline int domain_vpci_get_num_mmio_handlers(struct domain *d) { return 0; } + +static inline void domain_vpci_free(struct domain *d) { } #endif #endif /* __ARCH_ARM_VPCI_H__ */ diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c index 5b963d75d1ba..b7dffb769cfd 100644 --- a/xen/drivers/passthrough/pci.c +++ b/xen/drivers/passthrough/pci.c @@ -889,6 +889,31 @@ int pci_remove_virtual_device(struct domain *d, const struct pci_dev *pdev) xfree(vdev); return 0; } + +/* + * Find the physical device which is mapped to the virtual device + * and translate virtual SBDF to the physical one. + */ +bool pci_translate_virtual_device(const struct domain *d, pci_sbdf_t *sbdf) +{ + struct vpci_dev *vdev; + bool found = false; + + pcidevs_lock(); + list_for_each_entry ( vdev, &d->vdev_list, list ) + { + if ( vdev->sbdf.sbdf == sbdf->sbdf ) + { + /* Replace virtual SBDF with the physical one. */ + *sbdf = vdev->pdev->sbdf; + found = true; + break; + } + } + pcidevs_unlock(); + + return found; +} #endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */ /* Caller should hold the pcidevs_lock */ diff --git a/xen/include/asm-arm/pci.h b/xen/include/asm-arm/pci.h index 1bfba3da8f51..12b4bf467ad2 100644 --- a/xen/include/asm-arm/pci.h +++ b/xen/include/asm-arm/pci.h @@ -66,6 +66,7 @@ struct pci_host_bridge { uint16_t segment; /* Segment number */ struct pci_config_window* cfg; /* Pointer to the bridge config window */ struct pci_ops *ops; + void *mmio_priv; /* MMIO handler's private data. */ }; struct pci_ops { diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h index 33033a3a8f8d..89cfc4853331 100644 --- a/xen/include/xen/pci.h +++ b/xen/include/xen/pci.h @@ -188,6 +188,7 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn); #ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT int pci_add_virtual_device(struct domain *d, const struct pci_dev *pdev); int pci_remove_virtual_device(struct domain *d, const struct pci_dev *pdev); +bool pci_translate_virtual_device(const struct domain *d, pci_sbdf_t *sbdf); #endif int pci_ro_device(int seg, int bus, int devfn); int pci_hide_device(unsigned int seg, unsigned int bus, unsigned int devfn); diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index ecdb04b4f7fc..858b4133482f 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -451,6 +451,8 @@ struct domain * to assign a unique SBDF to a passed through virtual PCI device. */ int vpci_dev_next; + /* Virtual PCI MMIO handler's private data. */ + void *vpci_mmio_priv; #endif #endif