diff mbox series

[v8,01/13] pci: introduce per-domain PCI rwlock

Message ID 20230720003205.1828537-2-volodymyr_babchuk@epam.com (mailing list archive)
State Superseded
Headers show
Series PCI devices passthrough on Arm, part 3 | expand

Commit Message

Volodymyr Babchuk July 20, 2023, 12:32 a.m. UTC
Add per-domain d->pci_lock that protects access to
d->pdev_list. Purpose of this lock is to give guarantees to VPCI code
that underlying pdev will not disappear under feet. This is a rw-lock,
but this patch adds only write_lock()s. There will be read_lock()
users in the next patches.

This lock should be taken in write mode every time d->pdev_list is
altered. This covers both accesses to d->pdev_list and accesses to
pdev->domain_list fields. All write accesses also should be protected
by pcidevs_lock() as well. Idea is that any user that wants read
access to the list or to the devices stored in the list should use
either this new d->pci_lock or old pcidevs_lock(). Usage of any of
this two locks will ensure only that pdev of interest will not
disappear from under feet and that the pdev still will be assigned to
the same domain. Of course, any new users should use pcidevs_lock()
when it is appropriate (e.g. when accessing any other state that is
protected by the said lock).

Any write access to pdev->domain_list should be protected by both
pcidevs_lock() and d->pci_lock in the write mode.

Suggested-by: Roger Pau Monné <roger.pau@citrix.com>
Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>

---

Changes in v8:
 - New patch

Changes in v8 vs RFC:
 - Removed all read_locks after discussion with Roger in #xendevel
 - pci_release_devices() now returns the first error code
 - extended commit message
 - added missing lock in pci_remove_device()
 - extended locked region in pci_add_device() to protect list_del() calls
---
 xen/common/domain.c                         |  1 +
 xen/drivers/passthrough/amd/pci_amd_iommu.c |  9 ++-
 xen/drivers/passthrough/pci.c               | 68 +++++++++++++++++----
 xen/drivers/passthrough/vtd/iommu.c         |  9 ++-
 xen/include/xen/sched.h                     |  1 +
 5 files changed, 74 insertions(+), 14 deletions(-)

Comments

Roger Pau Monné July 20, 2023, 9:45 a.m. UTC | #1
On Thu, Jul 20, 2023 at 12:32:31AM +0000, Volodymyr Babchuk wrote:
> Add per-domain d->pci_lock that protects access to
> d->pdev_list. Purpose of this lock is to give guarantees to VPCI code
> that underlying pdev will not disappear under feet. This is a rw-lock,
> but this patch adds only write_lock()s. There will be read_lock()
> users in the next patches.
> 
> This lock should be taken in write mode every time d->pdev_list is
> altered. This covers both accesses to d->pdev_list and accesses to
> pdev->domain_list fields. All write accesses also should be protected
> by pcidevs_lock() as well. Idea is that any user that wants read
> access to the list or to the devices stored in the list should use
> either this new d->pci_lock or old pcidevs_lock(). Usage of any of
> this two locks will ensure only that pdev of interest will not
> disappear from under feet and that the pdev still will be assigned to
> the same domain. Of course, any new users should use pcidevs_lock()
> when it is appropriate (e.g. when accessing any other state that is
> protected by the said lock).

I think this needs a note about the ordering:

"In case both the newly introduced per-domain rwlock and the pcidevs
lock is taken, the later must be acquired first."

> 
> Any write access to pdev->domain_list should be protected by both
> pcidevs_lock() and d->pci_lock in the write mode.

You also protect calls to vpci_remove_device() with the per-domain
pci_lock it seems, and that will need some explanation as it's not
obvious.

> 
> Suggested-by: Roger Pau Monné <roger.pau@citrix.com>
> Suggested-by: Jan Beulich <jbeulich@suse.com>
> Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
> 
> ---
> 
> Changes in v8:
>  - New patch
> 
> Changes in v8 vs RFC:
>  - Removed all read_locks after discussion with Roger in #xendevel
>  - pci_release_devices() now returns the first error code
>  - extended commit message
>  - added missing lock in pci_remove_device()
>  - extended locked region in pci_add_device() to protect list_del() calls
> ---
>  xen/common/domain.c                         |  1 +
>  xen/drivers/passthrough/amd/pci_amd_iommu.c |  9 ++-
>  xen/drivers/passthrough/pci.c               | 68 +++++++++++++++++----
>  xen/drivers/passthrough/vtd/iommu.c         |  9 ++-
>  xen/include/xen/sched.h                     |  1 +
>  5 files changed, 74 insertions(+), 14 deletions(-)
> 
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index caaa402637..5d8a8836da 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -645,6 +645,7 @@ struct domain *domain_create(domid_t domid,
>  
>  #ifdef CONFIG_HAS_PCI
>      INIT_LIST_HEAD(&d->pdev_list);
> +    rwlock_init(&d->pci_lock);
>  #endif
>  
>      /* All error paths can depend on the above setup. */
> diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> index 94e3775506..e2f2e2e950 100644
> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> @@ -476,8 +476,13 @@ static int cf_check reassign_device(
>  
>      if ( devfn == pdev->devfn && pdev->domain != target )
>      {
> -        list_move(&pdev->domain_list, &target->pdev_list);
> -        pdev->domain = target;

You seem to have inadvertently dropped the above line? (and so devices
would keep the previous pdev->domain value)

> +        write_lock(&pdev->domain->pci_lock);
> +        list_del(&pdev->domain_list);
> +        write_unlock(&pdev->domain->pci_lock);
> +
> +        write_lock(&target->pci_lock);
> +        list_add(&pdev->domain_list, &target->pdev_list);
> +        write_unlock(&target->pci_lock);
>      }
>  
>      /*
> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
> index 95846e84f2..5b4632ead2 100644
> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -454,7 +454,9 @@ static void __init _pci_hide_device(struct pci_dev *pdev)
>      if ( pdev->domain )
>          return;
>      pdev->domain = dom_xen;
> +    write_lock(&dom_xen->pci_lock);
>      list_add(&pdev->domain_list, &dom_xen->pdev_list);
> +    write_unlock(&dom_xen->pci_lock);
>  }
>  
>  int __init pci_hide_device(unsigned int seg, unsigned int bus,
> @@ -747,6 +749,7 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>      ret = 0;
>      if ( !pdev->domain )
>      {
> +        write_lock(&hardware_domain->pci_lock);
>          pdev->domain = hardware_domain;
>          list_add(&pdev->domain_list, &hardware_domain->pdev_list);
>  
> @@ -760,6 +763,7 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>              printk(XENLOG_ERR "Setup of vPCI failed: %d\n", ret);
>              list_del(&pdev->domain_list);
>              pdev->domain = NULL;
> +            write_unlock(&hardware_domain->pci_lock);

Strictly speaking, this could move one line earlier, as accesses to
pdev->domain are not protected by the d->pci_lock?  Same in other
instances (above and below), as you seem to introduce a pattern to
perform accesses to pdev->domain with the rwlock taken.

>              goto out;
>          }
>          ret = iommu_add_device(pdev);
> @@ -768,8 +772,10 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>              vpci_remove_device(pdev);
>              list_del(&pdev->domain_list);
>              pdev->domain = NULL;
> +            write_unlock(&hardware_domain->pci_lock);
>              goto out;
>          }
> +        write_unlock(&hardware_domain->pci_lock);
>      }
>      else
>          iommu_enable_device(pdev);
> @@ -812,11 +818,13 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn)
>      list_for_each_entry ( pdev, &pseg->alldevs_list, alldevs_list )
>          if ( pdev->bus == bus && pdev->devfn == devfn )
>          {
> +            write_lock(&pdev->domain->pci_lock);
>              vpci_remove_device(pdev);
>              pci_cleanup_msi(pdev);
>              ret = iommu_remove_device(pdev);
>              if ( pdev->domain )
>                  list_del(&pdev->domain_list);
> +            write_unlock(&pdev->domain->pci_lock);

Here you seem to protect more than strictly required, I would think
only the list_del() would need to be done holding the rwlock?

>              printk(XENLOG_DEBUG "PCI remove device %pp\n", &pdev->sbdf);
>              free_pdev(pseg, pdev);
>              break;
> @@ -887,26 +895,62 @@ static int deassign_device(struct domain *d, uint16_t seg, uint8_t bus,
>  
>  int pci_release_devices(struct domain *d)
>  {
> -    struct pci_dev *pdev, *tmp;
> -    u8 bus, devfn;
> -    int ret;
> +    int combined_ret;
> +    LIST_HEAD(failed_pdevs);
>  
>      pcidevs_lock();
> -    ret = arch_pci_clean_pirqs(d);
> -    if ( ret )
> +    write_lock(&d->pci_lock);
> +    combined_ret = arch_pci_clean_pirqs(d);

Why do you need the per-domain rwlock for arch_pci_clean_pirqs()?
That function doesn't modify the per-domain pdev list.

> +    if ( combined_ret )
>      {
>          pcidevs_unlock();
> -        return ret;
> +        write_unlock(&d->pci_lock);
> +        return combined_ret;

Ideally we would like to keep the same order on unlock, so the rwlock
should be released before the pcidevs lock (unless there's a reason
not to).

>      }
> -    list_for_each_entry_safe ( pdev, tmp, &d->pdev_list, domain_list )
> +
> +    while ( !list_empty(&d->pdev_list) )
>      {
> -        bus = pdev->bus;
> -        devfn = pdev->devfn;
> -        ret = deassign_device(d, pdev->seg, bus, devfn) ?: ret;
> +        struct pci_dev *pdev = list_first_entry(&d->pdev_list,
> +                                                struct pci_dev,
> +                                                domain_list);
> +        uint16_t seg = pdev->seg;
> +        uint8_t bus = pdev->bus;
> +        uint8_t devfn = pdev->devfn;
> +        int ret;
> +
> +        write_unlock(&d->pci_lock);
> +        ret = deassign_device(d, seg, bus, devfn);
> +        write_lock(&d->pci_lock);
> +        if ( ret )
> +        {
> +            bool still_present = false;
> +            const struct pci_dev *tmp;
> +
> +            /*
> +             * We need to check if deassign_device() left our pdev in
> +             * domain's list. As we dropped the lock, we can't be sure
> +             * that list wasn't permutated in some random way, so we
> +             * need to traverse the whole list.
> +             */
> +            for_each_pdev ( d, tmp )
> +            {
> +                if ( tmp == pdev )
> +                {
> +                    still_present = true;
> +                    break;
> +                }
> +            }
> +            if ( still_present )
> +                list_move(&pdev->domain_list, &failed_pdevs);

You can get rid of the still_present variable, and just do:

for_each_pdev ( d, tmp )
    if ( tmp == pdev )
    {
        list_move(&pdev->domain_list, &failed_pdevs);
	break;
    }


> +            combined_ret = combined_ret?:ret;

Nit: missing spaces around the ternary operator.

Thanks, Roger.
Jan Beulich July 20, 2023, 3:40 p.m. UTC | #2
On 20.07.2023 02:32, Volodymyr Babchuk wrote:
> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> @@ -476,8 +476,13 @@ static int cf_check reassign_device(
>  
>      if ( devfn == pdev->devfn && pdev->domain != target )
>      {
> -        list_move(&pdev->domain_list, &target->pdev_list);
> -        pdev->domain = target;
> +        write_lock(&pdev->domain->pci_lock);
> +        list_del(&pdev->domain_list);
> +        write_unlock(&pdev->domain->pci_lock);

As mentioned on an earlier version, perhaps better (cheaper) to use
"source" here? (Same in VT-d code then.)

> @@ -747,6 +749,7 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>      ret = 0;
>      if ( !pdev->domain )
>      {
> +        write_lock(&hardware_domain->pci_lock);
>          pdev->domain = hardware_domain;
>          list_add(&pdev->domain_list, &hardware_domain->pdev_list);
>  
> @@ -760,6 +763,7 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>              printk(XENLOG_ERR "Setup of vPCI failed: %d\n", ret);
>              list_del(&pdev->domain_list);
>              pdev->domain = NULL;
> +            write_unlock(&hardware_domain->pci_lock);
>              goto out;

In addition to Roger's comments about locking scope: In a case like this
one it would probably also be good to move the printk() out of the locked
area. It can be slow, after all.

Question is why you have this wide a locked area here in the first place:
Don't you need to hold the lock just across the two list operations (but
not in between)?

> @@ -887,26 +895,62 @@ static int deassign_device(struct domain *d, uint16_t seg, uint8_t bus,
>  
>  int pci_release_devices(struct domain *d)
>  {
> -    struct pci_dev *pdev, *tmp;
> -    u8 bus, devfn;
> -    int ret;
> +    int combined_ret;
> +    LIST_HEAD(failed_pdevs);
>  
>      pcidevs_lock();
> -    ret = arch_pci_clean_pirqs(d);
> -    if ( ret )
> +    write_lock(&d->pci_lock);
> +    combined_ret = arch_pci_clean_pirqs(d);
> +    if ( combined_ret )
>      {
>          pcidevs_unlock();
> -        return ret;
> +        write_unlock(&d->pci_lock);
> +        return combined_ret;
>      }
> -    list_for_each_entry_safe ( pdev, tmp, &d->pdev_list, domain_list )
> +
> +    while ( !list_empty(&d->pdev_list) )
>      {
> -        bus = pdev->bus;
> -        devfn = pdev->devfn;
> -        ret = deassign_device(d, pdev->seg, bus, devfn) ?: ret;
> +        struct pci_dev *pdev = list_first_entry(&d->pdev_list,
> +                                                struct pci_dev,
> +                                                domain_list);
> +        uint16_t seg = pdev->seg;
> +        uint8_t bus = pdev->bus;
> +        uint8_t devfn = pdev->devfn;
> +        int ret;
> +
> +        write_unlock(&d->pci_lock);
> +        ret = deassign_device(d, seg, bus, devfn);
> +        write_lock(&d->pci_lock);
> +        if ( ret )
> +        {
> +            bool still_present = false;
> +            const struct pci_dev *tmp;
> +
> +            /*
> +             * We need to check if deassign_device() left our pdev in
> +             * domain's list. As we dropped the lock, we can't be sure
> +             * that list wasn't permutated in some random way, so we
> +             * need to traverse the whole list.
> +             */
> +            for_each_pdev ( d, tmp )
> +            {
> +                if ( tmp == pdev )
> +                {
> +                    still_present = true;
> +                    break;
> +                }
> +            }
> +            if ( still_present )
> +                list_move(&pdev->domain_list, &failed_pdevs);

In order to retain original ordering on the resulting list, perhaps better
list_move_tail()?

Jan
Volodymyr Babchuk July 20, 2023, 10:57 p.m. UTC | #3
Hi Roger,

Roger Pau Monné <roger.pau@citrix.com> writes:

> On Thu, Jul 20, 2023 at 12:32:31AM +0000, Volodymyr Babchuk wrote:
>> Add per-domain d->pci_lock that protects access to
>> d->pdev_list. Purpose of this lock is to give guarantees to VPCI code
>> that underlying pdev will not disappear under feet. This is a rw-lock,
>> but this patch adds only write_lock()s. There will be read_lock()
>> users in the next patches.
>> 
>> This lock should be taken in write mode every time d->pdev_list is
>> altered. This covers both accesses to d->pdev_list and accesses to
>> pdev->domain_list fields. All write accesses also should be protected
>> by pcidevs_lock() as well. Idea is that any user that wants read
>> access to the list or to the devices stored in the list should use
>> either this new d->pci_lock or old pcidevs_lock(). Usage of any of
>> this two locks will ensure only that pdev of interest will not
>> disappear from under feet and that the pdev still will be assigned to
>> the same domain. Of course, any new users should use pcidevs_lock()
>> when it is appropriate (e.g. when accessing any other state that is
>> protected by the said lock).
>
> I think this needs a note about the ordering:
>
> "In case both the newly introduced per-domain rwlock and the pcidevs
> lock is taken, the later must be acquired first."

Thanks. Added.

>> 
>> Any write access to pdev->domain_list should be protected by both
>> pcidevs_lock() and d->pci_lock in the write mode.
>
> You also protect calls to vpci_remove_device() with the per-domain
> pci_lock it seems, and that will need some explanation as it's not
> obvious.

Well, strictly speaking, it is not required in this patch. But it is
needed in the next one. I can lock only "list_del(&pdev->domain_list);"
end extend then locked area in the next patch. On other hand, this patch
already protects vpci_add_handlers() call in the pci_add_device() due to
the code layout, so it may be natural to protect vpci_remove_device() as
well. What is your opinion?

>> 
>> Suggested-by: Roger Pau Monné <roger.pau@citrix.com>
>> Suggested-by: Jan Beulich <jbeulich@suse.com>
>> Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
>> 
>> ---
>> 
>> Changes in v8:
>>  - New patch
>> 
>> Changes in v8 vs RFC:
>>  - Removed all read_locks after discussion with Roger in #xendevel
>>  - pci_release_devices() now returns the first error code
>>  - extended commit message
>>  - added missing lock in pci_remove_device()
>>  - extended locked region in pci_add_device() to protect list_del() calls
>> ---
>>  xen/common/domain.c                         |  1 +
>>  xen/drivers/passthrough/amd/pci_amd_iommu.c |  9 ++-
>>  xen/drivers/passthrough/pci.c               | 68 +++++++++++++++++----
>>  xen/drivers/passthrough/vtd/iommu.c         |  9 ++-
>>  xen/include/xen/sched.h                     |  1 +
>>  5 files changed, 74 insertions(+), 14 deletions(-)
>> 
>> diff --git a/xen/common/domain.c b/xen/common/domain.c
>> index caaa402637..5d8a8836da 100644
>> --- a/xen/common/domain.c
>> +++ b/xen/common/domain.c
>> @@ -645,6 +645,7 @@ struct domain *domain_create(domid_t domid,
>>  
>>  #ifdef CONFIG_HAS_PCI
>>      INIT_LIST_HEAD(&d->pdev_list);
>> +    rwlock_init(&d->pci_lock);
>>  #endif
>>  
>>      /* All error paths can depend on the above setup. */
>> diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c b/xen/drivers/passthrough/amd/pci_amd_iommu.c
>> index 94e3775506..e2f2e2e950 100644
>> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
>> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
>> @@ -476,8 +476,13 @@ static int cf_check reassign_device(
>>  
>>      if ( devfn == pdev->devfn && pdev->domain != target )
>>      {
>> -        list_move(&pdev->domain_list, &target->pdev_list);
>> -        pdev->domain = target;
>
> You seem to have inadvertently dropped the above line? (and so devices
> would keep the previous pdev->domain value)
>

Oops, yes. Thank you. I was testing those patches on Intel machine, so
AMD part left not verified.

>> +        write_lock(&pdev->domain->pci_lock);
>> +        list_del(&pdev->domain_list);
>> +        write_unlock(&pdev->domain->pci_lock);
>> +
>> +        write_lock(&target->pci_lock);
>> +        list_add(&pdev->domain_list, &target->pdev_list);
>> +        write_unlock(&target->pci_lock);
>>      }
>>  
>>      /*
>> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
>> index 95846e84f2..5b4632ead2 100644
>> --- a/xen/drivers/passthrough/pci.c
>> +++ b/xen/drivers/passthrough/pci.c
>> @@ -454,7 +454,9 @@ static void __init _pci_hide_device(struct pci_dev *pdev)
>>      if ( pdev->domain )
>>          return;
>>      pdev->domain = dom_xen;
>> +    write_lock(&dom_xen->pci_lock);
>>      list_add(&pdev->domain_list, &dom_xen->pdev_list);
>> +    write_unlock(&dom_xen->pci_lock);
>>  }
>>  
>>  int __init pci_hide_device(unsigned int seg, unsigned int bus,
>> @@ -747,6 +749,7 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>>      ret = 0;
>>      if ( !pdev->domain )
>>      {
>> +        write_lock(&hardware_domain->pci_lock);
>>          pdev->domain = hardware_domain;
>>          list_add(&pdev->domain_list, &hardware_domain->pdev_list);
>>  
>> @@ -760,6 +763,7 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>>              printk(XENLOG_ERR "Setup of vPCI failed: %d\n", ret);
>>              list_del(&pdev->domain_list);
>>              pdev->domain = NULL;
>> +            write_unlock(&hardware_domain->pci_lock);
>
> Strictly speaking, this could move one line earlier, as accesses to
> pdev->domain are not protected by the d->pci_lock?  Same in other
> instances (above and below), as you seem to introduce a pattern to
> perform accesses to pdev->domain with the rwlock taken.
>

Yes, you are right. I'll move the unlock() call.

>>              goto out;
>>          }
>>          ret = iommu_add_device(pdev);
>> @@ -768,8 +772,10 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>>              vpci_remove_device(pdev);
>>              list_del(&pdev->domain_list);
>>              pdev->domain = NULL;
>> +            write_unlock(&hardware_domain->pci_lock);
>>              goto out;
>>          }
>> +        write_unlock(&hardware_domain->pci_lock);
>>      }
>>      else
>>          iommu_enable_device(pdev);
>> @@ -812,11 +818,13 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn)
>>      list_for_each_entry ( pdev, &pseg->alldevs_list, alldevs_list )
>>          if ( pdev->bus == bus && pdev->devfn == devfn )
>>          {
>> +            write_lock(&pdev->domain->pci_lock);
>>              vpci_remove_device(pdev);
>>              pci_cleanup_msi(pdev);
>>              ret = iommu_remove_device(pdev);
>>              if ( pdev->domain )
>>                  list_del(&pdev->domain_list);
>> +            write_unlock(&pdev->domain->pci_lock);
>
> Here you seem to protect more than strictly required, I would think
> only the list_del() would need to be done holding the rwlock?
>

Yes, I believe this is a spill from a next patch. At first all those
changes were introduced in "vpci: use per-domain PCI lock to protect
vpci structure", but then I decided to split changes into two patches.

>>              printk(XENLOG_DEBUG "PCI remove device %pp\n", &pdev->sbdf);
>>              free_pdev(pseg, pdev);
>>              break;
>> @@ -887,26 +895,62 @@ static int deassign_device(struct domain *d, uint16_t seg, uint8_t bus,
>>  
>>  int pci_release_devices(struct domain *d)
>>  {
>> -    struct pci_dev *pdev, *tmp;
>> -    u8 bus, devfn;
>> -    int ret;
>> +    int combined_ret;
>> +    LIST_HEAD(failed_pdevs);
>>  
>>      pcidevs_lock();
>> -    ret = arch_pci_clean_pirqs(d);
>> -    if ( ret )
>> +    write_lock(&d->pci_lock);
>> +    combined_ret = arch_pci_clean_pirqs(d);
>
> Why do you need the per-domain rwlock for arch_pci_clean_pirqs()?
> That function doesn't modify the per-domain pdev list.

You are right, I will correct this in the next version.

>
>> +    if ( combined_ret )
>>      {
>>          pcidevs_unlock();
>> -        return ret;
>> +        write_unlock(&d->pci_lock);
>> +        return combined_ret;
>
> Ideally we would like to keep the same order on unlock, so the rwlock
> should be released before the pcidevs lock (unless there's a reason
> not to).

I'll move write_lock() further below, so this will be fixed automatically.

>
>>      }
>> -    list_for_each_entry_safe ( pdev, tmp, &d->pdev_list, domain_list )
>> +
>> +    while ( !list_empty(&d->pdev_list) )
>>      {
>> -        bus = pdev->bus;
>> -        devfn = pdev->devfn;
>> -        ret = deassign_device(d, pdev->seg, bus, devfn) ?: ret;
>> +        struct pci_dev *pdev = list_first_entry(&d->pdev_list,
>> +                                                struct pci_dev,
>> +                                                domain_list);
>> +        uint16_t seg = pdev->seg;
>> +        uint8_t bus = pdev->bus;
>> +        uint8_t devfn = pdev->devfn;
>> +        int ret;
>> +
>> +        write_unlock(&d->pci_lock);
>> +        ret = deassign_device(d, seg, bus, devfn);
>> +        write_lock(&d->pci_lock);
>> +        if ( ret )
>> +        {
>> +            bool still_present = false;
>> +            const struct pci_dev *tmp;
>> +
>> +            /*
>> +             * We need to check if deassign_device() left our pdev in
>> +             * domain's list. As we dropped the lock, we can't be sure
>> +             * that list wasn't permutated in some random way, so we
>> +             * need to traverse the whole list.
>> +             */
>> +            for_each_pdev ( d, tmp )
>> +            {
>> +                if ( tmp == pdev )
>> +                {
>> +                    still_present = true;
>> +                    break;
>> +                }
>> +            }
>> +            if ( still_present )
>> +                list_move(&pdev->domain_list, &failed_pdevs);
>
> You can get rid of the still_present variable, and just do:
>
> for_each_pdev ( d, tmp )
>     if ( tmp == pdev )
>     {
>         list_move(&pdev->domain_list, &failed_pdevs);
> 	break;
>     }
>
>

Yep, thanks.
Volodymyr Babchuk July 20, 2023, 11:37 p.m. UTC | #4
Hi Jan,

Jan Beulich <jbeulich@suse.com> writes:

> On 20.07.2023 02:32, Volodymyr Babchuk wrote:
>> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
>> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
>> @@ -476,8 +476,13 @@ static int cf_check reassign_device(
>>  
>>      if ( devfn == pdev->devfn && pdev->domain != target )
>>      {
>> -        list_move(&pdev->domain_list, &target->pdev_list);
>> -        pdev->domain = target;
>> +        write_lock(&pdev->domain->pci_lock);
>> +        list_del(&pdev->domain_list);
>> +        write_unlock(&pdev->domain->pci_lock);
>
> As mentioned on an earlier version, perhaps better (cheaper) to use
> "source" here? (Same in VT-d code then.)

Sorry, I saw you comment for the previous version, but missed to include
this change. It will be done in the next version.

>> @@ -747,6 +749,7 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>>      ret = 0;
>>      if ( !pdev->domain )
>>      {
>> +        write_lock(&hardware_domain->pci_lock);
>>          pdev->domain = hardware_domain;
>>          list_add(&pdev->domain_list, &hardware_domain->pdev_list);
>>  
>> @@ -760,6 +763,7 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>>              printk(XENLOG_ERR "Setup of vPCI failed: %d\n", ret);
>>              list_del(&pdev->domain_list);
>>              pdev->domain = NULL;
>> +            write_unlock(&hardware_domain->pci_lock);
>>              goto out;
>
> In addition to Roger's comments about locking scope: In a case like this
> one it would probably also be good to move the printk() out of the locked
> area. It can be slow, after all.
>
> Question is why you have this wide a locked area here in the first place:
> Don't you need to hold the lock just across the two list operations (but
> not in between)?

Strictly speaking yes, we need to hold lock only when operating on the
list. For now. Next patch will use the same lock to protect the VPCI
(de)alloction, so locked region will be extended anyways.

I think, I'll decrease locked area in this patch and increase in the
next one, it will be most logical.


>> @@ -887,26 +895,62 @@ static int deassign_device(struct domain *d, uint16_t seg, uint8_t bus,
>>  
>>  int pci_release_devices(struct domain *d)
>>  {
>> -    struct pci_dev *pdev, *tmp;
>> -    u8 bus, devfn;
>> -    int ret;
>> +    int combined_ret;
>> +    LIST_HEAD(failed_pdevs);
>>  
>>      pcidevs_lock();
>> -    ret = arch_pci_clean_pirqs(d);
>> -    if ( ret )
>> +    write_lock(&d->pci_lock);
>> +    combined_ret = arch_pci_clean_pirqs(d);
>> +    if ( combined_ret )
>>      {
>>          pcidevs_unlock();
>> -        return ret;
>> +        write_unlock(&d->pci_lock);
>> +        return combined_ret;
>>      }
>> -    list_for_each_entry_safe ( pdev, tmp, &d->pdev_list, domain_list )
>> +
>> +    while ( !list_empty(&d->pdev_list) )
>>      {
>> -        bus = pdev->bus;
>> -        devfn = pdev->devfn;
>> -        ret = deassign_device(d, pdev->seg, bus, devfn) ?: ret;
>> +        struct pci_dev *pdev = list_first_entry(&d->pdev_list,
>> +                                                struct pci_dev,
>> +                                                domain_list);
>> +        uint16_t seg = pdev->seg;
>> +        uint8_t bus = pdev->bus;
>> +        uint8_t devfn = pdev->devfn;
>> +        int ret;
>> +
>> +        write_unlock(&d->pci_lock);
>> +        ret = deassign_device(d, seg, bus, devfn);
>> +        write_lock(&d->pci_lock);
>> +        if ( ret )
>> +        {
>> +            bool still_present = false;
>> +            const struct pci_dev *tmp;
>> +
>> +            /*
>> +             * We need to check if deassign_device() left our pdev in
>> +             * domain's list. As we dropped the lock, we can't be sure
>> +             * that list wasn't permutated in some random way, so we
>> +             * need to traverse the whole list.
>> +             */
>> +            for_each_pdev ( d, tmp )
>> +            {
>> +                if ( tmp == pdev )
>> +                {
>> +                    still_present = true;
>> +                    break;
>> +                }
>> +            }
>> +            if ( still_present )
>> +                list_move(&pdev->domain_list, &failed_pdevs);
>
> In order to retain original ordering on the resulting list, perhaps better
> list_move_tail()?

Yes, thanks.
diff mbox series

Patch

diff --git a/xen/common/domain.c b/xen/common/domain.c
index caaa402637..5d8a8836da 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -645,6 +645,7 @@  struct domain *domain_create(domid_t domid,
 
 #ifdef CONFIG_HAS_PCI
     INIT_LIST_HEAD(&d->pdev_list);
+    rwlock_init(&d->pci_lock);
 #endif
 
     /* All error paths can depend on the above setup. */
diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c b/xen/drivers/passthrough/amd/pci_amd_iommu.c
index 94e3775506..e2f2e2e950 100644
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -476,8 +476,13 @@  static int cf_check reassign_device(
 
     if ( devfn == pdev->devfn && pdev->domain != target )
     {
-        list_move(&pdev->domain_list, &target->pdev_list);
-        pdev->domain = target;
+        write_lock(&pdev->domain->pci_lock);
+        list_del(&pdev->domain_list);
+        write_unlock(&pdev->domain->pci_lock);
+
+        write_lock(&target->pci_lock);
+        list_add(&pdev->domain_list, &target->pdev_list);
+        write_unlock(&target->pci_lock);
     }
 
     /*
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index 95846e84f2..5b4632ead2 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -454,7 +454,9 @@  static void __init _pci_hide_device(struct pci_dev *pdev)
     if ( pdev->domain )
         return;
     pdev->domain = dom_xen;
+    write_lock(&dom_xen->pci_lock);
     list_add(&pdev->domain_list, &dom_xen->pdev_list);
+    write_unlock(&dom_xen->pci_lock);
 }
 
 int __init pci_hide_device(unsigned int seg, unsigned int bus,
@@ -747,6 +749,7 @@  int pci_add_device(u16 seg, u8 bus, u8 devfn,
     ret = 0;
     if ( !pdev->domain )
     {
+        write_lock(&hardware_domain->pci_lock);
         pdev->domain = hardware_domain;
         list_add(&pdev->domain_list, &hardware_domain->pdev_list);
 
@@ -760,6 +763,7 @@  int pci_add_device(u16 seg, u8 bus, u8 devfn,
             printk(XENLOG_ERR "Setup of vPCI failed: %d\n", ret);
             list_del(&pdev->domain_list);
             pdev->domain = NULL;
+            write_unlock(&hardware_domain->pci_lock);
             goto out;
         }
         ret = iommu_add_device(pdev);
@@ -768,8 +772,10 @@  int pci_add_device(u16 seg, u8 bus, u8 devfn,
             vpci_remove_device(pdev);
             list_del(&pdev->domain_list);
             pdev->domain = NULL;
+            write_unlock(&hardware_domain->pci_lock);
             goto out;
         }
+        write_unlock(&hardware_domain->pci_lock);
     }
     else
         iommu_enable_device(pdev);
@@ -812,11 +818,13 @@  int pci_remove_device(u16 seg, u8 bus, u8 devfn)
     list_for_each_entry ( pdev, &pseg->alldevs_list, alldevs_list )
         if ( pdev->bus == bus && pdev->devfn == devfn )
         {
+            write_lock(&pdev->domain->pci_lock);
             vpci_remove_device(pdev);
             pci_cleanup_msi(pdev);
             ret = iommu_remove_device(pdev);
             if ( pdev->domain )
                 list_del(&pdev->domain_list);
+            write_unlock(&pdev->domain->pci_lock);
             printk(XENLOG_DEBUG "PCI remove device %pp\n", &pdev->sbdf);
             free_pdev(pseg, pdev);
             break;
@@ -887,26 +895,62 @@  static int deassign_device(struct domain *d, uint16_t seg, uint8_t bus,
 
 int pci_release_devices(struct domain *d)
 {
-    struct pci_dev *pdev, *tmp;
-    u8 bus, devfn;
-    int ret;
+    int combined_ret;
+    LIST_HEAD(failed_pdevs);
 
     pcidevs_lock();
-    ret = arch_pci_clean_pirqs(d);
-    if ( ret )
+    write_lock(&d->pci_lock);
+    combined_ret = arch_pci_clean_pirqs(d);
+    if ( combined_ret )
     {
         pcidevs_unlock();
-        return ret;
+        write_unlock(&d->pci_lock);
+        return combined_ret;
     }
-    list_for_each_entry_safe ( pdev, tmp, &d->pdev_list, domain_list )
+
+    while ( !list_empty(&d->pdev_list) )
     {
-        bus = pdev->bus;
-        devfn = pdev->devfn;
-        ret = deassign_device(d, pdev->seg, bus, devfn) ?: ret;
+        struct pci_dev *pdev = list_first_entry(&d->pdev_list,
+                                                struct pci_dev,
+                                                domain_list);
+        uint16_t seg = pdev->seg;
+        uint8_t bus = pdev->bus;
+        uint8_t devfn = pdev->devfn;
+        int ret;
+
+        write_unlock(&d->pci_lock);
+        ret = deassign_device(d, seg, bus, devfn);
+        write_lock(&d->pci_lock);
+        if ( ret )
+        {
+            bool still_present = false;
+            const struct pci_dev *tmp;
+
+            /*
+             * We need to check if deassign_device() left our pdev in
+             * domain's list. As we dropped the lock, we can't be sure
+             * that list wasn't permutated in some random way, so we
+             * need to traverse the whole list.
+             */
+            for_each_pdev ( d, tmp )
+            {
+                if ( tmp == pdev )
+                {
+                    still_present = true;
+                    break;
+                }
+            }
+            if ( still_present )
+                list_move(&pdev->domain_list, &failed_pdevs);
+            combined_ret = combined_ret?:ret;
+        }
     }
+
+    list_splice(&failed_pdevs, &d->pdev_list);
+    write_unlock(&d->pci_lock);
     pcidevs_unlock();
 
-    return ret;
+    return combined_ret;
 }
 
 #define PCI_CLASS_BRIDGE_HOST    0x0600
@@ -1125,7 +1169,9 @@  static int __hwdom_init cf_check _setup_hwdom_pci_devices(
             if ( !pdev->domain )
             {
                 pdev->domain = ctxt->d;
+                write_lock(&ctxt->d->pci_lock);
                 list_add(&pdev->domain_list, &ctxt->d->pdev_list);
+                write_unlock(&ctxt->d->pci_lock);
                 setup_one_hwdom_device(ctxt, pdev);
             }
             else if ( pdev->domain == dom_xen )
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index 0e3062c820..55ee3f110d 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2806,7 +2806,14 @@  static int cf_check reassign_device_ownership(
 
     if ( devfn == pdev->devfn && pdev->domain != target )
     {
-        list_move(&pdev->domain_list, &target->pdev_list);
+        write_lock(&pdev->domain->pci_lock);
+        list_del(&pdev->domain_list);
+        write_unlock(&pdev->domain->pci_lock);
+
+        write_lock(&target->pci_lock);
+        list_add(&pdev->domain_list, &target->pdev_list);
+        write_unlock(&target->pci_lock);
+
         pdev->domain = target;
     }
 
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 85242a73d3..80dd150bbf 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -460,6 +460,7 @@  struct domain
 
 #ifdef CONFIG_HAS_PCI
     struct list_head pdev_list;
+    rwlock_t pci_lock;
 #endif
 
 #ifdef CONFIG_HAS_PASSTHROUGH