[11/11] xen/arm: Process pending vPCI map/unmap operations

Message ID	20210903083347.131786-12-andr2000@gmail.com (mailing list archive)
State	Superseded
Headers	show Return-Path: <SRS0=Dhlx=NZ=lists.xenproject.org=xen-devel-bounces@kernel.org> DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org B21E661056 Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org> From: Oleksandr Andrushchenko <andr2000@gmail.com> To: xen-devel@lists.xenproject.org Cc: julien@xen.org, sstabellini@kernel.org, oleksandr_tyshchenko@epam.com, volodymyr_babchuk@epam.com, Artem_Mygaiev@epam.com, roger.pau@citrix.com, bertrand.marquis@arm.com, rahul.singh@arm.com, Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> Subject: [PATCH 11/11] xen/arm: Process pending vPCI map/unmap operations Date: Fri, 3 Sep 2021 11:33:47 +0300 Message-Id: <20210903083347.131786-12-andr2000@gmail.com> In-Reply-To: <20210903083347.131786-1-andr2000@gmail.com> References: <20210903083347.131786-1-andr2000@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	PCI devices passthrough on Arm, part 2 \| expand [00/11] PCI devices passthrough on Arm, part 2 [01/11] xen/arm: Add new device type for PCI [02/11] xen/arm: Add dev_to_pci helper [03/11] xen/arm: Introduce pci_find_host_bridge_node helper [04/11] xen/device-tree: Make dt_find_node_by_phandle global [05/11] xen/arm: Mark device as PCI while creating one [06/11] xen/domain: Call pci_release_devices() when releasing domain resources [07/11] libxl: Allow removing PCI devices for all types of domains [08/11] libxl: Only map legacy PCI IRQs if they are supported [09/11] xen/arm: Setup MMIO range trap handlers for hardware domain [10/11] xen/arm: Do not map PCI ECAM space to Domain-0's p2m [11/11] xen/arm: Process pending vPCI map/unmap operations

Oleksandr Andrushchenko Sept. 3, 2021, 8:33 a.m. UTC

From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

vPCI may map and unmap PCI device memory (BARs) being passed through which
may take a lot of time. For this those operations may be deferred to be
performed later, so that they can be safely preempted.
Run the corresponding vPCI code while switching a vCPU.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
---
 xen/arch/arm/traps.c | 6 ++++++
 1 file changed, 6 insertions(+)

Julien Grall Sept. 3, 2021, 9:04 a.m. UTC | #1

Hi Oleksandr,

On 03/09/2021 09:33, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> vPCI may map and unmap PCI device memory (BARs) being passed through which
> may take a lot of time. For this those operations may be deferred to be
> performed later, so that they can be safely preempted.
> Run the corresponding vPCI code while switching a vCPU.

IIUC, you are talking about the function map_range() in 
xen/drivers/vpci/header. The function has the following todo for Arm:

         /*
          * ARM TODOs:
          * - On ARM whether the memory is prefetchable or not should be 
passed
          *   to map_mmio_regions in order to decide which memory attributes
          *   should be used.
          *
          * - {un}map_mmio_regions doesn't support preemption.
          */

This doesn't seem to be addressed in the two series for PCI passthrough 
sent so far. Do you have any plan to handle it?

> 
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> ---
>   xen/arch/arm/traps.c | 6 ++++++
>   1 file changed, 6 insertions(+)
> 
> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
> index 219ab3c3fbde..1571fb8afd03 100644
> --- a/xen/arch/arm/traps.c
> +++ b/xen/arch/arm/traps.c
> @@ -34,6 +34,7 @@
>   #include <xen/symbols.h>
>   #include <xen/version.h>
>   #include <xen/virtual_region.h>
> +#include <xen/vpci.h>
>   
>   #include <public/sched.h>
>   #include <public/xen.h>
> @@ -2304,6 +2305,11 @@ static bool check_for_vcpu_work(void)
>       }
>   #endif
>   
> +    local_irq_enable();
> +    if ( has_vpci(v->domain) && vpci_process_pending(v) )

Looking at the code of vpci_process_pending(), it looks like there are 
some rework to do for guest. Do you plan to handle it as part of the 
vPCI series?

> +        raise_softirq(SCHEDULE_SOFTIRQ);
> +    local_irq_disable();
> +

 From my understanding of vcpi_process_pending(). The function will 
return true if there are more work to schedule. However, if 
check_for_vcpu_for_work() return false, then we will return to the guest 
before any work for vCPI has finished. This is because 
check_for_vcpu_work() will not be called again.

In this case, I think you want to return as soon as you know we need to 
reschedule.

However, looking at the rest of the code, we already have a check for 
vpci in the common IOREQ code. So we would end up to call twice 
vpci_process_pending(). Maybe we should move the call from the IOREQ to 
arch-code.

Cheers,

Oleksandr Andrushchenko Sept. 6, 2021, 7:02 a.m. UTC | #2

Hi, Julien!

On 03.09.21 12:04, Julien Grall wrote:
> Hi Oleksandr,
>
> On 03/09/2021 09:33, Oleksandr Andrushchenko wrote:
>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>
>> vPCI may map and unmap PCI device memory (BARs) being passed through which
>> may take a lot of time. For this those operations may be deferred to be
>> performed later, so that they can be safely preempted.
>> Run the corresponding vPCI code while switching a vCPU.
>
> IIUC, you are talking about the function map_range() in xen/drivers/vpci/header. The function has the following todo for Arm:
>
>         /*
>          * ARM TODOs:
>          * - On ARM whether the memory is prefetchable or not should be passed
>          *   to map_mmio_regions in order to decide which memory attributes
>          *   should be used.
>          *
>          * - {un}map_mmio_regions doesn't support preemption.
>          */
>
> This doesn't seem to be addressed in the two series for PCI passthrough sent so far. Do you have any plan to handle it?

No plan yet.

All the mappings are happening with p2m_mmio_direct_dev as of now.

>
>>
>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>> ---
>>   xen/arch/arm/traps.c | 6 ++++++
>>   1 file changed, 6 insertions(+)
>>
>> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
>> index 219ab3c3fbde..1571fb8afd03 100644
>> --- a/xen/arch/arm/traps.c
>> +++ b/xen/arch/arm/traps.c
>> @@ -34,6 +34,7 @@
>>   #include <xen/symbols.h>
>>   #include <xen/version.h>
>>   #include <xen/virtual_region.h>
>> +#include <xen/vpci.h>
>>     #include <public/sched.h>
>>   #include <public/xen.h>
>> @@ -2304,6 +2305,11 @@ static bool check_for_vcpu_work(void)
>>       }
>>   #endif
>>   +    local_irq_enable();
>> +    if ( has_vpci(v->domain) && vpci_process_pending(v) )
>
> Looking at the code of vpci_process_pending(), it looks like there are some rework to do for guest. Do you plan to handle it as part of the vPCI series?
Yes, vPCI code is heavily touched to support guest non-identity mappings
>
>> +        raise_softirq(SCHEDULE_SOFTIRQ);
>> +    local_irq_disable();
>> +
>
> From my understanding of vcpi_process_pending(). The function will return true if there are more work to schedule.
Yes
> However, if check_for_vcpu_for_work() return false, then we will return to the guest before any work for vCPI has finished. This is because check_for_vcpu_work() will not be called again.
Correct
>
> In this case, I think you want to return as soon as you know we need to reschedule.
Not sure I understand this
>
> However, looking at the rest of the code, we already have a check for vpci in the common IOREQ code.

Which may not be enabled as it depends on CONFIG_IOREQ_SERVER.

My understanding is that for x86 it is always enabled, but this might not be the case for Arm

> So we would end up to call twice vpci_process_pending().
So, if CONFIG_IOREQ_SERVER is not enabled then we end up with only calling it from traps.c on Arm
> Maybe we should move the call from the IOREQ to arch-code.

Hm. I would better think of moving it from IOREQ to some other common code: for x86 (if

my understanding correct about CONFIG_IOREQ_SERVER) it is by coincidence that we call vPCI

code from there and IOREQ is always enabled.

>
> Cheers,
>
Thank you,

Oleksandr

Julien Grall Sept. 6, 2021, 8:48 a.m. UTC | #3

On 06/09/2021 08:02, Oleksandr Andrushchenko wrote:
> Hi, Julien!

Hi Oleksandr,

> On 03.09.21 12:04, Julien Grall wrote:
>> Hi Oleksandr,
>>
>> On 03/09/2021 09:33, Oleksandr Andrushchenko wrote:
>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>
>>> vPCI may map and unmap PCI device memory (BARs) being passed through which
>>> may take a lot of time. For this those operations may be deferred to be
>>> performed later, so that they can be safely preempted.
>>> Run the corresponding vPCI code while switching a vCPU.
>>
>> IIUC, you are talking about the function map_range() in xen/drivers/vpci/header. The function has the following todo for Arm:
>>
>>          /*
>>           * ARM TODOs:
>>           * - On ARM whether the memory is prefetchable or not should be passed
>>           *   to map_mmio_regions in order to decide which memory attributes
>>           *   should be used.
>>           *
>>           * - {un}map_mmio_regions doesn't support preemption.
>>           */
>>
>> This doesn't seem to be addressed in the two series for PCI passthrough sent so far. Do you have any plan to handle it?
> 
> No plan yet.
> 
> All the mappings are happening with p2m_mmio_direct_dev as of now.

So this address the first TODO but how about the second TODO? It refers 
to the lack of preemption on Arm but in this patch you suggest there are 
some and hence we need to call vpci_process_pending().

For a tech preview, the lack of preemption would be OK. However, the 
commit message should be updated to make clear there are no such 
preemption yet or avoid to mention it.

> 
>>
>>>
>>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>> ---
>>>    xen/arch/arm/traps.c | 6 ++++++
>>>    1 file changed, 6 insertions(+)
>>>
>>> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
>>> index 219ab3c3fbde..1571fb8afd03 100644
>>> --- a/xen/arch/arm/traps.c
>>> +++ b/xen/arch/arm/traps.c
>>> @@ -34,6 +34,7 @@
>>>    #include <xen/symbols.h>
>>>    #include <xen/version.h>
>>>    #include <xen/virtual_region.h>
>>> +#include <xen/vpci.h>
>>>      #include <public/sched.h>
>>>    #include <public/xen.h>
>>> @@ -2304,6 +2305,11 @@ static bool check_for_vcpu_work(void)
>>>        }
>>>    #endif
>>>    +    local_irq_enable();
>>> +    if ( has_vpci(v->domain) && vpci_process_pending(v) )
>>
>> Looking at the code of vpci_process_pending(), it looks like there are some rework to do for guest. Do you plan to handle it as part of the vPCI series?
> Yes, vPCI code is heavily touched to support guest non-identity mappings

I wasn't referring to the non-identity mappings here. I was referring to 
TODOs such as:

             /*
              * FIXME: in case of failure remove the device from the domain.
              * Note that there might still be leftover mappings. While 
this is
              * safe for Dom0, for DomUs the domain will likely need to be
              * killed in order to avoid leaking stale p2m mappings on
              * failure.
              */

You still have them after the series reworking the vPCI. As for the 
preemption this is OK to ignore it for a tech preview. Although, we want 
to at least track them.

>>
>>> +        raise_softirq(SCHEDULE_SOFTIRQ);
>>> +    local_irq_disable();
>>> +
>>
>>  From my understanding of vcpi_process_pending(). The function will return true if there are more work to schedule.
> Yes
>> However, if check_for_vcpu_for_work() return false, then we will return to the guest before any work for vCPI has finished. This is because check_for_vcpu_work() will not be called again.
> Correct
>>
>> In this case, I think you want to return as soon as you know we need to reschedule.
> Not sure I understand this

The return value of check_for_vcpu_for_work() indicates whether we have 
more work to do before returning to return to the guest.

When vpci_process_pending() returns true, it tells us we need to call 
the function at least one more time before returning to the guest.

In your current implementation, you leave that decision to whoeever is 
next in the function.

It is not safe to return to the guest as long as vpci_process_pending() 
returns true. So you want to write something like:

if ( vpci_process_pending() )
   return true;

>>
>> However, looking at the rest of the code, we already have a check for vpci in the common IOREQ code.
> 
> Which may not be enabled as it depends on CONFIG_IOREQ_SERVER.

Right. My point is when CONFIG_IOREQ_SERVER is set then you would end up 
to call twice vpci_process_pending(). This will have an impact how on 
long your vCPU is going to running because you are doubling the work.

> 
> My understanding is that for x86 it is always enabled, but this might not be the case for Arm
> 
>> So we would end up to call twice vpci_process_pending().
> So, if CONFIG_IOREQ_SERVER is not enabled then we end up with only calling it from traps.c on Arm
>> Maybe we should move the call from the IOREQ to arch-code.
> 
> Hm. I would better think of moving it from IOREQ to some other common code: for x86 (if
> 
> my understanding correct about CONFIG_IOREQ_SERVER) it is by coincidence that we call vPCI
> 
> code from there and IOREQ is always enabled.

I am not aware of another suitable common helper that would be called on 
the return to the guest path. Hence why I suggest to possibly duplicated 
the code in each arch path.

Cheers,

Oleksandr Andrushchenko Sept. 6, 2021, 9:14 a.m. UTC | #4

On 06.09.21 11:48, Julien Grall wrote:
> On 06/09/2021 08:02, Oleksandr Andrushchenko wrote:
>> Hi, Julien!
>
> Hi Oleksandr,
>
>> On 03.09.21 12:04, Julien Grall wrote:
>>> Hi Oleksandr,
>>>
>>> On 03/09/2021 09:33, Oleksandr Andrushchenko wrote:
>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>>
>>>> vPCI may map and unmap PCI device memory (BARs) being passed through which
>>>> may take a lot of time. For this those operations may be deferred to be
>>>> performed later, so that they can be safely preempted.
>>>> Run the corresponding vPCI code while switching a vCPU.
>>>
>>> IIUC, you are talking about the function map_range() in xen/drivers/vpci/header. The function has the following todo for Arm:
>>>
>>>          /*
>>>           * ARM TODOs:
>>>           * - On ARM whether the memory is prefetchable or not should be passed
>>>           *   to map_mmio_regions in order to decide which memory attributes
>>>           *   should be used.
>>>           *
>>>           * - {un}map_mmio_regions doesn't support preemption.
>>>           */
>>>
>>> This doesn't seem to be addressed in the two series for PCI passthrough sent so far. Do you have any plan to handle it?
>>
>> No plan yet.
>>
>> All the mappings are happening with p2m_mmio_direct_dev as of now.
>
> So this address the first TODO but how about the second TODO? It refers to the lack of preemption on Arm but in this patch you suggest there are some and hence we need to call vpci_process_pending().
>
> For a tech preview, the lack of preemption would be OK. However, the commit message should be updated to make clear there are no such preemption yet or avoid to mention it.

Well, the comment was not added by me (by Roger I guess), I just keep it.

As to the preemption both map and unmap are happening via vpci_process_pending, so

what is true for map is also true for unmap with this respect

>
>>
>>>
>>>>
>>>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>> ---
>>>>    xen/arch/arm/traps.c | 6 ++++++
>>>>    1 file changed, 6 insertions(+)
>>>>
>>>> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
>>>> index 219ab3c3fbde..1571fb8afd03 100644
>>>> --- a/xen/arch/arm/traps.c
>>>> +++ b/xen/arch/arm/traps.c
>>>> @@ -34,6 +34,7 @@
>>>>    #include <xen/symbols.h>
>>>>    #include <xen/version.h>
>>>>    #include <xen/virtual_region.h>
>>>> +#include <xen/vpci.h>
>>>>      #include <public/sched.h>
>>>>    #include <public/xen.h>
>>>> @@ -2304,6 +2305,11 @@ static bool check_for_vcpu_work(void)
>>>>        }
>>>>    #endif
>>>>    +    local_irq_enable();
>>>> +    if ( has_vpci(v->domain) && vpci_process_pending(v) )
>>>
>>> Looking at the code of vpci_process_pending(), it looks like there are some rework to do for guest. Do you plan to handle it as part of the vPCI series?
>> Yes, vPCI code is heavily touched to support guest non-identity mappings
>
> I wasn't referring to the non-identity mappings here. I was referring to TODOs such as:
>
>             /*
>              * FIXME: in case of failure remove the device from the domain.
>              * Note that there might still be leftover mappings. While this is
>              * safe for Dom0, for DomUs the domain will likely need to be
>              * killed in order to avoid leaking stale p2m mappings on
>              * failure.
>              */
>
> You still have them after the series reworking the vPCI. As for the preemption this is OK to ignore it for a tech preview. Although, we want to at least track them.
Please see above: both map and unmap are happening via vpci_process_pending
>
>>>
>>>> + raise_softirq(SCHEDULE_SOFTIRQ);
>>>> +    local_irq_disable();
>>>> +
>>>
>>>  From my understanding of vcpi_process_pending(). The function will return true if there are more work to schedule.
>> Yes
>>> However, if check_for_vcpu_for_work() return false, then we will return to the guest before any work for vCPI has finished. This is because check_for_vcpu_work() will not be called again.
>> Correct
>>>
>>> In this case, I think you want to return as soon as you know we need to reschedule.
>> Not sure I understand this
>
I was more referring to "I think you want to return as soon as you know we need to reschedule."
> The return value of check_for_vcpu_for_work() indicates whether we have more work to do before returning to return to the guest.
>
> When vpci_process_pending() returns true, it tells us we need to call the function at least one more time before returning to the guest.
>
> In your current implementation, you leave that decision to whoeever is next in the function.
>
> It is not safe to return to the guest as long as vpci_process_pending() returns true. So you want to write something like:
>
> if ( vpci_process_pending() )
>   return true;
--- a/xen/arch/arm/traps.c

+++ b/xen/arch/arm/traps.c
@@ -2291,6 +2291,9 @@ static bool check_for_vcpu_work(void)
  {
      struct vcpu *v = current;

+    if ( vpci_process_pending() )
+      return true;
+
  #ifdef CONFIG_IOREQ_SERVER
      if ( domain_has_ioreq_server(v->domain) )
      {
Do you mean something like this?


>
>>>
>>> However, looking at the rest of the code, we already have a check for vpci in the common IOREQ code.
>>
>> Which may not be enabled as it depends on CONFIG_IOREQ_SERVER.
>
> Right. My point is when CONFIG_IOREQ_SERVER is set then you would end up to call twice vpci_process_pending(). This will have an impact how on long your vCPU is going to running because you are doubling the work.

So, you suggest that we have in the common IOREQ code something call like

arch_vpci_process_pending? In case of x86 it will have the code currently found in the

common IOREQ sources and for Arm it will be nop?

Any better suggestion for the name?

>
>>
>> My understanding is that for x86 it is always enabled, but this might not be the case for Arm
>>
>>> So we would end up to call twice vpci_process_pending().
>> So, if CONFIG_IOREQ_SERVER is not enabled then we end up with only calling it from traps.c on Arm
>>> Maybe we should move the call from the IOREQ to arch-code.
>>
>> Hm. I would better think of moving it from IOREQ to some other common code: for x86 (if
>>
>> my understanding correct about CONFIG_IOREQ_SERVER) it is by coincidence that we call vPCI
>>
>> code from there and IOREQ is always enabled.
>
> I am not aware of another suitable common helper that would be called on the return to the guest path. Hence why I suggest to possibly duplicated the code in each arch path.
I see
>
> Cheers,
>

Julien Grall Sept. 6, 2021, 9:53 a.m. UTC | #5

Hi Oleksandr,

On 06/09/2021 10:14, Oleksandr Andrushchenko wrote:
> 
> On 06.09.21 11:48, Julien Grall wrote:
>> On 06/09/2021 08:02, Oleksandr Andrushchenko wrote:
>>> Hi, Julien!
>>
>> Hi Oleksandr,
>>
>>> On 03.09.21 12:04, Julien Grall wrote:
>>>> Hi Oleksandr,
>>>>
>>>> On 03/09/2021 09:33, Oleksandr Andrushchenko wrote:
>>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>>>
>>>>> vPCI may map and unmap PCI device memory (BARs) being passed through which
>>>>> may take a lot of time. For this those operations may be deferred to be
>>>>> performed later, so that they can be safely preempted.
>>>>> Run the corresponding vPCI code while switching a vCPU.
>>>>
>>>> IIUC, you are talking about the function map_range() in xen/drivers/vpci/header. The function has the following todo for Arm:
>>>>
>>>>           /*
>>>>            * ARM TODOs:
>>>>            * - On ARM whether the memory is prefetchable or not should be passed
>>>>            *   to map_mmio_regions in order to decide which memory attributes
>>>>            *   should be used.
>>>>            *
>>>>            * - {un}map_mmio_regions doesn't support preemption.
>>>>            */
>>>>
>>>> This doesn't seem to be addressed in the two series for PCI passthrough sent so far. Do you have any plan to handle it?
>>>
>>> No plan yet.
>>>
>>> All the mappings are happening with p2m_mmio_direct_dev as of now.
>>
>> So this address the first TODO but how about the second TODO? It refers to the lack of preemption on Arm but in this patch you suggest there are some and hence we need to call vpci_process_pending().
>>
>> For a tech preview, the lack of preemption would be OK. However, the commit message should be updated to make clear there are no such preemption yet or avoid to mention it.
> 
> Well, the comment was not added by me (by Roger I guess), I just keep it.

I don't think it matters to know who added it. What matters is when 
those comments are going to be handled. If they are already handled, 
then they should be dropped.

If they are not, the two TODOs listed above are probably OK to defer as 
you only plan a tech preview. But they would need to be handled before 
vCPI is selected by default and used in production.

Note that I specifically wrote "the two TODOs listed above" because I 
haven't looked at the other TODOs/FIXMEs and figued out they are fine to 
defer.

> 
> As to the preemption both map and unmap are happening via vpci_process_pending, so

Right... this doesn't mean preemption is actually supported on Arm. 
vpci_process_pending() doesn't do the preemption itself. It relies on 
map_range() to do it.

But even map_range() relies on the arch specific helper 
{,un}map_mmio_regions() to do it. If you look at the x86 implementation 
they are adding at max MAX_MMIO_MAX_ITER entries per call. On Arm, there 
are not such limit. Therefore the function will always do the full 
{,un}mapping before returning. IOW there are no preemption supported.

> 
> what is true for map is also true for unmap with this respect
> 
>>
>>>
>>>>
>>>>>
>>>>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>>> ---
>>>>>     xen/arch/arm/traps.c | 6 ++++++
>>>>>     1 file changed, 6 insertions(+)
>>>>>
>>>>> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
>>>>> index 219ab3c3fbde..1571fb8afd03 100644
>>>>> --- a/xen/arch/arm/traps.c
>>>>> +++ b/xen/arch/arm/traps.c
>>>>> @@ -34,6 +34,7 @@
>>>>>     #include <xen/symbols.h>
>>>>>     #include <xen/version.h>
>>>>>     #include <xen/virtual_region.h>
>>>>> +#include <xen/vpci.h>
>>>>>       #include <public/sched.h>
>>>>>     #include <public/xen.h>
>>>>> @@ -2304,6 +2305,11 @@ static bool check_for_vcpu_work(void)
>>>>>         }
>>>>>     #endif
>>>>>     +    local_irq_enable();
>>>>> +    if ( has_vpci(v->domain) && vpci_process_pending(v) )
>>>>
>>>> Looking at the code of vpci_process_pending(), it looks like there are some rework to do for guest. Do you plan to handle it as part of the vPCI series?
>>> Yes, vPCI code is heavily touched to support guest non-identity mappings
>>
>> I wasn't referring to the non-identity mappings here. I was referring to TODOs such as:
>>
>>              /*
>>               * FIXME: in case of failure remove the device from the domain.
>>               * Note that there might still be leftover mappings. While this is
>>               * safe for Dom0, for DomUs the domain will likely need to be
>>               * killed in order to avoid leaking stale p2m mappings on
>>               * failure.
>>               */
>>
>> You still have them after the series reworking the vPCI. As for the preemption this is OK to ignore it for a tech preview. Although, we want to at least track them.
> Please see above: both map and unmap are happening via vpci_process_pending

I am not sure how this is relevant to what I just mentionned.

>>
>>>>
>>>>> + raise_softirq(SCHEDULE_SOFTIRQ);
>>>>> +    local_irq_disable();
>>>>> +
>>>>
>>>>   From my understanding of vcpi_process_pending(). The function will return true if there are more work to schedule.
>>> Yes
>>>> However, if check_for_vcpu_for_work() return false, then we will return to the guest before any work for vCPI has finished. This is because check_for_vcpu_work() will not be called again.
>>> Correct
>>>>
>>>> In this case, I think you want to return as soon as you know we need to reschedule.
>>> Not sure I understand this
>>
> I was more referring to "I think you want to return as soon as you know we need to reschedule."
>> The return value of check_for_vcpu_for_work() indicates whether we have more work to do before returning to return to the guest.
>>
>> When vpci_process_pending() returns true, it tells us we need to call the function at least one more time before returning to the guest.
>>
>> In your current implementation, you leave that decision to whoeever is next in the function.
>>
>> It is not safe to return to the guest as long as vpci_process_pending() returns true. So you want to write something like:
>>
>> if ( vpci_process_pending() )
>>    return true;
> --- a/xen/arch/arm/traps.c
> 
> +++ b/xen/arch/arm/traps.c
> @@ -2291,6 +2291,9 @@ static bool check_for_vcpu_work(void)
>    {
>        struct vcpu *v = current;
> 
> +    if ( vpci_process_pending() )
> +      return true;
> +
>    #ifdef CONFIG_IOREQ_SERVER
>        if ( domain_has_ioreq_server(v->domain) )
>        {
> Do you mean something like this?

Yes.

>>>> However, looking at the rest of the code, we already have a check for vpci in the common IOREQ code.
>>>
>>> Which may not be enabled as it depends on CONFIG_IOREQ_SERVER.
>>
>> Right. My point is when CONFIG_IOREQ_SERVER is set then you would end up to call twice vpci_process_pending(). This will have an impact how on long your vCPU is going to running because you are doubling the work.
> 
> So, you suggest that we have in the common IOREQ code something call like
> 
> arch_vpci_process_pending? In case of x86 it will have the code currently found in the
> 
> common IOREQ sources and for Arm it will be nop?

No I am suggesting to move the call of the IOREQ code to hvm_do_resume() 
(on x86) and check_for_vcpu_work() (on Arm).

Cheers,

Oleksandr Andrushchenko Sept. 6, 2021, 10:06 a.m. UTC | #6

On 06.09.21 12:53, Julien Grall wrote:
> Hi Oleksandr,
>
> On 06/09/2021 10:14, Oleksandr Andrushchenko wrote:
>>
>> On 06.09.21 11:48, Julien Grall wrote:
>>> On 06/09/2021 08:02, Oleksandr Andrushchenko wrote:
>>>> Hi, Julien!
>>>
>>> Hi Oleksandr,
>>>
>>>> On 03.09.21 12:04, Julien Grall wrote:
>>>>> Hi Oleksandr,
>>>>>
>>>>> On 03/09/2021 09:33, Oleksandr Andrushchenko wrote:
>>>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>>>>
>>>>>> vPCI may map and unmap PCI device memory (BARs) being passed through which
>>>>>> may take a lot of time. For this those operations may be deferred to be
>>>>>> performed later, so that they can be safely preempted.
>>>>>> Run the corresponding vPCI code while switching a vCPU.
>>>>>
>>>>> IIUC, you are talking about the function map_range() in xen/drivers/vpci/header. The function has the following todo for Arm:
>>>>>
>>>>>           /*
>>>>>            * ARM TODOs:
>>>>>            * - On ARM whether the memory is prefetchable or not should be passed
>>>>>            *   to map_mmio_regions in order to decide which memory attributes
>>>>>            *   should be used.
>>>>>            *
>>>>>            * - {un}map_mmio_regions doesn't support preemption.
>>>>>            */
>>>>>
>>>>> This doesn't seem to be addressed in the two series for PCI passthrough sent so far. Do you have any plan to handle it?
>>>>
>>>> No plan yet.
>>>>
>>>> All the mappings are happening with p2m_mmio_direct_dev as of now.
>>>
>>> So this address the first TODO but how about the second TODO? It refers to the lack of preemption on Arm but in this patch you suggest there are some and hence we need to call vpci_process_pending().
>>>
>>> For a tech preview, the lack of preemption would be OK. However, the commit message should be updated to make clear there are no such preemption yet or avoid to mention it.
>>
>> Well, the comment was not added by me (by Roger I guess), I just keep it.
>
> I don't think it matters to know who added it. What matters is when those comments are going to be handled. If they are already handled, then they should be dropped.
>
> If they are not, the two TODOs listed above are probably OK to defer as you only plan a tech preview. But they would need to be handled before vCPI is selected by default and used in production.
>
> Note that I specifically wrote "the two TODOs listed above" because I haven't looked at the other TODOs/FIXMEs and figued out they are fine to defer.
Ok, then I leave the TODOs as they are
>
>>
>> As to the preemption both map and unmap are happening via vpci_process_pending, so
>
> Right... this doesn't mean preemption is actually supported on Arm. vpci_process_pending() doesn't do the preemption itself. It relies on map_range() to do it.
>
> But even map_range() relies on the arch specific helper {,un}map_mmio_regions() to do it. If you look at the x86 implementation they are adding at max MAX_MMIO_MAX_ITER entries per call. On Arm, there are not such limit. Therefore the function will always do the full {,un}mapping before returning. IOW there are no preemption supported.
Ok
>
>>
>> what is true for map is also true for unmap with this respect
>>
>>>
>>>>
>>>>>
>>>>>>
>>>>>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>>>> ---
>>>>>>     xen/arch/arm/traps.c | 6 ++++++
>>>>>>     1 file changed, 6 insertions(+)
>>>>>>
>>>>>> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
>>>>>> index 219ab3c3fbde..1571fb8afd03 100644
>>>>>> --- a/xen/arch/arm/traps.c
>>>>>> +++ b/xen/arch/arm/traps.c
>>>>>> @@ -34,6 +34,7 @@
>>>>>>     #include <xen/symbols.h>
>>>>>>     #include <xen/version.h>
>>>>>>     #include <xen/virtual_region.h>
>>>>>> +#include <xen/vpci.h>
>>>>>>       #include <public/sched.h>
>>>>>>     #include <public/xen.h>
>>>>>> @@ -2304,6 +2305,11 @@ static bool check_for_vcpu_work(void)
>>>>>>         }
>>>>>>     #endif
>>>>>>     +    local_irq_enable();
>>>>>> +    if ( has_vpci(v->domain) && vpci_process_pending(v) )
>>>>>
>>>>> Looking at the code of vpci_process_pending(), it looks like there are some rework to do for guest. Do you plan to handle it as part of the vPCI series?
>>>> Yes, vPCI code is heavily touched to support guest non-identity mappings
>>>
>>> I wasn't referring to the non-identity mappings here. I was referring to TODOs such as:
>>>
>>>              /*
>>>               * FIXME: in case of failure remove the device from the domain.
>>>               * Note that there might still be leftover mappings. While this is
>>>               * safe for Dom0, for DomUs the domain will likely need to be
>>>               * killed in order to avoid leaking stale p2m mappings on
>>>               * failure.
>>>               */
>>>
>>> You still have them after the series reworking the vPCI. As for the preemption this is OK to ignore it for a tech preview. Although, we want to at least track them.
>> Please see above: both map and unmap are happening via vpci_process_pending
>
> I am not sure how this is relevant to what I just mentionned.
>
>>>
>>>>>
>>>>>> + raise_softirq(SCHEDULE_SOFTIRQ);
>>>>>> +    local_irq_disable();
>>>>>> +
>>>>>
>>>>>   From my understanding of vcpi_process_pending(). The function will return true if there are more work to schedule.
>>>> Yes
>>>>> However, if check_for_vcpu_for_work() return false, then we will return to the guest before any work for vCPI has finished. This is because check_for_vcpu_work() will not be called again.
>>>> Correct
>>>>>
>>>>> In this case, I think you want to return as soon as you know we need to reschedule.
>>>> Not sure I understand this
>>>
>> I was more referring to "I think you want to return as soon as you know we need to reschedule."
>>> The return value of check_for_vcpu_for_work() indicates whether we have more work to do before returning to return to the guest.
>>>
>>> When vpci_process_pending() returns true, it tells us we need to call the function at least one more time before returning to the guest.
>>>
>>> In your current implementation, you leave that decision to whoeever is next in the function.
>>>
>>> It is not safe to return to the guest as long as vpci_process_pending() returns true. So you want to write something like:
>>>
>>> if ( vpci_process_pending() )
>>>    return true;
>> --- a/xen/arch/arm/traps.c
>>
>> +++ b/xen/arch/arm/traps.c
>> @@ -2291,6 +2291,9 @@ static bool check_for_vcpu_work(void)
>>    {
>>        struct vcpu *v = current;
>>
>> +    if ( vpci_process_pending() )
>> +      return true;
>> +
>>    #ifdef CONFIG_IOREQ_SERVER
>>        if ( domain_has_ioreq_server(v->domain) )
>>        {
>> Do you mean something like this?
>
> Yes.
Ok, I'll add this check
>
>>>>> However, looking at the rest of the code, we already have a check for vpci in the common IOREQ code.
>>>>
>>>> Which may not be enabled as it depends on CONFIG_IOREQ_SERVER.
>>>
>>> Right. My point is when CONFIG_IOREQ_SERVER is set then you would end up to call twice vpci_process_pending(). This will have an impact how on long your vCPU is going to running because you are doubling the work.
>>
>> So, you suggest that we have in the common IOREQ code something call like
>>
>> arch_vpci_process_pending? In case of x86 it will have the code currently found in the
>>
>> common IOREQ sources and for Arm it will be nop?
>
> No I am suggesting to move the call of the IOREQ code to hvm_do_resume() (on x86) and check_for_vcpu_work() (on Arm).

Ok, I can move vPCI code to hvm_do_resume, but vPCI is only used for x86 PVH Dom0.

Do you still think hvm_do_resume is the right place?

>
> Cheers,
>
Thanks,

Oleksandr

Julien Grall Sept. 6, 2021, 10:38 a.m. UTC | #7

Hi Oleksandr,

On 06/09/2021 11:06, Oleksandr Andrushchenko wrote:
> On 06.09.21 12:53, Julien Grall wrote:
>>>>>> However, looking at the rest of the code, we already have a check for vpci in the common IOREQ code.
>>>>>
>>>>> Which may not be enabled as it depends on CONFIG_IOREQ_SERVER.
>>>>
>>>> Right. My point is when CONFIG_IOREQ_SERVER is set then you would end up to call twice vpci_process_pending(). This will have an impact how on long your vCPU is going to running because you are doubling the work.
>>>
>>> So, you suggest that we have in the common IOREQ code something call like
>>>
>>> arch_vpci_process_pending? In case of x86 it will have the code currently found in the
>>>
>>> common IOREQ sources and for Arm it will be nop?
>>
>> No I am suggesting to move the call of the IOREQ code to hvm_do_resume() (on x86) and check_for_vcpu_work() (on Arm).
> 
> Ok, I can move vPCI code to hvm_do_resume, but vPCI is only used for x86 PVH Dom0.

AFAIK, Roger is planning to use it for x86 PVH guest.

> 
> Do you still think hvm_do_resume is the right place?
I think so. AFAICT, on x86, the only caller of 
vcpu_ioreq_handle_completion() is hvm_do_resume(). So it makes sense to 
push one layer up.

Cheers,

Oleksandr Andrushchenko Sept. 7, 2021, 6:34 a.m. UTC | #8

On 06.09.21 13:38, Julien Grall wrote:
> Hi Oleksandr,
>
> On 06/09/2021 11:06, Oleksandr Andrushchenko wrote:
>> On 06.09.21 12:53, Julien Grall wrote:
>>>>>>> However, looking at the rest of the code, we already have a check for vpci in the common IOREQ code.
>>>>>>
>>>>>> Which may not be enabled as it depends on CONFIG_IOREQ_SERVER.
>>>>>
>>>>> Right. My point is when CONFIG_IOREQ_SERVER is set then you would end up to call twice vpci_process_pending(). This will have an impact how on long your vCPU is going to running because you are doubling the work.
>>>>
>>>> So, you suggest that we have in the common IOREQ code something call like
>>>>
>>>> arch_vpci_process_pending? In case of x86 it will have the code currently found in the
>>>>
>>>> common IOREQ sources and for Arm it will be nop?
>>>
>>> No I am suggesting to move the call of the IOREQ code to hvm_do_resume() (on x86) and check_for_vcpu_work() (on Arm).
>>
>> Ok, I can move vPCI code to hvm_do_resume, but vPCI is only used for x86 PVH Dom0.
>
> AFAIK, Roger is planning to use it for x86 PVH guest.
>
>>
>> Do you still think hvm_do_resume is the right place?
> I think so. AFAICT, on x86, the only caller of vcpu_ioreq_handle_completion() is hvm_do_resume(). So it makes sense to push one layer up.

Ok, so I ended up with:

--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -2305,10 +2305,17 @@ static bool check_for_vcpu_work(void)
      }
  #endif

-    local_irq_enable();
-    if ( has_vpci(v->domain) && vpci_process_pending(v) )
-        raise_softirq(SCHEDULE_SOFTIRQ);
-    local_irq_disable();
+    if ( has_vpci(v->domain) )
+    {
+        bool pending;
+
+        local_irq_enable();
+        pending = vpci_process_pending(v);
+        local_irq_disable();
+
+        if ( pending )
+            return true;
+    }
This is how it is done for IOREQ. It seems there is no need to raise softirq.

I also moved vPCI from common code to hvm_do_resume for x86

>
> Cheers,
>
Thanks,

Oleksandr

[11/11] xen/arm: Process pending vPCI map/unmap operations

Commit Message

Comments

Patch