[RFC] x86/apicv: fix RTC periodic timer and apicv issue

Message ID	E0A769A898ADB6449596C41F51EF62C6AA038E@SZXEMI506-MBS.china.huawei.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <xen-devel-bounces@lists.xen.org> From: "Xuquan (Euler)" <xuquan8@huawei.com> To: "xen-devel@lists.xen.org" <xen-devel@lists.xen.org> Thread-Topic: [RFC PATCH] x86/apicv: fix RTC periodic timer and apicv issue Thread-Index: AdH6GWlAeGO11MDvRE2AaS+FhckIPg== Date: Fri, 19 Aug 2016 12:58:53 +0000 Message-ID: <E0A769A898ADB6449596C41F51EF62C6AA038E@SZXEMI506-MBS.china.huawei.com> Accept-Language: en-US Content-Language: zh-CN Content-Type: multipart/mixed; boundary="_002_E0A769A898ADB6449596C41F51EF62C6AA038ESZXEMI506MBSchina_" MIME-Version: 1.0 Cc: "yang.zhang.wz@gmail.com" <yang.zhang.wz@gmail.com>, Kevin Tian <kevin.tian@intel.com>, "jbeulich@suse.com" <jbeulich@suse.com>, "George.Dunlap@eu.citrix.com" <George.Dunlap@eu.citrix.com>, Andrew Cooper <andrew.cooper3@citrix.com>, "jun.nakajima@intel.com" <jun.nakajima@intel.com> Subject: [Xen-devel] [RFC PATCH] x86/apicv: fix RTC periodic timer and apicv issue Precedence: list Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" <xen-devel-bounces@lists.xen.org>

Xuquan (Euler) Aug. 19, 2016, 12:58 p.m. UTC

From 9b2df963c13ad27e2cffbeddfa3267782ac3da2a Mon Sep 17 00:00:00 2001
From: Quan Xu <xuquan8@huawei.com>
Date: Fri, 19 Aug 2016 20:40:31 +0800
Subject: [RFC PATCH] x86/apicv: fix RTC periodic timer and apicv issue

When Xen apicv is enabled, wall clock time is faster on Windows7-32 guest
with high payload (with 2vCPU, captured from xentrace, in high payload, the
count of IPI interrupt increases rapidly between these vCPUs).

If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector 0xd1)
are both pending (index of bit set in VIRR), unfortunately, the IPI intrrupt
is high priority than periodic timer interrupt. Xen updates IPI interrupt bit
set in VIRR to guest interrupt status (RVI) as a high priority and apicv
(Virtual-Interrupt Delivery) delivers IPI interrupt within VMX non-root
operation without a VM exit. Within VMX non-root operation, if periodic timer
interrupt index of bit is set in VIRR and highest, the apicv delivers periodic
timer interrupt within VMX non-root operation as well.

But in current code, if Xen doesn't update periodic timer interrupt bit set
in VIRR to guest interrupt status (RVI) directly, Xen is not aware of this
case to decrease the count (pending_intr_nr) of pending periodic timer interrupt,
then Xen will deliver a periodic timer interrupt again. The guest receives more
periodic timer interrupt.

If the periodic timer interrut is delivered and not the highest priority, make
Xen be aware of this case to decrease the count of pending periodic timer
interrupt.

Signed-off-by: Yifei Jiang <jiangyifei@huawei.com>
Signed-off-by: Rongguang He <herongguang.he@huawei.com>
Signed-off-by: Quan Xu <xuquan8@huawei.com>
---
Why RFC:
1. I am not quite sure for other cases, such as nested case.
2. Within VMX non-root operation, an Asynchronous Enclave Exit (including external
   interrupts, non-maskable interrupt system-management interrrupts, exceptions
   and VM exit) may occur before delivery of a periodic timer interrupt, the periodic
   timer interrupt may be lost when a coming periodic timer interrupt is delivered.
   Actually, and so current code is.
---
 xen/arch/x86/hvm/vmx/intr.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

--
1.7.12.4

Yang Zhang Aug. 22, 2016, 9:38 a.m. UTC | #1

On 2016/8/19 20:58, Xuquan (Euler) wrote:
> From 9b2df963c13ad27e2cffbeddfa3267782ac3da2a Mon Sep 17 00:00:00 2001
> From: Quan Xu <xuquan8@huawei.com>
> Date: Fri, 19 Aug 2016 20:40:31 +0800
> Subject: [RFC PATCH] x86/apicv: fix RTC periodic timer and apicv issue
>
> When Xen apicv is enabled, wall clock time is faster on Windows7-32 guest
> with high payload (with 2vCPU, captured from xentrace, in high payload, the
> count of IPI interrupt increases rapidly between these vCPUs).
>
> If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector 0xd1)
> are both pending (index of bit set in VIRR), unfortunately, the IPI intrrupt
> is high priority than periodic timer interrupt. Xen updates IPI interrupt bit
> set in VIRR to guest interrupt status (RVI) as a high priority and apicv
> (Virtual-Interrupt Delivery) delivers IPI interrupt within VMX non-root
> operation without a VM exit. Within VMX non-root operation, if periodic timer
> interrupt index of bit is set in VIRR and highest, the apicv delivers periodic
> timer interrupt within VMX non-root operation as well.
>
> But in current code, if Xen doesn't update periodic timer interrupt bit set
> in VIRR to guest interrupt status (RVI) directly, Xen is not aware of this
> case to decrease the count (pending_intr_nr) of pending periodic timer interrupt,
> then Xen will deliver a periodic timer interrupt again. The guest receives more
> periodic timer interrupt.
>
> If the periodic timer interrut is delivered and not the highest priority, make
> Xen be aware of this case to decrease the count of pending periodic timer
> interrupt.

Nice catch!

>
> Signed-off-by: Yifei Jiang <jiangyifei@huawei.com>
> Signed-off-by: Rongguang He <herongguang.he@huawei.com>
> Signed-off-by: Quan Xu <xuquan8@huawei.com>
> ---
> Why RFC:
> 1. I am not quite sure for other cases, such as nested case.

We don't support the nested APICv now. So it is ok for this patch.

> 2. Within VMX non-root operation, an Asynchronous Enclave Exit (including external
>    interrupts, non-maskable interrupt system-management interrrupts, exceptions
>    and VM exit) may occur before delivery of a periodic timer interrupt, the periodic
>    timer interrupt may be lost when a coming periodic timer interrupt is delivered.
>    Actually, and so current code is.

I think you mean the combination not interrupt lost. It should be ok 
since it happens rarely and even native OS will encounter the same 
problem if it disable the interrupt for too long.

> ---
>  xen/arch/x86/hvm/vmx/intr.c | 16 +++++++++++++++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/xen/arch/x86/hvm/vmx/intr.c b/xen/arch/x86/hvm/vmx/intr.c
> index 8fca08c..d3a034e 100644
> --- a/xen/arch/x86/hvm/vmx/intr.c
> +++ b/xen/arch/x86/hvm/vmx/intr.c
> @@ -334,7 +334,21 @@ void vmx_intr_assist(void)
>              __vmwrite(EOI_EXIT_BITMAP(i), v->arch.hvm_vmx.eoi_exit_bitmap[i]);
>          }
>
> -        pt_intr_post(v, intack);
> +       /*
> +        * If the periodic timer interrut is delivered and not the highest priority,
> +        * make Xen be aware of this case to decrease the count of pending periodic
> +        * timer interrupt.
> +        */
> +        if ( pt_vector != -1 && intack.vector > pt_vector )
> +        {
> +            struct hvm_intack pt_intack;
> +
> +            pt_intack.vector = pt_vector;
> +            pt_intack.source = hvm_intsrc_lapic;
> +            pt_intr_post(v, pt_intack);
> +        }
> +        else
> +            pt_intr_post(v, intack);
>      }
>      else
>      {
> --
> 1.7.12.4
>

Looks good to me.

Reviewed-by: Yang Zhang <yang.zhang.wz@gmail.com>

Jan Beulich Aug. 22, 2016, 10:35 a.m. UTC | #2

>>> On 19.08.16 at 14:58, <xuquan8@huawei.com> wrote:
> From 9b2df963c13ad27e2cffbeddfa3267782ac3da2a Mon Sep 17 00:00:00 2001
> From: Quan Xu <xuquan8@huawei.com>
> Date: Fri, 19 Aug 2016 20:40:31 +0800
> Subject: [RFC PATCH] x86/apicv: fix RTC periodic timer and apicv issue
> 
> When Xen apicv is enabled, wall clock time is faster on Windows7-32 guest
> with high payload (with 2vCPU, captured from xentrace, in high payload, the
> count of IPI interrupt increases rapidly between these vCPUs).
> 
> If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector 0xd1)
> are both pending (index of bit set in VIRR), unfortunately, the IPI intrrupt
> is high priority than periodic timer interrupt. Xen updates IPI interrupt bit
> set in VIRR to guest interrupt status (RVI) as a high priority and apicv
> (Virtual-Interrupt Delivery) delivers IPI interrupt within VMX non-root
> operation without a VM exit. Within VMX non-root operation, if periodic timer
> interrupt index of bit is set in VIRR and highest, the apicv delivers periodic
> timer interrupt within VMX non-root operation as well.
> 
> But in current code, if Xen doesn't update periodic timer interrupt bit set
> in VIRR to guest interrupt status (RVI) directly, Xen is not aware of this
> case to decrease the count (pending_intr_nr) of pending periodic timer interrupt,
> then Xen will deliver a periodic timer interrupt again. The guest receives more
> periodic timer interrupt.
> 
> If the periodic timer interrut is delivered and not the highest priority, make
> Xen be aware of this case to decrease the count of pending periodic timer
> interrupt.

I can see the issue you're trying to address, but for one - doesn't
this lead to other double accounting, namely once the pt irq becomes
the highest priority one?

> --- a/xen/arch/x86/hvm/vmx/intr.c
> +++ b/xen/arch/x86/hvm/vmx/intr.c
> @@ -334,7 +334,21 @@ void vmx_intr_assist(void)
>              __vmwrite(EOI_EXIT_BITMAP(i), v->arch.hvm_vmx.eoi_exit_bitmap[i]);
>          }
> 
> -        pt_intr_post(v, intack);
> +       /*
> +        * If the periodic timer interrut is delivered and not the highest priority,
> +        * make Xen be aware of this case to decrease the count of pending periodic
> +        * timer interrupt.
> +        */
> +        if ( pt_vector != -1 && intack.vector > pt_vector )
> +        {
> +            struct hvm_intack pt_intack;
> +
> +            pt_intack.vector = pt_vector;
> +            pt_intack.source = hvm_intsrc_lapic;
> +            pt_intr_post(v, pt_intack);
> +        }
> +        else
> +            pt_intr_post(v, intack);

And then I'm having two problems with this change: Processing other
than the highest priority irq here looks to be non-architectural. From
looking over pt_intr_post() there's only pt related accounting and no
actual interrupt handling there, but I'd like this to be (a) confirmed
and (b) called out in the comment.

Plus the change above is in no way apicv specific, so I'd also like to
see clarification why this change is valid (i.e. at least not making
things worse) even without apicv in use.

And of course in any event VMX maintainer feedback is going to be
needed here.

Jan

Xuquan (Euler) Aug. 22, 2016, 10:52 a.m. UTC | #3

On August 22, 2016 5:38 PM, Yang Zhang wrote:
>On 2016/8/19 20:58, Xuquan (Euler) wrote:
>> From 9b2df963c13ad27e2cffbeddfa3267782ac3da2a Mon Sep 17 00:00:00 2001
>> From: Quan Xu <xuquan8@huawei.com>
>> Date: Fri, 19 Aug 2016 20:40:31 +0800
>> Subject: [RFC PATCH] x86/apicv: fix RTC periodic timer and apicv issue
>>
>> When Xen apicv is enabled, wall clock time is faster on Windows7-32
>> guest with high payload (with 2vCPU, captured from xentrace, in high
>> payload, the count of IPI interrupt increases rapidly between these vCPUs).
>>
>> If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector
>> 0xd1) are both pending (index of bit set in VIRR), unfortunately, the
>> IPI intrrupt is high priority than periodic timer interrupt. Xen
>> updates IPI interrupt bit set in VIRR to guest interrupt status (RVI)
>> as a high priority and apicv (Virtual-Interrupt Delivery) delivers IPI
>> interrupt within VMX non-root operation without a VM exit. Within VMX
>> non-root operation, if periodic timer interrupt index of bit is set in
>> VIRR and highest, the apicv delivers periodic timer interrupt within VMX non-root operation as well.
>>
>> But in current code, if Xen doesn't update periodic timer interrupt
>> bit set in VIRR to guest interrupt status (RVI) directly, Xen is not
>> aware of this case to decrease the count (pending_intr_nr) of pending
>> periodic timer interrupt, then Xen will deliver a periodic timer
>> interrupt again. The guest receives more periodic timer interrupt.
>>
>> If the periodic timer interrut is delivered and not the highest
>> priority, make Xen be aware of this case to decrease the count of
>> pending periodic timer interrupt.
>
>Nice catch!
>
>>
>> Signed-off-by: Yifei Jiang <jiangyifei@huawei.com>
>> Signed-off-by: Rongguang He <herongguang.he@huawei.com>
>> Signed-off-by: Quan Xu <xuquan8@huawei.com>
>> ---
>> Why RFC:
>> 1. I am not quite sure for other cases, such as nested case.
>
>We don't support the nested APICv now. So it is ok for this patch.
>
>> 2. Within VMX non-root operation, an Asynchronous Enclave Exit (including external
>>    interrupts, non-maskable interrupt system-management interrrupts, exceptions
>>    and VM exit) may occur before delivery of a periodic timer interrupt, the periodic
>>    timer interrupt may be lost when a coming periodic timer interrupt is delivered.
>>    Actually, and so current code is.
>
>I think you mean the combination not interrupt lost. It should be ok since it happens rarely and even native OS will encounter the same problem if it disable the interrupt for too long.
>

Yes, it is 'combination'.

>> ---
>>  xen/arch/x86/hvm/vmx/intr.c | 16 +++++++++++++++-
>>  1 file changed, 15 insertions(+), 1 deletion(-)
>>
>> diff --git a/xen/arch/x86/hvm/vmx/intr.c b/xen/arch/x86/hvm/vmx/intr.c
>> index 8fca08c..d3a034e 100644
>> --- a/xen/arch/x86/hvm/vmx/intr.c
>> +++ b/xen/arch/x86/hvm/vmx/intr.c
>> @@ -334,7 +334,21 @@ void vmx_intr_assist(void)
>>              __vmwrite(EOI_EXIT_BITMAP(i), v->arch.hvm_vmx.eoi_exit_bitmap[i]);
>>          }
>>
>> -        pt_intr_post(v, intack);
>> +       /*
>> +        * If the periodic timer interrut is delivered and not the highest priority,
>> +        * make Xen be aware of this case to decrease the count of pending periodic
>> +        * timer interrupt.
>> +        */
>> +        if ( pt_vector != -1 && intack.vector > pt_vector )
>> +        {
>> +            struct hvm_intack pt_intack;
>> +
>> +            pt_intack.vector = pt_vector;
>> +            pt_intack.source = hvm_intsrc_lapic;
>> +            pt_intr_post(v, pt_intack);
>> +        }
>> +        else
>> +            pt_intr_post(v, intack);
>>      }
>>      else
>>      {
>> --
>> 1.7.12.4
>>
>
>Looks good to me.
>
>Reviewed-by: Yang Zhang <yang.zhang.wz@gmail.com>
>


Thanks for your comments!!
Quan


>--
>Yang
>Alibaba Cloud Computing

Yang Zhang Aug. 22, 2016, 11:40 a.m. UTC | #4

On 2016/8/22 18:35, Jan Beulich wrote:
>>>> On 19.08.16 at 14:58, <xuquan8@huawei.com> wrote:
>> From 9b2df963c13ad27e2cffbeddfa3267782ac3da2a Mon Sep 17 00:00:00 2001
>> From: Quan Xu <xuquan8@huawei.com>
>> Date: Fri, 19 Aug 2016 20:40:31 +0800
>> Subject: [RFC PATCH] x86/apicv: fix RTC periodic timer and apicv issue
>>
>> When Xen apicv is enabled, wall clock time is faster on Windows7-32 guest
>> with high payload (with 2vCPU, captured from xentrace, in high payload, the
>> count of IPI interrupt increases rapidly between these vCPUs).
>>
>> If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector 0xd1)
>> are both pending (index of bit set in VIRR), unfortunately, the IPI intrrupt
>> is high priority than periodic timer interrupt. Xen updates IPI interrupt bit
>> set in VIRR to guest interrupt status (RVI) as a high priority and apicv
>> (Virtual-Interrupt Delivery) delivers IPI interrupt within VMX non-root
>> operation without a VM exit. Within VMX non-root operation, if periodic timer
>> interrupt index of bit is set in VIRR and highest, the apicv delivers periodic
>> timer interrupt within VMX non-root operation as well.
>>
>> But in current code, if Xen doesn't update periodic timer interrupt bit set
>> in VIRR to guest interrupt status (RVI) directly, Xen is not aware of this
>> case to decrease the count (pending_intr_nr) of pending periodic timer interrupt,
>> then Xen will deliver a periodic timer interrupt again. The guest receives more
>> periodic timer interrupt.
>>
>> If the periodic timer interrut is delivered and not the highest priority, make
>> Xen be aware of this case to decrease the count of pending periodic timer
>> interrupt.
>
> I can see the issue you're trying to address, but for one - doesn't
> this lead to other double accounting, namely once the pt irq becomes
> the highest priority one?

"intack.vector > pt_vector"
I think above check in code can avoid the double accounting problem. It 
recalculate pending timer interrupt only when the highest priority 
interrupt is not timer.

>
>> --- a/xen/arch/x86/hvm/vmx/intr.c
>> +++ b/xen/arch/x86/hvm/vmx/intr.c
>> @@ -334,7 +334,21 @@ void vmx_intr_assist(void)
>>              __vmwrite(EOI_EXIT_BITMAP(i), v->arch.hvm_vmx.eoi_exit_bitmap[i]);
>>          }
>>
>> -        pt_intr_post(v, intack);
>> +       /*
>> +        * If the periodic timer interrut is delivered and not the highest priority,
>> +        * make Xen be aware of this case to decrease the count of pending periodic
>> +        * timer interrupt.
>> +        */
>> +        if ( pt_vector != -1 && intack.vector > pt_vector )
>> +        {
>> +            struct hvm_intack pt_intack;
>> +
>> +            pt_intack.vector = pt_vector;
>> +            pt_intack.source = hvm_intsrc_lapic;
>> +            pt_intr_post(v, pt_intack);
>> +        }
>> +        else
>> +            pt_intr_post(v, intack);
>
> And then I'm having two problems with this change: Processing other
> than the highest priority irq here looks to be non-architectural. From
> looking over pt_intr_post() there's only pt related accounting and no
> actual interrupt handling there, but I'd like this to be (a) confirmed
> and (b) called out in the comment.
>
> Plus the change above is in no way apicv specific, so I'd also like to
> see clarification why this change is valid (i.e. at least not making
> things worse) even without apicv in use.

It is apicv specific case. Above code will execute only when cpu has 
virtual interrupt delivery which is part of apicv feature.

>
> And of course in any event VMX maintainer feedback is going to be
> needed here.
>
> Jan
>

Xuquan (Euler) Aug. 22, 2016, 11:41 a.m. UTC | #5

On August 22, 2016 6:36 PM, <JBeulich@suse.com> wrote:
>>>> On 19.08.16 at 14:58, <xuquan8@xxxxxxxxxx> wrote:
>> From 9b2df963c13ad27e2cffbeddfa3267782ac3da2a Mon Sep 17 00:00:00 2001
>> From: Quan Xu <xuquan8@xxxxxxxxxx>
>> Date: Fri, 19 Aug 2016 20:40:31 +0800
>> Subject: [RFC PATCH] x86/apicv: fix RTC periodic timer and apicv issue
>>
>> When Xen apicv is enabled, wall clock time is faster on Windows7-32 guest
>> with high payload (with 2vCPU, captured from xentrace, in high payload, the
>> count of IPI interrupt increases rapidly between these vCPUs).
>>
>> If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector 0xd1)
>> are both pending (index of bit set in VIRR), unfortunately, the IPI intrrupt
>> is high priority than periodic timer interrupt. Xen updates IPI interrupt bit
>> set in VIRR to guest interrupt status (RVI) as a high priority and apicv
>> (Virtual-Interrupt Delivery) delivers IPI interrupt within VMX non-root
>> operation without a VM exit. Within VMX non-root operation, if periodic timer
>> interrupt index of bit is set in VIRR and highest, the apicv delivers periodic
>> timer interrupt within VMX non-root operation as well.
>>
>> But in current code, if Xen doesn't update periodic timer interrupt bit set
>> in VIRR to guest interrupt status (RVI) directly, Xen is not aware of this
>> case to decrease the count (pending_intr_nr) of pending periodic timer
>> interrupt,
>> then Xen will deliver a periodic timer interrupt again. The guest receives
>> more
>> periodic timer interrupt.
>>
>> If the periodic timer interrut is delivered and not the highest priority, make
>> Xen be aware of this case to decrease the count of pending periodic timer
>> interrupt.
>
>I can see the issue you're trying to address, but for one - doesn't
>this lead to other double accounting, namely once the pt irq becomes
>the highest priority one?
>

It is does NOT lead to other double accounting..
As if the pt irq becomes the highest priority one, the intack is pt one..
Then:

 +        else
 +            pt_intr_post(v, intack);


>> --- a/xen/arch/x86/hvm/vmx/intr.c
>> +++ b/xen/arch/x86/hvm/vmx/intr.c
>> @@ -334,7 +334,21 @@ void vmx_intr_assist(void)
>>              __vmwrite(EOI_EXIT_BITMAP(i),
>> v->arch.hvm_vmx.eoi_exit_bitmap[i]);
>>          }
>>
>> -        pt_intr_post(v, intack);
>> +       /*
>> +        * If the periodic timer interrut is delivered and not the highest
>> priority,
>> +        * make Xen be aware of this case to decrease the count of pending
>> periodic
>> +        * timer interrupt.
>> +        */
>> +        if ( pt_vector != -1 && intack.vector > pt_vector )
>> +        {
>> +            struct hvm_intack pt_intack;
>> +
>> +            pt_intack.vector = pt_vector;
>> +            pt_intack.source = hvm_intsrc_lapic;
>> +            pt_intr_post(v, pt_intack);
>> +        }
>> +        else
>> +            pt_intr_post(v, intack);
>
>And then I'm having two problems with this change: Processing other
>than the highest priority irq here looks to be non-architectural. From
>looking over pt_intr_post() there's only pt related accounting and no
>actual interrupt handling there, but I'd like this to be
> (a) confirmed

To me, YES.. but I hope Kevin could confirm it again.

>and (b) called out in the comment.

Agreed. Good idea.

>
>Plus the change above is in no way apicv specific, so I'd also like to
>see clarification why this change is valid (i.e. at least not making
>things worse) even without apicv in use.
>

__iiuc__, it is apicv specific, as these change is under condition 'cpu_has_vmx_virtual_intr_delivery' (Virtual Interrupt Delivery is one of apicv features) as below:

    if ( cpu_has_vmx_virtual_intr_delivery ...) {  ___change___ }
 



>And of course in any event VMX maintainer feedback is going to be
>needed here.


Thanks for your comments!!
Quan

Xuquan (Euler) Aug. 22, 2016, 11:52 a.m. UTC | #6

On August 22, 2016 7:40 PM, Yang Zhang wrote:
>On 2016/8/22 18:35, Jan Beulich wrote:
>>>>> On 19.08.16 at 14:58, <xuquan8@huawei.com> wrote:
>>> From 9b2df963c13ad27e2cffbeddfa3267782ac3da2a Mon Sep 17 00:00:00
>>> 2001
>>> From: Quan Xu <xuquan8@huawei.com>
>>> Date: Fri, 19 Aug 2016 20:40:31 +0800
>>> Subject: [RFC PATCH] x86/apicv: fix RTC periodic timer and apicv
>>> issue
>>>
>>> When Xen apicv is enabled, wall clock time is faster on Windows7-32
>>> guest with high payload (with 2vCPU, captured from xentrace, in high
>>> payload, the count of IPI interrupt increases rapidly between these vCPUs).
>>>
>>> If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector
>>> 0xd1) are both pending (index of bit set in VIRR), unfortunately, the
>>> IPI intrrupt is high priority than periodic timer interrupt. Xen
>>> updates IPI interrupt bit set in VIRR to guest interrupt status (RVI)
>>> as a high priority and apicv (Virtual-Interrupt Delivery) delivers
>>> IPI interrupt within VMX non-root operation without a VM exit. Within
>>> VMX non-root operation, if periodic timer interrupt index of bit is
>>> set in VIRR and highest, the apicv delivers periodic timer interrupt within
>VMX non-root operation as well.
>>>
>>> But in current code, if Xen doesn't update periodic timer interrupt
>>> bit set in VIRR to guest interrupt status (RVI) directly, Xen is not
>>> aware of this case to decrease the count (pending_intr_nr) of pending
>>> periodic timer interrupt, then Xen will deliver a periodic timer
>>> interrupt again. The guest receives more periodic timer interrupt.
>>>
>>> If the periodic timer interrut is delivered and not the highest
>>> priority, make Xen be aware of this case to decrease the count of
>>> pending periodic timer interrupt.
>>
>> I can see the issue you're trying to address, but for one - doesn't
>> this lead to other double accounting, namely once the pt irq becomes
>> the highest priority one?
>
>"intack.vector > pt_vector"
>I think above check in code can avoid the double accounting problem. It
>recalculate pending timer interrupt only when the highest priority interrupt is
>not timer.
>

Yes, 

>>
>>> --- a/xen/arch/x86/hvm/vmx/intr.c
>>> +++ b/xen/arch/x86/hvm/vmx/intr.c
>>> @@ -334,7 +334,21 @@ void vmx_intr_assist(void)
>>>              __vmwrite(EOI_EXIT_BITMAP(i),
>v->arch.hvm_vmx.eoi_exit_bitmap[i]);
>>>          }
>>>
>>> -        pt_intr_post(v, intack);
>>> +       /*
>>> +        * If the periodic timer interrut is delivered and not the highest
>priority,
>>> +        * make Xen be aware of this case to decrease the count of
>pending periodic
>>> +        * timer interrupt.
>>> +        */
>>> +        if ( pt_vector != -1 && intack.vector > pt_vector )
>>> +        {
>>> +            struct hvm_intack pt_intack;
>>> +
>>> +            pt_intack.vector = pt_vector;
>>> +            pt_intack.source = hvm_intsrc_lapic;
>>> +            pt_intr_post(v, pt_intack);
>>> +        }
>>> +        else
>>> +            pt_intr_post(v, intack);
>>
>> And then I'm having two problems with this change: Processing other
>> than the highest priority irq here looks to be non-architectural. From
>> looking over pt_intr_post() there's only pt related accounting and no
>> actual interrupt handling there, but I'd like this to be (a) confirmed
>> and (b) called out in the comment.
>>
>> Plus the change above is in no way apicv specific, so I'd also like to
>> see clarification why this change is valid (i.e. at least not making
>> things worse) even without apicv in use.
>
>It is apicv specific case. Above code will execute only when cpu has
>virtual interrupt delivery which is part of apicv feature.
>

Yes, I think so too. Thanks for your clarification!!

Quan


>>
>> And of course in any event VMX maintainer feedback is going to be
>> needed here.
>>
>> Jan
>>
>
>
>--
>Yang
>Alibaba Cloud Computing

Jan Beulich Aug. 22, 2016, 12:01 p.m. UTC | #7

>>> On 22.08.16 at 13:40, <yang.zhang.wz@gmail.com> wrote:
> On 2016/8/22 18:35, Jan Beulich wrote:
>>>>> On 19.08.16 at 14:58, <xuquan8@huawei.com> wrote:
>>> From 9b2df963c13ad27e2cffbeddfa3267782ac3da2a Mon Sep 17 00:00:00 2001
>>> From: Quan Xu <xuquan8@huawei.com>
>>> Date: Fri, 19 Aug 2016 20:40:31 +0800
>>> Subject: [RFC PATCH] x86/apicv: fix RTC periodic timer and apicv issue
>>>
>>> When Xen apicv is enabled, wall clock time is faster on Windows7-32 guest
>>> with high payload (with 2vCPU, captured from xentrace, in high payload, the
>>> count of IPI interrupt increases rapidly between these vCPUs).
>>>
>>> If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector 0xd1)
>>> are both pending (index of bit set in VIRR), unfortunately, the IPI intrrupt
>>> is high priority than periodic timer interrupt. Xen updates IPI interrupt bit
>>> set in VIRR to guest interrupt status (RVI) as a high priority and apicv
>>> (Virtual-Interrupt Delivery) delivers IPI interrupt within VMX non-root
>>> operation without a VM exit. Within VMX non-root operation, if periodic timer
>>> interrupt index of bit is set in VIRR and highest, the apicv delivers periodic
>>> timer interrupt within VMX non-root operation as well.
>>>
>>> But in current code, if Xen doesn't update periodic timer interrupt bit set
>>> in VIRR to guest interrupt status (RVI) directly, Xen is not aware of this
>>> case to decrease the count (pending_intr_nr) of pending periodic timer interrupt,
>>> then Xen will deliver a periodic timer interrupt again. The guest receives more
>>> periodic timer interrupt.
>>>
>>> If the periodic timer interrut is delivered and not the highest priority, make
>>> Xen be aware of this case to decrease the count of pending periodic timer
>>> interrupt.
>>
>> I can see the issue you're trying to address, but for one - doesn't
>> this lead to other double accounting, namely once the pt irq becomes
>> the highest priority one?
> 
> "intack.vector > pt_vector"
> I think above check in code can avoid the double accounting problem. It 
> recalculate pending timer interrupt only when the highest priority 
> interrupt is not timer.

I don't understand: When the pt one is highest priority, surely
recalculation is also needed. I'm not talking about a problem in
the case where the conditional is true, but about a possible
subsequent run through vmx_intr_assist() when the previous
higher priority interrupt has got handled, but the pt one (now
being highest) is still pending. (Or similarly consider the case
where on the next run through vmx_intr_assist() another higher
priority interrupts appeared.)

I'm not saying the change is wrong, I'm merely asking for further
proof that it doesn't open up new issues.

>>> --- a/xen/arch/x86/hvm/vmx/intr.c
>>> +++ b/xen/arch/x86/hvm/vmx/intr.c
>>> @@ -334,7 +334,21 @@ void vmx_intr_assist(void)
>>>              __vmwrite(EOI_EXIT_BITMAP(i), v->arch.hvm_vmx.eoi_exit_bitmap[i]);
>>>          }
>>>
>>> -        pt_intr_post(v, intack);
>>> +       /*
>>> +        * If the periodic timer interrut is delivered and not the highest priority,
>>> +        * make Xen be aware of this case to decrease the count of pending periodic
>>> +        * timer interrupt.
>>> +        */
>>> +        if ( pt_vector != -1 && intack.vector > pt_vector )
>>> +        {
>>> +            struct hvm_intack pt_intack;
>>> +
>>> +            pt_intack.vector = pt_vector;
>>> +            pt_intack.source = hvm_intsrc_lapic;
>>> +            pt_intr_post(v, pt_intack);
>>> +        }
>>> +        else
>>> +            pt_intr_post(v, intack);
>>
>> And then I'm having two problems with this change: Processing other
>> than the highest priority irq here looks to be non-architectural. From
>> looking over pt_intr_post() there's only pt related accounting and no
>> actual interrupt handling there, but I'd like this to be (a) confirmed
>> and (b) called out in the comment.
>>
>> Plus the change above is in no way apicv specific, so I'd also like to
>> see clarification why this change is valid (i.e. at least not making
>> things worse) even without apicv in use.
> 
> It is apicv specific case. Above code will execute only when cpu has 
> virtual interrupt delivery which is part of apicv feature.

Ah, right, I didn't look closely enough what conditional this is
inside of.

Jan

Jan Beulich Aug. 22, 2016, 12:04 p.m. UTC | #8

>>> On 22.08.16 at 13:41, <xuquan8@huawei.com> wrote:
> On August 22, 2016 6:36 PM, <JBeulich@suse.com> wrote:
>>>>> On 19.08.16 at 14:58, <xuquan8@xxxxxxxxxx> wrote:
>>> From 9b2df963c13ad27e2cffbeddfa3267782ac3da2a Mon Sep 17 00:00:00 2001
>>> From: Quan Xu <xuquan8@xxxxxxxxxx>
>>> Date: Fri, 19 Aug 2016 20:40:31 +0800
>>> Subject: [RFC PATCH] x86/apicv: fix RTC periodic timer and apicv issue
>>>
>>> When Xen apicv is enabled, wall clock time is faster on Windows7-32 guest
>>> with high payload (with 2vCPU, captured from xentrace, in high payload, the
>>> count of IPI interrupt increases rapidly between these vCPUs).
>>>
>>> If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector 0xd1)
>>> are both pending (index of bit set in VIRR), unfortunately, the IPI intrrupt
>>> is high priority than periodic timer interrupt. Xen updates IPI interrupt bit
>>> set in VIRR to guest interrupt status (RVI) as a high priority and apicv
>>> (Virtual-Interrupt Delivery) delivers IPI interrupt within VMX non-root
>>> operation without a VM exit. Within VMX non-root operation, if periodic timer
>>> interrupt index of bit is set in VIRR and highest, the apicv delivers periodic
>>> timer interrupt within VMX non-root operation as well.
>>>
>>> But in current code, if Xen doesn't update periodic timer interrupt bit set
>>> in VIRR to guest interrupt status (RVI) directly, Xen is not aware of this
>>> case to decrease the count (pending_intr_nr) of pending periodic timer
>>> interrupt,
>>> then Xen will deliver a periodic timer interrupt again. The guest receives
>>> more
>>> periodic timer interrupt.
>>>
>>> If the periodic timer interrut is delivered and not the highest priority, make
>>> Xen be aware of this case to decrease the count of pending periodic timer
>>> interrupt.
>>
>>I can see the issue you're trying to address, but for one - doesn't
>>this lead to other double accounting, namely once the pt irq becomes
>>the highest priority one?
>>
> 
> It is does NOT lead to other double accounting..
> As if the pt irq becomes the highest priority one, the intack is pt one..
> Then:
> 
>  +        else
>  +            pt_intr_post(v, intack);

As just said in reply to Yang: If this is still the same interrupt instance
as in a prior run through here which took the if() branch, this change
looks like having the potential of double accounting.

>>Plus the change above is in no way apicv specific, so I'd also like to
>>see clarification why this change is valid (i.e. at least not making
>>things worse) even without apicv in use.
>>
> 
> __iiuc__, it is apicv specific, as these change is under condition 
> 'cpu_has_vmx_virtual_intr_delivery' (Virtual Interrupt Delivery is one of 
> apicv features) as below:
> 
>     if ( cpu_has_vmx_virtual_intr_delivery ...) {  ___change___ }

Indeed, as already said in reply to Yang, I apparently didn't look
closely enough. I'm sorry for the noise.

Jan

Yang Zhang Aug. 22, 2016, 1:09 p.m. UTC | #9

On 2016/8/22 20:04, Jan Beulich wrote:
>>>> On 22.08.16 at 13:41, <xuquan8@huawei.com> wrote:
>> On August 22, 2016 6:36 PM, <JBeulich@suse.com> wrote:
>>>>>> On 19.08.16 at 14:58, <xuquan8@xxxxxxxxxx> wrote:
>>>> From 9b2df963c13ad27e2cffbeddfa3267782ac3da2a Mon Sep 17 00:00:00 2001
>>>> From: Quan Xu <xuquan8@xxxxxxxxxx>
>>>> Date: Fri, 19 Aug 2016 20:40:31 +0800
>>>> Subject: [RFC PATCH] x86/apicv: fix RTC periodic timer and apicv issue
>>>>
>>>> When Xen apicv is enabled, wall clock time is faster on Windows7-32 guest
>>>> with high payload (with 2vCPU, captured from xentrace, in high payload, the
>>>> count of IPI interrupt increases rapidly between these vCPUs).
>>>>
>>>> If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector 0xd1)
>>>> are both pending (index of bit set in VIRR), unfortunately, the IPI intrrupt
>>>> is high priority than periodic timer interrupt. Xen updates IPI interrupt bit
>>>> set in VIRR to guest interrupt status (RVI) as a high priority and apicv
>>>> (Virtual-Interrupt Delivery) delivers IPI interrupt within VMX non-root
>>>> operation without a VM exit. Within VMX non-root operation, if periodic timer
>>>> interrupt index of bit is set in VIRR and highest, the apicv delivers periodic
>>>> timer interrupt within VMX non-root operation as well.
>>>>
>>>> But in current code, if Xen doesn't update periodic timer interrupt bit set
>>>> in VIRR to guest interrupt status (RVI) directly, Xen is not aware of this
>>>> case to decrease the count (pending_intr_nr) of pending periodic timer
>>>> interrupt,
>>>> then Xen will deliver a periodic timer interrupt again. The guest receives
>>>> more
>>>> periodic timer interrupt.
>>>>
>>>> If the periodic timer interrut is delivered and not the highest priority, make
>>>> Xen be aware of this case to decrease the count of pending periodic timer
>>>> interrupt.
>>>
>>> I can see the issue you're trying to address, but for one - doesn't
>>> this lead to other double accounting, namely once the pt irq becomes
>>> the highest priority one?
>>>
>>
>> It is does NOT lead to other double accounting..
>> As if the pt irq becomes the highest priority one, the intack is pt one..
>> Then:
>>
>>  +        else
>>  +            pt_intr_post(v, intack);
>
> As just said in reply to Yang: If this is still the same interrupt instance
> as in a prior run through here which took the if() branch, this change
> looks like having the potential of double accounting.

This cannot happen: if pt intr is the second priority one, then 
corresponding vIRR bit will be cleared by hardware while the highest 
priority intr is delivered to guest. And the next vmx_intr_assit cannot 
aware of it since the vIRR bit is zero.

Xuquan (Euler) Aug. 22, 2016, 1:18 p.m. UTC | #10

On August 22, 2016 9:10 PM, Yang Zhang wrote:
>On 2016/8/22 20:04, Jan Beulich wrote:
>>>>> On 22.08.16 at 13:41, <xuquan8@huawei.com> wrote:
>>> On August 22, 2016 6:36 PM, <JBeulich@suse.com> wrote:
>>>>>>> On 19.08.16 at 14:58, <xuquan8@xxxxxxxxxx> wrote:
>>>>> From 9b2df963c13ad27e2cffbeddfa3267782ac3da2a Mon Sep 17 00:00:00
>>>>> 2001
>>>>> From: Quan Xu <xuquan8@xxxxxxxxxx>
>>>>> Date: Fri, 19 Aug 2016 20:40:31 +0800
>>>>> Subject: [RFC PATCH] x86/apicv: fix RTC periodic timer and apicv
>>>>> issue
>>>>>
>>>>> When Xen apicv is enabled, wall clock time is faster on Windows7-32
>>>>> guest with high payload (with 2vCPU, captured from xentrace, in
>>>>> high payload, the count of IPI interrupt increases rapidly between these
>vCPUs).
>>>>>
>>>>> If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector
>>>>> 0xd1) are both pending (index of bit set in VIRR), unfortunately,
>>>>> the IPI intrrupt is high priority than periodic timer interrupt.
>>>>> Xen updates IPI interrupt bit set in VIRR to guest interrupt status
>>>>> (RVI) as a high priority and apicv (Virtual-Interrupt Delivery)
>>>>> delivers IPI interrupt within VMX non-root operation without a VM
>>>>> exit. Within VMX non-root operation, if periodic timer interrupt
>>>>> index of bit is set in VIRR and highest, the apicv delivers periodic timer
>interrupt within VMX non-root operation as well.
>>>>>
>>>>> But in current code, if Xen doesn't update periodic timer interrupt
>>>>> bit set in VIRR to guest interrupt status (RVI) directly, Xen is
>>>>> not aware of this case to decrease the count (pending_intr_nr) of
>>>>> pending periodic timer interrupt, then Xen will deliver a periodic
>>>>> timer interrupt again. The guest receives more periodic timer
>>>>> interrupt.
>>>>>
>>>>> If the periodic timer interrut is delivered and not the highest
>>>>> priority, make Xen be aware of this case to decrease the count of
>>>>> pending periodic timer interrupt.
>>>>
>>>> I can see the issue you're trying to address, but for one - doesn't
>>>> this lead to other double accounting, namely once the pt irq becomes
>>>> the highest priority one?
>>>>
>>>
>>> It is does NOT lead to other double accounting..
>>> As if the pt irq becomes the highest priority one, the intack is pt one..
>>> Then:
>>>
>>>  +        else
>>>  +            pt_intr_post(v, intack);
>>
>> As just said in reply to Yang: If this is still the same interrupt
>> instance as in a prior run through here which took the if() branch,
>> this change looks like having the potential of double accounting.
>
>This cannot happen: if pt intr is the second priority one, then corresponding vIRR
>bit will be cleared by hardware while the highest priority intr is delivered to
>guest. And the next vmx_intr_assit cannot aware of it since the vIRR bit is zero.
>

I am afraid that this may happen.. as I mentioned as 2nd RFC reason, 

Within VMX non-root operation, an Asynchronous Enclave Exit (including external
interrupts, non-maskable interrupt system-management interrrupts, exceptions and VM exit)
may occur before delivery of a periodic timer interrupt, the pt bit of VIRR is still there...

but I will explain more why it doesn't have the potential of double accounting in current code in next email.

Quan

Xuquan (Euler) Aug. 22, 2016, 2:02 p.m. UTC | #11

On August 22, 2016 8:04 PM, <JBeulich@suse.com> wrote: 
>>>> On 22.08.16 at 13:41, <xuquan8@huawei.com> wrote:
>> On August 22, 2016 6:36 PM, <JBeulich@suse.com> wrote:
>>>>>> On 19.08.16 at 14:58, <xuquan8@xxxxxxxxxx> wrote:
>>>> From 9b2df963c13ad27e2cffbeddfa3267782ac3da2a Mon Sep 17 00:00:00
>>>> 2001
>>>> From: Quan Xu <xuquan8@xxxxxxxxxx>
>>>> Date: Fri, 19 Aug 2016 20:40:31 +0800
>>>> Subject: [RFC PATCH] x86/apicv: fix RTC periodic timer and apicv
>>>> issue
>>>>
>>>> When Xen apicv is enabled, wall clock time is faster on Windows7-32
>>>> guest with high payload (with 2vCPU, captured from xentrace, in high
>>>> payload, the count of IPI interrupt increases rapidly between these vCPUs).
>>>>
>>>> If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector
>>>> 0xd1) are both pending (index of bit set in VIRR), unfortunately,
>>>> the IPI intrrupt is high priority than periodic timer interrupt. Xen
>>>> updates IPI interrupt bit set in VIRR to guest interrupt status
>>>> (RVI) as a high priority and apicv (Virtual-Interrupt Delivery)
>>>> delivers IPI interrupt within VMX non-root operation without a VM
>>>> exit. Within VMX non-root operation, if periodic timer interrupt
>>>> index of bit is set in VIRR and highest, the apicv delivers periodic timer
>interrupt within VMX non-root operation as well.
>>>>
>>>> But in current code, if Xen doesn't update periodic timer interrupt
>>>> bit set in VIRR to guest interrupt status (RVI) directly, Xen is not
>>>> aware of this case to decrease the count (pending_intr_nr) of
>>>> pending periodic timer interrupt, then Xen will deliver a periodic
>>>> timer interrupt again. The guest receives more periodic timer
>>>> interrupt.
>>>>
>>>> If the periodic timer interrut is delivered and not the highest
>>>> priority, make Xen be aware of this case to decrease the count of
>>>> pending periodic timer interrupt.
>>>
>>>I can see the issue you're trying to address, but for one - doesn't
>>>this lead to other double accounting, namely once the pt irq becomes
>>>the highest priority one?
>>>
>>
>> It is does NOT lead to other double accounting..
>> As if the pt irq becomes the highest priority one, the intack is pt one..
>> Then:
>>
>>  +        else
>>  +            pt_intr_post(v, intack);
>
>As just said in reply to Yang: If this is still the same interrupt instance as in a
>prior run through here which took the if() branch, this change looks like having
>the potential of double accounting.
>

I very appreciate your detail review. It looks like, but actually it doesn't happen.

 As the key parameter 'pt->irq_issued'..

In pt_update_irq(), once the PT irq is issued, set the pt->irq_issued..
In pt_intr_post(), clear the pt->irq_issued before touching the count 'pt->pending_intr_nr'..

According to your assumption, at the second call to pt_intr_post(), As if 'pt->irq_issued' is clear, pt is NULL in is_pt_irq() check,
then return, there is no chance to touch the count 'pt->pending_intr_nr'..
------
void pt_intr_post(struct vcpu *v, struct hvm_intack intack)
{
...
    pt = is_pt_irq(v, intack);
    if ( pt == NULL )
    {
        spin_unlock(&v->arch.hvm_vcpu.tm_lock);
        return;
    }
...

     
...
}


static struct periodic_time *is_pt_irq()
{
 ....
    list_for_each_entry ( pt, head, list )
    {
        if ( pt->pending_intr_nr && ________pt->irq_issued_______ &&
             (intack.vector == pt_irq_vector(pt, intack.source)) )
            return pt;
    }

    return NULL;
}





__IIUC__, this question is based on the following pseudocode detail the behavior of virtual-interrupt delivery is __not__ atomic:

Vector <- RVI;
VISR[Vector] <- 1;
SVI <- Vector;
VPPR<- Vector & F0H;
VIRR[Vector] <- 0;
IF any bits set in VIRR
   Then RVI<-highest index of bit set in VIRR
   ELSE RVI <-0
FI
Deliver interrupt with Vector through IDT
...


We'd better check this first, as Yang said, this is atomic..

Quan

Yang Zhang Aug. 22, 2016, 2:07 p.m. UTC | #12

2016年8月22日星期一，Xuquan (Euler) <xuquan8@huawei.com> 写道：

> On August 22, 2016 8:04 PM, <JBeulich@suse.com <javascript:;>> wrote:
> >>>> On 22.08.16 at 13:41, <xuquan8@huawei.com <javascript:;>> wrote:
> >> On August 22, 2016 6:36 PM, <JBeulich@suse.com <javascript:;>> wrote:
> >>>>>> On 19.08.16 at 14:58, <xuquan8@xxxxxxxxxx> wrote:
> >>>> From 9b2df963c13ad27e2cffbeddfa3267782ac3da2a Mon Sep 17 00:00:00
> >>>> 2001
> >>>> From: Quan Xu <xuquan8@xxxxxxxxxx>
> >>>> Date: Fri, 19 Aug 2016 20:40:31 +0800
> >>>> Subject: [RFC PATCH] x86/apicv: fix RTC periodic timer and apicv
> >>>> issue
> >>>>
> >>>> When Xen apicv is enabled, wall clock time is faster on Windows7-32
> >>>> guest with high payload (with 2vCPU, captured from xentrace, in high
> >>>> payload, the count of IPI interrupt increases rapidly between these
> vCPUs).
> >>>>
> >>>> If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector
> >>>> 0xd1) are both pending (index of bit set in VIRR), unfortunately,
> >>>> the IPI intrrupt is high priority than periodic timer interrupt. Xen
> >>>> updates IPI interrupt bit set in VIRR to guest interrupt status
> >>>> (RVI) as a high priority and apicv (Virtual-Interrupt Delivery)
> >>>> delivers IPI interrupt within VMX non-root operation without a VM
> >>>> exit. Within VMX non-root operation, if periodic timer interrupt
> >>>> index of bit is set in VIRR and highest, the apicv delivers periodic
> timer
> >interrupt within VMX non-root operation as well.
> >>>>
> >>>> But in current code, if Xen doesn't update periodic timer interrupt
> >>>> bit set in VIRR to guest interrupt status (RVI) directly, Xen is not
> >>>> aware of this case to decrease the count (pending_intr_nr) of
> >>>> pending periodic timer interrupt, then Xen will deliver a periodic
> >>>> timer interrupt again. The guest receives more periodic timer
> >>>> interrupt.
> >>>>
> >>>> If the periodic timer interrut is delivered and not the highest
> >>>> priority, make Xen be aware of this case to decrease the count of
> >>>> pending periodic timer interrupt.
> >>>
> >>>I can see the issue you're trying to address, but for one - doesn't
> >>>this lead to other double accounting, namely once the pt irq becomes
> >>>the highest priority one?
> >>>
> >>
> >> It is does NOT lead to other double accounting..
> >> As if the pt irq becomes the highest priority one, the intack is pt
> one..
> >> Then:
> >>
> >>  +        else
> >>  +            pt_intr_post(v, intack);
> >
> >As just said in reply to Yang: If this is still the same interrupt
> instance as in a
> >prior run through here which took the if() branch, this change looks like
> having
> >the potential of double accounting.
> >
>
> I very appreciate your detail review. It looks like, but actually it
> doesn't happen.
>
>  As the key parameter 'pt->irq_issued'..
>
> In pt_update_irq(), once the PT irq is issued, set the pt->irq_issued..
> In pt_intr_post(), clear the pt->irq_issued before touching the count
> 'pt->pending_intr_nr'..
>
> According to your assumption, at the second call to pt_intr_post(), As if
> 'pt->irq_issued' is clear, pt is NULL in is_pt_irq() check,
> then return, there is no chance to touch the count 'pt->pending_intr_nr'..
> ------
> void pt_intr_post(struct vcpu *v, struct hvm_intack intack)
> {
> ...
>     pt = is_pt_irq(v, intack);
>     if ( pt == NULL )
>     {
>         spin_unlock(&v->arch.hvm_vcpu.tm_lock);
>         return;
>     }
> ...
>
>
> ...
> }
>
>
> static struct periodic_time *is_pt_irq()
> {
>  ....
>     list_for_each_entry ( pt, head, list )
>     {
>         if ( pt->pending_intr_nr && ________pt->irq_issued_______ &&
>              (intack.vector == pt_irq_vector(pt, intack.source)) )
>             return pt;
>     }
>
>     return NULL;
> }
>
>
>
>
>
> __IIUC__, this question is based on the following pseudocode detail the
> behavior of virtual-interrupt delivery is __not__ atomic:
>
> Vector <- RVI;
> VISR[Vector] <- 1;
> SVI <- Vector;
> VPPR<- Vector & F0H;
> VIRR[Vector] <- 0;
> IF any bits set in VIRR
>    Then RVI<-highest index of bit set in VIRR
>    ELSE RVI <-0
> FI
> Deliver interrupt with Vector through IDT
> ...
>
>
> We'd better check this first, as Yang said, this is atomic..


i have said that this is ensured by hardware.


>
> Quan
>

Yang Zhang Aug. 22, 2016, 2:15 p.m. UTC | #13

2016年8月22日星期一，Xuquan (Euler) <xuquan8@huawei.com> 写道：

> On August 22, 2016 8:04 PM, <JBeulich@suse.com <javascript:;>> wrote:
> >>>> On 22.08.16 at 13:41, <xuquan8@huawei.com <javascript:;>> wrote:
> >> On August 22, 2016 6:36 PM, <JBeulich@suse.com <javascript:;>> wrote:
> >>>>>> On 19.08.16 at 14:58, <xuquan8@xxxxxxxxxx> wrote:
> >>>> From 9b2df963c13ad27e2cffbeddfa3267782ac3da2a Mon Sep 17 00:00:00
> >>>> 2001
> >>>> From: Quan Xu <xuquan8@xxxxxxxxxx>
> >>>> Date: Fri, 19 Aug 2016 20:40:31 +0800
> >>>> Subject: [RFC PATCH] x86/apicv: fix RTC periodic timer and apicv
> >>>> issue
> >>>>
> >>>> When Xen apicv is enabled, wall clock time is faster on Windows7-32
> >>>> guest with high payload (with 2vCPU, captured from xentrace, in high
> >>>> payload, the count of IPI interrupt increases rapidly between these
> vCPUs).
> >>>>
> >>>> If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector
> >>>> 0xd1) are both pending (index of bit set in VIRR), unfortunately,
> >>>> the IPI intrrupt is high priority than periodic timer interrupt. Xen
> >>>> updates IPI interrupt bit set in VIRR to guest interrupt status
> >>>> (RVI) as a high priority and apicv (Virtual-Interrupt Delivery)
> >>>> delivers IPI interrupt within VMX non-root operation without a VM
> >>>> exit. Within VMX non-root operation, if periodic timer interrupt
> >>>> index of bit is set in VIRR and highest, the apicv delivers periodic
> timer
> >interrupt within VMX non-root operation as well.
> >>>>
> >>>> But in current code, if Xen doesn't update periodic timer interrupt
> >>>> bit set in VIRR to guest interrupt status (RVI) directly, Xen is not
> >>>> aware of this case to decrease the count (pending_intr_nr) of
> >>>> pending periodic timer interrupt, then Xen will deliver a periodic
> >>>> timer interrupt again. The guest receives more periodic timer
> >>>> interrupt.
> >>>>
> >>>> If the periodic timer interrut is delivered and not the highest
> >>>> priority, make Xen be aware of this case to decrease the count of
> >>>> pending periodic timer interrupt.
> >>>
> >>>I can see the issue you're trying to address, but for one - doesn't
> >>>this lead to other double accounting, namely once the pt irq becomes
> >>>the highest priority one?
> >>>
> >>
> >> It is does NOT lead to other double accounting..
> >> As if the pt irq becomes the highest priority one, the intack is pt
> one..
> >> Then:
> >>
> >>  +        else
> >>  +            pt_intr_post(v, intack);
> >
> >As just said in reply to Yang: If this is still the same interrupt
> instance as in a
> >prior run through here which took the if() branch, this change looks like
> having
> >the potential of double accounting.
> >
>
> I very appreciate your detail review. It looks like, but actually it
> doesn't happen.
>
>  As the key parameter 'pt->irq_issued'..
>
> In pt_update_irq(), once the PT irq is issued, set the pt->irq_issued..
> In pt_intr_post(), clear the pt->irq_issued before touching the count
> 'pt->pending_intr_nr'..
>
> According to your assumption, at the second call to pt_intr_post(), As if
> 'pt->irq_issued' is clear, pt is NULL in is_pt_irq() check,
> then return, there is no chance to touch the count 'pt->pending_intr_nr'..
> ------
> void pt_intr_post(struct vcpu *v, struct hvm_intack intack)
> {
> ...
>     pt = is_pt_irq(v, intack);
>     if ( pt == NULL )
>     {
>         spin_unlock(&v->arch.hvm_vcpu.tm_lock);
>         return;
>     }
> ...
>
>
> ...
> }
>
>
> static struct periodic_time *is_pt_irq()
> {
>  ....
>     list_for_each_entry ( pt, head, list )
>     {
>         if ( pt->pending_intr_nr && ________pt->irq_issued_______ &&
>              (intack.vector == pt_irq_vector(pt, intack.source)) )
>             return pt;
>     }
>
>     return NULL;
> }
>
>
>
>
>
> __IIUC__, this question is based on the following pseudocode detail the
> behavior of virtual-interrupt delivery is __not__ atomic:
>
> Vector <- RVI;
> VISR[Vector] <- 1;
> SVI <- Vector;
> VPPR<- Vector & F0H;
> VIRR[Vector] <- 0;
> IF any bits set in VIRR
>    Then RVI<-highest index of bit set in VIRR
>    ELSE RVI <-0
> FI
> Deliver interrupt with Vector through IDT
> ...
>
>
> We'd better check this first, as Yang said, this is atomic..


i have said that this is ensured by hardware.


>
> Quan
>

Jan Beulich Aug. 22, 2016, 2:57 p.m. UTC | #14

>>> On 22.08.16 at 15:09, <yang.zhang.wz@gmail.com> wrote:
> On 2016/8/22 20:04, Jan Beulich wrote:
>>>>> On 22.08.16 at 13:41, <xuquan8@huawei.com> wrote:
>>> On August 22, 2016 6:36 PM, <JBeulich@suse.com> wrote:
>>>>>>> On 19.08.16 at 14:58, <xuquan8@xxxxxxxxxx> wrote:
>>>>> From 9b2df963c13ad27e2cffbeddfa3267782ac3da2a Mon Sep 17 00:00:00 2001
>>>>> From: Quan Xu <xuquan8@xxxxxxxxxx>
>>>>> Date: Fri, 19 Aug 2016 20:40:31 +0800
>>>>> Subject: [RFC PATCH] x86/apicv: fix RTC periodic timer and apicv issue
>>>>>
>>>>> When Xen apicv is enabled, wall clock time is faster on Windows7-32 guest
>>>>> with high payload (with 2vCPU, captured from xentrace, in high payload, the
>>>>> count of IPI interrupt increases rapidly between these vCPUs).
>>>>>
>>>>> If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector 0xd1)
>>>>> are both pending (index of bit set in VIRR), unfortunately, the IPI intrrupt
>>>>> is high priority than periodic timer interrupt. Xen updates IPI interrupt 
> bit
>>>>> set in VIRR to guest interrupt status (RVI) as a high priority and apicv
>>>>> (Virtual-Interrupt Delivery) delivers IPI interrupt within VMX non-root
>>>>> operation without a VM exit. Within VMX non-root operation, if periodic timer
>>>>> interrupt index of bit is set in VIRR and highest, the apicv delivers 
> periodic
>>>>> timer interrupt within VMX non-root operation as well.
>>>>>
>>>>> But in current code, if Xen doesn't update periodic timer interrupt bit set
>>>>> in VIRR to guest interrupt status (RVI) directly, Xen is not aware of this
>>>>> case to decrease the count (pending_intr_nr) of pending periodic timer
>>>>> interrupt,
>>>>> then Xen will deliver a periodic timer interrupt again. The guest receives
>>>>> more
>>>>> periodic timer interrupt.
>>>>>
>>>>> If the periodic timer interrut is delivered and not the highest priority, 
> make
>>>>> Xen be aware of this case to decrease the count of pending periodic timer
>>>>> interrupt.
>>>>
>>>> I can see the issue you're trying to address, but for one - doesn't
>>>> this lead to other double accounting, namely once the pt irq becomes
>>>> the highest priority one?
>>>>
>>>
>>> It is does NOT lead to other double accounting..
>>> As if the pt irq becomes the highest priority one, the intack is pt one..
>>> Then:
>>>
>>>  +        else
>>>  +            pt_intr_post(v, intack);
>>
>> As just said in reply to Yang: If this is still the same interrupt instance
>> as in a prior run through here which took the if() branch, this change
>> looks like having the potential of double accounting.
> 
> This cannot happen: if pt intr is the second priority one, then 
> corresponding vIRR bit will be cleared by hardware while the highest 
> priority intr is delivered to guest. And the next vmx_intr_assit cannot 
> aware of it since the vIRR bit is zero.

In addition to what Quan said in response - aren't you mixing up
vISR and vIRR here? I can see vISR being clear when a higher prio
interrupt gets delivered (unless that happened to interrupt the pt
interrupt handler), but I would hope that virtualized behavior
matches non-virtualized one in that vIRR accumulates requests.

Jan

Jan Beulich Aug. 22, 2016, 3:11 p.m. UTC | #15

>>> On 22.08.16 at 16:02, <xuquan8@huawei.com> wrote:
> On August 22, 2016 8:04 PM, <JBeulich@suse.com> wrote: 
>>>>> On 22.08.16 at 13:41, <xuquan8@huawei.com> wrote:
>>> On August 22, 2016 6:36 PM, <JBeulich@suse.com> wrote:
>>>>>>> On 19.08.16 at 14:58, <xuquan8@xxxxxxxxxx> wrote:
>>>>> From 9b2df963c13ad27e2cffbeddfa3267782ac3da2a Mon Sep 17 00:00:00
>>>>> 2001
>>>>> From: Quan Xu <xuquan8@xxxxxxxxxx>
>>>>> Date: Fri, 19 Aug 2016 20:40:31 +0800
>>>>> Subject: [RFC PATCH] x86/apicv: fix RTC periodic timer and apicv
>>>>> issue
>>>>>
>>>>> When Xen apicv is enabled, wall clock time is faster on Windows7-32
>>>>> guest with high payload (with 2vCPU, captured from xentrace, in high
>>>>> payload, the count of IPI interrupt increases rapidly between these vCPUs).
>>>>>
>>>>> If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector
>>>>> 0xd1) are both pending (index of bit set in VIRR), unfortunately,
>>>>> the IPI intrrupt is high priority than periodic timer interrupt. Xen
>>>>> updates IPI interrupt bit set in VIRR to guest interrupt status
>>>>> (RVI) as a high priority and apicv (Virtual-Interrupt Delivery)
>>>>> delivers IPI interrupt within VMX non-root operation without a VM
>>>>> exit. Within VMX non-root operation, if periodic timer interrupt
>>>>> index of bit is set in VIRR and highest, the apicv delivers periodic timer
>>interrupt within VMX non-root operation as well.
>>>>>
>>>>> But in current code, if Xen doesn't update periodic timer interrupt
>>>>> bit set in VIRR to guest interrupt status (RVI) directly, Xen is not
>>>>> aware of this case to decrease the count (pending_intr_nr) of
>>>>> pending periodic timer interrupt, then Xen will deliver a periodic
>>>>> timer interrupt again. The guest receives more periodic timer
>>>>> interrupt.
>>>>>
>>>>> If the periodic timer interrut is delivered and not the highest
>>>>> priority, make Xen be aware of this case to decrease the count of
>>>>> pending periodic timer interrupt.
>>>>
>>>>I can see the issue you're trying to address, but for one - doesn't
>>>>this lead to other double accounting, namely once the pt irq becomes
>>>>the highest priority one?
>>>>
>>>
>>> It is does NOT lead to other double accounting..
>>> As if the pt irq becomes the highest priority one, the intack is pt one..
>>> Then:
>>>
>>>  +        else
>>>  +            pt_intr_post(v, intack);
>>
>>As just said in reply to Yang: If this is still the same interrupt instance 
> as in a
>>prior run through here which took the if() branch, this change looks like 
> having
>>the potential of double accounting.
>>
> 
> I very appreciate your detail review. It looks like, but actually it doesn't 
> happen.
> 
>  As the key parameter 'pt->irq_issued'..
> 
> In pt_update_irq(), once the PT irq is issued, set the pt->irq_issued..
> In pt_intr_post(), clear the pt->irq_issued before touching the count 
> 'pt->pending_intr_nr'..
> 
> According to your assumption, at the second call to pt_intr_post(), As if 
> 'pt->irq_issued' is clear, pt is NULL in is_pt_irq() check,
> then return, there is no chance to touch the count 'pt->pending_intr_nr'..

Hmm, wait: That second pass would also get us through
pt_update_irq() a second time, which might cause irq_issued to get
set again.

Granted this code is fragile, therefore please excuse that I'm trying
to be extra careful with accepting changes to it.

Jan

Xuquan (Euler) Aug. 23, 2016, 12:40 a.m. UTC | #16

On August 22, 2016 11:12 PM, <JBeulich@suse.com> wrote:
>>>> On 22.08.16 at 16:02, <xuquan8@huawei.com> wrote:
>> On August 22, 2016 8:04 PM, <JBeulich@suse.com> wrote:
>>>>>> On 22.08.16 at 13:41, <xuquan8@huawei.com> wrote:
>>>> On August 22, 2016 6:36 PM, <JBeulich@suse.com> wrote:
>>>>>>>> On 19.08.16 at 14:58, <xuquan8@xxxxxxxxxx> wrote:
>>>>>> From 9b2df963c13ad27e2cffbeddfa3267782ac3da2a Mon Sep 17
>00:00:00
>>>>>> 2001
>>>>>> From: Quan Xu <xuquan8@xxxxxxxxxx>
>>>>>> Date: Fri, 19 Aug 2016 20:40:31 +0800
>>>>>> Subject: [RFC PATCH] x86/apicv: fix RTC periodic timer and apicv
>>>>>> issue
>>>>>>
>>>>>> When Xen apicv is enabled, wall clock time is faster on
>>>>>> Windows7-32 guest with high payload (with 2vCPU, captured from
>>>>>> xentrace, in high payload, the count of IPI interrupt increases rapidly
>between these vCPUs).
>>>>>>
>>>>>> If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector
>>>>>> 0xd1) are both pending (index of bit set in VIRR), unfortunately,
>>>>>> the IPI intrrupt is high priority than periodic timer interrupt.
>>>>>> Xen updates IPI interrupt bit set in VIRR to guest interrupt
>>>>>> status
>>>>>> (RVI) as a high priority and apicv (Virtual-Interrupt Delivery)
>>>>>> delivers IPI interrupt within VMX non-root operation without a VM
>>>>>> exit. Within VMX non-root operation, if periodic timer interrupt
>>>>>> index of bit is set in VIRR and highest, the apicv delivers
>>>>>> periodic timer
>>>interrupt within VMX non-root operation as well.
>>>>>>
>>>>>> But in current code, if Xen doesn't update periodic timer
>>>>>> interrupt bit set in VIRR to guest interrupt status (RVI)
>>>>>> directly, Xen is not aware of this case to decrease the count
>>>>>> (pending_intr_nr) of pending periodic timer interrupt, then Xen
>>>>>> will deliver a periodic timer interrupt again. The guest receives
>>>>>> more periodic timer interrupt.
>>>>>>
>>>>>> If the periodic timer interrut is delivered and not the highest
>>>>>> priority, make Xen be aware of this case to decrease the count of
>>>>>> pending periodic timer interrupt.
>>>>>
>>>>>I can see the issue you're trying to address, but for one - doesn't
>>>>>this lead to other double accounting, namely once the pt irq becomes
>>>>>the highest priority one?
>>>>>
>>>>
>>>> It is does NOT lead to other double accounting..
>>>> As if the pt irq becomes the highest priority one, the intack is pt one..
>>>> Then:
>>>>
>>>>  +        else
>>>>  +            pt_intr_post(v, intack);
>>>
>>>As just said in reply to Yang: If this is still the same interrupt
>>>instance
>> as in a
>>>prior run through here which took the if() branch, this change looks
>>>like
>> having
>>>the potential of double accounting.
>>>
>>
>> I very appreciate your detail review. It looks like, but actually it
>> doesn't happen.
>>
>>  As the key parameter 'pt->irq_issued'..
>>
>> In pt_update_irq(), once the PT irq is issued, set the pt->irq_issued..
>> In pt_intr_post(), clear the pt->irq_issued before touching the count
>> 'pt->pending_intr_nr'..
>>
>> According to your assumption, at the second call to pt_intr_post(), As
>> if 'pt->irq_issued' is clear, pt is NULL in is_pt_irq() check, then
>> return, there is no chance to touch the count 'pt->pending_intr_nr'..
>
>Hmm, wait: That second pass would also get us through
>pt_update_irq() a second time, which might cause irq_issued to get set again.
>

 iiuc this case is IRQ combination, issue twice but delivery once..
Another enhancement may be required here,
     IF  hvm_intsrc_lapic && the PT IRQ index bit in VIRR is not clear
        THEN don't issue PT IRQ in pt_update_irq()
     FI

>Granted this code is fragile, therefore please excuse that I'm trying to be extra
>careful with accepting changes to it.

Understood :):)...

Quan

Yang Zhang Aug. 23, 2016, 2:11 a.m. UTC | #17

On 2016/8/22 22:57, Jan Beulich wrote:
>>>> On 22.08.16 at 15:09, <yang.zhang.wz@gmail.com> wrote:
>> On 2016/8/22 20:04, Jan Beulich wrote:
>>>>>> On 22.08.16 at 13:41, <xuquan8@huawei.com> wrote:
>>>> On August 22, 2016 6:36 PM, <JBeulich@suse.com> wrote:
>>>>>>>> On 19.08.16 at 14:58, <xuquan8@xxxxxxxxxx> wrote:
>>>>>> From 9b2df963c13ad27e2cffbeddfa3267782ac3da2a Mon Sep 17 00:00:00 2001
>>>>>> From: Quan Xu <xuquan8@xxxxxxxxxx>
>>>>>> Date: Fri, 19 Aug 2016 20:40:31 +0800
>>>>>> Subject: [RFC PATCH] x86/apicv: fix RTC periodic timer and apicv issue
>>>>>>
>>>>>> When Xen apicv is enabled, wall clock time is faster on Windows7-32 guest
>>>>>> with high payload (with 2vCPU, captured from xentrace, in high payload, the
>>>>>> count of IPI interrupt increases rapidly between these vCPUs).
>>>>>>
>>>>>> If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector 0xd1)
>>>>>> are both pending (index of bit set in VIRR), unfortunately, the IPI intrrupt
>>>>>> is high priority than periodic timer interrupt. Xen updates IPI interrupt
>> bit
>>>>>> set in VIRR to guest interrupt status (RVI) as a high priority and apicv
>>>>>> (Virtual-Interrupt Delivery) delivers IPI interrupt within VMX non-root
>>>>>> operation without a VM exit. Within VMX non-root operation, if periodic timer
>>>>>> interrupt index of bit is set in VIRR and highest, the apicv delivers
>> periodic
>>>>>> timer interrupt within VMX non-root operation as well.
>>>>>>
>>>>>> But in current code, if Xen doesn't update periodic timer interrupt bit set
>>>>>> in VIRR to guest interrupt status (RVI) directly, Xen is not aware of this
>>>>>> case to decrease the count (pending_intr_nr) of pending periodic timer
>>>>>> interrupt,
>>>>>> then Xen will deliver a periodic timer interrupt again. The guest receives
>>>>>> more
>>>>>> periodic timer interrupt.
>>>>>>
>>>>>> If the periodic timer interrut is delivered and not the highest priority,
>> make
>>>>>> Xen be aware of this case to decrease the count of pending periodic timer
>>>>>> interrupt.
>>>>>
>>>>> I can see the issue you're trying to address, but for one - doesn't
>>>>> this lead to other double accounting, namely once the pt irq becomes
>>>>> the highest priority one?
>>>>>
>>>>
>>>> It is does NOT lead to other double accounting..
>>>> As if the pt irq becomes the highest priority one, the intack is pt one..
>>>> Then:
>>>>
>>>>  +        else
>>>>  +            pt_intr_post(v, intack);
>>>
>>> As just said in reply to Yang: If this is still the same interrupt instance
>>> as in a prior run through here which took the if() branch, this change
>>> looks like having the potential of double accounting.
>>
>> This cannot happen: if pt intr is the second priority one, then
>> corresponding vIRR bit will be cleared by hardware while the highest
>> priority intr is delivered to guest. And the next vmx_intr_assit cannot
>> aware of it since the vIRR bit is zero.
>
> In addition to what Quan said in response - aren't you mixing up
> vISR and vIRR here? I can see vISR being clear when a higher prio

Right. It should be ISR not IRR.

> interrupt gets delivered (unless that happened to interrupt the pt
> interrupt handler), but I would hope that virtualized behavior
> matches non-virtualized one in that vIRR accumulates requests.

Tian, Kevin Aug. 30, 2016, 5:58 a.m. UTC | #18

> From: Xuquan (Euler) [mailto:xuquan8@huawei.com]
> Sent: Friday, August 19, 2016 8:59 PM
> 
> When Xen apicv is enabled, wall clock time is faster on Windows7-32 guest
> with high payload (with 2vCPU, captured from xentrace, in high payload, the
> count of IPI interrupt increases rapidly between these vCPUs).
> 
> If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector 0xd1)
> are both pending (index of bit set in VIRR), unfortunately, the IPI intrrupt
> is high priority than periodic timer interrupt. Xen updates IPI interrupt bit
> set in VIRR to guest interrupt status (RVI) as a high priority and apicv
> (Virtual-Interrupt Delivery) delivers IPI interrupt within VMX non-root
> operation without a VM exit. Within VMX non-root operation, if periodic timer
> interrupt index of bit is set in VIRR and highest, the apicv delivers periodic
> timer interrupt within VMX non-root operation as well.
> 
> But in current code, if Xen doesn't update periodic timer interrupt bit set
> in VIRR to guest interrupt status (RVI) directly, Xen is not aware of this
> case to decrease the count (pending_intr_nr) of pending periodic timer interrupt,
> then Xen will deliver a periodic timer interrupt again. The guest receives more
> periodic timer interrupt.
> 
> If the periodic timer interrut is delivered and not the highest priority, make
> Xen be aware of this case to decrease the count of pending periodic timer
> interrupt.
> 
> Signed-off-by: Yifei Jiang <jiangyifei@huawei.com>
> Signed-off-by: Rongguang He <herongguang.he@huawei.com>
> Signed-off-by: Quan Xu <xuquan8@huawei.com>
> ---
> Why RFC:
> 1. I am not quite sure for other cases, such as nested case.
> 2. Within VMX non-root operation, an Asynchronous Enclave Exit (including external
>    interrupts, non-maskable interrupt system-management interrrupts, exceptions
>    and VM exit) may occur before delivery of a periodic timer interrupt, the periodic
>    timer interrupt may be lost when a coming periodic timer interrupt is delivered.
>    Actually, and so current code is.
> ---
>  xen/arch/x86/hvm/vmx/intr.c | 16 +++++++++++++++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/arch/x86/hvm/vmx/intr.c b/xen/arch/x86/hvm/vmx/intr.c
> index 8fca08c..d3a034e 100644
> --- a/xen/arch/x86/hvm/vmx/intr.c
> +++ b/xen/arch/x86/hvm/vmx/intr.c
> @@ -334,7 +334,21 @@ void vmx_intr_assist(void)
>              __vmwrite(EOI_EXIT_BITMAP(i), v->arch.hvm_vmx.eoi_exit_bitmap[i]);
>          }
> 
> -        pt_intr_post(v, intack);
> +       /*
> +        * If the periodic timer interrut is delivered and not the highest priority,
> +        * make Xen be aware of this case to decrease the count of pending periodic
> +        * timer interrupt.
> +        */
> +        if ( pt_vector != -1 && intack.vector > pt_vector )
> +        {
> +            struct hvm_intack pt_intack;
> +
> +            pt_intack.vector = pt_vector;
> +            pt_intack.source = hvm_intsrc_lapic;
> +            pt_intr_post(v, pt_intack);
> +        }
> +        else
> +            pt_intr_post(v, intack);
>      }

Here is my thought. w/o above patch one pending pt interrupt may 
result in multiple injections of guest timer interrupt, as you identified, 
when pt_vector happens to be not the highest pending vector. Then
w/ your patch, multiple pending pt interrupt instances may be combined
as one injection of guest timer interrupt. Especially intack.vector
>pt_vector doesn't mean pt_intr is eligible for injection, which might
be blocked due to other reasons. As Jan pointed out, this path is very 
fragile, so better we can have a fix to sustain the original behavior
with one pending pt instance causing one actual injection.

What about doing pt_intr_post in EOI-induced VM-exit instead? EOI exit
bit for pt_intr is already set to 1 which means every injection would incur
an EOI-induced VM-exit:

       /*
        * Set eoi_exit_bitmap for periodic timer interrup to cause EOI-induced VM
        * exit, then pending periodic time interrups have the chance to be injected
        * for compensation
        */
        if (pt_vector != -1)
            vmx_set_eoi_exit_bitmap(v, pt_vector);

I don't think delaying post makes much difference here. Even we do
post earlier like your patch, further pending instances have to wait
until current instance is completed. So as long as post happens before
EOI is completed, it should be fine.

Thanks
Kevin

Xuquan (Euler) Sept. 6, 2016, 8:53 a.m. UTC | #19

On August 30, 2016 1:58 PM, Tian, Kevin <kevin.tian@intel.com> wrote:
>> From: Xuquan (Euler) [mailto:xuquan8@huawei.com]
>> Sent: Friday, August 19, 2016 8:59 PM
>>
>> When Xen apicv is enabled, wall clock time is faster on Windows7-32
>> guest with high payload (with 2vCPU, captured from xentrace, in high
>> payload, the count of IPI interrupt increases rapidly between these vCPUs).
>>
>> If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector
>> 0xd1) are both pending (index of bit set in VIRR), unfortunately, the
>> IPI intrrupt is high priority than periodic timer interrupt. Xen
>> updates IPI interrupt bit set in VIRR to guest interrupt status (RVI)
>> as a high priority and apicv (Virtual-Interrupt Delivery) delivers IPI
>> interrupt within VMX non-root operation without a VM exit. Within VMX
>> non-root operation, if periodic timer interrupt index of bit is set in
>> VIRR and highest, the apicv delivers periodic timer interrupt within VMX
>non-root operation as well.
>>
>> But in current code, if Xen doesn't update periodic timer interrupt
>> bit set in VIRR to guest interrupt status (RVI) directly, Xen is not
>> aware of this case to decrease the count (pending_intr_nr) of pending
>> periodic timer interrupt, then Xen will deliver a periodic timer
>> interrupt again. The guest receives more periodic timer interrupt.
>>
>> If the periodic timer interrut is delivered and not the highest
>> priority, make Xen be aware of this case to decrease the count of
>> pending periodic timer interrupt.
>>
>> Signed-off-by: Yifei Jiang <jiangyifei@huawei.com>
>> Signed-off-by: Rongguang He <herongguang.he@huawei.com>
>> Signed-off-by: Quan Xu <xuquan8@huawei.com>
>> ---
>> Why RFC:
>> 1. I am not quite sure for other cases, such as nested case.
>> 2. Within VMX non-root operation, an Asynchronous Enclave Exit (including
>external
>>    interrupts, non-maskable interrupt system-management interrrupts,
>exceptions
>>    and VM exit) may occur before delivery of a periodic timer interrupt, the
>periodic
>>    timer interrupt may be lost when a coming periodic timer interrupt is
>delivered.
>>    Actually, and so current code is.
>> ---
>>  xen/arch/x86/hvm/vmx/intr.c | 16 +++++++++++++++-
>>  1 file changed, 15 insertions(+), 1 deletion(-)
>>
>> diff --git a/xen/arch/x86/hvm/vmx/intr.c b/xen/arch/x86/hvm/vmx/intr.c
>> index 8fca08c..d3a034e 100644
>> --- a/xen/arch/x86/hvm/vmx/intr.c
>> +++ b/xen/arch/x86/hvm/vmx/intr.c
>> @@ -334,7 +334,21 @@ void vmx_intr_assist(void)
>>              __vmwrite(EOI_EXIT_BITMAP(i),
>v->arch.hvm_vmx.eoi_exit_bitmap[i]);
>>          }
>>
>> -        pt_intr_post(v, intack);
>> +       /*
>> +        * If the periodic timer interrut is delivered and not the highest
>priority,
>> +        * make Xen be aware of this case to decrease the count of pending
>periodic
>> +        * timer interrupt.
>> +        */
>> +        if ( pt_vector != -1 && intack.vector > pt_vector )
>> +        {
>> +            struct hvm_intack pt_intack;
>> +
>> +            pt_intack.vector = pt_vector;
>> +            pt_intack.source = hvm_intsrc_lapic;
>> +            pt_intr_post(v, pt_intack);
>> +        }
>> +        else
>> +            pt_intr_post(v, intack);
>>      }
>
>Here is my thought. w/o above patch one pending pt interrupt may result in
>multiple injections of guest timer interrupt, as you identified, when pt_vector
>happens to be not the highest pending vector. Then w/ your patch, multiple
>pending pt interrupt instances may be combined as one injection of guest timer
>interrupt. Especially intack.vector
>>pt_vector doesn't mean pt_intr is eligible for injection, which might
>be blocked due to other reasons. As Jan pointed out, this path is very fragile, so
>better we can have a fix to sustain the original behavior with one pending pt
>instance causing one actual injection.
>

Agreed. Good summary..

>What about doing pt_intr_post in EOI-induced VM-exit instead? EOI exit bit for
>pt_intr is already set to 1 which means every injection would incur an
>EOI-induced VM-exit:
>
>       /*
>        * Set eoi_exit_bitmap for periodic timer interrup to cause EOI-induced
>VM
>        * exit, then pending periodic time interrups have the chance to be
>injected
>        * for compensation
>        */
>        if (pt_vector != -1)
>            vmx_set_eoi_exit_bitmap(v, pt_vector);
>
>I don't think delaying post makes much difference here. Even we do post earlier
>like your patch, further pending instances have to wait until current instance is
>completed. So as long as post happens before EOI is completed, it should be
>fine.

Although I really wanted to fix here, I gave up.. 
We need to try our best to narrow down this 'window' -- from pt_update_irq() to pt_intr_post()..
I think It is too later to call pt_intr_post() in EOI-induced VM-exit, _iiuc_, we cannot make sure that each PT interrupt injection incurs an EOI-induced VM-exit before the next VM entry..
then one pending pt interrupt may result in multiple injections of guest timer interrupt.. 

to be honest, my fix is also a tradeoff.. w/ my patch, multiple pending pt interrupt instances may be combined as one injection of guest timer interrupt..
 but as Yang said, It should be ok since it happens rarely and even native OS will encounter the same problem if it disable the interrupt for too long.



Thanks for your comments!!

Quan
Huawei 2012 lab

Jan Beulich Sept. 6, 2016, 9:04 a.m. UTC | #20

>>> On 06.09.16 at 10:53, <xuquan8@huawei.com> wrote:
> to be honest, my fix is also a tradeoff.. w/ my patch, multiple pending pt 
> interrupt instances may be combined as one injection of guest timer 
> interrupt..
>  but as Yang said, It should be ok since it happens rarely and even native 
> OS will encounter the same problem if it disable the interrupt for too long.

And one OS may properly deal with that situation while another
might not. For example, let me remind you of issues the hypervisor
had in earlier days when no event/softirq processing happened for
long enough, resulting in time management issues due to missed
platform timer overflow handling. IOW an OS may avoid that
situation by simply not disabling IRQs for too long a time.

Jan

Xuquan (Euler) Sept. 6, 2016, 11:22 a.m. UTC | #21

On September 06, 2016 5:05 PM, Jan Beulich <JBeulich@suse.com>
>>>> On 06.09.16 at 10:53, <xuquan8@huawei.com> wrote:
>> to be honest, my fix is also a tradeoff.. w/ my patch, multiple
>> pending pt interrupt instances may be combined as one injection of
>> guest timer interrupt..
>>  but as Yang said, It should be ok since it happens rarely and even
>> native OS will encounter the same problem if it disable the interrupt for too
>long.
>
>And one OS may properly deal with that situation while another might not. For
>example, let me remind you of issues the hypervisor had in earlier days when no
>event/softirq processing happened for long enough, resulting in time
>management issues due to missed platform timer overflow handling. IOW an OS
>may avoid that situation by simply not disabling IRQs for too long a time.
>
Thanks for your reminding,
Jan, do you have any suggestion to fix this PT and apicv issue?
What do you think about Kevin's?

Quan

Jan Beulich Sept. 6, 2016, 1:20 p.m. UTC | #22

>>> On 06.09.16 at 13:22, <xuquan8@huawei.com> wrote:
> On September 06, 2016 5:05 PM, Jan Beulich <JBeulich@suse.com>
>>>>> On 06.09.16 at 10:53, <xuquan8@huawei.com> wrote:
>>> to be honest, my fix is also a tradeoff.. w/ my patch, multiple
>>> pending pt interrupt instances may be combined as one injection of
>>> guest timer interrupt..
>>>  but as Yang said, It should be ok since it happens rarely and even
>>> native OS will encounter the same problem if it disable the interrupt for too
>>long.
>>
>>And one OS may properly deal with that situation while another might not. For
>>example, let me remind you of issues the hypervisor had in earlier days when no
>>event/softirq processing happened for long enough, resulting in time
>>management issues due to missed platform timer overflow handling. IOW an OS
>>may avoid that situation by simply not disabling IRQs for too long a time.
>>
> Thanks for your reminding,
> Jan, do you have any suggestion to fix this PT and apicv issue?
> What do you think about Kevin's?

Kevin's suggestion sounded reasonable, iirc, and I'd rely on his
and your better VMX knowledge for the low level details.

Jan

Xuquan (Euler) Sept. 7, 2016, 12:47 a.m. UTC | #23

On September 06, 2016 9:21 PM, Jan Beulich <JBeulich@suse.com> wrote:
>>>> On 06.09.16 at 13:22, <xuquan8@huawei.com> wrote:
>> On September 06, 2016 5:05 PM, Jan Beulich <JBeulich@suse.com>
>>>>>> On 06.09.16 at 10:53, <xuquan8@huawei.com> wrote:
>>>> to be honest, my fix is also a tradeoff.. w/ my patch, multiple
>>>> pending pt interrupt instances may be combined as one injection of
>>>> guest timer interrupt..
>>>>  but as Yang said, It should be ok since it happens rarely and even
>>>> native OS will encounter the same problem if it disable the
>>>> interrupt for too
>>>long.
>>>
>>>And one OS may properly deal with that situation while another might
>>>not. For example, let me remind you of issues the hypervisor had in
>>>earlier days when no event/softirq processing happened for long
>>>enough, resulting in time management issues due to missed platform
>>>timer overflow handling. IOW an OS may avoid that situation by simply not
>disabling IRQs for too long a time.
>>>
>> Thanks for your reminding,
>> Jan, do you have any suggestion to fix this PT and apicv issue?
>> What do you think about Kevin's?
>
>Kevin's suggestion sounded reasonable, iirc, and I'd rely on his and your better
>VMX knowledge for the low level details.
>

I'll try to verify Kevin's later.
Kevin is really a master but I am just a rookie. I always appreciate both of your education..

Quan

[RFC] x86/apicv: fix RTC periodic timer and apicv issue

Commit Message

Comments

Patch