[xen-unstable,test] 104131: regressions - FAIL

Message ID	E0A769A898ADB6449596C41F51EF62C6AE5466@SZXEMI506-MBX.china.huawei.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <xen-devel-bounces@lists.xen.org> From: "Xuquan (Quan Xu)" <xuquan8@huawei.com> To: "Tian, Kevin" <kevin.tian@intel.com>, Jan Beulich <JBeulich@suse.com> Thread-Topic: [Xen-devel] [xen-unstable test] 104131: regressions - FAIL Thread-Index: AQHSbLRB4GrLEGPSIkuiqONC4BxadaE0uyqA//+AhQCAAALNgIAF0/aAgAcIRmCABNjLEIAYXHaAgACG5KA= Date: Wed, 8 Feb 2017 08:27:11 +0000 Message-ID: <E0A769A898ADB6449596C41F51EF62C6AE5466@SZXEMI506-MBX.china.huawei.com> References: <osstest-104131-mainreport@xen.org> <a91a6ecc-2a4f-01db-3bb2-d6faa2501f94@citrix.com> <E0A769A898ADB6449596C41F51EF62C6ADE819@SZXEMI506-MBX.china.huawei.com> <56880ae2-198c-a8e5-1d2c-a440193c6e65@citrix.com> <587783DB020000780012F65C@prv-mh.provo.novell.com> <AADFC41AFE54684AB9EE6CBC0274A5D190C062DA@SHSMSX101.ccr.corp.intel.com> <E0A769A898ADB6449596C41F51EF62C6AE2093@SZXEMI506-MBX.china.huawei.com> <AADFC41AFE54684AB9EE6CBC0274A5D190C2A99C@SHSMSX101.ccr.corp.intel.com> In-Reply-To: <AADFC41AFE54684AB9EE6CBC0274A5D190C2A99C@SHSMSX101.ccr.corp.intel.com> Accept-Language: en-US Content-Language: zh-CN MIME-Version: 1.0 Cc: "yang.zhang.wz@gmail.com" <yang.zhang.wz@gmail.com>, Andrew Cooper <andrew.cooper3@citrix.com>, "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>, "Gao, Chao" <chao.gao@intel.com> Subject: Re: [Xen-devel] [xen-unstable test] 104131: regressions - FAIL Precedence: list Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" <xen-devel-bounces@lists.xen.org>

Xuquan (Euler) Feb. 8, 2017, 8:27 a.m. UTC

On February 08, 2017 2:51 PM, Tian, Kevin wrote:
>> From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com]
>> Sent: Monday, January 23, 2017 6:57 PM
>>
>> On January 20, 2017 5:09 PM, Quan Xu wrote:
>> >btw, for PIR.. I find that there might be a bug in
>> >__vmx_deliver_posted_interrupt()...
>> >why test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu)) ??
>> >
>> >static void __vmx_deliver_posted_interrupt(struct vcpu *v) { ...
>> >        if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ,
>> >&softirq_pending(cpu)) ...
>> >}
>> >
>> >Suppose that vCPUx is in guest mode, there are two (even more)
>> >interrupts to vCPUx..
>> >As the bit is set when delivers the first interrupt... the second
>> >interrupt is pending until next VM entry -- by PIR to vIRR..
>> >
>>
>> Jan , Kevin
>> Correct me if I am wrong...
>>
>> Quan
>
>I don't quite understand the point here. Can you elaborate?
>

Assumed vCPU is in guest_mode..
When apicv is enabled, hypervisor calls vmx_deliver_posted_intr(), then __vmx_deliver_posted_interrupt() to deliver interrupt, but no vmexit (also no vcpu_kick() )..
In __vmx_deliver_posted_interrupt(), it is __conditional__ to deliver posted interrupt. if posted interrupt is not delivered, the posted interrupt is 
pending until next VM entry -- by PIR to vIRR.. 

one condition is :
In __vmx_deliver_posted_interrupt(),  ' if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))' ..

Specifically, we did verify it by RES interrupt, which is used for smp_reschedule_interrupt..
We even cost more time to deliver RES interrupt than no-apicv in average..

If RES interrupt (no. 1) is delivered by posted way (the vcpu is still guest_mode).. when tries to deliver next-coming RES interrupt (no. 2) by posted way,
The next-coming RES interrupt (no. 2) is not delivered, as we set the VCPU_KICK_SOFTIRQ bit when we deliver RES interrupt (no. 1)..

Then the next-coming RES interrupt (no. 2) is pending until next VM entry -- by PIR to vIRR..

We can fix it as below(I don't think this is a best one, it is better to set the VCPU_KICK_SOFTIRQ bit, but not test it):

To be honest, I really spent several days for the original awkward-description:).. 

Happy Spring Festival!!
Quan

Chao Gao Feb. 8, 2017, 8:22 a.m. UTC | #1

On Wed, Feb 08, 2017 at 10:15:28AM +0000, Xuquan (Quan Xu) wrote:
>On February 08, 2017 4:52 PM, Jan Beulich wrote:
>>>>> On 08.02.17 at 09:27, <xuquan8@huawei.com> wrote:
>>> Assumed vCPU is in guest_mode..
>>> When apicv is enabled, hypervisor calls vmx_deliver_posted_intr(),
>>> then
>>> __vmx_deliver_posted_interrupt() to deliver interrupt, but no vmexit
>>> (also no
>>> vcpu_kick() )..
>>> In __vmx_deliver_posted_interrupt(), it is __conditional__ to deliver
>>> posted interrupt. if posted interrupt is not delivered, the posted
>>> interrupt is pending until next VM entry -- by PIR to vIRR..
>>>
>>> one condition is :
>>> In __vmx_deliver_posted_interrupt(),  ' if (
>>> !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))' ..
>>>
>>> Specifically, we did verify it by RES interrupt, which is used for
>>> smp_reschedule_interrupt..
>>> We even cost more time to deliver RES interrupt than no-apicv in
>>average..
>>>
>>> If RES interrupt (no. 1) is delivered by posted way (the vcpu is still
>>> guest_mode).. when tries to deliver next-coming RES interrupt (no. 2)
>>> by posted way, The next-coming RES interrupt (no. 2) is not delivered,
>>> as we set the VCPU_KICK_SOFTIRQ bit when we deliver RES interrupt (no.
>>> 1)..
>>>
>>> Then the next-coming RES interrupt (no. 2) is pending until next VM
>>> entry -- by PIR to vIRR..
>>>
>>>
>>> We can fix it as below(I don't think this is a best one, it is better
>>> to set the VCPU_KICK_SOFTIRQ bit, but not test it):
>>>
>>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
>>> @@ -1846,7 +1846,7 @@ static void
>>__vmx_deliver_posted_interrupt(struct vcpu *v)
>>>      {
>>>          unsigned int cpu = v->processor;
>>>
>>> -        if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ,
>>&softirq_pending(cpu))
>>> +        if ( !test_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))
>>>               && (cpu != smp_processor_id()) )
>>>              send_IPI_mask(cpumask_of(cpu), posted_intr_vector);
>>>      }
>>
>>While I don't think I fully understand your description, 
>
>Sorry!!
>
>>the line you change
>>here has always been puzzling me: If we were to raise a softirq here, we
>>ought to call cpu_raise_softirq() instead of partly open coding what it does.
>>So I think not marking that softirq pending (but doing this incompletely) is
>>a valid change in any case.
>
>As comments in pi_notification_interrupt()  -- xen/arch/x86/hvm/vmx/vmx.c
>((((
>     *
>     * we need to set VCPU_KICK_SOFTIRQ for the current cpu, just like
>     * __vmx_deliver_posted_interrupt(). So the pending interrupt in PIRR will
>     * be synced to vIRR before VM-Exit in time.
>     *
>))))
>
>I think setting VCPU_KICK_SOFTIRQ bit -- the pending interrupt in PIRR will be synced to vIRR before VM-Exit in time.
>That's also why i said it is better to set the VCPU_KICK_SOFTIRQ bit, but not test it..
>

I think there is a typo. It should be "before VM-Entry in time". It set 
VCPU_KICK_SOFTIRQ bit only to jump to vmx_do_vmentry again instead of
entering guest directly. Jumping to vmx_do_vmentry again can re-sync the PIR to
vIRR in vmx_intr_assist(). In root-mode, cpu treat the pi notification interrupt
as normal interrupt, so cpu will run the interrupt handler pi_notification_interrupt()
instead of syncing PIR to vIRR automatically. Receiving a pi notificatio interrupt
means some interrupts have been posted in PIR. Setting that bit is to deliver 
these new arrival interrupts before this VM-entry. 

Thanks
chao

>
>>But I'll have to defer to Kevin in the hopes that he fully understands what
>>you explain above as well as him knowing why this was a test-and-set here
>>in the first place.
>>
>
>To me, this test-and-set is a bug.
>
>Quan
>

Jan Beulich Feb. 8, 2017, 8:52 a.m. UTC | #2

>>> On 08.02.17 at 09:27, <xuquan8@huawei.com> wrote:
> Assumed vCPU is in guest_mode..
> When apicv is enabled, hypervisor calls vmx_deliver_posted_intr(), then 
> __vmx_deliver_posted_interrupt() to deliver interrupt, but no vmexit (also no 
> vcpu_kick() )..
> In __vmx_deliver_posted_interrupt(), it is __conditional__ to deliver posted 
> interrupt. if posted interrupt is not delivered, the posted interrupt is 
> pending until next VM entry -- by PIR to vIRR.. 
> 
> one condition is :
> In __vmx_deliver_posted_interrupt(),  ' if ( 
> !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))' ..
> 
> Specifically, we did verify it by RES interrupt, which is used for 
> smp_reschedule_interrupt..
> We even cost more time to deliver RES interrupt than no-apicv in average..
> 
> If RES interrupt (no. 1) is delivered by posted way (the vcpu is still 
> guest_mode).. when tries to deliver next-coming RES interrupt (no. 2) by 
> posted way,
> The next-coming RES interrupt (no. 2) is not delivered, as we set the 
> VCPU_KICK_SOFTIRQ bit when we deliver RES interrupt (no. 1)..
> 
> Then the next-coming RES interrupt (no. 2) is pending until next VM entry -- by 
> PIR to vIRR..
> 
> 
> We can fix it as below(I don't think this is a best one, it is better to set 
> the VCPU_KICK_SOFTIRQ bit, but not test it):
> 
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -1846,7 +1846,7 @@ static void __vmx_deliver_posted_interrupt(struct vcpu *v)
>      {
>          unsigned int cpu = v->processor;
> 
> -        if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))
> +        if ( !test_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))
>               && (cpu != smp_processor_id()) )
>              send_IPI_mask(cpumask_of(cpu), posted_intr_vector);
>      }

While I don't think I fully understand your description, the line you
change here has always been puzzling me: If we were to raise a
softirq here, we ought to call cpu_raise_softirq() instead of partly
open coding what it does. So I think not marking that softirq
pending (but doing this incompletely) is a valid change in any case.
But I'll have to defer to Kevin in the hopes that he fully
understands what you explain above as well as him knowing why
this was a test-and-set here in the first place.

Jan

Xuquan (Euler) Feb. 8, 2017, 10:15 a.m. UTC | #3

On February 08, 2017 4:52 PM, Jan Beulich wrote:
>>>> On 08.02.17 at 09:27, <xuquan8@huawei.com> wrote:
>> Assumed vCPU is in guest_mode..
>> When apicv is enabled, hypervisor calls vmx_deliver_posted_intr(),
>> then
>> __vmx_deliver_posted_interrupt() to deliver interrupt, but no vmexit
>> (also no
>> vcpu_kick() )..
>> In __vmx_deliver_posted_interrupt(), it is __conditional__ to deliver
>> posted interrupt. if posted interrupt is not delivered, the posted
>> interrupt is pending until next VM entry -- by PIR to vIRR..
>>
>> one condition is :
>> In __vmx_deliver_posted_interrupt(),  ' if (
>> !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))' ..
>>
>> Specifically, we did verify it by RES interrupt, which is used for
>> smp_reschedule_interrupt..
>> We even cost more time to deliver RES interrupt than no-apicv in
>average..
>>
>> If RES interrupt (no. 1) is delivered by posted way (the vcpu is still
>> guest_mode).. when tries to deliver next-coming RES interrupt (no. 2)
>> by posted way, The next-coming RES interrupt (no. 2) is not delivered,
>> as we set the VCPU_KICK_SOFTIRQ bit when we deliver RES interrupt (no.
>> 1)..
>>
>> Then the next-coming RES interrupt (no. 2) is pending until next VM
>> entry -- by PIR to vIRR..
>>
>>
>> We can fix it as below(I don't think this is a best one, it is better
>> to set the VCPU_KICK_SOFTIRQ bit, but not test it):
>>
>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
>> @@ -1846,7 +1846,7 @@ static void
>__vmx_deliver_posted_interrupt(struct vcpu *v)
>>      {
>>          unsigned int cpu = v->processor;
>>
>> -        if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ,
>&softirq_pending(cpu))
>> +        if ( !test_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))
>>               && (cpu != smp_processor_id()) )
>>              send_IPI_mask(cpumask_of(cpu), posted_intr_vector);
>>      }
>
>While I don't think I fully understand your description, 

Sorry!!

>the line you change
>here has always been puzzling me: If we were to raise a softirq here, we
>ought to call cpu_raise_softirq() instead of partly open coding what it does.
>So I think not marking that softirq pending (but doing this incompletely) is
>a valid change in any case.

As comments in pi_notification_interrupt()  -- xen/arch/x86/hvm/vmx/vmx.c
((((
     *
     * we need to set VCPU_KICK_SOFTIRQ for the current cpu, just like
     * __vmx_deliver_posted_interrupt(). So the pending interrupt in PIRR will
     * be synced to vIRR before VM-Exit in time.
     *
))))

I think setting VCPU_KICK_SOFTIRQ bit -- the pending interrupt in PIRR will be synced to vIRR before VM-Exit in time.
That's also why i said it is better to set the VCPU_KICK_SOFTIRQ bit, but not test it..


>But I'll have to defer to Kevin in the hopes that he fully understands what
>you explain above as well as him knowing why this was a test-and-set here
>in the first place.
>

To me, this test-and-set is a bug.

Quan

Chao Gao Feb. 9, 2017, 3:41 a.m. UTC | #4

On Thu, Feb 09, 2017 at 08:51:46AM +0000, Xuquan (Quan Xu) wrote:
>On February 08, 2017 4:22 PM, Chao Gao wrote:
>>On Wed, Feb 08, 2017 at 10:15:28AM +0000, Xuquan (Quan Xu) wrote:
>>>On February 08, 2017 4:52 PM, Jan Beulich wrote:
>>>>>>> On 08.02.17 at 09:27, <xuquan8@huawei.com> wrote:
>>>>> Assumed vCPU is in guest_mode..
>>>>> When apicv is enabled, hypervisor calls vmx_deliver_posted_intr(),
>>>>> then
>>>>> __vmx_deliver_posted_interrupt() to deliver interrupt, but no vmexit
>>>>> (also no
>>>>> vcpu_kick() )..
>>>>> In __vmx_deliver_posted_interrupt(), it is __conditional__ to
>>>>> deliver posted interrupt. if posted interrupt is not delivered, the
>>>>> posted interrupt is pending until next VM entry -- by PIR to vIRR..
>>>>>
>>>>> one condition is :
>>>>> In __vmx_deliver_posted_interrupt(),  ' if (
>>>>> !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))' ..
>>>>>
>>>>> Specifically, we did verify it by RES interrupt, which is used for
>>>>> smp_reschedule_interrupt..
>>>>> We even cost more time to deliver RES interrupt than no-apicv in
>>>>average..
>>>>>
>>>>> If RES interrupt (no. 1) is delivered by posted way (the vcpu is
>>>>> still guest_mode).. when tries to deliver next-coming RES interrupt
>>>>> (no. 2) by posted way, The next-coming RES interrupt (no. 2) is not
>>>>> delivered, as we set the VCPU_KICK_SOFTIRQ bit when we deliver RES
>>interrupt (no.
>>>>> 1)..
>>>>>
>>>>> Then the next-coming RES interrupt (no. 2) is pending until next VM
>>>>> entry -- by PIR to vIRR..
>>>>>
>>>>>
>>>>> We can fix it as below(I don't think this is a best one, it is
>>>>> better to set the VCPU_KICK_SOFTIRQ bit, but not test it):
>>>>>
>>>>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>>>>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
>>>>> @@ -1846,7 +1846,7 @@ static void
>>>>__vmx_deliver_posted_interrupt(struct vcpu *v)
>>>>>      {
>>>>>          unsigned int cpu = v->processor;
>>>>>
>>>>> -        if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ,
>>>>&softirq_pending(cpu))
>>>>> +        if ( !test_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))
>>>>>               && (cpu != smp_processor_id()) )
>>>>>              send_IPI_mask(cpumask_of(cpu), posted_intr_vector);
>>>>>      }
>>>>
>>>>While I don't think I fully understand your description,
>>>
>>>Sorry!!
>>>
>>>>the line you change
>>>>here has always been puzzling me: If we were to raise a softirq here,
>>>>we ought to call cpu_raise_softirq() instead of partly open coding what it
>>does.
>>>>So I think not marking that softirq pending (but doing this
>>>>incompletely) is a valid change in any case.
>>>
>>>As comments in pi_notification_interrupt()  --
>>>xen/arch/x86/hvm/vmx/vmx.c ((((
>>>     *
>>>     * we need to set VCPU_KICK_SOFTIRQ for the current cpu, just like
>>>     * __vmx_deliver_posted_interrupt(). So the pending interrupt in
>>PIRR will
>>>     * be synced to vIRR before VM-Exit in time.
>>>     *
>>>))))
>>>
>>>I think setting VCPU_KICK_SOFTIRQ bit -- the pending interrupt in PIRR will
>>be synced to vIRR before VM-Exit in time.
>>>That's also why i said it is better to set the VCPU_KICK_SOFTIRQ bit, but
>>not test it..
>>>
>>
>>I think there is a typo. It should be "before VM-Entry in time". It set
>>VCPU_KICK_SOFTIRQ bit only to jump to vmx_do_vmentry again instead of
>>entering guest directly. Jumping to vmx_do_vmentry again can re-sync the
>>PIR to vIRR in vmx_intr_assist(). 
>
>impressive analysis..
>chao, could you show the related code?
>

In xen/arch/x86/vmx/entry.S, 
.Lvmx_do_vmentry:
	call vmx_intr_assist
	... 
	cli
	cmp %ecx,(%rdx,%rax,1)
	jnz .Lvmx_process_softirqs
and 
.Lvmx_process_softirqs:
	sti
	call do_softirq
	jmp .Lvmx_do_vmentry	

In vmx_intr_assist(), PIR is synced to vIRR. After vmx_intr_assist(), 
if some interrupts are posted in PIR, VCPU_KICK_SOFTIRQ is set 
in pi_nofitication_interrupt() and it will jump to vmx_process_softirqs
and jump to vmx_do_vmentry again.

Thanks
Chao

>Quan
>
>
>>In root-mode, cpu treat the pi notification
>>interrupt as normal interrupt, so cpu will run the interrupt handler
>>pi_notification_interrupt() instead of syncing PIR to vIRR automatically.
>>Receiving a pi notificatio interrupt means some interrupts have been
>>posted in PIR. Setting that bit is to deliver these new arrival interrupts
>>before this VM-entry.
>>
>

Xuquan (Euler) Feb. 9, 2017, 8:51 a.m. UTC | #5

On February 08, 2017 4:22 PM, Chao Gao wrote:
>On Wed, Feb 08, 2017 at 10:15:28AM +0000, Xuquan (Quan Xu) wrote:
>>On February 08, 2017 4:52 PM, Jan Beulich wrote:
>>>>>> On 08.02.17 at 09:27, <xuquan8@huawei.com> wrote:
>>>> Assumed vCPU is in guest_mode..
>>>> When apicv is enabled, hypervisor calls vmx_deliver_posted_intr(),
>>>> then
>>>> __vmx_deliver_posted_interrupt() to deliver interrupt, but no vmexit
>>>> (also no
>>>> vcpu_kick() )..
>>>> In __vmx_deliver_posted_interrupt(), it is __conditional__ to
>>>> deliver posted interrupt. if posted interrupt is not delivered, the
>>>> posted interrupt is pending until next VM entry -- by PIR to vIRR..
>>>>
>>>> one condition is :
>>>> In __vmx_deliver_posted_interrupt(),  ' if (
>>>> !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))' ..
>>>>
>>>> Specifically, we did verify it by RES interrupt, which is used for
>>>> smp_reschedule_interrupt..
>>>> We even cost more time to deliver RES interrupt than no-apicv in
>>>average..
>>>>
>>>> If RES interrupt (no. 1) is delivered by posted way (the vcpu is
>>>> still guest_mode).. when tries to deliver next-coming RES interrupt
>>>> (no. 2) by posted way, The next-coming RES interrupt (no. 2) is not
>>>> delivered, as we set the VCPU_KICK_SOFTIRQ bit when we deliver RES
>interrupt (no.
>>>> 1)..
>>>>
>>>> Then the next-coming RES interrupt (no. 2) is pending until next VM
>>>> entry -- by PIR to vIRR..
>>>>
>>>>
>>>> We can fix it as below(I don't think this is a best one, it is
>>>> better to set the VCPU_KICK_SOFTIRQ bit, but not test it):
>>>>
>>>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>>>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
>>>> @@ -1846,7 +1846,7 @@ static void
>>>__vmx_deliver_posted_interrupt(struct vcpu *v)
>>>>      {
>>>>          unsigned int cpu = v->processor;
>>>>
>>>> -        if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ,
>>>&softirq_pending(cpu))
>>>> +        if ( !test_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))
>>>>               && (cpu != smp_processor_id()) )
>>>>              send_IPI_mask(cpumask_of(cpu), posted_intr_vector);
>>>>      }
>>>
>>>While I don't think I fully understand your description,
>>
>>Sorry!!
>>
>>>the line you change
>>>here has always been puzzling me: If we were to raise a softirq here,
>>>we ought to call cpu_raise_softirq() instead of partly open coding what it
>does.
>>>So I think not marking that softirq pending (but doing this
>>>incompletely) is a valid change in any case.
>>
>>As comments in pi_notification_interrupt()  --
>>xen/arch/x86/hvm/vmx/vmx.c ((((
>>     *
>>     * we need to set VCPU_KICK_SOFTIRQ for the current cpu, just like
>>     * __vmx_deliver_posted_interrupt(). So the pending interrupt in
>PIRR will
>>     * be synced to vIRR before VM-Exit in time.
>>     *
>>))))
>>
>>I think setting VCPU_KICK_SOFTIRQ bit -- the pending interrupt in PIRR will
>be synced to vIRR before VM-Exit in time.
>>That's also why i said it is better to set the VCPU_KICK_SOFTIRQ bit, but
>not test it..
>>
>
>I think there is a typo. It should be "before VM-Entry in time". It set
>VCPU_KICK_SOFTIRQ bit only to jump to vmx_do_vmentry again instead of
>entering guest directly. Jumping to vmx_do_vmentry again can re-sync the
>PIR to vIRR in vmx_intr_assist(). 

impressive analysis..
chao, could you show the related code?

Quan


>In root-mode, cpu treat the pi notification
>interrupt as normal interrupt, so cpu will run the interrupt handler
>pi_notification_interrupt() instead of syncing PIR to vIRR automatically.
>Receiving a pi notificatio interrupt means some interrupts have been
>posted in PIR. Setting that bit is to deliver these new arrival interrupts
>before this VM-entry.
>

Tian, Kevin Feb. 13, 2017, 8:21 a.m. UTC | #6

> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Wednesday, February 08, 2017 4:52 PM
> 
> >>> On 08.02.17 at 09:27, <xuquan8@huawei.com> wrote:
> > Assumed vCPU is in guest_mode..
> > When apicv is enabled, hypervisor calls vmx_deliver_posted_intr(), then
> > __vmx_deliver_posted_interrupt() to deliver interrupt, but no vmexit (also no
> > vcpu_kick() )..
> > In __vmx_deliver_posted_interrupt(), it is __conditional__ to deliver posted
> > interrupt. if posted interrupt is not delivered, the posted interrupt is
> > pending until next VM entry -- by PIR to vIRR..
> >
> > one condition is :
> > In __vmx_deliver_posted_interrupt(),  ' if (
> > !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))' ..
> >
> > Specifically, we did verify it by RES interrupt, which is used for
> > smp_reschedule_interrupt..
> > We even cost more time to deliver RES interrupt than no-apicv in average..
> >
> > If RES interrupt (no. 1) is delivered by posted way (the vcpu is still
> > guest_mode).. when tries to deliver next-coming RES interrupt (no. 2) by
> > posted way,
> > The next-coming RES interrupt (no. 2) is not delivered, as we set the
> > VCPU_KICK_SOFTIRQ bit when we deliver RES interrupt (no. 1)..
> >
> > Then the next-coming RES interrupt (no. 2) is pending until next VM entry -- by
> > PIR to vIRR..
> >
> >
> > We can fix it as below(I don't think this is a best one, it is better to set
> > the VCPU_KICK_SOFTIRQ bit, but not test it):
> >
> > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > @@ -1846,7 +1846,7 @@ static void __vmx_deliver_posted_interrupt(struct vcpu *v)
> >      {
> >          unsigned int cpu = v->processor;
> >
> > -        if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))
> > +        if ( !test_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))
> >               && (cpu != smp_processor_id()) )
> >              send_IPI_mask(cpumask_of(cpu), posted_intr_vector);
> >      }
> 
> While I don't think I fully understand your description, the line you
> change here has always been puzzling me: If we were to raise a
> softirq here, we ought to call cpu_raise_softirq() instead of partly
> open coding what it does. 

We require posted_intr_vector for target CPU to ack/deliver virtual 
interrupt in non-root mode. cpu_raise_softirq uses a different vector,
which cannot trigger such effect. 

> So I think not marking that softirq
> pending (but doing this incompletely) is a valid change in any case.
> But I'll have to defer to Kevin in the hopes that he fully
> understands what you explain above as well as him knowing why
> this was a test-and-set here in the first place.
> 

I agree we have a misuse of softirq mechanism here. If guest 
is already in non-root mode, the 1st posted interrupt will be directly 
delivered to guest (leaving softirq being set w/o actually incurring a 
VM-exit - breaking desired softirq behavior). Then further posted 
interrupts will skip the IPI, stay in PIR and not noted until another 
VM-exit happens. Looks Quan observes such delay of delivery in
his experiments.

I'm OK to remove the set here. Actually since it's an optimization
for less IPIs, we'd better check softirq_pending(cpu) directly 
instead of sticking to one bit only.

Thanks
Kevin

Tian, Kevin Feb. 13, 2017, 8:23 a.m. UTC | #7

> From: Tian, Kevin
> Sent: Monday, February 13, 2017 4:21 PM
> 
> > From: Jan Beulich [mailto:JBeulich@suse.com]
> > Sent: Wednesday, February 08, 2017 4:52 PM
> >
> > >>> On 08.02.17 at 09:27, <xuquan8@huawei.com> wrote:
> > > Assumed vCPU is in guest_mode..
> > > When apicv is enabled, hypervisor calls vmx_deliver_posted_intr(), then
> > > __vmx_deliver_posted_interrupt() to deliver interrupt, but no vmexit (also no
> > > vcpu_kick() )..
> > > In __vmx_deliver_posted_interrupt(), it is __conditional__ to deliver posted
> > > interrupt. if posted interrupt is not delivered, the posted interrupt is
> > > pending until next VM entry -- by PIR to vIRR..
> > >
> > > one condition is :
> > > In __vmx_deliver_posted_interrupt(),  ' if (
> > > !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))' ..
> > >
> > > Specifically, we did verify it by RES interrupt, which is used for
> > > smp_reschedule_interrupt..
> > > We even cost more time to deliver RES interrupt than no-apicv in average..
> > >
> > > If RES interrupt (no. 1) is delivered by posted way (the vcpu is still
> > > guest_mode).. when tries to deliver next-coming RES interrupt (no. 2) by
> > > posted way,
> > > The next-coming RES interrupt (no. 2) is not delivered, as we set the
> > > VCPU_KICK_SOFTIRQ bit when we deliver RES interrupt (no. 1)..
> > >
> > > Then the next-coming RES interrupt (no. 2) is pending until next VM entry -- by
> > > PIR to vIRR..
> > >
> > >
> > > We can fix it as below(I don't think this is a best one, it is better to set
> > > the VCPU_KICK_SOFTIRQ bit, but not test it):
> > >
> > > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > > @@ -1846,7 +1846,7 @@ static void __vmx_deliver_posted_interrupt(struct vcpu *v)
> > >      {
> > >          unsigned int cpu = v->processor;
> > >
> > > -        if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))
> > > +        if ( !test_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))
> > >               && (cpu != smp_processor_id()) )
> > >              send_IPI_mask(cpumask_of(cpu), posted_intr_vector);
> > >      }
> >
> > While I don't think I fully understand your description, the line you
> > change here has always been puzzling me: If we were to raise a
> > softirq here, we ought to call cpu_raise_softirq() instead of partly
> > open coding what it does.
> 
> We require posted_intr_vector for target CPU to ack/deliver virtual
> interrupt in non-root mode. cpu_raise_softirq uses a different vector,
> which cannot trigger such effect.
> 
> > So I think not marking that softirq
> > pending (but doing this incompletely) is a valid change in any case.
> > But I'll have to defer to Kevin in the hopes that he fully
> > understands what you explain above as well as him knowing why
> > this was a test-and-set here in the first place.
> >
> 
> I agree we have a misuse of softirq mechanism here. If guest
> is already in non-root mode, the 1st posted interrupt will be directly
> delivered to guest (leaving softirq being set w/o actually incurring a
> VM-exit - breaking desired softirq behavior). Then further posted
> interrupts will skip the IPI, stay in PIR and not noted until another
> VM-exit happens. Looks Quan observes such delay of delivery in
> his experiments.
> 
> I'm OK to remove the set here. Actually since it's an optimization
> for less IPIs, we'd better check softirq_pending(cpu) directly
> instead of sticking to one bit only.
>

sent too fast... Quan, can you work out a patch following this
suggestion and see whether your slow-delivery issue is solved?
(hope I understand your issue correctly here).

Thanks
Kevin

Xuquan (Euler) Feb. 14, 2017, 1:22 a.m. UTC | #8

On February 13, 2017 4:24 PM, Tian, Kevin wrote:
>> From: Tian, Kevin
>> Sent: Monday, February 13, 2017 4:21 PM
>>
>> > From: Jan Beulich [mailto:JBeulich@suse.com]
>> > Sent: Wednesday, February 08, 2017 4:52 PM
>> >
>> > >>> On 08.02.17 at 09:27, <xuquan8@huawei.com> wrote:
>> > > Assumed vCPU is in guest_mode..
>> > > When apicv is enabled, hypervisor calls vmx_deliver_posted_intr(),
>> > > then
>> > > __vmx_deliver_posted_interrupt() to deliver interrupt, but no
>> > > vmexit (also no
>> > > vcpu_kick() )..
>> > > In __vmx_deliver_posted_interrupt(), it is __conditional__ to
>> > > deliver posted interrupt. if posted interrupt is not delivered,
>> > > the posted interrupt is pending until next VM entry -- by PIR to vIRR..
>> > >
>> > > one condition is :
>> > > In __vmx_deliver_posted_interrupt(),  ' if (
>> > > !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))' ..
>> > >
>> > > Specifically, we did verify it by RES interrupt, which is used for
>> > > smp_reschedule_interrupt..
>> > > We even cost more time to deliver RES interrupt than no-apicv in
>average..
>> > >
>> > > If RES interrupt (no. 1) is delivered by posted way (the vcpu is
>> > > still guest_mode).. when tries to deliver next-coming RES
>> > > interrupt (no. 2) by posted way, The next-coming RES interrupt
>> > > (no. 2) is not delivered, as we set the VCPU_KICK_SOFTIRQ bit when
>> > > we deliver RES interrupt (no. 1)..
>> > >
>> > > Then the next-coming RES interrupt (no. 2) is pending until next
>> > > VM entry -- by PIR to vIRR..
>> > >
>> > >
>> > > We can fix it as below(I don't think this is a best one, it is
>> > > better to set the VCPU_KICK_SOFTIRQ bit, but not test it):
>> > >
>> > > --- a/xen/arch/x86/hvm/vmx/vmx.c
>> > > +++ b/xen/arch/x86/hvm/vmx/vmx.c
>> > > @@ -1846,7 +1846,7 @@ static void
>__vmx_deliver_posted_interrupt(struct vcpu *v)
>> > >      {
>> > >          unsigned int cpu = v->processor;
>> > >
>> > > -        if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ,
>&softirq_pending(cpu))
>> > > +        if ( !test_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))
>> > >               && (cpu != smp_processor_id()) )
>> > >              send_IPI_mask(cpumask_of(cpu), posted_intr_vector);
>> > >      }
>> >
>> > While I don't think I fully understand your description, the line
>> > you change here has always been puzzling me: If we were to raise a
>> > softirq here, we ought to call cpu_raise_softirq() instead of partly
>> > open coding what it does.
>>
>> We require posted_intr_vector for target CPU to ack/deliver virtual
>> interrupt in non-root mode. cpu_raise_softirq uses a different vector,
>> which cannot trigger such effect.
>>
>> > So I think not marking that softirq
>> > pending (but doing this incompletely) is a valid change in any case.
>> > But I'll have to defer to Kevin in the hopes that he fully
>> > understands what you explain above as well as him knowing why this
>> > was a test-and-set here in the first place.
>> >
>>
>> I agree we have a misuse of softirq mechanism here. If guest is
>> already in non-root mode, the 1st posted interrupt will be directly
>> delivered to guest (leaving softirq being set w/o actually incurring a
>> VM-exit - breaking desired softirq behavior). Then further posted
>> interrupts will skip the IPI, stay in PIR and not noted until another
>> VM-exit happens. Looks Quan observes such delay of delivery in his
>> experiments.
>>
>> I'm OK to remove the set here. Actually since it's an optimization for
>> less IPIs, we'd better check softirq_pending(cpu) directly instead of
>> sticking to one bit only.
>>
>
>sent too fast... Quan, can you work out a patch following this suggestion
>and see whether your slow-delivery issue is solved?
>(hope I understand your issue correctly here).
>

Cool, Very Correct!! 
Sure, I will send out a patch in coming days..

Quan

Xuquan (Euler) Feb. 20, 2017, 12:04 p.m. UTC | #9

On February 13, 2017 4:21 PM, Tian, Kevin wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: Wednesday, February 08, 2017 4:52 PM
>>
>> >>> On 08.02.17 at 09:27, <xuquan8@huawei.com> wrote:
>> > Assumed vCPU is in guest_mode..
>> > When apicv is enabled, hypervisor calls vmx_deliver_posted_intr(),
>> > then
>> > __vmx_deliver_posted_interrupt() to deliver interrupt, but no vmexit
>> > (also no
>> > vcpu_kick() )..
>> > In __vmx_deliver_posted_interrupt(), it is __conditional__ to
>> > deliver posted interrupt. if posted interrupt is not delivered, the
>> > posted interrupt is pending until next VM entry -- by PIR to vIRR..
>> >
>> > one condition is :
>> > In __vmx_deliver_posted_interrupt(),  ' if (
>> > !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))' ..
>> >
>> > Specifically, we did verify it by RES interrupt, which is used for
>> > smp_reschedule_interrupt..
>> > We even cost more time to deliver RES interrupt than no-apicv in
>average..
>> >
>> > If RES interrupt (no. 1) is delivered by posted way (the vcpu is
>> > still guest_mode).. when tries to deliver next-coming RES interrupt
>> > (no. 2) by posted way, The next-coming RES interrupt (no. 2) is not
>> > delivered, as we set the VCPU_KICK_SOFTIRQ bit when we deliver RES
>> > interrupt (no. 1)..
>> >
>> > Then the next-coming RES interrupt (no. 2) is pending until next VM
>> > entry -- by PIR to vIRR..
>> >
>> >
>> > We can fix it as below(I don't think this is a best one, it is
>> > better to set the VCPU_KICK_SOFTIRQ bit, but not test it):
>> >
>> > --- a/xen/arch/x86/hvm/vmx/vmx.c
>> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
>> > @@ -1846,7 +1846,7 @@ static void
>__vmx_deliver_posted_interrupt(struct vcpu *v)
>> >      {
>> >          unsigned int cpu = v->processor;
>> >
>> > -        if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ,
>&softirq_pending(cpu))
>> > +        if ( !test_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))
>> >               && (cpu != smp_processor_id()) )
>> >              send_IPI_mask(cpumask_of(cpu), posted_intr_vector);
>> >      }
>>
>> While I don't think I fully understand your description, the line you
>> change here has always been puzzling me: If we were to raise a softirq
>> here, we ought to call cpu_raise_softirq() instead of partly open
>> coding what it does.
>
>We require posted_intr_vector for target CPU to ack/deliver virtual
>interrupt in non-root mode. cpu_raise_softirq uses a different vector, which
>cannot trigger such effect.
>


Kevin,

I can't follow this 'to ack'..
As I understand, the posted_intr_vector is to call event_check_interrupt() [ or pi_notification_interrupt() ] to writes zero to the EOI register in the local APIC --
this dismisses the interrupt with the posted interrupt notification vector from the local APIC.

What does this ack refer to?






>> So I think not marking that softirq
>> pending (but doing this incompletely) is a valid change in any case.
>> But I'll have to defer to Kevin in the hopes that he fully understands
>> what you explain above as well as him knowing why this was a
>> test-and-set here in the first place.
>>
>
>I agree we have a misuse of softirq mechanism here. If guest is already in
>non-root mode, the 1st posted interrupt will be directly delivered to guest
>(leaving softirq being set w/o actually incurring a VM-exit - breaking desired
>softirq behavior). Then further posted interrupts will skip the IPI, stay in PIR
>and not noted until another VM-exit happens. Looks Quan observes such
>delay of delivery in his experiments.
>
>I'm OK to remove the set here. Actually since it's an optimization for less
>IPIs, we'd better check softirq_pending(cpu) directly instead of sticking to
>one bit only.
>
>Thanks
>Kevin
>
>
>

Tian, Kevin Feb. 21, 2017, 3:17 a.m. UTC | #10

> From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com]
> Sent: Monday, February 20, 2017 8:04 PM
> 
> On February 13, 2017 4:21 PM, Tian, Kevin wrote:
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> Sent: Wednesday, February 08, 2017 4:52 PM
> >>
> >> >>> On 08.02.17 at 09:27, <xuquan8@huawei.com> wrote:
> >> > Assumed vCPU is in guest_mode..
> >> > When apicv is enabled, hypervisor calls vmx_deliver_posted_intr(),
> >> > then
> >> > __vmx_deliver_posted_interrupt() to deliver interrupt, but no vmexit
> >> > (also no
> >> > vcpu_kick() )..
> >> > In __vmx_deliver_posted_interrupt(), it is __conditional__ to
> >> > deliver posted interrupt. if posted interrupt is not delivered, the
> >> > posted interrupt is pending until next VM entry -- by PIR to vIRR..
> >> >
> >> > one condition is :
> >> > In __vmx_deliver_posted_interrupt(),  ' if (
> >> > !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))' ..
> >> >
> >> > Specifically, we did verify it by RES interrupt, which is used for
> >> > smp_reschedule_interrupt..
> >> > We even cost more time to deliver RES interrupt than no-apicv in
> >average..
> >> >
> >> > If RES interrupt (no. 1) is delivered by posted way (the vcpu is
> >> > still guest_mode).. when tries to deliver next-coming RES interrupt
> >> > (no. 2) by posted way, The next-coming RES interrupt (no. 2) is not
> >> > delivered, as we set the VCPU_KICK_SOFTIRQ bit when we deliver RES
> >> > interrupt (no. 1)..
> >> >
> >> > Then the next-coming RES interrupt (no. 2) is pending until next VM
> >> > entry -- by PIR to vIRR..
> >> >
> >> >
> >> > We can fix it as below(I don't think this is a best one, it is
> >> > better to set the VCPU_KICK_SOFTIRQ bit, but not test it):
> >> >
> >> > --- a/xen/arch/x86/hvm/vmx/vmx.c
> >> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> >> > @@ -1846,7 +1846,7 @@ static void
> >__vmx_deliver_posted_interrupt(struct vcpu *v)
> >> >      {
> >> >          unsigned int cpu = v->processor;
> >> >
> >> > -        if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ,
> >&softirq_pending(cpu))
> >> > +        if ( !test_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))
> >> >               && (cpu != smp_processor_id()) )
> >> >              send_IPI_mask(cpumask_of(cpu), posted_intr_vector);
> >> >      }
> >>
> >> While I don't think I fully understand your description, the line you
> >> change here has always been puzzling me: If we were to raise a softirq
> >> here, we ought to call cpu_raise_softirq() instead of partly open
> >> coding what it does.
> >
> >We require posted_intr_vector for target CPU to ack/deliver virtual
> >interrupt in non-root mode. cpu_raise_softirq uses a different vector, which
> >cannot trigger such effect.
> >
> 
> 
> Kevin,
> 
> I can't follow this 'to ack'..
> As I understand, the posted_intr_vector is to call event_check_interrupt() [ or
> pi_notification_interrupt() ] to writes zero to the EOI register in the local APIC --
> this dismisses the interrupt with the posted interrupt notification vector from the local
> APIC.
> 
> What does this ack refer to?
> 

Please look at SDM. 'ack' means evaluation of pending vIRRs when CPU is
in non-root mode which results in direct virtual interrupt delivery w/o incurring
VM-exit.

Thanks
Kevin

[xen-unstable,test] 104131: regressions - FAIL

Commit Message

Comments

Patch