Message ID | E0A769A898ADB6449596C41F51EF62C6AE5466@SZXEMI506-MBX.china.huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Feb 08, 2017 at 10:15:28AM +0000, Xuquan (Quan Xu) wrote: >On February 08, 2017 4:52 PM, Jan Beulich wrote: >>>>> On 08.02.17 at 09:27, <xuquan8@huawei.com> wrote: >>> Assumed vCPU is in guest_mode.. >>> When apicv is enabled, hypervisor calls vmx_deliver_posted_intr(), >>> then >>> __vmx_deliver_posted_interrupt() to deliver interrupt, but no vmexit >>> (also no >>> vcpu_kick() ).. >>> In __vmx_deliver_posted_interrupt(), it is __conditional__ to deliver >>> posted interrupt. if posted interrupt is not delivered, the posted >>> interrupt is pending until next VM entry -- by PIR to vIRR.. >>> >>> one condition is : >>> In __vmx_deliver_posted_interrupt(), ' if ( >>> !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))' .. >>> >>> Specifically, we did verify it by RES interrupt, which is used for >>> smp_reschedule_interrupt.. >>> We even cost more time to deliver RES interrupt than no-apicv in >>average.. >>> >>> If RES interrupt (no. 1) is delivered by posted way (the vcpu is still >>> guest_mode).. when tries to deliver next-coming RES interrupt (no. 2) >>> by posted way, The next-coming RES interrupt (no. 2) is not delivered, >>> as we set the VCPU_KICK_SOFTIRQ bit when we deliver RES interrupt (no. >>> 1).. >>> >>> Then the next-coming RES interrupt (no. 2) is pending until next VM >>> entry -- by PIR to vIRR.. >>> >>> >>> We can fix it as below(I don't think this is a best one, it is better >>> to set the VCPU_KICK_SOFTIRQ bit, but not test it): >>> >>> --- a/xen/arch/x86/hvm/vmx/vmx.c >>> +++ b/xen/arch/x86/hvm/vmx/vmx.c >>> @@ -1846,7 +1846,7 @@ static void >>__vmx_deliver_posted_interrupt(struct vcpu *v) >>> { >>> unsigned int cpu = v->processor; >>> >>> - if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ, >>&softirq_pending(cpu)) >>> + if ( !test_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu)) >>> && (cpu != smp_processor_id()) ) >>> send_IPI_mask(cpumask_of(cpu), posted_intr_vector); >>> } >> >>While I don't think I fully understand your description, > >Sorry!! > >>the line you change >>here has always been puzzling me: If we were to raise a softirq here, we >>ought to call cpu_raise_softirq() instead of partly open coding what it does. >>So I think not marking that softirq pending (but doing this incompletely) is >>a valid change in any case. > >As comments in pi_notification_interrupt() -- xen/arch/x86/hvm/vmx/vmx.c >(((( > * > * we need to set VCPU_KICK_SOFTIRQ for the current cpu, just like > * __vmx_deliver_posted_interrupt(). So the pending interrupt in PIRR will > * be synced to vIRR before VM-Exit in time. > * >)))) > >I think setting VCPU_KICK_SOFTIRQ bit -- the pending interrupt in PIRR will be synced to vIRR before VM-Exit in time. >That's also why i said it is better to set the VCPU_KICK_SOFTIRQ bit, but not test it.. > I think there is a typo. It should be "before VM-Entry in time". It set VCPU_KICK_SOFTIRQ bit only to jump to vmx_do_vmentry again instead of entering guest directly. Jumping to vmx_do_vmentry again can re-sync the PIR to vIRR in vmx_intr_assist(). In root-mode, cpu treat the pi notification interrupt as normal interrupt, so cpu will run the interrupt handler pi_notification_interrupt() instead of syncing PIR to vIRR automatically. Receiving a pi notificatio interrupt means some interrupts have been posted in PIR. Setting that bit is to deliver these new arrival interrupts before this VM-entry. Thanks chao > >>But I'll have to defer to Kevin in the hopes that he fully understands what >>you explain above as well as him knowing why this was a test-and-set here >>in the first place. >> > >To me, this test-and-set is a bug. > >Quan >
>>> On 08.02.17 at 09:27, <xuquan8@huawei.com> wrote: > Assumed vCPU is in guest_mode.. > When apicv is enabled, hypervisor calls vmx_deliver_posted_intr(), then > __vmx_deliver_posted_interrupt() to deliver interrupt, but no vmexit (also no > vcpu_kick() ).. > In __vmx_deliver_posted_interrupt(), it is __conditional__ to deliver posted > interrupt. if posted interrupt is not delivered, the posted interrupt is > pending until next VM entry -- by PIR to vIRR.. > > one condition is : > In __vmx_deliver_posted_interrupt(), ' if ( > !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))' .. > > Specifically, we did verify it by RES interrupt, which is used for > smp_reschedule_interrupt.. > We even cost more time to deliver RES interrupt than no-apicv in average.. > > If RES interrupt (no. 1) is delivered by posted way (the vcpu is still > guest_mode).. when tries to deliver next-coming RES interrupt (no. 2) by > posted way, > The next-coming RES interrupt (no. 2) is not delivered, as we set the > VCPU_KICK_SOFTIRQ bit when we deliver RES interrupt (no. 1).. > > Then the next-coming RES interrupt (no. 2) is pending until next VM entry -- by > PIR to vIRR.. > > > We can fix it as below(I don't think this is a best one, it is better to set > the VCPU_KICK_SOFTIRQ bit, but not test it): > > --- a/xen/arch/x86/hvm/vmx/vmx.c > +++ b/xen/arch/x86/hvm/vmx/vmx.c > @@ -1846,7 +1846,7 @@ static void __vmx_deliver_posted_interrupt(struct vcpu *v) > { > unsigned int cpu = v->processor; > > - if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu)) > + if ( !test_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu)) > && (cpu != smp_processor_id()) ) > send_IPI_mask(cpumask_of(cpu), posted_intr_vector); > } While I don't think I fully understand your description, the line you change here has always been puzzling me: If we were to raise a softirq here, we ought to call cpu_raise_softirq() instead of partly open coding what it does. So I think not marking that softirq pending (but doing this incompletely) is a valid change in any case. But I'll have to defer to Kevin in the hopes that he fully understands what you explain above as well as him knowing why this was a test-and-set here in the first place. Jan
On February 08, 2017 4:52 PM, Jan Beulich wrote: >>>> On 08.02.17 at 09:27, <xuquan8@huawei.com> wrote: >> Assumed vCPU is in guest_mode.. >> When apicv is enabled, hypervisor calls vmx_deliver_posted_intr(), >> then >> __vmx_deliver_posted_interrupt() to deliver interrupt, but no vmexit >> (also no >> vcpu_kick() ).. >> In __vmx_deliver_posted_interrupt(), it is __conditional__ to deliver >> posted interrupt. if posted interrupt is not delivered, the posted >> interrupt is pending until next VM entry -- by PIR to vIRR.. >> >> one condition is : >> In __vmx_deliver_posted_interrupt(), ' if ( >> !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))' .. >> >> Specifically, we did verify it by RES interrupt, which is used for >> smp_reschedule_interrupt.. >> We even cost more time to deliver RES interrupt than no-apicv in >average.. >> >> If RES interrupt (no. 1) is delivered by posted way (the vcpu is still >> guest_mode).. when tries to deliver next-coming RES interrupt (no. 2) >> by posted way, The next-coming RES interrupt (no. 2) is not delivered, >> as we set the VCPU_KICK_SOFTIRQ bit when we deliver RES interrupt (no. >> 1).. >> >> Then the next-coming RES interrupt (no. 2) is pending until next VM >> entry -- by PIR to vIRR.. >> >> >> We can fix it as below(I don't think this is a best one, it is better >> to set the VCPU_KICK_SOFTIRQ bit, but not test it): >> >> --- a/xen/arch/x86/hvm/vmx/vmx.c >> +++ b/xen/arch/x86/hvm/vmx/vmx.c >> @@ -1846,7 +1846,7 @@ static void >__vmx_deliver_posted_interrupt(struct vcpu *v) >> { >> unsigned int cpu = v->processor; >> >> - if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ, >&softirq_pending(cpu)) >> + if ( !test_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu)) >> && (cpu != smp_processor_id()) ) >> send_IPI_mask(cpumask_of(cpu), posted_intr_vector); >> } > >While I don't think I fully understand your description, Sorry!! >the line you change >here has always been puzzling me: If we were to raise a softirq here, we >ought to call cpu_raise_softirq() instead of partly open coding what it does. >So I think not marking that softirq pending (but doing this incompletely) is >a valid change in any case. As comments in pi_notification_interrupt() -- xen/arch/x86/hvm/vmx/vmx.c (((( * * we need to set VCPU_KICK_SOFTIRQ for the current cpu, just like * __vmx_deliver_posted_interrupt(). So the pending interrupt in PIRR will * be synced to vIRR before VM-Exit in time. * )))) I think setting VCPU_KICK_SOFTIRQ bit -- the pending interrupt in PIRR will be synced to vIRR before VM-Exit in time. That's also why i said it is better to set the VCPU_KICK_SOFTIRQ bit, but not test it.. >But I'll have to defer to Kevin in the hopes that he fully understands what >you explain above as well as him knowing why this was a test-and-set here >in the first place. > To me, this test-and-set is a bug. Quan
On Thu, Feb 09, 2017 at 08:51:46AM +0000, Xuquan (Quan Xu) wrote: >On February 08, 2017 4:22 PM, Chao Gao wrote: >>On Wed, Feb 08, 2017 at 10:15:28AM +0000, Xuquan (Quan Xu) wrote: >>>On February 08, 2017 4:52 PM, Jan Beulich wrote: >>>>>>> On 08.02.17 at 09:27, <xuquan8@huawei.com> wrote: >>>>> Assumed vCPU is in guest_mode.. >>>>> When apicv is enabled, hypervisor calls vmx_deliver_posted_intr(), >>>>> then >>>>> __vmx_deliver_posted_interrupt() to deliver interrupt, but no vmexit >>>>> (also no >>>>> vcpu_kick() ).. >>>>> In __vmx_deliver_posted_interrupt(), it is __conditional__ to >>>>> deliver posted interrupt. if posted interrupt is not delivered, the >>>>> posted interrupt is pending until next VM entry -- by PIR to vIRR.. >>>>> >>>>> one condition is : >>>>> In __vmx_deliver_posted_interrupt(), ' if ( >>>>> !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))' .. >>>>> >>>>> Specifically, we did verify it by RES interrupt, which is used for >>>>> smp_reschedule_interrupt.. >>>>> We even cost more time to deliver RES interrupt than no-apicv in >>>>average.. >>>>> >>>>> If RES interrupt (no. 1) is delivered by posted way (the vcpu is >>>>> still guest_mode).. when tries to deliver next-coming RES interrupt >>>>> (no. 2) by posted way, The next-coming RES interrupt (no. 2) is not >>>>> delivered, as we set the VCPU_KICK_SOFTIRQ bit when we deliver RES >>interrupt (no. >>>>> 1).. >>>>> >>>>> Then the next-coming RES interrupt (no. 2) is pending until next VM >>>>> entry -- by PIR to vIRR.. >>>>> >>>>> >>>>> We can fix it as below(I don't think this is a best one, it is >>>>> better to set the VCPU_KICK_SOFTIRQ bit, but not test it): >>>>> >>>>> --- a/xen/arch/x86/hvm/vmx/vmx.c >>>>> +++ b/xen/arch/x86/hvm/vmx/vmx.c >>>>> @@ -1846,7 +1846,7 @@ static void >>>>__vmx_deliver_posted_interrupt(struct vcpu *v) >>>>> { >>>>> unsigned int cpu = v->processor; >>>>> >>>>> - if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ, >>>>&softirq_pending(cpu)) >>>>> + if ( !test_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu)) >>>>> && (cpu != smp_processor_id()) ) >>>>> send_IPI_mask(cpumask_of(cpu), posted_intr_vector); >>>>> } >>>> >>>>While I don't think I fully understand your description, >>> >>>Sorry!! >>> >>>>the line you change >>>>here has always been puzzling me: If we were to raise a softirq here, >>>>we ought to call cpu_raise_softirq() instead of partly open coding what it >>does. >>>>So I think not marking that softirq pending (but doing this >>>>incompletely) is a valid change in any case. >>> >>>As comments in pi_notification_interrupt() -- >>>xen/arch/x86/hvm/vmx/vmx.c (((( >>> * >>> * we need to set VCPU_KICK_SOFTIRQ for the current cpu, just like >>> * __vmx_deliver_posted_interrupt(). So the pending interrupt in >>PIRR will >>> * be synced to vIRR before VM-Exit in time. >>> * >>>)))) >>> >>>I think setting VCPU_KICK_SOFTIRQ bit -- the pending interrupt in PIRR will >>be synced to vIRR before VM-Exit in time. >>>That's also why i said it is better to set the VCPU_KICK_SOFTIRQ bit, but >>not test it.. >>> >> >>I think there is a typo. It should be "before VM-Entry in time". It set >>VCPU_KICK_SOFTIRQ bit only to jump to vmx_do_vmentry again instead of >>entering guest directly. Jumping to vmx_do_vmentry again can re-sync the >>PIR to vIRR in vmx_intr_assist(). > >impressive analysis.. >chao, could you show the related code? > In xen/arch/x86/vmx/entry.S, .Lvmx_do_vmentry: call vmx_intr_assist ... cli cmp %ecx,(%rdx,%rax,1) jnz .Lvmx_process_softirqs and .Lvmx_process_softirqs: sti call do_softirq jmp .Lvmx_do_vmentry In vmx_intr_assist(), PIR is synced to vIRR. After vmx_intr_assist(), if some interrupts are posted in PIR, VCPU_KICK_SOFTIRQ is set in pi_nofitication_interrupt() and it will jump to vmx_process_softirqs and jump to vmx_do_vmentry again. Thanks Chao >Quan > > >>In root-mode, cpu treat the pi notification >>interrupt as normal interrupt, so cpu will run the interrupt handler >>pi_notification_interrupt() instead of syncing PIR to vIRR automatically. >>Receiving a pi notificatio interrupt means some interrupts have been >>posted in PIR. Setting that bit is to deliver these new arrival interrupts >>before this VM-entry. >> >
On February 08, 2017 4:22 PM, Chao Gao wrote: >On Wed, Feb 08, 2017 at 10:15:28AM +0000, Xuquan (Quan Xu) wrote: >>On February 08, 2017 4:52 PM, Jan Beulich wrote: >>>>>> On 08.02.17 at 09:27, <xuquan8@huawei.com> wrote: >>>> Assumed vCPU is in guest_mode.. >>>> When apicv is enabled, hypervisor calls vmx_deliver_posted_intr(), >>>> then >>>> __vmx_deliver_posted_interrupt() to deliver interrupt, but no vmexit >>>> (also no >>>> vcpu_kick() ).. >>>> In __vmx_deliver_posted_interrupt(), it is __conditional__ to >>>> deliver posted interrupt. if posted interrupt is not delivered, the >>>> posted interrupt is pending until next VM entry -- by PIR to vIRR.. >>>> >>>> one condition is : >>>> In __vmx_deliver_posted_interrupt(), ' if ( >>>> !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))' .. >>>> >>>> Specifically, we did verify it by RES interrupt, which is used for >>>> smp_reschedule_interrupt.. >>>> We even cost more time to deliver RES interrupt than no-apicv in >>>average.. >>>> >>>> If RES interrupt (no. 1) is delivered by posted way (the vcpu is >>>> still guest_mode).. when tries to deliver next-coming RES interrupt >>>> (no. 2) by posted way, The next-coming RES interrupt (no. 2) is not >>>> delivered, as we set the VCPU_KICK_SOFTIRQ bit when we deliver RES >interrupt (no. >>>> 1).. >>>> >>>> Then the next-coming RES interrupt (no. 2) is pending until next VM >>>> entry -- by PIR to vIRR.. >>>> >>>> >>>> We can fix it as below(I don't think this is a best one, it is >>>> better to set the VCPU_KICK_SOFTIRQ bit, but not test it): >>>> >>>> --- a/xen/arch/x86/hvm/vmx/vmx.c >>>> +++ b/xen/arch/x86/hvm/vmx/vmx.c >>>> @@ -1846,7 +1846,7 @@ static void >>>__vmx_deliver_posted_interrupt(struct vcpu *v) >>>> { >>>> unsigned int cpu = v->processor; >>>> >>>> - if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ, >>>&softirq_pending(cpu)) >>>> + if ( !test_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu)) >>>> && (cpu != smp_processor_id()) ) >>>> send_IPI_mask(cpumask_of(cpu), posted_intr_vector); >>>> } >>> >>>While I don't think I fully understand your description, >> >>Sorry!! >> >>>the line you change >>>here has always been puzzling me: If we were to raise a softirq here, >>>we ought to call cpu_raise_softirq() instead of partly open coding what it >does. >>>So I think not marking that softirq pending (but doing this >>>incompletely) is a valid change in any case. >> >>As comments in pi_notification_interrupt() -- >>xen/arch/x86/hvm/vmx/vmx.c (((( >> * >> * we need to set VCPU_KICK_SOFTIRQ for the current cpu, just like >> * __vmx_deliver_posted_interrupt(). So the pending interrupt in >PIRR will >> * be synced to vIRR before VM-Exit in time. >> * >>)))) >> >>I think setting VCPU_KICK_SOFTIRQ bit -- the pending interrupt in PIRR will >be synced to vIRR before VM-Exit in time. >>That's also why i said it is better to set the VCPU_KICK_SOFTIRQ bit, but >not test it.. >> > >I think there is a typo. It should be "before VM-Entry in time". It set >VCPU_KICK_SOFTIRQ bit only to jump to vmx_do_vmentry again instead of >entering guest directly. Jumping to vmx_do_vmentry again can re-sync the >PIR to vIRR in vmx_intr_assist(). impressive analysis.. chao, could you show the related code? Quan >In root-mode, cpu treat the pi notification >interrupt as normal interrupt, so cpu will run the interrupt handler >pi_notification_interrupt() instead of syncing PIR to vIRR automatically. >Receiving a pi notificatio interrupt means some interrupts have been >posted in PIR. Setting that bit is to deliver these new arrival interrupts >before this VM-entry. >
> From: Jan Beulich [mailto:JBeulich@suse.com] > Sent: Wednesday, February 08, 2017 4:52 PM > > >>> On 08.02.17 at 09:27, <xuquan8@huawei.com> wrote: > > Assumed vCPU is in guest_mode.. > > When apicv is enabled, hypervisor calls vmx_deliver_posted_intr(), then > > __vmx_deliver_posted_interrupt() to deliver interrupt, but no vmexit (also no > > vcpu_kick() ).. > > In __vmx_deliver_posted_interrupt(), it is __conditional__ to deliver posted > > interrupt. if posted interrupt is not delivered, the posted interrupt is > > pending until next VM entry -- by PIR to vIRR.. > > > > one condition is : > > In __vmx_deliver_posted_interrupt(), ' if ( > > !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))' .. > > > > Specifically, we did verify it by RES interrupt, which is used for > > smp_reschedule_interrupt.. > > We even cost more time to deliver RES interrupt than no-apicv in average.. > > > > If RES interrupt (no. 1) is delivered by posted way (the vcpu is still > > guest_mode).. when tries to deliver next-coming RES interrupt (no. 2) by > > posted way, > > The next-coming RES interrupt (no. 2) is not delivered, as we set the > > VCPU_KICK_SOFTIRQ bit when we deliver RES interrupt (no. 1).. > > > > Then the next-coming RES interrupt (no. 2) is pending until next VM entry -- by > > PIR to vIRR.. > > > > > > We can fix it as below(I don't think this is a best one, it is better to set > > the VCPU_KICK_SOFTIRQ bit, but not test it): > > > > --- a/xen/arch/x86/hvm/vmx/vmx.c > > +++ b/xen/arch/x86/hvm/vmx/vmx.c > > @@ -1846,7 +1846,7 @@ static void __vmx_deliver_posted_interrupt(struct vcpu *v) > > { > > unsigned int cpu = v->processor; > > > > - if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu)) > > + if ( !test_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu)) > > && (cpu != smp_processor_id()) ) > > send_IPI_mask(cpumask_of(cpu), posted_intr_vector); > > } > > While I don't think I fully understand your description, the line you > change here has always been puzzling me: If we were to raise a > softirq here, we ought to call cpu_raise_softirq() instead of partly > open coding what it does. We require posted_intr_vector for target CPU to ack/deliver virtual interrupt in non-root mode. cpu_raise_softirq uses a different vector, which cannot trigger such effect. > So I think not marking that softirq > pending (but doing this incompletely) is a valid change in any case. > But I'll have to defer to Kevin in the hopes that he fully > understands what you explain above as well as him knowing why > this was a test-and-set here in the first place. > I agree we have a misuse of softirq mechanism here. If guest is already in non-root mode, the 1st posted interrupt will be directly delivered to guest (leaving softirq being set w/o actually incurring a VM-exit - breaking desired softirq behavior). Then further posted interrupts will skip the IPI, stay in PIR and not noted until another VM-exit happens. Looks Quan observes such delay of delivery in his experiments. I'm OK to remove the set here. Actually since it's an optimization for less IPIs, we'd better check softirq_pending(cpu) directly instead of sticking to one bit only. Thanks Kevin
> From: Tian, Kevin > Sent: Monday, February 13, 2017 4:21 PM > > > From: Jan Beulich [mailto:JBeulich@suse.com] > > Sent: Wednesday, February 08, 2017 4:52 PM > > > > >>> On 08.02.17 at 09:27, <xuquan8@huawei.com> wrote: > > > Assumed vCPU is in guest_mode.. > > > When apicv is enabled, hypervisor calls vmx_deliver_posted_intr(), then > > > __vmx_deliver_posted_interrupt() to deliver interrupt, but no vmexit (also no > > > vcpu_kick() ).. > > > In __vmx_deliver_posted_interrupt(), it is __conditional__ to deliver posted > > > interrupt. if posted interrupt is not delivered, the posted interrupt is > > > pending until next VM entry -- by PIR to vIRR.. > > > > > > one condition is : > > > In __vmx_deliver_posted_interrupt(), ' if ( > > > !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))' .. > > > > > > Specifically, we did verify it by RES interrupt, which is used for > > > smp_reschedule_interrupt.. > > > We even cost more time to deliver RES interrupt than no-apicv in average.. > > > > > > If RES interrupt (no. 1) is delivered by posted way (the vcpu is still > > > guest_mode).. when tries to deliver next-coming RES interrupt (no. 2) by > > > posted way, > > > The next-coming RES interrupt (no. 2) is not delivered, as we set the > > > VCPU_KICK_SOFTIRQ bit when we deliver RES interrupt (no. 1).. > > > > > > Then the next-coming RES interrupt (no. 2) is pending until next VM entry -- by > > > PIR to vIRR.. > > > > > > > > > We can fix it as below(I don't think this is a best one, it is better to set > > > the VCPU_KICK_SOFTIRQ bit, but not test it): > > > > > > --- a/xen/arch/x86/hvm/vmx/vmx.c > > > +++ b/xen/arch/x86/hvm/vmx/vmx.c > > > @@ -1846,7 +1846,7 @@ static void __vmx_deliver_posted_interrupt(struct vcpu *v) > > > { > > > unsigned int cpu = v->processor; > > > > > > - if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu)) > > > + if ( !test_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu)) > > > && (cpu != smp_processor_id()) ) > > > send_IPI_mask(cpumask_of(cpu), posted_intr_vector); > > > } > > > > While I don't think I fully understand your description, the line you > > change here has always been puzzling me: If we were to raise a > > softirq here, we ought to call cpu_raise_softirq() instead of partly > > open coding what it does. > > We require posted_intr_vector for target CPU to ack/deliver virtual > interrupt in non-root mode. cpu_raise_softirq uses a different vector, > which cannot trigger such effect. > > > So I think not marking that softirq > > pending (but doing this incompletely) is a valid change in any case. > > But I'll have to defer to Kevin in the hopes that he fully > > understands what you explain above as well as him knowing why > > this was a test-and-set here in the first place. > > > > I agree we have a misuse of softirq mechanism here. If guest > is already in non-root mode, the 1st posted interrupt will be directly > delivered to guest (leaving softirq being set w/o actually incurring a > VM-exit - breaking desired softirq behavior). Then further posted > interrupts will skip the IPI, stay in PIR and not noted until another > VM-exit happens. Looks Quan observes such delay of delivery in > his experiments. > > I'm OK to remove the set here. Actually since it's an optimization > for less IPIs, we'd better check softirq_pending(cpu) directly > instead of sticking to one bit only. > sent too fast... Quan, can you work out a patch following this suggestion and see whether your slow-delivery issue is solved? (hope I understand your issue correctly here). Thanks Kevin
On February 13, 2017 4:24 PM, Tian, Kevin wrote: >> From: Tian, Kevin >> Sent: Monday, February 13, 2017 4:21 PM >> >> > From: Jan Beulich [mailto:JBeulich@suse.com] >> > Sent: Wednesday, February 08, 2017 4:52 PM >> > >> > >>> On 08.02.17 at 09:27, <xuquan8@huawei.com> wrote: >> > > Assumed vCPU is in guest_mode.. >> > > When apicv is enabled, hypervisor calls vmx_deliver_posted_intr(), >> > > then >> > > __vmx_deliver_posted_interrupt() to deliver interrupt, but no >> > > vmexit (also no >> > > vcpu_kick() ).. >> > > In __vmx_deliver_posted_interrupt(), it is __conditional__ to >> > > deliver posted interrupt. if posted interrupt is not delivered, >> > > the posted interrupt is pending until next VM entry -- by PIR to vIRR.. >> > > >> > > one condition is : >> > > In __vmx_deliver_posted_interrupt(), ' if ( >> > > !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))' .. >> > > >> > > Specifically, we did verify it by RES interrupt, which is used for >> > > smp_reschedule_interrupt.. >> > > We even cost more time to deliver RES interrupt than no-apicv in >average.. >> > > >> > > If RES interrupt (no. 1) is delivered by posted way (the vcpu is >> > > still guest_mode).. when tries to deliver next-coming RES >> > > interrupt (no. 2) by posted way, The next-coming RES interrupt >> > > (no. 2) is not delivered, as we set the VCPU_KICK_SOFTIRQ bit when >> > > we deliver RES interrupt (no. 1).. >> > > >> > > Then the next-coming RES interrupt (no. 2) is pending until next >> > > VM entry -- by PIR to vIRR.. >> > > >> > > >> > > We can fix it as below(I don't think this is a best one, it is >> > > better to set the VCPU_KICK_SOFTIRQ bit, but not test it): >> > > >> > > --- a/xen/arch/x86/hvm/vmx/vmx.c >> > > +++ b/xen/arch/x86/hvm/vmx/vmx.c >> > > @@ -1846,7 +1846,7 @@ static void >__vmx_deliver_posted_interrupt(struct vcpu *v) >> > > { >> > > unsigned int cpu = v->processor; >> > > >> > > - if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ, >&softirq_pending(cpu)) >> > > + if ( !test_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu)) >> > > && (cpu != smp_processor_id()) ) >> > > send_IPI_mask(cpumask_of(cpu), posted_intr_vector); >> > > } >> > >> > While I don't think I fully understand your description, the line >> > you change here has always been puzzling me: If we were to raise a >> > softirq here, we ought to call cpu_raise_softirq() instead of partly >> > open coding what it does. >> >> We require posted_intr_vector for target CPU to ack/deliver virtual >> interrupt in non-root mode. cpu_raise_softirq uses a different vector, >> which cannot trigger such effect. >> >> > So I think not marking that softirq >> > pending (but doing this incompletely) is a valid change in any case. >> > But I'll have to defer to Kevin in the hopes that he fully >> > understands what you explain above as well as him knowing why this >> > was a test-and-set here in the first place. >> > >> >> I agree we have a misuse of softirq mechanism here. If guest is >> already in non-root mode, the 1st posted interrupt will be directly >> delivered to guest (leaving softirq being set w/o actually incurring a >> VM-exit - breaking desired softirq behavior). Then further posted >> interrupts will skip the IPI, stay in PIR and not noted until another >> VM-exit happens. Looks Quan observes such delay of delivery in his >> experiments. >> >> I'm OK to remove the set here. Actually since it's an optimization for >> less IPIs, we'd better check softirq_pending(cpu) directly instead of >> sticking to one bit only. >> > >sent too fast... Quan, can you work out a patch following this suggestion >and see whether your slow-delivery issue is solved? >(hope I understand your issue correctly here). > Cool, Very Correct!! Sure, I will send out a patch in coming days.. Quan
On February 13, 2017 4:21 PM, Tian, Kevin wrote: >> From: Jan Beulich [mailto:JBeulich@suse.com] >> Sent: Wednesday, February 08, 2017 4:52 PM >> >> >>> On 08.02.17 at 09:27, <xuquan8@huawei.com> wrote: >> > Assumed vCPU is in guest_mode.. >> > When apicv is enabled, hypervisor calls vmx_deliver_posted_intr(), >> > then >> > __vmx_deliver_posted_interrupt() to deliver interrupt, but no vmexit >> > (also no >> > vcpu_kick() ).. >> > In __vmx_deliver_posted_interrupt(), it is __conditional__ to >> > deliver posted interrupt. if posted interrupt is not delivered, the >> > posted interrupt is pending until next VM entry -- by PIR to vIRR.. >> > >> > one condition is : >> > In __vmx_deliver_posted_interrupt(), ' if ( >> > !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))' .. >> > >> > Specifically, we did verify it by RES interrupt, which is used for >> > smp_reschedule_interrupt.. >> > We even cost more time to deliver RES interrupt than no-apicv in >average.. >> > >> > If RES interrupt (no. 1) is delivered by posted way (the vcpu is >> > still guest_mode).. when tries to deliver next-coming RES interrupt >> > (no. 2) by posted way, The next-coming RES interrupt (no. 2) is not >> > delivered, as we set the VCPU_KICK_SOFTIRQ bit when we deliver RES >> > interrupt (no. 1).. >> > >> > Then the next-coming RES interrupt (no. 2) is pending until next VM >> > entry -- by PIR to vIRR.. >> > >> > >> > We can fix it as below(I don't think this is a best one, it is >> > better to set the VCPU_KICK_SOFTIRQ bit, but not test it): >> > >> > --- a/xen/arch/x86/hvm/vmx/vmx.c >> > +++ b/xen/arch/x86/hvm/vmx/vmx.c >> > @@ -1846,7 +1846,7 @@ static void >__vmx_deliver_posted_interrupt(struct vcpu *v) >> > { >> > unsigned int cpu = v->processor; >> > >> > - if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ, >&softirq_pending(cpu)) >> > + if ( !test_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu)) >> > && (cpu != smp_processor_id()) ) >> > send_IPI_mask(cpumask_of(cpu), posted_intr_vector); >> > } >> >> While I don't think I fully understand your description, the line you >> change here has always been puzzling me: If we were to raise a softirq >> here, we ought to call cpu_raise_softirq() instead of partly open >> coding what it does. > >We require posted_intr_vector for target CPU to ack/deliver virtual >interrupt in non-root mode. cpu_raise_softirq uses a different vector, which >cannot trigger such effect. > Kevin, I can't follow this 'to ack'.. As I understand, the posted_intr_vector is to call event_check_interrupt() [ or pi_notification_interrupt() ] to writes zero to the EOI register in the local APIC -- this dismisses the interrupt with the posted interrupt notification vector from the local APIC. What does this ack refer to? >> So I think not marking that softirq >> pending (but doing this incompletely) is a valid change in any case. >> But I'll have to defer to Kevin in the hopes that he fully understands >> what you explain above as well as him knowing why this was a >> test-and-set here in the first place. >> > >I agree we have a misuse of softirq mechanism here. If guest is already in >non-root mode, the 1st posted interrupt will be directly delivered to guest >(leaving softirq being set w/o actually incurring a VM-exit - breaking desired >softirq behavior). Then further posted interrupts will skip the IPI, stay in PIR >and not noted until another VM-exit happens. Looks Quan observes such >delay of delivery in his experiments. > >I'm OK to remove the set here. Actually since it's an optimization for less >IPIs, we'd better check softirq_pending(cpu) directly instead of sticking to >one bit only. > >Thanks >Kevin > > >
> From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com] > Sent: Monday, February 20, 2017 8:04 PM > > On February 13, 2017 4:21 PM, Tian, Kevin wrote: > >> From: Jan Beulich [mailto:JBeulich@suse.com] > >> Sent: Wednesday, February 08, 2017 4:52 PM > >> > >> >>> On 08.02.17 at 09:27, <xuquan8@huawei.com> wrote: > >> > Assumed vCPU is in guest_mode.. > >> > When apicv is enabled, hypervisor calls vmx_deliver_posted_intr(), > >> > then > >> > __vmx_deliver_posted_interrupt() to deliver interrupt, but no vmexit > >> > (also no > >> > vcpu_kick() ).. > >> > In __vmx_deliver_posted_interrupt(), it is __conditional__ to > >> > deliver posted interrupt. if posted interrupt is not delivered, the > >> > posted interrupt is pending until next VM entry -- by PIR to vIRR.. > >> > > >> > one condition is : > >> > In __vmx_deliver_posted_interrupt(), ' if ( > >> > !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))' .. > >> > > >> > Specifically, we did verify it by RES interrupt, which is used for > >> > smp_reschedule_interrupt.. > >> > We even cost more time to deliver RES interrupt than no-apicv in > >average.. > >> > > >> > If RES interrupt (no. 1) is delivered by posted way (the vcpu is > >> > still guest_mode).. when tries to deliver next-coming RES interrupt > >> > (no. 2) by posted way, The next-coming RES interrupt (no. 2) is not > >> > delivered, as we set the VCPU_KICK_SOFTIRQ bit when we deliver RES > >> > interrupt (no. 1).. > >> > > >> > Then the next-coming RES interrupt (no. 2) is pending until next VM > >> > entry -- by PIR to vIRR.. > >> > > >> > > >> > We can fix it as below(I don't think this is a best one, it is > >> > better to set the VCPU_KICK_SOFTIRQ bit, but not test it): > >> > > >> > --- a/xen/arch/x86/hvm/vmx/vmx.c > >> > +++ b/xen/arch/x86/hvm/vmx/vmx.c > >> > @@ -1846,7 +1846,7 @@ static void > >__vmx_deliver_posted_interrupt(struct vcpu *v) > >> > { > >> > unsigned int cpu = v->processor; > >> > > >> > - if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ, > >&softirq_pending(cpu)) > >> > + if ( !test_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu)) > >> > && (cpu != smp_processor_id()) ) > >> > send_IPI_mask(cpumask_of(cpu), posted_intr_vector); > >> > } > >> > >> While I don't think I fully understand your description, the line you > >> change here has always been puzzling me: If we were to raise a softirq > >> here, we ought to call cpu_raise_softirq() instead of partly open > >> coding what it does. > > > >We require posted_intr_vector for target CPU to ack/deliver virtual > >interrupt in non-root mode. cpu_raise_softirq uses a different vector, which > >cannot trigger such effect. > > > > > Kevin, > > I can't follow this 'to ack'.. > As I understand, the posted_intr_vector is to call event_check_interrupt() [ or > pi_notification_interrupt() ] to writes zero to the EOI register in the local APIC -- > this dismisses the interrupt with the posted interrupt notification vector from the local > APIC. > > What does this ack refer to? > Please look at SDM. 'ack' means evaluation of pending vIRRs when CPU is in non-root mode which results in direct virtual interrupt delivery w/o incurring VM-exit. Thanks Kevin
--- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -1846,7 +1846,7 @@ static void __vmx_deliver_posted_interrupt(struct vcpu *v) { unsigned int cpu = v->processor; - if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu)) + if ( !test_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu)) && (cpu != smp_processor_id()) ) send_IPI_mask(cpumask_of(cpu), posted_intr_vector); }