Message ID | E0A769A898ADB6449596C41F51EF62C6AA3E8E@SZXEMI506-MBX.china.huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
> From: Xuquan (Euler) [mailto:xuquan8@huawei.com] > Sent: Friday, September 09, 2016 11:02 AM > > On August 30, 2016 1:58 PM, Tian Kevin < kevin.tian@intel.com > wrote: > >> From: Xuquan (Euler) [mailto:xuquan8@huawei.com] > >> Sent: Friday, August 19, 2016 8:59 PM > >> > >> When Xen apicv is enabled, wall clock time is faster on Windows7-32 > >> guest with high payload (with 2vCPU, captured from xentrace, in high > >> payload, the count of IPI interrupt increases rapidly between these vCPUs). > >> > >> If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector > >> 0xd1) are both pending (index of bit set in VIRR), unfortunately, the > >> IPI intrrupt is high priority than periodic timer interrupt. Xen > >> updates IPI interrupt bit set in VIRR to guest interrupt status (RVI) > >> as a high priority and apicv (Virtual-Interrupt Delivery) delivers IPI > >> interrupt within VMX non-root operation without a VM exit. Within VMX > >> non-root operation, if periodic timer interrupt index of bit is set in > >> VIRR and highest, the apicv delivers periodic timer interrupt within VMX > >non-root operation as well. > >> > >> But in current code, if Xen doesn't update periodic timer interrupt > >> bit set in VIRR to guest interrupt status (RVI) directly, Xen is not > >> aware of this case to decrease the count (pending_intr_nr) of pending > >> periodic timer interrupt, then Xen will deliver a periodic timer > >> interrupt again. The guest receives more periodic timer interrupt. > >> > >> If the periodic timer interrut is delivered and not the highest > >> priority, make Xen be aware of this case to decrease the count of > >> pending periodic timer interrupt. > >> > >> Signed-off-by: Yifei Jiang <jiangyifei@huawei.com> > >> Signed-off-by: Rongguang He <herongguang.he@huawei.com> > >> Signed-off-by: Quan Xu <xuquan8@huawei.com> > >> --- > >> Why RFC: > >> 1. I am not quite sure for other cases, such as nested case. > >> 2. Within VMX non-root operation, an Asynchronous Enclave Exit (including > >external > >> interrupts, non-maskable interrupt system-management interrrupts, > >exceptions > >> and VM exit) may occur before delivery of a periodic timer interrupt, the > >periodic > >> timer interrupt may be lost when a coming periodic timer interrupt is > >delivered. > >> Actually, and so current code is. > >> --- > >> xen/arch/x86/hvm/vmx/intr.c | 16 +++++++++++++++- > >> 1 file changed, 15 insertions(+), 1 deletion(-) > >> > >> diff --git a/xen/arch/x86/hvm/vmx/intr.c b/xen/arch/x86/hvm/vmx/intr.c > >> index 8fca08c..d3a034e 100644 > >> --- a/xen/arch/x86/hvm/vmx/intr.c > >> +++ b/xen/arch/x86/hvm/vmx/intr.c > >> @@ -334,7 +334,21 @@ void vmx_intr_assist(void) > >> __vmwrite(EOI_EXIT_BITMAP(i), > >v->arch.hvm_vmx.eoi_exit_bitmap[i]); > >> } > >> > >> - pt_intr_post(v, intack); > >> + /* > >> + * If the periodic timer interrut is delivered and not the highest > >priority, > >> + * make Xen be aware of this case to decrease the count of pending > >periodic > >> + * timer interrupt. > >> + */ > >> + if ( pt_vector != -1 && intack.vector > pt_vector ) > >> + { > >> + struct hvm_intack pt_intack; > >> + > >> + pt_intack.vector = pt_vector; > >> + pt_intack.source = hvm_intsrc_lapic; > >> + pt_intr_post(v, pt_intack); > >> + } > >> + else > >> + pt_intr_post(v, intack); > >> } > > > >Here is my thought. w/o above patch one pending pt interrupt may result in > >multiple injections of guest timer interrupt, as you identified, when pt_vector > >happens to be not the highest pending vector. Then w/ your patch, multiple > >pending pt interrupt instances may be combined as one injection of guest timer > >interrupt. Especially intack.vector > >>pt_vector doesn't mean pt_intr is eligible for injection, which might > >be blocked due to other reasons. As Jan pointed out, this path is very fragile, so > >better we can have a fix to sustain the original behavior with one pending pt > >instance causing one actual injection. > > > >What about doing pt_intr_post in EOI-induced VM-exit instead? EOI exit bit for > >pt_intr is already set to 1 which means every injection would incur an > >EOI-induced VM-exit: > > > > /* > > * Set eoi_exit_bitmap for periodic timer interrup to cause EOI-induced > >VM > > * exit, then pending periodic time interrups have the chance to be > >injected > > * for compensation > > */ > > if (pt_vector != -1) > > vmx_set_eoi_exit_bitmap(v, pt_vector); > > > >I don't think delaying post makes much difference here. Even we do post earlier > >like your patch, further pending instances have to wait until current instance is > >completed. So as long as post happens before EOI is completed, it should be > >fine. > > Kevin, I verify your suggestion with below modification. wall clock time is __still__ faster. > I hope this modification is correct to your suggestion. > > I __guess__ that the vm-entry is more frequent than PT interrupt, > Specifically, if there is a PT interrupt pending, the count (pending_intr_nr) is 1.. > > before next PT interrupt coming, each PT interrupt injection may not incur an EOI-induced > VM-exit directly, multiple PT interrupt inject for a pending PT interrupt, > then multiple EOI-induced VM-exit incur with multiple pt_intr_post() to decrease the count > (pending_intr_nr), but the count (pending_intr_nr) is still 1 before next PT interrupt coming, > then only one pt_intr_post() is effective.. I don't quite understand your description here, but just for your patch see below... > > Thanks for your time!! > Quan > > ======= modification ======== > > diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c > index 1d5d287..cc247c3 100644 > --- a/xen/arch/x86/hvm/vlapic.c > +++ b/xen/arch/x86/hvm/vlapic.c > @@ -433,6 +433,11 @@ void vlapic_EOI_set(struct vlapic *vlapic) > void vlapic_handle_EOI(struct vlapic *vlapic, u8 vector) > { > struct domain *d = vlapic_domain(vlapic); > + struct hvm_intack pt_intack; > + > + pt_intack.vector = vector; > + pt_intack.source = hvm_intsrc_lapic; > + pt_intr_post(vlapic_vcpu(vlapic), pt_intack); > > if ( vlapic_test_and_clear_vector(vector, &vlapic->regs->data[APIC_TMR]) ) > vioapic_update_EOI(d, vector); > diff --git a/xen/arch/x86/hvm/vmx/intr.c b/xen/arch/x86/hvm/vmx/intr.c > index 8fca08c..29d9bbf 100644 > --- a/xen/arch/x86/hvm/vmx/intr.c > +++ b/xen/arch/x86/hvm/vmx/intr.c > @@ -333,8 +333,6 @@ void vmx_intr_assist(void) > clear_bit(i, &v->arch.hvm_vmx.eoi_exitmap_changed); > __vmwrite(EOI_EXIT_BITMAP(i), v->arch.hvm_vmx.eoi_exit_bitmap[i]); > } > - > - pt_intr_post(v, intack); > } > else > { > Because we update pt irq in every vmentry, there is a chance that already-injected instance (before EOI-induced exit happens) will incur another pending IRR setting if there is a VM-exit happens between HW virtual interrupt injection (vIRR->0, vISR->1) and EOI-induced exit (vISR->0), since pt_intr_post hasn't been invoked yet. I guess this is the reason why you still see faster wallclock. I think you need mark this pending_intr_post situation explicitly. Then pt_update_irq should skip such pt timer when pending_intr_post of that timer is true (otherwise the update is meaningless since previous one hasn't been posted yet). Then with your change to post in EOI-induced exit handler, it should work correctly to meet the goal - one virtual interrupt delivery for one pending pt intr... Thanks Kevin
On September 12, 2016 3:58 PM, Tian, Kevin <kevin.tian@intel.com> wrote: >> From: Xuquan (Euler) [mailto:xuquan8@huawei.com] >> Sent: Friday, September 09, 2016 11:02 AM >> >> On August 30, 2016 1:58 PM, Tian Kevin < kevin.tian@intel.com > wrote: >> >> From: Xuquan (Euler) [mailto:xuquan8@huawei.com] >> >> Sent: Friday, August 19, 2016 8:59 PM >> diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c >> index 1d5d287..cc247c3 100644 >> --- a/xen/arch/x86/hvm/vlapic.c >> +++ b/xen/arch/x86/hvm/vlapic.c >> @@ -433,6 +433,11 @@ void vlapic_EOI_set(struct vlapic *vlapic) void >> vlapic_handle_EOI(struct vlapic *vlapic, u8 vector) { >> struct domain *d = vlapic_domain(vlapic); >> + struct hvm_intack pt_intack; >> + >> + pt_intack.vector = vector; >> + pt_intack.source = hvm_intsrc_lapic; >> + pt_intr_post(vlapic_vcpu(vlapic), pt_intack); >> >> if ( vlapic_test_and_clear_vector(vector, >&vlapic->regs->data[APIC_TMR]) ) >> vioapic_update_EOI(d, vector); diff --git >> a/xen/arch/x86/hvm/vmx/intr.c b/xen/arch/x86/hvm/vmx/intr.c index >> 8fca08c..29d9bbf 100644 >> --- a/xen/arch/x86/hvm/vmx/intr.c >> +++ b/xen/arch/x86/hvm/vmx/intr.c >> @@ -333,8 +333,6 @@ void vmx_intr_assist(void) >> clear_bit(i, &v->arch.hvm_vmx.eoi_exitmap_changed); >> __vmwrite(EOI_EXIT_BITMAP(i), >v->arch.hvm_vmx.eoi_exit_bitmap[i]); >> } >> - >> - pt_intr_post(v, intack); >> } >> else >> { >> > >Because we update pt irq in every vmentry, there is a chance that >already-injected instance (before EOI-induced exit happens) will incur another >pending IRR setting if there is a VM-exit happens between HW virtual interrupt >injection (vIRR->0, vISR->1) and EOI-induced exit (vISR->0), since pt_intr_post >hasn't been invoked yet. I guess this is the reason why you still see faster >wallclock. > Agreed. A good description. My bad description is from another aspect. >I think you need mark this pending_intr_post situation explicitly. >Then pt_update_irq should skip such pt timer when pending_intr_post of that >timer is true (otherwise the update is meaningless since previous one hasn't >been posted yet). Then with your change to post in EOI-induced exit handler, it >should work correctly to meet the goal >- one virtual interrupt delivery for one pending pt intr... > I think we are at least on the right track. But I can't follow ' pending_intr_post ', a new parameter? Thanks. Quan
> From: Xuquan (Euler) [mailto:xuquan8@huawei.com] > Sent: Monday, September 12, 2016 5:08 PM > > On September 12, 2016 3:58 PM, Tian, Kevin <kevin.tian@intel.com> wrote: > >> From: Xuquan (Euler) [mailto:xuquan8@huawei.com] > >> Sent: Friday, September 09, 2016 11:02 AM > >> > >> On August 30, 2016 1:58 PM, Tian Kevin < kevin.tian@intel.com > wrote: > >> >> From: Xuquan (Euler) [mailto:xuquan8@huawei.com] > >> >> Sent: Friday, August 19, 2016 8:59 PM > >> diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c > >> index 1d5d287..cc247c3 100644 > >> --- a/xen/arch/x86/hvm/vlapic.c > >> +++ b/xen/arch/x86/hvm/vlapic.c > >> @@ -433,6 +433,11 @@ void vlapic_EOI_set(struct vlapic *vlapic) void > >> vlapic_handle_EOI(struct vlapic *vlapic, u8 vector) { > >> struct domain *d = vlapic_domain(vlapic); > >> + struct hvm_intack pt_intack; > >> + > >> + pt_intack.vector = vector; > >> + pt_intack.source = hvm_intsrc_lapic; > >> + pt_intr_post(vlapic_vcpu(vlapic), pt_intack); > >> > >> if ( vlapic_test_and_clear_vector(vector, > >&vlapic->regs->data[APIC_TMR]) ) > >> vioapic_update_EOI(d, vector); diff --git > >> a/xen/arch/x86/hvm/vmx/intr.c b/xen/arch/x86/hvm/vmx/intr.c index > >> 8fca08c..29d9bbf 100644 > >> --- a/xen/arch/x86/hvm/vmx/intr.c > >> +++ b/xen/arch/x86/hvm/vmx/intr.c > >> @@ -333,8 +333,6 @@ void vmx_intr_assist(void) > >> clear_bit(i, &v->arch.hvm_vmx.eoi_exitmap_changed); > >> __vmwrite(EOI_EXIT_BITMAP(i), > >v->arch.hvm_vmx.eoi_exit_bitmap[i]); > >> } > >> - > >> - pt_intr_post(v, intack); > >> } > >> else > >> { > >> > > > >Because we update pt irq in every vmentry, there is a chance that > >already-injected instance (before EOI-induced exit happens) will incur another > >pending IRR setting if there is a VM-exit happens between HW virtual interrupt > >injection (vIRR->0, vISR->1) and EOI-induced exit (vISR->0), since pt_intr_post > >hasn't been invoked yet. I guess this is the reason why you still see faster > >wallclock. > > > > Agreed. A good description. My bad description is from another aspect. > > >I think you need mark this pending_intr_post situation explicitly. > >Then pt_update_irq should skip such pt timer when pending_intr_post of that > >timer is true (otherwise the update is meaningless since previous one hasn't > >been posted yet). Then with your change to post in EOI-induced exit handler, it > >should work correctly to meet the goal > >- one virtual interrupt delivery for one pending pt intr... > > > I think we are at least on the right track. > But I can't follow ' pending_intr_post ', a new parameter? Thanks. > > yes, a new parameter to record whether a intr_post operation is pending Thanks Kevin
======= modification ======== diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c index 1d5d287..cc247c3 100644 --- a/xen/arch/x86/hvm/vlapic.c +++ b/xen/arch/x86/hvm/vlapic.c @@ -433,6 +433,11 @@ void vlapic_EOI_set(struct vlapic *vlapic) void vlapic_handle_EOI(struct vlapic *vlapic, u8 vector) { struct domain *d = vlapic_domain(vlapic); + struct hvm_intack pt_intack; + + pt_intack.vector = vector; + pt_intack.source = hvm_intsrc_lapic; + pt_intr_post(vlapic_vcpu(vlapic), pt_intack); if ( vlapic_test_and_clear_vector(vector, &vlapic->regs->data[APIC_TMR]) ) vioapic_update_EOI(d, vector); diff --git a/xen/arch/x86/hvm/vmx/intr.c b/xen/arch/x86/hvm/vmx/intr.c index 8fca08c..29d9bbf 100644 --- a/xen/arch/x86/hvm/vmx/intr.c +++ b/xen/arch/x86/hvm/vmx/intr.c @@ -333,8 +333,6 @@ void vmx_intr_assist(void) clear_bit(i, &v->arch.hvm_vmx.eoi_exitmap_changed); __vmwrite(EOI_EXIT_BITMAP(i), v->arch.hvm_vmx.eoi_exit_bitmap[i]); } - - pt_intr_post(v, intack); } else {