Message ID | E0A769A898ADB6449596C41F51EF62C6AB4577@SZXEMI506-MBS.china.huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
>>> On 20.09.16 at 15:30, <xuquan8@huawei.com> wrote: > --- a/xen/arch/x86/hvm/vlapic.c > +++ b/xen/arch/x86/hvm/vlapic.c > @@ -433,6 +433,12 @@ void vlapic_EOI_set(struct vlapic *vlapic) > void vlapic_handle_EOI(struct vlapic *vlapic, u8 vector) > { > struct domain *d = vlapic_domain(vlapic); > + struct vcpu *v = vlapic_vcpu(vlapic); > + struct hvm_intack pt_intack; > + > + pt_intack.vector = vector; > + pt_intack.source = hvm_intsrc_lapic; > + pt_intr_post(v, pt_intack); This also sits on the EOI LAPIC register write path, i.e. the change then also affects non-apicv environments. Furthermore - don't we have the same problem as with v1 again then? What prevents multiple EOIs to come here before the timer interrupt actually gets handled? You'd then clear ->irq_issued each time, rendering your change to pt_update_irq() ineffective. In any event please use hvm_intack_lapic() instead of open coding it (and then I don't think you need a local variable at all). > --- a/xen/arch/x86/hvm/vmx/intr.c > +++ b/xen/arch/x86/hvm/vmx/intr.c > @@ -333,8 +333,6 @@ void vmx_intr_assist(void) > clear_bit(i, &v->arch.hvm_vmx.eoi_exitmap_changed); > __vmwrite(EOI_EXIT_BITMAP(i), v->arch.hvm_vmx.eoi_exit_bitmap[i]); > } > - > - pt_intr_post(v, intack); > } I'll defer to the VMX maintainers to determine whether removing this but not the other one is correct. > --- a/xen/arch/x86/hvm/vpt.c > +++ b/xen/arch/x86/hvm/vpt.c > @@ -252,7 +252,8 @@ int pt_update_irq(struct vcpu *v) > } > else > { > - if ( (pt->last_plt_gtime + pt->period) < max_lag ) > + if ( (pt->last_plt_gtime + pt->period) < max_lag && > + !pt->irq_issued ) > { > max_lag = pt->last_plt_gtime + pt->period; > earliest_pt = pt; Looking at the rest of the code I really wonder why this check hadn't been there from the beginning. Tim, do recall whether this was intentional (as opposed to simply having been an oversight)? Jan
> From: Jan Beulich [mailto:JBeulich@suse.com] > Sent: Friday, September 23, 2016 11:34 PM > > >>> On 20.09.16 at 15:30, <xuquan8@huawei.com> wrote: > > --- a/xen/arch/x86/hvm/vlapic.c > > +++ b/xen/arch/x86/hvm/vlapic.c > > @@ -433,6 +433,12 @@ void vlapic_EOI_set(struct vlapic *vlapic) > > void vlapic_handle_EOI(struct vlapic *vlapic, u8 vector) > > { > > struct domain *d = vlapic_domain(vlapic); > > + struct vcpu *v = vlapic_vcpu(vlapic); > > + struct hvm_intack pt_intack; > > + > > + pt_intack.vector = vector; > > + pt_intack.source = hvm_intsrc_lapic; > > + pt_intr_post(v, pt_intack); > > This also sits on the EOI LAPIC register write path, i.e. the change > then also affects non-apicv environments. The new logic should be entered only when EOI-induced exit happens. > > Furthermore - don't we have the same problem as with v1 again > then? What prevents multiple EOIs to come here before the timer > interrupt actually gets handled? You'd then clear ->irq_issued > each time, rendering your change to pt_update_irq() ineffective. based on this patch, one irq_issued should cause only one EOI on related pt vector and this EOI exit clears irq_issued then next EOI would be seen only upon another injection when irq_issued is set again... However there might be an issue if this pt vector is shared with another device interrupt, which although is not a typical case... > > In any event please use hvm_intack_lapic() instead of open > coding it (and then I don't think you need a local variable at all). > > > --- a/xen/arch/x86/hvm/vmx/intr.c > > +++ b/xen/arch/x86/hvm/vmx/intr.c > > @@ -333,8 +333,6 @@ void vmx_intr_assist(void) > > clear_bit(i, &v->arch.hvm_vmx.eoi_exitmap_changed); > > __vmwrite(EOI_EXIT_BITMAP(i), v->arch.hvm_vmx.eoi_exit_bitmap[i]); > > } > > - > > - pt_intr_post(v, intack); > > } > > I'll defer to the VMX maintainers to determine whether removing this > but not the other one is correct. This is correct. The other one is for non-apicv scenario. > > > --- a/xen/arch/x86/hvm/vpt.c > > +++ b/xen/arch/x86/hvm/vpt.c > > @@ -252,7 +252,8 @@ int pt_update_irq(struct vcpu *v) > > } > > else > > { > > - if ( (pt->last_plt_gtime + pt->period) < max_lag ) > > + if ( (pt->last_plt_gtime + pt->period) < max_lag && > > + !pt->irq_issued ) > > { > > max_lag = pt->last_plt_gtime + pt->period; > > earliest_pt = pt; > > Looking at the rest of the code I really wonder why this check > hadn't been there from the beginning. Tim, do recall whether > this was intentional (as opposed to simply having been an > oversight)? > Possibly simplify the emulation by relying on IRR/ISR to handling pending situation? Thanks Kevin
On September 24, 2016 7:34 AM, Tian Kevin < kevin.tian@intel.com > wrote: >> From: Jan Beulich [mailto:JBeulich@suse.com] >> Sent: Friday, September 23, 2016 11:34 PM >> >> >>> On 20.09.16 at 15:30, <xuquan8@huawei.com> wrote: >> > --- a/xen/arch/x86/hvm/vlapic.c >> > +++ b/xen/arch/x86/hvm/vlapic.c >> > @@ -433,6 +433,12 @@ void vlapic_EOI_set(struct vlapic *vlapic) >> > void vlapic_handle_EOI(struct vlapic *vlapic, u8 vector) { >> > struct domain *d = vlapic_domain(vlapic); >> > + struct vcpu *v = vlapic_vcpu(vlapic); >> > + struct hvm_intack pt_intack; >> > + >> > + pt_intack.vector = vector; >> > + pt_intack.source = hvm_intsrc_lapic; >> > + pt_intr_post(v, pt_intack); >> >> This also sits on the EOI LAPIC register write path, i.e. the change >> then also affects non-apicv environments. > >The new logic should be entered only when EOI-induced exit happens. > Yes, more that the EOI-induced exit is conditional, specifically, the bitmap is set by vmx_set_eoi_exit_bitmap(). Jan, what do you think? While I recall from v1 discussion, you have the same comment. We can dig it deep.. >> >> Furthermore - don't we have the same problem as with v1 again then? >> What prevents multiple EOIs to come here before the timer interrupt >> actually gets handled? You'd then clear ->irq_issued each time, >> rendering your change to pt_update_irq() ineffective. > >based on this patch, one irq_issued should cause only one EOI on related pt >vector and this EOI exit clears irq_issued then next EOI would be seen only upon >another injection when irq_issued is set again... However there might be an >issue if this pt vector is shared with another device interrupt, which although is >not a typical case... > Agreed. >> >> In any event please use hvm_intack_lapic() instead of open coding it >> (and then I don't think you need a local variable at all). >> Indeed. >> > --- a/xen/arch/x86/hvm/vmx/intr.c >> > +++ b/xen/arch/x86/hvm/vmx/intr.c >> > @@ -333,8 +333,6 @@ void vmx_intr_assist(void) > >> > clear_bit(i, &v->arch.hvm_vmx.eoi_exitmap_changed); >> > __vmwrite(EOI_EXIT_BITMAP(i), >v->arch.hvm_vmx.eoi_exit_bitmap[i]); >> > } >> > - >> > - pt_intr_post(v, intack); >> > } >> >> I'll defer to the VMX maintainers to determine whether removing this >> but not the other one is correct. > >This is correct. The other one is for non-apicv scenario. > >> >> > --- a/xen/arch/x86/hvm/vpt.c >> > +++ b/xen/arch/x86/hvm/vpt.c >> > @@ -252,7 +252,8 @@ int pt_update_irq(struct vcpu *v) >> > } >> > else >> > { >> > - if ( (pt->last_plt_gtime + pt->period) < max_lag ) >> > + if ( (pt->last_plt_gtime + pt->period) < max_lag && >> > + !pt->irq_issued ) >> > { >> > max_lag = pt->last_plt_gtime + pt->period; >> > earliest_pt = pt; >> >> Looking at the rest of the code I really wonder why this check hadn't >> been there from the beginning. Tim, do recall whether this was >> intentional (as opposed to simply having been an oversight)? >> > >Possibly simplify the emulation by relying on IRR/ISR to handling pending >situation? I think IRR is enough. But there is a challenge that the pt interrupt is __not_only__ handled by LAPIC ( see the bottom of pt_update_irq() ).. Quan
>>> On 24.09.16 at 01:34, <kevin.tian@intel.com> wrote: >> From: Jan Beulich [mailto:JBeulich@suse.com] >> Sent: Friday, September 23, 2016 11:34 PM >> >> >>> On 20.09.16 at 15:30, <xuquan8@huawei.com> wrote: >> > --- a/xen/arch/x86/hvm/vlapic.c >> > +++ b/xen/arch/x86/hvm/vlapic.c >> > @@ -433,6 +433,12 @@ void vlapic_EOI_set(struct vlapic *vlapic) >> > void vlapic_handle_EOI(struct vlapic *vlapic, u8 vector) >> > { >> > struct domain *d = vlapic_domain(vlapic); >> > + struct vcpu *v = vlapic_vcpu(vlapic); >> > + struct hvm_intack pt_intack; >> > + >> > + pt_intack.vector = vector; >> > + pt_intack.source = hvm_intsrc_lapic; >> > + pt_intr_post(v, pt_intack); >> >> This also sits on the EOI LAPIC register write path, i.e. the change >> then also affects non-apicv environments. > > The new logic should be entered only when EOI-induced exit happens. To me this reply reads ambiguously: Does "should" here mean the patch needs further adjustment in your opinion, or that you think the patch already does what is needed to ensure the new logic to bet involved only in the EOI-induced exit case. To me it continues to look like the former. >> Furthermore - don't we have the same problem as with v1 again >> then? What prevents multiple EOIs to come here before the timer >> interrupt actually gets handled? You'd then clear ->irq_issued >> each time, rendering your change to pt_update_irq() ineffective. > > based on this patch, one irq_issued should cause only one EOI on > related pt vector and this EOI exit clears irq_issued then next EOI > would be seen only upon another injection when irq_issued is set > again... However there might be an issue if this pt vector is shared > with another device interrupt, which although is not a typical case... That's a common problem: I don't think we can consider only the typical case. Or does hardware only deal with those, too? And then the "should" here reads as ambiguously to me as the earlier one, the more that you seem to consider EOIs for only the one vector of interest, whereas my reply was specifically meant to cover also EOIs for other vectors. >> > --- a/xen/arch/x86/hvm/vmx/intr.c >> > +++ b/xen/arch/x86/hvm/vmx/intr.c >> > @@ -333,8 +333,6 @@ void vmx_intr_assist(void) >> > clear_bit(i, &v->arch.hvm_vmx.eoi_exitmap_changed); >> > __vmwrite(EOI_EXIT_BITMAP(i), v->arch.hvm_vmx.eoi_exit_bitmap[i]); >> > } >> > - >> > - pt_intr_post(v, intack); >> > } >> >> I'll defer to the VMX maintainers to determine whether removing this >> but not the other one is correct. > > This is correct. The other one is for non-apicv scenario. Sure, but the other path will also get used under certain conditions even when apicv is in use. >> > --- a/xen/arch/x86/hvm/vpt.c >> > +++ b/xen/arch/x86/hvm/vpt.c >> > @@ -252,7 +252,8 @@ int pt_update_irq(struct vcpu *v) >> > } >> > else >> > { >> > - if ( (pt->last_plt_gtime + pt->period) < max_lag ) >> > + if ( (pt->last_plt_gtime + pt->period) < max_lag && >> > + !pt->irq_issued ) >> > { >> > max_lag = pt->last_plt_gtime + pt->period; >> > earliest_pt = pt; >> >> Looking at the rest of the code I really wonder why this check >> hadn't been there from the beginning. Tim, do recall whether >> this was intentional (as opposed to simply having been an >> oversight)? > > Possibly simplify the emulation by relying on IRR/ISR to handling pending > situation? Again I'm suffering ambiguity here: Do you suggest a possible explanation for why things are the way they are, or a possible code adjustment to be done? Jan
>>> On 24.09.16 at 03:06, <xuquan8@huawei.com> wrote: > On September 24, 2016 7:34 AM, Tian Kevin < kevin.tian@intel.com > wrote: >>> From: Jan Beulich [mailto:JBeulich@suse.com] >>> Sent: Friday, September 23, 2016 11:34 PM >>> >>> >>> On 20.09.16 at 15:30, <xuquan8@huawei.com> wrote: >>> > --- a/xen/arch/x86/hvm/vlapic.c >>> > +++ b/xen/arch/x86/hvm/vlapic.c >>> > @@ -433,6 +433,12 @@ void vlapic_EOI_set(struct vlapic *vlapic) >>> > void vlapic_handle_EOI(struct vlapic *vlapic, u8 vector) { >>> > struct domain *d = vlapic_domain(vlapic); >>> > + struct vcpu *v = vlapic_vcpu(vlapic); >>> > + struct hvm_intack pt_intack; >>> > + >>> > + pt_intack.vector = vector; >>> > + pt_intack.source = hvm_intsrc_lapic; >>> > + pt_intr_post(v, pt_intack); >>> >>> This also sits on the EOI LAPIC register write path, i.e. the change >>> then also affects non-apicv environments. >> >>The new logic should be entered only when EOI-induced exit happens. >> > > Yes, more that the EOI-induced exit is conditional, specifically, the bitmap > is set by vmx_set_eoi_exit_bitmap(). > Jan, what do you think? While I recall from v1 discussion, you have the same > comment. We can dig it deep.. See my reply to Kevin sent a minute ago. As I'm not sure what Kevin means to state with several of his responses, I can't properly respond for now. And then what you say doesn't really address my concern - things being conditional elsewhere doesn't mean we won't get here too in the non-apicv case, at least not in a way that I can follow right away. Jan
> From: Jan Beulich [mailto:JBeulich@suse.com] > Sent: Monday, September 26, 2016 2:35 PM > > >>> On 24.09.16 at 01:34, <kevin.tian@intel.com> wrote: > >> From: Jan Beulich [mailto:JBeulich@suse.com] > >> Sent: Friday, September 23, 2016 11:34 PM > >> > >> >>> On 20.09.16 at 15:30, <xuquan8@huawei.com> wrote: > >> > --- a/xen/arch/x86/hvm/vlapic.c > >> > +++ b/xen/arch/x86/hvm/vlapic.c > >> > @@ -433,6 +433,12 @@ void vlapic_EOI_set(struct vlapic *vlapic) > >> > void vlapic_handle_EOI(struct vlapic *vlapic, u8 vector) > >> > { > >> > struct domain *d = vlapic_domain(vlapic); > >> > + struct vcpu *v = vlapic_vcpu(vlapic); > >> > + struct hvm_intack pt_intack; > >> > + > >> > + pt_intack.vector = vector; > >> > + pt_intack.source = hvm_intsrc_lapic; > >> > + pt_intr_post(v, pt_intack); > >> > >> This also sits on the EOI LAPIC register write path, i.e. the change > >> then also affects non-apicv environments. > > > > The new logic should be entered only when EOI-induced exit happens. > > To me this reply reads ambiguously: Does "should" here mean the > patch needs further adjustment in your opinion, or that you think > the patch already does what is needed to ensure the new logic to > bet involved only in the EOI-induced exit case. To me it continues > to look like the former. yes the former. Possibly clearer if I directly reply to original Quan's patch. :-) > > >> Furthermore - don't we have the same problem as with v1 again > >> then? What prevents multiple EOIs to come here before the timer > >> interrupt actually gets handled? You'd then clear ->irq_issued > >> each time, rendering your change to pt_update_irq() ineffective. > > > > based on this patch, one irq_issued should cause only one EOI on > > related pt vector and this EOI exit clears irq_issued then next EOI > > would be seen only upon another injection when irq_issued is set > > again... However there might be an issue if this pt vector is shared > > with another device interrupt, which although is not a typical case... > > That's a common problem: I don't think we can consider only the > typical case. Or does hardware only deal with those, too? need more thinking here. > > And then the "should" here reads as ambiguously to me as the > earlier one, the more that you seem to consider EOIs for only the > one vector of interest, whereas my reply was specifically meant > to cover also EOIs for other vectors. This "should" means the behavior after further adjustment > > >> > --- a/xen/arch/x86/hvm/vmx/intr.c > >> > +++ b/xen/arch/x86/hvm/vmx/intr.c > >> > @@ -333,8 +333,6 @@ void vmx_intr_assist(void) > >> > clear_bit(i, &v->arch.hvm_vmx.eoi_exitmap_changed); > >> > __vmwrite(EOI_EXIT_BITMAP(i), > v->arch.hvm_vmx.eoi_exit_bitmap[i]); > >> > } > >> > - > >> > - pt_intr_post(v, intack); > >> > } > >> > >> I'll defer to the VMX maintainers to determine whether removing this > >> but not the other one is correct. > > > > This is correct. The other one is for non-apicv scenario. > > Sure, but the other path will also get used under certain conditions > even when apicv is in use. > > >> > --- a/xen/arch/x86/hvm/vpt.c > >> > +++ b/xen/arch/x86/hvm/vpt.c > >> > @@ -252,7 +252,8 @@ int pt_update_irq(struct vcpu *v) > >> > } > >> > else > >> > { > >> > - if ( (pt->last_plt_gtime + pt->period) < max_lag ) > >> > + if ( (pt->last_plt_gtime + pt->period) < max_lag && > >> > + !pt->irq_issued ) > >> > { > >> > max_lag = pt->last_plt_gtime + pt->period; > >> > earliest_pt = pt; > >> > >> Looking at the rest of the code I really wonder why this check > >> hadn't been there from the beginning. Tim, do recall whether > >> this was intentional (as opposed to simply having been an > >> oversight)? > > > > Possibly simplify the emulation by relying on IRR/ISR to handling pending > > situation? > > Again I'm suffering ambiguity here: Do you suggest a possible > explanation for why things are the way they are, or a possible > code adjustment to be done? > the former. Thanks Kevin
On September 26, 2016 2:39 PM, Jan Beulich < JBeulich@suse.com > wrote: >>>> On 24.09.16 at 03:06, <xuquan8@huawei.com> wrote: >> On September 24, 2016 7:34 AM, Tian Kevin < kevin.tian@intel.com > wrote: >>>> From: Jan Beulich [mailto:JBeulich@suse.com] >>>> Sent: Friday, September 23, 2016 11:34 PM >>>> >>>> >>> On 20.09.16 at 15:30, <xuquan8@huawei.com> wrote: >>>> > --- a/xen/arch/x86/hvm/vlapic.c >>>> > +++ b/xen/arch/x86/hvm/vlapic.c >>>> > @@ -433,6 +433,12 @@ void vlapic_EOI_set(struct vlapic *vlapic) >>>> > void vlapic_handle_EOI(struct vlapic *vlapic, u8 vector) { >>>> > struct domain *d = vlapic_domain(vlapic); >>>> > + struct vcpu *v = vlapic_vcpu(vlapic); >>>> > + struct hvm_intack pt_intack; >>>> > + >>>> > + pt_intack.vector = vector; >>>> > + pt_intack.source = hvm_intsrc_lapic; >>>> > + pt_intr_post(v, pt_intack); >>>> >>>> This also sits on the EOI LAPIC register write path, i.e. the change >>>> then also affects non-apicv environments. >>> >>>The new logic should be entered only when EOI-induced exit happens. >>> >> >> Yes, more that the EOI-induced exit is conditional, specifically, the >> bitmap is set by vmx_set_eoi_exit_bitmap(). >> Jan, what do you think? While I recall from v1 discussion, you have >> the same comment. We can dig it deep.. > >See my reply to Kevin sent a minute ago. As I'm not sure what Kevin means to >state with several of his responses, I can't properly respond for now. And then >what you say doesn't really address my concern - things being conditional >elsewhere doesn't mean we won't get here too in the non-apicv case, at least >not in a way that I can follow right away. > Jan, any idea now? Quan
>>> On 10.10.16 at 11:21, <xuquan8@huawei.com> wrote: > On September 26, 2016 2:39 PM, Jan Beulich < JBeulich@suse.com > wrote: >>>>> On 24.09.16 at 03:06, <xuquan8@huawei.com> wrote: >>> On September 24, 2016 7:34 AM, Tian Kevin < kevin.tian@intel.com > wrote: >>>>> From: Jan Beulich [mailto:JBeulich@suse.com] >>>>> Sent: Friday, September 23, 2016 11:34 PM >>>>> >>>>> >>> On 20.09.16 at 15:30, <xuquan8@huawei.com> wrote: >>>>> > --- a/xen/arch/x86/hvm/vlapic.c >>>>> > +++ b/xen/arch/x86/hvm/vlapic.c >>>>> > @@ -433,6 +433,12 @@ void vlapic_EOI_set(struct vlapic *vlapic) >>>>> > void vlapic_handle_EOI(struct vlapic *vlapic, u8 vector) { >>>>> > struct domain *d = vlapic_domain(vlapic); >>>>> > + struct vcpu *v = vlapic_vcpu(vlapic); >>>>> > + struct hvm_intack pt_intack; >>>>> > + >>>>> > + pt_intack.vector = vector; >>>>> > + pt_intack.source = hvm_intsrc_lapic; >>>>> > + pt_intr_post(v, pt_intack); >>>>> >>>>> This also sits on the EOI LAPIC register write path, i.e. the change >>>>> then also affects non-apicv environments. >>>> >>>>The new logic should be entered only when EOI-induced exit happens. >>>> >>> >>> Yes, more that the EOI-induced exit is conditional, specifically, the >>> bitmap is set by vmx_set_eoi_exit_bitmap(). >>> Jan, what do you think? While I recall from v1 discussion, you have >>> the same comment. We can dig it deep.. >> >>See my reply to Kevin sent a minute ago. As I'm not sure what Kevin means to >>state with several of his responses, I can't properly respond for now. And then >>what you say doesn't really address my concern - things being conditional >>elsewhere doesn't mean we won't get here too in the non-apicv case, at least >>not in a way that I can follow right away. > > Jan, any idea now? I don't think there was anything left open on the other sub-thread; if there is, please point out specific aspects which are still unclear. Jan
On October 10, 2016 5:40 PM, Jan Beulich < JBeulich@suse.com > wrote: >>>>>> >>> On 20.09.16 at 15:30, <xuquan8@huawei.com> wrote: >>>>>> > --- a/xen/arch/x86/hvm/vlapic.c >>>>>> > +++ b/xen/arch/x86/hvm/vlapic.c >>>>>> > @@ -433,6 +433,12 @@ void vlapic_EOI_set(struct vlapic *vlapic) >>>>>> > void vlapic_handle_EOI(struct vlapic *vlapic, u8 vector) { >>>>>> > struct domain *d = vlapic_domain(vlapic); >>>>>> > + struct vcpu *v = vlapic_vcpu(vlapic); >>>>>> > + struct hvm_intack pt_intack; >>>>>> > + >>>>>> > + pt_intack.vector = vector; >>>>>> > + pt_intack.source = hvm_intsrc_lapic; >>>>>> > + pt_intr_post(v, pt_intack); >>>>>> >>>>>> This also sits on the EOI LAPIC register write path, i.e. the >>>>>> change then also affects non-apicv environments. >>>>> >>>>>The new logic should be entered only when EOI-induced exit happens. >>>>> >>>> >>>> Yes, more that the EOI-induced exit is conditional, specifically, >>>> the bitmap is set by vmx_set_eoi_exit_bitmap(). >>>> Jan, what do you think? While I recall from v1 discussion, you have >>>> the same comment. We can dig it deep.. >>> >>>See my reply to Kevin sent a minute ago. As I'm not sure what Kevin >>>means to state with several of his responses, I can't properly respond >>>for now. And then what you say doesn't really address my concern - >>>things being conditional elsewhere doesn't mean we won't get here too >>>in the non-apicv case, at least not in a way that I can follow right away. >> >> Jan, any idea now? > >I don't think there was anything left open on the other sub-thread; if there is, >please point out specific aspects which are still unclear. > Sorry, I overlooked the other sub-thread after holiday(10.1-10.7).. I will read it again.. Quan
> From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com] > Sent: Monday, October 10, 2016 6:49 PM > > On October 10, 2016 5:40 PM, Jan Beulich < JBeulich@suse.com > wrote: > >>>>>> >>> On 20.09.16 at 15:30, <xuquan8@huawei.com> wrote: > >>>>>> > --- a/xen/arch/x86/hvm/vlapic.c > >>>>>> > +++ b/xen/arch/x86/hvm/vlapic.c > >>>>>> > @@ -433,6 +433,12 @@ void vlapic_EOI_set(struct vlapic *vlapic) > >>>>>> > void vlapic_handle_EOI(struct vlapic *vlapic, u8 vector) { > >>>>>> > struct domain *d = vlapic_domain(vlapic); > >>>>>> > + struct vcpu *v = vlapic_vcpu(vlapic); > >>>>>> > + struct hvm_intack pt_intack; > >>>>>> > + > >>>>>> > + pt_intack.vector = vector; > >>>>>> > + pt_intack.source = hvm_intsrc_lapic; > >>>>>> > + pt_intr_post(v, pt_intack); > >>>>>> > >>>>>> This also sits on the EOI LAPIC register write path, i.e. the > >>>>>> change then also affects non-apicv environments. > >>>>> > >>>>>The new logic should be entered only when EOI-induced exit happens. > >>>>> > >>>> > >>>> Yes, more that the EOI-induced exit is conditional, specifically, > >>>> the bitmap is set by vmx_set_eoi_exit_bitmap(). > >>>> Jan, what do you think? While I recall from v1 discussion, you have > >>>> the same comment. We can dig it deep.. > >>> > >>>See my reply to Kevin sent a minute ago. As I'm not sure what Kevin > >>>means to state with several of his responses, I can't properly respond > >>>for now. And then what you say doesn't really address my concern - > >>>things being conditional elsewhere doesn't mean we won't get here too > >>>in the non-apicv case, at least not in a way that I can follow right away. > >> > >> Jan, any idea now? > > > >I don't think there was anything left open on the other sub-thread; if there is, > >please point out specific aspects which are still unclear. > > > > Sorry, I overlooked the other sub-thread after holiday(10.1-10.7).. > I will read it again.. > > Quan Is there any discussion after 10.1? I didn't see it. Back to the main open before holiday - multiple EOIs may come to clear irq_issued before guest actually handles the very vpt injection (possible if vpt vector is shared with other sources). I don't see a good solution on that open... :/ We've discussed various options which all fail in one or another place - either miss an injection, or incur undesired injections. Possibly we should consider another direction - fall back to non-apicv path when we see vpt vector pending but it's not the highest one. Original condition to enter virtual intr delivery: else if ( cpu_has_vmx_virtual_intr_delivery && intack.source != hvm_intsrc_pic && intack.source != hvm_intsrc_vector ) now new condition: else if ( cpu_has_vmx_virtual_intr_delivery && intack.source != hvm_intsrc_pic && intack.source != hvm_intsrc_vector && (pt_vector == -1 || intack.vector == pt_vector) ) Thoughts? Thanks Kevin
On October 11, 2016 3:49 PM, Tian, Kevin <Kevin.tian@intel.com> >> From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com] >> Sent: Monday, October 10, 2016 6:49 PM >> >> On October 10, 2016 5:40 PM, Jan Beulich < JBeulich@suse.com > wrote: >> >>>>>> >>> On 20.09.16 at 15:30, <xuquan8@huawei.com> wrote: >> >>>>>> > --- a/xen/arch/x86/hvm/vlapic.c >> >>>>>> > +++ b/xen/arch/x86/hvm/vlapic.c >> >>>>>> > @@ -433,6 +433,12 @@ void vlapic_EOI_set(struct vlapic >> >>>>>> > *vlapic) void vlapic_handle_EOI(struct vlapic *vlapic, u8 vector) { >> >>>>>> > struct domain *d = vlapic_domain(vlapic); >> >>>>>> > + struct vcpu *v = vlapic_vcpu(vlapic); >> >>>>>> > + struct hvm_intack pt_intack; >> >>>>>> > + >> >>>>>> > + pt_intack.vector = vector; >> >>>>>> > + pt_intack.source = hvm_intsrc_lapic; >> >>>>>> > + pt_intr_post(v, pt_intack); >> >>>>>> >> >>>>>> This also sits on the EOI LAPIC register write path, i.e. the >> >>>>>> change then also affects non-apicv environments. >> >>>>> >> >>>>>The new logic should be entered only when EOI-induced exit happens. >> >>>>> >> >>>> >> >>>> Yes, more that the EOI-induced exit is conditional, specifically, >> >>>> the bitmap is set by vmx_set_eoi_exit_bitmap(). >> >>>> Jan, what do you think? While I recall from v1 discussion, you >> >>>> have the same comment. We can dig it deep.. >> >>> >> >>>See my reply to Kevin sent a minute ago. As I'm not sure what Kevin >> >>>means to state with several of his responses, I can't properly >> >>>respond for now. And then what you say doesn't really address my >> >>>concern - things being conditional elsewhere doesn't mean we won't >> >>>get here too in the non-apicv case, at least not in a way that I can follow >right away. >> >> >> >> Jan, any idea now? >> > >> >I don't think there was anything left open on the other sub-thread; >> >if there is, please point out specific aspects which are still unclear. >> > >> >> Sorry, I overlooked the other sub-thread after holiday(10.1-10.7).. >> I will read it again.. >> >> Quan > >Is there any discussion after 10.1? I didn't see it. > >Back to the main open before holiday - multiple EOIs may come to clear >irq_issued before guest actually handles the very vpt injection (possible if vpt >vector is shared with other sources). I don't see a good solution on that open... :/ > >We've discussed various options which all fail in one or another place - either >miss an injection, or incur undesired injections. >Possibly we should consider another direction - fall back to non-apicv path when >we see vpt vector pending but it's not the highest one. > >Original condition to enter virtual intr delivery: > else if ( cpu_has_vmx_virtual_intr_delivery && > intack.source != hvm_intsrc_pic && > intack.source != hvm_intsrc_vector ) > >now new condition: > else if ( cpu_has_vmx_virtual_intr_delivery && > intack.source != hvm_intsrc_pic && > intack.source != hvm_intsrc_vector && > (pt_vector == -1 || intack.vector == pt_vector) ) > >Thoughts? > Kevin, When I try to fix it as your suggestion, I cannot boot the guest, with below message(from xl dmesg): (d1) HVM Loader (d1) Detected Xen v4.8-unstable (d1) Xenbus rings @0xfeffc000, event channel 1 (d1) System requested SeaBIOS (d1) CPU speed is 2394 MHz (d1) Relocating guest memory for lowmem MMIO space disabled (XEN) irq.c:275: Dom1 PCI link 0 changed 0 -> 5 (d1) PCI-ISA link 0 routed to IRQ5 (XEN) irq.c:275: Dom1 PCI link 1 changed 0 -> 10 (d1) PCI-ISA link 1 routed to IRQ10 (XEN) irq.c:275: Dom1 PCI link 2 changed 0 -> 11 (d1) PCI-ISA link 2 routed to IRQ11 (XEN) irq.c:275: Dom1 PCI link 3 changed 0 -> 5 (d1) PCI-ISA link 3 routed to IRQ5 (d1) pci dev 01:3 INTA->IRQ10 (d1) pci dev 02:0 INTA->IRQ11 (d1) RAM in high memory; setting high_mem resource base to 20f800000 (d1) pci dev 03:0 bar 10 size 002000000: 0f0000008 (d1) pci dev 02:0 bar 14 size 001000000: 0f2000008 (d1) pci dev 03:0 bar 30 size 000010000: 0f3000000 (d1) pci dev 03:0 bar 14 size 000001000: 0f3010000 (d1) pci dev 02:0 bar 10 size 000000100: 00000c001 (d1) pci dev 01:1 bar 20 size 000000010: 00000c101 (d1) Multiprocessor initialisation: (d1) - CPU0 ... 46-bit phys ... fixed MTRRs ... var MTRRs [1/8] ... done. (d1) - CPU1 ... 46-bit phys ... fixed MTRRs ... var MTRRs [1/8] ... done. (d1) Testing HVM environment: (d1) - REP INSB across page boundaries ... passed (d1) - GS base MSRs and SWAPGS ... passed (d1) Passed 2 of 2 tests (d1) Writing SMBIOS tables ... (d1) Loading SeaBIOS ... (d1) Creating MP tables ... (d1) Loading ACPI ... (d1) vm86 TSS at fc00a300 (d1) BIOS map: (d1) 10000-100e3: Scratch space (d1) c0000-fffff: Main BIOS (d1) E820 table: (d1) [00]: 00000000:00000000 - 00000000:000a0000: RAM (d1) HOLE: 00000000:000a0000 - 00000000:000c0000 (d1) [01]: 00000000:000c0000 - 00000000:00100000: RESERVED (d1) [02]: 00000000:00100000 - 00000000:f0000000: RAM (d1) HOLE: 00000000:f0000000 - 00000000:fc000000 (d1) [03]: 00000000:fc000000 - 00000001:00000000: RESERVED (d1) [04]: 00000001:00000000 - 00000002:0f800000: RAM (d1) Invoking SeaBIOS ... (d1) SeaBIOS (version rel-1.9.3-0-ge2fc41e) (d1) BUILD: gcc: (SUSE Linux) 4.3.4 [gcc-4_3-branch revision 152973] binutils: (GNU (d1) Binutils; SUSE Linux Enterprise 11) 2.23.1 (d1) (d1) Found Xen hypervisor signature at 40000000 (d1) Running on QEMU (i440fx) (d1) xen: copy e820... (d1) Relocating init from 0x000d8fa0 to 0xeffabc40 (size 82736) (d1) Found 6 PCI devices (max PCI bus is 00) (d1) Allocated Xen hypercall page at effff000 (d1) Detected Xen v4.8-unstable (d1) xen: copy BIOS tables... (d1) Copying SMBIOS entry point from 0x00010020 to 0x000f5b60 (d1) Copying MPTABLE from 0xfc001170/fc001180 to 0x000f5a60 (d1) Copying PIR from 0x00010040 to 0x000f59e0 (d1) Copying ACPI RSDP from 0x000100c0 to 0x000f59b0 (d1) Using pmtimer, ioport 0xb008 (d1) Scan for VGA option rom (d1) Running option rom at c000:0003 (XEN) stdvga.c:174:d1v0 entering stdvga mode (d1) pmm call arg1=0 (d1) Turning on vga text mode console (d1) SeaBIOS (version rel-1.9.3-0-ge2fc41e) (d1) Machine UUID 59e20ef4-565a-49cb-9559-cde6d391cdf4 (d1) All threads complete. (d1) Found 0 lpt ports (d1) Found 0 serial ports (d1) ATA controller 1 at 1f0/3f4/0 (irq 14 dev 9) (d1) ATA controller 2 at 170/374/0 (irq 15 dev 9) (d1) PS2 keyboard initialized (d1) ata0-0: QEMU HARDDISK ATA-7 Hard-Disk (30720 MiBytes) (d1) Searching bootorder for: /pci@i0cf8/*@1,1/drive@0/disk@0 (d1) All threads complete. (d1) Scan for option roms (d1) (d1) Press ESC for boot menu. (d1) (d1) Searching bootorder for: HALT (d1) drive 0x000f5940: PCHS=16383/16/63 translation=lba LCHS=1024/255/63 s=62914561 (d1) Space available for UMB: c9800-ec800, f5380-f5940 (d1) Returned 258048 bytes of ZoneHigh (d1) e820 map has 7 items: (d1) 0: 0000000000000000 - 000000000009fc00 = 1 RAM (d1) 1: 000000000009fc00 - 00000000000a0000 = 2 RESERVED (d1) 2: 00000000000f0000 - 0000000000100000 = 2 RESERVED (d1) 3: 0000000000100000 - 00000000effff000 = 1 RAM (d1) 4: 00000000effff000 - 00000000f0000000 = 2 RESERVED (d1) 5: 00000000fc000000 - 0000000100000000 = 2 RESERVED (d1) 6: 0000000100000000 - 000000020f800000 = 1 RAM (d1) enter handle_19: (d1) NULL (d1) Booting from Hard Disk... (d1) Booting from 0000:7c00 (XEN) stdvga.c:179:d1v0 leaving stdvga mode (XEN) Failed vm entry (exit reason 0x80000021) caused by invalid guest state (0). (XEN) ************* VMCS Area ************** (XEN) *** Guest State *** (XEN) CR0: actual=0x0000000080010031, shadow=0x0000000080010031, gh_mask=ffffffffffffffff (XEN) CR4: actual=0x00000000000426f8, shadow=0x00000000000406b8, gh_mask=ffffffffffffffff (XEN) CR3 = 0x0000000000185000 (XEN) PDPTE0 = 0x0000000000186001 PDPTE1 = 0x0000000000187001 (XEN) PDPTE2 = 0x0000000000188001 PDPTE3 = 0x0000000000189001 (XEN) RSP = 0x000000008ce53a6c (0x000000008ce53a6c) RIP = 0x000000008161dd21 (0x000000008161dd21) (XEN) RFLAGS=0x00200046 (0x00200046) DR7 = 0x0000000000000400 (XEN) Sysenter RSP=000000008078b000 CS:RIP=0008:000000008168c0c0 (XEN) sel attr limit base (XEN) CS: 0008 0c09b ffffffff 0000000000000000 (XEN) DS: 0023 0c0f3 ffffffff 0000000000000000 (XEN) SS: 0010 0c093 ffffffff 0000000000000000 (XEN) ES: 0023 0c0f3 ffffffff 0000000000000000 (XEN) FS: 0030 04093 00003748 0000000081779c00 (XEN) GS: 0000 1c000 ffffffff 0000000000000000 (XEN) GDTR: 000003ff 0000000081553000 (XEN) LDTR: 0000 1c000 ffffffff 0000000000000000 (XEN) IDTR: 000007ff 0000000081553400 (XEN) TR: 0028 0008b 000020ab 00000000801ad000 (XEN) EFER = 0x0000000000000000 PAT = 0x0007010600070106 (XEN) PreemptionTimer = 0x00000000 SM Base = 0x00000000 (XEN) DebugCtl = 0x0000000000000000 DebugExceptions = 0x0000000000000000 (XEN) Interruptibility = 00000000 ActivityState = 00000000 (XEN) InterruptStatus = d100 (XEN) *** Host State *** (XEN) RIP = 0xffff82d0801ff560 (vmx_asm_vmexit_handler) RSP = 0xffff83187e20ff90 (XEN) CS=e008 SS=0000 DS=0000 ES=0000 FS=0000 GS=0000 TR=e040 (XEN) FSBase=0000000000000000 GSBase=0000000000000000 TRBase=ffff830839dfec00 (XEN) GDTBase=ffff83187e36f000 IDTBase=ffff83187e37b000 (XEN) CR0=0000000080050033 CR3=0000000821f6e000 CR4=00000000001526e0 (XEN) Sysenter RSP=ffff83187e20ffc0 CS:RIP=e008:ffff82d080244a00 (XEN) EFER = 0x0000000000000000 PAT = 0x0000050100070406 (XEN) *** Control State *** (XEN) PinBased=000000bf CPUBased=b6a065fa SecondaryExec=0000576b (XEN) EntryControls=000051ff ExitControls=000fefff (XEN) ExceptionBitmap=00060002 PFECmask=00000000 PFECmatch=00000000 (XEN) VMEntry: intr_info=800000e1 errcode=00000000 ilen=00000000 (XEN) VMExit: intr_info=00000000 errcode=00000000 ilen=00000006 (XEN) reason=80000021 qualification=0000000000000000 (XEN) IDTVectoring: info=00000000 errcode=00000000 (XEN) TSC Offset = 0xfffd7fc7c2ca5a1c TSC Multiplier = 0x0000000000000000 (XEN) TPR Threshold = 0x00 PostedIntrVec = 0xf4 (XEN) EPT pointer = 0x0000000821f8501e EPTP index = 0x0000 (XEN) PLE Gap=00000080 Window=00001000 (XEN) Virtual processor ID = 0x5f7a VMfunc controls = 0000000000000000 (XEN) ************************************** (XEN) domain_crash called from vmx.c:3111 (XEN) Domain 1 (vcpu#0) crashed on cpu#12: (XEN) ----[ Xen-4.8-unstable x86_64 debug=y Not tainted ]---- (XEN) CPU: 12 (XEN) RIP: 0008:[<000000008161dd21>] (XEN) RFLAGS: 0000000000200046 CONTEXT: hvm guest (d1v0) (XEN) rax: 00000000000c0000 rbx: 000000000000000a rcx: 00000000000c00d1 (XEN) rdx: 0000000000000000 rsi: 0000000000000000 rdi: 000000008ce53acc (XEN) rbp: 000000008ce53a6c rsp: 000000008ce53a6c r8: 0000000000000000 (XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000000 (XEN) r12: 0000000000000000 r13: 0000000000000000 r14: 0000000000000000 (XEN) r15: 0000000000000000 cr0: 0000000080010031 cr4: 00000000000406b8 (XEN) cr3: 0000000000185000 cr2: 000000008cc09000 (XEN) ds: 0023 es: 0023 fs: 0030 gs: 0000 ss: 0010 cs: 0008 (XEN) HVM2 save: CPU (XEN) HVM2 save: PIC (XEN) HVM2 save: IOAPIC (XEN) HVM2 save: LAPIC (XEN) HVM2 save: LAPIC_REGS (XEN) HVM2 save: PCI_IRQ (XEN) HVM2 save: ISA_IRQ (XEN) HVM2 save: PCI_LINK (XEN) HVM2 save: PIT (XEN) HVM2 save: RTC (XEN) HVM2 save: HPET (XEN) HVM2 save: PMTIMER (XEN) HVM2 save: MTRR (XEN) HVM2 save: VIRIDIAN_DOMAIN (XEN) HVM2 save: CPU_XSAVE (XEN) HVM2 save: VIRIDIAN_VCPU (XEN) HVM2 save: VMCE_VCPU (XEN) HVM2 save: TSC_ADJUST (XEN) HVM2 restore: CPU 0 Quan
On October 11, 2016 7:11 PM, Xuquan < xuquan8@huawei.com > wrote: >On October 11, 2016 3:49 PM, Tian, Kevin <Kevin.tian@intel.com> >>> From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com] >>> Sent: Monday, October 10, 2016 6:49 PM >>> >>> On October 10, 2016 5:40 PM, Jan Beulich < JBeulich@suse.com > wrote: >>> >>>>>> >>> On 20.09.16 at 15:30, <xuquan8@huawei.com> wrote: >>> >>>>>> > --- a/xen/arch/x86/hvm/vlapic.c >>> >>>>>> > +++ b/xen/arch/x86/hvm/vlapic.c >>> >>>>>> > @@ -433,6 +433,12 @@ void vlapic_EOI_set(struct vlapic >>> >>>>>> > *vlapic) void vlapic_handle_EOI(struct vlapic *vlapic, u8 vector) { >>> >>>>>> > struct domain *d = vlapic_domain(vlapic); >>> >>>>>> > + struct vcpu *v = vlapic_vcpu(vlapic); >>> >>>>>> > + struct hvm_intack pt_intack; >>> >>>>>> > + >>> >>>>>> > + pt_intack.vector = vector; >>> >>>>>> > + pt_intack.source = hvm_intsrc_lapic; >>> >>>>>> > + pt_intr_post(v, pt_intack); >>> >>>>>> >>> >>>>>> This also sits on the EOI LAPIC register write path, i.e. the >>> >>>>>> change then also affects non-apicv environments. >>> >>>>> >>> >>>>>The new logic should be entered only when EOI-induced exit happens. >>> >>>>> >>> >>>> >>> >>>> Yes, more that the EOI-induced exit is conditional, >>> >>>> specifically, the bitmap is set by vmx_set_eoi_exit_bitmap(). >>> >>>> Jan, what do you think? While I recall from v1 discussion, you >>> >>>> have the same comment. We can dig it deep.. >>> >>> >>> >>>See my reply to Kevin sent a minute ago. As I'm not sure what >>> >>>Kevin means to state with several of his responses, I can't >>> >>>properly respond for now. And then what you say doesn't really >>> >>>address my concern - things being conditional elsewhere doesn't >>> >>>mean we won't get here too in the non-apicv case, at least not in >>> >>>a way that I can follow >>right away. >>> >> >>> >> Jan, any idea now? >>> > >>> >I don't think there was anything left open on the other sub-thread; >>> >if there is, please point out specific aspects which are still unclear. >>> > >>> >>> Sorry, I overlooked the other sub-thread after holiday(10.1-10.7).. >>> I will read it again.. >>> >>> Quan >> >>Is there any discussion after 10.1? I didn't see it. >> >>Back to the main open before holiday - multiple EOIs may come to clear >>irq_issued before guest actually handles the very vpt injection >>(possible if vpt vector is shared with other sources). I don't see a >>good solution on that open... :/ >> >>We've discussed various options which all fail in one or another place >>- either miss an injection, or incur undesired injections. >>Possibly we should consider another direction - fall back to non-apicv >>path when we see vpt vector pending but it's not the highest one. >> >>Original condition to enter virtual intr delivery: >> else if ( cpu_has_vmx_virtual_intr_delivery && >> intack.source != hvm_intsrc_pic && >> intack.source != hvm_intsrc_vector ) >> >>now new condition: >> else if ( cpu_has_vmx_virtual_intr_delivery && >> intack.source != hvm_intsrc_pic && >> intack.source != hvm_intsrc_vector && >> (pt_vector == -1 || intack.vector == pt_vector) ) >> >>Thoughts? >> >Kevin, >When I try to fix it as your suggestion, I cannot boot the guest, with below >message(from xl dmesg): with Kevin's patch, the hypervisor always calls ' vmx_inject_extint() -> __vmx_inject_exception()' to inject exception, then vm-entry on loop.. the interrupt (PT or IPI, or others) can't deliver to guest.. and so far, we suppress MSR-based APIC suggestion when having APIC-V by http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=7f2e992b824ec62a2818e64390ac2ccfbd74e6b7 so I think we couldn't fallback to non-apicv dynamically here.. Quan >(d1) HVM Loader >(d1) Detected Xen v4.8-unstable >(d1) Xenbus rings @0xfeffc000, event channel 1 >(d1) System requested SeaBIOS >(d1) CPU speed is 2394 MHz >(d1) Relocating guest memory for lowmem MMIO space disabled >(XEN) irq.c:275: Dom1 PCI link 0 changed 0 -> 5 >(d1) PCI-ISA link 0 routed to IRQ5 >(XEN) irq.c:275: Dom1 PCI link 1 changed 0 -> 10 >(d1) PCI-ISA link 1 routed to IRQ10 >(XEN) irq.c:275: Dom1 PCI link 2 changed 0 -> 11 >(d1) PCI-ISA link 2 routed to IRQ11 >(XEN) irq.c:275: Dom1 PCI link 3 changed 0 -> 5 >(d1) PCI-ISA link 3 routed to IRQ5 >(d1) pci dev 01:3 INTA->IRQ10 >(d1) pci dev 02:0 INTA->IRQ11 >(d1) RAM in high memory; setting high_mem resource base to 20f800000 >(d1) pci dev 03:0 bar 10 size 002000000: 0f0000008 >(d1) pci dev 02:0 bar 14 size 001000000: 0f2000008 >(d1) pci dev 03:0 bar 30 size 000010000: 0f3000000 >(d1) pci dev 03:0 bar 14 size 000001000: 0f3010000 >(d1) pci dev 02:0 bar 10 size 000000100: 00000c001 >(d1) pci dev 01:1 bar 20 size 000000010: 00000c101 >(d1) Multiprocessor initialisation: >(d1) - CPU0 ... 46-bit phys ... fixed MTRRs ... var MTRRs [1/8] ... done. >(d1) - CPU1 ... 46-bit phys ... fixed MTRRs ... var MTRRs [1/8] ... done. >(d1) Testing HVM environment: >(d1) - REP INSB across page boundaries ... passed >(d1) - GS base MSRs and SWAPGS ... passed >(d1) Passed 2 of 2 tests >(d1) Writing SMBIOS tables ... >(d1) Loading SeaBIOS ... >(d1) Creating MP tables ... >(d1) Loading ACPI ... >(d1) vm86 TSS at fc00a300 >(d1) BIOS map: >(d1) 10000-100e3: Scratch space >(d1) c0000-fffff: Main BIOS >(d1) E820 table: >(d1) [00]: 00000000:00000000 - 00000000:000a0000: RAM >(d1) HOLE: 00000000:000a0000 - 00000000:000c0000 >(d1) [01]: 00000000:000c0000 - 00000000:00100000: RESERVED >(d1) [02]: 00000000:00100000 - 00000000:f0000000: RAM >(d1) HOLE: 00000000:f0000000 - 00000000:fc000000 >(d1) [03]: 00000000:fc000000 - 00000001:00000000: RESERVED >(d1) [04]: 00000001:00000000 - 00000002:0f800000: RAM >(d1) Invoking SeaBIOS ... >(d1) SeaBIOS (version rel-1.9.3-0-ge2fc41e) >(d1) BUILD: gcc: (SUSE Linux) 4.3.4 [gcc-4_3-branch revision 152973] binutils: >(GNU >(d1) Binutils; SUSE Linux Enterprise 11) 2.23.1 >(d1) >(d1) Found Xen hypervisor signature at 40000000 >(d1) Running on QEMU (i440fx) >(d1) xen: copy e820... >(d1) Relocating init from 0x000d8fa0 to 0xeffabc40 (size 82736) >(d1) Found 6 PCI devices (max PCI bus is 00) >(d1) Allocated Xen hypercall page at effff000 >(d1) Detected Xen v4.8-unstable >(d1) xen: copy BIOS tables... >(d1) Copying SMBIOS entry point from 0x00010020 to 0x000f5b60 >(d1) Copying MPTABLE from 0xfc001170/fc001180 to 0x000f5a60 >(d1) Copying PIR from 0x00010040 to 0x000f59e0 >(d1) Copying ACPI RSDP from 0x000100c0 to 0x000f59b0 >(d1) Using pmtimer, ioport 0xb008 >(d1) Scan for VGA option rom >(d1) Running option rom at c000:0003 >(XEN) stdvga.c:174:d1v0 entering stdvga mode >(d1) pmm call arg1=0 >(d1) Turning on vga text mode console >(d1) SeaBIOS (version rel-1.9.3-0-ge2fc41e) >(d1) Machine UUID 59e20ef4-565a-49cb-9559-cde6d391cdf4 >(d1) All threads complete. >(d1) Found 0 lpt ports >(d1) Found 0 serial ports >(d1) ATA controller 1 at 1f0/3f4/0 (irq 14 dev 9) >(d1) ATA controller 2 at 170/374/0 (irq 15 dev 9) >(d1) PS2 keyboard initialized >(d1) ata0-0: QEMU HARDDISK ATA-7 Hard-Disk (30720 MiBytes) >(d1) Searching bootorder for: /pci@i0cf8/*@1,1/drive@0/disk@0 >(d1) All threads complete. >(d1) Scan for option roms >(d1) >(d1) Press ESC for boot menu. >(d1) >(d1) Searching bootorder for: HALT >(d1) drive 0x000f5940: PCHS=16383/16/63 translation=lba LCHS=1024/255/63 >s=62914561 >(d1) Space available for UMB: c9800-ec800, f5380-f5940 >(d1) Returned 258048 bytes of ZoneHigh >(d1) e820 map has 7 items: >(d1) 0: 0000000000000000 - 000000000009fc00 = 1 RAM >(d1) 1: 000000000009fc00 - 00000000000a0000 = 2 RESERVED >(d1) 2: 00000000000f0000 - 0000000000100000 = 2 RESERVED >(d1) 3: 0000000000100000 - 00000000effff000 = 1 RAM >(d1) 4: 00000000effff000 - 00000000f0000000 = 2 RESERVED >(d1) 5: 00000000fc000000 - 0000000100000000 = 2 RESERVED >(d1) 6: 0000000100000000 - 000000020f800000 = 1 RAM >(d1) enter handle_19: >(d1) NULL >(d1) Booting from Hard Disk... >(d1) Booting from 0000:7c00 >(XEN) stdvga.c:179:d1v0 leaving stdvga mode >(XEN) Failed vm entry (exit reason 0x80000021) caused by invalid guest state (0). >(XEN) ************* VMCS Area ************** >(XEN) *** Guest State *** >(XEN) CR0: actual=0x0000000080010031, shadow=0x0000000080010031, >gh_mask=ffffffffffffffff >(XEN) CR4: actual=0x00000000000426f8, shadow=0x00000000000406b8, >gh_mask=ffffffffffffffff >(XEN) CR3 = 0x0000000000185000 >(XEN) PDPTE0 = 0x0000000000186001 PDPTE1 = 0x0000000000187001 >(XEN) PDPTE2 = 0x0000000000188001 PDPTE3 = 0x0000000000189001 >(XEN) RSP = 0x000000008ce53a6c (0x000000008ce53a6c) RIP = >0x000000008161dd21 (0x000000008161dd21) >(XEN) RFLAGS=0x00200046 (0x00200046) DR7 = 0x0000000000000400 >(XEN) Sysenter RSP=000000008078b000 CS:RIP=0008:000000008168c0c0 >(XEN) sel attr limit base >(XEN) CS: 0008 0c09b ffffffff 0000000000000000 >(XEN) DS: 0023 0c0f3 ffffffff 0000000000000000 >(XEN) SS: 0010 0c093 ffffffff 0000000000000000 >(XEN) ES: 0023 0c0f3 ffffffff 0000000000000000 >(XEN) FS: 0030 04093 00003748 0000000081779c00 >(XEN) GS: 0000 1c000 ffffffff 0000000000000000 >(XEN) GDTR: 000003ff 0000000081553000 >(XEN) LDTR: 0000 1c000 ffffffff 0000000000000000 >(XEN) IDTR: 000007ff 0000000081553400 >(XEN) TR: 0028 0008b 000020ab 00000000801ad000 >(XEN) EFER = 0x0000000000000000 PAT = 0x0007010600070106 >(XEN) PreemptionTimer = 0x00000000 SM Base = 0x00000000 >(XEN) DebugCtl = 0x0000000000000000 DebugExceptions = >0x0000000000000000 >(XEN) Interruptibility = 00000000 ActivityState = 00000000 >(XEN) InterruptStatus = d100 >(XEN) *** Host State *** >(XEN) RIP = 0xffff82d0801ff560 (vmx_asm_vmexit_handler) RSP = >0xffff83187e20ff90 >(XEN) CS=e008 SS=0000 DS=0000 ES=0000 FS=0000 GS=0000 TR=e040 >(XEN) FSBase=0000000000000000 GSBase=0000000000000000 >TRBase=ffff830839dfec00 >(XEN) GDTBase=ffff83187e36f000 IDTBase=ffff83187e37b000 >(XEN) CR0=0000000080050033 CR3=0000000821f6e000 >CR4=00000000001526e0 >(XEN) Sysenter RSP=ffff83187e20ffc0 CS:RIP=e008:ffff82d080244a00 >(XEN) EFER = 0x0000000000000000 PAT = 0x0000050100070406 >(XEN) *** Control State *** >(XEN) PinBased=000000bf CPUBased=b6a065fa SecondaryExec=0000576b >(XEN) EntryControls=000051ff ExitControls=000fefff >(XEN) ExceptionBitmap=00060002 PFECmask=00000000 PFECmatch=00000000 >(XEN) VMEntry: intr_info=800000e1 errcode=00000000 ilen=00000000 >(XEN) VMExit: intr_info=00000000 errcode=00000000 ilen=00000006 >(XEN) reason=80000021 qualification=0000000000000000 >(XEN) IDTVectoring: info=00000000 errcode=00000000 >(XEN) TSC Offset = 0xfffd7fc7c2ca5a1c TSC Multiplier = 0x0000000000000000 >(XEN) TPR Threshold = 0x00 PostedIntrVec = 0xf4 >(XEN) EPT pointer = 0x0000000821f8501e EPTP index = 0x0000 >(XEN) PLE Gap=00000080 Window=00001000 >(XEN) Virtual processor ID = 0x5f7a VMfunc controls = 0000000000000000 >(XEN) ************************************** >(XEN) domain_crash called from vmx.c:3111 >(XEN) Domain 1 (vcpu#0) crashed on cpu#12: >(XEN) ----[ Xen-4.8-unstable x86_64 debug=y Not tainted ]---- >(XEN) CPU: 12 >(XEN) RIP: 0008:[<000000008161dd21>] >(XEN) RFLAGS: 0000000000200046 CONTEXT: hvm guest (d1v0) >(XEN) rax: 00000000000c0000 rbx: 000000000000000a rcx: >00000000000c00d1 >(XEN) rdx: 0000000000000000 rsi: 0000000000000000 rdi: >000000008ce53acc >(XEN) rbp: 000000008ce53a6c rsp: 000000008ce53a6c r8: >0000000000000000 >(XEN) r9: 0000000000000000 r10: 0000000000000000 r11: >0000000000000000 >(XEN) r12: 0000000000000000 r13: 0000000000000000 r14: >0000000000000000 >(XEN) r15: 0000000000000000 cr0: 0000000080010031 cr4: >00000000000406b8 >(XEN) cr3: 0000000000185000 cr2: 000000008cc09000 >(XEN) ds: 0023 es: 0023 fs: 0030 gs: 0000 ss: 0010 cs: 0008 >(XEN) HVM2 save: CPU >(XEN) HVM2 save: PIC >(XEN) HVM2 save: IOAPIC >(XEN) HVM2 save: LAPIC >(XEN) HVM2 save: LAPIC_REGS >(XEN) HVM2 save: PCI_IRQ >(XEN) HVM2 save: ISA_IRQ >(XEN) HVM2 save: PCI_LINK >(XEN) HVM2 save: PIT >(XEN) HVM2 save: RTC >(XEN) HVM2 save: HPET >(XEN) HVM2 save: PMTIMER >(XEN) HVM2 save: MTRR >(XEN) HVM2 save: VIRIDIAN_DOMAIN >(XEN) HVM2 save: CPU_XSAVE >(XEN) HVM2 save: VIRIDIAN_VCPU >(XEN) HVM2 save: VMCE_VCPU >(XEN) HVM2 save: TSC_ADJUST >(XEN) HVM2 restore: CPU 0 > > >Quan
Sorry I haven't got on this thread yet, but want you know it's on my list (hopefully will respond next week). > -----Original Message----- > From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com] > Sent: Monday, October 17, 2016 5:28 PM > To: Xuquan (Quan Xu); Tian, Kevin; Jan Beulich > Cc: Andrew Cooper; George.Dunlap@eu.citrix.com; yang.zhang.wz@gmail.com; > Hanweidong (Randy); Jiangyifei; Nakajima, Jun; xen-devel@lists.xen.org; > konrad.wilk@oracle.com; Tim Deegan > Subject: RE: [PATCH v2] x86/apicv: fix RTC periodic timer and apicv issue > > On October 11, 2016 7:11 PM, Xuquan < xuquan8@huawei.com > wrote: > >On October 11, 2016 3:49 PM, Tian, Kevin <Kevin.tian@intel.com> > >>> From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com] > >>> Sent: Monday, October 10, 2016 6:49 PM > >>> > >>> On October 10, 2016 5:40 PM, Jan Beulich < JBeulich@suse.com > wrote: > >>> >>>>>> >>> On 20.09.16 at 15:30, <xuquan8@huawei.com> wrote: > >>> >>>>>> > --- a/xen/arch/x86/hvm/vlapic.c > >>> >>>>>> > +++ b/xen/arch/x86/hvm/vlapic.c > >>> >>>>>> > @@ -433,6 +433,12 @@ void vlapic_EOI_set(struct vlapic > >>> >>>>>> > *vlapic) void vlapic_handle_EOI(struct vlapic *vlapic, u8 vector) { > >>> >>>>>> > struct domain *d = vlapic_domain(vlapic); > >>> >>>>>> > + struct vcpu *v = vlapic_vcpu(vlapic); > >>> >>>>>> > + struct hvm_intack pt_intack; > >>> >>>>>> > + > >>> >>>>>> > + pt_intack.vector = vector; > >>> >>>>>> > + pt_intack.source = hvm_intsrc_lapic; > >>> >>>>>> > + pt_intr_post(v, pt_intack); > >>> >>>>>> > >>> >>>>>> This also sits on the EOI LAPIC register write path, i.e. the > >>> >>>>>> change then also affects non-apicv environments. > >>> >>>>> > >>> >>>>>The new logic should be entered only when EOI-induced exit happens. > >>> >>>>> > >>> >>>> > >>> >>>> Yes, more that the EOI-induced exit is conditional, > >>> >>>> specifically, the bitmap is set by vmx_set_eoi_exit_bitmap(). > >>> >>>> Jan, what do you think? While I recall from v1 discussion, you > >>> >>>> have the same comment. We can dig it deep.. > >>> >>> > >>> >>>See my reply to Kevin sent a minute ago. As I'm not sure what > >>> >>>Kevin means to state with several of his responses, I can't > >>> >>>properly respond for now. And then what you say doesn't really > >>> >>>address my concern - things being conditional elsewhere doesn't > >>> >>>mean we won't get here too in the non-apicv case, at least not in > >>> >>>a way that I can follow > >>right away. > >>> >> > >>> >> Jan, any idea now? > >>> > > >>> >I don't think there was anything left open on the other sub-thread; > >>> >if there is, please point out specific aspects which are still unclear. > >>> > > >>> > >>> Sorry, I overlooked the other sub-thread after holiday(10.1-10.7).. > >>> I will read it again.. > >>> > >>> Quan > >> > >>Is there any discussion after 10.1? I didn't see it. > >> > >>Back to the main open before holiday - multiple EOIs may come to clear > >>irq_issued before guest actually handles the very vpt injection > >>(possible if vpt vector is shared with other sources). I don't see a > >>good solution on that open... :/ > >> > >>We've discussed various options which all fail in one or another place > >>- either miss an injection, or incur undesired injections. > >>Possibly we should consider another direction - fall back to non-apicv > >>path when we see vpt vector pending but it's not the highest one. > >> > >>Original condition to enter virtual intr delivery: > >> else if ( cpu_has_vmx_virtual_intr_delivery && > >> intack.source != hvm_intsrc_pic && > >> intack.source != hvm_intsrc_vector ) > >> > >>now new condition: > >> else if ( cpu_has_vmx_virtual_intr_delivery && > >> intack.source != hvm_intsrc_pic && > >> intack.source != hvm_intsrc_vector && > >> (pt_vector == -1 || intack.vector == pt_vector) ) > >> > >>Thoughts? > >> > >Kevin, > >When I try to fix it as your suggestion, I cannot boot the guest, with below > >message(from xl dmesg): > > with Kevin's patch, the hypervisor always calls ' vmx_inject_extint() -> > __vmx_inject_exception()' to inject exception, then vm-entry on loop.. > the interrupt (PT or IPI, or others) can't deliver to guest.. > > and so far, we suppress MSR-based APIC suggestion when having APIC-V by > http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=7f2e992b824ec62a2818e643 > 90ac2ccfbd74e6b7 > so I think we couldn't fallback to non-apicv dynamically here.. > > Quan > > > > >(d1) HVM Loader > >(d1) Detected Xen v4.8-unstable > >(d1) Xenbus rings @0xfeffc000, event channel 1 > >(d1) System requested SeaBIOS > >(d1) CPU speed is 2394 MHz > >(d1) Relocating guest memory for lowmem MMIO space disabled > >(XEN) irq.c:275: Dom1 PCI link 0 changed 0 -> 5 > >(d1) PCI-ISA link 0 routed to IRQ5 > >(XEN) irq.c:275: Dom1 PCI link 1 changed 0 -> 10 > >(d1) PCI-ISA link 1 routed to IRQ10 > >(XEN) irq.c:275: Dom1 PCI link 2 changed 0 -> 11 > >(d1) PCI-ISA link 2 routed to IRQ11 > >(XEN) irq.c:275: Dom1 PCI link 3 changed 0 -> 5 > >(d1) PCI-ISA link 3 routed to IRQ5 > >(d1) pci dev 01:3 INTA->IRQ10 > >(d1) pci dev 02:0 INTA->IRQ11 > >(d1) RAM in high memory; setting high_mem resource base to 20f800000 > >(d1) pci dev 03:0 bar 10 size 002000000: 0f0000008 > >(d1) pci dev 02:0 bar 14 size 001000000: 0f2000008 > >(d1) pci dev 03:0 bar 30 size 000010000: 0f3000000 > >(d1) pci dev 03:0 bar 14 size 000001000: 0f3010000 > >(d1) pci dev 02:0 bar 10 size 000000100: 00000c001 > >(d1) pci dev 01:1 bar 20 size 000000010: 00000c101 > >(d1) Multiprocessor initialisation: > >(d1) - CPU0 ... 46-bit phys ... fixed MTRRs ... var MTRRs [1/8] ... done. > >(d1) - CPU1 ... 46-bit phys ... fixed MTRRs ... var MTRRs [1/8] ... done. > >(d1) Testing HVM environment: > >(d1) - REP INSB across page boundaries ... passed > >(d1) - GS base MSRs and SWAPGS ... passed > >(d1) Passed 2 of 2 tests > >(d1) Writing SMBIOS tables ... > >(d1) Loading SeaBIOS ... > >(d1) Creating MP tables ... > >(d1) Loading ACPI ... > >(d1) vm86 TSS at fc00a300 > >(d1) BIOS map: > >(d1) 10000-100e3: Scratch space > >(d1) c0000-fffff: Main BIOS > >(d1) E820 table: > >(d1) [00]: 00000000:00000000 - 00000000:000a0000: RAM > >(d1) HOLE: 00000000:000a0000 - 00000000:000c0000 > >(d1) [01]: 00000000:000c0000 - 00000000:00100000: RESERVED > >(d1) [02]: 00000000:00100000 - 00000000:f0000000: RAM > >(d1) HOLE: 00000000:f0000000 - 00000000:fc000000 > >(d1) [03]: 00000000:fc000000 - 00000001:00000000: RESERVED > >(d1) [04]: 00000001:00000000 - 00000002:0f800000: RAM > >(d1) Invoking SeaBIOS ... > >(d1) SeaBIOS (version rel-1.9.3-0-ge2fc41e) > >(d1) BUILD: gcc: (SUSE Linux) 4.3.4 [gcc-4_3-branch revision 152973] binutils: > >(GNU > >(d1) Binutils; SUSE Linux Enterprise 11) 2.23.1 > >(d1) > >(d1) Found Xen hypervisor signature at 40000000 > >(d1) Running on QEMU (i440fx) > >(d1) xen: copy e820... > >(d1) Relocating init from 0x000d8fa0 to 0xeffabc40 (size 82736) > >(d1) Found 6 PCI devices (max PCI bus is 00) > >(d1) Allocated Xen hypercall page at effff000 > >(d1) Detected Xen v4.8-unstable > >(d1) xen: copy BIOS tables... > >(d1) Copying SMBIOS entry point from 0x00010020 to 0x000f5b60 > >(d1) Copying MPTABLE from 0xfc001170/fc001180 to 0x000f5a60 > >(d1) Copying PIR from 0x00010040 to 0x000f59e0 > >(d1) Copying ACPI RSDP from 0x000100c0 to 0x000f59b0 > >(d1) Using pmtimer, ioport 0xb008 > >(d1) Scan for VGA option rom > >(d1) Running option rom at c000:0003 > >(XEN) stdvga.c:174:d1v0 entering stdvga mode > >(d1) pmm call arg1=0 > >(d1) Turning on vga text mode console > >(d1) SeaBIOS (version rel-1.9.3-0-ge2fc41e) > >(d1) Machine UUID 59e20ef4-565a-49cb-9559-cde6d391cdf4 > >(d1) All threads complete. > >(d1) Found 0 lpt ports > >(d1) Found 0 serial ports > >(d1) ATA controller 1 at 1f0/3f4/0 (irq 14 dev 9) > >(d1) ATA controller 2 at 170/374/0 (irq 15 dev 9) > >(d1) PS2 keyboard initialized > >(d1) ata0-0: QEMU HARDDISK ATA-7 Hard-Disk (30720 MiBytes) > >(d1) Searching bootorder for: /pci@i0cf8/*@1,1/drive@0/disk@0 > >(d1) All threads complete. > >(d1) Scan for option roms > >(d1) > >(d1) Press ESC for boot menu. > >(d1) > >(d1) Searching bootorder for: HALT > >(d1) drive 0x000f5940: PCHS=16383/16/63 translation=lba LCHS=1024/255/63 > >s=62914561 > >(d1) Space available for UMB: c9800-ec800, f5380-f5940 > >(d1) Returned 258048 bytes of ZoneHigh > >(d1) e820 map has 7 items: > >(d1) 0: 0000000000000000 - 000000000009fc00 = 1 RAM > >(d1) 1: 000000000009fc00 - 00000000000a0000 = 2 RESERVED > >(d1) 2: 00000000000f0000 - 0000000000100000 = 2 RESERVED > >(d1) 3: 0000000000100000 - 00000000effff000 = 1 RAM > >(d1) 4: 00000000effff000 - 00000000f0000000 = 2 RESERVED > >(d1) 5: 00000000fc000000 - 0000000100000000 = 2 RESERVED > >(d1) 6: 0000000100000000 - 000000020f800000 = 1 RAM > >(d1) enter handle_19: > >(d1) NULL > >(d1) Booting from Hard Disk... > >(d1) Booting from 0000:7c00 > >(XEN) stdvga.c:179:d1v0 leaving stdvga mode > >(XEN) Failed vm entry (exit reason 0x80000021) caused by invalid guest state (0). > >(XEN) ************* VMCS Area ************** > >(XEN) *** Guest State *** > >(XEN) CR0: actual=0x0000000080010031, shadow=0x0000000080010031, > >gh_mask=ffffffffffffffff > >(XEN) CR4: actual=0x00000000000426f8, shadow=0x00000000000406b8, > >gh_mask=ffffffffffffffff > >(XEN) CR3 = 0x0000000000185000 > >(XEN) PDPTE0 = 0x0000000000186001 PDPTE1 = 0x0000000000187001 > >(XEN) PDPTE2 = 0x0000000000188001 PDPTE3 = 0x0000000000189001 > >(XEN) RSP = 0x000000008ce53a6c (0x000000008ce53a6c) RIP = > >0x000000008161dd21 (0x000000008161dd21) > >(XEN) RFLAGS=0x00200046 (0x00200046) DR7 = 0x0000000000000400 > >(XEN) Sysenter RSP=000000008078b000 CS:RIP=0008:000000008168c0c0 > >(XEN) sel attr limit base > >(XEN) CS: 0008 0c09b ffffffff 0000000000000000 > >(XEN) DS: 0023 0c0f3 ffffffff 0000000000000000 > >(XEN) SS: 0010 0c093 ffffffff 0000000000000000 > >(XEN) ES: 0023 0c0f3 ffffffff 0000000000000000 > >(XEN) FS: 0030 04093 00003748 0000000081779c00 > >(XEN) GS: 0000 1c000 ffffffff 0000000000000000 > >(XEN) GDTR: 000003ff 0000000081553000 > >(XEN) LDTR: 0000 1c000 ffffffff 0000000000000000 > >(XEN) IDTR: 000007ff 0000000081553400 > >(XEN) TR: 0028 0008b 000020ab 00000000801ad000 > >(XEN) EFER = 0x0000000000000000 PAT = 0x0007010600070106 > >(XEN) PreemptionTimer = 0x00000000 SM Base = 0x00000000 > >(XEN) DebugCtl = 0x0000000000000000 DebugExceptions = > >0x0000000000000000 > >(XEN) Interruptibility = 00000000 ActivityState = 00000000 > >(XEN) InterruptStatus = d100 > >(XEN) *** Host State *** > >(XEN) RIP = 0xffff82d0801ff560 (vmx_asm_vmexit_handler) RSP = > >0xffff83187e20ff90 > >(XEN) CS=e008 SS=0000 DS=0000 ES=0000 FS=0000 GS=0000 TR=e040 > >(XEN) FSBase=0000000000000000 GSBase=0000000000000000 > >TRBase=ffff830839dfec00 > >(XEN) GDTBase=ffff83187e36f000 IDTBase=ffff83187e37b000 > >(XEN) CR0=0000000080050033 CR3=0000000821f6e000 > >CR4=00000000001526e0 > >(XEN) Sysenter RSP=ffff83187e20ffc0 CS:RIP=e008:ffff82d080244a00 > >(XEN) EFER = 0x0000000000000000 PAT = 0x0000050100070406 > >(XEN) *** Control State *** > >(XEN) PinBased=000000bf CPUBased=b6a065fa SecondaryExec=0000576b > >(XEN) EntryControls=000051ff ExitControls=000fefff > >(XEN) ExceptionBitmap=00060002 PFECmask=00000000 PFECmatch=00000000 > >(XEN) VMEntry: intr_info=800000e1 errcode=00000000 ilen=00000000 > >(XEN) VMExit: intr_info=00000000 errcode=00000000 ilen=00000006 > >(XEN) reason=80000021 qualification=0000000000000000 > >(XEN) IDTVectoring: info=00000000 errcode=00000000 > >(XEN) TSC Offset = 0xfffd7fc7c2ca5a1c TSC Multiplier = 0x0000000000000000 > >(XEN) TPR Threshold = 0x00 PostedIntrVec = 0xf4 > >(XEN) EPT pointer = 0x0000000821f8501e EPTP index = 0x0000 > >(XEN) PLE Gap=00000080 Window=00001000 > >(XEN) Virtual processor ID = 0x5f7a VMfunc controls = 0000000000000000 > >(XEN) ************************************** > >(XEN) domain_crash called from vmx.c:3111 > >(XEN) Domain 1 (vcpu#0) crashed on cpu#12: > >(XEN) ----[ Xen-4.8-unstable x86_64 debug=y Not tainted ]---- > >(XEN) CPU: 12 > >(XEN) RIP: 0008:[<000000008161dd21>] > >(XEN) RFLAGS: 0000000000200046 CONTEXT: hvm guest (d1v0) > >(XEN) rax: 00000000000c0000 rbx: 000000000000000a rcx: > >00000000000c00d1 > >(XEN) rdx: 0000000000000000 rsi: 0000000000000000 rdi: > >000000008ce53acc > >(XEN) rbp: 000000008ce53a6c rsp: 000000008ce53a6c r8: > >0000000000000000 > >(XEN) r9: 0000000000000000 r10: 0000000000000000 r11: > >0000000000000000 > >(XEN) r12: 0000000000000000 r13: 0000000000000000 r14: > >0000000000000000 > >(XEN) r15: 0000000000000000 cr0: 0000000080010031 cr4: > >00000000000406b8 > >(XEN) cr3: 0000000000185000 cr2: 000000008cc09000 > >(XEN) ds: 0023 es: 0023 fs: 0030 gs: 0000 ss: 0010 cs: 0008 > >(XEN) HVM2 save: CPU > >(XEN) HVM2 save: PIC > >(XEN) HVM2 save: IOAPIC > >(XEN) HVM2 save: LAPIC > >(XEN) HVM2 save: LAPIC_REGS > >(XEN) HVM2 save: PCI_IRQ > >(XEN) HVM2 save: ISA_IRQ > >(XEN) HVM2 save: PCI_LINK > >(XEN) HVM2 save: PIT > >(XEN) HVM2 save: RTC > >(XEN) HVM2 save: HPET > >(XEN) HVM2 save: PMTIMER > >(XEN) HVM2 save: MTRR > >(XEN) HVM2 save: VIRIDIAN_DOMAIN > >(XEN) HVM2 save: CPU_XSAVE > >(XEN) HVM2 save: VIRIDIAN_VCPU > >(XEN) HVM2 save: VMCE_VCPU > >(XEN) HVM2 save: TSC_ADJUST > >(XEN) HVM2 restore: CPU 0 > > > > > >Quan
> From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com] > Sent: Monday, October 17, 2016 5:28 PM > > >> > >>Back to the main open before holiday - multiple EOIs may come to clear > >>irq_issued before guest actually handles the very vpt injection > >>(possible if vpt vector is shared with other sources). I don't see a > >>good solution on that open... :/ > >> > >>We've discussed various options which all fail in one or another place > >>- either miss an injection, or incur undesired injections. > >>Possibly we should consider another direction - fall back to non-apicv > >>path when we see vpt vector pending but it's not the highest one. > >> > >>Original condition to enter virtual intr delivery: > >> else if ( cpu_has_vmx_virtual_intr_delivery && > >> intack.source != hvm_intsrc_pic && > >> intack.source != hvm_intsrc_vector ) > >> > >>now new condition: > >> else if ( cpu_has_vmx_virtual_intr_delivery && > >> intack.source != hvm_intsrc_pic && > >> intack.source != hvm_intsrc_vector && > >> (pt_vector == -1 || intack.vector == pt_vector) ) > >> > >>Thoughts? > >> > >Kevin, > >When I try to fix it as your suggestion, I cannot boot the guest, with below > >message(from xl dmesg): > > with Kevin's patch, the hypervisor always calls ' vmx_inject_extint() -> > __vmx_inject_exception()' to inject exception, then vm-entry on loop.. > the interrupt (PT or IPI, or others) can't deliver to guest.. > > and so far, we suppress MSR-based APIC suggestion when having APIC-V by > http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=7f2e992b824ec62a2818e643 > 90ac2ccfbd74e6b7 > so I think we couldn't fallback to non-apicv dynamically here.. > What about setting EOI exit bitmap for intack.vector when it's higher than pending pt_vector? This way we can guarantee there's always a chance to post pt_vector when pt_vector becomes the highest one... (of course you need make later pt_intr_post conditionally then, only when intack.vector==pt_vector) Thanks Kevin
On October 24, 2016 3:02 PM, Tian, Kevin wrote: >> From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com] >> Sent: Monday, October 17, 2016 5:28 PM >> >> >> >> >>Back to the main open before holiday - multiple EOIs may come to >> >>clear irq_issued before guest actually handles the very vpt >> >>injection (possible if vpt vector is shared with other sources). I >> >>don't see a good solution on that open... :/ >> >> >> >>We've discussed various options which all fail in one or another >> >>place >> >>- either miss an injection, or incur undesired injections. >> >>Possibly we should consider another direction - fall back to >> >>non-apicv path when we see vpt vector pending but it's not the highest one. >> >> >> >>Original condition to enter virtual intr delivery: >> >> else if ( cpu_has_vmx_virtual_intr_delivery && >> >> intack.source != hvm_intsrc_pic && >> >> intack.source != hvm_intsrc_vector ) >> >> >> >>now new condition: >> >> else if ( cpu_has_vmx_virtual_intr_delivery && >> >> intack.source != hvm_intsrc_pic && >> >> intack.source != hvm_intsrc_vector && >> >> (pt_vector == -1 || intack.vector == pt_vector) ) >> >> >> >>Thoughts? >> >> >> >Kevin, >> >When I try to fix it as your suggestion, I cannot boot the guest, >> >with below message(from xl dmesg): >> >> with Kevin's patch, the hypervisor always calls ' vmx_inject_extint() >> -> __vmx_inject_exception()' to inject exception, then vm-entry on loop.. >> the interrupt (PT or IPI, or others) can't deliver to guest.. >> >> and so far, we suppress MSR-based APIC suggestion when having APIC-V >> by >> http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=7f2e992b824ec6 >> 2a2818e643 >> 90ac2ccfbd74e6b7 >> so I think we couldn't fallback to non-apicv dynamically here.. >> > >What about setting EOI exit bitmap for intack.vector when it's higher than >pending pt_vector? This way we can guarantee there's always a chance to post >pt_vector when pt_vector becomes the highest one... > >(of course you need make later pt_intr_post conditionally then, only when >intack.vector==pt_vector) Kevin, thanks for your positive reply. I have returned back that server (Intel(R) Xeon(R) CPU E5-2620 v3), so I can't Verify it on time. By my understanding, ''Virtual-Interrupt Delivery'' and "EOI Virtualization" were independent of each other. Even we set setting EOI exit bitmap here to cause EOI-induced VM exit, we still can't guarantee the PT interrupt is delivered to guest from the highest one delivered to the highest one EOI-induced VM exit.. I am afraid we could find a clean solution based on current implementation (kvm is ok).. and apicv results in decreased application performance for some windows guest. So I suggest to configure ' msr_based_apic=1' / 'apicv = 0' to enable viridian for windows guest.. this is a workaround.. Quan
On October 25, 2016 4:36 PM, Quan Xu wrote: >I am afraid we could find a clean solution based on current implementation (kvm Sorry, a typo.. s/could/could not/ >is ok).. and apicv results in decreased application performance for some windows >guest. > >So I suggest to configure ' msr_based_apic=1' / 'apicv = 0' to enable viridian for >windows guest.. this is a workaround.. > >Quan
On Tue, Oct 25, 2016 at 08:36:23AM +0000, Xuquan (Quan Xu) wrote: > On October 24, 2016 3:02 PM, Tian, Kevin wrote: > >> From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com] > >> Sent: Monday, October 17, 2016 5:28 PM > >> > >> >> > >> >>Back to the main open before holiday - multiple EOIs may come to > >> >>clear irq_issued before guest actually handles the very vpt > >> >>injection (possible if vpt vector is shared with other sources). I > >> >>don't see a good solution on that open... :/ > >> >> > >> >>We've discussed various options which all fail in one or another > >> >>place > >> >>- either miss an injection, or incur undesired injections. > >> >>Possibly we should consider another direction - fall back to > >> >>non-apicv path when we see vpt vector pending but it's not the highest one. > >> >> > >> >>Original condition to enter virtual intr delivery: > >> >> else if ( cpu_has_vmx_virtual_intr_delivery && > >> >> intack.source != hvm_intsrc_pic && > >> >> intack.source != hvm_intsrc_vector ) > >> >> > >> >>now new condition: > >> >> else if ( cpu_has_vmx_virtual_intr_delivery && > >> >> intack.source != hvm_intsrc_pic && > >> >> intack.source != hvm_intsrc_vector && > >> >> (pt_vector == -1 || intack.vector == pt_vector) ) > >> >> > >> >>Thoughts? > >> >> > >> >Kevin, > >> >When I try to fix it as your suggestion, I cannot boot the guest, > >> >with below message(from xl dmesg): > >> > >> with Kevin's patch, the hypervisor always calls ' vmx_inject_extint() > >> -> __vmx_inject_exception()' to inject exception, then vm-entry on loop.. > >> the interrupt (PT or IPI, or others) can't deliver to guest.. > >> > >> and so far, we suppress MSR-based APIC suggestion when having APIC-V > >> by > >> http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=7f2e992b824ec6 > >> 2a2818e643 > >> 90ac2ccfbd74e6b7 > >> so I think we couldn't fallback to non-apicv dynamically here.. > >> > > > >What about setting EOI exit bitmap for intack.vector when it's higher than > >pending pt_vector? This way we can guarantee there's always a chance to post > >pt_vector when pt_vector becomes the highest one... > > > >(of course you need make later pt_intr_post conditionally then, only when > >intack.vector==pt_vector) > > Kevin, thanks for your positive reply. I have returned back that server (Intel(R) Xeon(R) CPU E5-2620 v3), so I can't > Verify it on time. > > By my understanding, ''Virtual-Interrupt Delivery'' and "EOI Virtualization" were independent of each other. > Even we set setting EOI exit bitmap here to cause EOI-induced VM exit, we still can't guarantee the PT interrupt is delivered to guest > from the highest one delivered to the highest one EOI-induced VM exit.. > > I am afraid we could find a clean solution based on current implementation (kvm is ok).. and apicv results in decreased application performance for some windows guest. > Could you describe what the KVM implementation solution did that made it clean? I am curious whether the concept could be put in Xen? > So I suggest to configure ' msr_based_apic=1' / 'apicv = 0' to enable viridian for windows guest.. this is a workaround.. > > Quan
> From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com] > Sent: Tuesday, October 25, 2016 4:36 PM > > On October 24, 2016 3:02 PM, Tian, Kevin wrote: > >> From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com] > >> Sent: Monday, October 17, 2016 5:28 PM > >> > >> >> > >> >>Back to the main open before holiday - multiple EOIs may come to > >> >>clear irq_issued before guest actually handles the very vpt > >> >>injection (possible if vpt vector is shared with other sources). I > >> >>don't see a good solution on that open... :/ > >> >> > >> >>We've discussed various options which all fail in one or another > >> >>place > >> >>- either miss an injection, or incur undesired injections. > >> >>Possibly we should consider another direction - fall back to > >> >>non-apicv path when we see vpt vector pending but it's not the highest one. > >> >> > >> >>Original condition to enter virtual intr delivery: > >> >> else if ( cpu_has_vmx_virtual_intr_delivery && > >> >> intack.source != hvm_intsrc_pic && > >> >> intack.source != hvm_intsrc_vector ) > >> >> > >> >>now new condition: > >> >> else if ( cpu_has_vmx_virtual_intr_delivery && > >> >> intack.source != hvm_intsrc_pic && > >> >> intack.source != hvm_intsrc_vector && > >> >> (pt_vector == -1 || intack.vector == pt_vector) ) > >> >> > >> >>Thoughts? > >> >> > >> >Kevin, > >> >When I try to fix it as your suggestion, I cannot boot the guest, > >> >with below message(from xl dmesg): > >> > >> with Kevin's patch, the hypervisor always calls ' vmx_inject_extint() > >> -> __vmx_inject_exception()' to inject exception, then vm-entry on loop.. > >> the interrupt (PT or IPI, or others) can't deliver to guest.. > >> > >> and so far, we suppress MSR-based APIC suggestion when having APIC-V > >> by > >> http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=7f2e992b824ec6 > >> 2a2818e643 > >> 90ac2ccfbd74e6b7 > >> so I think we couldn't fallback to non-apicv dynamically here.. > >> > > > >What about setting EOI exit bitmap for intack.vector when it's higher than > >pending pt_vector? This way we can guarantee there's always a chance to post > >pt_vector when pt_vector becomes the highest one... > > > >(of course you need make later pt_intr_post conditionally then, only when > >intack.vector==pt_vector) > > Kevin, thanks for your positive reply. I have returned back that server (Intel(R) Xeon(R) > CPU E5-2620 v3), so I can't > Verify it on time. > > By my understanding, ''Virtual-Interrupt Delivery'' and "EOI Virtualization" were > independent of each other. > Even we set setting EOI exit bitmap here to cause EOI-induced VM exit, we still can't > guarantee the PT interrupt is delivered to guest > from the highest one delivered to the highest one EOI-induced VM exit.. Can you elaborate why you think it doesn't work? I didn't get your point here. The idea here is that given above situation occurs - multiple pending vectors but pt_vector is not the highest - set EOI exit bitmap for highest vector. Then once guest EOI to the highest vector, a VM-exit happens and then if pt_vector happens to be the next highest vector, you have chance to pt_intr_post before resuming to the guest. EOI exit bitmap anyway is required for vpt to work correctly... > > I am afraid we could find a clean solution based on current implementation (kvm is ok).. > and apicv results in decreased application performance for some windows guest. > > So I suggest to configure ' msr_based_apic=1' / 'apicv = 0' to enable viridian for windows > guest.. this is a workaround.. > > Quan
On October 26, 2016 1:20 PM, Tian, Kevin wrote: >> From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com] >> Sent: Tuesday, October 25, 2016 4:36 PM >> >> On October 24, 2016 3:02 PM, Tian, Kevin wrote: >> >> From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com] >> >> Sent: Monday, October 17, 2016 5:28 PM >> >> >> >> >> >> >> >>Back to the main open before holiday - multiple EOIs may come to >> >> >>clear irq_issued before guest actually handles the very vpt >> >> >>injection (possible if vpt vector is shared with other sources). >> >> >>I don't see a good solution on that open... :/ >> >> >> >> >> >>We've discussed various options which all fail in one or another >> >> >>place >> >> >>- either miss an injection, or incur undesired injections. >> >> >>Possibly we should consider another direction - fall back to >> >> >>non-apicv path when we see vpt vector pending but it's not the highest >one. >> >> >> >> >> >>Original condition to enter virtual intr delivery: >> >> >> else if ( cpu_has_vmx_virtual_intr_delivery && >> >> >> intack.source != hvm_intsrc_pic && >> >> >> intack.source != hvm_intsrc_vector ) >> >> >> >> >> >>now new condition: >> >> >> else if ( cpu_has_vmx_virtual_intr_delivery && >> >> >> intack.source != hvm_intsrc_pic && >> >> >> intack.source != hvm_intsrc_vector && >> >> >> (pt_vector == -1 || intack.vector == pt_vector) ) >> >> >> >> >> >>Thoughts? >> >> >> >> >> >Kevin, >> >> >When I try to fix it as your suggestion, I cannot boot the guest, >> >> >with below message(from xl dmesg): >> >> >> >> with Kevin's patch, the hypervisor always calls ' >> >> vmx_inject_extint() >> >> -> __vmx_inject_exception()' to inject exception, then vm-entry on loop.. >> >> the interrupt (PT or IPI, or others) can't deliver to guest.. >> >> >> >> and so far, we suppress MSR-based APIC suggestion when having >> >> APIC-V by >> >> http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=7f2e992b824 >> >> ec6 >> >> 2a2818e643 >> >> 90ac2ccfbd74e6b7 >> >> so I think we couldn't fallback to non-apicv dynamically here.. >> >> >> > >> >What about setting EOI exit bitmap for intack.vector when it's higher >> >than pending pt_vector? This way we can guarantee there's always a >> >chance to post pt_vector when pt_vector becomes the highest one... >> > >> >(of course you need make later pt_intr_post conditionally then, only >> >when >> >intack.vector==pt_vector) >> >> Kevin, thanks for your positive reply. I have returned back that >> server (Intel(R) Xeon(R) CPU E5-2620 v3), so I can't Verify it on >> time. >> >> By my understanding, ''Virtual-Interrupt Delivery'' and "EOI >> Virtualization" were independent of each other. >> Even we set setting EOI exit bitmap here to cause EOI-induced VM exit, >> we still can't guarantee the PT interrupt is delivered to guest >> from the highest one delivered to the highest one >EOI-induced VM exit.. > >Can you elaborate why you think it doesn't work? I didn't get your point here. >The idea here is that given above situation occurs - multiple pending vectors but >pt_vector is not the highest - set EOI exit bitmap for highest vector. I understood your suggestion. >Then once >guest EOI to the highest vector, a VM-exit happens and then if pt_vector happens >to be the next highest vector, you have chance to pt_intr_post before resuming >to the guest. > The gap may be the count (pending_intr_nr) of pending periodic timer interrupt.. The highest vector is consumed as below: a). vIRR to RVI (software, vmx_intr_assist() ) b). RVI to SVI .etc, then deliver interrupt with Vector through IDT .. (hardware, Virtual-Interrupt Delivery ) c). EOI (hardware, EOI virtualization).. if we set EOI exit bitmap for highest vector, THEN cause EOI-induced VM exit.. if this is serial execution for each pending interrupt, your suggestion is working. But at step b), after delivered the highest vector, __hardware__ may continue to deliver vpt (if highest) to RVI and deliver it to guest as below " Virtual-Interrupt Delivery ", and without decreasing the count (pending_intr_nr) of pending periodic timer interrupt.. (( if Xen doesn't update periodic timer interrupt bit set in VIRR to guest interrupt status (RVI) directly, Xen is not aware of this case to decrease the count (pending_intr_nr) of pending periodic timer interrupt, then Xen will deliver a periodic timer interrupt again)) .. Virtual-Interrupt Delivery >>>: Vector ← RVI; VISR[Vector] ← 1; SVI ← Vector; VPPR ← Vector & F0H; VIRR[Vector] ← 0; IF any bits set in VIRR THEN RVI ← highest index of bit set in VIRR (___if vpt is the highest index___) ELSE RVI ← 0; FI; deliver interrupt with Vector through IDT; cease recognition of any pending virtual interrupt; <<<< I think this is still the case, one pending vpt with multiple delivery. Sorry for my bad description!! Quan >EOI exit bitmap anyway is required for vpt to work correctly... > >> >> I am afraid we could find a clean solution based on current implementation >(kvm is ok).. >> and apicv results in decreased application performance for some windows >guest. >> >> So I suggest to configure ' msr_based_apic=1' / 'apicv = 0' to enable >> viridian for windows guest.. this is a workaround.. >> >> Quan
On October 25, 2016 9:01 PM, Konrad Rzeszutek Wilk < konrad.wilk@oracle.com > wrote: >> > >Could you describe what the KVM implementation solution did that made it clean? >I am curious whether the concept could be put in Xen? Konrad, I am still learning this part. I will update it later. Quan
> From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com] > Sent: Wednesday, October 26, 2016 4:39 PM > > On October 26, 2016 1:20 PM, Tian, Kevin wrote: > >> From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com] > >> Sent: Tuesday, October 25, 2016 4:36 PM > >> > >> On October 24, 2016 3:02 PM, Tian, Kevin wrote: > >> >> From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com] > >> >> Sent: Monday, October 17, 2016 5:28 PM > >> >> > >> >> >> > >> >> >>Back to the main open before holiday - multiple EOIs may come to > >> >> >>clear irq_issued before guest actually handles the very vpt > >> >> >>injection (possible if vpt vector is shared with other sources). > >> >> >>I don't see a good solution on that open... :/ > >> >> >> > >> >> >>We've discussed various options which all fail in one or another > >> >> >>place > >> >> >>- either miss an injection, or incur undesired injections. > >> >> >>Possibly we should consider another direction - fall back to > >> >> >>non-apicv path when we see vpt vector pending but it's not the highest > >one. > >> >> >> > >> >> >>Original condition to enter virtual intr delivery: > >> >> >> else if ( cpu_has_vmx_virtual_intr_delivery && > >> >> >> intack.source != hvm_intsrc_pic && > >> >> >> intack.source != hvm_intsrc_vector ) > >> >> >> > >> >> >>now new condition: > >> >> >> else if ( cpu_has_vmx_virtual_intr_delivery && > >> >> >> intack.source != hvm_intsrc_pic && > >> >> >> intack.source != hvm_intsrc_vector && > >> >> >> (pt_vector == -1 || intack.vector == pt_vector) ) > >> >> >> > >> >> >>Thoughts? > >> >> >> > >> >> >Kevin, > >> >> >When I try to fix it as your suggestion, I cannot boot the guest, > >> >> >with below message(from xl dmesg): > >> >> > >> >> with Kevin's patch, the hypervisor always calls ' > >> >> vmx_inject_extint() > >> >> -> __vmx_inject_exception()' to inject exception, then vm-entry on loop.. > >> >> the interrupt (PT or IPI, or others) can't deliver to guest.. > >> >> > >> >> and so far, we suppress MSR-based APIC suggestion when having > >> >> APIC-V by > >> >> http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=7f2e992b824 > >> >> ec6 > >> >> 2a2818e643 > >> >> 90ac2ccfbd74e6b7 > >> >> so I think we couldn't fallback to non-apicv dynamically here.. > >> >> > >> > > >> >What about setting EOI exit bitmap for intack.vector when it's higher > >> >than pending pt_vector? This way we can guarantee there's always a > >> >chance to post pt_vector when pt_vector becomes the highest one... > >> > > >> >(of course you need make later pt_intr_post conditionally then, only > >> >when > >> >intack.vector==pt_vector) > >> > >> Kevin, thanks for your positive reply. I have returned back that > >> server (Intel(R) Xeon(R) CPU E5-2620 v3), so I can't Verify it on > >> time. > >> > >> By my understanding, ''Virtual-Interrupt Delivery'' and "EOI > >> Virtualization" were independent of each other. > >> Even we set setting EOI exit bitmap here to cause EOI-induced VM exit, > >> we still can't guarantee the PT interrupt is delivered to guest > >> from the highest one delivered to the highest one > >EOI-induced VM exit.. > > > >Can you elaborate why you think it doesn't work? I didn't get your point here. > >The idea here is that given above situation occurs - multiple pending vectors but > >pt_vector is not the highest - set EOI exit bitmap for highest vector. > > I understood your suggestion. > > >Then once > >guest EOI to the highest vector, a VM-exit happens and then if pt_vector happens > >to be the next highest vector, you have chance to pt_intr_post before resuming > >to the guest. > > > > The gap may be the count (pending_intr_nr) of pending periodic timer interrupt.. > > > The highest vector is consumed as below: > > a). vIRR to RVI (software, vmx_intr_assist() ) > b). RVI to SVI .etc, then deliver interrupt with Vector through IDT .. (hardware, > Virtual-Interrupt Delivery ) > c). EOI (hardware, EOI virtualization).. if we set EOI exit bitmap for highest vector, THEN > cause EOI-induced VM exit.. > > if this is serial execution for each pending interrupt, your suggestion is working. > > But at step b), after delivered the highest vector, __hardware__ may continue to deliver > vpt (if highest) to RVI and deliver it to guest as below " Virtual-Interrupt Delivery ", > and without decreasing the count (pending_intr_nr) of pending periodic timer interrupt.. > > (( if Xen doesn't update periodic timer interrupt bit set > in VIRR to guest interrupt status (RVI) directly, Xen is not aware of this > case to decrease the count (pending_intr_nr) of pending periodic timer interrupt, > then Xen will deliver a periodic timer interrupt again)) > > .. > Virtual-Interrupt Delivery >>>: > Vector ← RVI; > VISR[Vector] ← 1; > SVI ← Vector; > VPPR ← Vector & F0H; > VIRR[Vector] ← 0; > IF any bits set in VIRR > THEN RVI ← highest index of bit set in VIRR (___if vpt is the highest index___) updating RVI doesn't mean delivering the virtual interrupt. There is no difference from how software updates that field before resuming to guest. > ELSE RVI ← 0; > FI; > deliver interrupt with Vector through IDT; > cease recognition of any pending virtual interrupt; please note recognition is ceased here. If you check 29.2.1 Evaluation of pending virtual interrupts, virtual interrupt is evaluated at VM entry, TPR virtualization, EOI virtualization, self-IPI virtualization, and posted-interrupt processing. and evaluation is as below: IF “interrupt-window exiting” is 0 AND RVI[7:4] > VPPR[7:4] (see Section 29.1.1 for definition of VPPR) THEN recognize a pending virtual interrupt; ELSE do not recognize a pending virtual interrupt; FI; Even an evaluation is triggered when highest vector is being handled (before EOI), RVI (with pt_vector) is smaller than VPPR (which is the current highest vector). So no injection of pt_vector will happen. Then later when highest vector is EOI-ed, a VM-exit happens immediately... Thanks Kevin
On October 26, 2016 5:35 PM, Tian, Kevin wrote: >> From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com] >> Sent: Wednesday, October 26, 2016 4:39 PM >> >> On October 26, 2016 1:20 PM, Tian, Kevin wrote: >> >> From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com] >> >> Sent: Tuesday, October 25, 2016 4:36 PM >> >> >> >> On October 24, 2016 3:02 PM, Tian, Kevin wrote: >> >> >> From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com] >> >> >> Sent: Monday, October 17, 2016 5:28 PM >> >> >> >> >> >> >> >> >> >> >>Back to the main open before holiday - multiple EOIs may come >> >> >> >>to clear irq_issued before guest actually handles the very vpt >> >> >> >>injection (possible if vpt vector is shared with other sources). >> >> >> >>I don't see a good solution on that open... :/ >> >> >> >> >> >> >> >>We've discussed various options which all fail in one or >> >> >> >>another place >> >> >> >>- either miss an injection, or incur undesired injections. >> >> >> >>Possibly we should consider another direction - fall back to >> >> >> >>non-apicv path when we see vpt vector pending but it's not the >> >> >> >>highest >> >one. >> >> >> >> >> >> >> >>Original condition to enter virtual intr delivery: >> >> >> >> else if ( cpu_has_vmx_virtual_intr_delivery && >> >> >> >> intack.source != hvm_intsrc_pic && >> >> >> >> intack.source != hvm_intsrc_vector ) >> >> >> >> >> >> >> >>now new condition: >> >> >> >> else if ( cpu_has_vmx_virtual_intr_delivery && >> >> >> >> intack.source != hvm_intsrc_pic && >> >> >> >> intack.source != hvm_intsrc_vector && >> >> >> >> (pt_vector == -1 || intack.vector == pt_vector) ) >> >> >> >> >> >> >> >>Thoughts? >> >> >> >> >> >> >> >Kevin, >> >> >> >When I try to fix it as your suggestion, I cannot boot the >> >> >> >guest, with below message(from xl dmesg): >> >> >> >> >> >> with Kevin's patch, the hypervisor always calls ' >> >> >> vmx_inject_extint() >> >> >> -> __vmx_inject_exception()' to inject exception, then vm-entry on loop.. >> >> >> the interrupt (PT or IPI, or others) can't deliver to guest.. >> >> >> >> >> >> and so far, we suppress MSR-based APIC suggestion when having >> >> >> APIC-V by >> >> >> http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=7f2e992b >> >> >> 824 >> >> >> ec6 >> >> >> 2a2818e643 >> >> >> 90ac2ccfbd74e6b7 >> >> >> so I think we couldn't fallback to non-apicv dynamically here.. >> >> >> >> >> > >> >> >What about setting EOI exit bitmap for intack.vector when it's >> >> >higher than pending pt_vector? This way we can guarantee there's >> >> >always a chance to post pt_vector when pt_vector becomes the highest >one... >> >> > >> >> >(of course you need make later pt_intr_post conditionally then, >> >> >only when >> >> >intack.vector==pt_vector) >> >> >> >> Kevin, thanks for your positive reply. I have returned back that >> >> server (Intel(R) Xeon(R) CPU E5-2620 v3), so I can't Verify it on >> >> time. >> >> >> >> By my understanding, ''Virtual-Interrupt Delivery'' and "EOI >> >> Virtualization" were independent of each other. >> >> Even we set setting EOI exit bitmap here to cause EOI-induced VM >> >> exit, we still can't guarantee the PT interrupt is delivered to guest >> >> from the highest one delivered to the highest one >> >EOI-induced VM exit.. >> > >> >Can you elaborate why you think it doesn't work? I didn't get your point here. >> >The idea here is that given above situation occurs - multiple pending >> >vectors but pt_vector is not the highest - set EOI exit bitmap for highest >vector. >> >> I understood your suggestion. >> >> >Then once >> >guest EOI to the highest vector, a VM-exit happens and then if >> >pt_vector happens to be the next highest vector, you have chance to >> >pt_intr_post before resuming to the guest. >> > >> >> The gap may be the count (pending_intr_nr) of pending periodic timer >interrupt.. >> >> >> The highest vector is consumed as below: >> >> a). vIRR to RVI (software, vmx_intr_assist() ) b). RVI to SVI .etc, >> then deliver interrupt with Vector through IDT .. (hardware, >> Virtual-Interrupt Delivery ) c). EOI (hardware, EOI virtualization).. >> if we set EOI exit bitmap for highest vector, THEN cause EOI-induced >> VM exit.. >> >> if this is serial execution for each pending interrupt, your suggestion is working. >> >> But at step b), after delivered the highest vector, __hardware__ may >> continue to deliver vpt (if highest) to RVI and deliver it to guest as >> below " Virtual-Interrupt Delivery ", and without decreasing the count >(pending_intr_nr) of pending periodic timer interrupt.. >> >> (( if Xen doesn't update periodic timer interrupt bit set in VIRR to >> guest interrupt status (RVI) directly, Xen is not aware of this case >> to decrease the count (pending_intr_nr) of pending periodic timer >> interrupt, then Xen will deliver a periodic timer interrupt again)) >> >> .. >> Virtual-Interrupt Delivery >>>: >> Vector ← RVI; >> VISR[Vector] ← 1; >> SVI ← Vector; >> VPPR ← Vector & F0H; >> VIRR[Vector] ← 0; >> IF any bits set in VIRR >> THEN RVI ← highest index of bit set in VIRR (___if vpt is the highest >> index___) > >updating RVI doesn't mean delivering the virtual interrupt. There is no difference >from how software updates that field before resuming to guest. > >> ELSE RVI ← 0; >> FI; >> deliver interrupt with Vector through IDT; cease recognition of any >> pending virtual interrupt; > >please note recognition is ceased here. > >If you check 29.2.1 Evaluation of pending virtual interrupts, virtual interrupt is >evaluated at VM entry, TPR virtualization, EOI virtualization, self-IPI virtualization, >and posted-interrupt processing. and evaluation is as below: > >IF “interrupt-window exiting” is 0 AND >RVI[7:4] > VPPR[7:4] (see Section 29.1.1 for definition of VPPR) THEN recognize a >pending virtual interrupt; ELSE do not recognize a pending virtual interrupt; FI; > >Even an evaluation is triggered when highest vector is being handled (before EOI), >RVI (with pt_vector) is smaller than VPPR (which is the current highest vector). So >no injection of pt_vector will happen. Then later when highest vector is EOI-ed, a >VM-exit happens immediately... > > Cool, you are right. I will update/verify patch as your suggestion when I get back that server later. Thank you very much!! Quan
diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c index 1d5d287..f83d6ab 100644 --- a/xen/arch/x86/hvm/vlapic.c +++ b/xen/arch/x86/hvm/vlapic.c @@ -433,6 +433,12 @@ void vlapic_EOI_set(struct vlapic *vlapic) void vlapic_handle_EOI(struct vlapic *vlapic, u8 vector) { struct domain *d = vlapic_domain(vlapic); + struct vcpu *v = vlapic_vcpu(vlapic); + struct hvm_intack pt_intack; + + pt_intack.vector = vector; + pt_intack.source = hvm_intsrc_lapic; + pt_intr_post(v, pt_intack); if ( vlapic_test_and_clear_vector(vector, &vlapic->regs->data[APIC_TMR]) ) vioapic_update_EOI(d, vector); diff --git a/xen/arch/x86/hvm/vmx/intr.c b/xen/arch/x86/hvm/vmx/intr.c index 8fca08c..29d9bbf 100644 --- a/xen/arch/x86/hvm/vmx/intr.c +++ b/xen/arch/x86/hvm/vmx/intr.c @@ -333,8 +333,6 @@ void vmx_intr_assist(void) clear_bit(i, &v->arch.hvm_vmx.eoi_exitmap_changed); __vmwrite(EOI_EXIT_BITMAP(i), v->arch.hvm_vmx.eoi_exit_bitmap[i]); } - - pt_intr_post(v, intack); } else { diff --git a/xen/arch/x86/hvm/vpt.c b/xen/arch/x86/hvm/vpt.c index 5c48fdb..a9da436 100644 --- a/xen/arch/x86/hvm/vpt.c +++ b/xen/arch/x86/hvm/vpt.c @@ -252,7 +252,8 @@ int pt_update_irq(struct vcpu *v) } else { - if ( (pt->last_plt_gtime + pt->period) < max_lag ) + if ( (pt->last_plt_gtime + pt->period) < max_lag && + !pt->irq_issued ) { max_lag = pt->last_plt_gtime + pt->period; earliest_pt = pt;