Message ID | E0A769A898ADB6449596C41F51EF62C6AA038E@SZXEMI506-MBS.china.huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 2016/8/19 20:58, Xuquan (Euler) wrote: > From 9b2df963c13ad27e2cffbeddfa3267782ac3da2a Mon Sep 17 00:00:00 2001 > From: Quan Xu <xuquan8@huawei.com> > Date: Fri, 19 Aug 2016 20:40:31 +0800 > Subject: [RFC PATCH] x86/apicv: fix RTC periodic timer and apicv issue > > When Xen apicv is enabled, wall clock time is faster on Windows7-32 guest > with high payload (with 2vCPU, captured from xentrace, in high payload, the > count of IPI interrupt increases rapidly between these vCPUs). > > If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector 0xd1) > are both pending (index of bit set in VIRR), unfortunately, the IPI intrrupt > is high priority than periodic timer interrupt. Xen updates IPI interrupt bit > set in VIRR to guest interrupt status (RVI) as a high priority and apicv > (Virtual-Interrupt Delivery) delivers IPI interrupt within VMX non-root > operation without a VM exit. Within VMX non-root operation, if periodic timer > interrupt index of bit is set in VIRR and highest, the apicv delivers periodic > timer interrupt within VMX non-root operation as well. > > But in current code, if Xen doesn't update periodic timer interrupt bit set > in VIRR to guest interrupt status (RVI) directly, Xen is not aware of this > case to decrease the count (pending_intr_nr) of pending periodic timer interrupt, > then Xen will deliver a periodic timer interrupt again. The guest receives more > periodic timer interrupt. > > If the periodic timer interrut is delivered and not the highest priority, make > Xen be aware of this case to decrease the count of pending periodic timer > interrupt. Nice catch! > > Signed-off-by: Yifei Jiang <jiangyifei@huawei.com> > Signed-off-by: Rongguang He <herongguang.he@huawei.com> > Signed-off-by: Quan Xu <xuquan8@huawei.com> > --- > Why RFC: > 1. I am not quite sure for other cases, such as nested case. We don't support the nested APICv now. So it is ok for this patch. > 2. Within VMX non-root operation, an Asynchronous Enclave Exit (including external > interrupts, non-maskable interrupt system-management interrrupts, exceptions > and VM exit) may occur before delivery of a periodic timer interrupt, the periodic > timer interrupt may be lost when a coming periodic timer interrupt is delivered. > Actually, and so current code is. I think you mean the combination not interrupt lost. It should be ok since it happens rarely and even native OS will encounter the same problem if it disable the interrupt for too long. > --- > xen/arch/x86/hvm/vmx/intr.c | 16 +++++++++++++++- > 1 file changed, 15 insertions(+), 1 deletion(-) > > diff --git a/xen/arch/x86/hvm/vmx/intr.c b/xen/arch/x86/hvm/vmx/intr.c > index 8fca08c..d3a034e 100644 > --- a/xen/arch/x86/hvm/vmx/intr.c > +++ b/xen/arch/x86/hvm/vmx/intr.c > @@ -334,7 +334,21 @@ void vmx_intr_assist(void) > __vmwrite(EOI_EXIT_BITMAP(i), v->arch.hvm_vmx.eoi_exit_bitmap[i]); > } > > - pt_intr_post(v, intack); > + /* > + * If the periodic timer interrut is delivered and not the highest priority, > + * make Xen be aware of this case to decrease the count of pending periodic > + * timer interrupt. > + */ > + if ( pt_vector != -1 && intack.vector > pt_vector ) > + { > + struct hvm_intack pt_intack; > + > + pt_intack.vector = pt_vector; > + pt_intack.source = hvm_intsrc_lapic; > + pt_intr_post(v, pt_intack); > + } > + else > + pt_intr_post(v, intack); > } > else > { > -- > 1.7.12.4 > Looks good to me. Reviewed-by: Yang Zhang <yang.zhang.wz@gmail.com>
>>> On 19.08.16 at 14:58, <xuquan8@huawei.com> wrote: > From 9b2df963c13ad27e2cffbeddfa3267782ac3da2a Mon Sep 17 00:00:00 2001 > From: Quan Xu <xuquan8@huawei.com> > Date: Fri, 19 Aug 2016 20:40:31 +0800 > Subject: [RFC PATCH] x86/apicv: fix RTC periodic timer and apicv issue > > When Xen apicv is enabled, wall clock time is faster on Windows7-32 guest > with high payload (with 2vCPU, captured from xentrace, in high payload, the > count of IPI interrupt increases rapidly between these vCPUs). > > If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector 0xd1) > are both pending (index of bit set in VIRR), unfortunately, the IPI intrrupt > is high priority than periodic timer interrupt. Xen updates IPI interrupt bit > set in VIRR to guest interrupt status (RVI) as a high priority and apicv > (Virtual-Interrupt Delivery) delivers IPI interrupt within VMX non-root > operation without a VM exit. Within VMX non-root operation, if periodic timer > interrupt index of bit is set in VIRR and highest, the apicv delivers periodic > timer interrupt within VMX non-root operation as well. > > But in current code, if Xen doesn't update periodic timer interrupt bit set > in VIRR to guest interrupt status (RVI) directly, Xen is not aware of this > case to decrease the count (pending_intr_nr) of pending periodic timer interrupt, > then Xen will deliver a periodic timer interrupt again. The guest receives more > periodic timer interrupt. > > If the periodic timer interrut is delivered and not the highest priority, make > Xen be aware of this case to decrease the count of pending periodic timer > interrupt. I can see the issue you're trying to address, but for one - doesn't this lead to other double accounting, namely once the pt irq becomes the highest priority one? > --- a/xen/arch/x86/hvm/vmx/intr.c > +++ b/xen/arch/x86/hvm/vmx/intr.c > @@ -334,7 +334,21 @@ void vmx_intr_assist(void) > __vmwrite(EOI_EXIT_BITMAP(i), v->arch.hvm_vmx.eoi_exit_bitmap[i]); > } > > - pt_intr_post(v, intack); > + /* > + * If the periodic timer interrut is delivered and not the highest priority, > + * make Xen be aware of this case to decrease the count of pending periodic > + * timer interrupt. > + */ > + if ( pt_vector != -1 && intack.vector > pt_vector ) > + { > + struct hvm_intack pt_intack; > + > + pt_intack.vector = pt_vector; > + pt_intack.source = hvm_intsrc_lapic; > + pt_intr_post(v, pt_intack); > + } > + else > + pt_intr_post(v, intack); And then I'm having two problems with this change: Processing other than the highest priority irq here looks to be non-architectural. From looking over pt_intr_post() there's only pt related accounting and no actual interrupt handling there, but I'd like this to be (a) confirmed and (b) called out in the comment. Plus the change above is in no way apicv specific, so I'd also like to see clarification why this change is valid (i.e. at least not making things worse) even without apicv in use. And of course in any event VMX maintainer feedback is going to be needed here. Jan
On August 22, 2016 5:38 PM, Yang Zhang wrote: >On 2016/8/19 20:58, Xuquan (Euler) wrote: >> From 9b2df963c13ad27e2cffbeddfa3267782ac3da2a Mon Sep 17 00:00:00 2001 >> From: Quan Xu <xuquan8@huawei.com> >> Date: Fri, 19 Aug 2016 20:40:31 +0800 >> Subject: [RFC PATCH] x86/apicv: fix RTC periodic timer and apicv issue >> >> When Xen apicv is enabled, wall clock time is faster on Windows7-32 >> guest with high payload (with 2vCPU, captured from xentrace, in high >> payload, the count of IPI interrupt increases rapidly between these vCPUs). >> >> If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector >> 0xd1) are both pending (index of bit set in VIRR), unfortunately, the >> IPI intrrupt is high priority than periodic timer interrupt. Xen >> updates IPI interrupt bit set in VIRR to guest interrupt status (RVI) >> as a high priority and apicv (Virtual-Interrupt Delivery) delivers IPI >> interrupt within VMX non-root operation without a VM exit. Within VMX >> non-root operation, if periodic timer interrupt index of bit is set in >> VIRR and highest, the apicv delivers periodic timer interrupt within VMX non-root operation as well. >> >> But in current code, if Xen doesn't update periodic timer interrupt >> bit set in VIRR to guest interrupt status (RVI) directly, Xen is not >> aware of this case to decrease the count (pending_intr_nr) of pending >> periodic timer interrupt, then Xen will deliver a periodic timer >> interrupt again. The guest receives more periodic timer interrupt. >> >> If the periodic timer interrut is delivered and not the highest >> priority, make Xen be aware of this case to decrease the count of >> pending periodic timer interrupt. > >Nice catch! > >> >> Signed-off-by: Yifei Jiang <jiangyifei@huawei.com> >> Signed-off-by: Rongguang He <herongguang.he@huawei.com> >> Signed-off-by: Quan Xu <xuquan8@huawei.com> >> --- >> Why RFC: >> 1. I am not quite sure for other cases, such as nested case. > >We don't support the nested APICv now. So it is ok for this patch. > >> 2. Within VMX non-root operation, an Asynchronous Enclave Exit (including external >> interrupts, non-maskable interrupt system-management interrrupts, exceptions >> and VM exit) may occur before delivery of a periodic timer interrupt, the periodic >> timer interrupt may be lost when a coming periodic timer interrupt is delivered. >> Actually, and so current code is. > >I think you mean the combination not interrupt lost. It should be ok since it happens rarely and even native OS will encounter the same problem if it disable the interrupt for too long. > Yes, it is 'combination'. >> --- >> xen/arch/x86/hvm/vmx/intr.c | 16 +++++++++++++++- >> 1 file changed, 15 insertions(+), 1 deletion(-) >> >> diff --git a/xen/arch/x86/hvm/vmx/intr.c b/xen/arch/x86/hvm/vmx/intr.c >> index 8fca08c..d3a034e 100644 >> --- a/xen/arch/x86/hvm/vmx/intr.c >> +++ b/xen/arch/x86/hvm/vmx/intr.c >> @@ -334,7 +334,21 @@ void vmx_intr_assist(void) >> __vmwrite(EOI_EXIT_BITMAP(i), v->arch.hvm_vmx.eoi_exit_bitmap[i]); >> } >> >> - pt_intr_post(v, intack); >> + /* >> + * If the periodic timer interrut is delivered and not the highest priority, >> + * make Xen be aware of this case to decrease the count of pending periodic >> + * timer interrupt. >> + */ >> + if ( pt_vector != -1 && intack.vector > pt_vector ) >> + { >> + struct hvm_intack pt_intack; >> + >> + pt_intack.vector = pt_vector; >> + pt_intack.source = hvm_intsrc_lapic; >> + pt_intr_post(v, pt_intack); >> + } >> + else >> + pt_intr_post(v, intack); >> } >> else >> { >> -- >> 1.7.12.4 >> > >Looks good to me. > >Reviewed-by: Yang Zhang <yang.zhang.wz@gmail.com> > Thanks for your comments!! Quan >-- >Yang >Alibaba Cloud Computing
On 2016/8/22 18:35, Jan Beulich wrote: >>>> On 19.08.16 at 14:58, <xuquan8@huawei.com> wrote: >> From 9b2df963c13ad27e2cffbeddfa3267782ac3da2a Mon Sep 17 00:00:00 2001 >> From: Quan Xu <xuquan8@huawei.com> >> Date: Fri, 19 Aug 2016 20:40:31 +0800 >> Subject: [RFC PATCH] x86/apicv: fix RTC periodic timer and apicv issue >> >> When Xen apicv is enabled, wall clock time is faster on Windows7-32 guest >> with high payload (with 2vCPU, captured from xentrace, in high payload, the >> count of IPI interrupt increases rapidly between these vCPUs). >> >> If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector 0xd1) >> are both pending (index of bit set in VIRR), unfortunately, the IPI intrrupt >> is high priority than periodic timer interrupt. Xen updates IPI interrupt bit >> set in VIRR to guest interrupt status (RVI) as a high priority and apicv >> (Virtual-Interrupt Delivery) delivers IPI interrupt within VMX non-root >> operation without a VM exit. Within VMX non-root operation, if periodic timer >> interrupt index of bit is set in VIRR and highest, the apicv delivers periodic >> timer interrupt within VMX non-root operation as well. >> >> But in current code, if Xen doesn't update periodic timer interrupt bit set >> in VIRR to guest interrupt status (RVI) directly, Xen is not aware of this >> case to decrease the count (pending_intr_nr) of pending periodic timer interrupt, >> then Xen will deliver a periodic timer interrupt again. The guest receives more >> periodic timer interrupt. >> >> If the periodic timer interrut is delivered and not the highest priority, make >> Xen be aware of this case to decrease the count of pending periodic timer >> interrupt. > > I can see the issue you're trying to address, but for one - doesn't > this lead to other double accounting, namely once the pt irq becomes > the highest priority one? "intack.vector > pt_vector" I think above check in code can avoid the double accounting problem. It recalculate pending timer interrupt only when the highest priority interrupt is not timer. > >> --- a/xen/arch/x86/hvm/vmx/intr.c >> +++ b/xen/arch/x86/hvm/vmx/intr.c >> @@ -334,7 +334,21 @@ void vmx_intr_assist(void) >> __vmwrite(EOI_EXIT_BITMAP(i), v->arch.hvm_vmx.eoi_exit_bitmap[i]); >> } >> >> - pt_intr_post(v, intack); >> + /* >> + * If the periodic timer interrut is delivered and not the highest priority, >> + * make Xen be aware of this case to decrease the count of pending periodic >> + * timer interrupt. >> + */ >> + if ( pt_vector != -1 && intack.vector > pt_vector ) >> + { >> + struct hvm_intack pt_intack; >> + >> + pt_intack.vector = pt_vector; >> + pt_intack.source = hvm_intsrc_lapic; >> + pt_intr_post(v, pt_intack); >> + } >> + else >> + pt_intr_post(v, intack); > > And then I'm having two problems with this change: Processing other > than the highest priority irq here looks to be non-architectural. From > looking over pt_intr_post() there's only pt related accounting and no > actual interrupt handling there, but I'd like this to be (a) confirmed > and (b) called out in the comment. > > Plus the change above is in no way apicv specific, so I'd also like to > see clarification why this change is valid (i.e. at least not making > things worse) even without apicv in use. It is apicv specific case. Above code will execute only when cpu has virtual interrupt delivery which is part of apicv feature. > > And of course in any event VMX maintainer feedback is going to be > needed here. > > Jan >
On August 22, 2016 6:36 PM, <JBeulich@suse.com> wrote: >>>> On 19.08.16 at 14:58, <xuquan8@xxxxxxxxxx> wrote: >> From 9b2df963c13ad27e2cffbeddfa3267782ac3da2a Mon Sep 17 00:00:00 2001 >> From: Quan Xu <xuquan8@xxxxxxxxxx> >> Date: Fri, 19 Aug 2016 20:40:31 +0800 >> Subject: [RFC PATCH] x86/apicv: fix RTC periodic timer and apicv issue >> >> When Xen apicv is enabled, wall clock time is faster on Windows7-32 guest >> with high payload (with 2vCPU, captured from xentrace, in high payload, the >> count of IPI interrupt increases rapidly between these vCPUs). >> >> If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector 0xd1) >> are both pending (index of bit set in VIRR), unfortunately, the IPI intrrupt >> is high priority than periodic timer interrupt. Xen updates IPI interrupt bit >> set in VIRR to guest interrupt status (RVI) as a high priority and apicv >> (Virtual-Interrupt Delivery) delivers IPI interrupt within VMX non-root >> operation without a VM exit. Within VMX non-root operation, if periodic timer >> interrupt index of bit is set in VIRR and highest, the apicv delivers periodic >> timer interrupt within VMX non-root operation as well. >> >> But in current code, if Xen doesn't update periodic timer interrupt bit set >> in VIRR to guest interrupt status (RVI) directly, Xen is not aware of this >> case to decrease the count (pending_intr_nr) of pending periodic timer >> interrupt, >> then Xen will deliver a periodic timer interrupt again. The guest receives >> more >> periodic timer interrupt. >> >> If the periodic timer interrut is delivered and not the highest priority, make >> Xen be aware of this case to decrease the count of pending periodic timer >> interrupt. > >I can see the issue you're trying to address, but for one - doesn't >this lead to other double accounting, namely once the pt irq becomes >the highest priority one? > It is does NOT lead to other double accounting.. As if the pt irq becomes the highest priority one, the intack is pt one.. Then: + else + pt_intr_post(v, intack); >> --- a/xen/arch/x86/hvm/vmx/intr.c >> +++ b/xen/arch/x86/hvm/vmx/intr.c >> @@ -334,7 +334,21 @@ void vmx_intr_assist(void) >> __vmwrite(EOI_EXIT_BITMAP(i), >> v->arch.hvm_vmx.eoi_exit_bitmap[i]); >> } >> >> - pt_intr_post(v, intack); >> + /* >> + * If the periodic timer interrut is delivered and not the highest >> priority, >> + * make Xen be aware of this case to decrease the count of pending >> periodic >> + * timer interrupt. >> + */ >> + if ( pt_vector != -1 && intack.vector > pt_vector ) >> + { >> + struct hvm_intack pt_intack; >> + >> + pt_intack.vector = pt_vector; >> + pt_intack.source = hvm_intsrc_lapic; >> + pt_intr_post(v, pt_intack); >> + } >> + else >> + pt_intr_post(v, intack); > >And then I'm having two problems with this change: Processing other >than the highest priority irq here looks to be non-architectural. From >looking over pt_intr_post() there's only pt related accounting and no >actual interrupt handling there, but I'd like this to be > (a) confirmed To me, YES.. but I hope Kevin could confirm it again. >and (b) called out in the comment. Agreed. Good idea. > >Plus the change above is in no way apicv specific, so I'd also like to >see clarification why this change is valid (i.e. at least not making >things worse) even without apicv in use. > __iiuc__, it is apicv specific, as these change is under condition 'cpu_has_vmx_virtual_intr_delivery' (Virtual Interrupt Delivery is one of apicv features) as below: if ( cpu_has_vmx_virtual_intr_delivery ...) { ___change___ } >And of course in any event VMX maintainer feedback is going to be >needed here. Thanks for your comments!! Quan
On August 22, 2016 7:40 PM, Yang Zhang wrote: >On 2016/8/22 18:35, Jan Beulich wrote: >>>>> On 19.08.16 at 14:58, <xuquan8@huawei.com> wrote: >>> From 9b2df963c13ad27e2cffbeddfa3267782ac3da2a Mon Sep 17 00:00:00 >>> 2001 >>> From: Quan Xu <xuquan8@huawei.com> >>> Date: Fri, 19 Aug 2016 20:40:31 +0800 >>> Subject: [RFC PATCH] x86/apicv: fix RTC periodic timer and apicv >>> issue >>> >>> When Xen apicv is enabled, wall clock time is faster on Windows7-32 >>> guest with high payload (with 2vCPU, captured from xentrace, in high >>> payload, the count of IPI interrupt increases rapidly between these vCPUs). >>> >>> If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector >>> 0xd1) are both pending (index of bit set in VIRR), unfortunately, the >>> IPI intrrupt is high priority than periodic timer interrupt. Xen >>> updates IPI interrupt bit set in VIRR to guest interrupt status (RVI) >>> as a high priority and apicv (Virtual-Interrupt Delivery) delivers >>> IPI interrupt within VMX non-root operation without a VM exit. Within >>> VMX non-root operation, if periodic timer interrupt index of bit is >>> set in VIRR and highest, the apicv delivers periodic timer interrupt within >VMX non-root operation as well. >>> >>> But in current code, if Xen doesn't update periodic timer interrupt >>> bit set in VIRR to guest interrupt status (RVI) directly, Xen is not >>> aware of this case to decrease the count (pending_intr_nr) of pending >>> periodic timer interrupt, then Xen will deliver a periodic timer >>> interrupt again. The guest receives more periodic timer interrupt. >>> >>> If the periodic timer interrut is delivered and not the highest >>> priority, make Xen be aware of this case to decrease the count of >>> pending periodic timer interrupt. >> >> I can see the issue you're trying to address, but for one - doesn't >> this lead to other double accounting, namely once the pt irq becomes >> the highest priority one? > >"intack.vector > pt_vector" >I think above check in code can avoid the double accounting problem. It >recalculate pending timer interrupt only when the highest priority interrupt is >not timer. > Yes, >> >>> --- a/xen/arch/x86/hvm/vmx/intr.c >>> +++ b/xen/arch/x86/hvm/vmx/intr.c >>> @@ -334,7 +334,21 @@ void vmx_intr_assist(void) >>> __vmwrite(EOI_EXIT_BITMAP(i), >v->arch.hvm_vmx.eoi_exit_bitmap[i]); >>> } >>> >>> - pt_intr_post(v, intack); >>> + /* >>> + * If the periodic timer interrut is delivered and not the highest >priority, >>> + * make Xen be aware of this case to decrease the count of >pending periodic >>> + * timer interrupt. >>> + */ >>> + if ( pt_vector != -1 && intack.vector > pt_vector ) >>> + { >>> + struct hvm_intack pt_intack; >>> + >>> + pt_intack.vector = pt_vector; >>> + pt_intack.source = hvm_intsrc_lapic; >>> + pt_intr_post(v, pt_intack); >>> + } >>> + else >>> + pt_intr_post(v, intack); >> >> And then I'm having two problems with this change: Processing other >> than the highest priority irq here looks to be non-architectural. From >> looking over pt_intr_post() there's only pt related accounting and no >> actual interrupt handling there, but I'd like this to be (a) confirmed >> and (b) called out in the comment. >> >> Plus the change above is in no way apicv specific, so I'd also like to >> see clarification why this change is valid (i.e. at least not making >> things worse) even without apicv in use. > >It is apicv specific case. Above code will execute only when cpu has >virtual interrupt delivery which is part of apicv feature. > Yes, I think so too. Thanks for your clarification!! Quan >> >> And of course in any event VMX maintainer feedback is going to be >> needed here. >> >> Jan >> > > >-- >Yang >Alibaba Cloud Computing
>>> On 22.08.16 at 13:40, <yang.zhang.wz@gmail.com> wrote: > On 2016/8/22 18:35, Jan Beulich wrote: >>>>> On 19.08.16 at 14:58, <xuquan8@huawei.com> wrote: >>> From 9b2df963c13ad27e2cffbeddfa3267782ac3da2a Mon Sep 17 00:00:00 2001 >>> From: Quan Xu <xuquan8@huawei.com> >>> Date: Fri, 19 Aug 2016 20:40:31 +0800 >>> Subject: [RFC PATCH] x86/apicv: fix RTC periodic timer and apicv issue >>> >>> When Xen apicv is enabled, wall clock time is faster on Windows7-32 guest >>> with high payload (with 2vCPU, captured from xentrace, in high payload, the >>> count of IPI interrupt increases rapidly between these vCPUs). >>> >>> If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector 0xd1) >>> are both pending (index of bit set in VIRR), unfortunately, the IPI intrrupt >>> is high priority than periodic timer interrupt. Xen updates IPI interrupt bit >>> set in VIRR to guest interrupt status (RVI) as a high priority and apicv >>> (Virtual-Interrupt Delivery) delivers IPI interrupt within VMX non-root >>> operation without a VM exit. Within VMX non-root operation, if periodic timer >>> interrupt index of bit is set in VIRR and highest, the apicv delivers periodic >>> timer interrupt within VMX non-root operation as well. >>> >>> But in current code, if Xen doesn't update periodic timer interrupt bit set >>> in VIRR to guest interrupt status (RVI) directly, Xen is not aware of this >>> case to decrease the count (pending_intr_nr) of pending periodic timer interrupt, >>> then Xen will deliver a periodic timer interrupt again. The guest receives more >>> periodic timer interrupt. >>> >>> If the periodic timer interrut is delivered and not the highest priority, make >>> Xen be aware of this case to decrease the count of pending periodic timer >>> interrupt. >> >> I can see the issue you're trying to address, but for one - doesn't >> this lead to other double accounting, namely once the pt irq becomes >> the highest priority one? > > "intack.vector > pt_vector" > I think above check in code can avoid the double accounting problem. It > recalculate pending timer interrupt only when the highest priority > interrupt is not timer. I don't understand: When the pt one is highest priority, surely recalculation is also needed. I'm not talking about a problem in the case where the conditional is true, but about a possible subsequent run through vmx_intr_assist() when the previous higher priority interrupt has got handled, but the pt one (now being highest) is still pending. (Or similarly consider the case where on the next run through vmx_intr_assist() another higher priority interrupts appeared.) I'm not saying the change is wrong, I'm merely asking for further proof that it doesn't open up new issues. >>> --- a/xen/arch/x86/hvm/vmx/intr.c >>> +++ b/xen/arch/x86/hvm/vmx/intr.c >>> @@ -334,7 +334,21 @@ void vmx_intr_assist(void) >>> __vmwrite(EOI_EXIT_BITMAP(i), v->arch.hvm_vmx.eoi_exit_bitmap[i]); >>> } >>> >>> - pt_intr_post(v, intack); >>> + /* >>> + * If the periodic timer interrut is delivered and not the highest priority, >>> + * make Xen be aware of this case to decrease the count of pending periodic >>> + * timer interrupt. >>> + */ >>> + if ( pt_vector != -1 && intack.vector > pt_vector ) >>> + { >>> + struct hvm_intack pt_intack; >>> + >>> + pt_intack.vector = pt_vector; >>> + pt_intack.source = hvm_intsrc_lapic; >>> + pt_intr_post(v, pt_intack); >>> + } >>> + else >>> + pt_intr_post(v, intack); >> >> And then I'm having two problems with this change: Processing other >> than the highest priority irq here looks to be non-architectural. From >> looking over pt_intr_post() there's only pt related accounting and no >> actual interrupt handling there, but I'd like this to be (a) confirmed >> and (b) called out in the comment. >> >> Plus the change above is in no way apicv specific, so I'd also like to >> see clarification why this change is valid (i.e. at least not making >> things worse) even without apicv in use. > > It is apicv specific case. Above code will execute only when cpu has > virtual interrupt delivery which is part of apicv feature. Ah, right, I didn't look closely enough what conditional this is inside of. Jan
>>> On 22.08.16 at 13:41, <xuquan8@huawei.com> wrote: > On August 22, 2016 6:36 PM, <JBeulich@suse.com> wrote: >>>>> On 19.08.16 at 14:58, <xuquan8@xxxxxxxxxx> wrote: >>> From 9b2df963c13ad27e2cffbeddfa3267782ac3da2a Mon Sep 17 00:00:00 2001 >>> From: Quan Xu <xuquan8@xxxxxxxxxx> >>> Date: Fri, 19 Aug 2016 20:40:31 +0800 >>> Subject: [RFC PATCH] x86/apicv: fix RTC periodic timer and apicv issue >>> >>> When Xen apicv is enabled, wall clock time is faster on Windows7-32 guest >>> with high payload (with 2vCPU, captured from xentrace, in high payload, the >>> count of IPI interrupt increases rapidly between these vCPUs). >>> >>> If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector 0xd1) >>> are both pending (index of bit set in VIRR), unfortunately, the IPI intrrupt >>> is high priority than periodic timer interrupt. Xen updates IPI interrupt bit >>> set in VIRR to guest interrupt status (RVI) as a high priority and apicv >>> (Virtual-Interrupt Delivery) delivers IPI interrupt within VMX non-root >>> operation without a VM exit. Within VMX non-root operation, if periodic timer >>> interrupt index of bit is set in VIRR and highest, the apicv delivers periodic >>> timer interrupt within VMX non-root operation as well. >>> >>> But in current code, if Xen doesn't update periodic timer interrupt bit set >>> in VIRR to guest interrupt status (RVI) directly, Xen is not aware of this >>> case to decrease the count (pending_intr_nr) of pending periodic timer >>> interrupt, >>> then Xen will deliver a periodic timer interrupt again. The guest receives >>> more >>> periodic timer interrupt. >>> >>> If the periodic timer interrut is delivered and not the highest priority, make >>> Xen be aware of this case to decrease the count of pending periodic timer >>> interrupt. >> >>I can see the issue you're trying to address, but for one - doesn't >>this lead to other double accounting, namely once the pt irq becomes >>the highest priority one? >> > > It is does NOT lead to other double accounting.. > As if the pt irq becomes the highest priority one, the intack is pt one.. > Then: > > + else > + pt_intr_post(v, intack); As just said in reply to Yang: If this is still the same interrupt instance as in a prior run through here which took the if() branch, this change looks like having the potential of double accounting. >>Plus the change above is in no way apicv specific, so I'd also like to >>see clarification why this change is valid (i.e. at least not making >>things worse) even without apicv in use. >> > > __iiuc__, it is apicv specific, as these change is under condition > 'cpu_has_vmx_virtual_intr_delivery' (Virtual Interrupt Delivery is one of > apicv features) as below: > > if ( cpu_has_vmx_virtual_intr_delivery ...) { ___change___ } Indeed, as already said in reply to Yang, I apparently didn't look closely enough. I'm sorry for the noise. Jan
On 2016/8/22 20:04, Jan Beulich wrote: >>>> On 22.08.16 at 13:41, <xuquan8@huawei.com> wrote: >> On August 22, 2016 6:36 PM, <JBeulich@suse.com> wrote: >>>>>> On 19.08.16 at 14:58, <xuquan8@xxxxxxxxxx> wrote: >>>> From 9b2df963c13ad27e2cffbeddfa3267782ac3da2a Mon Sep 17 00:00:00 2001 >>>> From: Quan Xu <xuquan8@xxxxxxxxxx> >>>> Date: Fri, 19 Aug 2016 20:40:31 +0800 >>>> Subject: [RFC PATCH] x86/apicv: fix RTC periodic timer and apicv issue >>>> >>>> When Xen apicv is enabled, wall clock time is faster on Windows7-32 guest >>>> with high payload (with 2vCPU, captured from xentrace, in high payload, the >>>> count of IPI interrupt increases rapidly between these vCPUs). >>>> >>>> If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector 0xd1) >>>> are both pending (index of bit set in VIRR), unfortunately, the IPI intrrupt >>>> is high priority than periodic timer interrupt. Xen updates IPI interrupt bit >>>> set in VIRR to guest interrupt status (RVI) as a high priority and apicv >>>> (Virtual-Interrupt Delivery) delivers IPI interrupt within VMX non-root >>>> operation without a VM exit. Within VMX non-root operation, if periodic timer >>>> interrupt index of bit is set in VIRR and highest, the apicv delivers periodic >>>> timer interrupt within VMX non-root operation as well. >>>> >>>> But in current code, if Xen doesn't update periodic timer interrupt bit set >>>> in VIRR to guest interrupt status (RVI) directly, Xen is not aware of this >>>> case to decrease the count (pending_intr_nr) of pending periodic timer >>>> interrupt, >>>> then Xen will deliver a periodic timer interrupt again. The guest receives >>>> more >>>> periodic timer interrupt. >>>> >>>> If the periodic timer interrut is delivered and not the highest priority, make >>>> Xen be aware of this case to decrease the count of pending periodic timer >>>> interrupt. >>> >>> I can see the issue you're trying to address, but for one - doesn't >>> this lead to other double accounting, namely once the pt irq becomes >>> the highest priority one? >>> >> >> It is does NOT lead to other double accounting.. >> As if the pt irq becomes the highest priority one, the intack is pt one.. >> Then: >> >> + else >> + pt_intr_post(v, intack); > > As just said in reply to Yang: If this is still the same interrupt instance > as in a prior run through here which took the if() branch, this change > looks like having the potential of double accounting. This cannot happen: if pt intr is the second priority one, then corresponding vIRR bit will be cleared by hardware while the highest priority intr is delivered to guest. And the next vmx_intr_assit cannot aware of it since the vIRR bit is zero.
On August 22, 2016 9:10 PM, Yang Zhang wrote: >On 2016/8/22 20:04, Jan Beulich wrote: >>>>> On 22.08.16 at 13:41, <xuquan8@huawei.com> wrote: >>> On August 22, 2016 6:36 PM, <JBeulich@suse.com> wrote: >>>>>>> On 19.08.16 at 14:58, <xuquan8@xxxxxxxxxx> wrote: >>>>> From 9b2df963c13ad27e2cffbeddfa3267782ac3da2a Mon Sep 17 00:00:00 >>>>> 2001 >>>>> From: Quan Xu <xuquan8@xxxxxxxxxx> >>>>> Date: Fri, 19 Aug 2016 20:40:31 +0800 >>>>> Subject: [RFC PATCH] x86/apicv: fix RTC periodic timer and apicv >>>>> issue >>>>> >>>>> When Xen apicv is enabled, wall clock time is faster on Windows7-32 >>>>> guest with high payload (with 2vCPU, captured from xentrace, in >>>>> high payload, the count of IPI interrupt increases rapidly between these >vCPUs). >>>>> >>>>> If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector >>>>> 0xd1) are both pending (index of bit set in VIRR), unfortunately, >>>>> the IPI intrrupt is high priority than periodic timer interrupt. >>>>> Xen updates IPI interrupt bit set in VIRR to guest interrupt status >>>>> (RVI) as a high priority and apicv (Virtual-Interrupt Delivery) >>>>> delivers IPI interrupt within VMX non-root operation without a VM >>>>> exit. Within VMX non-root operation, if periodic timer interrupt >>>>> index of bit is set in VIRR and highest, the apicv delivers periodic timer >interrupt within VMX non-root operation as well. >>>>> >>>>> But in current code, if Xen doesn't update periodic timer interrupt >>>>> bit set in VIRR to guest interrupt status (RVI) directly, Xen is >>>>> not aware of this case to decrease the count (pending_intr_nr) of >>>>> pending periodic timer interrupt, then Xen will deliver a periodic >>>>> timer interrupt again. The guest receives more periodic timer >>>>> interrupt. >>>>> >>>>> If the periodic timer interrut is delivered and not the highest >>>>> priority, make Xen be aware of this case to decrease the count of >>>>> pending periodic timer interrupt. >>>> >>>> I can see the issue you're trying to address, but for one - doesn't >>>> this lead to other double accounting, namely once the pt irq becomes >>>> the highest priority one? >>>> >>> >>> It is does NOT lead to other double accounting.. >>> As if the pt irq becomes the highest priority one, the intack is pt one.. >>> Then: >>> >>> + else >>> + pt_intr_post(v, intack); >> >> As just said in reply to Yang: If this is still the same interrupt >> instance as in a prior run through here which took the if() branch, >> this change looks like having the potential of double accounting. > >This cannot happen: if pt intr is the second priority one, then corresponding vIRR >bit will be cleared by hardware while the highest priority intr is delivered to >guest. And the next vmx_intr_assit cannot aware of it since the vIRR bit is zero. > I am afraid that this may happen.. as I mentioned as 2nd RFC reason, Within VMX non-root operation, an Asynchronous Enclave Exit (including external interrupts, non-maskable interrupt system-management interrrupts, exceptions and VM exit) may occur before delivery of a periodic timer interrupt, the pt bit of VIRR is still there... but I will explain more why it doesn't have the potential of double accounting in current code in next email. Quan
On August 22, 2016 8:04 PM, <JBeulich@suse.com> wrote: >>>> On 22.08.16 at 13:41, <xuquan8@huawei.com> wrote: >> On August 22, 2016 6:36 PM, <JBeulich@suse.com> wrote: >>>>>> On 19.08.16 at 14:58, <xuquan8@xxxxxxxxxx> wrote: >>>> From 9b2df963c13ad27e2cffbeddfa3267782ac3da2a Mon Sep 17 00:00:00 >>>> 2001 >>>> From: Quan Xu <xuquan8@xxxxxxxxxx> >>>> Date: Fri, 19 Aug 2016 20:40:31 +0800 >>>> Subject: [RFC PATCH] x86/apicv: fix RTC periodic timer and apicv >>>> issue >>>> >>>> When Xen apicv is enabled, wall clock time is faster on Windows7-32 >>>> guest with high payload (with 2vCPU, captured from xentrace, in high >>>> payload, the count of IPI interrupt increases rapidly between these vCPUs). >>>> >>>> If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector >>>> 0xd1) are both pending (index of bit set in VIRR), unfortunately, >>>> the IPI intrrupt is high priority than periodic timer interrupt. Xen >>>> updates IPI interrupt bit set in VIRR to guest interrupt status >>>> (RVI) as a high priority and apicv (Virtual-Interrupt Delivery) >>>> delivers IPI interrupt within VMX non-root operation without a VM >>>> exit. Within VMX non-root operation, if periodic timer interrupt >>>> index of bit is set in VIRR and highest, the apicv delivers periodic timer >interrupt within VMX non-root operation as well. >>>> >>>> But in current code, if Xen doesn't update periodic timer interrupt >>>> bit set in VIRR to guest interrupt status (RVI) directly, Xen is not >>>> aware of this case to decrease the count (pending_intr_nr) of >>>> pending periodic timer interrupt, then Xen will deliver a periodic >>>> timer interrupt again. The guest receives more periodic timer >>>> interrupt. >>>> >>>> If the periodic timer interrut is delivered and not the highest >>>> priority, make Xen be aware of this case to decrease the count of >>>> pending periodic timer interrupt. >>> >>>I can see the issue you're trying to address, but for one - doesn't >>>this lead to other double accounting, namely once the pt irq becomes >>>the highest priority one? >>> >> >> It is does NOT lead to other double accounting.. >> As if the pt irq becomes the highest priority one, the intack is pt one.. >> Then: >> >> + else >> + pt_intr_post(v, intack); > >As just said in reply to Yang: If this is still the same interrupt instance as in a >prior run through here which took the if() branch, this change looks like having >the potential of double accounting. > I very appreciate your detail review. It looks like, but actually it doesn't happen. As the key parameter 'pt->irq_issued'.. In pt_update_irq(), once the PT irq is issued, set the pt->irq_issued.. In pt_intr_post(), clear the pt->irq_issued before touching the count 'pt->pending_intr_nr'.. According to your assumption, at the second call to pt_intr_post(), As if 'pt->irq_issued' is clear, pt is NULL in is_pt_irq() check, then return, there is no chance to touch the count 'pt->pending_intr_nr'.. ------ void pt_intr_post(struct vcpu *v, struct hvm_intack intack) { ... pt = is_pt_irq(v, intack); if ( pt == NULL ) { spin_unlock(&v->arch.hvm_vcpu.tm_lock); return; } ... ... } static struct periodic_time *is_pt_irq() { .... list_for_each_entry ( pt, head, list ) { if ( pt->pending_intr_nr && ________pt->irq_issued_______ && (intack.vector == pt_irq_vector(pt, intack.source)) ) return pt; } return NULL; } __IIUC__, this question is based on the following pseudocode detail the behavior of virtual-interrupt delivery is __not__ atomic: Vector <- RVI; VISR[Vector] <- 1; SVI <- Vector; VPPR<- Vector & F0H; VIRR[Vector] <- 0; IF any bits set in VIRR Then RVI<-highest index of bit set in VIRR ELSE RVI <-0 FI Deliver interrupt with Vector through IDT ... We'd better check this first, as Yang said, this is atomic.. Quan
2016年8月22日星期一,Xuquan (Euler) <xuquan8@huawei.com> 写道: > On August 22, 2016 8:04 PM, <JBeulich@suse.com <javascript:;>> wrote: > >>>> On 22.08.16 at 13:41, <xuquan8@huawei.com <javascript:;>> wrote: > >> On August 22, 2016 6:36 PM, <JBeulich@suse.com <javascript:;>> wrote: > >>>>>> On 19.08.16 at 14:58, <xuquan8@xxxxxxxxxx> wrote: > >>>> From 9b2df963c13ad27e2cffbeddfa3267782ac3da2a Mon Sep 17 00:00:00 > >>>> 2001 > >>>> From: Quan Xu <xuquan8@xxxxxxxxxx> > >>>> Date: Fri, 19 Aug 2016 20:40:31 +0800 > >>>> Subject: [RFC PATCH] x86/apicv: fix RTC periodic timer and apicv > >>>> issue > >>>> > >>>> When Xen apicv is enabled, wall clock time is faster on Windows7-32 > >>>> guest with high payload (with 2vCPU, captured from xentrace, in high > >>>> payload, the count of IPI interrupt increases rapidly between these > vCPUs). > >>>> > >>>> If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector > >>>> 0xd1) are both pending (index of bit set in VIRR), unfortunately, > >>>> the IPI intrrupt is high priority than periodic timer interrupt. Xen > >>>> updates IPI interrupt bit set in VIRR to guest interrupt status > >>>> (RVI) as a high priority and apicv (Virtual-Interrupt Delivery) > >>>> delivers IPI interrupt within VMX non-root operation without a VM > >>>> exit. Within VMX non-root operation, if periodic timer interrupt > >>>> index of bit is set in VIRR and highest, the apicv delivers periodic > timer > >interrupt within VMX non-root operation as well. > >>>> > >>>> But in current code, if Xen doesn't update periodic timer interrupt > >>>> bit set in VIRR to guest interrupt status (RVI) directly, Xen is not > >>>> aware of this case to decrease the count (pending_intr_nr) of > >>>> pending periodic timer interrupt, then Xen will deliver a periodic > >>>> timer interrupt again. The guest receives more periodic timer > >>>> interrupt. > >>>> > >>>> If the periodic timer interrut is delivered and not the highest > >>>> priority, make Xen be aware of this case to decrease the count of > >>>> pending periodic timer interrupt. > >>> > >>>I can see the issue you're trying to address, but for one - doesn't > >>>this lead to other double accounting, namely once the pt irq becomes > >>>the highest priority one? > >>> > >> > >> It is does NOT lead to other double accounting.. > >> As if the pt irq becomes the highest priority one, the intack is pt > one.. > >> Then: > >> > >> + else > >> + pt_intr_post(v, intack); > > > >As just said in reply to Yang: If this is still the same interrupt > instance as in a > >prior run through here which took the if() branch, this change looks like > having > >the potential of double accounting. > > > > I very appreciate your detail review. It looks like, but actually it > doesn't happen. > > As the key parameter 'pt->irq_issued'.. > > In pt_update_irq(), once the PT irq is issued, set the pt->irq_issued.. > In pt_intr_post(), clear the pt->irq_issued before touching the count > 'pt->pending_intr_nr'.. > > According to your assumption, at the second call to pt_intr_post(), As if > 'pt->irq_issued' is clear, pt is NULL in is_pt_irq() check, > then return, there is no chance to touch the count 'pt->pending_intr_nr'.. > ------ > void pt_intr_post(struct vcpu *v, struct hvm_intack intack) > { > ... > pt = is_pt_irq(v, intack); > if ( pt == NULL ) > { > spin_unlock(&v->arch.hvm_vcpu.tm_lock); > return; > } > ... > > > ... > } > > > static struct periodic_time *is_pt_irq() > { > .... > list_for_each_entry ( pt, head, list ) > { > if ( pt->pending_intr_nr && ________pt->irq_issued_______ && > (intack.vector == pt_irq_vector(pt, intack.source)) ) > return pt; > } > > return NULL; > } > > > > > > __IIUC__, this question is based on the following pseudocode detail the > behavior of virtual-interrupt delivery is __not__ atomic: > > Vector <- RVI; > VISR[Vector] <- 1; > SVI <- Vector; > VPPR<- Vector & F0H; > VIRR[Vector] <- 0; > IF any bits set in VIRR > Then RVI<-highest index of bit set in VIRR > ELSE RVI <-0 > FI > Deliver interrupt with Vector through IDT > ... > > > We'd better check this first, as Yang said, this is atomic.. i have said that this is ensured by hardware. > > Quan >
2016年8月22日星期一,Xuquan (Euler) <xuquan8@huawei.com> 写道: > On August 22, 2016 8:04 PM, <JBeulich@suse.com <javascript:;>> wrote: > >>>> On 22.08.16 at 13:41, <xuquan8@huawei.com <javascript:;>> wrote: > >> On August 22, 2016 6:36 PM, <JBeulich@suse.com <javascript:;>> wrote: > >>>>>> On 19.08.16 at 14:58, <xuquan8@xxxxxxxxxx> wrote: > >>>> From 9b2df963c13ad27e2cffbeddfa3267782ac3da2a Mon Sep 17 00:00:00 > >>>> 2001 > >>>> From: Quan Xu <xuquan8@xxxxxxxxxx> > >>>> Date: Fri, 19 Aug 2016 20:40:31 +0800 > >>>> Subject: [RFC PATCH] x86/apicv: fix RTC periodic timer and apicv > >>>> issue > >>>> > >>>> When Xen apicv is enabled, wall clock time is faster on Windows7-32 > >>>> guest with high payload (with 2vCPU, captured from xentrace, in high > >>>> payload, the count of IPI interrupt increases rapidly between these > vCPUs). > >>>> > >>>> If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector > >>>> 0xd1) are both pending (index of bit set in VIRR), unfortunately, > >>>> the IPI intrrupt is high priority than periodic timer interrupt. Xen > >>>> updates IPI interrupt bit set in VIRR to guest interrupt status > >>>> (RVI) as a high priority and apicv (Virtual-Interrupt Delivery) > >>>> delivers IPI interrupt within VMX non-root operation without a VM > >>>> exit. Within VMX non-root operation, if periodic timer interrupt > >>>> index of bit is set in VIRR and highest, the apicv delivers periodic > timer > >interrupt within VMX non-root operation as well. > >>>> > >>>> But in current code, if Xen doesn't update periodic timer interrupt > >>>> bit set in VIRR to guest interrupt status (RVI) directly, Xen is not > >>>> aware of this case to decrease the count (pending_intr_nr) of > >>>> pending periodic timer interrupt, then Xen will deliver a periodic > >>>> timer interrupt again. The guest receives more periodic timer > >>>> interrupt. > >>>> > >>>> If the periodic timer interrut is delivered and not the highest > >>>> priority, make Xen be aware of this case to decrease the count of > >>>> pending periodic timer interrupt. > >>> > >>>I can see the issue you're trying to address, but for one - doesn't > >>>this lead to other double accounting, namely once the pt irq becomes > >>>the highest priority one? > >>> > >> > >> It is does NOT lead to other double accounting.. > >> As if the pt irq becomes the highest priority one, the intack is pt > one.. > >> Then: > >> > >> + else > >> + pt_intr_post(v, intack); > > > >As just said in reply to Yang: If this is still the same interrupt > instance as in a > >prior run through here which took the if() branch, this change looks like > having > >the potential of double accounting. > > > > I very appreciate your detail review. It looks like, but actually it > doesn't happen. > > As the key parameter 'pt->irq_issued'.. > > In pt_update_irq(), once the PT irq is issued, set the pt->irq_issued.. > In pt_intr_post(), clear the pt->irq_issued before touching the count > 'pt->pending_intr_nr'.. > > According to your assumption, at the second call to pt_intr_post(), As if > 'pt->irq_issued' is clear, pt is NULL in is_pt_irq() check, > then return, there is no chance to touch the count 'pt->pending_intr_nr'.. > ------ > void pt_intr_post(struct vcpu *v, struct hvm_intack intack) > { > ... > pt = is_pt_irq(v, intack); > if ( pt == NULL ) > { > spin_unlock(&v->arch.hvm_vcpu.tm_lock); > return; > } > ... > > > ... > } > > > static struct periodic_time *is_pt_irq() > { > .... > list_for_each_entry ( pt, head, list ) > { > if ( pt->pending_intr_nr && ________pt->irq_issued_______ && > (intack.vector == pt_irq_vector(pt, intack.source)) ) > return pt; > } > > return NULL; > } > > > > > > __IIUC__, this question is based on the following pseudocode detail the > behavior of virtual-interrupt delivery is __not__ atomic: > > Vector <- RVI; > VISR[Vector] <- 1; > SVI <- Vector; > VPPR<- Vector & F0H; > VIRR[Vector] <- 0; > IF any bits set in VIRR > Then RVI<-highest index of bit set in VIRR > ELSE RVI <-0 > FI > Deliver interrupt with Vector through IDT > ... > > > We'd better check this first, as Yang said, this is atomic.. i have said that this is ensured by hardware. > > Quan >
>>> On 22.08.16 at 15:09, <yang.zhang.wz@gmail.com> wrote: > On 2016/8/22 20:04, Jan Beulich wrote: >>>>> On 22.08.16 at 13:41, <xuquan8@huawei.com> wrote: >>> On August 22, 2016 6:36 PM, <JBeulich@suse.com> wrote: >>>>>>> On 19.08.16 at 14:58, <xuquan8@xxxxxxxxxx> wrote: >>>>> From 9b2df963c13ad27e2cffbeddfa3267782ac3da2a Mon Sep 17 00:00:00 2001 >>>>> From: Quan Xu <xuquan8@xxxxxxxxxx> >>>>> Date: Fri, 19 Aug 2016 20:40:31 +0800 >>>>> Subject: [RFC PATCH] x86/apicv: fix RTC periodic timer and apicv issue >>>>> >>>>> When Xen apicv is enabled, wall clock time is faster on Windows7-32 guest >>>>> with high payload (with 2vCPU, captured from xentrace, in high payload, the >>>>> count of IPI interrupt increases rapidly between these vCPUs). >>>>> >>>>> If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector 0xd1) >>>>> are both pending (index of bit set in VIRR), unfortunately, the IPI intrrupt >>>>> is high priority than periodic timer interrupt. Xen updates IPI interrupt > bit >>>>> set in VIRR to guest interrupt status (RVI) as a high priority and apicv >>>>> (Virtual-Interrupt Delivery) delivers IPI interrupt within VMX non-root >>>>> operation without a VM exit. Within VMX non-root operation, if periodic timer >>>>> interrupt index of bit is set in VIRR and highest, the apicv delivers > periodic >>>>> timer interrupt within VMX non-root operation as well. >>>>> >>>>> But in current code, if Xen doesn't update periodic timer interrupt bit set >>>>> in VIRR to guest interrupt status (RVI) directly, Xen is not aware of this >>>>> case to decrease the count (pending_intr_nr) of pending periodic timer >>>>> interrupt, >>>>> then Xen will deliver a periodic timer interrupt again. The guest receives >>>>> more >>>>> periodic timer interrupt. >>>>> >>>>> If the periodic timer interrut is delivered and not the highest priority, > make >>>>> Xen be aware of this case to decrease the count of pending periodic timer >>>>> interrupt. >>>> >>>> I can see the issue you're trying to address, but for one - doesn't >>>> this lead to other double accounting, namely once the pt irq becomes >>>> the highest priority one? >>>> >>> >>> It is does NOT lead to other double accounting.. >>> As if the pt irq becomes the highest priority one, the intack is pt one.. >>> Then: >>> >>> + else >>> + pt_intr_post(v, intack); >> >> As just said in reply to Yang: If this is still the same interrupt instance >> as in a prior run through here which took the if() branch, this change >> looks like having the potential of double accounting. > > This cannot happen: if pt intr is the second priority one, then > corresponding vIRR bit will be cleared by hardware while the highest > priority intr is delivered to guest. And the next vmx_intr_assit cannot > aware of it since the vIRR bit is zero. In addition to what Quan said in response - aren't you mixing up vISR and vIRR here? I can see vISR being clear when a higher prio interrupt gets delivered (unless that happened to interrupt the pt interrupt handler), but I would hope that virtualized behavior matches non-virtualized one in that vIRR accumulates requests. Jan
>>> On 22.08.16 at 16:02, <xuquan8@huawei.com> wrote: > On August 22, 2016 8:04 PM, <JBeulich@suse.com> wrote: >>>>> On 22.08.16 at 13:41, <xuquan8@huawei.com> wrote: >>> On August 22, 2016 6:36 PM, <JBeulich@suse.com> wrote: >>>>>>> On 19.08.16 at 14:58, <xuquan8@xxxxxxxxxx> wrote: >>>>> From 9b2df963c13ad27e2cffbeddfa3267782ac3da2a Mon Sep 17 00:00:00 >>>>> 2001 >>>>> From: Quan Xu <xuquan8@xxxxxxxxxx> >>>>> Date: Fri, 19 Aug 2016 20:40:31 +0800 >>>>> Subject: [RFC PATCH] x86/apicv: fix RTC periodic timer and apicv >>>>> issue >>>>> >>>>> When Xen apicv is enabled, wall clock time is faster on Windows7-32 >>>>> guest with high payload (with 2vCPU, captured from xentrace, in high >>>>> payload, the count of IPI interrupt increases rapidly between these vCPUs). >>>>> >>>>> If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector >>>>> 0xd1) are both pending (index of bit set in VIRR), unfortunately, >>>>> the IPI intrrupt is high priority than periodic timer interrupt. Xen >>>>> updates IPI interrupt bit set in VIRR to guest interrupt status >>>>> (RVI) as a high priority and apicv (Virtual-Interrupt Delivery) >>>>> delivers IPI interrupt within VMX non-root operation without a VM >>>>> exit. Within VMX non-root operation, if periodic timer interrupt >>>>> index of bit is set in VIRR and highest, the apicv delivers periodic timer >>interrupt within VMX non-root operation as well. >>>>> >>>>> But in current code, if Xen doesn't update periodic timer interrupt >>>>> bit set in VIRR to guest interrupt status (RVI) directly, Xen is not >>>>> aware of this case to decrease the count (pending_intr_nr) of >>>>> pending periodic timer interrupt, then Xen will deliver a periodic >>>>> timer interrupt again. The guest receives more periodic timer >>>>> interrupt. >>>>> >>>>> If the periodic timer interrut is delivered and not the highest >>>>> priority, make Xen be aware of this case to decrease the count of >>>>> pending periodic timer interrupt. >>>> >>>>I can see the issue you're trying to address, but for one - doesn't >>>>this lead to other double accounting, namely once the pt irq becomes >>>>the highest priority one? >>>> >>> >>> It is does NOT lead to other double accounting.. >>> As if the pt irq becomes the highest priority one, the intack is pt one.. >>> Then: >>> >>> + else >>> + pt_intr_post(v, intack); >> >>As just said in reply to Yang: If this is still the same interrupt instance > as in a >>prior run through here which took the if() branch, this change looks like > having >>the potential of double accounting. >> > > I very appreciate your detail review. It looks like, but actually it doesn't > happen. > > As the key parameter 'pt->irq_issued'.. > > In pt_update_irq(), once the PT irq is issued, set the pt->irq_issued.. > In pt_intr_post(), clear the pt->irq_issued before touching the count > 'pt->pending_intr_nr'.. > > According to your assumption, at the second call to pt_intr_post(), As if > 'pt->irq_issued' is clear, pt is NULL in is_pt_irq() check, > then return, there is no chance to touch the count 'pt->pending_intr_nr'.. Hmm, wait: That second pass would also get us through pt_update_irq() a second time, which might cause irq_issued to get set again. Granted this code is fragile, therefore please excuse that I'm trying to be extra careful with accepting changes to it. Jan
On August 22, 2016 11:12 PM, <JBeulich@suse.com> wrote: >>>> On 22.08.16 at 16:02, <xuquan8@huawei.com> wrote: >> On August 22, 2016 8:04 PM, <JBeulich@suse.com> wrote: >>>>>> On 22.08.16 at 13:41, <xuquan8@huawei.com> wrote: >>>> On August 22, 2016 6:36 PM, <JBeulich@suse.com> wrote: >>>>>>>> On 19.08.16 at 14:58, <xuquan8@xxxxxxxxxx> wrote: >>>>>> From 9b2df963c13ad27e2cffbeddfa3267782ac3da2a Mon Sep 17 >00:00:00 >>>>>> 2001 >>>>>> From: Quan Xu <xuquan8@xxxxxxxxxx> >>>>>> Date: Fri, 19 Aug 2016 20:40:31 +0800 >>>>>> Subject: [RFC PATCH] x86/apicv: fix RTC periodic timer and apicv >>>>>> issue >>>>>> >>>>>> When Xen apicv is enabled, wall clock time is faster on >>>>>> Windows7-32 guest with high payload (with 2vCPU, captured from >>>>>> xentrace, in high payload, the count of IPI interrupt increases rapidly >between these vCPUs). >>>>>> >>>>>> If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector >>>>>> 0xd1) are both pending (index of bit set in VIRR), unfortunately, >>>>>> the IPI intrrupt is high priority than periodic timer interrupt. >>>>>> Xen updates IPI interrupt bit set in VIRR to guest interrupt >>>>>> status >>>>>> (RVI) as a high priority and apicv (Virtual-Interrupt Delivery) >>>>>> delivers IPI interrupt within VMX non-root operation without a VM >>>>>> exit. Within VMX non-root operation, if periodic timer interrupt >>>>>> index of bit is set in VIRR and highest, the apicv delivers >>>>>> periodic timer >>>interrupt within VMX non-root operation as well. >>>>>> >>>>>> But in current code, if Xen doesn't update periodic timer >>>>>> interrupt bit set in VIRR to guest interrupt status (RVI) >>>>>> directly, Xen is not aware of this case to decrease the count >>>>>> (pending_intr_nr) of pending periodic timer interrupt, then Xen >>>>>> will deliver a periodic timer interrupt again. The guest receives >>>>>> more periodic timer interrupt. >>>>>> >>>>>> If the periodic timer interrut is delivered and not the highest >>>>>> priority, make Xen be aware of this case to decrease the count of >>>>>> pending periodic timer interrupt. >>>>> >>>>>I can see the issue you're trying to address, but for one - doesn't >>>>>this lead to other double accounting, namely once the pt irq becomes >>>>>the highest priority one? >>>>> >>>> >>>> It is does NOT lead to other double accounting.. >>>> As if the pt irq becomes the highest priority one, the intack is pt one.. >>>> Then: >>>> >>>> + else >>>> + pt_intr_post(v, intack); >>> >>>As just said in reply to Yang: If this is still the same interrupt >>>instance >> as in a >>>prior run through here which took the if() branch, this change looks >>>like >> having >>>the potential of double accounting. >>> >> >> I very appreciate your detail review. It looks like, but actually it >> doesn't happen. >> >> As the key parameter 'pt->irq_issued'.. >> >> In pt_update_irq(), once the PT irq is issued, set the pt->irq_issued.. >> In pt_intr_post(), clear the pt->irq_issued before touching the count >> 'pt->pending_intr_nr'.. >> >> According to your assumption, at the second call to pt_intr_post(), As >> if 'pt->irq_issued' is clear, pt is NULL in is_pt_irq() check, then >> return, there is no chance to touch the count 'pt->pending_intr_nr'.. > >Hmm, wait: That second pass would also get us through >pt_update_irq() a second time, which might cause irq_issued to get set again. > iiuc this case is IRQ combination, issue twice but delivery once.. Another enhancement may be required here, IF hvm_intsrc_lapic && the PT IRQ index bit in VIRR is not clear THEN don't issue PT IRQ in pt_update_irq() FI >Granted this code is fragile, therefore please excuse that I'm trying to be extra >careful with accepting changes to it. Understood :):)... Quan
On 2016/8/22 22:57, Jan Beulich wrote: >>>> On 22.08.16 at 15:09, <yang.zhang.wz@gmail.com> wrote: >> On 2016/8/22 20:04, Jan Beulich wrote: >>>>>> On 22.08.16 at 13:41, <xuquan8@huawei.com> wrote: >>>> On August 22, 2016 6:36 PM, <JBeulich@suse.com> wrote: >>>>>>>> On 19.08.16 at 14:58, <xuquan8@xxxxxxxxxx> wrote: >>>>>> From 9b2df963c13ad27e2cffbeddfa3267782ac3da2a Mon Sep 17 00:00:00 2001 >>>>>> From: Quan Xu <xuquan8@xxxxxxxxxx> >>>>>> Date: Fri, 19 Aug 2016 20:40:31 +0800 >>>>>> Subject: [RFC PATCH] x86/apicv: fix RTC periodic timer and apicv issue >>>>>> >>>>>> When Xen apicv is enabled, wall clock time is faster on Windows7-32 guest >>>>>> with high payload (with 2vCPU, captured from xentrace, in high payload, the >>>>>> count of IPI interrupt increases rapidly between these vCPUs). >>>>>> >>>>>> If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector 0xd1) >>>>>> are both pending (index of bit set in VIRR), unfortunately, the IPI intrrupt >>>>>> is high priority than periodic timer interrupt. Xen updates IPI interrupt >> bit >>>>>> set in VIRR to guest interrupt status (RVI) as a high priority and apicv >>>>>> (Virtual-Interrupt Delivery) delivers IPI interrupt within VMX non-root >>>>>> operation without a VM exit. Within VMX non-root operation, if periodic timer >>>>>> interrupt index of bit is set in VIRR and highest, the apicv delivers >> periodic >>>>>> timer interrupt within VMX non-root operation as well. >>>>>> >>>>>> But in current code, if Xen doesn't update periodic timer interrupt bit set >>>>>> in VIRR to guest interrupt status (RVI) directly, Xen is not aware of this >>>>>> case to decrease the count (pending_intr_nr) of pending periodic timer >>>>>> interrupt, >>>>>> then Xen will deliver a periodic timer interrupt again. The guest receives >>>>>> more >>>>>> periodic timer interrupt. >>>>>> >>>>>> If the periodic timer interrut is delivered and not the highest priority, >> make >>>>>> Xen be aware of this case to decrease the count of pending periodic timer >>>>>> interrupt. >>>>> >>>>> I can see the issue you're trying to address, but for one - doesn't >>>>> this lead to other double accounting, namely once the pt irq becomes >>>>> the highest priority one? >>>>> >>>> >>>> It is does NOT lead to other double accounting.. >>>> As if the pt irq becomes the highest priority one, the intack is pt one.. >>>> Then: >>>> >>>> + else >>>> + pt_intr_post(v, intack); >>> >>> As just said in reply to Yang: If this is still the same interrupt instance >>> as in a prior run through here which took the if() branch, this change >>> looks like having the potential of double accounting. >> >> This cannot happen: if pt intr is the second priority one, then >> corresponding vIRR bit will be cleared by hardware while the highest >> priority intr is delivered to guest. And the next vmx_intr_assit cannot >> aware of it since the vIRR bit is zero. > > In addition to what Quan said in response - aren't you mixing up > vISR and vIRR here? I can see vISR being clear when a higher prio Right. It should be ISR not IRR. > interrupt gets delivered (unless that happened to interrupt the pt > interrupt handler), but I would hope that virtualized behavior > matches non-virtualized one in that vIRR accumulates requests.
> From: Xuquan (Euler) [mailto:xuquan8@huawei.com] > Sent: Friday, August 19, 2016 8:59 PM > > When Xen apicv is enabled, wall clock time is faster on Windows7-32 guest > with high payload (with 2vCPU, captured from xentrace, in high payload, the > count of IPI interrupt increases rapidly between these vCPUs). > > If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector 0xd1) > are both pending (index of bit set in VIRR), unfortunately, the IPI intrrupt > is high priority than periodic timer interrupt. Xen updates IPI interrupt bit > set in VIRR to guest interrupt status (RVI) as a high priority and apicv > (Virtual-Interrupt Delivery) delivers IPI interrupt within VMX non-root > operation without a VM exit. Within VMX non-root operation, if periodic timer > interrupt index of bit is set in VIRR and highest, the apicv delivers periodic > timer interrupt within VMX non-root operation as well. > > But in current code, if Xen doesn't update periodic timer interrupt bit set > in VIRR to guest interrupt status (RVI) directly, Xen is not aware of this > case to decrease the count (pending_intr_nr) of pending periodic timer interrupt, > then Xen will deliver a periodic timer interrupt again. The guest receives more > periodic timer interrupt. > > If the periodic timer interrut is delivered and not the highest priority, make > Xen be aware of this case to decrease the count of pending periodic timer > interrupt. > > Signed-off-by: Yifei Jiang <jiangyifei@huawei.com> > Signed-off-by: Rongguang He <herongguang.he@huawei.com> > Signed-off-by: Quan Xu <xuquan8@huawei.com> > --- > Why RFC: > 1. I am not quite sure for other cases, such as nested case. > 2. Within VMX non-root operation, an Asynchronous Enclave Exit (including external > interrupts, non-maskable interrupt system-management interrrupts, exceptions > and VM exit) may occur before delivery of a periodic timer interrupt, the periodic > timer interrupt may be lost when a coming periodic timer interrupt is delivered. > Actually, and so current code is. > --- > xen/arch/x86/hvm/vmx/intr.c | 16 +++++++++++++++- > 1 file changed, 15 insertions(+), 1 deletion(-) > > diff --git a/xen/arch/x86/hvm/vmx/intr.c b/xen/arch/x86/hvm/vmx/intr.c > index 8fca08c..d3a034e 100644 > --- a/xen/arch/x86/hvm/vmx/intr.c > +++ b/xen/arch/x86/hvm/vmx/intr.c > @@ -334,7 +334,21 @@ void vmx_intr_assist(void) > __vmwrite(EOI_EXIT_BITMAP(i), v->arch.hvm_vmx.eoi_exit_bitmap[i]); > } > > - pt_intr_post(v, intack); > + /* > + * If the periodic timer interrut is delivered and not the highest priority, > + * make Xen be aware of this case to decrease the count of pending periodic > + * timer interrupt. > + */ > + if ( pt_vector != -1 && intack.vector > pt_vector ) > + { > + struct hvm_intack pt_intack; > + > + pt_intack.vector = pt_vector; > + pt_intack.source = hvm_intsrc_lapic; > + pt_intr_post(v, pt_intack); > + } > + else > + pt_intr_post(v, intack); > } Here is my thought. w/o above patch one pending pt interrupt may result in multiple injections of guest timer interrupt, as you identified, when pt_vector happens to be not the highest pending vector. Then w/ your patch, multiple pending pt interrupt instances may be combined as one injection of guest timer interrupt. Especially intack.vector >pt_vector doesn't mean pt_intr is eligible for injection, which might be blocked due to other reasons. As Jan pointed out, this path is very fragile, so better we can have a fix to sustain the original behavior with one pending pt instance causing one actual injection. What about doing pt_intr_post in EOI-induced VM-exit instead? EOI exit bit for pt_intr is already set to 1 which means every injection would incur an EOI-induced VM-exit: /* * Set eoi_exit_bitmap for periodic timer interrup to cause EOI-induced VM * exit, then pending periodic time interrups have the chance to be injected * for compensation */ if (pt_vector != -1) vmx_set_eoi_exit_bitmap(v, pt_vector); I don't think delaying post makes much difference here. Even we do post earlier like your patch, further pending instances have to wait until current instance is completed. So as long as post happens before EOI is completed, it should be fine. Thanks Kevin
On August 30, 2016 1:58 PM, Tian, Kevin <kevin.tian@intel.com> wrote: >> From: Xuquan (Euler) [mailto:xuquan8@huawei.com] >> Sent: Friday, August 19, 2016 8:59 PM >> >> When Xen apicv is enabled, wall clock time is faster on Windows7-32 >> guest with high payload (with 2vCPU, captured from xentrace, in high >> payload, the count of IPI interrupt increases rapidly between these vCPUs). >> >> If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector >> 0xd1) are both pending (index of bit set in VIRR), unfortunately, the >> IPI intrrupt is high priority than periodic timer interrupt. Xen >> updates IPI interrupt bit set in VIRR to guest interrupt status (RVI) >> as a high priority and apicv (Virtual-Interrupt Delivery) delivers IPI >> interrupt within VMX non-root operation without a VM exit. Within VMX >> non-root operation, if periodic timer interrupt index of bit is set in >> VIRR and highest, the apicv delivers periodic timer interrupt within VMX >non-root operation as well. >> >> But in current code, if Xen doesn't update periodic timer interrupt >> bit set in VIRR to guest interrupt status (RVI) directly, Xen is not >> aware of this case to decrease the count (pending_intr_nr) of pending >> periodic timer interrupt, then Xen will deliver a periodic timer >> interrupt again. The guest receives more periodic timer interrupt. >> >> If the periodic timer interrut is delivered and not the highest >> priority, make Xen be aware of this case to decrease the count of >> pending periodic timer interrupt. >> >> Signed-off-by: Yifei Jiang <jiangyifei@huawei.com> >> Signed-off-by: Rongguang He <herongguang.he@huawei.com> >> Signed-off-by: Quan Xu <xuquan8@huawei.com> >> --- >> Why RFC: >> 1. I am not quite sure for other cases, such as nested case. >> 2. Within VMX non-root operation, an Asynchronous Enclave Exit (including >external >> interrupts, non-maskable interrupt system-management interrrupts, >exceptions >> and VM exit) may occur before delivery of a periodic timer interrupt, the >periodic >> timer interrupt may be lost when a coming periodic timer interrupt is >delivered. >> Actually, and so current code is. >> --- >> xen/arch/x86/hvm/vmx/intr.c | 16 +++++++++++++++- >> 1 file changed, 15 insertions(+), 1 deletion(-) >> >> diff --git a/xen/arch/x86/hvm/vmx/intr.c b/xen/arch/x86/hvm/vmx/intr.c >> index 8fca08c..d3a034e 100644 >> --- a/xen/arch/x86/hvm/vmx/intr.c >> +++ b/xen/arch/x86/hvm/vmx/intr.c >> @@ -334,7 +334,21 @@ void vmx_intr_assist(void) >> __vmwrite(EOI_EXIT_BITMAP(i), >v->arch.hvm_vmx.eoi_exit_bitmap[i]); >> } >> >> - pt_intr_post(v, intack); >> + /* >> + * If the periodic timer interrut is delivered and not the highest >priority, >> + * make Xen be aware of this case to decrease the count of pending >periodic >> + * timer interrupt. >> + */ >> + if ( pt_vector != -1 && intack.vector > pt_vector ) >> + { >> + struct hvm_intack pt_intack; >> + >> + pt_intack.vector = pt_vector; >> + pt_intack.source = hvm_intsrc_lapic; >> + pt_intr_post(v, pt_intack); >> + } >> + else >> + pt_intr_post(v, intack); >> } > >Here is my thought. w/o above patch one pending pt interrupt may result in >multiple injections of guest timer interrupt, as you identified, when pt_vector >happens to be not the highest pending vector. Then w/ your patch, multiple >pending pt interrupt instances may be combined as one injection of guest timer >interrupt. Especially intack.vector >>pt_vector doesn't mean pt_intr is eligible for injection, which might >be blocked due to other reasons. As Jan pointed out, this path is very fragile, so >better we can have a fix to sustain the original behavior with one pending pt >instance causing one actual injection. > Agreed. Good summary.. >What about doing pt_intr_post in EOI-induced VM-exit instead? EOI exit bit for >pt_intr is already set to 1 which means every injection would incur an >EOI-induced VM-exit: > > /* > * Set eoi_exit_bitmap for periodic timer interrup to cause EOI-induced >VM > * exit, then pending periodic time interrups have the chance to be >injected > * for compensation > */ > if (pt_vector != -1) > vmx_set_eoi_exit_bitmap(v, pt_vector); > >I don't think delaying post makes much difference here. Even we do post earlier >like your patch, further pending instances have to wait until current instance is >completed. So as long as post happens before EOI is completed, it should be >fine. Although I really wanted to fix here, I gave up.. We need to try our best to narrow down this 'window' -- from pt_update_irq() to pt_intr_post().. I think It is too later to call pt_intr_post() in EOI-induced VM-exit, _iiuc_, we cannot make sure that each PT interrupt injection incurs an EOI-induced VM-exit before the next VM entry.. then one pending pt interrupt may result in multiple injections of guest timer interrupt.. to be honest, my fix is also a tradeoff.. w/ my patch, multiple pending pt interrupt instances may be combined as one injection of guest timer interrupt.. but as Yang said, It should be ok since it happens rarely and even native OS will encounter the same problem if it disable the interrupt for too long. Thanks for your comments!! Quan Huawei 2012 lab
>>> On 06.09.16 at 10:53, <xuquan8@huawei.com> wrote: > to be honest, my fix is also a tradeoff.. w/ my patch, multiple pending pt > interrupt instances may be combined as one injection of guest timer > interrupt.. > but as Yang said, It should be ok since it happens rarely and even native > OS will encounter the same problem if it disable the interrupt for too long. And one OS may properly deal with that situation while another might not. For example, let me remind you of issues the hypervisor had in earlier days when no event/softirq processing happened for long enough, resulting in time management issues due to missed platform timer overflow handling. IOW an OS may avoid that situation by simply not disabling IRQs for too long a time. Jan
On September 06, 2016 5:05 PM, Jan Beulich <JBeulich@suse.com> >>>> On 06.09.16 at 10:53, <xuquan8@huawei.com> wrote: >> to be honest, my fix is also a tradeoff.. w/ my patch, multiple >> pending pt interrupt instances may be combined as one injection of >> guest timer interrupt.. >> but as Yang said, It should be ok since it happens rarely and even >> native OS will encounter the same problem if it disable the interrupt for too >long. > >And one OS may properly deal with that situation while another might not. For >example, let me remind you of issues the hypervisor had in earlier days when no >event/softirq processing happened for long enough, resulting in time >management issues due to missed platform timer overflow handling. IOW an OS >may avoid that situation by simply not disabling IRQs for too long a time. > Thanks for your reminding, Jan, do you have any suggestion to fix this PT and apicv issue? What do you think about Kevin's? Quan
>>> On 06.09.16 at 13:22, <xuquan8@huawei.com> wrote: > On September 06, 2016 5:05 PM, Jan Beulich <JBeulich@suse.com> >>>>> On 06.09.16 at 10:53, <xuquan8@huawei.com> wrote: >>> to be honest, my fix is also a tradeoff.. w/ my patch, multiple >>> pending pt interrupt instances may be combined as one injection of >>> guest timer interrupt.. >>> but as Yang said, It should be ok since it happens rarely and even >>> native OS will encounter the same problem if it disable the interrupt for too >>long. >> >>And one OS may properly deal with that situation while another might not. For >>example, let me remind you of issues the hypervisor had in earlier days when no >>event/softirq processing happened for long enough, resulting in time >>management issues due to missed platform timer overflow handling. IOW an OS >>may avoid that situation by simply not disabling IRQs for too long a time. >> > Thanks for your reminding, > Jan, do you have any suggestion to fix this PT and apicv issue? > What do you think about Kevin's? Kevin's suggestion sounded reasonable, iirc, and I'd rely on his and your better VMX knowledge for the low level details. Jan
On September 06, 2016 9:21 PM, Jan Beulich <JBeulich@suse.com> wrote: >>>> On 06.09.16 at 13:22, <xuquan8@huawei.com> wrote: >> On September 06, 2016 5:05 PM, Jan Beulich <JBeulich@suse.com> >>>>>> On 06.09.16 at 10:53, <xuquan8@huawei.com> wrote: >>>> to be honest, my fix is also a tradeoff.. w/ my patch, multiple >>>> pending pt interrupt instances may be combined as one injection of >>>> guest timer interrupt.. >>>> but as Yang said, It should be ok since it happens rarely and even >>>> native OS will encounter the same problem if it disable the >>>> interrupt for too >>>long. >>> >>>And one OS may properly deal with that situation while another might >>>not. For example, let me remind you of issues the hypervisor had in >>>earlier days when no event/softirq processing happened for long >>>enough, resulting in time management issues due to missed platform >>>timer overflow handling. IOW an OS may avoid that situation by simply not >disabling IRQs for too long a time. >>> >> Thanks for your reminding, >> Jan, do you have any suggestion to fix this PT and apicv issue? >> What do you think about Kevin's? > >Kevin's suggestion sounded reasonable, iirc, and I'd rely on his and your better >VMX knowledge for the low level details. > I'll try to verify Kevin's later. Kevin is really a master but I am just a rookie. I always appreciate both of your education.. Quan
diff --git a/xen/arch/x86/hvm/vmx/intr.c b/xen/arch/x86/hvm/vmx/intr.c index 8fca08c..d3a034e 100644 --- a/xen/arch/x86/hvm/vmx/intr.c +++ b/xen/arch/x86/hvm/vmx/intr.c @@ -334,7 +334,21 @@ void vmx_intr_assist(void) __vmwrite(EOI_EXIT_BITMAP(i), v->arch.hvm_vmx.eoi_exit_bitmap[i]); } - pt_intr_post(v, intack); + /* + * If the periodic timer interrut is delivered and not the highest priority, + * make Xen be aware of this case to decrease the count of pending periodic + * timer interrupt. + */ + if ( pt_vector != -1 && intack.vector > pt_vector ) + { + struct hvm_intack pt_intack; + + pt_intack.vector = pt_vector; + pt_intack.source = hvm_intsrc_lapic; + pt_intr_post(v, pt_intack); + } + else + pt_intr_post(v, intack); } else {