Message ID | 1409148331-5964-1-git-send-email-wei.w.wang@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Il 27/08/2014 16:05, Wei Wang ha scritto: > Guest may mask the IOAPIC entry before issue EOI. In such case, > EOI will not be intercepted by the hypervisor, since the corresponding > bit in eoi_exit_bitmap is not set after the masking of IOAPIC entry. > > The solution here is to OR eoi_exit_bitmap with tmr to make sure that > all level-triggered interrupts have their bits in eoi_exit_bitmap set. This commit message does not explain why this change is necessary, and the relationship between this patch and the previous one. For example: ------ Commit 0f6c0a740b7d (KVM: x86: always exit on EOIs for interrupts listed in the IOAPIC redir table, 2014-07-30) fixed an APICv bug where an incorrect EOI exit bitmap triggered an interrupt storm inside the guest. There is a corner case for which that patch would have disabled accelerated EOI unnecessarily. Suppose you have: - a device that was the sole user of an INTx interrupt and is hot-unplugged - an OS that masks the INTx interrupt entry in the IOAPIC after the unplug - another device that uses MSI and is subsequently hot-plugged If the OS chooses to reuse the same LAPIC interrupt vector for the two interrupts, the patch would have left the vector in the EOI exit bitmap, because KVM takes into account the stale entry in the IOAPIC redirection table. We do know exactly which masked interrupts are still in-service and thus require broadcasting an EOI to the IOAPIC: this information is in the TMR. So, this patch ORs the EOI exit bitmap provided by the ioapic with the TMR register. Thanks to the previous patch, an active level-triggered interrupt will always be included in the EOI exit bitmap. ------ However, see below. > Tested-by: Rongrong Liu <rongrongx.liu@intel.com> > Signed-off-by: Yang Zhang <yang.z.zhang@intel.com> > Signed-off-by: Wei Wang <wei.w.wang@intel.com> > --- > arch/x86/kvm/lapic.c | 12 ++++++++++++ > arch/x86/kvm/lapic.h | 1 + > arch/x86/kvm/x86.c | 1 + > virt/kvm/ioapic.c | 7 ++++--- > 4 files changed, 18 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > index 8c1162d..0fcac3c 100644 > --- a/arch/x86/kvm/lapic.c > +++ b/arch/x86/kvm/lapic.c > @@ -539,6 +539,18 @@ void kvm_apic_update_tmr(struct kvm_vcpu *vcpu, u32 *tmr) > } > } > > +void kvm_apic_update_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap) > +{ > + struct kvm_lapic *apic = vcpu->arch.apic; > + u32 i; > + u32 tmr; > + > + for (i = 0; i < 8; i++) { > + tmr = kvm_apic_get_reg(apic, APIC_TMR + 0x10 * i); > + *((u32 *)eoi_exit_bitmap + i) |= tmr; > + } > +} > + > static void apic_update_ppr(struct kvm_lapic *apic) > { > u32 tpr, isrv, ppr, old_ppr; > diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h > index 6a11845..d2b96f2 100644 > --- a/arch/x86/kvm/lapic.h > +++ b/arch/x86/kvm/lapic.h > @@ -55,6 +55,7 @@ void kvm_apic_set_version(struct kvm_vcpu *vcpu); > > void kvm_apic_update_tmr(struct kvm_vcpu *vcpu, u32 *tmr); > void kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir); > +void kvm_apic_update_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap); > int kvm_apic_match_physical_addr(struct kvm_lapic *apic, u16 dest); > int kvm_apic_match_logical_addr(struct kvm_lapic *apic, u8 mda); > int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq, > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index d401684..d23b558 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -5992,6 +5992,7 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu) > > kvm_ioapic_scan_entry(vcpu, eoi_exit_bitmap, tmr); > kvm_apic_update_tmr(vcpu, tmr); > + kvm_apic_update_eoi_exitmap(vcpu, eoi_exit_bitmap); > kvm_x86_ops->load_eoi_exitmap(vcpu, eoi_exit_bitmap); > } > > diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c > index e8ce34c..ed15936 100644 > --- a/virt/kvm/ioapic.c > +++ b/virt/kvm/ioapic.c > @@ -254,9 +254,10 @@ void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap, > spin_lock(&ioapic->lock); > for (index = 0; index < IOAPIC_NUM_PINS; index++) { > e = &ioapic->redirtbl[index]; > - if (e->fields.trig_mode == IOAPIC_LEVEL_TRIG || > - kvm_irq_has_notifier(ioapic->kvm, KVM_IRQCHIP_IOAPIC, index) || > - index == RTC_GSI) { > + if ((!e->fields.mask > + && e->fields.trig_mode == IOAPIC_LEVEL_TRIG) > + || kvm_irq_has_notifier(ioapic->kvm, KVM_IRQCHIP_IOAPIC, > + index) || index == RTC_GSI) { > if (kvm_apic_match_dest(vcpu, NULL, 0, > e->fields.dest_id, e->fields.dest_mode)) { > __set_bit(e->fields.vector, > There's still something missing here. Suppose you have the following: Program edge-triggered MSI for vector 123 Interrupt comes in, ISR[123]=1 Mask MSI Program level-triggered IOAPIC interrupt for vector 123 >> Here, TMR[123] remains 0. Send EOI for vector 123 Now the TMR will not be updated in the LAPICs, and the EOI exit bitmap will not be set correctly. To fix this, we could drop all communication of the TMR from IOAPIC to LAPIC. Instead, do what the processor does and just modify the TMR in __apic_accept_irq, based on the trig_mode. If the TMR changes, you trigger an IOAPIC scan that will also refresh the EOI exit bitmap. For the hot-unplug/hot-plug case that Yang mentioned, the EOI exit bitmap will be updated as soon as the first MSI arrives (the MSI is edge-triggered). But this is more tricky than it looks like. Because the interrupt must not be delivered until the EOI exit bitmap is right, you have to skip posted interrupt delivery. There could be races between the handling of KVM_REQ_SCAN_IOAPIC and KVM_REQ_EVENT, which are hard to get right. I'm not sure it's worthwhile for what is after all a corner case. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Paolo Bonzini wrote on 2014-08-27: > Il 27/08/2014 16:05, Wei Wang ha scritto: > > Guest may mask the IOAPIC entry before issue EOI. In such case, EOI > > will not be intercepted by the hypervisor, since the corresponding bit > > in eoi_exit_bitmap is not set after the masking of IOAPIC entry. > > > > The solution here is to OR eoi_exit_bitmap with tmr to make sure that > > all level-triggered interrupts have their bits in eoi_exit_bitmap set. > > This commit message does not explain why this change is necessary, and the > relationship between this patch and the previous one. > > For example: > > ------ > Commit 0f6c0a740b7d (KVM: x86: always exit on EOIs for interrupts listed in > the IOAPIC redir table, 2014-07-30) fixed an APICv bug where an incorrect EOI > exit bitmap triggered an interrupt storm inside the guest. > > There is a corner case for which that patch would have disabled accelerated > EOI unnecessarily. Suppose you have: > > - a device that was the sole user of an INTx interrupt and is hot-unplugged > > - an OS that masks the INTx interrupt entry in the IOAPIC after the unplug > > - another device that uses MSI and is subsequently hot-plugged > > If the OS chooses to reuse the same LAPIC interrupt vector for the two > interrupts, the patch would have left the vector in the EOI exit bitmap, because > KVM takes into account the stale entry in the IOAPIC redirection table. > > We do know exactly which masked interrupts are still in-service and thus > require broadcasting an EOI to the IOAPIC: this information is in the TMR. So, > this patch ORs the EOI exit bitmap provided by the ioapic with the TMR register. > Thanks to the previous patch, an active level-triggered interrupt will always be > included in the EOI exit bitmap. > ------ > > However, see below. > > > Tested-by: Rongrong Liu <rongrongx.liu@intel.com> > > Signed-off-by: Yang Zhang <yang.z.zhang@intel.com> > > Signed-off-by: Wei Wang <wei.w.wang@intel.com> > > --- > > arch/x86/kvm/lapic.c | 12 ++++++++++++ > > arch/x86/kvm/lapic.h | 1 + > > arch/x86/kvm/x86.c | 1 + > > virt/kvm/ioapic.c | 7 ++++--- > > 4 files changed, 18 insertions(+), 3 deletions(-) > > > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index > > 8c1162d..0fcac3c 100644 > > --- a/arch/x86/kvm/lapic.c > > +++ b/arch/x86/kvm/lapic.c > > @@ -539,6 +539,18 @@ void kvm_apic_update_tmr(struct kvm_vcpu *vcpu, > u32 *tmr) > > } > > } > > > > +void kvm_apic_update_eoi_exitmap(struct kvm_vcpu *vcpu, u64 > > +*eoi_exit_bitmap) { > > + struct kvm_lapic *apic = vcpu->arch.apic; > > + u32 i; > > + u32 tmr; > > + > > + for (i = 0; i < 8; i++) { > > + tmr = kvm_apic_get_reg(apic, APIC_TMR + 0x10 * i); > > + *((u32 *)eoi_exit_bitmap + i) |= tmr; > > + } > > +} > > + > > static void apic_update_ppr(struct kvm_lapic *apic) { > > u32 tpr, isrv, ppr, old_ppr; > > diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h index > > 6a11845..d2b96f2 100644 > > --- a/arch/x86/kvm/lapic.h > > +++ b/arch/x86/kvm/lapic.h > > @@ -55,6 +55,7 @@ void kvm_apic_set_version(struct kvm_vcpu *vcpu); > > > > void kvm_apic_update_tmr(struct kvm_vcpu *vcpu, u32 *tmr); void > > kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir); > > +void kvm_apic_update_eoi_exitmap(struct kvm_vcpu *vcpu, u64 > > +*eoi_exit_bitmap); > > int kvm_apic_match_physical_addr(struct kvm_lapic *apic, u16 dest); > > int kvm_apic_match_logical_addr(struct kvm_lapic *apic, u8 mda); int > > kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq, > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index > > d401684..d23b558 100644 > > --- a/arch/x86/kvm/x86.c > > +++ b/arch/x86/kvm/x86.c > > @@ -5992,6 +5992,7 @@ static void vcpu_scan_ioapic(struct kvm_vcpu > > *vcpu) > > > > kvm_ioapic_scan_entry(vcpu, eoi_exit_bitmap, tmr); > > kvm_apic_update_tmr(vcpu, tmr); > > + kvm_apic_update_eoi_exitmap(vcpu, eoi_exit_bitmap); > > kvm_x86_ops->load_eoi_exitmap(vcpu, eoi_exit_bitmap); } > > > > diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c index > > e8ce34c..ed15936 100644 > > --- a/virt/kvm/ioapic.c > > +++ b/virt/kvm/ioapic.c > > @@ -254,9 +254,10 @@ void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, > u64 *eoi_exit_bitmap, > > spin_lock(&ioapic->lock); > > for (index = 0; index < IOAPIC_NUM_PINS; index++) { > > e = &ioapic->redirtbl[index]; > > - if (e->fields.trig_mode == IOAPIC_LEVEL_TRIG || > > - kvm_irq_has_notifier(ioapic->kvm, KVM_IRQCHIP_IOAPIC, index) > || > > - index == RTC_GSI) { > > + if ((!e->fields.mask > > + && e->fields.trig_mode == IOAPIC_LEVEL_TRIG) > > + || kvm_irq_has_notifier(ioapic->kvm, KVM_IRQCHIP_IOAPIC, > > + index) || index == RTC_GSI) { > > if (kvm_apic_match_dest(vcpu, NULL, 0, > > e->fields.dest_id, e->fields.dest_mode)) { > > __set_bit(e->fields.vector, > > > > There's still something missing here. > > Suppose you have the following: > > Program edge-triggered MSI for vector 123 > Interrupt comes in, ISR[123]=1 > Mask MSI > Program level-triggered IOAPIC interrupt for vector 123 You cannot assign the vector 123 to another trigger mode interrupt before previous IRQ handler finished (means issue EOI). If there is an interrupt comes from the IOAPIC entry immediately after you reprogram the entry, it will update the TMR to 1. Since we are still in previous IRQ handler, it will get confused to see the TMR becomes 1. > >> Here, TMR[123] remains 0. > Send EOI for vector 123 > > Now the TMR will not be updated in the LAPICs, and the EOI exit bitmap will > not be set correctly. > > To fix this, we could drop all communication of the TMR from IOAPIC to LAPIC. > Instead, do what the processor does and just modify the TMR in > __apic_accept_irq, based on the trig_mode. If the TMR changes, you trigger > an IOAPIC scan that will also refresh the EOI exit bitmap. > > For the hot-unplug/hot-plug case that Yang mentioned, the EOI exit bitmap will > be updated as soon as the first MSI arrives (the MSI is edge-triggered). > > But this is more tricky than it looks like. Because the interrupt must not be > delivered until the EOI exit bitmap is right, you have to skip posted interrupt > delivery. There could be races between the handling of > KVM_REQ_SCAN_IOAPIC and KVM_REQ_EVENT, which are hard to get right. > I'm not sure it's worthwhile for what is after all a corner case. > > Paolo Best regards, Yang -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Il 28/08/2014 08:17, Zhang, Yang Z ha scritto: > > Program edge-triggered MSI for vector 123 > > Interrupt comes in, ISR[123]=1 > > Mask MSI > > Program level-triggered IOAPIC interrupt for vector 123 > > You cannot assign the vector 123 to another trigger mode interrupt > before previous IRQ handler finished (means issue EOI). If there is an > interrupt comes from the IOAPIC entry immediately after you reprogram > the entry, it will update the TMR to 1. Since we are still in previous > IRQ handler, it will get confused to see the TMR becomes 1. Yeah, that could be confusing to real hardware as well. Still, I'm a bit nervous at the possibility of races introduced by these patches... I wouldn't mind a second review. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
We will do some more tests on it to make sure there are no problems. Wei -----Original Message----- From: Paolo Bonzini [mailto:paolo.bonzini@gmail.com] On Behalf Of Paolo Bonzini Sent: Thursday, August 28, 2014 4:44 PM To: Zhang, Yang Z; Wang, Wei W; kvm@vger.kernel.org Cc: alex.williamson@redhat.com Subject: Re: [PATCH v2] KVM: x86: keep eoi exit bitmap accurate before loading it. Il 28/08/2014 08:17, Zhang, Yang Z ha scritto: > > Program edge-triggered MSI for vector 123 > > Interrupt comes in, ISR[123]=1 > > Mask MSI > > Program level-triggered IOAPIC interrupt for vector 123 > > You cannot assign the vector 123 to another trigger mode interrupt > before previous IRQ handler finished (means issue EOI). If there is an > interrupt comes from the IOAPIC entry immediately after you reprogram > the entry, it will update the TMR to 1. Since we are still in previous > IRQ handler, it will get confused to see the TMR becomes 1. Yeah, that could be confusing to real hardware as well. Still, I'm a bit nervous at the possibility of races introduced by these patches... I wouldn't mind a second review. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Il 28/08/2014 12:14, Wang, Wei W ha scritto:
> We will do some more tests on it to make sure there are no problems.
No, I don't think there are any easily-detected practical problems with
the patch. But I'm not sure I understand all the theoretical problems
and whether possible races are benign. These would be really hard to
debug, unless you get a bug that is 100% reproducible.
Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
I think we can think about it for another couple of days and see if any corner case is not covered. Wei > -----Original Message----- > From: Paolo Bonzini [mailto:pbonzini@redhat.com] > Sent: Thursday, August 28, 2014 7:01 PM > To: Wang, Wei W; Zhang, Yang Z; kvm@vger.kernel.org > Cc: alex.williamson@redhat.com > Subject: Re: [PATCH v2] KVM: x86: keep eoi exit bitmap accurate before > loading it. > > Il 28/08/2014 12:14, Wang, Wei W ha scritto: > > We will do some more tests on it to make sure there are no problems. > > No, I don't think there are any easily-detected practical problems with the > patch. But I'm not sure I understand all the theoretical problems and > whether possible races are benign. These would be really hard to debug, > unless you get a bug that is 100% reproducible. > > Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> -----Original Message----- > From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On > Behalf Of Wang, Wei W > Sent: Friday, August 29, 2014 8:59 AM > To: Paolo Bonzini; Zhang, Yang Z; kvm@vger.kernel.org > Cc: alex.williamson@redhat.com > Subject: RE: [PATCH v2] KVM: x86: keep eoi exit bitmap accurate before > loading it. > > I think we can think about it for another couple of days and see if any corner > case is not covered. > > Wei Hi Paolo, we have not found any corner case. Do you still have any concerns? If not, I can email out the two patches with your suggested commit messages. If anyone find a bug in the future, we can also get back to solve it. Wei > > > -----Original Message----- > > From: Paolo Bonzini [mailto:pbonzini@redhat.com] > > Sent: Thursday, August 28, 2014 7:01 PM > > To: Wang, Wei W; Zhang, Yang Z; kvm@vger.kernel.org > > Cc: alex.williamson@redhat.com > > Subject: Re: [PATCH v2] KVM: x86: keep eoi exit bitmap accurate before > > loading it. > > > > Il 28/08/2014 12:14, Wang, Wei W ha scritto: > > > We will do some more tests on it to make sure there are no problems. > > > > No, I don't think there are any easily-detected practical problems > > with the patch. But I'm not sure I understand all the theoretical > > problems and whether possible races are benign. These would be really > > hard to debug, unless you get a bug that is 100% reproducible. > > > > Paolo > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a > message to majordomo@vger.kernel.org More majordomo info at > http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 8c1162d..0fcac3c 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -539,6 +539,18 @@ void kvm_apic_update_tmr(struct kvm_vcpu *vcpu, u32 *tmr) } } +void kvm_apic_update_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap) +{ + struct kvm_lapic *apic = vcpu->arch.apic; + u32 i; + u32 tmr; + + for (i = 0; i < 8; i++) { + tmr = kvm_apic_get_reg(apic, APIC_TMR + 0x10 * i); + *((u32 *)eoi_exit_bitmap + i) |= tmr; + } +} + static void apic_update_ppr(struct kvm_lapic *apic) { u32 tpr, isrv, ppr, old_ppr; diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h index 6a11845..d2b96f2 100644 --- a/arch/x86/kvm/lapic.h +++ b/arch/x86/kvm/lapic.h @@ -55,6 +55,7 @@ void kvm_apic_set_version(struct kvm_vcpu *vcpu); void kvm_apic_update_tmr(struct kvm_vcpu *vcpu, u32 *tmr); void kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir); +void kvm_apic_update_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap); int kvm_apic_match_physical_addr(struct kvm_lapic *apic, u16 dest); int kvm_apic_match_logical_addr(struct kvm_lapic *apic, u8 mda); int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq, diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index d401684..d23b558 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5992,6 +5992,7 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu) kvm_ioapic_scan_entry(vcpu, eoi_exit_bitmap, tmr); kvm_apic_update_tmr(vcpu, tmr); + kvm_apic_update_eoi_exitmap(vcpu, eoi_exit_bitmap); kvm_x86_ops->load_eoi_exitmap(vcpu, eoi_exit_bitmap); } diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c index e8ce34c..ed15936 100644 --- a/virt/kvm/ioapic.c +++ b/virt/kvm/ioapic.c @@ -254,9 +254,10 @@ void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap, spin_lock(&ioapic->lock); for (index = 0; index < IOAPIC_NUM_PINS; index++) { e = &ioapic->redirtbl[index]; - if (e->fields.trig_mode == IOAPIC_LEVEL_TRIG || - kvm_irq_has_notifier(ioapic->kvm, KVM_IRQCHIP_IOAPIC, index) || - index == RTC_GSI) { + if ((!e->fields.mask + && e->fields.trig_mode == IOAPIC_LEVEL_TRIG) + || kvm_irq_has_notifier(ioapic->kvm, KVM_IRQCHIP_IOAPIC, + index) || index == RTC_GSI) { if (kvm_apic_match_dest(vcpu, NULL, 0, e->fields.dest_id, e->fields.dest_mode)) { __set_bit(e->fields.vector,