Message ID | 1614678202-10808-1-git-send-email-wanpengli@tencent.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: LAPIC: Advancing the timer expiration on guest initiated write | expand |
On Tue, Mar 02, 2021, Wanpeng Li wrote: > From: Wanpeng Li <wanpengli@tencent.com> > > Advancing the timer expiration should only be necessary on guest initiated > writes. Now, we cancel the timer, clear .pending and clear expired_tscdeadline > at the same time during state restore. That last sentence is confusing. kvm_apic_set_state() already clears .pending, by way of __start_apic_timer(). I think what you mean is: When we cancel the timer and clear .pending during state restore, clear expired_tscdeadline as well. With that, Reviewed-by: Sean Christopherson <seanjc@google.com> Side topic, I think there's a theoretical bug where KVM could inject a spurious timer interrupt. If KVM is using hrtimer, the hrtimer expires early due to an overzealous timer_advance_ns, and the guest writes MSR_TSCDEADLINE after the hrtimer expires but before the vCPU is kicked, then KVM will inject a spurious timer IRQ since the premature expiration should have been canceled by the guest's WRMSR. It could also cause KVM to soft hang the guest if the new lapic_timer.tscdeadline is written before apic_timer_expired() captures it in expired_tscdeadline. In that case, KVM will wait for the new deadline, which could be far in the future. Side topic #2, I'm pretty sure the direct usage of kvm_wait_lapic_expire() in apic_timer_expired() before kvm_apic_inject_pending_timer_irqs() is broken. kvm_wait_lapic_expire() requires the interrupt to be pending, but that never happens if PI is used, and even if PI "fails", the IRQ isn't injected until the next line, kvm_apic_inject_pending_timer_irqs(). I'll send a patch. > Signed-off-by: Wanpeng Li <wanpengli@tencent.com> > --- > arch/x86/kvm/lapic.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > index 45d40bf..f2b6e79 100644 > --- a/arch/x86/kvm/lapic.c > +++ b/arch/x86/kvm/lapic.c > @@ -2595,6 +2595,7 @@ int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s) > > apic_update_ppr(apic); > hrtimer_cancel(&apic->lapic_timer.timer); > + apic->lapic_timer.expired_tscdeadline = 0; > apic_update_lvtt(apic); > apic_manage_nmi_watchdog(apic, kvm_lapic_get_reg(apic, APIC_LVT0)); > update_divide_count(apic); > -- > 2.7.4 >
On Wed, 3 Mar 2021 at 01:16, Sean Christopherson <seanjc@google.com> wrote: > > On Tue, Mar 02, 2021, Wanpeng Li wrote: > > From: Wanpeng Li <wanpengli@tencent.com> > > > > Advancing the timer expiration should only be necessary on guest initiated > > writes. Now, we cancel the timer, clear .pending and clear expired_tscdeadline > > at the same time during state restore. > > That last sentence is confusing. kvm_apic_set_state() already clears .pending, > by way of __start_apic_timer(). I think what you mean is: > > When we cancel the timer and clear .pending during state restore, clear > expired_tscdeadline as well. Good statement. :) > > With that, > > Reviewed-by: Sean Christopherson <seanjc@google.com> > > > Side topic, I think there's a theoretical bug where KVM could inject a spurious > timer interrupt. If KVM is using hrtimer, the hrtimer expires early due to an > overzealous timer_advance_ns, and the guest writes MSR_TSCDEADLINE after the > hrtimer expires but before the vCPU is kicked, then KVM will inject a spurious > timer IRQ since the premature expiration should have been canceled by the guest's > WRMSR. > > It could also cause KVM to soft hang the guest if the new lapic_timer.tscdeadline > is written before apic_timer_expired() captures it in expired_tscdeadline. In > that case, KVM will wait for the new deadline, which could be far in the future. The hrtimer_cancel() before setting new lapic_timer.tscdeadline in kvm_set_lapic_tscdeadline_msr() will wait for the hrtimer callback function to finish. Could it solve this issue? Wanpeng
On Wed, Mar 03, 2021, Wanpeng Li wrote: > > Side topic, I think there's a theoretical bug where KVM could inject a spurious > > timer interrupt. If KVM is using hrtimer, the hrtimer expires early due to an > > overzealous timer_advance_ns, and the guest writes MSR_TSCDEADLINE after the > > hrtimer expires but before the vCPU is kicked, then KVM will inject a spurious > > timer IRQ since the premature expiration should have been canceled by the guest's > > WRMSR. > > > > It could also cause KVM to soft hang the guest if the new lapic_timer.tscdeadline > > is written before apic_timer_expired() captures it in expired_tscdeadline. In > > that case, KVM will wait for the new deadline, which could be far in the future. > > The hrtimer_cancel() before setting new lapic_timer.tscdeadline in > kvm_set_lapic_tscdeadline_msr() will wait for the hrtimer callback > function to finish. Could it solve this issue? Aha! Yep, that prevents my theoretical bug. Thanks!
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 45d40bf..f2b6e79 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -2595,6 +2595,7 @@ int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s) apic_update_ppr(apic); hrtimer_cancel(&apic->lapic_timer.timer); + apic->lapic_timer.expired_tscdeadline = 0; apic_update_lvtt(apic); apic_manage_nmi_watchdog(apic, kvm_lapic_get_reg(apic, APIC_LVT0)); update_divide_count(apic);