Message ID | 20241023124527.1092810-1-alexyonghe@tencent.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | KVM: x86: Try to enable irr_pending state with disabled APICv | expand |
On Wed, Oct 23, 2024, Yong He wrote: > From: Yong He <alexyonghe@tencent.com> > > Try to enable irr_pending when set APIC state, if there is > pending interrupt in IRR with disabled APICv. > > In save/restore VM scenery with disabled APICv. Qemu/CloudHypervisor > always send signals to stop running vcpu threads, then save > entire VM state, including APIC state. There may be a pending > timer interrupt in the saved APIC IRR that is injected before > vcpu_run return. But when restoring the VM, since APICv is > disabled, irr_pending is disabled by default, so this may cause > the timer interrupt in the IRR to be suspended for a long time, > until the next interrupt comes. > > Signed-off-by: Yong He <alexyonghe@tencent.com> > --- > arch/x86/kvm/lapic.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > index 2098dc689088..7373f649958b 100644 > --- a/arch/x86/kvm/lapic.c > +++ b/arch/x86/kvm/lapic.c > @@ -3099,6 +3099,10 @@ int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s) > apic_find_highest_irr(apic)); > kvm_x86_call(hwapic_isr_update)(apic_find_highest_isr(apic)); > } > + > + /* Search the IRR and enable irr_pending state with disabled APICv*/ > + if (!enable_apicv && apic_search_irr(apic) != -1) This can/should be an "else" from the above "if (apic->apicv_active)". I also think KVM can safely clear irr_pending in this case, which is also why irr_pending isn't handling in kvm_apic_update_apicv(). When APICv is disabled (inhibited) at runtime, an IRQ may be in-flight, i.e. apic_search_irr() can get a false negative. But when stuffing APIC state, I don't see how that can happen. So this? diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 65412640cfc7..deb73aea2c06 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -3086,6 +3086,15 @@ int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s) kvm_x86_call(hwapic_irr_update)(vcpu, apic_find_highest_irr(apic)); kvm_x86_call(hwapic_isr_update)(apic_find_highest_isr(apic)); + } else { + /* + * Note, kvm_apic_update_apicv() is responsible for updating + * isr_count and highest_isr_cache. irr_pending is somewhat + * special because it mustn't be cleared when APICv is disabled + * at runtime, and only state restore can cause an IRR bit to + * be set without also refreshing irr_pending. + */ + apic->irr_pending = apic_search_irr(apic) != -1; } kvm_make_request(KVM_REQ_EVENT, vcpu); if (ioapic_in_kernel(vcpu->kvm)) > + apic->irr_pending = true; > kvm_make_request(KVM_REQ_EVENT, vcpu); > if (ioapic_in_kernel(vcpu->kvm)) > kvm_rtc_eoi_tracking_restore_one(vcpu); > -- > 2.43.5 >
On Thu, Oct 31, 2024 at 2:31 AM Sean Christopherson <seanjc@google.com> wrote: > > On Wed, Oct 23, 2024, Yong He wrote: > > From: Yong He <alexyonghe@tencent.com> > > > > Try to enable irr_pending when set APIC state, if there is > > pending interrupt in IRR with disabled APICv. > > > > In save/restore VM scenery with disabled APICv. Qemu/CloudHypervisor > > always send signals to stop running vcpu threads, then save > > entire VM state, including APIC state. There may be a pending > > timer interrupt in the saved APIC IRR that is injected before > > vcpu_run return. But when restoring the VM, since APICv is > > disabled, irr_pending is disabled by default, so this may cause > > the timer interrupt in the IRR to be suspended for a long time, > > until the next interrupt comes. > > > > Signed-off-by: Yong He <alexyonghe@tencent.com> > > --- > > arch/x86/kvm/lapic.c | 4 ++++ > > 1 file changed, 4 insertions(+) > > > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > > index 2098dc689088..7373f649958b 100644 > > --- a/arch/x86/kvm/lapic.c > > +++ b/arch/x86/kvm/lapic.c > > @@ -3099,6 +3099,10 @@ int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s) > > apic_find_highest_irr(apic)); > > kvm_x86_call(hwapic_isr_update)(apic_find_highest_isr(apic)); > > } > > + > > + /* Search the IRR and enable irr_pending state with disabled APICv*/ > > + if (!enable_apicv && apic_search_irr(apic) != -1) > > This can/should be an "else" from the above "if (apic->apicv_active)". I also > think KVM can safely clear irr_pending in this case, which is also why irr_pending > isn't handling in kvm_apic_update_apicv(). When APICv is disabled (inhibited) at > runtime, an IRQ may be in-flight, i.e. apic_search_irr() can get a false negative. Thank you for your review and suggestions. > > But when stuffing APIC state, I don't see how that can happen. So this? Here is our case. APICv is disabled by set enable_apicv to 0. Create VM snapshot, then start/restore new VM base the snapshot. We occasionally encountered issues with VMs hanging for long periods of time after restore. Investigation show that there is a timer IRQ pending in IRR, but the newly restored VM could not detect it, because irr_pending is not set when restoring the APIC state by kvm_apic_set_state(). Further investigation show when creating VM snapshot, VMM pause VCPUs by signal, an in-flight timer pending in IRR, and the tscdeadline is 0 in saved APIC state. All these contexts in saved APIC state prove that kvm_inject_pending_timer_irqs had just injected a timer (will also set the tscdeadline to 0) before the VCPU handle the signal. Maybe this patch is a fix for 755c2bf87860 (KVM: x86: lapic: don't touch irr_pending in kvm_apic_update_apicv when inhibiting it), the irr_pending enable check is missed in kvm_apic_set_state() after that. > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > index 65412640cfc7..deb73aea2c06 100644 > --- a/arch/x86/kvm/lapic.c > +++ b/arch/x86/kvm/lapic.c > @@ -3086,6 +3086,15 @@ int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s) > kvm_x86_call(hwapic_irr_update)(vcpu, > apic_find_highest_irr(apic)); > kvm_x86_call(hwapic_isr_update)(apic_find_highest_isr(apic)); > + } else { > + /* > + * Note, kvm_apic_update_apicv() is responsible for updating > + * isr_count and highest_isr_cache. irr_pending is somewhat > + * special because it mustn't be cleared when APICv is disabled > + * at runtime, and only state restore can cause an IRR bit to > + * be set without also refreshing irr_pending. > + */ > + apic->irr_pending = apic_search_irr(apic) != -1; > } > kvm_make_request(KVM_REQ_EVENT, vcpu); > if (ioapic_in_kernel(vcpu->kvm)) > > > + apic->irr_pending = true; > > kvm_make_request(KVM_REQ_EVENT, vcpu); > > if (ioapic_in_kernel(vcpu->kvm)) > > kvm_rtc_eoi_tracking_restore_one(vcpu); > > -- > > 2.43.5 > >
On Thu, Oct 31, 2024, zhuangel570 wrote: > On Thu, Oct 31, 2024 at 2:31 AM Sean Christopherson <seanjc@google.com> wrote: > > > > On Wed, Oct 23, 2024, Yong He wrote: > > > From: Yong He <alexyonghe@tencent.com> > > > > > > Try to enable irr_pending when set APIC state, if there is > > > pending interrupt in IRR with disabled APICv. > > > > > > In save/restore VM scenery with disabled APICv. Qemu/CloudHypervisor > > > always send signals to stop running vcpu threads, then save > > > entire VM state, including APIC state. There may be a pending > > > timer interrupt in the saved APIC IRR that is injected before > > > vcpu_run return. But when restoring the VM, since APICv is > > > disabled, irr_pending is disabled by default, so this may cause > > > the timer interrupt in the IRR to be suspended for a long time, > > > until the next interrupt comes. > > > > > > Signed-off-by: Yong He <alexyonghe@tencent.com> > > > --- > > > arch/x86/kvm/lapic.c | 4 ++++ > > > 1 file changed, 4 insertions(+) > > > > > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > > > index 2098dc689088..7373f649958b 100644 > > > --- a/arch/x86/kvm/lapic.c > > > +++ b/arch/x86/kvm/lapic.c > > > @@ -3099,6 +3099,10 @@ int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s) > > > apic_find_highest_irr(apic)); > > > kvm_x86_call(hwapic_isr_update)(apic_find_highest_isr(apic)); > > > } > > > + > > > + /* Search the IRR and enable irr_pending state with disabled APICv*/ > > > + if (!enable_apicv && apic_search_irr(apic) != -1) > > > > This can/should be an "else" from the above "if (apic->apicv_active)". I also > > think KVM can safely clear irr_pending in this case, which is also why irr_pending > > isn't handling in kvm_apic_update_apicv(). When APICv is disabled (inhibited) at > > runtime, an IRQ may be in-flight, i.e. apic_search_irr() can get a false negative. > > Thank you for your review and suggestions. > > > > > But when stuffing APIC state, I don't see how that can happen. So this? > > Here is our case. Sorry for not being clear. I wasn't saying that the bug you encountered can't happen. I 100% agree it's a real bug. What I was saying can't happen is an in-flight virtual IRQ from another vCPU or device racing with setting APIC state, and the VM (or VMM) having any expectation that everything would work correctly. And so, I think it's save for KVM to explicitly set irr_pending, i.e. to potentially _clear_ irr_pending. If you are able to verify the psuedo-patch I posted fixes the bug you encountered, that is how I would like to fix the bug.
On Thu, Oct 31, 2024 at 11:16 PM Sean Christopherson <seanjc@google.com> wrote: > > On Thu, Oct 31, 2024, zhuangel570 wrote: > > On Thu, Oct 31, 2024 at 2:31 AM Sean Christopherson <seanjc@google.com> wrote: > > > > > > On Wed, Oct 23, 2024, Yong He wrote: > > > > From: Yong He <alexyonghe@tencent.com> > > > > > > > > Try to enable irr_pending when set APIC state, if there is > > > > pending interrupt in IRR with disabled APICv. > > > > > > > > In save/restore VM scenery with disabled APICv. Qemu/CloudHypervisor > > > > always send signals to stop running vcpu threads, then save > > > > entire VM state, including APIC state. There may be a pending > > > > timer interrupt in the saved APIC IRR that is injected before > > > > vcpu_run return. But when restoring the VM, since APICv is > > > > disabled, irr_pending is disabled by default, so this may cause > > > > the timer interrupt in the IRR to be suspended for a long time, > > > > until the next interrupt comes. > > > > > > > > Signed-off-by: Yong He <alexyonghe@tencent.com> > > > > --- > > > > arch/x86/kvm/lapic.c | 4 ++++ > > > > 1 file changed, 4 insertions(+) > > > > > > > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > > > > index 2098dc689088..7373f649958b 100644 > > > > --- a/arch/x86/kvm/lapic.c > > > > +++ b/arch/x86/kvm/lapic.c > > > > @@ -3099,6 +3099,10 @@ int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s) > > > > apic_find_highest_irr(apic)); > > > > kvm_x86_call(hwapic_isr_update)(apic_find_highest_isr(apic)); > > > > } > > > > + > > > > + /* Search the IRR and enable irr_pending state with disabled APICv*/ > > > > + if (!enable_apicv && apic_search_irr(apic) != -1) > > > > > > This can/should be an "else" from the above "if (apic->apicv_active)". I also > > > think KVM can safely clear irr_pending in this case, which is also why irr_pending > > > isn't handling in kvm_apic_update_apicv(). When APICv is disabled (inhibited) at > > > runtime, an IRQ may be in-flight, i.e. apic_search_irr() can get a false negative. > > > > Thank you for your review and suggestions. > > > > > > > > But when stuffing APIC state, I don't see how that can happen. So this? > > > > Here is our case. > > Sorry for not being clear. I wasn't saying that the bug you encountered can't > happen. I 100% agree it's a real bug. What I was saying can't happen is an > in-flight virtual IRQ from another vCPU or device racing with setting APIC state, > and the VM (or VMM) having any expectation that everything would work correctly. > > And so, I think it's save for KVM to explicitly set irr_pending, i.e. to potentially > _clear_ irr_pending. Got it, thanks. > > If you are able to verify the psuedo-patch I posted fixes the bug you encountered, > that is how I would like to fix the bug. Yes, your patch may fix the issue better, and I have checked on my test machine and it is fixed.
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 2098dc689088..7373f649958b 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -3099,6 +3099,10 @@ int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s) apic_find_highest_irr(apic)); kvm_x86_call(hwapic_isr_update)(apic_find_highest_isr(apic)); } + + /* Search the IRR and enable irr_pending state with disabled APICv*/ + if (!enable_apicv && apic_search_irr(apic) != -1) + apic->irr_pending = true; kvm_make_request(KVM_REQ_EVENT, vcpu); if (ioapic_in_kernel(vcpu->kvm)) kvm_rtc_eoi_tracking_restore_one(vcpu);