diff mbox series

KVM: x86: Try to enable irr_pending state with disabled APICv

Message ID 20241023124527.1092810-1-alexyonghe@tencent.com (mailing list archive)
State New
Headers show
Series KVM: x86: Try to enable irr_pending state with disabled APICv | expand

Commit Message

zhuangel570 Oct. 23, 2024, 12:45 p.m. UTC
From: Yong He <alexyonghe@tencent.com>

Try to enable irr_pending when set APIC state, if there is
pending interrupt in IRR with disabled APICv.

In save/restore VM scenery with disabled APICv. Qemu/CloudHypervisor
always send signals to stop running vcpu threads, then save
entire VM state, including APIC state. There may be a pending
timer interrupt in the saved APIC IRR that is injected before
vcpu_run return. But when restoring the VM, since APICv is
disabled, irr_pending is disabled by default, so this may cause
the timer interrupt in the IRR to be suspended for a long time,
until the next interrupt comes.

Signed-off-by: Yong He <alexyonghe@tencent.com>
---
 arch/x86/kvm/lapic.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Sean Christopherson Oct. 30, 2024, 6:31 p.m. UTC | #1
On Wed, Oct 23, 2024, Yong He wrote:
> From: Yong He <alexyonghe@tencent.com>
> 
> Try to enable irr_pending when set APIC state, if there is
> pending interrupt in IRR with disabled APICv.
> 
> In save/restore VM scenery with disabled APICv. Qemu/CloudHypervisor
> always send signals to stop running vcpu threads, then save
> entire VM state, including APIC state. There may be a pending
> timer interrupt in the saved APIC IRR that is injected before
> vcpu_run return. But when restoring the VM, since APICv is
> disabled, irr_pending is disabled by default, so this may cause
> the timer interrupt in the IRR to be suspended for a long time,
> until the next interrupt comes.
> 
> Signed-off-by: Yong He <alexyonghe@tencent.com>
> ---
>  arch/x86/kvm/lapic.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 2098dc689088..7373f649958b 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -3099,6 +3099,10 @@ int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s)
>  						apic_find_highest_irr(apic));
>  		kvm_x86_call(hwapic_isr_update)(apic_find_highest_isr(apic));
>  	}
> +
> +	/* Search the IRR and enable irr_pending state with disabled APICv*/
> +	if (!enable_apicv && apic_search_irr(apic) != -1)

This can/should be an "else" from the above "if (apic->apicv_active)".  I also
think KVM can safely clear irr_pending in this case, which is also why irr_pending
isn't handling in kvm_apic_update_apicv().  When APICv is disabled (inhibited) at
runtime, an IRQ may be in-flight, i.e. apic_search_irr() can get a false negative.

But when stuffing APIC state, I don't see how that can happen.  So this?

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 65412640cfc7..deb73aea2c06 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -3086,6 +3086,15 @@ int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s)
                kvm_x86_call(hwapic_irr_update)(vcpu,
                                                apic_find_highest_irr(apic));
                kvm_x86_call(hwapic_isr_update)(apic_find_highest_isr(apic));
+       } else {
+               /*
+                * Note, kvm_apic_update_apicv() is responsible for updating
+                * isr_count and highest_isr_cache.  irr_pending is somewhat
+                * special because it mustn't be cleared when APICv is disabled
+                * at runtime, and only state restore can cause an IRR bit to
+                * be set without also refreshing irr_pending.
+                */
+               apic->irr_pending = apic_search_irr(apic) != -1;
        }
        kvm_make_request(KVM_REQ_EVENT, vcpu);
        if (ioapic_in_kernel(vcpu->kvm))

> +		apic->irr_pending = true;
>  	kvm_make_request(KVM_REQ_EVENT, vcpu);
>  	if (ioapic_in_kernel(vcpu->kvm))
>  		kvm_rtc_eoi_tracking_restore_one(vcpu);
> -- 
> 2.43.5
>
zhuangel570 Oct. 31, 2024, 3:39 a.m. UTC | #2
On Thu, Oct 31, 2024 at 2:31 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Wed, Oct 23, 2024, Yong He wrote:
> > From: Yong He <alexyonghe@tencent.com>
> >
> > Try to enable irr_pending when set APIC state, if there is
> > pending interrupt in IRR with disabled APICv.
> >
> > In save/restore VM scenery with disabled APICv. Qemu/CloudHypervisor
> > always send signals to stop running vcpu threads, then save
> > entire VM state, including APIC state. There may be a pending
> > timer interrupt in the saved APIC IRR that is injected before
> > vcpu_run return. But when restoring the VM, since APICv is
> > disabled, irr_pending is disabled by default, so this may cause
> > the timer interrupt in the IRR to be suspended for a long time,
> > until the next interrupt comes.
> >
> > Signed-off-by: Yong He <alexyonghe@tencent.com>
> > ---
> >  arch/x86/kvm/lapic.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > index 2098dc689088..7373f649958b 100644
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -3099,6 +3099,10 @@ int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s)
> >                                               apic_find_highest_irr(apic));
> >               kvm_x86_call(hwapic_isr_update)(apic_find_highest_isr(apic));
> >       }
> > +
> > +     /* Search the IRR and enable irr_pending state with disabled APICv*/
> > +     if (!enable_apicv && apic_search_irr(apic) != -1)
>
> This can/should be an "else" from the above "if (apic->apicv_active)".  I also
> think KVM can safely clear irr_pending in this case, which is also why irr_pending
> isn't handling in kvm_apic_update_apicv().  When APICv is disabled (inhibited) at
> runtime, an IRQ may be in-flight, i.e. apic_search_irr() can get a false negative.

Thank you for your review and suggestions.

>
> But when stuffing APIC state, I don't see how that can happen.  So this?

Here is our case.

APICv is disabled by set enable_apicv to 0. Create VM snapshot, then
start/restore
new VM base the snapshot. We occasionally encountered issues with VMs hanging
for long periods of time after restore. Investigation show that there
is a timer IRQ
pending in IRR, but the newly restored VM could not detect it, because
irr_pending
is not set when restoring the APIC state by kvm_apic_set_state().

Further investigation show when creating VM snapshot, VMM pause VCPUs by signal,
an in-flight timer pending in IRR, and the tscdeadline is 0 in saved
APIC state. All these
contexts in saved APIC state prove that kvm_inject_pending_timer_irqs
had just injected
a timer (will also set the tscdeadline to 0) before the VCPU handle the signal.

Maybe this patch is a fix for 755c2bf87860 (KVM: x86: lapic: don't
touch irr_pending in
kvm_apic_update_apicv when inhibiting it), the irr_pending enable
check is missed in
kvm_apic_set_state() after that.

>
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 65412640cfc7..deb73aea2c06 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -3086,6 +3086,15 @@ int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s)
>                 kvm_x86_call(hwapic_irr_update)(vcpu,
>                                                 apic_find_highest_irr(apic));
>                 kvm_x86_call(hwapic_isr_update)(apic_find_highest_isr(apic));
> +       } else {
> +               /*
> +                * Note, kvm_apic_update_apicv() is responsible for updating
> +                * isr_count and highest_isr_cache.  irr_pending is somewhat
> +                * special because it mustn't be cleared when APICv is disabled
> +                * at runtime, and only state restore can cause an IRR bit to
> +                * be set without also refreshing irr_pending.
> +                */
> +               apic->irr_pending = apic_search_irr(apic) != -1;
>         }
>         kvm_make_request(KVM_REQ_EVENT, vcpu);
>         if (ioapic_in_kernel(vcpu->kvm))
>
> > +             apic->irr_pending = true;
> >       kvm_make_request(KVM_REQ_EVENT, vcpu);
> >       if (ioapic_in_kernel(vcpu->kvm))
> >               kvm_rtc_eoi_tracking_restore_one(vcpu);
> > --
> > 2.43.5
> >
Sean Christopherson Oct. 31, 2024, 3:16 p.m. UTC | #3
On Thu, Oct 31, 2024, zhuangel570 wrote:
> On Thu, Oct 31, 2024 at 2:31 AM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Wed, Oct 23, 2024, Yong He wrote:
> > > From: Yong He <alexyonghe@tencent.com>
> > >
> > > Try to enable irr_pending when set APIC state, if there is
> > > pending interrupt in IRR with disabled APICv.
> > >
> > > In save/restore VM scenery with disabled APICv. Qemu/CloudHypervisor
> > > always send signals to stop running vcpu threads, then save
> > > entire VM state, including APIC state. There may be a pending
> > > timer interrupt in the saved APIC IRR that is injected before
> > > vcpu_run return. But when restoring the VM, since APICv is
> > > disabled, irr_pending is disabled by default, so this may cause
> > > the timer interrupt in the IRR to be suspended for a long time,
> > > until the next interrupt comes.
> > >
> > > Signed-off-by: Yong He <alexyonghe@tencent.com>
> > > ---
> > >  arch/x86/kvm/lapic.c | 4 ++++
> > >  1 file changed, 4 insertions(+)
> > >
> > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > > index 2098dc689088..7373f649958b 100644
> > > --- a/arch/x86/kvm/lapic.c
> > > +++ b/arch/x86/kvm/lapic.c
> > > @@ -3099,6 +3099,10 @@ int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s)
> > >                                               apic_find_highest_irr(apic));
> > >               kvm_x86_call(hwapic_isr_update)(apic_find_highest_isr(apic));
> > >       }
> > > +
> > > +     /* Search the IRR and enable irr_pending state with disabled APICv*/
> > > +     if (!enable_apicv && apic_search_irr(apic) != -1)
> >
> > This can/should be an "else" from the above "if (apic->apicv_active)".  I also
> > think KVM can safely clear irr_pending in this case, which is also why irr_pending
> > isn't handling in kvm_apic_update_apicv().  When APICv is disabled (inhibited) at
> > runtime, an IRQ may be in-flight, i.e. apic_search_irr() can get a false negative.
> 
> Thank you for your review and suggestions.
> 
> >
> > But when stuffing APIC state, I don't see how that can happen.  So this?
> 
> Here is our case.

Sorry for not being clear.  I wasn't saying that the bug you encountered can't
happen.  I 100% agree it's a real bug.  What I was saying can't happen is an
in-flight virtual IRQ from another vCPU or device racing with setting APIC state,
and the VM (or VMM) having any expectation that everything would work correctly.

And so, I think it's save for KVM to explicitly set irr_pending, i.e. to potentially
_clear_ irr_pending.

If you are able to verify the psuedo-patch I posted fixes the bug you encountered,
that is how I would like to fix the bug.
zhuangel570 Nov. 1, 2024, 7:42 a.m. UTC | #4
On Thu, Oct 31, 2024 at 11:16 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Thu, Oct 31, 2024, zhuangel570 wrote:
> > On Thu, Oct 31, 2024 at 2:31 AM Sean Christopherson <seanjc@google.com> wrote:
> > >
> > > On Wed, Oct 23, 2024, Yong He wrote:
> > > > From: Yong He <alexyonghe@tencent.com>
> > > >
> > > > Try to enable irr_pending when set APIC state, if there is
> > > > pending interrupt in IRR with disabled APICv.
> > > >
> > > > In save/restore VM scenery with disabled APICv. Qemu/CloudHypervisor
> > > > always send signals to stop running vcpu threads, then save
> > > > entire VM state, including APIC state. There may be a pending
> > > > timer interrupt in the saved APIC IRR that is injected before
> > > > vcpu_run return. But when restoring the VM, since APICv is
> > > > disabled, irr_pending is disabled by default, so this may cause
> > > > the timer interrupt in the IRR to be suspended for a long time,
> > > > until the next interrupt comes.
> > > >
> > > > Signed-off-by: Yong He <alexyonghe@tencent.com>
> > > > ---
> > > >  arch/x86/kvm/lapic.c | 4 ++++
> > > >  1 file changed, 4 insertions(+)
> > > >
> > > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > > > index 2098dc689088..7373f649958b 100644
> > > > --- a/arch/x86/kvm/lapic.c
> > > > +++ b/arch/x86/kvm/lapic.c
> > > > @@ -3099,6 +3099,10 @@ int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s)
> > > >                                               apic_find_highest_irr(apic));
> > > >               kvm_x86_call(hwapic_isr_update)(apic_find_highest_isr(apic));
> > > >       }
> > > > +
> > > > +     /* Search the IRR and enable irr_pending state with disabled APICv*/
> > > > +     if (!enable_apicv && apic_search_irr(apic) != -1)
> > >
> > > This can/should be an "else" from the above "if (apic->apicv_active)".  I also
> > > think KVM can safely clear irr_pending in this case, which is also why irr_pending
> > > isn't handling in kvm_apic_update_apicv().  When APICv is disabled (inhibited) at
> > > runtime, an IRQ may be in-flight, i.e. apic_search_irr() can get a false negative.
> >
> > Thank you for your review and suggestions.
> >
> > >
> > > But when stuffing APIC state, I don't see how that can happen.  So this?
> >
> > Here is our case.
>
> Sorry for not being clear.  I wasn't saying that the bug you encountered can't
> happen.  I 100% agree it's a real bug.  What I was saying can't happen is an
> in-flight virtual IRQ from another vCPU or device racing with setting APIC state,
> and the VM (or VMM) having any expectation that everything would work correctly.
>
> And so, I think it's save for KVM to explicitly set irr_pending, i.e. to potentially
> _clear_ irr_pending.

Got it, thanks.

>
> If you are able to verify the psuedo-patch I posted fixes the bug you encountered,
> that is how I would like to fix the bug.

Yes, your patch may fix the issue better, and I have checked on my test machine
and it is fixed.
diff mbox series

Patch

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 2098dc689088..7373f649958b 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -3099,6 +3099,10 @@  int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s)
 						apic_find_highest_irr(apic));
 		kvm_x86_call(hwapic_isr_update)(apic_find_highest_isr(apic));
 	}
+
+	/* Search the IRR and enable irr_pending state with disabled APICv*/
+	if (!enable_apicv && apic_search_irr(apic) != -1)
+		apic->irr_pending = true;
 	kvm_make_request(KVM_REQ_EVENT, vcpu);
 	if (ioapic_in_kernel(vcpu->kvm))
 		kvm_rtc_eoi_tracking_restore_one(vcpu);