diff mbox series

[v2] KVM: x86: Clear bit12 of ICR after APIC-write VM-exit

Message ID 20230908041115.987682-1-tao1.su@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series [v2] KVM: x86: Clear bit12 of ICR after APIC-write VM-exit | expand

Commit Message

Tao Su Sept. 8, 2023, 4:11 a.m. UTC
When IPI virtualization is enabled, a WARN is triggered if bit12 of ICR
MSR is set after APIC-write VM-exit. The reason is kvm_apic_send_ipi()
thinks the APIC_ICR_BUSY bit should be cleared because KVM has no delay,
but kvm_apic_write_nodecode() doesn't clear the APIC_ICR_BUSY bit.

Bit12 of ICR is different from other reserved bits(31:20, 17:16 and 13).
When bit12 is set, it will cause APIC-wirte VM-exit but not #GP. For
reading bit12 back as '0' which is a safer approach, clearing bit12 in
x2APIC mode is needed.

Although bit12 of ICR is no longer APIC_ICR_BUSY in x2APIC, keeping it
is far easier to understand what's going on, especially given that it
may be repurposed for something new.

Link: https://lore.kernel.org/all/ZPj6iF0Q7iynn62p@google.com/
Fixes: 5413bcba7ed5 ("KVM: x86: Add support for vICR APIC-write VM-Exits in x2APIC mode")
Signed-off-by: Tao Su <tao1.su@linux.intel.com>
Tested-by: Yi Lai <yi1.lai@intel.com>
---
Changelog:

v2:
  - Drop the unnecessary alias for bit12 of ICR.
  - Add back kvm_lapic_get_reg64() that was removed by mistake.
  - Modify the commit message to make it clearer.

v1: https://lore.kernel.org/all/20230904013555.725413-1-tao1.su@linux.intel.com/
---
 arch/x86/kvm/lapic.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)


base-commit: a48fa7efaf1161c1c898931fe4c7f0070964233a

Comments

Chao Gao Sept. 8, 2023, 10:12 a.m. UTC | #1
On Fri, Sep 08, 2023 at 12:11:15PM +0800, Tao Su wrote:
>When IPI virtualization is enabled, a WARN is triggered if bit12 of ICR
>MSR is set after APIC-write VM-exit. The reason is kvm_apic_send_ipi()
>thinks the APIC_ICR_BUSY bit should be cleared because KVM has no delay,
>but kvm_apic_write_nodecode() doesn't clear the APIC_ICR_BUSY bit.
>
>Bit12 of ICR is different from other reserved bits(31:20, 17:16 and 13).
>When bit12 is set, it will cause APIC-wirte VM-exit but not #GP. For

s/wirte/write

>reading bit12 back as '0' which is a safer approach, clearing bit12 in
>x2APIC mode is needed.

how about quoting what Sean said:
(w/ a slight change to the last sentence)

Under the x2APIC section, regarding ICR, the SDM says:

  It remains readable only to aid in debugging; however, software should not
  assume the value returned by reading the ICR is the last written value.

I.e. KVM basically has free reign to do whatever it wants, so long as it doesn't
confuse userspace or break KVM's ABI.

Clear bit12 so that it reads back as '0'. This approach is safer than "do
nothing" and is consistent with the case where IPI virtualization is
disabled or not supported, i.e.,

	handle_fastpath_set_x2apic_icr_irqoff() -> kvm_x2apic_icr_write()

>
>Although bit12 of ICR is no longer APIC_ICR_BUSY in x2APIC, keeping it
>is far easier to understand what's going on, especially given that it
>may be repurposed for something new.

Probably you can remove this paragraph. it is not clear w/o the context
that there was an attempt to rename APIC_ICR_BUSY for x2apic while fixing
the issue.

>
>Link: https://lore.kernel.org/all/ZPj6iF0Q7iynn62p@google.com/
>Fixes: 5413bcba7ed5 ("KVM: x86: Add support for vICR APIC-write VM-Exits in x2APIC mode")
>Signed-off-by: Tao Su <tao1.su@linux.intel.com>
>Tested-by: Yi Lai <yi1.lai@intel.com>

Apart from above nits on the changelog, this patch looks good to me.

Reviewed-by: Chao Gao <chao.gao@intel.com>

>---
>Changelog:
>
>v2:
>  - Drop the unnecessary alias for bit12 of ICR.
>  - Add back kvm_lapic_get_reg64() that was removed by mistake.
>  - Modify the commit message to make it clearer.
>
>v1: https://lore.kernel.org/all/20230904013555.725413-1-tao1.su@linux.intel.com/
>---
> arch/x86/kvm/lapic.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
>diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>index dcd60b39e794..664d5a78b46a 100644
>--- a/arch/x86/kvm/lapic.c
>+++ b/arch/x86/kvm/lapic.c
>@@ -2450,13 +2450,13 @@ void kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset)
> 	 * ICR is a single 64-bit register when x2APIC is enabled.  For legacy
> 	 * xAPIC, ICR writes need to go down the common (slightly slower) path
> 	 * to get the upper half from ICR2.
>+	 *
>+	 * TODO: optimize to just emulate side effect w/o one more write
> 	 */
> 	if (apic_x2apic_mode(apic) && offset == APIC_ICR) {
> 		val = kvm_lapic_get_reg64(apic, APIC_ICR);
>-		kvm_apic_send_ipi(apic, (u32)val, (u32)(val >> 32));
>-		trace_kvm_apic_write(APIC_ICR, val);
>+		kvm_x2apic_icr_write(apic, val);
> 	} else {
>-		/* TODO: optimize to just emulate side effect w/o one more write */
> 		val = kvm_lapic_get_reg(apic, offset);
> 		kvm_lapic_reg_write(apic, offset, (u32)val);
> 	}
>
>base-commit: a48fa7efaf1161c1c898931fe4c7f0070964233a
>-- 
>2.34.1
>
Tao Su Sept. 9, 2023, 12:35 a.m. UTC | #2
On Fri, Sep 08, 2023 at 06:12:12PM +0800, Chao Gao wrote:
> On Fri, Sep 08, 2023 at 12:11:15PM +0800, Tao Su wrote:
> >When IPI virtualization is enabled, a WARN is triggered if bit12 of ICR
> >MSR is set after APIC-write VM-exit. The reason is kvm_apic_send_ipi()
> >thinks the APIC_ICR_BUSY bit should be cleared because KVM has no delay,
> >but kvm_apic_write_nodecode() doesn't clear the APIC_ICR_BUSY bit.
> >
> >Bit12 of ICR is different from other reserved bits(31:20, 17:16 and 13).
> >When bit12 is set, it will cause APIC-wirte VM-exit but not #GP. For
> 
> s/wirte/write
> 
> >reading bit12 back as '0' which is a safer approach, clearing bit12 in
> >x2APIC mode is needed.
> 
> how about quoting what Sean said:
> (w/ a slight change to the last sentence)
> 
> Under the x2APIC section, regarding ICR, the SDM says:
> 
>   It remains readable only to aid in debugging; however, software should not
>   assume the value returned by reading the ICR is the last written value.
> 
> I.e. KVM basically has free reign to do whatever it wants, so long as it doesn't
> confuse userspace or break KVM's ABI.
> 
> Clear bit12 so that it reads back as '0'. This approach is safer than "do
> nothing" and is consistent with the case where IPI virtualization is
> disabled or not supported, i.e.,
> 
> 	handle_fastpath_set_x2apic_icr_irqoff() -> kvm_x2apic_icr_write()
> 
> >
> >Although bit12 of ICR is no longer APIC_ICR_BUSY in x2APIC, keeping it
> >is far easier to understand what's going on, especially given that it
> >may be repurposed for something new.
> 
> Probably you can remove this paragraph. it is not clear w/o the context
> that there was an attempt to rename APIC_ICR_BUSY for x2apic while fixing
> the issue.

Yes, agree with all the above, this is more correct and clear description.

Thanks,
Tao

> 
> >
> >Link: https://lore.kernel.org/all/ZPj6iF0Q7iynn62p@google.com/
> >Fixes: 5413bcba7ed5 ("KVM: x86: Add support for vICR APIC-write VM-Exits in x2APIC mode")
> >Signed-off-by: Tao Su <tao1.su@linux.intel.com>
> >Tested-by: Yi Lai <yi1.lai@intel.com>
> 
> Apart from above nits on the changelog, this patch looks good to me.
> 
> Reviewed-by: Chao Gao <chao.gao@intel.com>
> 
> >---
> >Changelog:
> >
> >v2:
> >  - Drop the unnecessary alias for bit12 of ICR.
> >  - Add back kvm_lapic_get_reg64() that was removed by mistake.
> >  - Modify the commit message to make it clearer.
> >
> >v1: https://lore.kernel.org/all/20230904013555.725413-1-tao1.su@linux.intel.com/
> >---
> > arch/x86/kvm/lapic.c | 6 +++---
> > 1 file changed, 3 insertions(+), 3 deletions(-)
> >
> >diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> >index dcd60b39e794..664d5a78b46a 100644
> >--- a/arch/x86/kvm/lapic.c
> >+++ b/arch/x86/kvm/lapic.c
> >@@ -2450,13 +2450,13 @@ void kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset)
> > 	 * ICR is a single 64-bit register when x2APIC is enabled.  For legacy
> > 	 * xAPIC, ICR writes need to go down the common (slightly slower) path
> > 	 * to get the upper half from ICR2.
> >+	 *
> >+	 * TODO: optimize to just emulate side effect w/o one more write
> > 	 */
> > 	if (apic_x2apic_mode(apic) && offset == APIC_ICR) {
> > 		val = kvm_lapic_get_reg64(apic, APIC_ICR);
> >-		kvm_apic_send_ipi(apic, (u32)val, (u32)(val >> 32));
> >-		trace_kvm_apic_write(APIC_ICR, val);
> >+		kvm_x2apic_icr_write(apic, val);
> > 	} else {
> >-		/* TODO: optimize to just emulate side effect w/o one more write */
> > 		val = kvm_lapic_get_reg(apic, offset);
> > 		kvm_lapic_reg_write(apic, offset, (u32)val);
> > 	}
> >
> >base-commit: a48fa7efaf1161c1c898931fe4c7f0070964233a
> >-- 
> >2.34.1
> >
diff mbox series

Patch

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index dcd60b39e794..664d5a78b46a 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2450,13 +2450,13 @@  void kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset)
 	 * ICR is a single 64-bit register when x2APIC is enabled.  For legacy
 	 * xAPIC, ICR writes need to go down the common (slightly slower) path
 	 * to get the upper half from ICR2.
+	 *
+	 * TODO: optimize to just emulate side effect w/o one more write
 	 */
 	if (apic_x2apic_mode(apic) && offset == APIC_ICR) {
 		val = kvm_lapic_get_reg64(apic, APIC_ICR);
-		kvm_apic_send_ipi(apic, (u32)val, (u32)(val >> 32));
-		trace_kvm_apic_write(APIC_ICR, val);
+		kvm_x2apic_icr_write(apic, val);
 	} else {
-		/* TODO: optimize to just emulate side effect w/o one more write */
 		val = kvm_lapic_get_reg(apic, offset);
 		kvm_lapic_reg_write(apic, offset, (u32)val);
 	}