diff mbox series

[2/2] KVM: x86: Don't allow tsc_offset, tsc_scaling_ratio to change

Message ID 3a7444aec08042fe205666864b6858910e86aa98.1728719037.git.isaku.yamahata@intel.com (mailing list archive)
State New, archived
Headers show
Series [1/2] KVM: x86: Push down setting vcpu.arch.user_set_tsc | expand

Commit Message

Isaku Yamahata Oct. 12, 2024, 7:55 a.m. UTC
Add guest_tsc_protected member to struct kvm_arch_vcpu and prohibit
changing TSC offset/multiplier when guest_tsc_protected is true.

Background
X86 confidential computing technology defines protected guest TSC so that
the VMM can't change the TSC offset/multiplier once vCPU is initialized.
The SEV-SNP defines Secure TSC as optional.  TDX mandates it.  The TDX
module determines the TSC offset/multiplier.  The VMM has to retrieve them.

On the other hand, the x86 KVM common logic tries to guess or adjust TSC
offset/multiplier for better guest TSC and TSC interrupt latency at KVM
vCPU creation (kvm_arch_vcpu_postcreate()), vCPU migration over pCPU
(kvm_arch_vcpu_load()), vCPU TSC device attributes
(kvm_arch_tsc_set_attr()) and guest/host writing to TSC or TSC adjust MSR
(kvm_set_msr_common()).

Problem
The current x86 KVM implementation conflicts with protected TSC because the
VMM can't change the TSC offset/multiplier.  Disable or ignore the KVM
logic to change/adjust the TSC offset/multiplier somehow.

Because KVM emulates the TSC timer or the TSC deadline timer with the TSC
offset/multiplier, the TSC timer interrupts is injected to the guest at the
wrong time if the KVM TSC offset is different from what the TDX module
determined.

Originally this issue was found by cyclic test of rt-test [1] as the
latency in TDX case is worse than VMX value + TDX SEAMCALL overhead.  It
turned out that the KVM TSC offset is different from what the TDX module
determines.

Solution
The solution is to keep the KVM TSC offset/multiplier the same as the value
of the TDX module somehow.  Possible solutions are as follows.
- Skip the logic
  Ignore (or don't call related functions) the request to change the TSC
  offset/multiplier.
  Pros
  - Logically clean.  This is similar to the guest_protected case.
  Cons
  - Needs to identify the call sites.

- Revert the change at the hooks after TSC adjustment
  x86 KVM defines the vendor hooks when TSC offset/multiplier are
  changed.  The callback can revert the change.
  Pros
  - We don't need to care about the logic to change the TSC
    offset/multiplier.
  Cons:
  - Hacky to revert the KVM x86 common code logic.

Choose the first one.  With this patch series, SEV-SNP secure TSC can be
supported.

[1] https://git.kernel.org/pub/scm/utils/rt-tests/rt-tests.git

Reported-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/kvm_host.h | 1 +
 arch/x86/kvm/x86.c              | 9 ++++++++-
 2 files changed, 9 insertions(+), 1 deletion(-)

Comments

Edgecombe, Rick P Oct. 14, 2024, 3:48 p.m. UTC | #1
On Sat, 2024-10-12 at 00:55 -0700, Isaku Yamahata wrote:
> Problem
> The current x86 KVM implementation conflicts with protected TSC because the
> VMM can't change the TSC offset/multiplier.  Disable or ignore the KVM
> logic to change/adjust the TSC offset/multiplier somehow.
> 
> Because KVM emulates the TSC timer or the TSC deadline timer with the TSC
> offset/multiplier, the TSC timer interrupts is injected to the guest at the
> wrong time if the KVM TSC offset is different from what the TDX module
> determined.
> 
> Originally this issue was found by cyclic test of rt-test [1] as the
> latency in TDX case is worse than VMX value + TDX SEAMCALL overhead.  It
> turned out that the KVM TSC offset is different from what the TDX module
> determines.
> 
> Solution
> The solution is to keep the KVM TSC offset/multiplier the same as the value
> of the TDX module somehow.  Possible solutions are as follows.
> - Skip the logic
>   Ignore (or don't call related functions) the request to change the TSC
>   offset/multiplier.
>   Pros
>   - Logically clean.  This is similar to the guest_protected case.
>   Cons
>   - Needs to identify the call sites.
> 
> - Revert the change at the hooks after TSC adjustment
>   x86 KVM defines the vendor hooks when TSC offset/multiplier are
>   changed.  The callback can revert the change.
>   Pros
>   - We don't need to care about the logic to change the TSC
>     offset/multiplier.
>   Cons:
>   - Hacky to revert the KVM x86 common code logic.
> 
> Choose the first one.  With this patch series, SEV-SNP secure TSC can be
> supported.
> 
> [1] https://git.kernel.org/pub/scm/utils/rt-tests/rt-tests.git
> 
> Reported-by: Marcelo Tosatti <mtosatti@redhat.com>

IIUC this problem was reported by Marcelo and he tested these patches and found
that they did *not* resolve his issue? But offline you mentioned that you
reproduced a similar seeming bug on your end that *was* resolved by these
patches. If I got that right, I would think we should figure out Marcelo's
problem before fixing this upstream. If it only affects out-of-tree TDX code we
can take more time and not thrash the code as it gets untangled further.
diff mbox series

Patch

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 61b7e9fe5e57..112b8a4f1860 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1036,6 +1036,7 @@  struct kvm_vcpu_arch {
 
 	/* Protected Guests */
 	bool guest_state_protected;
+	bool guest_tsc_protected;
 
 	/*
 	 * Set when PDPTS were loaded directly by the userspace without
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 65d871bb5b35..a6cf4422df28 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2587,6 +2587,9 @@  EXPORT_SYMBOL_GPL(kvm_calc_nested_tsc_multiplier);
 
 static void kvm_vcpu_write_tsc_offset(struct kvm_vcpu *vcpu, u64 l1_offset)
 {
+	if (vcpu->arch.guest_tsc_protected)
+		return;
+
 	trace_kvm_write_tsc_offset(vcpu->vcpu_id,
 				   vcpu->arch.l1_tsc_offset,
 				   l1_offset);
@@ -2650,6 +2653,9 @@  static void __kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 offset, u64 tsc,
 
 	lockdep_assert_held(&kvm->arch.tsc_write_lock);
 
+	if (vcpu->arch.guest_tsc_protected)
+		return;
+
 	if (user_set_tsc)
 		vcpu->kvm->arch.user_set_tsc = true;
 
@@ -5028,7 +5034,8 @@  void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 			u64 offset = kvm_compute_l1_tsc_offset(vcpu,
 						vcpu->arch.last_guest_tsc);
 			kvm_vcpu_write_tsc_offset(vcpu, offset);
-			vcpu->arch.tsc_catchup = 1;
+			if (!vcpu->arch.guest_tsc_protected)
+				vcpu->arch.tsc_catchup = 1;
 		}
 
 		if (kvm_lapic_hv_timer_in_use(vcpu))