mbox series

[V9,0/2] KVM: x86: Make bus clock frequency for vAPIC timer configurable

Message ID cover.1718214999.git.reinette.chatre@intel.com (mailing list archive)
Headers show
Series KVM: x86: Make bus clock frequency for vAPIC timer configurable | expand

Message

Reinette Chatre June 12, 2024, 6:16 p.m. UTC
Changes from v8:
- v8: https://lore.kernel.org/lkml/cover.1718043121.git.reinette.chatre@intel.com/
- Many changes to new udelay() utility patch as well as the APIC bus
  frequency test aimed to make it more robust (additional ASSERTs,
  consistent types, eliminate duplicate code, etc.) and useful with
  support for more user configuration. Please refer to individual patches for
  detailed changes.
- Series applies cleanly to next branch of kvm-x86 with HEAD
  e4e9e1067138e5620cf0500c3e5f6ebfb9d322c8.

Changes from v7:
- v7: https://lore.kernel.org/lkml/cover.1717695426.git.reinette.chatre@intel.com/
- Use appropriate ioctl() return type.
- Let new tsc_khz used by KVM selftest be of the same type as same variable
  used in kernel.
- Please refer to individual patches for detailed changes.

Changes from v6:
- V6: https://lore.kernel.org/lkml/cover.1715017765.git.reinette.chatre@intel.com/
- KVM changes were merged into kvm-x86 misc and subsequently dropped from
  this submission. This submission only contains the selftest changes.
  https://lore.kernel.org/lkml/ZmBw9R_jkNLYXkJh@google.com/
- New patch to add udelay() utility to x86_64 KVM selftests. (Sean)
- Many changes to APIC bus frequency selftest. Please refer to the patch
  for detailed changes.
- Series applies cleanly to next branch of kvm-x86 with HEAD
  af0903ab52ee6d6f0f63af67fa73d5eb00f79b9a.

Changes from v5:
- v5: https://lore.kernel.org/lkml/cover.1714081725.git.reinette.chatre@intel.com/
- Rebased on latest of "next" branch of https://github.com/kvm-x86/linux.git
- Added Xiaoyao Li and Yuan Yao's Reviewed-by tags.
- Use the KVM selftest vm_create() wrapper instead of open coding it. (Zide)
- Improve grammar of selftest description. (Zide)

Changes from v4:
- v4: https://lore.kernel.org/lkml/cover.1711035400.git.reinette.chatre@intel.com/
- Rename capability from KVM_CAP_X86_APIC_BUS_FREQUENCY to
  KVM_CAP_X86_APIC_BUS_CYCLES_NS. (Xiaoyao Li).
- Include "Testing" section in cover letter.
- Add Rick's Reviewed-by tags.
- Rebased on latest of "next" branch of https://github.com/kvm-x86/linux.git

Changes from v3:
- v3: https://lore.kernel.org/all/cover.1702974319.git.isaku.yamahata@intel.com/
- Reworked all changelogs.
- Regarding code changes: patches #1 and #2 are unchanged, patch #3 was
  reworked according to Sean's guidance, and patch #4 (the test)
  needed changes after rebase to kvm-x86/next and the implementation
  (patch #3) changes.
- Added Reviewed-by tags to patches #1, #2, and #4, but removed
  Maxim's Reviewed-by tag from patch #3 because it changed so much.
- Added a "Suggested-by" to patch #4 to reflect that it represents
  Sean's guidance.
- Reworked cover to match customs (in subject line) and reflect feedback
  to patches: capability renamed to KVM_CAP_X86_APIC_BUS_FREQUENCY, clarify
  distinction between "core crystal clock" and "bus clock", and update
  pro/con list.
- Please refer to individual patches for detailed changes.

Changes from v2:
- v2: https://lore.kernel.org/lkml/cover.1699936040.git.isaku.yamahata@intel.com/
- Removed APIC_BUS_FREQUENCY and apic_bus_frequency of struct kvm-arch.
- Update the commit messages.
- Added reviewed-by (Maxim Levitsky)
- Use 1.5GHz instead of 1GHz as frequency for the test case.

Changes from v1:
- v1: https://lore.kernel.org/all/cover.1699383993.git.isaku.yamahata@intel.com/
- Added a test case
- Fix a build error for i386 platform
- Add check if vcpu isn't created.
- Add check if lapic chip is in-kernel emulation.
- Updated api.rst

Summary
-------
Add KVM_CAP_X86_APIC_BUS_CYCLES_NS capability to configure the APIC
bus clock frequency for APIC timer emulation.
Allow KVM_ENABLE_CAPABILITY(KVM_CAP_X86_APIC_BUS_CYCLES_NS) to set the
frequency in nanoseconds. When using this capability, the user space
VMM should configure CPUID leaf 0x15 to advertise the frequency.

Description
-----------
Vishal reported [1] that the TDX guest kernel expects a 25MHz APIC bus
frequency but ends up getting interrupts at a significantly higher rate.

The TDX architecture hard-codes the core crystal clock frequency to
25MHz and mandates exposing it via CPUID leaf 0x15. The TDX architecture
does not allow the VMM to override the value.

In addition, per Intel SDM:
    "The APIC timer frequency will be the processor’s bus clock or core
     crystal clock frequency (when TSC/core crystal clock ratio is
     enumerated in CPUID leaf 0x15) divided by the value specified in
     the divide configuration register."

The resulting 25MHz APIC bus frequency conflicts with the KVM hardcoded
APIC bus frequency of 1GHz.

The KVM doesn't enumerate CPUID leaf 0x15 to the guest unless the user
space VMM sets it using KVM_SET_CPUID. If the CPUID leaf 0x15 is
enumerated, the guest kernel uses it as the APIC bus frequency. If not,
the guest kernel measures the frequency based on other known timers like
the ACPI timer or the legacy PIT. As reported in [1] the TDX guest kernel
expects a 25MHz timer frequency but gets timer interrupt more frequently
due to the 1GHz frequency used by KVM.

To ensure that the guest doesn't have a conflicting view of the APIC bus
frequency, allow the userspace to tell KVM to use the same frequency that
TDX mandates instead of the default 1Ghz.

There are several options to address this:
1. Make the KVM able to configure APIC bus frequency (this series).
   Pro: It resembles the existing hardware.  The recent Intel CPUs
        adopts 25MHz.
   Con: Require the VMM to emulate the APIC timer at 25MHz.
2. Make the TDX architecture enumerate CPUID leaf 0x15 to configurable
   frequency or not enumerate it.
   Pro: Any APIC bus frequency is allowed.
   Con: Deviates from TDX architecture.
3. Make the TDX guest kernel use 1GHz when it's running on KVM.
   Con: The kernel ignores CPUID leaf 0x15.
4. Change CPUID leaf 0x15 under TDX to report the crystal clock frequency
   as 1 GHz.
   Pro: This has been the virtual APIC frequency for KVM guests for 13
        years.
   Pro: This requires changing only one hard-coded constant in TDX.
   Con: It doesn't work with other VMMs as TDX isn't specific to KVM.
   Con: Core crystal clock frequency is also used to calculate TSC frequency.
   Con: If it is configured to value different from hardware, it will
        break the correctness of INTEL-PT Mini Time Count (MTC) packets
        in TDs.

Testing
-------
Tested on non-TDX host using included kselftest. Host running kernel
with this series applied to "next" branch of
https://github.com/kvm-x86/linux.git

Tested on TDX host and TD using a modified version
of x86/apic.c:test_apic_timer_one_shot() available from
https://github.com/intel/kvm-unit-tests-tdx/blob/tdx/x86/apic.c.
Host running TDX KVM development patches and QEMU with corresponding TDX
changes.
The test needed to be modified to (a) stop any lingering timers before the
test starts, and (b) use CPUID 0x15 in TDX to accurately determine the TSC
and APIC frequencies instead of making 1GHz assumption and use similar
check as the kselftest test introduced in this series (while increasing
the amount with which the frequency is allowed to deviate by 1%).

The core changes made to x86/apic.c:test_apic_timer_one_shot() for this
testing are shown below for reference. Work is in progress to upstream
these modifications.

@@ -477,11 +478,29 @@ static void lvtt_handler(isr_regs_t *regs)
 
 static void test_apic_timer_one_shot(void)
 {
-	uint64_t tsc1, tsc2;
 	static const uint32_t interval = 0x10000;
+	uint64_t measured_apic_freq, tsc2, tsc1;
+	uint32_t tsc_freq = 0, apic_freq = 0;
+	struct cpuid cpuid_tsc = {};
 
 #define APIC_LVT_TIMER_VECTOR    (0xee)
 
+	/*
+	 * CPUID 0x15 is not available in VMX, can use it to obtain
+	 * TSC and APIC frequency for accurate testing
+	 */
+	if (is_tdx_guest()) {
+		cpuid_tsc = raw_cpuid(0x15, 0);
+		tsc_freq = cpuid_tsc.c * cpuid_tsc.b / cpuid_tsc.a;
+		apic_freq = cpuid_tsc.c;
+	}
+	/*
+	   stop already fired local timer
+	   the test case can be negative failure if the timer fired
+	   after installed lvtt_handler but *before*
+	   write to TIMICT again.
+	 */
+	apic_write(APIC_TMICT, 0);
 	handle_irq(APIC_LVT_TIMER_VECTOR, lvtt_handler);
 
 	/* One shot mode */
@@ -503,8 +522,16 @@ static void test_apic_timer_one_shot(void)
 	 * cases, the following should satisfy on all modern
 	 * processors.
 	 */
-	report((lvtt_counter == 1) && (tsc2 - tsc1 >= interval),
-	       "APIC LVT timer one shot");
+	if (is_tdx_guest()) {
+		measured_apic_freq = interval * (tsc_freq / (tsc2 - tsc1));
+		report((lvtt_counter == 1) &&
+		       (measured_apic_freq < apic_freq * 102 / 100) &&
+		       (measured_apic_freq > apic_freq * 98 / 100),
+		       "APIC LVT timer one shot");
+	} else {
+		report((lvtt_counter == 1) && (tsc2 - tsc1 >= interval),
+		"APIC LVT timer one shot");
+	}
 }

[1] https://lore.kernel.org/lkml/20231006011255.4163884-1-vannapurve@google.com/

Isaku Yamahata (1):
  KVM: selftests: Add test for configure of x86 APIC bus frequency

Reinette Chatre (1):
  KVM: selftests: Add x86_64 guest udelay() utility

 tools/testing/selftests/kvm/Makefile          |   1 +
 .../selftests/kvm/include/x86_64/apic.h       |   8 +
 .../selftests/kvm/include/x86_64/processor.h  |  17 ++
 .../selftests/kvm/lib/x86_64/processor.c      |  11 +
 .../kvm/x86_64/apic_bus_clock_test.c          | 202 ++++++++++++++++++
 5 files changed, 239 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c

base-commit: e4e9e1067138e5620cf0500c3e5f6ebfb9d322c8

Comments

Sean Christopherson June 28, 2024, 10:55 p.m. UTC | #1
On Wed, 12 Jun 2024 11:16:10 -0700, Reinette Chatre wrote:
> Changes from v8:
> - v8: https://lore.kernel.org/lkml/cover.1718043121.git.reinette.chatre@intel.com/
> - Many changes to new udelay() utility patch as well as the APIC bus
>   frequency test aimed to make it more robust (additional ASSERTs,
>   consistent types, eliminate duplicate code, etc.) and useful with
>   support for more user configuration. Please refer to individual patches for
>   detailed changes.
> - Series applies cleanly to next branch of kvm-x86 with HEAD
>   e4e9e1067138e5620cf0500c3e5f6ebfb9d322c8.
> 
> [...]

Applied to kvm-x86 misc, with all the changes mentioned in my earlier replies.
I'm out next week, and don't want to merge the KVM changes without these tests,
hence the rushed application.

Please holler if you disagree with anything (or if I broke something).  I won't
respond until July 8th at the earliest, but worst case scenario we can do fixup
patches after 6.11-rc1.

[1/2] KVM: selftests: Add x86_64 guest udelay() utility
      https://github.com/kvm-x86/linux/commit/6b878cbb87bf
[2/2] KVM: selftests: Add test for configure of x86 APIC bus frequency
      https://github.com/kvm-x86/linux/commit/82222ee7e84c

--
https://github.com/kvm-x86/linux/tree/next
Reinette Chatre June 29, 2024, 12:10 a.m. UTC | #2
Hi Sean,

On 6/28/24 3:55 PM, Sean Christopherson wrote:
> On Wed, 12 Jun 2024 11:16:10 -0700, Reinette Chatre wrote:
>> Changes from v8:
>> - v8: https://lore.kernel.org/lkml/cover.1718043121.git.reinette.chatre@intel.com/
>> - Many changes to new udelay() utility patch as well as the APIC bus
>>    frequency test aimed to make it more robust (additional ASSERTs,
>>    consistent types, eliminate duplicate code, etc.) and useful with
>>    support for more user configuration. Please refer to individual patches for
>>    detailed changes.
>> - Series applies cleanly to next branch of kvm-x86 with HEAD
>>    e4e9e1067138e5620cf0500c3e5f6ebfb9d322c8.
>>
>> [...]
> 
> Applied to kvm-x86 misc, with all the changes mentioned in my earlier replies.
> I'm out next week, and don't want to merge the KVM changes without these tests,
> hence the rushed application.
> 
> Please holler if you disagree with anything (or if I broke something).  I won't
> respond until July 8th at the earliest, but worst case scenario we can do fixup
> patches after 6.11-rc1.

Thank you very much for taking the time to make the changes and apply the patches.
All the changes look good to me and passes my testing.

Now that the x86 udelay() utility no longer use cpu_relax(), should ARM
and RISC-V's udelay() be modified to match in this regard? I can prepare
(unable to test) changes for you to consider on your return.

Reinette
Sean Christopherson July 10, 2024, 3:42 p.m. UTC | #3
On Fri, Jun 28, 2024, Reinette Chatre wrote:
> Hi Sean,
> 
> On 6/28/24 3:55 PM, Sean Christopherson wrote:
> > On Wed, 12 Jun 2024 11:16:10 -0700, Reinette Chatre wrote:
> > > Changes from v8:
> > > - v8: https://lore.kernel.org/lkml/cover.1718043121.git.reinette.chatre@intel.com/
> > > - Many changes to new udelay() utility patch as well as the APIC bus
> > >    frequency test aimed to make it more robust (additional ASSERTs,
> > >    consistent types, eliminate duplicate code, etc.) and useful with
> > >    support for more user configuration. Please refer to individual patches for
> > >    detailed changes.
> > > - Series applies cleanly to next branch of kvm-x86 with HEAD
> > >    e4e9e1067138e5620cf0500c3e5f6ebfb9d322c8.
> > > 
> > > [...]
> > 
> > Applied to kvm-x86 misc, with all the changes mentioned in my earlier replies.
> > I'm out next week, and don't want to merge the KVM changes without these tests,
> > hence the rushed application.
> > 
> > Please holler if you disagree with anything (or if I broke something).  I won't
> > respond until July 8th at the earliest, but worst case scenario we can do fixup
> > patches after 6.11-rc1.
> 
> Thank you very much for taking the time to make the changes and apply the patches.
> All the changes look good to me and passes my testing.
> 
> Now that the x86 udelay() utility no longer use cpu_relax(), should ARM
> and RISC-V's udelay() be modified to match in this regard? I can prepare
> (unable to test) changes for you to consider on your return.

I don't think so?  IIUC, arm64's "yield", used by cpu_relax() doesn't trigger the
"on spin" exists.  Such exist are only triggered by "wfet" and friends.
Reinette Chatre July 10, 2024, 5:14 p.m. UTC | #4
On 7/10/24 8:42 AM, Sean Christopherson wrote:
> On Fri, Jun 28, 2024, Reinette Chatre wrote:

>> Now that the x86 udelay() utility no longer use cpu_relax(), should ARM
>> and RISC-V's udelay() be modified to match in this regard? I can prepare
>> (unable to test) changes for you to consider on your return.
> 
> I don't think so?  IIUC, arm64's "yield", used by cpu_relax() doesn't trigger the
> "on spin" exists.  Such exist are only triggered by "wfet" and friends.

ah, I see, thank you very much Sean.

Reinette