diff mbox series

KVM: x86: Advertise AVX10.1 CPUID to userspace

Message ID 20240520022002.1494056-1-tao1.su@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series KVM: x86: Advertise AVX10.1 CPUID to userspace | expand

Commit Message

Tao Su May 20, 2024, 2:20 a.m. UTC
Advertise AVX10.1 related CPUIDs, i.e. report AVX10 support bit via
CPUID.(EAX=07H, ECX=01H):EDX[bit 19] and new CPUID leaf 0x24H so that
guest OS and applications can query the AVX10.1 CPUIDs directly. Intel
AVX10 represents the first major new vector ISA since the introduction of
Intel AVX512, which will establish a common, converged vector instruction
set across all Intel architectures[1].

AVX10.1 is an early version of AVX10, that enumerates the Intel AVX512
instruction set at 128, 256, and 512 bits which is enabled on
Granite Rapids. I.e., AVX10.1 is only a new CPUID enumeration without
any VMX controls.

Advertising AVX10.1 is safe because kernel doesn't enable AVX10.1 which is
on KVM-only leaf now, just the CPUID checking is changed when using AVX512
related instructions, e.g. if using one AVX512 instruction needs to check
(AVX512 AND AVX512DQ), it can check ((AVX512 AND AVX512DQ) OR AVX10.1)
after checking XCR0[7:5].

The versions of AVX10 are expected to be inclusive, e.g. version N+1 is
a superset of version N, so just advertise AVX10.1 if it's supported in
hardware.

[1] https://cdrdv2.intel.com/v1/dl/getContent/784267

Signed-off-by: Tao Su <tao1.su@linux.intel.com>
---
 arch/x86/include/asm/cpuid.h |  1 +
 arch/x86/kvm/cpuid.c         | 20 ++++++++++++++++++--
 arch/x86/kvm/reverse_cpuid.h |  1 +
 3 files changed, 20 insertions(+), 2 deletions(-)


base-commit: eb6a9339efeb6f3d2b5c86fdf2382cdc293eca2c

Comments

Sean Christopherson May 20, 2024, 2:43 p.m. UTC | #1
On Mon, May 20, 2024, Tao Su wrote:
> @@ -1162,6 +1162,22 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
>  			break;
>  		}
>  		break;
> +	case 0x24: {
> +		u8 avx10_version;
> +		u32 vector_support;
> +
> +		if (!kvm_cpu_cap_has(X86_FEATURE_AVX10)) {
> +			entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
> +			break;
> +		}
> +		avx10_version = min(entry->ebx & 0xff, 1);

Taking the min() of '1' and anything else is pointless.  Per the spec, the version
can never be 0.

  CPUID.(EAX=24H, ECX=00H):EBX[bits 7:0]  Reports the Intel AVX10 Converged Vector ISA version. Integer (≥ 1)

And it's probably too late, but why on earth is there an AVX10 version number?
Version numbers are _awful_ for virtualization; see the constant vPMU problems
that arise from bundling things under a single version number..  Y'all carved out
room for sub-leafs, i.e. there's a ton of room for "discrete feature bits", so
why oh why is there a version number?

> +		vector_support = entry->ebx & GENMASK(18, 16);

Please add proper defines somewhere, this this can be something like:

		/* EBX[7:0] hold the AVX10 version; KVM supports version '1'. */
		entry->eax = 0;
		entry->ebx = (entry->ebx & AVX10_VECTOR_SIZES_MASK) | 1;
		entry->ecx = 0;
		entry->edx = 0;

Or perhaps we should have feature bits for the vector sizes, because that's really
what they are.  Mixing feature bits in with a version number makes for painful
code, but there's nothing KVM can do about that.  With proper features, this then
becomes something like:

		entry->eax = 0;
		cpuid_entry_override(entry, CPUID_24_0_EBX);
		/* EBX[7:0] hold the AVX10 version; KVM supports version '1'. */
		entry->ebx |= 1;
		entry->ecx = 0;
		entry->edx = 0;
Tao Su May 21, 2024, 3:08 a.m. UTC | #2
On Mon, May 20, 2024 at 07:43:50AM -0700, Sean Christopherson wrote:
> On Mon, May 20, 2024, Tao Su wrote:
> > @@ -1162,6 +1162,22 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
> >  			break;
> >  		}
> >  		break;
> > +	case 0x24: {
> > +		u8 avx10_version;
> > +		u32 vector_support;
> > +
> > +		if (!kvm_cpu_cap_has(X86_FEATURE_AVX10)) {
> > +			entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
> > +			break;
> > +		}
> > +		avx10_version = min(entry->ebx & 0xff, 1);
> 
> Taking the min() of '1' and anything else is pointless.  Per the spec, the version
> can never be 0.
> 
>   CPUID.(EAX=24H, ECX=00H):EBX[bits 7:0]  Reports the Intel AVX10 Converged Vector ISA version. Integer (≥ 1)
> 
> And it's probably too late, but why on earth is there an AVX10 version number?
> Version numbers are _awful_ for virtualization; see the constant vPMU problems
> that arise from bundling things under a single version number..  Y'all carved out
> room for sub-leafs, i.e. there's a ton of room for "discrete feature bits", so
> why oh why is there a version number?
> 

Per the spec, AVX10 wants to reduce the number of CPUID feature flags required
to be checked, which may simplify application development. Application only
needs to check the version number that can know whether hardware supports an
instruction. There's indeed a sub-leaf for enumerating discrete CPUID feature
bits, but the sub-leaf is only in the rare case.

AVX10.2 (version number == 2) is the initial and fully-featured version of
AVX10, we may need to advertise AVX10.2 in the future. Is keeping min() more
flexible to control the advertised version number? E.g.

    avx10_version = min(entry->ebx & 0xff, 2);

can advertise AVX10.2 to userspace.

> > +		vector_support = entry->ebx & GENMASK(18, 16);
> 
> Please add proper defines somewhere, this this can be something like:
> 
> 		/* EBX[7:0] hold the AVX10 version; KVM supports version '1'. */
> 		entry->eax = 0;
> 		entry->ebx = (entry->ebx & AVX10_VECTOR_SIZES_MASK) | 1;
> 		entry->ecx = 0;
> 		entry->edx = 0;
> 

Yes, its readability will be better.

> Or perhaps we should have feature bits for the vector sizes, because that's really
> what they are.  Mixing feature bits in with a version number makes for painful
> code, but there's nothing KVM can do about that.  With proper features, this then
> becomes something like:
> 
> 		entry->eax = 0;
> 		cpuid_entry_override(entry, CPUID_24_0_EBX);
> 		/* EBX[7:0] hold the AVX10 version; KVM supports version '1'. */
> 		entry->ebx |= 1;
> 		entry->ecx = 0;
> 		entry->edx = 0;

Agree, I will introduce the feature bits for the vector sizes, thanks!
Sean Christopherson May 21, 2024, 7:41 p.m. UTC | #3
On Tue, May 21, 2024, Tao Su wrote:
> On Mon, May 20, 2024 at 07:43:50AM -0700, Sean Christopherson wrote:
> > On Mon, May 20, 2024, Tao Su wrote:
> > > @@ -1162,6 +1162,22 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
> > >  			break;
> > >  		}
> > >  		break;
> > > +	case 0x24: {
> > > +		u8 avx10_version;
> > > +		u32 vector_support;
> > > +
> > > +		if (!kvm_cpu_cap_has(X86_FEATURE_AVX10)) {
> > > +			entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
> > > +			break;
> > > +		}
> > > +		avx10_version = min(entry->ebx & 0xff, 1);
> > 
> > Taking the min() of '1' and anything else is pointless.  Per the spec, the version
> > can never be 0.
> > 
> >   CPUID.(EAX=24H, ECX=00H):EBX[bits 7:0]  Reports the Intel AVX10 Converged Vector ISA version. Integer (≥ 1)
> > 
> > And it's probably too late, but why on earth is there an AVX10 version number?
> > Version numbers are _awful_ for virtualization; see the constant vPMU problems
> > that arise from bundling things under a single version number..  Y'all carved out
> > room for sub-leafs, i.e. there's a ton of room for "discrete feature bits", so
> > why oh why is there a version number?
> > 
> 
> Per the spec, AVX10 wants to reduce the number of CPUID feature flags required
> to be checked, which may simplify application development. Application only
> needs to check the version number that can know whether hardware supports an
> instruction.

I get that, but it royally hoses virtualization.  Bundling multiple features
under a single flag is annoying, e.g. it makes it impossible to selectively
advertise features, but I can appreciate that there are situations where having
one feature but not another is nonsensical.

Incrementing version numbers are a whole other level of bad though.  E.g. if
AVX10.2 has a feature that shouldn't be enumerated to guests for whatever reason,
then KVM can't enumerate any "later" features either, because the only way to hide
the problematic AVX10.2 feature is to set the version to AVX10.1 or lower.

FWIW, unlike the PMU, which is a bit of a disaster due to version numbers, I don't
expect AVX to be problematic in practice.  E.g. most AVX features are just passed
through and don't have virtualization controls.  I just think it's a terrible
tradeoff.  E.g. if features really need to be bundled together, I don't see how
application development is meaningfully more difficult if enumeration is done
via a multi-purpose CPUID flag, versus a version number.

> There's indeed a sub-leaf for enumerating discrete CPUID feature bits, but
> the sub-leaf is only in the rare case.
> 
> AVX10.2 (version number == 2) is the initial and fully-featured version of

So what's AVX10.1?

> AVX10, we may need to advertise AVX10.2 in the future. Is keeping min() more
> flexible to control the advertised version number? E.g.
> 
>     avx10_version = min(entry->ebx & 0xff, 2);
> 
> can advertise AVX10.2 to userspace.

I'm not worried about flexibility at this point, as much as I'm worried about
having sensible code.  E.g. if we know AVX10.2 is coming (or already here?), why
not set KVM's supported min version to 2 from the get-go?
Tao Su May 22, 2024, 3:33 a.m. UTC | #4
On Tue, May 21, 2024 at 12:41:54PM -0700, Sean Christopherson wrote:
> On Tue, May 21, 2024, Tao Su wrote:
> > On Mon, May 20, 2024 at 07:43:50AM -0700, Sean Christopherson wrote:
> > > On Mon, May 20, 2024, Tao Su wrote:
> > > > @@ -1162,6 +1162,22 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
> > > >  			break;
> > > >  		}
> > > >  		break;
> > > > +	case 0x24: {
> > > > +		u8 avx10_version;
> > > > +		u32 vector_support;
> > > > +
> > > > +		if (!kvm_cpu_cap_has(X86_FEATURE_AVX10)) {
> > > > +			entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
> > > > +			break;
> > > > +		}
> > > > +		avx10_version = min(entry->ebx & 0xff, 1);
> > > 
> > > Taking the min() of '1' and anything else is pointless.  Per the spec, the version
> > > can never be 0.
> > > 
> > >   CPUID.(EAX=24H, ECX=00H):EBX[bits 7:0]  Reports the Intel AVX10 Converged Vector ISA version. Integer (≥ 1)
> > > 
> > > And it's probably too late, but why on earth is there an AVX10 version number?
> > > Version numbers are _awful_ for virtualization; see the constant vPMU problems
> > > that arise from bundling things under a single version number..  Y'all carved out
> > > room for sub-leafs, i.e. there's a ton of room for "discrete feature bits", so
> > > why oh why is there a version number?
> > > 
> > 
> > Per the spec, AVX10 wants to reduce the number of CPUID feature flags required
> > to be checked, which may simplify application development. Application only
> > needs to check the version number that can know whether hardware supports an
> > instruction.
> 
> I get that, but it royally hoses virtualization.  Bundling multiple features
> under a single flag is annoying, e.g. it makes it impossible to selectively
> advertise features, but I can appreciate that there are situations where having
> one feature but not another is nonsensical.
> 
> Incrementing version numbers are a whole other level of bad though.  E.g. if
> AVX10.2 has a feature that shouldn't be enumerated to guests for whatever reason,
> then KVM can't enumerate any "later" features either, because the only way to hide
> the problematic AVX10.2 feature is to set the version to AVX10.1 or lower.
> 

I see, if a 'small part' of a version cannot be advertised, it will block the
virtualization of all subsequent versions. If this special 'small part' is
really introduced later, I believe this will belong to the rare case and be
enumerated in the sub-leaf of CPUID leaf 24H.

> FWIW, unlike the PMU, which is a bit of a disaster due to version numbers, I don't
> expect AVX to be problematic in practice.  E.g. most AVX features are just passed
> through and don't have virtualization controls.

Yes, I can’t agree more.

> I just think it's a terrible
> tradeoff.  E.g. if features really need to be bundled together, I don't see how
> application development is meaningfully more difficult if enumeration is done
> via a multi-purpose CPUID flag, versus a version number.
> 

For applications, it seems no significant advantage to the version number.
Maybe applications can batch operations based on the version number and the
supported vector length.

> > There's indeed a sub-leaf for enumerating discrete CPUID feature bits, but
> > the sub-leaf is only in the rare case.
> > 
> > AVX10.2 (version number == 2) is the initial and fully-featured version of
> 
> So what's AVX10.1?
>

AVX10.1 just adds the related CPUIDs for software pre-enabling, i.e. AVX10.1
has no VMX capability, Embedded rounding and Suppress All Exceptions (SAE)
control, which will be introduced in AVX10.2.

> > AVX10, we may need to advertise AVX10.2 in the future. Is keeping min() more
> > flexible to control the advertised version number? E.g.
> > 
> >     avx10_version = min(entry->ebx & 0xff, 2);
> > 
> > can advertise AVX10.2 to userspace.
> 
> I'm not worried about flexibility at this point, as much as I'm worried about
> having sensible code.  E.g. if we know AVX10.2 is coming (or already here?), why
> not set KVM's supported min version to 2 from the get-go?

Per the spec, AVX10.2 will have a VMX capability, i.e. in the future, KVM may
have to do something before advertising AVX10.2. But now there are CPUs that
support AVX10.1, so AVX10.1 should to be advertised to guest firstly.
diff mbox series

Patch

diff --git a/arch/x86/include/asm/cpuid.h b/arch/x86/include/asm/cpuid.h
index 6b122a31da06..aa21c105eef1 100644
--- a/arch/x86/include/asm/cpuid.h
+++ b/arch/x86/include/asm/cpuid.h
@@ -179,6 +179,7 @@  static __always_inline bool cpuid_function_is_indexed(u32 function)
 	case 0x1d:
 	case 0x1e:
 	case 0x1f:
+	case 0x24:
 	case 0x8000001d:
 		return true;
 	}
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index f2f2be5d1141..ef9e3a4ed461 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -693,7 +693,7 @@  void kvm_set_cpu_caps(void)
 
 	kvm_cpu_cap_init_kvm_defined(CPUID_7_1_EDX,
 		F(AVX_VNNI_INT8) | F(AVX_NE_CONVERT) | F(PREFETCHITI) |
-		F(AMX_COMPLEX)
+		F(AMX_COMPLEX) | F(AVX10)
 	);
 
 	kvm_cpu_cap_init_kvm_defined(CPUID_7_2_EDX,
@@ -937,7 +937,7 @@  static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
 	switch (function) {
 	case 0:
 		/* Limited to the highest leaf implemented in KVM. */
-		entry->eax = min(entry->eax, 0x1fU);
+		entry->eax = min(entry->eax, 0x24U);
 		break;
 	case 1:
 		cpuid_entry_override(entry, CPUID_1_EDX);
@@ -1162,6 +1162,22 @@  static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
 			break;
 		}
 		break;
+	case 0x24: {
+		u8 avx10_version;
+		u32 vector_support;
+
+		if (!kvm_cpu_cap_has(X86_FEATURE_AVX10)) {
+			entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
+			break;
+		}
+		avx10_version = min(entry->ebx & 0xff, 1);
+		vector_support = entry->ebx & GENMASK(18, 16);
+		entry->eax = 0;
+		entry->ebx = vector_support | avx10_version;
+		entry->ecx = 0;
+		entry->edx = 0;
+		break;
+	}
 	case KVM_CPUID_SIGNATURE: {
 		const u32 *sigptr = (const u32 *)KVM_SIGNATURE;
 		entry->eax = KVM_CPUID_FEATURES;
diff --git a/arch/x86/kvm/reverse_cpuid.h b/arch/x86/kvm/reverse_cpuid.h
index 2f4e155080ba..695e1fb8d5bc 100644
--- a/arch/x86/kvm/reverse_cpuid.h
+++ b/arch/x86/kvm/reverse_cpuid.h
@@ -46,6 +46,7 @@  enum kvm_only_cpuid_leafs {
 #define X86_FEATURE_AVX_NE_CONVERT      KVM_X86_FEATURE(CPUID_7_1_EDX, 5)
 #define X86_FEATURE_AMX_COMPLEX         KVM_X86_FEATURE(CPUID_7_1_EDX, 8)
 #define X86_FEATURE_PREFETCHITI         KVM_X86_FEATURE(CPUID_7_1_EDX, 14)
+#define X86_FEATURE_AVX10               KVM_X86_FEATURE(CPUID_7_1_EDX, 19)
 
 /* Intel-defined sub-features, CPUID level 0x00000007:2 (EDX) */
 #define X86_FEATURE_INTEL_PSFD		KVM_X86_FEATURE(CPUID_7_2_EDX, 0)