Message ID | 20230802022954.193843-1-tao1.su@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: x86: Advertise AMX-COMPLEX CPUID to userspace | expand |
On 8/2/2023 10:29 AM, Tao Su wrote: > Latest Intel platform GraniteRapids-D introduces AMX-COMPLEX, which adds > two instructions to perform matrix multiplication of two tiles containing > complex elements and accumulate the results into a packed single precision > tile. > > AMX-COMPLEX is enumerated via CPUID.(EAX=7,ECX=1):EDX[bit 8] > > Since there are no new VMX controls or additional host enabling required > for guests to use this feature, advertise the CPUID to userspace. > > Signed-off-by: Tao Su <tao1.su@linux.intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> > --- > arch/x86/kvm/cpuid.c | 3 ++- > arch/x86/kvm/reverse_cpuid.h | 1 + > 2 files changed, 3 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c > index 7f4d13383cf2..883ec8d5a77f 100644 > --- a/arch/x86/kvm/cpuid.c > +++ b/arch/x86/kvm/cpuid.c > @@ -647,7 +647,8 @@ void kvm_set_cpu_caps(void) > ); > > kvm_cpu_cap_init_kvm_defined(CPUID_7_1_EDX, > - F(AVX_VNNI_INT8) | F(AVX_NE_CONVERT) | F(PREFETCHITI) > + F(AVX_VNNI_INT8) | F(AVX_NE_CONVERT) | F(PREFETCHITI) | > + F(AMX_COMPLEX) > ); > > kvm_cpu_cap_mask(CPUID_D_1_EAX, > diff --git a/arch/x86/kvm/reverse_cpuid.h b/arch/x86/kvm/reverse_cpuid.h > index 56cbdb24400a..b81650678375 100644 > --- a/arch/x86/kvm/reverse_cpuid.h > +++ b/arch/x86/kvm/reverse_cpuid.h > @@ -43,6 +43,7 @@ enum kvm_only_cpuid_leafs { > /* Intel-defined sub-features, CPUID level 0x00000007:1 (EDX) */ > #define X86_FEATURE_AVX_VNNI_INT8 KVM_X86_FEATURE(CPUID_7_1_EDX, 4) > #define X86_FEATURE_AVX_NE_CONVERT KVM_X86_FEATURE(CPUID_7_1_EDX, 5) > +#define X86_FEATURE_AMX_COMPLEX KVM_X86_FEATURE(CPUID_7_1_EDX, 8) > #define X86_FEATURE_PREFETCHITI KVM_X86_FEATURE(CPUID_7_1_EDX, 14) > > /* CPUID level 0x80000007 (EDX). */ > > base-commit: 5d0c230f1de8c7515b6567d9afba1f196fb4e2f4
On Wed, Aug 02, 2023, Tao Su wrote: > Latest Intel platform GraniteRapids-D introduces AMX-COMPLEX, which adds > two instructions to perform matrix multiplication of two tiles containing > complex elements and accumulate the results into a packed single precision > tile. > > AMX-COMPLEX is enumerated via CPUID.(EAX=7,ECX=1):EDX[bit 8] > > Since there are no new VMX controls or additional host enabling required > for guests to use this feature, advertise the CPUID to userspace. Nit, I would rather justify this (last paragraph) with something like: Advertise AMX_COMPLEX if it's supported in hardware. There are no VMX controls for the feature, i.e. the instructions can't be interecepted, and KVM advertises base AMX in CPUID if AMX is supported in hardware, even if KVM doesn't advertise AMX as being supported in XCR0, e.g. because the process didn't opt-in to allocating tile data. If the above is accurate and there are no objections, I'll fixup the changelog when applying. Side topic, this does make me wonder if advertising AMX when XTILE_DATA isn't permitted is a bad idea. But no one has complained, and chasing down all the dependent AMX features would get annoying, so I'm inclined to keep the status quo.
On 8/3/2023 7:36 AM, Sean Christopherson wrote: > On Wed, Aug 02, 2023, Tao Su wrote: >> Latest Intel platform GraniteRapids-D introduces AMX-COMPLEX, which adds >> two instructions to perform matrix multiplication of two tiles containing >> complex elements and accumulate the results into a packed single precision >> tile. >> >> AMX-COMPLEX is enumerated via CPUID.(EAX=7,ECX=1):EDX[bit 8] >> >> Since there are no new VMX controls or additional host enabling required >> for guests to use this feature, advertise the CPUID to userspace. > > Nit, I would rather justify this (last paragraph) with something like: > > Advertise AMX_COMPLEX if it's supported in hardware. There are no VMX > controls for the feature, i.e. the instructions can't be interecepted, and > KVM advertises base AMX in CPUID if AMX is supported in hardware, even if > KVM doesn't advertise AMX as being supported in XCR0, e.g. because the > process didn't opt-in to allocating tile data. It looks good to me. > If the above is accurate and there are no objections, I'll fixup the changelog > when applying. > > Side topic, this does make me wonder if advertising AMX when XTILE_DATA isn't > permitted is a bad idea. But no one has complained, and chasing down all the > dependent AMX features would get annoying, so I'm inclined to keep the status quo.
On Wed, Aug 02, 2023 at 04:36:10PM -0700, Sean Christopherson wrote: > On Wed, Aug 02, 2023, Tao Su wrote: > > Latest Intel platform GraniteRapids-D introduces AMX-COMPLEX, which adds > > two instructions to perform matrix multiplication of two tiles containing > > complex elements and accumulate the results into a packed single precision > > tile. > > > > AMX-COMPLEX is enumerated via CPUID.(EAX=7,ECX=1):EDX[bit 8] > > > > Since there are no new VMX controls or additional host enabling required > > for guests to use this feature, advertise the CPUID to userspace. > > Nit, I would rather justify this (last paragraph) with something like: > > Advertise AMX_COMPLEX if it's supported in hardware. There are no VMX > controls for the feature, i.e. the instructions can't be interecepted, and > KVM advertises base AMX in CPUID if AMX is supported in hardware, even if > KVM doesn't advertise AMX as being supported in XCR0, e.g. because the > process didn't opt-in to allocating tile data. > > If the above is accurate and there are no objections, I'll fixup the changelog > when applying. Totally agree. > > Side topic, this does make me wonder if advertising AMX when XTILE_DATA isn't > permitted is a bad idea. But no one has complained, and chasing down all the > dependent AMX features would get annoying, so I'm inclined to keep the status quo. From the description of AMX exception, there is no CPUID checking and #UD will be produced if XCR0[18:17] != 0b11. Since user applications should check both the XCR0 and CPUIDs before using related AMX instructions, I don't think there should be bad effects in keeping the status quo. Thanks, Tao
On 8/3/23 01:36, Sean Christopherson wrote: > Side topic, this does make me wonder if advertising AMX when XTILE_DATA isn't > permitted is a bad idea. But no one has complained, and chasing down all the > dependent AMX features would get annoying, so I'm inclined to keep the status quo. I think it should not be an issue because you have to check XCR0 anyway before using AMX. OS kernels know what's in XCR0 but still they need to check the save state leaves in CPUID[0xD]. In neither case will the instruction leaves in CPUID[7] be the only input to the test (if they are used at all). Paolo
On Wed, 02 Aug 2023 10:29:54 +0800, Tao Su wrote: > Latest Intel platform GraniteRapids-D introduces AMX-COMPLEX, which adds > two instructions to perform matrix multiplication of two tiles containing > complex elements and accumulate the results into a packed single precision > tile. > > AMX-COMPLEX is enumerated via CPUID.(EAX=7,ECX=1):EDX[bit 8] > > [...] Applied to kvm-x86 misc, thanks! [1/1] KVM: x86: Advertise AMX-COMPLEX CPUID to userspace https://github.com/kvm-x86/linux/commit/99b668545356 -- https://github.com/kvm-x86/linux/tree/next https://github.com/kvm-x86/linux/tree/fixes
On Thu, Aug 03, 2023 at 05:40:28PM -0700, Sean Christopherson wrote: > On Wed, 02 Aug 2023 10:29:54 +0800, Tao Su wrote: > > Latest Intel platform GraniteRapids-D introduces AMX-COMPLEX, which adds > > two instructions to perform matrix multiplication of two tiles containing > > complex elements and accumulate the results into a packed single precision > > tile. > > > > AMX-COMPLEX is enumerated via CPUID.(EAX=7,ECX=1):EDX[bit 8] > > > > [...] > > Applied to kvm-x86 misc, thanks! Sean, thanks! > > [1/1] KVM: x86: Advertise AMX-COMPLEX CPUID to userspace > https://github.com/kvm-x86/linux/commit/99b668545356 > > -- > https://github.com/kvm-x86/linux/tree/next > https://github.com/kvm-x86/linux/tree/fixes
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 7f4d13383cf2..883ec8d5a77f 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -647,7 +647,8 @@ void kvm_set_cpu_caps(void) ); kvm_cpu_cap_init_kvm_defined(CPUID_7_1_EDX, - F(AVX_VNNI_INT8) | F(AVX_NE_CONVERT) | F(PREFETCHITI) + F(AVX_VNNI_INT8) | F(AVX_NE_CONVERT) | F(PREFETCHITI) | + F(AMX_COMPLEX) ); kvm_cpu_cap_mask(CPUID_D_1_EAX, diff --git a/arch/x86/kvm/reverse_cpuid.h b/arch/x86/kvm/reverse_cpuid.h index 56cbdb24400a..b81650678375 100644 --- a/arch/x86/kvm/reverse_cpuid.h +++ b/arch/x86/kvm/reverse_cpuid.h @@ -43,6 +43,7 @@ enum kvm_only_cpuid_leafs { /* Intel-defined sub-features, CPUID level 0x00000007:1 (EDX) */ #define X86_FEATURE_AVX_VNNI_INT8 KVM_X86_FEATURE(CPUID_7_1_EDX, 4) #define X86_FEATURE_AVX_NE_CONVERT KVM_X86_FEATURE(CPUID_7_1_EDX, 5) +#define X86_FEATURE_AMX_COMPLEX KVM_X86_FEATURE(CPUID_7_1_EDX, 8) #define X86_FEATURE_PREFETCHITI KVM_X86_FEATURE(CPUID_7_1_EDX, 14) /* CPUID level 0x80000007 (EDX). */
Latest Intel platform GraniteRapids-D introduces AMX-COMPLEX, which adds two instructions to perform matrix multiplication of two tiles containing complex elements and accumulate the results into a packed single precision tile. AMX-COMPLEX is enumerated via CPUID.(EAX=7,ECX=1):EDX[bit 8] Since there are no new VMX controls or additional host enabling required for guests to use this feature, advertise the CPUID to userspace. Signed-off-by: Tao Su <tao1.su@linux.intel.com> --- arch/x86/kvm/cpuid.c | 3 ++- arch/x86/kvm/reverse_cpuid.h | 1 + 2 files changed, 3 insertions(+), 1 deletion(-) base-commit: 5d0c230f1de8c7515b6567d9afba1f196fb4e2f4