Message ID | 20210414110433.83084-3-roger.pau@citrix.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | x86/amd: LFENCE always serializing CPUID bit | expand |
On 14/04/2021 12:04, Roger Pau Monne wrote: > Milan AMD CPUs I'd suggest "AMD Milan (Zen3) CPUs" for people who don't know the AMD codenames inside/out. > have an LFENCE Always Serializing CPUID bit in leaf > 80000021.eax. Its probably worth noting that this is because of SEV-SNP, which is a headline feature in Milan. There must not be anything the VMM can do to impact the confidentiality of an SEV-SNP VM, and breaking Spectre protections is definitely a problem. In a post-Spectre world, no system should be running without LFENCE set to dispatch serialising. > Previous AMD versions used to have a user settable bit > in DE_CFG MSR to select whether LFENCE was dispatch serializing, which > Xen always attempts to set. > > In order to support this new CPUID bit move the LFENCE_DISPATCH > synthetic CPUID bit to map the hardware bit (leaving a hole in the > synthetic range) and either rely on the bit already being set by the > native CPUID output, or attempt to fake it in Xen by modifying the > DE_CFG MSR. This requires adding one more entry to the featureset to > support leaf 80000021.eax. > > The bit is exposed to guests by default, as a way to signal that > LFENCE is always serializing, either because it's mandated by > hardware, or because Xen has performed the necessary arrangements. > Note that Xen doesn't allow guests to change the DE_CFG value, so once > set by Xen LFENCE will always be serializing. > > Note that the access to DE_CFG by guests is left as-is: reads will > unconditionally return LFENCE_SERIALISE bit set, while writes are > silently dropped. The MSR is not present on AMD Milan hardware Not all MSRs are listed in the PPR. This MSR does exist, and is my understanding that the bit still exists, and is read-as-1/write-discard, so the past 3 years of software will still conclude "lfence safe" when inspecting the MSR. One of the speculation whitepapers says that this bit is going to remain fixed in the future to simplify software needing to interact with it. > , but > I'm not sure there's any value in adding logic in Xen to also hide it > from guests when running on such hardware. > > Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> > --- > tools/misc/xen-cpuid.c | 6 ++++ Need to patch the translation table in libxl_cpuid.c too. See c/s 23ccf530431561 for a reference. > xen/arch/x86/cpu/amd.c | 7 ++++ > xen/arch/x86/cpu/common.c | 3 ++ > xen/include/asm-x86/cpufeatures.h | 2 +- > xen/include/public/arch-x86/cpufeatureset.h | 3 ++ > xen/include/xen/lib/x86/cpuid.h | 37 ++++++++++++++++++++- > 6 files changed, 56 insertions(+), 2 deletions(-) > > diff --git a/tools/misc/xen-cpuid.c b/tools/misc/xen-cpuid.c > index 2d04162d8d8..38406baadad 100644 > --- a/tools/misc/xen-cpuid.c > +++ b/tools/misc/xen-cpuid.c > @@ -178,6 +178,11 @@ static const char *const str_7a1[32] = > [ 4] = "avx-vnni", [ 5] = "avx512-bf16", > }; > > +static const char *const str_e21a[32] = > +{ > + [ 2] = "lfence-always-serializing", This is a bit of a mouthful. One problem is the fact that "serialising" is an ambiguous term, because neither Intel nor AMD formally specify what it means in the architecture. There is a description of what "architecturally serialising" does, which does occasionally move over time, and the name of this CPUID bit in the PPR confusing at best, because it very much isn't "architecturally serialising", and the term "dispatch serialising" isn't actually defined anywhere. Intel have now started referring to LFENCE as a "speculative execution barrier", but this still doesn't have a precise definition. I'm afraid I don't have a useful suggestion for something short and concise, which also conveys the precise meaning. ~Andrew
On 14.04.2021 14:33, Andrew Cooper wrote: > On 14/04/2021 12:04, Roger Pau Monne wrote: >> --- a/tools/misc/xen-cpuid.c >> +++ b/tools/misc/xen-cpuid.c >> @@ -178,6 +178,11 @@ static const char *const str_7a1[32] = >> [ 4] = "avx-vnni", [ 5] = "avx512-bf16", >> }; >> >> +static const char *const str_e21a[32] = >> +{ >> + [ 2] = "lfence-always-serializing", > > This is a bit of a mouthful. One problem is the fact that "serialising" > is an ambiguous term, because neither Intel nor AMD formally specify > what it means in the architecture. > > There is a description of what "architecturally serialising" does, which > does occasionally move over time, and the name of this CPUID bit in the > PPR confusing at best, because it very much isn't "architecturally > serialising", and the term "dispatch serialising" isn't actually defined > anywhere. > > Intel have now started referring to LFENCE as a "speculative execution > barrier", but this still doesn't have a precise definition. > > I'm afraid I don't have a useful suggestion for something short and > concise, which also conveys the precise meaning. How about "lfence+" or some such? Jan
On 14/04/2021 13:45, Jan Beulich wrote: > On 14.04.2021 14:33, Andrew Cooper wrote: >> On 14/04/2021 12:04, Roger Pau Monne wrote: >>> --- a/tools/misc/xen-cpuid.c >>> +++ b/tools/misc/xen-cpuid.c >>> @@ -178,6 +178,11 @@ static const char *const str_7a1[32] = >>> [ 4] = "avx-vnni", [ 5] = "avx512-bf16", >>> }; >>> >>> +static const char *const str_e21a[32] = >>> +{ >>> + [ 2] = "lfence-always-serializing", >> This is a bit of a mouthful. One problem is the fact that "serialising" >> is an ambiguous term, because neither Intel nor AMD formally specify >> what it means in the architecture. >> >> There is a description of what "architecturally serialising" does, which >> does occasionally move over time, and the name of this CPUID bit in the >> PPR confusing at best, because it very much isn't "architecturally >> serialising", and the term "dispatch serialising" isn't actually defined >> anywhere. >> >> Intel have now started referring to LFENCE as a "speculative execution >> barrier", but this still doesn't have a precise definition. >> >> I'm afraid I don't have a useful suggestion for something short and >> concise, which also conveys the precise meaning. > How about "lfence+" or some such? Hmm yes - for xen-cpuid, that's probably fine, and for churn reasons, we can keep X86_FEATURE_LFENCE_DISPATCH which was the best name I could come up with at the time. ~Andrew
On 14.04.2021 13:04, Roger Pau Monne wrote: > @@ -264,6 +265,38 @@ struct cpuid_policy > }; > uint32_t nc:8, :4, apic_id_size:4, :16; > uint32_t /* d */:32; > + > + uint64_t :64, :64; /* Leaf 0x80000009. */ > + uint64_t :64, :64; /* Leaf 0x8000000a - SVM rev and features. */ > + uint64_t :64, :64; /* Leaf 0x8000000b. */ > + uint64_t :64, :64; /* Leaf 0x8000000c. */ > + uint64_t :64, :64; /* Leaf 0x8000000d. */ > + uint64_t :64, :64; /* Leaf 0x8000000e. */ > + uint64_t :64, :64; /* Leaf 0x8000000f. */ > + uint64_t :64, :64; /* Leaf 0x80000010. */ > + uint64_t :64, :64; /* Leaf 0x80000011. */ > + uint64_t :64, :64; /* Leaf 0x80000012. */ > + uint64_t :64, :64; /* Leaf 0x80000013. */ > + uint64_t :64, :64; /* Leaf 0x80000014. */ > + uint64_t :64, :64; /* Leaf 0x80000015. */ > + uint64_t :64, :64; /* Leaf 0x80000016. */ > + uint64_t :64, :64; /* Leaf 0x80000017. */ > + uint64_t :64, :64; /* Leaf 0x80000018. */ > + uint64_t :64, :64; /* Leaf 0x80000019 - TLB 1GB Identifiers. */ > + uint64_t :64, :64; /* Leaf 0x8000001a - Performance related info. */ > + uint64_t :64, :64; /* Leaf 0x8000001b - IBS feature information. */ > + uint64_t :64, :64; /* Leaf 0x8000001c. */ > + uint64_t :64, :64; /* Leaf 0x8000001d - Cache properties. */ > + uint64_t :64, :64; /* Leaf 0x8000001e - Extd APIC/Core/Node IDs. */ > + uint64_t :64, :64; /* Leaf 0x8000001f - AMD Secure Encryption. */ > + uint64_t :64, :64; /* Leaf 0x80000020 - Platform QoS. */ > + > + /* Leaf 0x80000021 - Extended Feature 2 */ > + union { > + uint32_t e21a; > + struct { DECL_BITFIELD(e21a); }; > + }; > + uint32_t /* b */:32, /* c */:32, /* d */:32; > }; > } extd; Due to the effect of this on what guests get to see, I think this wants to take my "x86/CPUID: shrink max_{,sub}leaf fields according to actual leaf contents" as a prereq, which in turn may better remain on top of "x86/CPUID: adjust extended leaves out of range clearing" (both are neighbors in that over 4 months old series, fair parts of which could imo go in irrespective of the unsettled dispute on xvmalloc() - unfortunately I had made that patch 2 of the series, not expecting it to be blocked for so long, and then presumably signaling to others that the rest of the series is also blocked). Jan
On 14/04/2021 13:57, Jan Beulich wrote: > On 14.04.2021 13:04, Roger Pau Monne wrote: >> @@ -264,6 +265,38 @@ struct cpuid_policy >> }; >> uint32_t nc:8, :4, apic_id_size:4, :16; >> uint32_t /* d */:32; >> + >> + uint64_t :64, :64; /* Leaf 0x80000009. */ >> + uint64_t :64, :64; /* Leaf 0x8000000a - SVM rev and features. */ >> + uint64_t :64, :64; /* Leaf 0x8000000b. */ >> + uint64_t :64, :64; /* Leaf 0x8000000c. */ >> + uint64_t :64, :64; /* Leaf 0x8000000d. */ >> + uint64_t :64, :64; /* Leaf 0x8000000e. */ >> + uint64_t :64, :64; /* Leaf 0x8000000f. */ >> + uint64_t :64, :64; /* Leaf 0x80000010. */ >> + uint64_t :64, :64; /* Leaf 0x80000011. */ >> + uint64_t :64, :64; /* Leaf 0x80000012. */ >> + uint64_t :64, :64; /* Leaf 0x80000013. */ >> + uint64_t :64, :64; /* Leaf 0x80000014. */ >> + uint64_t :64, :64; /* Leaf 0x80000015. */ >> + uint64_t :64, :64; /* Leaf 0x80000016. */ >> + uint64_t :64, :64; /* Leaf 0x80000017. */ >> + uint64_t :64, :64; /* Leaf 0x80000018. */ >> + uint64_t :64, :64; /* Leaf 0x80000019 - TLB 1GB Identifiers. */ >> + uint64_t :64, :64; /* Leaf 0x8000001a - Performance related info. */ >> + uint64_t :64, :64; /* Leaf 0x8000001b - IBS feature information. */ >> + uint64_t :64, :64; /* Leaf 0x8000001c. */ >> + uint64_t :64, :64; /* Leaf 0x8000001d - Cache properties. */ >> + uint64_t :64, :64; /* Leaf 0x8000001e - Extd APIC/Core/Node IDs. */ >> + uint64_t :64, :64; /* Leaf 0x8000001f - AMD Secure Encryption. */ >> + uint64_t :64, :64; /* Leaf 0x80000020 - Platform QoS. */ >> + >> + /* Leaf 0x80000021 - Extended Feature 2 */ >> + union { >> + uint32_t e21a; >> + struct { DECL_BITFIELD(e21a); }; >> + }; >> + uint32_t /* b */:32, /* c */:32, /* d */:32; >> }; >> } extd; > Due to the effect of this on what guests get to see, I think this > wants to take my "x86/CPUID: shrink max_{,sub}leaf fields according > to actual leaf contents" as a prereq, which in turn may better > remain on top of "x86/CPUID: adjust extended leaves out of range > clearing" (both are neighbors in that over 4 months old series, > fair parts of which could imo go in irrespective of the unsettled > dispute on xvmalloc() - unfortunately I had made that patch 2 of > the series, not expecting it to be blocked for so long, and then > presumably signaling to others that the rest of the series is also > blocked). There is no shrinking to be done in this case. The bit is set across the board on AMD/Hygon hardware, even on older parts. What does need changing however is the logic to trim max_extd_leaf down to what hardware supports, so the bit is visible on Rome/older hardware. I.e. after this change, all VMs should get 0x80000021 by default on AMD hardware. (A curious observation of Milan hardware is that it actually advertises 0x80000023 as max_extd_leaf, and has two leaves of zeros at the end. I've got an open query about this.) ~Andrew
On 14.04.2021 15:05, Andrew Cooper wrote: > On 14/04/2021 13:57, Jan Beulich wrote: >> On 14.04.2021 13:04, Roger Pau Monne wrote: >>> @@ -264,6 +265,38 @@ struct cpuid_policy >>> }; >>> uint32_t nc:8, :4, apic_id_size:4, :16; >>> uint32_t /* d */:32; >>> + >>> + uint64_t :64, :64; /* Leaf 0x80000009. */ >>> + uint64_t :64, :64; /* Leaf 0x8000000a - SVM rev and features. */ >>> + uint64_t :64, :64; /* Leaf 0x8000000b. */ >>> + uint64_t :64, :64; /* Leaf 0x8000000c. */ >>> + uint64_t :64, :64; /* Leaf 0x8000000d. */ >>> + uint64_t :64, :64; /* Leaf 0x8000000e. */ >>> + uint64_t :64, :64; /* Leaf 0x8000000f. */ >>> + uint64_t :64, :64; /* Leaf 0x80000010. */ >>> + uint64_t :64, :64; /* Leaf 0x80000011. */ >>> + uint64_t :64, :64; /* Leaf 0x80000012. */ >>> + uint64_t :64, :64; /* Leaf 0x80000013. */ >>> + uint64_t :64, :64; /* Leaf 0x80000014. */ >>> + uint64_t :64, :64; /* Leaf 0x80000015. */ >>> + uint64_t :64, :64; /* Leaf 0x80000016. */ >>> + uint64_t :64, :64; /* Leaf 0x80000017. */ >>> + uint64_t :64, :64; /* Leaf 0x80000018. */ >>> + uint64_t :64, :64; /* Leaf 0x80000019 - TLB 1GB Identifiers. */ >>> + uint64_t :64, :64; /* Leaf 0x8000001a - Performance related info. */ >>> + uint64_t :64, :64; /* Leaf 0x8000001b - IBS feature information. */ >>> + uint64_t :64, :64; /* Leaf 0x8000001c. */ >>> + uint64_t :64, :64; /* Leaf 0x8000001d - Cache properties. */ >>> + uint64_t :64, :64; /* Leaf 0x8000001e - Extd APIC/Core/Node IDs. */ >>> + uint64_t :64, :64; /* Leaf 0x8000001f - AMD Secure Encryption. */ >>> + uint64_t :64, :64; /* Leaf 0x80000020 - Platform QoS. */ >>> + >>> + /* Leaf 0x80000021 - Extended Feature 2 */ >>> + union { >>> + uint32_t e21a; >>> + struct { DECL_BITFIELD(e21a); }; >>> + }; >>> + uint32_t /* b */:32, /* c */:32, /* d */:32; >>> }; >>> } extd; >> Due to the effect of this on what guests get to see, I think this >> wants to take my "x86/CPUID: shrink max_{,sub}leaf fields according >> to actual leaf contents" as a prereq, which in turn may better >> remain on top of "x86/CPUID: adjust extended leaves out of range >> clearing" (both are neighbors in that over 4 months old series, >> fair parts of which could imo go in irrespective of the unsettled >> dispute on xvmalloc() - unfortunately I had made that patch 2 of >> the series, not expecting it to be blocked for so long, and then >> presumably signaling to others that the rest of the series is also >> blocked). > > There is no shrinking to be done in this case. The bit is set across > the board on AMD/Hygon hardware, even on older parts. > > What does need changing however is the logic to trim max_extd_leaf down > to what hardware supports, so the bit is visible on Rome/older > hardware. I.e. after this change, all VMs should get 0x80000021 by > default on AMD hardware. As to this last sentence - not really. Only if we managed to set the flag (if it needs setting). It's a two-way thing really: If the flag ends up set, I agree there's no need to shrink max_extd_leaf. But if the flag ends up clear, the issue we have today (hence my patch) becomes worse: Trailing all zero leaves would result because we don't properly reduce the maximum from what hardware reports to what is actually populated. In any event - I think I'd much rather see issues with my patch pointed out, if any. Jan
On 14/04/2021 14:05, Andrew Cooper wrote: > On 14/04/2021 13:57, Jan Beulich wrote: >> On 14.04.2021 13:04, Roger Pau Monne wrote: >>> @@ -264,6 +265,38 @@ struct cpuid_policy >>> }; >>> uint32_t nc:8, :4, apic_id_size:4, :16; >>> uint32_t /* d */:32; >>> + >>> + uint64_t :64, :64; /* Leaf 0x80000009. */ >>> + uint64_t :64, :64; /* Leaf 0x8000000a - SVM rev and features. */ >>> + uint64_t :64, :64; /* Leaf 0x8000000b. */ >>> + uint64_t :64, :64; /* Leaf 0x8000000c. */ >>> + uint64_t :64, :64; /* Leaf 0x8000000d. */ >>> + uint64_t :64, :64; /* Leaf 0x8000000e. */ >>> + uint64_t :64, :64; /* Leaf 0x8000000f. */ >>> + uint64_t :64, :64; /* Leaf 0x80000010. */ >>> + uint64_t :64, :64; /* Leaf 0x80000011. */ >>> + uint64_t :64, :64; /* Leaf 0x80000012. */ >>> + uint64_t :64, :64; /* Leaf 0x80000013. */ >>> + uint64_t :64, :64; /* Leaf 0x80000014. */ >>> + uint64_t :64, :64; /* Leaf 0x80000015. */ >>> + uint64_t :64, :64; /* Leaf 0x80000016. */ >>> + uint64_t :64, :64; /* Leaf 0x80000017. */ >>> + uint64_t :64, :64; /* Leaf 0x80000018. */ >>> + uint64_t :64, :64; /* Leaf 0x80000019 - TLB 1GB Identifiers. */ >>> + uint64_t :64, :64; /* Leaf 0x8000001a - Performance related info. */ >>> + uint64_t :64, :64; /* Leaf 0x8000001b - IBS feature information. */ >>> + uint64_t :64, :64; /* Leaf 0x8000001c. */ >>> + uint64_t :64, :64; /* Leaf 0x8000001d - Cache properties. */ >>> + uint64_t :64, :64; /* Leaf 0x8000001e - Extd APIC/Core/Node IDs. */ >>> + uint64_t :64, :64; /* Leaf 0x8000001f - AMD Secure Encryption. */ >>> + uint64_t :64, :64; /* Leaf 0x80000020 - Platform QoS. */ >>> + >>> + /* Leaf 0x80000021 - Extended Feature 2 */ >>> + union { >>> + uint32_t e21a; >>> + struct { DECL_BITFIELD(e21a); }; >>> + }; >>> + uint32_t /* b */:32, /* c */:32, /* d */:32; >>> }; >>> } extd; >> Due to the effect of this on what guests get to see, I think this >> wants to take my "x86/CPUID: shrink max_{,sub}leaf fields according >> to actual leaf contents" as a prereq, which in turn may better >> remain on top of "x86/CPUID: adjust extended leaves out of range >> clearing" (both are neighbors in that over 4 months old series, >> fair parts of which could imo go in irrespective of the unsettled >> dispute on xvmalloc() - unfortunately I had made that patch 2 of >> the series, not expecting it to be blocked for so long, and then >> presumably signaling to others that the rest of the series is also >> blocked). > There is no shrinking to be done in this case. The bit is set across > the board on AMD/Hygon hardware, even on older parts. > > What does need changing however is the logic to trim max_extd_leaf down > to what hardware supports, so the bit is visible on Rome/older > hardware. I.e. after this change, all VMs should get 0x80000021 by > default on AMD hardware. > > (A curious observation of Milan hardware is that it actually advertises > 0x80000023 as max_extd_leaf, and has two leaves of zeros at the end. > I've got an open query about this.) Something like this: diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c index 050cd5713e..d9eb2878c5 100644 --- a/xen/arch/x86/cpuid.c +++ b/xen/arch/x86/cpuid.c @@ -311,6 +311,7 @@ static void __init calculate_raw_policy(void) static void __init calculate_host_policy(void) { struct cpuid_policy *p = &host_cpuid_policy; + unsigned int max_extd_leaf; *p = raw_cpuid_policy; @@ -318,7 +319,18 @@ static void __init calculate_host_policy(void) min_t(uint32_t, p->basic.max_leaf, ARRAY_SIZE(p->basic.raw) - 1); p->feat.max_subleaf = min_t(uint32_t, p->feat.max_subleaf, ARRAY_SIZE(p->feat.raw) - 1); - p->extd.max_leaf = 0x80000000 | min_t(uint32_t, p->extd.max_leaf & 0xffff, + + max_extd_leaf = p->extd.max_leaf; + + /* + * For AMD/Hygon hardware before Zen3, we modify LFENCE to be dispatch + * serialsing. Extend max_extd_leaf beyond what hardware supports, to + * include the feature leaf containing this information. + */ + if ( cpu_has_lfence_dispatch ) + max_extd_leaf = max(max_extd_leaf, 0x80000021); + + p->extd.max_leaf = 0x80000000 | min_t(uint32_t, max_extd_leaf & 0xffff, ARRAY_SIZE(p->extd.raw) - 1); cpuid_featureset_to_policy(boot_cpu_data.x86_capability, p); Only compile tested. Needs checking on Zen2 or older. The Raw policy should have the real max_extd_leaf, while host and derived should have 0x80000021. ~Andrew
On 14/04/2021 14:24, Jan Beulich wrote: > On 14.04.2021 15:05, Andrew Cooper wrote: >> On 14/04/2021 13:57, Jan Beulich wrote: >>> On 14.04.2021 13:04, Roger Pau Monne wrote: >>>> @@ -264,6 +265,38 @@ struct cpuid_policy >>>> }; >>>> uint32_t nc:8, :4, apic_id_size:4, :16; >>>> uint32_t /* d */:32; >>>> + >>>> + uint64_t :64, :64; /* Leaf 0x80000009. */ >>>> + uint64_t :64, :64; /* Leaf 0x8000000a - SVM rev and features. */ >>>> + uint64_t :64, :64; /* Leaf 0x8000000b. */ >>>> + uint64_t :64, :64; /* Leaf 0x8000000c. */ >>>> + uint64_t :64, :64; /* Leaf 0x8000000d. */ >>>> + uint64_t :64, :64; /* Leaf 0x8000000e. */ >>>> + uint64_t :64, :64; /* Leaf 0x8000000f. */ >>>> + uint64_t :64, :64; /* Leaf 0x80000010. */ >>>> + uint64_t :64, :64; /* Leaf 0x80000011. */ >>>> + uint64_t :64, :64; /* Leaf 0x80000012. */ >>>> + uint64_t :64, :64; /* Leaf 0x80000013. */ >>>> + uint64_t :64, :64; /* Leaf 0x80000014. */ >>>> + uint64_t :64, :64; /* Leaf 0x80000015. */ >>>> + uint64_t :64, :64; /* Leaf 0x80000016. */ >>>> + uint64_t :64, :64; /* Leaf 0x80000017. */ >>>> + uint64_t :64, :64; /* Leaf 0x80000018. */ >>>> + uint64_t :64, :64; /* Leaf 0x80000019 - TLB 1GB Identifiers. */ >>>> + uint64_t :64, :64; /* Leaf 0x8000001a - Performance related info. */ >>>> + uint64_t :64, :64; /* Leaf 0x8000001b - IBS feature information. */ >>>> + uint64_t :64, :64; /* Leaf 0x8000001c. */ >>>> + uint64_t :64, :64; /* Leaf 0x8000001d - Cache properties. */ >>>> + uint64_t :64, :64; /* Leaf 0x8000001e - Extd APIC/Core/Node IDs. */ >>>> + uint64_t :64, :64; /* Leaf 0x8000001f - AMD Secure Encryption. */ >>>> + uint64_t :64, :64; /* Leaf 0x80000020 - Platform QoS. */ >>>> + >>>> + /* Leaf 0x80000021 - Extended Feature 2 */ >>>> + union { >>>> + uint32_t e21a; >>>> + struct { DECL_BITFIELD(e21a); }; >>>> + }; >>>> + uint32_t /* b */:32, /* c */:32, /* d */:32; >>>> }; >>>> } extd; >>> Due to the effect of this on what guests get to see, I think this >>> wants to take my "x86/CPUID: shrink max_{,sub}leaf fields according >>> to actual leaf contents" as a prereq, which in turn may better >>> remain on top of "x86/CPUID: adjust extended leaves out of range >>> clearing" (both are neighbors in that over 4 months old series, >>> fair parts of which could imo go in irrespective of the unsettled >>> dispute on xvmalloc() - unfortunately I had made that patch 2 of >>> the series, not expecting it to be blocked for so long, and then >>> presumably signaling to others that the rest of the series is also >>> blocked). >> There is no shrinking to be done in this case. The bit is set across >> the board on AMD/Hygon hardware, even on older parts. >> >> What does need changing however is the logic to trim max_extd_leaf down >> to what hardware supports, so the bit is visible on Rome/older >> hardware. I.e. after this change, all VMs should get 0x80000021 by >> default on AMD hardware. > As to this last sentence - not really. Only if we managed to set the > flag (if it needs setting). ... or we find it already set. Remember that we don't even offer an option to let the user avoid this behaviour. The only case where we'll boot with lfence not being dispatch serialising is when we're virutalised under a pre-2018 hypervisor with has no Spectre knowledge. There are more important things in life, than to worry about this case. > It's a two-way thing really: If the flag ends up set, I agree there's > no need to shrink max_extd_leaf. But if the flag ends up clear, the > issue we have today (hence my patch) becomes worse: Trailing all zero > leaves would result because we don't properly reduce the maximum from > what hardware reports to what is actually populated. In any event - I > think I'd much rather see issues with my patch pointed out, if any. I don't have a problem with the shrinking change in principle, but it is definitely not a prereq to this change. Trailing zeroes aren't going to cause guests to malfunction, even if we'd ideally be neater about the result. ~Andrew
On 14.04.2021 15:25, Andrew Cooper wrote: > On 14/04/2021 14:05, Andrew Cooper wrote: >> On 14/04/2021 13:57, Jan Beulich wrote: >>> On 14.04.2021 13:04, Roger Pau Monne wrote: >>>> @@ -264,6 +265,38 @@ struct cpuid_policy >>>> }; >>>> uint32_t nc:8, :4, apic_id_size:4, :16; >>>> uint32_t /* d */:32; >>>> + >>>> + uint64_t :64, :64; /* Leaf 0x80000009. */ >>>> + uint64_t :64, :64; /* Leaf 0x8000000a - SVM rev and features. */ >>>> + uint64_t :64, :64; /* Leaf 0x8000000b. */ >>>> + uint64_t :64, :64; /* Leaf 0x8000000c. */ >>>> + uint64_t :64, :64; /* Leaf 0x8000000d. */ >>>> + uint64_t :64, :64; /* Leaf 0x8000000e. */ >>>> + uint64_t :64, :64; /* Leaf 0x8000000f. */ >>>> + uint64_t :64, :64; /* Leaf 0x80000010. */ >>>> + uint64_t :64, :64; /* Leaf 0x80000011. */ >>>> + uint64_t :64, :64; /* Leaf 0x80000012. */ >>>> + uint64_t :64, :64; /* Leaf 0x80000013. */ >>>> + uint64_t :64, :64; /* Leaf 0x80000014. */ >>>> + uint64_t :64, :64; /* Leaf 0x80000015. */ >>>> + uint64_t :64, :64; /* Leaf 0x80000016. */ >>>> + uint64_t :64, :64; /* Leaf 0x80000017. */ >>>> + uint64_t :64, :64; /* Leaf 0x80000018. */ >>>> + uint64_t :64, :64; /* Leaf 0x80000019 - TLB 1GB Identifiers. */ >>>> + uint64_t :64, :64; /* Leaf 0x8000001a - Performance related info. */ >>>> + uint64_t :64, :64; /* Leaf 0x8000001b - IBS feature information. */ >>>> + uint64_t :64, :64; /* Leaf 0x8000001c. */ >>>> + uint64_t :64, :64; /* Leaf 0x8000001d - Cache properties. */ >>>> + uint64_t :64, :64; /* Leaf 0x8000001e - Extd APIC/Core/Node IDs. */ >>>> + uint64_t :64, :64; /* Leaf 0x8000001f - AMD Secure Encryption. */ >>>> + uint64_t :64, :64; /* Leaf 0x80000020 - Platform QoS. */ >>>> + >>>> + /* Leaf 0x80000021 - Extended Feature 2 */ >>>> + union { >>>> + uint32_t e21a; >>>> + struct { DECL_BITFIELD(e21a); }; >>>> + }; >>>> + uint32_t /* b */:32, /* c */:32, /* d */:32; >>>> }; >>>> } extd; >>> Due to the effect of this on what guests get to see, I think this >>> wants to take my "x86/CPUID: shrink max_{,sub}leaf fields according >>> to actual leaf contents" as a prereq, which in turn may better >>> remain on top of "x86/CPUID: adjust extended leaves out of range >>> clearing" (both are neighbors in that over 4 months old series, >>> fair parts of which could imo go in irrespective of the unsettled >>> dispute on xvmalloc() - unfortunately I had made that patch 2 of >>> the series, not expecting it to be blocked for so long, and then >>> presumably signaling to others that the rest of the series is also >>> blocked). >> There is no shrinking to be done in this case. The bit is set across >> the board on AMD/Hygon hardware, even on older parts. >> >> What does need changing however is the logic to trim max_extd_leaf down >> to what hardware supports, so the bit is visible on Rome/older >> hardware. I.e. after this change, all VMs should get 0x80000021 by >> default on AMD hardware. >> >> (A curious observation of Milan hardware is that it actually advertises >> 0x80000023 as max_extd_leaf, and has two leaves of zeros at the end. >> I've got an open query about this.) > > Something like this: > > diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c > index 050cd5713e..d9eb2878c5 100644 > --- a/xen/arch/x86/cpuid.c > +++ b/xen/arch/x86/cpuid.c > @@ -311,6 +311,7 @@ static void __init calculate_raw_policy(void) > static void __init calculate_host_policy(void) > { > struct cpuid_policy *p = &host_cpuid_policy; > + unsigned int max_extd_leaf; > > *p = raw_cpuid_policy; > > @@ -318,7 +319,18 @@ static void __init calculate_host_policy(void) > min_t(uint32_t, p->basic.max_leaf, ARRAY_SIZE(p->basic.raw) - 1); > p->feat.max_subleaf = > min_t(uint32_t, p->feat.max_subleaf, ARRAY_SIZE(p->feat.raw) - 1); > - p->extd.max_leaf = 0x80000000 | min_t(uint32_t, p->extd.max_leaf & > 0xffff, > + > + max_extd_leaf = p->extd.max_leaf; > + > + /* > + * For AMD/Hygon hardware before Zen3, we modify LFENCE to be dispatch > + * serialsing. Extend max_extd_leaf beyond what hardware supports, to > + * include the feature leaf containing this information. > + */ > + if ( cpu_has_lfence_dispatch ) > + max_extd_leaf = max(max_extd_leaf, 0x80000021); > + > + p->extd.max_leaf = 0x80000000 | min_t(uint32_t, max_extd_leaf & 0xffff, > ARRAY_SIZE(p->extd.raw) - 1); > > cpuid_featureset_to_policy(boot_cpu_data.x86_capability, p); Well, why not set it to ARRAY_SIZE() and then have x86_cpuid_policy_shrink_max_leaves() (from "x86/CPUID: shrink max_{,sub}leaf fields according to actual leaf contents") have a go? It'll recognize the non-zero leaf ... Otherwise, if we gain a few more such special cases, things are going to get ugly here. Jan
On Wed, Apr 14, 2021 at 03:57:22PM +0200, Jan Beulich wrote: > On 14.04.2021 15:25, Andrew Cooper wrote: > > On 14/04/2021 14:05, Andrew Cooper wrote: > >> On 14/04/2021 13:57, Jan Beulich wrote: > >>> On 14.04.2021 13:04, Roger Pau Monne wrote: > >>>> @@ -264,6 +265,38 @@ struct cpuid_policy > >>>> }; > >>>> uint32_t nc:8, :4, apic_id_size:4, :16; > >>>> uint32_t /* d */:32; > >>>> + > >>>> + uint64_t :64, :64; /* Leaf 0x80000009. */ > >>>> + uint64_t :64, :64; /* Leaf 0x8000000a - SVM rev and features. */ > >>>> + uint64_t :64, :64; /* Leaf 0x8000000b. */ > >>>> + uint64_t :64, :64; /* Leaf 0x8000000c. */ > >>>> + uint64_t :64, :64; /* Leaf 0x8000000d. */ > >>>> + uint64_t :64, :64; /* Leaf 0x8000000e. */ > >>>> + uint64_t :64, :64; /* Leaf 0x8000000f. */ > >>>> + uint64_t :64, :64; /* Leaf 0x80000010. */ > >>>> + uint64_t :64, :64; /* Leaf 0x80000011. */ > >>>> + uint64_t :64, :64; /* Leaf 0x80000012. */ > >>>> + uint64_t :64, :64; /* Leaf 0x80000013. */ > >>>> + uint64_t :64, :64; /* Leaf 0x80000014. */ > >>>> + uint64_t :64, :64; /* Leaf 0x80000015. */ > >>>> + uint64_t :64, :64; /* Leaf 0x80000016. */ > >>>> + uint64_t :64, :64; /* Leaf 0x80000017. */ > >>>> + uint64_t :64, :64; /* Leaf 0x80000018. */ > >>>> + uint64_t :64, :64; /* Leaf 0x80000019 - TLB 1GB Identifiers. */ > >>>> + uint64_t :64, :64; /* Leaf 0x8000001a - Performance related info. */ > >>>> + uint64_t :64, :64; /* Leaf 0x8000001b - IBS feature information. */ > >>>> + uint64_t :64, :64; /* Leaf 0x8000001c. */ > >>>> + uint64_t :64, :64; /* Leaf 0x8000001d - Cache properties. */ > >>>> + uint64_t :64, :64; /* Leaf 0x8000001e - Extd APIC/Core/Node IDs. */ > >>>> + uint64_t :64, :64; /* Leaf 0x8000001f - AMD Secure Encryption. */ > >>>> + uint64_t :64, :64; /* Leaf 0x80000020 - Platform QoS. */ > >>>> + > >>>> + /* Leaf 0x80000021 - Extended Feature 2 */ > >>>> + union { > >>>> + uint32_t e21a; > >>>> + struct { DECL_BITFIELD(e21a); }; > >>>> + }; > >>>> + uint32_t /* b */:32, /* c */:32, /* d */:32; > >>>> }; > >>>> } extd; > >>> Due to the effect of this on what guests get to see, I think this > >>> wants to take my "x86/CPUID: shrink max_{,sub}leaf fields according > >>> to actual leaf contents" as a prereq, which in turn may better > >>> remain on top of "x86/CPUID: adjust extended leaves out of range > >>> clearing" (both are neighbors in that over 4 months old series, > >>> fair parts of which could imo go in irrespective of the unsettled > >>> dispute on xvmalloc() - unfortunately I had made that patch 2 of > >>> the series, not expecting it to be blocked for so long, and then > >>> presumably signaling to others that the rest of the series is also > >>> blocked). > >> There is no shrinking to be done in this case. The bit is set across > >> the board on AMD/Hygon hardware, even on older parts. > >> > >> What does need changing however is the logic to trim max_extd_leaf down > >> to what hardware supports, so the bit is visible on Rome/older > >> hardware. I.e. after this change, all VMs should get 0x80000021 by > >> default on AMD hardware. > >> > >> (A curious observation of Milan hardware is that it actually advertises > >> 0x80000023 as max_extd_leaf, and has two leaves of zeros at the end. > >> I've got an open query about this.) > > > > Something like this: > > > > diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c > > index 050cd5713e..d9eb2878c5 100644 > > --- a/xen/arch/x86/cpuid.c > > +++ b/xen/arch/x86/cpuid.c > > @@ -311,6 +311,7 @@ static void __init calculate_raw_policy(void) > > static void __init calculate_host_policy(void) > > { > > struct cpuid_policy *p = &host_cpuid_policy; > > + unsigned int max_extd_leaf; > > > > *p = raw_cpuid_policy; > > > > @@ -318,7 +319,18 @@ static void __init calculate_host_policy(void) > > min_t(uint32_t, p->basic.max_leaf, ARRAY_SIZE(p->basic.raw) - 1); > > p->feat.max_subleaf = > > min_t(uint32_t, p->feat.max_subleaf, ARRAY_SIZE(p->feat.raw) - 1); > > - p->extd.max_leaf = 0x80000000 | min_t(uint32_t, p->extd.max_leaf & > > 0xffff, > > + > > + max_extd_leaf = p->extd.max_leaf; > > + > > + /* > > + * For AMD/Hygon hardware before Zen3, we modify LFENCE to be dispatch > > + * serialsing. Extend max_extd_leaf beyond what hardware supports, to > > + * include the feature leaf containing this information. > > + */ > > + if ( cpu_has_lfence_dispatch ) > > + max_extd_leaf = max(max_extd_leaf, 0x80000021); > > + > > + p->extd.max_leaf = 0x80000000 | min_t(uint32_t, max_extd_leaf & 0xffff, > > ARRAY_SIZE(p->extd.raw) - 1); > > > > cpuid_featureset_to_policy(boot_cpu_data.x86_capability, p); > > Well, why not set it to ARRAY_SIZE() and then have > x86_cpuid_policy_shrink_max_leaves() (from "x86/CPUID: shrink > max_{,sub}leaf fields according to actual leaf contents") have > a go? It'll recognize the non-zero leaf ... Otherwise, if we > gain a few more such special cases, things are going to get > ugly here. I will wait for Jan to post the updated version of his shrink patch and then rebase mine on top in order to set extd.max_leaf to ARRAY_SIZE and let the shrink logic deal with it. Thanks, Roger.
diff --git a/tools/misc/xen-cpuid.c b/tools/misc/xen-cpuid.c index 2d04162d8d8..38406baadad 100644 --- a/tools/misc/xen-cpuid.c +++ b/tools/misc/xen-cpuid.c @@ -178,6 +178,11 @@ static const char *const str_7a1[32] = [ 4] = "avx-vnni", [ 5] = "avx512-bf16", }; +static const char *const str_e21a[32] = +{ + [ 2] = "lfence-always-serializing", +}; + static const struct { const char *name; const char *abbr; @@ -195,6 +200,7 @@ static const struct { { "0x80000008.ebx", "e8b", str_e8b }, { "0x00000007:0.edx", "7d0", str_7d0 }, { "0x00000007:1.eax", "7a1", str_7a1 }, + { "0x80000021.eax", "e21a", str_e21a }, }; #define COL_ALIGN "18" diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c index 9c8dcd91eef..35f22c24762 100644 --- a/xen/arch/x86/cpu/amd.c +++ b/xen/arch/x86/cpu/amd.c @@ -646,6 +646,13 @@ void amd_init_lfence(struct cpuinfo_x86 *c) { uint64_t value; + /* + * Some hardware has LFENCE dispatch serializing always enabled, + * nothing to do on that case. + */ + if (test_bit(X86_FEATURE_LFENCE_DISPATCH, c->x86_capability)) + return; + /* * Attempt to set lfence to be Dispatch Serialising. This MSR almost * certainly isn't virtualised (and Xen at least will leak the real diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c index e5c3caf41d5..0eb364f8a65 100644 --- a/xen/arch/x86/cpu/common.c +++ b/xen/arch/x86/cpu/common.c @@ -412,6 +412,9 @@ static void generic_identify(struct cpuinfo_x86 *c) if (c->extended_cpuid_level >= 0x80000008) c->x86_capability[cpufeat_word(X86_FEATURE_CLZERO)] = cpuid_ebx(0x80000008); + if (c->extended_cpuid_level >= 0x80000021) + c->x86_capability[cpufeat_word(X86_FEATURE_LFENCE_DISPATCH)] + = cpuid_eax(0x80000021); /* Intel-defined flags: level 0x00000007 */ if ( c->cpuid_level >= 0x00000007 ) { diff --git a/xen/include/asm-x86/cpufeatures.h b/xen/include/asm-x86/cpufeatures.h index d7e42d9bb6a..6c8f432aee4 100644 --- a/xen/include/asm-x86/cpufeatures.h +++ b/xen/include/asm-x86/cpufeatures.h @@ -24,7 +24,7 @@ XEN_CPUFEATURE(APERFMPERF, X86_SYNTH( 8)) /* APERFMPERF */ XEN_CPUFEATURE(MFENCE_RDTSC, X86_SYNTH( 9)) /* MFENCE synchronizes RDTSC */ XEN_CPUFEATURE(XEN_SMEP, X86_SYNTH(10)) /* SMEP gets used by Xen itself */ XEN_CPUFEATURE(XEN_SMAP, X86_SYNTH(11)) /* SMAP gets used by Xen itself */ -XEN_CPUFEATURE(LFENCE_DISPATCH, X86_SYNTH(12)) /* lfence set as Dispatch Serialising */ +/* Bit 12 - unused. */ XEN_CPUFEATURE(IND_THUNK_LFENCE, X86_SYNTH(13)) /* Use IND_THUNK_LFENCE */ XEN_CPUFEATURE(IND_THUNK_JMP, X86_SYNTH(14)) /* Use IND_THUNK_JMP */ XEN_CPUFEATURE(SC_BRANCH_HARDEN, X86_SYNTH(15)) /* Conditional Branch Hardening */ diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h index a5014798209..c67cd07b26b 100644 --- a/xen/include/public/arch-x86/cpufeatureset.h +++ b/xen/include/public/arch-x86/cpufeatureset.h @@ -277,6 +277,9 @@ XEN_CPUFEATURE(SSBD, 9*32+31) /*A MSR_SPEC_CTRL.SSBD available */ XEN_CPUFEATURE(AVX_VNNI, 10*32+ 4) /*A AVX-VNNI Instructions */ XEN_CPUFEATURE(AVX512_BF16, 10*32+ 5) /*A AVX512 BFloat16 Instructions */ +/* AMD-defined CPU features, CPUID level 0x80000021.eax, word 11 */ +XEN_CPUFEATURE(LFENCE_DISPATCH, 11*32+ 2) /*A LFENCE always serializing */ + #endif /* XEN_CPUFEATURE */ /* Clean up from a default include. Close the enum (for C). */ diff --git a/xen/include/xen/lib/x86/cpuid.h b/xen/include/xen/lib/x86/cpuid.h index f4ef8a9f2f0..a4d254ea96e 100644 --- a/xen/include/xen/lib/x86/cpuid.h +++ b/xen/include/xen/lib/x86/cpuid.h @@ -15,6 +15,7 @@ #define FEATURESET_e8b 8 /* 0x80000008.ebx */ #define FEATURESET_7d0 9 /* 0x00000007:0.edx */ #define FEATURESET_7a1 10 /* 0x00000007:1.eax */ +#define FEATURESET_e21a 11 /* 0x80000021.eax */ struct cpuid_leaf { @@ -84,7 +85,7 @@ const char *x86_cpuid_vendor_to_str(unsigned int vendor); #define CPUID_GUEST_NR_TOPO (1u + 1) #define CPUID_GUEST_NR_XSTATE (62u + 1) #define CPUID_GUEST_NR_EXTD_INTEL (0x8u + 1) -#define CPUID_GUEST_NR_EXTD_AMD (0x1cu + 1) +#define CPUID_GUEST_NR_EXTD_AMD (0x21u + 1) #define CPUID_GUEST_NR_EXTD MAX(CPUID_GUEST_NR_EXTD_INTEL, \ CPUID_GUEST_NR_EXTD_AMD) @@ -264,6 +265,38 @@ struct cpuid_policy }; uint32_t nc:8, :4, apic_id_size:4, :16; uint32_t /* d */:32; + + uint64_t :64, :64; /* Leaf 0x80000009. */ + uint64_t :64, :64; /* Leaf 0x8000000a - SVM rev and features. */ + uint64_t :64, :64; /* Leaf 0x8000000b. */ + uint64_t :64, :64; /* Leaf 0x8000000c. */ + uint64_t :64, :64; /* Leaf 0x8000000d. */ + uint64_t :64, :64; /* Leaf 0x8000000e. */ + uint64_t :64, :64; /* Leaf 0x8000000f. */ + uint64_t :64, :64; /* Leaf 0x80000010. */ + uint64_t :64, :64; /* Leaf 0x80000011. */ + uint64_t :64, :64; /* Leaf 0x80000012. */ + uint64_t :64, :64; /* Leaf 0x80000013. */ + uint64_t :64, :64; /* Leaf 0x80000014. */ + uint64_t :64, :64; /* Leaf 0x80000015. */ + uint64_t :64, :64; /* Leaf 0x80000016. */ + uint64_t :64, :64; /* Leaf 0x80000017. */ + uint64_t :64, :64; /* Leaf 0x80000018. */ + uint64_t :64, :64; /* Leaf 0x80000019 - TLB 1GB Identifiers. */ + uint64_t :64, :64; /* Leaf 0x8000001a - Performance related info. */ + uint64_t :64, :64; /* Leaf 0x8000001b - IBS feature information. */ + uint64_t :64, :64; /* Leaf 0x8000001c. */ + uint64_t :64, :64; /* Leaf 0x8000001d - Cache properties. */ + uint64_t :64, :64; /* Leaf 0x8000001e - Extd APIC/Core/Node IDs. */ + uint64_t :64, :64; /* Leaf 0x8000001f - AMD Secure Encryption. */ + uint64_t :64, :64; /* Leaf 0x80000020 - Platform QoS. */ + + /* Leaf 0x80000021 - Extended Feature 2 */ + union { + uint32_t e21a; + struct { DECL_BITFIELD(e21a); }; + }; + uint32_t /* b */:32, /* c */:32, /* d */:32; }; } extd; @@ -293,6 +326,7 @@ static inline void cpuid_policy_to_featureset( fs[FEATURESET_e8b] = p->extd.e8b; fs[FEATURESET_7d0] = p->feat._7d0; fs[FEATURESET_7a1] = p->feat._7a1; + fs[FEATURESET_e21a] = p->extd.e21a; } /* Fill in a CPUID policy from a featureset bitmap. */ @@ -310,6 +344,7 @@ static inline void cpuid_featureset_to_policy( p->extd.e8b = fs[FEATURESET_e8b]; p->feat._7d0 = fs[FEATURESET_7d0]; p->feat._7a1 = fs[FEATURESET_7a1]; + p->extd.e21a = fs[FEATURESET_e21a]; } static inline uint64_t cpuid_policy_xcr0_max(const struct cpuid_policy *p)
Milan AMD CPUs have an LFENCE Always Serializing CPUID bit in leaf 80000021.eax. Previous AMD versions used to have a user settable bit in DE_CFG MSR to select whether LFENCE was dispatch serializing, which Xen always attempts to set. In order to support this new CPUID bit move the LFENCE_DISPATCH synthetic CPUID bit to map the hardware bit (leaving a hole in the synthetic range) and either rely on the bit already being set by the native CPUID output, or attempt to fake it in Xen by modifying the DE_CFG MSR. This requires adding one more entry to the featureset to support leaf 80000021.eax. The bit is exposed to guests by default, as a way to signal that LFENCE is always serializing, either because it's mandated by hardware, or because Xen has performed the necessary arrangements. Note that Xen doesn't allow guests to change the DE_CFG value, so once set by Xen LFENCE will always be serializing. Note that the access to DE_CFG by guests is left as-is: reads will unconditionally return LFENCE_SERIALISE bit set, while writes are silently dropped. The MSR is not present on AMD Milan hardware, but I'm not sure there's any value in adding logic in Xen to also hide it from guests when running on such hardware. Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> --- tools/misc/xen-cpuid.c | 6 ++++ xen/arch/x86/cpu/amd.c | 7 ++++ xen/arch/x86/cpu/common.c | 3 ++ xen/include/asm-x86/cpufeatures.h | 2 +- xen/include/public/arch-x86/cpufeatureset.h | 3 ++ xen/include/xen/lib/x86/cpuid.h | 37 ++++++++++++++++++++- 6 files changed, 56 insertions(+), 2 deletions(-)