Message ID | 6b751a400df49217defc89a19b3ac2ca33ab7690.1743683787.git.teddy.astie@vates.tech (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [RFC] x86/amd: Add support for AMD TCE | expand |
On 03.04.2025 14:44, Teddy Astie wrote: > AMD Translation Cache Extension is a flag that can be enabled in the EFER MSR to optimize > some TLB flushes. Enable this flag if supported by hardware. > > AMD Architecture Developer Manual describe this feature as follow >> Setting this bit to 1 changes how the INVLPG, INVLPGB, and INVPCID instructions operate >> on TLB entries. When this bit is 0, these instructions remove the target PTE from the >> TLB as well as all upper-level table entries that are cached in the TLB, whether or not >> they are associated with the target PTE. When this bit is set, these instructions will >> remove the target PTE and only those upper-level entries that lead to the target PTE in >> the page table hierarchy, leaving unrelated upper-level entries intact. This may provide >> a performance benefit. > > Signed-off-by: Teddy Astie <teddy.astie@vates.tech> > --- > RFC: > - is this change actually safe ? Well, before getting here with reading I was already about to ask this very question. It's really you who needs to prove it. > - should we add a tce/no-tce option to opt-in/out this feature ? Unless we're entirely certain we got this right and didn't overlook any corner case, perhaps better to do so. > - is this flag enabled at the right moment during boot ? If (as you appear to take as a base assumption) our code is safe towards this behavioral change, then it would be largely irrelevant when you set the bit. So to answer this question the first point above needs sorting. > --- a/xen/arch/x86/include/asm/cpufeature.h > +++ b/xen/arch/x86/include/asm/cpufeature.h > @@ -114,6 +114,7 @@ static inline bool boot_cpu_has(unsigned int feat) > #define cpu_has_xop boot_cpu_has(X86_FEATURE_XOP) > #define cpu_has_skinit boot_cpu_has(X86_FEATURE_SKINIT) > #define cpu_has_fma4 boot_cpu_has(X86_FEATURE_FMA4) > +#define cpu_has_tce boot_cpu_has(X86_FEATURE_TCE) If you add this, ... > --- a/xen/arch/x86/setup.c > +++ b/xen/arch/x86/setup.c > @@ -2008,6 +2008,13 @@ void asmlinkage __init noreturn __start_xen(void) > if ( cpu_has_pku ) > set_in_cr4(X86_CR4_PKE); > > + if ( boot_cpu_has(X86_FEATURE_TCE) ) ... the please also use it. > --- a/xen/arch/x86/smpboot.c > +++ b/xen/arch/x86/smpboot.c > @@ -372,6 +372,9 @@ void asmlinkage start_secondary(void *unused) > > microcode_update_one(); > > + if ( boot_cpu_has(X86_FEATURE_TCE) ) > + write_efer(read_efer() | EFER_TCE); Same here. But I wonder if you couldn't set the bit in trampoline_efer. Jan
On 03/04/2025 1:58 pm, Jan Beulich wrote: > On 03.04.2025 14:44, Teddy Astie wrote: >> Signed-off-by: Teddy Astie <teddy.astie@vates.tech> >> --- >> RFC: >> - is this change actually safe ? > Well, before getting here with reading I was already about to ask this very > question. It's really you who needs to prove it. > >> - should we add a tce/no-tce option to opt-in/out this feature ? > Unless we're entirely certain we got this right and didn't overlook any > corner case, perhaps better to do so. To bring across a quote of mine from Mattermost: "I'm reasonably sure our TLB handling algorithm is safe for it, following the PCID work we did for Meltdown" But, proving this is hard. Some history: INVLPG flushing the entire paging structure cache (non-leaf mappings) was a last-minute "fix" to keep Windows working on the Pentium(?), where it had started using INVLPG from the 486(?) but with a logical error. AMD's TCE feature is "that's a hefty hit to keep around, so here's an option for the behaviour one would more reasonably expect from INVLPG". Anyway. I have a suspicion that Intel's INVPCID no longer followed the INVLPG behaviour anyway, and that we were forced to account for that. However, I'm struggling to find confirmation one way or another in the SDM. Another mitigating factor is that, because we use recursive pagetables, we have to upgrade an INVLPG into a full flush anyway if we edited non-leaf entries. As to a cmdline option, there's cpuid=no-tce if we really really need it, but I don't think we want a dedicated TCE option. >> --- a/xen/arch/x86/smpboot.c >> +++ b/xen/arch/x86/smpboot.c >> @@ -372,6 +372,9 @@ void asmlinkage start_secondary(void *unused) >> >> microcode_update_one(); >> >> + if ( boot_cpu_has(X86_FEATURE_TCE) ) >> + write_efer(read_efer() | EFER_TCE); > Same here. But I wonder if you couldn't set the bit in trampoline_efer. Yes, do set it in trampoline_efer, and drop this hunk. ~Andrew
Le 03/04/2025 à 16:08, Andrew Cooper a écrit : > On 03/04/2025 1:58 pm, Jan Beulich wrote: >> On 03.04.2025 14:44, Teddy Astie wrote: >>> Signed-off-by: Teddy Astie <teddy.astie@vates.tech> >>> --- >>> RFC: >>> - is this change actually safe ? >> Well, before getting here with reading I was already about to ask this very >> question. It's really you who needs to prove it. >> >>> - should we add a tce/no-tce option to opt-in/out this feature ? >> Unless we're entirely certain we got this right and didn't overlook any >> corner case, perhaps better to do so. > > To bring across a quote of mine from Mattermost: > > "I'm reasonably sure our TLB handling algorithm is safe for it, > following the PCID work we did for Meltdown" > > But, proving this is hard. > > Some history: INVLPG flushing the entire paging structure cache > (non-leaf mappings) was a last-minute "fix" to keep Windows working on > the Pentium(?), where it had started using INVLPG from the 486(?) but > with a logical error. > > AMD's TCE feature is "that's a hefty hit to keep around, so here's an > option for the behaviour one would more reasonably expect from INVLPG". > > Anyway. I have a suspicion that Intel's INVPCID no longer followed the > INVLPG behaviour anyway, and that we were forced to account for that. > However, I'm struggling to find confirmation one way or another in the SDM. > > Another mitigating factor is that, because we use recursive pagetables, > we have to upgrade an INVLPG into a full flush anyway if we edited > non-leaf entries. > Yes, while proving it on the hypervisor side may be doable, I am quite unsure about PV guests. Some calls to HYPERVISOR_mmuext_op incidentally call invlpg and alike which could be affected with this change, as the guest can "assume" some behavior aspects of invlpg. > > As to a cmdline option, there's cpuid=no-tce if we really really need > it, but I don't think we want a dedicated TCE option. > > >>> --- a/xen/arch/x86/smpboot.c >>> +++ b/xen/arch/x86/smpboot.c >>> @@ -372,6 +372,9 @@ void asmlinkage start_secondary(void *unused) >>> >>> microcode_update_one(); >>> >>> + if ( boot_cpu_has(X86_FEATURE_TCE) ) >>> + write_efer(read_efer() | EFER_TCE); >> Same here. But I wonder if you couldn't set the bit in trampoline_efer. > > Yes, do set it in trampoline_efer, and drop this hunk. > Will do. > > If you add this, ... > >> --- a/xen/arch/x86/setup.c >> +++ b/xen/arch/x86/setup.c >> @@ -2008,6 +2008,13 @@ void asmlinkage __init noreturn __start_xen(void) >> if ( cpu_has_pku ) >> set_in_cr4(X86_CR4_PKE); >> >> + if ( boot_cpu_has(X86_FEATURE_TCE) ) > > ... the please also use it. > Yes, I forgot to change it. --- Aside enabling this flag for Xen/PV guests, it can be useful to expose it to the guests. While it's currently not going to change anything as most of the related instructions are trapped and managed by the hypervisor, it does affect the behavior of inside-guest INVLPGB if enabled in the VMCB. > ~Andrew Teddy Teddy Astie | Vates XCP-ng Developer XCP-ng & Xen Orchestra - Vates solutions web: https://vates.tech
On 03/04/2025 7:36 pm, Teddy Astie wrote: > Yes, while proving it on the hypervisor side may be doable, I am quite > unsure about PV guests. > Some calls to HYPERVISOR_mmuext_op incidentally call invlpg and alike > which could be affected with this change, as the guest can "assume" some > behavior aspects of invlpg. I wouldn't worry about PV guests. They have to delegate TLB flushing to Xen anyway, and already don't get to choose whether Xen uses INVLPG, or INVPCID, or something else to perform the requested action. We've e.g. switched from INVLPG to INVPCID as a consequence of Meltdown, and nothing exploded[1]. > Aside enabling this flag for Xen/PV guests, it can be useful to expose > it to the guests. While it's currently not going to change anything as > most of the related instructions are trapped and managed by the > hypervisor, it does affect the behavior of inside-guest INVLPGB if > enabled in the VMCB. Good point. Linux 6.14 does now use it when available. You should split this patch in two. First patch exposes it for guests, so use an H tag in cpufeatureset.h (available in HAP domains by default), and adjust hvm_efer_valid(). I think that's all you need to do, although remember CHANGELOG.md. Then the second patch turns it on for Xen. ~Andrew [1] Well, XSA-292 was a spectacular explosion, but we fixed that.
diff --git a/xen/arch/x86/include/asm/cpufeature.h b/xen/arch/x86/include/asm/cpufeature.h index 05399fb9c9..ab6d07b767 100644 --- a/xen/arch/x86/include/asm/cpufeature.h +++ b/xen/arch/x86/include/asm/cpufeature.h @@ -114,6 +114,7 @@ static inline bool boot_cpu_has(unsigned int feat) #define cpu_has_xop boot_cpu_has(X86_FEATURE_XOP) #define cpu_has_skinit boot_cpu_has(X86_FEATURE_SKINIT) #define cpu_has_fma4 boot_cpu_has(X86_FEATURE_FMA4) +#define cpu_has_tce boot_cpu_has(X86_FEATURE_TCE) #define cpu_has_tbm boot_cpu_has(X86_FEATURE_TBM) /* CPUID level 0x0000000D:1.eax */ diff --git a/xen/arch/x86/include/asm/msr-index.h b/xen/arch/x86/include/asm/msr-index.h index 22d9e76e55..d7b3a4bc40 100644 --- a/xen/arch/x86/include/asm/msr-index.h +++ b/xen/arch/x86/include/asm/msr-index.h @@ -200,6 +200,7 @@ #define EFER_NXE (_AC(1, ULL) << 11) /* No Execute Enable */ #define EFER_SVME (_AC(1, ULL) << 12) /* Secure Virtual Machine Enable */ #define EFER_FFXSE (_AC(1, ULL) << 14) /* Fast FXSAVE/FXRSTOR */ +#define EFER_TCE (_AC(1, ULL) << 15) /* Translation Cache Extensions */ #define EFER_AIBRSE (_AC(1, ULL) << 21) /* Automatic IBRS Enable */ #define EFER_KNOWN_MASK \ diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c index d70abb7e0c..96f200f853 100644 --- a/xen/arch/x86/setup.c +++ b/xen/arch/x86/setup.c @@ -2008,6 +2008,13 @@ void asmlinkage __init noreturn __start_xen(void) if ( cpu_has_pku ) set_in_cr4(X86_CR4_PKE); + if ( boot_cpu_has(X86_FEATURE_TCE) ) + { + printk("Enabling AMD TCE\n"); + + write_efer(read_efer() | EFER_TCE); + } + if ( opt_invpcid && cpu_has_invpcid ) use_invpcid = true; diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c index 54207e6d88..fbd1710720 100644 --- a/xen/arch/x86/smpboot.c +++ b/xen/arch/x86/smpboot.c @@ -372,6 +372,9 @@ void asmlinkage start_secondary(void *unused) microcode_update_one(); + if ( boot_cpu_has(X86_FEATURE_TCE) ) + write_efer(read_efer() | EFER_TCE); + /* * If any speculative control MSRs are available, apply Xen's default * settings. Note: These MSRs may only become available after loading diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h index cc6e984a88..a0c8d561fb 100644 --- a/xen/include/public/arch-x86/cpufeatureset.h +++ b/xen/include/public/arch-x86/cpufeatureset.h @@ -170,6 +170,7 @@ XEN_CPUFEATURE(SKINIT, 3*32+12) /* SKINIT/STGI instructions */ XEN_CPUFEATURE(WDT, 3*32+13) /* Watchdog timer */ XEN_CPUFEATURE(LWP, 3*32+15) /* Light Weight Profiling */ XEN_CPUFEATURE(FMA4, 3*32+16) /*A 4 operands MAC instructions */ +XEN_CPUFEATURE(TCE, 3*32+17) /* Translation Cache Extension support */ XEN_CPUFEATURE(NODEID_MSR, 3*32+19) /* NodeId MSR */ XEN_CPUFEATURE(TBM, 3*32+21) /*A trailing bit manipulations */ XEN_CPUFEATURE(TOPOEXT, 3*32+22) /* topology extensions CPUID leafs */