Message ID | 20240204-delay-verw-v7-1-59be2d704cb2@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Delay VERW | expand |
On Sun, Feb 04, 2024 at 11:18:59PM -0800, Pawan Gupta wrote: > .popsection > + > +/* > + * Defines the VERW operand that is disguised as entry code so that "Define..." > + * it can be referenced with KPTI enabled. This ensures VERW can be "Ensure..." But committer can fix those. > + * used late in exit-to-user path after page tables are switched. > + */ > +.pushsection .entry.text, "ax" > + > +.align L1_CACHE_BYTES, 0xcc > +SYM_CODE_START_NOALIGN(mds_verw_sel) > + UNWIND_HINT_UNDEFINED > + ANNOTATE_NOENDBR > + .word __KERNEL_DS > +.align L1_CACHE_BYTES, 0xcc > +SYM_CODE_END(mds_verw_sel); > +/* For KVM */ > +EXPORT_SYMBOL_GPL(mds_verw_sel); > + > +.popsection > diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h > index fdf723b6f6d0..2b62cdd8dd12 100644 > --- a/arch/x86/include/asm/cpufeatures.h > +++ b/arch/x86/include/asm/cpufeatures.h > @@ -95,7 +95,7 @@ > #define X86_FEATURE_SYSENTER32 ( 3*32+15) /* "" sysenter in IA32 userspace */ > #define X86_FEATURE_REP_GOOD ( 3*32+16) /* REP microcode works well */ > #define X86_FEATURE_AMD_LBR_V2 ( 3*32+17) /* AMD Last Branch Record Extension Version 2 */ > -/* FREE, was #define X86_FEATURE_LFENCE_RDTSC ( 3*32+18) "" LFENCE synchronizes RDTSC */ > +#define X86_FEATURE_CLEAR_CPU_BUF ( 3*32+18) /* "" Clear CPU buffers using VERW */ > #define X86_FEATURE_ACC_POWER ( 3*32+19) /* AMD Accumulated Power Mechanism */ > #define X86_FEATURE_NOPL ( 3*32+20) /* The NOPL (0F 1F) instructions */ > #define X86_FEATURE_ALWAYS ( 3*32+21) /* "" Always-present feature */ > diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h > index 262e65539f83..ec85dfe67123 100644 > --- a/arch/x86/include/asm/nospec-branch.h > +++ b/arch/x86/include/asm/nospec-branch.h > @@ -315,6 +315,21 @@ > #endif > .endm > > +/* > + * Macros to execute VERW instruction that mitigate transient data sampling > + * attacks such as MDS. On affected systems a microcode update overloaded VERW > + * instruction to also clear the CPU buffers. VERW clobbers CFLAGS.ZF. > + * > + * Note: Only the memory operand variant of VERW clears the CPU buffers. > + */ > +.macro EXEC_VERW I think I asked this already: Why isn't this called simply "VERW"? There's no better name as this is basically the insn itself... > + verw _ASM_RIP(mds_verw_sel) > +.endm
On Fri, Feb 09, 2024 at 06:28:43PM +0100, Borislav Petkov wrote: > On Sun, Feb 04, 2024 at 11:18:59PM -0800, Pawan Gupta wrote: > > .popsection > > + > > +/* > > + * Defines the VERW operand that is disguised as entry code so that > > "Define..." > > > + * it can be referenced with KPTI enabled. This ensures VERW can be > > "Ensure..." > > But committer can fix those. > > > + * used late in exit-to-user path after page tables are switched. > > + */ > > +.pushsection .entry.text, "ax" > > + > > +.align L1_CACHE_BYTES, 0xcc > > +SYM_CODE_START_NOALIGN(mds_verw_sel) > > + UNWIND_HINT_UNDEFINED > > + ANNOTATE_NOENDBR > > + .word __KERNEL_DS > > +.align L1_CACHE_BYTES, 0xcc > > +SYM_CODE_END(mds_verw_sel); > > +/* For KVM */ > > +EXPORT_SYMBOL_GPL(mds_verw_sel); > > + > > +.popsection > > diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h > > index fdf723b6f6d0..2b62cdd8dd12 100644 > > --- a/arch/x86/include/asm/cpufeatures.h > > +++ b/arch/x86/include/asm/cpufeatures.h > > @@ -95,7 +95,7 @@ > > #define X86_FEATURE_SYSENTER32 ( 3*32+15) /* "" sysenter in IA32 userspace */ > > #define X86_FEATURE_REP_GOOD ( 3*32+16) /* REP microcode works well */ > > #define X86_FEATURE_AMD_LBR_V2 ( 3*32+17) /* AMD Last Branch Record Extension Version 2 */ > > -/* FREE, was #define X86_FEATURE_LFENCE_RDTSC ( 3*32+18) "" LFENCE synchronizes RDTSC */ > > +#define X86_FEATURE_CLEAR_CPU_BUF ( 3*32+18) /* "" Clear CPU buffers using VERW */ > > #define X86_FEATURE_ACC_POWER ( 3*32+19) /* AMD Accumulated Power Mechanism */ > > #define X86_FEATURE_NOPL ( 3*32+20) /* The NOPL (0F 1F) instructions */ > > #define X86_FEATURE_ALWAYS ( 3*32+21) /* "" Always-present feature */ > > diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h > > index 262e65539f83..ec85dfe67123 100644 > > --- a/arch/x86/include/asm/nospec-branch.h > > +++ b/arch/x86/include/asm/nospec-branch.h > > @@ -315,6 +315,21 @@ > > #endif > > .endm > > > > +/* > > + * Macros to execute VERW instruction that mitigate transient data sampling > > + * attacks such as MDS. On affected systems a microcode update overloaded VERW > > + * instruction to also clear the CPU buffers. VERW clobbers CFLAGS.ZF. > > + * > > + * Note: Only the memory operand variant of VERW clears the CPU buffers. > > + */ > > +.macro EXEC_VERW > > I think I asked this already: Sorry I can't seem to find that on lore. (Though, there was a comment on avoiding the macro alltogether, to which I replied that it complicates 32-bit.) > Why isn't this called simply "VERW"? > > There's no better name as this is basically the insn itself... Agree. > > + verw _ASM_RIP(mds_verw_sel) But, in this case the instruction needs a special operand, and the build fails with the macro name VERW: AS arch/x86/entry/entry.o AS arch/x86/entry/entry_64.o arch/x86/entry/entry_64.S: Assembler messages: arch/x86/entry/entry_64.S:164: Error: too many positional arguments arch/x86/entry/entry_64.S:577: Error: too many positional arguments arch/x86/entry/entry_64.S:728: Error: too many positional arguments arch/x86/entry/entry_64.S:1479: Error: too many positional arguments make[4]: *** [scripts/Makefile.build:361: arch/x86/entry/entry_64.o] Error 1 make[3]: *** [scripts/Makefile.build:481: arch/x86/entry] Error 2 make[2]: *** [scripts/Makefile.build:481: arch/x86] Error 2 > > +.endm Perhaps s/EXEC_VERW/_VERW/ ? Thanks for the review.
On Fri, Feb 09, 2024 at 11:06:02AM -0800, Pawan Gupta wrote: > (Though, there was a comment on avoiding the macro alltogether, to which > I replied that it complicates 32-bit.) Hmm, so this seems to build the respective entry_{32,64}.S TUs fine: --- diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h index e81cabcb758f..7d1e5fe66495 100644 --- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -313,12 +313,9 @@ * * Note: Only the memory operand variant of VERW clears the CPU buffers. */ -.macro EXEC_VERW - verw _ASM_RIP(mds_verw_sel) -.endm .macro CLEAR_CPU_BUFFERS - ALTERNATIVE "", __stringify(EXEC_VERW), X86_FEATURE_CLEAR_CPU_BUF + ALTERNATIVE "", __stringify(verw _ASM_RIP(mds_verw_sel)), X86_FEATURE_CLEAR_CPU_BUF .endm #else /* __ASSEMBLY__ */ --- and looking at the asm: 64-bit: # 317 "./arch/x86/include/asm/nospec-branch.h" .macro CLEAR_CPU_BUFFERS ALTERNATIVE "", "verw mds_verw_sel (% rip)", ( 3*32+18) .endm 23: 0f 00 2d 00 00 00 00 verw 0x0(%rip) # 2a <.altinstr_replacement+0x2a> 32-bit: .macro CLEAR_CPU_BUFFERS ALTERNATIVE "", "verw mds_verw_sel", ( 3*32+18) .endm 13d: 0f 00 2d 00 00 00 00 verw 0x0 it makes sense. So what complications do you mean?
On Fri, Feb 09, 2024 at 08:57:04PM +0100, Borislav Petkov wrote: > On Fri, Feb 09, 2024 at 11:06:02AM -0800, Pawan Gupta wrote: > > (Though, there was a comment on avoiding the macro alltogether, to which > > I replied that it complicates 32-bit.) > > Hmm, so this seems to build the respective entry_{32,64}.S TUs fine: > > --- > diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h > index e81cabcb758f..7d1e5fe66495 100644 > --- a/arch/x86/include/asm/nospec-branch.h > +++ b/arch/x86/include/asm/nospec-branch.h > @@ -313,12 +313,9 @@ > * > * Note: Only the memory operand variant of VERW clears the CPU buffers. > */ > -.macro EXEC_VERW > - verw _ASM_RIP(mds_verw_sel) > -.endm > > .macro CLEAR_CPU_BUFFERS > - ALTERNATIVE "", __stringify(EXEC_VERW), X86_FEATURE_CLEAR_CPU_BUF > + ALTERNATIVE "", __stringify(verw _ASM_RIP(mds_verw_sel)), X86_FEATURE_CLEAR_CPU_BUF > .endm > +.macro CLEAR_CPU_BUFFERS > + ALTERNATIVE "", __stringify(EXEC_VERW), X86_FEATURE_CLEAR_CPU_BUF > +.endm This works, thanks. My mistake that I tried your earlier suggestion without playing with it: > > +.macro CLEAR_CPU_BUFFERS > > + ALTERNATIVE "", __stringify(EXEC_VERW), X86_FEATURE_CLEAR_CPU_BUF > > +.endm > > Why can't this simply be: > > .macro CLEAR_CPU_BUFFERS > ALTERNATIVE "", "verw mds_verw_sel(%rip)", X86_FEATURE_CLEAR_CPU_BUF > .endm > > without that silly EXEC_VERW macro? > > -- > Regards/Gruss, > Boris.
diff --git a/arch/x86/entry/entry.S b/arch/x86/entry/entry.S index 8c8d38f0cb1d..bd8e77c5a375 100644 --- a/arch/x86/entry/entry.S +++ b/arch/x86/entry/entry.S @@ -6,6 +6,9 @@ #include <linux/export.h> #include <linux/linkage.h> #include <asm/msr-index.h> +#include <asm/unwind_hints.h> +#include <asm/segment.h> +#include <asm/cache.h> .pushsection .noinstr.text, "ax" @@ -20,3 +23,22 @@ SYM_FUNC_END(entry_ibpb) EXPORT_SYMBOL_GPL(entry_ibpb); .popsection + +/* + * Defines the VERW operand that is disguised as entry code so that + * it can be referenced with KPTI enabled. This ensures VERW can be + * used late in exit-to-user path after page tables are switched. + */ +.pushsection .entry.text, "ax" + +.align L1_CACHE_BYTES, 0xcc +SYM_CODE_START_NOALIGN(mds_verw_sel) + UNWIND_HINT_UNDEFINED + ANNOTATE_NOENDBR + .word __KERNEL_DS +.align L1_CACHE_BYTES, 0xcc +SYM_CODE_END(mds_verw_sel); +/* For KVM */ +EXPORT_SYMBOL_GPL(mds_verw_sel); + +.popsection diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index fdf723b6f6d0..2b62cdd8dd12 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -95,7 +95,7 @@ #define X86_FEATURE_SYSENTER32 ( 3*32+15) /* "" sysenter in IA32 userspace */ #define X86_FEATURE_REP_GOOD ( 3*32+16) /* REP microcode works well */ #define X86_FEATURE_AMD_LBR_V2 ( 3*32+17) /* AMD Last Branch Record Extension Version 2 */ -/* FREE, was #define X86_FEATURE_LFENCE_RDTSC ( 3*32+18) "" LFENCE synchronizes RDTSC */ +#define X86_FEATURE_CLEAR_CPU_BUF ( 3*32+18) /* "" Clear CPU buffers using VERW */ #define X86_FEATURE_ACC_POWER ( 3*32+19) /* AMD Accumulated Power Mechanism */ #define X86_FEATURE_NOPL ( 3*32+20) /* The NOPL (0F 1F) instructions */ #define X86_FEATURE_ALWAYS ( 3*32+21) /* "" Always-present feature */ diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h index 262e65539f83..ec85dfe67123 100644 --- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -315,6 +315,21 @@ #endif .endm +/* + * Macros to execute VERW instruction that mitigate transient data sampling + * attacks such as MDS. On affected systems a microcode update overloaded VERW + * instruction to also clear the CPU buffers. VERW clobbers CFLAGS.ZF. + * + * Note: Only the memory operand variant of VERW clears the CPU buffers. + */ +.macro EXEC_VERW + verw _ASM_RIP(mds_verw_sel) +.endm + +.macro CLEAR_CPU_BUFFERS + ALTERNATIVE "", __stringify(EXEC_VERW), X86_FEATURE_CLEAR_CPU_BUF +.endm + #else /* __ASSEMBLY__ */ #define ANNOTATE_RETPOLINE_SAFE \ @@ -536,6 +551,8 @@ DECLARE_STATIC_KEY_FALSE(switch_mm_cond_l1d_flush); DECLARE_STATIC_KEY_FALSE(mmio_stale_data_clear); +extern u16 mds_verw_sel; + #include <asm/segment.h> /**
MDS mitigation requires clearing the CPU buffers before returning to user. This needs to be done late in the exit-to-user path. Current location of VERW leaves a possibility of kernel data ending up in CPU buffers for memory accesses done after VERW such as: 1. Kernel data accessed by an NMI between VERW and return-to-user can remain in CPU buffers since NMI returning to kernel does not execute VERW to clear CPU buffers. 2. Alyssa reported that after VERW is executed, CONFIG_GCC_PLUGIN_STACKLEAK=y scrubs the stack used by a system call. Memory accesses during stack scrubbing can move kernel stack contents into CPU buffers. 3. When caller saved registers are restored after a return from function executing VERW, the kernel stack accesses can remain in CPU buffers(since they occur after VERW). To fix this VERW needs to be moved very late in exit-to-user path. In preparation for moving VERW to entry/exit asm code, create macros that can be used in asm. Also make VERW patching depend on a new feature flag X86_FEATURE_CLEAR_CPU_BUF. Reported-by: Alyssa Milburn <alyssa.milburn@intel.com> Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Suggested-by: Peter Zijlstra <peterz@infradead.org> Cc: stable@kernel.org Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> --- arch/x86/entry/entry.S | 22 ++++++++++++++++++++++ arch/x86/include/asm/cpufeatures.h | 2 +- arch/x86/include/asm/nospec-branch.h | 17 +++++++++++++++++ 3 files changed, 40 insertions(+), 1 deletion(-)