Message ID | 20200805021059.1331-1-ricardo.neri-calderon@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] x86/cpu: Use SERIALIZE in sync_core() when available | expand |
On Tue, Aug 04, 2020 at 07:10:59PM -0700, Ricardo Neri wrote: > The SERIALIZE instruction gives software a way to force the processor to > complete all modifications to flags, registers and memory from previous > instructions and drain all buffered writes to memory before the next > instruction is fetched and executed. Thus, it serves the purpose of > sync_core(). Use it when available. > > Commit 7117f16bf460 ("objtool: Fix ORC vs alternatives") enforced stack > invariance in alternatives. The iret-to-self does not comply with such > invariance. Thus, it cannot be used inside alternative code. Instead, use > an alternative that jumps to SERIALIZE when available. > > Cc: Andy Lutomirski <luto@kernel.org> > Cc: Cathy Zhang <cathy.zhang@intel.com> > Cc: Dave Hansen <dave.hansen@linux.intel.com> > Cc: Fenghua Yu <fenghua.yu@intel.com> > Cc: "H. Peter Anvin" <hpa@zytor.com> > Cc: Kyung Min Park <kyung.min.park@intel.com> > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com> > Cc: Sean Christopherson <sean.j.christopherson@intel.com> > Cc: linux-edac@vger.kernel.org > Cc: linux-kernel@vger.kernel.org > Suggested-by: Andy Lutomirski <luto@kernel.org> > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> > --- > This is a v2 from my initial submission [1]. The first three patches of > the series have been merged in Linus' tree. Hence, I am submitting only > this patch for review. > > [1]. https://lkml.org/lkml/2020/7/27/8 > > Changes since v1: > * Support SERIALIZE using alternative runtime patching. > (Peter Zijlstra, H. Peter Anvin) > * Added a note to specify which version of binutils supports SERIALIZE. > (Peter Zijlstra) > * Verified that (::: "memory") is used. (H. Peter Anvin) > --- > arch/x86/include/asm/special_insns.h | 2 ++ > arch/x86/include/asm/sync_core.h | 10 +++++++++- > 2 files changed, 11 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h > index 59a3e13204c3..25cd67801dda 100644 > --- a/arch/x86/include/asm/special_insns.h > +++ b/arch/x86/include/asm/special_insns.h > @@ -10,6 +10,8 @@ > #include <linux/irqflags.h> > #include <linux/jump_label.h> > > +/* Instruction opcode for SERIALIZE; supported in binutils >= 2.35. */ > +#define __ASM_SERIALIZE ".byte 0xf, 0x1, 0xe8" > /* > * Volatile isn't enough to prevent the compiler from reordering the > * read/write functions for the control registers and messing everything up. > diff --git a/arch/x86/include/asm/sync_core.h b/arch/x86/include/asm/sync_core.h > index fdb5b356e59b..201ea3d9a6bd 100644 > --- a/arch/x86/include/asm/sync_core.h > +++ b/arch/x86/include/asm/sync_core.h > @@ -5,15 +5,19 @@ > #include <linux/preempt.h> > #include <asm/processor.h> > #include <asm/cpufeature.h> > +#include <asm/special_insns.h> > > #ifdef CONFIG_X86_32 > static inline void iret_to_self(void) > { > asm volatile ( > + ALTERNATIVE("", "jmp 2f", X86_FEATURE_SERIALIZE) > "pushfl\n\t" > "pushl %%cs\n\t" > "pushl $1f\n\t" > "iret\n\t" > + "2:\n\t" > + __ASM_SERIALIZE "\n" > "1:" > : ASM_CALL_CONSTRAINT : : "memory"); > } > @@ -23,6 +27,7 @@ static inline void iret_to_self(void) > unsigned int tmp; > > asm volatile ( > + ALTERNATIVE("", "jmp 2f", X86_FEATURE_SERIALIZE) Why is this and above stuck inside the asm statement? Why can't you simply do: if (static_cpu_has(X86_FEATURE_SERIALIZE)) { asm volatile(__ASM_SERIALIZE ::: "memory"); return; } on function entry instead of making it more unreadable for no particular reason?
On August 4, 2020 9:48:40 PM PDT, Borislav Petkov <bp@suse.de> wrote: >On Tue, Aug 04, 2020 at 07:10:59PM -0700, Ricardo Neri wrote: >> The SERIALIZE instruction gives software a way to force the processor >to >> complete all modifications to flags, registers and memory from >previous >> instructions and drain all buffered writes to memory before the next >> instruction is fetched and executed. Thus, it serves the purpose of >> sync_core(). Use it when available. >> >> Commit 7117f16bf460 ("objtool: Fix ORC vs alternatives") enforced >stack >> invariance in alternatives. The iret-to-self does not comply with >such >> invariance. Thus, it cannot be used inside alternative code. Instead, >use >> an alternative that jumps to SERIALIZE when available. >> >> Cc: Andy Lutomirski <luto@kernel.org> >> Cc: Cathy Zhang <cathy.zhang@intel.com> >> Cc: Dave Hansen <dave.hansen@linux.intel.com> >> Cc: Fenghua Yu <fenghua.yu@intel.com> >> Cc: "H. Peter Anvin" <hpa@zytor.com> >> Cc: Kyung Min Park <kyung.min.park@intel.com> >> Cc: Peter Zijlstra <peterz@infradead.org> >> Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com> >> Cc: Sean Christopherson <sean.j.christopherson@intel.com> >> Cc: linux-edac@vger.kernel.org >> Cc: linux-kernel@vger.kernel.org >> Suggested-by: Andy Lutomirski <luto@kernel.org> >> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> >> --- >> This is a v2 from my initial submission [1]. The first three patches >of >> the series have been merged in Linus' tree. Hence, I am submitting >only >> this patch for review. >> >> [1]. https://lkml.org/lkml/2020/7/27/8 >> >> Changes since v1: >> * Support SERIALIZE using alternative runtime patching. >> (Peter Zijlstra, H. Peter Anvin) >> * Added a note to specify which version of binutils supports >SERIALIZE. >> (Peter Zijlstra) >> * Verified that (::: "memory") is used. (H. Peter Anvin) >> --- >> arch/x86/include/asm/special_insns.h | 2 ++ >> arch/x86/include/asm/sync_core.h | 10 +++++++++- >> 2 files changed, 11 insertions(+), 1 deletion(-) >> >> diff --git a/arch/x86/include/asm/special_insns.h >b/arch/x86/include/asm/special_insns.h >> index 59a3e13204c3..25cd67801dda 100644 >> --- a/arch/x86/include/asm/special_insns.h >> +++ b/arch/x86/include/asm/special_insns.h >> @@ -10,6 +10,8 @@ >> #include <linux/irqflags.h> >> #include <linux/jump_label.h> >> >> +/* Instruction opcode for SERIALIZE; supported in binutils >= 2.35. >*/ >> +#define __ASM_SERIALIZE ".byte 0xf, 0x1, 0xe8" >> /* >> * Volatile isn't enough to prevent the compiler from reordering the >> * read/write functions for the control registers and messing >everything up. >> diff --git a/arch/x86/include/asm/sync_core.h >b/arch/x86/include/asm/sync_core.h >> index fdb5b356e59b..201ea3d9a6bd 100644 >> --- a/arch/x86/include/asm/sync_core.h >> +++ b/arch/x86/include/asm/sync_core.h >> @@ -5,15 +5,19 @@ >> #include <linux/preempt.h> >> #include <asm/processor.h> >> #include <asm/cpufeature.h> >> +#include <asm/special_insns.h> >> >> #ifdef CONFIG_X86_32 >> static inline void iret_to_self(void) >> { >> asm volatile ( >> + ALTERNATIVE("", "jmp 2f", X86_FEATURE_SERIALIZE) >> "pushfl\n\t" >> "pushl %%cs\n\t" >> "pushl $1f\n\t" >> "iret\n\t" >> + "2:\n\t" >> + __ASM_SERIALIZE "\n" >> "1:" >> : ASM_CALL_CONSTRAINT : : "memory"); >> } >> @@ -23,6 +27,7 @@ static inline void iret_to_self(void) >> unsigned int tmp; >> >> asm volatile ( >> + ALTERNATIVE("", "jmp 2f", X86_FEATURE_SERIALIZE) > >Why is this and above stuck inside the asm statement? > >Why can't you simply do: > > if (static_cpu_has(X86_FEATURE_SERIALIZE)) { > asm volatile(__ASM_SERIALIZE ::: "memory"); > return; > } > >on function entry instead of making it more unreadable for no >particular >reason? Because why use an alternative to jump over one instruction? I personally would prefer to have the IRET put out of line and have the call/jmp replaced by SERIALIZE inline.
On Tue, Aug 04, 2020 at 09:58:25PM -0700, hpa@zytor.com wrote: > Because why use an alternative to jump over one instruction? > > I personally would prefer to have the IRET put out of line Can't yet - SERIALIZE CPUs are a minority at the moment. > and have the call/jmp replaced by SERIALIZE inline. Well, we could do: alternative_io("... IRET bunch", __ASM_SERIALIZE, X86_FEATURE_SERIALIZE, ...); and avoid all kinds of jumping. Alternatives get padded so there would be a couple of NOPs following when SERIALIZE gets patched in but it shouldn't be a problem. I guess one needs to look at what gcc generates...
On August 4, 2020 10:08:08 PM PDT, Borislav Petkov <bp@alien8.de> wrote: >On Tue, Aug 04, 2020 at 09:58:25PM -0700, hpa@zytor.com wrote: >> Because why use an alternative to jump over one instruction? >> >> I personally would prefer to have the IRET put out of line > >Can't yet - SERIALIZE CPUs are a minority at the moment. > >> and have the call/jmp replaced by SERIALIZE inline. > >Well, we could do: > > alternative_io("... IRET bunch", __ASM_SERIALIZE, >X86_FEATURE_SERIALIZE, ...); > >and avoid all kinds of jumping. Alternatives get padded so there >would be a couple of NOPs following when SERIALIZE gets patched in >but it shouldn't be a problem. I guess one needs to look at what gcc >generates... I didn't say behind a trap. IRET is a control transfer instruction, and slow, so putting it out of line really isn't unreasonable. Can even do a call to a common handler.
On Wed, Aug 05, 2020 at 06:48:40AM +0200, Borislav Petkov wrote: > On Tue, Aug 04, 2020 at 07:10:59PM -0700, Ricardo Neri wrote: > > The SERIALIZE instruction gives software a way to force the processor to > > complete all modifications to flags, registers and memory from previous > > instructions and drain all buffered writes to memory before the next > > instruction is fetched and executed. Thus, it serves the purpose of > > sync_core(). Use it when available. > > > > Commit 7117f16bf460 ("objtool: Fix ORC vs alternatives") enforced stack > > invariance in alternatives. The iret-to-self does not comply with such > > invariance. Thus, it cannot be used inside alternative code. Instead, use > > an alternative that jumps to SERIALIZE when available. > > > > Cc: Andy Lutomirski <luto@kernel.org> > > Cc: Cathy Zhang <cathy.zhang@intel.com> > > Cc: Dave Hansen <dave.hansen@linux.intel.com> > > Cc: Fenghua Yu <fenghua.yu@intel.com> > > Cc: "H. Peter Anvin" <hpa@zytor.com> > > Cc: Kyung Min Park <kyung.min.park@intel.com> > > Cc: Peter Zijlstra <peterz@infradead.org> > > Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com> > > Cc: Sean Christopherson <sean.j.christopherson@intel.com> > > Cc: linux-edac@vger.kernel.org > > Cc: linux-kernel@vger.kernel.org > > Suggested-by: Andy Lutomirski <luto@kernel.org> > > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> > > --- > > This is a v2 from my initial submission [1]. The first three patches of > > the series have been merged in Linus' tree. Hence, I am submitting only > > this patch for review. > > > > [1]. https://lkml.org/lkml/2020/7/27/8 > > > > Changes since v1: > > * Support SERIALIZE using alternative runtime patching. > > (Peter Zijlstra, H. Peter Anvin) > > * Added a note to specify which version of binutils supports SERIALIZE. > > (Peter Zijlstra) > > * Verified that (::: "memory") is used. (H. Peter Anvin) > > --- > > arch/x86/include/asm/special_insns.h | 2 ++ > > arch/x86/include/asm/sync_core.h | 10 +++++++++- > > 2 files changed, 11 insertions(+), 1 deletion(-) > > > > diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h > > index 59a3e13204c3..25cd67801dda 100644 > > --- a/arch/x86/include/asm/special_insns.h > > +++ b/arch/x86/include/asm/special_insns.h > > @@ -10,6 +10,8 @@ > > #include <linux/irqflags.h> > > #include <linux/jump_label.h> > > > > +/* Instruction opcode for SERIALIZE; supported in binutils >= 2.35. */ > > +#define __ASM_SERIALIZE ".byte 0xf, 0x1, 0xe8" > > /* > > * Volatile isn't enough to prevent the compiler from reordering the > > * read/write functions for the control registers and messing everything up. > > diff --git a/arch/x86/include/asm/sync_core.h b/arch/x86/include/asm/sync_core.h > > index fdb5b356e59b..201ea3d9a6bd 100644 > > --- a/arch/x86/include/asm/sync_core.h > > +++ b/arch/x86/include/asm/sync_core.h > > @@ -5,15 +5,19 @@ > > #include <linux/preempt.h> > > #include <asm/processor.h> > > #include <asm/cpufeature.h> > > +#include <asm/special_insns.h> > > > > #ifdef CONFIG_X86_32 > > static inline void iret_to_self(void) > > { > > asm volatile ( > > + ALTERNATIVE("", "jmp 2f", X86_FEATURE_SERIALIZE) > > "pushfl\n\t" > > "pushl %%cs\n\t" > > "pushl $1f\n\t" > > "iret\n\t" > > + "2:\n\t" > > + __ASM_SERIALIZE "\n" > > "1:" > > : ASM_CALL_CONSTRAINT : : "memory"); > > } > > @@ -23,6 +27,7 @@ static inline void iret_to_self(void) > > unsigned int tmp; > > > > asm volatile ( > > + ALTERNATIVE("", "jmp 2f", X86_FEATURE_SERIALIZE) > > Why is this and above stuck inside the asm statement? > > Why can't you simply do: > > if (static_cpu_has(X86_FEATURE_SERIALIZE)) { > asm volatile(__ASM_SERIALIZE ::: "memory"); > return; > } > > on function entry instead of making it more unreadable for no particular > reason? My my first submission I had implemented it as you describe. The only difference was that I used boot_cpu_has() instead of static_cpu_has() as the latter has a comment stating that: "Use static_cpu_has() only in fast paths (...) boot_cpu_has() is already fast enough for the majority of cases..." sync_core_before_usermode() already handles what I think are the critical paths. Thanks and BR, Ricardo
On Wed, Aug 05, 2020 at 07:08:08AM +0200, Borislav Petkov wrote: > On Tue, Aug 04, 2020 at 09:58:25PM -0700, hpa@zytor.com wrote: > > Because why use an alternative to jump over one instruction? > > > > I personally would prefer to have the IRET put out of line > > Can't yet - SERIALIZE CPUs are a minority at the moment. > > > and have the call/jmp replaced by SERIALIZE inline. > > Well, we could do: > > alternative_io("... IRET bunch", __ASM_SERIALIZE, X86_FEATURE_SERIALIZE, ...); > > and avoid all kinds of jumping. Alternatives get padded so there > would be a couple of NOPs following when SERIALIZE gets patched in > but it shouldn't be a problem. I guess one needs to look at what gcc > generates... But the IRET-TO-SELF code has instruction which modify the stack. This would violate stack invariance in alternatives as enforced in commit 7117f16bf460 ("objtool: Fix ORC vs alternatives"). As a result, objtool gives warnings as follows: arch/x86/kernel/alternative.o: warning: objtool: do_sync_core()+0xe: alternative modifies stack Perhaps in this specific case it does not matter as the changes in the stack will be undone by IRET. However, using alternative_io would require adding the macro STACK_FRAME_NON_STANDARD to functions using sync_core(). IMHO, it wouldn't look good. So maybe the best approach is to implement as you suggested using static_cpu_has()? Thanks and BR, Ricardo
On Wed, Aug 5, 2020 at 10:07 AM Ricardo Neri <ricardo.neri-calderon@linux.intel.com> wrote: > > On Wed, Aug 05, 2020 at 07:08:08AM +0200, Borislav Petkov wrote: > > On Tue, Aug 04, 2020 at 09:58:25PM -0700, hpa@zytor.com wrote: > > > Because why use an alternative to jump over one instruction? > > > > > > I personally would prefer to have the IRET put out of line > > > > Can't yet - SERIALIZE CPUs are a minority at the moment. > > > > > and have the call/jmp replaced by SERIALIZE inline. > > > > Well, we could do: > > > > alternative_io("... IRET bunch", __ASM_SERIALIZE, X86_FEATURE_SERIALIZE, ...); > > > > and avoid all kinds of jumping. Alternatives get padded so there > > would be a couple of NOPs following when SERIALIZE gets patched in > > but it shouldn't be a problem. I guess one needs to look at what gcc > > generates... > > But the IRET-TO-SELF code has instruction which modify the stack. This > would violate stack invariance in alternatives as enforced in commit > 7117f16bf460 ("objtool: Fix ORC vs alternatives"). As a result, objtool > gives warnings as follows: > > arch/x86/kernel/alternative.o: warning: objtool: do_sync_core()+0xe: > alternative modifies stack > > Perhaps in this specific case it does not matter as the changes in the > stack will be undone by IRET. However, using alternative_io would require > adding the macro STACK_FRAME_NON_STANDARD to functions using sync_core(). > IMHO, it wouldn't look good. > > So maybe the best approach is to implement as you suggested using > static_cpu_has()? I agree. Let's keep it simple. Honestly, I think the right solution is to have iret_to_self() in actual asm and invoke it from C as needed. IRET is *slow* -- trying to optimize it at all is silly. The big optimization was switching from CPUID to IRET, since CPUID is slooooooooooooooooooow in virtual environments, whereas IRET is merely sloooooooow and SERIALIZE is probably just sloooow. (I once benchmarked it. IIRC the winning version on my laptop is MOV to CR2 on bare metal and IRET in a Xen PV guest. This optimization was not obviously worthwhile.) > > Thanks and BR, > Ricardo
On Wed, Aug 05, 2020 at 11:28:31AM -0700, Andy Lutomirski wrote: > On Wed, Aug 5, 2020 at 10:07 AM Ricardo Neri > <ricardo.neri-calderon@linux.intel.com> wrote: > > > > On Wed, Aug 05, 2020 at 07:08:08AM +0200, Borislav Petkov wrote: > > > On Tue, Aug 04, 2020 at 09:58:25PM -0700, hpa@zytor.com wrote: > > > > Because why use an alternative to jump over one instruction? > > > > > > > > I personally would prefer to have the IRET put out of line > > > > > > Can't yet - SERIALIZE CPUs are a minority at the moment. > > > > > > > and have the call/jmp replaced by SERIALIZE inline. > > > > > > Well, we could do: > > > > > > alternative_io("... IRET bunch", __ASM_SERIALIZE, X86_FEATURE_SERIALIZE, ...); > > > > > > and avoid all kinds of jumping. Alternatives get padded so there > > > would be a couple of NOPs following when SERIALIZE gets patched in > > > but it shouldn't be a problem. I guess one needs to look at what gcc > > > generates... > > > > But the IRET-TO-SELF code has instruction which modify the stack. This > > would violate stack invariance in alternatives as enforced in commit > > 7117f16bf460 ("objtool: Fix ORC vs alternatives"). As a result, objtool > > gives warnings as follows: > > > > arch/x86/kernel/alternative.o: warning: objtool: do_sync_core()+0xe: > > alternative modifies stack > > > > Perhaps in this specific case it does not matter as the changes in the > > stack will be undone by IRET. However, using alternative_io would require > > adding the macro STACK_FRAME_NON_STANDARD to functions using sync_core(). > > IMHO, it wouldn't look good. > > > > So maybe the best approach is to implement as you suggested using > > static_cpu_has()? > > I agree. Let's keep it simple. > > Honestly, I think the right solution is to have iret_to_self() in > actual asm and invoke it from C as needed. Do you mean anything different from what we have already [1]? If I understand your comment correctly, we have exactly that: an iret_to_self() asm implementation invoked from C. Thanks and BR, Ricardo [1]. https://lore.kernel.org/lkml/20200727043132.15082-4-ricardo.neri-calderon@linux.intel.com/ Thanks and BR, Ricardo
> On Aug 5, 2020, at 12:11 PM, Ricardo Neri <ricardo.neri-calderon@linux.intel.com> wrote: > > On Wed, Aug 05, 2020 at 11:28:31AM -0700, Andy Lutomirski wrote: >>> On Wed, Aug 5, 2020 at 10:07 AM Ricardo Neri >>> <ricardo.neri-calderon@linux.intel.com> wrote: >>> >>> On Wed, Aug 05, 2020 at 07:08:08AM +0200, Borislav Petkov wrote: >>>> On Tue, Aug 04, 2020 at 09:58:25PM -0700, hpa@zytor.com wrote: >>>>> Because why use an alternative to jump over one instruction? >>>>> >>>>> I personally would prefer to have the IRET put out of line >>>> >>>> Can't yet - SERIALIZE CPUs are a minority at the moment. >>>> >>>>> and have the call/jmp replaced by SERIALIZE inline. >>>> >>>> Well, we could do: >>>> >>>> alternative_io("... IRET bunch", __ASM_SERIALIZE, X86_FEATURE_SERIALIZE, ...); >>>> >>>> and avoid all kinds of jumping. Alternatives get padded so there >>>> would be a couple of NOPs following when SERIALIZE gets patched in >>>> but it shouldn't be a problem. I guess one needs to look at what gcc >>>> generates... >>> >>> But the IRET-TO-SELF code has instruction which modify the stack. This >>> would violate stack invariance in alternatives as enforced in commit >>> 7117f16bf460 ("objtool: Fix ORC vs alternatives"). As a result, objtool >>> gives warnings as follows: >>> >>> arch/x86/kernel/alternative.o: warning: objtool: do_sync_core()+0xe: >>> alternative modifies stack >>> >>> Perhaps in this specific case it does not matter as the changes in the >>> stack will be undone by IRET. However, using alternative_io would require >>> adding the macro STACK_FRAME_NON_STANDARD to functions using sync_core(). >>> IMHO, it wouldn't look good. >>> >>> So maybe the best approach is to implement as you suggested using >>> static_cpu_has()? >> >> I agree. Let's keep it simple. >> >> Honestly, I think the right solution is to have iret_to_self() in >> actual asm and invoke it from C as needed. > > Do you mean anything different from what we have already [1]? If I > understand your comment correctly, we have exactly that: an > iret_to_self() asm implementation invoked from C. I meant asm as in a .S file. But the code we have is fine for this purpose, at least for now. > > Thanks and BR, > Ricardo > > [1]. https://lore.kernel.org/lkml/20200727043132.15082-4-ricardo.neri-calderon@linux.intel.com/ > > Thanks and BR, > Ricardo
> I meant asm as in a .S file. But the code we have is fine for this purpose, at least for now.
There seem to be some drivers that call sync_core:
drivers/misc/sgi-gru/grufault.c: sync_core();
drivers/misc/sgi-gru/grufault.c: sync_core(); /* make sure we are have current data */
drivers/misc/sgi-gru/gruhandles.c: sync_core();
drivers/misc/sgi-gru/gruhandles.c: sync_core();
drivers/misc/sgi-gru/grukservices.c: sync_core();
So if you go this path some day be sure to EXPORT the iret_to_self() function.
-Tony
diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h index 59a3e13204c3..25cd67801dda 100644 --- a/arch/x86/include/asm/special_insns.h +++ b/arch/x86/include/asm/special_insns.h @@ -10,6 +10,8 @@ #include <linux/irqflags.h> #include <linux/jump_label.h> +/* Instruction opcode for SERIALIZE; supported in binutils >= 2.35. */ +#define __ASM_SERIALIZE ".byte 0xf, 0x1, 0xe8" /* * Volatile isn't enough to prevent the compiler from reordering the * read/write functions for the control registers and messing everything up. diff --git a/arch/x86/include/asm/sync_core.h b/arch/x86/include/asm/sync_core.h index fdb5b356e59b..201ea3d9a6bd 100644 --- a/arch/x86/include/asm/sync_core.h +++ b/arch/x86/include/asm/sync_core.h @@ -5,15 +5,19 @@ #include <linux/preempt.h> #include <asm/processor.h> #include <asm/cpufeature.h> +#include <asm/special_insns.h> #ifdef CONFIG_X86_32 static inline void iret_to_self(void) { asm volatile ( + ALTERNATIVE("", "jmp 2f", X86_FEATURE_SERIALIZE) "pushfl\n\t" "pushl %%cs\n\t" "pushl $1f\n\t" "iret\n\t" + "2:\n\t" + __ASM_SERIALIZE "\n" "1:" : ASM_CALL_CONSTRAINT : : "memory"); } @@ -23,6 +27,7 @@ static inline void iret_to_self(void) unsigned int tmp; asm volatile ( + ALTERNATIVE("", "jmp 2f", X86_FEATURE_SERIALIZE) "mov %%ss, %0\n\t" "pushq %q0\n\t" "pushq %%rsp\n\t" @@ -32,6 +37,8 @@ static inline void iret_to_self(void) "pushq %q0\n\t" "pushq $1f\n\t" "iretq\n\t" + "2:\n\t" + __ASM_SERIALIZE "\n" "1:" : "=&r" (tmp), ASM_CALL_CONSTRAINT : : "cc", "memory"); } @@ -54,7 +61,8 @@ static inline void iret_to_self(void) static inline void sync_core(void) { /* - * There are quite a few ways to do this. IRET-to-self is nice + * Hardware can do this for us if SERIALIZE is available. Otherwise, + * there are quite a few ways to do this. IRET-to-self is nice * because it works on every CPU, at any CPL (so it's compatible * with paravirtualization), and it never exits to a hypervisor. * The only down sides are that it's a bit slow (it seems to be
The SERIALIZE instruction gives software a way to force the processor to complete all modifications to flags, registers and memory from previous instructions and drain all buffered writes to memory before the next instruction is fetched and executed. Thus, it serves the purpose of sync_core(). Use it when available. Commit 7117f16bf460 ("objtool: Fix ORC vs alternatives") enforced stack invariance in alternatives. The iret-to-self does not comply with such invariance. Thus, it cannot be used inside alternative code. Instead, use an alternative that jumps to SERIALIZE when available. Cc: Andy Lutomirski <luto@kernel.org> Cc: Cathy Zhang <cathy.zhang@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Kyung Min Park <kyung.min.park@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: linux-edac@vger.kernel.org Cc: linux-kernel@vger.kernel.org Suggested-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> --- This is a v2 from my initial submission [1]. The first three patches of the series have been merged in Linus' tree. Hence, I am submitting only this patch for review. [1]. https://lkml.org/lkml/2020/7/27/8 Changes since v1: * Support SERIALIZE using alternative runtime patching. (Peter Zijlstra, H. Peter Anvin) * Added a note to specify which version of binutils supports SERIALIZE. (Peter Zijlstra) * Verified that (::: "memory") is used. (H. Peter Anvin) --- arch/x86/include/asm/special_insns.h | 2 ++ arch/x86/include/asm/sync_core.h | 10 +++++++++- 2 files changed, 11 insertions(+), 1 deletion(-)