Message ID | 20200806192531.25136-1-ricardo.neri-calderon@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v3] x86/cpu: Use SERIALIZE in sync_core() when available | expand |
On 8/6/20 12:25 PM, Ricardo Neri wrote: > static inline void sync_core(void) > { > /* > - * There are quite a few ways to do this. IRET-to-self is nice > + * Hardware can do this for us if SERIALIZE is available. Otherwise, > + * there are quite a few ways to do this. IRET-to-self is nice This seems to imply that things other than SERIALIZE aren't the hardware doing this. All of these methods are equally architecturally serializing *and* provided by the hardware. I also don't quite get the point of separating the comments from the code. Shouldn't this be: /* * The SERIALIZE instruction is the most straightforward way to * do this but it not universally available. */ if (static_cpu_has(X86_FEATURE_SERIALIZE)) { asm volatile(__ASM_SERIALIZE ::: "memory"); return; } /* * For all other processors, IRET-to-self is nice ... */ iret_to_self();
On Thu, Aug 06, 2020 at 12:57:26PM -0700, Dave Hansen wrote: > On 8/6/20 12:25 PM, Ricardo Neri wrote: > > static inline void sync_core(void) > > { > > /* > > - * There are quite a few ways to do this. IRET-to-self is nice > > + * Hardware can do this for us if SERIALIZE is available. Otherwise, > > + * there are quite a few ways to do this. IRET-to-self is nice > > This seems to imply that things other than SERIALIZE aren't the hardware > doing this. All of these methods are equally architecturally > serializing *and* provided by the hardware. Indeed, I can see how the wording may imply that. > > I also don't quite get the point of separating the comments from the > code. Shouldn't this be: > > /* > * The SERIALIZE instruction is the most straightforward way to > * do this but it not universally available. > */ I regard the comment as describing all the considered options to for architectural serialization. What about if I add SERIALIZE as another option? I propose to discuss it towards the end of the comment: /* * There are quite a few ways to do this. IRET-to-self is nice * because it works on every CPU, at any CPL (so it's compatible * with paravirtualization), and it never exits to a hypervisor. * The only down sides are that it's a bit slow (it seems to be * a bit more than 2x slower than the fastest options) and that * it unmasks NMIs. The "push %cs" is needed because, in * paravirtual environments, __KERNEL_CS may not be a valid CS * value when we do IRET directly. * * In case NMI unmasking or performance ever becomes a problem, * the next best option appears to be MOV-to-CR2 and an * unconditional jump. That sequence also works on all CPUs, * but it will fault at CPL3 (i.e. Xen PV). * * CPUID is the conventional way, but it's nasty: it doesn't * exist on some 486-like CPUs, and it usually exits to a * hypervisor. * * The SERIALIZE instruction is the most straightforward way to * do this as it does not clobber registers or exit to a * hypervisor. However, it is not universally available. * * Like all of Linux's memory ordering operations, this is a * compiler barrier as well. */ What do you think? > if (static_cpu_has(X86_FEATURE_SERIALIZE)) { > asm volatile(__ASM_SERIALIZE ::: "memory"); > return; > } > > /* > * For all other processors, IRET-to-self is nice ... > */ > iret_to_self();
On 8/6/20 4:04 PM, Ricardo Neri wrote: > * CPUID is the conventional way, but it's nasty: it doesn't > * exist on some 486-like CPUs, and it usually exits to a > * hypervisor. > * > * The SERIALIZE instruction is the most straightforward way to > * do this as it does not clobber registers or exit to a > * hypervisor. However, it is not universally available. > * > * Like all of Linux's memory ordering operations, this is a > * compiler barrier as well. > */ > > What do you think? I like what I suggested. :) SERIALIZE is best where available. Do it first, comment it by itself. Then, go into the long discussion of the other alternatives. They only make sense when SERIALIZE isn't there, and the logic for selection there is substantially more complicated.
On Thu, Aug 06, 2020 at 04:08:47PM -0700, Dave Hansen wrote: > On 8/6/20 4:04 PM, Ricardo Neri wrote: > > * CPUID is the conventional way, but it's nasty: it doesn't > > * exist on some 486-like CPUs, and it usually exits to a > > * hypervisor. > > * > > * The SERIALIZE instruction is the most straightforward way to > > * do this as it does not clobber registers or exit to a > > * hypervisor. However, it is not universally available. > > * > > * Like all of Linux's memory ordering operations, this is a > > * compiler barrier as well. > > */ > > > > What do you think? > > I like what I suggested. :) > > SERIALIZE is best where available. Do it first, comment it by itself. > > Then, go into the long discussion of the other alternatives. They only > make sense when SERIALIZE isn't there, and the logic for selection there > is substantially more complicated. Sure Dave, I think this layout makes sense. I will rework the comments. Thanks and BR, Ricardo
diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h index 59a3e13204c3..25cd67801dda 100644 --- a/arch/x86/include/asm/special_insns.h +++ b/arch/x86/include/asm/special_insns.h @@ -10,6 +10,8 @@ #include <linux/irqflags.h> #include <linux/jump_label.h> +/* Instruction opcode for SERIALIZE; supported in binutils >= 2.35. */ +#define __ASM_SERIALIZE ".byte 0xf, 0x1, 0xe8" /* * Volatile isn't enough to prevent the compiler from reordering the * read/write functions for the control registers and messing everything up. diff --git a/arch/x86/include/asm/sync_core.h b/arch/x86/include/asm/sync_core.h index fdb5b356e59b..b74580413a0b 100644 --- a/arch/x86/include/asm/sync_core.h +++ b/arch/x86/include/asm/sync_core.h @@ -5,6 +5,7 @@ #include <linux/preempt.h> #include <asm/processor.h> #include <asm/cpufeature.h> +#include <asm/special_insns.h> #ifdef CONFIG_X86_32 static inline void iret_to_self(void) @@ -54,7 +55,8 @@ static inline void iret_to_self(void) static inline void sync_core(void) { /* - * There are quite a few ways to do this. IRET-to-self is nice + * Hardware can do this for us if SERIALIZE is available. Otherwise, + * there are quite a few ways to do this. IRET-to-self is nice * because it works on every CPU, at any CPL (so it's compatible * with paravirtualization), and it never exits to a hypervisor. * The only down sides are that it's a bit slow (it seems to be @@ -75,6 +77,11 @@ static inline void sync_core(void) * Like all of Linux's memory ordering operations, this is a * compiler barrier as well. */ + if (static_cpu_has(X86_FEATURE_SERIALIZE)) { + asm volatile(__ASM_SERIALIZE ::: "memory"); + return; + } + iret_to_self(); }