Message ID | 20170718223333.110371-17-thgarnie@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, Jul 18, 2017 at 6:33 PM, Thomas Garnier <thgarnie@google.com> wrote: > Perpcu uses a clever design where the .percu ELF section has a virtual > address of zero and the relocation code avoid relocating specific > symbols. It makes the code simple and easily adaptable with or without > SMP support. > > This design is incompatible with PIE because generated code always try to > access the zero virtual address relative to the default mapping address. > It becomes impossible when KASLR is configured to go below -2G. This > patch solves this problem by removing the zero mapping and adapting the GS > base to be relative to the expected address. These changes are done only > when PIE is enabled. The original implementation is kept as-is > by default. The reason the per-cpu section is zero-based on x86-64 is to workaround GCC hardcoding the stack protector canary at %gs:40. So this patch is incompatible with CONFIG_STACK_PROTECTOR. -- Brian Gerst
On Tue, Jul 18, 2017 at 8:08 PM, Brian Gerst <brgerst@gmail.com> wrote: > On Tue, Jul 18, 2017 at 6:33 PM, Thomas Garnier <thgarnie@google.com> wrote: >> Perpcu uses a clever design where the .percu ELF section has a virtual >> address of zero and the relocation code avoid relocating specific >> symbols. It makes the code simple and easily adaptable with or without >> SMP support. >> >> This design is incompatible with PIE because generated code always try to >> access the zero virtual address relative to the default mapping address. >> It becomes impossible when KASLR is configured to go below -2G. This >> patch solves this problem by removing the zero mapping and adapting the GS >> base to be relative to the expected address. These changes are done only >> when PIE is enabled. The original implementation is kept as-is >> by default. > > The reason the per-cpu section is zero-based on x86-64 is to > workaround GCC hardcoding the stack protector canary at %gs:40. So > this patch is incompatible with CONFIG_STACK_PROTECTOR. Ok, that make sense. I don't want this feature to not work with CONFIG_CC_STACKPROTECTOR*. One way to fix that would be adding a GDT entry for gs so gs:40 points to the correct memory address and gs:[rip+XX] works correctly through the MSR. Given the separate discussion on mcmodel, I am going first to check if we can move from PIE to PIC with a mcmodel=small or medium that would remove the percpu change requirement. I tried before without success but I understand better percpu and other components so maybe I can make it work. Thanks a lot for the feedback. > > -- > Brian Gerst
On 07/19/17 11:26, Thomas Garnier wrote: > On Tue, Jul 18, 2017 at 8:08 PM, Brian Gerst <brgerst@gmail.com> wrote: >> On Tue, Jul 18, 2017 at 6:33 PM, Thomas Garnier <thgarnie@google.com> wrote: >>> Perpcu uses a clever design where the .percu ELF section has a virtual >>> address of zero and the relocation code avoid relocating specific >>> symbols. It makes the code simple and easily adaptable with or without >>> SMP support. >>> >>> This design is incompatible with PIE because generated code always try to >>> access the zero virtual address relative to the default mapping address. >>> It becomes impossible when KASLR is configured to go below -2G. This >>> patch solves this problem by removing the zero mapping and adapting the GS >>> base to be relative to the expected address. These changes are done only >>> when PIE is enabled. The original implementation is kept as-is >>> by default. >> >> The reason the per-cpu section is zero-based on x86-64 is to >> workaround GCC hardcoding the stack protector canary at %gs:40. So >> this patch is incompatible with CONFIG_STACK_PROTECTOR. > > Ok, that make sense. I don't want this feature to not work with > CONFIG_CC_STACKPROTECTOR*. One way to fix that would be adding a GDT > entry for gs so gs:40 points to the correct memory address and > gs:[rip+XX] works correctly through the MSR. What are you talking about? A GDT entry and the MSR do the same thing, except that a GDT entry is limited to an offset of 0-0xffffffff (which doesn't work for us, obviously.) > Given the separate > discussion on mcmodel, I am going first to check if we can move from > PIE to PIC with a mcmodel=small or medium that would remove the percpu > change requirement. I tried before without success but I understand > better percpu and other components so maybe I can make it work. >> This is silly. The right thing is for PIE is to be explicitly absolute, >> without (%rip). The use of (%rip) memory references for percpu is just >> an optimization. > > I agree that it is odd but that's how the compiler generates code. I > will re-explore PIC options with mcmodel=small or medium, as mentioned > on other threads. Why should the way compiler generates code affect the way we do things in assembly? That being said, the compiler now has support for generating this kind of code explicitly via the __seg_gs pointer modifier. That should let us drop the __percpu_prefix and just use variables directly. I suspect we want to declare percpu variables as "volatile __seg_gs" to account for the possibility of CPU switches. Older compilers won't be able to work with this, of course, but I think that it is acceptable for those older compilers to not be able to support PIE. -hpa
On 07/19/17 16:33, H. Peter Anvin wrote: >> >> I agree that it is odd but that's how the compiler generates code. I >> will re-explore PIC options with mcmodel=small or medium, as mentioned >> on other threads. > > Why should the way compiler generates code affect the way we do things > in assembly? > > That being said, the compiler now has support for generating this kind > of code explicitly via the __seg_gs pointer modifier. That should let > us drop the __percpu_prefix and just use variables directly. I suspect > we want to declare percpu variables as "volatile __seg_gs" to account > for the possibility of CPU switches. > > Older compilers won't be able to work with this, of course, but I think > that it is acceptable for those older compilers to not be able to > support PIE. > Grump. It turns out that the compiler doesn't do the right thing for symbols marked with the __seg_[fg]s markers. __thread does the right thing, but __thread a) has %fs: hard-coded, still, and b) I believe can still cache %seg:0 arbitrarily long. -hpa
On 07/19/17 19:21, H. Peter Anvin wrote: > On 07/19/17 16:33, H. Peter Anvin wrote: >>> >>> I agree that it is odd but that's how the compiler generates code. I >>> will re-explore PIC options with mcmodel=small or medium, as mentioned >>> on other threads. >> >> Why should the way compiler generates code affect the way we do things >> in assembly? >> >> That being said, the compiler now has support for generating this kind >> of code explicitly via the __seg_gs pointer modifier. That should let >> us drop the __percpu_prefix and just use variables directly. I suspect >> we want to declare percpu variables as "volatile __seg_gs" to account >> for the possibility of CPU switches. >> >> Older compilers won't be able to work with this, of course, but I think >> that it is acceptable for those older compilers to not be able to >> support PIE. >> > > Grump. It turns out that the compiler doesn't do the right thing for > symbols marked with the __seg_[fg]s markers. __thread does the right > thing, but __thread a) has %fs: hard-coded, still, and b) I believe can > still cache %seg:0 arbitrarily long. I filed this bug report for gcc: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81490 It might still be possible to work around this by playing really ugly games with __thread, but I haven't yet figured out how best to do that. -hpa
On Wed, Jul 19, 2017 at 4:33 PM, H. Peter Anvin <hpa@zytor.com> wrote: > On 07/19/17 11:26, Thomas Garnier wrote: >> On Tue, Jul 18, 2017 at 8:08 PM, Brian Gerst <brgerst@gmail.com> wrote: >>> On Tue, Jul 18, 2017 at 6:33 PM, Thomas Garnier <thgarnie@google.com> wrote: >>>> Perpcu uses a clever design where the .percu ELF section has a virtual >>>> address of zero and the relocation code avoid relocating specific >>>> symbols. It makes the code simple and easily adaptable with or without >>>> SMP support. >>>> >>>> This design is incompatible with PIE because generated code always try to >>>> access the zero virtual address relative to the default mapping address. >>>> It becomes impossible when KASLR is configured to go below -2G. This >>>> patch solves this problem by removing the zero mapping and adapting the GS >>>> base to be relative to the expected address. These changes are done only >>>> when PIE is enabled. The original implementation is kept as-is >>>> by default. >>> >>> The reason the per-cpu section is zero-based on x86-64 is to >>> workaround GCC hardcoding the stack protector canary at %gs:40. So >>> this patch is incompatible with CONFIG_STACK_PROTECTOR. >> >> Ok, that make sense. I don't want this feature to not work with >> CONFIG_CC_STACKPROTECTOR*. One way to fix that would be adding a GDT >> entry for gs so gs:40 points to the correct memory address and >> gs:[rip+XX] works correctly through the MSR. > > What are you talking about? A GDT entry and the MSR do the same thing, > except that a GDT entry is limited to an offset of 0-0xffffffff (which > doesn't work for us, obviously.) > A GDT entry would allow gs:0x40 to be valid while all gs:[rip+XX] addresses uses the MSR. I didn't tested it but that was used on the RFG mitigation [1]. The fs segment register was used for both thread storage and shadow stack. [1] http://xlab.tencent.com/en/2016/11/02/return-flow-guard/ >> Given the separate >> discussion on mcmodel, I am going first to check if we can move from >> PIE to PIC with a mcmodel=small or medium that would remove the percpu >> change requirement. I tried before without success but I understand >> better percpu and other components so maybe I can make it work. > >>> This is silly. The right thing is for PIE is to be explicitly absolute, >>> without (%rip). The use of (%rip) memory references for percpu is just >>> an optimization. >> >> I agree that it is odd but that's how the compiler generates code. I >> will re-explore PIC options with mcmodel=small or medium, as mentioned >> on other threads. > > Why should the way compiler generates code affect the way we do things > in assembly? > > That being said, the compiler now has support for generating this kind > of code explicitly via the __seg_gs pointer modifier. That should let > us drop the __percpu_prefix and just use variables directly. I suspect > we want to declare percpu variables as "volatile __seg_gs" to account > for the possibility of CPU switches. > > Older compilers won't be able to work with this, of course, but I think > that it is acceptable for those older compilers to not be able to > support PIE. > > -hpa >
On Thu, Jul 20, 2017 at 7:26 AM, Thomas Garnier <thgarnie@google.com> wrote: > On Wed, Jul 19, 2017 at 4:33 PM, H. Peter Anvin <hpa@zytor.com> wrote: >> On 07/19/17 11:26, Thomas Garnier wrote: >>> On Tue, Jul 18, 2017 at 8:08 PM, Brian Gerst <brgerst@gmail.com> wrote: >>>> On Tue, Jul 18, 2017 at 6:33 PM, Thomas Garnier <thgarnie@google.com> wrote: >>>>> Perpcu uses a clever design where the .percu ELF section has a virtual >>>>> address of zero and the relocation code avoid relocating specific >>>>> symbols. It makes the code simple and easily adaptable with or without >>>>> SMP support. >>>>> >>>>> This design is incompatible with PIE because generated code always try to >>>>> access the zero virtual address relative to the default mapping address. >>>>> It becomes impossible when KASLR is configured to go below -2G. This >>>>> patch solves this problem by removing the zero mapping and adapting the GS >>>>> base to be relative to the expected address. These changes are done only >>>>> when PIE is enabled. The original implementation is kept as-is >>>>> by default. >>>> >>>> The reason the per-cpu section is zero-based on x86-64 is to >>>> workaround GCC hardcoding the stack protector canary at %gs:40. So >>>> this patch is incompatible with CONFIG_STACK_PROTECTOR. >>> >>> Ok, that make sense. I don't want this feature to not work with >>> CONFIG_CC_STACKPROTECTOR*. One way to fix that would be adding a GDT >>> entry for gs so gs:40 points to the correct memory address and >>> gs:[rip+XX] works correctly through the MSR. >> >> What are you talking about? A GDT entry and the MSR do the same thing, >> except that a GDT entry is limited to an offset of 0-0xffffffff (which >> doesn't work for us, obviously.) >> > > A GDT entry would allow gs:0x40 to be valid while all gs:[rip+XX] > addresses uses the MSR. > > I didn't tested it but that was used on the RFG mitigation [1]. The fs > segment register was used for both thread storage and shadow stack. > > [1] http://xlab.tencent.com/en/2016/11/02/return-flow-guard/ > Small update on that. I noticed that not only we have the problem of gs:0x40 not being accessible. The compiler will default to the fs register if mcmodel=kernel is not set. On the next patch set, I am going to add support for -mstack-protector-guard=global so a global variable can be used instead of the segment register. Similar approach than ARM/ARM64. Following this patch, I will work with gcc and llvm to add -mstack-protector-reg=<segment register> support similar to PowerPC. This way we can have gs used even without mcmodel=kernel. Once that's an option, I can setup the GDT as described in the previous email (similar to RFG). Let me know what you think about this approach. >>> Given the separate >>> discussion on mcmodel, I am going first to check if we can move from >>> PIE to PIC with a mcmodel=small or medium that would remove the percpu >>> change requirement. I tried before without success but I understand >>> better percpu and other components so maybe I can make it work. >> >>>> This is silly. The right thing is for PIE is to be explicitly absolute, >>>> without (%rip). The use of (%rip) memory references for percpu is just >>>> an optimization. >>> >>> I agree that it is odd but that's how the compiler generates code. I >>> will re-explore PIC options with mcmodel=small or medium, as mentioned >>> on other threads. >> >> Why should the way compiler generates code affect the way we do things >> in assembly? >> >> That being said, the compiler now has support for generating this kind >> of code explicitly via the __seg_gs pointer modifier. That should let >> us drop the __percpu_prefix and just use variables directly. I suspect >> we want to declare percpu variables as "volatile __seg_gs" to account >> for the possibility of CPU switches. >> >> Older compilers won't be able to work with this, of course, but I think >> that it is acceptable for those older compilers to not be able to >> support PIE. >> >> -hpa >> > > > > -- > Thomas
On Wed, Aug 2, 2017 at 9:42 AM, Thomas Garnier <thgarnie@google.com> wrote: > I noticed that not only we have the problem of gs:0x40 not being > accessible. The compiler will default to the fs register if > mcmodel=kernel is not set. > > On the next patch set, I am going to add support for > -mstack-protector-guard=global so a global variable can be used > instead of the segment register. Similar approach than ARM/ARM64. While this is probably understood, I have to point out that this would be a major regression for the stack protection on x86. > Following this patch, I will work with gcc and llvm to add > -mstack-protector-reg=<segment register> support similar to PowerPC. > This way we can have gs used even without mcmodel=kernel. Once that's > an option, I can setup the GDT as described in the previous email > (similar to RFG). It would be much nicer if we could teach gcc about the percpu area instead. This would let us solve the global stack protector problem on the other architectures: http://www.openwall.com/lists/kernel-hardening/2017/06/27/6 -Kees
On Wed, Aug 2, 2017 at 9:56 AM, Kees Cook <keescook@chromium.org> wrote: > On Wed, Aug 2, 2017 at 9:42 AM, Thomas Garnier <thgarnie@google.com> wrote: >> I noticed that not only we have the problem of gs:0x40 not being >> accessible. The compiler will default to the fs register if >> mcmodel=kernel is not set. >> >> On the next patch set, I am going to add support for >> -mstack-protector-guard=global so a global variable can be used >> instead of the segment register. Similar approach than ARM/ARM64. > > While this is probably understood, I have to point out that this would > be a major regression for the stack protection on x86. I agree, the optimal solution will be using updated gcc/clang. > >> Following this patch, I will work with gcc and llvm to add >> -mstack-protector-reg=<segment register> support similar to PowerPC. >> This way we can have gs used even without mcmodel=kernel. Once that's >> an option, I can setup the GDT as described in the previous email >> (similar to RFG). > > It would be much nicer if we could teach gcc about the percpu area > instead. This would let us solve the global stack protector problem on > the other architectures: > http://www.openwall.com/lists/kernel-hardening/2017/06/27/6 Yes, while I am looking at gcc I will take a look at other architecture to see if I can help there too. > > -Kees > > -- > Kees Cook > Pixel Security
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 691c4755269b..be198c0a2a8c 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -388,7 +388,7 @@ ENTRY(__switch_to_asm) #ifdef CONFIG_CC_STACKPROTECTOR movq TASK_stack_canary(%rsi), %rbx - movq %rbx, PER_CPU_VAR(irq_stack_union)+stack_canary_offset + movq %rbx, PER_CPU_VAR(irq_stack_union + stack_canary_offset) #endif /* restore callee-saved registers */ @@ -739,7 +739,7 @@ apicinterrupt IRQ_WORK_VECTOR irq_work_interrupt smp_irq_work_interrupt /* * Exception entry points. */ -#define CPU_TSS_IST(x) PER_CPU_VAR(cpu_tss) + (TSS_ist + ((x) - 1) * 8) +#define CPU_TSS_IST(x) PER_CPU_VAR(cpu_tss + (TSS_ist + ((x) - 1) * 8)) .macro idtentry sym do_sym has_error_code:req paranoid=0 shift_ist=-1 ENTRY(\sym) diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h index 9fa03604b2b3..862eb771f0e5 100644 --- a/arch/x86/include/asm/percpu.h +++ b/arch/x86/include/asm/percpu.h @@ -4,9 +4,11 @@ #ifdef CONFIG_X86_64 #define __percpu_seg gs #define __percpu_mov_op movq +#define __percpu_rel (%rip) #else #define __percpu_seg fs #define __percpu_mov_op movl +#define __percpu_rel #endif #ifdef __ASSEMBLY__ @@ -27,10 +29,14 @@ #define PER_CPU(var, reg) \ __percpu_mov_op %__percpu_seg:this_cpu_off, reg; \ lea var(reg), reg -#define PER_CPU_VAR(var) %__percpu_seg:var +/* Compatible with Position Independent Code */ +#define PER_CPU_VAR(var) %__percpu_seg:(var)##__percpu_rel +/* Rare absolute reference */ +#define PER_CPU_VAR_ABS(var) %__percpu_seg:var #else /* ! SMP */ #define PER_CPU(var, reg) __percpu_mov_op $var, reg -#define PER_CPU_VAR(var) var +#define PER_CPU_VAR(var) (var)##__percpu_rel +#define PER_CPU_VAR_ABS(var) var #endif /* SMP */ #ifdef CONFIG_X86_64_SMP @@ -208,27 +214,34 @@ do { \ pfo_ret__; \ }) +/* Position Independent code uses relative addresses only */ +#ifdef CONFIG_X86_PIE +#define __percpu_stable_arg __percpu_arg(a1) +#else +#define __percpu_stable_arg __percpu_arg(P1) +#endif + #define percpu_stable_op(op, var) \ ({ \ typeof(var) pfo_ret__; \ switch (sizeof(var)) { \ case 1: \ - asm(op "b "__percpu_arg(P1)",%0" \ + asm(op "b "__percpu_stable_arg ",%0" \ : "=q" (pfo_ret__) \ : "p" (&(var))); \ break; \ case 2: \ - asm(op "w "__percpu_arg(P1)",%0" \ + asm(op "w "__percpu_stable_arg ",%0" \ : "=r" (pfo_ret__) \ : "p" (&(var))); \ break; \ case 4: \ - asm(op "l "__percpu_arg(P1)",%0" \ + asm(op "l "__percpu_stable_arg ",%0" \ : "=r" (pfo_ret__) \ : "p" (&(var))); \ break; \ case 8: \ - asm(op "q "__percpu_arg(P1)",%0" \ + asm(op "q "__percpu_stable_arg ",%0" \ : "=r" (pfo_ret__) \ : "p" (&(var))); \ break; \ diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index b95cd94ca97b..31300767ec0f 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -480,7 +480,9 @@ void load_percpu_segment(int cpu) loadsegment(fs, __KERNEL_PERCPU); #else __loadsegment_simple(gs, 0); - wrmsrl(MSR_GS_BASE, (unsigned long)per_cpu(irq_stack_union.gs_base, cpu)); + wrmsrl(MSR_GS_BASE, + (unsigned long)per_cpu(irq_stack_union.gs_base, cpu) - + (unsigned long)__per_cpu_start); #endif load_stack_canary_segment(); } diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S index 7e4f7a83a15a..4d0a7e68bfe8 100644 --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -256,7 +256,11 @@ ENDPROC(start_cpu0) GLOBAL(initial_code) .quad x86_64_start_kernel GLOBAL(initial_gs) +#ifdef CONFIG_X86_PIE + .quad 0 +#else .quad INIT_PER_CPU_VAR(irq_stack_union) +#endif GLOBAL(initial_stack) /* * The SIZEOF_PTREGS gap is a convention which helps the in-kernel diff --git a/arch/x86/kernel/setup_percpu.c b/arch/x86/kernel/setup_percpu.c index 10edd1e69a68..ce1c58a29def 100644 --- a/arch/x86/kernel/setup_percpu.c +++ b/arch/x86/kernel/setup_percpu.c @@ -25,7 +25,7 @@ DEFINE_PER_CPU_READ_MOSTLY(int, cpu_number); EXPORT_PER_CPU_SYMBOL(cpu_number); -#ifdef CONFIG_X86_64 +#if defined(CONFIG_X86_64) && !defined(CONFIG_X86_PIE) #define BOOT_PERCPU_OFFSET ((unsigned long)__per_cpu_load) #else #define BOOT_PERCPU_OFFSET 0 diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S index c8a3b61be0aa..77f1b0622539 100644 --- a/arch/x86/kernel/vmlinux.lds.S +++ b/arch/x86/kernel/vmlinux.lds.S @@ -183,9 +183,14 @@ SECTIONS /* * percpu offsets are zero-based on SMP. PERCPU_VADDR() changes the * output PHDR, so the next output section - .init.text - should - * start another segment - init. + * start another segment - init. For Position Independent Code, the + * per-cpu section cannot be zero-based because everything is relative. */ +#ifdef CONFIG_X86_PIE + PERCPU_SECTION(INTERNODE_CACHE_BYTES) +#else PERCPU_VADDR(INTERNODE_CACHE_BYTES, 0, :percpu) +#endif ASSERT(SIZEOF(.data..percpu) < CONFIG_PHYSICAL_START, "per-CPU data too large - increase CONFIG_PHYSICAL_START") #endif @@ -361,7 +366,11 @@ SECTIONS * Per-cpu symbols which need to be offset from __per_cpu_load * for the boot processor. */ +#ifdef CONFIG_X86_PIE +#define INIT_PER_CPU(x) init_per_cpu__##x = x +#else #define INIT_PER_CPU(x) init_per_cpu__##x = x + __per_cpu_load +#endif INIT_PER_CPU(gdt_page); INIT_PER_CPU(irq_stack_union); @@ -371,7 +380,7 @@ INIT_PER_CPU(irq_stack_union); . = ASSERT((_end - _text <= KERNEL_IMAGE_SIZE), "kernel image bigger than KERNEL_IMAGE_SIZE"); -#ifdef CONFIG_SMP +#if defined(CONFIG_SMP) && !defined(CONFIG_X86_PIE) . = ASSERT((irq_stack_union == 0), "irq_stack_union is not at start of per-cpu area"); #endif diff --git a/arch/x86/lib/cmpxchg16b_emu.S b/arch/x86/lib/cmpxchg16b_emu.S index 9b330242e740..254950604ae4 100644 --- a/arch/x86/lib/cmpxchg16b_emu.S +++ b/arch/x86/lib/cmpxchg16b_emu.S @@ -33,13 +33,13 @@ ENTRY(this_cpu_cmpxchg16b_emu) pushfq cli - cmpq PER_CPU_VAR((%rsi)), %rax + cmpq PER_CPU_VAR_ABS((%rsi)), %rax jne .Lnot_same - cmpq PER_CPU_VAR(8(%rsi)), %rdx + cmpq PER_CPU_VAR_ABS(8(%rsi)), %rdx jne .Lnot_same - movq %rbx, PER_CPU_VAR((%rsi)) - movq %rcx, PER_CPU_VAR(8(%rsi)) + movq %rbx, PER_CPU_VAR_ABS((%rsi)) + movq %rcx, PER_CPU_VAR_ABS(8(%rsi)) popfq mov $1, %al diff --git a/arch/x86/xen/xen-asm.S b/arch/x86/xen/xen-asm.S index eff224df813f..40410969fd3c 100644 --- a/arch/x86/xen/xen-asm.S +++ b/arch/x86/xen/xen-asm.S @@ -26,7 +26,7 @@ ENTRY(xen_irq_enable_direct) FRAME_BEGIN /* Unmask events */ - movb $0, PER_CPU_VAR(xen_vcpu_info) + XEN_vcpu_info_mask + movb $0, PER_CPU_VAR(xen_vcpu_info + XEN_vcpu_info_mask) /* * Preempt here doesn't matter because that will deal with any @@ -35,7 +35,7 @@ ENTRY(xen_irq_enable_direct) */ /* Test for pending */ - testb $0xff, PER_CPU_VAR(xen_vcpu_info) + XEN_vcpu_info_pending + testb $0xff, PER_CPU_VAR(xen_vcpu_info + XEN_vcpu_info_pending) jz 1f 2: call check_events @@ -52,7 +52,7 @@ ENDPATCH(xen_irq_enable_direct) * non-zero. */ ENTRY(xen_irq_disable_direct) - movb $1, PER_CPU_VAR(xen_vcpu_info) + XEN_vcpu_info_mask + movb $1, PER_CPU_VAR(xen_vcpu_info + XEN_vcpu_info_mask) ENDPATCH(xen_irq_disable_direct) ret ENDPROC(xen_irq_disable_direct) @@ -68,7 +68,7 @@ ENDPATCH(xen_irq_disable_direct) * x86 use opposite senses (mask vs enable). */ ENTRY(xen_save_fl_direct) - testb $0xff, PER_CPU_VAR(xen_vcpu_info) + XEN_vcpu_info_mask + testb $0xff, PER_CPU_VAR(xen_vcpu_info + XEN_vcpu_info_mask) setz %ah addb %ah, %ah ENDPATCH(xen_save_fl_direct) @@ -91,7 +91,7 @@ ENTRY(xen_restore_fl_direct) #else testb $X86_EFLAGS_IF>>8, %ah #endif - setz PER_CPU_VAR(xen_vcpu_info) + XEN_vcpu_info_mask + setz PER_CPU_VAR(xen_vcpu_info + XEN_vcpu_info_mask) /* * Preempt here doesn't matter because that will deal with any * pending interrupts. The pending check may end up being run @@ -99,7 +99,7 @@ ENTRY(xen_restore_fl_direct) */ /* check for unmasked and pending */ - cmpw $0x0001, PER_CPU_VAR(xen_vcpu_info) + XEN_vcpu_info_pending + cmpw $0x0001, PER_CPU_VAR(xen_vcpu_info + XEN_vcpu_info_pending) jnz 1f 2: call check_events 1: diff --git a/init/Kconfig b/init/Kconfig index 8514b25db21c..4fb5d6fc2c4f 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1201,7 +1201,7 @@ config KALLSYMS_ALL config KALLSYMS_ABSOLUTE_PERCPU bool depends on KALLSYMS - default X86_64 && SMP + default X86_64 && SMP && !X86_PIE config KALLSYMS_BASE_RELATIVE bool
Perpcu uses a clever design where the .percu ELF section has a virtual address of zero and the relocation code avoid relocating specific symbols. It makes the code simple and easily adaptable with or without SMP support. This design is incompatible with PIE because generated code always try to access the zero virtual address relative to the default mapping address. It becomes impossible when KASLR is configured to go below -2G. This patch solves this problem by removing the zero mapping and adapting the GS base to be relative to the expected address. These changes are done only when PIE is enabled. The original implementation is kept as-is by default. The assembly and PER_CPU macros are changed to use relative references when PIE is enabled. The KALLSYMS_ABSOLUTE_PERCPU configuration is disabled with PIE given percpu symbols are not absolute in this case. Position Independent Executable (PIE) support will allow to extended the KASLR randomization range below the -2G memory limit. Signed-off-by: Thomas Garnier <thgarnie@google.com> --- arch/x86/entry/entry_64.S | 4 ++-- arch/x86/include/asm/percpu.h | 25 +++++++++++++++++++------ arch/x86/kernel/cpu/common.c | 4 +++- arch/x86/kernel/head_64.S | 4 ++++ arch/x86/kernel/setup_percpu.c | 2 +- arch/x86/kernel/vmlinux.lds.S | 13 +++++++++++-- arch/x86/lib/cmpxchg16b_emu.S | 8 ++++---- arch/x86/xen/xen-asm.S | 12 ++++++------ init/Kconfig | 2 +- 9 files changed, 51 insertions(+), 23 deletions(-)