mbox series

[v2,0/4] Introduce %rip-relative addressing to PER_CPU_VAR macro

Message ID 20231012201743.292149-1-ubizjak@gmail.com (mailing list archive)
Headers show
Series Introduce %rip-relative addressing to PER_CPU_VAR macro | expand

Message

Uros Bizjak Oct. 12, 2023, 8:12 p.m. UTC
The following patch series introduces %rip-relative addressing to
PER_CPU_VAR macro. Instruction with %rip-relative address operand is
one byte shorter than its absolute address counterpart and is also
compatible with position independent executable (-fpie) build.

The first three patches are cleanups that fix various inconsistencies
throughout the assembly code.

The last patch introduces (%rip) suffix and uses it for x86_64 target,
resulting in a small code size decrease:

   text    data     bss     dec     hex filename
25510677        4386685  808388 30705750        1d48856 vmlinux-new.o
25510629        4386685  808388 30705702        1d48826 vmlinux-old.o

Patch series is against current mainline and can be applied independently
of ongoing percpu work.

v2: Introduce PER_CPU_ARG macro to conditionally enable
    segment registers in cmpxchg{8,16}b_emu.S for CONFIG_SMP.

Cc: Juergen Gross <jgross@suse.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>

Uros Bizjak (4):
  x86/percpu: Introduce PER_CPU_ARG and use it in cmpxchg{8,16}b_emu.S
  x86/percpu: Correct PER_CPU_VAR usage to include symbol and its addend
  x86/percpu, xen: Correct PER_CPU_VAR usage to include symbol and its
    addend
  x86/percpu: Introduce %rip-relative addressing to PER_CPU_VAR macro

 arch/x86/entry/calling.h      |  2 +-
 arch/x86/entry/entry_32.S     |  2 +-
 arch/x86/entry/entry_64.S     |  2 +-
 arch/x86/include/asm/percpu.h | 10 +++++++---
 arch/x86/kernel/head_64.S     |  2 +-
 arch/x86/lib/cmpxchg16b_emu.S | 12 ++++++------
 arch/x86/lib/cmpxchg8b_emu.S  | 24 ++++++++++++++++++------
 arch/x86/xen/xen-asm.S        | 10 +++++-----
 8 files changed, 40 insertions(+), 24 deletions(-)

Comments

Dave Hansen Oct. 12, 2023, 8:53 p.m. UTC | #1
On 10/12/23 13:12, Uros Bizjak wrote:
> The last patch introduces (%rip) suffix and uses it for x86_64 target,
> resulting in a small code size decrease: text data bss dec hex filename
> 25510677 4386685 808388 30705750 1d48856 vmlinux-new.o 25510629 4386685
> 808388 30705702 1d48826 vmlinux-old.o

I feel like I'm missing some of the motivation here.

50 bytes is great and all, but it isn't without the cost of changing
some rules and introducing potential PER_CPU_ARG() vs. PER_CPU_VAR()
confusion.

Are there some other side benefits?  What else does this enable?
Uros Bizjak Oct. 12, 2023, 8:59 p.m. UTC | #2
On Thu, Oct 12, 2023 at 10:53 PM Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 10/12/23 13:12, Uros Bizjak wrote:
> > The last patch introduces (%rip) suffix and uses it for x86_64 target,
> > resulting in a small code size decrease: text data bss dec hex filename
> > 25510677 4386685 808388 30705750 1d48856 vmlinux-new.o 25510629 4386685
> > 808388 30705702 1d48826 vmlinux-old.o
>
> I feel like I'm missing some of the motivation here.
>
> 50 bytes is great and all, but it isn't without the cost of changing
> some rules and introducing potential PER_CPU_ARG() vs. PER_CPU_VAR()
> confusion.
>
> Are there some other side benefits?  What else does this enable?

These changes are necessary to build the kernel as Position
Independent Executable (PIE) on x86_64 [1]. And since I was working in
percpu area I thought that it was worth implementing them.

[1] https://lore.kernel.org/lkml/cover.1682673542.git.houwenlong.hwl@antgroup.com/

Uros.
H. Peter Anvin Oct. 12, 2023, 9:08 p.m. UTC | #3
On 10/12/23 13:59, Uros Bizjak wrote:
> On Thu, Oct 12, 2023 at 10:53 PM Dave Hansen <dave.hansen@intel.com> wrote:
>>
>> On 10/12/23 13:12, Uros Bizjak wrote:
>>> The last patch introduces (%rip) suffix and uses it for x86_64 target,
>>> resulting in a small code size decrease: text data bss dec hex filename
>>> 25510677 4386685 808388 30705750 1d48856 vmlinux-new.o 25510629 4386685
>>> 808388 30705702 1d48826 vmlinux-old.o
>>
>> I feel like I'm missing some of the motivation here.
>>
>> 50 bytes is great and all, but it isn't without the cost of changing
>> some rules and introducing potential PER_CPU_ARG() vs. PER_CPU_VAR()
>> confusion.
>>
>> Are there some other side benefits?  What else does this enable?
> 
> These changes are necessary to build the kernel as Position
> Independent Executable (PIE) on x86_64 [1]. And since I was working in
> percpu area I thought that it was worth implementing them.
> 
> [1] https://lore.kernel.org/lkml/cover.1682673542.git.houwenlong.hwl@antgroup.com/
> 

Are you PIC-adjusting the percpu variables as well?

	-hpa
Uros Bizjak Oct. 12, 2023, 9:17 p.m. UTC | #4
On Thu, Oct 12, 2023 at 11:08 PM H. Peter Anvin <hpa@zytor.com> wrote:
>
> On 10/12/23 13:59, Uros Bizjak wrote:
> > On Thu, Oct 12, 2023 at 10:53 PM Dave Hansen <dave.hansen@intel.com> wrote:
> >>
> >> On 10/12/23 13:12, Uros Bizjak wrote:
> >>> The last patch introduces (%rip) suffix and uses it for x86_64 target,
> >>> resulting in a small code size decrease: text data bss dec hex filename
> >>> 25510677 4386685 808388 30705750 1d48856 vmlinux-new.o 25510629 4386685
> >>> 808388 30705702 1d48826 vmlinux-old.o
> >>
> >> I feel like I'm missing some of the motivation here.
> >>
> >> 50 bytes is great and all, but it isn't without the cost of changing
> >> some rules and introducing potential PER_CPU_ARG() vs. PER_CPU_VAR()
> >> confusion.
> >>
> >> Are there some other side benefits?  What else does this enable?
> >
> > These changes are necessary to build the kernel as Position
> > Independent Executable (PIE) on x86_64 [1]. And since I was working in
> > percpu area I thought that it was worth implementing them.
> >
> > [1] https://lore.kernel.org/lkml/cover.1682673542.git.houwenlong.hwl@antgroup.com/
> >
>
> Are you PIC-adjusting the percpu variables as well?

After this patch (and after fixing percpu_stable_op to use "a" operand
modifier on GCC), the only *one* remaining absolute reference to
percpu variable remain in xen-head.S, where:

    movq    $INIT_PER_CPU_VAR(fixed_percpu_data),%rax

should be changed to use leaq.

All others should then be (%rip)-relative.

Uros.
H. Peter Anvin Oct. 12, 2023, 9:21 p.m. UTC | #5
On 10/12/23 14:17, Uros Bizjak wrote:
>>
>> Are you PIC-adjusting the percpu variables as well?
> 
> After this patch (and after fixing percpu_stable_op to use "a" operand
> modifier on GCC), the only *one* remaining absolute reference to
> percpu variable remain in xen-head.S, where:
> 
>      movq    $INIT_PER_CPU_VAR(fixed_percpu_data),%rax
> 
> should be changed to use leaq.
> 
> All others should then be (%rip)-relative.
> 

I mean, the symbols themselves are relative, not absolute?

	-hpa
Uros Bizjak Oct. 12, 2023, 10:44 p.m. UTC | #6
On Thu, Oct 12, 2023 at 11:22 PM H. Peter Anvin <hpa@zytor.com> wrote:
>
> On 10/12/23 14:17, Uros Bizjak wrote:
> >>
> >> Are you PIC-adjusting the percpu variables as well?
> >
> > After this patch (and after fixing percpu_stable_op to use "a" operand
> > modifier on GCC), the only *one* remaining absolute reference to
> > percpu variable remain in xen-head.S, where:
> >
> >      movq    $INIT_PER_CPU_VAR(fixed_percpu_data),%rax
> >
> > should be changed to use leaq.
> >
> > All others should then be (%rip)-relative.
> >
>
> I mean, the symbols themselves are relative, not absolute?

The reference to the symbol is relative to the segment register, but
absolute to the location of the instruction. If the executable changes
location, then instruction moves around  and reference is not valid
anymore. (%rip)-relative reference compensate for changed location of
the instruction.

Uros.