mbox series

[RFC,-next,v2,0/4] arm64/ftrace: support dynamic trampoline

Message ID 20220316100132.244849-1-bobo.shaobowang@huawei.com (mailing list archive)
Headers show
Series arm64/ftrace: support dynamic trampoline | expand

Message

Wangshaobo (bobo) March 16, 2022, 10:01 a.m. UTC
This implements dynamic trampoline in ARM64, as reference said, we
complete whole design of supporting long jump in dynamic trampoline:

   .text section:
     funcA:         |    funcA():        funcB():|
      `->  +-----+  |    |   ...         mov x9  |
           | ... |  |    |   adrp   <-   bl  <>  |
           | nop |  |    |   mov
           | nop |  |    |   br   x16 ---+
     funcB | nop |  |                    | ftrace_(regs_)caller_tramp:
      `->  +-----+  |                    `--> +---------------------+
           | nop |  |                         | ...                 |
           | nop |  |       ftrace callsite   +---------------------+
           | ... |  |                `----->  | PLT entry:          |
           | nop |  |                         |       adrp          |
           | nop |  |                         |       add           |
    funcC: | nop |  | ftrace graph callsite   |       br   x16      |
      `->  +-----+  |                `----->  +---------------------+
           | nop |  |                         | ...                 |
           | nop |  |                         +---------------------+

But there is still a tricky problem that is how to adjust tracing ip,
waiting to be solved:

For ARM64, somecases there may be extra instructions inserted into the
head of tracable functions(but not all) by compiler, for instance BTI[1].

This dump vmlinux with CONFIG_BTI=y:

(1) function gic_handle_irq has bti in its head, so we adjust rec->ip+=5 to last nop
    ffffffc0080100e0:       d53cd042        mrs     x2, tpidr_el2
            ...
    ffffffc0080100f0:       d503201f        nop     //__mcount_loc tells the rec->ip
    ffffffc0080100f4:       d503201f        nop
    ffffffc0080100f8:       d503201f        nop

    ffffffc0080100fc <gic_handle_irq>:
    ffffffc0080100fc:       d503245f        bti     c
    ffffffc008010100:       d503201f        nop
    ffffffc008010104:       d503201f        nop     //we adjust origin rec->ip+5 to here
    ffffffc008010108:       d503233f        paciasp
(2) name_to_dev_t.part.0 do not has bti in its head, so we should adjust rec->ip+=4 to last nop
    ffff8000080137d4:       d503201f        nop
    ffff8000080137d8:       d503201f        nop
    ffff8000080137dc:       d503201f        nop
    
    ffff8000080137e0 <name_to_dev_t.part.0>:
    ffff8000080137e0:       d503201f        nop
    ffff8000080137e4:       d503201f        nop
    ffff8000080137e8:       d503233f        paciasp

So at this time we have no idea to identify rec->ip for each tracable function.

we are looking forward to follow-up discussions.

References:
[1] https://developer.arm.com/documentation/100076/0100/a64-instruction-set-reference/a64-general-instructions/bti
[2] https://lore.kernel.org/linux-arm-kernel/20200109142736.1122-1-cj.chengjian@huawei.com/

Cheng Jian (4):
  arm64: introduce aarch64_insn_gen_load_literal
  arm64/ftrace: introduce ftrace dynamic trampoline entrances
  arm64/ftrace: support dynamically allocated trampolines
  arm64/ftrace: implement long jump for dynamic trampolines

 arch/arm64/Makefile              |   2 +-
 arch/arm64/include/asm/ftrace.h  |  10 +-
 arch/arm64/include/asm/insn.h    |   6 +
 arch/arm64/include/asm/module.h  |   9 +
 arch/arm64/kernel/entry-ftrace.S |  88 ++++++--
 arch/arm64/kernel/ftrace.c       | 366 ++++++++++++++++++++++++++++---
 arch/arm64/kernel/module-plts.c  |  50 +++++
 arch/arm64/lib/insn.c            |  49 +++++
 8 files changed, 532 insertions(+), 48 deletions(-)

Comments

Steven Rostedt March 16, 2022, 2:29 p.m. UTC | #1
On Wed, 16 Mar 2022 18:01:28 +0800
Wang ShaoBo <bobo.shaobowang@huawei.com> wrote:

> This implements dynamic trampoline in ARM64, as reference said, we
> complete whole design of supporting long jump in dynamic trampoline:
> 
>    .text section:
>      funcA:         |    funcA():        funcB():|
>       `->  +-----+  |    |   ...         mov x9  |
>            | ... |  |    |   adrp   <-   bl  <>  |
>            | nop |  |    |   mov
>            | nop |  |    |   br   x16 ---+
>      funcB | nop |  |                    | ftrace_(regs_)caller_tramp:
>       `->  +-----+  |                    `--> +---------------------+
>            | nop |  |                         | ...                 |
>            | nop |  |       ftrace callsite   +---------------------+
>            | ... |  |                `----->  | PLT entry:          |
>            | nop |  |                         |       adrp          |
>            | nop |  |                         |       add           |
>     funcC: | nop |  | ftrace graph callsite   |       br   x16      |
>       `->  +-----+  |                `----->  +---------------------+
>            | nop |  |                         | ...                 |
>            | nop |  |                         +---------------------+
> 
> But there is still a tricky problem that is how to adjust tracing ip,
> waiting to be solved:
> 
> For ARM64, somecases there may be extra instructions inserted into the
> head of tracable functions(but not all) by compiler, for instance BTI[1].
> 
> This dump vmlinux with CONFIG_BTI=y:
> 
> (1) function gic_handle_irq has bti in its head, so we adjust rec->ip+=5 to last nop
>     ffffffc0080100e0:       d53cd042        mrs     x2, tpidr_el2
>             ...
>     ffffffc0080100f0:       d503201f        nop     //__mcount_loc tells the rec->ip
>     ffffffc0080100f4:       d503201f        nop
>     ffffffc0080100f8:       d503201f        nop
> 
>     ffffffc0080100fc <gic_handle_irq>:
>     ffffffc0080100fc:       d503245f        bti     c
>     ffffffc008010100:       d503201f        nop
>     ffffffc008010104:       d503201f        nop     //we adjust origin rec->ip+5 to here
>     ffffffc008010108:       d503233f        paciasp
> (2) name_to_dev_t.part.0 do not has bti in its head, so we should adjust rec->ip+=4 to last nop
>     ffff8000080137d4:       d503201f        nop
>     ffff8000080137d8:       d503201f        nop
>     ffff8000080137dc:       d503201f        nop
>     
>     ffff8000080137e0 <name_to_dev_t.part.0>:
>     ffff8000080137e0:       d503201f        nop
>     ffff8000080137e4:       d503201f        nop
>     ffff8000080137e8:       d503233f        paciasp
> 
> So at this time we have no idea to identify rec->ip for each tracable function.

This looks like the same issue that Peter Zijlstra is handling for IBT on
x86, which I think can be useful for you too.

  https://lore.kernel.org/all/20220308153011.021123062@infradead.org/

Specifically this patch:

  https://lore.kernel.org/all/20220308154318.227581603@infradead.org/

Which modifies the ftrace_location() to return the rec->ip if you pass in
the ip of the function and kallsyms returns that the ip passed in has an
offset of zero.

-- Steve

> 
> we are looking forward to follow-up discussions.
> 
> References:
> [1] https://developer.arm.com/documentation/100076/0100/a64-instruction-set-reference/a64-general-instructions/bti
> [2] https://lore.kernel.org/linux-arm-kernel/20200109142736.1122-1-cj.chengjian@huawei.com/
> 
> Cheng Jian (4):
>   arm64: introduce aarch64_insn_gen_load_literal
>   arm64/ftrace: introduce ftrace dynamic trampoline entrances
>   arm64/ftrace: support dynamically allocated trampolines
>   arm64/ftrace: implement long jump for dynamic trampolines
> 
>  arch/arm64/Makefile              |   2 +-
>  arch/arm64/include/asm/ftrace.h  |  10 +-
>  arch/arm64/include/asm/insn.h    |   6 +
>  arch/arm64/include/asm/module.h  |   9 +
>  arch/arm64/kernel/entry-ftrace.S |  88 ++++++--
>  arch/arm64/kernel/ftrace.c       | 366 ++++++++++++++++++++++++++++---
>  arch/arm64/kernel/module-plts.c  |  50 +++++
>  arch/arm64/lib/insn.c            |  49 +++++
>  8 files changed, 532 insertions(+), 48 deletions(-)
>
Steven Rostedt April 20, 2022, 6:11 p.m. UTC | #2
Is this going anywhere?

-- Steve


On Wed, 16 Mar 2022 18:01:28 +0800
Wang ShaoBo <bobo.shaobowang@huawei.com> wrote:

> This implements dynamic trampoline in ARM64, as reference said, we
> complete whole design of supporting long jump in dynamic trampoline:
> 
>    .text section:
>      funcA:         |    funcA():        funcB():|
>       `->  +-----+  |    |   ...         mov x9  |
>            | ... |  |    |   adrp   <-   bl  <>  |
>            | nop |  |    |   mov
>            | nop |  |    |   br   x16 ---+
>      funcB | nop |  |                    | ftrace_(regs_)caller_tramp:
>       `->  +-----+  |                    `--> +---------------------+
>            | nop |  |                         | ...                 |
>            | nop |  |       ftrace callsite   +---------------------+
>            | ... |  |                `----->  | PLT entry:          |
>            | nop |  |                         |       adrp          |
>            | nop |  |                         |       add           |
>     funcC: | nop |  | ftrace graph callsite   |       br   x16      |
>       `->  +-----+  |                `----->  +---------------------+
>            | nop |  |                         | ...                 |
>            | nop |  |                         +---------------------+
> 
> But there is still a tricky problem that is how to adjust tracing ip,
> waiting to be solved:
> 
> For ARM64, somecases there may be extra instructions inserted into the
> head of tracable functions(but not all) by compiler, for instance BTI[1].
> 
> This dump vmlinux with CONFIG_BTI=y:
> 
> (1) function gic_handle_irq has bti in its head, so we adjust rec->ip+=5 to last nop
>     ffffffc0080100e0:       d53cd042        mrs     x2, tpidr_el2
>             ...
>     ffffffc0080100f0:       d503201f        nop     //__mcount_loc tells the rec->ip
>     ffffffc0080100f4:       d503201f        nop
>     ffffffc0080100f8:       d503201f        nop
> 
>     ffffffc0080100fc <gic_handle_irq>:
>     ffffffc0080100fc:       d503245f        bti     c
>     ffffffc008010100:       d503201f        nop
>     ffffffc008010104:       d503201f        nop     //we adjust origin rec->ip+5 to here
>     ffffffc008010108:       d503233f        paciasp
> (2) name_to_dev_t.part.0 do not has bti in its head, so we should adjust rec->ip+=4 to last nop
>     ffff8000080137d4:       d503201f        nop
>     ffff8000080137d8:       d503201f        nop
>     ffff8000080137dc:       d503201f        nop
>     
>     ffff8000080137e0 <name_to_dev_t.part.0>:
>     ffff8000080137e0:       d503201f        nop
>     ffff8000080137e4:       d503201f        nop
>     ffff8000080137e8:       d503233f        paciasp
> 
> So at this time we have no idea to identify rec->ip for each tracable function.
> 
> we are looking forward to follow-up discussions.
> 
> References:
> [1] https://developer.arm.com/documentation/100076/0100/a64-instruction-set-reference/a64-general-instructions/bti
> [2] https://lore.kernel.org/linux-arm-kernel/20200109142736.1122-1-cj.chengjian@huawei.com/
> 
> Cheng Jian (4):
>   arm64: introduce aarch64_insn_gen_load_literal
>   arm64/ftrace: introduce ftrace dynamic trampoline entrances
>   arm64/ftrace: support dynamically allocated trampolines
>   arm64/ftrace: implement long jump for dynamic trampolines
> 
>  arch/arm64/Makefile              |   2 +-
>  arch/arm64/include/asm/ftrace.h  |  10 +-
>  arch/arm64/include/asm/insn.h    |   6 +
>  arch/arm64/include/asm/module.h  |   9 +
>  arch/arm64/kernel/entry-ftrace.S |  88 ++++++--
>  arch/arm64/kernel/ftrace.c       | 366 ++++++++++++++++++++++++++++---
>  arch/arm64/kernel/module-plts.c  |  50 +++++
>  arch/arm64/lib/insn.c            |  49 +++++
>  8 files changed, 532 insertions(+), 48 deletions(-)
>
Wangshaobo (bobo) April 21, 2022, 1:13 a.m. UTC | #3
在 2022/4/21 2:11, Steven Rostedt 写道:
> Is this going anywhere?
>
> -- Steve

Not yet, Steve, ftrace_location() looks has no help to find a right 
rec->ip in our case,

ftrace_location() can find a right rec->ip when input ip is in the range 
between

sym+0 and sym+$end, but our question is how to  identify rec->ip from 
__mcount_loc,

this changed the patchable entry before bti to after in gcc:

    [1] https://reviews.llvm.org/D73680

gcc tells the place of first nop of the 5 NOPs when using 
-fpatchable-function-entry=5,3,

but not tells the first nop after bti, so we don't know how to adjust 
our rec->ip for ftrace.

>
>
> On Wed, 16 Mar 2022 18:01:28 +0800
> Wang ShaoBo <bobo.shaobowang@huawei.com> wrote:
>
>> This implements dynamic trampoline in ARM64, as reference said, we
>> complete whole design of supporting long jump in dynamic trampoline:
>>
>>     .text section:
>>       funcA:         |    funcA():        funcB():|
>>        `->  +-----+  |    |   ...         mov x9  |
>>             | ... |  |    |   adrp   <-   bl  <>  |
>>             | nop |  |    |   mov
>>             | nop |  |    |   br   x16 ---+
>>       funcB | nop |  |                    | ftrace_(regs_)caller_tramp:
>>        `->  +-----+  |                    `--> +---------------------+
>>             | nop |  |                         | ...                 |
>>             | nop |  |       ftrace callsite   +---------------------+
>>             | ... |  |                `----->  | PLT entry:          |
>>             | nop |  |                         |       adrp          |
>>             | nop |  |                         |       add           |
>>      funcC: | nop |  | ftrace graph callsite   |       br   x16      |
>>        `->  +-----+  |                `----->  +---------------------+
>>             | nop |  |                         | ...                 |
>>             | nop |  |                         +---------------------+
>>
>> But there is still a tricky problem that is how to adjust tracing ip,
>> waiting to be solved:
>>
>> For ARM64, somecases there may be extra instructions inserted into the
>> head of tracable functions(but not all) by compiler, for instance BTI[1].
>>
>> This dump vmlinux with CONFIG_BTI=y:
>>
>> (1) function gic_handle_irq has bti in its head, so we adjust rec->ip+=5 to last nop
>>      ffffffc0080100e0:       d53cd042        mrs     x2, tpidr_el2
>>              ...
>>      ffffffc0080100f0:       d503201f        nop     //__mcount_loc tells the rec->ip
>>      ffffffc0080100f4:       d503201f        nop
>>      ffffffc0080100f8:       d503201f        nop
>>
>>      ffffffc0080100fc <gic_handle_irq>:
>>      ffffffc0080100fc:       d503245f        bti     c
>>      ffffffc008010100:       d503201f        nop
>>      ffffffc008010104:       d503201f        nop     //we adjust origin rec->ip+5 to here
>>      ffffffc008010108:       d503233f        paciasp
>> (2) name_to_dev_t.part.0 do not has bti in its head, so we should adjust rec->ip+=4 to last nop
>>      ffff8000080137d4:       d503201f        nop
>>      ffff8000080137d8:       d503201f        nop
>>      ffff8000080137dc:       d503201f        nop
>>      
>>      ffff8000080137e0 <name_to_dev_t.part.0>:
>>      ffff8000080137e0:       d503201f        nop
>>      ffff8000080137e4:       d503201f        nop
>>      ffff8000080137e8:       d503233f        paciasp
>>
>> So at this time we have no idea to identify rec->ip for each tracable function.
>>
>> we are looking forward to follow-up discussions.
>>
>> References:
>> [1] https://developer.arm.com/documentation/100076/0100/a64-instruction-set-reference/a64-general-instructions/bti
>> [2] https://lore.kernel.org/linux-arm-kernel/20200109142736.1122-1-cj.chengjian@huawei.com/
>>
>> Cheng Jian (4):
>>    arm64: introduce aarch64_insn_gen_load_literal
>>    arm64/ftrace: introduce ftrace dynamic trampoline entrances
>>    arm64/ftrace: support dynamically allocated trampolines
>>    arm64/ftrace: implement long jump for dynamic trampolines
>>
>>   arch/arm64/Makefile              |   2 +-
>>   arch/arm64/include/asm/ftrace.h  |  10 +-
>>   arch/arm64/include/asm/insn.h    |   6 +
>>   arch/arm64/include/asm/module.h  |   9 +
>>   arch/arm64/kernel/entry-ftrace.S |  88 ++++++--
>>   arch/arm64/kernel/ftrace.c       | 366 ++++++++++++++++++++++++++++---
>>   arch/arm64/kernel/module-plts.c  |  50 +++++
>>   arch/arm64/lib/insn.c            |  49 +++++
>>   8 files changed, 532 insertions(+), 48 deletions(-)
>>
> .
Steven Rostedt April 21, 2022, 12:37 p.m. UTC | #4
On Thu, 21 Apr 2022 09:13:01 +0800
"Wangshaobo (bobo)" <bobo.shaobowang@huawei.com> wrote:

> Not yet, Steve, ftrace_location() looks has no help to find a right 
> rec->ip in our case,
> 
> ftrace_location() can find a right rec->ip when input ip is in the range 
> between
> 
> sym+0 and sym+$end, but our question is how to  identify rec->ip from 
> __mcount_loc,

Are you saying that the "ftrace location" is not between sym+0 and sym+$end?

> 
> this changed the patchable entry before bti to after in gcc:
> 
>     [1] https://reviews.llvm.org/D73680
> 
> gcc tells the place of first nop of the 5 NOPs when using 
> -fpatchable-function-entry=5,3,
> 
> but not tells the first nop after bti, so we don't know how to adjust 
> our rec->ip for ftrace.

OK, so I do not understand how the compiler is injecting bti with mcount
calls, so I'll just walk away for now ;-)

-- Steve
Mark Rutland April 21, 2022, 12:53 p.m. UTC | #5
On Wed, Mar 16, 2022 at 06:01:28PM +0800, Wang ShaoBo wrote:
> This implements dynamic trampoline in ARM64, as reference said, we
> complete whole design of supporting long jump in dynamic trampoline:
> 
>    .text section:
>      funcA:         |    funcA():        funcB():|
>       `->  +-----+  |    |   ...         mov x9  |
>            | ... |  |    |   adrp   <-   bl  <>  |
>            | nop |  |    |   mov
>            | nop |  |    |   br   x16 ---+
>      funcB | nop |  |                    | ftrace_(regs_)caller_tramp:
>       `->  +-----+  |                    `--> +---------------------+
>            | nop |  |                         | ...                 |
>            | nop |  |       ftrace callsite   +---------------------+
>            | ... |  |                `----->  | PLT entry:          |
>            | nop |  |                         |       adrp          |
>            | nop |  |                         |       add           |
>     funcC: | nop |  | ftrace graph callsite   |       br   x16      |
>       `->  +-----+  |                `----->  +---------------------+
>            | nop |  |                         | ...                 |
>            | nop |  |                         +---------------------+
> 
> But there is still a tricky problem that is how to adjust tracing ip,
> waiting to be solved:
> 
> For ARM64, somecases there may be extra instructions inserted into the
> head of tracable functions(but not all) by compiler, for instance BTI[1].
> 
> This dump vmlinux with CONFIG_BTI=y:
> 
> (1) function gic_handle_irq has bti in its head, so we adjust rec->ip+=5 to last nop
>     ffffffc0080100e0:       d53cd042        mrs     x2, tpidr_el2
>             ...
>     ffffffc0080100f0:       d503201f        nop     //__mcount_loc tells the rec->ip
>     ffffffc0080100f4:       d503201f        nop
>     ffffffc0080100f8:       d503201f        nop
> 
>     ffffffc0080100fc <gic_handle_irq>:
>     ffffffc0080100fc:       d503245f        bti     c
>     ffffffc008010100:       d503201f        nop
>     ffffffc008010104:       d503201f        nop     //we adjust origin rec->ip+5 to here
>     ffffffc008010108:       d503233f        paciasp
> (2) name_to_dev_t.part.0 do not has bti in its head, so we should adjust rec->ip+=4 to last nop
>     ffff8000080137d4:       d503201f        nop
>     ffff8000080137d8:       d503201f        nop
>     ffff8000080137dc:       d503201f        nop
>     
>     ffff8000080137e0 <name_to_dev_t.part.0>:
>     ffff8000080137e0:       d503201f        nop
>     ffff8000080137e4:       d503201f        nop
>     ffff8000080137e8:       d503233f        paciasp
> 
> So at this time we have no idea to identify rec->ip for each tracable function.

When I had looked into this in the past, I had assumed we could figure
this out at ftrace_init_nop() time, and adjust rec->ip there.

However, placing code *before* the function entry point will also break
stacktracing and will require adjustment, and I'd been intending to
clean up the stacktrace code first, so I haven't looked at that in a
while.

I'll take a look at the rest of the series shortly.

Thanks,
Mark.

> 
> we are looking forward to follow-up discussions.
> 
> References:
> [1] https://developer.arm.com/documentation/100076/0100/a64-instruction-set-reference/a64-general-instructions/bti
> [2] https://lore.kernel.org/linux-arm-kernel/20200109142736.1122-1-cj.chengjian@huawei.com/
> 
> Cheng Jian (4):
>   arm64: introduce aarch64_insn_gen_load_literal
>   arm64/ftrace: introduce ftrace dynamic trampoline entrances
>   arm64/ftrace: support dynamically allocated trampolines
>   arm64/ftrace: implement long jump for dynamic trampolines
> 
>  arch/arm64/Makefile              |   2 +-
>  arch/arm64/include/asm/ftrace.h  |  10 +-
>  arch/arm64/include/asm/insn.h    |   6 +
>  arch/arm64/include/asm/module.h  |   9 +
>  arch/arm64/kernel/entry-ftrace.S |  88 ++++++--
>  arch/arm64/kernel/ftrace.c       | 366 ++++++++++++++++++++++++++++---
>  arch/arm64/kernel/module-plts.c  |  50 +++++
>  arch/arm64/lib/insn.c            |  49 +++++
>  8 files changed, 532 insertions(+), 48 deletions(-)
> 
> -- 
> 2.25.1
>
Mark Rutland May 25, 2022, 12:45 p.m. UTC | #6
On Thu, Apr 21, 2022 at 08:37:58AM -0400, Steven Rostedt wrote:
> On Thu, 21 Apr 2022 09:13:01 +0800
> "Wangshaobo (bobo)" <bobo.shaobowang@huawei.com> wrote:
> 
> > Not yet, Steve, ftrace_location() looks has no help to find a right 
> > rec->ip in our case,
> > 
> > ftrace_location() can find a right rec->ip when input ip is in the range 
> > between
> > 
> > sym+0 and sym+$end, but our question is how to  identify rec->ip from 
> > __mcount_loc,
> 
> Are you saying that the "ftrace location" is not between sym+0 and sym+$end?

IIUC yes -- this series as-is moves the call to the trampoline *before* sym+0.

Among other things that completely wrecks backtracing, so I'd *really* like to
avoid that (hance my suggested alternative).

> > this changed the patchable entry before bti to after in gcc:
> > 
> >     [1] https://reviews.llvm.org/D73680
> > 
> > gcc tells the place of first nop of the 5 NOPs when using 
> > -fpatchable-function-entry=5,3,
> > 
> > but not tells the first nop after bti, so we don't know how to adjust 
> > our rec->ip for ftrace.
> 
> OK, so I do not understand how the compiler is injecting bti with mcount
> calls, so I'll just walk away for now ;-)

When using BTI, the compiler has to drop a BTI *at* the function entry point
(i.e. sym+0) for any function that can be called indirectly, but can omit this
when the function is only directly called (which is the case for most functions
created via insterprocedural specialization, or for a number of static
functions).

Today, when we pass:

	-fpatchable-function-entry=2

... the compiler places 2 NOPs *after* any BTI, and records the location of the
first NOP. So the two cases we get are:

	__func_without_bti:
		NOP		<--- recorded location
		NOP

	__func_with_bti:
		BTI
		NOP		<--- recorded location
		NOP

... which works just fine, since either sym+0 or sym+4 are reasonable
locations for the patch-site to live.

However, if we were to pass:

	-fpatchable-function-entry=5,3

... the compiler places 3 NOPs *before* any BTI, and 2 NOPs *after* any BTI,
still recording the location of the first NOP. So in the two cases we get:

		NOP		<--- recorded location
		NOP
		NOP
	__func_without_bti:
		NOP
		NOP

		NOP		<--- recorded location
		NOP
		NOP
	__func_with_bti:
		BTI
		NOP
		NOP

... so where we want to patch one of the later nops to banch to a pre-function
NOP, we need to know whether or not the compiler generated a BTI. We can
discover discover that either by:

* Checking whether the recorded location is at sym+0 (no BTI) or sym+4 (BTI).

* Reading the instruction before the recorded location, and seeing if this is a
  BTI.

... and depending on how we handle thigns the two cases *might* need different
trampolines.

Thanks,
Mark.
Steven Rostedt May 25, 2022, 1:58 p.m. UTC | #7
On Wed, 25 May 2022 13:45:13 +0100
Mark Rutland <mark.rutland@arm.com> wrote:

> ... the compiler places 3 NOPs *before* any BTI, and 2 NOPs *after* any BTI,
> still recording the location of the first NOP. So in the two cases we get:
> 
> 		NOP		<--- recorded location
> 		NOP
> 		NOP
> 	__func_without_bti:
> 		NOP
> 		NOP
> 
> 		NOP		<--- recorded location
> 		NOP
> 		NOP
> 	__func_with_bti:
> 		BTI
> 		NOP
> 		NOP

Are you saying that the above "recorded location" is what we have in
mcount_loc section? If that's the case, we will need to modify it to point
to something that kallsyms will recognize (ie. sym+0 or greater). Because
that will cause set_ftrace_filter to fail as well.

-- Steve


> 
> ... so where we want to patch one of the later nops to banch to a pre-function
> NOP, we need to know whether or not the compiler generated a BTI. We can
> discover discover that either by:
> 
> * Checking whether the recorded location is at sym+0 (no BTI) or sym+4 (BTI).
> 
> * Reading the instruction before the recorded location, and seeing if this is a
>   BTI.
> 
> ... and depending on how we handle thigns the two cases *might* need different
> trampolines.
Mark Rutland May 25, 2022, 5:26 p.m. UTC | #8
On Wed, May 25, 2022 at 09:58:45AM -0400, Steven Rostedt wrote:
> On Wed, 25 May 2022 13:45:13 +0100
> Mark Rutland <mark.rutland@arm.com> wrote:
> 
> > ... the compiler places 3 NOPs *before* any BTI, and 2 NOPs *after* any BTI,
> > still recording the location of the first NOP. So in the two cases we get:
> > 
> > 		NOP		<--- recorded location
> > 		NOP
> > 		NOP
> > 	__func_without_bti:
> > 		NOP
> > 		NOP
> > 
> > 		NOP		<--- recorded location
> > 		NOP
> > 		NOP
> > 	__func_with_bti:
> > 		BTI
> > 		NOP
> > 		NOP
> 
> Are you saying that the above "recorded location" is what we have in
> mcount_loc section?

Yes; I'm saying that with this series, the compiler would record that into the
mcount_loc section.

Note that's not necessarily what goes into rec->ip, which we can adjust at
initialization time to be within the function. We'd need to record the
presence/absence of the BTI somewhere (I guess in dyn_arch_ftrace).

> If that's the case, we will need to modify it to point to something that
> kallsyms will recognize (ie. sym+0 or greater). Because that will cause
> set_ftrace_filter to fail as well.

Yup, understood. Like I mentioned it also wrecks the unwinder and would make it
really hard to implement RELIABLE_STACKTRACE.

Just to be clear, I don't think we should follow this specific approach. I just
wrote the examples to clarify what was being proposed.

Thanks,
Mark.