Message ID | 20220316100132.244849-1-bobo.shaobowang@huawei.com (mailing list archive) |
---|---|
Headers | show |
Series | arm64/ftrace: support dynamic trampoline | expand |
On Wed, 16 Mar 2022 18:01:28 +0800 Wang ShaoBo <bobo.shaobowang@huawei.com> wrote: > This implements dynamic trampoline in ARM64, as reference said, we > complete whole design of supporting long jump in dynamic trampoline: > > .text section: > funcA: | funcA(): funcB():| > `-> +-----+ | | ... mov x9 | > | ... | | | adrp <- bl <> | > | nop | | | mov > | nop | | | br x16 ---+ > funcB | nop | | | ftrace_(regs_)caller_tramp: > `-> +-----+ | `--> +---------------------+ > | nop | | | ... | > | nop | | ftrace callsite +---------------------+ > | ... | | `-----> | PLT entry: | > | nop | | | adrp | > | nop | | | add | > funcC: | nop | | ftrace graph callsite | br x16 | > `-> +-----+ | `-----> +---------------------+ > | nop | | | ... | > | nop | | +---------------------+ > > But there is still a tricky problem that is how to adjust tracing ip, > waiting to be solved: > > For ARM64, somecases there may be extra instructions inserted into the > head of tracable functions(but not all) by compiler, for instance BTI[1]. > > This dump vmlinux with CONFIG_BTI=y: > > (1) function gic_handle_irq has bti in its head, so we adjust rec->ip+=5 to last nop > ffffffc0080100e0: d53cd042 mrs x2, tpidr_el2 > ... > ffffffc0080100f0: d503201f nop //__mcount_loc tells the rec->ip > ffffffc0080100f4: d503201f nop > ffffffc0080100f8: d503201f nop > > ffffffc0080100fc <gic_handle_irq>: > ffffffc0080100fc: d503245f bti c > ffffffc008010100: d503201f nop > ffffffc008010104: d503201f nop //we adjust origin rec->ip+5 to here > ffffffc008010108: d503233f paciasp > (2) name_to_dev_t.part.0 do not has bti in its head, so we should adjust rec->ip+=4 to last nop > ffff8000080137d4: d503201f nop > ffff8000080137d8: d503201f nop > ffff8000080137dc: d503201f nop > > ffff8000080137e0 <name_to_dev_t.part.0>: > ffff8000080137e0: d503201f nop > ffff8000080137e4: d503201f nop > ffff8000080137e8: d503233f paciasp > > So at this time we have no idea to identify rec->ip for each tracable function. This looks like the same issue that Peter Zijlstra is handling for IBT on x86, which I think can be useful for you too. https://lore.kernel.org/all/20220308153011.021123062@infradead.org/ Specifically this patch: https://lore.kernel.org/all/20220308154318.227581603@infradead.org/ Which modifies the ftrace_location() to return the rec->ip if you pass in the ip of the function and kallsyms returns that the ip passed in has an offset of zero. -- Steve > > we are looking forward to follow-up discussions. > > References: > [1] https://developer.arm.com/documentation/100076/0100/a64-instruction-set-reference/a64-general-instructions/bti > [2] https://lore.kernel.org/linux-arm-kernel/20200109142736.1122-1-cj.chengjian@huawei.com/ > > Cheng Jian (4): > arm64: introduce aarch64_insn_gen_load_literal > arm64/ftrace: introduce ftrace dynamic trampoline entrances > arm64/ftrace: support dynamically allocated trampolines > arm64/ftrace: implement long jump for dynamic trampolines > > arch/arm64/Makefile | 2 +- > arch/arm64/include/asm/ftrace.h | 10 +- > arch/arm64/include/asm/insn.h | 6 + > arch/arm64/include/asm/module.h | 9 + > arch/arm64/kernel/entry-ftrace.S | 88 ++++++-- > arch/arm64/kernel/ftrace.c | 366 ++++++++++++++++++++++++++++--- > arch/arm64/kernel/module-plts.c | 50 +++++ > arch/arm64/lib/insn.c | 49 +++++ > 8 files changed, 532 insertions(+), 48 deletions(-) >
Is this going anywhere? -- Steve On Wed, 16 Mar 2022 18:01:28 +0800 Wang ShaoBo <bobo.shaobowang@huawei.com> wrote: > This implements dynamic trampoline in ARM64, as reference said, we > complete whole design of supporting long jump in dynamic trampoline: > > .text section: > funcA: | funcA(): funcB():| > `-> +-----+ | | ... mov x9 | > | ... | | | adrp <- bl <> | > | nop | | | mov > | nop | | | br x16 ---+ > funcB | nop | | | ftrace_(regs_)caller_tramp: > `-> +-----+ | `--> +---------------------+ > | nop | | | ... | > | nop | | ftrace callsite +---------------------+ > | ... | | `-----> | PLT entry: | > | nop | | | adrp | > | nop | | | add | > funcC: | nop | | ftrace graph callsite | br x16 | > `-> +-----+ | `-----> +---------------------+ > | nop | | | ... | > | nop | | +---------------------+ > > But there is still a tricky problem that is how to adjust tracing ip, > waiting to be solved: > > For ARM64, somecases there may be extra instructions inserted into the > head of tracable functions(but not all) by compiler, for instance BTI[1]. > > This dump vmlinux with CONFIG_BTI=y: > > (1) function gic_handle_irq has bti in its head, so we adjust rec->ip+=5 to last nop > ffffffc0080100e0: d53cd042 mrs x2, tpidr_el2 > ... > ffffffc0080100f0: d503201f nop //__mcount_loc tells the rec->ip > ffffffc0080100f4: d503201f nop > ffffffc0080100f8: d503201f nop > > ffffffc0080100fc <gic_handle_irq>: > ffffffc0080100fc: d503245f bti c > ffffffc008010100: d503201f nop > ffffffc008010104: d503201f nop //we adjust origin rec->ip+5 to here > ffffffc008010108: d503233f paciasp > (2) name_to_dev_t.part.0 do not has bti in its head, so we should adjust rec->ip+=4 to last nop > ffff8000080137d4: d503201f nop > ffff8000080137d8: d503201f nop > ffff8000080137dc: d503201f nop > > ffff8000080137e0 <name_to_dev_t.part.0>: > ffff8000080137e0: d503201f nop > ffff8000080137e4: d503201f nop > ffff8000080137e8: d503233f paciasp > > So at this time we have no idea to identify rec->ip for each tracable function. > > we are looking forward to follow-up discussions. > > References: > [1] https://developer.arm.com/documentation/100076/0100/a64-instruction-set-reference/a64-general-instructions/bti > [2] https://lore.kernel.org/linux-arm-kernel/20200109142736.1122-1-cj.chengjian@huawei.com/ > > Cheng Jian (4): > arm64: introduce aarch64_insn_gen_load_literal > arm64/ftrace: introduce ftrace dynamic trampoline entrances > arm64/ftrace: support dynamically allocated trampolines > arm64/ftrace: implement long jump for dynamic trampolines > > arch/arm64/Makefile | 2 +- > arch/arm64/include/asm/ftrace.h | 10 +- > arch/arm64/include/asm/insn.h | 6 + > arch/arm64/include/asm/module.h | 9 + > arch/arm64/kernel/entry-ftrace.S | 88 ++++++-- > arch/arm64/kernel/ftrace.c | 366 ++++++++++++++++++++++++++++--- > arch/arm64/kernel/module-plts.c | 50 +++++ > arch/arm64/lib/insn.c | 49 +++++ > 8 files changed, 532 insertions(+), 48 deletions(-) >
在 2022/4/21 2:11, Steven Rostedt 写道: > Is this going anywhere? > > -- Steve Not yet, Steve, ftrace_location() looks has no help to find a right rec->ip in our case, ftrace_location() can find a right rec->ip when input ip is in the range between sym+0 and sym+$end, but our question is how to identify rec->ip from __mcount_loc, this changed the patchable entry before bti to after in gcc: [1] https://reviews.llvm.org/D73680 gcc tells the place of first nop of the 5 NOPs when using -fpatchable-function-entry=5,3, but not tells the first nop after bti, so we don't know how to adjust our rec->ip for ftrace. > > > On Wed, 16 Mar 2022 18:01:28 +0800 > Wang ShaoBo <bobo.shaobowang@huawei.com> wrote: > >> This implements dynamic trampoline in ARM64, as reference said, we >> complete whole design of supporting long jump in dynamic trampoline: >> >> .text section: >> funcA: | funcA(): funcB():| >> `-> +-----+ | | ... mov x9 | >> | ... | | | adrp <- bl <> | >> | nop | | | mov >> | nop | | | br x16 ---+ >> funcB | nop | | | ftrace_(regs_)caller_tramp: >> `-> +-----+ | `--> +---------------------+ >> | nop | | | ... | >> | nop | | ftrace callsite +---------------------+ >> | ... | | `-----> | PLT entry: | >> | nop | | | adrp | >> | nop | | | add | >> funcC: | nop | | ftrace graph callsite | br x16 | >> `-> +-----+ | `-----> +---------------------+ >> | nop | | | ... | >> | nop | | +---------------------+ >> >> But there is still a tricky problem that is how to adjust tracing ip, >> waiting to be solved: >> >> For ARM64, somecases there may be extra instructions inserted into the >> head of tracable functions(but not all) by compiler, for instance BTI[1]. >> >> This dump vmlinux with CONFIG_BTI=y: >> >> (1) function gic_handle_irq has bti in its head, so we adjust rec->ip+=5 to last nop >> ffffffc0080100e0: d53cd042 mrs x2, tpidr_el2 >> ... >> ffffffc0080100f0: d503201f nop //__mcount_loc tells the rec->ip >> ffffffc0080100f4: d503201f nop >> ffffffc0080100f8: d503201f nop >> >> ffffffc0080100fc <gic_handle_irq>: >> ffffffc0080100fc: d503245f bti c >> ffffffc008010100: d503201f nop >> ffffffc008010104: d503201f nop //we adjust origin rec->ip+5 to here >> ffffffc008010108: d503233f paciasp >> (2) name_to_dev_t.part.0 do not has bti in its head, so we should adjust rec->ip+=4 to last nop >> ffff8000080137d4: d503201f nop >> ffff8000080137d8: d503201f nop >> ffff8000080137dc: d503201f nop >> >> ffff8000080137e0 <name_to_dev_t.part.0>: >> ffff8000080137e0: d503201f nop >> ffff8000080137e4: d503201f nop >> ffff8000080137e8: d503233f paciasp >> >> So at this time we have no idea to identify rec->ip for each tracable function. >> >> we are looking forward to follow-up discussions. >> >> References: >> [1] https://developer.arm.com/documentation/100076/0100/a64-instruction-set-reference/a64-general-instructions/bti >> [2] https://lore.kernel.org/linux-arm-kernel/20200109142736.1122-1-cj.chengjian@huawei.com/ >> >> Cheng Jian (4): >> arm64: introduce aarch64_insn_gen_load_literal >> arm64/ftrace: introduce ftrace dynamic trampoline entrances >> arm64/ftrace: support dynamically allocated trampolines >> arm64/ftrace: implement long jump for dynamic trampolines >> >> arch/arm64/Makefile | 2 +- >> arch/arm64/include/asm/ftrace.h | 10 +- >> arch/arm64/include/asm/insn.h | 6 + >> arch/arm64/include/asm/module.h | 9 + >> arch/arm64/kernel/entry-ftrace.S | 88 ++++++-- >> arch/arm64/kernel/ftrace.c | 366 ++++++++++++++++++++++++++++--- >> arch/arm64/kernel/module-plts.c | 50 +++++ >> arch/arm64/lib/insn.c | 49 +++++ >> 8 files changed, 532 insertions(+), 48 deletions(-) >> > .
On Thu, 21 Apr 2022 09:13:01 +0800 "Wangshaobo (bobo)" <bobo.shaobowang@huawei.com> wrote: > Not yet, Steve, ftrace_location() looks has no help to find a right > rec->ip in our case, > > ftrace_location() can find a right rec->ip when input ip is in the range > between > > sym+0 and sym+$end, but our question is how to identify rec->ip from > __mcount_loc, Are you saying that the "ftrace location" is not between sym+0 and sym+$end? > > this changed the patchable entry before bti to after in gcc: > > [1] https://reviews.llvm.org/D73680 > > gcc tells the place of first nop of the 5 NOPs when using > -fpatchable-function-entry=5,3, > > but not tells the first nop after bti, so we don't know how to adjust > our rec->ip for ftrace. OK, so I do not understand how the compiler is injecting bti with mcount calls, so I'll just walk away for now ;-) -- Steve
On Wed, Mar 16, 2022 at 06:01:28PM +0800, Wang ShaoBo wrote: > This implements dynamic trampoline in ARM64, as reference said, we > complete whole design of supporting long jump in dynamic trampoline: > > .text section: > funcA: | funcA(): funcB():| > `-> +-----+ | | ... mov x9 | > | ... | | | adrp <- bl <> | > | nop | | | mov > | nop | | | br x16 ---+ > funcB | nop | | | ftrace_(regs_)caller_tramp: > `-> +-----+ | `--> +---------------------+ > | nop | | | ... | > | nop | | ftrace callsite +---------------------+ > | ... | | `-----> | PLT entry: | > | nop | | | adrp | > | nop | | | add | > funcC: | nop | | ftrace graph callsite | br x16 | > `-> +-----+ | `-----> +---------------------+ > | nop | | | ... | > | nop | | +---------------------+ > > But there is still a tricky problem that is how to adjust tracing ip, > waiting to be solved: > > For ARM64, somecases there may be extra instructions inserted into the > head of tracable functions(but not all) by compiler, for instance BTI[1]. > > This dump vmlinux with CONFIG_BTI=y: > > (1) function gic_handle_irq has bti in its head, so we adjust rec->ip+=5 to last nop > ffffffc0080100e0: d53cd042 mrs x2, tpidr_el2 > ... > ffffffc0080100f0: d503201f nop //__mcount_loc tells the rec->ip > ffffffc0080100f4: d503201f nop > ffffffc0080100f8: d503201f nop > > ffffffc0080100fc <gic_handle_irq>: > ffffffc0080100fc: d503245f bti c > ffffffc008010100: d503201f nop > ffffffc008010104: d503201f nop //we adjust origin rec->ip+5 to here > ffffffc008010108: d503233f paciasp > (2) name_to_dev_t.part.0 do not has bti in its head, so we should adjust rec->ip+=4 to last nop > ffff8000080137d4: d503201f nop > ffff8000080137d8: d503201f nop > ffff8000080137dc: d503201f nop > > ffff8000080137e0 <name_to_dev_t.part.0>: > ffff8000080137e0: d503201f nop > ffff8000080137e4: d503201f nop > ffff8000080137e8: d503233f paciasp > > So at this time we have no idea to identify rec->ip for each tracable function. When I had looked into this in the past, I had assumed we could figure this out at ftrace_init_nop() time, and adjust rec->ip there. However, placing code *before* the function entry point will also break stacktracing and will require adjustment, and I'd been intending to clean up the stacktrace code first, so I haven't looked at that in a while. I'll take a look at the rest of the series shortly. Thanks, Mark. > > we are looking forward to follow-up discussions. > > References: > [1] https://developer.arm.com/documentation/100076/0100/a64-instruction-set-reference/a64-general-instructions/bti > [2] https://lore.kernel.org/linux-arm-kernel/20200109142736.1122-1-cj.chengjian@huawei.com/ > > Cheng Jian (4): > arm64: introduce aarch64_insn_gen_load_literal > arm64/ftrace: introduce ftrace dynamic trampoline entrances > arm64/ftrace: support dynamically allocated trampolines > arm64/ftrace: implement long jump for dynamic trampolines > > arch/arm64/Makefile | 2 +- > arch/arm64/include/asm/ftrace.h | 10 +- > arch/arm64/include/asm/insn.h | 6 + > arch/arm64/include/asm/module.h | 9 + > arch/arm64/kernel/entry-ftrace.S | 88 ++++++-- > arch/arm64/kernel/ftrace.c | 366 ++++++++++++++++++++++++++++--- > arch/arm64/kernel/module-plts.c | 50 +++++ > arch/arm64/lib/insn.c | 49 +++++ > 8 files changed, 532 insertions(+), 48 deletions(-) > > -- > 2.25.1 >
On Thu, Apr 21, 2022 at 08:37:58AM -0400, Steven Rostedt wrote: > On Thu, 21 Apr 2022 09:13:01 +0800 > "Wangshaobo (bobo)" <bobo.shaobowang@huawei.com> wrote: > > > Not yet, Steve, ftrace_location() looks has no help to find a right > > rec->ip in our case, > > > > ftrace_location() can find a right rec->ip when input ip is in the range > > between > > > > sym+0 and sym+$end, but our question is how to identify rec->ip from > > __mcount_loc, > > Are you saying that the "ftrace location" is not between sym+0 and sym+$end? IIUC yes -- this series as-is moves the call to the trampoline *before* sym+0. Among other things that completely wrecks backtracing, so I'd *really* like to avoid that (hance my suggested alternative). > > this changed the patchable entry before bti to after in gcc: > > > > [1] https://reviews.llvm.org/D73680 > > > > gcc tells the place of first nop of the 5 NOPs when using > > -fpatchable-function-entry=5,3, > > > > but not tells the first nop after bti, so we don't know how to adjust > > our rec->ip for ftrace. > > OK, so I do not understand how the compiler is injecting bti with mcount > calls, so I'll just walk away for now ;-) When using BTI, the compiler has to drop a BTI *at* the function entry point (i.e. sym+0) for any function that can be called indirectly, but can omit this when the function is only directly called (which is the case for most functions created via insterprocedural specialization, or for a number of static functions). Today, when we pass: -fpatchable-function-entry=2 ... the compiler places 2 NOPs *after* any BTI, and records the location of the first NOP. So the two cases we get are: __func_without_bti: NOP <--- recorded location NOP __func_with_bti: BTI NOP <--- recorded location NOP ... which works just fine, since either sym+0 or sym+4 are reasonable locations for the patch-site to live. However, if we were to pass: -fpatchable-function-entry=5,3 ... the compiler places 3 NOPs *before* any BTI, and 2 NOPs *after* any BTI, still recording the location of the first NOP. So in the two cases we get: NOP <--- recorded location NOP NOP __func_without_bti: NOP NOP NOP <--- recorded location NOP NOP __func_with_bti: BTI NOP NOP ... so where we want to patch one of the later nops to banch to a pre-function NOP, we need to know whether or not the compiler generated a BTI. We can discover discover that either by: * Checking whether the recorded location is at sym+0 (no BTI) or sym+4 (BTI). * Reading the instruction before the recorded location, and seeing if this is a BTI. ... and depending on how we handle thigns the two cases *might* need different trampolines. Thanks, Mark.
On Wed, 25 May 2022 13:45:13 +0100 Mark Rutland <mark.rutland@arm.com> wrote: > ... the compiler places 3 NOPs *before* any BTI, and 2 NOPs *after* any BTI, > still recording the location of the first NOP. So in the two cases we get: > > NOP <--- recorded location > NOP > NOP > __func_without_bti: > NOP > NOP > > NOP <--- recorded location > NOP > NOP > __func_with_bti: > BTI > NOP > NOP Are you saying that the above "recorded location" is what we have in mcount_loc section? If that's the case, we will need to modify it to point to something that kallsyms will recognize (ie. sym+0 or greater). Because that will cause set_ftrace_filter to fail as well. -- Steve > > ... so where we want to patch one of the later nops to banch to a pre-function > NOP, we need to know whether or not the compiler generated a BTI. We can > discover discover that either by: > > * Checking whether the recorded location is at sym+0 (no BTI) or sym+4 (BTI). > > * Reading the instruction before the recorded location, and seeing if this is a > BTI. > > ... and depending on how we handle thigns the two cases *might* need different > trampolines.
On Wed, May 25, 2022 at 09:58:45AM -0400, Steven Rostedt wrote: > On Wed, 25 May 2022 13:45:13 +0100 > Mark Rutland <mark.rutland@arm.com> wrote: > > > ... the compiler places 3 NOPs *before* any BTI, and 2 NOPs *after* any BTI, > > still recording the location of the first NOP. So in the two cases we get: > > > > NOP <--- recorded location > > NOP > > NOP > > __func_without_bti: > > NOP > > NOP > > > > NOP <--- recorded location > > NOP > > NOP > > __func_with_bti: > > BTI > > NOP > > NOP > > Are you saying that the above "recorded location" is what we have in > mcount_loc section? Yes; I'm saying that with this series, the compiler would record that into the mcount_loc section. Note that's not necessarily what goes into rec->ip, which we can adjust at initialization time to be within the function. We'd need to record the presence/absence of the BTI somewhere (I guess in dyn_arch_ftrace). > If that's the case, we will need to modify it to point to something that > kallsyms will recognize (ie. sym+0 or greater). Because that will cause > set_ftrace_filter to fail as well. Yup, understood. Like I mentioned it also wrecks the unwinder and would make it really hard to implement RELIABLE_STACKTRACE. Just to be clear, I don't think we should follow this specific approach. I just wrote the examples to clarify what was being proposed. Thanks, Mark.