Message ID | 20230627111612.761164-1-suagrfillet@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | riscv: Optimize function trace | expand |
Ping... 在 2023/6/27 19:16, Song Shuai 写道: > Changes in V11: > > - append a patch that makes the DIRECT_CALL samples support RV32I in > this series fixing the rv32 build failure reported by Palmer > > - validated with ftrace boottime selftest and manual sample modules test > in qemu-system for RV32I and RV64I > > This series optimizes function trace. The first 3 independent > patches has been picked in the V7 version of this series, the > subsequent version continues the following 4 patches: > > select FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY [1] (patch 1) > ========================================================== > > In RISC-V, -fpatchable-function-entry option is used to support > dynamic ftrace in this commit afc76b8b8011 ("riscv: Using > PATCHABLE_FUNCTION_ENTRY instead of MCOUNT"). So recordmcount > don't have to be called to create the __mcount_loc section before > the vmlinux linking. > > Here selects FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY to tell > Makefile not to run recordmcount. > > Make function graph use ftrace directly [2] (patch 2) > ======================================================== > > In RISC-V architecture, when we enable the ftrace_graph tracer on some > functions, the function tracings on other functions will suffer extra > graph tracing work. In essence, graph_ops isn't limited by its func_hash > due to the global ftrace_graph_[regs]_call label. That should be > corrected. > > What inspires me is the commit 0c0593b45c9b ("x86/ftrace: Make function > graph use ftrace directly") that uses graph_ops::func function to > install return_hooker and makes the function called against its > func_hash. > > Add WITH_DIRECT_CALLS support [3] (patch 3, 4) > ============================================== > > This series adds DYNAMIC_FTRACE_WITH_DIRECT_CALLS support for RISC-V. > SAMPLE_FTRACE_DIRECT and SAMPLE_FTRACE_DIRECT_MULTI are also included > here as the samples for testing DIRECT_CALLS related interface. > > First, select the DYNAMIC_FTRACE_WITH_DIRECT_CALLS to provide > register_ftrace_direct[_multi] interfaces allowing user to register > the customed trampoline (direct_caller) as the mcount for one or > more target functions. And modify_ftrace_direct[_multi] are also > provided for modify direct_caller. > > At the same time, the samples in ./samples/ftrace/ can be built > as kerenl module for testing these interfaces with SAMPLE_FTRACE_DIRECT > and SAMPLE_FTRACE_DIRECT_MULTI selected. > > Second, to make the direct_caller and the other ftrace hooks > (eg. function/fgraph tracer, k[ret]probes) co-exist, a temporary > register > are nominated to store the address of direct_caller in > ftrace_regs_caller. > After the setting of the address direct_caller by direct_ops->func and > the RESTORE_REGS in ftrace_regs_caller, direct_caller will be jumped to > by the `jr` inst. > > The series's old changes related these patches > ========================================== > > Changes in v10: > https://lore.kernel.org/all/20230511093234.3123181-1-suagrfillet@gmail.com/ > > - add Acked-by from Björn Töpel in patch 2 and patch 4 > - replace `move` with `mv` in patch3 > - prettify patch 2/4 with proper tabs > > Changes in v9: > https://lore.kernel.org/linux-riscv/20230510101857.2953955-1-suagrfillet@gmail.com/ > > 1. add Acked-by from Björn Töpel in patch 1 > > 2. rebase patch2/patch3 on Linux v6.4-rc1 > > - patch 2: to make the `SAVE_ABI_REGS` configurable, revert the > modification of mcount-dyn.S from commit (45b32b946a97 "riscv: > entry: Consolidate general regs saving/restoring") > > - patch 3: to pass the trace_selftest, add the implement of > `ftrace_stub_direct_tramp` from commit (fee86a4ed536 "ftrace: > selftest: remove broken trace_direct_tramp") ; and fixup the context > conflict in Kconfig > > Changes in v8: > https://lore.kernel.org/linux-riscv/20230324033342.3177979-1-suagrfillet@gmail.com/ > - Fix incorrect address values in the 4nd patch > - Rebased on v6.3-rc2 > > Changes in v7: > https://lore.kernel.org/linux-riscv/20230112090603.1295340-1-guoren@kernel.org/ > - Fixup RESTORE_ABI_REGS by remove PT_T0(sp) overwrite. > - Add FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY [1] > - Fixup kconfig with HAVE_SAMPLE_FTRACE_DIRECT & > HAVE_SAMPLE_FTRACE_DIRECT_MULTI > > Changes in v6: > https://lore.kernel.org/linux-riscv/20230107133549.4192639-1-guoren@kernel.org/ > - Replace 8 with MCOUNT_INSN_SIZE > - Replace "REG_L a1, PT_RA(sp)" with "mv a1, ra" > - Add Evgenii Shatokhin comment > > Changes in v5: > https://lore.kernel.org/linux-riscv/20221208091244.203407-1-guoren@kernel.org/ > - Sort Kconfig entries in alphabetical order. > > Changes in v4: > https://lore.kernel.org/linux-riscv/20221129033230.255947-1-guoren@kernel.org/ > - Include [3] for maintenance. [Song Shuai] > > Changes in V3: > https://lore.kernel.org/linux-riscv/20221123153950.2911981-1-guoren@kernel.org/ > - Include [2] for maintenance. [Song Shuai] > > [1]: https://lore.kernel.org/linux-riscv/CAAYs2=j3Eak9vU6xbAw0zPuoh00rh8v5C2U3fePkokZFibWs2g@mail.gmail.com/T/#t > [2]: https://lore.kernel.org/lkml/20221120084230.910152-1-suagrfillet@gmail.com/ > [3]: https://lore.kernel.org/linux-riscv/20221123142025.1504030-1-suagrfillet@gmail.com/ > > Song Shuai (5): > riscv: select FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY > riscv: ftrace: Add ftrace_graph_func > riscv: ftrace: Add DYNAMIC_FTRACE_WITH_DIRECT_CALLS support > samples: ftrace: Add riscv support for SAMPLE_FTRACE_DIRECT[_MULTI] > samples: ftrace: Make the riscv samples support RV32I > > arch/riscv/Kconfig | 4 + > arch/riscv/include/asm/ftrace.h | 19 +- > arch/riscv/kernel/ftrace.c | 30 ++- > arch/riscv/kernel/mcount-dyn.S | 200 ++++++++++++++++---- > samples/ftrace/ftrace-direct-modify.c | 35 ++++ > samples/ftrace/ftrace-direct-multi-modify.c | 41 ++++ > samples/ftrace/ftrace-direct-multi.c | 25 +++ > samples/ftrace/ftrace-direct-too.c | 28 +++ > samples/ftrace/ftrace-direct.c | 24 +++ > 9 files changed, 350 insertions(+), 56 deletions(-) >
On Thu, Jul 06, 2023 at 05:35:49PM +0800, Song Shuai wrote: > Ping... A context-less ping is not very helpful - what are you looking for here? More reviews? For example, someone to look at 5/5? > 在 2023/6/27 19:16, Song Shuai 写道: If it's application you want, you sent the patch only last week - which was during the merge window, making it unlikely to be applied. Either way, please try to explain what it is that you are looking for when you do a ping! Cheers, Conor.
在 2023/7/6 17:53, Conor Dooley 写道: > On Thu, Jul 06, 2023 at 05:35:49PM +0800, Song Shuai wrote: >> Ping... > > A context-less ping is not very helpful - what are you looking for here? > More reviews? For example, someone to look at 5/5? > Sorry for the context-less ping. I hoped someone could look at the 5th patch. >> 在 2023/6/27 19:16, Song Shuai 写道: > > If it's application you want, you sent the patch only last week - which > was during the merge window, making it unlikely to be applied. > > Either way, please try to explain what it is that you are looking for > when you do a ping! Thanks for your correction, I'll follow this thread after the merge window closes. > > Cheers, > Conor. >
Song Shuai <suagrfillet@gmail.com> writes: [...] > Add WITH_DIRECT_CALLS support [3] (patch 3, 4) > ============================================== We've had some offlist discussions, so here's some input for a wider audience! Most importantly, this is for Palmer, so that this series is not merged until a proper BPF trampoline fix is in place. Note that what's currently usable from BPF trampoline *works*. It's when this series is added that it breaks. TL;DR This series adds DYNAMIC_FTRACE_WITH_DIRECT_CALLS, which enables fentry/fexit BPF trampoline support. Unfortunately the fexit/BPF_TRAMP_F_SKIP_FRAME parts of the RV BPF trampoline breaks with this addition, and need to be addressed *prior* merging this series. An easy way to reproduce, is just calling any of the kselftest tests that uses fexit patching. The issue is around the nop seld, and how a call is done; The nop sled (patchable-function-entry) size changed from 16B to 8B in commit 6724a76cff85 ("riscv: ftrace: Reduce the detour code size to half"), but BPF code still uses the old 16B. So it'll work for BPF programs, but not for regular kernel functions. An example: | ffffffff80fa4150 <bpf_fentry_test1>: | ffffffff80fa4150: 0001 nop | ffffffff80fa4152: 0001 nop | ffffffff80fa4154: 0001 nop | ffffffff80fa4156: 0001 nop | ffffffff80fa4158: 1141 add sp,sp,-16 | ffffffff80fa415a: e422 sd s0,8(sp) | ffffffff80fa415c: 0800 add s0,sp,16 | ffffffff80fa415e: 6422 ld s0,8(sp) | ffffffff80fa4160: 2505 addw a0,a0,1 | ffffffff80fa4162: 0141 add sp,sp,16 | ffffffff80fa4164: 8082 ret is patched to: | ffffffff80fa4150: f70c0297 auipc t0,-150208512 | ffffffff80fa4154: eb0282e7 jalr t0,t0,-336 The return address to bpf_fentry_test1 is stored in t0 at BPF trampoline entry. Return to the *parent* is in ra. The trampline has to deal with this. For BPF_TRAMP_F_SKIP_FRAME/CALL_ORIG, the BPF trampoline will skip too many bytes, and not correctly handle parent calls. Further; The BPF trampoline currently has a different way of patching the nops for BPF programs, than what ftrace does. That should be changed to match what ftrace does (auipc/jalr t0). To summarize: * Align BPF nop sled with patchable-function-entry: 8B. * Adapt BPF trampoline for 8B nop sleds. * Adapt BPF trampoline t0 return, ra parent scheme. Cheers, Björn
On Wed, 12 Jul 2023 11:11:08 PDT (-0700), bjorn@kernel.org wrote: > Song Shuai <suagrfillet@gmail.com> writes: > > [...] > >> Add WITH_DIRECT_CALLS support [3] (patch 3, 4) >> ============================================== > > We've had some offlist discussions, so here's some input for a wider > audience! Most importantly, this is for Palmer, so that this series is > not merged until a proper BPF trampoline fix is in place. > > Note that what's currently usable from BPF trampoline *works*. It's > when this series is added that it breaks. > > TL;DR This series adds DYNAMIC_FTRACE_WITH_DIRECT_CALLS, which enables > fentry/fexit BPF trampoline support. Unfortunately the > fexit/BPF_TRAMP_F_SKIP_FRAME parts of the RV BPF trampoline breaks > with this addition, and need to be addressed *prior* merging this > series. An easy way to reproduce, is just calling any of the kselftest > tests that uses fexit patching. > > The issue is around the nop seld, and how a call is done; The nop sled > (patchable-function-entry) size changed from 16B to 8B in commit > 6724a76cff85 ("riscv: ftrace: Reduce the detour code size to half"), but > BPF code still uses the old 16B. So it'll work for BPF programs, but not > for regular kernel functions. > > An example: > > | ffffffff80fa4150 <bpf_fentry_test1>: > | ffffffff80fa4150: 0001 nop > | ffffffff80fa4152: 0001 nop > | ffffffff80fa4154: 0001 nop > | ffffffff80fa4156: 0001 nop > | ffffffff80fa4158: 1141 add sp,sp,-16 > | ffffffff80fa415a: e422 sd s0,8(sp) > | ffffffff80fa415c: 0800 add s0,sp,16 > | ffffffff80fa415e: 6422 ld s0,8(sp) > | ffffffff80fa4160: 2505 addw a0,a0,1 > | ffffffff80fa4162: 0141 add sp,sp,16 > | ffffffff80fa4164: 8082 ret > > is patched to: > > | ffffffff80fa4150: f70c0297 auipc t0,-150208512 > | ffffffff80fa4154: eb0282e7 jalr t0,t0,-336 > > The return address to bpf_fentry_test1 is stored in t0 at BPF > trampoline entry. Return to the *parent* is in ra. The trampline has > to deal with this. > > For BPF_TRAMP_F_SKIP_FRAME/CALL_ORIG, the BPF trampoline will skip too > many bytes, and not correctly handle parent calls. > > Further; The BPF trampoline currently has a different way of patching > the nops for BPF programs, than what ftrace does. That should be changed > to match what ftrace does (auipc/jalr t0). > > To summarize: > * Align BPF nop sled with patchable-function-entry: 8B. > * Adapt BPF trampoline for 8B nop sleds. > * Adapt BPF trampoline t0 return, ra parent scheme. Thanks for digging into this one, I agree we need to sort out the BPF breakages before we merge this. Sounds like there's a rabbit hole here, but hopefully we can get it sorted out. I've dropped this from patchwork and such, as we'll need at least another spin. > Cheers, > Björn
On 2023/7/13 2:11, Björn Töpel wrote: > Song Shuai <suagrfillet@gmail.com> writes: > > [...] > >> Add WITH_DIRECT_CALLS support [3] (patch 3, 4) >> ============================================== > > We've had some offlist discussions, so here's some input for a wider > audience! Most importantly, this is for Palmer, so that this series is > not merged until a proper BPF trampoline fix is in place. > > Note that what's currently usable from BPF trampoline *works*. It's > when this series is added that it breaks. > > TL;DR This series adds DYNAMIC_FTRACE_WITH_DIRECT_CALLS, which enables > fentry/fexit BPF trampoline support. Unfortunately the > fexit/BPF_TRAMP_F_SKIP_FRAME parts of the RV BPF trampoline breaks > with this addition, and need to be addressed *prior* merging this > series. An easy way to reproduce, is just calling any of the kselftest > tests that uses fexit patching. > > The issue is around the nop seld, and how a call is done; The nop sled > (patchable-function-entry) size changed from 16B to 8B in commit > 6724a76cff85 ("riscv: ftrace: Reduce the detour code size to half"), but > BPF code still uses the old 16B. So it'll work for BPF programs, but not > for regular kernel functions. > > An example: > > | ffffffff80fa4150 <bpf_fentry_test1>: > | ffffffff80fa4150: 0001 nop > | ffffffff80fa4152: 0001 nop > | ffffffff80fa4154: 0001 nop > | ffffffff80fa4156: 0001 nop > | ffffffff80fa4158: 1141 add sp,sp,-16 > | ffffffff80fa415a: e422 sd s0,8(sp) > | ffffffff80fa415c: 0800 add s0,sp,16 > | ffffffff80fa415e: 6422 ld s0,8(sp) > | ffffffff80fa4160: 2505 addw a0,a0,1 > | ffffffff80fa4162: 0141 add sp,sp,16 > | ffffffff80fa4164: 8082 ret > > is patched to: > > | ffffffff80fa4150: f70c0297 auipc t0,-150208512 > | ffffffff80fa4154: eb0282e7 jalr t0,t0,-336 > > The return address to bpf_fentry_test1 is stored in t0 at BPF > trampoline entry. Return to the *parent* is in ra. The trampline has > to deal with this. > > For BPF_TRAMP_F_SKIP_FRAME/CALL_ORIG, the BPF trampoline will skip too > many bytes, and not correctly handle parent calls. > > Further; The BPF trampoline currently has a different way of patching > the nops for BPF programs, than what ftrace does. That should be changed > to match what ftrace does (auipc/jalr t0). > > To summarize: > * Align BPF nop sled with patchable-function-entry: 8B. > * Adapt BPF trampoline for 8B nop sleds. > * Adapt BPF trampoline t0 return, ra parent scheme. > Thanks Björn, I make a adaptation as follows, looking forward to your review. https://lore.kernel.org/bpf/20230715090137.2141358-1-pulehui@huaweicloud.com/ > > Cheers, > Björn > >
Palmer Dabbelt <palmer@rivosinc.com> writes: > On Wed, 12 Jul 2023 11:11:08 PDT (-0700), bjorn@kernel.org wrote: >> Song Shuai <suagrfillet@gmail.com> writes: >> >> [...] >> >>> Add WITH_DIRECT_CALLS support [3] (patch 3, 4) >>> ============================================== >> >> We've had some offlist discussions, so here's some input for a wider >> audience! Most importantly, this is for Palmer, so that this series is >> not merged until a proper BPF trampoline fix is in place. >> >> Note that what's currently usable from BPF trampoline *works*. It's >> when this series is added that it breaks. >> >> TL;DR This series adds DYNAMIC_FTRACE_WITH_DIRECT_CALLS, which enables >> fentry/fexit BPF trampoline support. Unfortunately the >> fexit/BPF_TRAMP_F_SKIP_FRAME parts of the RV BPF trampoline breaks >> with this addition, and need to be addressed *prior* merging this >> series. An easy way to reproduce, is just calling any of the kselftest >> tests that uses fexit patching. >> >> The issue is around the nop seld, and how a call is done; The nop sled >> (patchable-function-entry) size changed from 16B to 8B in commit >> 6724a76cff85 ("riscv: ftrace: Reduce the detour code size to half"), but >> BPF code still uses the old 16B. So it'll work for BPF programs, but not >> for regular kernel functions. >> >> An example: >> >> | ffffffff80fa4150 <bpf_fentry_test1>: >> | ffffffff80fa4150: 0001 nop >> | ffffffff80fa4152: 0001 nop >> | ffffffff80fa4154: 0001 nop >> | ffffffff80fa4156: 0001 nop >> | ffffffff80fa4158: 1141 add sp,sp,-16 >> | ffffffff80fa415a: e422 sd s0,8(sp) >> | ffffffff80fa415c: 0800 add s0,sp,16 >> | ffffffff80fa415e: 6422 ld s0,8(sp) >> | ffffffff80fa4160: 2505 addw a0,a0,1 >> | ffffffff80fa4162: 0141 add sp,sp,16 >> | ffffffff80fa4164: 8082 ret >> >> is patched to: >> >> | ffffffff80fa4150: f70c0297 auipc t0,-150208512 >> | ffffffff80fa4154: eb0282e7 jalr t0,t0,-336 >> >> The return address to bpf_fentry_test1 is stored in t0 at BPF >> trampoline entry. Return to the *parent* is in ra. The trampline has >> to deal with this. >> >> For BPF_TRAMP_F_SKIP_FRAME/CALL_ORIG, the BPF trampoline will skip too >> many bytes, and not correctly handle parent calls. >> >> Further; The BPF trampoline currently has a different way of patching >> the nops for BPF programs, than what ftrace does. That should be changed >> to match what ftrace does (auipc/jalr t0). >> >> To summarize: >> * Align BPF nop sled with patchable-function-entry: 8B. >> * Adapt BPF trampoline for 8B nop sleds. >> * Adapt BPF trampoline t0 return, ra parent scheme. > > Thanks for digging into this one, I agree we need to sort out the BPF > breakages before we merge this. Sounds like there's a rabbit hole here, > but hopefully we can get it sorted out. > > I've dropped this from patchwork and such, as we'll need at least > another spin. Palmer, The needed BPF patch is upstream in the bpf-next tree, and has been for a couple of weeks. I think this series is a candidate for RISC-V -next! It would help RISC-V BPF a lot in terms of completeness. Björn
Björn Töpel <bjorn@kernel.org> writes: > Palmer Dabbelt <palmer@rivosinc.com> writes: > >> On Wed, 12 Jul 2023 11:11:08 PDT (-0700), bjorn@kernel.org wrote: >>> Song Shuai <suagrfillet@gmail.com> writes: >>> >>> [...] >>> >>>> Add WITH_DIRECT_CALLS support [3] (patch 3, 4) >>>> ============================================== >>> >>> We've had some offlist discussions, so here's some input for a wider >>> audience! Most importantly, this is for Palmer, so that this series is >>> not merged until a proper BPF trampoline fix is in place. >>> >>> Note that what's currently usable from BPF trampoline *works*. It's >>> when this series is added that it breaks. >>> >>> TL;DR This series adds DYNAMIC_FTRACE_WITH_DIRECT_CALLS, which enables >>> fentry/fexit BPF trampoline support. Unfortunately the >>> fexit/BPF_TRAMP_F_SKIP_FRAME parts of the RV BPF trampoline breaks >>> with this addition, and need to be addressed *prior* merging this >>> series. An easy way to reproduce, is just calling any of the kselftest >>> tests that uses fexit patching. >>> >>> The issue is around the nop seld, and how a call is done; The nop sled >>> (patchable-function-entry) size changed from 16B to 8B in commit >>> 6724a76cff85 ("riscv: ftrace: Reduce the detour code size to half"), but >>> BPF code still uses the old 16B. So it'll work for BPF programs, but not >>> for regular kernel functions. >>> >>> An example: >>> >>> | ffffffff80fa4150 <bpf_fentry_test1>: >>> | ffffffff80fa4150: 0001 nop >>> | ffffffff80fa4152: 0001 nop >>> | ffffffff80fa4154: 0001 nop >>> | ffffffff80fa4156: 0001 nop >>> | ffffffff80fa4158: 1141 add sp,sp,-16 >>> | ffffffff80fa415a: e422 sd s0,8(sp) >>> | ffffffff80fa415c: 0800 add s0,sp,16 >>> | ffffffff80fa415e: 6422 ld s0,8(sp) >>> | ffffffff80fa4160: 2505 addw a0,a0,1 >>> | ffffffff80fa4162: 0141 add sp,sp,16 >>> | ffffffff80fa4164: 8082 ret >>> >>> is patched to: >>> >>> | ffffffff80fa4150: f70c0297 auipc t0,-150208512 >>> | ffffffff80fa4154: eb0282e7 jalr t0,t0,-336 >>> >>> The return address to bpf_fentry_test1 is stored in t0 at BPF >>> trampoline entry. Return to the *parent* is in ra. The trampline has >>> to deal with this. >>> >>> For BPF_TRAMP_F_SKIP_FRAME/CALL_ORIG, the BPF trampoline will skip too >>> many bytes, and not correctly handle parent calls. >>> >>> Further; The BPF trampoline currently has a different way of patching >>> the nops for BPF programs, than what ftrace does. That should be changed >>> to match what ftrace does (auipc/jalr t0). >>> >>> To summarize: >>> * Align BPF nop sled with patchable-function-entry: 8B. >>> * Adapt BPF trampoline for 8B nop sleds. >>> * Adapt BPF trampoline t0 return, ra parent scheme. >> >> Thanks for digging into this one, I agree we need to sort out the BPF >> breakages before we merge this. Sounds like there's a rabbit hole here, >> but hopefully we can get it sorted out. >> >> I've dropped this from patchwork and such, as we'll need at least >> another spin. > > Palmer, > > The needed BPF patch is upstream in the bpf-next tree, and has been for > a couple of weeks. > > I think this series is a candidate for RISC-V -next! It would help > RISC-V BPF a lot in terms of completeness. Palmer, The needed fix for BPF is now in Linus' tree, commit 25ad10658dc1 ("riscv, bpf: Adapt bpf trampoline to optimized riscv ftrace framework"). IOW, this ftrace series can be merged now. Björn