Message ID | 20220123221932.537060-1-jolsa@kernel.org (mailing list archive) |
---|---|
State | Not Applicable |
Delegated to: | BPF |
Headers | show |
Series | [1/3] perf/bpf: Remove prologue generation | expand |
Context | Check | Description |
---|---|---|
bpf/vmtest-bpf-PR | fail | merge-conflict |
netdev/tree_selection | success | Not a local patch |
On Sun, Jan 23, 2022 at 2:19 PM Jiri Olsa <jolsa@redhat.com> wrote: > > Removing code for ebpf program prologue generation. > > The prologue code was used to get data for extra arguments specified > in program section name, like: > > SEC("lock_page=__lock_page page->flags") > int lock_page(struct pt_regs *ctx, int err, unsigned long flags) > { > return 1; > } > > This code is using deprecated libbpf API and blocks its removal. > > This feature was not documented and broken for some time without > anyone complaining, also original authors are not responding, > so I'm removing it. > > Signed-off-by: Jiri Olsa <jolsa@kernel.org> > --- > tools/perf/Makefile.config | 11 - > tools/perf/builtin-record.c | 14 - > tools/perf/util/bpf-loader.c | 242 +--------------- > tools/perf/util/bpf-prologue.c | 508 --------------------------------- > tools/perf/util/bpf-prologue.h | 37 --- > 5 files changed, 1 insertion(+), 811 deletions(-) Love the stats! Thanks for taking this on! > delete mode 100644 tools/perf/util/bpf-prologue.c > delete mode 100644 tools/perf/util/bpf-prologue.h > [...]
On Mon, Jan 24, 2022 at 12:24 PM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > On Sun, Jan 23, 2022 at 2:19 PM Jiri Olsa <jolsa@redhat.com> wrote: > > > > Removing code for ebpf program prologue generation. > > > > The prologue code was used to get data for extra arguments specified > > in program section name, like: > > > > SEC("lock_page=__lock_page page->flags") > > int lock_page(struct pt_regs *ctx, int err, unsigned long flags) > > { > > return 1; > > } > > > > This code is using deprecated libbpf API and blocks its removal. > > > > This feature was not documented and broken for some time without > > anyone complaining, also original authors are not responding, > > so I'm removing it. > > > > Signed-off-by: Jiri Olsa <jolsa@kernel.org> > > --- > > tools/perf/Makefile.config | 11 - > > tools/perf/builtin-record.c | 14 - > > tools/perf/util/bpf-loader.c | 242 +--------------- > > tools/perf/util/bpf-prologue.c | 508 --------------------------------- > > tools/perf/util/bpf-prologue.h | 37 --- > > 5 files changed, 1 insertion(+), 811 deletions(-) > > Love the stats! Thanks for taking this on! > Hi, Was this ever applied? If not, are there any blockers? I assume this will go through the perf tree, right? > > delete mode 100644 tools/perf/util/bpf-prologue.c > > delete mode 100644 tools/perf/util/bpf-prologue.h > > > > [...]
Em Tue, Feb 01, 2022 at 05:01:38PM -0800, Andrii Nakryiko escreveu: > On Mon, Jan 24, 2022 at 12:24 PM Andrii Nakryiko > <andrii.nakryiko@gmail.com> wrote: > > > > On Sun, Jan 23, 2022 at 2:19 PM Jiri Olsa <jolsa@redhat.com> wrote: > > > > > > Removing code for ebpf program prologue generation. > > > > > > The prologue code was used to get data for extra arguments specified > > > in program section name, like: > > > > > > SEC("lock_page=__lock_page page->flags") > > > int lock_page(struct pt_regs *ctx, int err, unsigned long flags) > > > { > > > return 1; > > > } > > > > > > This code is using deprecated libbpf API and blocks its removal. > > > > > > This feature was not documented and broken for some time without > > > anyone complaining, also original authors are not responding, > > > so I'm removing it. > > > > > > Signed-off-by: Jiri Olsa <jolsa@kernel.org> > > > --- > > > tools/perf/Makefile.config | 11 - > > > tools/perf/builtin-record.c | 14 - > > > tools/perf/util/bpf-loader.c | 242 +--------------- > > > tools/perf/util/bpf-prologue.c | 508 --------------------------------- > > > tools/perf/util/bpf-prologue.h | 37 --- > > > 5 files changed, 1 insertion(+), 811 deletions(-) > > > > Love the stats! Thanks for taking this on! > > > > Hi, > > Was this ever applied? If not, are there any blockers? I assume this > will go through the perf tree, right? I'll go thru it today. > > > delete mode 100644 tools/perf/util/bpf-prologue.c > > > delete mode 100644 tools/perf/util/bpf-prologue.h > > > > > > > [...]
On Wed, Feb 2, 2022 at 2:08 AM Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com> wrote: > > Em Tue, Feb 01, 2022 at 05:01:38PM -0800, Andrii Nakryiko escreveu: > > On Mon, Jan 24, 2022 at 12:24 PM Andrii Nakryiko > > <andrii.nakryiko@gmail.com> wrote: > > > > > > On Sun, Jan 23, 2022 at 2:19 PM Jiri Olsa <jolsa@redhat.com> wrote: > > > > > > > > Removing code for ebpf program prologue generation. > > > > > > > > The prologue code was used to get data for extra arguments specified > > > > in program section name, like: > > > > > > > > SEC("lock_page=__lock_page page->flags") > > > > int lock_page(struct pt_regs *ctx, int err, unsigned long flags) > > > > { > > > > return 1; > > > > } > > > > > > > > This code is using deprecated libbpf API and blocks its removal. > > > > > > > > This feature was not documented and broken for some time without > > > > anyone complaining, also original authors are not responding, > > > > so I'm removing it. > > > > > > > > Signed-off-by: Jiri Olsa <jolsa@kernel.org> > > > > --- > > > > tools/perf/Makefile.config | 11 - > > > > tools/perf/builtin-record.c | 14 - > > > > tools/perf/util/bpf-loader.c | 242 +--------------- > > > > tools/perf/util/bpf-prologue.c | 508 --------------------------------- > > > > tools/perf/util/bpf-prologue.h | 37 --- > > > > 5 files changed, 1 insertion(+), 811 deletions(-) > > > > > > Love the stats! Thanks for taking this on! > > > > > > > Hi, > > > > Was this ever applied? If not, are there any blockers? I assume this > > will go through the perf tree, right? > > I'll go thru it today. Great, thank you! > > > > > delete mode 100644 tools/perf/util/bpf-prologue.c > > > > delete mode 100644 tools/perf/util/bpf-prologue.h > > > > > > > > > > [...] > > -- > > - Arnaldo
Em Sun, Jan 23, 2022 at 11:19:30PM +0100, Jiri Olsa escreveu: > Removing code for ebpf program prologue generation. > > The prologue code was used to get data for extra arguments specified > in program section name, like: > > SEC("lock_page=__lock_page page->flags") > int lock_page(struct pt_regs *ctx, int err, unsigned long flags) > { > return 1; > } > > This code is using deprecated libbpf API and blocks its removal. > > This feature was not documented and broken for some time without > anyone complaining, also original authors are not responding, > so I'm removing it. So, the example below breaks, how hard would be to move the deprecated APIs to perf like was done in some other cases? - Arnaldo Before: [root@quaco perf]# cat tools/perf/examples/bpf/5sec.c // SPDX-License-Identifier: GPL-2.0 /* Description: . Disable strace like syscall tracing (--no-syscalls), or try tracing just some (-e *sleep). . Attach a filter function to a kernel function, returning when it should be considered, i.e. appear on the output. . Run it system wide, so that any sleep of >= 5 seconds and < than 6 seconds gets caught. . Ask for callgraphs using DWARF info, so that userspace can be unwound . While this is running, run something like "sleep 5s". . If we decide to add tv_nsec as well, then it becomes: int probe(hrtimer_nanosleep, rqtp->tv_sec rqtp->tv_nsec)(void *ctx, int err, long sec, long nsec) I.e. add where it comes from (rqtp->tv_nsec) and where it will be accessible in the function body (nsec) # perf trace --no-syscalls -e tools/perf/examples/bpf/5sec.c/call-graph=dwarf/ 0.000 perf_bpf_probe:func:(ffffffff9811b5f0) tv_sec=5 hrtimer_nanosleep ([kernel.kallsyms]) __x64_sys_nanosleep ([kernel.kallsyms]) do_syscall_64 ([kernel.kallsyms]) entry_SYSCALL_64 ([kernel.kallsyms]) __GI___nanosleep (/usr/lib64/libc-2.26.so) rpl_nanosleep (/usr/bin/sleep) xnanosleep (/usr/bin/sleep) main (/usr/bin/sleep) __libc_start_main (/usr/lib64/libc-2.26.so) _start (/usr/bin/sleep) ^C# Copyright (C) 2018 Red Hat, Inc., Arnaldo Carvalho de Melo <acme@redhat.com> */ #include <bpf.h> #define NSEC_PER_SEC 1000000000L int probe(hrtimer_nanosleep, rqtp)(void *ctx, int err, long long sec) { return sec / NSEC_PER_SEC == 5ULL; } license(GPL); [root@quaco perf]# perf trace -e tools/perf/examples/bpf/5sec.c sleep 5s 0.000 perf_bpf_probe:hrtimer_nanosleep(__probe_ip: -1994947936, rqtp: 5000000000) [root@quaco perf]# After: [root@quaco perf]# perf trace -e tools/perf/examples/bpf/5sec.c sleep 5s event syntax error: 'tools/perf/examples/bpf/5sec.c' \___ Permission denied (add -v to see detail) Run 'perf list' for a list of valid events Usage: perf trace [<options>] [<command>] or: perf trace [<options>] -- <command> [<options>] or: perf trace record [<options>] [<command>] or: perf trace record [<options>] -- <command> [<options>] -e, --event <event> event/syscall selector. use 'perf list' to list available events [root@quaco perf]# perf trace -v -e tools/perf/examples/bpf/5sec.c sleep 5s bpf: builtin compilation failed: -95, try external compiler Kernel build dir is set to /lib/modules/5.15.18-200.fc35.x86_64/build set env: KBUILD_DIR=/lib/modules/5.15.18-200.fc35.x86_64/build unset env: KBUILD_OPTS include option is set to -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/11/include -I./arch/x86/include -I./arch/x86/include/generated -I./include -I./arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I./include/uapi -I./include/generated/uapi -include ./include/linux/compiler-version.h -include ./include/linux/kconfig.h set env: NR_CPUS=8 set env: LINUX_VERSION_CODE=0x50f12 set env: CLANG_EXEC=/usr/lib64/ccache/clang set env: CLANG_OPTIONS=-g set env: KERNEL_INC_OPTIONS=-nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/11/include -I./arch/x86/include -I./arch/x86/include/generated -I./include -I./arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I./include/uapi -I./include/generated/uapi -include ./include/linux/compiler-version.h -include ./include/linux/kconfig.h set env: PERF_BPF_INC_OPTIONS=-I/home/acme/lib/perf/include/bpf set env: WORKING_DIR=/lib/modules/5.15.18-200.fc35.x86_64/build set env: CLANG_SOURCE=/home/acme/git/perf/tools/perf/examples/bpf/5sec.c llvm compiling command template: $CLANG_EXEC -D__KERNEL__ -D__NR_CPUS__=$NR_CPUS -DLINUX_VERSION_CODE=$LINUX_VERSION_CODE $CLANG_OPTIONS $PERF_BPF_INC_OPTIONS $KERNEL_INC_OPTIONS -Wno-unused-value -Wno-pointer-sign -working-directory $WORKING_DIR -c "$CLANG_SOURCE" -target bpf $CLANG_EMIT_LLVM -O2 -o - $LLVM_OPTIONS_PIPE llvm compiling command : /usr/lib64/ccache/clang -D__KERNEL__ -D__NR_CPUS__=8 -DLINUX_VERSION_CODE=0x50f12 -g -I/home/acme/lib/perf/include/bpf -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/11/include -I./arch/x86/include -I./arch/x86/include/generated -I./include -I./arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I./include/uapi -I./include/generated/uapi -include ./include/linux/compiler-version.h -include ./include/linux/kconfig.h -Wno-unused-value -Wno-pointer-sign -working-directory /lib/modules/5.15.18-200.fc35.x86_64/build -c /home/acme/git/perf/tools/perf/examples/bpf/5sec.c -target bpf -O2 -o - libbpf: loading object 'tools/perf/examples/bpf/5sec.c' from buffer libbpf: elf: section(3) hrtimer_nanosleep=hrtimer_nanosleep rqtp, size 64, link 0, flags 6, type=1 libbpf: sec 'hrtimer_nanosleep=hrtimer_nanosleep rqtp': found program 'hrtimer_nanosleep' at insn offset 0 (0 bytes), code size 8 insns (64 bytes) libbpf: elf: section(4) license, size 4, link 0, flags 3, type=1 libbpf: license of tools/perf/examples/bpf/5sec.c is GPL libbpf: elf: section(5) version, size 4, link 0, flags 3, type=1 libbpf: kernel version of tools/perf/examples/bpf/5sec.c is 50f12 libbpf: elf: section(11) .BTF, size 558, link 0, flags 0, type=1 libbpf: elf: section(13) .BTF.ext, size 112, link 0, flags 0, type=1 libbpf: elf: section(20) .symtab, size 288, link 1, flags 0, type=2 libbpf: looking for externs among 12 symbols... libbpf: collected 0 externs total libbpf: prog 'hrtimer_nanosleep': unrecognized ELF section name 'hrtimer_nanosleep=hrtimer_nanosleep rqtp' LLVM: dumping tools/perf/examples/bpf/5sec.o bpf: config program 'hrtimer_nanosleep=hrtimer_nanosleep rqtp' symbol:hrtimer_nanosleep file:(null) line:0 offset:0 return:0 lazy:(null) parsing arg: rqtp into rqtp bpf: config 'hrtimer_nanosleep=hrtimer_nanosleep rqtp' is ok Looking at the vmlinux_path (8 entries long) Using /usr/lib/debug/lib/modules/5.15.18-200.fc35.x86_64/vmlinux for symbols Open Debuginfo file: /usr/lib/debug/.build-id/a5/6896963dc51b426302a1f1147842fb8f288ef2.debug Try to find probe point from debuginfo. Opening /sys/kernel/tracing//README write=0 Matched function: hrtimer_nanosleep [1af0959] Probe point found: hrtimer_nanosleep+0 Searching 'rqtp' variable in context. Converting variable rqtp into trace event. rqtp type is long long int. Found 1 probe_trace_events. Opening /sys/kernel/tracing//kprobe_events write=1 Writing event: p:perf_bpf_probe/hrtimer_nanosleep _text+1540768 rqtp=%di:s64 libbpf: prog 'hrtimer_nanosleep': BPF program load failed: Permission denied libbpf: prog 'hrtimer_nanosleep': -- BEGIN PROG LOAD LOG -- arg#0 reference type('UNKNOWN ') size cannot be determined: -22 ; int probe(hrtimer_nanosleep, rqtp)(void *ctx, int err, long long sec) 0: (18) r1 = 0xfffffffed5fa0e00 ; return sec / NSEC_PER_SEC == 5ULL; 2: (0f) r3 += r1 R3 !read_ok processed 2 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 -- END PROG LOAD LOG -- libbpf: failed to load program 'hrtimer_nanosleep' libbpf: failed to load object 'tools/perf/examples/bpf/5sec.c' bpf: load objects failed: err=-13: (Permission denied) event syntax error: 'tools/perf/examples/bpf/5sec.c' \___ Permission denied (add -v to see detail) Run 'perf list' for a list of valid events Usage: perf trace [<options>] [<command>] or: perf trace [<options>] -- <command> [<options>] or: perf trace record [<options>] [<command>] or: perf trace record [<options>] -- <command> [<options>] -e, --event <event> event/syscall selector. use 'perf list' to list available events Opening /sys/kernel/tracing//kprobe_events write=1 Opening /sys/kernel/tracing//uprobe_events write=1 Parsing probe_events: p:perf_bpf_probe/hrtimer_nanosleep _text+1540768 rqtp=%di:s64 Group:perf_bpf_probe Event:hrtimer_nanosleep probe:p Writing event: -:perf_bpf_probe/hrtimer_nanosleep [root@quaco perf]#
On Thu, Feb 10, 2022 at 04:18:10PM -0300, Arnaldo Carvalho de Melo wrote: > Em Sun, Jan 23, 2022 at 11:19:30PM +0100, Jiri Olsa escreveu: > > Removing code for ebpf program prologue generation. > > > > The prologue code was used to get data for extra arguments specified > > in program section name, like: > > > > SEC("lock_page=__lock_page page->flags") > > int lock_page(struct pt_regs *ctx, int err, unsigned long flags) > > { > > return 1; > > } > > > > This code is using deprecated libbpf API and blocks its removal. > > > > This feature was not documented and broken for some time without > > anyone complaining, also original authors are not responding, > > so I'm removing it. > > So, the example below breaks, how hard would be to move the deprecated > APIs to perf like was done in some other cases? > > - Arnaldo > > Before: > > [root@quaco perf]# cat tools/perf/examples/bpf/5sec.c > // SPDX-License-Identifier: GPL-2.0 > /* > Description: > > . Disable strace like syscall tracing (--no-syscalls), or try tracing > just some (-e *sleep). > > . Attach a filter function to a kernel function, returning when it should > be considered, i.e. appear on the output. > > . Run it system wide, so that any sleep of >= 5 seconds and < than 6 > seconds gets caught. > > . Ask for callgraphs using DWARF info, so that userspace can be unwound > > . While this is running, run something like "sleep 5s". > > . If we decide to add tv_nsec as well, then it becomes: > > int probe(hrtimer_nanosleep, rqtp->tv_sec rqtp->tv_nsec)(void *ctx, int err, long sec, long nsec) > > I.e. add where it comes from (rqtp->tv_nsec) and where it will be > accessible in the function body (nsec) > > # perf trace --no-syscalls -e tools/perf/examples/bpf/5sec.c/call-graph=dwarf/ > 0.000 perf_bpf_probe:func:(ffffffff9811b5f0) tv_sec=5 > hrtimer_nanosleep ([kernel.kallsyms]) > __x64_sys_nanosleep ([kernel.kallsyms]) > do_syscall_64 ([kernel.kallsyms]) > entry_SYSCALL_64 ([kernel.kallsyms]) > __GI___nanosleep (/usr/lib64/libc-2.26.so) > rpl_nanosleep (/usr/bin/sleep) > xnanosleep (/usr/bin/sleep) > main (/usr/bin/sleep) > __libc_start_main (/usr/lib64/libc-2.26.so) > _start (/usr/bin/sleep) > ^C# > > Copyright (C) 2018 Red Hat, Inc., Arnaldo Carvalho de Melo <acme@redhat.com> > */ > > #include <bpf.h> > > #define NSEC_PER_SEC 1000000000L > > int probe(hrtimer_nanosleep, rqtp)(void *ctx, int err, long long sec) > { > return sec / NSEC_PER_SEC == 5ULL; > } that sucks ;-) I'll check if we can re-implement as we discussed earlier, however below is workaround how to do it without the prologue support jirka --- diff --git a/tools/perf/examples/bpf/5sec.c b/tools/perf/examples/bpf/5sec.c index e6b6181c6dc6..734d39debdb8 100644 --- a/tools/perf/examples/bpf/5sec.c +++ b/tools/perf/examples/bpf/5sec.c @@ -43,9 +43,17 @@ #define NSEC_PER_SEC 1000000000L -int probe(hrtimer_nanosleep, rqtp)(void *ctx, int err, long long sec) +struct pt_regs { + long di; +}; + +SEC("function=hrtimer_nanosleep") +int krava(struct pt_regs *ctx) { - return sec / NSEC_PER_SEC == 5ULL; + unsigned long arg; + + probe_read_kernel(&arg, sizeof(arg), __builtin_preserve_access_index(&ctx->di)); + return arg / NSEC_PER_SEC == 5ULL; } license(GPL); diff --git a/tools/perf/include/bpf/bpf.h b/tools/perf/include/bpf/bpf.h index b422aeef5339..b7d6d2fc8342 100644 --- a/tools/perf/include/bpf/bpf.h +++ b/tools/perf/include/bpf/bpf.h @@ -64,6 +64,7 @@ int _version SEC("version") = LINUX_VERSION_CODE; static int (*probe_read)(void *dst, int size, const void *unsafe_addr) = (void *)BPF_FUNC_probe_read; static int (*probe_read_str)(void *dst, int size, const void *unsafe_addr) = (void *)BPF_FUNC_probe_read_str; +static long (*probe_read_kernel)(void *dst, __u32 size, const void *unsafe_ptr) = (void *) BPF_FUNC_probe_read_kernel; static int (*perf_event_output)(void *, struct bpf_map *, int, void *, unsigned long) = (void *)BPF_FUNC_perf_event_output; diff --git a/tools/perf/util/llvm-utils.c b/tools/perf/util/llvm-utils.c index 96c8ef60f4f8..9274a3373847 100644 --- a/tools/perf/util/llvm-utils.c +++ b/tools/perf/util/llvm-utils.c @@ -25,7 +25,7 @@ "$CLANG_OPTIONS $PERF_BPF_INC_OPTIONS $KERNEL_INC_OPTIONS " \ "-Wno-unused-value -Wno-pointer-sign " \ "-working-directory $WORKING_DIR " \ - "-c \"$CLANG_SOURCE\" -target bpf $CLANG_EMIT_LLVM -O2 -o - $LLVM_OPTIONS_PIPE" + "-g -c \"$CLANG_SOURCE\" -target bpf $CLANG_EMIT_LLVM -O2 -o - $LLVM_OPTIONS_PIPE" struct llvm_param llvm_param = { .clang_path = "clang",
On Thu, Feb 10, 2022 at 1:31 PM Jiri Olsa <olsajiri@gmail.com> wrote: > > On Thu, Feb 10, 2022 at 04:18:10PM -0300, Arnaldo Carvalho de Melo wrote: > > Em Sun, Jan 23, 2022 at 11:19:30PM +0100, Jiri Olsa escreveu: > > > Removing code for ebpf program prologue generation. > > > > > > The prologue code was used to get data for extra arguments specified > > > in program section name, like: > > > > > > SEC("lock_page=__lock_page page->flags") > > > int lock_page(struct pt_regs *ctx, int err, unsigned long flags) > > > { > > > return 1; > > > } > > > > > > This code is using deprecated libbpf API and blocks its removal. > > > > > > This feature was not documented and broken for some time without > > > anyone complaining, also original authors are not responding, > > > so I'm removing it. > > > > So, the example below breaks, how hard would be to move the deprecated > > APIs to perf like was done in some other cases? > > Just copy/pasting libbpf code won't work. But there are three parts: 1. bpf_(program|map|object)__set_priv(). There is no equivalent API, but perf can maintain this private data by building hashmap where the key is bpf_object/bpf_map/bpf_program pointer itself. Annoying but very straightforward to replace. 2. For prologue generation, bpf_program__set_prep() doesn't have a direct equivalent. But program cloning and adjustment of the code can be achieved through bpf_program__insns()/bpf_program__insn_cnt() API to load one "prototype" program, gets its underlying insns and clone programs as necessary. After that, bpf_prog_load() would be used to load those cloned programs into kernel. 3. Those *very* custom SEC() definitions will be possible for perf to handle once [0] lands (I'll send new revision tomorrow, probably). You'll be able to register your own "fallback" handler with libbpf_register_prog_handler(NULL, ...). [0] https://patchwork.kernel.org/project/netdevbpf/list/?series=611491&state=* Sorry, it's not a straightforward copy/paste, but I hope this helps a bit. > > - Arnaldo > > > > Before: > > > > [root@quaco perf]# cat tools/perf/examples/bpf/5sec.c > > // SPDX-License-Identifier: GPL-2.0 > > /* > > Description: > > > > . Disable strace like syscall tracing (--no-syscalls), or try tracing > > just some (-e *sleep). > > > > . Attach a filter function to a kernel function, returning when it should > > be considered, i.e. appear on the output. > > > > . Run it system wide, so that any sleep of >= 5 seconds and < than 6 > > seconds gets caught. > > > > . Ask for callgraphs using DWARF info, so that userspace can be unwound > > > > . While this is running, run something like "sleep 5s". > > > > . If we decide to add tv_nsec as well, then it becomes: > > > > int probe(hrtimer_nanosleep, rqtp->tv_sec rqtp->tv_nsec)(void *ctx, int err, long sec, long nsec) > > > > I.e. add where it comes from (rqtp->tv_nsec) and where it will be > > accessible in the function body (nsec) > > > > # perf trace --no-syscalls -e tools/perf/examples/bpf/5sec.c/call-graph=dwarf/ > > 0.000 perf_bpf_probe:func:(ffffffff9811b5f0) tv_sec=5 > > hrtimer_nanosleep ([kernel.kallsyms]) > > __x64_sys_nanosleep ([kernel.kallsyms]) > > do_syscall_64 ([kernel.kallsyms]) > > entry_SYSCALL_64 ([kernel.kallsyms]) > > __GI___nanosleep (/usr/lib64/libc-2.26.so) > > rpl_nanosleep (/usr/bin/sleep) > > xnanosleep (/usr/bin/sleep) > > main (/usr/bin/sleep) > > __libc_start_main (/usr/lib64/libc-2.26.so) > > _start (/usr/bin/sleep) > > ^C# > > > > Copyright (C) 2018 Red Hat, Inc., Arnaldo Carvalho de Melo <acme@redhat.com> > > */ > > > > #include <bpf.h> > > > > #define NSEC_PER_SEC 1000000000L > > > > int probe(hrtimer_nanosleep, rqtp)(void *ctx, int err, long long sec) > > { > > return sec / NSEC_PER_SEC == 5ULL; > > } > > that sucks ;-) I'll check if we can re-implement as we discussed earlier, > however below is workaround how to do it without the prologue support > > jirka > > > --- > diff --git a/tools/perf/examples/bpf/5sec.c b/tools/perf/examples/bpf/5sec.c > index e6b6181c6dc6..734d39debdb8 100644 > --- a/tools/perf/examples/bpf/5sec.c > +++ b/tools/perf/examples/bpf/5sec.c > @@ -43,9 +43,17 @@ > > #define NSEC_PER_SEC 1000000000L > > -int probe(hrtimer_nanosleep, rqtp)(void *ctx, int err, long long sec) > +struct pt_regs { > + long di; > +}; > + > +SEC("function=hrtimer_nanosleep") > +int krava(struct pt_regs *ctx) > { > - return sec / NSEC_PER_SEC == 5ULL; > + unsigned long arg; > + > + probe_read_kernel(&arg, sizeof(arg), __builtin_preserve_access_index(&ctx->di)); > + return arg / NSEC_PER_SEC == 5ULL; > } > > license(GPL); > diff --git a/tools/perf/include/bpf/bpf.h b/tools/perf/include/bpf/bpf.h > index b422aeef5339..b7d6d2fc8342 100644 > --- a/tools/perf/include/bpf/bpf.h > +++ b/tools/perf/include/bpf/bpf.h > @@ -64,6 +64,7 @@ int _version SEC("version") = LINUX_VERSION_CODE; > > static int (*probe_read)(void *dst, int size, const void *unsafe_addr) = (void *)BPF_FUNC_probe_read; > static int (*probe_read_str)(void *dst, int size, const void *unsafe_addr) = (void *)BPF_FUNC_probe_read_str; > +static long (*probe_read_kernel)(void *dst, __u32 size, const void *unsafe_ptr) = (void *) BPF_FUNC_probe_read_kernel; > > static int (*perf_event_output)(void *, struct bpf_map *, int, void *, unsigned long) = (void *)BPF_FUNC_perf_event_output; > > diff --git a/tools/perf/util/llvm-utils.c b/tools/perf/util/llvm-utils.c > index 96c8ef60f4f8..9274a3373847 100644 > --- a/tools/perf/util/llvm-utils.c > +++ b/tools/perf/util/llvm-utils.c > @@ -25,7 +25,7 @@ > "$CLANG_OPTIONS $PERF_BPF_INC_OPTIONS $KERNEL_INC_OPTIONS " \ > "-Wno-unused-value -Wno-pointer-sign " \ > "-working-directory $WORKING_DIR " \ > - "-c \"$CLANG_SOURCE\" -target bpf $CLANG_EMIT_LLVM -O2 -o - $LLVM_OPTIONS_PIPE" > + "-g -c \"$CLANG_SOURCE\" -target bpf $CLANG_EMIT_LLVM -O2 -o - $LLVM_OPTIONS_PIPE" > > struct llvm_param llvm_param = { > .clang_path = "clang",
On Thu, Feb 10, 2022 at 09:28:51PM -0800, Andrii Nakryiko wrote: > On Thu, Feb 10, 2022 at 1:31 PM Jiri Olsa <olsajiri@gmail.com> wrote: > > > > On Thu, Feb 10, 2022 at 04:18:10PM -0300, Arnaldo Carvalho de Melo wrote: > > > Em Sun, Jan 23, 2022 at 11:19:30PM +0100, Jiri Olsa escreveu: > > > > Removing code for ebpf program prologue generation. > > > > > > > > The prologue code was used to get data for extra arguments specified > > > > in program section name, like: > > > > > > > > SEC("lock_page=__lock_page page->flags") > > > > int lock_page(struct pt_regs *ctx, int err, unsigned long flags) > > > > { > > > > return 1; > > > > } > > > > > > > > This code is using deprecated libbpf API and blocks its removal. > > > > > > > > This feature was not documented and broken for some time without > > > > anyone complaining, also original authors are not responding, > > > > so I'm removing it. > > > > > > So, the example below breaks, how hard would be to move the deprecated > > > APIs to perf like was done in some other cases? > > > > > Just copy/pasting libbpf code won't work. But there are three parts: > > 1. bpf_(program|map|object)__set_priv(). There is no equivalent API, > but perf can maintain this private data by building hashmap where the > key is bpf_object/bpf_map/bpf_program pointer itself. Annoying but > very straightforward to replace. > > 2. For prologue generation, bpf_program__set_prep() doesn't have a > direct equivalent. But program cloning and adjustment of the code can > be achieved through bpf_program__insns()/bpf_program__insn_cnt() API > to load one "prototype" program, gets its underlying insns and clone > programs as necessary. After that, bpf_prog_load() would be used to > load those cloned programs into kernel. hm, I can't see how to clone a program.. so we need to end up with several copies of the single program defined in the object.. I can get its intructions with bpf_program__insns, but how do I add more programs with these instructions customized/prefixed? thanks, jirka > > 3. Those *very* custom SEC() definitions will be possible for perf to > handle once [0] lands (I'll send new revision tomorrow, probably). > You'll be able to register your own "fallback" handler with > libbpf_register_prog_handler(NULL, ...). > > [0] https://patchwork.kernel.org/project/netdevbpf/list/?series=611491&state=* > > Sorry, it's not a straightforward copy/paste, but I hope this helps a bit. > > > > - Arnaldo > > > > > > Before: > > > > > > [root@quaco perf]# cat tools/perf/examples/bpf/5sec.c > > > // SPDX-License-Identifier: GPL-2.0 > > > /* > > > Description: > > > > > > . Disable strace like syscall tracing (--no-syscalls), or try tracing > > > just some (-e *sleep). > > > > > > . Attach a filter function to a kernel function, returning when it should > > > be considered, i.e. appear on the output. > > > > > > . Run it system wide, so that any sleep of >= 5 seconds and < than 6 > > > seconds gets caught. > > > > > > . Ask for callgraphs using DWARF info, so that userspace can be unwound > > > > > > . While this is running, run something like "sleep 5s". > > > > > > . If we decide to add tv_nsec as well, then it becomes: > > > > > > int probe(hrtimer_nanosleep, rqtp->tv_sec rqtp->tv_nsec)(void *ctx, int err, long sec, long nsec) > > > > > > I.e. add where it comes from (rqtp->tv_nsec) and where it will be > > > accessible in the function body (nsec) > > > > > > # perf trace --no-syscalls -e tools/perf/examples/bpf/5sec.c/call-graph=dwarf/ > > > 0.000 perf_bpf_probe:func:(ffffffff9811b5f0) tv_sec=5 > > > hrtimer_nanosleep ([kernel.kallsyms]) > > > __x64_sys_nanosleep ([kernel.kallsyms]) > > > do_syscall_64 ([kernel.kallsyms]) > > > entry_SYSCALL_64 ([kernel.kallsyms]) > > > __GI___nanosleep (/usr/lib64/libc-2.26.so) > > > rpl_nanosleep (/usr/bin/sleep) > > > xnanosleep (/usr/bin/sleep) > > > main (/usr/bin/sleep) > > > __libc_start_main (/usr/lib64/libc-2.26.so) > > > _start (/usr/bin/sleep) > > > ^C# > > > > > > Copyright (C) 2018 Red Hat, Inc., Arnaldo Carvalho de Melo <acme@redhat.com> > > > */ > > > > > > #include <bpf.h> > > > > > > #define NSEC_PER_SEC 1000000000L > > > > > > int probe(hrtimer_nanosleep, rqtp)(void *ctx, int err, long long sec) > > > { > > > return sec / NSEC_PER_SEC == 5ULL; > > > } > > > > that sucks ;-) I'll check if we can re-implement as we discussed earlier, > > however below is workaround how to do it without the prologue support > > > > jirka > > > > > > --- > > diff --git a/tools/perf/examples/bpf/5sec.c b/tools/perf/examples/bpf/5sec.c > > index e6b6181c6dc6..734d39debdb8 100644 > > --- a/tools/perf/examples/bpf/5sec.c > > +++ b/tools/perf/examples/bpf/5sec.c > > @@ -43,9 +43,17 @@ > > > > #define NSEC_PER_SEC 1000000000L > > > > -int probe(hrtimer_nanosleep, rqtp)(void *ctx, int err, long long sec) > > +struct pt_regs { > > + long di; > > +}; > > + > > +SEC("function=hrtimer_nanosleep") > > +int krava(struct pt_regs *ctx) > > { > > - return sec / NSEC_PER_SEC == 5ULL; > > + unsigned long arg; > > + > > + probe_read_kernel(&arg, sizeof(arg), __builtin_preserve_access_index(&ctx->di)); > > + return arg / NSEC_PER_SEC == 5ULL; > > } > > > > license(GPL); > > diff --git a/tools/perf/include/bpf/bpf.h b/tools/perf/include/bpf/bpf.h > > index b422aeef5339..b7d6d2fc8342 100644 > > --- a/tools/perf/include/bpf/bpf.h > > +++ b/tools/perf/include/bpf/bpf.h > > @@ -64,6 +64,7 @@ int _version SEC("version") = LINUX_VERSION_CODE; > > > > static int (*probe_read)(void *dst, int size, const void *unsafe_addr) = (void *)BPF_FUNC_probe_read; > > static int (*probe_read_str)(void *dst, int size, const void *unsafe_addr) = (void *)BPF_FUNC_probe_read_str; > > +static long (*probe_read_kernel)(void *dst, __u32 size, const void *unsafe_ptr) = (void *) BPF_FUNC_probe_read_kernel; > > > > static int (*perf_event_output)(void *, struct bpf_map *, int, void *, unsigned long) = (void *)BPF_FUNC_perf_event_output; > > > > diff --git a/tools/perf/util/llvm-utils.c b/tools/perf/util/llvm-utils.c > > index 96c8ef60f4f8..9274a3373847 100644 > > --- a/tools/perf/util/llvm-utils.c > > +++ b/tools/perf/util/llvm-utils.c > > @@ -25,7 +25,7 @@ > > "$CLANG_OPTIONS $PERF_BPF_INC_OPTIONS $KERNEL_INC_OPTIONS " \ > > "-Wno-unused-value -Wno-pointer-sign " \ > > "-working-directory $WORKING_DIR " \ > > - "-c \"$CLANG_SOURCE\" -target bpf $CLANG_EMIT_LLVM -O2 -o - $LLVM_OPTIONS_PIPE" > > + "-g -c \"$CLANG_SOURCE\" -target bpf $CLANG_EMIT_LLVM -O2 -o - $LLVM_OPTIONS_PIPE" > > > > struct llvm_param llvm_param = { > > .clang_path = "clang",
On Sun, Feb 13, 2022 at 7:02 AM Jiri Olsa <olsajiri@gmail.com> wrote: > > On Thu, Feb 10, 2022 at 09:28:51PM -0800, Andrii Nakryiko wrote: > > On Thu, Feb 10, 2022 at 1:31 PM Jiri Olsa <olsajiri@gmail.com> wrote: > > > > > > On Thu, Feb 10, 2022 at 04:18:10PM -0300, Arnaldo Carvalho de Melo wrote: > > > > Em Sun, Jan 23, 2022 at 11:19:30PM +0100, Jiri Olsa escreveu: > > > > > Removing code for ebpf program prologue generation. > > > > > > > > > > The prologue code was used to get data for extra arguments specified > > > > > in program section name, like: > > > > > > > > > > SEC("lock_page=__lock_page page->flags") > > > > > int lock_page(struct pt_regs *ctx, int err, unsigned long flags) > > > > > { > > > > > return 1; > > > > > } > > > > > > > > > > This code is using deprecated libbpf API and blocks its removal. > > > > > > > > > > This feature was not documented and broken for some time without > > > > > anyone complaining, also original authors are not responding, > > > > > so I'm removing it. > > > > > > > > So, the example below breaks, how hard would be to move the deprecated > > > > APIs to perf like was done in some other cases? > > > > > > > > Just copy/pasting libbpf code won't work. But there are three parts: > > > > 1. bpf_(program|map|object)__set_priv(). There is no equivalent API, > > but perf can maintain this private data by building hashmap where the > > key is bpf_object/bpf_map/bpf_program pointer itself. Annoying but > > very straightforward to replace. > > > > 2. For prologue generation, bpf_program__set_prep() doesn't have a > > direct equivalent. But program cloning and adjustment of the code can > > be achieved through bpf_program__insns()/bpf_program__insn_cnt() API > > to load one "prototype" program, gets its underlying insns and clone > > programs as necessary. After that, bpf_prog_load() would be used to > > load those cloned programs into kernel. > > hm, I can't see how to clone a program.. so we need to end up with > several copies of the single program defined in the object.. I can > get its intructions with bpf_program__insns, but how do I add more > programs with these instructions customized/prefixed? You can't add those clones back to bpf_object, of course. But after grabbing (and modifying) instructions, you can use bpf_prog_load() low-level API to create BPF programs and get their FDs back. You'll have to keep track of those prog FDs separately from libbpf' struct bpf_object. > > thanks, > jirka > > > > > 3. Those *very* custom SEC() definitions will be possible for perf to > > handle once [0] lands (I'll send new revision tomorrow, probably). > > You'll be able to register your own "fallback" handler with > > libbpf_register_prog_handler(NULL, ...). > > > > [0] https://patchwork.kernel.org/project/netdevbpf/list/?series=611491&state=* > > > > Sorry, it's not a straightforward copy/paste, but I hope this helps a bit. > > > > > > - Arnaldo > > > > > > > > Before: > > > > > > > > [root@quaco perf]# cat tools/perf/examples/bpf/5sec.c > > > > // SPDX-License-Identifier: GPL-2.0 > > > > /* > > > > Description: > > > > > > > > . Disable strace like syscall tracing (--no-syscalls), or try tracing > > > > just some (-e *sleep). > > > > > > > > . Attach a filter function to a kernel function, returning when it should > > > > be considered, i.e. appear on the output. > > > > > > > > . Run it system wide, so that any sleep of >= 5 seconds and < than 6 > > > > seconds gets caught. > > > > > > > > . Ask for callgraphs using DWARF info, so that userspace can be unwound > > > > > > > > . While this is running, run something like "sleep 5s". > > > > > > > > . If we decide to add tv_nsec as well, then it becomes: > > > > > > > > int probe(hrtimer_nanosleep, rqtp->tv_sec rqtp->tv_nsec)(void *ctx, int err, long sec, long nsec) > > > > > > > > I.e. add where it comes from (rqtp->tv_nsec) and where it will be > > > > accessible in the function body (nsec) > > > > > > > > # perf trace --no-syscalls -e tools/perf/examples/bpf/5sec.c/call-graph=dwarf/ > > > > 0.000 perf_bpf_probe:func:(ffffffff9811b5f0) tv_sec=5 > > > > hrtimer_nanosleep ([kernel.kallsyms]) > > > > __x64_sys_nanosleep ([kernel.kallsyms]) > > > > do_syscall_64 ([kernel.kallsyms]) > > > > entry_SYSCALL_64 ([kernel.kallsyms]) > > > > __GI___nanosleep (/usr/lib64/libc-2.26.so) > > > > rpl_nanosleep (/usr/bin/sleep) > > > > xnanosleep (/usr/bin/sleep) > > > > main (/usr/bin/sleep) > > > > __libc_start_main (/usr/lib64/libc-2.26.so) > > > > _start (/usr/bin/sleep) > > > > ^C# > > > > > > > > Copyright (C) 2018 Red Hat, Inc., Arnaldo Carvalho de Melo <acme@redhat.com> > > > > */ > > > > > > > > #include <bpf.h> > > > > > > > > #define NSEC_PER_SEC 1000000000L > > > > > > > > int probe(hrtimer_nanosleep, rqtp)(void *ctx, int err, long long sec) > > > > { > > > > return sec / NSEC_PER_SEC == 5ULL; > > > > } > > > > > > that sucks ;-) I'll check if we can re-implement as we discussed earlier, > > > however below is workaround how to do it without the prologue support > > > > > > jirka > > > > > > > > > --- > > > diff --git a/tools/perf/examples/bpf/5sec.c b/tools/perf/examples/bpf/5sec.c > > > index e6b6181c6dc6..734d39debdb8 100644 > > > --- a/tools/perf/examples/bpf/5sec.c > > > +++ b/tools/perf/examples/bpf/5sec.c > > > @@ -43,9 +43,17 @@ > > > > > > #define NSEC_PER_SEC 1000000000L > > > > > > -int probe(hrtimer_nanosleep, rqtp)(void *ctx, int err, long long sec) > > > +struct pt_regs { > > > + long di; > > > +}; > > > + > > > +SEC("function=hrtimer_nanosleep") > > > +int krava(struct pt_regs *ctx) > > > { > > > - return sec / NSEC_PER_SEC == 5ULL; > > > + unsigned long arg; > > > + > > > + probe_read_kernel(&arg, sizeof(arg), __builtin_preserve_access_index(&ctx->di)); > > > + return arg / NSEC_PER_SEC == 5ULL; > > > } > > > > > > license(GPL); > > > diff --git a/tools/perf/include/bpf/bpf.h b/tools/perf/include/bpf/bpf.h > > > index b422aeef5339..b7d6d2fc8342 100644 > > > --- a/tools/perf/include/bpf/bpf.h > > > +++ b/tools/perf/include/bpf/bpf.h > > > @@ -64,6 +64,7 @@ int _version SEC("version") = LINUX_VERSION_CODE; > > > > > > static int (*probe_read)(void *dst, int size, const void *unsafe_addr) = (void *)BPF_FUNC_probe_read; > > > static int (*probe_read_str)(void *dst, int size, const void *unsafe_addr) = (void *)BPF_FUNC_probe_read_str; > > > +static long (*probe_read_kernel)(void *dst, __u32 size, const void *unsafe_ptr) = (void *) BPF_FUNC_probe_read_kernel; > > > > > > static int (*perf_event_output)(void *, struct bpf_map *, int, void *, unsigned long) = (void *)BPF_FUNC_perf_event_output; > > > > > > diff --git a/tools/perf/util/llvm-utils.c b/tools/perf/util/llvm-utils.c > > > index 96c8ef60f4f8..9274a3373847 100644 > > > --- a/tools/perf/util/llvm-utils.c > > > +++ b/tools/perf/util/llvm-utils.c > > > @@ -25,7 +25,7 @@ > > > "$CLANG_OPTIONS $PERF_BPF_INC_OPTIONS $KERNEL_INC_OPTIONS " \ > > > "-Wno-unused-value -Wno-pointer-sign " \ > > > "-working-directory $WORKING_DIR " \ > > > - "-c \"$CLANG_SOURCE\" -target bpf $CLANG_EMIT_LLVM -O2 -o - $LLVM_OPTIONS_PIPE" > > > + "-g -c \"$CLANG_SOURCE\" -target bpf $CLANG_EMIT_LLVM -O2 -o - $LLVM_OPTIONS_PIPE" > > > > > > struct llvm_param llvm_param = { > > > .clang_path = "clang",
On Thu, Feb 10, 2022 at 1:31 PM Jiri Olsa <olsajiri@gmail.com> wrote: > > On Thu, Feb 10, 2022 at 04:18:10PM -0300, Arnaldo Carvalho de Melo wrote: > > Em Sun, Jan 23, 2022 at 11:19:30PM +0100, Jiri Olsa escreveu: > > > Removing code for ebpf program prologue generation. > > > > > > The prologue code was used to get data for extra arguments specified > > > in program section name, like: > > > > > > SEC("lock_page=__lock_page page->flags") > > > int lock_page(struct pt_regs *ctx, int err, unsigned long flags) > > > { > > > return 1; > > > } > > > > > > This code is using deprecated libbpf API and blocks its removal. > > > > > > This feature was not documented and broken for some time without > > > anyone complaining, also original authors are not responding, > > > so I'm removing it. > > > > So, the example below breaks, how hard would be to move the deprecated > > APIs to perf like was done in some other cases? > > > > - Arnaldo > > > > Before: > > > > [root@quaco perf]# cat tools/perf/examples/bpf/5sec.c > > // SPDX-License-Identifier: GPL-2.0 > > /* > > Description: > > > > . Disable strace like syscall tracing (--no-syscalls), or try tracing > > just some (-e *sleep). > > > > . Attach a filter function to a kernel function, returning when it should > > be considered, i.e. appear on the output. > > > > . Run it system wide, so that any sleep of >= 5 seconds and < than 6 > > seconds gets caught. > > > > . Ask for callgraphs using DWARF info, so that userspace can be unwound > > > > . While this is running, run something like "sleep 5s". > > > > . If we decide to add tv_nsec as well, then it becomes: > > > > int probe(hrtimer_nanosleep, rqtp->tv_sec rqtp->tv_nsec)(void *ctx, int err, long sec, long nsec) > > > > I.e. add where it comes from (rqtp->tv_nsec) and where it will be > > accessible in the function body (nsec) > > > > # perf trace --no-syscalls -e tools/perf/examples/bpf/5sec.c/call-graph=dwarf/ > > 0.000 perf_bpf_probe:func:(ffffffff9811b5f0) tv_sec=5 > > hrtimer_nanosleep ([kernel.kallsyms]) > > __x64_sys_nanosleep ([kernel.kallsyms]) > > do_syscall_64 ([kernel.kallsyms]) > > entry_SYSCALL_64 ([kernel.kallsyms]) > > __GI___nanosleep (/usr/lib64/libc-2.26.so) > > rpl_nanosleep (/usr/bin/sleep) > > xnanosleep (/usr/bin/sleep) > > main (/usr/bin/sleep) > > __libc_start_main (/usr/lib64/libc-2.26.so) > > _start (/usr/bin/sleep) > > ^C# > > > > Copyright (C) 2018 Red Hat, Inc., Arnaldo Carvalho de Melo <acme@redhat.com> > > */ > > > > #include <bpf.h> > > > > #define NSEC_PER_SEC 1000000000L > > > > int probe(hrtimer_nanosleep, rqtp)(void *ctx, int err, long long sec) > > { > > return sec / NSEC_PER_SEC == 5ULL; > > } > > that sucks ;-) I'll check if we can re-implement as we discussed earlier, > however below is workaround how to do it without the prologue support > I'm a bit confused. I thought that we discussed dropping custom SEC() definitions from perf as no one knew if anyone is even using this feature and when Jiri tried it didn't even work ([0]). Is this broken sample some other important use case that we are sure is used by people and thus must be preserved? Or dropping this custom SEC() business and prologue generation still on the table? > jirka > > > --- > diff --git a/tools/perf/examples/bpf/5sec.c b/tools/perf/examples/bpf/5sec.c > index e6b6181c6dc6..734d39debdb8 100644 > --- a/tools/perf/examples/bpf/5sec.c > +++ b/tools/perf/examples/bpf/5sec.c > @@ -43,9 +43,17 @@ > > #define NSEC_PER_SEC 1000000000L > > -int probe(hrtimer_nanosleep, rqtp)(void *ctx, int err, long long sec) > +struct pt_regs { > + long di; > +}; > + > +SEC("function=hrtimer_nanosleep") > +int krava(struct pt_regs *ctx) > { > - return sec / NSEC_PER_SEC == 5ULL; > + unsigned long arg; > + > + probe_read_kernel(&arg, sizeof(arg), __builtin_preserve_access_index(&ctx->di)); > + return arg / NSEC_PER_SEC == 5ULL; > } > > license(GPL); > diff --git a/tools/perf/include/bpf/bpf.h b/tools/perf/include/bpf/bpf.h > index b422aeef5339..b7d6d2fc8342 100644 > --- a/tools/perf/include/bpf/bpf.h > +++ b/tools/perf/include/bpf/bpf.h > @@ -64,6 +64,7 @@ int _version SEC("version") = LINUX_VERSION_CODE; > > static int (*probe_read)(void *dst, int size, const void *unsafe_addr) = (void *)BPF_FUNC_probe_read; > static int (*probe_read_str)(void *dst, int size, const void *unsafe_addr) = (void *)BPF_FUNC_probe_read_str; > +static long (*probe_read_kernel)(void *dst, __u32 size, const void *unsafe_ptr) = (void *) BPF_FUNC_probe_read_kernel; > > static int (*perf_event_output)(void *, struct bpf_map *, int, void *, unsigned long) = (void *)BPF_FUNC_perf_event_output; > > diff --git a/tools/perf/util/llvm-utils.c b/tools/perf/util/llvm-utils.c > index 96c8ef60f4f8..9274a3373847 100644 > --- a/tools/perf/util/llvm-utils.c > +++ b/tools/perf/util/llvm-utils.c > @@ -25,7 +25,7 @@ > "$CLANG_OPTIONS $PERF_BPF_INC_OPTIONS $KERNEL_INC_OPTIONS " \ > "-Wno-unused-value -Wno-pointer-sign " \ > "-working-directory $WORKING_DIR " \ > - "-c \"$CLANG_SOURCE\" -target bpf $CLANG_EMIT_LLVM -O2 -o - $LLVM_OPTIONS_PIPE" > + "-g -c \"$CLANG_SOURCE\" -target bpf $CLANG_EMIT_LLVM -O2 -o - $LLVM_OPTIONS_PIPE" > > struct llvm_param llvm_param = { > .clang_path = "clang",
On Sun, Feb 13, 2022 at 10:02:38PM -0800, Andrii Nakryiko wrote: > On Thu, Feb 10, 2022 at 1:31 PM Jiri Olsa <olsajiri@gmail.com> wrote: > > > > On Thu, Feb 10, 2022 at 04:18:10PM -0300, Arnaldo Carvalho de Melo wrote: > > > Em Sun, Jan 23, 2022 at 11:19:30PM +0100, Jiri Olsa escreveu: > > > > Removing code for ebpf program prologue generation. > > > > > > > > The prologue code was used to get data for extra arguments specified > > > > in program section name, like: > > > > > > > > SEC("lock_page=__lock_page page->flags") > > > > int lock_page(struct pt_regs *ctx, int err, unsigned long flags) > > > > { > > > > return 1; > > > > } > > > > > > > > This code is using deprecated libbpf API and blocks its removal. > > > > > > > > This feature was not documented and broken for some time without > > > > anyone complaining, also original authors are not responding, > > > > so I'm removing it. > > > > > > So, the example below breaks, how hard would be to move the deprecated > > > APIs to perf like was done in some other cases? > > > > > > - Arnaldo > > > > > > Before: > > > > > > [root@quaco perf]# cat tools/perf/examples/bpf/5sec.c > > > // SPDX-License-Identifier: GPL-2.0 > > > /* > > > Description: > > > > > > . Disable strace like syscall tracing (--no-syscalls), or try tracing > > > just some (-e *sleep). > > > > > > . Attach a filter function to a kernel function, returning when it should > > > be considered, i.e. appear on the output. > > > > > > . Run it system wide, so that any sleep of >= 5 seconds and < than 6 > > > seconds gets caught. > > > > > > . Ask for callgraphs using DWARF info, so that userspace can be unwound > > > > > > . While this is running, run something like "sleep 5s". > > > > > > . If we decide to add tv_nsec as well, then it becomes: > > > > > > int probe(hrtimer_nanosleep, rqtp->tv_sec rqtp->tv_nsec)(void *ctx, int err, long sec, long nsec) > > > > > > I.e. add where it comes from (rqtp->tv_nsec) and where it will be > > > accessible in the function body (nsec) > > > > > > # perf trace --no-syscalls -e tools/perf/examples/bpf/5sec.c/call-graph=dwarf/ > > > 0.000 perf_bpf_probe:func:(ffffffff9811b5f0) tv_sec=5 > > > hrtimer_nanosleep ([kernel.kallsyms]) > > > __x64_sys_nanosleep ([kernel.kallsyms]) > > > do_syscall_64 ([kernel.kallsyms]) > > > entry_SYSCALL_64 ([kernel.kallsyms]) > > > __GI___nanosleep (/usr/lib64/libc-2.26.so) > > > rpl_nanosleep (/usr/bin/sleep) > > > xnanosleep (/usr/bin/sleep) > > > main (/usr/bin/sleep) > > > __libc_start_main (/usr/lib64/libc-2.26.so) > > > _start (/usr/bin/sleep) > > > ^C# > > > > > > Copyright (C) 2018 Red Hat, Inc., Arnaldo Carvalho de Melo <acme@redhat.com> > > > */ > > > > > > #include <bpf.h> > > > > > > #define NSEC_PER_SEC 1000000000L > > > > > > int probe(hrtimer_nanosleep, rqtp)(void *ctx, int err, long long sec) > > > { > > > return sec / NSEC_PER_SEC == 5ULL; > > > } > > > > that sucks ;-) I'll check if we can re-implement as we discussed earlier, > > however below is workaround how to do it without the prologue support > > > > I'm a bit confused. I thought that we discussed dropping custom SEC() > definitions from perf as no one knew if anyone is even using this > feature and when Jiri tried it didn't even work ([0]). Is this broken > sample some other important use case that we are sure is used by > people and thus must be preserved? I overlooked that script Arnaldo found in perf, so it's not as private feature as we thought before and there could be people using it we discussed and I'll give it another try to add the support > > Or dropping this custom SEC() business and prologue generation still > on the table? I was thinking another way could be to 'mark it' somehow on our side as deprecated with workaround below and remove it after some time jirka > > > jirka > > > > > > --- > > diff --git a/tools/perf/examples/bpf/5sec.c b/tools/perf/examples/bpf/5sec.c > > index e6b6181c6dc6..734d39debdb8 100644 > > --- a/tools/perf/examples/bpf/5sec.c > > +++ b/tools/perf/examples/bpf/5sec.c > > @@ -43,9 +43,17 @@ > > > > #define NSEC_PER_SEC 1000000000L > > > > -int probe(hrtimer_nanosleep, rqtp)(void *ctx, int err, long long sec) > > +struct pt_regs { > > + long di; > > +}; > > + > > +SEC("function=hrtimer_nanosleep") > > +int krava(struct pt_regs *ctx) > > { > > - return sec / NSEC_PER_SEC == 5ULL; > > + unsigned long arg; > > + > > + probe_read_kernel(&arg, sizeof(arg), __builtin_preserve_access_index(&ctx->di)); > > + return arg / NSEC_PER_SEC == 5ULL; > > } > > > > license(GPL); > > diff --git a/tools/perf/include/bpf/bpf.h b/tools/perf/include/bpf/bpf.h > > index b422aeef5339..b7d6d2fc8342 100644 > > --- a/tools/perf/include/bpf/bpf.h > > +++ b/tools/perf/include/bpf/bpf.h > > @@ -64,6 +64,7 @@ int _version SEC("version") = LINUX_VERSION_CODE; > > > > static int (*probe_read)(void *dst, int size, const void *unsafe_addr) = (void *)BPF_FUNC_probe_read; > > static int (*probe_read_str)(void *dst, int size, const void *unsafe_addr) = (void *)BPF_FUNC_probe_read_str; > > +static long (*probe_read_kernel)(void *dst, __u32 size, const void *unsafe_ptr) = (void *) BPF_FUNC_probe_read_kernel; > > > > static int (*perf_event_output)(void *, struct bpf_map *, int, void *, unsigned long) = (void *)BPF_FUNC_perf_event_output; > > > > diff --git a/tools/perf/util/llvm-utils.c b/tools/perf/util/llvm-utils.c > > index 96c8ef60f4f8..9274a3373847 100644 > > --- a/tools/perf/util/llvm-utils.c > > +++ b/tools/perf/util/llvm-utils.c > > @@ -25,7 +25,7 @@ > > "$CLANG_OPTIONS $PERF_BPF_INC_OPTIONS $KERNEL_INC_OPTIONS " \ > > "-Wno-unused-value -Wno-pointer-sign " \ > > "-working-directory $WORKING_DIR " \ > > - "-c \"$CLANG_SOURCE\" -target bpf $CLANG_EMIT_LLVM -O2 -o - $LLVM_OPTIONS_PIPE" > > + "-g -c \"$CLANG_SOURCE\" -target bpf $CLANG_EMIT_LLVM -O2 -o - $LLVM_OPTIONS_PIPE" > > > > struct llvm_param llvm_param = { > > .clang_path = "clang",
On Sun, Feb 13, 2022 at 09:57:15PM -0800, Andrii Nakryiko wrote: > On Sun, Feb 13, 2022 at 7:02 AM Jiri Olsa <olsajiri@gmail.com> wrote: > > > > On Thu, Feb 10, 2022 at 09:28:51PM -0800, Andrii Nakryiko wrote: > > > On Thu, Feb 10, 2022 at 1:31 PM Jiri Olsa <olsajiri@gmail.com> wrote: > > > > > > > > On Thu, Feb 10, 2022 at 04:18:10PM -0300, Arnaldo Carvalho de Melo wrote: > > > > > Em Sun, Jan 23, 2022 at 11:19:30PM +0100, Jiri Olsa escreveu: > > > > > > Removing code for ebpf program prologue generation. > > > > > > > > > > > > The prologue code was used to get data for extra arguments specified > > > > > > in program section name, like: > > > > > > > > > > > > SEC("lock_page=__lock_page page->flags") > > > > > > int lock_page(struct pt_regs *ctx, int err, unsigned long flags) > > > > > > { > > > > > > return 1; > > > > > > } > > > > > > > > > > > > This code is using deprecated libbpf API and blocks its removal. > > > > > > > > > > > > This feature was not documented and broken for some time without > > > > > > anyone complaining, also original authors are not responding, > > > > > > so I'm removing it. > > > > > > > > > > So, the example below breaks, how hard would be to move the deprecated > > > > > APIs to perf like was done in some other cases? > > > > > > > > > > > Just copy/pasting libbpf code won't work. But there are three parts: > > > > > > 1. bpf_(program|map|object)__set_priv(). There is no equivalent API, > > > but perf can maintain this private data by building hashmap where the > > > key is bpf_object/bpf_map/bpf_program pointer itself. Annoying but > > > very straightforward to replace. > > > > > > 2. For prologue generation, bpf_program__set_prep() doesn't have a > > > direct equivalent. But program cloning and adjustment of the code can > > > be achieved through bpf_program__insns()/bpf_program__insn_cnt() API > > > to load one "prototype" program, gets its underlying insns and clone > > > programs as necessary. After that, bpf_prog_load() would be used to > > > load those cloned programs into kernel. > > > > hm, I can't see how to clone a program.. so we need to end up with > > several copies of the single program defined in the object.. I can > > get its intructions with bpf_program__insns, but how do I add more > > programs with these instructions customized/prefixed? > > You can't add those clones back to bpf_object, of course. But after > grabbing (and modifying) instructions, you can use bpf_prog_load() > low-level API to create BPF programs and get their FDs back. You'll > have to keep track of those prog FDs separately from libbpf' struct > bpf_object. ok so loaded on the side with bpf_prog_load thanks, jirka
diff --git a/tools/perf/Makefile.config b/tools/perf/Makefile.config index 96ad944ca6a8..d9ff537d999e 100644 --- a/tools/perf/Makefile.config +++ b/tools/perf/Makefile.config @@ -556,17 +556,6 @@ ifndef NO_LIBELF endif endif - ifndef NO_DWARF - ifdef PERF_HAVE_ARCH_REGS_QUERY_REGISTER_OFFSET - CFLAGS += -DHAVE_BPF_PROLOGUE - $(call detected,CONFIG_BPF_PROLOGUE) - else - msg := $(warning BPF prologue is not supported by architecture $(SRCARCH), missing regs_query_register_offset()); - endif - else - msg := $(warning DWARF support is off, BPF prologue is disabled); - endif - endif # NO_LIBBPF endif # NO_LIBELF diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c index bb716c953d02..1a8111fdff2e 100644 --- a/tools/perf/builtin-record.c +++ b/tools/perf/builtin-record.c @@ -2696,20 +2696,6 @@ int cmd_record(int argc, const char **argv) set_nobuild('\0', "clang-path", true); set_nobuild('\0', "clang-opt", true); # undef set_nobuild -#endif - -#ifndef HAVE_BPF_PROLOGUE -# if !defined (HAVE_DWARF_SUPPORT) -# define REASON "NO_DWARF=1" -# elif !defined (HAVE_LIBBPF_SUPPORT) -# define REASON "NO_LIBBPF=1" -# else -# define REASON "this architecture doesn't support BPF prologue" -# endif -# define set_nobuild(s, l, c) set_option_nobuild(record_options, s, l, REASON, c) - set_nobuild('\0', "vmlinux", true); -# undef set_nobuild -# undef REASON #endif rec->opts.affinity = PERF_AFFINITY_SYS; diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c index 7ecfaac7536a..f9f329a48892 100644 --- a/tools/perf/util/bpf-loader.c +++ b/tools/perf/util/bpf-loader.c @@ -18,7 +18,6 @@ #include "debug.h" #include "evlist.h" #include "bpf-loader.h" -#include "bpf-prologue.h" #include "probe-event.h" #include "probe-finder.h" // for MAX_PROBES #include "parse-events.h" @@ -43,10 +42,6 @@ struct bpf_prog_priv { char *sys_name; char *evt_name; struct perf_probe_event pev; - bool need_prologue; - struct bpf_insn *insns_buf; - int nr_types; - int *type_mapping; }; static bool libbpf_initialized; @@ -128,8 +123,6 @@ clear_prog_priv(struct bpf_program *prog __maybe_unused, struct bpf_prog_priv *priv = _priv; cleanup_perf_probe_events(&priv->pev, 1); - zfree(&priv->insns_buf); - zfree(&priv->type_mapping); zfree(&priv->sys_name); zfree(&priv->evt_name); free(priv); @@ -412,220 +405,6 @@ static int bpf__prepare_probe(void) return err; } -static int -preproc_gen_prologue(struct bpf_program *prog, int n, - struct bpf_insn *orig_insns, int orig_insns_cnt, - struct bpf_prog_prep_result *res) -{ - struct bpf_prog_priv *priv = bpf_program__priv(prog); - struct probe_trace_event *tev; - struct perf_probe_event *pev; - struct bpf_insn *buf; - size_t prologue_cnt = 0; - int i, err; - - if (IS_ERR_OR_NULL(priv) || priv->is_tp) - goto errout; - - pev = &priv->pev; - - if (n < 0 || n >= priv->nr_types) - goto errout; - - /* Find a tev belongs to that type */ - for (i = 0; i < pev->ntevs; i++) { - if (priv->type_mapping[i] == n) - break; - } - - if (i >= pev->ntevs) { - pr_debug("Internal error: prologue type %d not found\n", n); - return -BPF_LOADER_ERRNO__PROLOGUE; - } - - tev = &pev->tevs[i]; - - buf = priv->insns_buf; - err = bpf__gen_prologue(tev->args, tev->nargs, - buf, &prologue_cnt, - BPF_MAXINSNS - orig_insns_cnt); - if (err) { - const char *title; - - title = bpf_program__section_name(prog); - pr_debug("Failed to generate prologue for program %s\n", - title); - return err; - } - - memcpy(&buf[prologue_cnt], orig_insns, - sizeof(struct bpf_insn) * orig_insns_cnt); - - res->new_insn_ptr = buf; - res->new_insn_cnt = prologue_cnt + orig_insns_cnt; - res->pfd = NULL; - return 0; - -errout: - pr_debug("Internal error in preproc_gen_prologue\n"); - return -BPF_LOADER_ERRNO__PROLOGUE; -} - -/* - * compare_tev_args is reflexive, transitive and antisymmetric. - * I can proof it but this margin is too narrow to contain. - */ -static int compare_tev_args(const void *ptev1, const void *ptev2) -{ - int i, ret; - const struct probe_trace_event *tev1 = - *(const struct probe_trace_event **)ptev1; - const struct probe_trace_event *tev2 = - *(const struct probe_trace_event **)ptev2; - - ret = tev2->nargs - tev1->nargs; - if (ret) - return ret; - - for (i = 0; i < tev1->nargs; i++) { - struct probe_trace_arg *arg1, *arg2; - struct probe_trace_arg_ref *ref1, *ref2; - - arg1 = &tev1->args[i]; - arg2 = &tev2->args[i]; - - ret = strcmp(arg1->value, arg2->value); - if (ret) - return ret; - - ref1 = arg1->ref; - ref2 = arg2->ref; - - while (ref1 && ref2) { - ret = ref2->offset - ref1->offset; - if (ret) - return ret; - - ref1 = ref1->next; - ref2 = ref2->next; - } - - if (ref1 || ref2) - return ref2 ? 1 : -1; - } - - return 0; -} - -/* - * Assign a type number to each tevs in a pev. - * mapping is an array with same slots as tevs in that pev. - * nr_types will be set to number of types. - */ -static int map_prologue(struct perf_probe_event *pev, int *mapping, - int *nr_types) -{ - int i, type = 0; - struct probe_trace_event **ptevs; - - size_t array_sz = sizeof(*ptevs) * pev->ntevs; - - ptevs = malloc(array_sz); - if (!ptevs) { - pr_debug("Not enough memory: alloc ptevs failed\n"); - return -ENOMEM; - } - - pr_debug("In map_prologue, ntevs=%d\n", pev->ntevs); - for (i = 0; i < pev->ntevs; i++) - ptevs[i] = &pev->tevs[i]; - - qsort(ptevs, pev->ntevs, sizeof(*ptevs), - compare_tev_args); - - for (i = 0; i < pev->ntevs; i++) { - int n; - - n = ptevs[i] - pev->tevs; - if (i == 0) { - mapping[n] = type; - pr_debug("mapping[%d]=%d\n", n, type); - continue; - } - - if (compare_tev_args(ptevs + i, ptevs + i - 1) == 0) - mapping[n] = type; - else - mapping[n] = ++type; - - pr_debug("mapping[%d]=%d\n", n, mapping[n]); - } - free(ptevs); - *nr_types = type + 1; - - return 0; -} - -static int hook_load_preprocessor(struct bpf_program *prog) -{ - struct bpf_prog_priv *priv = bpf_program__priv(prog); - struct perf_probe_event *pev; - bool need_prologue = false; - int err, i; - - if (IS_ERR_OR_NULL(priv)) { - pr_debug("Internal error when hook preprocessor\n"); - return -BPF_LOADER_ERRNO__INTERNAL; - } - - if (priv->is_tp) { - priv->need_prologue = false; - return 0; - } - - pev = &priv->pev; - for (i = 0; i < pev->ntevs; i++) { - struct probe_trace_event *tev = &pev->tevs[i]; - - if (tev->nargs > 0) { - need_prologue = true; - break; - } - } - - /* - * Since all tevs don't have argument, we don't need generate - * prologue. - */ - if (!need_prologue) { - priv->need_prologue = false; - return 0; - } - - priv->need_prologue = true; - priv->insns_buf = malloc(sizeof(struct bpf_insn) * BPF_MAXINSNS); - if (!priv->insns_buf) { - pr_debug("Not enough memory: alloc insns_buf failed\n"); - return -ENOMEM; - } - - priv->type_mapping = malloc(sizeof(int) * pev->ntevs); - if (!priv->type_mapping) { - pr_debug("Not enough memory: alloc type_mapping failed\n"); - return -ENOMEM; - } - memset(priv->type_mapping, -1, - sizeof(int) * pev->ntevs); - - err = map_prologue(pev, priv->type_mapping, &priv->nr_types); - if (err) - return err; - - err = bpf_program__set_prep(prog, priv->nr_types, - preproc_gen_prologue); - return err; -} - int bpf__probe(struct bpf_object *obj) { int err = 0; @@ -672,18 +451,6 @@ int bpf__probe(struct bpf_object *obj) pr_debug("bpf_probe: failed to apply perf probe events\n"); goto out; } - - /* - * After probing, let's consider prologue, which - * adds program fetcher to BPF programs. - * - * hook_load_preprocessor() hooks pre-processor - * to bpf_program, let it generate prologue - * dynamically during loading. - */ - err = hook_load_preprocessor(prog); - if (err) - goto out; } out: return err < 0 ? err : 0; @@ -776,14 +543,7 @@ int bpf__foreach_event(struct bpf_object *obj, for (i = 0; i < pev->ntevs; i++) { tev = &pev->tevs[i]; - if (priv->need_prologue) { - int type = priv->type_mapping[i]; - - fd = bpf_program__nth_fd(prog, type); - } else { - fd = bpf_program__fd(prog); - } - + fd = bpf_program__fd(prog); if (fd < 0) { pr_debug("bpf: failed to get file descriptor\n"); return fd; diff --git a/tools/perf/util/bpf-prologue.c b/tools/perf/util/bpf-prologue.c deleted file mode 100644 index 9887ae09242d..000000000000 --- a/tools/perf/util/bpf-prologue.c +++ /dev/null @@ -1,508 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -/* - * bpf-prologue.c - * - * Copyright (C) 2015 He Kuang <hekuang@huawei.com> - * Copyright (C) 2015 Wang Nan <wangnan0@huawei.com> - * Copyright (C) 2015 Huawei Inc. - */ - -#include <bpf/libbpf.h> -#include "debug.h" -#include "bpf-loader.h" -#include "bpf-prologue.h" -#include "probe-finder.h" -#include <errno.h> -#include <stdlib.h> -#include <dwarf-regs.h> -#include <linux/filter.h> - -#define BPF_REG_SIZE 8 - -#define JMP_TO_ERROR_CODE -1 -#define JMP_TO_SUCCESS_CODE -2 -#define JMP_TO_USER_CODE -3 - -struct bpf_insn_pos { - struct bpf_insn *begin; - struct bpf_insn *end; - struct bpf_insn *pos; -}; - -static inline int -pos_get_cnt(struct bpf_insn_pos *pos) -{ - return pos->pos - pos->begin; -} - -static int -append_insn(struct bpf_insn new_insn, struct bpf_insn_pos *pos) -{ - if (!pos->pos) - return -BPF_LOADER_ERRNO__PROLOGUE2BIG; - - if (pos->pos + 1 >= pos->end) { - pr_err("bpf prologue: prologue too long\n"); - pos->pos = NULL; - return -BPF_LOADER_ERRNO__PROLOGUE2BIG; - } - - *(pos->pos)++ = new_insn; - return 0; -} - -static int -check_pos(struct bpf_insn_pos *pos) -{ - if (!pos->pos || pos->pos >= pos->end) - return -BPF_LOADER_ERRNO__PROLOGUE2BIG; - return 0; -} - -/* - * Convert type string (u8/u16/u32/u64/s8/s16/s32/s64 ..., see - * Documentation/trace/kprobetrace.rst) to size field of BPF_LDX_MEM - * instruction (BPF_{B,H,W,DW}). - */ -static int -argtype_to_ldx_size(const char *type) -{ - int arg_size = type ? atoi(&type[1]) : 64; - - switch (arg_size) { - case 8: - return BPF_B; - case 16: - return BPF_H; - case 32: - return BPF_W; - case 64: - default: - return BPF_DW; - } -} - -static const char * -insn_sz_to_str(int insn_sz) -{ - switch (insn_sz) { - case BPF_B: - return "BPF_B"; - case BPF_H: - return "BPF_H"; - case BPF_W: - return "BPF_W"; - case BPF_DW: - return "BPF_DW"; - default: - return "UNKNOWN"; - } -} - -/* Give it a shorter name */ -#define ins(i, p) append_insn((i), (p)) - -/* - * Give a register name (in 'reg'), generate instruction to - * load register into an eBPF register rd: - * 'ldd target_reg, offset(ctx_reg)', where: - * ctx_reg is pre initialized to pointer of 'struct pt_regs'. - */ -static int -gen_ldx_reg_from_ctx(struct bpf_insn_pos *pos, int ctx_reg, - const char *reg, int target_reg) -{ - int offset = regs_query_register_offset(reg); - - if (offset < 0) { - pr_err("bpf: prologue: failed to get register %s\n", - reg); - return offset; - } - ins(BPF_LDX_MEM(BPF_DW, target_reg, ctx_reg, offset), pos); - - return check_pos(pos); -} - -/* - * Generate a BPF_FUNC_probe_read function call. - * - * src_base_addr_reg is a register holding base address, - * dst_addr_reg is a register holding dest address (on stack), - * result is: - * - * *[dst_addr_reg] = *([src_base_addr_reg] + offset) - * - * Arguments of BPF_FUNC_probe_read: - * ARG1: ptr to stack (dest) - * ARG2: size (8) - * ARG3: unsafe ptr (src) - */ -static int -gen_read_mem(struct bpf_insn_pos *pos, - int src_base_addr_reg, - int dst_addr_reg, - long offset, - int probeid) -{ - /* mov arg3, src_base_addr_reg */ - if (src_base_addr_reg != BPF_REG_ARG3) - ins(BPF_MOV64_REG(BPF_REG_ARG3, src_base_addr_reg), pos); - /* add arg3, #offset */ - if (offset) - ins(BPF_ALU64_IMM(BPF_ADD, BPF_REG_ARG3, offset), pos); - - /* mov arg2, #reg_size */ - ins(BPF_ALU64_IMM(BPF_MOV, BPF_REG_ARG2, BPF_REG_SIZE), pos); - - /* mov arg1, dst_addr_reg */ - if (dst_addr_reg != BPF_REG_ARG1) - ins(BPF_MOV64_REG(BPF_REG_ARG1, dst_addr_reg), pos); - - /* Call probe_read */ - ins(BPF_EMIT_CALL(probeid), pos); - /* - * Error processing: if read fail, goto error code, - * will be relocated. Target should be the start of - * error processing code. - */ - ins(BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, JMP_TO_ERROR_CODE), - pos); - - return check_pos(pos); -} - -/* - * Each arg should be bare register. Fetch and save them into argument - * registers (r3 - r5). - * - * BPF_REG_1 should have been initialized with pointer to - * 'struct pt_regs'. - */ -static int -gen_prologue_fastpath(struct bpf_insn_pos *pos, - struct probe_trace_arg *args, int nargs) -{ - int i, err = 0; - - for (i = 0; i < nargs; i++) { - err = gen_ldx_reg_from_ctx(pos, BPF_REG_1, args[i].value, - BPF_PROLOGUE_START_ARG_REG + i); - if (err) - goto errout; - } - - return check_pos(pos); -errout: - return err; -} - -/* - * Slow path: - * At least one argument has the form of 'offset($rx)'. - * - * Following code first stores them into stack, then loads all of then - * to r2 - r5. - * Before final loading, the final result should be: - * - * low address - * BPF_REG_FP - 24 ARG3 - * BPF_REG_FP - 16 ARG2 - * BPF_REG_FP - 8 ARG1 - * BPF_REG_FP - * high address - * - * For each argument (described as: offn(...off2(off1(reg)))), - * generates following code: - * - * r7 <- fp - * r7 <- r7 - stack_offset // Ideal code should initialize r7 using - * // fp before generating args. However, - * // eBPF won't regard r7 as stack pointer - * // if it is generated by minus 8 from - * // another stack pointer except fp. - * // This is why we have to set r7 - * // to fp for each variable. - * r3 <- value of 'reg'-> generated using gen_ldx_reg_from_ctx() - * (r7) <- r3 // skip following instructions for bare reg - * r3 <- r3 + off1 . // skip if off1 == 0 - * r2 <- 8 \ - * r1 <- r7 |-> generated by gen_read_mem() - * call probe_read / - * jnei r0, 0, err ./ - * r3 <- (r7) - * r3 <- r3 + off2 . // skip if off2 == 0 - * r2 <- 8 \ // r2 may be broken by probe_read, so set again - * r1 <- r7 |-> generated by gen_read_mem() - * call probe_read / - * jnei r0, 0, err ./ - * ... - */ -static int -gen_prologue_slowpath(struct bpf_insn_pos *pos, - struct probe_trace_arg *args, int nargs) -{ - int err, i, probeid; - - for (i = 0; i < nargs; i++) { - struct probe_trace_arg *arg = &args[i]; - const char *reg = arg->value; - struct probe_trace_arg_ref *ref = NULL; - int stack_offset = (i + 1) * -8; - - pr_debug("prologue: fetch arg %d, base reg is %s\n", - i, reg); - - /* value of base register is stored into ARG3 */ - err = gen_ldx_reg_from_ctx(pos, BPF_REG_CTX, reg, - BPF_REG_ARG3); - if (err) { - pr_err("prologue: failed to get offset of register %s\n", - reg); - goto errout; - } - - /* Make r7 the stack pointer. */ - ins(BPF_MOV64_REG(BPF_REG_7, BPF_REG_FP), pos); - /* r7 += -8 */ - ins(BPF_ALU64_IMM(BPF_ADD, BPF_REG_7, stack_offset), pos); - /* - * Store r3 (base register) onto stack - * Ensure fp[offset] is set. - * fp is the only valid base register when storing - * into stack. We are not allowed to use r7 as base - * register here. - */ - ins(BPF_STX_MEM(BPF_DW, BPF_REG_FP, BPF_REG_ARG3, - stack_offset), pos); - - ref = arg->ref; - probeid = BPF_FUNC_probe_read_kernel; - while (ref) { - pr_debug("prologue: arg %d: offset %ld\n", - i, ref->offset); - - if (ref->user_access) - probeid = BPF_FUNC_probe_read_user; - - err = gen_read_mem(pos, BPF_REG_3, BPF_REG_7, - ref->offset, probeid); - if (err) { - pr_err("prologue: failed to generate probe_read function call\n"); - goto errout; - } - - ref = ref->next; - /* - * Load previous result into ARG3. Use - * BPF_REG_FP instead of r7 because verifier - * allows FP based addressing only. - */ - if (ref) - ins(BPF_LDX_MEM(BPF_DW, BPF_REG_ARG3, - BPF_REG_FP, stack_offset), pos); - } - } - - /* Final pass: read to registers */ - for (i = 0; i < nargs; i++) { - int insn_sz = (args[i].ref) ? argtype_to_ldx_size(args[i].type) : BPF_DW; - - pr_debug("prologue: load arg %d, insn_sz is %s\n", - i, insn_sz_to_str(insn_sz)); - ins(BPF_LDX_MEM(insn_sz, BPF_PROLOGUE_START_ARG_REG + i, - BPF_REG_FP, -BPF_REG_SIZE * (i + 1)), pos); - } - - ins(BPF_JMP_IMM(BPF_JA, BPF_REG_0, 0, JMP_TO_SUCCESS_CODE), pos); - - return check_pos(pos); -errout: - return err; -} - -static int -prologue_relocate(struct bpf_insn_pos *pos, struct bpf_insn *error_code, - struct bpf_insn *success_code, struct bpf_insn *user_code) -{ - struct bpf_insn *insn; - - if (check_pos(pos)) - return -BPF_LOADER_ERRNO__PROLOGUE2BIG; - - for (insn = pos->begin; insn < pos->pos; insn++) { - struct bpf_insn *target; - u8 class = BPF_CLASS(insn->code); - u8 opcode; - - if (class != BPF_JMP) - continue; - opcode = BPF_OP(insn->code); - if (opcode == BPF_CALL) - continue; - - switch (insn->off) { - case JMP_TO_ERROR_CODE: - target = error_code; - break; - case JMP_TO_SUCCESS_CODE: - target = success_code; - break; - case JMP_TO_USER_CODE: - target = user_code; - break; - default: - pr_err("bpf prologue: internal error: relocation failed\n"); - return -BPF_LOADER_ERRNO__PROLOGUE; - } - - insn->off = target - (insn + 1); - } - return 0; -} - -int bpf__gen_prologue(struct probe_trace_arg *args, int nargs, - struct bpf_insn *new_prog, size_t *new_cnt, - size_t cnt_space) -{ - struct bpf_insn *success_code = NULL; - struct bpf_insn *error_code = NULL; - struct bpf_insn *user_code = NULL; - struct bpf_insn_pos pos; - bool fastpath = true; - int err = 0, i; - - if (!new_prog || !new_cnt) - return -EINVAL; - - if (cnt_space > BPF_MAXINSNS) - cnt_space = BPF_MAXINSNS; - - pos.begin = new_prog; - pos.end = new_prog + cnt_space; - pos.pos = new_prog; - - if (!nargs) { - ins(BPF_ALU64_IMM(BPF_MOV, BPF_PROLOGUE_FETCH_RESULT_REG, 0), - &pos); - - if (check_pos(&pos)) - goto errout; - - *new_cnt = pos_get_cnt(&pos); - return 0; - } - - if (nargs > BPF_PROLOGUE_MAX_ARGS) { - pr_warning("bpf: prologue: %d arguments are dropped\n", - nargs - BPF_PROLOGUE_MAX_ARGS); - nargs = BPF_PROLOGUE_MAX_ARGS; - } - - /* First pass: validation */ - for (i = 0; i < nargs; i++) { - struct probe_trace_arg_ref *ref = args[i].ref; - - if (args[i].value[0] == '@') { - /* TODO: fetch global variable */ - pr_err("bpf: prologue: global %s%+ld not support\n", - args[i].value, ref ? ref->offset : 0); - return -ENOTSUP; - } - - while (ref) { - /* fastpath is true if all args has ref == NULL */ - fastpath = false; - - /* - * Instruction encodes immediate value using - * s32, ref->offset is long. On systems which - * can't fill long in s32, refuse to process if - * ref->offset too large (or small). - */ -#ifdef __LP64__ -#define OFFSET_MAX ((1LL << 31) - 1) -#define OFFSET_MIN ((1LL << 31) * -1) - if (ref->offset > OFFSET_MAX || - ref->offset < OFFSET_MIN) { - pr_err("bpf: prologue: offset out of bound: %ld\n", - ref->offset); - return -BPF_LOADER_ERRNO__PROLOGUEOOB; - } -#endif - ref = ref->next; - } - } - pr_debug("prologue: pass validation\n"); - - if (fastpath) { - /* If all variables are registers... */ - pr_debug("prologue: fast path\n"); - err = gen_prologue_fastpath(&pos, args, nargs); - if (err) - goto errout; - } else { - pr_debug("prologue: slow path\n"); - - /* Initialization: move ctx to a callee saved register. */ - ins(BPF_MOV64_REG(BPF_REG_CTX, BPF_REG_ARG1), &pos); - - err = gen_prologue_slowpath(&pos, args, nargs); - if (err) - goto errout; - /* - * start of ERROR_CODE (only slow pass needs error code) - * mov r2 <- 1 // r2 is error number - * mov r3 <- 0 // r3, r4... should be touched or - * // verifier would complain - * mov r4 <- 0 - * ... - * goto usercode - */ - error_code = pos.pos; - ins(BPF_ALU64_IMM(BPF_MOV, BPF_PROLOGUE_FETCH_RESULT_REG, 1), - &pos); - - for (i = 0; i < nargs; i++) - ins(BPF_ALU64_IMM(BPF_MOV, - BPF_PROLOGUE_START_ARG_REG + i, - 0), - &pos); - ins(BPF_JMP_IMM(BPF_JA, BPF_REG_0, 0, JMP_TO_USER_CODE), - &pos); - } - - /* - * start of SUCCESS_CODE: - * mov r2 <- 0 - * goto usercode // skip - */ - success_code = pos.pos; - ins(BPF_ALU64_IMM(BPF_MOV, BPF_PROLOGUE_FETCH_RESULT_REG, 0), &pos); - - /* - * start of USER_CODE: - * Restore ctx to r1 - */ - user_code = pos.pos; - if (!fastpath) { - /* - * Only slow path needs restoring of ctx. In fast path, - * register are loaded directly from r1. - */ - ins(BPF_MOV64_REG(BPF_REG_ARG1, BPF_REG_CTX), &pos); - err = prologue_relocate(&pos, error_code, success_code, - user_code); - if (err) - goto errout; - } - - err = check_pos(&pos); - if (err) - goto errout; - - *new_cnt = pos_get_cnt(&pos); - return 0; -errout: - return err; -} diff --git a/tools/perf/util/bpf-prologue.h b/tools/perf/util/bpf-prologue.h deleted file mode 100644 index c50c7358009f..000000000000 --- a/tools/perf/util/bpf-prologue.h +++ /dev/null @@ -1,37 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -/* - * Copyright (C) 2015, He Kuang <hekuang@huawei.com> - * Copyright (C) 2015, Huawei Inc. - */ -#ifndef __BPF_PROLOGUE_H -#define __BPF_PROLOGUE_H - -#include <linux/compiler.h> -#include <linux/filter.h> -#include "probe-event.h" - -#define BPF_PROLOGUE_MAX_ARGS 3 -#define BPF_PROLOGUE_START_ARG_REG BPF_REG_3 -#define BPF_PROLOGUE_FETCH_RESULT_REG BPF_REG_2 - -#ifdef HAVE_BPF_PROLOGUE -int bpf__gen_prologue(struct probe_trace_arg *args, int nargs, - struct bpf_insn *new_prog, size_t *new_cnt, - size_t cnt_space); -#else -#include <errno.h> - -static inline int -bpf__gen_prologue(struct probe_trace_arg *args __maybe_unused, - int nargs __maybe_unused, - struct bpf_insn *new_prog __maybe_unused, - size_t *new_cnt, - size_t cnt_space __maybe_unused) -{ - if (!new_cnt) - return -EINVAL; - *new_cnt = 0; - return -ENOTSUP; -} -#endif -#endif /* __BPF_PROLOGUE_H */
Removing code for ebpf program prologue generation. The prologue code was used to get data for extra arguments specified in program section name, like: SEC("lock_page=__lock_page page->flags") int lock_page(struct pt_regs *ctx, int err, unsigned long flags) { return 1; } This code is using deprecated libbpf API and blocks its removal. This feature was not documented and broken for some time without anyone complaining, also original authors are not responding, so I'm removing it. Signed-off-by: Jiri Olsa <jolsa@kernel.org> --- tools/perf/Makefile.config | 11 - tools/perf/builtin-record.c | 14 - tools/perf/util/bpf-loader.c | 242 +--------------- tools/perf/util/bpf-prologue.c | 508 --------------------------------- tools/perf/util/bpf-prologue.h | 37 --- 5 files changed, 1 insertion(+), 811 deletions(-) delete mode 100644 tools/perf/util/bpf-prologue.c delete mode 100644 tools/perf/util/bpf-prologue.h