mbox series

[RFC,00/13] kprobe/bpf: Add support to attach multiple kprobes

Message ID 20220104080943.113249-1-jolsa@kernel.org (mailing list archive)
Headers show
Series kprobe/bpf: Add support to attach multiple kprobes | expand

Message

Jiri Olsa Jan. 4, 2022, 8:09 a.m. UTC
hi,
adding support to attach multiple kprobes within single syscall
and speed up attachment of many kprobes.

The previous attempt [1] wasn't fast enough, so coming with new
approach that adds new kprobe interface.

The attachment speed of of this approach (tested in bpftrace)
is now comparable to ftrace tracer attachment speed.. fast ;-)

The limit of this approach is forced by using ftrace as attach
layer, so it allows only kprobes on function's entry (plus
return probes).

This patchset contains:
  - kprobes support to register multiple kprobes with current
    kprobe API (patches 1 - 8)
  - bpf support ot create new kprobe link allowing to attach
    multiple addresses (patches 9 - 14)

We don't need to care about multiple probes on same functions
because it's taken care on the ftrace_ops layer.

Also available at:
  https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
  kprobe/multi

thanks,
jirka

[1] https://lore.kernel.org/bpf/20211124084119.260239-1-jolsa@kernel.org/


---
Jiri Olsa (13):
      ftrace: Add ftrace_set_filter_ips function
      kprobe: Keep traced function address
      kprobe: Add support to register multiple ftrace kprobes
      kprobe: Add support to register multiple ftrace kretprobes
      kprobe: Allow to get traced function address for multi ftrace kprobes
      samples/kprobes: Add support for multi kprobe interface
      samples/kprobes: Add support for multi kretprobe interface
      bpf: Add kprobe link for attaching raw kprobes
      libbpf: Add libbpf__kallsyms_parse function
      libbpf: Add bpf_link_create support for multi kprobes
      libbpf: Add bpf_program__attach_kprobe_opts for multi kprobes
      selftest/bpf: Add raw kprobe attach test
      selftest/bpf: Add bpf_cookie test for raw_k[ret]probe

 arch/Kconfig                                             |   3 ++
 arch/x86/Kconfig                                         |   1 +
 arch/x86/kernel/kprobes/ftrace.c                         |  51 +++++++++++++-----
 include/linux/bpf_types.h                                |   1 +
 include/linux/ftrace.h                                   |   3 ++
 include/linux/kprobes.h                                  |  55 ++++++++++++++++++++
 include/uapi/linux/bpf.h                                 |  12 +++++
 kernel/bpf/syscall.c                                     | 191 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 kernel/kprobes.c                                         | 264 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---------------
 kernel/trace/bpf_trace.c                                 |   7 ++-
 kernel/trace/ftrace.c                                    |  53 +++++++++++++++----
 samples/kprobes/kprobe_example.c                         |  47 +++++++++++++++--
 samples/kprobes/kretprobe_example.c                      |  43 +++++++++++++++-
 tools/include/uapi/linux/bpf.h                           |  12 +++++
 tools/lib/bpf/bpf.c                                      |   5 ++
 tools/lib/bpf/bpf.h                                      |   7 ++-
 tools/lib/bpf/libbpf.c                                   | 186 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++---------
 tools/lib/bpf/libbpf_internal.h                          |   5 ++
 tools/testing/selftests/bpf/prog_tests/bpf_cookie.c      |  42 +++++++++++++++
 tools/testing/selftests/bpf/prog_tests/raw_kprobe_test.c |  92 +++++++++++++++++++++++++++++++++
 tools/testing/selftests/bpf/progs/get_func_ip_test.c     |   4 +-
 tools/testing/selftests/bpf/progs/raw_kprobe.c           |  58 +++++++++++++++++++++
 tools/testing/selftests/bpf/progs/test_bpf_cookie.c      |  24 ++++++++-
 23 files changed, 1062 insertions(+), 104 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/raw_kprobe_test.c
 create mode 100644 tools/testing/selftests/bpf/progs/raw_kprobe.c

Comments

Alexei Starovoitov Jan. 4, 2022, 6:53 p.m. UTC | #1
On Tue, Jan 4, 2022 at 12:09 AM Jiri Olsa <jolsa@redhat.com> wrote:
>
> hi,
> adding support to attach multiple kprobes within single syscall
> and speed up attachment of many kprobes.
>
> The previous attempt [1] wasn't fast enough, so coming with new
> approach that adds new kprobe interface.
>
> The attachment speed of of this approach (tested in bpftrace)
> is now comparable to ftrace tracer attachment speed.. fast ;-)

What are the absolute numbers?
How quickly a single bpf prog can attach to 1k kprobes?
Jiri Olsa Jan. 5, 2022, 9:15 a.m. UTC | #2
On Tue, Jan 04, 2022 at 10:53:19AM -0800, Alexei Starovoitov wrote:
> On Tue, Jan 4, 2022 at 12:09 AM Jiri Olsa <jolsa@redhat.com> wrote:
> >
> > hi,
> > adding support to attach multiple kprobes within single syscall
> > and speed up attachment of many kprobes.
> >
> > The previous attempt [1] wasn't fast enough, so coming with new
> > approach that adds new kprobe interface.
> >
> > The attachment speed of of this approach (tested in bpftrace)
> > is now comparable to ftrace tracer attachment speed.. fast ;-)
> 
> What are the absolute numbers?
> How quickly a single bpf prog can attach to 1k kprobes?
> 

I'd need to write special tool for 1k kprobes exactly,
we could do some benchmark selftest for that

I tested following counts with current bpftrace interface for now
(note it includes both attach and detach)


2 seconds for 673 kprobes:

	# perf stat -e cycles:u,cycles:k ./src/bpftrace  -e 'kprobe:kvm* {  } i:ms:10 { printf("KRAVA\n"); exit() }' 
	Attaching 2 probes...
	Attaching 673 functions
	KRAVA


	 Performance counter stats for './src/bpftrace -e kprobe:kvm* {  } i:ms:10 { printf("KRAVA\n"); exit() }':

	     1,695,142,901      cycles:u                                                    
	     1,909,616,944      cycles:k                                                    

	       1.990434019 seconds time elapsed

	       0.767746000 seconds user
	       0.921166000 seconds sys


5 seconds for 3337 kprobes:

	# perf stat -e cycles:u,cycles:k ./src/bpftrace  -e 'kprobe:x* {  } i:ms:10 { printf("KRAVA\n"); exit() }' 
	Attaching 2 probes...
	Attaching 3337 functions
	KRAVA


	 Performance counter stats for './src/bpftrace -e kprobe:x* {  } i:ms:10 { printf("KRAVA\n"); exit() }':

	     1,731,646,061      cycles:u                                                    
	     9,815,306,940      cycles:k                                                    

	       5.196176904 seconds time elapsed

	       0.780508000 seconds user
	       4.078170000 seconds sys


lot of the time above is spent in kallsyms:

	    42.70%  bpftrace  [kernel.kallsyms]     [k] kallsyms_expand_symbol.constprop.0
	     5.11%  bpftrace  [kernel.kallsyms]     [k] insn_get_prefixes.part.0
	     3.91%  bpftrace  [kernel.kallsyms]     [k] insn_decode
	     3.09%  bpftrace  [kernel.kallsyms]     [k] arch_jump_entry_size
	     1.98%  bpftrace  [kernel.kallsyms]     [k] __lock_acquire
	     1.51%  bpftrace  [kernel.kallsyms]     [k] static_call_text_reserved


by checking if the address is on the kprobe blacklist:

	    42.70%  bpftrace  [kernel.kallsyms]     [k] kallsyms_expand_symbol.constprop.0
		    |
		    ---kallsyms_expand_symbol.constprop.0
		       |          
			--42.22%--kallsyms_lookup_name
				  within_kprobe_blacklist.part.0
				  check_kprobe_address
				  register_kprobe
				  bpf_kprobe_link_attach
				  __sys_bpf
				  __x64_sys_bpf
				  do_syscall_64
				  entry_SYSCALL_64_after_hwframe
				  syscall
				  bpftrace::AttachedProbe::attach_kprobe


I could revive that patch that did bsearch on kallsyms or we could
add 'do-not-check-kprobe-blacklist' unsafe mode to get more speed

jirka
Masami Hiramatsu (Google) Jan. 5, 2022, 3:24 p.m. UTC | #3
On Tue,  4 Jan 2022 09:09:30 +0100
Jiri Olsa <jolsa@redhat.com> wrote:

> hi,
> adding support to attach multiple kprobes within single syscall
> and speed up attachment of many kprobes.
> 
> The previous attempt [1] wasn't fast enough, so coming with new
> approach that adds new kprobe interface.

Yes, since register_kprobes() just registers multiple kprobes on
array. This is designed for dozens of kprobes.

> The attachment speed of of this approach (tested in bpftrace)
> is now comparable to ftrace tracer attachment speed.. fast ;-)

Yes, because that if ftrace, not kprobes.

> The limit of this approach is forced by using ftrace as attach
> layer, so it allows only kprobes on function's entry (plus
> return probes).

Note that you also need to multiply the number of instances.

> 
> This patchset contains:
>   - kprobes support to register multiple kprobes with current
>     kprobe API (patches 1 - 8)
>   - bpf support ot create new kprobe link allowing to attach
>     multiple addresses (patches 9 - 14)
> 
> We don't need to care about multiple probes on same functions
> because it's taken care on the ftrace_ops layer.

Hmm, I think there may be a time to split the "kprobe as an 
interface for the software breakpoint" and "kprobe as a wrapper
interface for the callbacks of various instrumentations", like
'raw_kprobe'(or kswbp) and 'kprobes'.
And this may be called as 'fprobe' as ftrace_ops wrapper.
(But if the bpf is enough flexible, this kind of intermediate layer
 may not be needed, it can use ftrace_ops directly, eventually)

Jiri, have you already considered to use ftrace_ops from the
bpf directly? Are there any issues?
(bpf depends on 'kprobe' widely?)

Thank you,
Jiri Olsa Jan. 6, 2022, 8:29 a.m. UTC | #4
On Thu, Jan 06, 2022 at 12:24:35AM +0900, Masami Hiramatsu wrote:
> On Tue,  4 Jan 2022 09:09:30 +0100
> Jiri Olsa <jolsa@redhat.com> wrote:
> 
> > hi,
> > adding support to attach multiple kprobes within single syscall
> > and speed up attachment of many kprobes.
> > 
> > The previous attempt [1] wasn't fast enough, so coming with new
> > approach that adds new kprobe interface.
> 
> Yes, since register_kprobes() just registers multiple kprobes on
> array. This is designed for dozens of kprobes.
> 
> > The attachment speed of of this approach (tested in bpftrace)
> > is now comparable to ftrace tracer attachment speed.. fast ;-)
> 
> Yes, because that if ftrace, not kprobes.
> 
> > The limit of this approach is forced by using ftrace as attach
> > layer, so it allows only kprobes on function's entry (plus
> > return probes).
> 
> Note that you also need to multiply the number of instances.
> 
> > 
> > This patchset contains:
> >   - kprobes support to register multiple kprobes with current
> >     kprobe API (patches 1 - 8)
> >   - bpf support ot create new kprobe link allowing to attach
> >     multiple addresses (patches 9 - 14)
> > 
> > We don't need to care about multiple probes on same functions
> > because it's taken care on the ftrace_ops layer.
> 
> Hmm, I think there may be a time to split the "kprobe as an 
> interface for the software breakpoint" and "kprobe as a wrapper
> interface for the callbacks of various instrumentations", like
> 'raw_kprobe'(or kswbp) and 'kprobes'.
> And this may be called as 'fprobe' as ftrace_ops wrapper.
> (But if the bpf is enough flexible, this kind of intermediate layer
>  may not be needed, it can use ftrace_ops directly, eventually)
> 
> Jiri, have you already considered to use ftrace_ops from the
> bpf directly? Are there any issues?
> (bpf depends on 'kprobe' widely?)

at the moment there's not ftrace public interface for the return
probe merged in, so to get the kretprobe working I had to use
kprobe interface

but.. there are patches Steven shared some time ago, that do that
and make graph_ops available as kernel interface

I recall we considered graph_ops interface before as common attach
layer for trampolines, which was bad, but it might actually make
sense for kprobes

I'll need to check it in more details but I think both graph_ops and
kprobe do about similar thing wrt hooking return probe, so it should
be comparable.. and they are already doing the same for the entry hook,
because kprobe is mostly using ftrace for that

we would not need to introduce new program type - kprobe programs
should be able to run from ftrace callbacks just fine

so we would have:
  - kprobe type programs attaching to:
  - new BPF_LINK_TYPE_FPROBE link using the graph_ops as attachment layer

jirka
Masami Hiramatsu (Google) Jan. 6, 2022, 1:59 p.m. UTC | #5
On Thu, 6 Jan 2022 09:29:02 +0100
Jiri Olsa <jolsa@redhat.com> wrote:

> On Thu, Jan 06, 2022 at 12:24:35AM +0900, Masami Hiramatsu wrote:
> > On Tue,  4 Jan 2022 09:09:30 +0100
> > Jiri Olsa <jolsa@redhat.com> wrote:
> > 
> > > hi,
> > > adding support to attach multiple kprobes within single syscall
> > > and speed up attachment of many kprobes.
> > > 
> > > The previous attempt [1] wasn't fast enough, so coming with new
> > > approach that adds new kprobe interface.
> > 
> > Yes, since register_kprobes() just registers multiple kprobes on
> > array. This is designed for dozens of kprobes.
> > 
> > > The attachment speed of of this approach (tested in bpftrace)
> > > is now comparable to ftrace tracer attachment speed.. fast ;-)
> > 
> > Yes, because that if ftrace, not kprobes.
> > 
> > > The limit of this approach is forced by using ftrace as attach
> > > layer, so it allows only kprobes on function's entry (plus
> > > return probes).
> > 
> > Note that you also need to multiply the number of instances.
> > 
> > > 
> > > This patchset contains:
> > >   - kprobes support to register multiple kprobes with current
> > >     kprobe API (patches 1 - 8)
> > >   - bpf support ot create new kprobe link allowing to attach
> > >     multiple addresses (patches 9 - 14)
> > > 
> > > We don't need to care about multiple probes on same functions
> > > because it's taken care on the ftrace_ops layer.
> > 
> > Hmm, I think there may be a time to split the "kprobe as an 
> > interface for the software breakpoint" and "kprobe as a wrapper
> > interface for the callbacks of various instrumentations", like
> > 'raw_kprobe'(or kswbp) and 'kprobes'.
> > And this may be called as 'fprobe' as ftrace_ops wrapper.
> > (But if the bpf is enough flexible, this kind of intermediate layer
> >  may not be needed, it can use ftrace_ops directly, eventually)
> > 
> > Jiri, have you already considered to use ftrace_ops from the
> > bpf directly? Are there any issues?
> > (bpf depends on 'kprobe' widely?)
> 
> at the moment there's not ftrace public interface for the return
> probe merged in, so to get the kretprobe working I had to use
> kprobe interface

Yeah, I found that too. We have to ask Steve to salvage it ;)

> but.. there are patches Steven shared some time ago, that do that
> and make graph_ops available as kernel interface
> 
> I recall we considered graph_ops interface before as common attach
> layer for trampolines, which was bad, but it might actually make
> sense for kprobes

I started working on making 'fprobe' which will provide multiple
function probe with similar interface of kprobes. See attached
patch. Then you can use it in bpf, maybe with an union like

union {
	struct kprobe kp;	// for function body
	struct fprobe fp;	// for function entry and return
};

At this moment, fprobe only support entry_handler, but when we
re-start the generic graph_ops interface, it is easy to expand
to support exit_handler.
If this works, I think kretprobe can be phased out, since at that
moment, kprobe_event can replace it with the fprobe exit_handler.
(This is a benefit of decoupling the instrumentation layer from
the event layer. It can choose the best way without changing
user interface.)

> I'll need to check it in more details but I think both graph_ops and
> kprobe do about similar thing wrt hooking return probe, so it should
> be comparable.. and they are already doing the same for the entry hook,
> because kprobe is mostly using ftrace for that
> 
> we would not need to introduce new program type - kprobe programs
> should be able to run from ftrace callbacks just fine

That seems to bind your mind. The program type is just a programing
'model' of the bpf. You can choose the best implementation to provide
equal functionality. 'kprobe' in bpf is just a name that you call some
instrumentations which can probe kernel code.

Thank you,

> 
> so we would have:
>   - kprobe type programs attaching to:
>   - new BPF_LINK_TYPE_FPROBE link using the graph_ops as attachment layer
> 
> jirka
>
Jiri Olsa Jan. 6, 2022, 2:57 p.m. UTC | #6
On Thu, Jan 06, 2022 at 10:59:43PM +0900, Masami Hiramatsu wrote:

SNIP

> > > 
> > > Hmm, I think there may be a time to split the "kprobe as an 
> > > interface for the software breakpoint" and "kprobe as a wrapper
> > > interface for the callbacks of various instrumentations", like
> > > 'raw_kprobe'(or kswbp) and 'kprobes'.
> > > And this may be called as 'fprobe' as ftrace_ops wrapper.
> > > (But if the bpf is enough flexible, this kind of intermediate layer
> > >  may not be needed, it can use ftrace_ops directly, eventually)
> > > 
> > > Jiri, have you already considered to use ftrace_ops from the
> > > bpf directly? Are there any issues?
> > > (bpf depends on 'kprobe' widely?)
> > 
> > at the moment there's not ftrace public interface for the return
> > probe merged in, so to get the kretprobe working I had to use
> > kprobe interface
> 
> Yeah, I found that too. We have to ask Steve to salvage it ;)

I got those patches rebased like half a year ago upstream code,
so should be easy to revive them

> 
> > but.. there are patches Steven shared some time ago, that do that
> > and make graph_ops available as kernel interface
> > 
> > I recall we considered graph_ops interface before as common attach
> > layer for trampolines, which was bad, but it might actually make
> > sense for kprobes
> 
> I started working on making 'fprobe' which will provide multiple
> function probe with similar interface of kprobes. See attached
> patch. Then you can use it in bpf, maybe with an union like
> 
> union {
> 	struct kprobe kp;	// for function body
> 	struct fprobe fp;	// for function entry and return
> };
> 
> At this moment, fprobe only support entry_handler, but when we
> re-start the generic graph_ops interface, it is easy to expand
> to support exit_handler.
> If this works, I think kretprobe can be phased out, since at that
> moment, kprobe_event can replace it with the fprobe exit_handler.
> (This is a benefit of decoupling the instrumentation layer from
> the event layer. It can choose the best way without changing
> user interface.)
> 

I can resend out graph_ops patches if you want to base
it directly on that

> > I'll need to check it in more details but I think both graph_ops and
> > kprobe do about similar thing wrt hooking return probe, so it should
> > be comparable.. and they are already doing the same for the entry hook,
> > because kprobe is mostly using ftrace for that
> > 
> > we would not need to introduce new program type - kprobe programs
> > should be able to run from ftrace callbacks just fine
> 
> That seems to bind your mind. The program type is just a programing
> 'model' of the bpf. You can choose the best implementation to provide
> equal functionality. 'kprobe' in bpf is just a name that you call some
> instrumentations which can probe kernel code.

I don't want to introduce new type, there's some dependencies
in bpf verifier and helpers code we'd need to handle for that

I'm looking for solution for current kprobe bpf program type
to be registered for multiple addresses quickly

> 
> Thank you,
> 
> > 
> > so we would have:
> >   - kprobe type programs attaching to:
> >   - new BPF_LINK_TYPE_FPROBE link using the graph_ops as attachment layer
> > 
> > jirka
> > 
> 
> 
> -- 
> Masami Hiramatsu <mhiramat@kernel.org>

> From 269b86597c166d6d4c5dd564168237603533165a Mon Sep 17 00:00:00 2001
> From: Masami Hiramatsu <mhiramat@kernel.org>
> Date: Thu, 6 Jan 2022 15:40:36 +0900
> Subject: [PATCH] fprobe: Add ftrace based probe APIs
> 
> The fprobe is a wrapper API for ftrace function tracer.
> Unlike kprobes, this probes only supports the function entry, but
> it can probe multiple functions by one fprobe. The usage is almost
> same as the kprobe, user will specify the function names by
> fprobe::syms, the number of syms by fprobe::nsyms, and the user
> handler by fprobe::handler.
> 
> struct fprobe = { 0 };
> const char *targets[] = {"func1", "func2", "func3"};
> 
> fprobe.handler = user_handler;
> fprobe.nsyms = ARRAY_SIZE(targets);
> fprobe.syms = targets;
> 
> ret = register_fprobe(&fprobe);
> ...
> 
> 
> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
> ---
>  include/linux/fprobes.h |  52 ++++++++++++++++
>  kernel/trace/Kconfig    |  10 ++++
>  kernel/trace/Makefile   |   1 +
>  kernel/trace/fprobes.c  | 128 ++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 191 insertions(+)
>  create mode 100644 include/linux/fprobes.h
>  create mode 100644 kernel/trace/fprobes.c
> 
> diff --git a/include/linux/fprobes.h b/include/linux/fprobes.h
> new file mode 100644
> index 000000000000..22db748bf491
> --- /dev/null
> +++ b/include/linux/fprobes.h
> @@ -0,0 +1,52 @@
> +#ifndef _LINUX_FPROBES_H
> +#define _LINUX_FPROBES_H
> +/* Simple ftrace probe wrapper */
> +
> +#include <linux/compiler.h>
> +#include <linux/ftrace.h>
> +
> +struct fprobe {
> +	const char		**syms;
> +	unsigned long		*addrs;

could you add array of user data for each addr/sym?

SNIP

> +static int populate_func_addresses(struct fprobe *fp)
> +{
> +	unsigned int i;
> +
> +	fp->addrs = kmalloc(sizeof(void *) * fp->nsyms, GFP_KERNEL);
> +	if (!fp->addrs)
> +		return -ENOMEM;
> +
> +	for (i = 0; i < fp->nsyms; i++) {
> +		fp->addrs[i] = kallsyms_lookup_name(fp->syms[i]);
> +		if (!fp->addrs[i]) {
> +			kfree(fp->addrs);
> +			fp->addrs = NULL;
> +			return -ENOENT;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +/**
> + * register_fprobe - Register fprobe to ftrace
> + * @fp: A fprobe data structure to be registered.
> + *
> + * This expects the user set @fp::syms or @fp::addrs (not both),
> + * @fp::nsyms (number of entries of @fp::syms or @fp::addrs) and
> + * @fp::handler. Other fields are initialized by this function.
> + */
> +int register_fprobe(struct fprobe *fp)
> +{
> +	unsigned int i;
> +	int ret;
> +
> +	if (!fp)
> +		return -EINVAL;
> +
> +	if (!fp->nsyms || (!fp->syms && !fp->addrs) || (fp->syms && fp->addrs))
> +		return -EINVAL;
> +
> +	if (fp->syms) {
> +		ret = populate_func_addresses(fp);
> +		if (ret < 0)
> +			return ret;
> +	}
> +
> +	fp->ftrace.func = fprobe_handler;
> +	fp->ftrace.flags = FTRACE_OPS_FL_SAVE_REGS;
> +
> +	for (i = 0; i < fp->nsyms; i++) {
> +		ret = ftrace_set_filter_ip(&fp->ftrace, fp->addrs[i], 0, 0);
> +		if (ret < 0)
> +			goto error;
> +	}

I introduced ftrace_set_filter_ips, because loop like above was slow:
  https://lore.kernel.org/bpf/20211118112455.475349-4-jolsa@kernel.org/

thanks,
jirka

> +
> +	fp->nmissed = 0;
> +	ret = register_ftrace_function(&fp->ftrace);
> +	if (!ret)
> +		return ret;
> +
> +error:
> +	if (fp->syms) {
> +		kfree(fp->addrs);
> +		fp->addrs = NULL;
> +	}
> +
> +	return ret;
> +}
> +
> +/**
> + * unregister_fprobe - Unregister fprobe from ftrace
> + * @fp: A fprobe data structure to be unregistered.
> + */
> +int unregister_fprobe(struct fprobe *fp)
> +{
> +	int ret;
> +
> +	if (!fp)
> +		return -EINVAL;
> +
> +	if (!fp->nsyms || !fp->addrs)
> +		return -EINVAL;
> +
> +	ret = unregister_ftrace_function(&fp->ftrace);
> +
> +	if (fp->syms) {
> +		/* fp->addrs is allocated by register_fprobe() */
> +		kfree(fp->addrs);
> +		fp->addrs = NULL;
> +	}
> +
> +	return ret;
> +}
> -- 
> 2.25.1
>
Steven Rostedt Jan. 6, 2022, 3:02 p.m. UTC | #7
On Thu, 6 Jan 2022 22:59:43 +0900
Masami Hiramatsu <mhiramat@kernel.org> wrote:

> > at the moment there's not ftrace public interface for the return
> > probe merged in, so to get the kretprobe working I had to use
> > kprobe interface  
> 
> Yeah, I found that too. We have to ask Steve to salvage it ;)

I have one more week of being unemployed (and I'm done with my office
renovation), so perhaps I'll start looking into this.

This was the work to merge function graph tracer with kretprobes, right?

-- Steve
Alexei Starovoitov Jan. 6, 2022, 5:40 p.m. UTC | #8
On Thu, Jan 6, 2022 at 5:59 AM Masami Hiramatsu <mhiramat@kernel.org> wrote:
>
> That seems to bind your mind. The program type is just a programing
> 'model' of the bpf. You can choose the best implementation to provide
> equal functionality. 'kprobe' in bpf is just a name that you call some
> instrumentations which can probe kernel code.

No. We're not going to call it "fprobe" or any other name.
From bpf user's pov it's going to be "multi attach kprobe",
because this is how everyone got to know kprobes.
The 99% usage is at the beginning of the funcs.
When users say "kprobe" they don't care how kernel attaches it.
The func entry limitation for "multi attach kprobe" is a no-brainer.

And we need both "multi attach kprobe" and "multi attach kretprobe"
at the same time. It's no go to implement one first and the other
some time later.
Masami Hiramatsu (Google) Jan. 6, 2022, 11:52 p.m. UTC | #9
On Thu, 6 Jan 2022 09:40:17 -0800
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> On Thu, Jan 6, 2022 at 5:59 AM Masami Hiramatsu <mhiramat@kernel.org> wrote:
> >
> > That seems to bind your mind. The program type is just a programing
> > 'model' of the bpf. You can choose the best implementation to provide
> > equal functionality. 'kprobe' in bpf is just a name that you call some
> > instrumentations which can probe kernel code.
> 
> No. We're not going to call it "fprobe" or any other name.
> From bpf user's pov it's going to be "multi attach kprobe",
> because this is how everyone got to know kprobes.
> The 99% usage is at the beginning of the funcs.
> When users say "kprobe" they don't care how kernel attaches it.
> The func entry limitation for "multi attach kprobe" is a no-brainer.

Agreed. I think I might mislead you. From the bpf user pov, it always be
shown as 'multi attached kprobes (but only for the function entry)'
the 'fprobe' is kernel internal API name.

> And we need both "multi attach kprobe" and "multi attach kretprobe"
> at the same time. It's no go to implement one first and the other
> some time later.

You can provide the interface to user space, but the kernel implementation
is optimized step by step. We can start it with using real multiple
kretprobes, and then, switch to 'fprobe' after integrating fgraph
callback. :)

Thank you,
Alexei Starovoitov Jan. 7, 2022, 12:20 a.m. UTC | #10
On Thu, Jan 6, 2022 at 3:52 PM Masami Hiramatsu <mhiramat@kernel.org> wrote:
>
> On Thu, 6 Jan 2022 09:40:17 -0800
> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>
> > On Thu, Jan 6, 2022 at 5:59 AM Masami Hiramatsu <mhiramat@kernel.org> wrote:
> > >
> > > That seems to bind your mind. The program type is just a programing
> > > 'model' of the bpf. You can choose the best implementation to provide
> > > equal functionality. 'kprobe' in bpf is just a name that you call some
> > > instrumentations which can probe kernel code.
> >
> > No. We're not going to call it "fprobe" or any other name.
> > From bpf user's pov it's going to be "multi attach kprobe",
> > because this is how everyone got to know kprobes.
> > The 99% usage is at the beginning of the funcs.
> > When users say "kprobe" they don't care how kernel attaches it.
> > The func entry limitation for "multi attach kprobe" is a no-brainer.
>
> Agreed. I think I might mislead you. From the bpf user pov, it always be
> shown as 'multi attached kprobes (but only for the function entry)'
> the 'fprobe' is kernel internal API name.
>
> > And we need both "multi attach kprobe" and "multi attach kretprobe"
> > at the same time. It's no go to implement one first and the other
> > some time later.
>
> You can provide the interface to user space, but the kernel implementation
> is optimized step by step. We can start it with using real multiple
> kretprobes, and then, switch to 'fprobe' after integrating fgraph
> callback. :)

Sounds good to me.
My point was that users often want to say:
"profile speed of all foo* functions".
To perform such a command a tracer would need to
attach kprobes and kretprobes to all such functions.
The speed of attach/detach has to be fast.
Currently tracers artificially limit the regex just because
attach/detach is so slow that the user will likely Ctrl-C
instead of waiting for many seconds.
Masami Hiramatsu (Google) Jan. 7, 2022, 5:42 a.m. UTC | #11
On Thu, 6 Jan 2022 15:57:10 +0100
Jiri Olsa <jolsa@redhat.com> wrote:

> On Thu, Jan 06, 2022 at 10:59:43PM +0900, Masami Hiramatsu wrote:
> 
> SNIP
> 
> > > > 
> > > > Hmm, I think there may be a time to split the "kprobe as an 
> > > > interface for the software breakpoint" and "kprobe as a wrapper
> > > > interface for the callbacks of various instrumentations", like
> > > > 'raw_kprobe'(or kswbp) and 'kprobes'.
> > > > And this may be called as 'fprobe' as ftrace_ops wrapper.
> > > > (But if the bpf is enough flexible, this kind of intermediate layer
> > > >  may not be needed, it can use ftrace_ops directly, eventually)
> > > > 
> > > > Jiri, have you already considered to use ftrace_ops from the
> > > > bpf directly? Are there any issues?
> > > > (bpf depends on 'kprobe' widely?)
> > > 
> > > at the moment there's not ftrace public interface for the return
> > > probe merged in, so to get the kretprobe working I had to use
> > > kprobe interface
> > 
> > Yeah, I found that too. We have to ask Steve to salvage it ;)
> 
> I got those patches rebased like half a year ago upstream code,
> so should be easy to revive them

Nice! :)

> 
> > 
> > > but.. there are patches Steven shared some time ago, that do that
> > > and make graph_ops available as kernel interface
> > > 
> > > I recall we considered graph_ops interface before as common attach
> > > layer for trampolines, which was bad, but it might actually make
> > > sense for kprobes
> > 
> > I started working on making 'fprobe' which will provide multiple
> > function probe with similar interface of kprobes. See attached
> > patch. Then you can use it in bpf, maybe with an union like
> > 
> > union {
> > 	struct kprobe kp;	// for function body
> > 	struct fprobe fp;	// for function entry and return
> > };
> > 
> > At this moment, fprobe only support entry_handler, but when we
> > re-start the generic graph_ops interface, it is easy to expand
> > to support exit_handler.
> > If this works, I think kretprobe can be phased out, since at that
> > moment, kprobe_event can replace it with the fprobe exit_handler.
> > (This is a benefit of decoupling the instrumentation layer from
> > the event layer. It can choose the best way without changing
> > user interface.)
> > 
> 
> I can resend out graph_ops patches if you want to base
> it directly on that

Yes, that's very helpful. Now I'm considering to use it (or via fprobe)
from kretprobes like ftrace-based kprobe.

> > > I'll need to check it in more details but I think both graph_ops and
> > > kprobe do about similar thing wrt hooking return probe, so it should
> > > be comparable.. and they are already doing the same for the entry hook,
> > > because kprobe is mostly using ftrace for that
> > > 
> > > we would not need to introduce new program type - kprobe programs
> > > should be able to run from ftrace callbacks just fine
> > 
> > That seems to bind your mind. The program type is just a programing
> > 'model' of the bpf. You can choose the best implementation to provide
> > equal functionality. 'kprobe' in bpf is just a name that you call some
> > instrumentations which can probe kernel code.
> 
> I don't want to introduce new type, there's some dependencies
> in bpf verifier and helpers code we'd need to handle for that
> 
> I'm looking for solution for current kprobe bpf program type
> to be registered for multiple addresses quickly

Yes, as I replied to Alex, the bpf program type itself keeps 'kprobe'.
For example, you've introduced bpf_kprobe_link at [8/13], 

struct bpf_kprobe_link {
	struct bpf_link link;
	union {
		struct kretprobe rp;
		struct fprobe fp;
	};
	bool is_return;
	bool is_fentry;
	kprobe_opcode_t **addrs;
	u32 cnt;
	u64 bpf_cookie;
};

If all "addrs" are function entry, ::fp will be used.
If cnt == 1 then use ::rp.

> > > so we would have:
> > >   - kprobe type programs attaching to:
> > >   - new BPF_LINK_TYPE_FPROBE link using the graph_ops as attachment layer
> > > 
> > > jirka
> > > 
> > 
> > 
> > -- 
> > Masami Hiramatsu <mhiramat@kernel.org>
> 
> > From 269b86597c166d6d4c5dd564168237603533165a Mon Sep 17 00:00:00 2001
> > From: Masami Hiramatsu <mhiramat@kernel.org>
> > Date: Thu, 6 Jan 2022 15:40:36 +0900
> > Subject: [PATCH] fprobe: Add ftrace based probe APIs
> > 
> > The fprobe is a wrapper API for ftrace function tracer.
> > Unlike kprobes, this probes only supports the function entry, but
> > it can probe multiple functions by one fprobe. The usage is almost
> > same as the kprobe, user will specify the function names by
> > fprobe::syms, the number of syms by fprobe::nsyms, and the user
> > handler by fprobe::handler.
> > 
> > struct fprobe = { 0 };
> > const char *targets[] = {"func1", "func2", "func3"};
> > 
> > fprobe.handler = user_handler;
> > fprobe.nsyms = ARRAY_SIZE(targets);
> > fprobe.syms = targets;
> > 
> > ret = register_fprobe(&fprobe);
> > ...
> > 
> > 
> > Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
> > ---
> >  include/linux/fprobes.h |  52 ++++++++++++++++
> >  kernel/trace/Kconfig    |  10 ++++
> >  kernel/trace/Makefile   |   1 +
> >  kernel/trace/fprobes.c  | 128 ++++++++++++++++++++++++++++++++++++++++
> >  4 files changed, 191 insertions(+)
> >  create mode 100644 include/linux/fprobes.h
> >  create mode 100644 kernel/trace/fprobes.c
> > 
> > diff --git a/include/linux/fprobes.h b/include/linux/fprobes.h
> > new file mode 100644
> > index 000000000000..22db748bf491
> > --- /dev/null
> > +++ b/include/linux/fprobes.h
> > @@ -0,0 +1,52 @@
> > +#ifndef _LINUX_FPROBES_H
> > +#define _LINUX_FPROBES_H
> > +/* Simple ftrace probe wrapper */
> > +
> > +#include <linux/compiler.h>
> > +#include <linux/ftrace.h>
> > +
> > +struct fprobe {
> > +	const char		**syms;
> > +	unsigned long		*addrs;
> 
> could you add array of user data for each addr/sym?

OK, something like this?

	void	**user_data;

But note that you need O(N) to search the entry corresponding to
a specific address. To reduce the overhead, we may need to sort
the array in advance (e.g. when registering it).

> 
> SNIP
> 
> > +static int populate_func_addresses(struct fprobe *fp)
> > +{
> > +	unsigned int i;
> > +
> > +	fp->addrs = kmalloc(sizeof(void *) * fp->nsyms, GFP_KERNEL);
> > +	if (!fp->addrs)
> > +		return -ENOMEM;
> > +
> > +	for (i = 0; i < fp->nsyms; i++) {
> > +		fp->addrs[i] = kallsyms_lookup_name(fp->syms[i]);
> > +		if (!fp->addrs[i]) {
> > +			kfree(fp->addrs);
> > +			fp->addrs = NULL;
> > +			return -ENOENT;
> > +		}
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +/**
> > + * register_fprobe - Register fprobe to ftrace
> > + * @fp: A fprobe data structure to be registered.
> > + *
> > + * This expects the user set @fp::syms or @fp::addrs (not both),
> > + * @fp::nsyms (number of entries of @fp::syms or @fp::addrs) and
> > + * @fp::handler. Other fields are initialized by this function.
> > + */
> > +int register_fprobe(struct fprobe *fp)
> > +{
> > +	unsigned int i;
> > +	int ret;
> > +
> > +	if (!fp)
> > +		return -EINVAL;
> > +
> > +	if (!fp->nsyms || (!fp->syms && !fp->addrs) || (fp->syms && fp->addrs))
> > +		return -EINVAL;
> > +
> > +	if (fp->syms) {
> > +		ret = populate_func_addresses(fp);
> > +		if (ret < 0)
> > +			return ret;
> > +	}
> > +
> > +	fp->ftrace.func = fprobe_handler;
> > +	fp->ftrace.flags = FTRACE_OPS_FL_SAVE_REGS;
> > +
> > +	for (i = 0; i < fp->nsyms; i++) {
> > +		ret = ftrace_set_filter_ip(&fp->ftrace, fp->addrs[i], 0, 0);
> > +		if (ret < 0)
> > +			goto error;
> > +	}
> 
> I introduced ftrace_set_filter_ips, because loop like above was slow:
>   https://lore.kernel.org/bpf/20211118112455.475349-4-jolsa@kernel.org/

Ah, thanks for noticing!

Thank you,

> 
> thanks,
> jirka
> 
> > +
> > +	fp->nmissed = 0;
> > +	ret = register_ftrace_function(&fp->ftrace);
> > +	if (!ret)
> > +		return ret;
> > +
> > +error:
> > +	if (fp->syms) {
> > +		kfree(fp->addrs);
> > +		fp->addrs = NULL;
> > +	}
> > +
> > +	return ret;
> > +}
> > +
> > +/**
> > + * unregister_fprobe - Unregister fprobe from ftrace
> > + * @fp: A fprobe data structure to be unregistered.
> > + */
> > +int unregister_fprobe(struct fprobe *fp)
> > +{
> > +	int ret;
> > +
> > +	if (!fp)
> > +		return -EINVAL;
> > +
> > +	if (!fp->nsyms || !fp->addrs)
> > +		return -EINVAL;
> > +
> > +	ret = unregister_ftrace_function(&fp->ftrace);
> > +
> > +	if (fp->syms) {
> > +		/* fp->addrs is allocated by register_fprobe() */
> > +		kfree(fp->addrs);
> > +		fp->addrs = NULL;
> > +	}
> > +
> > +	return ret;
> > +}
> > -- 
> > 2.25.1
> > 
>
Masami Hiramatsu (Google) Jan. 7, 2022, 12:55 p.m. UTC | #12
On Thu, 6 Jan 2022 16:20:05 -0800
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> On Thu, Jan 6, 2022 at 3:52 PM Masami Hiramatsu <mhiramat@kernel.org> wrote:
> >
> > On Thu, 6 Jan 2022 09:40:17 -0800
> > Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> >
> > > On Thu, Jan 6, 2022 at 5:59 AM Masami Hiramatsu <mhiramat@kernel.org> wrote:
> > > >
> > > > That seems to bind your mind. The program type is just a programing
> > > > 'model' of the bpf. You can choose the best implementation to provide
> > > > equal functionality. 'kprobe' in bpf is just a name that you call some
> > > > instrumentations which can probe kernel code.
> > >
> > > No. We're not going to call it "fprobe" or any other name.
> > > From bpf user's pov it's going to be "multi attach kprobe",
> > > because this is how everyone got to know kprobes.
> > > The 99% usage is at the beginning of the funcs.
> > > When users say "kprobe" they don't care how kernel attaches it.
> > > The func entry limitation for "multi attach kprobe" is a no-brainer.
> >
> > Agreed. I think I might mislead you. From the bpf user pov, it always be
> > shown as 'multi attached kprobes (but only for the function entry)'
> > the 'fprobe' is kernel internal API name.
> >
> > > And we need both "multi attach kprobe" and "multi attach kretprobe"
> > > at the same time. It's no go to implement one first and the other
> > > some time later.
> >
> > You can provide the interface to user space, but the kernel implementation
> > is optimized step by step. We can start it with using real multiple
> > kretprobes, and then, switch to 'fprobe' after integrating fgraph
> > callback. :)
> 
> Sounds good to me.
> My point was that users often want to say:
> "profile speed of all foo* functions".
> To perform such a command a tracer would need to
> attach kprobes and kretprobes to all such functions.

Yeah, I know. That is more than 10 years issue since
systemtap. :)

> The speed of attach/detach has to be fast.

Yes, that's why I provided register/unregister_kprobes()
but it sounds not enough (and maybe not optimized enough
because all handlers are same)

> Currently tracers artificially limit the regex just because
> attach/detach is so slow that the user will likely Ctrl-C
> instead of waiting for many seconds.

Ah, OK.
Anyway I also would like to fix that issue. If user wants
only function entry/exit, it should be done by ftrace. But
since the syntax (and user's mind model) it should be done via
'kprobe', so transparently converting such request to ftrace
is needed.

Thank you,
Masami Hiramatsu (Google) Jan. 11, 2022, 3 p.m. UTC | #13
Hi Jiri,

Here is a short series of patches, which shows what I replied
to your series.

This introduces the fprobe, the function entry/exit probe with
multiple probe point support. This also introduces the rethook
for hooking function return, which I cloned from kretprobe.

I also rewrite your [08/13] bpf patch to use this fprobe instead
of kprobes. I didn't tested that one, but the sample module seems
to work. Please test bpf part with your libbpf updates.

BTW, while implementing the fprobe, I introduced the per-probe
point private data, but I'm not sure why you need it. It seems
that data is not used from bpf...

If this is good for you, I would like to proceed this with
the rethook and rewrite the kretprobe to use the rethook to
hook the functions. That should be much cleaner (and easy to
prepare for the fgraph tracer integration)

Thank you,

---

Jiri Olsa (1):
      bpf: Add kprobe link for attaching raw kprobes

Masami Hiramatsu (5):
      fprobe: Add ftrace based probe APIs
      rethook: Add a generic return hook
      rethook: x86: Add rethook x86 implementation
      fprobe: Add exit_handler support
      fprobe: Add sample program for fprobe


 arch/x86/Kconfig                |    1 
 arch/x86/kernel/Makefile        |    1 
 arch/x86/kernel/rethook.c       |  115 ++++++++++++++++++++
 include/linux/bpf_types.h       |    1 
 include/linux/fprobes.h         |   75 +++++++++++++
 include/linux/rethook.h         |   74 +++++++++++++
 include/linux/sched.h           |    3 +
 include/uapi/linux/bpf.h        |   12 ++
 kernel/bpf/syscall.c            |  199 +++++++++++++++++++++++++++++++++-
 kernel/exit.c                   |    2 
 kernel/fork.c                   |    3 +
 kernel/trace/Kconfig            |   22 ++++
 kernel/trace/Makefile           |    2 
 kernel/trace/fprobes.c          |  187 ++++++++++++++++++++++++++++++++
 kernel/trace/rethook.c          |  226 +++++++++++++++++++++++++++++++++++++++
 samples/Kconfig                 |    6 +
 samples/Makefile                |    1 
 samples/fprobe/Makefile         |    3 +
 samples/fprobe/fprobe_example.c |  103 ++++++++++++++++++
 tools/include/uapi/linux/bpf.h  |   12 ++
 20 files changed, 1043 insertions(+), 5 deletions(-)
 create mode 100644 arch/x86/kernel/rethook.c
 create mode 100644 include/linux/fprobes.h
 create mode 100644 include/linux/rethook.h
 create mode 100644 kernel/trace/fprobes.c
 create mode 100644 kernel/trace/rethook.c
 create mode 100644 samples/fprobe/Makefile
 create mode 100644 samples/fprobe/fprobe_example.c

--
Masami Hiramatsu (Linaro) <mhiramat@kernel.org>