Message ID | 20241026154629.593041-2-mathieu.desnoyers@efficios.com (mailing list archive) |
---|---|
State | RFC |
Headers | show |
Series | [RFC,v3,1/3] tracing: Introduce tracepoint extended structure | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Not a local patch |
On Sat, 26 Oct 2024 11:46:28 -0400 Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: > Introduce a "syscall" flag within the extended structure to know whether > a tracepoint needs rcu tasks trace grace period before reclaim. > This can be queried using tracepoint_is_syscall(). > > Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> > Cc: Michael Jeanson <mjeanson@efficios.com> > Cc: Steven Rostedt <rostedt@goodmis.org> > Cc: Masami Hiramatsu <mhiramat@kernel.org> > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: Alexei Starovoitov <ast@kernel.org> > Cc: Yonghong Song <yhs@fb.com> > Cc: Paul E. McKenney <paulmck@kernel.org> > Cc: Ingo Molnar <mingo@redhat.com> > Cc: Arnaldo Carvalho de Melo <acme@kernel.org> > Cc: Mark Rutland <mark.rutland@arm.com> > Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> > Cc: Namhyung Kim <namhyung@kernel.org> > Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com> > Cc: bpf@vger.kernel.org > Cc: Joel Fernandes <joel@joelfernandes.org> > Cc: Jordan Rife <jrife@google.com> > --- > include/linux/tracepoint-defs.h | 2 ++ > include/linux/tracepoint.h | 24 ++++++++++++++++++++++++ > include/trace/define_trace.h | 2 +- > 3 files changed, 27 insertions(+), 1 deletion(-) > > diff --git a/include/linux/tracepoint-defs.h b/include/linux/tracepoint-defs.h > index 967c08d9da84..53119e074c87 100644 > --- a/include/linux/tracepoint-defs.h > +++ b/include/linux/tracepoint-defs.h > @@ -32,6 +32,8 @@ struct tracepoint_func { > struct tracepoint_ext { > int (*regfunc)(void); > void (*unregfunc)(void); > + /* Flags. */ > + unsigned int syscall:1; I wonder if we should call it "sleepable" instead? For this patch set do we really care if it's a system call or not? It's really if the tracepoint is sleepable or not that's the issue. System calls are just one user of it, there may be more in the future, and the changes to BPF will still be needed. Other than that, I think this could work. -- Steve > }; > > struct tracepoint { > diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h > index 83dc24ee8b13..93e70bc64533 100644 > --- a/include/linux/tracepoint.h > +++ b/include/linux/tracepoint.h > @@ -104,6 +104,12 @@ void for_each_tracepoint_in_module(struct module *mod, > * tracepoint_synchronize_unregister must be called between the last tracepoint > * probe unregistration and the end of module exit to make sure there is no > * caller executing a probe when it is freed. > + * > + * An alternative is to use the following for batch reclaim associated > + * with a given tracepoint: > + * > + * - tracepoint_is_syscall() == false: call_rcu() > + * - tracepoint_is_syscall() == true: call_rcu_tasks_trace() > */ > #ifdef CONFIG_TRACEPOINTS > static inline void tracepoint_synchronize_unregister(void) > @@ -111,9 +117,17 @@ static inline void tracepoint_synchronize_unregister(void) > synchronize_rcu_tasks_trace(); > synchronize_rcu(); > } > +static inline bool tracepoint_is_syscall(struct tracepoint *tp) > +{ > + return tp->ext && tp->ext->syscall; > +} > #else > static inline void tracepoint_synchronize_unregister(void) > { } > +static inline bool tracepoint_is_syscall(struct tracepoint *tp) > +{ > + return false; > +} > #endif > > #ifdef CONFIG_HAVE_SYSCALL_TRACEPOINTS > @@ -345,6 +359,15 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p) > struct tracepoint_ext __tracepoint_ext_##_name = { \ > .regfunc = _reg, \ > .unregfunc = _unreg, \ > + .syscall = false, \ > + }; \ > + __DEFINE_TRACE_EXT(_name, &__tracepoint_ext_##_name, PARAMS(_proto), PARAMS(_args)); > + > +#define DEFINE_TRACE_SYSCALL(_name, _reg, _unreg, _proto, _args) \ > + struct tracepoint_ext __tracepoint_ext_##_name = { \ > + .regfunc = _reg, \ > + .unregfunc = _unreg, \ > + .syscall = true, \ > }; \ > __DEFINE_TRACE_EXT(_name, &__tracepoint_ext_##_name, PARAMS(_proto), PARAMS(_args)); > > @@ -389,6 +412,7 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p) > #define __DECLARE_TRACE_SYSCALL __DECLARE_TRACE > > #define DEFINE_TRACE_FN(name, reg, unreg, proto, args) > +#define DEFINE_TRACE_SYSCALL(name, reg, unreg, proto, args) > #define DEFINE_TRACE(name, proto, args) > #define EXPORT_TRACEPOINT_SYMBOL_GPL(name) > #define EXPORT_TRACEPOINT_SYMBOL(name) > diff --git a/include/trace/define_trace.h b/include/trace/define_trace.h > index ff5fa17a6259..63fea2218afa 100644 > --- a/include/trace/define_trace.h > +++ b/include/trace/define_trace.h > @@ -48,7 +48,7 @@ > > #undef TRACE_EVENT_SYSCALL > #define TRACE_EVENT_SYSCALL(name, proto, args, struct, assign, print, reg, unreg) \ > - DEFINE_TRACE_FN(name, reg, unreg, PARAMS(proto), PARAMS(args)) > + DEFINE_TRACE_SYSCALL(name, reg, unreg, PARAMS(proto), PARAMS(args)) > > #undef TRACE_EVENT_NOP > #define TRACE_EVENT_NOP(name, proto, args, struct, assign, print)
On 2024-10-26 20:08, Steven Rostedt wrote: > On Sat, 26 Oct 2024 11:46:28 -0400 > Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: > >> Introduce a "syscall" flag within the extended structure to know whether >> a tracepoint needs rcu tasks trace grace period before reclaim. >> This can be queried using tracepoint_is_syscall(). >> >> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> >> Cc: Michael Jeanson <mjeanson@efficios.com> >> Cc: Steven Rostedt <rostedt@goodmis.org> >> Cc: Masami Hiramatsu <mhiramat@kernel.org> >> Cc: Peter Zijlstra <peterz@infradead.org> >> Cc: Alexei Starovoitov <ast@kernel.org> >> Cc: Yonghong Song <yhs@fb.com> >> Cc: Paul E. McKenney <paulmck@kernel.org> >> Cc: Ingo Molnar <mingo@redhat.com> >> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> >> Cc: Mark Rutland <mark.rutland@arm.com> >> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> >> Cc: Namhyung Kim <namhyung@kernel.org> >> Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com> >> Cc: bpf@vger.kernel.org >> Cc: Joel Fernandes <joel@joelfernandes.org> >> Cc: Jordan Rife <jrife@google.com> >> --- >> include/linux/tracepoint-defs.h | 2 ++ >> include/linux/tracepoint.h | 24 ++++++++++++++++++++++++ >> include/trace/define_trace.h | 2 +- >> 3 files changed, 27 insertions(+), 1 deletion(-) >> >> diff --git a/include/linux/tracepoint-defs.h b/include/linux/tracepoint-defs.h >> index 967c08d9da84..53119e074c87 100644 >> --- a/include/linux/tracepoint-defs.h >> +++ b/include/linux/tracepoint-defs.h >> @@ -32,6 +32,8 @@ struct tracepoint_func { >> struct tracepoint_ext { >> int (*regfunc)(void); >> void (*unregfunc)(void); >> + /* Flags. */ >> + unsigned int syscall:1; > > I wonder if we should call it "sleepable" instead? For this patch set > do we really care if it's a system call or not? It's really if the > tracepoint is sleepable or not that's the issue. System calls are just > one user of it, there may be more in the future, and the changes to BPF > will still be needed. Remember that syscall tracepoint probes are allowed to handle page faults, but should not generally block, otherwise it would postpone the grace periods of all RCU tasks trace users. So naming this "sleepable" would be misleading, because probes are not allowed general blocking, just to handle page faults. If we look at the history of this tracepoint feature, we went with the following naming over the various versions of the patch series: 1) Sleepable tracepoints: until we understood that we just want to allow page fault, not general sleeping, so we needed to change the name, 2) Faultable tracepoints: until Linus requested that we aim for something that is specific to system calls, rather than a generic thing. https://lore.kernel.org/lkml/CAHk-=wggDLDeTKbhb5hh--x=-DQd69v41137M72m6NOTmbD-cw@mail.gmail.com/ 3) Syscall tracepoints: This is what we currently have. > Other than that, I think this could work. Calling this field "sleepable" would be misleading. Calling it "faultable" would be a better fit, but based on Linus' request, I'm tempted to stick with "syscall" for now. Your concern is to name this in a way that is general and future-proof. Linus' point was to make it syscall-specific rather than general. My position is that we should wait until we face other use-cases (if we even do) before consider changing the naming from "syscall" to something more generic. Thanks, Mathieu > > -- Steve > > >> }; >> >> struct tracepoint { >> diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h >> index 83dc24ee8b13..93e70bc64533 100644 >> --- a/include/linux/tracepoint.h >> +++ b/include/linux/tracepoint.h >> @@ -104,6 +104,12 @@ void for_each_tracepoint_in_module(struct module *mod, >> * tracepoint_synchronize_unregister must be called between the last tracepoint >> * probe unregistration and the end of module exit to make sure there is no >> * caller executing a probe when it is freed. >> + * >> + * An alternative is to use the following for batch reclaim associated >> + * with a given tracepoint: >> + * >> + * - tracepoint_is_syscall() == false: call_rcu() >> + * - tracepoint_is_syscall() == true: call_rcu_tasks_trace() >> */ >> #ifdef CONFIG_TRACEPOINTS >> static inline void tracepoint_synchronize_unregister(void) >> @@ -111,9 +117,17 @@ static inline void tracepoint_synchronize_unregister(void) >> synchronize_rcu_tasks_trace(); >> synchronize_rcu(); >> } >> +static inline bool tracepoint_is_syscall(struct tracepoint *tp) >> +{ >> + return tp->ext && tp->ext->syscall; >> +} >> #else >> static inline void tracepoint_synchronize_unregister(void) >> { } >> +static inline bool tracepoint_is_syscall(struct tracepoint *tp) >> +{ >> + return false; >> +} >> #endif >> >> #ifdef CONFIG_HAVE_SYSCALL_TRACEPOINTS >> @@ -345,6 +359,15 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p) >> struct tracepoint_ext __tracepoint_ext_##_name = { \ >> .regfunc = _reg, \ >> .unregfunc = _unreg, \ >> + .syscall = false, \ >> + }; \ >> + __DEFINE_TRACE_EXT(_name, &__tracepoint_ext_##_name, PARAMS(_proto), PARAMS(_args)); >> + >> +#define DEFINE_TRACE_SYSCALL(_name, _reg, _unreg, _proto, _args) \ >> + struct tracepoint_ext __tracepoint_ext_##_name = { \ >> + .regfunc = _reg, \ >> + .unregfunc = _unreg, \ >> + .syscall = true, \ >> }; \ >> __DEFINE_TRACE_EXT(_name, &__tracepoint_ext_##_name, PARAMS(_proto), PARAMS(_args)); >> >> @@ -389,6 +412,7 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p) >> #define __DECLARE_TRACE_SYSCALL __DECLARE_TRACE >> >> #define DEFINE_TRACE_FN(name, reg, unreg, proto, args) >> +#define DEFINE_TRACE_SYSCALL(name, reg, unreg, proto, args) >> #define DEFINE_TRACE(name, proto, args) >> #define EXPORT_TRACEPOINT_SYMBOL_GPL(name) >> #define EXPORT_TRACEPOINT_SYMBOL(name) >> diff --git a/include/trace/define_trace.h b/include/trace/define_trace.h >> index ff5fa17a6259..63fea2218afa 100644 >> --- a/include/trace/define_trace.h >> +++ b/include/trace/define_trace.h >> @@ -48,7 +48,7 @@ >> >> #undef TRACE_EVENT_SYSCALL >> #define TRACE_EVENT_SYSCALL(name, proto, args, struct, assign, print, reg, unreg) \ >> - DEFINE_TRACE_FN(name, reg, unreg, PARAMS(proto), PARAMS(args)) >> + DEFINE_TRACE_SYSCALL(name, reg, unreg, PARAMS(proto), PARAMS(args)) >> >> #undef TRACE_EVENT_NOP >> #define TRACE_EVENT_NOP(name, proto, args, struct, assign, print) >
On Sat, 26 Oct 2024 20:08:40 -0400 Steven Rostedt <rostedt@goodmis.org> wrote: > On Sat, 26 Oct 2024 11:46:28 -0400 > Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: > > > Introduce a "syscall" flag within the extended structure to know whether > > a tracepoint needs rcu tasks trace grace period before reclaim. > > This can be queried using tracepoint_is_syscall(). > > > > Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> > > Cc: Michael Jeanson <mjeanson@efficios.com> > > Cc: Steven Rostedt <rostedt@goodmis.org> > > Cc: Masami Hiramatsu <mhiramat@kernel.org> > > Cc: Peter Zijlstra <peterz@infradead.org> > > Cc: Alexei Starovoitov <ast@kernel.org> > > Cc: Yonghong Song <yhs@fb.com> > > Cc: Paul E. McKenney <paulmck@kernel.org> > > Cc: Ingo Molnar <mingo@redhat.com> > > Cc: Arnaldo Carvalho de Melo <acme@kernel.org> > > Cc: Mark Rutland <mark.rutland@arm.com> > > Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> > > Cc: Namhyung Kim <namhyung@kernel.org> > > Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com> > > Cc: bpf@vger.kernel.org > > Cc: Joel Fernandes <joel@joelfernandes.org> > > Cc: Jordan Rife <jrife@google.com> > > --- > > include/linux/tracepoint-defs.h | 2 ++ > > include/linux/tracepoint.h | 24 ++++++++++++++++++++++++ > > include/trace/define_trace.h | 2 +- > > 3 files changed, 27 insertions(+), 1 deletion(-) > > > > diff --git a/include/linux/tracepoint-defs.h b/include/linux/tracepoint-defs.h > > index 967c08d9da84..53119e074c87 100644 > > --- a/include/linux/tracepoint-defs.h > > +++ b/include/linux/tracepoint-defs.h > > @@ -32,6 +32,8 @@ struct tracepoint_func { > > struct tracepoint_ext { > > int (*regfunc)(void); > > void (*unregfunc)(void); > > + /* Flags. */ > > + unsigned int syscall:1; > > I wonder if we should call it "sleepable" instead? For this patch set > do we really care if it's a system call or not? It's really if the > tracepoint is sleepable or not that's the issue. System calls are just > one user of it, there may be more in the future, and the changes to BPF > will still be needed. I agree with this. Even if currently we restrict only syscall events can be sleep, "tracepoint_is_syscall()" requires to add comment to explain why on all call sites e.g. /* * The syscall event is only sleepable event, so we ensure it is * syscall event for checking sleepable or not. */ If it called tracepoint_is_sleepable(), we don't need such comment. Thank you, > > Other than that, I think this could work. > > -- Steve > > > > }; > > > > struct tracepoint { > > diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h > > index 83dc24ee8b13..93e70bc64533 100644 > > --- a/include/linux/tracepoint.h > > +++ b/include/linux/tracepoint.h > > @@ -104,6 +104,12 @@ void for_each_tracepoint_in_module(struct module *mod, > > * tracepoint_synchronize_unregister must be called between the last tracepoint > > * probe unregistration and the end of module exit to make sure there is no > > * caller executing a probe when it is freed. > > + * > > + * An alternative is to use the following for batch reclaim associated > > + * with a given tracepoint: > > + * > > + * - tracepoint_is_syscall() == false: call_rcu() > > + * - tracepoint_is_syscall() == true: call_rcu_tasks_trace() > > */ > > #ifdef CONFIG_TRACEPOINTS > > static inline void tracepoint_synchronize_unregister(void) > > @@ -111,9 +117,17 @@ static inline void tracepoint_synchronize_unregister(void) > > synchronize_rcu_tasks_trace(); > > synchronize_rcu(); > > } > > +static inline bool tracepoint_is_syscall(struct tracepoint *tp) > > +{ > > + return tp->ext && tp->ext->syscall; > > +} > > #else > > static inline void tracepoint_synchronize_unregister(void) > > { } > > +static inline bool tracepoint_is_syscall(struct tracepoint *tp) > > +{ > > + return false; > > +} > > #endif > > > > #ifdef CONFIG_HAVE_SYSCALL_TRACEPOINTS > > @@ -345,6 +359,15 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p) > > struct tracepoint_ext __tracepoint_ext_##_name = { \ > > .regfunc = _reg, \ > > .unregfunc = _unreg, \ > > + .syscall = false, \ > > + }; \ > > + __DEFINE_TRACE_EXT(_name, &__tracepoint_ext_##_name, PARAMS(_proto), PARAMS(_args)); > > + > > +#define DEFINE_TRACE_SYSCALL(_name, _reg, _unreg, _proto, _args) \ > > + struct tracepoint_ext __tracepoint_ext_##_name = { \ > > + .regfunc = _reg, \ > > + .unregfunc = _unreg, \ > > + .syscall = true, \ > > }; \ > > __DEFINE_TRACE_EXT(_name, &__tracepoint_ext_##_name, PARAMS(_proto), PARAMS(_args)); > > > > @@ -389,6 +412,7 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p) > > #define __DECLARE_TRACE_SYSCALL __DECLARE_TRACE > > > > #define DEFINE_TRACE_FN(name, reg, unreg, proto, args) > > +#define DEFINE_TRACE_SYSCALL(name, reg, unreg, proto, args) > > #define DEFINE_TRACE(name, proto, args) > > #define EXPORT_TRACEPOINT_SYMBOL_GPL(name) > > #define EXPORT_TRACEPOINT_SYMBOL(name) > > diff --git a/include/trace/define_trace.h b/include/trace/define_trace.h > > index ff5fa17a6259..63fea2218afa 100644 > > --- a/include/trace/define_trace.h > > +++ b/include/trace/define_trace.h > > @@ -48,7 +48,7 @@ > > > > #undef TRACE_EVENT_SYSCALL > > #define TRACE_EVENT_SYSCALL(name, proto, args, struct, assign, print, reg, unreg) \ > > - DEFINE_TRACE_FN(name, reg, unreg, PARAMS(proto), PARAMS(args)) > > + DEFINE_TRACE_SYSCALL(name, reg, unreg, PARAMS(proto), PARAMS(args)) > > > > #undef TRACE_EVENT_NOP > > #define TRACE_EVENT_NOP(name, proto, args, struct, assign, print) >
On Sun, Oct 27, 2024 at 7:19 AM Masami Hiramatsu <mhiramat@kernel.org> wrote: > > On Sat, 26 Oct 2024 20:08:40 -0400 > Steven Rostedt <rostedt@goodmis.org> wrote: > > > On Sat, 26 Oct 2024 11:46:28 -0400 > > Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: > > > > > Introduce a "syscall" flag within the extended structure to know whether > > > a tracepoint needs rcu tasks trace grace period before reclaim. > > > This can be queried using tracepoint_is_syscall(). > > > > > > Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> > > > Cc: Michael Jeanson <mjeanson@efficios.com> > > > Cc: Steven Rostedt <rostedt@goodmis.org> > > > Cc: Masami Hiramatsu <mhiramat@kernel.org> > > > Cc: Peter Zijlstra <peterz@infradead.org> > > > Cc: Alexei Starovoitov <ast@kernel.org> > > > Cc: Yonghong Song <yhs@fb.com> > > > Cc: Paul E. McKenney <paulmck@kernel.org> > > > Cc: Ingo Molnar <mingo@redhat.com> > > > Cc: Arnaldo Carvalho de Melo <acme@kernel.org> > > > Cc: Mark Rutland <mark.rutland@arm.com> > > > Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> > > > Cc: Namhyung Kim <namhyung@kernel.org> > > > Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com> > > > Cc: bpf@vger.kernel.org > > > Cc: Joel Fernandes <joel@joelfernandes.org> > > > Cc: Jordan Rife <jrife@google.com> > > > --- > > > include/linux/tracepoint-defs.h | 2 ++ > > > include/linux/tracepoint.h | 24 ++++++++++++++++++++++++ > > > include/trace/define_trace.h | 2 +- > > > 3 files changed, 27 insertions(+), 1 deletion(-) > > > > > > diff --git a/include/linux/tracepoint-defs.h b/include/linux/tracepoint-defs.h > > > index 967c08d9da84..53119e074c87 100644 > > > --- a/include/linux/tracepoint-defs.h > > > +++ b/include/linux/tracepoint-defs.h > > > @@ -32,6 +32,8 @@ struct tracepoint_func { > > > struct tracepoint_ext { > > > int (*regfunc)(void); > > > void (*unregfunc)(void); > > > + /* Flags. */ > > > + unsigned int syscall:1; > > > > I wonder if we should call it "sleepable" instead? For this patch set > > do we really care if it's a system call or not? It's really if the > > tracepoint is sleepable or not that's the issue. System calls are just > > one user of it, there may be more in the future, and the changes to BPF > > will still be needed. > > I agree with this. Even if currently we restrict only syscall events > can be sleep, "tracepoint_is_syscall()" requires to add comment to > explain why on all call sites e.g. > +1 to naming this "sleepable" (or at least "faultable"). BPF world uses "sleepable BPF" terminology for BPF programs and attachment hooks that can take page fault (and wait/sleep waiting for those to be handled), so this would be consistent with that. Also, from BPF standpoint this will be advertised as attaching to sleepable tracepoints regardless, so "syscall" terminology is too specific and misleading, because while current set of tracepoints are syscall-specific, the important part is taking page fault, no tracing syscalls. > /* > * The syscall event is only sleepable event, so we ensure it is > * syscall event for checking sleepable or not. > */ > > If it called tracepoint_is_sleepable(), we don't need such comment. > > Thank you, > > > > > Other than that, I think this could work. > > > > -- Steve > > > > > > > }; > > > > > > struct tracepoint { > > > diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h > > > index 83dc24ee8b13..93e70bc64533 100644 > > > --- a/include/linux/tracepoint.h > > > +++ b/include/linux/tracepoint.h > > > @@ -104,6 +104,12 @@ void for_each_tracepoint_in_module(struct module *mod, > > > * tracepoint_synchronize_unregister must be called between the last tracepoint > > > * probe unregistration and the end of module exit to make sure there is no > > > * caller executing a probe when it is freed. > > > + * > > > + * An alternative is to use the following for batch reclaim associated > > > + * with a given tracepoint: > > > + * > > > + * - tracepoint_is_syscall() == false: call_rcu() > > > + * - tracepoint_is_syscall() == true: call_rcu_tasks_trace() > > > */ > > > #ifdef CONFIG_TRACEPOINTS > > > static inline void tracepoint_synchronize_unregister(void) > > > @@ -111,9 +117,17 @@ static inline void tracepoint_synchronize_unregister(void) > > > synchronize_rcu_tasks_trace(); > > > synchronize_rcu(); > > > } > > > +static inline bool tracepoint_is_syscall(struct tracepoint *tp) > > > +{ > > > + return tp->ext && tp->ext->syscall; > > > +} > > > #else > > > static inline void tracepoint_synchronize_unregister(void) > > > { } > > > +static inline bool tracepoint_is_syscall(struct tracepoint *tp) > > > +{ > > > + return false; > > > +} > > > #endif > > > > > > #ifdef CONFIG_HAVE_SYSCALL_TRACEPOINTS > > > @@ -345,6 +359,15 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p) > > > struct tracepoint_ext __tracepoint_ext_##_name = { \ > > > .regfunc = _reg, \ > > > .unregfunc = _unreg, \ > > > + .syscall = false, \ > > > + }; \ > > > + __DEFINE_TRACE_EXT(_name, &__tracepoint_ext_##_name, PARAMS(_proto), PARAMS(_args)); > > > + > > > +#define DEFINE_TRACE_SYSCALL(_name, _reg, _unreg, _proto, _args) \ > > > + struct tracepoint_ext __tracepoint_ext_##_name = { \ > > > + .regfunc = _reg, \ > > > + .unregfunc = _unreg, \ > > > + .syscall = true, \ > > > }; \ > > > __DEFINE_TRACE_EXT(_name, &__tracepoint_ext_##_name, PARAMS(_proto), PARAMS(_args)); > > > > > > @@ -389,6 +412,7 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p) > > > #define __DECLARE_TRACE_SYSCALL __DECLARE_TRACE > > > > > > #define DEFINE_TRACE_FN(name, reg, unreg, proto, args) > > > +#define DEFINE_TRACE_SYSCALL(name, reg, unreg, proto, args) > > > #define DEFINE_TRACE(name, proto, args) > > > #define EXPORT_TRACEPOINT_SYMBOL_GPL(name) > > > #define EXPORT_TRACEPOINT_SYMBOL(name) > > > diff --git a/include/trace/define_trace.h b/include/trace/define_trace.h > > > index ff5fa17a6259..63fea2218afa 100644 > > > --- a/include/trace/define_trace.h > > > +++ b/include/trace/define_trace.h > > > @@ -48,7 +48,7 @@ > > > > > > #undef TRACE_EVENT_SYSCALL > > > #define TRACE_EVENT_SYSCALL(name, proto, args, struct, assign, print, reg, unreg) \ > > > - DEFINE_TRACE_FN(name, reg, unreg, PARAMS(proto), PARAMS(args)) > > > + DEFINE_TRACE_SYSCALL(name, reg, unreg, PARAMS(proto), PARAMS(args)) > > > > > > #undef TRACE_EVENT_NOP > > > #define TRACE_EVENT_NOP(name, proto, args, struct, assign, print) > > > > > -- > Masami Hiramatsu (Google) <mhiramat@kernel.org>
On Sun, 27 Oct 2024 08:30:54 -0400 Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: > > > > I wonder if we should call it "sleepable" instead? For this patch set > > do we really care if it's a system call or not? It's really if the > > tracepoint is sleepable or not that's the issue. System calls are just > > one user of it, there may be more in the future, and the changes to BPF > > will still be needed. > > Remember that syscall tracepoint probes are allowed to handle page > faults, but should not generally block, otherwise it would postpone the > grace periods of all RCU tasks trace users. > > So naming this "sleepable" would be misleading, because probes are > not allowed general blocking, just to handle page faults. I'm fine with "faultable" too. > > If we look at the history of this tracepoint feature, we went with > the following naming over the various versions of the patch series: > > 1) Sleepable tracepoints: until we understood that we just want to > allow page fault, not general sleeping, so we needed to change > the name, > > 2) Faultable tracepoints: until Linus requested that we aim for > something that is specific to system calls, rather than a generic > thing. > > https://lore.kernel.org/lkml/CAHk-=wggDLDeTKbhb5hh--x=-DQd69v41137M72m6NOTmbD-cw@mail.gmail.com/ Reading that thread again, I believe that Linus was talking more about all the infrastructure going around to make a special "faultable" tracepoint (I could be wrong, and Linus may correct me here). When in fact, the only user is system calls. But from the BPF POV, it doesn't care if it's a system call, it cares that it is faultable, and the check should be on that. Having BPF check if it's a system call is requiring that BPF knows the implementation details of system call tracepoints. But if it knows it is faultable, then it needs to do something special. > > 3) Syscall tracepoints: This is what we currently have. > > > Other than that, I think this could work. > > Calling this field "sleepable" would be misleading. Calling it "faultable" > would be a better fit, but based on Linus' request, I'm tempted to stick > with "syscall" for now. > > Your concern is to name this in a way that is general and future-proof. > Linus' point was to make it syscall-specific rather than general. My > position is that we should wait until we face other use-cases (if we > even do) before consider changing the naming from "syscall" to something > more generic. Yes, but that was for the infrastructure itself. It really doesnt' make sense that BPF needs to know which type of tracepoint can fault. That's telling BPF, you need to know the implementation of this type of tracepoint. -- Steve
On 2024-10-28 01:06, Steven Rostedt wrote: > On Sun, 27 Oct 2024 08:30:54 -0400 > Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: >>> >>> I wonder if we should call it "sleepable" instead? For this patch set >>> do we really care if it's a system call or not? It's really if the >>> tracepoint is sleepable or not that's the issue. System calls are just >>> one user of it, there may be more in the future, and the changes to BPF >>> will still be needed. >> >> Remember that syscall tracepoint probes are allowed to handle page >> faults, but should not generally block, otherwise it would postpone the >> grace periods of all RCU tasks trace users. >> >> So naming this "sleepable" would be misleading, because probes are >> not allowed general blocking, just to handle page faults. > > I'm fine with "faultable" too. > >> >> If we look at the history of this tracepoint feature, we went with >> the following naming over the various versions of the patch series: >> >> 1) Sleepable tracepoints: until we understood that we just want to >> allow page fault, not general sleeping, so we needed to change >> the name, >> >> 2) Faultable tracepoints: until Linus requested that we aim for >> something that is specific to system calls, rather than a generic >> thing. >> >> https://lore.kernel.org/lkml/CAHk-=wggDLDeTKbhb5hh--x=-DQd69v41137M72m6NOTmbD-cw@mail.gmail.com/ > > Reading that thread again, I believe that Linus was talking more about > all the infrastructure going around to make a special "faultable" > tracepoint (I could be wrong, and Linus may correct me here). When in > fact, the only user is system calls. But from the BPF POV, it doesn't > care if it's a system call, it cares that it is faultable, and the > check should be on that. Having BPF check if it's a system call is > requiring that BPF knows the implementation details of system call > tracepoints. But if it knows it is faultable, then it needs to do > something special. It might just be that, indeed. Considering the overwhelming preference for something a little more general (sleepable/faultable vs syscall), I am tempted to go for "tracepoint_is_faultable()". > >> >> 3) Syscall tracepoints: This is what we currently have. >> >>> Other than that, I think this could work. >> >> Calling this field "sleepable" would be misleading. Calling it "faultable" >> would be a better fit, but based on Linus' request, I'm tempted to stick >> with "syscall" for now. >> >> Your concern is to name this in a way that is general and future-proof. >> Linus' point was to make it syscall-specific rather than general. My >> position is that we should wait until we face other use-cases (if we >> even do) before consider changing the naming from "syscall" to something >> more generic. > > Yes, but that was for the infrastructure itself. It really doesnt' make > sense that BPF needs to know which type of tracepoint can fault. That's > telling BPF, you need to know the implementation of this type of > tracepoint. OK, I'll use tracepoint_is_faultable() and a "faultable" field name, and see how people react. I really prefer "faultable" to "sleepable" here, because I envision that in the future we may introduce tracepoints which are really able to sleep (general blocking), for instance though use of hazard pointers to protect a list iteration. (if there is ever a need for it) Thanks, Mathieu
On 2024-10-27 21:23, Andrii Nakryiko wrote: > On Sun, Oct 27, 2024 at 7:19 AM Masami Hiramatsu <mhiramat@kernel.org> wrote: [...] >>>> include/linux/tracepoint-defs.h | 2 ++ >>>> include/linux/tracepoint.h | 24 ++++++++++++++++++++++++ >>>> include/trace/define_trace.h | 2 +- >>>> 3 files changed, 27 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/include/linux/tracepoint-defs.h b/include/linux/tracepoint-defs.h >>>> index 967c08d9da84..53119e074c87 100644 >>>> --- a/include/linux/tracepoint-defs.h >>>> +++ b/include/linux/tracepoint-defs.h >>>> @@ -32,6 +32,8 @@ struct tracepoint_func { >>>> struct tracepoint_ext { >>>> int (*regfunc)(void); >>>> void (*unregfunc)(void); >>>> + /* Flags. */ >>>> + unsigned int syscall:1; >>> >>> I wonder if we should call it "sleepable" instead? For this patch set >>> do we really care if it's a system call or not? It's really if the >>> tracepoint is sleepable or not that's the issue. System calls are just >>> one user of it, there may be more in the future, and the changes to BPF >>> will still be needed. >> >> I agree with this. Even if currently we restrict only syscall events >> can be sleep, "tracepoint_is_syscall()" requires to add comment to >> explain why on all call sites e.g. >> > > +1 to naming this "sleepable" (or at least "faultable"). BPF world > uses "sleepable BPF" terminology for BPF programs and attachment hooks > that can take page fault (and wait/sleep waiting for those to be > handled), so this would be consistent with that. Also, from BPF > standpoint this will be advertised as attaching to sleepable > tracepoints regardless, so "syscall" terminology is too specific and > misleading, because while current set of tracepoints are > syscall-specific, the important part is taking page fault, no tracing > syscalls. +1 for "faultable". Thanks, Mathieu
diff --git a/include/linux/tracepoint-defs.h b/include/linux/tracepoint-defs.h index 967c08d9da84..53119e074c87 100644 --- a/include/linux/tracepoint-defs.h +++ b/include/linux/tracepoint-defs.h @@ -32,6 +32,8 @@ struct tracepoint_func { struct tracepoint_ext { int (*regfunc)(void); void (*unregfunc)(void); + /* Flags. */ + unsigned int syscall:1; }; struct tracepoint { diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h index 83dc24ee8b13..93e70bc64533 100644 --- a/include/linux/tracepoint.h +++ b/include/linux/tracepoint.h @@ -104,6 +104,12 @@ void for_each_tracepoint_in_module(struct module *mod, * tracepoint_synchronize_unregister must be called between the last tracepoint * probe unregistration and the end of module exit to make sure there is no * caller executing a probe when it is freed. + * + * An alternative is to use the following for batch reclaim associated + * with a given tracepoint: + * + * - tracepoint_is_syscall() == false: call_rcu() + * - tracepoint_is_syscall() == true: call_rcu_tasks_trace() */ #ifdef CONFIG_TRACEPOINTS static inline void tracepoint_synchronize_unregister(void) @@ -111,9 +117,17 @@ static inline void tracepoint_synchronize_unregister(void) synchronize_rcu_tasks_trace(); synchronize_rcu(); } +static inline bool tracepoint_is_syscall(struct tracepoint *tp) +{ + return tp->ext && tp->ext->syscall; +} #else static inline void tracepoint_synchronize_unregister(void) { } +static inline bool tracepoint_is_syscall(struct tracepoint *tp) +{ + return false; +} #endif #ifdef CONFIG_HAVE_SYSCALL_TRACEPOINTS @@ -345,6 +359,15 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p) struct tracepoint_ext __tracepoint_ext_##_name = { \ .regfunc = _reg, \ .unregfunc = _unreg, \ + .syscall = false, \ + }; \ + __DEFINE_TRACE_EXT(_name, &__tracepoint_ext_##_name, PARAMS(_proto), PARAMS(_args)); + +#define DEFINE_TRACE_SYSCALL(_name, _reg, _unreg, _proto, _args) \ + struct tracepoint_ext __tracepoint_ext_##_name = { \ + .regfunc = _reg, \ + .unregfunc = _unreg, \ + .syscall = true, \ }; \ __DEFINE_TRACE_EXT(_name, &__tracepoint_ext_##_name, PARAMS(_proto), PARAMS(_args)); @@ -389,6 +412,7 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p) #define __DECLARE_TRACE_SYSCALL __DECLARE_TRACE #define DEFINE_TRACE_FN(name, reg, unreg, proto, args) +#define DEFINE_TRACE_SYSCALL(name, reg, unreg, proto, args) #define DEFINE_TRACE(name, proto, args) #define EXPORT_TRACEPOINT_SYMBOL_GPL(name) #define EXPORT_TRACEPOINT_SYMBOL(name) diff --git a/include/trace/define_trace.h b/include/trace/define_trace.h index ff5fa17a6259..63fea2218afa 100644 --- a/include/trace/define_trace.h +++ b/include/trace/define_trace.h @@ -48,7 +48,7 @@ #undef TRACE_EVENT_SYSCALL #define TRACE_EVENT_SYSCALL(name, proto, args, struct, assign, print, reg, unreg) \ - DEFINE_TRACE_FN(name, reg, unreg, PARAMS(proto), PARAMS(args)) + DEFINE_TRACE_SYSCALL(name, reg, unreg, PARAMS(proto), PARAMS(args)) #undef TRACE_EVENT_NOP #define TRACE_EVENT_NOP(name, proto, args, struct, assign, print)
Introduce a "syscall" flag within the extended structure to know whether a tracepoint needs rcu tasks trace grace period before reclaim. This can be queried using tracepoint_is_syscall(). Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Michael Jeanson <mjeanson@efficios.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Yonghong Song <yhs@fb.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com> Cc: bpf@vger.kernel.org Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Jordan Rife <jrife@google.com> --- include/linux/tracepoint-defs.h | 2 ++ include/linux/tracepoint.h | 24 ++++++++++++++++++++++++ include/trace/define_trace.h | 2 +- 3 files changed, 27 insertions(+), 1 deletion(-)