diff mbox series

[v1,6/8] tracing/ftrace: Add might_fault check to syscall probes

Message ID 20241003151638.1608537-7-mathieu.desnoyers@efficios.com (mailing list archive)
State Superseded
Headers show
Series tracing: Allow system call tracepoints to handle page faults | expand

Commit Message

Mathieu Desnoyers Oct. 3, 2024, 3:16 p.m. UTC
Add a might_fault() check to validate that the ftrace sys_enter/sys_exit
probe callbacks are indeed called from a context where page faults can
be handled.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Michael Jeanson <mjeanson@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: bpf@vger.kernel.org
Cc: Joel Fernandes <joel@joelfernandes.org>
---
 include/trace/trace_events.h  | 1 +
 kernel/trace/trace_syscalls.c | 2 ++
 2 files changed, 3 insertions(+)

Comments

Steven Rostedt Oct. 3, 2024, 10:36 p.m. UTC | #1
On Thu,  3 Oct 2024 11:16:36 -0400
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:

> diff --git a/include/trace/trace_events.h b/include/trace/trace_events.h
> index 0228d9ed94a3..e0d4850b0d77 100644
> --- a/include/trace/trace_events.h
> +++ b/include/trace/trace_events.h
> @@ -446,6 +446,7 @@ __DECLARE_EVENT_CLASS(call, PARAMS(proto), PARAMS(args), PARAMS(tstruct), \
>  static notrace void							\
>  trace_event_raw_event_##call(void *__data, proto)			\
>  {									\
> +	might_fault();							\

I don't think we want "might_fault()" here, as this is called for every
tracepoint that is created by the TRACE_EVENT() macro. That means, there's
going to be plenty of locations this gets called at that do not allow faults.

-- Steve


>  	guard(preempt_notrace)();					\
>  	do_trace_event_raw_event_##call(__data, args);			\
>  }
Mathieu Desnoyers Oct. 4, 2024, 12:11 a.m. UTC | #2
On 2024-10-04 00:36, Steven Rostedt wrote:
> On Thu,  3 Oct 2024 11:16:36 -0400
> Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> 
>> diff --git a/include/trace/trace_events.h b/include/trace/trace_events.h
>> index 0228d9ed94a3..e0d4850b0d77 100644
>> --- a/include/trace/trace_events.h
>> +++ b/include/trace/trace_events.h
>> @@ -446,6 +446,7 @@ __DECLARE_EVENT_CLASS(call, PARAMS(proto), PARAMS(args), PARAMS(tstruct), \
>>   static notrace void							\
>>   trace_event_raw_event_##call(void *__data, proto)			\
>>   {									\
>> +	might_fault();							\
> 
> I don't think we want "might_fault()" here, as this is called for every
> tracepoint that is created by the TRACE_EVENT() macro. That means, there's
> going to be plenty of locations this gets called at that do not allow faults.

Here is the full context where this line applies:

#undef DECLARE_EVENT_SYSCALL_CLASS
#define DECLARE_EVENT_SYSCALL_CLASS(call, proto, args, tstruct, assign, print) \
__DECLARE_EVENT_CLASS(call, PARAMS(proto), PARAMS(args), PARAMS(tstruct), \
                       PARAMS(assign), PARAMS(print))                    \
static notrace void                                                     \
trace_event_raw_event_##call(void *__data, proto)                       \
{                                                                       \
         might_fault();                                                  \
         guard(preempt_notrace)();                                       \
         do_trace_event_raw_event_##call(__data, args);                  \
}

Not an issue, since it's only for syscall tracepoints.

Thanks,

Mathieu

> 
> -- Steve
> 
> 
>>   	guard(preempt_notrace)();					\
>>   	do_trace_event_raw_event_##call(__data, args);			\
>>   }
diff mbox series

Patch

diff --git a/include/trace/trace_events.h b/include/trace/trace_events.h
index 0228d9ed94a3..e0d4850b0d77 100644
--- a/include/trace/trace_events.h
+++ b/include/trace/trace_events.h
@@ -446,6 +446,7 @@  __DECLARE_EVENT_CLASS(call, PARAMS(proto), PARAMS(args), PARAMS(tstruct), \
 static notrace void							\
 trace_event_raw_event_##call(void *__data, proto)			\
 {									\
+	might_fault();							\
 	guard(preempt_notrace)();					\
 	do_trace_event_raw_event_##call(__data, args);			\
 }
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index edcfa47446c7..89d7e4c57b5b 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -303,6 +303,7 @@  static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)
 	 * Syscall probe called with preemption enabled, but the ring
 	 * buffer and per-cpu data require preemption to be disabled.
 	 */
+	might_fault();
 	guard(preempt_notrace)();
 
 	syscall_nr = trace_get_syscall_nr(current, regs);
@@ -348,6 +349,7 @@  static void ftrace_syscall_exit(void *data, struct pt_regs *regs, long ret)
 	 * Syscall probe called with preemption enabled, but the ring
 	 * buffer and per-cpu data require preemption to be disabled.
 	 */
+	might_fault();
 	guard(preempt_notrace)();
 
 	syscall_nr = trace_get_syscall_nr(current, regs);