Message ID | 20241128115319.305523-1-chen.dylane@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | tracing: Add WARN_ON_ONCE for syscall_nr check | expand |
On Thu, 28 Nov 2024 19:53:19 +0800 Tao Chen <chen.dylane@gmail.com> wrote: > Now, x86_64 kernel not support to trace syscall for ia32 syscall. > As a result, there is no any trace output when tracing a ia32 task. > Like unreg_event_syscall_enter, add a WARN_ON_ONCE judgment for > syscall_nr in perf_syscall_enter and ftrace_syscall_enter to give > some message. So on a system that has "panic_on_warn" set and they trace a 32 bit system call, it will cause their system to crash. Is that the intended behavior? WARN*() is for self testing the kernel to detect real bugs, not to inform users that something isn't supported. BIG NAK! -- Steve
在 2024/11/28 20:46, Steven Rostedt 写道: > On Thu, 28 Nov 2024 19:53:19 +0800 > Tao Chen <chen.dylane@gmail.com> wrote: > >> Now, x86_64 kernel not support to trace syscall for ia32 syscall. >> As a result, there is no any trace output when tracing a ia32 task. >> Like unreg_event_syscall_enter, add a WARN_ON_ONCE judgment for >> syscall_nr in perf_syscall_enter and ftrace_syscall_enter to give >> some message. > > So on a system that has "panic_on_warn" set and they trace a 32 bit > system call, it will cause their system to crash. Is that the intended > behavior? > > WARN*() is for self testing the kernel to detect real bugs, not to > inform users that something isn't supported. > > BIG NAK! > > -- Steve Hi, Steve, thank you for your reply, as you say, so what about pr_warn_once api just to print something ?
On 2024-11-28 08:15, Tao Chen wrote: > 在 2024/11/28 20:46, Steven Rostedt 写道: >> On Thu, 28 Nov 2024 19:53:19 +0800 >> Tao Chen <chen.dylane@gmail.com> wrote: >> >>> Now, x86_64 kernel not support to trace syscall for ia32 syscall. >>> As a result, there is no any trace output when tracing a ia32 task. >>> Like unreg_event_syscall_enter, add a WARN_ON_ONCE judgment for >>> syscall_nr in perf_syscall_enter and ftrace_syscall_enter to give >>> some message. >> >> So on a system that has "panic_on_warn" set and they trace a 32 bit >> system call, it will cause their system to crash. Is that the intended >> behavior? >> >> WARN*() is for self testing the kernel to detect real bugs, not to >> inform users that something isn't supported. >> >> BIG NAK! >> >> -- Steve > > Hi, Steve, thank you for your reply, as you say, so what about > pr_warn_once api just to print something ? > I understand that explicitly enabling a system call and observing that ia32 system calls are just not traced by ftrace and perf can be surprising for end users. But printing a warning on the tracing path for an unimplemented tracing feature is just wrong. Also, AFAIU the warning will trigger if an ia32 program issues any system call when any unrelated system call tracing is enabled, including non-compat syscalls. If you want to check something and return an error, it would be in tracefs where the users interact with files that represent ia32 system calls to try to list/enable them. However, given the exposed files have nothing that mention whether they apply to non-compat or compat system calls, it's understandable that an end user would think all system calls are traced, including compat system calls. So I'm not sure how to solve that in ftrace/perf without actually implementing the missing feature the way the tracefs ABI is exposed. If your end goal is to trace ia32 system call, you may want to try the lttng-modules kernel tracer [1], which has supported compat system call tracing for the past 12 years (at least since lttng-modules 2.0). Thanks, Mathieu [1] https://lttng.org/
On Thu, 28 Nov 2024 21:15:31 +0800 Tao Chen <chen.dylane@gmail.com> wrote: > Hi, Steve, thank you for your reply, as you say, so what about > pr_warn_once api just to print something ? A better solution is for there to be a return code or something where the tracers (perf or ftrace) can record in the trace that the system call is not supported. -- Steve
On 2024-11-28 10:03, Steven Rostedt wrote: > On Thu, 28 Nov 2024 21:15:31 +0800 > Tao Chen <chen.dylane@gmail.com> wrote: > >> Hi, Steve, thank you for your reply, as you say, so what about >> pr_warn_once api just to print something ? > > A better solution is for there to be a return code or something where the > tracers (perf or ftrace) can record in the trace that the system call is > not supported. Just be careful not to spam the traces uselessly. E.g. if only the openat syscall is enabled, you'd only want to be made aware of the ia32 openat, not all ia32 syscalls. Thanks, Mathieu
On Thu, 28 Nov 2024 11:02:31 -0500 Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: > > A better solution is for there to be a return code or something where the > > tracers (perf or ftrace) can record in the trace that the system call is > > not supported. > > Just be careful not to spam the traces uselessly. E.g. if only the > openat syscall is enabled, you'd only want to be made aware of the > ia32 openat, not all ia32 syscalls. Why not? If you ask to trace something that isn't able to be traced, add something in the buffer. It's not totally useless information. If anything, you know that a task is making ia32 system calls, and how many and when they are doing so. Why make it more complex than it has to be. To do it only once, you need to add the accounting logic to make sure each trace gets notified about it. Not to mention if the event gets dropped. If the user doesn't want this in their buffer, then they should filter it out. -- Steve
在 2024/11/28 22:27, Mathieu Desnoyers 写道: > On 2024-11-28 08:15, Tao Chen wrote: >> 在 2024/11/28 20:46, Steven Rostedt 写道: >>> On Thu, 28 Nov 2024 19:53:19 +0800 >>> Tao Chen <chen.dylane@gmail.com> wrote: >>> >>>> Now, x86_64 kernel not support to trace syscall for ia32 syscall. >>>> As a result, there is no any trace output when tracing a ia32 task. >>>> Like unreg_event_syscall_enter, add a WARN_ON_ONCE judgment for >>>> syscall_nr in perf_syscall_enter and ftrace_syscall_enter to give >>>> some message. >>> >>> So on a system that has "panic_on_warn" set and they trace a 32 bit >>> system call, it will cause their system to crash. Is that the intended >>> behavior? >>> >>> WARN*() is for self testing the kernel to detect real bugs, not to >>> inform users that something isn't supported. >>> >>> BIG NAK! >>> >>> -- Steve >> >> Hi, Steve, thank you for your reply, as you say, so what about >> pr_warn_once api just to print something ? >> > > I understand that explicitly enabling a system call and observing > that ia32 system calls are just not traced by ftrace and perf can > be surprising for end users. But printing a warning on the tracing > path for an unimplemented tracing feature is just wrong. > The initial problem was that I used eBPF SEC("tp/syscalls/sys_enter_sendto") to capture system calls and found no results for i32 programs. So here, i just wanted to remind users that on a 64-bit system, i32 system calls are not supported, to avoid confusion. > Also, AFAIU the warning will trigger if an ia32 program issues any > system call when any unrelated system call tracing is enabled, > including non-compat syscalls. > > If you want to check something and return an error, it would > be in tracefs where the users interact with files that represent > ia32 system calls to try to list/enable them. However, given the > exposed files have nothing that mention whether they apply to > non-compat or compat system calls, it's understandable that an > end user would think all system calls are traced, including > compat system calls. So I'm not sure how to solve that in ftrace/perf > without actually implementing the missing feature the way the > tracefs ABI is exposed. > > If your end goal is to trace ia32 system call, you may want to try the > lttng-modules kernel tracer [1], which has supported compat system call > tracing for the past 12 years (at least since lttng-modules 2.0). Thank you for your recommendation. I'll take a look at how lttng implements it. Actually, SEC("tp/raw_syscall/sys_enter") can also capure it. > > Thanks, > > Mathieu > > [1] https://lttng.org/ >
在 2024/11/28 23:03, Steven Rostedt 写道: > On Thu, 28 Nov 2024 21:15:31 +0800 > Tao Chen <chen.dylane@gmail.com> wrote: > >> Hi, Steve, thank you for your reply, as you say, so what about >> pr_warn_once api just to print something ? > > A better solution is for there to be a return code or something where the > tracers (perf or ftrace) can record in the trace that the system call is > not supported. > > -- Steve Hi,Steve,thank you for your suggestion. I'll see if it can be implemented.
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c index 785733245ead..19c3335e7d84 100644 --- a/kernel/trace/trace_syscalls.c +++ b/kernel/trace/trace_syscalls.c @@ -300,7 +300,7 @@ static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id) int size; syscall_nr = trace_get_syscall_nr(current, regs); - if (syscall_nr < 0 || syscall_nr >= NR_syscalls) + if (WARN_ON_ONCE(syscall_nr < 0 || syscall_nr >= NR_syscalls)) return; /* Here we're inside tp handler's rcu_read_lock_sched (__DO_TRACE) */ @@ -339,7 +339,7 @@ static void ftrace_syscall_exit(void *data, struct pt_regs *regs, long ret) int syscall_nr; syscall_nr = trace_get_syscall_nr(current, regs); - if (syscall_nr < 0 || syscall_nr >= NR_syscalls) + if (WARN_ON_ONCE(syscall_nr < 0 || syscall_nr >= NR_syscalls)) return; /* Here we're inside tp handler's rcu_read_lock_sched (__DO_TRACE()) */ @@ -585,7 +585,7 @@ static void perf_syscall_enter(void *ignore, struct pt_regs *regs, long id) int size; syscall_nr = trace_get_syscall_nr(current, regs); - if (syscall_nr < 0 || syscall_nr >= NR_syscalls) + if (WARN_ON_ONCE(syscall_nr < 0 || syscall_nr >= NR_syscalls)) return; if (!test_bit(syscall_nr, enabled_perf_enter_syscalls)) return; @@ -687,7 +687,7 @@ static void perf_syscall_exit(void *ignore, struct pt_regs *regs, long ret) int size; syscall_nr = trace_get_syscall_nr(current, regs); - if (syscall_nr < 0 || syscall_nr >= NR_syscalls) + if (WARN_ON_ONCE(syscall_nr < 0 || syscall_nr >= NR_syscalls)) return; if (!test_bit(syscall_nr, enabled_perf_exit_syscalls)) return;
Now, x86_64 kernel not support to trace syscall for ia32 syscall. As a result, there is no any trace output when tracing a ia32 task. Like unreg_event_syscall_enter, add a WARN_ON_ONCE judgment for syscall_nr in perf_syscall_enter and ftrace_syscall_enter to give some message. #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> int main(void) { int ret = open("tmp.txt", 0); return 0; } gcc -m32 open32 open.c echo 1 > /sys/kernel/debug/tracing/events/syscalls/sys_enter_open/enable ./open32 [ 109.214595] ------------[ cut here ]------------ [ 109.215625] WARNING: CPU: 0 PID: 170 at kernel/trace/trace_syscalls.c:304 ftrace_syscall_enter+0x55/0x1c0 [ 109.217111] Modules linked in: [ 109.218190] CPU: 0 UID: 0 PID: 170 Comm: open32 Not tainted 6.12.0-rc4-virtme #10] Signed-off-by: Tao Chen <chen.dylane@gmail.com> --- kernel/trace/trace_syscalls.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)