diff mbox series

tracing: Add WARN_ON_ONCE for syscall_nr check

Message ID 20241128115319.305523-1-chen.dylane@gmail.com (mailing list archive)
State New
Headers show
Series tracing: Add WARN_ON_ONCE for syscall_nr check | expand

Commit Message

Tao Chen Nov. 28, 2024, 11:53 a.m. UTC
Now, x86_64 kernel not support to trace syscall for ia32 syscall.
As a result, there is no any trace output when tracing a ia32 task.
Like unreg_event_syscall_enter, add a WARN_ON_ONCE judgment for
syscall_nr in perf_syscall_enter and ftrace_syscall_enter to give
some message.

 #include <sys/types.h>
 #include <sys/stat.h>
 #include <fcntl.h>

 int main(void)
 {
	int ret = open("tmp.txt", 0);
	return 0;
 }

gcc -m32 open32 open.c
echo 1 > /sys/kernel/debug/tracing/events/syscalls/sys_enter_open/enable
./open32

[  109.214595] ------------[ cut here ]------------
[  109.215625] WARNING: CPU: 0 PID: 170 at
kernel/trace/trace_syscalls.c:304 ftrace_syscall_enter+0x55/0x1c0
[  109.217111] Modules linked in:
[  109.218190] CPU: 0 UID: 0 PID: 170 Comm: open32 Not tainted
6.12.0-rc4-virtme #10]

Signed-off-by: Tao Chen <chen.dylane@gmail.com>
---
 kernel/trace/trace_syscalls.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Comments

Steven Rostedt Nov. 28, 2024, 12:46 p.m. UTC | #1
On Thu, 28 Nov 2024 19:53:19 +0800
Tao Chen <chen.dylane@gmail.com> wrote:

> Now, x86_64 kernel not support to trace syscall for ia32 syscall.
> As a result, there is no any trace output when tracing a ia32 task.
> Like unreg_event_syscall_enter, add a WARN_ON_ONCE judgment for
> syscall_nr in perf_syscall_enter and ftrace_syscall_enter to give
> some message.

So on a system that has "panic_on_warn" set and they trace a 32 bit
system call, it will cause their system to crash. Is that the intended
behavior?

WARN*() is for self testing the kernel to detect real bugs, not to
inform users that something isn't supported.

BIG NAK!

-- Steve
Tao Chen Nov. 28, 2024, 1:15 p.m. UTC | #2
在 2024/11/28 20:46, Steven Rostedt 写道:
> On Thu, 28 Nov 2024 19:53:19 +0800
> Tao Chen <chen.dylane@gmail.com> wrote:
> 
>> Now, x86_64 kernel not support to trace syscall for ia32 syscall.
>> As a result, there is no any trace output when tracing a ia32 task.
>> Like unreg_event_syscall_enter, add a WARN_ON_ONCE judgment for
>> syscall_nr in perf_syscall_enter and ftrace_syscall_enter to give
>> some message.
> 
> So on a system that has "panic_on_warn" set and they trace a 32 bit
> system call, it will cause their system to crash. Is that the intended
> behavior?
> 
> WARN*() is for self testing the kernel to detect real bugs, not to
> inform users that something isn't supported.
> 
> BIG NAK!
> 
> -- Steve

Hi, Steve, thank you for your reply, as you say, so what about 
pr_warn_once api just to print something ?
Mathieu Desnoyers Nov. 28, 2024, 2:27 p.m. UTC | #3
On 2024-11-28 08:15, Tao Chen wrote:
> 在 2024/11/28 20:46, Steven Rostedt 写道:
>> On Thu, 28 Nov 2024 19:53:19 +0800
>> Tao Chen <chen.dylane@gmail.com> wrote:
>>
>>> Now, x86_64 kernel not support to trace syscall for ia32 syscall.
>>> As a result, there is no any trace output when tracing a ia32 task.
>>> Like unreg_event_syscall_enter, add a WARN_ON_ONCE judgment for
>>> syscall_nr in perf_syscall_enter and ftrace_syscall_enter to give
>>> some message.
>>
>> So on a system that has "panic_on_warn" set and they trace a 32 bit
>> system call, it will cause their system to crash. Is that the intended
>> behavior?
>>
>> WARN*() is for self testing the kernel to detect real bugs, not to
>> inform users that something isn't supported.
>>
>> BIG NAK!
>>
>> -- Steve
> 
> Hi, Steve, thank you for your reply, as you say, so what about 
> pr_warn_once api just to print something ?
> 

I understand that explicitly enabling a system call and observing
that ia32 system calls are just not traced by ftrace and perf can
be surprising for end users. But printing a warning on the tracing
path for an unimplemented tracing feature is just wrong.

Also, AFAIU the warning will trigger if an ia32 program issues any
system call when any unrelated system call tracing is enabled,
including non-compat syscalls.

If you want to check something and return an error, it would
be in tracefs where the users interact with files that represent
ia32 system calls to try to list/enable them. However, given the
exposed files have nothing that mention whether they apply to
non-compat or compat system calls, it's understandable that an
end user would think all system calls are traced, including
compat system calls. So I'm not sure how to solve that in ftrace/perf
without actually implementing the missing feature the way the
tracefs ABI is exposed.

If your end goal is to trace ia32 system call, you may want to try the
lttng-modules kernel tracer [1], which has supported compat system call
tracing for the past 12 years (at least since lttng-modules 2.0).

Thanks,

Mathieu

[1] https://lttng.org/
Steven Rostedt Nov. 28, 2024, 3:03 p.m. UTC | #4
On Thu, 28 Nov 2024 21:15:31 +0800
Tao Chen <chen.dylane@gmail.com> wrote:

> Hi, Steve, thank you for your reply, as you say, so what about 
> pr_warn_once api just to print something ?

A better solution is for there to be a return code or something where the
tracers (perf or ftrace) can record in the trace that the system call is
not supported.

-- Steve
Mathieu Desnoyers Nov. 28, 2024, 4:02 p.m. UTC | #5
On 2024-11-28 10:03, Steven Rostedt wrote:
> On Thu, 28 Nov 2024 21:15:31 +0800
> Tao Chen <chen.dylane@gmail.com> wrote:
> 
>> Hi, Steve, thank you for your reply, as you say, so what about
>> pr_warn_once api just to print something ?
> 
> A better solution is for there to be a return code or something where the
> tracers (perf or ftrace) can record in the trace that the system call is
> not supported.

Just be careful not to spam the traces uselessly. E.g. if only the
openat syscall is enabled, you'd only want to be made aware of the
ia32 openat, not all ia32 syscalls.

Thanks,

Mathieu
Steven Rostedt Nov. 28, 2024, 4:52 p.m. UTC | #6
On Thu, 28 Nov 2024 11:02:31 -0500
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:

> > A better solution is for there to be a return code or something where the
> > tracers (perf or ftrace) can record in the trace that the system call is
> > not supported.  
> 
> Just be careful not to spam the traces uselessly. E.g. if only the
> openat syscall is enabled, you'd only want to be made aware of the
> ia32 openat, not all ia32 syscalls.

Why not? If you ask to trace something that isn't able to be traced,
add something in the buffer. It's not totally useless information. If
anything, you know that a task is making ia32 system calls, and how
many and when they are doing so.

Why make it more complex than it has to be. To do it only once, you
need to add the accounting logic to make sure each trace gets notified
about it. Not to mention if the event gets dropped.

If the user doesn't want this in their buffer, then they should filter
it out.

-- Steve
Tao Chen Nov. 29, 2024, 2:48 a.m. UTC | #7
在 2024/11/28 22:27, Mathieu Desnoyers 写道:
> On 2024-11-28 08:15, Tao Chen wrote:
>> 在 2024/11/28 20:46, Steven Rostedt 写道:
>>> On Thu, 28 Nov 2024 19:53:19 +0800
>>> Tao Chen <chen.dylane@gmail.com> wrote:
>>>
>>>> Now, x86_64 kernel not support to trace syscall for ia32 syscall.
>>>> As a result, there is no any trace output when tracing a ia32 task.
>>>> Like unreg_event_syscall_enter, add a WARN_ON_ONCE judgment for
>>>> syscall_nr in perf_syscall_enter and ftrace_syscall_enter to give
>>>> some message.
>>>
>>> So on a system that has "panic_on_warn" set and they trace a 32 bit
>>> system call, it will cause their system to crash. Is that the intended
>>> behavior?
>>>
>>> WARN*() is for self testing the kernel to detect real bugs, not to
>>> inform users that something isn't supported.
>>>
>>> BIG NAK!
>>>
>>> -- Steve
>>
>> Hi, Steve, thank you for your reply, as you say, so what about 
>> pr_warn_once api just to print something ?
>>
> 
> I understand that explicitly enabling a system call and observing
> that ia32 system calls are just not traced by ftrace and perf can
> be surprising for end users. But printing a warning on the tracing
> path for an unimplemented tracing feature is just wrong.
> 

The initial problem was that I used eBPF 
SEC("tp/syscalls/sys_enter_sendto") to capture system calls and found no 
results for i32 programs. So here, i just wanted to remind users that on 
a 64-bit system, i32 system calls are not supported, to avoid confusion.

> Also, AFAIU the warning will trigger if an ia32 program issues any
> system call when any unrelated system call tracing is enabled,
> including non-compat syscalls.
> 
> If you want to check something and return an error, it would
> be in tracefs where the users interact with files that represent
> ia32 system calls to try to list/enable them. However, given the
> exposed files have nothing that mention whether they apply to
> non-compat or compat system calls, it's understandable that an
> end user would think all system calls are traced, including
> compat system calls. So I'm not sure how to solve that in ftrace/perf
> without actually implementing the missing feature the way the
> tracefs ABI is exposed.
> 
> If your end goal is to trace ia32 system call, you may want to try the
> lttng-modules kernel tracer [1], which has supported compat system call
> tracing for the past 12 years (at least since lttng-modules 2.0).

Thank you for your recommendation. I'll take a look at how lttng 
implements it. Actually, SEC("tp/raw_syscall/sys_enter") can also
capure it.

> 
> Thanks,
> 
> Mathieu
> 
> [1] https://lttng.org/
>
Tao Chen Nov. 29, 2024, 2:51 a.m. UTC | #8
在 2024/11/28 23:03, Steven Rostedt 写道:
> On Thu, 28 Nov 2024 21:15:31 +0800
> Tao Chen <chen.dylane@gmail.com> wrote:
> 
>> Hi, Steve, thank you for your reply, as you say, so what about
>> pr_warn_once api just to print something ?
> 
> A better solution is for there to be a return code or something where the
> tracers (perf or ftrace) can record in the trace that the system call is
> not supported.
> 
> -- Steve

Hi,Steve,thank you for your suggestion. I'll see if it can be implemented.
diff mbox series

Patch

diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index 785733245ead..19c3335e7d84 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -300,7 +300,7 @@  static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)
 	int size;
 
 	syscall_nr = trace_get_syscall_nr(current, regs);
-	if (syscall_nr < 0 || syscall_nr >= NR_syscalls)
+	if (WARN_ON_ONCE(syscall_nr < 0 || syscall_nr >= NR_syscalls))
 		return;
 
 	/* Here we're inside tp handler's rcu_read_lock_sched (__DO_TRACE) */
@@ -339,7 +339,7 @@  static void ftrace_syscall_exit(void *data, struct pt_regs *regs, long ret)
 	int syscall_nr;
 
 	syscall_nr = trace_get_syscall_nr(current, regs);
-	if (syscall_nr < 0 || syscall_nr >= NR_syscalls)
+	if (WARN_ON_ONCE(syscall_nr < 0 || syscall_nr >= NR_syscalls))
 		return;
 
 	/* Here we're inside tp handler's rcu_read_lock_sched (__DO_TRACE()) */
@@ -585,7 +585,7 @@  static void perf_syscall_enter(void *ignore, struct pt_regs *regs, long id)
 	int size;
 
 	syscall_nr = trace_get_syscall_nr(current, regs);
-	if (syscall_nr < 0 || syscall_nr >= NR_syscalls)
+	if (WARN_ON_ONCE(syscall_nr < 0 || syscall_nr >= NR_syscalls))
 		return;
 	if (!test_bit(syscall_nr, enabled_perf_enter_syscalls))
 		return;
@@ -687,7 +687,7 @@  static void perf_syscall_exit(void *ignore, struct pt_regs *regs, long ret)
 	int size;
 
 	syscall_nr = trace_get_syscall_nr(current, regs);
-	if (syscall_nr < 0 || syscall_nr >= NR_syscalls)
+	if (WARN_ON_ONCE(syscall_nr < 0 || syscall_nr >= NR_syscalls))
 		return;
 	if (!test_bit(syscall_nr, enabled_perf_exit_syscalls))
 		return;