mbox series

[RFC,0/2] Add basic tracing support for m68k

Message ID 20241021-add-m68k-tracing-support-v1-0-0883d704525b@yoseli.org (mailing list archive)
Headers show
Series Add basic tracing support for m68k | expand

Message

Jean-Michel Hautbois Oct. 21, 2024, 9:44 a.m. UTC
In order to debug latency issues, I wanted to use ftrace on my M54418
coldfire. Sadly, it is not supported yet.
Thanks to Steven [1] it does not sound too difficult.

This small series adds basic functions to make it work, and for the few
tests I could do, it seems to be working well.

Here is a simple output I get:

```

            bash-232     [000] d..3.   947.629000: thread_noise:     bash:232 start 947.629000000 duration 0 ns
      timerlat/0-274     [000] .....   947.629000: #51598 context thread timer_latency    409280 ns
            bash-232     [000] d.h..   947.630000: #51599 context    irq timer_latency    110720 ns
            bash-232     [000] dnh1.   947.630000: irq_noise: timer:206 start 947.629000000 duration 1000000 ns
            bash-232     [000] d..3.   947.630000: thread_noise:     bash:232 start 947.630000000 duration 0 ns
      timerlat/0-274     [000] .....   947.630000: #51599 context thread timer_latency    407168 ns
            bash-232     [000] d.h..   947.631000: #51600 context    irq timer_latency    108608 ns
            bash-232     [000] dnh1.   947.631000: irq_noise: timer:206 start 947.630000000 duration 1000000 ns
            bash-232     [000] d..3.   947.631000: thread_noise:     bash:232 start 947.631000000 duration 0 ns
      timerlat/0-274     [000] .....   947.631000: #51600 context thread timer_latency    401472 ns
```

I am very interested by any relevant test to do (switch events ?
Anything else ?) to improve the series (and the platform :-)).

I am quite sure I missed a *lot* of things, but it seems to do what I
need :-). I post it as RFC for now, in particular because I added a new
file, and I am not sure if it is the proper way.

Thanks for your remarks and improvements !

[1]: https://lore.kernel.org/linux-m68k/20241018124511.70d29198@gandalf.local.home

Signed-off-by: Jean-Michel Hautbois <jeanmichel.hautbois@yoseli.org>
---
Jean-Michel Hautbois (2):
      m68k: Add tracirqs
      arch: m68k: Add STACKTRACE support

 arch/m68k/Kconfig             |  6 ++++
 arch/m68k/kernel/Makefile     |  1 +
 arch/m68k/kernel/irq.c        |  2 ++
 arch/m68k/kernel/stacktrace.c | 70 +++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 79 insertions(+)
---
base-commit: 42f7652d3eb527d03665b09edac47f85fb600924
change-id: 20241021-add-m68k-tracing-support-36c18d2233d8

Best regards,

Comments

Jean-Michel Hautbois Nov. 15, 2024, 8:26 a.m. UTC | #1
Hi all,

On 21/10/2024 11:44, Jean-Michel Hautbois wrote:
> In order to debug latency issues, I wanted to use ftrace on my M54418
> coldfire. Sadly, it is not supported yet.
> Thanks to Steven [1] it does not sound too difficult.
> 
> This small series adds basic functions to make it work, and for the few
> tests I could do, it seems to be working well.

I did a few tests (well, a *lot* to be honest :-)) after adding HR 
timers support on my m68k coldfire kernel.

Now, I think ftrace shows a nice issue, but I think my stack_trace is 
not correct ?

# wakeup_rt latency trace v1.1.5 on 6.12.0-rc4-00116-g2f00ff61fd82-dirty
# --------------------------------------------------------------------
# latency: 451 us, #6/6, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0)
#    -----------------
#    | task: irq/104-enet-fe-109 (uid:0 nice:0 policy:1 rt_prio:50)
#    -----------------
#
#                    _------=> CPU#
#                   / _-----=> irqs-off/BH-disabled
#                  | / _----=> need-resched
#                  || / _---=> hardirq/softirq
#                  ||| / _--=> preempt-depth
#                  |||| / _-=> migrate-disable
#                  ||||| /     delay
#  cmd     pid     |||||| time  |   caller
#     \   /        ||||||  \    |    /
  telnetd-224       0dnh4.   14us+:      224:120:R   + [000]     109: 
49:R irq/104-enet-fe
  telnetd-224       0dnh4.   50us+: <stack trace>
  telnetd-224       0dnh4.   65us!: 0
  telnetd-224       0d..3.  396us+: __traceiter_sched_switch
  telnetd-224       0d..3.  418us+:      224:120:R ==> [000]     109: 
49:R irq/104-enet-fe
  telnetd-224       0d..3.  440us : <stack trace>


Nevertheless it sounds like a really high latency for wake_up().

I have a custom driver which basically gets an IRQ, and calls wake_up on 
a read() call. This wake_up() on a high cpu usage can be more than 1ms ! 
Even with a fifo/99 priority for my kernel thread !

I don't know if it rings any bell ?
I can obviously do more tests if it can help getting down to the issue :-).

Thanks !

> Here is a simple output I get:
> 
> ```
> 
>              bash-232     [000] d..3.   947.629000: thread_noise:     bash:232 start 947.629000000 duration 0 ns
>        timerlat/0-274     [000] .....   947.629000: #51598 context thread timer_latency    409280 ns
>              bash-232     [000] d.h..   947.630000: #51599 context    irq timer_latency    110720 ns
>              bash-232     [000] dnh1.   947.630000: irq_noise: timer:206 start 947.629000000 duration 1000000 ns
>              bash-232     [000] d..3.   947.630000: thread_noise:     bash:232 start 947.630000000 duration 0 ns
>        timerlat/0-274     [000] .....   947.630000: #51599 context thread timer_latency    407168 ns
>              bash-232     [000] d.h..   947.631000: #51600 context    irq timer_latency    108608 ns
>              bash-232     [000] dnh1.   947.631000: irq_noise: timer:206 start 947.630000000 duration 1000000 ns
>              bash-232     [000] d..3.   947.631000: thread_noise:     bash:232 start 947.631000000 duration 0 ns
>        timerlat/0-274     [000] .....   947.631000: #51600 context thread timer_latency    401472 ns
> ```
> 
> I am very interested by any relevant test to do (switch events ?
> Anything else ?) to improve the series (and the platform :-)).
> 
> I am quite sure I missed a *lot* of things, but it seems to do what I
> need :-). I post it as RFC for now, in particular because I added a new
> file, and I am not sure if it is the proper way.
> 
> Thanks for your remarks and improvements !
> 
> [1]: https://lore.kernel.org/linux-m68k/20241018124511.70d29198@gandalf.local.home
> 
> Signed-off-by: Jean-Michel Hautbois <jeanmichel.hautbois@yoseli.org>
> ---
> Jean-Michel Hautbois (2):
>        m68k: Add tracirqs
>        arch: m68k: Add STACKTRACE support
> 
>   arch/m68k/Kconfig             |  6 ++++
>   arch/m68k/kernel/Makefile     |  1 +
>   arch/m68k/kernel/irq.c        |  2 ++
>   arch/m68k/kernel/stacktrace.c | 70 +++++++++++++++++++++++++++++++++++++++++++
>   4 files changed, 79 insertions(+)
> ---
> base-commit: 42f7652d3eb527d03665b09edac47f85fb600924
> change-id: 20241021-add-m68k-tracing-support-36c18d2233d8
> 
> Best regards,
Steven Rostedt Nov. 15, 2024, 3:25 p.m. UTC | #2
On Fri, 15 Nov 2024 09:26:07 +0100
Jean-Michel Hautbois <jeanmichel.hautbois@yoseli.org> wrote:

> Nevertheless it sounds like a really high latency for wake_up().
> 
> I have a custom driver which basically gets an IRQ, and calls wake_up on 
> a read() call. This wake_up() on a high cpu usage can be more than 1ms ! 
> Even with a fifo/99 priority for my kernel thread !
> 
> I don't know if it rings any bell ?
> I can obviously do more tests if it can help getting down to the issue :-).

Try running timerlat.

-- Steve
Jean-Michel Hautbois Nov. 15, 2024, 3:33 p.m. UTC | #3
Hi Steve,

On 15/11/2024 16:25, Steven Rostedt wrote:
> On Fri, 15 Nov 2024 09:26:07 +0100
> Jean-Michel Hautbois <jeanmichel.hautbois@yoseli.org> wrote:
> 
>> Nevertheless it sounds like a really high latency for wake_up().
>>
>> I have a custom driver which basically gets an IRQ, and calls wake_up on
>> a read() call. This wake_up() on a high cpu usage can be more than 1ms !
>> Even with a fifo/99 priority for my kernel thread !
>>
>> I don't know if it rings any bell ?
>> I can obviously do more tests if it can help getting down to the issue :-).
> 
> Try running timerlat.

Thanks !
Here is what I get:
# echo timerlat > current_tracer
# echo 1 > events/osnoise/enable
# echo 25 > osnoise/stop_tracing_total_us
# tail -10 trace
             bash-224     [000] d.h..   153.268917: #77645 context 
irq timer_latency     45056 ns
             bash-224     [000] dnh..   153.268987: irq_noise: timer:206 
start 153.268879083 duration 93957 ns
             bash-224     [000] d....   153.269056: thread_noise: 
bash:224 start 153.268905324 duration 71045 ns
       timerlat/0-271     [000] .....   153.269103: #77645 context 
thread timer_latency    230656 ns
             bash-224     [000] d.h..   153.269735: irq_noise: timer:206 
start 153.269613847 duration 103558 ns
             bash-224     [000] d.h..   153.269911: #77646 context 
irq timer_latency     40640 ns
             bash-224     [000] dnh..   153.269982: irq_noise: timer:206 
start 153.269875367 duration 93190 ns
             bash-224     [000] d....   153.270053: thread_noise: 
bash:224 start 153.269900969 duration 72709 ns
       timerlat/0-271     [000] .....   153.270100: #77646 context 
thread timer_latency    227008 ns
       timerlat/0-271     [000] .....   153.270155: timerlat_main: stop 
tracing hit on cpu 0

It looks awful, right ?
JM
Steven Rostedt Nov. 15, 2024, 7:55 p.m. UTC | #4
On Fri, 15 Nov 2024 16:33:06 +0100
Jean-Michel Hautbois <jeanmichel.hautbois@yoseli.org> wrote:

> Hi Steve,
> 
> On 15/11/2024 16:25, Steven Rostedt wrote:
> > On Fri, 15 Nov 2024 09:26:07 +0100
> > Jean-Michel Hautbois <jeanmichel.hautbois@yoseli.org> wrote:
> >   
> >> Nevertheless it sounds like a really high latency for wake_up().
> >>
> >> I have a custom driver which basically gets an IRQ, and calls wake_up on
> >> a read() call. This wake_up() on a high cpu usage can be more than 1ms !
> >> Even with a fifo/99 priority for my kernel thread !
> >>
> >> I don't know if it rings any bell ?
> >> I can obviously do more tests if it can help getting down to the issue :-).  
> > 
> > Try running timerlat.  
> 
> Thanks !
> Here is what I get:
> # echo timerlat > current_tracer
> # echo 1 > events/osnoise/enable
> # echo 25 > osnoise/stop_tracing_total_us
> # tail -10 trace
>              bash-224     [000] d.h..   153.268917: #77645 context  irq timer_latency     45056 ns
>              bash-224     [000] dnh..   153.268987: irq_noise: timer:206  start 153.268879083 duration 93957 ns
>              bash-224     [000] d....   153.269056: thread_noise:  bash:224 start 153.268905324 duration 71045 ns
>        timerlat/0-271     [000] .....   153.269103: #77645 context thread timer_latency    230656 ns
>              bash-224     [000] d.h..   153.269735: irq_noise: timer:206 start 153.269613847 duration 103558 ns
>              bash-224     [000] d.h..   153.269911: #77646 context irq timer_latency     40640 ns
>              bash-224     [000] dnh..   153.269982: irq_noise: timer:206 start 153.269875367 duration 93190 ns
>              bash-224     [000] d....   153.270053: thread_noise: bash:224 start 153.269900969 duration 72709 ns
>        timerlat/0-271     [000] .....   153.270100: #77646 context thread timer_latency    227008 ns
>        timerlat/0-271     [000] .....   153.270155: timerlat_main: stop tracing hit on cpu 0
> 
> It looks awful, right ?

awful is relative ;-) If that was on x86, I would say it was bad.

Also check out rtla (in tools/trace/rtla).

 # rtla timerlat top

                                     Timer Latency                                                                                       
  0 00:00:23   |          IRQ Timer Latency (us)        |         Thread Timer Latency (us)      |    Ret user Timer Latency (us)
CPU COUNT      |      cur       min       avg       max |      cur       min       avg       max |      cur       min       avg       max
  0 #21515     |      580       573       599       827 |      598       585       641       860 |      606       589       659       889
  1 #21513     |      461       452       477       506 |      480       315       521       561 |      488       384       539       575
  2 #10827     |      962       961       988      1016 |      983       805      1026      1075 |      989       940      1045      1081
  3 #21512     |       68        61        86       313 |       87        73       128       190 |       95        79       146       338
  4 #21510     |      254       246       271       464 |      273        70       315       473 |      281       183       333       477
  5 #21509     |      397       388       414       441 |      416       215       457       649 |      424       338       475       655
  6 #21508     |      496       496       522       566 |      509       322       563       605 |      515       424       579       611
  7 #21507     |      658       648       675       702 |      676       471       717       757 |      684       627       735       763
---------------|----------------------------------------|----------------------------------------|---------------------------------------
ALL #161401 e0 |                 61       472      1016 |                 70       514      1075 |                 79       532      1081


Or on a virtual machine running on my x86 server:

 # rtla timerlat top

                                     Timer Latency                                                                                       
  0 00:00:19   |          IRQ Timer Latency (us)        |         Thread Timer Latency (us)      |    Ret user Timer Latency (us)
CPU COUNT      |      cur       min       avg       max |      cur       min       avg       max |      cur       min       avg       max
  0 #4600      |        4         3      3104      3223 |       41        41      3131      3262 |       46        46      3143      3304
  1 #4589      |     3678         3      3675      3742 |     3705        15      3703      3814 |     3718        22      3716      3828
  2 #4571      |     3118        56      3135      3233 |     3145        79      3162      3260 |     3157        91      3175      3275
  3 #4579      |     3509         3      3514      3625 |     3535         7      3540      3665 |     3547         9      3553      3702
  4 #4676      |        2         1      2912      4016 |       10         5      2940      4044 |       15         7      2952      4060
  5 #4572      |     3446         9      3458      3517 |     3475        32      3486      3562 |     3488        41      3499      3576
  6 #4692      |     3948         0      3767      4002 |     3984         5      3794      4079 |     4000         7      3806      4108
  7 #4580      |     3141         2      3144      3269 |     3168        20      3171      3296 |     3180        30      3184      3332
---------------|----------------------------------------|----------------------------------------|---------------------------------------
ALL #36859  e0 |                  0      3339      4016 |                  5      3366      4079 |                  7      3379      4108

That has some really poor numbers.

-- Steve
Jean-Michel Hautbois Nov. 18, 2024, 10:11 a.m. UTC | #5
Hi Steve,

On 15/11/2024 20:55, Steven Rostedt wrote:
> On Fri, 15 Nov 2024 16:33:06 +0100
> Jean-Michel Hautbois <jeanmichel.hautbois@yoseli.org> wrote:
> 
>> Hi Steve,
>>
>> On 15/11/2024 16:25, Steven Rostedt wrote:
>>> On Fri, 15 Nov 2024 09:26:07 +0100
>>> Jean-Michel Hautbois <jeanmichel.hautbois@yoseli.org> wrote:
>>>    
>>>> Nevertheless it sounds like a really high latency for wake_up().
>>>>
>>>> I have a custom driver which basically gets an IRQ, and calls wake_up on
>>>> a read() call. This wake_up() on a high cpu usage can be more than 1ms !
>>>> Even with a fifo/99 priority for my kernel thread !
>>>>
>>>> I don't know if it rings any bell ?
>>>> I can obviously do more tests if it can help getting down to the issue :-).
>>>
>>> Try running timerlat.
>>
>> Thanks !
>> Here is what I get:
>> # echo timerlat > current_tracer
>> # echo 1 > events/osnoise/enable
>> # echo 25 > osnoise/stop_tracing_total_us
>> # tail -10 trace
>>               bash-224     [000] d.h..   153.268917: #77645 context  irq timer_latency     45056 ns
>>               bash-224     [000] dnh..   153.268987: irq_noise: timer:206  start 153.268879083 duration 93957 ns
>>               bash-224     [000] d....   153.269056: thread_noise:  bash:224 start 153.268905324 duration 71045 ns
>>         timerlat/0-271     [000] .....   153.269103: #77645 context thread timer_latency    230656 ns
>>               bash-224     [000] d.h..   153.269735: irq_noise: timer:206 start 153.269613847 duration 103558 ns
>>               bash-224     [000] d.h..   153.269911: #77646 context irq timer_latency     40640 ns
>>               bash-224     [000] dnh..   153.269982: irq_noise: timer:206 start 153.269875367 duration 93190 ns
>>               bash-224     [000] d....   153.270053: thread_noise: bash:224 start 153.269900969 duration 72709 ns
>>         timerlat/0-271     [000] .....   153.270100: #77646 context thread timer_latency    227008 ns
>>         timerlat/0-271     [000] .....   153.270155: timerlat_main: stop tracing hit on cpu 0
>>
>> It looks awful, right ?
> 
> awful is relative ;-) If that was on x86, I would say it was bad.
> 
> Also check out rtla (in tools/trace/rtla).

Thanks ! I knew it only by name, so I watched a presentation recorded 
during OSS summit given by Daniel Bristot de Oliveira who wrote it and 
it is really impressive !

I had to modify the source code a bit, as it does not compile with my 
uclibc toolchain:
diff --git a/tools/tracing/rtla/Makefile.rtla 
b/tools/tracing/rtla/Makefile.rtla
index cc1d6b615475..b22016a88d09 100644
--- a/tools/tracing/rtla/Makefile.rtla
+++ b/tools/tracing/rtla/Makefile.rtla
@@ -15,7 +15,7 @@ $(call allow-override,LD_SO_CONF_PATH,/etc/ld.so.conf.d/)
  $(call allow-override,LDCONFIG,ldconfig)
  export CC AR STRIP PKG_CONFIG LD_SO_CONF_PATH LDCONFIG

-FOPTS          := -flto=auto -ffat-lto-objects -fexceptions 
-fstack-protector-strong   \
+FOPTS          := -flto=auto -ffat-lto-objects -fexceptions \
                 -fasynchronous-unwind-tables -fstack-clash-protection
  WOPTS          := -O -Wall -Werror=format-security 
-Wp,-D_FORTIFY_SOURCE=2             \
                 -Wp,-D_GLIBCXX_ASSERTIONS -Wno-maybe-uninitialized
diff --git a/tools/tracing/rtla/src/timerlat_u.c 
b/tools/tracing/rtla/src/timerlat_u.c
index 01dbf9a6b5a5..92ad2388b123 100644
--- a/tools/tracing/rtla/src/timerlat_u.c
+++ b/tools/tracing/rtla/src/timerlat_u.c
@@ -15,10 +15,16 @@
  #include <pthread.h>
  #include <sys/wait.h>
  #include <sys/prctl.h>
+#include <sys/syscall.h>

  #include "utils.h"
  #include "timerlat_u.h"

+static inline pid_t gettid(void)
+{
+       return syscall(SYS_gettid);
+}
+
  /*
   * This is the user-space main for the tool timerlatu/ threads.
   *
diff --git a/tools/tracing/rtla/src/utils.c b/tools/tracing/rtla/src/utils.c
index 9ac71a66840c..b754dc1016a4 100644
--- a/tools/tracing/rtla/src/utils.c
+++ b/tools/tracing/rtla/src/utils.c
@@ -229,6 +229,9 @@ long parse_ns_duration(char *val)
  #elif __s390x__
  # define __NR_sched_setattr    345
  # define __NR_sched_getattr    346
+#elif __m68k__
+# define __NR_sched_setattr    349
+# define __NR_sched_getattr    350
  #endif

  #define SCHED_DEADLINE         6

But it is not enough, as executing rtla fails with a segfault.
I can dump a core, but I could not manage to build gdb for my board so I 
can't debug it (I don't know how to debug a coredump without gdb !).

JM
Steven Rostedt Nov. 18, 2024, 8:20 p.m. UTC | #6
[ Added Tomas as he knows this code better than I do ]

On Mon, 18 Nov 2024 11:11:48 +0100
Jean-Michel Hautbois <jeanmichel.hautbois@yoseli.org> wrote:

> Hi Steve,
> 
> On 15/11/2024 20:55, Steven Rostedt wrote:
> > On Fri, 15 Nov 2024 16:33:06 +0100
> > Jean-Michel Hautbois <jeanmichel.hautbois@yoseli.org> wrote:
> >   
> >> Hi Steve,
> >>
> >> On 15/11/2024 16:25, Steven Rostedt wrote:  
> >>> On Fri, 15 Nov 2024 09:26:07 +0100
> >>> Jean-Michel Hautbois <jeanmichel.hautbois@yoseli.org> wrote:
> >>>      
> >>>> Nevertheless it sounds like a really high latency for wake_up().
> >>>>
> >>>> I have a custom driver which basically gets an IRQ, and calls wake_up on
> >>>> a read() call. This wake_up() on a high cpu usage can be more than 1ms !
> >>>> Even with a fifo/99 priority for my kernel thread !
> >>>>
> >>>> I don't know if it rings any bell ?
> >>>> I can obviously do more tests if it can help getting down to the issue :-).  
> >>>
> >>> Try running timerlat.  
> >>
> >> Thanks !
> >> Here is what I get:
> >> # echo timerlat > current_tracer
> >> # echo 1 > events/osnoise/enable
> >> # echo 25 > osnoise/stop_tracing_total_us
> >> # tail -10 trace
> >>               bash-224     [000] d.h..   153.268917: #77645 context  irq timer_latency     45056 ns
> >>               bash-224     [000] dnh..   153.268987: irq_noise: timer:206  start 153.268879083 duration 93957 ns
> >>               bash-224     [000] d....   153.269056: thread_noise:  bash:224 start 153.268905324 duration 71045 ns
> >>         timerlat/0-271     [000] .....   153.269103: #77645 context thread timer_latency    230656 ns
> >>               bash-224     [000] d.h..   153.269735: irq_noise: timer:206 start 153.269613847 duration 103558 ns
> >>               bash-224     [000] d.h..   153.269911: #77646 context irq timer_latency     40640 ns
> >>               bash-224     [000] dnh..   153.269982: irq_noise: timer:206 start 153.269875367 duration 93190 ns
> >>               bash-224     [000] d....   153.270053: thread_noise: bash:224 start 153.269900969 duration 72709 ns
> >>         timerlat/0-271     [000] .....   153.270100: #77646 context thread timer_latency    227008 ns
> >>         timerlat/0-271     [000] .....   153.270155: timerlat_main: stop tracing hit on cpu 0
> >>
> >> It looks awful, right ?  
> > 
> > awful is relative ;-) If that was on x86, I would say it was bad.
> > 
> > Also check out rtla (in tools/trace/rtla).  
> 
> Thanks ! I knew it only by name, so I watched a presentation recorded 
> during OSS summit given by Daniel Bristot de Oliveira who wrote it and 
> it is really impressive !
> 
> I had to modify the source code a bit, as it does not compile with my 
> uclibc toolchain:
> diff --git a/tools/tracing/rtla/Makefile.rtla 
> b/tools/tracing/rtla/Makefile.rtla
> index cc1d6b615475..b22016a88d09 100644
> --- a/tools/tracing/rtla/Makefile.rtla
> +++ b/tools/tracing/rtla/Makefile.rtla
> @@ -15,7 +15,7 @@ $(call allow-override,LD_SO_CONF_PATH,/etc/ld.so.conf.d/)
>   $(call allow-override,LDCONFIG,ldconfig)
>   export CC AR STRIP PKG_CONFIG LD_SO_CONF_PATH LDCONFIG
> 
> -FOPTS          := -flto=auto -ffat-lto-objects -fexceptions 
> -fstack-protector-strong   \
> +FOPTS          := -flto=auto -ffat-lto-objects -fexceptions \
>                  -fasynchronous-unwind-tables -fstack-clash-protection
>   WOPTS          := -O -Wall -Werror=format-security 
> -Wp,-D_FORTIFY_SOURCE=2             \
>                  -Wp,-D_GLIBCXX_ASSERTIONS -Wno-maybe-uninitialized

I'm not sure what the consequence of the above would be. Perhaps Daniel
just copied this from other code?

> diff --git a/tools/tracing/rtla/src/timerlat_u.c 
> b/tools/tracing/rtla/src/timerlat_u.c
> index 01dbf9a6b5a5..92ad2388b123 100644
> --- a/tools/tracing/rtla/src/timerlat_u.c
> +++ b/tools/tracing/rtla/src/timerlat_u.c
> @@ -15,10 +15,16 @@
>   #include <pthread.h>
>   #include <sys/wait.h>
>   #include <sys/prctl.h>
> +#include <sys/syscall.h>
> 
>   #include "utils.h"
>   #include "timerlat_u.h"
> 
> +static inline pid_t gettid(void)
> +{
> +       return syscall(SYS_gettid);
> +}
> +
>   /*
>    * This is the user-space main for the tool timerlatu/ threads.
>    *
> diff --git a/tools/tracing/rtla/src/utils.c b/tools/tracing/rtla/src/utils.c
> index 9ac71a66840c..b754dc1016a4 100644
> --- a/tools/tracing/rtla/src/utils.c
> +++ b/tools/tracing/rtla/src/utils.c
> @@ -229,6 +229,9 @@ long parse_ns_duration(char *val)
>   #elif __s390x__
>   # define __NR_sched_setattr    345
>   # define __NR_sched_getattr    346
> +#elif __m68k__
> +# define __NR_sched_setattr    349
> +# define __NR_sched_getattr    350
>   #endif
> 
>   #define SCHED_DEADLINE         6
> 
> But it is not enough, as executing rtla fails with a segfault.
> I can dump a core, but I could not manage to build gdb for my board so I 
> can't debug it (I don't know how to debug a coredump without gdb !).

printf()!  That's how I debug things without gdb ;-)

-- Steve
Jean-Michel Hautbois Nov. 19, 2024, 2:24 p.m. UTC | #7
Hi Steve !

On 18/11/2024 21:20, Steven Rostedt wrote:
> 
> [ Added Tomas as he knows this code better than I do ]

Thanks !

> 
> On Mon, 18 Nov 2024 11:11:48 +0100
> Jean-Michel Hautbois <jeanmichel.hautbois@yoseli.org> wrote:
> 
>> Hi Steve,
>>
>> On 15/11/2024 20:55, Steven Rostedt wrote:
>>> On Fri, 15 Nov 2024 16:33:06 +0100
>>> Jean-Michel Hautbois <jeanmichel.hautbois@yoseli.org> wrote:
>>>    
>>>> Hi Steve,
>>>>
>>>> On 15/11/2024 16:25, Steven Rostedt wrote:
>>>>> On Fri, 15 Nov 2024 09:26:07 +0100
>>>>> Jean-Michel Hautbois <jeanmichel.hautbois@yoseli.org> wrote:
>>>>>       
>>>>>> Nevertheless it sounds like a really high latency for wake_up().
>>>>>>
>>>>>> I have a custom driver which basically gets an IRQ, and calls wake_up on
>>>>>> a read() call. This wake_up() on a high cpu usage can be more than 1ms !
>>>>>> Even with a fifo/99 priority for my kernel thread !
>>>>>>
>>>>>> I don't know if it rings any bell ?
>>>>>> I can obviously do more tests if it can help getting down to the issue :-).
>>>>>
>>>>> Try running timerlat.
>>>>
>>>> Thanks !
>>>> Here is what I get:
>>>> # echo timerlat > current_tracer
>>>> # echo 1 > events/osnoise/enable
>>>> # echo 25 > osnoise/stop_tracing_total_us
>>>> # tail -10 trace
>>>>                bash-224     [000] d.h..   153.268917: #77645 context  irq timer_latency     45056 ns
>>>>                bash-224     [000] dnh..   153.268987: irq_noise: timer:206  start 153.268879083 duration 93957 ns
>>>>                bash-224     [000] d....   153.269056: thread_noise:  bash:224 start 153.268905324 duration 71045 ns
>>>>          timerlat/0-271     [000] .....   153.269103: #77645 context thread timer_latency    230656 ns
>>>>                bash-224     [000] d.h..   153.269735: irq_noise: timer:206 start 153.269613847 duration 103558 ns
>>>>                bash-224     [000] d.h..   153.269911: #77646 context irq timer_latency     40640 ns
>>>>                bash-224     [000] dnh..   153.269982: irq_noise: timer:206 start 153.269875367 duration 93190 ns
>>>>                bash-224     [000] d....   153.270053: thread_noise: bash:224 start 153.269900969 duration 72709 ns
>>>>          timerlat/0-271     [000] .....   153.270100: #77646 context thread timer_latency    227008 ns
>>>>          timerlat/0-271     [000] .....   153.270155: timerlat_main: stop tracing hit on cpu 0
>>>>
>>>> It looks awful, right ?
>>>
>>> awful is relative ;-) If that was on x86, I would say it was bad.
>>>
>>> Also check out rtla (in tools/trace/rtla).
>>
>> Thanks ! I knew it only by name, so I watched a presentation recorded
>> during OSS summit given by Daniel Bristot de Oliveira who wrote it and
>> it is really impressive !
>>
>> I had to modify the source code a bit, as it does not compile with my
>> uclibc toolchain:
>> diff --git a/tools/tracing/rtla/Makefile.rtla
>> b/tools/tracing/rtla/Makefile.rtla
>> index cc1d6b615475..b22016a88d09 100644
>> --- a/tools/tracing/rtla/Makefile.rtla
>> +++ b/tools/tracing/rtla/Makefile.rtla
>> @@ -15,7 +15,7 @@ $(call allow-override,LD_SO_CONF_PATH,/etc/ld.so.conf.d/)
>>    $(call allow-override,LDCONFIG,ldconfig)
>>    export CC AR STRIP PKG_CONFIG LD_SO_CONF_PATH LDCONFIG
>>
>> -FOPTS          := -flto=auto -ffat-lto-objects -fexceptions
>> -fstack-protector-strong   \
>> +FOPTS          := -flto=auto -ffat-lto-objects -fexceptions \
>>                   -fasynchronous-unwind-tables -fstack-clash-protection
>>    WOPTS          := -O -Wall -Werror=format-security
>> -Wp,-D_FORTIFY_SOURCE=2             \
>>                   -Wp,-D_GLIBCXX_ASSERTIONS -Wno-maybe-uninitialized
> 
> I'm not sure what the consequence of the above would be. Perhaps Daniel
> just copied this from other code?
> 
>> diff --git a/tools/tracing/rtla/src/timerlat_u.c
>> b/tools/tracing/rtla/src/timerlat_u.c
>> index 01dbf9a6b5a5..92ad2388b123 100644
>> --- a/tools/tracing/rtla/src/timerlat_u.c
>> +++ b/tools/tracing/rtla/src/timerlat_u.c
>> @@ -15,10 +15,16 @@
>>    #include <pthread.h>
>>    #include <sys/wait.h>
>>    #include <sys/prctl.h>
>> +#include <sys/syscall.h>
>>
>>    #include "utils.h"
>>    #include "timerlat_u.h"
>>
>> +static inline pid_t gettid(void)
>> +{
>> +       return syscall(SYS_gettid);
>> +}
>> +
>>    /*
>>     * This is the user-space main for the tool timerlatu/ threads.
>>     *
>> diff --git a/tools/tracing/rtla/src/utils.c b/tools/tracing/rtla/src/utils.c
>> index 9ac71a66840c..b754dc1016a4 100644
>> --- a/tools/tracing/rtla/src/utils.c
>> +++ b/tools/tracing/rtla/src/utils.c
>> @@ -229,6 +229,9 @@ long parse_ns_duration(char *val)
>>    #elif __s390x__
>>    # define __NR_sched_setattr    345
>>    # define __NR_sched_getattr    346
>> +#elif __m68k__
>> +# define __NR_sched_setattr    349
>> +# define __NR_sched_getattr    350
>>    #endif
>>
>>    #define SCHED_DEADLINE         6
>>
>> But it is not enough, as executing rtla fails with a segfault.
>> I can dump a core, but I could not manage to build gdb for my board so I
>> can't debug it (I don't know how to debug a coredump without gdb !).
> 
> printf()!  That's how I debug things without gdb ;-)

Indeed printf gave me clues !
It appears to be a bug in libtracefs (v1.8.1). rtla segfaults when 
calling tracefs_local_events() in trace_instance_init().

Debugging libtracefs pointed me to the load_events() function, and the 
segfault happens after tep_parse_event() is called for 
"/sys/kernel/debug/tracing/events/vmscan/mm_vmscan_write_folio/format".

Going through the calls I get to event_read_print_args().
I changed libtraceevent log level to get the warnings, and it says:
libtraceevent: Resource temporarily unavailable
   unknown op '.'
Segmentation fault

JM
Steven Rostedt Nov. 19, 2024, 3:26 p.m. UTC | #8
On Tue, 19 Nov 2024 15:24:00 +0100
Jean-Michel Hautbois <jeanmichel.hautbois@yoseli.org> wrote:

> Indeed printf gave me clues !
> It appears to be a bug in libtracefs (v1.8.1). rtla segfaults when 
> calling tracefs_local_events() in trace_instance_init().
> 
> Debugging libtracefs pointed me to the load_events() function, and the 
> segfault happens after tep_parse_event() is called for 
> "/sys/kernel/debug/tracing/events/vmscan/mm_vmscan_write_folio/format".
> 
> Going through the calls I get to event_read_print_args().
> I changed libtraceevent log level to get the warnings, and it says:
> libtraceevent: Resource temporarily unavailable
>    unknown op '.'
> Segmentation fault

Can you do me a favor and send me privately a tarball of:

 # cp -r /sys/kernel/tracing/events /tmp/events
 # cd /tmp
 # tar -cvjf events.tar.bz2 events

You can't call tar on the /sys/kernel/tracing files as those are pseudo
files with size of zero, and tar will just record empty files :-p

-- Steve
Steven Rostedt Nov. 19, 2024, 4:28 p.m. UTC | #9
On Tue, 19 Nov 2024 10:26:31 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> Can you do me a favor and send me privately a tarball of:
> 
>  # cp -r /sys/kernel/tracing/events /tmp/events
>  # cd /tmp
>  # tar -cvjf events.tar.bz2 events
> 
> You can't call tar on the /sys/kernel/tracing files as those are pseudo
> files with size of zero, and tar will just record empty files :-p

It crashes on parsing this:

name: mm_vmscan_write_folio
ID: 198
format:
	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
	field:int common_pid;	offset:4;	size:4;	signed:1;

	field:unsigned long pfn;	offset:8;	size:4;	signed:0;
	field:int reclaim_flags;	offset:12;	size:4;	signed:1;

print fmt: "page=%p pfn=0x%lx flags=%s", (mem_map + ((REC->pfn) - (m68k_memory[0].addr >> 13))), REC->pfn, (REC->reclaim_flags) ? __print_flags(REC->reclaim_flags, "|", {0x0001u, "RECLAIM_WB_ANON"}, {0x0002u, "RECLAIM_WB_FILE"}, {0x0010u, "RECLAIM_WB_MIXED"}, {0x0004u, "RECLAIM_WB_SYNC"}, {0x0008u, "RECLAIM_WB_ASYNC"} ) : "RECLAIM_WB_NONE"


It shouldn't crash, but it also found a bug in your code ;-)

You reference two variables that are not part of the event:

 "mem_map" and "m68k_memory[0].addr"

Do these variables ever change? Because the TP_printk() part of the
TRACE_EVENT() macro is called a long time after the event is recorded. It
could be seconds, minutes, days or even months (and unlikely possibly
years) later.

The event takes place and runs the TP_fast_assign() to record the event in
the ring buffer. Then some time later, when you read the "trace" file, the
TP_printk() portion gets run. If you wait months before reading that, it is
executed months later.

Now you have "mem_map" and "m68k_memory[0].addr" in that output that gets
run months after the fact. Are they constant throughout the boot?

Now another issue is that user space has no idea what those values are. Now
user space can not print the values. Currently the code crashes because you
are the first one to reference a global value from a trace event print fmt.
That should probably be fixed to simply fail to parse the event and ignore
the print format logic (which defaults to just printing the raw fields).

-- Steve
Steven Rostedt Nov. 19, 2024, 4:44 p.m. UTC | #10
On Tue, 19 Nov 2024 11:28:50 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> print fmt: "page=%p pfn=0x%lx flags=%s", (mem_map + ((REC->pfn) - (m68k_memory[0].addr >> 13))), REC->pfn, (REC->reclaim_flags) ? __print_flags(REC->reclaim_flags, "|", {0x0001u, "RECLAIM_WB_ANON"}, {0x0002u, "RECLAIM_WB_FILE"}, {0x0010u, "RECLAIM_WB_MIXED"}, {0x0004u, "RECLAIM_WB_SYNC"}, {0x0008u, "RECLAIM_WB_ASYNC"} ) : "RECLAIM_WB_NONE"

Running gdb on this, I found that it crashed on the "m68k_memory[0].addr"
because it failed to fail on the '.' that it doesn't know how to parse.

This patch should stop the crash (against libtraceevent)

-- Steve

diff --git a/src/event-parse.c b/src/event-parse.c
index 0427061603db..a6da8f04cbf3 100644
--- a/src/event-parse.c
+++ b/src/event-parse.c
@@ -2498,6 +2498,10 @@ process_op(struct tep_event *event, struct tep_print_arg *arg, char **tok)
 
 		/* higher prios need to be closer to the root */
 		prio = get_op_prio(*tok);
+		if (prio < 0) {
+			token = *tok;
+			goto out_free;
+		}
 
 		if (prio > arg->op.prio)
 			return process_op(event, arg, tok);
Jean-Michel Hautbois Nov. 19, 2024, 6:06 p.m. UTC | #11
Hi Steve,

On 19/11/2024 17:28, Steven Rostedt wrote:
> On Tue, 19 Nov 2024 10:26:31 -0500
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
>> Can you do me a favor and send me privately a tarball of:
>>
>>   # cp -r /sys/kernel/tracing/events /tmp/events
>>   # cd /tmp
>>   # tar -cvjf events.tar.bz2 events
>>
>> You can't call tar on the /sys/kernel/tracing files as those are pseudo
>> files with size of zero, and tar will just record empty files :-p
> 
> It crashes on parsing this:
> 
> name: mm_vmscan_write_folio
> ID: 198
> format:
> 	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
> 	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
> 	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
> 	field:int common_pid;	offset:4;	size:4;	signed:1;
> 
> 	field:unsigned long pfn;	offset:8;	size:4;	signed:0;
> 	field:int reclaim_flags;	offset:12;	size:4;	signed:1;
> 
> print fmt: "page=%p pfn=0x%lx flags=%s", (mem_map + ((REC->pfn) - (m68k_memory[0].addr >> 13))), REC->pfn, (REC->reclaim_flags) ? __print_flags(REC->reclaim_flags, "|", {0x0001u, "RECLAIM_WB_ANON"}, {0x0002u, "RECLAIM_WB_FILE"}, {0x0010u, "RECLAIM_WB_MIXED"}, {0x0004u, "RECLAIM_WB_SYNC"}, {0x0008u, "RECLAIM_WB_ASYNC"} ) : "RECLAIM_WB_NONE"
> 
> 
> It shouldn't crash, but it also found a bug in your code ;-)

In my code is a really big assumption :-).

> You reference two variables that are not part of the event:
> 
>   "mem_map" and "m68k_memory[0].addr"
> 
> Do these variables ever change? Because the TP_printk() part of the
> TRACE_EVENT() macro is called a long time after the event is recorded. It
> could be seconds, minutes, days or even months (and unlikely possibly
> years) later.

I am really not the best placed to answer.
AFAIK, it sounds like those are never changing.

> 
> The event takes place and runs the TP_fast_assign() to record the event in
> the ring buffer. Then some time later, when you read the "trace" file, the
> TP_printk() portion gets run. If you wait months before reading that, it is
> executed months later.
> 
> Now you have "mem_map" and "m68k_memory[0].addr" in that output that gets
> run months after the fact. Are they constant throughout the boot?

I don't know.

> Now another issue is that user space has no idea what those values are. Now
> user space can not print the values. Currently the code crashes because you
> are the first one to reference a global value from a trace event print fmt.
> That should probably be fixed to simply fail to parse the event and ignore
> the print format logic (which defaults to just printing the raw fields).

The patch you sent works...
But, it fails a bit later:
Dispatching timerlat u procs
starting loop
User-space timerlat pid 230 on cpu 0
Segmentation fault


> 
> -- Steve
>
Steven Rostedt Nov. 19, 2024, 6:10 p.m. UTC | #12
On Tue, 19 Nov 2024 19:06:45 +0100
Jean-Michel Hautbois <jeanmichel.hautbois@yoseli.org> wrote:
> > 
> > It shouldn't crash, but it also found a bug in your code ;-)  
> 
> In my code is a really big assumption :-).

Well, not your personally, but I meant "your" as in m68k code.

> 
> > You reference two variables that are not part of the event:
> > 
> >   "mem_map" and "m68k_memory[0].addr"
> > 
> > Do these variables ever change? Because the TP_printk() part of the
> > TRACE_EVENT() macro is called a long time after the event is recorded. It
> > could be seconds, minutes, days or even months (and unlikely possibly
> > years) later.  
> 
> I am really not the best placed to answer.
> AFAIK, it sounds like those are never changing.

That would mean they are OK and will not corrupt the trace, but it will be
meaningless for tools like perf and trace-cmd.

> 
> > 
> > The event takes place and runs the TP_fast_assign() to record the event in
> > the ring buffer. Then some time later, when you read the "trace" file, the
> > TP_printk() portion gets run. If you wait months before reading that, it is
> > executed months later.
> > 
> > Now you have "mem_map" and "m68k_memory[0].addr" in that output that gets
> > run months after the fact. Are they constant throughout the boot?  
> 
> I don't know.
> 
> > Now another issue is that user space has no idea what those values are. Now
> > user space can not print the values. Currently the code crashes because you
> > are the first one to reference a global value from a trace event print fmt.
> > That should probably be fixed to simply fail to parse the event and ignore
> > the print format logic (which defaults to just printing the raw fields).  
> 
> The patch you sent works...
> But, it fails a bit later:
> Dispatching timerlat u procs
> starting loop
> User-space timerlat pid 230 on cpu 0
> Segmentation fault
> 

More printk? ;-)

-- Steve
Michael Schmitz Nov. 19, 2024, 6:25 p.m. UTC | #13
Hi Steve,

On 20/11/24 07:06, Jean-Michel Hautbois wrote:
>
>> You reference two variables that are not part of the event:
>>
>>   "mem_map" and "m68k_memory[0].addr"
>>
>> Do these variables ever change? Because the TP_printk() part of the
>> TRACE_EVENT() macro is called a long time after the event is 
>> recorded. It
>> could be seconds, minutes, days or even months (and unlikely possibly
>> years) later.
>
> I am really not the best placed to answer.
> AFAIK, it sounds like those are never changing.

m68k_memory[0].addr never changes (that segment is usually where the 
kernel is loaded to, and can't be hotplugged).

mem_map is equal to NODE_DATA(0)->node_mem_map on m68k 
(mm/mm_init.c:__init alloc_node_mem_map()) and won't change either.

Cheers,

     Michael

>
>>
>> The event takes place and runs the TP_fast_assign() to record the 
>> event in
>> the ring buffer. Then some time later, when you read the "trace" 
>> file, the
>> TP_printk() portion gets run. If you wait months before reading that, 
>> it is
>> executed months later.
>>
>> Now you have "mem_map" and "m68k_memory[0].addr" in that output that 
>> gets
>> run months after the fact. Are they constant throughout the boot?
>
> I don't know.
>
>> Now another issue is that user space has no idea what those values 
>> are. Now
>> user space can not print the values. Currently the code crashes 
>> because you
>> are the first one to reference a global value from a trace event 
>> print fmt.
>> That should probably be fixed to simply fail to parse the event and 
>> ignore
>> the print format logic (which defaults to just printing the raw fields).
>
> The patch you sent works...
> But, it fails a bit later:
> Dispatching timerlat u procs
> starting loop
> User-space timerlat pid 230 on cpu 0
> Segmentation fault
>
>
>>
>> -- Steve
>>
>
>
Jean-Michel Hautbois Nov. 20, 2024, 11:47 a.m. UTC | #14
Hi Steve,

On 19/11/2024 19:10, Steven Rostedt wrote:
> On Tue, 19 Nov 2024 19:06:45 +0100
> Jean-Michel Hautbois <jeanmichel.hautbois@yoseli.org> wrote:
>>>
>>> It shouldn't crash, but it also found a bug in your code ;-)
>>
>> In my code is a really big assumption :-).
> 
> Well, not your personally, but I meant "your" as in m68k code.
> 
>>
>>> You reference two variables that are not part of the event:
>>>
>>>    "mem_map" and "m68k_memory[0].addr"
>>>
>>> Do these variables ever change? Because the TP_printk() part of the
>>> TRACE_EVENT() macro is called a long time after the event is recorded. It
>>> could be seconds, minutes, days or even months (and unlikely possibly
>>> years) later.
>>
>> I am really not the best placed to answer.
>> AFAIK, it sounds like those are never changing.
> 
> That would mean they are OK and will not corrupt the trace, but it will be
> meaningless for tools like perf and trace-cmd.
> 
>>
>>>
>>> The event takes place and runs the TP_fast_assign() to record the event in
>>> the ring buffer. Then some time later, when you read the "trace" file, the
>>> TP_printk() portion gets run. If you wait months before reading that, it is
>>> executed months later.
>>>
>>> Now you have "mem_map" and "m68k_memory[0].addr" in that output that gets
>>> run months after the fact. Are they constant throughout the boot?
>>
>> I don't know.
>>
>>> Now another issue is that user space has no idea what those values are. Now
>>> user space can not print the values. Currently the code crashes because you
>>> are the first one to reference a global value from a trace event print fmt.
>>> That should probably be fixed to simply fail to parse the event and ignore
>>> the print format logic (which defaults to just printing the raw fields).
>>
>> The patch you sent works...
>> But, it fails a bit later:
>> Dispatching timerlat u procs
>> starting loop
>> User-space timerlat pid 230 on cpu 0
>> Segmentation fault
>>
> 
> More printk? ;-)

Indeed, but the result is not straightforward this time :-(.

Long story short: it fails at kbuffer_load_subbuffer() call in
read_cpu_pages().

I added printf in the kbuffer helpers in libevent, and it finishes at:
__read_long_4: call read_4 at 0x600230c2
__read_4_sw: ptr=0x8044e2ac

static unsigned int __read_4_sw(void *ptr)
{
	printf("%s: ptr=%p, value: %08x\n", __func__, ptr, *(unsigned int *)ptr);
	unsigned int data = *(unsigned int *)ptr;
	printf("%s: data=%08x\n", __func__, data);

	return swap_4(data);
}

As soon as ptr is dereferenced, the segfault appears.
ptr should be ok though, as the address is valid afaik...

I must say that now I am stuck :-(.

Thanks,
JM
Steven Rostedt Nov. 20, 2024, 3:31 p.m. UTC | #15
On Wed, 20 Nov 2024 12:47:19 +0100
Jean-Michel Hautbois <jeanmichel.hautbois@yoseli.org> wrote:

> Long story short: it fails at kbuffer_load_subbuffer() call in
> read_cpu_pages().
> 
> I added printf in the kbuffer helpers in libevent, and it finishes at:
> __read_long_4: call read_4 at 0x600230c2
> __read_4_sw: ptr=0x8044e2ac
> 
> static unsigned int __read_4_sw(void *ptr)
> {
> 	printf("%s: ptr=%p, value: %08x\n", __func__, ptr, *(unsigned int *)ptr);
> 	unsigned int data = *(unsigned int *)ptr;
> 	printf("%s: data=%08x\n", __func__, data);
> 
> 	return swap_4(data);
> }
> 
> As soon as ptr is dereferenced, the segfault appears.
> ptr should be ok though, as the address is valid afaik...

But you don't know what ptr it failed on, right?

If dereferencing a pointer will crash, the below line:

 	printf("%s: ptr=%p, value: %08x\n", __func__, ptr, *(unsigned int *)ptr);

Will crash before printing, because you are dereferencing ptr. Perhaps you
should change this to:

 	printf("%s: ptr=%p\n" value: %08x\n", __func__, ptr);
	printf("    value: %08x\n", *(unsigned int *)ptr);

And that way you will see what 'ptr' is before the crash. Or did you do
that already?

-- Steve


> 
> I must say that now I am stuck :-(.
Jean-Michel Hautbois Nov. 20, 2024, 3:59 p.m. UTC | #16
Hi Steve,

On 20/11/2024 16:31, Steven Rostedt wrote:
> On Wed, 20 Nov 2024 12:47:19 +0100
> Jean-Michel Hautbois <jeanmichel.hautbois@yoseli.org> wrote:
> 
>> Long story short: it fails at kbuffer_load_subbuffer() call in
>> read_cpu_pages().
>>
>> I added printf in the kbuffer helpers in libevent, and it finishes at:
>> __read_long_4: call read_4 at 0x600230c2
>> __read_4_sw: ptr=0x8044e2ac
>>
>> static unsigned int __read_4_sw(void *ptr)
>> {
>> 	printf("%s: ptr=%p, value: %08x\n", __func__, ptr, *(unsigned int *)ptr);
>> 	unsigned int data = *(unsigned int *)ptr;
>> 	printf("%s: data=%08x\n", __func__, data);
>>
>> 	return swap_4(data);
>> }
>>
>> As soon as ptr is dereferenced, the segfault appears.
>> ptr should be ok though, as the address is valid afaik...
> 
> But you don't know what ptr it failed on, right?
> 
> If dereferencing a pointer will crash, the below line:
> 
>   	printf("%s: ptr=%p, value: %08x\n", __func__, ptr, *(unsigned int *)ptr);
> 
> Will crash before printing, because you are dereferencing ptr. Perhaps you
> should change this to:
> 
>   	printf("%s: ptr=%p\n" value: %08x\n", __func__, ptr);
> 	printf("    value: %08x\n", *(unsigned int *)ptr);
> 
> And that way you will see what 'ptr' is before the crash. Or did you do
> that already?

Yes, I did, sorry I thought it was in the previous dump :-(.

kbuffer_load_subbuffer: kbuf 0x8001d520, ptr 0x8025e2a0, call read_8
read_8
swap_8
kbuffer_load_subbuffer: kbuf 0x8001d520, ptr 0x8025e2a8, read_long
read_long: call read_long at 0x60022ef4 with 0x8025e2a8
__read_long_4: call read_4 at 0x600230de with 0x8025e2a8
__read_4_sw with 0x8025e2a8
__read_4_sw: ptr=0x8025e2a8, value: 00001ff0
__read_4_sw: data=00001ff0
swap_4 for 00001ff0
__read_long_4: --> read_4 at 0x600230de: f01f0000
__read_4_sw with 0x8025e2a8
__read_4_sw: ptr=0x8025e2a8, value: 00001ff0
__read_4_sw: data=00001ff0
swap_4 for 00001ff0
read_long: --> read_long at 0x60022ef4: f01f0000
__read_long_4: call read_4 at 0x600230de with 0x8025e2a8
__read_4_sw with 0x8025e2a8
__read_4_sw: ptr=0x8025e2a8, value: 00001ff0
__read_4_sw: data=00001ff0
swap_4 for 00001ff0
__read_long_4: --> read_4 at 0x600230de: f01f0000
__read_4_sw with 0x8025e2a8
__read_4_sw: ptr=0x8025e2a8, value: 00001ff0
__read_4_sw: data=00001ff0
swap_4 for 00001ff0
kbuffer_load_subbuffer: --> read_long,  flags=f01f0000
kbuffer_load_subbuffer: --> size=1f0000
kbuffer_load_subbuffer: kbuf->data=0x8025e2ac, kbuf->size=1f0000
kbuffer_load_subbuffer: kbuf 0x8001d520, ptr 0x8044e2ac, read_long 
lost_events
read_long: call read_long at 0x60022ef4 with 0x8044e2ac
__read_long_4: call read_4 at 0x600230de with 0x8044e2ac
__read_4_sw with 0x8044e2ac
Segmentation fault

JM
Steven Rostedt Nov. 20, 2024, 4:43 p.m. UTC | #17
On Wed, 20 Nov 2024 16:59:55 +0100
Jean-Michel Hautbois <jeanmichel.hautbois@yoseli.org> wrote:

> > And that way you will see what 'ptr' is before the crash. Or did you do
> > that already?  
> 
> Yes, I did, sorry I thought it was in the previous dump :-(.

Can you see if this makes a difference?

Patch libtracefs:

diff --git a/src/tracefs-events.c b/src/tracefs-events.c
index 77d1ba89b038..19ea3b3f8d36 100644
--- a/src/tracefs-events.c
+++ b/src/tracefs-events.c
@@ -274,7 +274,7 @@ static int open_cpu_files(struct tracefs_instance *instance, cpu_set_t *cpus,
 		if (snapshot)
 			tcpu = tracefs_cpu_snapshot_open(instance, cpu, true);
 		else
-			tcpu = tracefs_cpu_open_mapped(instance, cpu, true);
+			tcpu = tracefs_cpu_open(instance, cpu, true);
 		if (!tcpu)
 			goto error;
 

-- Steve
Jean-Michel Hautbois Nov. 20, 2024, 4:51 p.m. UTC | #18
On 20/11/2024 17:43, Steven Rostedt wrote:
> On Wed, 20 Nov 2024 16:59:55 +0100
> Jean-Michel Hautbois <jeanmichel.hautbois@yoseli.org> wrote:
> 
>>> And that way you will see what 'ptr' is before the crash. Or did you do
>>> that already?
>>
>> Yes, I did, sorry I thought it was in the previous dump :-(.
> 
> Can you see if this makes a difference?
> 
> Patch libtracefs:
> 
> diff --git a/src/tracefs-events.c b/src/tracefs-events.c
> index 77d1ba89b038..19ea3b3f8d36 100644
> --- a/src/tracefs-events.c
> +++ b/src/tracefs-events.c
> @@ -274,7 +274,7 @@ static int open_cpu_files(struct tracefs_instance *instance, cpu_set_t *cpus,
>   		if (snapshot)
>   			tcpu = tracefs_cpu_snapshot_open(instance, cpu, true);
>   		else
> -			tcpu = tracefs_cpu_open_mapped(instance, cpu, true);
> +			tcpu = tracefs_cpu_open(instance, cpu, true);
>   		if (!tcpu)
>   			goto error;
>   

Nope. Nice try :-) !

If you need me to add a few specific printfs I can run it.

JM