mbox series

[v5,0/9] Add OPTPROBES feature on RISCV

Message ID 20221224114315.850130-1-chenguokai17@mails.ucas.ac.cn (mailing list archive)
Headers show
Series Add OPTPROBES feature on RISCV | expand

Message

Xim Dec. 24, 2022, 11:43 a.m. UTC
Add jump optimization support for RISC-V.

Replaces ebreak instructions used by normal kprobes with an
auipc+jalr instruction pair, at the aim of suppressing the probe-hit
overhead.

All known optprobe-capable RISC architectures have been using a single
jump or branch instructions while this patch chooses not. RISC-V has a
quite limited jump range (4KB or 2MB) for both its branch and jump
instructions, which prevent optimizations from supporting probes that
spread all over the kernel.

Auipc-jalr instruction pair is introduced with a much wider jump range
(4GB), where auipc loads the upper 12 bits to a free register and jalr
Deaconappends the lower 20 bits to form a 32 bit immediate. Note that
returns from probe handler requires another free register. As kprobes
can appear almost anywhere inside the kernel, the free register should
be found in a generic way, not depending on calling convention or any
other regulations.

The algorithm for finding the free register is inspired by the register
renaming in modern processors. From the perspective of register
renaming,
a register could be represented as two different registers if two
neighbour
instructions both write to it but no one ever reads. Extending this
fact,
a register is considered to be free if there is no read before its next
write in the execution flow. We are free to change its value without
interfering normal execution.

Static analysis shows that 51% instructions of the kernel (default
config)
is capable of being replaced i.e. one free register can be found at both
the start and end of replaced instruction pairs while the replaced
instructions can be directly executed. We also made an efficiency test
on Gem 5 RISCV which shows a more than 5x speedup on breakpoint-based
implementation.

Contribution:
Chen Guokai invents the algorithm of searching free register, evaluate
the ratio of optimizaion, the basic function support RVI kernel binary.
Liao Chang adds the support for hybrid RVI and RVC kernel binary, fix
some bugs with different kernel configure, refactor out entire feature
into some individual patches.

v5:
1. Correct known nits
2. Enable the usage of unused caller-saved registers
3. Append an efficiency test result on Gem 5

v4:
Correct the sequence of Signed-off-by and Co-developed-by.

v3:
1. Support of hybrid RVI and RVC kernel binary.
2. Refactor out entire feature into some individual patches.

v2:
1. Adjust comments
2. Remove improper copyright
3. Clean up format issues that is no common practice
4. Extract common definition of instruction decoder
5. Fix race issue in SMP platform.

v1:
Chen Guokai contribute the basic functionality code.

Chen Guokai (1):
  riscv/kprobe: Search free registers from unused caller-saved ones

Liao Chang (8):
  riscv/kprobe: Prepare the skeleton to implement RISCV OPTPROBES
    feature
  riscv/kprobe: Allocate detour buffer from module area
  riscv/kprobe: Prepare the skeleton to prepare optimized kprobe
  riscv/kprobe: Add common RVI and RVC instruction decoder code
  riscv/kprobe: Search free register(s) to clobber for 'AUIPC/JALR'
  riscv/kprobe: Add code to check if kprobe can be optimized
  riscv/kprobe: Prepare detour buffer for optimized kprobe
  riscv/kprobe: Patch AUIPC/JALR pair to optimize kprobe

 arch/riscv/Kconfig                        |   1 +
 arch/riscv/include/asm/bug.h              |   5 +-
 arch/riscv/include/asm/kprobes.h          |  49 ++
 arch/riscv/include/asm/patch.h            |   1 +
 arch/riscv/kernel/patch.c                 |  23 +-
 arch/riscv/kernel/probes/Makefile         |   1 +
 arch/riscv/kernel/probes/decode-insn.h    | 153 +++++
 arch/riscv/kernel/probes/kprobes.c        |  24 +
 arch/riscv/kernel/probes/opt.c            | 693 ++++++++++++++++++++++
 arch/riscv/kernel/probes/opt_trampoline.S | 137 +++++
 arch/riscv/kernel/probes/simulate-insn.h  |  41 ++
 11 files changed, 1123 insertions(+), 5 deletions(-)
 create mode 100644 arch/riscv/kernel/probes/opt.c
 create mode 100644 arch/riscv/kernel/probes/opt_trampoline.S

Comments

Björn Töpel Jan. 2, 2023, 6:02 p.m. UTC | #1
Chen Guokai <chenguokai17@mails.ucas.ac.cn> writes:

> Add jump optimization support for RISC-V.

Thank you for continuing to work on the series! I took the series for a
spin, and ran into a number of issues that makes me wonder how you test
the series, and how the testing is different from my runs.

I'll outline the general/big issues here, and leave the specifics per-patch.

I've done simple testing, using "Kprobe-based Event Tracing"
(CONFIG_KPROBE_EVENTS=y) via tracefs.

All the tests were run on commit 88603b6dc419 ("Linux 6.2-rc2") with the
series applied. All the bugs were trigged by setting different probes to
do_sys_openat2. Code:

do_sys_openat2:
...snip...
ffffffff802d138c:       89aa                    c.mv    s3,a0    // +44
ffffffff802d138e:       892e                    c.mv    s2,a1    // +46
ffffffff802d1390:       8532                    c.mv    a0,a2
ffffffff802d1392:       fa040593                addi    a1,s0,-96
ffffffff802d1396:       84b2                    c.mv    s1,a2
ffffffff802d1398:       fa043023                sd      zero,-96(s0)
ffffffff802d139c:       fa043423                sd      zero,-88(s0)
ffffffff802d13a0:       fa042823                sw      zero,-80(s0)
ffffffff802d13a4:       00000097                auipc   ra,0x0
...snip...


1. Fail to register kprobe to c.mv

Add a kprobe:
  echo 'p do_sys_openat2+44' > /sys/kernel/debug/tracing/kprobe_events

register_kprobe returns -22 (EINVAL). This is due to a bug in the
instruction decoder. I've sent to fix upstream [1].

2. (with [1] applied) Oops when register a probe

Add a kprobe:
  echo 'p do_sys_openat2+44' > /sys/kernel/debug/tracing/kprobe_events

You get a splat:
  Unable to handle kernel access to user memory without uaccess routines at virtual address 0000000000000008
  Oops [#1]
  Modules linked in:
  CPU: 1 PID: 242 Comm: bash Tainted: G        W          6.2.0-rc2-00010-g09ff1aa7b1f9-dirty #14
  Hardware name: riscv-virtio,qemu (DT)
  epc : riscv_probe_decode_insn+0x16a/0x192
   ra : riscv_probe_decode_insn+0x32/0x192
  epc : ffffffff8127b2bc ra : ffffffff8127b184 sp : ff2000000173bac0
   gp : ffffffff82533f70 tp : ff60000086ab2b40 t0 : 0000000000000000
   t1 : 0000000000000850 t2 : 65646f6365642054 s0 : ff2000000173bae0
   s1 : 0000000000000017 a0 : 000000000000e001 a1 : 000000000000003f
   a2 : 0000000000009002 a3 : 0000000000000017 a4 : 000000000000c001
   a5 : ffffffff8127b38a a6 : ff6000047d666000 a7 : 0000000000040000
   s2 : 0000000000000000 s3 : 0000000000000006 s4 : ff6000008558f718
   s5 : ff6000008558f718 s6 : 0000000000000001 s7 : ff6000008558f768
   s8 : 0000000000000007 s9 : 0000000000000003 s10: 0000000000000002
   s11: 00aaaaaad62baf78 t3 : 0000000000000000 t4 : 8dd70b0100000000
   t5 : ffffffffffffe000 t6 : ff2000000173b8c8
  status: 0000000200000120 badaddr: 0000000000000008 cause: 000000000000000f
  [<ffffffff81257e48>] arch_prepare_optimized_kprobe+0xc2/0x4ec
  [<ffffffff8125b420>] alloc_aggr_kprobe+0x5c/0x6a
  [<ffffffff8125ba0a>] register_kprobe+0x5dc/0x6a2
  [<ffffffff8016f266>] __register_trace_kprobe.part.0+0x98/0xbc
  [<ffffffff80170544>] __trace_kprobe_create+0x6ea/0xbcc
  [<ffffffff80176cee>] trace_probe_create+0x6c/0x7c
  [<ffffffff8016f1a2>] create_or_delete_trace_kprobe+0x24/0x50
  [<ffffffff80150642>] trace_parse_run_command+0x9e/0x12a
  [<ffffffff8016f176>] probes_write+0x18/0x20
  [<ffffffff802d494a>] vfs_write+0xca/0x41e
  [<ffffffff802d4f96>] ksys_write+0x70/0xee
  [<ffffffff802d5036>] sys_write+0x22/0x2a
  [<ffffffff80004196>] ret_from_syscall+0x0/0x2

This is because a call to riscv_probe_decode_insn(probe_opcode_t *addr,
struct arch_probe_insn *api), where api is NULL (and tripping over
auipc). Should be a common scenario...

3. No bound check for instructions

Add a probe to a non-valid instruction (in the middle of addi):
  echo 'p 0xffffffff802d1394' > /sys/kernel/debug/tracing/kprobe_events

You get the same splat as above from the auipc NULL-pointer, but the
"half" addi-instruction is parsed as a correct instruction.

4. Lockdep splat

Might be false positive; When enabling a probe, e.g.
  echo 1 > /sys/kernel/debug/tracing/events/kprobes/myprobe/enable


  ======================================================
  WARNING: possible circular locking dependency detected
  
  ------------------------------------------------------
  bash/244 is trying to acquire lock:                                                     
  ffffffff8223ee90 (cpu_hotplug_lock){++++}-{0:0}, at: stop_machine+0x2c/0x54
                                                                                          
  but task is already holding lock:                                                       
  ffffffff82249f70 (text_mutex){+.+.}-{3:3}, at: ftrace_arch_code_modify_prepare+0x1a/0x22
                                                                                          
  which lock already depends on the new lock. 
                                                                                          
                                                                                          
  the existing dependency chain (in reverse order) is:         
                                                                                          
  -> #1 (text_mutex){+.+.}-{3:3}:                                                         
         lock_acquire+0x10a/0x328                                                         
         __mutex_lock+0xa8/0x770                                                          
         mutex_lock_nested+0x28/0x30                                                      
         register_kprobe+0x3ae/0x5ea
         __register_trace_kprobe.part.0+0x98/0xbc
         __trace_kprobe_create+0x6ea/0xbcc
         trace_probe_create+0x6c/0x7c
         create_or_delete_trace_kprobe+0x24/0x50
         trace_parse_run_command+0x9e/0x12a
         probes_write+0x18/0x20
         vfs_write+0xca/0x41e
         ksys_write+0x70/0xee
         sys_write+0x22/0x2a
         ret_from_syscall+0x0/0x2
  
  -> #0 (cpu_hotplug_lock){++++}-{0:0}:
         check_noncircular+0x122/0x13a
         __lock_acquire+0x1058/0x20e4
         lock_acquire+0x10a/0x328
         cpus_read_lock+0x4c/0x11c
         stop_machine+0x2c/0x54
         arch_ftrace_update_code+0x2e/0x4c
         ftrace_startup+0xd0/0x15e
         register_ftrace_function+0x32/0x7c
         arm_kprobe+0x132/0x198
         enable_kprobe+0x9c/0xc0
         enable_trace_kprobe+0x6e/0xea
         kprobe_register+0x64/0x6c
         __ftrace_event_enable_disable+0x72/0x246
         event_enable_write+0x94/0xe4
         vfs_write+0xca/0x41e
         ksys_write+0x70/0xee
         sys_write+0x22/0x2a
         ret_from_syscall+0x0/0x2
  
  other info that might help us debug this:
  
   Possible unsafe locking scenario:
  
         CPU0                    CPU1
         ----                    ----
    lock(text_mutex);
                                 lock(cpu_hotplug_lock);
                                 lock(text_mutex);
    lock(cpu_hotplug_lock);
  
   *** DEADLOCK ***
  
  5 locks held by bash/244:
   #0: ff60000080f49438 (sb_writers#12){.+.+}-{0:0}, at: ksys_write+0x70/0xee
   #1: ffffffff822d9468 (event_mutex){+.+.}-{3:3}, at: event_enable_write+0x7c/0xe4
   #2: ffffffff822d3fa8 (kprobe_mutex){+.+.}-{3:3}, at: enable_kprobe+0x32/0xc0
   #3: ffffffff822d56d8 (ftrace_lock){+.+.}-{3:3}, at: register_ftrace_function+0x26/0x7c
   #4: ffffffff82249f70 (text_mutex){+.+.}-{3:3}, at: ftrace_arch_code_modify_prepare+0x1a/0x22
  
  stack backtrace:
  CPU: 2 PID: 244 Comm: bash Not tainted 6.2.0-rc1-00008-g544b2c59fd81 #1
  Hardware name: riscv-virtio,qemu (DT)
  Call Trace:
  [<ffffffff80006e80>] dump_backtrace+0x30/0x38
  [<ffffffff81256e82>] show_stack+0x40/0x4c
  [<ffffffff8126e054>] dump_stack_lvl+0x62/0x84
  [<ffffffff8126e08e>] dump_stack+0x18/0x20
  [<ffffffff8009b37e>] print_circular_bug+0x2ac/0x318
  [<ffffffff8009b50c>] check_noncircular+0x122/0x13a
  [<ffffffff8009e020>] __lock_acquire+0x1058/0x20e4
  [<ffffffff8009f90c>] lock_acquire+0x10a/0x328
  [<ffffffff8002fb8a>] cpus_read_lock+0x4c/0x11c
  [<ffffffff8011ed60>] stop_machine+0x2c/0x54
  [<ffffffff8013aec6>] arch_ftrace_update_code+0x2e/0x4c
  [<ffffffff8013e796>] ftrace_startup+0xd0/0x15e
  [<ffffffff8013e856>] register_ftrace_function+0x32/0x7c
  [<ffffffff8012f928>] arm_kprobe+0x132/0x198
  [<ffffffff8012fa2a>] enable_kprobe+0x9c/0xc0
  [<ffffffff8016ff62>] enable_trace_kprobe+0x6e/0xea
  [<ffffffff801700da>] kprobe_register+0x64/0x6c
  [<ffffffff8015eba6>] __ftrace_event_enable_disable+0x72/0x246
  [<ffffffff8015eeea>] event_enable_write+0x94/0xe4
  [<ffffffff802d5e1a>] vfs_write+0xca/0x41e
  [<ffffffff802d6466>] ksys_write+0x70/0xee
  [<ffffffff802d6506>] sys_write+0x22/0x2a
  [<ffffffff80004196>] ret_from_syscall+0x0/0x2


5. 32b support?

I've noticed that there code supports rv32. Is this tested? Does regular
kprobes work on 32b?


Thanks,
Björn


[1] https://lore.kernel.org/linux-riscv/20230102160748.1307289-1-bjorn@kernel.org/
Xim Jan. 4, 2023, 8:30 a.m. UTC | #2
Hi Björn,

Thanks for your detailed review! I made tests mainly on some syscall/timer related
functions where these issues were not triggered. I will check all these issues as well
as comments that spread per-patch before a new version of patch set is sent.

FYI the 32b support is included and was tested with mostly same cases as the 64b one.

Regards,
Guokai Chen

> 2023年1月3日 02:02,Björn Töpel <bjorn@kernel.org> 写道:
> 
> Chen Guokai <chenguokai17@mails.ucas.ac.cn> writes:
> 
>> Add jump optimization support for RISC-V.
> 
> Thank you for continuing to work on the series! I took the series for a
> spin, and ran into a number of issues that makes me wonder how you test
> the series, and how the testing is different from my runs.
> 
> I'll outline the general/big issues here, and leave the specifics per-patch.
> 
> I've done simple testing, using "Kprobe-based Event Tracing"
> (CONFIG_KPROBE_EVENTS=y) via tracefs.
> 
> All the tests were run on commit 88603b6dc419 ("Linux 6.2-rc2") with the
> series applied. All the bugs were trigged by setting different probes to
> do_sys_openat2. Code:
> 
> do_sys_openat2:
> ...snip...
> ffffffff802d138c:       89aa                    c.mv    s3,a0    // +44
> ffffffff802d138e:       892e                    c.mv    s2,a1    // +46
> ffffffff802d1390:       8532                    c.mv    a0,a2
> ffffffff802d1392:       fa040593                addi    a1,s0,-96
> ffffffff802d1396:       84b2                    c.mv    s1,a2
> ffffffff802d1398:       fa043023                sd      zero,-96(s0)
> ffffffff802d139c:       fa043423                sd      zero,-88(s0)
> ffffffff802d13a0:       fa042823                sw      zero,-80(s0)
> ffffffff802d13a4:       00000097                auipc   ra,0x0
> ...snip...
> 
> 
> 1. Fail to register kprobe to c.mv
> 
> Add a kprobe:
>  echo 'p do_sys_openat2+44' > /sys/kernel/debug/tracing/kprobe_events
> 
> register_kprobe returns -22 (EINVAL). This is due to a bug in the
> instruction decoder. I've sent to fix upstream [1].
> 
> 2. (with [1] applied) Oops when register a probe
> 
> Add a kprobe:
>  echo 'p do_sys_openat2+44' > /sys/kernel/debug/tracing/kprobe_events
> 
> You get a splat:
>  Unable to handle kernel access to user memory without uaccess routines at virtual address 0000000000000008
>  Oops [#1]
>  Modules linked in:
>  CPU: 1 PID: 242 Comm: bash Tainted: G        W          6.2.0-rc2-00010-g09ff1aa7b1f9-dirty #14
>  Hardware name: riscv-virtio,qemu (DT)
>  epc : riscv_probe_decode_insn+0x16a/0x192
>   ra : riscv_probe_decode_insn+0x32/0x192
>  epc : ffffffff8127b2bc ra : ffffffff8127b184 sp : ff2000000173bac0
>   gp : ffffffff82533f70 tp : ff60000086ab2b40 t0 : 0000000000000000
>   t1 : 0000000000000850 t2 : 65646f6365642054 s0 : ff2000000173bae0
>   s1 : 0000000000000017 a0 : 000000000000e001 a1 : 000000000000003f
>   a2 : 0000000000009002 a3 : 0000000000000017 a4 : 000000000000c001
>   a5 : ffffffff8127b38a a6 : ff6000047d666000 a7 : 0000000000040000
>   s2 : 0000000000000000 s3 : 0000000000000006 s4 : ff6000008558f718
>   s5 : ff6000008558f718 s6 : 0000000000000001 s7 : ff6000008558f768
>   s8 : 0000000000000007 s9 : 0000000000000003 s10: 0000000000000002
>   s11: 00aaaaaad62baf78 t3 : 0000000000000000 t4 : 8dd70b0100000000
>   t5 : ffffffffffffe000 t6 : ff2000000173b8c8
>  status: 0000000200000120 badaddr: 0000000000000008 cause: 000000000000000f
>  [<ffffffff81257e48>] arch_prepare_optimized_kprobe+0xc2/0x4ec
>  [<ffffffff8125b420>] alloc_aggr_kprobe+0x5c/0x6a
>  [<ffffffff8125ba0a>] register_kprobe+0x5dc/0x6a2
>  [<ffffffff8016f266>] __register_trace_kprobe.part.0+0x98/0xbc
>  [<ffffffff80170544>] __trace_kprobe_create+0x6ea/0xbcc
>  [<ffffffff80176cee>] trace_probe_create+0x6c/0x7c
>  [<ffffffff8016f1a2>] create_or_delete_trace_kprobe+0x24/0x50
>  [<ffffffff80150642>] trace_parse_run_command+0x9e/0x12a
>  [<ffffffff8016f176>] probes_write+0x18/0x20
>  [<ffffffff802d494a>] vfs_write+0xca/0x41e
>  [<ffffffff802d4f96>] ksys_write+0x70/0xee
>  [<ffffffff802d5036>] sys_write+0x22/0x2a
>  [<ffffffff80004196>] ret_from_syscall+0x0/0x2
> 
> This is because a call to riscv_probe_decode_insn(probe_opcode_t *addr,
> struct arch_probe_insn *api), where api is NULL (and tripping over
> auipc). Should be a common scenario...
> 
> 3. No bound check for instructions
> 
> Add a probe to a non-valid instruction (in the middle of addi):
>  echo 'p 0xffffffff802d1394' > /sys/kernel/debug/tracing/kprobe_events
> 
> You get the same splat as above from the auipc NULL-pointer, but the
> "half" addi-instruction is parsed as a correct instruction.
> 
> 4. Lockdep splat
> 
> Might be false positive; When enabling a probe, e.g.
>  echo 1 > /sys/kernel/debug/tracing/events/kprobes/myprobe/enable
> 
> 
>  ======================================================
>  WARNING: possible circular locking dependency detected
> 
>  ------------------------------------------------------
>  bash/244 is trying to acquire lock:                                                     
>  ffffffff8223ee90 (cpu_hotplug_lock){++++}-{0:0}, at: stop_machine+0x2c/0x54
> 
>  but task is already holding lock:                                                       
>  ffffffff82249f70 (text_mutex){+.+.}-{3:3}, at: ftrace_arch_code_modify_prepare+0x1a/0x22
> 
>  which lock already depends on the new lock. 
> 
> 
>  the existing dependency chain (in reverse order) is:         
> 
>  -> #1 (text_mutex){+.+.}-{3:3}:                                                         
>         lock_acquire+0x10a/0x328                                                         
>         __mutex_lock+0xa8/0x770                                                          
>         mutex_lock_nested+0x28/0x30                                                      
>         register_kprobe+0x3ae/0x5ea
>         __register_trace_kprobe.part.0+0x98/0xbc
>         __trace_kprobe_create+0x6ea/0xbcc
>         trace_probe_create+0x6c/0x7c
>         create_or_delete_trace_kprobe+0x24/0x50
>         trace_parse_run_command+0x9e/0x12a
>         probes_write+0x18/0x20
>         vfs_write+0xca/0x41e
>         ksys_write+0x70/0xee
>         sys_write+0x22/0x2a
>         ret_from_syscall+0x0/0x2
> 
>  -> #0 (cpu_hotplug_lock){++++}-{0:0}:
>         check_noncircular+0x122/0x13a
>         __lock_acquire+0x1058/0x20e4
>         lock_acquire+0x10a/0x328
>         cpus_read_lock+0x4c/0x11c
>         stop_machine+0x2c/0x54
>         arch_ftrace_update_code+0x2e/0x4c
>         ftrace_startup+0xd0/0x15e
>         register_ftrace_function+0x32/0x7c
>         arm_kprobe+0x132/0x198
>         enable_kprobe+0x9c/0xc0
>         enable_trace_kprobe+0x6e/0xea
>         kprobe_register+0x64/0x6c
>         __ftrace_event_enable_disable+0x72/0x246
>         event_enable_write+0x94/0xe4
>         vfs_write+0xca/0x41e
>         ksys_write+0x70/0xee
>         sys_write+0x22/0x2a
>         ret_from_syscall+0x0/0x2
> 
>  other info that might help us debug this:
> 
>   Possible unsafe locking scenario:
> 
>         CPU0                    CPU1
>         ----                    ----
>    lock(text_mutex);
>                                 lock(cpu_hotplug_lock);
>                                 lock(text_mutex);
>    lock(cpu_hotplug_lock);
> 
>   *** DEADLOCK ***
> 
>  5 locks held by bash/244:
>   #0: ff60000080f49438 (sb_writers#12){.+.+}-{0:0}, at: ksys_write+0x70/0xee
>   #1: ffffffff822d9468 (event_mutex){+.+.}-{3:3}, at: event_enable_write+0x7c/0xe4
>   #2: ffffffff822d3fa8 (kprobe_mutex){+.+.}-{3:3}, at: enable_kprobe+0x32/0xc0
>   #3: ffffffff822d56d8 (ftrace_lock){+.+.}-{3:3}, at: register_ftrace_function+0x26/0x7c
>   #4: ffffffff82249f70 (text_mutex){+.+.}-{3:3}, at: ftrace_arch_code_modify_prepare+0x1a/0x22
> 
>  stack backtrace:
>  CPU: 2 PID: 244 Comm: bash Not tainted 6.2.0-rc1-00008-g544b2c59fd81 #1
>  Hardware name: riscv-virtio,qemu (DT)
>  Call Trace:
>  [<ffffffff80006e80>] dump_backtrace+0x30/0x38
>  [<ffffffff81256e82>] show_stack+0x40/0x4c
>  [<ffffffff8126e054>] dump_stack_lvl+0x62/0x84
>  [<ffffffff8126e08e>] dump_stack+0x18/0x20
>  [<ffffffff8009b37e>] print_circular_bug+0x2ac/0x318
>  [<ffffffff8009b50c>] check_noncircular+0x122/0x13a
>  [<ffffffff8009e020>] __lock_acquire+0x1058/0x20e4
>  [<ffffffff8009f90c>] lock_acquire+0x10a/0x328
>  [<ffffffff8002fb8a>] cpus_read_lock+0x4c/0x11c
>  [<ffffffff8011ed60>] stop_machine+0x2c/0x54
>  [<ffffffff8013aec6>] arch_ftrace_update_code+0x2e/0x4c
>  [<ffffffff8013e796>] ftrace_startup+0xd0/0x15e
>  [<ffffffff8013e856>] register_ftrace_function+0x32/0x7c
>  [<ffffffff8012f928>] arm_kprobe+0x132/0x198
>  [<ffffffff8012fa2a>] enable_kprobe+0x9c/0xc0
>  [<ffffffff8016ff62>] enable_trace_kprobe+0x6e/0xea
>  [<ffffffff801700da>] kprobe_register+0x64/0x6c
>  [<ffffffff8015eba6>] __ftrace_event_enable_disable+0x72/0x246
>  [<ffffffff8015eeea>] event_enable_write+0x94/0xe4
>  [<ffffffff802d5e1a>] vfs_write+0xca/0x41e
>  [<ffffffff802d6466>] ksys_write+0x70/0xee
>  [<ffffffff802d6506>] sys_write+0x22/0x2a
>  [<ffffffff80004196>] ret_from_syscall+0x0/0x2
> 
> 
> 5. 32b support?
> 
> I've noticed that there code supports rv32. Is this tested? Does regular
> kprobes work on 32b?
> 
> 
> Thanks,
> Björn
> 
> 
> [1] https://lore.kernel.org/linux-riscv/20230102160748.1307289-1-bjorn@kernel.org/
>
Liao, Chang Jan. 4, 2023, 8:34 a.m. UTC | #3
Hi,Björn,appreciate for your review and testing about this feature.

在 2023/1/3 2:02, Björn Töpel 写道:
> Chen Guokai <chenguokai17@mails.ucas.ac.cn> writes:
> 
>> Add jump optimization support for RISC-V.
> 
> Thank you for continuing to work on the series! I took the series for a
> spin, and ran into a number of issues that makes me wonder how you test
> the series, and how the testing is different from my runs.

I have pick some kernel functions to test this series, which means all optprobe
are install at entry of function, i guess the instruction pattern is not versatile
enough for my testcases leads to some bugs are not discovered.

Do you think it is good idea to test this feature via binary ftracetest and the
kprobe related tc scripts in tools/testing/ftrace directory?

Thanks.

> 
> I'll outline the general/big issues here, and leave the specifics per-patch.
> 
> I've done simple testing, using "Kprobe-based Event Tracing"
> (CONFIG_KPROBE_EVENTS=y) via tracefs.
> 
> All the tests were run on commit 88603b6dc419 ("Linux 6.2-rc2") with the
> series applied. All the bugs were trigged by setting different probes to
> do_sys_openat2. Code:
> 
> do_sys_openat2:
> ...snip...
> ffffffff802d138c:       89aa                    c.mv    s3,a0    // +44
> ffffffff802d138e:       892e                    c.mv    s2,a1    // +46
> ffffffff802d1390:       8532                    c.mv    a0,a2
> ffffffff802d1392:       fa040593                addi    a1,s0,-96
> ffffffff802d1396:       84b2                    c.mv    s1,a2
> ffffffff802d1398:       fa043023                sd      zero,-96(s0)
> ffffffff802d139c:       fa043423                sd      zero,-88(s0)
> ffffffff802d13a0:       fa042823                sw      zero,-80(s0)
> ffffffff802d13a4:       00000097                auipc   ra,0x0
> ...snip...
> 
> 
> 1. Fail to register kprobe to c.mv
> 
> Add a kprobe:
>   echo 'p do_sys_openat2+44' > /sys/kernel/debug/tracing/kprobe_events
> 
> register_kprobe returns -22 (EINVAL). This is due to a bug in the
> instruction decoder. I've sent to fix upstream [1].
> 
> 2. (with [1] applied) Oops when register a probe
> 
> Add a kprobe:
>   echo 'p do_sys_openat2+44' > /sys/kernel/debug/tracing/kprobe_events
> 
> You get a splat:
>   Unable to handle kernel access to user memory without uaccess routines at virtual address 0000000000000008
>   Oops [#1]
>   Modules linked in:
>   CPU: 1 PID: 242 Comm: bash Tainted: G        W          6.2.0-rc2-00010-g09ff1aa7b1f9-dirty #14
>   Hardware name: riscv-virtio,qemu (DT)
>   epc : riscv_probe_decode_insn+0x16a/0x192
>    ra : riscv_probe_decode_insn+0x32/0x192
>   epc : ffffffff8127b2bc ra : ffffffff8127b184 sp : ff2000000173bac0
>    gp : ffffffff82533f70 tp : ff60000086ab2b40 t0 : 0000000000000000
>    t1 : 0000000000000850 t2 : 65646f6365642054 s0 : ff2000000173bae0
>    s1 : 0000000000000017 a0 : 000000000000e001 a1 : 000000000000003f
>    a2 : 0000000000009002 a3 : 0000000000000017 a4 : 000000000000c001
>    a5 : ffffffff8127b38a a6 : ff6000047d666000 a7 : 0000000000040000
>    s2 : 0000000000000000 s3 : 0000000000000006 s4 : ff6000008558f718
>    s5 : ff6000008558f718 s6 : 0000000000000001 s7 : ff6000008558f768
>    s8 : 0000000000000007 s9 : 0000000000000003 s10: 0000000000000002
>    s11: 00aaaaaad62baf78 t3 : 0000000000000000 t4 : 8dd70b0100000000
>    t5 : ffffffffffffe000 t6 : ff2000000173b8c8
>   status: 0000000200000120 badaddr: 0000000000000008 cause: 000000000000000f
>   [<ffffffff81257e48>] arch_prepare_optimized_kprobe+0xc2/0x4ec
>   [<ffffffff8125b420>] alloc_aggr_kprobe+0x5c/0x6a
>   [<ffffffff8125ba0a>] register_kprobe+0x5dc/0x6a2
>   [<ffffffff8016f266>] __register_trace_kprobe.part.0+0x98/0xbc
>   [<ffffffff80170544>] __trace_kprobe_create+0x6ea/0xbcc
>   [<ffffffff80176cee>] trace_probe_create+0x6c/0x7c
>   [<ffffffff8016f1a2>] create_or_delete_trace_kprobe+0x24/0x50
>   [<ffffffff80150642>] trace_parse_run_command+0x9e/0x12a
>   [<ffffffff8016f176>] probes_write+0x18/0x20
>   [<ffffffff802d494a>] vfs_write+0xca/0x41e
>   [<ffffffff802d4f96>] ksys_write+0x70/0xee
>   [<ffffffff802d5036>] sys_write+0x22/0x2a
>   [<ffffffff80004196>] ret_from_syscall+0x0/0x2
> 
> This is because a call to riscv_probe_decode_insn(probe_opcode_t *addr,
> struct arch_probe_insn *api), where api is NULL (and tripping over
> auipc). Should be a common scenario...
> 
> 3. No bound check for instructions
> 
> Add a probe to a non-valid instruction (in the middle of addi):
>   echo 'p 0xffffffff802d1394' > /sys/kernel/debug/tracing/kprobe_events
> 
> You get the same splat as above from the auipc NULL-pointer, but the
> "half" addi-instruction is parsed as a correct instruction.
> 
> 4. Lockdep splat
> 
> Might be false positive; When enabling a probe, e.g.
>   echo 1 > /sys/kernel/debug/tracing/events/kprobes/myprobe/enable
> 
> 
>   ======================================================
>   WARNING: possible circular locking dependency detected
>   
>   ------------------------------------------------------
>   bash/244 is trying to acquire lock:                                                     
>   ffffffff8223ee90 (cpu_hotplug_lock){++++}-{0:0}, at: stop_machine+0x2c/0x54
>                                                                                           
>   but task is already holding lock:                                                       
>   ffffffff82249f70 (text_mutex){+.+.}-{3:3}, at: ftrace_arch_code_modify_prepare+0x1a/0x22
>                                                                                           
>   which lock already depends on the new lock. 
>                                                                                           
>                                                                                           
>   the existing dependency chain (in reverse order) is:         
>                                                                                           
>   -> #1 (text_mutex){+.+.}-{3:3}:                                                         
>          lock_acquire+0x10a/0x328                                                         
>          __mutex_lock+0xa8/0x770                                                          
>          mutex_lock_nested+0x28/0x30                                                      
>          register_kprobe+0x3ae/0x5ea
>          __register_trace_kprobe.part.0+0x98/0xbc
>          __trace_kprobe_create+0x6ea/0xbcc
>          trace_probe_create+0x6c/0x7c
>          create_or_delete_trace_kprobe+0x24/0x50
>          trace_parse_run_command+0x9e/0x12a
>          probes_write+0x18/0x20
>          vfs_write+0xca/0x41e
>          ksys_write+0x70/0xee
>          sys_write+0x22/0x2a
>          ret_from_syscall+0x0/0x2
>   
>   -> #0 (cpu_hotplug_lock){++++}-{0:0}:
>          check_noncircular+0x122/0x13a
>          __lock_acquire+0x1058/0x20e4
>          lock_acquire+0x10a/0x328
>          cpus_read_lock+0x4c/0x11c
>          stop_machine+0x2c/0x54
>          arch_ftrace_update_code+0x2e/0x4c
>          ftrace_startup+0xd0/0x15e
>          register_ftrace_function+0x32/0x7c
>          arm_kprobe+0x132/0x198
>          enable_kprobe+0x9c/0xc0
>          enable_trace_kprobe+0x6e/0xea
>          kprobe_register+0x64/0x6c
>          __ftrace_event_enable_disable+0x72/0x246
>          event_enable_write+0x94/0xe4
>          vfs_write+0xca/0x41e
>          ksys_write+0x70/0xee
>          sys_write+0x22/0x2a
>          ret_from_syscall+0x0/0x2
>   
>   other info that might help us debug this:

Need to study this backtrace further, but at first glance, i guess CONFIG_DYNAMIC_FTRACE is enabled on your kernel, right?
If so, all krpobe is installed via ftrace stub, then kprobe optimiztion occur in the ftrace trampoline code, and it also
a corner case to current optprobe implementation.

>   
>    Possible unsafe locking scenario:
>   
>          CPU0                    CPU1
>          ----                    ----
>     lock(text_mutex);
>                                  lock(cpu_hotplug_lock);
>                                  lock(text_mutex);
>     lock(cpu_hotplug_lock);
>   
>    *** DEADLOCK ***
>   
>   5 locks held by bash/244:
>    #0: ff60000080f49438 (sb_writers#12){.+.+}-{0:0}, at: ksys_write+0x70/0xee
>    #1: ffffffff822d9468 (event_mutex){+.+.}-{3:3}, at: event_enable_write+0x7c/0xe4
>    #2: ffffffff822d3fa8 (kprobe_mutex){+.+.}-{3:3}, at: enable_kprobe+0x32/0xc0
>    #3: ffffffff822d56d8 (ftrace_lock){+.+.}-{3:3}, at: register_ftrace_function+0x26/0x7c
>    #4: ffffffff82249f70 (text_mutex){+.+.}-{3:3}, at: ftrace_arch_code_modify_prepare+0x1a/0x22
>   
>   stack backtrace:
>   CPU: 2 PID: 244 Comm: bash Not tainted 6.2.0-rc1-00008-g544b2c59fd81 #1
>   Hardware name: riscv-virtio,qemu (DT)
>   Call Trace:
>   [<ffffffff80006e80>] dump_backtrace+0x30/0x38
>   [<ffffffff81256e82>] show_stack+0x40/0x4c
>   [<ffffffff8126e054>] dump_stack_lvl+0x62/0x84
>   [<ffffffff8126e08e>] dump_stack+0x18/0x20
>   [<ffffffff8009b37e>] print_circular_bug+0x2ac/0x318
>   [<ffffffff8009b50c>] check_noncircular+0x122/0x13a
>   [<ffffffff8009e020>] __lock_acquire+0x1058/0x20e4
>   [<ffffffff8009f90c>] lock_acquire+0x10a/0x328
>   [<ffffffff8002fb8a>] cpus_read_lock+0x4c/0x11c
>   [<ffffffff8011ed60>] stop_machine+0x2c/0x54
>   [<ffffffff8013aec6>] arch_ftrace_update_code+0x2e/0x4c
>   [<ffffffff8013e796>] ftrace_startup+0xd0/0x15e
>   [<ffffffff8013e856>] register_ftrace_function+0x32/0x7c
>   [<ffffffff8012f928>] arm_kprobe+0x132/0x198
>   [<ffffffff8012fa2a>] enable_kprobe+0x9c/0xc0
>   [<ffffffff8016ff62>] enable_trace_kprobe+0x6e/0xea
>   [<ffffffff801700da>] kprobe_register+0x64/0x6c
>   [<ffffffff8015eba6>] __ftrace_event_enable_disable+0x72/0x246
>   [<ffffffff8015eeea>] event_enable_write+0x94/0xe4
>   [<ffffffff802d5e1a>] vfs_write+0xca/0x41e
>   [<ffffffff802d6466>] ksys_write+0x70/0xee
>   [<ffffffff802d6506>] sys_write+0x22/0x2a
>   [<ffffffff80004196>] ret_from_syscall+0x0/0x2
> 

My comment is same as the last one.

> 
> 5. 32b support?
> 
> I've noticed that there code supports rv32. Is this tested? Does regular
> kprobes work on 32b?

Not yet, i will test on rv32.

>
> 
> Thanks,
> Björn
> 
> 
> [1] https://lore.kernel.org/linux-riscv/20230102160748.1307289-1-bjorn@kernel.org/
> 
>
Björn Töpel Jan. 4, 2023, 8:45 a.m. UTC | #4
Xim <chenguokai17@mails.ucas.ac.cn> writes:

> Hi Björn,
>
> Thanks for your detailed review! I made tests mainly on some syscall/timer related
> functions where these issues were not triggered. I will check all these issues as well
> as comments that spread per-patch before a new version of patch set is sent.
>
> FYI the 32b support is included and was tested with mostly same cases as the 64b one.

Ok! Thank you for clarifying!


Björn
Björn Töpel Jan. 4, 2023, 8:53 a.m. UTC | #5
"liaochang (A)" <liaochang1@huawei.com> writes:

> Hi,Björn,appreciate for your review and testing about this feature.

Thank you for the hard work!

> 在 2023/1/3 2:02, Björn Töpel 写道:
>> Chen Guokai <chenguokai17@mails.ucas.ac.cn> writes:
>> 
>>> Add jump optimization support for RISC-V.
>> 
>> Thank you for continuing to work on the series! I took the series for a
>> spin, and ran into a number of issues that makes me wonder how you test
>> the series, and how the testing is different from my runs.
>
> I have pick some kernel functions to test this series, which means all optprobe
> are install at entry of function, i guess the instruction pattern is not versatile
> enough for my testcases leads to some bugs are not discovered.
>
> Do you think it is good idea to test this feature via binary ftracetest and the
> kprobe related tc scripts in tools/testing/ftrace directory?

Definitely! Both running all tests in tools/testing/selftests/ftrace and
with the CONFIG_KPROBES_SANITY_TEST module.

[...]

>> 4. Lockdep splat

[...]

> Need to study this backtrace further, but at first glance, i guess CONFIG_DYNAMIC_FTRACE is enabled on your kernel, right?
> If so, all krpobe is installed via ftrace stub, then kprobe optimiztion occur in the ftrace trampoline code, and it also
> a corner case to current optprobe implementation.

Yes, CONFIG_DYNAMIC_FTRACE is on. My kernel config was simply:
  make ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- defconfig
  make ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- kselftest-merge


Thanks,
Björn