Message ID | 20201021073839.43935-1-zong.li@sifive.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | stop_machine: Mark functions as notrace | expand |
On Wed, 21 Oct 2020 15:38:39 +0800 Zong Li <zong.li@sifive.com> wrote: > Like the commit cb9d7fd51d9f ("watchdog: Mark watchdog touch functions > as notrace"), some architectures assume that the stopped CPUs don't make > function calls to traceable functions when they are in the stopped > state. For example, it causes unexpected kernel crashed when switching > tracer on RISC-V. > > The following patches added calls to these two functions, fix it by > adding the notrace annotations. > > Fixes: 4ecf0a43e729 ("processor: get rid of cpu_relax_yield") > Fixes: 366237e7b083 ("stop_machine: Provide RCU quiescent state in > multi_cpu_stop()") I really do not like to add "notrace" to core functions because a single architecture has issues with it. Why does RISCV have problems with these functions but no other architecture does? NACK from me until it is shown that these are issues for a broader set of architectures. It sounds to me like you are treating a symptom and not the disease. -- Steve > > Signed-off-by: Zong Li <zong.li@sifive.com> > --- > kernel/rcu/tree.c | 2 +- > kernel/stop_machine.c | 2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index 06895ef85d69..2a52f42f64b6 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -409,7 +409,7 @@ bool rcu_eqs_special_set(int cpu) > * > * The caller must have disabled interrupts and must not be idle. > */ > -void rcu_momentary_dyntick_idle(void) > +notrace void rcu_momentary_dyntick_idle(void) > { > int special; > > diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c > index 865bb0228ab6..890b79cf0e7c 100644 > --- a/kernel/stop_machine.c > +++ b/kernel/stop_machine.c > @@ -178,7 +178,7 @@ static void ack_state(struct multi_stop_data *msdata) > set_state(msdata, msdata->state + 1); > } > > -void __weak stop_machine_yield(const struct cpumask *cpumask) > +notrace void __weak stop_machine_yield(const struct cpumask *cpumask) > { > cpu_relax(); > }
On Wed, 21 Oct 2020 10:12:16 -0400 Steven Rostedt <rostedt@goodmis.org> wrote: > > Fixes: 4ecf0a43e729 ("processor: get rid of cpu_relax_yield") > > Fixes: 366237e7b083 ("stop_machine: Provide RCU quiescent state in > > multi_cpu_stop()") > > I really do not like to add "notrace" to core functions because a single > architecture has issues with it. Why does RISCV have problems with these > functions but no other architecture does? > > NACK from me until it is shown that these are issues for a broader set of > architectures. After looking at the two above fixes, I take back my NACK ;-) One of them duplicates an already notraced function, so that looks fine. The other makes a static function global, which could cause issues as well. After further review: Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> -- Steve
On Wed, Oct 21, 2020 at 10:15:22AM -0400, Steven Rostedt wrote: > On Wed, 21 Oct 2020 10:12:16 -0400 > Steven Rostedt <rostedt@goodmis.org> wrote: > > > > Fixes: 4ecf0a43e729 ("processor: get rid of cpu_relax_yield") > > > Fixes: 366237e7b083 ("stop_machine: Provide RCU quiescent state in > > > multi_cpu_stop()") > > > > I really do not like to add "notrace" to core functions because a single > > architecture has issues with it. Why does RISCV have problems with these > > functions but no other architecture does? > > > > NACK from me until it is shown that these are issues for a broader set of > > architectures. > > After looking at the two above fixes, I take back my NACK ;-) > > One of them duplicates an already notraced function, so that looks fine. > The other makes a static function global, which could cause issues as well. > > After further review: > > Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> If someone else would like to take this: Acked-by: Paul E. McKenney <paulmck@kernel.org> Or let me know if you would like me to take it, target v5.11. Thanx, Paul
On Wed, 21 Oct 2020 08:44:56 -0700
"Paul E. McKenney" <paulmck@kernel.org> wrote:
> Or let me know if you would like me to take it, target v5.11.
I'm not sure if these can wait, as I believe they are fixing a regression
with RISCV function tracing.
Probably best to have them go through the RISCV tree.
-- Steve
On Wed, Oct 21, 2020 at 11:54:51AM -0400, Steven Rostedt wrote: > On Wed, 21 Oct 2020 08:44:56 -0700 > "Paul E. McKenney" <paulmck@kernel.org> wrote: > > > Or let me know if you would like me to take it, target v5.11. > > I'm not sure if these can wait, as I believe they are fixing a regression > with RISCV function tracing. > > Probably best to have them go through the RISCV tree. Works for me! Thanx, Paul
On Wed, Oct 21, 2020 at 12:38 AM Zong Li <zong.li@sifive.com> wrote: > > Like the commit cb9d7fd51d9f ("watchdog: Mark watchdog touch functions > as notrace"), some architectures assume that the stopped CPUs don't make > function calls to traceable functions when they are in the stopped > state. For example, it causes unexpected kernel crashed when switching > tracer on RISC-V. > > The following patches added calls to these two functions, fix it by > adding the notrace annotations. > > Fixes: 4ecf0a43e729 ("processor: get rid of cpu_relax_yield") > Fixes: 366237e7b083 ("stop_machine: Provide RCU quiescent state in > multi_cpu_stop()") > > Signed-off-by: Zong Li <zong.li@sifive.com> > --- > kernel/rcu/tree.c | 2 +- > kernel/stop_machine.c | 2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index 06895ef85d69..2a52f42f64b6 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -409,7 +409,7 @@ bool rcu_eqs_special_set(int cpu) > * > * The caller must have disabled interrupts and must not be idle. > */ > -void rcu_momentary_dyntick_idle(void) > +notrace void rcu_momentary_dyntick_idle(void) > { > int special; > > diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c > index 865bb0228ab6..890b79cf0e7c 100644 > --- a/kernel/stop_machine.c > +++ b/kernel/stop_machine.c > @@ -178,7 +178,7 @@ static void ack_state(struct multi_stop_data *msdata) > set_state(msdata, msdata->state + 1); > } > > -void __weak stop_machine_yield(const struct cpumask *cpumask) > +notrace void __weak stop_machine_yield(const struct cpumask *cpumask) > { > cpu_relax(); > } > -- > 2.28.0 > Thanks for the fix. FWIW, Tested-by: Atish Patra <atish.patra@wdc.com> Can you update the bugzilla as well ? https://bugzilla.kernel.org/show_bug.cgi?id=209317 -- Regards, Atish
On 21/10/2020 08:38, Zong Li wrote: > Like the commit cb9d7fd51d9f ("watchdog: Mark watchdog touch functions > as notrace"), some architectures assume that the stopped CPUs don't make > function calls to traceable functions when they are in the stopped > state. For example, it causes unexpected kernel crashed when switching > tracer on RISC-V. > > The following patches added calls to these two functions, fix it by > adding the notrace annotations. > > Fixes: 4ecf0a43e729 ("processor: get rid of cpu_relax_yield") > Fixes: 366237e7b083 ("stop_machine: Provide RCU quiescent state in > multi_cpu_stop()") > > Signed-off-by: Zong Li <zong.li@sifive.com> > --- > kernel/rcu/tree.c | 2 +- > kernel/stop_machine.c | 2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index 06895ef85d69..2a52f42f64b6 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -409,7 +409,7 @@ bool rcu_eqs_special_set(int cpu) > * > * The caller must have disabled interrupts and must not be idle. > */ > -void rcu_momentary_dyntick_idle(void) > +notrace void rcu_momentary_dyntick_idle(void) > { > int special; > > diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c > index 865bb0228ab6..890b79cf0e7c 100644 > --- a/kernel/stop_machine.c > +++ b/kernel/stop_machine.c > @@ -178,7 +178,7 @@ static void ack_state(struct multi_stop_data *msdata) > set_state(msdata, msdata->state + 1); > } > > -void __weak stop_machine_yield(const struct cpumask *cpumask) > +notrace void __weak stop_machine_yield(const struct cpumask *cpumask) > { > cpu_relax(); > } > Apologies for taking so long to reply, I needed to test this on several devices. This not only fixes the ftrace issue I see on RISC-V but also a ftrace hang issue on ARM64 in 5.8 too. Tested-by: Colin Ian King <colin.king@canonical.com> Many thanks!
On Sat, Oct 24, 2020 at 3:29 AM Colin Ian King <colin.king@canonical.com> wrote: > > On 21/10/2020 08:38, Zong Li wrote: > > Like the commit cb9d7fd51d9f ("watchdog: Mark watchdog touch functions > > as notrace"), some architectures assume that the stopped CPUs don't make > > function calls to traceable functions when they are in the stopped > > state. For example, it causes unexpected kernel crashed when switching > > tracer on RISC-V. > > > > The following patches added calls to these two functions, fix it by > > adding the notrace annotations. > > > > Fixes: 4ecf0a43e729 ("processor: get rid of cpu_relax_yield") > > Fixes: 366237e7b083 ("stop_machine: Provide RCU quiescent state in > > multi_cpu_stop()") > > > > Signed-off-by: Zong Li <zong.li@sifive.com> > > --- > > kernel/rcu/tree.c | 2 +- > > kernel/stop_machine.c | 2 +- > > 2 files changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > index 06895ef85d69..2a52f42f64b6 100644 > > --- a/kernel/rcu/tree.c > > +++ b/kernel/rcu/tree.c > > @@ -409,7 +409,7 @@ bool rcu_eqs_special_set(int cpu) > > * > > * The caller must have disabled interrupts and must not be idle. > > */ > > -void rcu_momentary_dyntick_idle(void) > > +notrace void rcu_momentary_dyntick_idle(void) > > { > > int special; > > > > diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c > > index 865bb0228ab6..890b79cf0e7c 100644 > > --- a/kernel/stop_machine.c > > +++ b/kernel/stop_machine.c > > @@ -178,7 +178,7 @@ static void ack_state(struct multi_stop_data *msdata) > > set_state(msdata, msdata->state + 1); > > } > > > > -void __weak stop_machine_yield(const struct cpumask *cpumask) > > +notrace void __weak stop_machine_yield(const struct cpumask *cpumask) > > { > > cpu_relax(); > > } > > > > Apologies for taking so long to reply, I needed to test this on several > devices. > > This not only fixes the ftrace issue I see on RISC-V but also a ftrace > hang issue on ARM64 in 5.8 too. > > Tested-by: Colin Ian King <colin.king@canonical.com> > > Many thanks! Many thanks all for reviewing and testing. Hi Palmer, As Steven suggested, could you help to pick up this patch in RISC-V tree?
Hi Zong & Atish, In our 2 harts c910 chip, we found: echo function > /sys/kernel/debug/tracing/current_tracer echo function_graph > /sys/kernel/debug/tracing/current_tracer echo function > /sys/kernel/debug/tracing/current_tracer echo function_graph > /sys/kernel/debug/tracing/current_tracer Then one core halted at stop_machine_yield: arch_cpu_idle () at arch/riscv/kernel/process.c:39 39 local_irq_enable(); (gdb) i th Id Target Id Frame * 1 Thread 1 (CPU#0) arch_cpu_idle () at arch/riscv/kernel/process.c:39 2 Thread 2 (CPU#1) stop_machine_yield (cpumask=0xffffffe001371fa8 <__cpu_online_mask>) at ./arch/riscv/include/asm/vdso/processor.h:12 (gdb) thread 2 [Switching to thread 2 (Thread 2)] #0 stop_machine_yield (cpumask=0xffffffe001371fa8 <__cpu_online_mask>) at ./arch/riscv/include/asm/vdso/processor.h:12 12 __asm__ __volatile__ ("div %0, %0, zero" : "=r" (dummy)); With your patch, it's solved. For this patch, I'll give: Tested by: Guo Ren <guoren@kernel.org> But that's not enough, we still need: diff --git a/arch/riscv/kernel/sbi.c b/arch/riscv/kernel/sbi.c index 226ccce..12b8808 100644 --- a/arch/riscv/kernel/sbi.c +++ b/arch/riscv/kernel/sbi.c @@ -376,7 +376,7 @@ EXPORT_SYMBOL(sbi_send_ipi); * * Return: None */ -void sbi_remote_fence_i(const unsigned long *hart_mask) +void notrace sbi_remote_fence_i(const unsigned long *hart_mask) { __sbi_rfence(SBI_EXT_RFENCE_REMOTE_FENCE_I, hart_mask, 0, 0, 0, 0); diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c index 400b945d..9467d987 100644 --- a/arch/riscv/mm/cacheflush.c +++ b/arch/riscv/mm/cacheflush.c @@ -9,12 +9,12 @@ #include <asm/sbi.h> -static void ipi_remote_fence_i(void *info) +static void notrace ipi_remote_fence_i(void *info) { return local_flush_icache_all(); } -void flush_icache_all(void) +void notrace flush_icache_all(void) { if (IS_ENABLED(CONFIG_RISCV_SBI)) sbi_remote_fence_i(NULL); Because: (gdb) bt #0 flush_icache_all () at arch/riscv/mm/cacheflush.c:20 #1 0xffffffe00020473a in patch_text_nosync (addr=<optimized out>, insns= <optimized out>, len=<optimized out>) at arch/riscv/kernel/patch.c:96 #2 0xffffffe000206792 in ftrace_make_call (rec=<optimized out>, addr=<optimized out>) at arch/riscv/kernel/ftrace.c:109 #3 0xffffffe0002c9be4 in __ftrace_replace_code (rec=0xffffffe01ae40020, e nable=true) at kernel/trace/ftrace.c:2503 #4 0xffffffe0002ca092 in ftrace_replace_code (mod_flags=<optimized out>) at kernel/trace/ftrace.c:2530 #5 0xffffffe0002ca24a in ftrace_modify_all_code (command=9) at kernel /trace/ftrace.c:2677 #6 0xffffffe0002ca2ee in __ftrace_modify_code (data=<optimized out>) at kernel/trace/ftrace.c:2703 #7 0xffffffe0002c1390 in multi_cpu_stop (data=0x0) at kernel/stop_machin e.c:224 #8 0xffffffe0002c0fbe in cpu_stopper_thread (cpu=<optimized out>) at kern el/stop_machine.c:491 #9 0xffffffe0002343be in smpboot_thread_fn (data=0x0) at kernel/smpboot. c:165 #10 0xffffffe00022f894 in kthread (_create=0xffffffe01af13040) at kern el/kthread.c:292 #11 0xffffffe000201fac in handle_exception () at arch/riscv/kernel/entry.S:236 On Wed, Oct 21, 2020 at 3:38 PM Zong Li <zong.li@sifive.com> wrote: > > Like the commit cb9d7fd51d9f ("watchdog: Mark watchdog touch functions > as notrace"), some architectures assume that the stopped CPUs don't make > function calls to traceable functions when they are in the stopped > state. For example, it causes unexpected kernel crashed when switching > tracer on RISC-V. > > The following patches added calls to these two functions, fix it by > adding the notrace annotations. > > Fixes: 4ecf0a43e729 ("processor: get rid of cpu_relax_yield") > Fixes: 366237e7b083 ("stop_machine: Provide RCU quiescent state in > multi_cpu_stop()") > > Signed-off-by: Zong Li <zong.li@sifive.com> > --- > kernel/rcu/tree.c | 2 +- > kernel/stop_machine.c | 2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index 06895ef85d69..2a52f42f64b6 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -409,7 +409,7 @@ bool rcu_eqs_special_set(int cpu) > * > * The caller must have disabled interrupts and must not be idle. > */ > -void rcu_momentary_dyntick_idle(void) > +notrace void rcu_momentary_dyntick_idle(void) > { > int special; > > diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c > index 865bb0228ab6..890b79cf0e7c 100644 > --- a/kernel/stop_machine.c > +++ b/kernel/stop_machine.c > @@ -178,7 +178,7 @@ static void ack_state(struct multi_stop_data *msdata) > set_state(msdata, msdata->state + 1); > } > > -void __weak stop_machine_yield(const struct cpumask *cpumask) > +notrace void __weak stop_machine_yield(const struct cpumask *cpumask) > { > cpu_relax(); > } > -- > 2.28.0 >
On Wed, Oct 28, 2020 at 8:44 AM Guo Ren <guoren@kernel.org> wrote: > > Hi Zong & Atish, > > In our 2 harts c910 chip, we found: > > echo function > /sys/kernel/debug/tracing/current_tracer > echo function_graph > /sys/kernel/debug/tracing/current_tracer > echo function > /sys/kernel/debug/tracing/current_tracer > echo function_graph > /sys/kernel/debug/tracing/current_tracer > > Then one core halted at stop_machine_yield: > arch_cpu_idle () at arch/riscv/kernel/process.c:39 > 39 local_irq_enable(); > (gdb) i th > Id Target Id Frame > * 1 Thread 1 (CPU#0) arch_cpu_idle () at arch/riscv/kernel/process.c:39 > 2 Thread 2 (CPU#1) stop_machine_yield > (cpumask=0xffffffe001371fa8 <__cpu_online_mask>) at > ./arch/riscv/include/asm/vdso/processor.h:12 > (gdb) thread 2 > [Switching to thread 2 (Thread 2)] > #0 stop_machine_yield (cpumask=0xffffffe001371fa8 > <__cpu_online_mask>) at ./arch/riscv/include/asm/vdso/processor.h:12 > 12 __asm__ __volatile__ ("div %0, %0, zero" : "=r" (dummy)); > > With your patch, it's solved. For this patch, I'll give: > Tested by: Guo Ren <guoren@kernel.org> > > But that's not enough, we still need: > > diff --git a/arch/riscv/kernel/sbi.c b/arch/riscv/kernel/sbi.c > index 226ccce..12b8808 100644 > --- a/arch/riscv/kernel/sbi.c > +++ b/arch/riscv/kernel/sbi.c > @@ -376,7 +376,7 @@ EXPORT_SYMBOL(sbi_send_ipi); > * > * Return: None > */ > -void sbi_remote_fence_i(const unsigned long *hart_mask) > +void notrace sbi_remote_fence_i(const unsigned long *hart_mask) > { > __sbi_rfence(SBI_EXT_RFENCE_REMOTE_FENCE_I, > hart_mask, 0, 0, 0, 0); > diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c > index 400b945d..9467d987 100644 > --- a/arch/riscv/mm/cacheflush.c > +++ b/arch/riscv/mm/cacheflush.c > @@ -9,12 +9,12 @@ > > #include <asm/sbi.h> > > -static void ipi_remote_fence_i(void *info) > +static void notrace ipi_remote_fence_i(void *info) > { > return local_flush_icache_all(); > } > > -void flush_icache_all(void) > +void notrace flush_icache_all(void) > { > if (IS_ENABLED(CONFIG_RISCV_SBI)) > sbi_remote_fence_i(NULL); > Did you see any issue if these functions are not marked as notrace ? As per Zong's explanation, the issue was that the other harts already fetched the next 2 nops and executed 1 while kernel patching replaced other with one of the auipc + jalr pair. @Zong can correct me if I am wrong. These functions are too far ahead. Can it cause such issues ? If yes, then we need to mark each and every function that can be invoked from patch_text_nosync and are not inlined. That includes copy_to_kernel_nofault, __sbi_rfence_v02, __sbi_rfence_v02_call, sbi_ecall. Few of these functions may be inlined by compiler. Can we depend on that ? > Because: > (gdb) bt > #0 flush_icache_all () at arch/riscv/mm/cacheflush.c:20 > #1 0xffffffe00020473a in patch_text_nosync (addr=<optimized out>, insns= > <optimized out>, len=<optimized out>) at arch/riscv/kernel/patch.c:96 > #2 0xffffffe000206792 in ftrace_make_call (rec=<optimized out>, > addr=<optimized out>) at arch/riscv/kernel/ftrace.c:109 > #3 0xffffffe0002c9be4 in __ftrace_replace_code (rec=0xffffffe01ae40020, e > nable=true) at kernel/trace/ftrace.c:2503 > #4 0xffffffe0002ca092 in ftrace_replace_code (mod_flags=<optimized > out>) at kernel/trace/ftrace.c:2530 > #5 0xffffffe0002ca24a in ftrace_modify_all_code (command=9) at kernel > /trace/ftrace.c:2677 > #6 0xffffffe0002ca2ee in __ftrace_modify_code (data=<optimized out>) at > kernel/trace/ftrace.c:2703 > #7 0xffffffe0002c1390 in multi_cpu_stop (data=0x0) at kernel/stop_machin > e.c:224 > #8 0xffffffe0002c0fbe in cpu_stopper_thread (cpu=<optimized out>) at kern > el/stop_machine.c:491 > #9 0xffffffe0002343be in smpboot_thread_fn (data=0x0) at kernel/smpboot. > c:165 > #10 0xffffffe00022f894 in kthread (_create=0xffffffe01af13040) at kern > el/kthread.c:292 > #11 0xffffffe000201fac in handle_exception () at arch/riscv/kernel/entry.S:236 > > On Wed, Oct 21, 2020 at 3:38 PM Zong Li <zong.li@sifive.com> wrote: > > > > Like the commit cb9d7fd51d9f ("watchdog: Mark watchdog touch functions > > as notrace"), some architectures assume that the stopped CPUs don't make > > function calls to traceable functions when they are in the stopped > > state. For example, it causes unexpected kernel crashed when switching > > tracer on RISC-V. > > > > The following patches added calls to these two functions, fix it by > > adding the notrace annotations. > > > > Fixes: 4ecf0a43e729 ("processor: get rid of cpu_relax_yield") > > Fixes: 366237e7b083 ("stop_machine: Provide RCU quiescent state in > > multi_cpu_stop()") > > > > Signed-off-by: Zong Li <zong.li@sifive.com> > > --- > > kernel/rcu/tree.c | 2 +- > > kernel/stop_machine.c | 2 +- > > 2 files changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > index 06895ef85d69..2a52f42f64b6 100644 > > --- a/kernel/rcu/tree.c > > +++ b/kernel/rcu/tree.c > > @@ -409,7 +409,7 @@ bool rcu_eqs_special_set(int cpu) > > * > > * The caller must have disabled interrupts and must not be idle. > > */ > > -void rcu_momentary_dyntick_idle(void) > > +notrace void rcu_momentary_dyntick_idle(void) > > { > > int special; > > > > diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c > > index 865bb0228ab6..890b79cf0e7c 100644 > > --- a/kernel/stop_machine.c > > +++ b/kernel/stop_machine.c > > @@ -178,7 +178,7 @@ static void ack_state(struct multi_stop_data *msdata) > > set_state(msdata, msdata->state + 1); > > } > > > > -void __weak stop_machine_yield(const struct cpumask *cpumask) > > +notrace void __weak stop_machine_yield(const struct cpumask *cpumask) > > { > > cpu_relax(); > > } > > -- > > 2.28.0 > > > > > -- > Best Regards > Guo Ren > > ML: https://lore.kernel.org/linux-csky/
On Thu, Oct 29, 2020 at 8:23 AM Atish Patra <atishp@atishpatra.org> wrote: > > On Wed, Oct 28, 2020 at 8:44 AM Guo Ren <guoren@kernel.org> wrote: > > > > Hi Zong & Atish, > > > > In our 2 harts c910 chip, we found: > > > > echo function > /sys/kernel/debug/tracing/current_tracer > > echo function_graph > /sys/kernel/debug/tracing/current_tracer > > echo function > /sys/kernel/debug/tracing/current_tracer > > echo function_graph > /sys/kernel/debug/tracing/current_tracer > > > > Then one core halted at stop_machine_yield: > > arch_cpu_idle () at arch/riscv/kernel/process.c:39 > > 39 local_irq_enable(); > > (gdb) i th > > Id Target Id Frame > > * 1 Thread 1 (CPU#0) arch_cpu_idle () at arch/riscv/kernel/process.c:39 > > 2 Thread 2 (CPU#1) stop_machine_yield > > (cpumask=0xffffffe001371fa8 <__cpu_online_mask>) at > > ./arch/riscv/include/asm/vdso/processor.h:12 > > (gdb) thread 2 > > [Switching to thread 2 (Thread 2)] > > #0 stop_machine_yield (cpumask=0xffffffe001371fa8 > > <__cpu_online_mask>) at ./arch/riscv/include/asm/vdso/processor.h:12 > > 12 __asm__ __volatile__ ("div %0, %0, zero" : "=r" (dummy)); > > > > With your patch, it's solved. For this patch, I'll give: > > Tested by: Guo Ren <guoren@kernel.org> > > > > But that's not enough, we still need: > > > > diff --git a/arch/riscv/kernel/sbi.c b/arch/riscv/kernel/sbi.c > > index 226ccce..12b8808 100644 > > --- a/arch/riscv/kernel/sbi.c > > +++ b/arch/riscv/kernel/sbi.c > > @@ -376,7 +376,7 @@ EXPORT_SYMBOL(sbi_send_ipi); > > * > > * Return: None > > */ > > -void sbi_remote_fence_i(const unsigned long *hart_mask) > > +void notrace sbi_remote_fence_i(const unsigned long *hart_mask) > > { > > __sbi_rfence(SBI_EXT_RFENCE_REMOTE_FENCE_I, > > hart_mask, 0, 0, 0, 0); > > diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c > > index 400b945d..9467d987 100644 > > --- a/arch/riscv/mm/cacheflush.c > > +++ b/arch/riscv/mm/cacheflush.c > > @@ -9,12 +9,12 @@ > > > > #include <asm/sbi.h> > > > > -static void ipi_remote_fence_i(void *info) > > +static void notrace ipi_remote_fence_i(void *info) > > { > > return local_flush_icache_all(); > > } > > > > -void flush_icache_all(void) > > +void notrace flush_icache_all(void) > > { > > if (IS_ENABLED(CONFIG_RISCV_SBI)) > > sbi_remote_fence_i(NULL); > > > > Did you see any issue if these functions are not marked as notrace ? > > As per Zong's explanation, the issue was that the other harts already > fetched the next 2 nops and > executed 1 while kernel patching replaced other with one of the auipc > + jalr pair. > > @Zong can correct me if I am wrong. > > These functions are too far ahead. Can it cause such issues ? If yes, > then we need to mark each and every function > that can be invoked from patch_text_nosync and are not inlined. > > That includes copy_to_kernel_nofault, __sbi_rfence_v02, > __sbi_rfence_v02_call, sbi_ecall. > > Few of these functions may be inlined by compiler. Can we depend on that ? > > > Because: > > (gdb) bt > > #0 flush_icache_all () at arch/riscv/mm/cacheflush.c:20 > > #1 0xffffffe00020473a in patch_text_nosync (addr=<optimized out>, insns= > > <optimized out>, len=<optimized out>) at arch/riscv/kernel/patch.c:96 > > #2 0xffffffe000206792 in ftrace_make_call (rec=<optimized out>, > > addr=<optimized out>) at arch/riscv/kernel/ftrace.c:109 > > #3 0xffffffe0002c9be4 in __ftrace_replace_code (rec=0xffffffe01ae40020, e > > nable=true) at kernel/trace/ftrace.c:2503 > > #4 0xffffffe0002ca092 in ftrace_replace_code (mod_flags=<optimized > > out>) at kernel/trace/ftrace.c:2530 > > #5 0xffffffe0002ca24a in ftrace_modify_all_code (command=9) at kernel > > /trace/ftrace.c:2677 > > #6 0xffffffe0002ca2ee in __ftrace_modify_code (data=<optimized out>) at > > kernel/trace/ftrace.c:2703 > > #7 0xffffffe0002c1390 in multi_cpu_stop (data=0x0) at kernel/stop_machin > > e.c:224 > > #8 0xffffffe0002c0fbe in cpu_stopper_thread (cpu=<optimized out>) at kern > > el/stop_machine.c:491 > > #9 0xffffffe0002343be in smpboot_thread_fn (data=0x0) at kernel/smpboot. > > c:165 > > #10 0xffffffe00022f894 in kthread (_create=0xffffffe01af13040) at kern > > el/kthread.c:292 > > #11 0xffffffe000201fac in handle_exception () at arch/riscv/kernel/entry.S:236 > > It seems to me that the problem happens on the waiting threads, it doesn't cause the issue on the patching code thread, so it is OK that these functions are traceable. I probably don't figure out all possible situations, do you find any issue and reason to change the annotation of these functions? > > On Wed, Oct 21, 2020 at 3:38 PM Zong Li <zong.li@sifive.com> wrote: > > > > > > Like the commit cb9d7fd51d9f ("watchdog: Mark watchdog touch functions > > > as notrace"), some architectures assume that the stopped CPUs don't make > > > function calls to traceable functions when they are in the stopped > > > state. For example, it causes unexpected kernel crashed when switching > > > tracer on RISC-V. > > > > > > The following patches added calls to these two functions, fix it by > > > adding the notrace annotations. > > > > > > Fixes: 4ecf0a43e729 ("processor: get rid of cpu_relax_yield") > > > Fixes: 366237e7b083 ("stop_machine: Provide RCU quiescent state in > > > multi_cpu_stop()") > > > > > > Signed-off-by: Zong Li <zong.li@sifive.com> > > > --- > > > kernel/rcu/tree.c | 2 +- > > > kernel/stop_machine.c | 2 +- > > > 2 files changed, 2 insertions(+), 2 deletions(-) > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > index 06895ef85d69..2a52f42f64b6 100644 > > > --- a/kernel/rcu/tree.c > > > +++ b/kernel/rcu/tree.c > > > @@ -409,7 +409,7 @@ bool rcu_eqs_special_set(int cpu) > > > * > > > * The caller must have disabled interrupts and must not be idle. > > > */ > > > -void rcu_momentary_dyntick_idle(void) > > > +notrace void rcu_momentary_dyntick_idle(void) > > > { > > > int special; > > > > > > diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c > > > index 865bb0228ab6..890b79cf0e7c 100644 > > > --- a/kernel/stop_machine.c > > > +++ b/kernel/stop_machine.c > > > @@ -178,7 +178,7 @@ static void ack_state(struct multi_stop_data *msdata) > > > set_state(msdata, msdata->state + 1); > > > } > > > > > > -void __weak stop_machine_yield(const struct cpumask *cpumask) > > > +notrace void __weak stop_machine_yield(const struct cpumask *cpumask) > > > { > > > cpu_relax(); > > > } > > > -- > > > 2.28.0 > > > > > > > > > -- > > Best Regards > > Guo Ren > > > > ML: https://lore.kernel.org/linux-csky/ > > > > -- > Regards, > Atish
On Thu, Oct 29, 2020 at 8:23 AM Atish Patra <atishp@atishpatra.org> wrote: > > On Wed, Oct 28, 2020 at 8:44 AM Guo Ren <guoren@kernel.org> wrote: > > > > Hi Zong & Atish, > > > > In our 2 harts c910 chip, we found: > > > > echo function > /sys/kernel/debug/tracing/current_tracer > > echo function_graph > /sys/kernel/debug/tracing/current_tracer > > echo function > /sys/kernel/debug/tracing/current_tracer > > echo function_graph > /sys/kernel/debug/tracing/current_tracer > > > > Then one core halted at stop_machine_yield: > > arch_cpu_idle () at arch/riscv/kernel/process.c:39 > > 39 local_irq_enable(); > > (gdb) i th > > Id Target Id Frame > > * 1 Thread 1 (CPU#0) arch_cpu_idle () at arch/riscv/kernel/process.c:39 > > 2 Thread 2 (CPU#1) stop_machine_yield > > (cpumask=0xffffffe001371fa8 <__cpu_online_mask>) at > > ./arch/riscv/include/asm/vdso/processor.h:12 > > (gdb) thread 2 > > [Switching to thread 2 (Thread 2)] > > #0 stop_machine_yield (cpumask=0xffffffe001371fa8 > > <__cpu_online_mask>) at ./arch/riscv/include/asm/vdso/processor.h:12 > > 12 __asm__ __volatile__ ("div %0, %0, zero" : "=r" (dummy)); > > > > With your patch, it's solved. For this patch, I'll give: > > Tested by: Guo Ren <guoren@kernel.org> > > > > But that's not enough, we still need: > > > > diff --git a/arch/riscv/kernel/sbi.c b/arch/riscv/kernel/sbi.c > > index 226ccce..12b8808 100644 > > --- a/arch/riscv/kernel/sbi.c > > +++ b/arch/riscv/kernel/sbi.c > > @@ -376,7 +376,7 @@ EXPORT_SYMBOL(sbi_send_ipi); > > * > > * Return: None > > */ > > -void sbi_remote_fence_i(const unsigned long *hart_mask) > > +void notrace sbi_remote_fence_i(const unsigned long *hart_mask) > > { > > __sbi_rfence(SBI_EXT_RFENCE_REMOTE_FENCE_I, > > hart_mask, 0, 0, 0, 0); > > diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c > > index 400b945d..9467d987 100644 > > --- a/arch/riscv/mm/cacheflush.c > > +++ b/arch/riscv/mm/cacheflush.c > > @@ -9,12 +9,12 @@ > > > > #include <asm/sbi.h> > > > > -static void ipi_remote_fence_i(void *info) > > +static void notrace ipi_remote_fence_i(void *info) > > { > > return local_flush_icache_all(); > > } > > > > -void flush_icache_all(void) > > +void notrace flush_icache_all(void) > > { > > if (IS_ENABLED(CONFIG_RISCV_SBI)) > > sbi_remote_fence_i(NULL); > > > > Did you see any issue if these functions are not marked as notrace ? Welcome to Buildroot buildroot login: root # # cat /proc/cpuinfo processor : 0 hart : 0 isa : rv64imafdcsu mmu : sv39 # # # echo function > /sys/kernel/debug/tracing/current_tracer [ 45.234334] Unable to handle kernel paging request at virtual address ffffffd38ae80900 [ 45.242313] Oops [#1] [ 45.244600] Modules linked in: [ 45.247678] CPU: 0 PID: 11 Comm: migration/0 Not tainted 5.9.0-00025-g9b7db83-dirty #215 [ 45.255797] epc: ffffffe00021689a ra : ffffffe00021718e sp : ffffffe01afabb58 [ 45.262955] gp : ffffffe00136afa0 tp : ffffffe01af94d00 t0 : 0000000000000002 [ 45.270200] t1 : 0000000000000000 t2 : 0000000000000001 s0 : ffffffe01afabc08 [ 45.277443] s1 : ffffffe0013718a8 a0 : 0000000000000000 a1 : ffffffe01afabba8 [ 45.284686] a2 : 0000000000000000 a3 : 0000000000000000 a4 : c4c16ad38ae80900 [ 45.291929] a5 : 0000000000000000 a6 : 0000000000000000 a7 : 0000000052464e43 [ 45.299173] s2 : 0000000000000001 s3 : ffffffe000206a60 s4 : ffffffe000206a60 [ 45.306415] s5 : 00000000000009ec s6 : ffffffe0013718a8 s7 : c4c16ad38ae80900 [ 45.313658] s8 : 0000000000000004 s9 : 0000000000000001 s10: 0000000000000001 [ 45.320902] s11: 0000000000000003 t3 : 0000000000000001 t4 : ffffffffd192fe79 [ 45.328144] t5 : ffffffffb8f80000 t6 : 0000000000040000 [ 45.333472] status: 0000000200000100 badaddr: ffffffd38ae80900 cause: 000000000000000f [ 45.341514] ---[ end trace d95102172248fdcf ]--- [ 45.346176] note: migration/0[11] exited with preempt_count 1 (gdb) x /2i $pc => 0xffffffe00021689a <__do_proc_dointvec+196>: sd zero,0(s7) 0xffffffe00021689e <__do_proc_dointvec+200>: li s11,0 (gdb) bt #0 __do_proc_dointvec (tbl_data=0x0, table=0xffffffe01afabba8, write=0, buffer=0x0, lenp=0x7bf897061f9a0800, ppos=0x4, conv=0x0, data=0x52464e43) at kernel/sysctl.c:581 #1 0xffffffe00021718e in do_proc_dointvec (data=<optimized out>, conv=<optimized out>, ppos=<optimized out>, lenp=<optimized out>, buffer=<optimized out>, write=<optimized out>, table=<optimized out>) at kernel/sysctl.c:964 #2 proc_dointvec_minmax (ppos=<optimized out>, lenp=<optimized out>, buffer=<optimized out>, write=<optimized out>, table=<optimized out>) at kernel/sysctl.c:964 #3 proc_do_static_key (table=<optimized out>, write=1, buffer=0x0, lenp=0x0, ppos=0x7bf897061f9a0800) at kernel/sysctl.c:1643 #4 0xffffffe000206792 in ftrace_make_call (rec=<optimized out>, addr=<optimized out>) at arch/riscv/kernel/ftrace.c:109 #5 0xffffffe0002c9c04 in __ftrace_replace_code (rec=0xffffffe01ae40c30, enable=3) at kernel/trace/ftrace.c:2503 #6 0xffffffe0002ca0b2 in ftrace_replace_code (mod_flags=<optimized out>) at kernel/trace/ftrace.c:2530 #7 0xffffffe0002ca26a in ftrace_modify_all_code (command=5) at kernel/trace/ftrace.c:2677 #8 0xffffffe0002ca30e in __ftrace_modify_code (data=<optimized out>) at kernel/trace/ftrace.c:2703 #9 0xffffffe0002c13b0 in multi_cpu_stop (data=0x0) at kernel/stop_machine.c:224 #10 0xffffffe0002c0fde in cpu_stopper_thread (cpu=<optimized out>) at kernel/stop_machine.c:491 #11 0xffffffe0002343de in smpboot_thread_fn (data=0x0) at kernel/smpboot.c:165 #12 0xffffffe00022f8b4 in kthread (_create=0xffffffe01af0c040) at kernel/kthread.c:292 #13 0xffffffe000201fac in handle_exception () at arch/riscv/kernel/entry.S:236 0xffffffe00020678a <+114>: auipc ra,0xffffe 0xffffffe00020678e <+118>: jalr -118(ra) # 0xffffffe000204714 <patch_text_nosync> 0xffffffe000206792 <+122>: snez a0,a0 (gdb) disassemble patch_text_nosync Dump of assembler code for function patch_text_nosync: 0xffffffe000204714 <+0>: addi sp,sp,-32 0xffffffe000204716 <+2>: sd s0,16(sp) 0xffffffe000204718 <+4>: sd ra,24(sp) 0xffffffe00020471a <+6>: addi s0,sp,32 0xffffffe00020471c <+8>: auipc ra,0x0 0xffffffe000204720 <+12>: jalr -384(ra) # 0xffffffe00020459c <patch_insn_write> 0xffffffe000204724 <+16>: beqz a0,0xffffffe00020472e <patch_text_nosync+26> 0xffffffe000204726 <+18>: ld ra,24(sp) 0xffffffe000204728 <+20>: ld s0,16(sp) 0xffffffe00020472a <+22>: addi sp,sp,32 0xffffffe00020472c <+24>: ret 0xffffffe00020472e <+26>: sd a0,-24(s0) 0xffffffe000204732 <+30>: auipc ra,0x4 0xffffffe000204736 <+34>: jalr -1464(ra) # 0xffffffe00020817a <flush_icache_all> 0xffffffe00020473a <+38>: ld a0,-24(s0) 0xffffffe00020473e <+42>: ld ra,24(sp) 0xffffffe000204740 <+44>: ld s0,16(sp) 0xffffffe000204742 <+46>: addi sp,sp,32 0xffffffe000204744 <+48>: ret (gdb) disassemble flush_icache_all-4 Dump of assembler code for function flush_icache_all: 0xffffffe00020817a <+0>: addi sp,sp,-8 0xffffffe00020817c <+2>: sd ra,0(sp) 0xffffffe00020817e <+4>: auipc ra,0xfffff 0xffffffe000208182 <+8>: jalr -1822(ra) # 0xffffffe000206a60 <ftrace_caller> 0xffffffe000208186 <+12>: ld ra,0(sp) 0xffffffe000208188 <+14>: addi sp,sp,8 0xffffffe00020818a <+0>: addi sp,sp,-16 0xffffffe00020818c <+2>: sd s0,0(sp) 0xffffffe00020818e <+4>: sd ra,8(sp) 0xffffffe000208190 <+6>: addi s0,sp,16 0xffffffe000208192 <+8>: li a0,0 0xffffffe000208194 <+10>: auipc ra,0xfffff 0xffffffe000208198 <+14>: jalr -410(ra) # 0xffffffe000206ffa <sbi_remote_fence_i> 0xffffffe00020819c <+18>: ld s0,0(sp) 0xffffffe00020819e <+20>: ld ra,8(sp) 0xffffffe0002081a0 <+22>: addi sp,sp,16 0xffffffe0002081a2 <+24>: ret (gdb) frame 5 #5 0xffffffe0002c9c04 in __ftrace_replace_code (rec=0xffffffe01ae40c30, enable=3) at kernel/trace/ftrace.c:2503 2503 return ftrace_make_call(rec, ftrace_addr); (gdb) p /x rec->ip $2 = 0xffffffe00020817a -> flush_icache_all ! Look when we modify flush_icache_all's patchable-entry with ftrace_caller: 1. Insert ftrace_caller at flush_icache_all entry. 2. Call flush_icache_all to sync I/Dcache, but flush_icache_all is just we've modified not ready to be called! > > As per Zong's explanation, the issue was that the other harts already > fetched the next 2 nops and > executed 1 while kernel patching replaced other with one of the auipc > + jalr pair. > > @Zong can correct me if I am wrong. > > These functions are too far ahead. Can it cause such issues ? If yes, > then we need to mark each and every function > that can be invoked from patch_text_nosync and are not inlined. > > That includes copy_to_kernel_nofault, __sbi_rfence_v02, > __sbi_rfence_v02_call, sbi_ecall. Yes, mark all of them. > > Few of these functions may be inlined by compiler. Can we depend on that ? It works, but we'd better give notrace for them. > > > Because: > > (gdb) bt > > #0 flush_icache_all () at arch/riscv/mm/cacheflush.c:20 > > #1 0xffffffe00020473a in patch_text_nosync (addr=<optimized out>, insns= > > <optimized out>, len=<optimized out>) at arch/riscv/kernel/patch.c:96 > > #2 0xffffffe000206792 in ftrace_make_call (rec=<optimized out>, > > addr=<optimized out>) at arch/riscv/kernel/ftrace.c:109 > > #3 0xffffffe0002c9be4 in __ftrace_replace_code (rec=0xffffffe01ae40020, e > > nable=true) at kernel/trace/ftrace.c:2503 > > #4 0xffffffe0002ca092 in ftrace_replace_code (mod_flags=<optimized > > out>) at kernel/trace/ftrace.c:2530 > > #5 0xffffffe0002ca24a in ftrace_modify_all_code (command=9) at kernel > > /trace/ftrace.c:2677 > > #6 0xffffffe0002ca2ee in __ftrace_modify_code (data=<optimized out>) at > > kernel/trace/ftrace.c:2703 > > #7 0xffffffe0002c1390 in multi_cpu_stop (data=0x0) at kernel/stop_machin > > e.c:224 > > #8 0xffffffe0002c0fbe in cpu_stopper_thread (cpu=<optimized out>) at kern > > el/stop_machine.c:491 > > #9 0xffffffe0002343be in smpboot_thread_fn (data=0x0) at kernel/smpboot. > > c:165 > > #10 0xffffffe00022f894 in kthread (_create=0xffffffe01af13040) at kern > > el/kthread.c:292 > > #11 0xffffffe000201fac in handle_exception () at arch/riscv/kernel/entry.S:236 > > > > On Wed, Oct 21, 2020 at 3:38 PM Zong Li <zong.li@sifive.com> wrote: > > > > > > Like the commit cb9d7fd51d9f ("watchdog: Mark watchdog touch functions > > > as notrace"), some architectures assume that the stopped CPUs don't make > > > function calls to traceable functions when they are in the stopped > > > state. For example, it causes unexpected kernel crashed when switching > > > tracer on RISC-V. > > > > > > The following patches added calls to these two functions, fix it by > > > adding the notrace annotations. > > > > > > Fixes: 4ecf0a43e729 ("processor: get rid of cpu_relax_yield") > > > Fixes: 366237e7b083 ("stop_machine: Provide RCU quiescent state in > > > multi_cpu_stop()") > > > > > > Signed-off-by: Zong Li <zong.li@sifive.com> > > > --- > > > kernel/rcu/tree.c | 2 +- > > > kernel/stop_machine.c | 2 +- > > > 2 files changed, 2 insertions(+), 2 deletions(-) > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > index 06895ef85d69..2a52f42f64b6 100644 > > > --- a/kernel/rcu/tree.c > > > +++ b/kernel/rcu/tree.c > > > @@ -409,7 +409,7 @@ bool rcu_eqs_special_set(int cpu) > > > * > > > * The caller must have disabled interrupts and must not be idle. > > > */ > > > -void rcu_momentary_dyntick_idle(void) > > > +notrace void rcu_momentary_dyntick_idle(void) > > > { > > > int special; > > > > > > diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c > > > index 865bb0228ab6..890b79cf0e7c 100644 > > > --- a/kernel/stop_machine.c > > > +++ b/kernel/stop_machine.c > > > @@ -178,7 +178,7 @@ static void ack_state(struct multi_stop_data *msdata) > > > set_state(msdata, msdata->state + 1); > > > } > > > > > > -void __weak stop_machine_yield(const struct cpumask *cpumask) > > > +notrace void __weak stop_machine_yield(const struct cpumask *cpumask) > > > { > > > cpu_relax(); > > > } > > > -- > > > 2.28.0 > > > > > > > > > -- > > Best Regards > > Guo Ren > > > > ML: https://lore.kernel.org/linux-csky/ > > > > -- > Regards, > Atish -- Best Regards Guo Ren ML: https://lore.kernel.org/linux-csky/
On Thu, Oct 29, 2020 at 10:34 AM Zong Li <zong.li@sifive.com> wrote: > > On Thu, Oct 29, 2020 at 8:23 AM Atish Patra <atishp@atishpatra.org> wrote: > > > > On Wed, Oct 28, 2020 at 8:44 AM Guo Ren <guoren@kernel.org> wrote: > > > > > > Hi Zong & Atish, > > > > > > In our 2 harts c910 chip, we found: > > > > > > echo function > /sys/kernel/debug/tracing/current_tracer > > > echo function_graph > /sys/kernel/debug/tracing/current_tracer > > > echo function > /sys/kernel/debug/tracing/current_tracer > > > echo function_graph > /sys/kernel/debug/tracing/current_tracer > > > > > > Then one core halted at stop_machine_yield: > > > arch_cpu_idle () at arch/riscv/kernel/process.c:39 > > > 39 local_irq_enable(); > > > (gdb) i th > > > Id Target Id Frame > > > * 1 Thread 1 (CPU#0) arch_cpu_idle () at arch/riscv/kernel/process.c:39 > > > 2 Thread 2 (CPU#1) stop_machine_yield > > > (cpumask=0xffffffe001371fa8 <__cpu_online_mask>) at > > > ./arch/riscv/include/asm/vdso/processor.h:12 > > > (gdb) thread 2 > > > [Switching to thread 2 (Thread 2)] > > > #0 stop_machine_yield (cpumask=0xffffffe001371fa8 > > > <__cpu_online_mask>) at ./arch/riscv/include/asm/vdso/processor.h:12 > > > 12 __asm__ __volatile__ ("div %0, %0, zero" : "=r" (dummy)); > > > > > > With your patch, it's solved. For this patch, I'll give: > > > Tested by: Guo Ren <guoren@kernel.org> > > > > > > But that's not enough, we still need: > > > > > > diff --git a/arch/riscv/kernel/sbi.c b/arch/riscv/kernel/sbi.c > > > index 226ccce..12b8808 100644 > > > --- a/arch/riscv/kernel/sbi.c > > > +++ b/arch/riscv/kernel/sbi.c > > > @@ -376,7 +376,7 @@ EXPORT_SYMBOL(sbi_send_ipi); > > > * > > > * Return: None > > > */ > > > -void sbi_remote_fence_i(const unsigned long *hart_mask) > > > +void notrace sbi_remote_fence_i(const unsigned long *hart_mask) > > > { > > > __sbi_rfence(SBI_EXT_RFENCE_REMOTE_FENCE_I, > > > hart_mask, 0, 0, 0, 0); > > > diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c > > > index 400b945d..9467d987 100644 > > > --- a/arch/riscv/mm/cacheflush.c > > > +++ b/arch/riscv/mm/cacheflush.c > > > @@ -9,12 +9,12 @@ > > > > > > #include <asm/sbi.h> > > > > > > -static void ipi_remote_fence_i(void *info) > > > +static void notrace ipi_remote_fence_i(void *info) > > > { > > > return local_flush_icache_all(); > > > } > > > > > > -void flush_icache_all(void) > > > +void notrace flush_icache_all(void) > > > { > > > if (IS_ENABLED(CONFIG_RISCV_SBI)) > > > sbi_remote_fence_i(NULL); > > > > > > > Did you see any issue if these functions are not marked as notrace ? > > > > As per Zong's explanation, the issue was that the other harts already > > fetched the next 2 nops and > > executed 1 while kernel patching replaced other with one of the auipc > > + jalr pair. > > > > @Zong can correct me if I am wrong. > > > > These functions are too far ahead. Can it cause such issues ? If yes, > > then we need to mark each and every function > > that can be invoked from patch_text_nosync and are not inlined. > > > > That includes copy_to_kernel_nofault, __sbi_rfence_v02, > > __sbi_rfence_v02_call, sbi_ecall. > > > > Few of these functions may be inlined by compiler. Can we depend on that ? > > > > > Because: > > > (gdb) bt > > > #0 flush_icache_all () at arch/riscv/mm/cacheflush.c:20 > > > #1 0xffffffe00020473a in patch_text_nosync (addr=<optimized out>, insns= > > > <optimized out>, len=<optimized out>) at arch/riscv/kernel/patch.c:96 > > > #2 0xffffffe000206792 in ftrace_make_call (rec=<optimized out>, > > > addr=<optimized out>) at arch/riscv/kernel/ftrace.c:109 > > > #3 0xffffffe0002c9be4 in __ftrace_replace_code (rec=0xffffffe01ae40020, e > > > nable=true) at kernel/trace/ftrace.c:2503 > > > #4 0xffffffe0002ca092 in ftrace_replace_code (mod_flags=<optimized > > > out>) at kernel/trace/ftrace.c:2530 > > > #5 0xffffffe0002ca24a in ftrace_modify_all_code (command=9) at kernel > > > /trace/ftrace.c:2677 > > > #6 0xffffffe0002ca2ee in __ftrace_modify_code (data=<optimized out>) at > > > kernel/trace/ftrace.c:2703 > > > #7 0xffffffe0002c1390 in multi_cpu_stop (data=0x0) at kernel/stop_machin > > > e.c:224 > > > #8 0xffffffe0002c0fbe in cpu_stopper_thread (cpu=<optimized out>) at kern > > > el/stop_machine.c:491 > > > #9 0xffffffe0002343be in smpboot_thread_fn (data=0x0) at kernel/smpboot. > > > c:165 > > > #10 0xffffffe00022f894 in kthread (_create=0xffffffe01af13040) at kern > > > el/kthread.c:292 > > > #11 0xffffffe000201fac in handle_exception () at arch/riscv/kernel/entry.S:236 > > > > > It seems to me that the problem happens on the waiting threads, it No, that is the call trace to show ftrace_make_call -> flush_icache_all and we should give notrace on the whole path. > doesn't cause the issue on the patching code thread, so it is OK that > these functions are traceable. I probably don't figure out all > possible situations, do you find any issue and reason to change the > annotation of these functions? > > > > On Wed, Oct 21, 2020 at 3:38 PM Zong Li <zong.li@sifive.com> wrote: > > > > > > > > Like the commit cb9d7fd51d9f ("watchdog: Mark watchdog touch functions > > > > as notrace"), some architectures assume that the stopped CPUs don't make > > > > function calls to traceable functions when they are in the stopped > > > > state. For example, it causes unexpected kernel crashed when switching > > > > tracer on RISC-V. > > > > > > > > The following patches added calls to these two functions, fix it by > > > > adding the notrace annotations. > > > > > > > > Fixes: 4ecf0a43e729 ("processor: get rid of cpu_relax_yield") > > > > Fixes: 366237e7b083 ("stop_machine: Provide RCU quiescent state in > > > > multi_cpu_stop()") > > > > > > > > Signed-off-by: Zong Li <zong.li@sifive.com> > > > > --- > > > > kernel/rcu/tree.c | 2 +- > > > > kernel/stop_machine.c | 2 +- > > > > 2 files changed, 2 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > > index 06895ef85d69..2a52f42f64b6 100644 > > > > --- a/kernel/rcu/tree.c > > > > +++ b/kernel/rcu/tree.c > > > > @@ -409,7 +409,7 @@ bool rcu_eqs_special_set(int cpu) > > > > * > > > > * The caller must have disabled interrupts and must not be idle. > > > > */ > > > > -void rcu_momentary_dyntick_idle(void) > > > > +notrace void rcu_momentary_dyntick_idle(void) > > > > { > > > > int special; > > > > > > > > diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c > > > > index 865bb0228ab6..890b79cf0e7c 100644 > > > > --- a/kernel/stop_machine.c > > > > +++ b/kernel/stop_machine.c > > > > @@ -178,7 +178,7 @@ static void ack_state(struct multi_stop_data *msdata) > > > > set_state(msdata, msdata->state + 1); > > > > } > > > > > > > > -void __weak stop_machine_yield(const struct cpumask *cpumask) > > > > +notrace void __weak stop_machine_yield(const struct cpumask *cpumask) > > > > { > > > > cpu_relax(); > > > > } > > > > -- > > > > 2.28.0 > > > > > > > > > > > > > -- > > > Best Regards > > > Guo Ren > > > > > > ML: https://lore.kernel.org/linux-csky/ > > > > > > > > -- > > Regards, > > Atish
On Thu, Oct 29, 2020 at 9:06 AM Guo Ren <guoren@kernel.org> wrote: > > On Thu, Oct 29, 2020 at 10:34 AM Zong Li <zong.li@sifive.com> wrote: > > > > On Thu, Oct 29, 2020 at 8:23 AM Atish Patra <atishp@atishpatra.org> wrote: > > > > > > On Wed, Oct 28, 2020 at 8:44 AM Guo Ren <guoren@kernel.org> wrote: > > > > > > > > Hi Zong & Atish, > > > > > > > > In our 2 harts c910 chip, we found: > > > > > > > > echo function > /sys/kernel/debug/tracing/current_tracer > > > > echo function_graph > /sys/kernel/debug/tracing/current_tracer > > > > echo function > /sys/kernel/debug/tracing/current_tracer > > > > echo function_graph > /sys/kernel/debug/tracing/current_tracer > > > > > > > > Then one core halted at stop_machine_yield: > > > > arch_cpu_idle () at arch/riscv/kernel/process.c:39 > > > > 39 local_irq_enable(); > > > > (gdb) i th > > > > Id Target Id Frame > > > > * 1 Thread 1 (CPU#0) arch_cpu_idle () at arch/riscv/kernel/process.c:39 > > > > 2 Thread 2 (CPU#1) stop_machine_yield > > > > (cpumask=0xffffffe001371fa8 <__cpu_online_mask>) at > > > > ./arch/riscv/include/asm/vdso/processor.h:12 > > > > (gdb) thread 2 > > > > [Switching to thread 2 (Thread 2)] > > > > #0 stop_machine_yield (cpumask=0xffffffe001371fa8 > > > > <__cpu_online_mask>) at ./arch/riscv/include/asm/vdso/processor.h:12 > > > > 12 __asm__ __volatile__ ("div %0, %0, zero" : "=r" (dummy)); > > > > > > > > With your patch, it's solved. For this patch, I'll give: > > > > Tested by: Guo Ren <guoren@kernel.org> > > > > > > > > But that's not enough, we still need: > > > > > > > > diff --git a/arch/riscv/kernel/sbi.c b/arch/riscv/kernel/sbi.c > > > > index 226ccce..12b8808 100644 > > > > --- a/arch/riscv/kernel/sbi.c > > > > +++ b/arch/riscv/kernel/sbi.c > > > > @@ -376,7 +376,7 @@ EXPORT_SYMBOL(sbi_send_ipi); > > > > * > > > > * Return: None > > > > */ > > > > -void sbi_remote_fence_i(const unsigned long *hart_mask) > > > > +void notrace sbi_remote_fence_i(const unsigned long *hart_mask) > > > > { > > > > __sbi_rfence(SBI_EXT_RFENCE_REMOTE_FENCE_I, > > > > hart_mask, 0, 0, 0, 0); > > > > diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c > > > > index 400b945d..9467d987 100644 > > > > --- a/arch/riscv/mm/cacheflush.c > > > > +++ b/arch/riscv/mm/cacheflush.c > > > > @@ -9,12 +9,12 @@ > > > > > > > > #include <asm/sbi.h> > > > > > > > > -static void ipi_remote_fence_i(void *info) > > > > +static void notrace ipi_remote_fence_i(void *info) > > > > { > > > > return local_flush_icache_all(); > > > > } > > > > > > > > -void flush_icache_all(void) > > > > +void notrace flush_icache_all(void) > > > > { > > > > if (IS_ENABLED(CONFIG_RISCV_SBI)) > > > > sbi_remote_fence_i(NULL); > > > > > > > > > > Did you see any issue if these functions are not marked as notrace ? > > > > > > As per Zong's explanation, the issue was that the other harts already > > > fetched the next 2 nops and > > > executed 1 while kernel patching replaced other with one of the auipc > > > + jalr pair. > > > > > > @Zong can correct me if I am wrong. > > > > > > These functions are too far ahead. Can it cause such issues ? If yes, > > > then we need to mark each and every function > > > that can be invoked from patch_text_nosync and are not inlined. > > > > > > That includes copy_to_kernel_nofault, __sbi_rfence_v02, > > > __sbi_rfence_v02_call, sbi_ecall. > > > > > > Few of these functions may be inlined by compiler. Can we depend on that ? > > > > > > > Because: > > > > (gdb) bt > > > > #0 flush_icache_all () at arch/riscv/mm/cacheflush.c:20 > > > > #1 0xffffffe00020473a in patch_text_nosync (addr=<optimized out>, insns= > > > > <optimized out>, len=<optimized out>) at arch/riscv/kernel/patch.c:96 > > > > #2 0xffffffe000206792 in ftrace_make_call (rec=<optimized out>, > > > > addr=<optimized out>) at arch/riscv/kernel/ftrace.c:109 > > > > #3 0xffffffe0002c9be4 in __ftrace_replace_code (rec=0xffffffe01ae40020, e > > > > nable=true) at kernel/trace/ftrace.c:2503 > > > > #4 0xffffffe0002ca092 in ftrace_replace_code (mod_flags=<optimized > > > > out>) at kernel/trace/ftrace.c:2530 > > > > #5 0xffffffe0002ca24a in ftrace_modify_all_code (command=9) at kernel > > > > /trace/ftrace.c:2677 > > > > #6 0xffffffe0002ca2ee in __ftrace_modify_code (data=<optimized out>) at > > > > kernel/trace/ftrace.c:2703 > > > > #7 0xffffffe0002c1390 in multi_cpu_stop (data=0x0) at kernel/stop_machin > > > > e.c:224 > > > > #8 0xffffffe0002c0fbe in cpu_stopper_thread (cpu=<optimized out>) at kern > > > > el/stop_machine.c:491 > > > > #9 0xffffffe0002343be in smpboot_thread_fn (data=0x0) at kernel/smpboot. > > > > c:165 > > > > #10 0xffffffe00022f894 in kthread (_create=0xffffffe01af13040) at kern > > > > el/kthread.c:292 > > > > #11 0xffffffe000201fac in handle_exception () at arch/riscv/kernel/entry.S:236 > > > > > > > > It seems to me that the problem happens on the waiting threads, it > No, that is the call trace to show ftrace_make_call -> > flush_icache_all and we should give notrace on the whole path. > Hmm. I am curious to understand how other architectures avoid this problem. Is it a bigger issue in RISC-V because we have to switch privilege mode to sync I/D cache ? > > doesn't cause the issue on the patching code thread, so it is OK that > > these functions are traceable. I probably don't figure out all > > possible situations, do you find any issue and reason to change the > > annotation of these functions? > > > > > > On Wed, Oct 21, 2020 at 3:38 PM Zong Li <zong.li@sifive.com> wrote: > > > > > > > > > > Like the commit cb9d7fd51d9f ("watchdog: Mark watchdog touch functions > > > > > as notrace"), some architectures assume that the stopped CPUs don't make > > > > > function calls to traceable functions when they are in the stopped > > > > > state. For example, it causes unexpected kernel crashed when switching > > > > > tracer on RISC-V. > > > > > > > > > > The following patches added calls to these two functions, fix it by > > > > > adding the notrace annotations. > > > > > > > > > > Fixes: 4ecf0a43e729 ("processor: get rid of cpu_relax_yield") > > > > > Fixes: 366237e7b083 ("stop_machine: Provide RCU quiescent state in > > > > > multi_cpu_stop()") > > > > > > > > > > Signed-off-by: Zong Li <zong.li@sifive.com> > > > > > --- > > > > > kernel/rcu/tree.c | 2 +- > > > > > kernel/stop_machine.c | 2 +- > > > > > 2 files changed, 2 insertions(+), 2 deletions(-) > > > > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > > > index 06895ef85d69..2a52f42f64b6 100644 > > > > > --- a/kernel/rcu/tree.c > > > > > +++ b/kernel/rcu/tree.c > > > > > @@ -409,7 +409,7 @@ bool rcu_eqs_special_set(int cpu) > > > > > * > > > > > * The caller must have disabled interrupts and must not be idle. > > > > > */ > > > > > -void rcu_momentary_dyntick_idle(void) > > > > > +notrace void rcu_momentary_dyntick_idle(void) > > > > > { > > > > > int special; > > > > > > > > > > diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c > > > > > index 865bb0228ab6..890b79cf0e7c 100644 > > > > > --- a/kernel/stop_machine.c > > > > > +++ b/kernel/stop_machine.c > > > > > @@ -178,7 +178,7 @@ static void ack_state(struct multi_stop_data *msdata) > > > > > set_state(msdata, msdata->state + 1); > > > > > } > > > > > > > > > > -void __weak stop_machine_yield(const struct cpumask *cpumask) > > > > > +notrace void __weak stop_machine_yield(const struct cpumask *cpumask) > > > > > { > > > > > cpu_relax(); > > > > > } > > > > > -- > > > > > 2.28.0 > > > > > > > > > > > > > > > > > -- > > > > Best Regards > > > > Guo Ren > > > > > > > > ML: https://lore.kernel.org/linux-csky/ > > > > > > > > > > > > -- > > > Regards, > > > Atish > > > > -- > Best Regards > Guo Ren > > ML: https://lore.kernel.org/linux-csky/
On Fri, Oct 30, 2020 at 2:46 AM Atish Patra <atishp@atishpatra.org> wrote: > > On Thu, Oct 29, 2020 at 9:06 AM Guo Ren <guoren@kernel.org> wrote: > > > > On Thu, Oct 29, 2020 at 10:34 AM Zong Li <zong.li@sifive.com> wrote: > > > > > > On Thu, Oct 29, 2020 at 8:23 AM Atish Patra <atishp@atishpatra.org> wrote: > > > > > > > > On Wed, Oct 28, 2020 at 8:44 AM Guo Ren <guoren@kernel.org> wrote: > > > > > > > > > > Hi Zong & Atish, > > > > > > > > > > In our 2 harts c910 chip, we found: > > > > > > > > > > echo function > /sys/kernel/debug/tracing/current_tracer > > > > > echo function_graph > /sys/kernel/debug/tracing/current_tracer > > > > > echo function > /sys/kernel/debug/tracing/current_tracer > > > > > echo function_graph > /sys/kernel/debug/tracing/current_tracer > > > > > > > > > > Then one core halted at stop_machine_yield: > > > > > arch_cpu_idle () at arch/riscv/kernel/process.c:39 > > > > > 39 local_irq_enable(); > > > > > (gdb) i th > > > > > Id Target Id Frame > > > > > * 1 Thread 1 (CPU#0) arch_cpu_idle () at arch/riscv/kernel/process.c:39 > > > > > 2 Thread 2 (CPU#1) stop_machine_yield > > > > > (cpumask=0xffffffe001371fa8 <__cpu_online_mask>) at > > > > > ./arch/riscv/include/asm/vdso/processor.h:12 > > > > > (gdb) thread 2 > > > > > [Switching to thread 2 (Thread 2)] > > > > > #0 stop_machine_yield (cpumask=0xffffffe001371fa8 > > > > > <__cpu_online_mask>) at ./arch/riscv/include/asm/vdso/processor.h:12 > > > > > 12 __asm__ __volatile__ ("div %0, %0, zero" : "=r" (dummy)); > > > > > > > > > > With your patch, it's solved. For this patch, I'll give: > > > > > Tested by: Guo Ren <guoren@kernel.org> > > > > > > > > > > But that's not enough, we still need: > > > > > > > > > > diff --git a/arch/riscv/kernel/sbi.c b/arch/riscv/kernel/sbi.c > > > > > index 226ccce..12b8808 100644 > > > > > --- a/arch/riscv/kernel/sbi.c > > > > > +++ b/arch/riscv/kernel/sbi.c > > > > > @@ -376,7 +376,7 @@ EXPORT_SYMBOL(sbi_send_ipi); > > > > > * > > > > > * Return: None > > > > > */ > > > > > -void sbi_remote_fence_i(const unsigned long *hart_mask) > > > > > +void notrace sbi_remote_fence_i(const unsigned long *hart_mask) > > > > > { > > > > > __sbi_rfence(SBI_EXT_RFENCE_REMOTE_FENCE_I, > > > > > hart_mask, 0, 0, 0, 0); > > > > > diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c > > > > > index 400b945d..9467d987 100644 > > > > > --- a/arch/riscv/mm/cacheflush.c > > > > > +++ b/arch/riscv/mm/cacheflush.c > > > > > @@ -9,12 +9,12 @@ > > > > > > > > > > #include <asm/sbi.h> > > > > > > > > > > -static void ipi_remote_fence_i(void *info) > > > > > +static void notrace ipi_remote_fence_i(void *info) > > > > > { > > > > > return local_flush_icache_all(); > > > > > } > > > > > > > > > > -void flush_icache_all(void) > > > > > +void notrace flush_icache_all(void) > > > > > { > > > > > if (IS_ENABLED(CONFIG_RISCV_SBI)) > > > > > sbi_remote_fence_i(NULL); > > > > > > > > > > > > > Did you see any issue if these functions are not marked as notrace ? > > > > > > > > As per Zong's explanation, the issue was that the other harts already > > > > fetched the next 2 nops and > > > > executed 1 while kernel patching replaced other with one of the auipc > > > > + jalr pair. > > > > > > > > @Zong can correct me if I am wrong. > > > > > > > > These functions are too far ahead. Can it cause such issues ? If yes, > > > > then we need to mark each and every function > > > > that can be invoked from patch_text_nosync and are not inlined. > > > > > > > > That includes copy_to_kernel_nofault, __sbi_rfence_v02, > > > > __sbi_rfence_v02_call, sbi_ecall. > > > > > > > > Few of these functions may be inlined by compiler. Can we depend on that ? > > > > > > > > > Because: > > > > > (gdb) bt > > > > > #0 flush_icache_all () at arch/riscv/mm/cacheflush.c:20 > > > > > #1 0xffffffe00020473a in patch_text_nosync (addr=<optimized out>, insns= > > > > > <optimized out>, len=<optimized out>) at arch/riscv/kernel/patch.c:96 > > > > > #2 0xffffffe000206792 in ftrace_make_call (rec=<optimized out>, > > > > > addr=<optimized out>) at arch/riscv/kernel/ftrace.c:109 > > > > > #3 0xffffffe0002c9be4 in __ftrace_replace_code (rec=0xffffffe01ae40020, e > > > > > nable=true) at kernel/trace/ftrace.c:2503 > > > > > #4 0xffffffe0002ca092 in ftrace_replace_code (mod_flags=<optimized > > > > > out>) at kernel/trace/ftrace.c:2530 > > > > > #5 0xffffffe0002ca24a in ftrace_modify_all_code (command=9) at kernel > > > > > /trace/ftrace.c:2677 > > > > > #6 0xffffffe0002ca2ee in __ftrace_modify_code (data=<optimized out>) at > > > > > kernel/trace/ftrace.c:2703 > > > > > #7 0xffffffe0002c1390 in multi_cpu_stop (data=0x0) at kernel/stop_machin > > > > > e.c:224 > > > > > #8 0xffffffe0002c0fbe in cpu_stopper_thread (cpu=<optimized out>) at kern > > > > > el/stop_machine.c:491 > > > > > #9 0xffffffe0002343be in smpboot_thread_fn (data=0x0) at kernel/smpboot. > > > > > c:165 > > > > > #10 0xffffffe00022f894 in kthread (_create=0xffffffe01af13040) at kern > > > > > el/kthread.c:292 > > > > > #11 0xffffffe000201fac in handle_exception () at arch/riscv/kernel/entry.S:236 > > > > > > > > > > > It seems to me that the problem happens on the waiting threads, it > > No, that is the call trace to show ftrace_make_call -> > > flush_icache_all and we should give notrace on the whole path. > > > > Hmm. I am curious to understand how other architectures avoid this problem. for arm64 static int ftrace_modify_code(unsigned long pc, u32 old, u32 new, bool validate) { u32 replaced; ... if (aarch64_insn_patch_text_nosync((void *)pc, new)) return -EPERM; int __kprobes aarch64_insn_patch_text_nosync(void *addr, u32 insn) { u32 *tp = addr; int ret; /* A64 instructions must be word aligned */ if ((uintptr_t)tp & 0x3) return -EINVAL; ret = aarch64_insn_write(tp, insn); if (ret == 0) __flush_icache_range((uintptr_t)tp, (uintptr_t)tp + AARCH64_INSN_SIZE); Look at arm64, they __kprobes flag and I guess it would also prevent ftrace call site. __flush_icache_range is written in asm and no possible ftrace call site. > Is it a bigger issue in RISC-V because we have to switch privilege > mode to sync I/D cache ? We should sync I/D cache at s-mode because we need virtual address. For c910 we've added icache broadcast invalid instructions by physical address and virtual address. Current linux/arch/riscv I/D cache sync is so expensive.
On Thu, Oct 29, 2020 at 8:28 PM Guo Ren <guoren@kernel.org> wrote: > > On Fri, Oct 30, 2020 at 2:46 AM Atish Patra <atishp@atishpatra.org> wrote: > > > > On Thu, Oct 29, 2020 at 9:06 AM Guo Ren <guoren@kernel.org> wrote: > > > > > > On Thu, Oct 29, 2020 at 10:34 AM Zong Li <zong.li@sifive.com> wrote: > > > > > > > > On Thu, Oct 29, 2020 at 8:23 AM Atish Patra <atishp@atishpatra.org> wrote: > > > > > > > > > > On Wed, Oct 28, 2020 at 8:44 AM Guo Ren <guoren@kernel.org> wrote: > > > > > > > > > > > > Hi Zong & Atish, > > > > > > > > > > > > In our 2 harts c910 chip, we found: > > > > > > > > > > > > echo function > /sys/kernel/debug/tracing/current_tracer > > > > > > echo function_graph > /sys/kernel/debug/tracing/current_tracer > > > > > > echo function > /sys/kernel/debug/tracing/current_tracer > > > > > > echo function_graph > /sys/kernel/debug/tracing/current_tracer > > > > > > > > > > > > Then one core halted at stop_machine_yield: > > > > > > arch_cpu_idle () at arch/riscv/kernel/process.c:39 > > > > > > 39 local_irq_enable(); > > > > > > (gdb) i th > > > > > > Id Target Id Frame > > > > > > * 1 Thread 1 (CPU#0) arch_cpu_idle () at arch/riscv/kernel/process.c:39 > > > > > > 2 Thread 2 (CPU#1) stop_machine_yield > > > > > > (cpumask=0xffffffe001371fa8 <__cpu_online_mask>) at > > > > > > ./arch/riscv/include/asm/vdso/processor.h:12 > > > > > > (gdb) thread 2 > > > > > > [Switching to thread 2 (Thread 2)] > > > > > > #0 stop_machine_yield (cpumask=0xffffffe001371fa8 > > > > > > <__cpu_online_mask>) at ./arch/riscv/include/asm/vdso/processor.h:12 > > > > > > 12 __asm__ __volatile__ ("div %0, %0, zero" : "=r" (dummy)); > > > > > > > > > > > > With your patch, it's solved. For this patch, I'll give: > > > > > > Tested by: Guo Ren <guoren@kernel.org> > > > > > > > > > > > > But that's not enough, we still need: > > > > > > > > > > > > diff --git a/arch/riscv/kernel/sbi.c b/arch/riscv/kernel/sbi.c > > > > > > index 226ccce..12b8808 100644 > > > > > > --- a/arch/riscv/kernel/sbi.c > > > > > > +++ b/arch/riscv/kernel/sbi.c > > > > > > @@ -376,7 +376,7 @@ EXPORT_SYMBOL(sbi_send_ipi); > > > > > > * > > > > > > * Return: None > > > > > > */ > > > > > > -void sbi_remote_fence_i(const unsigned long *hart_mask) > > > > > > +void notrace sbi_remote_fence_i(const unsigned long *hart_mask) > > > > > > { > > > > > > __sbi_rfence(SBI_EXT_RFENCE_REMOTE_FENCE_I, > > > > > > hart_mask, 0, 0, 0, 0); > > > > > > diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c > > > > > > index 400b945d..9467d987 100644 > > > > > > --- a/arch/riscv/mm/cacheflush.c > > > > > > +++ b/arch/riscv/mm/cacheflush.c > > > > > > @@ -9,12 +9,12 @@ > > > > > > > > > > > > #include <asm/sbi.h> > > > > > > > > > > > > -static void ipi_remote_fence_i(void *info) > > > > > > +static void notrace ipi_remote_fence_i(void *info) > > > > > > { > > > > > > return local_flush_icache_all(); > > > > > > } > > > > > > > > > > > > -void flush_icache_all(void) > > > > > > +void notrace flush_icache_all(void) > > > > > > { > > > > > > if (IS_ENABLED(CONFIG_RISCV_SBI)) > > > > > > sbi_remote_fence_i(NULL); > > > > > > > > > > > > > > > > Did you see any issue if these functions are not marked as notrace ? > > > > > > > > > > As per Zong's explanation, the issue was that the other harts already > > > > > fetched the next 2 nops and > > > > > executed 1 while kernel patching replaced other with one of the auipc > > > > > + jalr pair. > > > > > > > > > > @Zong can correct me if I am wrong. > > > > > > > > > > These functions are too far ahead. Can it cause such issues ? If yes, > > > > > then we need to mark each and every function > > > > > that can be invoked from patch_text_nosync and are not inlined. > > > > > > > > > > That includes copy_to_kernel_nofault, __sbi_rfence_v02, > > > > > __sbi_rfence_v02_call, sbi_ecall. > > > > > > > > > > Few of these functions may be inlined by compiler. Can we depend on that ? > > > > > > > > > > > Because: > > > > > > (gdb) bt > > > > > > #0 flush_icache_all () at arch/riscv/mm/cacheflush.c:20 > > > > > > #1 0xffffffe00020473a in patch_text_nosync (addr=<optimized out>, insns= > > > > > > <optimized out>, len=<optimized out>) at arch/riscv/kernel/patch.c:96 > > > > > > #2 0xffffffe000206792 in ftrace_make_call (rec=<optimized out>, > > > > > > addr=<optimized out>) at arch/riscv/kernel/ftrace.c:109 > > > > > > #3 0xffffffe0002c9be4 in __ftrace_replace_code (rec=0xffffffe01ae40020, e > > > > > > nable=true) at kernel/trace/ftrace.c:2503 > > > > > > #4 0xffffffe0002ca092 in ftrace_replace_code (mod_flags=<optimized > > > > > > out>) at kernel/trace/ftrace.c:2530 > > > > > > #5 0xffffffe0002ca24a in ftrace_modify_all_code (command=9) at kernel > > > > > > /trace/ftrace.c:2677 > > > > > > #6 0xffffffe0002ca2ee in __ftrace_modify_code (data=<optimized out>) at > > > > > > kernel/trace/ftrace.c:2703 > > > > > > #7 0xffffffe0002c1390 in multi_cpu_stop (data=0x0) at kernel/stop_machin > > > > > > e.c:224 > > > > > > #8 0xffffffe0002c0fbe in cpu_stopper_thread (cpu=<optimized out>) at kern > > > > > > el/stop_machine.c:491 > > > > > > #9 0xffffffe0002343be in smpboot_thread_fn (data=0x0) at kernel/smpboot. > > > > > > c:165 > > > > > > #10 0xffffffe00022f894 in kthread (_create=0xffffffe01af13040) at kern > > > > > > el/kthread.c:292 > > > > > > #11 0xffffffe000201fac in handle_exception () at arch/riscv/kernel/entry.S:236 > > > > > > > > > > > > > > It seems to me that the problem happens on the waiting threads, it > > > No, that is the call trace to show ftrace_make_call -> > > > flush_icache_all and we should give notrace on the whole path. > > > > > > > Hmm. I am curious to understand how other architectures avoid this problem. > > for arm64 > static int ftrace_modify_code(unsigned long pc, u32 old, u32 new, > bool validate) > { > u32 replaced; > ... > if (aarch64_insn_patch_text_nosync((void *)pc, new)) > return -EPERM; > > int __kprobes aarch64_insn_patch_text_nosync(void *addr, u32 insn) > { > u32 *tp = addr; > int ret; > > /* A64 instructions must be word aligned */ > if ((uintptr_t)tp & 0x3) > return -EINVAL; > > ret = aarch64_insn_write(tp, insn); > if (ret == 0) > __flush_icache_range((uintptr_t)tp, > (uintptr_t)tp + AARCH64_INSN_SIZE); > > Look at arm64, they __kprobes flag and I guess it would also prevent > ftrace call site. > Are you sure about that ? __kprobes puts the code in .kprobes.text section which is under whitelist sections in recordmcount.pl & recordmcount.c. > __flush_icache_range is written in asm and no possible ftrace call site. > > > Is it a bigger issue in RISC-V because we have to switch privilege > > mode to sync I/D cache ? > We should sync I/D cache at s-mode because we need virtual address. > For c910 we've added icache broadcast invalid instructions by physical > address and virtual address. > > Current linux/arch/riscv I/D cache sync is so expensive. > Yes. It is a known fact. Unfortunately, RISC-V specifications doesn't allow any other method yet. I hope the specification is modified to allow some method to sync I/D cache from S-mode soon. > -- > Best Regards > Guo Ren > > ML: https://lore.kernel.org/linux-csky/
On Fri, 30 Oct 2020 14:47:56 -0700 Atish Patra <atishp@atishpatra.org> wrote: > > Look at arm64, they __kprobes flag and I guess it would also prevent > > ftrace call site. > > > > Are you sure about that ? __kprobes puts the code in .kprobes.text section > which is under whitelist sections in recordmcount.pl & recordmcount.c. Correct, ftrace can trace functions marked with __kprobes. That said, the instruction you are looking at here, is in a file that is blacklisted from recordmcount. CFLAGS_REMOVE_insn.o = $(CC_FLAGS_FTRACE) All ftrace flags are removed from the compiling of insn.c, and every function in that file will not be traced. -- Steve
On Sat, Oct 31, 2020 at 8:28 AM Steven Rostedt <rostedt@goodmis.org> wrote: > > On Fri, 30 Oct 2020 14:47:56 -0700 > Atish Patra <atishp@atishpatra.org> wrote: > > > > Look at arm64, they __kprobes flag and I guess it would also prevent > > > ftrace call site. > > > > > > > Are you sure about that ? __kprobes puts the code in .kprobes.text section > > which is under whitelist sections in recordmcount.pl & recordmcount.c. > > Correct, ftrace can trace functions marked with __kprobes. That said, I guess wrong, thx for correct me. > the instruction you are looking at here, is in a file that is > blacklisted from recordmcount. > > CFLAGS_REMOVE_insn.o = $(CC_FLAGS_FTRACE) > > All ftrace flags are removed from the compiling of insn.c, and every > function in that file will not be traced. Yes, arm64 prevents the whole file from ftrace. My patch just use notrace flag setting on some functions. @Atish How do think: CFLAGS_REMOVE_cacheflush.o = $(CC_FLAGS_FTRACE) CFLAGS_REMOVE_sbi.o = $(CC_FLAGS_FTRACE)
On Sat, Oct 31, 2020 at 12:42 AM Guo Ren <guoren@kernel.org> wrote: > > On Sat, Oct 31, 2020 at 8:28 AM Steven Rostedt <rostedt@goodmis.org> wrote: > > > > On Fri, 30 Oct 2020 14:47:56 -0700 > > Atish Patra <atishp@atishpatra.org> wrote: > > > > > > Look at arm64, they __kprobes flag and I guess it would also prevent > > > > ftrace call site. > > > > > > > > > > Are you sure about that ? __kprobes puts the code in .kprobes.text section > > > which is under whitelist sections in recordmcount.pl & recordmcount.c. > > > > Correct, ftrace can trace functions marked with __kprobes. That said, > I guess wrong, thx for correct me. > > > the instruction you are looking at here, is in a file that is > > blacklisted from recordmcount. > > > > CFLAGS_REMOVE_insn.o = $(CC_FLAGS_FTRACE) > > > > All ftrace flags are removed from the compiling of insn.c, and every > > function in that file will not be traced. > Yes, arm64 prevents the whole file from ftrace. My patch just use > notrace flag setting on some functions. > > @Atish How do think: > CFLAGS_REMOVE_cacheflush.o = $(CC_FLAGS_FTRACE) > CFLAGS_REMOVE_sbi.o = $(CC_FLAGS_FTRACE) > Looks good to me. What should be done for copy_to_kernel_nofault ? That is also in the calling path. > -- > Best Regards > Guo Ren > > ML: https://lore.kernel.org/linux-csky/
On Tue, Nov 3, 2020 at 11:33 PM Atish Patra <atishp@atishpatra.org> wrote: > > On Sat, Oct 31, 2020 at 12:42 AM Guo Ren <guoren@kernel.org> wrote: > > > > On Sat, Oct 31, 2020 at 8:28 AM Steven Rostedt <rostedt@goodmis.org> wrote: > > > > > > On Fri, 30 Oct 2020 14:47:56 -0700 > > > Atish Patra <atishp@atishpatra.org> wrote: > > > > > > > > Look at arm64, they __kprobes flag and I guess it would also prevent > > > > > ftrace call site. > > > > > > > > > > > > > Are you sure about that ? __kprobes puts the code in .kprobes.text section > > > > which is under whitelist sections in recordmcount.pl & recordmcount.c. > > > > > > Correct, ftrace can trace functions marked with __kprobes. That said, > > I guess wrong, thx for correct me. > > > > > the instruction you are looking at here, is in a file that is > > > blacklisted from recordmcount. > > > > > > CFLAGS_REMOVE_insn.o = $(CC_FLAGS_FTRACE) > > > > > > All ftrace flags are removed from the compiling of insn.c, and every > > > function in that file will not be traced. > > Yes, arm64 prevents the whole file from ftrace. My patch just use > > notrace flag setting on some functions. > > > > @Atish How do think: > > CFLAGS_REMOVE_cacheflush.o = $(CC_FLAGS_FTRACE) > > CFLAGS_REMOVE_sbi.o = $(CC_FLAGS_FTRACE) > > > > Looks good to me. What should be done for copy_to_kernel_nofault ? > That is also in the calling path. There is no nops' entry in the prologue of copy_to_kernel_nofault. >>>> 000000000000007c <.LVL6>: } 7c: 6105 addi sp,sp,32 7e: 8082 ret 0000000000000080 <copy_to_user_nofault>: * * Safely write to address @dst from the buffer at @src. If a kernel fault * happens, handle that and return -EFAULT. */ long copy_to_user_nofault(void __user *dst, const void *src, size_t size) { 80: 1101 addi sp,sp,-32 82: e822 sd s0,16(sp) 84: ec06 sd ra,24(sp) 86: e426 sd s1,8(sp) 88: e04a sd s2,0(sp) 8a: 1000 addi s0,sp,32 <<<< >>>> cmd_mm/maccess.o := /root/source/riscv-tools/install_64gc/bin/riscv64-unknown-linux-gnu-gcc -Wp,-MMD,mm/.maccess.o.d -nostdinc -isystem /root/source/riscv-tools/install_64gc/bin/../lib/gcc/riscv64-unknown-linux-gnu/8.4.0/include -I./arch/riscv/include -I./arch/riscv/include/generated -I./include -I./arch/riscv/include/uapi -I./arch/riscv/include/generated/uapi -I./include/uapi -I./include/generated/uapi -include ./include/linux/kconfig.h -include ./include/linux/compiler_types.h -D__KERNEL__ ***-DCC_USING_PATCHABLE_FUNCTION_ENTRY*** -Wall -Wundef -Werror=strict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -fshort-wchar -fno-PIE -Werror=implicit-function-declaration -Werror=implicit-int -Wno-format-security -std=gnu89 -mabi=lp64 -march=rv64imac -mno-save-restore -DCONFIG_PAGE_OFFSET=0xffffffe000000000 -mcmodel=medany -fno-omit-frame-pointer -mstrict-align -fno-delete-null-pointer-checks -Wno-frame-address -Wno-format-truncation -Wno-format-overflow -O2 --param=allow-store-data-races=0 -Wframe-larger-than=2048 -fstack-protector-strong -Wno-unused-but-set-variable -Wimplicit-fallthrough -Wno-unused-const-variable -fno-omit-frame-pointer -fno-optimize-sibling-calls -fno-var-tracking-assignments -g ***-fpatchable-function-entry=8*** -Wdeclaration-after-statement -Wvla -Wno-pointer-sign -Wno-stringop-truncation -Wno-array-bounds -Wno-stringop-overflow -Wno-restrict -Wno-maybe-uninitialized -fno-strict-overflow -fno-merge-all-constants -fmerge-constants -fno-stack-check -fconserve-stack -Werror=date-time -Werror=incompatible-pointer-types -Werror=designated-init -fmacro-prefix-map=./= -Wno-packed-not-aligned -DKBUILD_MODFILE='"mm/maccess"' -DKBUILD_BASENAME='"maccess"' -DKBUILD_MODNAME='"maccess"' -c -o mm/maccess.o mm/maccess.c <<<< But copy_from_user_nofault has: 000000000000007c <.LVL6>: } 7c: 6105 addi sp,sp,32 7e: 8082 ret 0000000000000080 <copy_to_user_nofault>: * * Safely write to address @dst from the buffer at @src. If a kernel fault * happens, handle that and return -EFAULT. */ long copy_to_user_nofault(void __user *dst, const void *src, size_t size) { 80: 1101 addi sp,sp,-32 82: e822 sd s0,16(sp) 84: ec06 sd ra,24(sp) 86: e426 sd s1,8(sp) 88: e04a sd s2,0(sp) 8a: 1000 addi s0,sp,32 I think it's a gcc problem, but satisfy our ftrace requirement. -- Best Regards Guo Ren ML: https://lore.kernel.org/linux-csky/
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 06895ef85d69..2a52f42f64b6 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -409,7 +409,7 @@ bool rcu_eqs_special_set(int cpu) * * The caller must have disabled interrupts and must not be idle. */ -void rcu_momentary_dyntick_idle(void) +notrace void rcu_momentary_dyntick_idle(void) { int special; diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c index 865bb0228ab6..890b79cf0e7c 100644 --- a/kernel/stop_machine.c +++ b/kernel/stop_machine.c @@ -178,7 +178,7 @@ static void ack_state(struct multi_stop_data *msdata) set_state(msdata, msdata->state + 1); } -void __weak stop_machine_yield(const struct cpumask *cpumask) +notrace void __weak stop_machine_yield(const struct cpumask *cpumask) { cpu_relax(); }
Like the commit cb9d7fd51d9f ("watchdog: Mark watchdog touch functions as notrace"), some architectures assume that the stopped CPUs don't make function calls to traceable functions when they are in the stopped state. For example, it causes unexpected kernel crashed when switching tracer on RISC-V. The following patches added calls to these two functions, fix it by adding the notrace annotations. Fixes: 4ecf0a43e729 ("processor: get rid of cpu_relax_yield") Fixes: 366237e7b083 ("stop_machine: Provide RCU quiescent state in multi_cpu_stop()") Signed-off-by: Zong Li <zong.li@sifive.com> --- kernel/rcu/tree.c | 2 +- kernel/stop_machine.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)