diff mbox series

[v3] riscv: patch: Fixup lockdep warning in stop_machine

Message ID 20230130232659.3374212-1-changbin.du@huawei.com (mailing list archive)
State Superseded
Delegated to: Palmer Dabbelt
Headers show
Series [v3] riscv: patch: Fixup lockdep warning in stop_machine | expand

Checks

Context Check Description
conchuod/cover_letter success Single patches do not need cover letters
conchuod/tree_selection success Guessed tree name to be fixes
conchuod/fixes_present success Fixes tag present in non-next series
conchuod/maintainers_pattern success MAINTAINERS pattern errors before the patch: 13 and now 13
conchuod/verify_signedoff success Signed-off-by tag matches author and committer
conchuod/kdoc success Errors and warnings before: 0 this patch: 0
conchuod/module_param success Was 0 now: 0
conchuod/build_rv64_gcc_allmodconfig success Errors and warnings before: 4 this patch: 4
conchuod/alphanumeric_selects success Out of order selects before the patch: 57 and now 57
conchuod/build_rv32_defconfig success Build OK
conchuod/dtb_warn_rv64 success Errors and warnings before: 2 this patch: 2
conchuod/header_inline success No static functions without inline keyword in header files
conchuod/checkpatch warning WARNING: 'runing' may be misspelled - perhaps 'running'? WARNING: From:/Signed-off-by: email address mismatch: 'From: Changbin Du <changbin.du@gmail.com>' != 'Signed-off-by: Changbin Du <changbin.du@huawei.com>'
conchuod/source_inline success Was 0 now: 0
conchuod/build_rv64_nommu_k210_defconfig success Build OK
conchuod/verify_fixes success Fixes tag looks correct
conchuod/build_rv64_nommu_virt_defconfig success Build OK

Commit Message

Changbin Du Jan. 30, 2023, 11:26 p.m. UTC
From: Changbin Du <changbin.du@gmail.com>

The task of ftrace_arch_code_modify(_post)_prepare() caller is
stop_machine, whose caller and work thread are of different tasks. The
lockdep checker needs the same task context, or it's wrong. That means
it's a bug here to use lockdep_assert_held because we don't guarantee
the same task context.

kernel/locking/lockdep.c:
int __lock_is_held(const struct lockdep_map *lock, int read)
{
        struct task_struct *curr = current;
        int i;

        for (i = 0; i < curr->lockdep_depth; i++) {
			^^^^^^^^^^^^^^^^^^^
                struct held_lock *hlock = curr->held_locks + i;
					  ^^^^^^^^^^^^^^^^
                if (match_held_lock(hlock, lock)) {
                        if (read == -1 || !!hlock->read == read)
                                return LOCK_STATE_HELD;

The __lock_is_held depends on current held_locks records; if
stop_machine makes the checker runing on another task, that's wrong.

Here is the log:
[   15.761523] ------------[ cut here ]------------
[   15.762125] WARNING: CPU: 0 PID: 15 at arch/riscv/kernel/patch.c:63 patch_insn_write+0x72/0x364
[   15.763258] Modules linked in:
[   15.764154] CPU: 0 PID: 15 Comm: migration/0 Not tainted 6.1.0-rc1-00014-g66924be85884-dirty #377
[   15.765339] Hardware name: riscv-virtio,qemu (DT)
[   15.765985] Stopper: multi_cpu_stop+0x0/0x192 <- stop_cpus.constprop.0+0x90/0xe2
[   15.766711] epc : patch_insn_write+0x72/0x364
[   15.767011]  ra : patch_insn_write+0x70/0x364
[   15.767276] epc : ffffffff8000721e ra : ffffffff8000721c sp : ff2000000067bca0
[   15.767622]  gp : ffffffff81603f90 tp : ff60000002432a00 t0 : 7300000000000000
[   15.767919]  t1 : 0000000000000000 t2 : 73695f6b636f6c5f s0 : ff2000000067bcf0
[   15.768238]  s1 : 0000000000000008 a0 : 0000000000000000 a1 : 0000000000000000
[   15.768537]  a2 : 0000000000000000 a3 : 0000000000000000 a4 : 0000000000000000
[   15.768837]  a5 : 0000000000000000 a6 : 0000000000000000 a7 : 0000000000000000
[   15.769139]  s2 : ffffffff80009faa s3 : ff2000000067bd10 s4 : ffffffffffffffff
[   15.769447]  s5 : 0000000000000001 s6 : 0000000000000001 s7 : 0000000000000003
[   15.769740]  s8 : 0000000000000002 s9 : 0000000000000004 s10: 0000000000000003
[   15.770027]  s11: 0000000000000002 t3 : 0000000000000000 t4 : ffffffff819af097
[   15.770323]  t5 : ffffffff819af098 t6 : ff2000000067ba28
[   15.770574] status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003
[   15.771102] [<ffffffff80007520>] patch_text_nosync+0x10/0x3a
[   15.771421] [<ffffffff80009c66>] ftrace_update_ftrace_func+0x74/0x10a
[   15.771704] [<ffffffff800fa17e>] ftrace_modify_all_code+0xb0/0x16c
[   15.771958] [<ffffffff800fa24c>] __ftrace_modify_code+0x12/0x1c
[   15.772196] [<ffffffff800e110e>] multi_cpu_stop+0x14a/0x192
[   15.772454] [<ffffffff800e0a34>] cpu_stopper_thread+0x96/0x14c
[   15.772699] [<ffffffff8003f4ea>] smpboot_thread_fn+0xf8/0x1cc
[   15.772945] [<ffffffff8003ac9c>] kthread+0xe2/0xf8
[   15.773160] [<ffffffff80003e98>] ret_from_exception+0x0/0x14
[   15.773471] ---[ end trace 0000000000000000 ]---

By the way, this also fixes the same issue for patch_text().

Fixes: 0ff7c3b33127 ("riscv: Use text_mutex instead of patch_lock")
Co-developed-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Guo Ren <guoren@kernel.org>
Cc: Zong Li <zong.li@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Signed-off-by: Changbin Du <changbin.du@huawei.com>
---
Changes in v3:
 - denote this also fixes function patch_text().

Changes in v2:
 - Rewrite commit log with lockdep explanation [Guo Ren]
 - Rebase on v6.1 [Guo Ren]

v1:
https://lore.kernel.org/linux-riscv/20210417023532.354714-1-changbin.du@gmail.com/
---
 arch/riscv/kernel/patch.c | 7 -------
 1 file changed, 7 deletions(-)

Comments

Conor Dooley Jan. 30, 2023, 3:09 p.m. UTC | #1
Hey Changbin,

On Tue, Jan 31, 2023 at 07:26:59AM +0800, Changbin Du wrote:
> From: Changbin Du <changbin.du@gmail.com>
> 
> The task of ftrace_arch_code_modify(_post)_prepare() caller is
> stop_machine, whose caller and work thread are of different tasks. The
> lockdep checker needs the same task context, or it's wrong. That means
> it's a bug here to use lockdep_assert_held because we don't guarantee
> the same task context.
> 
> kernel/locking/lockdep.c:
> int __lock_is_held(const struct lockdep_map *lock, int read)
> {
>         struct task_struct *curr = current;
>         int i;
> 
>         for (i = 0; i < curr->lockdep_depth; i++) {
> 			^^^^^^^^^^^^^^^^^^^
>                 struct held_lock *hlock = curr->held_locks + i;
> 					  ^^^^^^^^^^^^^^^^
>                 if (match_held_lock(hlock, lock)) {
>                         if (read == -1 || !!hlock->read == read)
>                                 return LOCK_STATE_HELD;
> 
> The __lock_is_held depends on current held_locks records; if
> stop_machine makes the checker runing on another task, that's wrong.
> 
> Here is the log:
> [   15.761523] ------------[ cut here ]------------
> [   15.762125] WARNING: CPU: 0 PID: 15 at arch/riscv/kernel/patch.c:63 patch_insn_write+0x72/0x364
> [   15.763258] Modules linked in:
> [   15.764154] CPU: 0 PID: 15 Comm: migration/0 Not tainted 6.1.0-rc1-00014-g66924be85884-dirty #377
> [   15.765339] Hardware name: riscv-virtio,qemu (DT)
> [   15.765985] Stopper: multi_cpu_stop+0x0/0x192 <- stop_cpus.constprop.0+0x90/0xe2
> [   15.766711] epc : patch_insn_write+0x72/0x364
> [   15.767011]  ra : patch_insn_write+0x70/0x364
> [   15.767276] epc : ffffffff8000721e ra : ffffffff8000721c sp : ff2000000067bca0
> [   15.767622]  gp : ffffffff81603f90 tp : ff60000002432a00 t0 : 7300000000000000
> [   15.767919]  t1 : 0000000000000000 t2 : 73695f6b636f6c5f s0 : ff2000000067bcf0
> [   15.768238]  s1 : 0000000000000008 a0 : 0000000000000000 a1 : 0000000000000000
> [   15.768537]  a2 : 0000000000000000 a3 : 0000000000000000 a4 : 0000000000000000
> [   15.768837]  a5 : 0000000000000000 a6 : 0000000000000000 a7 : 0000000000000000
> [   15.769139]  s2 : ffffffff80009faa s3 : ff2000000067bd10 s4 : ffffffffffffffff
> [   15.769447]  s5 : 0000000000000001 s6 : 0000000000000001 s7 : 0000000000000003
> [   15.769740]  s8 : 0000000000000002 s9 : 0000000000000004 s10: 0000000000000003
> [   15.770027]  s11: 0000000000000002 t3 : 0000000000000000 t4 : ffffffff819af097
> [   15.770323]  t5 : ffffffff819af098 t6 : ff2000000067ba28
> [   15.770574] status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003
> [   15.771102] [<ffffffff80007520>] patch_text_nosync+0x10/0x3a
> [   15.771421] [<ffffffff80009c66>] ftrace_update_ftrace_func+0x74/0x10a
> [   15.771704] [<ffffffff800fa17e>] ftrace_modify_all_code+0xb0/0x16c
> [   15.771958] [<ffffffff800fa24c>] __ftrace_modify_code+0x12/0x1c
> [   15.772196] [<ffffffff800e110e>] multi_cpu_stop+0x14a/0x192
> [   15.772454] [<ffffffff800e0a34>] cpu_stopper_thread+0x96/0x14c
> [   15.772699] [<ffffffff8003f4ea>] smpboot_thread_fn+0xf8/0x1cc
> [   15.772945] [<ffffffff8003ac9c>] kthread+0xe2/0xf8
> [   15.773160] [<ffffffff80003e98>] ret_from_exception+0x0/0x14
> [   15.773471] ---[ end trace 0000000000000000 ]---

FWIW, you can always crop the [15.192321] stuff out of commit messages,
as it just adds noise.

> By the way, this also fixes the same issue for patch_text().
> 
> Fixes: 0ff7c3b33127 ("riscv: Use text_mutex instead of patch_lock")
> Co-developed-by: Guo Ren <guoren@kernel.org>
> Signed-off-by: Guo Ren <guoren@kernel.org>
> Cc: Zong Li <zong.li@sifive.com>
> Cc: Palmer Dabbelt <palmer@dabbelt.com>
> Signed-off-by: Changbin Du <changbin.du@huawei.com>
> ---
> Changes in v3:
>  - denote this also fixes function patch_text().
> 
> Changes in v2:
>  - Rewrite commit log with lockdep explanation [Guo Ren]
>  - Rebase on v6.1 [Guo Ren]
> 
> v1:
> https://lore.kernel.org/linux-riscv/20210417023532.354714-1-changbin.du@gmail.com/
> ---
>  arch/riscv/kernel/patch.c | 7 -------
>  1 file changed, 7 deletions(-)
> 
> diff --git a/arch/riscv/kernel/patch.c b/arch/riscv/kernel/patch.c
> index 765004b60513..8619706f8dfd 100644
> --- a/arch/riscv/kernel/patch.c
> +++ b/arch/riscv/kernel/patch.c
> @@ -55,13 +55,6 @@ static int patch_insn_write(void *addr, const void *insn, size_t len)
>  	bool across_pages = (((uintptr_t) addr & ~PAGE_MASK) + len) > PAGE_SIZE;
>  	int ret;
>  
> -	/*
> -	 * Before reaching here, it was expected to lock the text_mutex
> -	 * already, so we don't need to give another lock here and could
> -	 * ensure that it was safe between each cores.
> -	 */
> -	lockdep_assert_held(&text_mutex);

I must admit, patches like this do concern me a little, as a someone
unfamiliar with the world of probing and tracing.
Seeing an explicit check that the lock was held, leads me to believe
that the original author (Zong Li I think) thought that the text_mutex
lock was insufficient.
Do you think that their fear is unfounded? Explaining why it is safe to
remove this assertion in the commit message would go a long way towards
easing my anxiety!

Also, why delete the comment altogether? The comment provides some
information that doesn't appear to become invalid, even with the
assertion removed?

Thanks,
Conor.

> -
>  	if (across_pages)
>  		patch_map(addr + len, FIX_TEXT_POKE1);
>  
> -- 
> 2.25.1
> 
>
Guo Ren Jan. 31, 2023, 7:26 a.m. UTC | #2
On Mon, Jan 30, 2023 at 11:10 PM Conor Dooley
<conor.dooley@microchip.com> wrote:
>
> Hey Changbin,
>
> On Tue, Jan 31, 2023 at 07:26:59AM +0800, Changbin Du wrote:
> > From: Changbin Du <changbin.du@gmail.com>
> >
> > The task of ftrace_arch_code_modify(_post)_prepare() caller is
> > stop_machine, whose caller and work thread are of different tasks. The
> > lockdep checker needs the same task context, or it's wrong. That means
> > it's a bug here to use lockdep_assert_held because we don't guarantee
> > the same task context.
> >
> > kernel/locking/lockdep.c:
> > int __lock_is_held(const struct lockdep_map *lock, int read)
> > {
> >         struct task_struct *curr = current;
> >         int i;
> >
> >         for (i = 0; i < curr->lockdep_depth; i++) {
> >                       ^^^^^^^^^^^^^^^^^^^
> >                 struct held_lock *hlock = curr->held_locks + i;
> >                                         ^^^^^^^^^^^^^^^^
> >                 if (match_held_lock(hlock, lock)) {
> >                         if (read == -1 || !!hlock->read == read)
> >                                 return LOCK_STATE_HELD;
> >
> > The __lock_is_held depends on current held_locks records; if
> > stop_machine makes the checker runing on another task, that's wrong.
> >
> > Here is the log:
> > [   15.761523] ------------[ cut here ]------------
> > [   15.762125] WARNING: CPU: 0 PID: 15 at arch/riscv/kernel/patch.c:63 patch_insn_write+0x72/0x364
> > [   15.763258] Modules linked in:
> > [   15.764154] CPU: 0 PID: 15 Comm: migration/0 Not tainted 6.1.0-rc1-00014-g66924be85884-dirty #377
> > [   15.765339] Hardware name: riscv-virtio,qemu (DT)
> > [   15.765985] Stopper: multi_cpu_stop+0x0/0x192 <- stop_cpus.constprop.0+0x90/0xe2
> > [   15.766711] epc : patch_insn_write+0x72/0x364
> > [   15.767011]  ra : patch_insn_write+0x70/0x364
> > [   15.767276] epc : ffffffff8000721e ra : ffffffff8000721c sp : ff2000000067bca0
> > [   15.767622]  gp : ffffffff81603f90 tp : ff60000002432a00 t0 : 7300000000000000
> > [   15.767919]  t1 : 0000000000000000 t2 : 73695f6b636f6c5f s0 : ff2000000067bcf0
> > [   15.768238]  s1 : 0000000000000008 a0 : 0000000000000000 a1 : 0000000000000000
> > [   15.768537]  a2 : 0000000000000000 a3 : 0000000000000000 a4 : 0000000000000000
> > [   15.768837]  a5 : 0000000000000000 a6 : 0000000000000000 a7 : 0000000000000000
> > [   15.769139]  s2 : ffffffff80009faa s3 : ff2000000067bd10 s4 : ffffffffffffffff
> > [   15.769447]  s5 : 0000000000000001 s6 : 0000000000000001 s7 : 0000000000000003
> > [   15.769740]  s8 : 0000000000000002 s9 : 0000000000000004 s10: 0000000000000003
> > [   15.770027]  s11: 0000000000000002 t3 : 0000000000000000 t4 : ffffffff819af097
> > [   15.770323]  t5 : ffffffff819af098 t6 : ff2000000067ba28
> > [   15.770574] status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003
> > [   15.771102] [<ffffffff80007520>] patch_text_nosync+0x10/0x3a
> > [   15.771421] [<ffffffff80009c66>] ftrace_update_ftrace_func+0x74/0x10a
> > [   15.771704] [<ffffffff800fa17e>] ftrace_modify_all_code+0xb0/0x16c
> > [   15.771958] [<ffffffff800fa24c>] __ftrace_modify_code+0x12/0x1c
> > [   15.772196] [<ffffffff800e110e>] multi_cpu_stop+0x14a/0x192
> > [   15.772454] [<ffffffff800e0a34>] cpu_stopper_thread+0x96/0x14c
> > [   15.772699] [<ffffffff8003f4ea>] smpboot_thread_fn+0xf8/0x1cc
> > [   15.772945] [<ffffffff8003ac9c>] kthread+0xe2/0xf8
> > [   15.773160] [<ffffffff80003e98>] ret_from_exception+0x0/0x14
> > [   15.773471] ---[ end trace 0000000000000000 ]---
>
> FWIW, you can always crop the [15.192321] stuff out of commit messages,
> as it just adds noise.
>
> > By the way, this also fixes the same issue for patch_text().
> >
> > Fixes: 0ff7c3b33127 ("riscv: Use text_mutex instead of patch_lock")
> > Co-developed-by: Guo Ren <guoren@kernel.org>
> > Signed-off-by: Guo Ren <guoren@kernel.org>
> > Cc: Zong Li <zong.li@sifive.com>
> > Cc: Palmer Dabbelt <palmer@dabbelt.com>
> > Signed-off-by: Changbin Du <changbin.du@huawei.com>
> > ---
> > Changes in v3:
> >  - denote this also fixes function patch_text().
> >
> > Changes in v2:
> >  - Rewrite commit log with lockdep explanation [Guo Ren]
> >  - Rebase on v6.1 [Guo Ren]
> >
> > v1:
> > https://lore.kernel.org/linux-riscv/20210417023532.354714-1-changbin.du@gmail.com/
> > ---
> >  arch/riscv/kernel/patch.c | 7 -------
> >  1 file changed, 7 deletions(-)
> >
> > diff --git a/arch/riscv/kernel/patch.c b/arch/riscv/kernel/patch.c
> > index 765004b60513..8619706f8dfd 100644
> > --- a/arch/riscv/kernel/patch.c
> > +++ b/arch/riscv/kernel/patch.c
> > @@ -55,13 +55,6 @@ static int patch_insn_write(void *addr, const void *insn, size_t len)
> >       bool across_pages = (((uintptr_t) addr & ~PAGE_MASK) + len) > PAGE_SIZE;
> >       int ret;
> >
> > -     /*
> > -      * Before reaching here, it was expected to lock the text_mutex
> > -      * already, so we don't need to give another lock here and could
> > -      * ensure that it was safe between each cores.
> > -      */
> > -     lockdep_assert_held(&text_mutex);
>
> I must admit, patches like this do concern me a little, as a someone
> unfamiliar with the world of probing and tracing.
> Seeing an explicit check that the lock was held, leads me to believe
> that the original author (Zong Li I think) thought that the text_mutex
> lock was insufficient.
> Do you think that their fear is unfounded? Explaining why it is safe to
> remove this assertion in the commit message would go a long way towards
> easing my anxiety!
>
> Also, why delete the comment altogether? The comment provides some
> information that doesn't appear to become invalid, even with the
> assertion removed?
Stop_machine separated the mutex context and made a lockdep warning.
So text_mutex can't be used here. We need to find another check
solution. I agree with the patch.

>
> Thanks,
> Conor.
>
> > -
> >       if (across_pages)
> >               patch_map(addr + len, FIX_TEXT_POKE1);
> >
> > --
> > 2.25.1
> >
> >



--
Best Regards
 Guo Ren
Conor Dooley Jan. 31, 2023, 7:50 a.m. UTC | #3
On Tue, Jan 31, 2023 at 03:26:33PM +0800, Guo Ren wrote:
> On Mon, Jan 30, 2023 at 11:10 PM Conor Dooley
> <conor.dooley@microchip.com> wrote:
> >
> > Hey Changbin,
> >
> > On Tue, Jan 31, 2023 at 07:26:59AM +0800, Changbin Du wrote:
> > > From: Changbin Du <changbin.du@gmail.com>
> > >
> > > The task of ftrace_arch_code_modify(_post)_prepare() caller is
> > > stop_machine, whose caller and work thread are of different tasks. The
> > > lockdep checker needs the same task context, or it's wrong. That means
> > > it's a bug here to use lockdep_assert_held because we don't guarantee
> > > the same task context.
> > >
> > > kernel/locking/lockdep.c:
> > > int __lock_is_held(const struct lockdep_map *lock, int read)
> > > {
> > >         struct task_struct *curr = current;
> > >         int i;
> > >
> > >         for (i = 0; i < curr->lockdep_depth; i++) {
> > >                       ^^^^^^^^^^^^^^^^^^^
> > >                 struct held_lock *hlock = curr->held_locks + i;
> > >                                         ^^^^^^^^^^^^^^^^
> > >                 if (match_held_lock(hlock, lock)) {
> > >                         if (read == -1 || !!hlock->read == read)
> > >                                 return LOCK_STATE_HELD;
> > >
> > > The __lock_is_held depends on current held_locks records; if
> > > stop_machine makes the checker runing on another task, that's wrong.
> > >
> > > Here is the log:
> > > [   15.761523] ------------[ cut here ]------------
> > > [   15.762125] WARNING: CPU: 0 PID: 15 at arch/riscv/kernel/patch.c:63 patch_insn_write+0x72/0x364
> > > [   15.763258] Modules linked in:
> > > [   15.764154] CPU: 0 PID: 15 Comm: migration/0 Not tainted 6.1.0-rc1-00014-g66924be85884-dirty #377
> > > [   15.765339] Hardware name: riscv-virtio,qemu (DT)
> > > [   15.765985] Stopper: multi_cpu_stop+0x0/0x192 <- stop_cpus.constprop.0+0x90/0xe2
> > > [   15.766711] epc : patch_insn_write+0x72/0x364
> > > [   15.767011]  ra : patch_insn_write+0x70/0x364
> > > [   15.767276] epc : ffffffff8000721e ra : ffffffff8000721c sp : ff2000000067bca0
> > > [   15.767622]  gp : ffffffff81603f90 tp : ff60000002432a00 t0 : 7300000000000000
> > > [   15.767919]  t1 : 0000000000000000 t2 : 73695f6b636f6c5f s0 : ff2000000067bcf0
> > > [   15.768238]  s1 : 0000000000000008 a0 : 0000000000000000 a1 : 0000000000000000
> > > [   15.768537]  a2 : 0000000000000000 a3 : 0000000000000000 a4 : 0000000000000000
> > > [   15.768837]  a5 : 0000000000000000 a6 : 0000000000000000 a7 : 0000000000000000
> > > [   15.769139]  s2 : ffffffff80009faa s3 : ff2000000067bd10 s4 : ffffffffffffffff
> > > [   15.769447]  s5 : 0000000000000001 s6 : 0000000000000001 s7 : 0000000000000003
> > > [   15.769740]  s8 : 0000000000000002 s9 : 0000000000000004 s10: 0000000000000003
> > > [   15.770027]  s11: 0000000000000002 t3 : 0000000000000000 t4 : ffffffff819af097
> > > [   15.770323]  t5 : ffffffff819af098 t6 : ff2000000067ba28
> > > [   15.770574] status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003
> > > [   15.771102] [<ffffffff80007520>] patch_text_nosync+0x10/0x3a
> > > [   15.771421] [<ffffffff80009c66>] ftrace_update_ftrace_func+0x74/0x10a
> > > [   15.771704] [<ffffffff800fa17e>] ftrace_modify_all_code+0xb0/0x16c
> > > [   15.771958] [<ffffffff800fa24c>] __ftrace_modify_code+0x12/0x1c
> > > [   15.772196] [<ffffffff800e110e>] multi_cpu_stop+0x14a/0x192
> > > [   15.772454] [<ffffffff800e0a34>] cpu_stopper_thread+0x96/0x14c
> > > [   15.772699] [<ffffffff8003f4ea>] smpboot_thread_fn+0xf8/0x1cc
> > > [   15.772945] [<ffffffff8003ac9c>] kthread+0xe2/0xf8
> > > [   15.773160] [<ffffffff80003e98>] ret_from_exception+0x0/0x14
> > > [   15.773471] ---[ end trace 0000000000000000 ]---
> >
> > FWIW, you can always crop the [15.192321] stuff out of commit messages,
> > as it just adds noise.
> >
> > > By the way, this also fixes the same issue for patch_text().
> > >
> > > Fixes: 0ff7c3b33127 ("riscv: Use text_mutex instead of patch_lock")
> > > Co-developed-by: Guo Ren <guoren@kernel.org>
> > > Signed-off-by: Guo Ren <guoren@kernel.org>
> > > Cc: Zong Li <zong.li@sifive.com>
> > > Cc: Palmer Dabbelt <palmer@dabbelt.com>
> > > Signed-off-by: Changbin Du <changbin.du@huawei.com>
> > > ---
> > > Changes in v3:
> > >  - denote this also fixes function patch_text().
> > >
> > > Changes in v2:
> > >  - Rewrite commit log with lockdep explanation [Guo Ren]
> > >  - Rebase on v6.1 [Guo Ren]
> > >
> > > v1:
> > > https://lore.kernel.org/linux-riscv/20210417023532.354714-1-changbin.du@gmail.com/
> > > ---
> > >  arch/riscv/kernel/patch.c | 7 -------
> > >  1 file changed, 7 deletions(-)
> > >
> > > diff --git a/arch/riscv/kernel/patch.c b/arch/riscv/kernel/patch.c
> > > index 765004b60513..8619706f8dfd 100644
> > > --- a/arch/riscv/kernel/patch.c
> > > +++ b/arch/riscv/kernel/patch.c
> > > @@ -55,13 +55,6 @@ static int patch_insn_write(void *addr, const void *insn, size_t len)
> > >       bool across_pages = (((uintptr_t) addr & ~PAGE_MASK) + len) > PAGE_SIZE;
> > >       int ret;
> > >
> > > -     /*
> > > -      * Before reaching here, it was expected to lock the text_mutex
> > > -      * already, so we don't need to give another lock here and could
> > > -      * ensure that it was safe between each cores.
> > > -      */
> > > -     lockdep_assert_held(&text_mutex);
> >
> > I must admit, patches like this do concern me a little, as a someone
> > unfamiliar with the world of probing and tracing.
> > Seeing an explicit check that the lock was held, leads me to believe
> > that the original author (Zong Li I think) thought that the text_mutex
> > lock was insufficient.
> > Do you think that their fear is unfounded? Explaining why it is safe to
> > remove this assertion in the commit message would go a long way towards
> > easing my anxiety!
> >
> > Also, why delete the comment altogether? The comment provides some
> > information that doesn't appear to become invalid, even with the
> > assertion removed?
> Stop_machine separated the mutex context and made a lockdep warning.
> So text_mutex can't be used here. We need to find another check
> solution. I agree with the patch.

Whether or not you agree with the change is not the point (with your SoB
I'd hope you agree with it).
I understand that you two are trying to fix a false positive lockdep
warning, but what I am asking for an explanation as to why the original
author's fear is unfounded.
Surely, having added the assertion, they were not thinking of the same
code path that you guys are hitting the false positive on?

Perhaps Zong themselves can tell us what the original fear was?

Thanks,
Conor.
Conor Dooley Feb. 1, 2023, 2:01 p.m. UTC | #4
On Thu, Feb 02, 2023 at 05:00:31AM +0800, Changbin Du wrote:
> On Tue, Jan 31, 2023 at 07:50:20AM +0000, Conor Dooley wrote:
> > On Tue, Jan 31, 2023 at 03:26:33PM +0800, Guo Ren wrote:
> [snip]
> > > > >
> > > > > -     /*
> > > > > -      * Before reaching here, it was expected to lock the text_mutex
> > > > > -      * already, so we don't need to give another lock here and could
> > > > > -      * ensure that it was safe between each cores.
> > > > > -      */
> > > > > -     lockdep_assert_held(&text_mutex);
> > > >
> > > > I must admit, patches like this do concern me a little, as a someone
> > > > unfamiliar with the world of probing and tracing.
> > > > Seeing an explicit check that the lock was held, leads me to believe
> > > > that the original author (Zong Li I think) thought that the text_mutex
> > > > lock was insufficient.
> > > > Do you think that their fear is unfounded? Explaining why it is safe to
> > > > remove this assertion in the commit message would go a long way towards
> > > > easing my anxiety!
> > > >
> > > > Also, why delete the comment altogether? The comment provides some
> > > > information that doesn't appear to become invalid, even with the
> > > > assertion removed?
> > > Stop_machine separated the mutex context and made a lockdep warning.
> > > So text_mutex can't be used here. We need to find another check
> > > solution. I agree with the patch.
> > 
> > Whether or not you agree with the change is not the point (with your SoB
> > I'd hope you agree with it).
> > I understand that you two are trying to fix a false positive lockdep
> > warning, but what I am asking for an explanation as to why the original
> > author's fear is unfounded.
> > Surely, having added the assertion, they were not thinking of the same
> > code path that you guys are hitting the false positive on?
> > 
> The assertion is reasonable since the fixmap entry is shared. The text_mutex
> does should be held before entering that function. But the false positive cases
> make some functions (ftrace for example) difficult to use due to warning log
> storm.
> 
> Either the lockdep should be fixed for stop_machine, or remove the assertion
> simply now (we can keep the comments). (or do the assertion conditionly?)

How would you suggest checking it conditionally?

Also, if you persist in removing the assertion, there is a comment in
arch/riscv/kernel/ftrace.c that would need to be updated. (L129-ish)

The comment you removed in this patch seems valid both before and after
though, so I don't see a compelling reason for its removal.

> And this is not a riscv only problem but common for architectures which use
> stop_machine to patch text. (arm for example)
> 
> > Perhaps Zong themselves can tell us what the original fear was?
Changbin Du Feb. 1, 2023, 9 p.m. UTC | #5
On Tue, Jan 31, 2023 at 07:50:20AM +0000, Conor Dooley wrote:
> On Tue, Jan 31, 2023 at 03:26:33PM +0800, Guo Ren wrote:
[snip]
> > > >
> > > > -     /*
> > > > -      * Before reaching here, it was expected to lock the text_mutex
> > > > -      * already, so we don't need to give another lock here and could
> > > > -      * ensure that it was safe between each cores.
> > > > -      */
> > > > -     lockdep_assert_held(&text_mutex);
> > >
> > > I must admit, patches like this do concern me a little, as a someone
> > > unfamiliar with the world of probing and tracing.
> > > Seeing an explicit check that the lock was held, leads me to believe
> > > that the original author (Zong Li I think) thought that the text_mutex
> > > lock was insufficient.
> > > Do you think that their fear is unfounded? Explaining why it is safe to
> > > remove this assertion in the commit message would go a long way towards
> > > easing my anxiety!
> > >
> > > Also, why delete the comment altogether? The comment provides some
> > > information that doesn't appear to become invalid, even with the
> > > assertion removed?
> > Stop_machine separated the mutex context and made a lockdep warning.
> > So text_mutex can't be used here. We need to find another check
> > solution. I agree with the patch.
> 
> Whether or not you agree with the change is not the point (with your SoB
> I'd hope you agree with it).
> I understand that you two are trying to fix a false positive lockdep
> warning, but what I am asking for an explanation as to why the original
> author's fear is unfounded.
> Surely, having added the assertion, they were not thinking of the same
> code path that you guys are hitting the false positive on?
> 
The assertion is reasonable since the fixmap entry is shared. The text_mutex
does should be held before entering that function. But the false positive cases
make some functions (ftrace for example) difficult to use due to warning log
storm.

Either the lockdep should be fixed for stop_machine, or remove the assertion
simply now (we can keep the comments). (or do the assertion conditionly?)

And this is not a riscv only problem but common for architectures which use
stop_machine to patch text. (arm for example)

> Perhaps Zong themselves can tell us what the original fear was?
> 
> Thanks,
> Conor.
Conor Dooley Feb. 2, 2023, 8:01 a.m. UTC | #6
On Fri, Feb 03, 2023 at 07:00:43AM +0800, Changbin Du wrote:

btw, something is wrong with your mail client or host machine.
Everything that you are sending is timestamped in the future,
as it is currently 15:57 on the 2nd in UTC+8.

> On Wed, Feb 01, 2023 at 02:01:07PM +0000, Conor Dooley wrote:
> > On Thu, Feb 02, 2023 at 05:00:31AM +0800, Changbin Du wrote:
> > > On Tue, Jan 31, 2023 at 07:50:20AM +0000, Conor Dooley wrote:
> > > > On Tue, Jan 31, 2023 at 03:26:33PM +0800, Guo Ren wrote:
> > > [snip]
> > > > > > >
> > > > > > > -     /*
> > > > > > > -      * Before reaching here, it was expected to lock the text_mutex
> > > > > > > -      * already, so we don't need to give another lock here and could
> > > > > > > -      * ensure that it was safe between each cores.
> > > > > > > -      */
> > > > > > > -     lockdep_assert_held(&text_mutex);
> > > > > >
> > > > > > I must admit, patches like this do concern me a little, as a someone
> > > > > > unfamiliar with the world of probing and tracing.
> > > > > > Seeing an explicit check that the lock was held, leads me to believe
> > > > > > that the original author (Zong Li I think) thought that the text_mutex
> > > > > > lock was insufficient.
> > > > > > Do you think that their fear is unfounded? Explaining why it is safe to
> > > > > > remove this assertion in the commit message would go a long way towards
> > > > > > easing my anxiety!
> > > > > >
> > > > > > Also, why delete the comment altogether? The comment provides some
> > > > > > information that doesn't appear to become invalid, even with the
> > > > > > assertion removed?
> > > > > Stop_machine separated the mutex context and made a lockdep warning.
> > > > > So text_mutex can't be used here. We need to find another check
> > > > > solution. I agree with the patch.
> > > > 
> > > > Whether or not you agree with the change is not the point (with your SoB
> > > > I'd hope you agree with it).
> > > > I understand that you two are trying to fix a false positive lockdep
> > > > warning, but what I am asking for an explanation as to why the original
> > > > author's fear is unfounded.
> > > > Surely, having added the assertion, they were not thinking of the same
> > > > code path that you guys are hitting the false positive on?
> > > > 
> > > The assertion is reasonable since the fixmap entry is shared. The text_mutex
> > > does should be held before entering that function. But the false positive cases
> > > make some functions (ftrace for example) difficult to use due to warning log
> > > storm.
> > > 
> > > Either the lockdep should be fixed for stop_machine, or remove the assertion
> > > simply now (we can keep the comments). (or do the assertion conditionly?)
> > 
> > How would you suggest checking it conditionally?
> >
> Please refer to a early patch from Palmer Dabbelt.
> https://lore.kernel.org/all/20220322022331.32136-1-palmer@rivosinc.com/

Oh cool, thanks for that.
Why not resend that approach, with your suggested fixup for
ftrace_init_nop() then?
It looks more complex, but is less worrisome & has an R-b from Steven
already.

Thanks,
Conor.
Changbin Du Feb. 2, 2023, 11:39 a.m. UTC | #7
On Thu, Feb 02, 2023 at 08:01:44AM +0000, Conor Dooley wrote:
> On Fri, Feb 03, 2023 at 07:00:43AM +0800, Changbin Du wrote:
> 
> btw, something is wrong with your mail client or host machine.
> Everything that you are sending is timestamped in the future,
> as it is currently 15:57 on the 2nd in UTC+8.
>
hmm, my machine goes 12 hours ahead. Thanks for your remainding.

> > On Wed, Feb 01, 2023 at 02:01:07PM +0000, Conor Dooley wrote:
> > > On Thu, Feb 02, 2023 at 05:00:31AM +0800, Changbin Du wrote:
> > > > On Tue, Jan 31, 2023 at 07:50:20AM +0000, Conor Dooley wrote:
> > > > > On Tue, Jan 31, 2023 at 03:26:33PM +0800, Guo Ren wrote:
> > > > [snip]
> > > > > > > >
> > > > > > > > -     /*
> > > > > > > > -      * Before reaching here, it was expected to lock the text_mutex
> > > > > > > > -      * already, so we don't need to give another lock here and could
> > > > > > > > -      * ensure that it was safe between each cores.
> > > > > > > > -      */
> > > > > > > > -     lockdep_assert_held(&text_mutex);
> > > > > > >
> > > > > > > I must admit, patches like this do concern me a little, as a someone
> > > > > > > unfamiliar with the world of probing and tracing.
> > > > > > > Seeing an explicit check that the lock was held, leads me to believe
> > > > > > > that the original author (Zong Li I think) thought that the text_mutex
> > > > > > > lock was insufficient.
> > > > > > > Do you think that their fear is unfounded? Explaining why it is safe to
> > > > > > > remove this assertion in the commit message would go a long way towards
> > > > > > > easing my anxiety!
> > > > > > >
> > > > > > > Also, why delete the comment altogether? The comment provides some
> > > > > > > information that doesn't appear to become invalid, even with the
> > > > > > > assertion removed?
> > > > > > Stop_machine separated the mutex context and made a lockdep warning.
> > > > > > So text_mutex can't be used here. We need to find another check
> > > > > > solution. I agree with the patch.
> > > > > 
> > > > > Whether or not you agree with the change is not the point (with your SoB
> > > > > I'd hope you agree with it).
> > > > > I understand that you two are trying to fix a false positive lockdep
> > > > > warning, but what I am asking for an explanation as to why the original
> > > > > author's fear is unfounded.
> > > > > Surely, having added the assertion, they were not thinking of the same
> > > > > code path that you guys are hitting the false positive on?
> > > > > 
> > > > The assertion is reasonable since the fixmap entry is shared. The text_mutex
> > > > does should be held before entering that function. But the false positive cases
> > > > make some functions (ftrace for example) difficult to use due to warning log
> > > > storm.
> > > > 
> > > > Either the lockdep should be fixed for stop_machine, or remove the assertion
> > > > simply now (we can keep the comments). (or do the assertion conditionly?)
> > > 
> > > How would you suggest checking it conditionally?
> > >
> > Please refer to a early patch from Palmer Dabbelt.
> > https://lore.kernel.org/all/20220322022331.32136-1-palmer@rivosinc.com/
> 
> Oh cool, thanks for that.
> Why not resend that approach, with your suggested fixup for
> ftrace_init_nop() then?
> It looks more complex, but is less worrisome & has an R-b from Steven
> already.
> 
Personally I don't like the complex change, because the clients of text patching
api are very limited. So I think it's not hard to take care of it. (The arm code
also doesn't have the assertion)

So I would leave the decision making to maintainers :) Anyway I will send V4 as
a candidate.

> Thanks,
> Conor.
>
Changbin Du Feb. 2, 2023, 11 p.m. UTC | #8
On Wed, Feb 01, 2023 at 02:01:07PM +0000, Conor Dooley wrote:
> On Thu, Feb 02, 2023 at 05:00:31AM +0800, Changbin Du wrote:
> > On Tue, Jan 31, 2023 at 07:50:20AM +0000, Conor Dooley wrote:
> > > On Tue, Jan 31, 2023 at 03:26:33PM +0800, Guo Ren wrote:
> > [snip]
> > > > > >
> > > > > > -     /*
> > > > > > -      * Before reaching here, it was expected to lock the text_mutex
> > > > > > -      * already, so we don't need to give another lock here and could
> > > > > > -      * ensure that it was safe between each cores.
> > > > > > -      */
> > > > > > -     lockdep_assert_held(&text_mutex);
> > > > >
> > > > > I must admit, patches like this do concern me a little, as a someone
> > > > > unfamiliar with the world of probing and tracing.
> > > > > Seeing an explicit check that the lock was held, leads me to believe
> > > > > that the original author (Zong Li I think) thought that the text_mutex
> > > > > lock was insufficient.
> > > > > Do you think that their fear is unfounded? Explaining why it is safe to
> > > > > remove this assertion in the commit message would go a long way towards
> > > > > easing my anxiety!
> > > > >
> > > > > Also, why delete the comment altogether? The comment provides some
> > > > > information that doesn't appear to become invalid, even with the
> > > > > assertion removed?
> > > > Stop_machine separated the mutex context and made a lockdep warning.
> > > > So text_mutex can't be used here. We need to find another check
> > > > solution. I agree with the patch.
> > > 
> > > Whether or not you agree with the change is not the point (with your SoB
> > > I'd hope you agree with it).
> > > I understand that you two are trying to fix a false positive lockdep
> > > warning, but what I am asking for an explanation as to why the original
> > > author's fear is unfounded.
> > > Surely, having added the assertion, they were not thinking of the same
> > > code path that you guys are hitting the false positive on?
> > > 
> > The assertion is reasonable since the fixmap entry is shared. The text_mutex
> > does should be held before entering that function. But the false positive cases
> > make some functions (ftrace for example) difficult to use due to warning log
> > storm.
> > 
> > Either the lockdep should be fixed for stop_machine, or remove the assertion
> > simply now (we can keep the comments). (or do the assertion conditionly?)
> 
> How would you suggest checking it conditionally?
>
Please refer to a early patch from Palmer Dabbelt.
https://lore.kernel.org/all/20220322022331.32136-1-palmer@rivosinc.com/

> Also, if you persist in removing the assertion, there is a comment in
> arch/riscv/kernel/ftrace.c that would need to be updated. (L129-ish)
> 
No problem.

> The comment you removed in this patch seems valid both before and after
> though, so I don't see a compelling reason for its removal.
We all agreed. The key is to get rid of false positive case.

> 
> > And this is not a riscv only problem but common for architectures which use
> > stop_machine to patch text. (arm for example)
> > 
> > > Perhaps Zong themselves can tell us what the original fear was?
>
diff mbox series

Patch

diff --git a/arch/riscv/kernel/patch.c b/arch/riscv/kernel/patch.c
index 765004b60513..8619706f8dfd 100644
--- a/arch/riscv/kernel/patch.c
+++ b/arch/riscv/kernel/patch.c
@@ -55,13 +55,6 @@  static int patch_insn_write(void *addr, const void *insn, size_t len)
 	bool across_pages = (((uintptr_t) addr & ~PAGE_MASK) + len) > PAGE_SIZE;
 	int ret;
 
-	/*
-	 * Before reaching here, it was expected to lock the text_mutex
-	 * already, so we don't need to give another lock here and could
-	 * ensure that it was safe between each cores.
-	 */
-	lockdep_assert_held(&text_mutex);
-
 	if (across_pages)
 		patch_map(addr + len, FIX_TEXT_POKE1);