Message ID | 20230130232659.3374212-1-changbin.du@huawei.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Palmer Dabbelt |
Headers | show |
Series | [v3] riscv: patch: Fixup lockdep warning in stop_machine | expand |
Hey Changbin, On Tue, Jan 31, 2023 at 07:26:59AM +0800, Changbin Du wrote: > From: Changbin Du <changbin.du@gmail.com> > > The task of ftrace_arch_code_modify(_post)_prepare() caller is > stop_machine, whose caller and work thread are of different tasks. The > lockdep checker needs the same task context, or it's wrong. That means > it's a bug here to use lockdep_assert_held because we don't guarantee > the same task context. > > kernel/locking/lockdep.c: > int __lock_is_held(const struct lockdep_map *lock, int read) > { > struct task_struct *curr = current; > int i; > > for (i = 0; i < curr->lockdep_depth; i++) { > ^^^^^^^^^^^^^^^^^^^ > struct held_lock *hlock = curr->held_locks + i; > ^^^^^^^^^^^^^^^^ > if (match_held_lock(hlock, lock)) { > if (read == -1 || !!hlock->read == read) > return LOCK_STATE_HELD; > > The __lock_is_held depends on current held_locks records; if > stop_machine makes the checker runing on another task, that's wrong. > > Here is the log: > [ 15.761523] ------------[ cut here ]------------ > [ 15.762125] WARNING: CPU: 0 PID: 15 at arch/riscv/kernel/patch.c:63 patch_insn_write+0x72/0x364 > [ 15.763258] Modules linked in: > [ 15.764154] CPU: 0 PID: 15 Comm: migration/0 Not tainted 6.1.0-rc1-00014-g66924be85884-dirty #377 > [ 15.765339] Hardware name: riscv-virtio,qemu (DT) > [ 15.765985] Stopper: multi_cpu_stop+0x0/0x192 <- stop_cpus.constprop.0+0x90/0xe2 > [ 15.766711] epc : patch_insn_write+0x72/0x364 > [ 15.767011] ra : patch_insn_write+0x70/0x364 > [ 15.767276] epc : ffffffff8000721e ra : ffffffff8000721c sp : ff2000000067bca0 > [ 15.767622] gp : ffffffff81603f90 tp : ff60000002432a00 t0 : 7300000000000000 > [ 15.767919] t1 : 0000000000000000 t2 : 73695f6b636f6c5f s0 : ff2000000067bcf0 > [ 15.768238] s1 : 0000000000000008 a0 : 0000000000000000 a1 : 0000000000000000 > [ 15.768537] a2 : 0000000000000000 a3 : 0000000000000000 a4 : 0000000000000000 > [ 15.768837] a5 : 0000000000000000 a6 : 0000000000000000 a7 : 0000000000000000 > [ 15.769139] s2 : ffffffff80009faa s3 : ff2000000067bd10 s4 : ffffffffffffffff > [ 15.769447] s5 : 0000000000000001 s6 : 0000000000000001 s7 : 0000000000000003 > [ 15.769740] s8 : 0000000000000002 s9 : 0000000000000004 s10: 0000000000000003 > [ 15.770027] s11: 0000000000000002 t3 : 0000000000000000 t4 : ffffffff819af097 > [ 15.770323] t5 : ffffffff819af098 t6 : ff2000000067ba28 > [ 15.770574] status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003 > [ 15.771102] [<ffffffff80007520>] patch_text_nosync+0x10/0x3a > [ 15.771421] [<ffffffff80009c66>] ftrace_update_ftrace_func+0x74/0x10a > [ 15.771704] [<ffffffff800fa17e>] ftrace_modify_all_code+0xb0/0x16c > [ 15.771958] [<ffffffff800fa24c>] __ftrace_modify_code+0x12/0x1c > [ 15.772196] [<ffffffff800e110e>] multi_cpu_stop+0x14a/0x192 > [ 15.772454] [<ffffffff800e0a34>] cpu_stopper_thread+0x96/0x14c > [ 15.772699] [<ffffffff8003f4ea>] smpboot_thread_fn+0xf8/0x1cc > [ 15.772945] [<ffffffff8003ac9c>] kthread+0xe2/0xf8 > [ 15.773160] [<ffffffff80003e98>] ret_from_exception+0x0/0x14 > [ 15.773471] ---[ end trace 0000000000000000 ]--- FWIW, you can always crop the [15.192321] stuff out of commit messages, as it just adds noise. > By the way, this also fixes the same issue for patch_text(). > > Fixes: 0ff7c3b33127 ("riscv: Use text_mutex instead of patch_lock") > Co-developed-by: Guo Ren <guoren@kernel.org> > Signed-off-by: Guo Ren <guoren@kernel.org> > Cc: Zong Li <zong.li@sifive.com> > Cc: Palmer Dabbelt <palmer@dabbelt.com> > Signed-off-by: Changbin Du <changbin.du@huawei.com> > --- > Changes in v3: > - denote this also fixes function patch_text(). > > Changes in v2: > - Rewrite commit log with lockdep explanation [Guo Ren] > - Rebase on v6.1 [Guo Ren] > > v1: > https://lore.kernel.org/linux-riscv/20210417023532.354714-1-changbin.du@gmail.com/ > --- > arch/riscv/kernel/patch.c | 7 ------- > 1 file changed, 7 deletions(-) > > diff --git a/arch/riscv/kernel/patch.c b/arch/riscv/kernel/patch.c > index 765004b60513..8619706f8dfd 100644 > --- a/arch/riscv/kernel/patch.c > +++ b/arch/riscv/kernel/patch.c > @@ -55,13 +55,6 @@ static int patch_insn_write(void *addr, const void *insn, size_t len) > bool across_pages = (((uintptr_t) addr & ~PAGE_MASK) + len) > PAGE_SIZE; > int ret; > > - /* > - * Before reaching here, it was expected to lock the text_mutex > - * already, so we don't need to give another lock here and could > - * ensure that it was safe between each cores. > - */ > - lockdep_assert_held(&text_mutex); I must admit, patches like this do concern me a little, as a someone unfamiliar with the world of probing and tracing. Seeing an explicit check that the lock was held, leads me to believe that the original author (Zong Li I think) thought that the text_mutex lock was insufficient. Do you think that their fear is unfounded? Explaining why it is safe to remove this assertion in the commit message would go a long way towards easing my anxiety! Also, why delete the comment altogether? The comment provides some information that doesn't appear to become invalid, even with the assertion removed? Thanks, Conor. > - > if (across_pages) > patch_map(addr + len, FIX_TEXT_POKE1); > > -- > 2.25.1 > >
On Mon, Jan 30, 2023 at 11:10 PM Conor Dooley <conor.dooley@microchip.com> wrote: > > Hey Changbin, > > On Tue, Jan 31, 2023 at 07:26:59AM +0800, Changbin Du wrote: > > From: Changbin Du <changbin.du@gmail.com> > > > > The task of ftrace_arch_code_modify(_post)_prepare() caller is > > stop_machine, whose caller and work thread are of different tasks. The > > lockdep checker needs the same task context, or it's wrong. That means > > it's a bug here to use lockdep_assert_held because we don't guarantee > > the same task context. > > > > kernel/locking/lockdep.c: > > int __lock_is_held(const struct lockdep_map *lock, int read) > > { > > struct task_struct *curr = current; > > int i; > > > > for (i = 0; i < curr->lockdep_depth; i++) { > > ^^^^^^^^^^^^^^^^^^^ > > struct held_lock *hlock = curr->held_locks + i; > > ^^^^^^^^^^^^^^^^ > > if (match_held_lock(hlock, lock)) { > > if (read == -1 || !!hlock->read == read) > > return LOCK_STATE_HELD; > > > > The __lock_is_held depends on current held_locks records; if > > stop_machine makes the checker runing on another task, that's wrong. > > > > Here is the log: > > [ 15.761523] ------------[ cut here ]------------ > > [ 15.762125] WARNING: CPU: 0 PID: 15 at arch/riscv/kernel/patch.c:63 patch_insn_write+0x72/0x364 > > [ 15.763258] Modules linked in: > > [ 15.764154] CPU: 0 PID: 15 Comm: migration/0 Not tainted 6.1.0-rc1-00014-g66924be85884-dirty #377 > > [ 15.765339] Hardware name: riscv-virtio,qemu (DT) > > [ 15.765985] Stopper: multi_cpu_stop+0x0/0x192 <- stop_cpus.constprop.0+0x90/0xe2 > > [ 15.766711] epc : patch_insn_write+0x72/0x364 > > [ 15.767011] ra : patch_insn_write+0x70/0x364 > > [ 15.767276] epc : ffffffff8000721e ra : ffffffff8000721c sp : ff2000000067bca0 > > [ 15.767622] gp : ffffffff81603f90 tp : ff60000002432a00 t0 : 7300000000000000 > > [ 15.767919] t1 : 0000000000000000 t2 : 73695f6b636f6c5f s0 : ff2000000067bcf0 > > [ 15.768238] s1 : 0000000000000008 a0 : 0000000000000000 a1 : 0000000000000000 > > [ 15.768537] a2 : 0000000000000000 a3 : 0000000000000000 a4 : 0000000000000000 > > [ 15.768837] a5 : 0000000000000000 a6 : 0000000000000000 a7 : 0000000000000000 > > [ 15.769139] s2 : ffffffff80009faa s3 : ff2000000067bd10 s4 : ffffffffffffffff > > [ 15.769447] s5 : 0000000000000001 s6 : 0000000000000001 s7 : 0000000000000003 > > [ 15.769740] s8 : 0000000000000002 s9 : 0000000000000004 s10: 0000000000000003 > > [ 15.770027] s11: 0000000000000002 t3 : 0000000000000000 t4 : ffffffff819af097 > > [ 15.770323] t5 : ffffffff819af098 t6 : ff2000000067ba28 > > [ 15.770574] status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003 > > [ 15.771102] [<ffffffff80007520>] patch_text_nosync+0x10/0x3a > > [ 15.771421] [<ffffffff80009c66>] ftrace_update_ftrace_func+0x74/0x10a > > [ 15.771704] [<ffffffff800fa17e>] ftrace_modify_all_code+0xb0/0x16c > > [ 15.771958] [<ffffffff800fa24c>] __ftrace_modify_code+0x12/0x1c > > [ 15.772196] [<ffffffff800e110e>] multi_cpu_stop+0x14a/0x192 > > [ 15.772454] [<ffffffff800e0a34>] cpu_stopper_thread+0x96/0x14c > > [ 15.772699] [<ffffffff8003f4ea>] smpboot_thread_fn+0xf8/0x1cc > > [ 15.772945] [<ffffffff8003ac9c>] kthread+0xe2/0xf8 > > [ 15.773160] [<ffffffff80003e98>] ret_from_exception+0x0/0x14 > > [ 15.773471] ---[ end trace 0000000000000000 ]--- > > FWIW, you can always crop the [15.192321] stuff out of commit messages, > as it just adds noise. > > > By the way, this also fixes the same issue for patch_text(). > > > > Fixes: 0ff7c3b33127 ("riscv: Use text_mutex instead of patch_lock") > > Co-developed-by: Guo Ren <guoren@kernel.org> > > Signed-off-by: Guo Ren <guoren@kernel.org> > > Cc: Zong Li <zong.li@sifive.com> > > Cc: Palmer Dabbelt <palmer@dabbelt.com> > > Signed-off-by: Changbin Du <changbin.du@huawei.com> > > --- > > Changes in v3: > > - denote this also fixes function patch_text(). > > > > Changes in v2: > > - Rewrite commit log with lockdep explanation [Guo Ren] > > - Rebase on v6.1 [Guo Ren] > > > > v1: > > https://lore.kernel.org/linux-riscv/20210417023532.354714-1-changbin.du@gmail.com/ > > --- > > arch/riscv/kernel/patch.c | 7 ------- > > 1 file changed, 7 deletions(-) > > > > diff --git a/arch/riscv/kernel/patch.c b/arch/riscv/kernel/patch.c > > index 765004b60513..8619706f8dfd 100644 > > --- a/arch/riscv/kernel/patch.c > > +++ b/arch/riscv/kernel/patch.c > > @@ -55,13 +55,6 @@ static int patch_insn_write(void *addr, const void *insn, size_t len) > > bool across_pages = (((uintptr_t) addr & ~PAGE_MASK) + len) > PAGE_SIZE; > > int ret; > > > > - /* > > - * Before reaching here, it was expected to lock the text_mutex > > - * already, so we don't need to give another lock here and could > > - * ensure that it was safe between each cores. > > - */ > > - lockdep_assert_held(&text_mutex); > > I must admit, patches like this do concern me a little, as a someone > unfamiliar with the world of probing and tracing. > Seeing an explicit check that the lock was held, leads me to believe > that the original author (Zong Li I think) thought that the text_mutex > lock was insufficient. > Do you think that their fear is unfounded? Explaining why it is safe to > remove this assertion in the commit message would go a long way towards > easing my anxiety! > > Also, why delete the comment altogether? The comment provides some > information that doesn't appear to become invalid, even with the > assertion removed? Stop_machine separated the mutex context and made a lockdep warning. So text_mutex can't be used here. We need to find another check solution. I agree with the patch. > > Thanks, > Conor. > > > - > > if (across_pages) > > patch_map(addr + len, FIX_TEXT_POKE1); > > > > -- > > 2.25.1 > > > > -- Best Regards Guo Ren
On Tue, Jan 31, 2023 at 03:26:33PM +0800, Guo Ren wrote: > On Mon, Jan 30, 2023 at 11:10 PM Conor Dooley > <conor.dooley@microchip.com> wrote: > > > > Hey Changbin, > > > > On Tue, Jan 31, 2023 at 07:26:59AM +0800, Changbin Du wrote: > > > From: Changbin Du <changbin.du@gmail.com> > > > > > > The task of ftrace_arch_code_modify(_post)_prepare() caller is > > > stop_machine, whose caller and work thread are of different tasks. The > > > lockdep checker needs the same task context, or it's wrong. That means > > > it's a bug here to use lockdep_assert_held because we don't guarantee > > > the same task context. > > > > > > kernel/locking/lockdep.c: > > > int __lock_is_held(const struct lockdep_map *lock, int read) > > > { > > > struct task_struct *curr = current; > > > int i; > > > > > > for (i = 0; i < curr->lockdep_depth; i++) { > > > ^^^^^^^^^^^^^^^^^^^ > > > struct held_lock *hlock = curr->held_locks + i; > > > ^^^^^^^^^^^^^^^^ > > > if (match_held_lock(hlock, lock)) { > > > if (read == -1 || !!hlock->read == read) > > > return LOCK_STATE_HELD; > > > > > > The __lock_is_held depends on current held_locks records; if > > > stop_machine makes the checker runing on another task, that's wrong. > > > > > > Here is the log: > > > [ 15.761523] ------------[ cut here ]------------ > > > [ 15.762125] WARNING: CPU: 0 PID: 15 at arch/riscv/kernel/patch.c:63 patch_insn_write+0x72/0x364 > > > [ 15.763258] Modules linked in: > > > [ 15.764154] CPU: 0 PID: 15 Comm: migration/0 Not tainted 6.1.0-rc1-00014-g66924be85884-dirty #377 > > > [ 15.765339] Hardware name: riscv-virtio,qemu (DT) > > > [ 15.765985] Stopper: multi_cpu_stop+0x0/0x192 <- stop_cpus.constprop.0+0x90/0xe2 > > > [ 15.766711] epc : patch_insn_write+0x72/0x364 > > > [ 15.767011] ra : patch_insn_write+0x70/0x364 > > > [ 15.767276] epc : ffffffff8000721e ra : ffffffff8000721c sp : ff2000000067bca0 > > > [ 15.767622] gp : ffffffff81603f90 tp : ff60000002432a00 t0 : 7300000000000000 > > > [ 15.767919] t1 : 0000000000000000 t2 : 73695f6b636f6c5f s0 : ff2000000067bcf0 > > > [ 15.768238] s1 : 0000000000000008 a0 : 0000000000000000 a1 : 0000000000000000 > > > [ 15.768537] a2 : 0000000000000000 a3 : 0000000000000000 a4 : 0000000000000000 > > > [ 15.768837] a5 : 0000000000000000 a6 : 0000000000000000 a7 : 0000000000000000 > > > [ 15.769139] s2 : ffffffff80009faa s3 : ff2000000067bd10 s4 : ffffffffffffffff > > > [ 15.769447] s5 : 0000000000000001 s6 : 0000000000000001 s7 : 0000000000000003 > > > [ 15.769740] s8 : 0000000000000002 s9 : 0000000000000004 s10: 0000000000000003 > > > [ 15.770027] s11: 0000000000000002 t3 : 0000000000000000 t4 : ffffffff819af097 > > > [ 15.770323] t5 : ffffffff819af098 t6 : ff2000000067ba28 > > > [ 15.770574] status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003 > > > [ 15.771102] [<ffffffff80007520>] patch_text_nosync+0x10/0x3a > > > [ 15.771421] [<ffffffff80009c66>] ftrace_update_ftrace_func+0x74/0x10a > > > [ 15.771704] [<ffffffff800fa17e>] ftrace_modify_all_code+0xb0/0x16c > > > [ 15.771958] [<ffffffff800fa24c>] __ftrace_modify_code+0x12/0x1c > > > [ 15.772196] [<ffffffff800e110e>] multi_cpu_stop+0x14a/0x192 > > > [ 15.772454] [<ffffffff800e0a34>] cpu_stopper_thread+0x96/0x14c > > > [ 15.772699] [<ffffffff8003f4ea>] smpboot_thread_fn+0xf8/0x1cc > > > [ 15.772945] [<ffffffff8003ac9c>] kthread+0xe2/0xf8 > > > [ 15.773160] [<ffffffff80003e98>] ret_from_exception+0x0/0x14 > > > [ 15.773471] ---[ end trace 0000000000000000 ]--- > > > > FWIW, you can always crop the [15.192321] stuff out of commit messages, > > as it just adds noise. > > > > > By the way, this also fixes the same issue for patch_text(). > > > > > > Fixes: 0ff7c3b33127 ("riscv: Use text_mutex instead of patch_lock") > > > Co-developed-by: Guo Ren <guoren@kernel.org> > > > Signed-off-by: Guo Ren <guoren@kernel.org> > > > Cc: Zong Li <zong.li@sifive.com> > > > Cc: Palmer Dabbelt <palmer@dabbelt.com> > > > Signed-off-by: Changbin Du <changbin.du@huawei.com> > > > --- > > > Changes in v3: > > > - denote this also fixes function patch_text(). > > > > > > Changes in v2: > > > - Rewrite commit log with lockdep explanation [Guo Ren] > > > - Rebase on v6.1 [Guo Ren] > > > > > > v1: > > > https://lore.kernel.org/linux-riscv/20210417023532.354714-1-changbin.du@gmail.com/ > > > --- > > > arch/riscv/kernel/patch.c | 7 ------- > > > 1 file changed, 7 deletions(-) > > > > > > diff --git a/arch/riscv/kernel/patch.c b/arch/riscv/kernel/patch.c > > > index 765004b60513..8619706f8dfd 100644 > > > --- a/arch/riscv/kernel/patch.c > > > +++ b/arch/riscv/kernel/patch.c > > > @@ -55,13 +55,6 @@ static int patch_insn_write(void *addr, const void *insn, size_t len) > > > bool across_pages = (((uintptr_t) addr & ~PAGE_MASK) + len) > PAGE_SIZE; > > > int ret; > > > > > > - /* > > > - * Before reaching here, it was expected to lock the text_mutex > > > - * already, so we don't need to give another lock here and could > > > - * ensure that it was safe between each cores. > > > - */ > > > - lockdep_assert_held(&text_mutex); > > > > I must admit, patches like this do concern me a little, as a someone > > unfamiliar with the world of probing and tracing. > > Seeing an explicit check that the lock was held, leads me to believe > > that the original author (Zong Li I think) thought that the text_mutex > > lock was insufficient. > > Do you think that their fear is unfounded? Explaining why it is safe to > > remove this assertion in the commit message would go a long way towards > > easing my anxiety! > > > > Also, why delete the comment altogether? The comment provides some > > information that doesn't appear to become invalid, even with the > > assertion removed? > Stop_machine separated the mutex context and made a lockdep warning. > So text_mutex can't be used here. We need to find another check > solution. I agree with the patch. Whether or not you agree with the change is not the point (with your SoB I'd hope you agree with it). I understand that you two are trying to fix a false positive lockdep warning, but what I am asking for an explanation as to why the original author's fear is unfounded. Surely, having added the assertion, they were not thinking of the same code path that you guys are hitting the false positive on? Perhaps Zong themselves can tell us what the original fear was? Thanks, Conor.
On Thu, Feb 02, 2023 at 05:00:31AM +0800, Changbin Du wrote: > On Tue, Jan 31, 2023 at 07:50:20AM +0000, Conor Dooley wrote: > > On Tue, Jan 31, 2023 at 03:26:33PM +0800, Guo Ren wrote: > [snip] > > > > > > > > > > - /* > > > > > - * Before reaching here, it was expected to lock the text_mutex > > > > > - * already, so we don't need to give another lock here and could > > > > > - * ensure that it was safe between each cores. > > > > > - */ > > > > > - lockdep_assert_held(&text_mutex); > > > > > > > > I must admit, patches like this do concern me a little, as a someone > > > > unfamiliar with the world of probing and tracing. > > > > Seeing an explicit check that the lock was held, leads me to believe > > > > that the original author (Zong Li I think) thought that the text_mutex > > > > lock was insufficient. > > > > Do you think that their fear is unfounded? Explaining why it is safe to > > > > remove this assertion in the commit message would go a long way towards > > > > easing my anxiety! > > > > > > > > Also, why delete the comment altogether? The comment provides some > > > > information that doesn't appear to become invalid, even with the > > > > assertion removed? > > > Stop_machine separated the mutex context and made a lockdep warning. > > > So text_mutex can't be used here. We need to find another check > > > solution. I agree with the patch. > > > > Whether or not you agree with the change is not the point (with your SoB > > I'd hope you agree with it). > > I understand that you two are trying to fix a false positive lockdep > > warning, but what I am asking for an explanation as to why the original > > author's fear is unfounded. > > Surely, having added the assertion, they were not thinking of the same > > code path that you guys are hitting the false positive on? > > > The assertion is reasonable since the fixmap entry is shared. The text_mutex > does should be held before entering that function. But the false positive cases > make some functions (ftrace for example) difficult to use due to warning log > storm. > > Either the lockdep should be fixed for stop_machine, or remove the assertion > simply now (we can keep the comments). (or do the assertion conditionly?) How would you suggest checking it conditionally? Also, if you persist in removing the assertion, there is a comment in arch/riscv/kernel/ftrace.c that would need to be updated. (L129-ish) The comment you removed in this patch seems valid both before and after though, so I don't see a compelling reason for its removal. > And this is not a riscv only problem but common for architectures which use > stop_machine to patch text. (arm for example) > > > Perhaps Zong themselves can tell us what the original fear was?
On Tue, Jan 31, 2023 at 07:50:20AM +0000, Conor Dooley wrote: > On Tue, Jan 31, 2023 at 03:26:33PM +0800, Guo Ren wrote: [snip] > > > > > > > > - /* > > > > - * Before reaching here, it was expected to lock the text_mutex > > > > - * already, so we don't need to give another lock here and could > > > > - * ensure that it was safe between each cores. > > > > - */ > > > > - lockdep_assert_held(&text_mutex); > > > > > > I must admit, patches like this do concern me a little, as a someone > > > unfamiliar with the world of probing and tracing. > > > Seeing an explicit check that the lock was held, leads me to believe > > > that the original author (Zong Li I think) thought that the text_mutex > > > lock was insufficient. > > > Do you think that their fear is unfounded? Explaining why it is safe to > > > remove this assertion in the commit message would go a long way towards > > > easing my anxiety! > > > > > > Also, why delete the comment altogether? The comment provides some > > > information that doesn't appear to become invalid, even with the > > > assertion removed? > > Stop_machine separated the mutex context and made a lockdep warning. > > So text_mutex can't be used here. We need to find another check > > solution. I agree with the patch. > > Whether or not you agree with the change is not the point (with your SoB > I'd hope you agree with it). > I understand that you two are trying to fix a false positive lockdep > warning, but what I am asking for an explanation as to why the original > author's fear is unfounded. > Surely, having added the assertion, they were not thinking of the same > code path that you guys are hitting the false positive on? > The assertion is reasonable since the fixmap entry is shared. The text_mutex does should be held before entering that function. But the false positive cases make some functions (ftrace for example) difficult to use due to warning log storm. Either the lockdep should be fixed for stop_machine, or remove the assertion simply now (we can keep the comments). (or do the assertion conditionly?) And this is not a riscv only problem but common for architectures which use stop_machine to patch text. (arm for example) > Perhaps Zong themselves can tell us what the original fear was? > > Thanks, > Conor.
On Fri, Feb 03, 2023 at 07:00:43AM +0800, Changbin Du wrote: btw, something is wrong with your mail client or host machine. Everything that you are sending is timestamped in the future, as it is currently 15:57 on the 2nd in UTC+8. > On Wed, Feb 01, 2023 at 02:01:07PM +0000, Conor Dooley wrote: > > On Thu, Feb 02, 2023 at 05:00:31AM +0800, Changbin Du wrote: > > > On Tue, Jan 31, 2023 at 07:50:20AM +0000, Conor Dooley wrote: > > > > On Tue, Jan 31, 2023 at 03:26:33PM +0800, Guo Ren wrote: > > > [snip] > > > > > > > > > > > > > > - /* > > > > > > > - * Before reaching here, it was expected to lock the text_mutex > > > > > > > - * already, so we don't need to give another lock here and could > > > > > > > - * ensure that it was safe between each cores. > > > > > > > - */ > > > > > > > - lockdep_assert_held(&text_mutex); > > > > > > > > > > > > I must admit, patches like this do concern me a little, as a someone > > > > > > unfamiliar with the world of probing and tracing. > > > > > > Seeing an explicit check that the lock was held, leads me to believe > > > > > > that the original author (Zong Li I think) thought that the text_mutex > > > > > > lock was insufficient. > > > > > > Do you think that their fear is unfounded? Explaining why it is safe to > > > > > > remove this assertion in the commit message would go a long way towards > > > > > > easing my anxiety! > > > > > > > > > > > > Also, why delete the comment altogether? The comment provides some > > > > > > information that doesn't appear to become invalid, even with the > > > > > > assertion removed? > > > > > Stop_machine separated the mutex context and made a lockdep warning. > > > > > So text_mutex can't be used here. We need to find another check > > > > > solution. I agree with the patch. > > > > > > > > Whether or not you agree with the change is not the point (with your SoB > > > > I'd hope you agree with it). > > > > I understand that you two are trying to fix a false positive lockdep > > > > warning, but what I am asking for an explanation as to why the original > > > > author's fear is unfounded. > > > > Surely, having added the assertion, they were not thinking of the same > > > > code path that you guys are hitting the false positive on? > > > > > > > The assertion is reasonable since the fixmap entry is shared. The text_mutex > > > does should be held before entering that function. But the false positive cases > > > make some functions (ftrace for example) difficult to use due to warning log > > > storm. > > > > > > Either the lockdep should be fixed for stop_machine, or remove the assertion > > > simply now (we can keep the comments). (or do the assertion conditionly?) > > > > How would you suggest checking it conditionally? > > > Please refer to a early patch from Palmer Dabbelt. > https://lore.kernel.org/all/20220322022331.32136-1-palmer@rivosinc.com/ Oh cool, thanks for that. Why not resend that approach, with your suggested fixup for ftrace_init_nop() then? It looks more complex, but is less worrisome & has an R-b from Steven already. Thanks, Conor.
On Thu, Feb 02, 2023 at 08:01:44AM +0000, Conor Dooley wrote: > On Fri, Feb 03, 2023 at 07:00:43AM +0800, Changbin Du wrote: > > btw, something is wrong with your mail client or host machine. > Everything that you are sending is timestamped in the future, > as it is currently 15:57 on the 2nd in UTC+8. > hmm, my machine goes 12 hours ahead. Thanks for your remainding. > > On Wed, Feb 01, 2023 at 02:01:07PM +0000, Conor Dooley wrote: > > > On Thu, Feb 02, 2023 at 05:00:31AM +0800, Changbin Du wrote: > > > > On Tue, Jan 31, 2023 at 07:50:20AM +0000, Conor Dooley wrote: > > > > > On Tue, Jan 31, 2023 at 03:26:33PM +0800, Guo Ren wrote: > > > > [snip] > > > > > > > > > > > > > > > > - /* > > > > > > > > - * Before reaching here, it was expected to lock the text_mutex > > > > > > > > - * already, so we don't need to give another lock here and could > > > > > > > > - * ensure that it was safe between each cores. > > > > > > > > - */ > > > > > > > > - lockdep_assert_held(&text_mutex); > > > > > > > > > > > > > > I must admit, patches like this do concern me a little, as a someone > > > > > > > unfamiliar with the world of probing and tracing. > > > > > > > Seeing an explicit check that the lock was held, leads me to believe > > > > > > > that the original author (Zong Li I think) thought that the text_mutex > > > > > > > lock was insufficient. > > > > > > > Do you think that their fear is unfounded? Explaining why it is safe to > > > > > > > remove this assertion in the commit message would go a long way towards > > > > > > > easing my anxiety! > > > > > > > > > > > > > > Also, why delete the comment altogether? The comment provides some > > > > > > > information that doesn't appear to become invalid, even with the > > > > > > > assertion removed? > > > > > > Stop_machine separated the mutex context and made a lockdep warning. > > > > > > So text_mutex can't be used here. We need to find another check > > > > > > solution. I agree with the patch. > > > > > > > > > > Whether or not you agree with the change is not the point (with your SoB > > > > > I'd hope you agree with it). > > > > > I understand that you two are trying to fix a false positive lockdep > > > > > warning, but what I am asking for an explanation as to why the original > > > > > author's fear is unfounded. > > > > > Surely, having added the assertion, they were not thinking of the same > > > > > code path that you guys are hitting the false positive on? > > > > > > > > > The assertion is reasonable since the fixmap entry is shared. The text_mutex > > > > does should be held before entering that function. But the false positive cases > > > > make some functions (ftrace for example) difficult to use due to warning log > > > > storm. > > > > > > > > Either the lockdep should be fixed for stop_machine, or remove the assertion > > > > simply now (we can keep the comments). (or do the assertion conditionly?) > > > > > > How would you suggest checking it conditionally? > > > > > Please refer to a early patch from Palmer Dabbelt. > > https://lore.kernel.org/all/20220322022331.32136-1-palmer@rivosinc.com/ > > Oh cool, thanks for that. > Why not resend that approach, with your suggested fixup for > ftrace_init_nop() then? > It looks more complex, but is less worrisome & has an R-b from Steven > already. > Personally I don't like the complex change, because the clients of text patching api are very limited. So I think it's not hard to take care of it. (The arm code also doesn't have the assertion) So I would leave the decision making to maintainers :) Anyway I will send V4 as a candidate. > Thanks, > Conor. >
On Wed, Feb 01, 2023 at 02:01:07PM +0000, Conor Dooley wrote: > On Thu, Feb 02, 2023 at 05:00:31AM +0800, Changbin Du wrote: > > On Tue, Jan 31, 2023 at 07:50:20AM +0000, Conor Dooley wrote: > > > On Tue, Jan 31, 2023 at 03:26:33PM +0800, Guo Ren wrote: > > [snip] > > > > > > > > > > > > - /* > > > > > > - * Before reaching here, it was expected to lock the text_mutex > > > > > > - * already, so we don't need to give another lock here and could > > > > > > - * ensure that it was safe between each cores. > > > > > > - */ > > > > > > - lockdep_assert_held(&text_mutex); > > > > > > > > > > I must admit, patches like this do concern me a little, as a someone > > > > > unfamiliar with the world of probing and tracing. > > > > > Seeing an explicit check that the lock was held, leads me to believe > > > > > that the original author (Zong Li I think) thought that the text_mutex > > > > > lock was insufficient. > > > > > Do you think that their fear is unfounded? Explaining why it is safe to > > > > > remove this assertion in the commit message would go a long way towards > > > > > easing my anxiety! > > > > > > > > > > Also, why delete the comment altogether? The comment provides some > > > > > information that doesn't appear to become invalid, even with the > > > > > assertion removed? > > > > Stop_machine separated the mutex context and made a lockdep warning. > > > > So text_mutex can't be used here. We need to find another check > > > > solution. I agree with the patch. > > > > > > Whether or not you agree with the change is not the point (with your SoB > > > I'd hope you agree with it). > > > I understand that you two are trying to fix a false positive lockdep > > > warning, but what I am asking for an explanation as to why the original > > > author's fear is unfounded. > > > Surely, having added the assertion, they were not thinking of the same > > > code path that you guys are hitting the false positive on? > > > > > The assertion is reasonable since the fixmap entry is shared. The text_mutex > > does should be held before entering that function. But the false positive cases > > make some functions (ftrace for example) difficult to use due to warning log > > storm. > > > > Either the lockdep should be fixed for stop_machine, or remove the assertion > > simply now (we can keep the comments). (or do the assertion conditionly?) > > How would you suggest checking it conditionally? > Please refer to a early patch from Palmer Dabbelt. https://lore.kernel.org/all/20220322022331.32136-1-palmer@rivosinc.com/ > Also, if you persist in removing the assertion, there is a comment in > arch/riscv/kernel/ftrace.c that would need to be updated. (L129-ish) > No problem. > The comment you removed in this patch seems valid both before and after > though, so I don't see a compelling reason for its removal. We all agreed. The key is to get rid of false positive case. > > > And this is not a riscv only problem but common for architectures which use > > stop_machine to patch text. (arm for example) > > > > > Perhaps Zong themselves can tell us what the original fear was? >
diff --git a/arch/riscv/kernel/patch.c b/arch/riscv/kernel/patch.c index 765004b60513..8619706f8dfd 100644 --- a/arch/riscv/kernel/patch.c +++ b/arch/riscv/kernel/patch.c @@ -55,13 +55,6 @@ static int patch_insn_write(void *addr, const void *insn, size_t len) bool across_pages = (((uintptr_t) addr & ~PAGE_MASK) + len) > PAGE_SIZE; int ret; - /* - * Before reaching here, it was expected to lock the text_mutex - * already, so we don't need to give another lock here and could - * ensure that it was safe between each cores. - */ - lockdep_assert_held(&text_mutex); - if (across_pages) patch_map(addr + len, FIX_TEXT_POKE1);