Message ID | 20201028182614.13655-1-cai@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | arm64/smp: Move rcu_cpu_starting() earlier | expand |
On Wed, Oct 28, 2020 at 02:26:14PM -0400, Qian Cai wrote: > The call to rcu_cpu_starting() in secondary_start_kernel() is not early > enough in the CPU-hotplug onlining process, which results in lockdep > splats as follows: > > WARNING: suspicious RCU usage > ----------------------------- > kernel/locking/lockdep.c:3497 RCU-list traversed in non-reader section!! > > other info that might help us debug this: > > RCU used illegally from offline CPU! > rcu_scheduler_active = 1, debug_locks = 1 > no locks held by swapper/1/0. > > Call trace: > dump_backtrace+0x0/0x3c8 > show_stack+0x14/0x60 > dump_stack+0x14c/0x1c4 > lockdep_rcu_suspicious+0x134/0x14c > __lock_acquire+0x1c30/0x2600 > lock_acquire+0x274/0xc48 > _raw_spin_lock+0xc8/0x140 > vprintk_emit+0x90/0x3d0 > vprintk_default+0x34/0x40 > vprintk_func+0x378/0x590 > printk+0xa8/0xd4 > __cpuinfo_store_cpu+0x71c/0x868 > cpuinfo_store_cpu+0x2c/0xc8 > secondary_start_kernel+0x244/0x318 > > This is avoided by moving the call to rcu_cpu_starting up near the > beginning of the secondary_start_kernel() function. > > Link: https://lore.kernel.org/lkml/160223032121.7002.1269740091547117869.tip-bot2@tip-bot2/ > Signed-off-by: Qian Cai <cai@redhat.com> Interesting way to compute "cpu" earlier in the code, but nevertheless: Acked-by: Paul E. McKenney <paulmck@kernel.org> > --- > arch/arm64/kernel/smp.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c > index 82e75fc2c903..09c96f57818c 100644 > --- a/arch/arm64/kernel/smp.c > +++ b/arch/arm64/kernel/smp.c > @@ -222,6 +222,7 @@ asmlinkage notrace void secondary_start_kernel(void) > if (system_uses_irq_prio_masking()) > init_gic_priority_masking(); > > + rcu_cpu_starting(cpu); > preempt_disable(); > trace_hardirqs_off(); > > -- > 2.28.0 >
On Wed, Oct 28, 2020 at 02:26:14PM -0400, Qian Cai wrote: > The call to rcu_cpu_starting() in secondary_start_kernel() is not early > enough in the CPU-hotplug onlining process, which results in lockdep > splats as follows: > > WARNING: suspicious RCU usage > ----------------------------- > kernel/locking/lockdep.c:3497 RCU-list traversed in non-reader section!! > > other info that might help us debug this: > > RCU used illegally from offline CPU! > rcu_scheduler_active = 1, debug_locks = 1 > no locks held by swapper/1/0. > > Call trace: > dump_backtrace+0x0/0x3c8 > show_stack+0x14/0x60 > dump_stack+0x14c/0x1c4 > lockdep_rcu_suspicious+0x134/0x14c > __lock_acquire+0x1c30/0x2600 > lock_acquire+0x274/0xc48 > _raw_spin_lock+0xc8/0x140 > vprintk_emit+0x90/0x3d0 > vprintk_default+0x34/0x40 > vprintk_func+0x378/0x590 > printk+0xa8/0xd4 > __cpuinfo_store_cpu+0x71c/0x868 > cpuinfo_store_cpu+0x2c/0xc8 > secondary_start_kernel+0x244/0x318 > > This is avoided by moving the call to rcu_cpu_starting up near the > beginning of the secondary_start_kernel() function. Hmm, it's not really a move though -- we'll end up calling this thing twice afaict. It would be better to make sure we've called notify_cpu_starting() early enough. Can we do that instead? Will
On Thu, 2020-10-29 at 09:10 +0000, Will Deacon wrote: > On Wed, Oct 28, 2020 at 02:26:14PM -0400, Qian Cai wrote: > > The call to rcu_cpu_starting() in secondary_start_kernel() is not early > > enough in the CPU-hotplug onlining process, which results in lockdep > > splats as follows: > > > > WARNING: suspicious RCU usage > > ----------------------------- > > kernel/locking/lockdep.c:3497 RCU-list traversed in non-reader section!! > > > > other info that might help us debug this: > > > > RCU used illegally from offline CPU! > > rcu_scheduler_active = 1, debug_locks = 1 > > no locks held by swapper/1/0. > > > > Call trace: > > dump_backtrace+0x0/0x3c8 > > show_stack+0x14/0x60 > > dump_stack+0x14c/0x1c4 > > lockdep_rcu_suspicious+0x134/0x14c > > __lock_acquire+0x1c30/0x2600 > > lock_acquire+0x274/0xc48 > > _raw_spin_lock+0xc8/0x140 > > vprintk_emit+0x90/0x3d0 > > vprintk_default+0x34/0x40 > > vprintk_func+0x378/0x590 > > printk+0xa8/0xd4 > > __cpuinfo_store_cpu+0x71c/0x868 > > cpuinfo_store_cpu+0x2c/0xc8 > > secondary_start_kernel+0x244/0x318 > > > > This is avoided by moving the call to rcu_cpu_starting up near the > > beginning of the secondary_start_kernel() function. > > Hmm, it's not really a move though -- we'll end up calling this thing twice > afaict. It would be better to make sure we've called notify_cpu_starting() > early enough. Can we do that instead? Paul mentioned that it is fine to call rcu_cpu_starting() multiple times, and Peter mentioned that CPU bringup is complicated. Thus, I thought about doing something safe here. I tested a bit of patch below which seems fine, but I can't tell for sure if it is safe. Any suggestion? --- a/arch/arm64/kernel/smp.c +++ b/arch/arm64/kernel/smp.c @@ -224,6 +224,7 @@ asmlinkage notrace void secondary_start_kernel(void) preempt_disable(); trace_hardirqs_off(); + notify_cpu_starting(cpu); /* * If the system has established the capabilities, make sure @@ -244,7 +245,6 @@ asmlinkage notrace void secondary_start_kernel(void) /* * Enable GIC and timers. */ - notify_cpu_starting(cpu); ipi_setup(cpu);
On Thu, Oct 29, 2020 at 09:10:45AM +0000, Will Deacon wrote: > On Wed, Oct 28, 2020 at 02:26:14PM -0400, Qian Cai wrote: > > The call to rcu_cpu_starting() in secondary_start_kernel() is not early > > enough in the CPU-hotplug onlining process, which results in lockdep > > splats as follows: > > > > WARNING: suspicious RCU usage > > ----------------------------- > > kernel/locking/lockdep.c:3497 RCU-list traversed in non-reader section!! > > > > other info that might help us debug this: > > > > RCU used illegally from offline CPU! > > rcu_scheduler_active = 1, debug_locks = 1 > > no locks held by swapper/1/0. > > > > Call trace: > > dump_backtrace+0x0/0x3c8 > > show_stack+0x14/0x60 > > dump_stack+0x14c/0x1c4 > > lockdep_rcu_suspicious+0x134/0x14c > > __lock_acquire+0x1c30/0x2600 > > lock_acquire+0x274/0xc48 > > _raw_spin_lock+0xc8/0x140 > > vprintk_emit+0x90/0x3d0 > > vprintk_default+0x34/0x40 > > vprintk_func+0x378/0x590 > > printk+0xa8/0xd4 > > __cpuinfo_store_cpu+0x71c/0x868 > > cpuinfo_store_cpu+0x2c/0xc8 > > secondary_start_kernel+0x244/0x318 > > > > This is avoided by moving the call to rcu_cpu_starting up near the > > beginning of the secondary_start_kernel() function. > > Hmm, it's not really a move though -- we'll end up calling this thing twice > afaict. It would be better to make sure we've called notify_cpu_starting() > early enough. Can we do that instead? It uses a per-CPU variable so that RCU pays attention only to the first call to rcu_cpu_starting() if there is more than one of them. This is even intentional, due to there being a generic arch-independent call to rcu_cpu_starting() in notify_cpu_starting(). So multiple calls to rcu_cpu_starting() are fine by design. Thanx, Paul
On Thu, Oct 29, 2020 at 09:17:35AM -0400, Qian Cai wrote: > On Thu, 2020-10-29 at 09:10 +0000, Will Deacon wrote: > > On Wed, Oct 28, 2020 at 02:26:14PM -0400, Qian Cai wrote: > > > The call to rcu_cpu_starting() in secondary_start_kernel() is not early > > > enough in the CPU-hotplug onlining process, which results in lockdep > > > splats as follows: > > > > > > WARNING: suspicious RCU usage > > > ----------------------------- > > > kernel/locking/lockdep.c:3497 RCU-list traversed in non-reader section!! > > > > > > other info that might help us debug this: > > > > > > RCU used illegally from offline CPU! > > > rcu_scheduler_active = 1, debug_locks = 1 > > > no locks held by swapper/1/0. > > > > > > Call trace: > > > dump_backtrace+0x0/0x3c8 > > > show_stack+0x14/0x60 > > > dump_stack+0x14c/0x1c4 > > > lockdep_rcu_suspicious+0x134/0x14c > > > __lock_acquire+0x1c30/0x2600 > > > lock_acquire+0x274/0xc48 > > > _raw_spin_lock+0xc8/0x140 > > > vprintk_emit+0x90/0x3d0 > > > vprintk_default+0x34/0x40 > > > vprintk_func+0x378/0x590 > > > printk+0xa8/0xd4 > > > __cpuinfo_store_cpu+0x71c/0x868 > > > cpuinfo_store_cpu+0x2c/0xc8 > > > secondary_start_kernel+0x244/0x318 > > > > > > This is avoided by moving the call to rcu_cpu_starting up near the > > > beginning of the secondary_start_kernel() function. > > > > Hmm, it's not really a move though -- we'll end up calling this thing twice > > afaict. It would be better to make sure we've called notify_cpu_starting() > > early enough. Can we do that instead? > > Paul mentioned that it is fine to call rcu_cpu_starting() multiple times, and > Peter mentioned that CPU bringup is complicated. Thus, I thought about doing > something safe here. > > I tested a bit of patch below which seems fine, but I can't tell for sure if it > is safe. Any suggestion? No, you're right -- this does look dodgy as I think we'll end up kicking the CPU notifiers before things like CPU errata have been figured out. So I'll pick up your original patch with Paul's ack. Thanks! Will
On Wed, 28 Oct 2020 14:26:14 -0400, Qian Cai wrote: > The call to rcu_cpu_starting() in secondary_start_kernel() is not early > enough in the CPU-hotplug onlining process, which results in lockdep > splats as follows: > > WARNING: suspicious RCU usage > ----------------------------- > kernel/locking/lockdep.c:3497 RCU-list traversed in non-reader section!! > > [...] Applied to arm64 (for-next/fixes), thanks! [1/1] arm64/smp: Move rcu_cpu_starting() earlier https://git.kernel.org/arm64/c/ce3d31ad3cac Cheers,
On Fri, Oct 30, 2020 at 04:33:25PM +0000, Will Deacon wrote: > On Wed, 28 Oct 2020 14:26:14 -0400, Qian Cai wrote: > > The call to rcu_cpu_starting() in secondary_start_kernel() is not early > > enough in the CPU-hotplug onlining process, which results in lockdep > > splats as follows: > > > > WARNING: suspicious RCU usage > > ----------------------------- > > kernel/locking/lockdep.c:3497 RCU-list traversed in non-reader section!! > > > > [...] > > Applied to arm64 (for-next/fixes), thanks! > > [1/1] arm64/smp: Move rcu_cpu_starting() earlier > https://git.kernel.org/arm64/c/ce3d31ad3cac Hmm, this patch has caused a regression in the case that we fail to online a CPU because it has incompatible CPU features and so we park it in cpu_die_early(). We now get an endless spew of RCU stalls because the core will never come online, but is being tracked by RCU. So I'm tempted to revert this and live with the lockdep warning while we figure out a proper fix. What's the correct say to undo rcu_cpu_starting(), given that we cannot invoke the full hotplug machinery here? Is it correct to call rcutree_dying_cpu() on the bad CPU and then rcutree_dead_cpu() from the CPU doing cpu_up(), or should we do something else? Will
On Thu, 2020-11-05 at 22:22 +0000, Will Deacon wrote: > On Fri, Oct 30, 2020 at 04:33:25PM +0000, Will Deacon wrote: > > On Wed, 28 Oct 2020 14:26:14 -0400, Qian Cai wrote: > > > The call to rcu_cpu_starting() in secondary_start_kernel() is not early > > > enough in the CPU-hotplug onlining process, which results in lockdep > > > splats as follows: > > > > > > WARNING: suspicious RCU usage > > > ----------------------------- > > > kernel/locking/lockdep.c:3497 RCU-list traversed in non-reader section!! > > > > > > [...] > > > > Applied to arm64 (for-next/fixes), thanks! > > > > [1/1] arm64/smp: Move rcu_cpu_starting() earlier > > https://git.kernel.org/arm64/c/ce3d31ad3cac > > Hmm, this patch has caused a regression in the case that we fail to > online a CPU because it has incompatible CPU features and so we park it > in cpu_die_early(). We now get an endless spew of RCU stalls because the > core will never come online, but is being tracked by RCU. So I'm tempted > to revert this and live with the lockdep warning while we figure out a > proper fix. > > What's the correct say to undo rcu_cpu_starting(), given that we cannot > invoke the full hotplug machinery here? Is it correct to call > rcutree_dying_cpu() on the bad CPU and then rcutree_dead_cpu() from the > CPU doing cpu_up(), or should we do something else? It looks to me that rcu_report_dead() does the opposite of rcu_cpu_starting(), so lift rcu_report_dead() out of CONFIG_HOTPLUG_CPU and use it there to rewind, Paul?
On Thu, Nov 05, 2020 at 06:02:49PM -0500, Qian Cai wrote: > On Thu, 2020-11-05 at 22:22 +0000, Will Deacon wrote: > > On Fri, Oct 30, 2020 at 04:33:25PM +0000, Will Deacon wrote: > > > On Wed, 28 Oct 2020 14:26:14 -0400, Qian Cai wrote: > > > > The call to rcu_cpu_starting() in secondary_start_kernel() is not early > > > > enough in the CPU-hotplug onlining process, which results in lockdep > > > > splats as follows: > > > > > > > > WARNING: suspicious RCU usage > > > > ----------------------------- > > > > kernel/locking/lockdep.c:3497 RCU-list traversed in non-reader section!! > > > > > > > > [...] > > > > > > Applied to arm64 (for-next/fixes), thanks! > > > > > > [1/1] arm64/smp: Move rcu_cpu_starting() earlier > > > https://git.kernel.org/arm64/c/ce3d31ad3cac > > > > Hmm, this patch has caused a regression in the case that we fail to > > online a CPU because it has incompatible CPU features and so we park it > > in cpu_die_early(). We now get an endless spew of RCU stalls because the > > core will never come online, but is being tracked by RCU. So I'm tempted > > to revert this and live with the lockdep warning while we figure out a > > proper fix. > > > > What's the correct say to undo rcu_cpu_starting(), given that we cannot > > invoke the full hotplug machinery here? Is it correct to call > > rcutree_dying_cpu() on the bad CPU and then rcutree_dead_cpu() from the > > CPU doing cpu_up(), or should we do something else? > It looks to me that rcu_report_dead() does the opposite of rcu_cpu_starting(), > so lift rcu_report_dead() out of CONFIG_HOTPLUG_CPU and use it there to rewind, > Paul? Yes, rcu_report_dead() should do the trick. Presumably the earlier online-time CPU-hotplug notifiers are also unwound? Thanx, Paul
On Thu, 2020-11-05 at 15:28 -0800, Paul E. McKenney wrote: > On Thu, Nov 05, 2020 at 06:02:49PM -0500, Qian Cai wrote: > > On Thu, 2020-11-05 at 22:22 +0000, Will Deacon wrote: > > > On Fri, Oct 30, 2020 at 04:33:25PM +0000, Will Deacon wrote: > > > > On Wed, 28 Oct 2020 14:26:14 -0400, Qian Cai wrote: > > > > > The call to rcu_cpu_starting() in secondary_start_kernel() is not > > > > > early > > > > > enough in the CPU-hotplug onlining process, which results in lockdep > > > > > splats as follows: > > > > > > > > > > WARNING: suspicious RCU usage > > > > > ----------------------------- > > > > > kernel/locking/lockdep.c:3497 RCU-list traversed in non-reader > > > > > section!! > > > > > > > > > > [...] > > > > > > > > Applied to arm64 (for-next/fixes), thanks! > > > > > > > > [1/1] arm64/smp: Move rcu_cpu_starting() earlier > > > > https://git.kernel.org/arm64/c/ce3d31ad3cac > > > > > > Hmm, this patch has caused a regression in the case that we fail to > > > online a CPU because it has incompatible CPU features and so we park it > > > in cpu_die_early(). We now get an endless spew of RCU stalls because the > > > core will never come online, but is being tracked by RCU. So I'm tempted > > > to revert this and live with the lockdep warning while we figure out a > > > proper fix. > > > > > > What's the correct say to undo rcu_cpu_starting(), given that we cannot > > > invoke the full hotplug machinery here? Is it correct to call > > > rcutree_dying_cpu() on the bad CPU and then rcutree_dead_cpu() from the > > > CPU doing cpu_up(), or should we do something else? > > It looks to me that rcu_report_dead() does the opposite of > > rcu_cpu_starting(), > > so lift rcu_report_dead() out of CONFIG_HOTPLUG_CPU and use it there to > > rewind, > > Paul? > > Yes, rcu_report_dead() should do the trick. Presumably the earlier > online-time CPU-hotplug notifiers are also unwound? I don't think that is an issue here. cpu_die_early() set CPU_STUCK_IN_KERNEL, and then __cpu_up() will see a timeout waiting for the AP online and then deal with CPU_STUCK_IN_KERNEL according. Thus, something like this? I don't see anything in rcu_report_dead() depends on CONFIG_HOTPLUG_CPU=y. diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c index 09c96f57818c..10729d2d6084 100644 --- a/arch/arm64/kernel/smp.c +++ b/arch/arm64/kernel/smp.c @@ -421,6 +421,8 @@ void cpu_die_early(void) update_cpu_boot_status(CPU_STUCK_IN_KERNEL); + rcu_report_dead(cpu); + cpu_park_loop(); } diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 2a52f42f64b6..bd04b09b84b3 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -4077,7 +4077,6 @@ void rcu_cpu_starting(unsigned int cpu) smp_mb(); /* Ensure RCU read-side usage follows above initialization. */ } -#ifdef CONFIG_HOTPLUG_CPU /* * The outgoing function has no further need of RCU, so remove it from * the rcu_node tree's ->qsmaskinitnext bit masks. @@ -4117,6 +4116,7 @@ void rcu_report_dead(unsigned int cpu) rdp->cpu_started = false; } +#ifdef CONFIG_HOTPLUG_CPU /* * The outgoing CPU has just passed through the dying-idle state, and we * are being invoked from the CPU that was IPIed to continue the offline
On Thu, Nov 05, 2020 at 09:15:24PM -0500, Qian Cai wrote: > On Thu, 2020-11-05 at 15:28 -0800, Paul E. McKenney wrote: > > On Thu, Nov 05, 2020 at 06:02:49PM -0500, Qian Cai wrote: > > > On Thu, 2020-11-05 at 22:22 +0000, Will Deacon wrote: > > > > On Fri, Oct 30, 2020 at 04:33:25PM +0000, Will Deacon wrote: > > > > > On Wed, 28 Oct 2020 14:26:14 -0400, Qian Cai wrote: > > > > > > The call to rcu_cpu_starting() in secondary_start_kernel() is not > > > > > > early > > > > > > enough in the CPU-hotplug onlining process, which results in lockdep > > > > > > splats as follows: > > > > > > > > > > > > WARNING: suspicious RCU usage > > > > > > ----------------------------- > > > > > > kernel/locking/lockdep.c:3497 RCU-list traversed in non-reader > > > > > > section!! > > > > > > > > > > > > [...] > > > > > > > > > > Applied to arm64 (for-next/fixes), thanks! > > > > > > > > > > [1/1] arm64/smp: Move rcu_cpu_starting() earlier > > > > > https://git.kernel.org/arm64/c/ce3d31ad3cac > > > > > > > > Hmm, this patch has caused a regression in the case that we fail to > > > > online a CPU because it has incompatible CPU features and so we park it > > > > in cpu_die_early(). We now get an endless spew of RCU stalls because the > > > > core will never come online, but is being tracked by RCU. So I'm tempted > > > > to revert this and live with the lockdep warning while we figure out a > > > > proper fix. > > > > > > > > What's the correct say to undo rcu_cpu_starting(), given that we cannot > > > > invoke the full hotplug machinery here? Is it correct to call > > > > rcutree_dying_cpu() on the bad CPU and then rcutree_dead_cpu() from the > > > > CPU doing cpu_up(), or should we do something else? > > > It looks to me that rcu_report_dead() does the opposite of > > > rcu_cpu_starting(), > > > so lift rcu_report_dead() out of CONFIG_HOTPLUG_CPU and use it there to > > > rewind, > > > Paul? > > > > Yes, rcu_report_dead() should do the trick. Presumably the earlier > > online-time CPU-hotplug notifiers are also unwound? > I don't think that is an issue here. cpu_die_early() set CPU_STUCK_IN_KERNEL, > and then __cpu_up() will see a timeout waiting for the AP online and then deal > with CPU_STUCK_IN_KERNEL according. Thus, something like this? I don't see > anything in rcu_report_dead() depends on CONFIG_HOTPLUG_CPU=y. If this works for the ARM folks, it seems like a reasonable approach to me. I cannot reasonably test this because not only do I not have an ARM system, I don't have a system on which a kernel can be built with CONFIG_HOTPLUG_CPU=n, so I must rely on others' testing. Thanx, Paul > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c > index 09c96f57818c..10729d2d6084 100644 > --- a/arch/arm64/kernel/smp.c > +++ b/arch/arm64/kernel/smp.c > @@ -421,6 +421,8 @@ void cpu_die_early(void) > > update_cpu_boot_status(CPU_STUCK_IN_KERNEL); > > + rcu_report_dead(cpu); > + > cpu_park_loop(); > } > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index 2a52f42f64b6..bd04b09b84b3 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -4077,7 +4077,6 @@ void rcu_cpu_starting(unsigned int cpu) > smp_mb(); /* Ensure RCU read-side usage follows above initialization. */ > } > > -#ifdef CONFIG_HOTPLUG_CPU > /* > * The outgoing function has no further need of RCU, so remove it from > * the rcu_node tree's ->qsmaskinitnext bit masks. > @@ -4117,6 +4116,7 @@ void rcu_report_dead(unsigned int cpu) > rdp->cpu_started = false; > } > > +#ifdef CONFIG_HOTPLUG_CPU > /* > * The outgoing CPU has just passed through the dying-idle state, and we > * are being invoked from the CPU that was IPIed to continue the offline >
On Thu, Nov 05, 2020 at 09:15:24PM -0500, Qian Cai wrote: > On Thu, 2020-11-05 at 15:28 -0800, Paul E. McKenney wrote: > > On Thu, Nov 05, 2020 at 06:02:49PM -0500, Qian Cai wrote: > > > On Thu, 2020-11-05 at 22:22 +0000, Will Deacon wrote: > > > > Hmm, this patch has caused a regression in the case that we fail to > > > > online a CPU because it has incompatible CPU features and so we park it > > > > in cpu_die_early(). We now get an endless spew of RCU stalls because the > > > > core will never come online, but is being tracked by RCU. So I'm tempted > > > > to revert this and live with the lockdep warning while we figure out a > > > > proper fix. > > > > > > > > What's the correct say to undo rcu_cpu_starting(), given that we cannot > > > > invoke the full hotplug machinery here? Is it correct to call > > > > rcutree_dying_cpu() on the bad CPU and then rcutree_dead_cpu() from the > > > > CPU doing cpu_up(), or should we do something else? > > > It looks to me that rcu_report_dead() does the opposite of > > > rcu_cpu_starting(), > > > so lift rcu_report_dead() out of CONFIG_HOTPLUG_CPU and use it there to > > > rewind, > > > Paul? > > > > Yes, rcu_report_dead() should do the trick. Presumably the earlier > > online-time CPU-hotplug notifiers are also unwound? > I don't think that is an issue here. cpu_die_early() set CPU_STUCK_IN_KERNEL, > and then __cpu_up() will see a timeout waiting for the AP online and then deal > with CPU_STUCK_IN_KERNEL according. Thus, something like this? I don't see > anything in rcu_report_dead() depends on CONFIG_HOTPLUG_CPU=y. Cheers both for suggesting rcu_report_dead(). > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c > index 09c96f57818c..10729d2d6084 100644 > --- a/arch/arm64/kernel/smp.c > +++ b/arch/arm64/kernel/smp.c > @@ -421,6 +421,8 @@ void cpu_die_early(void) > > update_cpu_boot_status(CPU_STUCK_IN_KERNEL); > > + rcu_report_dead(cpu); I think this is in the wrong place, see: https://lore.kernel.org/r/20201106103602.9849-1-will@kernel.org which seems to fix the problem for me. Will
On Fri, 2020-11-06 at 10:37 +0000, Will Deacon wrote: > > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c > > index 09c96f57818c..10729d2d6084 100644 > > --- a/arch/arm64/kernel/smp.c > > +++ b/arch/arm64/kernel/smp.c > > @@ -421,6 +421,8 @@ void cpu_die_early(void) > > > > update_cpu_boot_status(CPU_STUCK_IN_KERNEL); > > > > + rcu_report_dead(cpu); > > I think this is in the wrong place, see: > > https://lore.kernel.org/r/20201106103602.9849-1-will@kernel.org > > which seems to fix the problem for me. Ah, I had not realized that cpu_psci_cpu_die() could no return. Your patchset looks good to me.
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c index 82e75fc2c903..09c96f57818c 100644 --- a/arch/arm64/kernel/smp.c +++ b/arch/arm64/kernel/smp.c @@ -222,6 +222,7 @@ asmlinkage notrace void secondary_start_kernel(void) if (system_uses_irq_prio_masking()) init_gic_priority_masking(); + rcu_cpu_starting(cpu); preempt_disable(); trace_hardirqs_off();
The call to rcu_cpu_starting() in secondary_start_kernel() is not early enough in the CPU-hotplug onlining process, which results in lockdep splats as follows: WARNING: suspicious RCU usage ----------------------------- kernel/locking/lockdep.c:3497 RCU-list traversed in non-reader section!! other info that might help us debug this: RCU used illegally from offline CPU! rcu_scheduler_active = 1, debug_locks = 1 no locks held by swapper/1/0. Call trace: dump_backtrace+0x0/0x3c8 show_stack+0x14/0x60 dump_stack+0x14c/0x1c4 lockdep_rcu_suspicious+0x134/0x14c __lock_acquire+0x1c30/0x2600 lock_acquire+0x274/0xc48 _raw_spin_lock+0xc8/0x140 vprintk_emit+0x90/0x3d0 vprintk_default+0x34/0x40 vprintk_func+0x378/0x590 printk+0xa8/0xd4 __cpuinfo_store_cpu+0x71c/0x868 cpuinfo_store_cpu+0x2c/0xc8 secondary_start_kernel+0x244/0x318 This is avoided by moving the call to rcu_cpu_starting up near the beginning of the secondary_start_kernel() function. Link: https://lore.kernel.org/lkml/160223032121.7002.1269740091547117869.tip-bot2@tip-bot2/ Signed-off-by: Qian Cai <cai@redhat.com> --- arch/arm64/kernel/smp.c | 1 + 1 file changed, 1 insertion(+)