diff mbox series

[v1,3/3] arm64: disable irq on cpu shutdown flow

Message ID 1606486531-25719-4-git-send-email-hanks.chen@mediatek.com (mailing list archive)
State New, archived
Headers show
Series [v1,1/3] irqchip/gic: enable irq target all | expand

Commit Message

Hanks Chen Nov. 27, 2020, 2:15 p.m. UTC
Disable irq on cpu shutdown flow to ensure interrupts
did not bother this cpu after status as offline.

To avoid suspicious RCU usage
(0)[0:swapper/0]RCU used illegally from offline CPU! ...
(0)[0:swapper/0]lockdep: [name:lockdep&]cpu_id = 0, cpu_is_offline = 1

Signed-off-by: Hanks Chen <hanks.chen@mediatek.com>
---
 arch/arm64/kernel/smp.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

Comments

Marc Zyngier Nov. 27, 2020, 6:27 p.m. UTC | #1
On 2020-11-27 14:15, Hanks Chen wrote:
> Disable irq on cpu shutdown flow to ensure interrupts
> did not bother this cpu after status as offline.
> 
> To avoid suspicious RCU usage
> (0)[0:swapper/0]RCU used illegally from offline CPU! ...
> (0)[0:swapper/0]lockdep: [name:lockdep&]cpu_id = 0, cpu_is_offline = 1

This needs to be explained *a lot* more .

My hunch is that because a CPU going offline can still receive 
interrupts
thanks to your interrupt broadcast hack, you break some the core 
expectations,
and RCU shouts at you.

If that's indeed the case, I don't think the architecture code needs 
fixing
(or at least, not for that).

> 
> Signed-off-by: Hanks Chen <hanks.chen@mediatek.com>
> ---
>  arch/arm64/kernel/smp.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index 82e75fc2c903..27a6553fa86f 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -308,6 +308,12 @@ int __cpu_disable(void)
>  	remove_cpu_topology(cpu);
>  	numa_remove_cpu(cpu);
> 
> +	/*
> +	 * we disable irq here to ensure interrupts
> +	 * did not bother this cpu after status as offline.
> +	 */
> +	local_irq_disable();
> +
>  	/*
>  	 * Take this CPU offline.  Once we clear this, we can't return,
>  	 * and we must not schedule until we're ready to give up the cpu.

Conveniently, the code that takes care of migrating the interrupts is
just below this comment.  Which strongly suggests that the interrupt
migration is broken by your earlier patch.

> @@ -842,9 +848,10 @@ void arch_irq_work_raise(void)
> 
>  static void local_cpu_stop(void)
>  {
> +	local_daif_mask();
> +
>  	set_cpu_online(smp_processor_id(), false);
> 
> -	local_daif_mask();
>  	sdei_mask_local_cpu();
>  	cpu_park_loop();
>  }

What problem are you addressing here?

Thanks,

         M.
diff mbox series

Patch

diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 82e75fc2c903..27a6553fa86f 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -308,6 +308,12 @@  int __cpu_disable(void)
 	remove_cpu_topology(cpu);
 	numa_remove_cpu(cpu);
 
+	/*
+	 * we disable irq here to ensure interrupts
+	 * did not bother this cpu after status as offline.
+	 */
+	local_irq_disable();
+
 	/*
 	 * Take this CPU offline.  Once we clear this, we can't return,
 	 * and we must not schedule until we're ready to give up the cpu.
@@ -842,9 +848,10 @@  void arch_irq_work_raise(void)
 
 static void local_cpu_stop(void)
 {
+	local_daif_mask();
+
 	set_cpu_online(smp_processor_id(), false);
 
-	local_daif_mask();
 	sdei_mask_local_cpu();
 	cpu_park_loop();
 }