diff mbox

[REPOST] ARM: smp: Move clear_tasks_mm_cpumask() call to __cpu_die()

Message ID 20180504112939.25493-1-bigeasy@linutronix.de (mailing list archive)
State New, archived
Headers show

Commit Message

Sebastian Andrzej Siewior May 4, 2018, 11:29 a.m. UTC
From: Grygorii Strashko <grygorii.strashko@ti.com>

Suspending a CPU on a RT kernel results in the following backtrace:

| Disabling non-boot CPUs ...
| BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:917
| in_atomic(): 1, irqs_disabled(): 128, pid: 18, name: migration/1
| INFO: lockdep is turned off.
| irq event stamp: 122
| hardirqs last  enabled at (121): [<c06ac0ac>] _raw_spin_unlock_irqrestore+0x88/0x90
| hardirqs last disabled at (122): [<c06abed0>] _raw_spin_lock_irq+0x28/0x5c
|  CPU: 1 PID: 18 Comm: migration/1 Tainted: G        W       4.1.4-rt3-01046-g96ac8da #204
| Hardware name: Generic DRA74X (Flattened Device Tree)
| [<c0019134>] (unwind_backtrace) from [<c0014774>] (show_stack+0x20/0x24)
| [<c0014774>] (show_stack) from [<c06a70f4>] (dump_stack+0x88/0xdc)
| [<c06a70f4>] (dump_stack) from [<c006cab8>] (___might_sleep+0x198/0x2a8)
| [<c006cab8>] (___might_sleep) from [<c06ac4dc>] (rt_spin_lock+0x30/0x70)
| [<c06ac4dc>] (rt_spin_lock) from [<c013f790>] (find_lock_task_mm+0x9c/0x174)
| [<c013f790>] (find_lock_task_mm) from [<c00409ac>] (clear_tasks_mm_cpumask+0xb4/0x1ac)
| [<c00409ac>] (clear_tasks_mm_cpumask) from [<c00166a4>] (__cpu_disable+0x98/0xbc)
| [<c00166a4>] (__cpu_disable) from [<c06a2e8c>] (take_cpu_down+0x1c/0x50)
| [<c06a2e8c>] (take_cpu_down) from [<c00f2600>] (multi_cpu_stop+0x11c/0x158)
| [<c00f2600>] (multi_cpu_stop) from [<c00f2a9c>] (cpu_stopper_thread+0xc4/0x184)
| [<c00f2a9c>] (cpu_stopper_thread) from [<c0069058>] (smpboot_thread_fn+0x18c/0x324)
| [<c0069058>] (smpboot_thread_fn) from [<c00649c4>] (kthread+0xe8/0x104)
| [<c00649c4>] (kthread) from [<c0010058>] (ret_from_fork+0x14/0x3c)
| CPU1: shutdown

The root cause of above backtrace is task_lock() which takes a sleeping
lock on -RT.

To fix the issue, move clear_tasks_mm_cpumask() call from __cpu_disable()
to __cpu_die() which is called on the thread which is asking for a target
CPU to be shutdown. In addition, this change restores CPU hotplug
functionality on ARM CPU1 can be unplugged/plugged many times.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: <linux-arm-kernel@lists.infradead.org>
Cc: Sekhar Nori <nsekhar@ti.com>
Cc: Russell King <linux@arm.linux.org.uk>
Link: http://lkml.kernel.org/r/1441995683-30817-1-git-send-email-grygorii.strashko@ti.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bigeasy: slighty edited the commit message]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/arm/kernel/smp.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Comments

Russell King (Oracle) May 8, 2018, 11:45 a.m. UTC | #1
On Fri, May 04, 2018 at 01:29:39PM +0200, Sebastian Andrzej Siewior wrote:
> From: Grygorii Strashko <grygorii.strashko@ti.com>
> 
> Suspending a CPU on a RT kernel results in the following backtrace:
> 
> | Disabling non-boot CPUs ...
> | BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:917
> | in_atomic(): 1, irqs_disabled(): 128, pid: 18, name: migration/1
> | INFO: lockdep is turned off.
> | irq event stamp: 122
> | hardirqs last  enabled at (121): [<c06ac0ac>] _raw_spin_unlock_irqrestore+0x88/0x90
> | hardirqs last disabled at (122): [<c06abed0>] _raw_spin_lock_irq+0x28/0x5c
> |  CPU: 1 PID: 18 Comm: migration/1 Tainted: G        W       4.1.4-rt3-01046-g96ac8da #204
> | Hardware name: Generic DRA74X (Flattened Device Tree)
> | [<c0019134>] (unwind_backtrace) from [<c0014774>] (show_stack+0x20/0x24)
> | [<c0014774>] (show_stack) from [<c06a70f4>] (dump_stack+0x88/0xdc)
> | [<c06a70f4>] (dump_stack) from [<c006cab8>] (___might_sleep+0x198/0x2a8)
> | [<c006cab8>] (___might_sleep) from [<c06ac4dc>] (rt_spin_lock+0x30/0x70)
> | [<c06ac4dc>] (rt_spin_lock) from [<c013f790>] (find_lock_task_mm+0x9c/0x174)
> | [<c013f790>] (find_lock_task_mm) from [<c00409ac>] (clear_tasks_mm_cpumask+0xb4/0x1ac)
> | [<c00409ac>] (clear_tasks_mm_cpumask) from [<c00166a4>] (__cpu_disable+0x98/0xbc)
> | [<c00166a4>] (__cpu_disable) from [<c06a2e8c>] (take_cpu_down+0x1c/0x50)
> | [<c06a2e8c>] (take_cpu_down) from [<c00f2600>] (multi_cpu_stop+0x11c/0x158)
> | [<c00f2600>] (multi_cpu_stop) from [<c00f2a9c>] (cpu_stopper_thread+0xc4/0x184)
> | [<c00f2a9c>] (cpu_stopper_thread) from [<c0069058>] (smpboot_thread_fn+0x18c/0x324)
> | [<c0069058>] (smpboot_thread_fn) from [<c00649c4>] (kthread+0xe8/0x104)
> | [<c00649c4>] (kthread) from [<c0010058>] (ret_from_fork+0x14/0x3c)
> | CPU1: shutdown
> 
> The root cause of above backtrace is task_lock() which takes a sleeping
> lock on -RT.
> 
> To fix the issue, move clear_tasks_mm_cpumask() call from __cpu_disable()
> to __cpu_die() which is called on the thread which is asking for a target
> CPU to be shutdown. In addition, this change restores CPU hotplug
> functionality on ARM CPU1 can be unplugged/plugged many times.

This looks fine to me - all the paths that use the mm_cpumask() check
that the CPU is still marked online.

Please send to the patch system, thanks.

> 
> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
> Cc: <linux-arm-kernel@lists.infradead.org>
> Cc: Sekhar Nori <nsekhar@ti.com>
> Cc: Russell King <linux@arm.linux.org.uk>
> Link: http://lkml.kernel.org/r/1441995683-30817-1-git-send-email-grygorii.strashko@ti.com
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> [bigeasy: slighty edited the commit message]
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>  arch/arm/kernel/smp.c |    3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> --- a/arch/arm/kernel/smp.c
> +++ b/arch/arm/kernel/smp.c
> @@ -236,8 +236,6 @@ int __cpu_disable(void)
>  	flush_cache_louis();
>  	local_flush_tlb_all();
>  
> -	clear_tasks_mm_cpumask(cpu);
> -
>  	return 0;
>  }
>  
> @@ -255,6 +253,7 @@ void __cpu_die(unsigned int cpu)
>  	}
>  	pr_debug("CPU%u: shutdown\n", cpu);
>  
> +	clear_tasks_mm_cpumask(cpu);
>  	/*
>  	 * platform_cpu_kill() is generally expected to do the powering off
>  	 * and/or cutting of clocks to the dying CPU.  Optionally, this may
diff mbox

Patch

--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -236,8 +236,6 @@  int __cpu_disable(void)
 	flush_cache_louis();
 	local_flush_tlb_all();
 
-	clear_tasks_mm_cpumask(cpu);
-
 	return 0;
 }
 
@@ -255,6 +253,7 @@  void __cpu_die(unsigned int cpu)
 	}
 	pr_debug("CPU%u: shutdown\n", cpu);
 
+	clear_tasks_mm_cpumask(cpu);
 	/*
 	 * platform_cpu_kill() is generally expected to do the powering off
 	 * and/or cutting of clocks to the dying CPU.  Optionally, this may