diff mbox series

[V3,01/11] ARM: Disable FIQs (but not IRQs) on CPUs shutdown paths

Message ID 20220819221731.480795-2-gpiccoli@igalia.com (mailing list archive)
State Not Applicable
Headers show
Series The panic notifiers refactor - fixes/clean-ups (V3) | expand

Checks

Context Check Description
netdev/tree_selection success Not a local patch, async

Commit Message

Guilherme G. Piccoli Aug. 19, 2022, 10:17 p.m. UTC
Currently the regular CPU shutdown path for ARM disables IRQs/FIQs
in the secondary CPUs - smp_send_stop() calls ipi_cpu_stop(), which
is responsible for that. IRQs are architecturally masked when we
take an interrupt, but FIQs are high priority than IRQs, hence they
aren't masked. With that said, it makes sense to disable FIQs here,
but there's no need for (re-)disabling IRQs.

More than that: there is an alternative path for disabling CPUs,
in the form of function crash_smp_send_stop(), which is used for
kexec/panic path. This function relies on a SMP call that also
triggers a busy-wait loop [at machine_crash_nonpanic_core()], but
without disabling FIQs. This might lead to odd scenarios, like
early interrupts in the boot of kexec'd kernel or even interrupts
in secondary "disabled" CPUs while the main one still works in the
panic path and assumes all secondary CPUs are (really!) off.

So, let's disable FIQs in both paths and *not* disable IRQs a second
time, since they are already masked in both paths by the architecture.
This way, we keep both CPU quiesce paths consistent and safe.

Cc: Marc Zyngier <maz@kernel.org>
Cc: Michael Kelley <mikelley@microsoft.com>
Cc: Russell King <linux@armlinux.org.uk>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>

---

V3:
- No changes.

V2:
- Small wording improvement (thanks Michael Kelley);
- Only disable FIQs, since IRQs are masked by architecture
definition when we take an interrupt. Thanks a lot Russell
and Marc for the discussion [0].

Should we add a Fixes tag here? If so, maybe the proper target is:
b23065313297 ("ARM: 6522/1: kexec: Add call to non-crashing cores through IPI")

[0] https://lore.kernel.org/lkml/Ymxcaqy6DwhoQrZT@shell.armlinux.org.uk/


 arch/arm/kernel/machine_kexec.c | 2 ++
 arch/arm/kernel/smp.c           | 5 ++---
 2 files changed, 4 insertions(+), 3 deletions(-)

Comments

Guilherme G. Piccoli Sept. 18, 2022, 1:58 p.m. UTC | #1
On 19/08/2022 19:17, Guilherme G. Piccoli wrote:
> Currently the regular CPU shutdown path for ARM disables IRQs/FIQs
> in the secondary CPUs - smp_send_stop() calls ipi_cpu_stop(), which
> is responsible for that. IRQs are architecturally masked when we
> take an interrupt, but FIQs are high priority than IRQs, hence they
> aren't masked. With that said, it makes sense to disable FIQs here,
> but there's no need for (re-)disabling IRQs.
> 
> More than that: there is an alternative path for disabling CPUs,
> in the form of function crash_smp_send_stop(), which is used for
> kexec/panic path. This function relies on a SMP call that also
> triggers a busy-wait loop [at machine_crash_nonpanic_core()], but
> without disabling FIQs. This might lead to odd scenarios, like
> early interrupts in the boot of kexec'd kernel or even interrupts
> in secondary "disabled" CPUs while the main one still works in the
> panic path and assumes all secondary CPUs are (really!) off.
> 
> So, let's disable FIQs in both paths and *not* disable IRQs a second
> time, since they are already masked in both paths by the architecture.
> This way, we keep both CPU quiesce paths consistent and safe.
> 
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Michael Kelley <mikelley@microsoft.com>
> Cc: Russell King <linux@armlinux.org.uk>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
> 
> ---
> 
> V3:
> - No changes.
> 
> V2:
> - Small wording improvement (thanks Michael Kelley);
> - Only disable FIQs, since IRQs are masked by architecture
> definition when we take an interrupt. Thanks a lot Russell
> and Marc for the discussion [0].
> 
> Should we add a Fixes tag here? If so, maybe the proper target is:
> b23065313297 ("ARM: 6522/1: kexec: Add call to non-crashing cores through IPI")
> 
> [0] https://lore.kernel.org/lkml/Ymxcaqy6DwhoQrZT@shell.armlinux.org.uk/
> 
> 
>  arch/arm/kernel/machine_kexec.c | 2 ++
>  arch/arm/kernel/smp.c           | 5 ++---
>  2 files changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c
> index f567032a09c0..0b482bcb97f7 100644
> --- a/arch/arm/kernel/machine_kexec.c
> +++ b/arch/arm/kernel/machine_kexec.c
> @@ -77,6 +77,8 @@ void machine_crash_nonpanic_core(void *unused)
>  {
>  	struct pt_regs regs;
>  
> +	local_fiq_disable();
> +
>  	crash_setup_regs(&regs, get_irq_regs());
>  	printk(KERN_DEBUG "CPU %u will stop doing anything useful since another CPU has crashed\n",
>  	       smp_processor_id());
> diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
> index 978db2d96b44..36e6efad89f3 100644
> --- a/arch/arm/kernel/smp.c
> +++ b/arch/arm/kernel/smp.c
> @@ -600,6 +600,8 @@ static DEFINE_RAW_SPINLOCK(stop_lock);
>   */
>  static void ipi_cpu_stop(unsigned int cpu)
>  {
> +	local_fiq_disable();
> +
>  	if (system_state <= SYSTEM_RUNNING) {
>  		raw_spin_lock(&stop_lock);
>  		pr_crit("CPU%u: stopping\n", cpu);
> @@ -609,9 +611,6 @@ static void ipi_cpu_stop(unsigned int cpu)
>  
>  	set_cpu_online(cpu, false);
>  
> -	local_fiq_disable();
> -	local_irq_disable();
> -
>  	while (1) {
>  		cpu_relax();
>  		wfe();

[+CC all ARM folks I could find]

Hi folks, sorry for the ping. Any reviews are greatly appreciated in
this one - this is the V3 of the series but I never got any specific
review for this patch.

Based on a previous discussion with Russell and Marc [0], seems this one
makes sense and fixes an inconsistency related to kdump/panic.

Really appreciate reviews or at least some indication on which path I
should take to get this potentially merged.

Cheers,


Guilherme


[0] https://lore.kernel.org/lkml/Ymxcaqy6DwhoQrZT@shell.armlinux.org.uk/
Guilherme G. Piccoli Oct. 17, 2022, 2 p.m. UTC | #2
On 18/09/2022 10:58, Guilherme G. Piccoli wrote:
> On 19/08/2022 19:17, Guilherme G. Piccoli wrote:
>> Currently the regular CPU shutdown path for ARM disables IRQs/FIQs
>> in the secondary CPUs - smp_send_stop() calls ipi_cpu_stop(), which
>> is responsible for that. IRQs are architecturally masked when we
>> take an interrupt, but FIQs are high priority than IRQs, hence they
>> aren't masked. With that said, it makes sense to disable FIQs here,
>> but there's no need for (re-)disabling IRQs.
>>
>> More than that: there is an alternative path for disabling CPUs,
>> in the form of function crash_smp_send_stop(), which is used for
>> kexec/panic path. This function relies on a SMP call that also
>> triggers a busy-wait loop [at machine_crash_nonpanic_core()], but
>> without disabling FIQs. This might lead to odd scenarios, like
>> early interrupts in the boot of kexec'd kernel or even interrupts
>> in secondary "disabled" CPUs while the main one still works in the
>> panic path and assumes all secondary CPUs are (really!) off.
>>
>> So, let's disable FIQs in both paths and *not* disable IRQs a second
>> time, since they are already masked in both paths by the architecture.
>> This way, we keep both CPU quiesce paths consistent and safe.
>>
>> Cc: Marc Zyngier <maz@kernel.org>
>> Cc: Michael Kelley <mikelley@microsoft.com>
>> Cc: Russell King <linux@armlinux.org.uk>
>> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
>>

Monthly ping - let me know if there's something I should improve in
order this fix is considered!
Thanks,


Guilherme
Russell King (Oracle) Oct. 17, 2022, 2:17 p.m. UTC | #3
On Mon, Oct 17, 2022 at 11:00:46AM -0300, Guilherme G. Piccoli wrote:
> On 18/09/2022 10:58, Guilherme G. Piccoli wrote:
> > On 19/08/2022 19:17, Guilherme G. Piccoli wrote:
> >> Currently the regular CPU shutdown path for ARM disables IRQs/FIQs
> >> in the secondary CPUs - smp_send_stop() calls ipi_cpu_stop(), which
> >> is responsible for that. IRQs are architecturally masked when we
> >> take an interrupt, but FIQs are high priority than IRQs, hence they
> >> aren't masked. With that said, it makes sense to disable FIQs here,
> >> but there's no need for (re-)disabling IRQs.
> >>
> >> More than that: there is an alternative path for disabling CPUs,
> >> in the form of function crash_smp_send_stop(), which is used for
> >> kexec/panic path. This function relies on a SMP call that also
> >> triggers a busy-wait loop [at machine_crash_nonpanic_core()], but
> >> without disabling FIQs. This might lead to odd scenarios, like
> >> early interrupts in the boot of kexec'd kernel or even interrupts
> >> in secondary "disabled" CPUs while the main one still works in the
> >> panic path and assumes all secondary CPUs are (really!) off.
> >>
> >> So, let's disable FIQs in both paths and *not* disable IRQs a second
> >> time, since they are already masked in both paths by the architecture.
> >> This way, we keep both CPU quiesce paths consistent and safe.
> >>
> >> Cc: Marc Zyngier <maz@kernel.org>
> >> Cc: Michael Kelley <mikelley@microsoft.com>
> >> Cc: Russell King <linux@armlinux.org.uk>
> >> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
> >>
> 
> Monthly ping - let me know if there's something I should improve in
> order this fix is considered!

Patches don't get applied unless they end up in the patch system.
Thanks.
Guilherme G. Piccoli Oct. 17, 2022, 2:50 p.m. UTC | #4
On 17/10/2022 11:17, Russell King (Oracle) wrote:
> [...]
>> Monthly ping - let me know if there's something I should improve in
>> order this fix is considered!
> 
> Patches don't get applied unless they end up in the patch system.
> Thanks.
> 

Thanks Russell! Can you show me some documentation on how should I send
the patches to this patch system? My understanding based in the
MAINTAINERS file is that we should send the arm32 patches to you + arm ML.

Cheers,


Guilherme
Russell King (Oracle) Oct. 17, 2022, 5:47 p.m. UTC | #5
On Mon, Oct 17, 2022 at 11:50:05AM -0300, Guilherme G. Piccoli wrote:
> On 17/10/2022 11:17, Russell King (Oracle) wrote:
> > [...]
> >> Monthly ping - let me know if there's something I should improve in
> >> order this fix is considered!
> > 
> > Patches don't get applied unless they end up in the patch system.
> > Thanks.
> > 
> 
> Thanks Russell! Can you show me some documentation on how should I send
> the patches to this patch system? My understanding based in the
> MAINTAINERS file is that we should send the arm32 patches to you + arm ML.

Look below in my signature --.
                             |
			     v
Guilherme G. Piccoli Oct. 17, 2022, 7:43 p.m. UTC | #6
On 17/10/2022 14:47, Russell King (Oracle) wrote:
> On Mon, Oct 17, 2022 at 11:50:05AM -0300, Guilherme G. Piccoli wrote:
>> On 17/10/2022 11:17, Russell King (Oracle) wrote:
>>> [...]
>>>> Monthly ping - let me know if there's something I should improve in
>>>> order this fix is considered!
>>>
>>> Patches don't get applied unless they end up in the patch system.
>>> Thanks.
>>>
>>
>> Thanks Russell! Can you show me some documentation on how should I send
>> the patches to this patch system? My understanding based in the
>> MAINTAINERS file is that we should send the arm32 patches to you + arm ML.
> 
> Look below in my signature --.
>                              |
> 			     v


Thank you! It seems I was able to submit it properly now:
https://www.armlinux.org.uk/developer/patches/viewpatch.php?id=9257/1

Cheers,


Guilherme
diff mbox series

Patch

diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c
index f567032a09c0..0b482bcb97f7 100644
--- a/arch/arm/kernel/machine_kexec.c
+++ b/arch/arm/kernel/machine_kexec.c
@@ -77,6 +77,8 @@  void machine_crash_nonpanic_core(void *unused)
 {
 	struct pt_regs regs;
 
+	local_fiq_disable();
+
 	crash_setup_regs(&regs, get_irq_regs());
 	printk(KERN_DEBUG "CPU %u will stop doing anything useful since another CPU has crashed\n",
 	       smp_processor_id());
diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 978db2d96b44..36e6efad89f3 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -600,6 +600,8 @@  static DEFINE_RAW_SPINLOCK(stop_lock);
  */
 static void ipi_cpu_stop(unsigned int cpu)
 {
+	local_fiq_disable();
+
 	if (system_state <= SYSTEM_RUNNING) {
 		raw_spin_lock(&stop_lock);
 		pr_crit("CPU%u: stopping\n", cpu);
@@ -609,9 +611,6 @@  static void ipi_cpu_stop(unsigned int cpu)
 
 	set_cpu_online(cpu, false);
 
-	local_fiq_disable();
-	local_irq_disable();
-
 	while (1) {
 		cpu_relax();
 		wfe();