Message ID | 20220819221731.480795-2-gpiccoli@igalia.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | The panic notifiers refactor - fixes/clean-ups (V3) | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Not a local patch, async |
On 19/08/2022 19:17, Guilherme G. Piccoli wrote: > Currently the regular CPU shutdown path for ARM disables IRQs/FIQs > in the secondary CPUs - smp_send_stop() calls ipi_cpu_stop(), which > is responsible for that. IRQs are architecturally masked when we > take an interrupt, but FIQs are high priority than IRQs, hence they > aren't masked. With that said, it makes sense to disable FIQs here, > but there's no need for (re-)disabling IRQs. > > More than that: there is an alternative path for disabling CPUs, > in the form of function crash_smp_send_stop(), which is used for > kexec/panic path. This function relies on a SMP call that also > triggers a busy-wait loop [at machine_crash_nonpanic_core()], but > without disabling FIQs. This might lead to odd scenarios, like > early interrupts in the boot of kexec'd kernel or even interrupts > in secondary "disabled" CPUs while the main one still works in the > panic path and assumes all secondary CPUs are (really!) off. > > So, let's disable FIQs in both paths and *not* disable IRQs a second > time, since they are already masked in both paths by the architecture. > This way, we keep both CPU quiesce paths consistent and safe. > > Cc: Marc Zyngier <maz@kernel.org> > Cc: Michael Kelley <mikelley@microsoft.com> > Cc: Russell King <linux@armlinux.org.uk> > Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com> > > --- > > V3: > - No changes. > > V2: > - Small wording improvement (thanks Michael Kelley); > - Only disable FIQs, since IRQs are masked by architecture > definition when we take an interrupt. Thanks a lot Russell > and Marc for the discussion [0]. > > Should we add a Fixes tag here? If so, maybe the proper target is: > b23065313297 ("ARM: 6522/1: kexec: Add call to non-crashing cores through IPI") > > [0] https://lore.kernel.org/lkml/Ymxcaqy6DwhoQrZT@shell.armlinux.org.uk/ > > > arch/arm/kernel/machine_kexec.c | 2 ++ > arch/arm/kernel/smp.c | 5 ++--- > 2 files changed, 4 insertions(+), 3 deletions(-) > > diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c > index f567032a09c0..0b482bcb97f7 100644 > --- a/arch/arm/kernel/machine_kexec.c > +++ b/arch/arm/kernel/machine_kexec.c > @@ -77,6 +77,8 @@ void machine_crash_nonpanic_core(void *unused) > { > struct pt_regs regs; > > + local_fiq_disable(); > + > crash_setup_regs(®s, get_irq_regs()); > printk(KERN_DEBUG "CPU %u will stop doing anything useful since another CPU has crashed\n", > smp_processor_id()); > diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c > index 978db2d96b44..36e6efad89f3 100644 > --- a/arch/arm/kernel/smp.c > +++ b/arch/arm/kernel/smp.c > @@ -600,6 +600,8 @@ static DEFINE_RAW_SPINLOCK(stop_lock); > */ > static void ipi_cpu_stop(unsigned int cpu) > { > + local_fiq_disable(); > + > if (system_state <= SYSTEM_RUNNING) { > raw_spin_lock(&stop_lock); > pr_crit("CPU%u: stopping\n", cpu); > @@ -609,9 +611,6 @@ static void ipi_cpu_stop(unsigned int cpu) > > set_cpu_online(cpu, false); > > - local_fiq_disable(); > - local_irq_disable(); > - > while (1) { > cpu_relax(); > wfe(); [+CC all ARM folks I could find] Hi folks, sorry for the ping. Any reviews are greatly appreciated in this one - this is the V3 of the series but I never got any specific review for this patch. Based on a previous discussion with Russell and Marc [0], seems this one makes sense and fixes an inconsistency related to kdump/panic. Really appreciate reviews or at least some indication on which path I should take to get this potentially merged. Cheers, Guilherme [0] https://lore.kernel.org/lkml/Ymxcaqy6DwhoQrZT@shell.armlinux.org.uk/
On 18/09/2022 10:58, Guilherme G. Piccoli wrote: > On 19/08/2022 19:17, Guilherme G. Piccoli wrote: >> Currently the regular CPU shutdown path for ARM disables IRQs/FIQs >> in the secondary CPUs - smp_send_stop() calls ipi_cpu_stop(), which >> is responsible for that. IRQs are architecturally masked when we >> take an interrupt, but FIQs are high priority than IRQs, hence they >> aren't masked. With that said, it makes sense to disable FIQs here, >> but there's no need for (re-)disabling IRQs. >> >> More than that: there is an alternative path for disabling CPUs, >> in the form of function crash_smp_send_stop(), which is used for >> kexec/panic path. This function relies on a SMP call that also >> triggers a busy-wait loop [at machine_crash_nonpanic_core()], but >> without disabling FIQs. This might lead to odd scenarios, like >> early interrupts in the boot of kexec'd kernel or even interrupts >> in secondary "disabled" CPUs while the main one still works in the >> panic path and assumes all secondary CPUs are (really!) off. >> >> So, let's disable FIQs in both paths and *not* disable IRQs a second >> time, since they are already masked in both paths by the architecture. >> This way, we keep both CPU quiesce paths consistent and safe. >> >> Cc: Marc Zyngier <maz@kernel.org> >> Cc: Michael Kelley <mikelley@microsoft.com> >> Cc: Russell King <linux@armlinux.org.uk> >> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com> >> Monthly ping - let me know if there's something I should improve in order this fix is considered! Thanks, Guilherme
On Mon, Oct 17, 2022 at 11:00:46AM -0300, Guilherme G. Piccoli wrote: > On 18/09/2022 10:58, Guilherme G. Piccoli wrote: > > On 19/08/2022 19:17, Guilherme G. Piccoli wrote: > >> Currently the regular CPU shutdown path for ARM disables IRQs/FIQs > >> in the secondary CPUs - smp_send_stop() calls ipi_cpu_stop(), which > >> is responsible for that. IRQs are architecturally masked when we > >> take an interrupt, but FIQs are high priority than IRQs, hence they > >> aren't masked. With that said, it makes sense to disable FIQs here, > >> but there's no need for (re-)disabling IRQs. > >> > >> More than that: there is an alternative path for disabling CPUs, > >> in the form of function crash_smp_send_stop(), which is used for > >> kexec/panic path. This function relies on a SMP call that also > >> triggers a busy-wait loop [at machine_crash_nonpanic_core()], but > >> without disabling FIQs. This might lead to odd scenarios, like > >> early interrupts in the boot of kexec'd kernel or even interrupts > >> in secondary "disabled" CPUs while the main one still works in the > >> panic path and assumes all secondary CPUs are (really!) off. > >> > >> So, let's disable FIQs in both paths and *not* disable IRQs a second > >> time, since they are already masked in both paths by the architecture. > >> This way, we keep both CPU quiesce paths consistent and safe. > >> > >> Cc: Marc Zyngier <maz@kernel.org> > >> Cc: Michael Kelley <mikelley@microsoft.com> > >> Cc: Russell King <linux@armlinux.org.uk> > >> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com> > >> > > Monthly ping - let me know if there's something I should improve in > order this fix is considered! Patches don't get applied unless they end up in the patch system. Thanks.
On 17/10/2022 11:17, Russell King (Oracle) wrote: > [...] >> Monthly ping - let me know if there's something I should improve in >> order this fix is considered! > > Patches don't get applied unless they end up in the patch system. > Thanks. > Thanks Russell! Can you show me some documentation on how should I send the patches to this patch system? My understanding based in the MAINTAINERS file is that we should send the arm32 patches to you + arm ML. Cheers, Guilherme
On Mon, Oct 17, 2022 at 11:50:05AM -0300, Guilherme G. Piccoli wrote: > On 17/10/2022 11:17, Russell King (Oracle) wrote: > > [...] > >> Monthly ping - let me know if there's something I should improve in > >> order this fix is considered! > > > > Patches don't get applied unless they end up in the patch system. > > Thanks. > > > > Thanks Russell! Can you show me some documentation on how should I send > the patches to this patch system? My understanding based in the > MAINTAINERS file is that we should send the arm32 patches to you + arm ML. Look below in my signature --. | v
On 17/10/2022 14:47, Russell King (Oracle) wrote: > On Mon, Oct 17, 2022 at 11:50:05AM -0300, Guilherme G. Piccoli wrote: >> On 17/10/2022 11:17, Russell King (Oracle) wrote: >>> [...] >>>> Monthly ping - let me know if there's something I should improve in >>>> order this fix is considered! >>> >>> Patches don't get applied unless they end up in the patch system. >>> Thanks. >>> >> >> Thanks Russell! Can you show me some documentation on how should I send >> the patches to this patch system? My understanding based in the >> MAINTAINERS file is that we should send the arm32 patches to you + arm ML. > > Look below in my signature --. > | > v Thank you! It seems I was able to submit it properly now: https://www.armlinux.org.uk/developer/patches/viewpatch.php?id=9257/1 Cheers, Guilherme
diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c index f567032a09c0..0b482bcb97f7 100644 --- a/arch/arm/kernel/machine_kexec.c +++ b/arch/arm/kernel/machine_kexec.c @@ -77,6 +77,8 @@ void machine_crash_nonpanic_core(void *unused) { struct pt_regs regs; + local_fiq_disable(); + crash_setup_regs(®s, get_irq_regs()); printk(KERN_DEBUG "CPU %u will stop doing anything useful since another CPU has crashed\n", smp_processor_id()); diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c index 978db2d96b44..36e6efad89f3 100644 --- a/arch/arm/kernel/smp.c +++ b/arch/arm/kernel/smp.c @@ -600,6 +600,8 @@ static DEFINE_RAW_SPINLOCK(stop_lock); */ static void ipi_cpu_stop(unsigned int cpu) { + local_fiq_disable(); + if (system_state <= SYSTEM_RUNNING) { raw_spin_lock(&stop_lock); pr_crit("CPU%u: stopping\n", cpu); @@ -609,9 +611,6 @@ static void ipi_cpu_stop(unsigned int cpu) set_cpu_online(cpu, false); - local_fiq_disable(); - local_irq_disable(); - while (1) { cpu_relax(); wfe();
Currently the regular CPU shutdown path for ARM disables IRQs/FIQs in the secondary CPUs - smp_send_stop() calls ipi_cpu_stop(), which is responsible for that. IRQs are architecturally masked when we take an interrupt, but FIQs are high priority than IRQs, hence they aren't masked. With that said, it makes sense to disable FIQs here, but there's no need for (re-)disabling IRQs. More than that: there is an alternative path for disabling CPUs, in the form of function crash_smp_send_stop(), which is used for kexec/panic path. This function relies on a SMP call that also triggers a busy-wait loop [at machine_crash_nonpanic_core()], but without disabling FIQs. This might lead to odd scenarios, like early interrupts in the boot of kexec'd kernel or even interrupts in secondary "disabled" CPUs while the main one still works in the panic path and assumes all secondary CPUs are (really!) off. So, let's disable FIQs in both paths and *not* disable IRQs a second time, since they are already masked in both paths by the architecture. This way, we keep both CPU quiesce paths consistent and safe. Cc: Marc Zyngier <maz@kernel.org> Cc: Michael Kelley <mikelley@microsoft.com> Cc: Russell King <linux@armlinux.org.uk> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com> --- V3: - No changes. V2: - Small wording improvement (thanks Michael Kelley); - Only disable FIQs, since IRQs are masked by architecture definition when we take an interrupt. Thanks a lot Russell and Marc for the discussion [0]. Should we add a Fixes tag here? If so, maybe the proper target is: b23065313297 ("ARM: 6522/1: kexec: Add call to non-crashing cores through IPI") [0] https://lore.kernel.org/lkml/Ymxcaqy6DwhoQrZT@shell.armlinux.org.uk/ arch/arm/kernel/machine_kexec.c | 2 ++ arch/arm/kernel/smp.c | 5 ++--- 2 files changed, 4 insertions(+), 3 deletions(-)