Message ID | 20220427224924.592546-3-gpiccoli@igalia.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | The panic notifiers refactor | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Not a local patch, async |
From: Guilherme G. Piccoli <gpiccoli@igalia.com> Sent: Wednesday, April 27, 2022 3:49 PM > > Currently the regular CPU shutdown path for ARM disables IRQs/FIQs > in the secondary CPUs - smp_send_stop() calls ipi_cpu_stop(), which > is responsible for that. This makes sense, since we're turning off > such CPUs, putting them in an endless busy-wait loop. > > Problem is that there is an alternative path for disabling CPUs, > in the form of function crash_smp_send_stop(), used for kexec/panic > paths. This functions relies in a SMP call that also triggers a s/functions relies in/function relies on/ > busy-wait loop [at machine_crash_nonpanic_core()], but *without* > disabling interrupts. This might lead to odd scenarios, like early > interrupts in the boot of kexec'd kernel or even interrupts in > other CPUs while the main one still works in the panic path and > assumes all secondary CPUs are (really!) off. > > This patch mimics the ipi_cpu_stop() interrupt disable mechanism > in the crash CPU shutdown path, hence disabling IRQs/FIQs in all > secondary CPUs in the kexec/panic path as well. > > Cc: Marc Zyngier <maz@kernel.org> > Cc: Russell King <linux@armlinux.org.uk> > Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com> > --- > arch/arm/kernel/machine_kexec.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c > index f567032a09c0..ef788ee00519 100644 > --- a/arch/arm/kernel/machine_kexec.c > +++ b/arch/arm/kernel/machine_kexec.c > @@ -86,6 +86,9 @@ void machine_crash_nonpanic_core(void *unused) > set_cpu_online(smp_processor_id(), false); > atomic_dec(&waiting_for_crash_ipi); > > + local_fiq_disable(); > + local_irq_disable(); > + > while (1) { > cpu_relax(); > wfe(); > -- > 2.36.0
On Wed, 27 Apr 2022 23:48:56 +0100, "Guilherme G. Piccoli" <gpiccoli@igalia.com> wrote: > > Currently the regular CPU shutdown path for ARM disables IRQs/FIQs > in the secondary CPUs - smp_send_stop() calls ipi_cpu_stop(), which > is responsible for that. This makes sense, since we're turning off > such CPUs, putting them in an endless busy-wait loop. > > Problem is that there is an alternative path for disabling CPUs, > in the form of function crash_smp_send_stop(), used for kexec/panic > paths. This functions relies in a SMP call that also triggers a > busy-wait loop [at machine_crash_nonpanic_core()], but *without* > disabling interrupts. This might lead to odd scenarios, like early > interrupts in the boot of kexec'd kernel or even interrupts in > other CPUs while the main one still works in the panic path and > assumes all secondary CPUs are (really!) off. > > This patch mimics the ipi_cpu_stop() interrupt disable mechanism > in the crash CPU shutdown path, hence disabling IRQs/FIQs in all > secondary CPUs in the kexec/panic path as well. > > Cc: Marc Zyngier <maz@kernel.org> > Cc: Russell King <linux@armlinux.org.uk> > Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com> > --- > arch/arm/kernel/machine_kexec.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c > index f567032a09c0..ef788ee00519 100644 > --- a/arch/arm/kernel/machine_kexec.c > +++ b/arch/arm/kernel/machine_kexec.c > @@ -86,6 +86,9 @@ void machine_crash_nonpanic_core(void *unused) > set_cpu_online(smp_processor_id(), false); > atomic_dec(&waiting_for_crash_ipi); > > + local_fiq_disable(); > + local_irq_disable(); > + My expectations would be that, since we're getting here using an IPI, interrupts are already masked. So what reenabled them the first place? Thanks, M.
Thanks Marc and Michael for the review/discussion. On 29/04/2022 15:20, Marc Zyngier wrote: > [...] > My expectations would be that, since we're getting here using an IPI, > interrupts are already masked. So what reenabled them the first place? > > Thanks, > > M. > Marc, I did some investigation in the code (and tried/failed in the ARM documentation as well heh), but this is still not 100% clear for me. You're saying IPI calls disable IRQs/FIQs by default in the the target CPUs? Where does it happen? I'm a bit confused if this a processor mechanism, or it's in code. Looking the smp_send_stop() in arch/arm/, it does IPI the CPUs, with the flag IPI_CPU_STOP, eventually calling ipi_cpu_stop(), and the latter does disable IRQ/FIQ in code - that's where I stole my code from. But crash_smp_send_stop() is different, it seems to IPI the other CPUs with the flag IPI_CALL_FUNC, which leads to calling generic_smp_call_function_interrupt() - does it disable interrupts/FIQs as well? I couldn't find it. Appreciate your clarifications about that, thanks again. Cheers, Guilherme
On Fri, Apr 29, 2022 at 06:38:19PM -0300, Guilherme G. Piccoli wrote: > Thanks Marc and Michael for the review/discussion. > > On 29/04/2022 15:20, Marc Zyngier wrote: > > [...] > > > My expectations would be that, since we're getting here using an IPI, > > interrupts are already masked. So what reenabled them the first place? > > > > Thanks, > > > > M. > > > > Marc, I did some investigation in the code (and tried/failed in the ARM > documentation as well heh), but this is still not 100% clear for me. > > You're saying IPI calls disable IRQs/FIQs by default in the the target > CPUs? Where does it happen? I'm a bit confused if this a processor > mechanism, or it's in code. When we taken an IRQ, IRQs will be masked, FIQs will not. IPIs are themselves interrupts, so IRQs will be masked while the IPI is being processed. Therefore, there should be no need to re-disable the already disabled interrupts. > But crash_smp_send_stop() is different, it seems to IPI the other CPUs > with the flag IPI_CALL_FUNC, which leads to calling > generic_smp_call_function_interrupt() - does it disable interrupts/FIQs > as well? I couldn't find it. It's buried in the architecture behaviour. When the CPU takes an interrupt and jumps to the interrupt vector in the vectors page, it is architecturally defined that interrupts will be disabled. If they weren't architecturally disabled at this point, then as soon as the first instruction is processed (at the interrupt vector, likely a branch) the CPU would immediately take another jump to the interrupt vector, and this process would continue indefinitely, making interrupt handling utterly useless. So, you won't find an explicit instruction in the code path from the vectors to the IPI handler that disables interrupts - because it's written into the architecture that this is what must happen. IRQs are a lower priority than FIQs, so FIQs remain unmasked.
On 29/04/2022 18:45, Russell King (Oracle) wrote: > [...] >> Marc, I did some investigation in the code (and tried/failed in the ARM >> documentation as well heh), but this is still not 100% clear for me. >> >> You're saying IPI calls disable IRQs/FIQs by default in the the target >> CPUs? Where does it happen? I'm a bit confused if this a processor >> mechanism, or it's in code. > > When we taken an IRQ, IRQs will be masked, FIQs will not. IPIs are > themselves interrupts, so IRQs will be masked while the IPI is being > processed. Therefore, there should be no need to re-disable the > already disabled interrupts. > >> But crash_smp_send_stop() is different, it seems to IPI the other CPUs >> with the flag IPI_CALL_FUNC, which leads to calling >> generic_smp_call_function_interrupt() - does it disable interrupts/FIQs >> as well? I couldn't find it. > > It's buried in the architecture behaviour. When the CPU takes an > interrupt and jumps to the interrupt vector in the vectors page, it is > architecturally defined that interrupts will be disabled. If they > weren't architecturally disabled at this point, then as soon as the > first instruction is processed (at the interrupt vector, likely a > branch) the CPU would immediately take another jump to the interrupt > vector, and this process would continue indefinitely, making interrupt > handling utterly useless. > > So, you won't find an explicit instruction in the code path from the > vectors to the IPI handler that disables interrupts - because it's > written into the architecture that this is what must happen. > > IRQs are a lower priority than FIQs, so FIQs remain unmasked. > Thanks a lot for the *great* explanation Russell, much appreciated. So, this leads to the both following questions: a) Shall we then change the patch to only disable FIQs, since it's panic path and we don't want secondary CPUs getting interrupted, but only spinning quietly "forever"? b) How about cleaning ipi_cpu_stop() then, by dropping the call to local_irq_disable() there, to avoid the double IRQ disabling? Thanks, Guilherme
On Fri, 29 Apr 2022 22:45:14 +0100, "Russell King (Oracle)" <linux@armlinux.org.uk> wrote: > > On Fri, Apr 29, 2022 at 06:38:19PM -0300, Guilherme G. Piccoli wrote: > > Thanks Marc and Michael for the review/discussion. > > > > On 29/04/2022 15:20, Marc Zyngier wrote: > > > [...] > > > > > My expectations would be that, since we're getting here using an IPI, > > > interrupts are already masked. So what reenabled them the first place? > > > > > > Thanks, > > > > > > M. > > > > > > > Marc, I did some investigation in the code (and tried/failed in the ARM > > documentation as well heh), but this is still not 100% clear for me. > > > > You're saying IPI calls disable IRQs/FIQs by default in the the target > > CPUs? Where does it happen? I'm a bit confused if this a processor > > mechanism, or it's in code. > > When we taken an IRQ, IRQs will be masked, FIQs will not. IPIs are > themselves interrupts, so IRQs will be masked while the IPI is being > processed. Therefore, there should be no need to re-disable the > already disabled interrupts. > > > But crash_smp_send_stop() is different, it seems to IPI the other CPUs > > with the flag IPI_CALL_FUNC, which leads to calling > > generic_smp_call_function_interrupt() - does it disable interrupts/FIQs > > as well? I couldn't find it. > > It's buried in the architecture behaviour. When the CPU takes an > interrupt and jumps to the interrupt vector in the vectors page, it is > architecturally defined that interrupts will be disabled. If they > weren't architecturally disabled at this point, then as soon as the > first instruction is processed (at the interrupt vector, likely a > branch) the CPU would immediately take another jump to the interrupt > vector, and this process would continue indefinitely, making interrupt > handling utterly useless. > > So, you won't find an explicit instruction in the code path from the > vectors to the IPI handler that disables interrupts - because it's > written into the architecture that this is what must happen. > > IRQs are a lower priority than FIQs, so FIQs remain unmasked. Ah, you're of course right. That's one of the huge differences between AArch32 and AArch64, where the former has per target mode masking rules, and the later masks everything on entry... M.
diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c index f567032a09c0..ef788ee00519 100644 --- a/arch/arm/kernel/machine_kexec.c +++ b/arch/arm/kernel/machine_kexec.c @@ -86,6 +86,9 @@ void machine_crash_nonpanic_core(void *unused) set_cpu_online(smp_processor_id(), false); atomic_dec(&waiting_for_crash_ipi); + local_fiq_disable(); + local_irq_disable(); + while (1) { cpu_relax(); wfe();
Currently the regular CPU shutdown path for ARM disables IRQs/FIQs in the secondary CPUs - smp_send_stop() calls ipi_cpu_stop(), which is responsible for that. This makes sense, since we're turning off such CPUs, putting them in an endless busy-wait loop. Problem is that there is an alternative path for disabling CPUs, in the form of function crash_smp_send_stop(), used for kexec/panic paths. This functions relies in a SMP call that also triggers a busy-wait loop [at machine_crash_nonpanic_core()], but *without* disabling interrupts. This might lead to odd scenarios, like early interrupts in the boot of kexec'd kernel or even interrupts in other CPUs while the main one still works in the panic path and assumes all secondary CPUs are (really!) off. This patch mimics the ipi_cpu_stop() interrupt disable mechanism in the crash CPU shutdown path, hence disabling IRQs/FIQs in all secondary CPUs in the kexec/panic path as well. Cc: Marc Zyngier <maz@kernel.org> Cc: Russell King <linux@armlinux.org.uk> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com> --- arch/arm/kernel/machine_kexec.c | 3 +++ 1 file changed, 3 insertions(+)