Message ID | 8f53a31cf8bbfdd73eb289e078addc31c5a19fcf.1457977403.git.geoff@infradead.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi! On 14/03/16 17:48, Geoff Levand wrote: > From: AKASHI Takahiro <takahiro.akashi@linaro.org> > > Primary kernel calls machine_crash_shutdown() to shut down non-boot cpus > and save registers' status in per-cpu ELF notes before starting crash > dump kernel. See kernel_kexec(). > Even if not all secondary cpus have shut down, we do kdump anyway. > > As we don't have to make non-boot(crashed) cpus offline (to preserve > correct status of cpus at crash dump) before shutting down, this patch > also adds a variant of smp_send_stop(). > > Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c > index b1adc51..76402c6cd 100644 > --- a/arch/arm64/kernel/smp.c > +++ b/arch/arm64/kernel/smp.c > @@ -701,6 +705,28 @@ static void ipi_cpu_stop(unsigned int cpu) > cpu_relax(); > } > > +static atomic_t waiting_for_crash_ipi; > + > +static void ipi_cpu_crash_stop(unsigned int cpu, struct pt_regs *regs) > +{ > + crash_save_cpu(regs, cpu); > + > + raw_spin_lock(&stop_lock); > + pr_debug("CPU%u: stopping\n", cpu); > + raw_spin_unlock(&stop_lock); > + > + atomic_dec(&waiting_for_crash_ipi); > + > + local_irq_disable(); Aren't irqs already disabled here? - or is this a 'just make sure'.... > + > + if (cpu_ops[cpu]->cpu_die) > + cpu_ops[cpu]->cpu_die(cpu); > + > + /* just in case */ > + while (1) > + wfi(); > +} > + > /* > * Main handler for inter-processor interrupts > */ > @@ -731,6 +757,12 @@ void handle_IPI(int ipinr, struct pt_regs *regs) > irq_exit(); > break; > > + case IPI_CPU_CRASH_STOP: > + irq_enter(); > + ipi_cpu_crash_stop(cpu, regs); > + irq_exit(); This made me jump: irq_exit() may end up in the __do_softirq() (with irqs turned back on!) ... but these lines are impossible to reach. Maybe get the compiler to enforce this with an unreachable() instead? > + break; > + > #ifdef CONFIG_GENERIC_CLOCKEVENTS_BROADCAST > case IPI_TIMER: > irq_enter(); > @@ -791,6 +823,30 @@ void smp_send_stop(void) > pr_warning("SMP: failed to stop secondary CPUs\n"); > } > > +void smp_send_crash_stop(void) > +{ > + cpumask_t mask; > + unsigned long timeout; > + > + if (num_online_cpus() == 1) > + return; > + > + cpumask_copy(&mask, cpu_online_mask); > + cpumask_clear_cpu(smp_processor_id(), &mask); > + > + atomic_set(&waiting_for_crash_ipi, num_online_cpus() - 1); > + > + smp_cross_call(&mask, IPI_CPU_CRASH_STOP); > + > + /* Wait up to one second for other CPUs to stop */ > + timeout = USEC_PER_SEC; > + while ((atomic_read(&waiting_for_crash_ipi) > 0) && timeout--) > + udelay(1); > + > + if (atomic_read(&waiting_for_crash_ipi) > 0) > + pr_warn("SMP: failed to stop secondary CPUs\n"); > +} > + > /* > * not supported here > */ > Thanks, James
Hi! On 18/03/16 18:08, James Morse wrote: > On 14/03/16 17:48, Geoff Levand wrote: >> From: AKASHI Takahiro <takahiro.akashi@linaro.org> >> >> Primary kernel calls machine_crash_shutdown() to shut down non-boot cpus >> and save registers' status in per-cpu ELF notes before starting crash >> dump kernel. See kernel_kexec(). >> Even if not all secondary cpus have shut down, we do kdump anyway. >> >> As we don't have to make non-boot(crashed) cpus offline (to preserve >> correct status of cpus at crash dump) before shutting down, this patch >> also adds a variant of smp_send_stop(). >> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c >> index b1adc51..76402c6cd 100644 >> --- a/arch/arm64/kernel/smp.c >> +++ b/arch/arm64/kernel/smp.c >> @@ -701,6 +705,28 @@ static void ipi_cpu_stop(unsigned int cpu) >> cpu_relax(); >> } >> >> +static atomic_t waiting_for_crash_ipi; >> + >> +static void ipi_cpu_crash_stop(unsigned int cpu, struct pt_regs *regs) >> +{ >> + crash_save_cpu(regs, cpu); >> + >> + raw_spin_lock(&stop_lock); >> + pr_debug("CPU%u: stopping\n", cpu); >> + raw_spin_unlock(&stop_lock); >> + >> + atomic_dec(&waiting_for_crash_ipi); >> + >> + local_irq_disable(); >> + >> + if (cpu_ops[cpu]->cpu_die) >> + cpu_ops[cpu]->cpu_die(cpu); >> + >> + /* just in case */ >> + while (1) >> + wfi(); Having thought about this some more: I don't think spinning like this is safe. We need to spin with the MMU turned off, otherwise this core will pollute the kdump kernel with TLB entries from the old page tables. Suzuki added code to catch this happening with cpu hotplug (grep CPU_STUCK_IN_KERNEL in arm64/for-next/core), but that won't help here. If 'CPU_STUCK_IN_KERNEL' was set by a core, I don't think we can kexec/kdump for this reason. Something like cpu_die() for spin-table is needed, naively I think it should turn the MMU off, and jump back into the secondary_holding_pen, but the core would still be stuck in the kernel, and the memory addresses associated with secondary_holding_pen can't be re-used. (which is fine for kdump, but not kexec) Thanks, James
On Fri, Mar 18, 2016 at 06:08:15PM +0000, James Morse wrote: > Hi! > > On 14/03/16 17:48, Geoff Levand wrote: > > From: AKASHI Takahiro <takahiro.akashi@linaro.org> > > > > Primary kernel calls machine_crash_shutdown() to shut down non-boot cpus > > and save registers' status in per-cpu ELF notes before starting crash > > dump kernel. See kernel_kexec(). > > Even if not all secondary cpus have shut down, we do kdump anyway. > > > > As we don't have to make non-boot(crashed) cpus offline (to preserve > > correct status of cpus at crash dump) before shutting down, this patch > > also adds a variant of smp_send_stop(). > > > > Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> > > > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c > > index b1adc51..76402c6cd 100644 > > --- a/arch/arm64/kernel/smp.c > > +++ b/arch/arm64/kernel/smp.c > > @@ -701,6 +705,28 @@ static void ipi_cpu_stop(unsigned int cpu) > > cpu_relax(); > > } > > > > +static atomic_t waiting_for_crash_ipi; > > + > > +static void ipi_cpu_crash_stop(unsigned int cpu, struct pt_regs *regs) > > +{ > > + crash_save_cpu(regs, cpu); > > + > > + raw_spin_lock(&stop_lock); > > + pr_debug("CPU%u: stopping\n", cpu); > > + raw_spin_unlock(&stop_lock); > > + > > + atomic_dec(&waiting_for_crash_ipi); > > + > > + local_irq_disable(); > > Aren't irqs already disabled here? - or is this a 'just make sure'.... Well, it also exists in ipi_cpu_stop() ... > > + > > + if (cpu_ops[cpu]->cpu_die) > > + cpu_ops[cpu]->cpu_die(cpu); > > + > > + /* just in case */ > > + while (1) > > + wfi(); > > +} > > + > > /* > > * Main handler for inter-processor interrupts > > */ > > @@ -731,6 +757,12 @@ void handle_IPI(int ipinr, struct pt_regs *regs) > > irq_exit(); > > break; > > > > + case IPI_CPU_CRASH_STOP: > > + irq_enter(); > > + ipi_cpu_crash_stop(cpu, regs); > > + irq_exit(); > > This made me jump: irq_exit() may end up in the __do_softirq() (with irqs turned > back on!) ... but these lines are impossible to reach. Maybe get the compiler to > enforce this with an unreachable() instead? I'm not sure how effective unreachable() is here, but OK I will add it. Thanks, -Takahiro AKASHI > > + break; > > + > > #ifdef CONFIG_GENERIC_CLOCKEVENTS_BROADCAST > > case IPI_TIMER: > > irq_enter(); > > @@ -791,6 +823,30 @@ void smp_send_stop(void) > > pr_warning("SMP: failed to stop secondary CPUs\n"); > > } > > > > +void smp_send_crash_stop(void) > > +{ > > + cpumask_t mask; > > + unsigned long timeout; > > + > > + if (num_online_cpus() == 1) > > + return; > > + > > + cpumask_copy(&mask, cpu_online_mask); > > + cpumask_clear_cpu(smp_processor_id(), &mask); > > + > > + atomic_set(&waiting_for_crash_ipi, num_online_cpus() - 1); > > + > > + smp_cross_call(&mask, IPI_CPU_CRASH_STOP); > > + > > + /* Wait up to one second for other CPUs to stop */ > > + timeout = USEC_PER_SEC; > > + while ((atomic_read(&waiting_for_crash_ipi) > 0) && timeout--) > > + udelay(1); > > + > > + if (atomic_read(&waiting_for_crash_ipi) > 0) > > + pr_warn("SMP: failed to stop secondary CPUs\n"); > > +} > > + > > /* > > * not supported here > > */ > > > > > Thanks, > > James > >
On Mon, Mar 21, 2016 at 01:29:28PM +0000, James Morse wrote: > Hi! > > On 18/03/16 18:08, James Morse wrote: > > On 14/03/16 17:48, Geoff Levand wrote: > >> From: AKASHI Takahiro <takahiro.akashi@linaro.org> > >> > >> Primary kernel calls machine_crash_shutdown() to shut down non-boot cpus > >> and save registers' status in per-cpu ELF notes before starting crash > >> dump kernel. See kernel_kexec(). > >> Even if not all secondary cpus have shut down, we do kdump anyway. > >> > >> As we don't have to make non-boot(crashed) cpus offline (to preserve > >> correct status of cpus at crash dump) before shutting down, this patch > >> also adds a variant of smp_send_stop(). > > >> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c > >> index b1adc51..76402c6cd 100644 > >> --- a/arch/arm64/kernel/smp.c > >> +++ b/arch/arm64/kernel/smp.c > >> @@ -701,6 +705,28 @@ static void ipi_cpu_stop(unsigned int cpu) > >> cpu_relax(); > >> } > >> > >> +static atomic_t waiting_for_crash_ipi; > >> + > >> +static void ipi_cpu_crash_stop(unsigned int cpu, struct pt_regs *regs) > >> +{ > >> + crash_save_cpu(regs, cpu); > >> + > >> + raw_spin_lock(&stop_lock); > >> + pr_debug("CPU%u: stopping\n", cpu); > >> + raw_spin_unlock(&stop_lock); > >> + > >> + atomic_dec(&waiting_for_crash_ipi); > >> + > >> + local_irq_disable(); > >> + > >> + if (cpu_ops[cpu]->cpu_die) > >> + cpu_ops[cpu]->cpu_die(cpu); > >> + > >> + /* just in case */ > >> + while (1) > >> + wfi(); > > Having thought about this some more: I don't think spinning like this is safe. > We need to spin with the MMU turned off, otherwise this core will pollute the > kdump kernel with TLB entries from the old page tables. I think that wfi() will never wake up since local interrupts are disabled here. So how can it pollute the kdump kernel? > Suzuki added code to > catch this happening with cpu hotplug (grep CPU_STUCK_IN_KERNEL in > arm64/for-next/core), but that won't help here. If 'CPU_STUCK_IN_KERNEL' was set > by a core, I don't think we can kexec/kdump for this reason. I will need to look into Suzuki's code. > Something like cpu_die() for spin-table is needed, naively I think it should > turn the MMU off, and jump back into the secondary_holding_pen, but the core > would still be stuck in the kernel, and the memory addresses associated with > secondary_holding_pen can't be re-used. (which is fine for kdump, but not kexec) Please note that the code is exercised only in kdump case through machine_crash_shutdown(). Thanks, -Takahiro AKASHI > > Thanks, > > James >
On 31/03/16 08:57, AKASHI Takahiro wrote: > On Mon, Mar 21, 2016 at 01:29:28PM +0000, James Morse wrote: >> Hi! >> >> On 18/03/16 18:08, James Morse wrote: >>> On 14/03/16 17:48, Geoff Levand wrote: >>>> From: AKASHI Takahiro <takahiro.akashi@linaro.org> >>>> >>>> Primary kernel calls machine_crash_shutdown() to shut down non-boot cpus >>>> and save registers' status in per-cpu ELF notes before starting crash >>>> dump kernel. See kernel_kexec(). >>>> Even if not all secondary cpus have shut down, we do kdump anyway. >>>> >>>> As we don't have to make non-boot(crashed) cpus offline (to preserve >>>> correct status of cpus at crash dump) before shutting down, this patch >>>> also adds a variant of smp_send_stop(). >> >>>> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c >>>> index b1adc51..76402c6cd 100644 >>>> --- a/arch/arm64/kernel/smp.c >>>> +++ b/arch/arm64/kernel/smp.c >>>> @@ -701,6 +705,28 @@ static void ipi_cpu_stop(unsigned int cpu) >>>> cpu_relax(); >>>> } >>>> >>>> +static atomic_t waiting_for_crash_ipi; >>>> + >>>> +static void ipi_cpu_crash_stop(unsigned int cpu, struct pt_regs *regs) >>>> +{ >>>> + crash_save_cpu(regs, cpu); >>>> + >>>> + raw_spin_lock(&stop_lock); >>>> + pr_debug("CPU%u: stopping\n", cpu); >>>> + raw_spin_unlock(&stop_lock); >>>> + >>>> + atomic_dec(&waiting_for_crash_ipi); >>>> + >>>> + local_irq_disable(); >>>> + >>>> + if (cpu_ops[cpu]->cpu_die) >>>> + cpu_ops[cpu]->cpu_die(cpu); >>>> + >>>> + /* just in case */ >>>> + while (1) >>>> + wfi(); >> >> Having thought about this some more: I don't think spinning like this is safe. >> We need to spin with the MMU turned off, otherwise this core will pollute the >> kdump kernel with TLB entries from the old page tables. > > I think that wfi() will never wake up since local interrupts are disabled > here. So how can it pollute the kdump kernel? Having interrupts disabled doesn't prevent an exit from WFI. Quite the opposite, actually. It is designed to wake-up the core when something happens on the external interface. Thanks, M.
On Thu, Mar 31, 2016 at 09:12:32AM +0100, Marc Zyngier wrote: > On 31/03/16 08:57, AKASHI Takahiro wrote: > > On Mon, Mar 21, 2016 at 01:29:28PM +0000, James Morse wrote: > >> On 18/03/16 18:08, James Morse wrote: > >>> On 14/03/16 17:48, Geoff Levand wrote: > >>>> + /* just in case */ > >>>> + while (1) > >>>> + wfi(); > >> > >> Having thought about this some more: I don't think spinning like this is safe. > >> We need to spin with the MMU turned off, otherwise this core will pollute the > >> kdump kernel with TLB entries from the old page tables. > > > > I think that wfi() will never wake up since local interrupts are disabled > > here. So how can it pollute the kdump kernel? > > Having interrupts disabled doesn't prevent an exit from WFI. Quite the > opposite, actually. It is designed to wake-up the core when something > happens on the external interface. Further, WFI is a hint, and may simply act as a NOP. The ARM ARM calls this out (see "D1.17.2" Wait For Interrupt in ARM DDI 0487A.i): Because the architecture permits a PE to leave the low-power state for any reason, it is permissible for a PE to treat WFI as a NOP , but this is not recommended for lowest power operation. Thanks, Mark.
On 31/03/16 08:46, AKASHI Takahiro wrote: > James Morse wrote: > > This made me jump: irq_exit() may end up in the __do_softirq() (with irqs turned > > back on!) ... but these lines are impossible to reach. Maybe get the compiler to > > enforce this with an unreachable() instead? > > I'm not sure how effective unreachable() is here, but OK I will add it. I thought '__builtin_unreachable()' would generate a warning if it was reachable, but from [0] it looks like it just suppresses warnings. You're right, it won't help. Thanks, James [0] https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html
On Thu, Mar 31, 2016 at 11:10:38AM +0100, Mark Rutland wrote: > On Thu, Mar 31, 2016 at 09:12:32AM +0100, Marc Zyngier wrote: > > On 31/03/16 08:57, AKASHI Takahiro wrote: > > > On Mon, Mar 21, 2016 at 01:29:28PM +0000, James Morse wrote: > > >> On 18/03/16 18:08, James Morse wrote: > > >>> On 14/03/16 17:48, Geoff Levand wrote: > > >>>> + /* just in case */ > > >>>> + while (1) > > >>>> + wfi(); > > >> > > >> Having thought about this some more: I don't think spinning like this is safe. > > >> We need to spin with the MMU turned off, otherwise this core will pollute the > > >> kdump kernel with TLB entries from the old page tables. > > > > > > I think that wfi() will never wake up since local interrupts are disabled > > > here. So how can it pollute the kdump kernel? > > > > Having interrupts disabled doesn't prevent an exit from WFI. Quite the > > opposite, actually. It is designed to wake-up the core when something > > happens on the external interface. > > Further, WFI is a hint, and may simply act as a NOP. Ah, OK. But even so, none of interrupt handlers (nor other code) will be executed after cpu wakes up, and the memory won't be polluted. Or do I still miss something? -Takahiro AKASHI > The ARM ARM calls this out (see "D1.17.2" Wait For Interrupt in ARM DDI > 0487A.i): > > Because the architecture permits a PE to leave the low-power > state for any reason, it is permissible for a PE to treat WFI as > a NOP , but this is not recommended for lowest power operation. > > Thanks, > Mark.
On Fri, Apr 01, 2016 at 05:45:09PM +0900, AKASHI Takahiro wrote: > On Thu, Mar 31, 2016 at 11:10:38AM +0100, Mark Rutland wrote: > > On Thu, Mar 31, 2016 at 09:12:32AM +0100, Marc Zyngier wrote: > > > On 31/03/16 08:57, AKASHI Takahiro wrote: > > > > On Mon, Mar 21, 2016 at 01:29:28PM +0000, James Morse wrote: > > > >> On 18/03/16 18:08, James Morse wrote: > > > >>> On 14/03/16 17:48, Geoff Levand wrote: > > > >>>> + /* just in case */ > > > >>>> + while (1) > > > >>>> + wfi(); > > > >> > > > >> Having thought about this some more: I don't think spinning like this is safe. > > > >> We need to spin with the MMU turned off, otherwise this core will pollute the > > > >> kdump kernel with TLB entries from the old page tables. > > > > > > > > I think that wfi() will never wake up since local interrupts are disabled > > > > here. So how can it pollute the kdump kernel? > > > > > > Having interrupts disabled doesn't prevent an exit from WFI. Quite the > > > opposite, actually. It is designed to wake-up the core when something > > > happens on the external interface. > > > > Further, WFI is a hint, and may simply act as a NOP. > > Ah, OK. But even so, none of interrupt handlers (nor other code) will > be executed after cpu wakes up, and the memory won't be polluted. The code comprising the while(1) loop will be executed, and TLB walks, speculative fetches, etc may occur regardless. We don't share TLB entries between cores (we don't have support for ARMv8.2s CnP), and the kdump kernel should be running in a carveout from main memory (which IIUC is not mapped by the original kernel). So normally, this would not be a problem. However, if there is a problem with the page tables (e.g. entries erroneously point into a PA range the kdump kernel is using), then unavoidable background memory traffic from the CPU may cause problems for the kdump kernel. Thanks, Mark.
Mark, On Fri, Apr 01, 2016 at 10:36:35AM +0100, Mark Rutland wrote: > On Fri, Apr 01, 2016 at 05:45:09PM +0900, AKASHI Takahiro wrote: > > On Thu, Mar 31, 2016 at 11:10:38AM +0100, Mark Rutland wrote: > > > On Thu, Mar 31, 2016 at 09:12:32AM +0100, Marc Zyngier wrote: > > > > On 31/03/16 08:57, AKASHI Takahiro wrote: > > > > > On Mon, Mar 21, 2016 at 01:29:28PM +0000, James Morse wrote: > > > > >> On 18/03/16 18:08, James Morse wrote: > > > > >>> On 14/03/16 17:48, Geoff Levand wrote: > > > > >>>> + /* just in case */ > > > > >>>> + while (1) > > > > >>>> + wfi(); > > > > >> > > > > >> Having thought about this some more: I don't think spinning like this is safe. > > > > >> We need to spin with the MMU turned off, otherwise this core will pollute the > > > > >> kdump kernel with TLB entries from the old page tables. > > > > > > > > > > I think that wfi() will never wake up since local interrupts are disabled > > > > > here. So how can it pollute the kdump kernel? > > > > > > > > Having interrupts disabled doesn't prevent an exit from WFI. Quite the > > > > opposite, actually. It is designed to wake-up the core when something > > > > happens on the external interface. > > > > > > Further, WFI is a hint, and may simply act as a NOP. > > > > Ah, OK. But even so, none of interrupt handlers (nor other code) will > > be executed after cpu wakes up, and the memory won't be polluted. > > The code comprising the while(1) loop will be executed, and TLB walks, > speculative fetches, etc may occur regardless. > > We don't share TLB entries between cores (we don't have support for > ARMv8.2s CnP), and the kdump kernel should be running in a carveout from > main memory (which IIUC is not mapped by the original kernel). So > normally, this would not be a problem. In fact, the memory region for crash kernel will be just memblock_reserve()'d. We can't do memblock_remove() here because that region is expected to exist as part of system memory. (For details, see kimage_load_crash_segment() in kexec_core.c.) Should we remove it from kernel mapping? > However, if there is a problem with the page tables (e.g. entries > erroneously point into a PA range the kdump kernel is using), then > unavoidable background memory traffic from the CPU may cause problems > for the kdump kernel. But the traffic generated by the cpu will concentrate on the area around the while loop, ipi_cpu_crash_stop(), in the *crashed* kernel's memory. (I'm not sure about speculative behaviors.) So I don't think it will hurt kdump kernel. Thanks, -Takahiro AKASHI > Thanks, > Mark.
diff --git a/arch/arm64/include/asm/hardirq.h b/arch/arm64/include/asm/hardirq.h index a57601f..8740297 100644 --- a/arch/arm64/include/asm/hardirq.h +++ b/arch/arm64/include/asm/hardirq.h @@ -20,7 +20,7 @@ #include <linux/threads.h> #include <asm/irq.h> -#define NR_IPI 5 +#define NR_IPI 6 typedef struct { unsigned int __softirq_pending; diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h index 04744dc..2f089b3 100644 --- a/arch/arm64/include/asm/kexec.h +++ b/arch/arm64/include/asm/kexec.h @@ -30,6 +30,10 @@ #ifndef __ASSEMBLY__ +#ifndef CONFIG_KEXEC_CORE +#define crash_save_cpu(regs, cpu) +#endif + /** * crash_setup_regs() - save registers for the panic kernel * @@ -40,7 +44,37 @@ static inline void crash_setup_regs(struct pt_regs *newregs, struct pt_regs *oldregs) { - /* Empty routine needed to avoid build errors. */ + if (oldregs) { + memcpy(newregs, oldregs, sizeof(*newregs)); + } else { + __asm__ __volatile__ ( + "stp x0, x1, [%3, #16 * 0]\n" + "stp x2, x3, [%3, #16 * 1]\n" + "stp x4, x5, [%3, #16 * 2]\n" + "stp x6, x7, [%3, #16 * 3]\n" + "stp x8, x9, [%3, #16 * 4]\n" + "stp x10, x11, [%3, #16 * 5]\n" + "stp x12, x13, [%3, #16 * 6]\n" + "stp x14, x15, [%3, #16 * 7]\n" + "stp x16, x17, [%3, #16 * 8]\n" + "stp x18, x19, [%3, #16 * 9]\n" + "stp x20, x21, [%3, #16 * 10]\n" + "stp x22, x23, [%3, #16 * 11]\n" + "stp x24, x25, [%3, #16 * 12]\n" + "stp x26, x27, [%3, #16 * 13]\n" + "stp x28, x29, [%3, #16 * 14]\n" + "str x30, [%3, #16 * 15]\n" + "mov %0, sp\n" + "adr %1, 1f\n" + "mrs %2, spsr_el1\n" + "1:" + : "=r" (newregs->sp), + "=r" (newregs->pc), + "=r" (newregs->pstate) + : "r" (&newregs->regs) + : "memory" + ); + } } #endif /* __ASSEMBLY__ */ diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h index d9c3d6a..0e42ece 100644 --- a/arch/arm64/include/asm/smp.h +++ b/arch/arm64/include/asm/smp.h @@ -69,4 +69,9 @@ extern int __cpu_disable(void); extern void __cpu_die(unsigned int cpu); extern void cpu_die(void); +/* + * for crash dump + */ +extern void smp_send_crash_stop(void); + #endif /* ifndef __ASM_SMP_H */ diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c index 75f6696..8651b27 100644 --- a/arch/arm64/kernel/machine_kexec.c +++ b/arch/arm64/kernel/machine_kexec.c @@ -10,6 +10,9 @@ */ #include <linux/highmem.h> +#include <linux/interrupt.h> +#include <linux/irq.h> +#include <linux/kernel.h> #include <linux/kexec.h> #include <linux/libfdt_env.h> #include <linux/of_fdt.h> @@ -25,6 +28,7 @@ extern const unsigned char arm64_relocate_new_kernel[]; extern const unsigned long arm64_relocate_new_kernel_size; +bool in_crash_kexec; static unsigned long kimage_start; /** @@ -150,7 +154,7 @@ void machine_kexec(struct kimage *kimage) phys_addr_t reboot_code_buffer_phys; void *reboot_code_buffer; - BUG_ON(num_online_cpus() > 1); + BUG_ON((num_online_cpus() > 1) && !WARN_ON(in_crash_kexec)); reboot_code_buffer_phys = page_to_phys(kimage->control_code_page); reboot_code_buffer = kmap(kimage->control_code_page); @@ -209,13 +213,58 @@ void machine_kexec(struct kimage *kimage) * relocation is complete. */ - cpu_soft_restart(is_hyp_mode_available(), + cpu_soft_restart(in_crash_kexec ? 0 : is_hyp_mode_available(), reboot_code_buffer_phys, kimage->head, kimage_start, 0); BUG(); /* Should never get here. */ } +static void machine_kexec_mask_interrupts(void) +{ + unsigned int i; + struct irq_desc *desc; + + for_each_irq_desc(i, desc) { + struct irq_chip *chip; + int ret; + + chip = irq_desc_get_chip(desc); + if (!chip) + continue; + + /* + * First try to remove the active state. If this + * fails, try to EOI the interrupt. + */ + ret = irq_set_irqchip_state(i, IRQCHIP_STATE_ACTIVE, false); + + if (ret && irqd_irq_inprogress(&desc->irq_data) && + chip->irq_eoi) + chip->irq_eoi(&desc->irq_data); + + if (chip->irq_mask) + chip->irq_mask(&desc->irq_data); + + if (chip->irq_disable && !irqd_irq_disabled(&desc->irq_data)) + chip->irq_disable(&desc->irq_data); + } +} + +/** + * machine_crash_shutdown - shutdown non-crashing cpus and save registers + */ void machine_crash_shutdown(struct pt_regs *regs) { - /* Empty routine needed to avoid build errors. */ + local_irq_disable(); + + in_crash_kexec = true; + + /* shutdown non-crashing cpus */ + smp_send_crash_stop(); + + /* for crashing cpu */ + crash_save_cpu(regs, smp_processor_id()); + machine_kexec_mask_interrupts(); + + pr_info("Starting crashdump kernel...\n"); } diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c index b1adc51..76402c6cd 100644 --- a/arch/arm64/kernel/smp.c +++ b/arch/arm64/kernel/smp.c @@ -37,6 +37,7 @@ #include <linux/completion.h> #include <linux/of.h> #include <linux/irq_work.h> +#include <linux/kexec.h> #include <asm/alternative.h> #include <asm/atomic.h> @@ -44,6 +45,7 @@ #include <asm/cpu.h> #include <asm/cputype.h> #include <asm/cpu_ops.h> +#include <asm/kexec.h> #include <asm/mmu_context.h> #include <asm/pgtable.h> #include <asm/pgalloc.h> @@ -68,6 +70,7 @@ enum ipi_msg_type { IPI_RESCHEDULE, IPI_CALL_FUNC, IPI_CPU_STOP, + IPI_CPU_CRASH_STOP, IPI_TIMER, IPI_IRQ_WORK, }; @@ -625,6 +628,7 @@ static const char *ipi_types[NR_IPI] __tracepoint_string = { S(IPI_RESCHEDULE, "Rescheduling interrupts"), S(IPI_CALL_FUNC, "Function call interrupts"), S(IPI_CPU_STOP, "CPU stop interrupts"), + S(IPI_CPU_CRASH_STOP, "CPU stop (for crash dump) interrupts"), S(IPI_TIMER, "Timer broadcast interrupts"), S(IPI_IRQ_WORK, "IRQ work interrupts"), }; @@ -701,6 +705,28 @@ static void ipi_cpu_stop(unsigned int cpu) cpu_relax(); } +static atomic_t waiting_for_crash_ipi; + +static void ipi_cpu_crash_stop(unsigned int cpu, struct pt_regs *regs) +{ + crash_save_cpu(regs, cpu); + + raw_spin_lock(&stop_lock); + pr_debug("CPU%u: stopping\n", cpu); + raw_spin_unlock(&stop_lock); + + atomic_dec(&waiting_for_crash_ipi); + + local_irq_disable(); + + if (cpu_ops[cpu]->cpu_die) + cpu_ops[cpu]->cpu_die(cpu); + + /* just in case */ + while (1) + wfi(); +} + /* * Main handler for inter-processor interrupts */ @@ -731,6 +757,12 @@ void handle_IPI(int ipinr, struct pt_regs *regs) irq_exit(); break; + case IPI_CPU_CRASH_STOP: + irq_enter(); + ipi_cpu_crash_stop(cpu, regs); + irq_exit(); + break; + #ifdef CONFIG_GENERIC_CLOCKEVENTS_BROADCAST case IPI_TIMER: irq_enter(); @@ -791,6 +823,30 @@ void smp_send_stop(void) pr_warning("SMP: failed to stop secondary CPUs\n"); } +void smp_send_crash_stop(void) +{ + cpumask_t mask; + unsigned long timeout; + + if (num_online_cpus() == 1) + return; + + cpumask_copy(&mask, cpu_online_mask); + cpumask_clear_cpu(smp_processor_id(), &mask); + + atomic_set(&waiting_for_crash_ipi, num_online_cpus() - 1); + + smp_cross_call(&mask, IPI_CPU_CRASH_STOP); + + /* Wait up to one second for other CPUs to stop */ + timeout = USEC_PER_SEC; + while ((atomic_read(&waiting_for_crash_ipi) > 0) && timeout--) + udelay(1); + + if (atomic_read(&waiting_for_crash_ipi) > 0) + pr_warn("SMP: failed to stop secondary CPUs\n"); +} + /* * not supported here */