Message ID | 20240625160718.v2.1.Id4817adef610302554b8aa42b090d57270dc119c@changeid (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] arm64: smp: smp_send_stop() and crash_smp_send_stop() should try non-NMI first | expand |
Hi Doug, On Tue, Jun 25, 2024 at 04:07:22PM -0700, Douglas Anderson wrote: > @@ -1084,79 +1088,87 @@ static inline unsigned int num_other_online_cpus(void) > > void smp_send_stop(void) > { > + static unsigned long stop_in_progress; > + cpumask_t mask; > unsigned long timeout; > > - if (num_other_online_cpus()) { > - cpumask_t mask; > + /* > + * If this cpu is the only one alive at this point in time, online or > + * not, there are no stop messages to be sent around, so just back out. > + */ > + if (num_other_online_cpus() == 0) > + goto skip_ipi; > > - cpumask_copy(&mask, cpu_online_mask); > - cpumask_clear_cpu(smp_processor_id(), &mask); > + /* Only proceed if this is the first CPU to reach this code */ > + if (test_and_set_bit(0, &stop_in_progress)) > + return; > > - if (system_state <= SYSTEM_RUNNING) > - pr_crit("SMP: stopping secondary CPUs\n"); > - smp_cross_call(&mask, IPI_CPU_STOP); > - } > + cpumask_copy(&mask, cpu_online_mask); > + cpumask_clear_cpu(smp_processor_id(), &mask); > > - /* Wait up to one second for other CPUs to stop */ > + if (system_state <= SYSTEM_RUNNING) > + pr_crit("SMP: stopping secondary CPUs\n"); > + > + /* > + * Start with a normal IPI and wait up to one second for other CPUs to > + * stop. We do this first because it gives other processors a chance > + * to exit critical sections / drop locks and makes the rest of the > + * stop process (especially console flush) more robust. > + */ > + smp_cross_call(&mask, IPI_CPU_STOP); I realise you've moved this out of crash_smp_send_stop() and it looks like we inherited the code from x86, but do you know how this serialise against CPU hotplug operations? I've spent the last 20m looking at the code and I can't see what prevents other CPUs from coming and going while we're trying to IPI a non-atomic copy of 'cpu_online_mask'. Will
Hi, On Mon, Aug 5, 2024 at 9:53 AM Will Deacon <will@kernel.org> wrote: > > Hi Doug, > > On Tue, Jun 25, 2024 at 04:07:22PM -0700, Douglas Anderson wrote: > > @@ -1084,79 +1088,87 @@ static inline unsigned int num_other_online_cpus(void) > > > > void smp_send_stop(void) > > { > > + static unsigned long stop_in_progress; > > + cpumask_t mask; > > unsigned long timeout; > > > > - if (num_other_online_cpus()) { > > - cpumask_t mask; > > + /* > > + * If this cpu is the only one alive at this point in time, online or > > + * not, there are no stop messages to be sent around, so just back out. > > + */ > > + if (num_other_online_cpus() == 0) > > + goto skip_ipi; > > > > - cpumask_copy(&mask, cpu_online_mask); > > - cpumask_clear_cpu(smp_processor_id(), &mask); > > + /* Only proceed if this is the first CPU to reach this code */ > > + if (test_and_set_bit(0, &stop_in_progress)) > > + return; > > > > - if (system_state <= SYSTEM_RUNNING) > > - pr_crit("SMP: stopping secondary CPUs\n"); > > - smp_cross_call(&mask, IPI_CPU_STOP); > > - } > > + cpumask_copy(&mask, cpu_online_mask); > > + cpumask_clear_cpu(smp_processor_id(), &mask); > > > > - /* Wait up to one second for other CPUs to stop */ > > + if (system_state <= SYSTEM_RUNNING) > > + pr_crit("SMP: stopping secondary CPUs\n"); > > + > > + /* > > + * Start with a normal IPI and wait up to one second for other CPUs to > > + * stop. We do this first because it gives other processors a chance > > + * to exit critical sections / drop locks and makes the rest of the > > + * stop process (especially console flush) more robust. > > + */ > > + smp_cross_call(&mask, IPI_CPU_STOP); > > I realise you've moved this out of crash_smp_send_stop() and it looks > like we inherited the code from x86, but do you know how this serialise > against CPU hotplug operations? I've spent the last 20m looking at the > code and I can't see what prevents other CPUs from coming and going > while we're trying to IPI a non-atomic copy of 'cpu_online_mask'. I don't think there is anything. ...and it's not just this code either. It sure looks like nmi_trigger_cpumask_backtrace() has the same problem. I guess maybe in the case of nmi_trigger_cpumask_backtrace() it's not such a big deal because: 1. If a CPU goes away then we'll just time out 2. If a CPU shows up then we'll skip backtracing it, but we were sending backtraces at an instant in time anyway. In the case of smp_send_stop() it's probably fine if a CPU goes away because, again, we'll just timeout. ...but if a CPU shows up then that's not super ideal. Maybe it doesn't cause problems in practice but it does feel like it should be fixed. -Doug
On Mon, Aug 05, 2024 at 10:13:11AM -0700, Doug Anderson wrote: > On Mon, Aug 5, 2024 at 9:53 AM Will Deacon <will@kernel.org> wrote: > > On Tue, Jun 25, 2024 at 04:07:22PM -0700, Douglas Anderson wrote: > > > @@ -1084,79 +1088,87 @@ static inline unsigned int num_other_online_cpus(void) > > > > > > void smp_send_stop(void) > > > { > > > + static unsigned long stop_in_progress; > > > + cpumask_t mask; > > > unsigned long timeout; > > > > > > - if (num_other_online_cpus()) { > > > - cpumask_t mask; > > > + /* > > > + * If this cpu is the only one alive at this point in time, online or > > > + * not, there are no stop messages to be sent around, so just back out. > > > + */ > > > + if (num_other_online_cpus() == 0) > > > + goto skip_ipi; > > > > > > - cpumask_copy(&mask, cpu_online_mask); > > > - cpumask_clear_cpu(smp_processor_id(), &mask); > > > + /* Only proceed if this is the first CPU to reach this code */ > > > + if (test_and_set_bit(0, &stop_in_progress)) > > > + return; > > > > > > - if (system_state <= SYSTEM_RUNNING) > > > - pr_crit("SMP: stopping secondary CPUs\n"); > > > - smp_cross_call(&mask, IPI_CPU_STOP); > > > - } > > > + cpumask_copy(&mask, cpu_online_mask); > > > + cpumask_clear_cpu(smp_processor_id(), &mask); > > > > > > - /* Wait up to one second for other CPUs to stop */ > > > + if (system_state <= SYSTEM_RUNNING) > > > + pr_crit("SMP: stopping secondary CPUs\n"); > > > + > > > + /* > > > + * Start with a normal IPI and wait up to one second for other CPUs to > > > + * stop. We do this first because it gives other processors a chance > > > + * to exit critical sections / drop locks and makes the rest of the > > > + * stop process (especially console flush) more robust. > > > + */ > > > + smp_cross_call(&mask, IPI_CPU_STOP); > > > > I realise you've moved this out of crash_smp_send_stop() and it looks > > like we inherited the code from x86, but do you know how this serialise > > against CPU hotplug operations? I've spent the last 20m looking at the > > code and I can't see what prevents other CPUs from coming and going > > while we're trying to IPI a non-atomic copy of 'cpu_online_mask'. > > I don't think there is anything. ...and it's not just this code > either. It sure looks like nmi_trigger_cpumask_backtrace() has the > same problem. > > I guess maybe in the case of nmi_trigger_cpumask_backtrace() it's not > such a big deal because: > 1. If a CPU goes away then we'll just time out > 2. If a CPU shows up then we'll skip backtracing it, but we were > sending backtraces at an instant in time anyway. > > In the case of smp_send_stop() it's probably fine if a CPU goes away > because, again, we'll just timeout. ...but if a CPU shows up then > that's not super ideal. Maybe it doesn't cause problems in practice > but it does feel like it should be fixed. On the other hand, I really don't fancy taking a CPU hotplug mutex on the panic() path, so it's hard to know what's best. I suppose one thing we could do is recompute the mask before sending the NMI, which should make it a little more robust in that case. It still feels fragile, but it's no worse than the existing code, I suppose. Will
On Mon, Aug 5, 2024 at 11:13 AM Doug Anderson <dianders@chromium.org> wrote: > > Hi, > > On Mon, Aug 5, 2024 at 9:53 AM Will Deacon <will@kernel.org> wrote: > > > > Hi Doug, > > > > On Tue, Jun 25, 2024 at 04:07:22PM -0700, Douglas Anderson wrote: > > > @@ -1084,79 +1088,87 @@ static inline unsigned int num_other_online_cpus(void) > > > > > > void smp_send_stop(void) > > > { > > > + static unsigned long stop_in_progress; > > > + cpumask_t mask; > > > unsigned long timeout; > > > > > > - if (num_other_online_cpus()) { > > > - cpumask_t mask; > > > + /* > > > + * If this cpu is the only one alive at this point in time, online or > > > + * not, there are no stop messages to be sent around, so just back out. > > > + */ > > > + if (num_other_online_cpus() == 0) > > > + goto skip_ipi; > > > > > > - cpumask_copy(&mask, cpu_online_mask); > > > - cpumask_clear_cpu(smp_processor_id(), &mask); > > > + /* Only proceed if this is the first CPU to reach this code */ > > > + if (test_and_set_bit(0, &stop_in_progress)) > > > + return; > > > > > > - if (system_state <= SYSTEM_RUNNING) > > > - pr_crit("SMP: stopping secondary CPUs\n"); > > > - smp_cross_call(&mask, IPI_CPU_STOP); > > > - } > > > + cpumask_copy(&mask, cpu_online_mask); > > > + cpumask_clear_cpu(smp_processor_id(), &mask); > > > > > > - /* Wait up to one second for other CPUs to stop */ > > > + if (system_state <= SYSTEM_RUNNING) > > > + pr_crit("SMP: stopping secondary CPUs\n"); > > > + > > > + /* > > > + * Start with a normal IPI and wait up to one second for other CPUs to > > > + * stop. We do this first because it gives other processors a chance > > > + * to exit critical sections / drop locks and makes the rest of the > > > + * stop process (especially console flush) more robust. > > > + */ > > > + smp_cross_call(&mask, IPI_CPU_STOP); > > > > I realise you've moved this out of crash_smp_send_stop() and it looks > > like we inherited the code from x86, but do you know how this serialise > > against CPU hotplug operations? I've spent the last 20m looking at the > > code and I can't see what prevents other CPUs from coming and going > > while we're trying to IPI a non-atomic copy of 'cpu_online_mask'. > > I don't think there is anything. ...and it's not just this code > either. I agree -- I noticed this a while ago [1], and I added cpus_read_lock/unlock() on the caller side, because having the locks inside callees can be a problem for callers who can't sleep. [1] https://lore.kernel.org/linux-mm/ZnkGps-vQbiynNwP@google.com/ Also, do the handlers always see `crash_stop` set by the sender? If not, would it be a problem? (In the above link, it has to do extra work to make sure that handlers eventually see any updated values.) > It sure looks like nmi_trigger_cpumask_backtrace() has the > same problem. > > I guess maybe in the case of nmi_trigger_cpumask_backtrace() it's not > such a big deal because: > 1. If a CPU goes away then we'll just time out > 2. If a CPU shows up then we'll skip backtracing it, but we were > sending backtraces at an instant in time anyway. > > In the case of smp_send_stop() it's probably fine if a CPU goes away > because, again, we'll just timeout. ...but if a CPU shows up then > that's not super ideal. Maybe it doesn't cause problems in practice > but it does feel like it should be fixed. > > -Doug
Hi, On Mon, Aug 5, 2024 at 10:24 AM Will Deacon <will@kernel.org> wrote: > > On Mon, Aug 05, 2024 at 10:13:11AM -0700, Doug Anderson wrote: > > On Mon, Aug 5, 2024 at 9:53 AM Will Deacon <will@kernel.org> wrote: > > > On Tue, Jun 25, 2024 at 04:07:22PM -0700, Douglas Anderson wrote: > > > > @@ -1084,79 +1088,87 @@ static inline unsigned int num_other_online_cpus(void) > > > > > > > > void smp_send_stop(void) > > > > { > > > > + static unsigned long stop_in_progress; > > > > + cpumask_t mask; > > > > unsigned long timeout; > > > > > > > > - if (num_other_online_cpus()) { > > > > - cpumask_t mask; > > > > + /* > > > > + * If this cpu is the only one alive at this point in time, online or > > > > + * not, there are no stop messages to be sent around, so just back out. > > > > + */ > > > > + if (num_other_online_cpus() == 0) > > > > + goto skip_ipi; > > > > > > > > - cpumask_copy(&mask, cpu_online_mask); > > > > - cpumask_clear_cpu(smp_processor_id(), &mask); > > > > + /* Only proceed if this is the first CPU to reach this code */ > > > > + if (test_and_set_bit(0, &stop_in_progress)) > > > > + return; > > > > > > > > - if (system_state <= SYSTEM_RUNNING) > > > > - pr_crit("SMP: stopping secondary CPUs\n"); > > > > - smp_cross_call(&mask, IPI_CPU_STOP); > > > > - } > > > > + cpumask_copy(&mask, cpu_online_mask); > > > > + cpumask_clear_cpu(smp_processor_id(), &mask); > > > > > > > > - /* Wait up to one second for other CPUs to stop */ > > > > + if (system_state <= SYSTEM_RUNNING) > > > > + pr_crit("SMP: stopping secondary CPUs\n"); > > > > + > > > > + /* > > > > + * Start with a normal IPI and wait up to one second for other CPUs to > > > > + * stop. We do this first because it gives other processors a chance > > > > + * to exit critical sections / drop locks and makes the rest of the > > > > + * stop process (especially console flush) more robust. > > > > + */ > > > > + smp_cross_call(&mask, IPI_CPU_STOP); > > > > > > I realise you've moved this out of crash_smp_send_stop() and it looks > > > like we inherited the code from x86, but do you know how this serialise > > > against CPU hotplug operations? I've spent the last 20m looking at the > > > code and I can't see what prevents other CPUs from coming and going > > > while we're trying to IPI a non-atomic copy of 'cpu_online_mask'. > > > > I don't think there is anything. ...and it's not just this code > > either. It sure looks like nmi_trigger_cpumask_backtrace() has the > > same problem. > > > > I guess maybe in the case of nmi_trigger_cpumask_backtrace() it's not > > such a big deal because: > > 1. If a CPU goes away then we'll just time out > > 2. If a CPU shows up then we'll skip backtracing it, but we were > > sending backtraces at an instant in time anyway. > > > > In the case of smp_send_stop() it's probably fine if a CPU goes away > > because, again, we'll just timeout. ...but if a CPU shows up then > > that's not super ideal. Maybe it doesn't cause problems in practice > > but it does feel like it should be fixed. > > On the other hand, I really don't fancy taking a CPU hotplug mutex on > the panic() path, so it's hard to know what's best. > > I suppose one thing we could do is recompute the mask before sending the > NMI, which should make it a little more robust in that case. It still > feels fragile, but it's no worse than the existing code, I suppose. Happy to do this and send a new version if you want. Just let me know. Basically it's just replacing `cpumask_and(&mask, &mask, cpu_online_mask)` with `cpumask_copy(&mask, cpu_online_mask); cpumask_clear_cpu(smp_processor_id(), &mask);` in the case where we fallback to NMI. If you're happy with the patch I'm also happy if you make this change while applying. BTW: if you want to dig into what's (possibly) another preexisting race condition that I think I noticed while coding this up, I suspect that "kcrash" will end up lacking info on a CPU any time we end up in the panic_smp_self_stop() case. Trying to do something about that felt a bit far afield for me, though. -Doug
Hi, On Mon, Aug 5, 2024 at 10:26 AM Yu Zhao <yuzhao@google.com> wrote: > > > > > + /* > > > > + * Start with a normal IPI and wait up to one second for other CPUs to > > > > + * stop. We do this first because it gives other processors a chance > > > > + * to exit critical sections / drop locks and makes the rest of the > > > > + * stop process (especially console flush) more robust. > > > > + */ > > > > + smp_cross_call(&mask, IPI_CPU_STOP); > > > > > > I realise you've moved this out of crash_smp_send_stop() and it looks > > > like we inherited the code from x86, but do you know how this serialise > > > against CPU hotplug operations? I've spent the last 20m looking at the > > > code and I can't see what prevents other CPUs from coming and going > > > while we're trying to IPI a non-atomic copy of 'cpu_online_mask'. > > > > I don't think there is anything. ...and it's not just this code > > either. > > I agree -- I noticed this a while ago [1], and I added > cpus_read_lock/unlock() on the caller side, because having the locks > inside callees can be a problem for callers who can't sleep. > > [1] https://lore.kernel.org/linux-mm/ZnkGps-vQbiynNwP@google.com/ Sounds reasonable. I'm happy to review a patch doing that. > Also, do the handlers always see `crash_stop` set by the sender? If > not, would it be a problem? (In the above link, it has to do extra > work to make sure that handlers eventually see any updated values.) I _think_ this is OK. It's been a while since I wrote the original patch but I seem to remember thinking that I didn't need to do anything special. Tracing through the code again, I see in gic_ipi_send_mask() the comment: /* * Ensure that stores to Normal memory are visible to the * other CPUs before they observe us issuing the IPI. */ dmb(ishst); ...so I think that means we're fine, right?
On Mon, Aug 5, 2024 at 2:29 PM Doug Anderson <dianders@chromium.org> wrote: > > Hi, > > On Mon, Aug 5, 2024 at 10:26 AM Yu Zhao <yuzhao@google.com> wrote: > > > > > > > + /* > > > > > + * Start with a normal IPI and wait up to one second for other CPUs to > > > > > + * stop. We do this first because it gives other processors a chance > > > > > + * to exit critical sections / drop locks and makes the rest of the > > > > > + * stop process (especially console flush) more robust. > > > > > + */ > > > > > + smp_cross_call(&mask, IPI_CPU_STOP); > > > > > > > > I realise you've moved this out of crash_smp_send_stop() and it looks > > > > like we inherited the code from x86, but do you know how this serialise > > > > against CPU hotplug operations? I've spent the last 20m looking at the > > > > code and I can't see what prevents other CPUs from coming and going > > > > while we're trying to IPI a non-atomic copy of 'cpu_online_mask'. > > > > > > I don't think there is anything. ...and it's not just this code > > > either. > > > > I agree -- I noticed this a while ago [1], and I added > > cpus_read_lock/unlock() on the caller side, because having the locks > > inside callees can be a problem for callers who can't sleep. > > > > [1] https://lore.kernel.org/linux-mm/ZnkGps-vQbiynNwP@google.com/ > > Sounds reasonable. I'm happy to review a patch doing that. Thanks. > > Also, do the handlers always see `crash_stop` set by the sender? If > > not, would it be a problem? (In the above link, it has to do extra > > work to make sure that handlers eventually see any updated values.) > > I _think_ this is OK. It's been a while since I wrote the original > patch but I seem to remember thinking that I didn't need to do > anything special. Tracing through the code again, I see in > gic_ipi_send_mask() the comment: > > /* > * Ensure that stores to Normal memory are visible to the > * other CPUs before they observe us issuing the IPI. > */ > dmb(ishst); > > ...so I think that means we're fine, right? Nice -- I didn't know that.
On Mon, Aug 05, 2024 at 01:14:12PM -0700, Doug Anderson wrote: > On Mon, Aug 5, 2024 at 10:24 AM Will Deacon <will@kernel.org> wrote: > > On Mon, Aug 05, 2024 at 10:13:11AM -0700, Doug Anderson wrote: > > > On Mon, Aug 5, 2024 at 9:53 AM Will Deacon <will@kernel.org> wrote: > > > > On Tue, Jun 25, 2024 at 04:07:22PM -0700, Douglas Anderson wrote: > > > > > @@ -1084,79 +1088,87 @@ static inline unsigned int num_other_online_cpus(void) > > > > > > > > > > void smp_send_stop(void) > > > > > { > > > > > + static unsigned long stop_in_progress; > > > > > + cpumask_t mask; > > > > > unsigned long timeout; > > > > > > > > > > - if (num_other_online_cpus()) { > > > > > - cpumask_t mask; > > > > > + /* > > > > > + * If this cpu is the only one alive at this point in time, online or > > > > > + * not, there are no stop messages to be sent around, so just back out. > > > > > + */ > > > > > + if (num_other_online_cpus() == 0) > > > > > + goto skip_ipi; > > > > > > > > > > - cpumask_copy(&mask, cpu_online_mask); > > > > > - cpumask_clear_cpu(smp_processor_id(), &mask); > > > > > + /* Only proceed if this is the first CPU to reach this code */ > > > > > + if (test_and_set_bit(0, &stop_in_progress)) > > > > > + return; > > > > > > > > > > - if (system_state <= SYSTEM_RUNNING) > > > > > - pr_crit("SMP: stopping secondary CPUs\n"); > > > > > - smp_cross_call(&mask, IPI_CPU_STOP); > > > > > - } > > > > > + cpumask_copy(&mask, cpu_online_mask); > > > > > + cpumask_clear_cpu(smp_processor_id(), &mask); > > > > > > > > > > - /* Wait up to one second for other CPUs to stop */ > > > > > + if (system_state <= SYSTEM_RUNNING) > > > > > + pr_crit("SMP: stopping secondary CPUs\n"); > > > > > + > > > > > + /* > > > > > + * Start with a normal IPI and wait up to one second for other CPUs to > > > > > + * stop. We do this first because it gives other processors a chance > > > > > + * to exit critical sections / drop locks and makes the rest of the > > > > > + * stop process (especially console flush) more robust. > > > > > + */ > > > > > + smp_cross_call(&mask, IPI_CPU_STOP); > > > > > > > > I realise you've moved this out of crash_smp_send_stop() and it looks > > > > like we inherited the code from x86, but do you know how this serialise > > > > against CPU hotplug operations? I've spent the last 20m looking at the > > > > code and I can't see what prevents other CPUs from coming and going > > > > while we're trying to IPI a non-atomic copy of 'cpu_online_mask'. > > > > > > I don't think there is anything. ...and it's not just this code > > > either. It sure looks like nmi_trigger_cpumask_backtrace() has the > > > same problem. > > > > > > I guess maybe in the case of nmi_trigger_cpumask_backtrace() it's not > > > such a big deal because: > > > 1. If a CPU goes away then we'll just time out > > > 2. If a CPU shows up then we'll skip backtracing it, but we were > > > sending backtraces at an instant in time anyway. > > > > > > In the case of smp_send_stop() it's probably fine if a CPU goes away > > > because, again, we'll just timeout. ...but if a CPU shows up then > > > that's not super ideal. Maybe it doesn't cause problems in practice > > > but it does feel like it should be fixed. > > > > On the other hand, I really don't fancy taking a CPU hotplug mutex on > > the panic() path, so it's hard to know what's best. > > > > I suppose one thing we could do is recompute the mask before sending the > > NMI, which should make it a little more robust in that case. It still > > feels fragile, but it's no worse than the existing code, I suppose. > > Happy to do this and send a new version if you want. Just let me know. > Basically it's just replacing `cpumask_and(&mask, &mask, > cpu_online_mask)` with `cpumask_copy(&mask, cpu_online_mask); > cpumask_clear_cpu(smp_processor_id(), &mask);` in the case where we > fallback to NMI. If you're happy with the patch I'm also happy if you > make this change while applying. Please send a v3 with that along with a little comment summarising this discussion so I don't confuse myself again when I look at next time :) Thanks, Will
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c index 31c8b3094dd7..254619245f97 100644 --- a/arch/arm64/kernel/smp.c +++ b/arch/arm64/kernel/smp.c @@ -71,7 +71,7 @@ enum ipi_msg_type { IPI_RESCHEDULE, IPI_CALL_FUNC, IPI_CPU_STOP, - IPI_CPU_CRASH_STOP, + IPI_CPU_STOP_NMI, IPI_TIMER, IPI_IRQ_WORK, NR_IPI, @@ -88,6 +88,8 @@ static int ipi_irq_base __ro_after_init; static int nr_ipi __ro_after_init = NR_IPI; static struct irq_desc *ipi_desc[MAX_IPI] __ro_after_init; +static bool crash_stop; + static void ipi_setup(int cpu); #ifdef CONFIG_HOTPLUG_CPU @@ -771,7 +773,7 @@ static const char *ipi_types[NR_IPI] __tracepoint_string = { [IPI_RESCHEDULE] = "Rescheduling interrupts", [IPI_CALL_FUNC] = "Function call interrupts", [IPI_CPU_STOP] = "CPU stop interrupts", - [IPI_CPU_CRASH_STOP] = "CPU stop (for crash dump) interrupts", + [IPI_CPU_STOP_NMI] = "CPU stop NMIs", [IPI_TIMER] = "Timer broadcast interrupts", [IPI_IRQ_WORK] = "IRQ work interrupts", }; @@ -813,9 +815,9 @@ void arch_irq_work_raise(void) } #endif -static void __noreturn local_cpu_stop(void) +static void __noreturn local_cpu_stop(unsigned int cpu) { - set_cpu_online(smp_processor_id(), false); + set_cpu_online(cpu, false); local_daif_mask(); sdei_mask_local_cpu(); @@ -829,21 +831,26 @@ static void __noreturn local_cpu_stop(void) */ void __noreturn panic_smp_self_stop(void) { - local_cpu_stop(); + local_cpu_stop(smp_processor_id()); } -#ifdef CONFIG_KEXEC_CORE -static atomic_t waiting_for_crash_ipi = ATOMIC_INIT(0); -#endif - static void __noreturn ipi_cpu_crash_stop(unsigned int cpu, struct pt_regs *regs) { #ifdef CONFIG_KEXEC_CORE + /* + * Use local_daif_mask() instead of local_irq_disable() to make sure + * that pseudo-NMIs are disabled. The "crash stop" code starts with + * an IRQ and falls back to NMI (which might be pseudo). If the IRQ + * finally goes through right as we're timing out then the NMI could + * interrupt us. It's better to prevent the NMI and let the IRQ + * finish since the pt_regs will be better. + */ + local_daif_mask(); + crash_save_cpu(regs, cpu); - atomic_dec(&waiting_for_crash_ipi); + set_cpu_online(cpu, false); - local_irq_disable(); sdei_mask_local_cpu(); if (IS_ENABLED(CONFIG_HOTPLUG_CPU)) @@ -908,14 +915,12 @@ static void do_handle_IPI(int ipinr) break; case IPI_CPU_STOP: - local_cpu_stop(); - break; - - case IPI_CPU_CRASH_STOP: - if (IS_ENABLED(CONFIG_KEXEC_CORE)) { + case IPI_CPU_STOP_NMI: + if (IS_ENABLED(CONFIG_KEXEC_CORE) && crash_stop) { ipi_cpu_crash_stop(cpu, get_irq_regs()); - unreachable(); + } else { + local_cpu_stop(cpu); } break; @@ -970,8 +975,7 @@ static bool ipi_should_be_nmi(enum ipi_msg_type ipi) return false; switch (ipi) { - case IPI_CPU_STOP: - case IPI_CPU_CRASH_STOP: + case IPI_CPU_STOP_NMI: case IPI_CPU_BACKTRACE: case IPI_KGDB_ROUNDUP: return true; @@ -1084,79 +1088,87 @@ static inline unsigned int num_other_online_cpus(void) void smp_send_stop(void) { + static unsigned long stop_in_progress; + cpumask_t mask; unsigned long timeout; - if (num_other_online_cpus()) { - cpumask_t mask; + /* + * If this cpu is the only one alive at this point in time, online or + * not, there are no stop messages to be sent around, so just back out. + */ + if (num_other_online_cpus() == 0) + goto skip_ipi; - cpumask_copy(&mask, cpu_online_mask); - cpumask_clear_cpu(smp_processor_id(), &mask); + /* Only proceed if this is the first CPU to reach this code */ + if (test_and_set_bit(0, &stop_in_progress)) + return; - if (system_state <= SYSTEM_RUNNING) - pr_crit("SMP: stopping secondary CPUs\n"); - smp_cross_call(&mask, IPI_CPU_STOP); - } + cpumask_copy(&mask, cpu_online_mask); + cpumask_clear_cpu(smp_processor_id(), &mask); - /* Wait up to one second for other CPUs to stop */ + if (system_state <= SYSTEM_RUNNING) + pr_crit("SMP: stopping secondary CPUs\n"); + + /* + * Start with a normal IPI and wait up to one second for other CPUs to + * stop. We do this first because it gives other processors a chance + * to exit critical sections / drop locks and makes the rest of the + * stop process (especially console flush) more robust. + */ + smp_cross_call(&mask, IPI_CPU_STOP); timeout = USEC_PER_SEC; while (num_other_online_cpus() && timeout--) udelay(1); - if (num_other_online_cpus()) + /* + * If CPUs are still online, try an NMI. There's no excuse for this to + * be slow, so we only give them an extra 10 ms to respond. + */ + if (num_other_online_cpus() && ipi_should_be_nmi(IPI_CPU_STOP_NMI)) { + cpumask_and(&mask, &mask, cpu_online_mask); + + pr_info("SMP: retry stop with NMI for CPUs %*pbl\n", + cpumask_pr_args(&mask)); + + smp_cross_call(&mask, IPI_CPU_STOP_NMI); + timeout = USEC_PER_MSEC * 10; + while (num_other_online_cpus() && timeout--) + udelay(1); + } + + if (num_other_online_cpus()) { + cpumask_and(&mask, &mask, cpu_online_mask); + pr_warn("SMP: failed to stop secondary CPUs %*pbl\n", - cpumask_pr_args(cpu_online_mask)); + cpumask_pr_args(&mask)); + } +skip_ipi: sdei_mask_local_cpu(); } #ifdef CONFIG_KEXEC_CORE void crash_smp_send_stop(void) { - static int cpus_stopped; - cpumask_t mask; - unsigned long timeout; - /* * This function can be called twice in panic path, but obviously * we execute this only once. + * + * We use this same boolean to tell whether the IPI we send was a + * stop or a "crash stop". */ - if (cpus_stopped) + if (crash_stop) return; + crash_stop = 1; - cpus_stopped = 1; + smp_send_stop(); - /* - * If this cpu is the only one alive at this point in time, online or - * not, there are no stop messages to be sent around, so just back out. - */ - if (num_other_online_cpus() == 0) - goto skip_ipi; - - cpumask_copy(&mask, cpu_online_mask); - cpumask_clear_cpu(smp_processor_id(), &mask); - - atomic_set(&waiting_for_crash_ipi, num_other_online_cpus()); - - pr_crit("SMP: stopping secondary CPUs\n"); - smp_cross_call(&mask, IPI_CPU_CRASH_STOP); - - /* Wait up to one second for other CPUs to stop */ - timeout = USEC_PER_SEC; - while ((atomic_read(&waiting_for_crash_ipi) > 0) && timeout--) - udelay(1); - - if (atomic_read(&waiting_for_crash_ipi) > 0) - pr_warn("SMP: failed to stop secondary CPUs %*pbl\n", - cpumask_pr_args(&mask)); - -skip_ipi: - sdei_mask_local_cpu(); sdei_handler_abort(); } bool smp_crash_stop_failed(void) { - return (atomic_read(&waiting_for_crash_ipi) > 0); + return num_other_online_cpus() != 0; } #endif
When testing hard lockup handling on my sc7180-trogdor-lazor device with pseudo-NMI enabled, with serial console enabled and with kgdb disabled, I found that the stack crawls printed to the serial console ended up as a jumbled mess. After rebooting, the pstore-based console looked fine though. Also, enabling kgdb to trap the panic made the console look fine and avoided the mess. After a bit of tracking down, I came to the conclusion that this was what was happening: 1. The panic path was stopping all other CPUs with panic_other_cpus_shutdown(). 2. At least one of those other CPUs was in the middle of printing to the serial console and holding the console port's lock, which is grabbed with "irqsave". ...but since we were stopping with an NMI we didn't care about the "irqsave" and interrupted anyway. 3. Since we stopped the CPU while it was holding the lock it would never release it. 4. All future calls to output to the console would end up failing to get the lock in qcom_geni_serial_console_write(). This isn't _totally_ unexpected at panic time but it's a code path that's not well tested, hard to get right, and apparently doesn't work terribly well on the Qualcomm geni serial driver. The Qualcomm geni serial driver was fixed to be a bit better in commit 9e957a155005 ("serial: qcom-geni: Don't cancel/abort if we can't get the port lock") but it's nice not to get into this situation in the first place. Taking a page from what x86 appears to do in native_stop_other_cpus(), do this: 1. First, try to stop other CPUs with a normal IPI and wait a second. This gives them a chance to leave critical sections. 2. If CPUs fail to stop then retry with an NMI, but give a much lower timeout since there's no good reason for a CPU not to react quickly to a NMI. This works well and avoids the corrupted console and (presumably) could help avoid other similar issues. In order to do this, we need to do a little re-organization of our IPIs since we don't have any more free IDs. Do what was suggested in previous conversations and combine "stop" and "crash stop". That frees up an IPI so now we can have a "stop" and "stop NMI". In order to do this we also need a slight change in the way we keep track of which CPUs still need to be stopped. We need to know specifically which CPUs haven't stopped yet when we fall back to NMI but in the "crash stop" case the "cpu_online_mask" isn't updated as CPUs go down. This is why that code path had an atomic of the number of CPUs left. Solve this by also updating the "cpu_online_mask" for crash stops. All of the above lets us combine the logic for "stop" and "crash stop" code, which appeared to have a bunch of arbitrary implementation differences. Aside from the above change where we try a normal IPI and then an NMI, the combined function has a few subtle differences: * In the normal smp_send_stop(), if we fail to stop one or more CPUs then we won't include the current CPU (the one running smp_send_stop()) in the error message. * In crash_smp_send_stop(), if we fail to stop some CPUs we'll print the CPUs that we failed to stop instead of printing all _but_ the current running CPU. * In crash_smp_send_stop(), we will now only print "SMP: stopping secondary CPUs" if (system_state <= SYSTEM_RUNNING). Fixes: d7402513c935 ("arm64: smp: IPI_CPU_STOP and IPI_CPU_CRASH_STOP should try for NMI") Signed-off-by: Douglas Anderson <dianders@chromium.org> --- I'm not setup to test the crash_smp_send_stop(). I made sure it compiled and hacked the panic() method to call it, but I haven't actually run kexec. Hopefully others can confirm that it's working for them. v1: https://lore.kernel.org/r/20231207170251.1.Id4817adef610302554b8aa42b090d57270dc119c@changeid Changes in v2: - Update commit message to point to Qualcomm serial driver fix. - Use a test-and-set to prevent stop code from running twice. - Move mask clearing until after crash_save_cpu(). - Use local_daif_mask() in ipi_cpu_crash_stop(). - Don't use a new mask, just have crash case update online CPUs. arch/arm64/kernel/smp.c | 138 ++++++++++++++++++++++------------------ 1 file changed, 75 insertions(+), 63 deletions(-)