Message ID | 20240618-arm64-fpsimd-sme-cpu-die-v1-1-9a90d1a34918@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | arm64/fpsimd: Ensure that offlined CPUs are not using SME | expand |
On Tue, Jun 18, 2024 at 03:03:50PM +0100, Mark Brown wrote: > When we use CPU hotplug to offline a CPU we may transition directly from > running a task which was using SME to the CPU being offlined. This means > that PSTATE.{SM,ZA} may still be set, indicating to the system that SME is > still in use. This could create contention with other still running CPUs if > the system uses shared SMCUs. Does it actually cause contention if the CPU isn't issuing SME instructions? Is this theoretical or something you see in practice? > For most systems this shouldn't be an issue, we should have PSCI or some > other power management mechanism which will take care of this as part of > offlining the CPU. However we do still have support for spin tables, I don't think spin-table is relevant; there's no support whatsoever for offlining CPUs with spin-table (and offlining will be rejected long before cpu_die()). > and it is possible that system firmware may not be ideally > implemented, so let's explicitly disable SME during the process of > offlining the CPU in order to ensure there's no spurious contention. If this is an issue, surely it's the same with idle, or any other long period spent in the kernel, or any long period where userspace leaves the CPU in streaming mode? It feels very odd that we should need to do something for cpu offlining in particular. Mark, > Signed-off-by: Mark Brown <broonie@kernel.org> > --- > arch/arm64/kernel/smp.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c > index 31c8b3094dd7..9e8fc6ac758a 100644 > --- a/arch/arm64/kernel/smp.c > +++ b/arch/arm64/kernel/smp.c > @@ -383,6 +383,10 @@ void __noreturn cpu_die(void) > /* Tell cpuhp_bp_sync_dead() that this CPU is now safe to dispose of */ > cpuhp_ap_report_dead(); > > + /* Ensure we are not spuriously contending any SMCU */ > + if (system_supports_sme()) > + sme_smstop(); > + > /* > * Actually shutdown the CPU. This must never fail. The specific hotplug > * mechanism must perform all required cache maintenance to ensure that > > --- > base-commit: 83a7eefedc9b56fe7bfeff13b6c7356688ffa670 > change-id: 20240617-arm64-fpsimd-sme-cpu-die-57205c7f220e > > Best regards, > -- > Mark Brown <broonie@kernel.org> > >
On Tue, Jun 18, 2024 at 03:51:47PM +0100, Mark Rutland wrote: > On Tue, Jun 18, 2024 at 03:03:50PM +0100, Mark Brown wrote: > > When we use CPU hotplug to offline a CPU we may transition directly from > > running a task which was using SME to the CPU being offlined. This means > > that PSTATE.{SM,ZA} may still be set, indicating to the system that SME is > > still in use. This could create contention with other still running CPUs if > > the system uses shared SMCUs. > Does it actually cause contention if the CPU isn't issuing SME > instructions? It was misbehaving, I didn't dig into the specifics of how. There will be a power impact too regardless of any instructions being issued. > Is this theoretical or something you see in practice? It was inspired by a report, the reporter was able to fix their firmware to be more sensible and issue the SMSTOP itself but it seemed like reasonable defensiveness/politeness for us to release the resource anyway. > I don't think spin-table is relevant; there's no support whatsoever for > offlining CPUs with spin-table (and offlining will be rejected long > before cpu_die()). Ah, good - I didn't spend enough time to convince myself there were no situations where we'd try to take down the CPU anyway. > > and it is possible that system firmware may not be ideally > > implemented, so let's explicitly disable SME during the process of > > offlining the CPU in order to ensure there's no spurious contention. > If this is an issue, surely it's the same with idle, or any other long > period spent in the kernel, or any long period where userspace leaves > the CPU in streaming mode? > It feels very odd that we should need to do something for cpu offlining > in particular. Yes, it's an issue for idle too in the case where we're not using cpuidle - I sent a separate patch for that. cpuidle should already cover this either itself or when it notifies us that register state will be lost. A good chunk of the other users that spend noticable time in kernel mode will be using kernel mode floating point so disable anyway due to that, and for everything else there's a tricky tradeoff with how long we're spending in kernel vs how much pressure is being applied and the likelyhood of returning to the same userspace process. That feels like we need some more real world experience to see what if anything is needed.
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c index 31c8b3094dd7..9e8fc6ac758a 100644 --- a/arch/arm64/kernel/smp.c +++ b/arch/arm64/kernel/smp.c @@ -383,6 +383,10 @@ void __noreturn cpu_die(void) /* Tell cpuhp_bp_sync_dead() that this CPU is now safe to dispose of */ cpuhp_ap_report_dead(); + /* Ensure we are not spuriously contending any SMCU */ + if (system_supports_sme()) + sme_smstop(); + /* * Actually shutdown the CPU. This must never fail. The specific hotplug * mechanism must perform all required cache maintenance to ensure that
When we use CPU hotplug to offline a CPU we may transition directly from running a task which was using SME to the CPU being offlined. This means that PSTATE.{SM,ZA} may still be set, indicating to the system that SME is still in use. This could create contention with other still running CPUs if the system uses shared SMCUs. For most systems this shouldn't be an issue, we should have PSCI or some other power management mechanism which will take care of this as part of offlining the CPU. However we do still have support for spin tables, and it is possible that system firmware may not be ideally implemented, so let's explicitly disable SME during the process of offlining the CPU in order to ensure there's no spurious contention. Signed-off-by: Mark Brown <broonie@kernel.org> --- arch/arm64/kernel/smp.c | 4 ++++ 1 file changed, 4 insertions(+) --- base-commit: 83a7eefedc9b56fe7bfeff13b6c7356688ffa670 change-id: 20240617-arm64-fpsimd-sme-cpu-die-57205c7f220e Best regards,