Message ID | 20240618-arm64-sme-no-cpuidle-v1-1-1de872e1691f@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | arm64/fpsimd: Ensure we don't contend a SMCU from idling CPUs | expand |
+ Mark R, he commented on the other patch. On Tue, Jun 18, 2024 at 02:57:42PM +0100, Mark Brown wrote: > When we enter the kernel we currently don't update any of the floating > point register state until either something else uses floating point or we > get a CPU_PM_ENTER notification during suspend or cpuidle. This means that > for a system which has been configured with both suspend and cpuidle > disabled we will leave whatever floating point state was loaded in the > registers present while a CPU is idling. > > For SME this may cause an idling CPU to interfere with the operation of > other CPUs, SME may be implemented as a SMCU shared with between multiple > CPUs. Leaving ZA or especially streaming mode enabled may be taken by > the hardware as an indication that SME is active use by the CPU and cause > resources to be allocated to it at the expense of other CPUs. > > Add an arch_cpu_idle_enter() implementation which disables SME if it is > active when we idle the CPU, ensuring that we don't create any spurious > contention even if cpuidle is not enabled. I guess this approach is useful when the kernel does a light WFI rather than going all the way to firmware for a deep sleep state. In general, the shallower the sleep state is, the more CPU state is retained. From a power perspective, I wonder whether we should leave the decision to drop the SME state to a cpuidle driver. If SME is implemented as a shared unit, doing an SMSTOP could be a useful indication that other tasks can use it. Which situations should we consider for such idle scenario (we discussed them offline briefly)? I think: 1. Thread migration: a thread using SME is moved to a different CPU. Here SMSTOP makes sense because a new thread scheduled eventually will need a different SME state. 2. Thread page fault followed by waiting for I/O etc. and the kernel may switch to idle. Here it's probably less beneficial to do an SMSTOP. Any other cases? Blocking syscalls don't matter since we don't preserve the state between calls. The trade-off is for case (2) above and it depends on whether it happens sufficiently often to be noticeable. I wouldn't think so. > Signed-off-by: Mark Brown <broonie@kernel.org> > --- > arch/arm64/include/asm/fpsimd.h | 2 ++ > arch/arm64/kernel/fpsimd.c | 14 ++++++++++++++ > arch/arm64/kernel/process.c | 5 +++++ > 3 files changed, 21 insertions(+) > > diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h > index bc69ac368d73..e1453a1d261d 100644 > --- a/arch/arm64/include/asm/fpsimd.h > +++ b/arch/arm64/include/asm/fpsimd.h > @@ -82,6 +82,8 @@ extern void fpsimd_restore_current_state(void); > extern void fpsimd_update_current_state(struct user_fpsimd_state const *state); > extern void fpsimd_kvm_prepare(void); > > +extern void fpsimd_idle_enter(void); > + > struct cpu_fp_state { > struct user_fpsimd_state *st; > void *sve_state; > diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c > index 82e8a6017382..400eaad374a2 100644 > --- a/arch/arm64/kernel/fpsimd.c > +++ b/arch/arm64/kernel/fpsimd.c > @@ -2126,6 +2126,20 @@ static void __init fpsimd_pm_init(void) > static inline void fpsimd_pm_init(void) { } > #endif /* CONFIG_CPU_PM */ > > +void fpsimd_idle_enter(void) > +{ > + /* > + * Leaving SME enabled may leave this core contending with > + * other cores if we have a SMCU, disable whenever we enter > + * idle to avoid this. Only do this if they're actually > + * enabled to avoid overhead in cases where we don't enter a > + * low enough power state to loose register state. > + */ > + if (system_supports_sme() && > + (read_sysreg_s(SYS_SVCR) & (SVCR_SM_MASK | SVCR_ZA_MASK))) > + fpsimd_save_and_flush_cpu_state(); > +} Do we always enter here via the idle thread? If we already had a thread switch we probably don't need to save the SME state again, only flush the state. > #ifdef CONFIG_HOTPLUG_CPU > static int fpsimd_cpu_dead(unsigned int cpu) > { > diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c > index 4ae31b7af6c3..fd616882cd49 100644 > --- a/arch/arm64/kernel/process.c > +++ b/arch/arm64/kernel/process.c > @@ -68,6 +68,11 @@ EXPORT_SYMBOL(__stack_chk_guard); > void (*pm_power_off)(void); > EXPORT_SYMBOL_GPL(pm_power_off); > > +void arch_cpu_idle_enter(void) > +{ > + fpsimd_idle_enter(); > +} > + > #ifdef CONFIG_HOTPLUG_CPU > void __noreturn arch_cpu_idle_dead(void) > {
On Wed, Jul 10, 2024 at 09:10:53PM +0100, Catalin Marinas wrote: > On Tue, Jun 18, 2024 at 02:57:42PM +0100, Mark Brown wrote: > > When we enter the kernel we currently don't update any of the floating > > point register state until either something else uses floating point or we > > get a CPU_PM_ENTER notification during suspend or cpuidle. This means that > > for a system which has been configured with both suspend and cpuidle > > disabled we will leave whatever floating point state was loaded in the > > registers present while a CPU is idling. > I guess this approach is useful when the kernel does a light WFI rather > than going all the way to firmware for a deep sleep state. In general, > the shallower the sleep state is, the more CPU state is retained. From a > power perspective, I wonder whether we should leave the decision to drop > the SME state to a cpuidle driver. The concern here is if we don't have a cpuidle driver - we could also make this conditional on !CPUIDLE. > Which situations should we consider for such idle scenario (we discussed > them offline briefly)? I think: > 1. Thread migration: a thread using SME is moved to a different CPU. > Here SMSTOP makes sense because a new thread scheduled eventually > will need a different SME state. > 2. Thread page fault followed by waiting for I/O etc. and the kernel may > switch to idle. Here it's probably less beneficial to do an SMSTOP. > Any other cases? Blocking syscalls don't matter since we don't preserve > the state between calls. For syscalls we explicitly disable streaming mode, but we do allow ZA to be active so you might have a blocking syscall with ZA enabled. Having state in ZA is less of a concern than streaming mode, it will have a power impact but it is much less likely that there will be a performance impact on other cores. > The trade-off is for case (2) above and it depends on whether it happens > sufficiently often to be noticeable. I wouldn't think so. Yes, to me it seems much more likely that we would decide to schedule a task out while it was using SME rather than getting faults where the overhead of reloading the state was noticable. > > + /* > > + * Leaving SME enabled may leave this core contending with > > + * other cores if we have a SMCU, disable whenever we enter > > + * idle to avoid this. Only do this if they're actually > > + * enabled to avoid overhead in cases where we don't enter a > > + * low enough power state to loose register state. > > + */ > > + if (system_supports_sme() && > > + (read_sysreg_s(SYS_SVCR) & (SVCR_SM_MASK | SVCR_ZA_MASK))) > > + fpsimd_save_and_flush_cpu_state(); > > +} > Do we always enter here via the idle thread? If we already had a thread > switch we probably don't need to save the SME state again, only flush > the state. If we've actually switched the thread then TIF_FOREIGN_FPSTATE has been set and we'll just check the flag and return for the save portion rather than actually writing any register values out so the overhead should be minimal. It feels safer to check in case we get better at doing the save lazily. Otherwise arch_cpu_idle_enter() is called from psci_checker as well as the idle thread, this code should not be relevant either way in that context since it runs before userspace and AIUI it's trying to do the same thing as the idle thread there anyway.
(replying to the previous version as it looks like I haven't followed up on the discussion) On Thu, Jul 11, 2024 at 12:28:12AM +0100, Mark Brown wrote: > On Wed, Jul 10, 2024 at 09:10:53PM +0100, Catalin Marinas wrote: > > On Tue, Jun 18, 2024 at 02:57:42PM +0100, Mark Brown wrote: > > > + /* > > > + * Leaving SME enabled may leave this core contending with > > > + * other cores if we have a SMCU, disable whenever we enter > > > + * idle to avoid this. Only do this if they're actually > > > + * enabled to avoid overhead in cases where we don't enter a > > > + * low enough power state to loose register state. > > > + */ > > > + if (system_supports_sme() && > > > + (read_sysreg_s(SYS_SVCR) & (SVCR_SM_MASK | SVCR_ZA_MASK))) > > > + fpsimd_save_and_flush_cpu_state(); > > > +} > > > Do we always enter here via the idle thread? If we already had a thread > > switch we probably don't need to save the SME state again, only flush > > the state. > > If we've actually switched the thread then TIF_FOREIGN_FPSTATE has been > set and we'll just check the flag and return for the save portion rather > than actually writing any register values out so the overhead should be > minimal. It feels safer to check in case we get better at doing the > save lazily. OK, so likely the state is already saved, all we need to do here is flush the state and SMSTOP. But why would switching to idle be any different than switching to a thread that doesn't used SME? It feels like we are just trying to optimise a special case only. Could we not instead issue an SMSTOP in the context switch code? Also this looks hypothetical until we have some hardware to test it on, see how it would behave with a shared SME unit.
On Thu, Sep 05, 2024 at 06:51:30PM +0100, Catalin Marinas wrote: > OK, so likely the state is already saved, all we need to do here is > flush the state and SMSTOP. But why would switching to idle be any > different than switching to a thread that doesn't used SME? It feels > like we are just trying to optimise a special case only. Could we not > instead issue an SMSTOP in the context switch code? On context switch the SMSTOP is issued as part of loading the state for the task but we only do that when either returning to userspace or it's a kernel thread with active FPSIMD usage. The idle thread is a kernel thread with no FPSIMD usage so we don't touch the state. If we did the SMSTOP unconditionally that'd mean that the optimisation where we don't reload the FP state if we bounce through a kernel thread would be broken while using SME which doesn't seem ideal, idling really does seem like a meaningfully special case here. > Also this looks hypothetical until we have some hardware to test it on, > see how it would behave with a shared SME unit. The specific performance impacts will depend on hardware (there'll likely be some power impact even on things with a single FP unit per PE) but given that keeping SM and ZA disabled when not in use is a fairly strong recommendation in the programming model my inclination at this point would be to program to the advertised model until we have confirmation that the hardware actually behaves otherwise.
On Thu, Sep 05, 2024 at 07:34:41PM +0100, Mark Brown wrote: > On Thu, Sep 05, 2024 at 06:51:30PM +0100, Catalin Marinas wrote: > > > OK, so likely the state is already saved, all we need to do here is > > flush the state and SMSTOP. But why would switching to idle be any > > different than switching to a thread that doesn't used SME? It feels > > like we are just trying to optimise a special case only. Could we not > > instead issue an SMSTOP in the context switch code? > > On context switch the SMSTOP is issued as part of loading the state for > the task but we only do that when either returning to userspace or it's > a kernel thread with active FPSIMD usage. The idle thread is a kernel > thread with no FPSIMD usage so we don't touch the state. If we did the > SMSTOP unconditionally that'd mean that the optimisation where we don't > reload the FP state if we bounce through a kernel thread would be broken > while using SME which doesn't seem ideal, idling really does seem like a > meaningfully special case here. It depends on why the CPU is idling and we don't have the whole information in this function. If it was a wait on a syscall, we already discarded the state (but we only issue sme_smstop_sm() IIUC). With this patch, we'd disable the ZA storage as well, can it cause any performance issues by forcing the user to re-fault? If it's some short-lived wait for I/O on page faults, we may not want to disable streaming mode. I don't see this last case much different from switching to a kernel thread that doesn't use SME. So I think this leaves us with the case where a thread is migrated to a different CPU and the current CPU goes into idle for longer. But, again, we can't tell in the arch callback. The cpuidle driver calling into firmware is slightly better informed since it knows it's been idle (or going to be) for longer. > > Also this looks hypothetical until we have some hardware to test it on, > > see how it would behave with a shared SME unit. > > The specific performance impacts will depend on hardware (there'll > likely be some power impact even on things with a single FP unit per > PE) but given that keeping SM and ZA disabled when not in use is a > fairly strong recommendation in the programming model my inclination at > this point would be to program to the advertised model until we have > confirmation that the hardware actually behaves otherwise. Does the programming model talk about shared units (I haven't read it, not even sure where it is)? I hope one CPU cannot DoS another by not issuing SMSTOPs and the hardware has some provisions for sharing that guarantees forward progress on all CPUs. They may not be optimal but it's highly depended on the software usage and hardware behaviour. I'm inclined not to do anything at this stage until we see the actual hardware behaviour in practice.
On Fri, Sep 06, 2024 at 03:56:52PM +0100, Catalin Marinas wrote: > On Thu, Sep 05, 2024 at 07:34:41PM +0100, Mark Brown wrote: > > On context switch the SMSTOP is issued as part of loading the state for > > the task but we only do that when either returning to userspace or it's > > a kernel thread with active FPSIMD usage. The idle thread is a kernel > > thread with no FPSIMD usage so we don't touch the state. If we did the > > SMSTOP unconditionally that'd mean that the optimisation where we don't > > reload the FP state if we bounce through a kernel thread would be broken > > while using SME which doesn't seem ideal, idling really does seem like a > > meaningfully special case here. > It depends on why the CPU is idling and we don't have the whole > information in this function. If it was a wait on a syscall, we already > discarded the state (but we only issue sme_smstop_sm() IIUC). With this > patch, we'd disable the ZA storage as well, can it cause any performance > issues by forcing the user to re-fault? There will be some overhead from reloading the FP state, yes. > If it's some short-lived wait for I/O on page faults, we may not want to > disable streaming mode. I don't see this last case much different from > switching to a kernel thread that doesn't use SME. Yeah, that one is going to depend a lot on how performant the I/O is. > So I think this leaves us with the case where a thread is migrated to a > different CPU and the current CPU goes into idle for longer. But, again, > we can't tell in the arch callback. The cpuidle driver calling into > firmware is slightly better informed since it knows it's been idle (or > going to be) for longer. Yes, cpuidle is a whole different case - this is mainly targeted at the case where that's been disabled in the kernel configuration (I was considering making this conditional on !CPUIDLE, it was an oversight not to do that in the first place). > > > Also this looks hypothetical until we have some hardware to test it on, > > > see how it would behave with a shared SME unit. > > The specific performance impacts will depend on hardware (there'll > > likely be some power impact even on things with a single FP unit per > > PE) but given that keeping SM and ZA disabled when not in use is a > > fairly strong recommendation in the programming model my inclination at > > this point would be to program to the advertised model until we have > > confirmation that the hardware actually behaves otherwise. > Does the programming model talk about shared units (I haven't read it, > not even sure where it is)? I hope one CPU cannot DoS another by not > issuing SMSTOPs and the hardware has some provisions for sharing that > guarantees forward progress on all CPUs. They may not be optimal but > it's highly depended on the software usage and hardware behaviour. This is all getting totally into IMPDEF behaviour (and QoI issues) but implementations are supposed to default to something which shares things equally between all the users and guarantees forward progress. Anything that doesn't guarantee forward progress would obviously be quite specialist, and you'd hope that if the PE isn't actually issuing FP instructions it won't impact anything. Even if things do great you'll still have the cost of keeping the unit on though. > I'm inclined not to do anything at this stage until we see the actual > hardware behaviour in practice. Like I say my inclination is the opposite way round, though probably with a check for !CONFIG_CPUIDLE.
diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h index bc69ac368d73..e1453a1d261d 100644 --- a/arch/arm64/include/asm/fpsimd.h +++ b/arch/arm64/include/asm/fpsimd.h @@ -82,6 +82,8 @@ extern void fpsimd_restore_current_state(void); extern void fpsimd_update_current_state(struct user_fpsimd_state const *state); extern void fpsimd_kvm_prepare(void); +extern void fpsimd_idle_enter(void); + struct cpu_fp_state { struct user_fpsimd_state *st; void *sve_state; diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index 82e8a6017382..400eaad374a2 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -2126,6 +2126,20 @@ static void __init fpsimd_pm_init(void) static inline void fpsimd_pm_init(void) { } #endif /* CONFIG_CPU_PM */ +void fpsimd_idle_enter(void) +{ + /* + * Leaving SME enabled may leave this core contending with + * other cores if we have a SMCU, disable whenever we enter + * idle to avoid this. Only do this if they're actually + * enabled to avoid overhead in cases where we don't enter a + * low enough power state to loose register state. + */ + if (system_supports_sme() && + (read_sysreg_s(SYS_SVCR) & (SVCR_SM_MASK | SVCR_ZA_MASK))) + fpsimd_save_and_flush_cpu_state(); +} + #ifdef CONFIG_HOTPLUG_CPU static int fpsimd_cpu_dead(unsigned int cpu) { diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c index 4ae31b7af6c3..fd616882cd49 100644 --- a/arch/arm64/kernel/process.c +++ b/arch/arm64/kernel/process.c @@ -68,6 +68,11 @@ EXPORT_SYMBOL(__stack_chk_guard); void (*pm_power_off)(void); EXPORT_SYMBOL_GPL(pm_power_off); +void arch_cpu_idle_enter(void) +{ + fpsimd_idle_enter(); +} + #ifdef CONFIG_HOTPLUG_CPU void __noreturn arch_cpu_idle_dead(void) {
When we enter the kernel we currently don't update any of the floating point register state until either something else uses floating point or we get a CPU_PM_ENTER notification during suspend or cpuidle. This means that for a system which has been configured with both suspend and cpuidle disabled we will leave whatever floating point state was loaded in the registers present while a CPU is idling. For SME this may cause an idling CPU to interfere with the operation of other CPUs, SME may be implemented as a SMCU shared with between multiple CPUs. Leaving ZA or especially streaming mode enabled may be taken by the hardware as an indication that SME is active use by the CPU and cause resources to be allocated to it at the expense of other CPUs. Add an arch_cpu_idle_enter() implementation which disables SME if it is active when we idle the CPU, ensuring that we don't create any spurious contention even if cpuidle is not enabled. Signed-off-by: Mark Brown <broonie@kernel.org> --- arch/arm64/include/asm/fpsimd.h | 2 ++ arch/arm64/kernel/fpsimd.c | 14 ++++++++++++++ arch/arm64/kernel/process.c | 5 +++++ 3 files changed, 21 insertions(+) --- base-commit: 83a7eefedc9b56fe7bfeff13b6c7356688ffa670 change-id: 20240617-arm64-sme-no-cpuidle-8c68ef86e370 Best regards,