Message ID | 20230306123418.720679-2-dedekind1@gmail.com (mailing list archive) |
---|---|
State | Changes Requested, archived |
Headers | show |
Series | Sapphire Rapids C0.x idle states support | expand |
On Mon, Mar 06, 2023 at 02:34:16PM +0200, Artem Bityutskiy wrote: > From: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> > > On Intel platforms, C-states are requested using the 'monitor/mwait' > instructions pair, as implemented in 'mwait_idle_with_hints()'. This > mechanism allows for entering C1 and deeper C-states. > > Sapphire Rapids Xeon supports new idle states - C0.1 and C0.2 (later C0.x). > These idle states have lower latency comparing to C1, and can be requested > with either 'tpause' and 'umwait' instructions. > > Linux already uses the 'tpause' instruction in delay functions like > 'udelay()'. This patch adds 'umwait' and 'umonitor' instructions support. > > 'umwait' and 'tpause' instructions are very similar - both send the CPU to > C0.x and have the same break out rules. But unlike 'tpause', 'umwait' works > together with 'umonitor' and exits the C0.x when the monitored memory > address is modified (similar idea as with 'monitor/mwait'). > > This patch implements the 'umwait_idle()' function, which works very > similarly to existing 'mwait_idle_with_hints()', but requests C0.x. The > intention is to use it from the 'intel_idle' driver. Still wondering wth regular mwait can't access these new idle states.
On Mon, Mar 6, 2023 at 3:56 PM Peter Zijlstra <peterz@infradead.org> wrote: > > On Mon, Mar 06, 2023 at 02:34:16PM +0200, Artem Bityutskiy wrote: > > From: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> > > > > On Intel platforms, C-states are requested using the 'monitor/mwait' > > instructions pair, as implemented in 'mwait_idle_with_hints()'. This > > mechanism allows for entering C1 and deeper C-states. > > > > Sapphire Rapids Xeon supports new idle states - C0.1 and C0.2 (later C0.x). > > These idle states have lower latency comparing to C1, and can be requested > > with either 'tpause' and 'umwait' instructions. > > > > Linux already uses the 'tpause' instruction in delay functions like > > 'udelay()'. This patch adds 'umwait' and 'umonitor' instructions support. > > > > 'umwait' and 'tpause' instructions are very similar - both send the CPU to > > C0.x and have the same break out rules. But unlike 'tpause', 'umwait' works > > together with 'umonitor' and exits the C0.x when the monitored memory > > address is modified (similar idea as with 'monitor/mwait'). > > > > This patch implements the 'umwait_idle()' function, which works very > > similarly to existing 'mwait_idle_with_hints()', but requests C0.x. The > > intention is to use it from the 'intel_idle' driver. > > Still wondering wth regular mwait can't access these new idle states. But is this a question for Artem to answer?
On Tue, Mar 07, 2023 at 12:55:45PM +0100, Rafael J. Wysocki wrote: > On Mon, Mar 6, 2023 at 3:56 PM Peter Zijlstra <peterz@infradead.org> wrote: > > > > On Mon, Mar 06, 2023 at 02:34:16PM +0200, Artem Bityutskiy wrote: > > > From: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> > > > > > > On Intel platforms, C-states are requested using the 'monitor/mwait' > > > instructions pair, as implemented in 'mwait_idle_with_hints()'. This > > > mechanism allows for entering C1 and deeper C-states. > > > > > > Sapphire Rapids Xeon supports new idle states - C0.1 and C0.2 (later C0.x). > > > These idle states have lower latency comparing to C1, and can be requested > > > with either 'tpause' and 'umwait' instructions. > > > > > > Linux already uses the 'tpause' instruction in delay functions like > > > 'udelay()'. This patch adds 'umwait' and 'umonitor' instructions support. > > > > > > 'umwait' and 'tpause' instructions are very similar - both send the CPU to > > > C0.x and have the same break out rules. But unlike 'tpause', 'umwait' works > > > together with 'umonitor' and exits the C0.x when the monitored memory > > > address is modified (similar idea as with 'monitor/mwait'). > > > > > > This patch implements the 'umwait_idle()' function, which works very > > > similarly to existing 'mwait_idle_with_hints()', but requests C0.x. The > > > intention is to use it from the 'intel_idle' driver. > > > > Still wondering wth regular mwait can't access these new idle states. > > But is this a question for Artem to answer? Maybe, maybe not, but I did want to call out this 'design' in public. It is really weird IMO.
diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h index 778df05f8539..a8612de3212a 100644 --- a/arch/x86/include/asm/mwait.h +++ b/arch/x86/include/asm/mwait.h @@ -141,4 +141,67 @@ static inline void __tpause(u32 ecx, u32 edx, u32 eax) #endif } +#ifdef CONFIG_X86_64 +/* + * Monitor a memory address at 'rcx' using the 'umonitor' instruction. + */ +static inline void __umonitor(const void *rcx) +{ + /* "umonitor %rcx" */ +#ifdef CONFIG_AS_TPAUSE + asm volatile("umonitor %%rcx\n" + : + : "c"(rcx)); +#else + asm volatile(".byte 0xf3, 0x0f, 0xae, 0xf1\t\n" + : + : "c"(rcx)); +#endif +} + +/* + * Same as '__tpause()', but uses the 'umwait' instruction. It is very + * similar to 'tpause', but also breaks out if the data at the address + * monitored with 'umonitor' is modified. + */ +static inline void __umwait(u32 ecx, u32 edx, u32 eax) +{ + /* "umwait %ecx, %edx, %eax;" */ +#ifdef CONFIG_AS_TPAUSE + asm volatile("umwait %%ecx\n" + : + : "c"(ecx), "d"(edx), "a"(eax)); +#else + asm volatile(".byte 0xf2, 0x0f, 0xae, 0xf1\t\n" + : + : "c"(ecx), "d"(edx), "a"(eax)); +#endif +} + +/* + * Enter C0.1 or C0.2 state and stay there until an event happens (an interrupt + * or the 'need_resched()'), or the deadline is reached. The deadline is the + * absolute TSC value to exit the idle state at. However, if deadline exceeds + * the global limit in the IA32_UMWAIT_CONTROL register, the global limit + * prevails, and the idle state is exited earlier than the deadline. + */ +static inline void umwait_idle(u64 deadline, u32 state) +{ + if (!current_set_polling_and_test()) { + u32 eax, edx; + + eax = lower_32_bits(deadline); + edx = upper_32_bits(deadline); + + __umonitor(¤t_thread_info()->flags); + if (!need_resched()) + __umwait(state, edx, eax); + } + current_clr_polling(); +} +#else +#define umwait_idle(deadline, state) \ + WARN_ONCE(1, "umwait CPU instruction is not supported") +#endif /* CONFIG_X86_64 */ + #endif /* _ASM_X86_MWAIT_H */