Message ID | 20210527045647.3599-1-yu.c.chen@intel.com (mailing list archive) |
---|---|
State | Changes Requested, archived |
Headers | show |
Series | intel_idle: Adjust the SKX C6 latency and residency if PC6 is disabled | expand |
On Thu, 2021-05-27 at 12:56 +0800, Chen Yu wrote: ... snip ... > Exit latency: > The C6 exit latency is measured when woken up from CC6/PC6. In the past, > if PC6 is disabled, CPU would be demoted to CC6/PC3, which is close to > the latency from CC6/PC6 and there is no need to update the C6 exit latency. > However on newer platform there is no CC3/PC3 anymore, then the C6 exit > latency with PC6 disabled should be CC6/PC0. > > Target residency: > With PC6 disabled and C3/PC3 supported, the OS requests C3 if idle > duration is within [CC6, PC6) target_residency. On new CPU generations > with C3/PC3 deprecated, the OS would request C1E. This would cause > low energy-efficiency. In summary, the question is, should we lower > the bar to request C6 when PC6 is disabled? The answer is yes. ... snip ... Hi Yu, Thanks for this patch, it is very actual and helpful. Comments about the commit message below. This patch is specifically about SKX. It also covers CLX and CPX, because they have the same ID. Now, this platforms do not have C3 and PC3. So I would avoid talking about these states in the commit message. Why making a simple thing more complex? Here are all the SKX C-states. 1. Linux-level C-states (linux can ask for): C1, C1E, C6. 2. HW-level C-states (HW supports under the hood): C1, C1E, CC6, PC2, PC6. Here is the story of this patch in my understanding. 1. C6 maps to CC6 and PC6. 2. CC6 is "shallower" than PC6. 3. Linux assumes worst case - PC6. 4. Many datacenters and users disable PC6. 5. We can optimize intel_idle in this case: adjust C6 latency and target residency to match (faster) CC6. That's it. Then may be it is worth mentioning that CC6 vs PC2 difference is not really measurable, so the adjustment is only for PC6. Artem.
On Thu, May 27, 2021 at 11:25:18AM +0300, Artem Bityutskiy wrote: > On Thu, 2021-05-27 at 12:56 +0800, Chen Yu wrote: > > ... snip ... > > > Exit latency: > > The C6 exit latency is measured when woken up from CC6/PC6. In the past, > > if PC6 is disabled, CPU would be demoted to CC6/PC3, which is close to > > the latency from CC6/PC6 and there is no need to update the C6 exit latency. > > However on newer platform there is no CC3/PC3 anymore, then the C6 exit > > latency with PC6 disabled should be CC6/PC0. > > > > Target residency: > > With PC6 disabled and C3/PC3 supported, the OS requests C3 if idle > > duration is within [CC6, PC6) target_residency. On new CPU generations > > with C3/PC3 deprecated, the OS would request C1E. This would cause > > low energy-efficiency. In summary, the question is, should we lower > > the bar to request C6 when PC6 is disabled? The answer is yes. > ... snip ... > > Hi Yu, > Hi Artem, > Thanks for this patch, it is very actual and helpful. > > Comments about the commit message below. > > This patch is specifically about SKX. It also covers CLX and CPX, > because they have the same ID. > > Now, this platforms do not have C3 and PC3. So I would avoid talking > about these states > in the commit message. Why making a simple thing more complex? > I agree, in theory this issue is irrelevant of the C3/PC3, however I was thinking the issue would become more significant to user on platform without C3/PC3, and this increase the necessity of this patch. But let me refine the commit log to only mention C3 a little just for future reference. > Here are all the SKX C-states. > > 1. Linux-level C-states (linux can ask for): C1, C1E, C6. > 2. HW-level C-states (HW supports under the hood): C1, C1E, CC6, PC2, > PC6. > And PC0? > Here is the story of this patch in my understanding. > > 1. C6 maps to CC6 and PC6. > 2. CC6 is "shallower" than PC6. > 3. Linux assumes worst case - PC6. > 4. Many datacenters and users disable PC6. > 5. We can optimize intel_idle in this case: adjust C6 latency and > target residency to match (faster) CC6. > > That's it. > Okay. > Then may be it is worth mentioning that CC6 vs PC2 difference is not > really measurable, so > the adjustment is only for PC6. > Okay. thanks, Chenyu > Artem. >
diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c index ec1b9d306ba6..e6c543b5ee1d 100644 --- a/drivers/idle/intel_idle.c +++ b/drivers/idle/intel_idle.c @@ -1484,6 +1484,36 @@ static void __init sklh_idle_state_table_update(void) skl_cstates[6].flags |= CPUIDLE_FLAG_UNUSABLE; /* C9-SKL */ } +/** + * skx_idle_state_table_update - Adjust the Sky Lake/Cascade Lake + * idle states table. + */ +static void __init skx_idle_state_table_update(void) +{ + unsigned long long msr; + + rdmsrl(MSR_PKG_CST_CONFIG_CONTROL, msr); + + /* + * 000b: C0/C1 (no package C-state support) + * 001b: C2 + * 010b: C6 (non-retention) + * 011b: C6 (retention) + * 111b: No Package C state limits. + */ + if ((msr & 0x7) < 2) { + /* + * Uses the CC6 + PC0 latency and 3 times of + * latency for target_residency if the PC6 + * is disabled in BIOS. This is consistent + * with how intel_idle driver uses _CST + * to set the target_residency. + */ + skx_cstates[2].exit_latency = 92; + skx_cstates[2].target_residency = 276; + } +} + static bool __init intel_idle_verify_cstate(unsigned int mwait_hint) { unsigned int mwait_cstate = MWAIT_HINT2CSTATE(mwait_hint) + 1; @@ -1515,6 +1545,9 @@ static void __init intel_idle_init_cstates_icpu(struct cpuidle_driver *drv) case INTEL_FAM6_SKYLAKE: sklh_idle_state_table_update(); break; + case INTEL_FAM6_SKYLAKE_X: + skx_idle_state_table_update(); + break; } for (cstate = 0; cstate < CPUIDLE_STATE_MAX; ++cstate) {
Currently cpuidle assumes worst-case C-state parameters, and so C6 is described with PC6 parameters, which is worst case for requesting CC6. When PC6 is enabled, this is appropriate. But if PC6 is disabled in BIOS, the exit latency and target_residency should be adjusted accordingly. Exit latency: The C6 exit latency is measured when woken up from CC6/PC6. In the past, if PC6 is disabled, CPU would be demoted to CC6/PC3, which is close to the latency from CC6/PC6 and there is no need to update the C6 exit latency. However on newer platform there is no CC3/PC3 anymore, then the C6 exit latency with PC6 disabled should be CC6/PC0. Target residency: With PC6 disabled and C3/PC3 supported, the OS requests C3 if idle duration is within [CC6, PC6) target_residency. On new CPU generations with C3/PC3 deprecated, the OS would request C1E. This would cause low energy-efficiency. In summary, the question is, should we lower the bar to request C6 when PC6 is disabled? The answer is yes. To fill this gap, check if PC6 is disabled in the BIOS in the MSR_PKG_CST_CONFIG_CONTROL(0xe2). If so, use CC6/PC0 parameters as the new exit latency. Meanwhile, update target_residency to 3 times of the new exit latency. This is consistent with how intel_idle driver uses _CST to calculate the target_residency. The consequence is that, the OS would be more offen to choose C6 over C1E when PC6 is disabled. The new exit latency of CC6/PC0 was from wult[1] result, which was measured via NIC wakeup from 99.99th latency. Before the change: PC6 enabled Y N N has C3/PC3 N N Y idle duration [CC6,PC6) [CC6,PC6) [CC6,PC6) CPU request C1E C1E C3 After the change: PC6 enabled Y N N has C3/PC3 N N Y idle duration [CC6,PC6) [CC6,PC6) [CC6,PC6) CPU request C1E *C6* C3 *C6* rather than C1E is chosen. There is concern that if we should introduce a more generic solution rather than optimizing on each platforms. However consider the code complexity and different PC6 bit interpretation on different platforms, tune the code per platform seems to be an acceptable trade-off. [1] https://intel.github.io/wult/ Suggested-by: Len Brown <len.brown@intel.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com> --- drivers/idle/intel_idle.c | 33 +++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+)