Message ID | 1444168147-17812-1-git-send-email-prarit@redhat.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
On Tuesday, October 06, 2015 05:49:07 PM Prarit Bhargava wrote: > Intel CPUs will not enter higher p-states when after switching from the > performance governor to the powersave governor, until > /sys/devices/system/cpu/intel_pstate/min_perf_pct is set to a low value. > This differs from previous behaviour in which a switch to the powersave > governor would result in a low default value for min_perf_pct. > > The behavior of the powersave governor changed after commit a04759924e25 > ("[cpufreq] intel_pstate: honor user space min_perf_pct override on > resume"). The commit introduced tracking of performance percentage > changes via sysfs in order to restore userspace changes during > suspend/resume. The problem occurs because the global values of the newly > introduced max_sysfs_pct and min_sysfs_pct are not reset on a governor > change and this causes the new governor to inherit the previous governor's > settings. > > This patch sets max_sysfs_pct to 100 and min_sysfs_pct to 0 on a governor > change which fixes the problem with governor switching. These changes > also make the initial calculations for max_perf_pct and min_perf_pct > slightly simpler. > > Before patch: > [root@intel-skylake-y-01 power]# cpupower frequency-set -g performance > [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/min_perf_pct > 100 > [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/max_perf_pct > 100 > [root@intel-skylake-y-01 power]# cpupower frequency-set -g powersave > [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/min_perf_pct > 100 > [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/max_perf_pct > 100 > > After patch: > [root@intel-skylake-y-01 power]# cpupower frequency-set -g performance > [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/min_perf_pct > 100 > [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/max_perf_pct > 100 > [root@intel-skylake-y-01 power]# cpupower frequency-set -g powersave > [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/min_perf_pct > 14 > [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/max_perf_pct > 100 > > Also note that I have tested suspend/resume (using CONFIG_PM_DEBUG): > [root@intel-skylake-y-01 power]# echo 50 > /sys/devices/system/cpu/intel_pstate/min_perf_pct > [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/*_perf_pct > 100 > 50 > [root@intel-skylake-y-01 power]# echo devices > /sys/power/pm_test > [root@intel-skylake-y-01 power]# echo platform > /sys/power/disk > [root@intel-skylake-y-01 power]# echo disk > /sys/power/state > [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/*_perf_pct > 100 > 50 > > Fixes: a04759924e25 ("[cpufreq] intel_pstate: honor user space min_perf_pct override on resume") > Cc: Kristen Carlson Accardi <kristen@linux.intel.com> > Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> > Cc: Viresh Kumar <viresh.kumar@linaro.org> > Cc: linux-pm@vger.kernel.org > Signed-off-by: Prarit Bhargava <prarit@redhat.com> > --- > drivers/cpufreq/intel_pstate.c | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c > index 3af9dd7..bb24458 100644 > --- a/drivers/cpufreq/intel_pstate.c > +++ b/drivers/cpufreq/intel_pstate.c > @@ -986,6 +986,9 @@ static int intel_pstate_set_policy(struct cpufreq_policy *policy) > if (!policy->cpuinfo.max_freq) > return -ENODEV; > > + limits.min_sysfs_pct = 0; > + limits.max_sysfs_pct = 100; > + > if (policy->policy == CPUFREQ_POLICY_PERFORMANCE && > policy->max >= policy->cpuinfo.max_freq) { > limits.min_policy_pct = 100; > @@ -1004,9 +1007,9 @@ static int intel_pstate_set_policy(struct cpufreq_policy *policy) > limits.max_policy_pct = clamp_t(int, limits.max_policy_pct, 0 , 100); > > /* Normalize user input to [min_policy_pct, max_policy_pct] */ > - limits.min_perf_pct = max(limits.min_policy_pct, limits.min_sysfs_pct); > + limits.min_perf_pct = limits.min_policy_pct; > limits.min_perf_pct = min(limits.max_policy_pct, limits.min_perf_pct); > - limits.max_perf_pct = min(limits.max_policy_pct, limits.max_sysfs_pct); > + limits.max_perf_pct = limits.max_sysfs_pct; > limits.max_perf_pct = max(limits.min_policy_pct, limits.max_perf_pct); > > /* Make sure min_perf_pct <= max_perf_pct */ > Makes sense to me. Kristen? -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wednesday, October 07, 2015 12:43:55 AM Rafael J. Wysocki wrote: > On Tuesday, October 06, 2015 05:49:07 PM Prarit Bhargava wrote: > > Intel CPUs will not enter higher p-states when after switching from the > > performance governor to the powersave governor, until > > /sys/devices/system/cpu/intel_pstate/min_perf_pct is set to a low value. > > This differs from previous behaviour in which a switch to the powersave > > governor would result in a low default value for min_perf_pct. > > > > The behavior of the powersave governor changed after commit a04759924e25 > > ("[cpufreq] intel_pstate: honor user space min_perf_pct override on > > resume"). The commit introduced tracking of performance percentage > > changes via sysfs in order to restore userspace changes during > > suspend/resume. The problem occurs because the global values of the newly > > introduced max_sysfs_pct and min_sysfs_pct are not reset on a governor > > change and this causes the new governor to inherit the previous governor's > > settings. > > > > This patch sets max_sysfs_pct to 100 and min_sysfs_pct to 0 on a governor > > change which fixes the problem with governor switching. These changes > > also make the initial calculations for max_perf_pct and min_perf_pct > > slightly simpler. > > > > Before patch: > > [root@intel-skylake-y-01 power]# cpupower frequency-set -g performance > > [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/min_perf_pct > > 100 > > [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/max_perf_pct > > 100 > > [root@intel-skylake-y-01 power]# cpupower frequency-set -g powersave > > [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/min_perf_pct > > 100 > > [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/max_perf_pct > > 100 > > > > After patch: > > [root@intel-skylake-y-01 power]# cpupower frequency-set -g performance > > [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/min_perf_pct > > 100 > > [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/max_perf_pct > > 100 > > [root@intel-skylake-y-01 power]# cpupower frequency-set -g powersave > > [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/min_perf_pct > > 14 > > [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/max_perf_pct > > 100 > > > > Also note that I have tested suspend/resume (using CONFIG_PM_DEBUG): > > [root@intel-skylake-y-01 power]# echo 50 > /sys/devices/system/cpu/intel_pstate/min_perf_pct > > [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/*_perf_pct > > 100 > > 50 > > [root@intel-skylake-y-01 power]# echo devices > /sys/power/pm_test > > [root@intel-skylake-y-01 power]# echo platform > /sys/power/disk > > [root@intel-skylake-y-01 power]# echo disk > /sys/power/state > > [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/*_perf_pct > > 100 > > 50 > > > > Fixes: a04759924e25 ("[cpufreq] intel_pstate: honor user space min_perf_pct override on resume") > > Cc: Kristen Carlson Accardi <kristen@linux.intel.com> > > Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> > > Cc: Viresh Kumar <viresh.kumar@linaro.org> > > Cc: linux-pm@vger.kernel.org > > Signed-off-by: Prarit Bhargava <prarit@redhat.com> > > --- > > drivers/cpufreq/intel_pstate.c | 7 +++++-- > > 1 file changed, 5 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c > > index 3af9dd7..bb24458 100644 > > --- a/drivers/cpufreq/intel_pstate.c > > +++ b/drivers/cpufreq/intel_pstate.c > > @@ -986,6 +986,9 @@ static int intel_pstate_set_policy(struct cpufreq_policy *policy) > > if (!policy->cpuinfo.max_freq) > > return -ENODEV; > > > > + limits.min_sysfs_pct = 0; > > + limits.max_sysfs_pct = 100; > > + > > if (policy->policy == CPUFREQ_POLICY_PERFORMANCE && > > policy->max >= policy->cpuinfo.max_freq) { > > limits.min_policy_pct = 100; > > @@ -1004,9 +1007,9 @@ static int intel_pstate_set_policy(struct cpufreq_policy *policy) > > limits.max_policy_pct = clamp_t(int, limits.max_policy_pct, 0 , 100); > > > > /* Normalize user input to [min_policy_pct, max_policy_pct] */ > > - limits.min_perf_pct = max(limits.min_policy_pct, limits.min_sysfs_pct); > > + limits.min_perf_pct = limits.min_policy_pct; > > limits.min_perf_pct = min(limits.max_policy_pct, limits.min_perf_pct); > > - limits.max_perf_pct = min(limits.max_policy_pct, limits.max_sysfs_pct); > > + limits.max_perf_pct = limits.max_sysfs_pct; On a second thought, isn't that always 100? If so, doesn't it basically discard limits.max_policy_pct? > > limits.max_perf_pct = max(limits.min_policy_pct, limits.max_perf_pct); > > > > /* Make sure min_perf_pct <= max_perf_pct */ > > Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2015.09.06 16:48 Rafael J. Wysocki wrote: > On Wednesday, October 07, 2015 12:43:55 AM Rafael J. Wysocki wrote: >> On Tuesday, October 06, 2015 05:49:07 PM Prarit Bhargava wrote: >>> Intel CPUs will not enter higher p-states when after switching from the >>> performance governor to the powersave governor, until >>> /sys/devices/system/cpu/intel_pstate/min_perf_pct is set to a low value. It works properly for me. Isn't the root issue here an incompatibility between tools/power/cpupower/utils/cpufreq-set.c and drivers/cpufreq/intel_pstate.c? (see experiment results below, where I do not use "cpupower") I am not familiar with tools/power/cpupower/utils/cpufreq-set.c, but will look at it more tomorrow. >>> This differs from previous behaviour in which a switch to the powersave >>> governor would result in a low default value for min_perf_pct. >>> >>> The behavior of the powersave governor changed after commit a04759924e25 >>> ("[cpufreq] intel_pstate: honor user space min_perf_pct override on >>> resume"). The commit introduced tracking of performance percentage >>> changes via sysfs in order to restore userspace changes during >>> suspend/resume. The problem occurs because the global values of the newly >>> introduced max_sysfs_pct and min_sysfs_pct are not reset on a governor >>> change and this causes the new governor to inherit the previous governor's >>> settings. >>> >>> This patch sets max_sysfs_pct to 100 and min_sysfs_pct to 0 on a governor >>> change which fixes the problem with governor switching. These changes >>> also make the initial calculations for max_perf_pct and min_perf_pct >>> slightly simpler. >>> >>> Before patch: >>> [root@intel-skylake-y-01 power]# cpupower frequency-set -g performance >>> [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/min_perf_pct >>> 100 >>> [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/max_perf_pct >>> 100 >>> [root@intel-skylake-y-01 power]# cpupower frequency-set -g powersave >>> [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/min_perf_pct >>> 100 >>> [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/max_perf_pct >>> 100 And before patch I get, using primitives and not cpupower: Executive Summary: Everything works fine (or at least as I thought it was supposed to). root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:powersave ... /sys/devices/system/cpu/cpu7/cpufreq/scaling_governor:powersave root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_* /sys/devices/system/cpu/intel_pstate/max_perf_pct:100 /sys/devices/system/cpu/intel_pstate/min_perf_pct:42 root@s15:/home/doug/temp# echo 50 > /sys/devices/system/cpu/intel_pstate/min_perf_pct root@s15:/home/doug/temp# echo 80 > /sys/devices/system/cpu/intel_pstate/max_perf_pct root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_* /sys/devices/system/cpu/intel_pstate/max_perf_pct:80 /sys/devices/system/cpu/intel_pstate/min_perf_pct:50 root@s15:/home/doug/temp# for file in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo "performance" > $file; done root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_* /sys/devices/system/cpu/intel_pstate/max_perf_pct:100 /sys/devices/system/cpu/intel_pstate/min_perf_pct:100 root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:performance ... /sys/devices/system/cpu/cpu7/cpufreq/scaling_governor:performance root@s15:/home/doug/temp# for file in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo "powersave" > $file; done root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:powersave ... /sys/devices/system/cpu/cpu7/cpufreq/scaling_governor:powersave root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_* /sys/devices/system/cpu/intel_pstate/max_perf_pct:80 /sys/devices/system/cpu/intel_pstate/min_perf_pct:50 >>> >>> After patch: >>> [root@intel-skylake-y-01 power]# cpupower frequency-set -g performance >>> [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/min_perf_pct >>> 100 >>> [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/max_perf_pct >>> 100 >>> [root@intel-skylake-y-01 power]# cpupower frequency-set -g powersave >>> [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/min_perf_pct >>> 14 >>> [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/max_perf_pct >>> 100 >>> And after the patch I get, using primitives and not cpupower: Executive Summary: Settings go back to default, and user settings are lost. This is not how I thought things were supposed to behave, but I'm not actually sure. root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:powersave ... /sys/devices/system/cpu/cpu7/cpufreq/scaling_governor:powersave root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_* /sys/devices/system/cpu/intel_pstate/max_perf_pct:100 /sys/devices/system/cpu/intel_pstate/min_perf_pct:42 root@s15:/home/doug/temp# echo 50 > /sys/devices/system/cpu/intel_pstate/min_perf_pct root@s15:/home/doug/temp# echo 80 > /sys/devices/system/cpu/intel_pstate/max_perf_pct root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_* /sys/devices/system/cpu/intel_pstate/max_perf_pct:80 /sys/devices/system/cpu/intel_pstate/min_perf_pct:50 root@s15:/home/doug/temp# for file in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo "performance" > $file; done root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:performance ... /sys/devices/system/cpu/cpu7/cpufreq/scaling_governor:performance root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_* /sys/devices/system/cpu/intel_pstate/max_perf_pct:100 /sys/devices/system/cpu/intel_pstate/min_perf_pct:100 root@s15:/home/doug/temp# for file in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo "powersave" > $file; done root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:powersave ... /sys/devices/system/cpu/cpu7/cpufreq/scaling_governor:powersave root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_* /sys/devices/system/cpu/intel_pstate/max_perf_pct:100 /sys/devices/system/cpu/intel_pstate/min_perf_pct:42 >>> Also note that I have tested suspend/resume (using CONFIG_PM_DEBUG): >>> [root@intel-skylake-y-01 power]# echo 50 > /sys/devices/system/cpu/intel_pstate/min_perf_pct >>> [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/*_perf_pct >>> 100 >>> 50 >>> [root@intel-skylake-y-01 power]# echo devices > /sys/power/pm_test >>> [root@intel-skylake-y-01 power]# echo platform > /sys/power/disk >>> [root@intel-skylake-y-01 power]# echo disk > /sys/power/state >>> [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/*_perf_pct >>> 100 >>> 50 Before Patch, I get: root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_* /sys/devices/system/cpu/intel_pstate/max_perf_pct:80 /sys/devices/system/cpu/intel_pstate/min_perf_pct:50 root@s15:/home/doug/temp# pm-suspend ... root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_* /sys/devices/system/cpu/intel_pstate/max_perf_pct:80 /sys/devices/system/cpu/intel_pstate/min_perf_pct:50 After Patch, I get: root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_* /sys/devices/system/cpu/intel_pstate/max_perf_pct:80 /sys/devices/system/cpu/intel_pstate/min_perf_pct:50 root@s15:/home/doug/temp# pm-suspend ... root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_* /sys/devices/system/cpu/intel_pstate/max_perf_pct:100 /sys/devices/system/cpu/intel_pstate/min_perf_pct:42 >>> >>> Fixes: a04759924e25 ("[cpufreq] intel_pstate: honor user space min_perf_pct override on resume") >>> Cc: Kristen Carlson Accardi <kristen@linux.intel.com> >>> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> >>> Cc: Viresh Kumar <viresh.kumar@linaro.org> >>> Cc: linux-pm@vger.kernel.org >>> Signed-off-by: Prarit Bhargava <prarit@redhat.com> >>> --- >>> drivers/cpufreq/intel_pstate.c | 7 +++++-- >>> 1 file changed, 5 insertions(+), 2 deletions(-) >>> >>> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c >>> index 3af9dd7..bb24458 100644 >>> --- a/drivers/cpufreq/intel_pstate.c >>> +++ b/drivers/cpufreq/intel_pstate.c >>> @@ -986,6 +986,9 @@ static int intel_pstate_set_policy(struct cpufreq_policy *policy) >>> if (!policy->cpuinfo.max_freq) >>> return -ENODEV; >>> >>> + limits.min_sysfs_pct = 0; >>> + limits.max_sysfs_pct = 100; >>> + >>> if (policy->policy == CPUFREQ_POLICY_PERFORMANCE && >>> policy->max >= policy->cpuinfo.max_freq) { >>> limits.min_policy_pct = 100; >>> @@ -1004,9 +1007,9 @@ static int intel_pstate_set_policy(struct cpufreq_policy *policy) >>> limits.max_policy_pct = clamp_t(int, limits.max_policy_pct, 0 , 100); >>> >>> /* Normalize user input to [min_policy_pct, max_policy_pct] */ >>> - limits.min_perf_pct = max(limits.min_policy_pct, limits.min_sysfs_pct); >>> + limits.min_perf_pct = limits.min_policy_pct; >>> limits.min_perf_pct = min(limits.max_policy_pct, limits.min_perf_pct); >>> - limits.max_perf_pct = min(limits.max_policy_pct, limits.max_sysfs_pct); >>> + limits.max_perf_pct = limits.max_sysfs_pct; > > On a second thought, isn't that always 100? If so, doesn't it basically discard > limits.max_policy_pct? > Yes, I think so, see above. >>> limits.max_perf_pct = max(limits.min_policy_pct, limits.max_perf_pct); >>> >>> /* Make sure min_perf_pct <= max_perf_pct */ >>> Kernels used: 4.3-rc4 and same plus this patch. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/07/2015 02:51 AM, Doug Smythies wrote: > > And before patch I get, using primitives and not cpupower: > Executive Summary: Everything works fine (or at least as I thought it was supposed to). > > root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:powersave > ... > /sys/devices/system/cpu/cpu7/cpufreq/scaling_governor:powersave > root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_* > /sys/devices/system/cpu/intel_pstate/max_perf_pct:100 > /sys/devices/system/cpu/intel_pstate/min_perf_pct:42 > root@s15:/home/doug/temp# echo 50 > /sys/devices/system/cpu/intel_pstate/min_perf_pct > root@s15:/home/doug/temp# echo 80 > /sys/devices/system/cpu/intel_pstate/max_perf_pct > root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_* > /sys/devices/system/cpu/intel_pstate/max_perf_pct:80 > /sys/devices/system/cpu/intel_pstate/min_perf_pct:50 > root@s15:/home/doug/temp# for file in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo "performance" > $file; done > root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_* > /sys/devices/system/cpu/intel_pstate/max_perf_pct:100 > /sys/devices/system/cpu/intel_pstate/min_perf_pct:100 > root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:performance The switch has to be from performance to powersave. Switch again and you'll see the problem. I can also reproduce this without using 'cpupower'. P. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/06/2015 07:06 PM, Rafael J. Wysocki wrote: > On Wednesday, October 07, 2015 12:43:55 AM Rafael J. Wysocki wrote: >> On Tuesday, October 06, 2015 05:49:07 PM Prarit Bhargava wrote: >>> Intel CPUs will not enter higher p-states when after switching from the >>> performance governor to the powersave governor, until >>> /sys/devices/system/cpu/intel_pstate/min_perf_pct is set to a low value. >>> This differs from previous behaviour in which a switch to the powersave >>> governor would result in a low default value for min_perf_pct. >>> >>> The behavior of the powersave governor changed after commit a04759924e25 >>> ("[cpufreq] intel_pstate: honor user space min_perf_pct override on >>> resume"). The commit introduced tracking of performance percentage >>> changes via sysfs in order to restore userspace changes during >>> suspend/resume. The problem occurs because the global values of the newly >>> introduced max_sysfs_pct and min_sysfs_pct are not reset on a governor >>> change and this causes the new governor to inherit the previous governor's >>> settings. >>> >>> This patch sets max_sysfs_pct to 100 and min_sysfs_pct to 0 on a governor >>> change which fixes the problem with governor switching. These changes >>> also make the initial calculations for max_perf_pct and min_perf_pct >>> slightly simpler. >>> >>> Before patch: >>> [root@intel-skylake-y-01 power]# cpupower frequency-set -g performance >>> [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/min_perf_pct >>> 100 >>> [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/max_perf_pct >>> 100 >>> [root@intel-skylake-y-01 power]# cpupower frequency-set -g powersave >>> [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/min_perf_pct >>> 100 >>> [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/max_perf_pct >>> 100 >>> >>> After patch: >>> [root@intel-skylake-y-01 power]# cpupower frequency-set -g performance >>> [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/min_perf_pct >>> 100 >>> [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/max_perf_pct >>> 100 >>> [root@intel-skylake-y-01 power]# cpupower frequency-set -g powersave >>> [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/min_perf_pct >>> 14 >>> [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/max_perf_pct >>> 100 >>> >>> Also note that I have tested suspend/resume (using CONFIG_PM_DEBUG): >>> [root@intel-skylake-y-01 power]# echo 50 > /sys/devices/system/cpu/intel_pstate/min_perf_pct >>> [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/*_perf_pct >>> 100 >>> 50 >>> [root@intel-skylake-y-01 power]# echo devices > /sys/power/pm_test >>> [root@intel-skylake-y-01 power]# echo platform > /sys/power/disk >>> [root@intel-skylake-y-01 power]# echo disk > /sys/power/state >>> [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/*_perf_pct >>> 100 >>> 50 >>> >>> Fixes: a04759924e25 ("[cpufreq] intel_pstate: honor user space min_perf_pct override on resume") >>> Cc: Kristen Carlson Accardi <kristen@linux.intel.com> >>> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> >>> Cc: Viresh Kumar <viresh.kumar@linaro.org> >>> Cc: linux-pm@vger.kernel.org >>> Signed-off-by: Prarit Bhargava <prarit@redhat.com> >>> --- >>> drivers/cpufreq/intel_pstate.c | 7 +++++-- >>> 1 file changed, 5 insertions(+), 2 deletions(-) >>> >>> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c >>> index 3af9dd7..bb24458 100644 >>> --- a/drivers/cpufreq/intel_pstate.c >>> +++ b/drivers/cpufreq/intel_pstate.c >>> @@ -986,6 +986,9 @@ static int intel_pstate_set_policy(struct cpufreq_policy *policy) >>> if (!policy->cpuinfo.max_freq) >>> return -ENODEV; >>> >>> + limits.min_sysfs_pct = 0; >>> + limits.max_sysfs_pct = 100; >>> + >>> if (policy->policy == CPUFREQ_POLICY_PERFORMANCE && >>> policy->max >= policy->cpuinfo.max_freq) { >>> limits.min_policy_pct = 100; >>> @@ -1004,9 +1007,9 @@ static int intel_pstate_set_policy(struct cpufreq_policy *policy) >>> limits.max_policy_pct = clamp_t(int, limits.max_policy_pct, 0 , 100); >>> >>> /* Normalize user input to [min_policy_pct, max_policy_pct] */ >>> - limits.min_perf_pct = max(limits.min_policy_pct, limits.min_sysfs_pct); >>> + limits.min_perf_pct = limits.min_policy_pct; >>> limits.min_perf_pct = min(limits.max_policy_pct, limits.min_perf_pct); >>> - limits.max_perf_pct = min(limits.max_policy_pct, limits.max_sysfs_pct); >>> + limits.max_perf_pct = limits.max_sysfs_pct; > > On a second thought, isn't that always 100? If so, doesn't it basically discard > limits.max_policy_pct? Looking at it, yes. And that's definitely an unintended consequence of this patch :). I'll take a closer look. I thought it should be permissible to set a range of (min_perf_pct, max_perf_pct) while changing p-states and I thought the purpose of max_perf_pct was to set the higher percentage limit. P. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/06/2015 07:06 PM, Rafael J. Wysocki wrote: > On Wednesday, October 07, 2015 12:43:55 AM Rafael J. Wysocki wrote: >> On Tuesday, October 06, 2015 05:49:07 PM Prarit Bhargava wrote: <snip> >>> >>> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c >>> index 3af9dd7..bb24458 100644 >>> --- a/drivers/cpufreq/intel_pstate.c >>> +++ b/drivers/cpufreq/intel_pstate.c >>> @@ -986,6 +986,9 @@ static int intel_pstate_set_policy(struct cpufreq_policy *policy) >>> if (!policy->cpuinfo.max_freq) >>> return -ENODEV; >>> >>> + limits.min_sysfs_pct = 0; >>> + limits.max_sysfs_pct = 100; >>> + >>> if (policy->policy == CPUFREQ_POLICY_PERFORMANCE && >>> policy->max >= policy->cpuinfo.max_freq) { >>> limits.min_policy_pct = 100; >>> @@ -1004,9 +1007,9 @@ static int intel_pstate_set_policy(struct cpufreq_policy *policy) >>> limits.max_policy_pct = clamp_t(int, limits.max_policy_pct, 0 , 100); >>> >>> /* Normalize user input to [min_policy_pct, max_policy_pct] */ >>> - limits.min_perf_pct = max(limits.min_policy_pct, limits.min_sysfs_pct); >>> + limits.min_perf_pct = limits.min_policy_pct; >>> limits.min_perf_pct = min(limits.max_policy_pct, limits.min_perf_pct); >>> - limits.max_perf_pct = min(limits.max_policy_pct, limits.max_sysfs_pct); >>> + limits.max_perf_pct = limits.max_sysfs_pct; > > On a second thought, isn't that always 100? If so, doesn't it basically discard > limits.max_policy_pct? Wow :) I think you're right and that definitely was an unintended consequence of this patch. I also see that I can clean up the intel_pstate_set_policy() code a bit more. I'll submit a 2-part v2. P. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2015.10.07 03:00 Prarit Bhargava wrote: > On 10/07/2015 02:51 AM, Doug Smythies wrote: >> >> And before patch I get, using primitives and not cpupower: >> Executive Summary: Everything works fine (or at least as I thought it was supposed to). >> >> root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor >> /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:powersave >> ... >> /sys/devices/system/cpu/cpu7/cpufreq/scaling_governor:powersave >> root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_* >> /sys/devices/system/cpu/intel_pstate/max_perf_pct:100 >> /sys/devices/system/cpu/intel_pstate/min_perf_pct:42 >> root@s15:/home/doug/temp# echo 50 > /sys/devices/system/cpu/intel_pstate/min_perf_pct >> root@s15:/home/doug/temp# echo 80 > /sys/devices/system/cpu/intel_pstate/max_perf_pct >> root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_* >> /sys/devices/system/cpu/intel_pstate/max_perf_pct:80 >> /sys/devices/system/cpu/intel_pstate/min_perf_pct:50 >> root@s15:/home/doug/temp# for file in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo "performance" > $file; done >> root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_* >> /sys/devices/system/cpu/intel_pstate/max_perf_pct:100 >> /sys/devices/system/cpu/intel_pstate/min_perf_pct:100 >> root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor >> /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:performance > > The switch has to be from performance to powersave. Switch again and you'll see > the problem. Yes, of course, and I did. The first "powersave" listing was just to show the setup. Continue reading my original reply, also added back below: The "..." stuff is just where I deleted 6 of the 8 CPUs saying the same thing. root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_* /sys/devices/system/cpu/intel_pstate/max_perf_pct:100 /sys/devices/system/cpu/intel_pstate/min_perf_pct:100 root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:performance ... /sys/devices/system/cpu/cpu7/cpufreq/scaling_governor:performance root@s15:/home/doug/temp# for file in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo "powersave" > $file; done root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:powersave ... /sys/devices/system/cpu/cpu7/cpufreq/scaling_governor:powersave root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_* /sys/devices/system/cpu/intel_pstate/max_perf_pct:80 /sys/devices/system/cpu/intel_pstate/min_perf_pct:50 > I can also reproduce this without using 'cpupower'. Interesting. Then maybe it is a difference between legacy mode and Hardware P state (HWP) mode. I do not have an HWP capable processor to test. I compiled cpupower, but get the same results as I get using primitives, meaning things work properly before the patch and resort to defaults after the patch. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/07/2015 10:04 AM, Doug Smythies wrote: > > On 2015.10.07 03:00 Prarit Bhargava wrote: >> On 10/07/2015 02:51 AM, Doug Smythies wrote: >>> >>> And before patch I get, using primitives and not cpupower: >>> Executive Summary: Everything works fine (or at least as I thought it was supposed to). >>> >>> root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor >>> /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:powersave >>> ... >>> /sys/devices/system/cpu/cpu7/cpufreq/scaling_governor:powersave >>> root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_* >>> /sys/devices/system/cpu/intel_pstate/max_perf_pct:100 >>> /sys/devices/system/cpu/intel_pstate/min_perf_pct:42 >>> root@s15:/home/doug/temp# echo 50 > /sys/devices/system/cpu/intel_pstate/min_perf_pct >>> root@s15:/home/doug/temp# echo 80 > /sys/devices/system/cpu/intel_pstate/max_perf_pct >>> root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_* >>> /sys/devices/system/cpu/intel_pstate/max_perf_pct:80 >>> /sys/devices/system/cpu/intel_pstate/min_perf_pct:50 >>> root@s15:/home/doug/temp# for file in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo "performance" > $file; done >>> root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_* >>> /sys/devices/system/cpu/intel_pstate/max_perf_pct:100 >>> /sys/devices/system/cpu/intel_pstate/min_perf_pct:100 >>> root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor >>> /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:performance >> >> The switch has to be from performance to powersave. Switch again and you'll see >> the problem. > > Yes, of course, and I did. > The first "powersave" listing was just to show the setup. > Continue reading my original reply, also added back below: > The "..." stuff is just where I deleted 6 of the 8 CPUs saying the same thing. > > root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_* > /sys/devices/system/cpu/intel_pstate/max_perf_pct:100 > /sys/devices/system/cpu/intel_pstate/min_perf_pct:100 > root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:performance > ... > /sys/devices/system/cpu/cpu7/cpufreq/scaling_governor:performance > root@s15:/home/doug/temp# for file in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo "powersave" > $file; done > root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:powersave > ... > /sys/devices/system/cpu/cpu7/cpufreq/scaling_governor:powersave > root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_* > /sys/devices/system/cpu/intel_pstate/max_perf_pct:80 > /sys/devices/system/cpu/intel_pstate/min_perf_pct:50 > >> I can also reproduce this without using 'cpupower'. > > Interesting. Then maybe it is a difference between legacy mode > and Hardware P state (HWP) mode. I do not have an HWP capable processor > to test. I have intel_pstate=no_hwp so that doesn't explain things. I will send you a printk debug patch shortly to figure out why your system doesn't see this problem. I have _6_ different variants of intel processors that see this same issue. What is your model #? P. > > I compiled cpupower, but get the same results as I get using primitives, > meaning things work properly before the patch and resort to defaults > after the patch. > > -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/07/2015 08:18 AM, Prarit Bhargava wrote: > > > On 10/06/2015 07:06 PM, Rafael J. Wysocki wrote: >> On Wednesday, October 07, 2015 12:43:55 AM Rafael J. Wysocki wrote: >>> On Tuesday, October 06, 2015 05:49:07 PM Prarit Bhargava wrote: > > <snip> > >>>> >>>> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c >>>> index 3af9dd7..bb24458 100644 >>>> --- a/drivers/cpufreq/intel_pstate.c >>>> +++ b/drivers/cpufreq/intel_pstate.c >>>> @@ -986,6 +986,9 @@ static int intel_pstate_set_policy(struct cpufreq_policy *policy) >>>> if (!policy->cpuinfo.max_freq) >>>> return -ENODEV; >>>> >>>> + limits.min_sysfs_pct = 0; >>>> + limits.max_sysfs_pct = 100; >>>> + >>>> if (policy->policy == CPUFREQ_POLICY_PERFORMANCE && >>>> policy->max >= policy->cpuinfo.max_freq) { >>>> limits.min_policy_pct = 100; >>>> @@ -1004,9 +1007,9 @@ static int intel_pstate_set_policy(struct cpufreq_policy *policy) >>>> limits.max_policy_pct = clamp_t(int, limits.max_policy_pct, 0 , 100); >>>> >>>> /* Normalize user input to [min_policy_pct, max_policy_pct] */ >>>> - limits.min_perf_pct = max(limits.min_policy_pct, limits.min_sysfs_pct); >>>> + limits.min_perf_pct = limits.min_policy_pct; >>>> limits.min_perf_pct = min(limits.max_policy_pct, limits.min_perf_pct); >>>> - limits.max_perf_pct = min(limits.max_policy_pct, limits.max_sysfs_pct); >>>> + limits.max_perf_pct = limits.max_sysfs_pct; >> >> On a second thought, isn't that always 100? If so, doesn't it basically discard >> limits.max_policy_pct? > > Wow :) I think you're right and that definitely was an unintended consequence > of this patch. I also see that I can clean up the intel_pstate_set_policy() > code a bit more. I'll submit a 2-part v2. Kristen, There is a question here and I'm going to need your input on the answer. The question is whether or not a user can exceed the max_policy_pct or drop below the min_policy_pct values via sysfs? If "No, a user should not be able to exceed or go below those values", then IMO limits.max_sysfs_pct should default to limits.max_policy_pct, and limits.min_sysfs_pct should default to limits.min_policy_pct. and I will adjust my patch accordingly. P. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2015.09.07 07:11 Prarit Bhargava wrote: Hi Prarit, > I have intel_pstate=no_hwp so that doesn't explain things. I will send you a > printk debug patch shortly to figure out why your system doesn't see this > problem. I have _6_ different variants of intel processors that see this same > issue. Interesting. > > What is your model #? Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz While probably not relevant, it is the identical processor that the previous maintainer of the intel_pstate driver had on his main test computer. As mentioned in my original reply, I am using kernel 4.3-rc4 from kernel.org mainline. For kernel configuration I steal the Ubuntu kernel config. O.K. meanwhile your printk debug patch e-mail came: On 2015.10.07 07:34 Prarit Bhargava wrote: > Doug, please apply and test. I never get the "store_min_perf_pct" stuff during startup. I get: [ 2.662092] Intel P-state driver initializing. [ 2.662110] intel_pstate_set_policy[1001] min_perf_pct = 100 [ 2.681792] intel_pstate_set_policy[1001] min_perf_pct = 100 [ 2.681825] intel_pstate_set_policy[1001] min_perf_pct = 100 [ 2.681834] intel_pstate_set_policy[1001] min_perf_pct = 100 [ 2.681935] intel_pstate_set_policy[1001] min_perf_pct = 100 [ 2.681959] intel_pstate_set_policy[1001] min_perf_pct = 100 [ 2.682004] intel_pstate_set_policy[1001] min_perf_pct = 100 [ 2.682024] intel_pstate_set_policy[1001] min_perf_pct = 100 [ 80.044329] intel_pstate_set_policy[1020] min_perf_pct = 42 [ 80.044331] intel_pstate_set_policy[1023] min_perf_pct = 42 [ 80.044331] intel_pstate_set_policy[1028] min_perf_pct = 42 [ 80.044343] intel_pstate_set_policy[1020] min_perf_pct = 42 [ 80.044344] intel_pstate_set_policy[1023] min_perf_pct = 42 [ 80.044345] intel_pstate_set_policy[1028] min_perf_pct = 42 [ 80.044354] intel_pstate_set_policy[1020] min_perf_pct = 42 [ 80.044355] intel_pstate_set_policy[1023] min_perf_pct = 42 [ 80.044355] intel_pstate_set_policy[1028] min_perf_pct = 42 [ 80.044364] intel_pstate_set_policy[1020] min_perf_pct = 42 [ 80.044365] intel_pstate_set_policy[1023] min_perf_pct = 42 [ 80.044366] intel_pstate_set_policy[1028] min_perf_pct = 42 [ 80.044375] intel_pstate_set_policy[1020] min_perf_pct = 42 [ 80.044376] intel_pstate_set_policy[1023] min_perf_pct = 42 [ 80.044376] intel_pstate_set_policy[1028] min_perf_pct = 42 [ 80.044385] intel_pstate_set_policy[1020] min_perf_pct = 42 [ 80.044386] intel_pstate_set_policy[1023] min_perf_pct = 42 [ 80.044387] intel_pstate_set_policy[1028] min_perf_pct = 42 [ 80.044396] intel_pstate_set_policy[1020] min_perf_pct = 42 [ 80.044397] intel_pstate_set_policy[1023] min_perf_pct = 42 [ 80.044397] intel_pstate_set_policy[1028] min_perf_pct = 42 [ 80.044406] intel_pstate_set_policy[1020] min_perf_pct = 42 [ 80.044407] intel_pstate_set_policy[1023] min_perf_pct = 42 [ 80.044408] intel_pstate_set_policy[1028] min_perf_pct = 42 Note that the stuff at 80 seconds is from a startup script that changes the governor to powersave after a 60 second delay. It is an Ubuntu default script. (/etc/init.d/ondemand), but merely does that same thing as this: for file in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo "powersave" > $file; done I do not know why things are done 3 times. Now, continuing: echo 50 > /sys/devices/system/cpu/intel_pstate/min_perf_pct [ 1411.436249] store_min_perf_pct[453] min_sysfs_pct = 50 [ 1411.436253] store_min_perf_pct[456] min_perf_pct = 50 [ 1411.436255] store_min_perf_pct[459] min_perf_pct = 50 [ 1411.436256] store_min_perf_pct[462] min_perf_pct = 50 cpupower frequency-set -g performance [ 1425.879832] intel_pstate_set_policy[1001] min_perf_pct = 100 [ 1425.879981] intel_pstate_set_policy[1001] min_perf_pct = 100 [ 1425.880049] intel_pstate_set_policy[1001] min_perf_pct = 100 [ 1425.880178] intel_pstate_set_policy[1001] min_perf_pct = 100 [ 1425.880324] intel_pstate_set_policy[1001] min_perf_pct = 100 [ 1425.880467] intel_pstate_set_policy[1001] min_perf_pct = 100 [ 1425.880534] intel_pstate_set_policy[1001] min_perf_pct = 100 [ 1425.880594] intel_pstate_set_policy[1001] min_perf_pct = 100 cpupower frequency-set -g powersave [ 1437.677179] intel_pstate_set_policy[1020] min_perf_pct = 50 [ 1437.677181] intel_pstate_set_policy[1023] min_perf_pct = 50 [ 1437.677182] intel_pstate_set_policy[1028] min_perf_pct = 50 [ 1437.677205] intel_pstate_set_policy[1020] min_perf_pct = 50 [ 1437.677206] intel_pstate_set_policy[1023] min_perf_pct = 50 [ 1437.677206] intel_pstate_set_policy[1028] min_perf_pct = 50 [ 1437.677224] intel_pstate_set_policy[1020] min_perf_pct = 50 [ 1437.677225] intel_pstate_set_policy[1023] min_perf_pct = 50 [ 1437.677226] intel_pstate_set_policy[1028] min_perf_pct = 50 [ 1437.677244] intel_pstate_set_policy[1020] min_perf_pct = 50 [ 1437.677245] intel_pstate_set_policy[1023] min_perf_pct = 50 [ 1437.677245] intel_pstate_set_policy[1028] min_perf_pct = 50 [ 1437.677270] intel_pstate_set_policy[1020] min_perf_pct = 50 [ 1437.677271] intel_pstate_set_policy[1023] min_perf_pct = 50 [ 1437.677279] intel_pstate_set_policy[1028] min_perf_pct = 50 [ 1437.677296] intel_pstate_set_policy[1020] min_perf_pct = 50 [ 1437.677297] intel_pstate_set_policy[1023] min_perf_pct = 50 [ 1437.677297] intel_pstate_set_policy[1028] min_perf_pct = 50 [ 1437.677316] intel_pstate_set_policy[1020] min_perf_pct = 50 [ 1437.677317] intel_pstate_set_policy[1023] min_perf_pct = 50 [ 1437.677318] intel_pstate_set_policy[1028] min_perf_pct = 50 [ 1437.677350] intel_pstate_set_policy[1020] min_perf_pct = 50 [ 1437.677351] intel_pstate_set_policy[1023] min_perf_pct = 50 [ 1437.677352] intel_pstate_set_policy[1028] min_perf_pct = 50 Do we agree or disagree that the root issue seems to be (from your test)?: \# echo 100 > /sys/devices/system/cpu/intel_pstate/min_perf_pct [ 21.483436] store_min_perf_pct[453] min_sysfs_pct = 100 [ 21.489373] store_min_perf_pct[456] min_perf_pct = 100 [ 21.495203] store_min_perf_pct[459] min_perf_pct = 100 [ 21.501050] store_min_perf_pct[462] min_perf_pct = 100 ... Doug -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/07/2015 11:40 AM, Doug Smythies wrote: > > Do we agree or disagree that the root issue seems to be (from your test)?: > > \# echo 100 > /sys/devices/system/cpu/intel_pstate/min_perf_pct > > [ 21.483436] store_min_perf_pct[453] min_sysfs_pct = 100 > [ 21.489373] store_min_perf_pct[456] min_perf_pct = 100 > [ 21.495203] store_min_perf_pct[459] min_perf_pct = 100 > [ 21.501050] store_min_perf_pct[462] min_perf_pct = 100 Yep, and it appears to be done by default in Fedora & RHEL :/ ... the issue is still the same IMO that min_sysfs_pct & max_sysfs_pct are not cleared on a governor switch. P. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2015.10.07 08:46 Prarit Bhargava wrote: > On 10/07/2015 11:40 AM, Doug Smythies wrote: >> >> Do we agree or disagree that the root issue seems to be (from your test)?: >> >> \# echo 100 > /sys/devices/system/cpu/intel_pstate/min_perf_pct >> >> [ 21.483436] store_min_perf_pct[453] min_sysfs_pct = 100 >> [ 21.489373] store_min_perf_pct[456] min_perf_pct = 100 >> [ 21.495203] store_min_perf_pct[459] min_perf_pct = 100 >> [ 21.501050] store_min_perf_pct[462] min_perf_pct = 100 > > Yep, and it appears to be done by default in Fedora & RHEL :/ ... the issue is > still the same IMO that min_sysfs_pct & max_sysfs_pct are not cleared on a > governor switch. Clearing them will break some other things. For example, and as shown in my original reply, resume from suspend. Why? Because, at least on my computer, the governor is changed to "performance" during suspend, and the "powersave" governor is restored sometime during resume. The users wants the settings they had before the suspend. Continuing with that printk debug kernel from earlier: pm-suspend: [12599.912028] intel_pstate_set_policy[1001] min_perf_pct = 100 [12599.913781] intel_pstate_set_policy[1001] min_perf_pct = 100 [12599.915343] intel_pstate_set_policy[1001] min_perf_pct = 100 [12599.916877] intel_pstate_set_policy[1001] min_perf_pct = 100 [12599.918444] intel_pstate_set_policy[1001] min_perf_pct = 100 [12599.919686] intel_pstate_set_policy[1001] min_perf_pct = 100 [12599.920932] intel_pstate_set_policy[1001] min_perf_pct = 100 [12599.922191] intel_pstate_set_policy[1001] min_perf_pct = 100 Then push the power button, i.e. resume: [12609.953358] intel_pstate_set_policy[1020] min_perf_pct = 50 [12609.953360] intel_pstate_set_policy[1023] min_perf_pct = 50 [12609.953361] intel_pstate_set_policy[1028] min_perf_pct = 50 [12609.953796] intel_pstate_set_policy[1020] min_perf_pct = 50 [12609.953797] intel_pstate_set_policy[1023] min_perf_pct = 50 [12609.953798] intel_pstate_set_policy[1028] min_perf_pct = 50 [12609.954209] intel_pstate_set_policy[1020] min_perf_pct = 50 [12609.954210] intel_pstate_set_policy[1023] min_perf_pct = 50 [12609.954211] intel_pstate_set_policy[1028] min_perf_pct = 50 [12609.954619] intel_pstate_set_policy[1020] min_perf_pct = 50 [12609.954620] intel_pstate_set_policy[1023] min_perf_pct = 50 [12609.954621] intel_pstate_set_policy[1028] min_perf_pct = 50 [12609.955028] intel_pstate_set_policy[1020] min_perf_pct = 50 [12609.955029] intel_pstate_set_policy[1023] min_perf_pct = 50 [12609.955030] intel_pstate_set_policy[1028] min_perf_pct = 50 [12609.955431] intel_pstate_set_policy[1020] min_perf_pct = 50 [12609.955432] intel_pstate_set_policy[1023] min_perf_pct = 50 [12609.955433] intel_pstate_set_policy[1028] min_perf_pct = 50 [12609.955833] intel_pstate_set_policy[1020] min_perf_pct = 50 [12609.955834] intel_pstate_set_policy[1023] min_perf_pct = 50 [12609.955835] intel_pstate_set_policy[1028] min_perf_pct = 50 [12609.956234] intel_pstate_set_policy[1020] min_perf_pct = 50 [12609.956235] intel_pstate_set_policy[1023] min_perf_pct = 50 [12609.956235] intel_pstate_set_policy[1028] min_perf_pct = 50 The below is copied from my original reply: Before Patch, I get: root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_* /sys/devices/system/cpu/intel_pstate/max_perf_pct:80 /sys/devices/system/cpu/intel_pstate/min_perf_pct:50 root@s15:/home/doug/temp# pm-suspend ... root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_* /sys/devices/system/cpu/intel_pstate/max_perf_pct:80 /sys/devices/system/cpu/intel_pstate/min_perf_pct:50 After Patch, I get: root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_* /sys/devices/system/cpu/intel_pstate/max_perf_pct:80 /sys/devices/system/cpu/intel_pstate/min_perf_pct:50 root@s15:/home/doug/temp# pm-suspend ... root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_* /sys/devices/system/cpu/intel_pstate/max_perf_pct:100 /sys/devices/system/cpu/intel_pstate/min_perf_pct:42 -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/07/2015 02:52 PM, Doug Smythies wrote: > On 2015.10.07 08:46 Prarit Bhargava wrote: >> On 10/07/2015 11:40 AM, Doug Smythies wrote: >>> >>> Do we agree or disagree that the root issue seems to be (from your test)?: >>> >>> \# echo 100 > /sys/devices/system/cpu/intel_pstate/min_perf_pct >>> >>> [ 21.483436] store_min_perf_pct[453] min_sysfs_pct = 100 >>> [ 21.489373] store_min_perf_pct[456] min_perf_pct = 100 >>> [ 21.495203] store_min_perf_pct[459] min_perf_pct = 100 >>> [ 21.501050] store_min_perf_pct[462] min_perf_pct = 100 >> >> Yep, and it appears to be done by default in Fedora & RHEL :/ ... the issue is >> still the same IMO that min_sysfs_pct & max_sysfs_pct are not cleared on a >> governor switch. > > Clearing them will break some other things. For example, and as > shown in my original reply, resume from suspend. > > Why? Because, at least on my computer, the governor is changed to > "performance" during suspend, and the "powersave" governor is > restored sometime during resume. The users wants the settings they had > before the suspend. Hmm ... okay. I'm going to try again with a new patch and I'll add an additional test of starting with the powersave governor, changing the min_perf_pct, and then doing a suspend/resume. P. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/07/2015 02:52 PM, Doug Smythies wrote: > On 2015.10.07 08:46 Prarit Bhargava wrote: >> On 10/07/2015 11:40 AM, Doug Smythies wrote: >>> >>> Do we agree or disagree that the root issue seems to be (from your test)?: >>> >>> \# echo 100 > /sys/devices/system/cpu/intel_pstate/min_perf_pct >>> >>> [ 21.483436] store_min_perf_pct[453] min_sysfs_pct = 100 >>> [ 21.489373] store_min_perf_pct[456] min_perf_pct = 100 >>> [ 21.495203] store_min_perf_pct[459] min_perf_pct = 100 >>> [ 21.501050] store_min_perf_pct[462] min_perf_pct = 100 >> >> Yep, and it appears to be done by default in Fedora & RHEL :/ ... the issue is >> still the same IMO that min_sysfs_pct & max_sysfs_pct are not cleared on a >> governor switch. > > Clearing them will break some other things. For example, and as > shown in my original reply, resume from suspend. > > Why? Because, at least on my computer, the governor is changed to > "performance" during suspend, and the "powersave" governor is > restored sometime during resume. The users wants the settings they had > before the suspend. > Looking at this in more detail after having tested on a Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz in Fedora and RHEL. I have a feeling that the switch you're seeing (poweersave->performance, suspend ... resume, performance->powersave) is occurring in userspace, and not as a result of the kernel. IMO if userspace changes the governor, all bets are off on maintaining max_sysfs_pct and min_sysfs_pct. Here's something I cannot figure out (because I do not have an Ubuntu install). *Why* is Ubuntu making the governor switch during suspend/resume? Is it because of archaic brokeness they were trying to paper over? > Continuing with that printk debug kernel from earlier: > > pm-suspend: > > [12599.912028] intel_pstate_set_policy[1001] min_perf_pct = 100 > [12599.913781] intel_pstate_set_policy[1001] min_perf_pct = 100 > [12599.915343] intel_pstate_set_policy[1001] min_perf_pct = 100 > [12599.916877] intel_pstate_set_policy[1001] min_perf_pct = 100 > [12599.918444] intel_pstate_set_policy[1001] min_perf_pct = 100 > [12599.919686] intel_pstate_set_policy[1001] min_perf_pct = 100 > [12599.920932] intel_pstate_set_policy[1001] min_perf_pct = 100 > [12599.922191] intel_pstate_set_policy[1001] min_perf_pct = 100 > > Then push the power button, i.e. resume: > > [12609.953358] intel_pstate_set_policy[1020] min_perf_pct = 50 > [12609.953360] intel_pstate_set_policy[1023] min_perf_pct = 50 > [12609.953361] intel_pstate_set_policy[1028] min_perf_pct = 50 > [12609.953796] intel_pstate_set_policy[1020] min_perf_pct = 50 > [12609.953797] intel_pstate_set_policy[1023] min_perf_pct = 50 > [12609.953798] intel_pstate_set_policy[1028] min_perf_pct = 50 > [12609.954209] intel_pstate_set_policy[1020] min_perf_pct = 50 > [12609.954210] intel_pstate_set_policy[1023] min_perf_pct = 50 > [12609.954211] intel_pstate_set_policy[1028] min_perf_pct = 50 > [12609.954619] intel_pstate_set_policy[1020] min_perf_pct = 50 > [12609.954620] intel_pstate_set_policy[1023] min_perf_pct = 50 > [12609.954621] intel_pstate_set_policy[1028] min_perf_pct = 50 > [12609.955028] intel_pstate_set_policy[1020] min_perf_pct = 50 > [12609.955029] intel_pstate_set_policy[1023] min_perf_pct = 50 > [12609.955030] intel_pstate_set_policy[1028] min_perf_pct = 50 > [12609.955431] intel_pstate_set_policy[1020] min_perf_pct = 50 > [12609.955432] intel_pstate_set_policy[1023] min_perf_pct = 50 > [12609.955433] intel_pstate_set_policy[1028] min_perf_pct = 50 > [12609.955833] intel_pstate_set_policy[1020] min_perf_pct = 50 > [12609.955834] intel_pstate_set_policy[1023] min_perf_pct = 50 > [12609.955835] intel_pstate_set_policy[1028] min_perf_pct = 50 > [12609.956234] intel_pstate_set_policy[1020] min_perf_pct = 50 > [12609.956235] intel_pstate_set_policy[1023] min_perf_pct = 50 > [12609.956235] intel_pstate_set_policy[1028] min_perf_pct = 50 > > The below is copied from my original reply: > > Before Patch, I get: > > root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_* > /sys/devices/system/cpu/intel_pstate/max_perf_pct:80 > /sys/devices/system/cpu/intel_pstate/min_perf_pct:50 > root@s15:/home/doug/temp# pm-suspend > ... > root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_* > /sys/devices/system/cpu/intel_pstate/max_perf_pct:80 > /sys/devices/system/cpu/intel_pstate/min_perf_pct:50 > > After Patch, I get: > > root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_* > /sys/devices/system/cpu/intel_pstate/max_perf_pct:80 > /sys/devices/system/cpu/intel_pstate/min_perf_pct:50 > root@s15:/home/doug/temp# pm-suspend > ... > root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_* > /sys/devices/system/cpu/intel_pstate/max_perf_pct:100 > /sys/devices/system/cpu/intel_pstate/min_perf_pct:42 Here's what I get after the patch (again, on Fedora which appears to let the kernel do it's thing during suspend/resume) on the same processor you are using (a Sandybridge, Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz), and using the powersave governor, [root@intel-skylake-y-01 ~]# cpupower frequency-info analyzing CPU 0: driver: intel_pstate CPUs which run at the same hardware frequency: 0 CPUs which need to have their frequency coordinated by software: 0 maximum transition latency: 0.97 ms. hardware limits: 400 MHz - 2.70 GHz available cpufreq governors: performance, powersave current policy: frequency should be within 400 MHz and 2.70 GHz. The governor "powersave" may decide which speed to use within this range. current CPU frequency is 800 MHz (asserted by call to hardware). boost state support: Supported: yes Active: yes [root@intel-skylake-y-01 ~]# cat /sys/devices/system/cpu/intel_pstate/*_perf_pct 100 14 [root@intel-skylake-y-01 ~]# echo devices > /sys/power/pm_test; echo platform > /sys/power/disk; sleep 1; echo disk > /sys/power/state [root@intel-skylake-y-01 ~]# cat /sys/devices/system/cpu/intel_pstate/*_perf_pct 100 14 Even if I manually change the max & min, [root@intel-skylake-y-01 ~]# echo 80 > /sys/devices/system/cpu/intel_pstate/max_perf_pct [root@intel-skylake-y-01 ~]# echo 50 > /sys/devices/system/cpu/intel_pstate/min_perf_pct [root@intel-skylake-y-01 ~]# cat /sys/devices/system/cpu/intel_pstate/*_perf_pct 80 50 [root@intel-skylake-y-01 ~]# echo devices > /sys/power/pm_test; echo platform > /sys/power/disk; sleep 1; echo disk > /sys/power/state [root@intel-skylake-y-01 ~]# cat /sys/devices/system/cpu/intel_pstate/*_perf_pct 80 50 [root@intel-skylake-y-01 ~]# Everything works. It doesn't work on Ubuntu because userspace is doing something weird. Let's figure out why that is -- anyone know who works on s/r @ Ubuntu? P. > > -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wednesday, October 07, 2015 05:31:25 PM Prarit Bhargava wrote: > > On 10/07/2015 02:52 PM, Doug Smythies wrote: > > On 2015.10.07 08:46 Prarit Bhargava wrote: > >> On 10/07/2015 11:40 AM, Doug Smythies wrote: > >>> > >>> Do we agree or disagree that the root issue seems to be (from your test)?: > >>> > >>> \# echo 100 > /sys/devices/system/cpu/intel_pstate/min_perf_pct > >>> > >>> [ 21.483436] store_min_perf_pct[453] min_sysfs_pct = 100 > >>> [ 21.489373] store_min_perf_pct[456] min_perf_pct = 100 > >>> [ 21.495203] store_min_perf_pct[459] min_perf_pct = 100 > >>> [ 21.501050] store_min_perf_pct[462] min_perf_pct = 100 > >> > >> Yep, and it appears to be done by default in Fedora & RHEL :/ ... the issue is > >> still the same IMO that min_sysfs_pct & max_sysfs_pct are not cleared on a > >> governor switch. > > > > Clearing them will break some other things. For example, and as > > shown in my original reply, resume from suspend. > > > > Why? Because, at least on my computer, the governor is changed to > > "performance" during suspend, and the "powersave" governor is > > restored sometime during resume. The users wants the settings they had > > before the suspend. > > > > Looking at this in more detail after having tested on a Intel(R) Core(TM) > i7-2600 CPU @ 3.40GHz in Fedora and RHEL. > > I have a feeling that the switch you're seeing (poweersave->performance, suspend > ... resume, performance->powersave) is occurring in userspace, and not as a > result of the kernel. IMO if userspace changes the governor, all bets are off > on maintaining max_sysfs_pct and min_sysfs_pct. > > Here's something I cannot figure out (because I do not have an Ubuntu install). > *Why* is Ubuntu making the governor switch during suspend/resume? Is it > because of archaic brokeness they were trying to paper over? That's not limited to Ubuntu, pm-utils has been doing that forever. I have no idea why has it been doing that, though. I guess the reason was to "speed up" PM transitions (in case it started when you were in a low-frequency P-state and then there was no time to bump it up before things got too far). Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2015.10.07 15:06 Rafael J. Wysocki wrote: > On Wednesday, October 07, 2015 05:31:25 PM Prarit Bhargava wrote: >> On 10/07/2015 02:52 PM, Doug Smythies wrote: >>> On 2015.10.07 08:46 Prarit Bhargava wrote: >>>> On 10/07/2015 11:40 AM, Doug Smythies wrote: >>>>> >>>>> Do we agree or disagree that the root issue seems to be (from your test)?: >>>>> >>>>> \# echo 100 > /sys/devices/system/cpu/intel_pstate/min_perf_pct >>>>> >>>>> [ 21.483436] store_min_perf_pct[453] min_sysfs_pct = 100 >>>>> [ 21.489373] store_min_perf_pct[456] min_perf_pct = 100 >>>>> [ 21.495203] store_min_perf_pct[459] min_perf_pct = 100 >>>>> [ 21.501050] store_min_perf_pct[462] min_perf_pct = 100 >>>> >>>> Yep, and it appears to be done by default in Fedora & RHEL :/ ... the issue is >>>> still the same IMO that min_sysfs_pct & max_sysfs_pct are not cleared on a >>>> governor switch. >>> >>> Clearing them will break some other things. For example, and as >>> shown in my original reply, resume from suspend. >>> >>> Why? Because, at least on my computer, the governor is changed to >>> "performance" during suspend, and the "powersave" governor is >>> restored sometime during resume. The users wants the settings they had >>> before the suspend. >>> >> Looking at this in more detail after having tested on a Intel(R) Core(TM) >> i7-2600 CPU @ 3.40GHz in Fedora and RHEL. >> >> I have a feeling that the switch you're seeing (poweersave->performance, suspend >> ... resume, performance->powersave) is occurring in userspace, and not as a >> result of the kernel. Agreed. It is pm-suspend doing it. >> IMO if userspace changes the governor, all bets are off >> on maintaining max_sysfs_pct and min_sysfs_pct. >> >> Here's something I cannot figure out (because I do not have an Ubuntu install). >> *Why* is Ubuntu making the governor switch during suspend/resume? Is it >> because of archaic brokeness they were trying to paper over? >> That's not limited to Ubuntu, pm-utils has been doing that forever. Agreed. This in pm-utils, and not limited to Ubuntu. We can ignore this issue if everyone wants, but I can envision bug reports. > I have no idea why has it been doing that, though. I guess the reason > was to "speed up" PM transitions (in case it started when you were in a > low-frequency P-state and then there was no time to bump it up before > things got too far). I have no idea either, but the stated theory seems sound. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/07/2015 06:05 PM, Rafael J. Wysocki wrote: > On Wednesday, October 07, 2015 05:31:25 PM Prarit Bhargava wrote: >> >> On 10/07/2015 02:52 PM, Doug Smythies wrote: >>> On 2015.10.07 08:46 Prarit Bhargava wrote: >>>> On 10/07/2015 11:40 AM, Doug Smythies wrote: >>>>> >>>>> Do we agree or disagree that the root issue seems to be (from your test)?: >>>>> >>>>> \# echo 100 > /sys/devices/system/cpu/intel_pstate/min_perf_pct >>>>> >>>>> [ 21.483436] store_min_perf_pct[453] min_sysfs_pct = 100 >>>>> [ 21.489373] store_min_perf_pct[456] min_perf_pct = 100 >>>>> [ 21.495203] store_min_perf_pct[459] min_perf_pct = 100 >>>>> [ 21.501050] store_min_perf_pct[462] min_perf_pct = 100 >>>> >>>> Yep, and it appears to be done by default in Fedora & RHEL :/ ... the issue is >>>> still the same IMO that min_sysfs_pct & max_sysfs_pct are not cleared on a >>>> governor switch. >>> >>> Clearing them will break some other things. For example, and as >>> shown in my original reply, resume from suspend. >>> >>> Why? Because, at least on my computer, the governor is changed to >>> "performance" during suspend, and the "powersave" governor is >>> restored sometime during resume. The users wants the settings they had >>> before the suspend. >>> >> >> Looking at this in more detail after having tested on a Intel(R) Core(TM) >> i7-2600 CPU @ 3.40GHz in Fedora and RHEL. >> >> I have a feeling that the switch you're seeing (poweersave->performance, suspend >> ... resume, performance->powersave) is occurring in userspace, and not as a >> result of the kernel. IMO if userspace changes the governor, all bets are off >> on maintaining max_sysfs_pct and min_sysfs_pct. >> >> Here's something I cannot figure out (because I do not have an Ubuntu install). >> *Why* is Ubuntu making the governor switch during suspend/resume? Is it >> because of archaic brokeness they were trying to paper over? > > That's not limited to Ubuntu, pm-utils has been doing that forever. > > I have no idea why has it been doing that, though. I guess the reason > was to "speed up" PM transitions (in case it started when you were in a > low-frequency P-state and then there was no time to bump it up before > things got too far). Nuts :/. Okay, it looks like git://anongit.freedesktop.org/git/pm-utils pm/sleep.d/94cpufreq is the culprit here. suspend/resume is known to work with the intel_pstate driver and IMO should be blacklisted. I'll install Ubuntu and do some testing. P. > > Thanks, > Rafael > -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/07/2015 06:26 PM, Doug Smythies wrote: > On 2015.10.07 15:06 Rafael J. Wysocki wrote: >> On Wednesday, October 07, 2015 05:31:25 PM Prarit Bhargava wrote: >>> On 10/07/2015 02:52 PM, Doug Smythies wrote: >>>> On 2015.10.07 08:46 Prarit Bhargava wrote: >>>>> On 10/07/2015 11:40 AM, Doug Smythies wrote: >>>>>> >>>>>> Do we agree or disagree that the root issue seems to be (from your test)?: >>>>>> >>>>>> \# echo 100 > /sys/devices/system/cpu/intel_pstate/min_perf_pct >>>>>> >>>>>> [ 21.483436] store_min_perf_pct[453] min_sysfs_pct = 100 >>>>>> [ 21.489373] store_min_perf_pct[456] min_perf_pct = 100 >>>>>> [ 21.495203] store_min_perf_pct[459] min_perf_pct = 100 >>>>>> [ 21.501050] store_min_perf_pct[462] min_perf_pct = 100 >>>>> >>>>> Yep, and it appears to be done by default in Fedora & RHEL :/ ... the issue is >>>>> still the same IMO that min_sysfs_pct & max_sysfs_pct are not cleared on a >>>>> governor switch. >>>> >>>> Clearing them will break some other things. For example, and as >>>> shown in my original reply, resume from suspend. >>>> >>>> Why? Because, at least on my computer, the governor is changed to >>>> "performance" during suspend, and the "powersave" governor is >>>> restored sometime during resume. The users wants the settings they had >>>> before the suspend. >>>> >>> Looking at this in more detail after having tested on a Intel(R) Core(TM) >>> i7-2600 CPU @ 3.40GHz in Fedora and RHEL. >>> >>> I have a feeling that the switch you're seeing (poweersave->performance, suspend >>> ... resume, performance->powersave) is occurring in userspace, and not as a >>> result of the kernel. > Agreed. It is pm-suspend doing it. > >>> IMO if userspace changes the governor, all bets are off >>> on maintaining max_sysfs_pct and min_sysfs_pct. >>> >>> Here's something I cannot figure out (because I do not have an Ubuntu install). >>> *Why* is Ubuntu making the governor switch during suspend/resume? Is it >>> because of archaic brokeness they were trying to paper over? > >>> That's not limited to Ubuntu, pm-utils has been doing that forever. > > Agreed. This in pm-utils, and not limited to Ubuntu. > We can ignore this issue if everyone wants, but I can envision bug reports. > Yeah -- I'd rather not just ignore it. Sorry to repeat myself but I just sent out a reply that said I'm going to submit a patch to pm-utils that blacklists the intel-pstate driver. Suspend/Resume is known to work with intel-pstate and we shouldn't be doing anything fancy there. I will put it on my TODO list to also later on take a look at acpi-cpufreq. >> I have no idea why has it been doing that, though. I guess the reason >> was to "speed up" PM transitions (in case it started when you were in a >> low-frequency P-state and then there was no time to bump it up before >> things got too far). > > I have no idea either, but the stated theory seems sound. Looking at various comments here and there online, it sounds like they wanted to put the system into a "sane state" before attempting to suspend/resume. The code has been untouched since 2008. We've come a long way and I'm willing to argue that we don't need it anymore. P. > > -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c index 3af9dd7..bb24458 100644 --- a/drivers/cpufreq/intel_pstate.c +++ b/drivers/cpufreq/intel_pstate.c @@ -986,6 +986,9 @@ static int intel_pstate_set_policy(struct cpufreq_policy *policy) if (!policy->cpuinfo.max_freq) return -ENODEV; + limits.min_sysfs_pct = 0; + limits.max_sysfs_pct = 100; + if (policy->policy == CPUFREQ_POLICY_PERFORMANCE && policy->max >= policy->cpuinfo.max_freq) { limits.min_policy_pct = 100; @@ -1004,9 +1007,9 @@ static int intel_pstate_set_policy(struct cpufreq_policy *policy) limits.max_policy_pct = clamp_t(int, limits.max_policy_pct, 0 , 100); /* Normalize user input to [min_policy_pct, max_policy_pct] */ - limits.min_perf_pct = max(limits.min_policy_pct, limits.min_sysfs_pct); + limits.min_perf_pct = limits.min_policy_pct; limits.min_perf_pct = min(limits.max_policy_pct, limits.min_perf_pct); - limits.max_perf_pct = min(limits.max_policy_pct, limits.max_sysfs_pct); + limits.max_perf_pct = limits.max_sysfs_pct; limits.max_perf_pct = max(limits.min_policy_pct, limits.max_perf_pct); /* Make sure min_perf_pct <= max_perf_pct */
Intel CPUs will not enter higher p-states when after switching from the performance governor to the powersave governor, until /sys/devices/system/cpu/intel_pstate/min_perf_pct is set to a low value. This differs from previous behaviour in which a switch to the powersave governor would result in a low default value for min_perf_pct. The behavior of the powersave governor changed after commit a04759924e25 ("[cpufreq] intel_pstate: honor user space min_perf_pct override on resume"). The commit introduced tracking of performance percentage changes via sysfs in order to restore userspace changes during suspend/resume. The problem occurs because the global values of the newly introduced max_sysfs_pct and min_sysfs_pct are not reset on a governor change and this causes the new governor to inherit the previous governor's settings. This patch sets max_sysfs_pct to 100 and min_sysfs_pct to 0 on a governor change which fixes the problem with governor switching. These changes also make the initial calculations for max_perf_pct and min_perf_pct slightly simpler. Before patch: [root@intel-skylake-y-01 power]# cpupower frequency-set -g performance [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/min_perf_pct 100 [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/max_perf_pct 100 [root@intel-skylake-y-01 power]# cpupower frequency-set -g powersave [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/min_perf_pct 100 [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/max_perf_pct 100 After patch: [root@intel-skylake-y-01 power]# cpupower frequency-set -g performance [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/min_perf_pct 100 [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/max_perf_pct 100 [root@intel-skylake-y-01 power]# cpupower frequency-set -g powersave [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/min_perf_pct 14 [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/max_perf_pct 100 Also note that I have tested suspend/resume (using CONFIG_PM_DEBUG): [root@intel-skylake-y-01 power]# echo 50 > /sys/devices/system/cpu/intel_pstate/min_perf_pct [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/*_perf_pct 100 50 [root@intel-skylake-y-01 power]# echo devices > /sys/power/pm_test [root@intel-skylake-y-01 power]# echo platform > /sys/power/disk [root@intel-skylake-y-01 power]# echo disk > /sys/power/state [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/*_perf_pct 100 50 Fixes: a04759924e25 ("[cpufreq] intel_pstate: honor user space min_perf_pct override on resume") Cc: Kristen Carlson Accardi <kristen@linux.intel.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: linux-pm@vger.kernel.org Signed-off-by: Prarit Bhargava <prarit@redhat.com> --- drivers/cpufreq/intel_pstate.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)