diff mbox

cpufreq, intel_pstate, set max_sysfs_pct and min_sysfs_pct on governor switch

Message ID 1444168147-17812-1-git-send-email-prarit@redhat.com (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Prarit Bhargava Oct. 6, 2015, 9:49 p.m. UTC
Intel CPUs will not enter higher p-states when after switching from the
performance governor to the powersave governor, until
/sys/devices/system/cpu/intel_pstate/min_perf_pct is set to a low value.
This differs from previous behaviour in which a switch to the powersave
governor would result in a low default value for min_perf_pct.

The behavior of the powersave governor changed after commit a04759924e25
("[cpufreq] intel_pstate: honor user space min_perf_pct override on
resume").  The commit introduced tracking of performance percentage
changes via sysfs in order to restore userspace changes during
suspend/resume.  The problem occurs because the global values of the newly
introduced max_sysfs_pct and min_sysfs_pct are not reset on a governor
change and this causes the new governor to inherit the previous governor's
settings.

This patch sets max_sysfs_pct to 100 and min_sysfs_pct to 0 on a governor
change which fixes the problem with governor switching.  These changes
also make the initial calculations for max_perf_pct and min_perf_pct
slightly simpler.

Before patch:
[root@intel-skylake-y-01 power]# cpupower frequency-set -g performance
[root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/min_perf_pct
100
[root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/max_perf_pct
100
[root@intel-skylake-y-01 power]# cpupower frequency-set -g powersave
[root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/min_perf_pct
100
[root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/max_perf_pct
100

After patch:
[root@intel-skylake-y-01 power]# cpupower frequency-set -g performance
[root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/min_perf_pct
100
[root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/max_perf_pct
100
[root@intel-skylake-y-01 power]# cpupower frequency-set -g powersave
[root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/min_perf_pct
14
[root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/max_perf_pct
100

Also note that I have tested suspend/resume (using CONFIG_PM_DEBUG):
[root@intel-skylake-y-01 power]# echo 50 > /sys/devices/system/cpu/intel_pstate/min_perf_pct
[root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/*_perf_pct
100
50
[root@intel-skylake-y-01 power]# echo devices > /sys/power/pm_test
[root@intel-skylake-y-01 power]# echo platform > /sys/power/disk
[root@intel-skylake-y-01 power]# echo disk > /sys/power/state
[root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/*_perf_pct
100
50

Fixes: a04759924e25 ("[cpufreq] intel_pstate: honor user space min_perf_pct override on resume")
Cc: Kristen Carlson Accardi <kristen@linux.intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: linux-pm@vger.kernel.org
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
---
 drivers/cpufreq/intel_pstate.c |    7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

Comments

Rafael J. Wysocki Oct. 6, 2015, 10:43 p.m. UTC | #1
On Tuesday, October 06, 2015 05:49:07 PM Prarit Bhargava wrote:
> Intel CPUs will not enter higher p-states when after switching from the
> performance governor to the powersave governor, until
> /sys/devices/system/cpu/intel_pstate/min_perf_pct is set to a low value.
> This differs from previous behaviour in which a switch to the powersave
> governor would result in a low default value for min_perf_pct.
> 
> The behavior of the powersave governor changed after commit a04759924e25
> ("[cpufreq] intel_pstate: honor user space min_perf_pct override on
> resume").  The commit introduced tracking of performance percentage
> changes via sysfs in order to restore userspace changes during
> suspend/resume.  The problem occurs because the global values of the newly
> introduced max_sysfs_pct and min_sysfs_pct are not reset on a governor
> change and this causes the new governor to inherit the previous governor's
> settings.
> 
> This patch sets max_sysfs_pct to 100 and min_sysfs_pct to 0 on a governor
> change which fixes the problem with governor switching.  These changes
> also make the initial calculations for max_perf_pct and min_perf_pct
> slightly simpler.
> 
> Before patch:
> [root@intel-skylake-y-01 power]# cpupower frequency-set -g performance
> [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/min_perf_pct
> 100
> [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/max_perf_pct
> 100
> [root@intel-skylake-y-01 power]# cpupower frequency-set -g powersave
> [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/min_perf_pct
> 100
> [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/max_perf_pct
> 100
> 
> After patch:
> [root@intel-skylake-y-01 power]# cpupower frequency-set -g performance
> [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/min_perf_pct
> 100
> [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/max_perf_pct
> 100
> [root@intel-skylake-y-01 power]# cpupower frequency-set -g powersave
> [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/min_perf_pct
> 14
> [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/max_perf_pct
> 100
> 
> Also note that I have tested suspend/resume (using CONFIG_PM_DEBUG):
> [root@intel-skylake-y-01 power]# echo 50 > /sys/devices/system/cpu/intel_pstate/min_perf_pct
> [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/*_perf_pct
> 100
> 50
> [root@intel-skylake-y-01 power]# echo devices > /sys/power/pm_test
> [root@intel-skylake-y-01 power]# echo platform > /sys/power/disk
> [root@intel-skylake-y-01 power]# echo disk > /sys/power/state
> [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/*_perf_pct
> 100
> 50
> 
> Fixes: a04759924e25 ("[cpufreq] intel_pstate: honor user space min_perf_pct override on resume")
> Cc: Kristen Carlson Accardi <kristen@linux.intel.com>
> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> Cc: Viresh Kumar <viresh.kumar@linaro.org>
> Cc: linux-pm@vger.kernel.org
> Signed-off-by: Prarit Bhargava <prarit@redhat.com>
> ---
>  drivers/cpufreq/intel_pstate.c |    7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
> index 3af9dd7..bb24458 100644
> --- a/drivers/cpufreq/intel_pstate.c
> +++ b/drivers/cpufreq/intel_pstate.c
> @@ -986,6 +986,9 @@ static int intel_pstate_set_policy(struct cpufreq_policy *policy)
>  	if (!policy->cpuinfo.max_freq)
>  		return -ENODEV;
>  
> +	limits.min_sysfs_pct = 0;
> +	limits.max_sysfs_pct = 100;
> +
>  	if (policy->policy == CPUFREQ_POLICY_PERFORMANCE &&
>  	    policy->max >= policy->cpuinfo.max_freq) {
>  		limits.min_policy_pct = 100;
> @@ -1004,9 +1007,9 @@ static int intel_pstate_set_policy(struct cpufreq_policy *policy)
>  	limits.max_policy_pct = clamp_t(int, limits.max_policy_pct, 0 , 100);
>  
>  	/* Normalize user input to [min_policy_pct, max_policy_pct] */
> -	limits.min_perf_pct = max(limits.min_policy_pct, limits.min_sysfs_pct);
> +	limits.min_perf_pct = limits.min_policy_pct;
>  	limits.min_perf_pct = min(limits.max_policy_pct, limits.min_perf_pct);
> -	limits.max_perf_pct = min(limits.max_policy_pct, limits.max_sysfs_pct);
> +	limits.max_perf_pct = limits.max_sysfs_pct;
>  	limits.max_perf_pct = max(limits.min_policy_pct, limits.max_perf_pct);
>  
>  	/* Make sure min_perf_pct <= max_perf_pct */
> 

Makes sense to me.

Kristen?

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki Oct. 6, 2015, 11:06 p.m. UTC | #2
On Wednesday, October 07, 2015 12:43:55 AM Rafael J. Wysocki wrote:
> On Tuesday, October 06, 2015 05:49:07 PM Prarit Bhargava wrote:
> > Intel CPUs will not enter higher p-states when after switching from the
> > performance governor to the powersave governor, until
> > /sys/devices/system/cpu/intel_pstate/min_perf_pct is set to a low value.
> > This differs from previous behaviour in which a switch to the powersave
> > governor would result in a low default value for min_perf_pct.
> > 
> > The behavior of the powersave governor changed after commit a04759924e25
> > ("[cpufreq] intel_pstate: honor user space min_perf_pct override on
> > resume").  The commit introduced tracking of performance percentage
> > changes via sysfs in order to restore userspace changes during
> > suspend/resume.  The problem occurs because the global values of the newly
> > introduced max_sysfs_pct and min_sysfs_pct are not reset on a governor
> > change and this causes the new governor to inherit the previous governor's
> > settings.
> > 
> > This patch sets max_sysfs_pct to 100 and min_sysfs_pct to 0 on a governor
> > change which fixes the problem with governor switching.  These changes
> > also make the initial calculations for max_perf_pct and min_perf_pct
> > slightly simpler.
> > 
> > Before patch:
> > [root@intel-skylake-y-01 power]# cpupower frequency-set -g performance
> > [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/min_perf_pct
> > 100
> > [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/max_perf_pct
> > 100
> > [root@intel-skylake-y-01 power]# cpupower frequency-set -g powersave
> > [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/min_perf_pct
> > 100
> > [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/max_perf_pct
> > 100
> > 
> > After patch:
> > [root@intel-skylake-y-01 power]# cpupower frequency-set -g performance
> > [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/min_perf_pct
> > 100
> > [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/max_perf_pct
> > 100
> > [root@intel-skylake-y-01 power]# cpupower frequency-set -g powersave
> > [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/min_perf_pct
> > 14
> > [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/max_perf_pct
> > 100
> > 
> > Also note that I have tested suspend/resume (using CONFIG_PM_DEBUG):
> > [root@intel-skylake-y-01 power]# echo 50 > /sys/devices/system/cpu/intel_pstate/min_perf_pct
> > [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/*_perf_pct
> > 100
> > 50
> > [root@intel-skylake-y-01 power]# echo devices > /sys/power/pm_test
> > [root@intel-skylake-y-01 power]# echo platform > /sys/power/disk
> > [root@intel-skylake-y-01 power]# echo disk > /sys/power/state
> > [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/*_perf_pct
> > 100
> > 50
> > 
> > Fixes: a04759924e25 ("[cpufreq] intel_pstate: honor user space min_perf_pct override on resume")
> > Cc: Kristen Carlson Accardi <kristen@linux.intel.com>
> > Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> > Cc: Viresh Kumar <viresh.kumar@linaro.org>
> > Cc: linux-pm@vger.kernel.org
> > Signed-off-by: Prarit Bhargava <prarit@redhat.com>
> > ---
> >  drivers/cpufreq/intel_pstate.c |    7 +++++--
> >  1 file changed, 5 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
> > index 3af9dd7..bb24458 100644
> > --- a/drivers/cpufreq/intel_pstate.c
> > +++ b/drivers/cpufreq/intel_pstate.c
> > @@ -986,6 +986,9 @@ static int intel_pstate_set_policy(struct cpufreq_policy *policy)
> >  	if (!policy->cpuinfo.max_freq)
> >  		return -ENODEV;
> >  
> > +	limits.min_sysfs_pct = 0;
> > +	limits.max_sysfs_pct = 100;
> > +
> >  	if (policy->policy == CPUFREQ_POLICY_PERFORMANCE &&
> >  	    policy->max >= policy->cpuinfo.max_freq) {
> >  		limits.min_policy_pct = 100;
> > @@ -1004,9 +1007,9 @@ static int intel_pstate_set_policy(struct cpufreq_policy *policy)
> >  	limits.max_policy_pct = clamp_t(int, limits.max_policy_pct, 0 , 100);
> >  
> >  	/* Normalize user input to [min_policy_pct, max_policy_pct] */
> > -	limits.min_perf_pct = max(limits.min_policy_pct, limits.min_sysfs_pct);
> > +	limits.min_perf_pct = limits.min_policy_pct;
> >  	limits.min_perf_pct = min(limits.max_policy_pct, limits.min_perf_pct);
> > -	limits.max_perf_pct = min(limits.max_policy_pct, limits.max_sysfs_pct);
> > +	limits.max_perf_pct = limits.max_sysfs_pct;

On a second thought, isn't that always 100?  If so, doesn't it basically discard
limits.max_policy_pct?

> >  	limits.max_perf_pct = max(limits.min_policy_pct, limits.max_perf_pct);
> >  
> >  	/* Make sure min_perf_pct <= max_perf_pct */
> > 

Thanks,
Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Doug Smythies Oct. 7, 2015, 6:51 a.m. UTC | #3
On 2015.09.06 16:48 Rafael J. Wysocki wrote:
> On Wednesday, October 07, 2015 12:43:55 AM Rafael J. Wysocki wrote:
>> On Tuesday, October 06, 2015 05:49:07 PM Prarit Bhargava wrote:
>>> Intel CPUs will not enter higher p-states when after switching from the
>>> performance governor to the powersave governor, until
>>> /sys/devices/system/cpu/intel_pstate/min_perf_pct is set to a low value.

It works properly for me.

Isn't the root issue here an incompatibility between
tools/power/cpupower/utils/cpufreq-set.c and
drivers/cpufreq/intel_pstate.c?
(see experiment results below, where I do not use "cpupower")

I am not familiar with tools/power/cpupower/utils/cpufreq-set.c, but will
look at it more tomorrow.

>>> This differs from previous behaviour in which a switch to the powersave
>>> governor would result in a low default value for min_perf_pct.
>>> 
>>> The behavior of the powersave governor changed after commit a04759924e25
>>> ("[cpufreq] intel_pstate: honor user space min_perf_pct override on
>>> resume").  The commit introduced tracking of performance percentage
>>> changes via sysfs in order to restore userspace changes during
>>> suspend/resume.  The problem occurs because the global values of the newly
>>> introduced max_sysfs_pct and min_sysfs_pct are not reset on a governor
>>> change and this causes the new governor to inherit the previous governor's
>>> settings.
>>> 
>>> This patch sets max_sysfs_pct to 100 and min_sysfs_pct to 0 on a governor
>>> change which fixes the problem with governor switching.  These changes
>>> also make the initial calculations for max_perf_pct and min_perf_pct
>>> slightly simpler.
>>> 
>>> Before patch:
>>> [root@intel-skylake-y-01 power]# cpupower frequency-set -g performance
>>> [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/min_perf_pct
>>> 100
>>> [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/max_perf_pct
>>> 100
>>> [root@intel-skylake-y-01 power]# cpupower frequency-set -g powersave
>>> [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/min_perf_pct
>>> 100
>>> [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/max_perf_pct
>>> 100

And before patch I get, using primitives and not cpupower:
Executive Summary: Everything works fine (or at least as I thought it was supposed to).

root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:powersave
...
/sys/devices/system/cpu/cpu7/cpufreq/scaling_governor:powersave
root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_*
/sys/devices/system/cpu/intel_pstate/max_perf_pct:100
/sys/devices/system/cpu/intel_pstate/min_perf_pct:42
root@s15:/home/doug/temp# echo 50 > /sys/devices/system/cpu/intel_pstate/min_perf_pct
root@s15:/home/doug/temp# echo 80 > /sys/devices/system/cpu/intel_pstate/max_perf_pct
root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_*
/sys/devices/system/cpu/intel_pstate/max_perf_pct:80
/sys/devices/system/cpu/intel_pstate/min_perf_pct:50
root@s15:/home/doug/temp# for file in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo "performance" > $file; done
root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_*
/sys/devices/system/cpu/intel_pstate/max_perf_pct:100
/sys/devices/system/cpu/intel_pstate/min_perf_pct:100
root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:performance
...
/sys/devices/system/cpu/cpu7/cpufreq/scaling_governor:performance
root@s15:/home/doug/temp# for file in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo "powersave" > $file; done
root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:powersave
...
/sys/devices/system/cpu/cpu7/cpufreq/scaling_governor:powersave
root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_*
/sys/devices/system/cpu/intel_pstate/max_perf_pct:80
/sys/devices/system/cpu/intel_pstate/min_perf_pct:50

>>> 
>>> After patch:
>>> [root@intel-skylake-y-01 power]# cpupower frequency-set -g performance
>>> [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/min_perf_pct
>>> 100
>>> [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/max_perf_pct
>>> 100
>>> [root@intel-skylake-y-01 power]# cpupower frequency-set -g powersave
>>> [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/min_perf_pct
>>> 14
>>> [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/max_perf_pct
>>> 100
>>> 

And after the patch I get, using primitives and not cpupower:
Executive Summary: Settings go back to default, and user settings are lost.
This is not how I thought things were supposed to behave, but I'm not actually sure.

root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:powersave
...
/sys/devices/system/cpu/cpu7/cpufreq/scaling_governor:powersave
root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_*
/sys/devices/system/cpu/intel_pstate/max_perf_pct:100
/sys/devices/system/cpu/intel_pstate/min_perf_pct:42
root@s15:/home/doug/temp# echo 50 > /sys/devices/system/cpu/intel_pstate/min_perf_pct
root@s15:/home/doug/temp# echo 80 > /sys/devices/system/cpu/intel_pstate/max_perf_pct
root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_*
/sys/devices/system/cpu/intel_pstate/max_perf_pct:80
/sys/devices/system/cpu/intel_pstate/min_perf_pct:50
root@s15:/home/doug/temp# for file in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo "performance" > $file; done
root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:performance
...
/sys/devices/system/cpu/cpu7/cpufreq/scaling_governor:performance
root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_*
/sys/devices/system/cpu/intel_pstate/max_perf_pct:100
/sys/devices/system/cpu/intel_pstate/min_perf_pct:100
root@s15:/home/doug/temp# for file in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo "powersave" > $file; done
root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:powersave
...
/sys/devices/system/cpu/cpu7/cpufreq/scaling_governor:powersave
root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_*
/sys/devices/system/cpu/intel_pstate/max_perf_pct:100
/sys/devices/system/cpu/intel_pstate/min_perf_pct:42

>>> Also note that I have tested suspend/resume (using CONFIG_PM_DEBUG):
>>> [root@intel-skylake-y-01 power]# echo 50 > /sys/devices/system/cpu/intel_pstate/min_perf_pct
>>> [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/*_perf_pct
>>> 100
>>> 50
>>> [root@intel-skylake-y-01 power]# echo devices > /sys/power/pm_test
>>> [root@intel-skylake-y-01 power]# echo platform > /sys/power/disk
>>> [root@intel-skylake-y-01 power]# echo disk > /sys/power/state
>>> [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/*_perf_pct
>>> 100
>>> 50

Before Patch, I get:

root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_*
/sys/devices/system/cpu/intel_pstate/max_perf_pct:80
/sys/devices/system/cpu/intel_pstate/min_perf_pct:50
root@s15:/home/doug/temp# pm-suspend
...
root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_*
/sys/devices/system/cpu/intel_pstate/max_perf_pct:80
/sys/devices/system/cpu/intel_pstate/min_perf_pct:50

After Patch, I get:

root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_*
/sys/devices/system/cpu/intel_pstate/max_perf_pct:80
/sys/devices/system/cpu/intel_pstate/min_perf_pct:50
root@s15:/home/doug/temp# pm-suspend
...
root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_*
/sys/devices/system/cpu/intel_pstate/max_perf_pct:100
/sys/devices/system/cpu/intel_pstate/min_perf_pct:42

>>> 
>>> Fixes: a04759924e25 ("[cpufreq] intel_pstate: honor user space min_perf_pct override on resume")
>>> Cc: Kristen Carlson Accardi <kristen@linux.intel.com>
>>> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
>>> Cc: Viresh Kumar <viresh.kumar@linaro.org>
>>> Cc: linux-pm@vger.kernel.org
>>> Signed-off-by: Prarit Bhargava <prarit@redhat.com>
>>> ---
>>>  drivers/cpufreq/intel_pstate.c |    7 +++++--
>>>  1 file changed, 5 insertions(+), 2 deletions(-)
>>> 
>>> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
>>> index 3af9dd7..bb24458 100644
>>> --- a/drivers/cpufreq/intel_pstate.c
>>> +++ b/drivers/cpufreq/intel_pstate.c
>>> @@ -986,6 +986,9 @@ static int intel_pstate_set_policy(struct cpufreq_policy *policy)
>>>  	if (!policy->cpuinfo.max_freq)
>>>  		return -ENODEV;
>>>  
>>> +	limits.min_sysfs_pct = 0;
>>> +	limits.max_sysfs_pct = 100;
>>> +
>>>  	if (policy->policy == CPUFREQ_POLICY_PERFORMANCE &&
>>>  	    policy->max >= policy->cpuinfo.max_freq) {
>>>  		limits.min_policy_pct = 100;
>>> @@ -1004,9 +1007,9 @@ static int intel_pstate_set_policy(struct cpufreq_policy *policy)
>>>  	limits.max_policy_pct = clamp_t(int, limits.max_policy_pct, 0 , 100);
>>>  
>>>  	/* Normalize user input to [min_policy_pct, max_policy_pct] */
>>> -	limits.min_perf_pct = max(limits.min_policy_pct, limits.min_sysfs_pct);
>>> +	limits.min_perf_pct = limits.min_policy_pct;
>>>  	limits.min_perf_pct = min(limits.max_policy_pct, limits.min_perf_pct);
>>> -	limits.max_perf_pct = min(limits.max_policy_pct, limits.max_sysfs_pct);
>>> +	limits.max_perf_pct = limits.max_sysfs_pct;
>
> On a second thought, isn't that always 100?  If so, doesn't it basically discard
> limits.max_policy_pct?
>

Yes, I think so, see above.

>>>  	limits.max_perf_pct = max(limits.min_policy_pct, limits.max_perf_pct);
>>>  
>>>  	/* Make sure min_perf_pct <= max_perf_pct */
>>> 

Kernels used: 4.3-rc4 and same plus this patch.


--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Prarit Bhargava Oct. 7, 2015, 9:59 a.m. UTC | #4
On 10/07/2015 02:51 AM, Doug Smythies wrote:

> 
> And before patch I get, using primitives and not cpupower:
> Executive Summary: Everything works fine (or at least as I thought it was supposed to).
> 
> root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
> /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:powersave
> ...
> /sys/devices/system/cpu/cpu7/cpufreq/scaling_governor:powersave
> root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_*
> /sys/devices/system/cpu/intel_pstate/max_perf_pct:100
> /sys/devices/system/cpu/intel_pstate/min_perf_pct:42
> root@s15:/home/doug/temp# echo 50 > /sys/devices/system/cpu/intel_pstate/min_perf_pct
> root@s15:/home/doug/temp# echo 80 > /sys/devices/system/cpu/intel_pstate/max_perf_pct
> root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_*
> /sys/devices/system/cpu/intel_pstate/max_perf_pct:80
> /sys/devices/system/cpu/intel_pstate/min_perf_pct:50
> root@s15:/home/doug/temp# for file in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo "performance" > $file; done
> root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_*
> /sys/devices/system/cpu/intel_pstate/max_perf_pct:100
> /sys/devices/system/cpu/intel_pstate/min_perf_pct:100
> root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
> /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:performance

The switch has to be from performance to powersave.  Switch again and you'll see
the problem.  I can also reproduce this without using 'cpupower'.

P.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Prarit Bhargava Oct. 7, 2015, 11:38 a.m. UTC | #5
On 10/06/2015 07:06 PM, Rafael J. Wysocki wrote:
> On Wednesday, October 07, 2015 12:43:55 AM Rafael J. Wysocki wrote:
>> On Tuesday, October 06, 2015 05:49:07 PM Prarit Bhargava wrote:
>>> Intel CPUs will not enter higher p-states when after switching from the
>>> performance governor to the powersave governor, until
>>> /sys/devices/system/cpu/intel_pstate/min_perf_pct is set to a low value.
>>> This differs from previous behaviour in which a switch to the powersave
>>> governor would result in a low default value for min_perf_pct.
>>>
>>> The behavior of the powersave governor changed after commit a04759924e25
>>> ("[cpufreq] intel_pstate: honor user space min_perf_pct override on
>>> resume").  The commit introduced tracking of performance percentage
>>> changes via sysfs in order to restore userspace changes during
>>> suspend/resume.  The problem occurs because the global values of the newly
>>> introduced max_sysfs_pct and min_sysfs_pct are not reset on a governor
>>> change and this causes the new governor to inherit the previous governor's
>>> settings.
>>>
>>> This patch sets max_sysfs_pct to 100 and min_sysfs_pct to 0 on a governor
>>> change which fixes the problem with governor switching.  These changes
>>> also make the initial calculations for max_perf_pct and min_perf_pct
>>> slightly simpler.
>>>
>>> Before patch:
>>> [root@intel-skylake-y-01 power]# cpupower frequency-set -g performance
>>> [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/min_perf_pct
>>> 100
>>> [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/max_perf_pct
>>> 100
>>> [root@intel-skylake-y-01 power]# cpupower frequency-set -g powersave
>>> [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/min_perf_pct
>>> 100
>>> [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/max_perf_pct
>>> 100
>>>
>>> After patch:
>>> [root@intel-skylake-y-01 power]# cpupower frequency-set -g performance
>>> [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/min_perf_pct
>>> 100
>>> [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/max_perf_pct
>>> 100
>>> [root@intel-skylake-y-01 power]# cpupower frequency-set -g powersave
>>> [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/min_perf_pct
>>> 14
>>> [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/max_perf_pct
>>> 100
>>>
>>> Also note that I have tested suspend/resume (using CONFIG_PM_DEBUG):
>>> [root@intel-skylake-y-01 power]# echo 50 > /sys/devices/system/cpu/intel_pstate/min_perf_pct
>>> [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/*_perf_pct
>>> 100
>>> 50
>>> [root@intel-skylake-y-01 power]# echo devices > /sys/power/pm_test
>>> [root@intel-skylake-y-01 power]# echo platform > /sys/power/disk
>>> [root@intel-skylake-y-01 power]# echo disk > /sys/power/state
>>> [root@intel-skylake-y-01 power]# cat /sys/devices/system/cpu/intel_pstate/*_perf_pct
>>> 100
>>> 50
>>>
>>> Fixes: a04759924e25 ("[cpufreq] intel_pstate: honor user space min_perf_pct override on resume")
>>> Cc: Kristen Carlson Accardi <kristen@linux.intel.com>
>>> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
>>> Cc: Viresh Kumar <viresh.kumar@linaro.org>
>>> Cc: linux-pm@vger.kernel.org
>>> Signed-off-by: Prarit Bhargava <prarit@redhat.com>
>>> ---
>>>  drivers/cpufreq/intel_pstate.c |    7 +++++--
>>>  1 file changed, 5 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
>>> index 3af9dd7..bb24458 100644
>>> --- a/drivers/cpufreq/intel_pstate.c
>>> +++ b/drivers/cpufreq/intel_pstate.c
>>> @@ -986,6 +986,9 @@ static int intel_pstate_set_policy(struct cpufreq_policy *policy)
>>>  	if (!policy->cpuinfo.max_freq)
>>>  		return -ENODEV;
>>>  
>>> +	limits.min_sysfs_pct = 0;
>>> +	limits.max_sysfs_pct = 100;
>>> +
>>>  	if (policy->policy == CPUFREQ_POLICY_PERFORMANCE &&
>>>  	    policy->max >= policy->cpuinfo.max_freq) {
>>>  		limits.min_policy_pct = 100;
>>> @@ -1004,9 +1007,9 @@ static int intel_pstate_set_policy(struct cpufreq_policy *policy)
>>>  	limits.max_policy_pct = clamp_t(int, limits.max_policy_pct, 0 , 100);
>>>  
>>>  	/* Normalize user input to [min_policy_pct, max_policy_pct] */
>>> -	limits.min_perf_pct = max(limits.min_policy_pct, limits.min_sysfs_pct);
>>> +	limits.min_perf_pct = limits.min_policy_pct;
>>>  	limits.min_perf_pct = min(limits.max_policy_pct, limits.min_perf_pct);
>>> -	limits.max_perf_pct = min(limits.max_policy_pct, limits.max_sysfs_pct);
>>> +	limits.max_perf_pct = limits.max_sysfs_pct;
> 
> On a second thought, isn't that always 100?  If so, doesn't it basically discard
> limits.max_policy_pct?

Looking at it, yes.  And that's definitely an unintended consequence of this
patch :).  I'll take a closer look.  I thought it should be permissible to set a
range of (min_perf_pct, max_perf_pct) while changing p-states and I thought the
purpose of max_perf_pct was to set the higher percentage limit.

P.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Prarit Bhargava Oct. 7, 2015, 12:18 p.m. UTC | #6
On 10/06/2015 07:06 PM, Rafael J. Wysocki wrote:
> On Wednesday, October 07, 2015 12:43:55 AM Rafael J. Wysocki wrote:
>> On Tuesday, October 06, 2015 05:49:07 PM Prarit Bhargava wrote:

<snip>

>>>
>>> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
>>> index 3af9dd7..bb24458 100644
>>> --- a/drivers/cpufreq/intel_pstate.c
>>> +++ b/drivers/cpufreq/intel_pstate.c
>>> @@ -986,6 +986,9 @@ static int intel_pstate_set_policy(struct cpufreq_policy *policy)
>>>  	if (!policy->cpuinfo.max_freq)
>>>  		return -ENODEV;
>>>  
>>> +	limits.min_sysfs_pct = 0;
>>> +	limits.max_sysfs_pct = 100;
>>> +
>>>  	if (policy->policy == CPUFREQ_POLICY_PERFORMANCE &&
>>>  	    policy->max >= policy->cpuinfo.max_freq) {
>>>  		limits.min_policy_pct = 100;
>>> @@ -1004,9 +1007,9 @@ static int intel_pstate_set_policy(struct cpufreq_policy *policy)
>>>  	limits.max_policy_pct = clamp_t(int, limits.max_policy_pct, 0 , 100);
>>>  
>>>  	/* Normalize user input to [min_policy_pct, max_policy_pct] */
>>> -	limits.min_perf_pct = max(limits.min_policy_pct, limits.min_sysfs_pct);
>>> +	limits.min_perf_pct = limits.min_policy_pct;
>>>  	limits.min_perf_pct = min(limits.max_policy_pct, limits.min_perf_pct);
>>> -	limits.max_perf_pct = min(limits.max_policy_pct, limits.max_sysfs_pct);
>>> +	limits.max_perf_pct = limits.max_sysfs_pct;
> 
> On a second thought, isn't that always 100?  If so, doesn't it basically discard
> limits.max_policy_pct?

Wow :)  I think you're right and that definitely was an unintended consequence
of this patch.  I also see that I can clean up the intel_pstate_set_policy()
code a bit more.  I'll submit a 2-part v2.

P.

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Doug Smythies Oct. 7, 2015, 2:04 p.m. UTC | #7
On 2015.10.07 03:00 Prarit Bhargava wrote:
> On 10/07/2015 02:51 AM, Doug Smythies wrote:
>> 
>> And before patch I get, using primitives and not cpupower:
>> Executive Summary: Everything works fine (or at least as I thought it was supposed to).
>> 
>> root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
>> /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:powersave
>> ...
>> /sys/devices/system/cpu/cpu7/cpufreq/scaling_governor:powersave
>> root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_*
>> /sys/devices/system/cpu/intel_pstate/max_perf_pct:100
>> /sys/devices/system/cpu/intel_pstate/min_perf_pct:42
>> root@s15:/home/doug/temp# echo 50 > /sys/devices/system/cpu/intel_pstate/min_perf_pct
>> root@s15:/home/doug/temp# echo 80 > /sys/devices/system/cpu/intel_pstate/max_perf_pct
>> root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_*
>> /sys/devices/system/cpu/intel_pstate/max_perf_pct:80
>> /sys/devices/system/cpu/intel_pstate/min_perf_pct:50
>> root@s15:/home/doug/temp# for file in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo "performance" > $file; done
>> root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_*
>> /sys/devices/system/cpu/intel_pstate/max_perf_pct:100
>> /sys/devices/system/cpu/intel_pstate/min_perf_pct:100
>> root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
>> /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:performance
>
> The switch has to be from performance to powersave.  Switch again and you'll see
> the problem.

Yes, of course, and I did.
The first "powersave" listing was just to show the setup.
Continue reading my original reply, also added back below:
The "..." stuff is just where I deleted 6 of the 8 CPUs saying the same thing.

root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_*
/sys/devices/system/cpu/intel_pstate/max_perf_pct:100
/sys/devices/system/cpu/intel_pstate/min_perf_pct:100
root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:performance
...
/sys/devices/system/cpu/cpu7/cpufreq/scaling_governor:performance
root@s15:/home/doug/temp# for file in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo "powersave" > $file; done
root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:powersave
...
/sys/devices/system/cpu/cpu7/cpufreq/scaling_governor:powersave
root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_*
/sys/devices/system/cpu/intel_pstate/max_perf_pct:80
/sys/devices/system/cpu/intel_pstate/min_perf_pct:50

> I can also reproduce this without using 'cpupower'.

Interesting. Then maybe it is a difference between legacy mode
and Hardware P state (HWP) mode. I do not have an HWP capable processor
to test.

I compiled cpupower, but get the same results as I get using primitives,
meaning things work properly before the patch and resort to defaults
after the patch.


--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Prarit Bhargava Oct. 7, 2015, 2:10 p.m. UTC | #8
On 10/07/2015 10:04 AM, Doug Smythies wrote:
> 
> On 2015.10.07 03:00 Prarit Bhargava wrote:
>> On 10/07/2015 02:51 AM, Doug Smythies wrote:
>>>
>>> And before patch I get, using primitives and not cpupower:
>>> Executive Summary: Everything works fine (or at least as I thought it was supposed to).
>>>
>>> root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
>>> /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:powersave
>>> ...
>>> /sys/devices/system/cpu/cpu7/cpufreq/scaling_governor:powersave
>>> root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_*
>>> /sys/devices/system/cpu/intel_pstate/max_perf_pct:100
>>> /sys/devices/system/cpu/intel_pstate/min_perf_pct:42
>>> root@s15:/home/doug/temp# echo 50 > /sys/devices/system/cpu/intel_pstate/min_perf_pct
>>> root@s15:/home/doug/temp# echo 80 > /sys/devices/system/cpu/intel_pstate/max_perf_pct
>>> root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_*
>>> /sys/devices/system/cpu/intel_pstate/max_perf_pct:80
>>> /sys/devices/system/cpu/intel_pstate/min_perf_pct:50
>>> root@s15:/home/doug/temp# for file in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo "performance" > $file; done
>>> root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_*
>>> /sys/devices/system/cpu/intel_pstate/max_perf_pct:100
>>> /sys/devices/system/cpu/intel_pstate/min_perf_pct:100
>>> root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
>>> /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:performance
>>
>> The switch has to be from performance to powersave.  Switch again and you'll see
>> the problem.
> 
> Yes, of course, and I did.
> The first "powersave" listing was just to show the setup.
> Continue reading my original reply, also added back below:
> The "..." stuff is just where I deleted 6 of the 8 CPUs saying the same thing.
> 
> root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_*
> /sys/devices/system/cpu/intel_pstate/max_perf_pct:100
> /sys/devices/system/cpu/intel_pstate/min_perf_pct:100
> root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
> /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:performance
> ...
> /sys/devices/system/cpu/cpu7/cpufreq/scaling_governor:performance
> root@s15:/home/doug/temp# for file in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo "powersave" > $file; done
> root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
> /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:powersave
> ...
> /sys/devices/system/cpu/cpu7/cpufreq/scaling_governor:powersave
> root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_*
> /sys/devices/system/cpu/intel_pstate/max_perf_pct:80
> /sys/devices/system/cpu/intel_pstate/min_perf_pct:50
> 
>> I can also reproduce this without using 'cpupower'.
> 
> Interesting. Then maybe it is a difference between legacy mode
> and Hardware P state (HWP) mode. I do not have an HWP capable processor
> to test.

I have intel_pstate=no_hwp so that doesn't explain things.  I will send you a
printk debug patch shortly to figure out why your system doesn't see this
problem.  I have _6_ different variants of intel processors that see this same
issue.

What is your model #?

P.


> 
> I compiled cpupower, but get the same results as I get using primitives,
> meaning things work properly before the patch and resort to defaults
> after the patch.
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Prarit Bhargava Oct. 7, 2015, 2:50 p.m. UTC | #9
On 10/07/2015 08:18 AM, Prarit Bhargava wrote:
> 
> 
> On 10/06/2015 07:06 PM, Rafael J. Wysocki wrote:
>> On Wednesday, October 07, 2015 12:43:55 AM Rafael J. Wysocki wrote:
>>> On Tuesday, October 06, 2015 05:49:07 PM Prarit Bhargava wrote:
> 
> <snip>
> 
>>>>
>>>> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
>>>> index 3af9dd7..bb24458 100644
>>>> --- a/drivers/cpufreq/intel_pstate.c
>>>> +++ b/drivers/cpufreq/intel_pstate.c
>>>> @@ -986,6 +986,9 @@ static int intel_pstate_set_policy(struct cpufreq_policy *policy)
>>>>  	if (!policy->cpuinfo.max_freq)
>>>>  		return -ENODEV;
>>>>  
>>>> +	limits.min_sysfs_pct = 0;
>>>> +	limits.max_sysfs_pct = 100;
>>>> +
>>>>  	if (policy->policy == CPUFREQ_POLICY_PERFORMANCE &&
>>>>  	    policy->max >= policy->cpuinfo.max_freq) {
>>>>  		limits.min_policy_pct = 100;
>>>> @@ -1004,9 +1007,9 @@ static int intel_pstate_set_policy(struct cpufreq_policy *policy)
>>>>  	limits.max_policy_pct = clamp_t(int, limits.max_policy_pct, 0 , 100);
>>>>  
>>>>  	/* Normalize user input to [min_policy_pct, max_policy_pct] */
>>>> -	limits.min_perf_pct = max(limits.min_policy_pct, limits.min_sysfs_pct);
>>>> +	limits.min_perf_pct = limits.min_policy_pct;
>>>>  	limits.min_perf_pct = min(limits.max_policy_pct, limits.min_perf_pct);
>>>> -	limits.max_perf_pct = min(limits.max_policy_pct, limits.max_sysfs_pct);
>>>> +	limits.max_perf_pct = limits.max_sysfs_pct;
>>
>> On a second thought, isn't that always 100?  If so, doesn't it basically discard
>> limits.max_policy_pct?
> 
> Wow :)  I think you're right and that definitely was an unintended consequence
> of this patch.  I also see that I can clean up the intel_pstate_set_policy()
> code a bit more.  I'll submit a 2-part v2.

Kristen,

There is a question here and I'm going to need your input on the answer.  The
question is whether or not a user can exceed the max_policy_pct or drop below
the min_policy_pct values via sysfs?

If "No, a user should not be able to exceed or go below those values", then IMO

limits.max_sysfs_pct should default to limits.max_policy_pct, and
limits.min_sysfs_pct should default to limits.min_policy_pct.

and I will adjust my patch accordingly.

P.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Doug Smythies Oct. 7, 2015, 3:40 p.m. UTC | #10
On 2015.09.07 07:11 Prarit Bhargava wrote:

Hi Prarit,

> I have intel_pstate=no_hwp so that doesn't explain things.  I will send you a
> printk debug patch shortly to figure out why your system doesn't see this
> problem.  I have _6_ different variants of intel processors that see this same
> issue.

Interesting.

>
> What is your model #?

Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz

While probably not relevant, it is the identical processor that the previous
maintainer of the intel_pstate driver had on his main test computer.

As mentioned in my original reply, I am using kernel 4.3-rc4
from kernel.org mainline.
For kernel configuration I steal the Ubuntu kernel config.

O.K. meanwhile your printk debug patch e-mail came:

On 2015.10.07 07:34 Prarit Bhargava wrote:

> Doug, please apply and test.

I never get the "store_min_perf_pct" stuff during startup.

I get:

[    2.662092] Intel P-state driver initializing.
[    2.662110] intel_pstate_set_policy[1001] min_perf_pct = 100
[    2.681792] intel_pstate_set_policy[1001] min_perf_pct = 100
[    2.681825] intel_pstate_set_policy[1001] min_perf_pct = 100
[    2.681834] intel_pstate_set_policy[1001] min_perf_pct = 100
[    2.681935] intel_pstate_set_policy[1001] min_perf_pct = 100
[    2.681959] intel_pstate_set_policy[1001] min_perf_pct = 100
[    2.682004] intel_pstate_set_policy[1001] min_perf_pct = 100
[    2.682024] intel_pstate_set_policy[1001] min_perf_pct = 100

[   80.044329] intel_pstate_set_policy[1020] min_perf_pct = 42
[   80.044331] intel_pstate_set_policy[1023] min_perf_pct = 42
[   80.044331] intel_pstate_set_policy[1028] min_perf_pct = 42
[   80.044343] intel_pstate_set_policy[1020] min_perf_pct = 42
[   80.044344] intel_pstate_set_policy[1023] min_perf_pct = 42
[   80.044345] intel_pstate_set_policy[1028] min_perf_pct = 42
[   80.044354] intel_pstate_set_policy[1020] min_perf_pct = 42
[   80.044355] intel_pstate_set_policy[1023] min_perf_pct = 42
[   80.044355] intel_pstate_set_policy[1028] min_perf_pct = 42
[   80.044364] intel_pstate_set_policy[1020] min_perf_pct = 42
[   80.044365] intel_pstate_set_policy[1023] min_perf_pct = 42
[   80.044366] intel_pstate_set_policy[1028] min_perf_pct = 42
[   80.044375] intel_pstate_set_policy[1020] min_perf_pct = 42
[   80.044376] intel_pstate_set_policy[1023] min_perf_pct = 42
[   80.044376] intel_pstate_set_policy[1028] min_perf_pct = 42
[   80.044385] intel_pstate_set_policy[1020] min_perf_pct = 42
[   80.044386] intel_pstate_set_policy[1023] min_perf_pct = 42
[   80.044387] intel_pstate_set_policy[1028] min_perf_pct = 42
[   80.044396] intel_pstate_set_policy[1020] min_perf_pct = 42
[   80.044397] intel_pstate_set_policy[1023] min_perf_pct = 42
[   80.044397] intel_pstate_set_policy[1028] min_perf_pct = 42
[   80.044406] intel_pstate_set_policy[1020] min_perf_pct = 42
[   80.044407] intel_pstate_set_policy[1023] min_perf_pct = 42
[   80.044408] intel_pstate_set_policy[1028] min_perf_pct = 42

Note that the stuff at 80 seconds is from a startup script
that changes the governor to powersave after a 60 second delay.
It is an Ubuntu default script. (/etc/init.d/ondemand),
but merely does that same thing as this:

for file in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo "powersave" > $file; done

I do not know why things are done 3 times.

Now, continuing:

echo 50 > /sys/devices/system/cpu/intel_pstate/min_perf_pct

[ 1411.436249] store_min_perf_pct[453] min_sysfs_pct = 50
[ 1411.436253] store_min_perf_pct[456] min_perf_pct = 50
[ 1411.436255] store_min_perf_pct[459] min_perf_pct = 50
[ 1411.436256] store_min_perf_pct[462] min_perf_pct = 50

cpupower frequency-set -g performance

[ 1425.879832] intel_pstate_set_policy[1001] min_perf_pct = 100
[ 1425.879981] intel_pstate_set_policy[1001] min_perf_pct = 100
[ 1425.880049] intel_pstate_set_policy[1001] min_perf_pct = 100
[ 1425.880178] intel_pstate_set_policy[1001] min_perf_pct = 100
[ 1425.880324] intel_pstate_set_policy[1001] min_perf_pct = 100
[ 1425.880467] intel_pstate_set_policy[1001] min_perf_pct = 100
[ 1425.880534] intel_pstate_set_policy[1001] min_perf_pct = 100
[ 1425.880594] intel_pstate_set_policy[1001] min_perf_pct = 100

cpupower frequency-set -g powersave

[ 1437.677179] intel_pstate_set_policy[1020] min_perf_pct = 50
[ 1437.677181] intel_pstate_set_policy[1023] min_perf_pct = 50
[ 1437.677182] intel_pstate_set_policy[1028] min_perf_pct = 50
[ 1437.677205] intel_pstate_set_policy[1020] min_perf_pct = 50
[ 1437.677206] intel_pstate_set_policy[1023] min_perf_pct = 50
[ 1437.677206] intel_pstate_set_policy[1028] min_perf_pct = 50
[ 1437.677224] intel_pstate_set_policy[1020] min_perf_pct = 50
[ 1437.677225] intel_pstate_set_policy[1023] min_perf_pct = 50
[ 1437.677226] intel_pstate_set_policy[1028] min_perf_pct = 50
[ 1437.677244] intel_pstate_set_policy[1020] min_perf_pct = 50
[ 1437.677245] intel_pstate_set_policy[1023] min_perf_pct = 50
[ 1437.677245] intel_pstate_set_policy[1028] min_perf_pct = 50
[ 1437.677270] intel_pstate_set_policy[1020] min_perf_pct = 50
[ 1437.677271] intel_pstate_set_policy[1023] min_perf_pct = 50
[ 1437.677279] intel_pstate_set_policy[1028] min_perf_pct = 50
[ 1437.677296] intel_pstate_set_policy[1020] min_perf_pct = 50
[ 1437.677297] intel_pstate_set_policy[1023] min_perf_pct = 50
[ 1437.677297] intel_pstate_set_policy[1028] min_perf_pct = 50
[ 1437.677316] intel_pstate_set_policy[1020] min_perf_pct = 50
[ 1437.677317] intel_pstate_set_policy[1023] min_perf_pct = 50
[ 1437.677318] intel_pstate_set_policy[1028] min_perf_pct = 50
[ 1437.677350] intel_pstate_set_policy[1020] min_perf_pct = 50
[ 1437.677351] intel_pstate_set_policy[1023] min_perf_pct = 50
[ 1437.677352] intel_pstate_set_policy[1028] min_perf_pct = 50

Do we agree or disagree that the root issue seems to be (from your test)?:

\#  echo 100 > /sys/devices/system/cpu/intel_pstate/min_perf_pct

[   21.483436] store_min_perf_pct[453] min_sysfs_pct = 100
[   21.489373] store_min_perf_pct[456] min_perf_pct = 100
[   21.495203] store_min_perf_pct[459] min_perf_pct = 100
[   21.501050] store_min_perf_pct[462] min_perf_pct = 100

... Doug


--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Prarit Bhargava Oct. 7, 2015, 3:46 p.m. UTC | #11
On 10/07/2015 11:40 AM, Doug Smythies wrote:
> 
> Do we agree or disagree that the root issue seems to be (from your test)?:
> 
> \#  echo 100 > /sys/devices/system/cpu/intel_pstate/min_perf_pct
> 
> [   21.483436] store_min_perf_pct[453] min_sysfs_pct = 100
> [   21.489373] store_min_perf_pct[456] min_perf_pct = 100
> [   21.495203] store_min_perf_pct[459] min_perf_pct = 100
> [   21.501050] store_min_perf_pct[462] min_perf_pct = 100

Yep, and it appears to be done by default in Fedora & RHEL :/ ... the issue is
still the same IMO that min_sysfs_pct & max_sysfs_pct are not cleared on a
governor switch.

P.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Doug Smythies Oct. 7, 2015, 6:52 p.m. UTC | #12
On 2015.10.07 08:46 Prarit Bhargava wrote:
> On 10/07/2015 11:40 AM, Doug Smythies wrote:
>> 
>> Do we agree or disagree that the root issue seems to be (from your test)?:
>> 
>> \#  echo 100 > /sys/devices/system/cpu/intel_pstate/min_perf_pct
>> 
>> [   21.483436] store_min_perf_pct[453] min_sysfs_pct = 100
>> [   21.489373] store_min_perf_pct[456] min_perf_pct = 100
>> [   21.495203] store_min_perf_pct[459] min_perf_pct = 100
>> [   21.501050] store_min_perf_pct[462] min_perf_pct = 100
>
> Yep, and it appears to be done by default in Fedora & RHEL :/ ... the issue is
> still the same IMO that min_sysfs_pct & max_sysfs_pct are not cleared on a
> governor switch.

Clearing them will break some other things. For example, and as
shown in my original reply, resume from suspend.

Why? Because, at least on my computer, the governor is changed to
"performance" during suspend, and the "powersave" governor is
restored sometime during resume. The users wants the settings they had
before the suspend.

Continuing with that printk debug kernel from earlier:

pm-suspend:

[12599.912028] intel_pstate_set_policy[1001] min_perf_pct = 100
[12599.913781] intel_pstate_set_policy[1001] min_perf_pct = 100
[12599.915343] intel_pstate_set_policy[1001] min_perf_pct = 100
[12599.916877] intel_pstate_set_policy[1001] min_perf_pct = 100
[12599.918444] intel_pstate_set_policy[1001] min_perf_pct = 100
[12599.919686] intel_pstate_set_policy[1001] min_perf_pct = 100
[12599.920932] intel_pstate_set_policy[1001] min_perf_pct = 100
[12599.922191] intel_pstate_set_policy[1001] min_perf_pct = 100

Then push the power button, i.e. resume:

[12609.953358] intel_pstate_set_policy[1020] min_perf_pct = 50
[12609.953360] intel_pstate_set_policy[1023] min_perf_pct = 50
[12609.953361] intel_pstate_set_policy[1028] min_perf_pct = 50
[12609.953796] intel_pstate_set_policy[1020] min_perf_pct = 50
[12609.953797] intel_pstate_set_policy[1023] min_perf_pct = 50
[12609.953798] intel_pstate_set_policy[1028] min_perf_pct = 50
[12609.954209] intel_pstate_set_policy[1020] min_perf_pct = 50
[12609.954210] intel_pstate_set_policy[1023] min_perf_pct = 50
[12609.954211] intel_pstate_set_policy[1028] min_perf_pct = 50
[12609.954619] intel_pstate_set_policy[1020] min_perf_pct = 50
[12609.954620] intel_pstate_set_policy[1023] min_perf_pct = 50
[12609.954621] intel_pstate_set_policy[1028] min_perf_pct = 50
[12609.955028] intel_pstate_set_policy[1020] min_perf_pct = 50
[12609.955029] intel_pstate_set_policy[1023] min_perf_pct = 50
[12609.955030] intel_pstate_set_policy[1028] min_perf_pct = 50
[12609.955431] intel_pstate_set_policy[1020] min_perf_pct = 50
[12609.955432] intel_pstate_set_policy[1023] min_perf_pct = 50
[12609.955433] intel_pstate_set_policy[1028] min_perf_pct = 50
[12609.955833] intel_pstate_set_policy[1020] min_perf_pct = 50
[12609.955834] intel_pstate_set_policy[1023] min_perf_pct = 50
[12609.955835] intel_pstate_set_policy[1028] min_perf_pct = 50
[12609.956234] intel_pstate_set_policy[1020] min_perf_pct = 50
[12609.956235] intel_pstate_set_policy[1023] min_perf_pct = 50
[12609.956235] intel_pstate_set_policy[1028] min_perf_pct = 50

The below is copied from my original reply:

Before Patch, I get:

root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_*
/sys/devices/system/cpu/intel_pstate/max_perf_pct:80
/sys/devices/system/cpu/intel_pstate/min_perf_pct:50
root@s15:/home/doug/temp# pm-suspend
...
root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_*
/sys/devices/system/cpu/intel_pstate/max_perf_pct:80
/sys/devices/system/cpu/intel_pstate/min_perf_pct:50

After Patch, I get:

root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_*
/sys/devices/system/cpu/intel_pstate/max_perf_pct:80
/sys/devices/system/cpu/intel_pstate/min_perf_pct:50
root@s15:/home/doug/temp# pm-suspend
...
root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_*
/sys/devices/system/cpu/intel_pstate/max_perf_pct:100
/sys/devices/system/cpu/intel_pstate/min_perf_pct:42


--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Prarit Bhargava Oct. 7, 2015, 8:40 p.m. UTC | #13
On 10/07/2015 02:52 PM, Doug Smythies wrote:
> On 2015.10.07 08:46 Prarit Bhargava wrote:
>> On 10/07/2015 11:40 AM, Doug Smythies wrote:
>>>
>>> Do we agree or disagree that the root issue seems to be (from your test)?:
>>>
>>> \#  echo 100 > /sys/devices/system/cpu/intel_pstate/min_perf_pct
>>>
>>> [   21.483436] store_min_perf_pct[453] min_sysfs_pct = 100
>>> [   21.489373] store_min_perf_pct[456] min_perf_pct = 100
>>> [   21.495203] store_min_perf_pct[459] min_perf_pct = 100
>>> [   21.501050] store_min_perf_pct[462] min_perf_pct = 100
>>
>> Yep, and it appears to be done by default in Fedora & RHEL :/ ... the issue is
>> still the same IMO that min_sysfs_pct & max_sysfs_pct are not cleared on a
>> governor switch.
> 
> Clearing them will break some other things. For example, and as
> shown in my original reply, resume from suspend.
> 
> Why? Because, at least on my computer, the governor is changed to
> "performance" during suspend, and the "powersave" governor is
> restored sometime during resume. The users wants the settings they had
> before the suspend.

Hmm ... okay.  I'm going to try again with a new patch and I'll add an
additional test of starting with the powersave governor, changing the
min_perf_pct, and then doing a suspend/resume.

P.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Prarit Bhargava Oct. 7, 2015, 9:31 p.m. UTC | #14
On 10/07/2015 02:52 PM, Doug Smythies wrote:
> On 2015.10.07 08:46 Prarit Bhargava wrote:
>> On 10/07/2015 11:40 AM, Doug Smythies wrote:
>>>
>>> Do we agree or disagree that the root issue seems to be (from your test)?:
>>>
>>> \#  echo 100 > /sys/devices/system/cpu/intel_pstate/min_perf_pct
>>>
>>> [   21.483436] store_min_perf_pct[453] min_sysfs_pct = 100
>>> [   21.489373] store_min_perf_pct[456] min_perf_pct = 100
>>> [   21.495203] store_min_perf_pct[459] min_perf_pct = 100
>>> [   21.501050] store_min_perf_pct[462] min_perf_pct = 100
>>
>> Yep, and it appears to be done by default in Fedora & RHEL :/ ... the issue is
>> still the same IMO that min_sysfs_pct & max_sysfs_pct are not cleared on a
>> governor switch.
> 
> Clearing them will break some other things. For example, and as
> shown in my original reply, resume from suspend.
> 
> Why? Because, at least on my computer, the governor is changed to
> "performance" during suspend, and the "powersave" governor is
> restored sometime during resume. The users wants the settings they had
> before the suspend.
> 

Looking at this in more detail after having tested on a Intel(R) Core(TM)
i7-2600 CPU @ 3.40GHz in Fedora and RHEL.

I have a feeling that the switch you're seeing (poweersave->performance, suspend
... resume, performance->powersave) is occurring in userspace, and not as a
result of the kernel.  IMO if userspace changes the governor, all bets are off
on maintaining max_sysfs_pct and min_sysfs_pct.

Here's something I cannot figure out (because I do not have an Ubuntu install).
 *Why* is Ubuntu making the governor switch during suspend/resume?  Is it
because of archaic brokeness they were trying to paper over?

> Continuing with that printk debug kernel from earlier:
> 
> pm-suspend:
> 
> [12599.912028] intel_pstate_set_policy[1001] min_perf_pct = 100
> [12599.913781] intel_pstate_set_policy[1001] min_perf_pct = 100
> [12599.915343] intel_pstate_set_policy[1001] min_perf_pct = 100
> [12599.916877] intel_pstate_set_policy[1001] min_perf_pct = 100
> [12599.918444] intel_pstate_set_policy[1001] min_perf_pct = 100
> [12599.919686] intel_pstate_set_policy[1001] min_perf_pct = 100
> [12599.920932] intel_pstate_set_policy[1001] min_perf_pct = 100
> [12599.922191] intel_pstate_set_policy[1001] min_perf_pct = 100
> 
> Then push the power button, i.e. resume:
> 
> [12609.953358] intel_pstate_set_policy[1020] min_perf_pct = 50
> [12609.953360] intel_pstate_set_policy[1023] min_perf_pct = 50
> [12609.953361] intel_pstate_set_policy[1028] min_perf_pct = 50
> [12609.953796] intel_pstate_set_policy[1020] min_perf_pct = 50
> [12609.953797] intel_pstate_set_policy[1023] min_perf_pct = 50
> [12609.953798] intel_pstate_set_policy[1028] min_perf_pct = 50
> [12609.954209] intel_pstate_set_policy[1020] min_perf_pct = 50
> [12609.954210] intel_pstate_set_policy[1023] min_perf_pct = 50
> [12609.954211] intel_pstate_set_policy[1028] min_perf_pct = 50
> [12609.954619] intel_pstate_set_policy[1020] min_perf_pct = 50
> [12609.954620] intel_pstate_set_policy[1023] min_perf_pct = 50
> [12609.954621] intel_pstate_set_policy[1028] min_perf_pct = 50
> [12609.955028] intel_pstate_set_policy[1020] min_perf_pct = 50
> [12609.955029] intel_pstate_set_policy[1023] min_perf_pct = 50
> [12609.955030] intel_pstate_set_policy[1028] min_perf_pct = 50
> [12609.955431] intel_pstate_set_policy[1020] min_perf_pct = 50
> [12609.955432] intel_pstate_set_policy[1023] min_perf_pct = 50
> [12609.955433] intel_pstate_set_policy[1028] min_perf_pct = 50
> [12609.955833] intel_pstate_set_policy[1020] min_perf_pct = 50
> [12609.955834] intel_pstate_set_policy[1023] min_perf_pct = 50
> [12609.955835] intel_pstate_set_policy[1028] min_perf_pct = 50
> [12609.956234] intel_pstate_set_policy[1020] min_perf_pct = 50
> [12609.956235] intel_pstate_set_policy[1023] min_perf_pct = 50
> [12609.956235] intel_pstate_set_policy[1028] min_perf_pct = 50
> 
> The below is copied from my original reply:
> 
> Before Patch, I get:
> 
> root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_*
> /sys/devices/system/cpu/intel_pstate/max_perf_pct:80
> /sys/devices/system/cpu/intel_pstate/min_perf_pct:50
> root@s15:/home/doug/temp# pm-suspend
> ...
> root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_*
> /sys/devices/system/cpu/intel_pstate/max_perf_pct:80
> /sys/devices/system/cpu/intel_pstate/min_perf_pct:50
> 
> After Patch, I get:
> 
> root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_*
> /sys/devices/system/cpu/intel_pstate/max_perf_pct:80
> /sys/devices/system/cpu/intel_pstate/min_perf_pct:50
> root@s15:/home/doug/temp# pm-suspend
> ...
> root@s15:/home/doug/temp# grep . /sys/devices/system/cpu/intel_pstate/*_perf_*
> /sys/devices/system/cpu/intel_pstate/max_perf_pct:100
> /sys/devices/system/cpu/intel_pstate/min_perf_pct:42

Here's what I get after the patch (again, on Fedora which appears to let the
kernel do it's thing during suspend/resume) on the same processor you are using
(a Sandybridge, Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz), and using the
powersave governor,

[root@intel-skylake-y-01 ~]# cpupower frequency-info
analyzing CPU 0:
  driver: intel_pstate
  CPUs which run at the same hardware frequency: 0
  CPUs which need to have their frequency coordinated by software: 0
  maximum transition latency: 0.97 ms.
  hardware limits: 400 MHz - 2.70 GHz
  available cpufreq governors: performance, powersave
  current policy: frequency should be within 400 MHz and 2.70 GHz.
                  The governor "powersave" may decide which speed to use
                  within this range.
  current CPU frequency is 800 MHz (asserted by call to hardware).
  boost state support:
    Supported: yes
    Active: yes


[root@intel-skylake-y-01 ~]# cat /sys/devices/system/cpu/intel_pstate/*_perf_pct
100
14
[root@intel-skylake-y-01 ~]# echo devices > /sys/power/pm_test; echo platform >
/sys/power/disk; sleep 1; echo disk > /sys/power/state
[root@intel-skylake-y-01 ~]# cat /sys/devices/system/cpu/intel_pstate/*_perf_pct
100
14

Even if I manually change the max & min,

[root@intel-skylake-y-01 ~]# echo 80 >
/sys/devices/system/cpu/intel_pstate/max_perf_pct
[root@intel-skylake-y-01 ~]# echo 50 >
/sys/devices/system/cpu/intel_pstate/min_perf_pct
[root@intel-skylake-y-01 ~]# cat /sys/devices/system/cpu/intel_pstate/*_perf_pct
80
50
[root@intel-skylake-y-01 ~]# echo devices > /sys/power/pm_test; echo platform >
/sys/power/disk; sleep 1; echo disk > /sys/power/state
[root@intel-skylake-y-01 ~]# cat /sys/devices/system/cpu/intel_pstate/*_perf_pct
80
50
[root@intel-skylake-y-01 ~]#

Everything works.  It doesn't work on Ubuntu because userspace is doing
something weird.  Let's figure out why that is -- anyone know who works on s/r @
Ubuntu?

P.
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki Oct. 7, 2015, 10:05 p.m. UTC | #15
On Wednesday, October 07, 2015 05:31:25 PM Prarit Bhargava wrote:
> 
> On 10/07/2015 02:52 PM, Doug Smythies wrote:
> > On 2015.10.07 08:46 Prarit Bhargava wrote:
> >> On 10/07/2015 11:40 AM, Doug Smythies wrote:
> >>>
> >>> Do we agree or disagree that the root issue seems to be (from your test)?:
> >>>
> >>> \#  echo 100 > /sys/devices/system/cpu/intel_pstate/min_perf_pct
> >>>
> >>> [   21.483436] store_min_perf_pct[453] min_sysfs_pct = 100
> >>> [   21.489373] store_min_perf_pct[456] min_perf_pct = 100
> >>> [   21.495203] store_min_perf_pct[459] min_perf_pct = 100
> >>> [   21.501050] store_min_perf_pct[462] min_perf_pct = 100
> >>
> >> Yep, and it appears to be done by default in Fedora & RHEL :/ ... the issue is
> >> still the same IMO that min_sysfs_pct & max_sysfs_pct are not cleared on a
> >> governor switch.
> > 
> > Clearing them will break some other things. For example, and as
> > shown in my original reply, resume from suspend.
> > 
> > Why? Because, at least on my computer, the governor is changed to
> > "performance" during suspend, and the "powersave" governor is
> > restored sometime during resume. The users wants the settings they had
> > before the suspend.
> > 
> 
> Looking at this in more detail after having tested on a Intel(R) Core(TM)
> i7-2600 CPU @ 3.40GHz in Fedora and RHEL.
> 
> I have a feeling that the switch you're seeing (poweersave->performance, suspend
> ... resume, performance->powersave) is occurring in userspace, and not as a
> result of the kernel.  IMO if userspace changes the governor, all bets are off
> on maintaining max_sysfs_pct and min_sysfs_pct.
> 
> Here's something I cannot figure out (because I do not have an Ubuntu install).
>  *Why* is Ubuntu making the governor switch during suspend/resume?  Is it
> because of archaic brokeness they were trying to paper over?

That's not limited to Ubuntu, pm-utils has been doing that forever.

I have no idea why has it been doing that, though.  I guess the reason
was to "speed up" PM transitions (in case it started when you were in a
low-frequency P-state and then there was no time to bump it up before
things got too far).

Thanks,
Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Doug Smythies Oct. 7, 2015, 10:26 p.m. UTC | #16
On 2015.10.07 15:06 Rafael J. Wysocki wrote:
> On Wednesday, October 07, 2015 05:31:25 PM Prarit Bhargava wrote:
>> On 10/07/2015 02:52 PM, Doug Smythies wrote:
>>> On 2015.10.07 08:46 Prarit Bhargava wrote:
>>>> On 10/07/2015 11:40 AM, Doug Smythies wrote:
>>>>>
>>>>> Do we agree or disagree that the root issue seems to be (from your test)?:
>>>>>
>>>>> \#  echo 100 > /sys/devices/system/cpu/intel_pstate/min_perf_pct
>>>>>
>>>>> [   21.483436] store_min_perf_pct[453] min_sysfs_pct = 100
>>>>> [   21.489373] store_min_perf_pct[456] min_perf_pct = 100
>>>>> [   21.495203] store_min_perf_pct[459] min_perf_pct = 100
>>>>> [   21.501050] store_min_perf_pct[462] min_perf_pct = 100
>>>>
>>>> Yep, and it appears to be done by default in Fedora & RHEL :/ ... the issue is
>>>> still the same IMO that min_sysfs_pct & max_sysfs_pct are not cleared on a
>>>> governor switch.
>>> 
>>> Clearing them will break some other things. For example, and as
>>> shown in my original reply, resume from suspend.
>>> 
>>> Why? Because, at least on my computer, the governor is changed to
>>> "performance" during suspend, and the "powersave" governor is
>>> restored sometime during resume. The users wants the settings they had
>>> before the suspend.
>>> 
>> Looking at this in more detail after having tested on a Intel(R) Core(TM)
>> i7-2600 CPU @ 3.40GHz in Fedora and RHEL.
>> 
>> I have a feeling that the switch you're seeing (poweersave->performance, suspend
>> ... resume, performance->powersave) is occurring in userspace, and not as a
>> result of the kernel.
Agreed. It is pm-suspend doing it.

>>  IMO if userspace changes the governor, all bets are off
>> on maintaining max_sysfs_pct and min_sysfs_pct.
>> 
>> Here's something I cannot figure out (because I do not have an Ubuntu install).
>>  *Why* is Ubuntu making the governor switch during suspend/resume?  Is it
>> because of archaic brokeness they were trying to paper over?

>> That's not limited to Ubuntu, pm-utils has been doing that forever.

Agreed. This in pm-utils, and not limited to Ubuntu.
We can ignore this issue if everyone wants, but I can envision bug reports.

> I have no idea why has it been doing that, though.  I guess the reason
> was to "speed up" PM transitions (in case it started when you were in a
> low-frequency P-state and then there was no time to bump it up before
> things got too far).

I have no idea either, but the stated theory seems sound.


--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Prarit Bhargava Oct. 7, 2015, 11:08 p.m. UTC | #17
On 10/07/2015 06:05 PM, Rafael J. Wysocki wrote:
> On Wednesday, October 07, 2015 05:31:25 PM Prarit Bhargava wrote:
>>
>> On 10/07/2015 02:52 PM, Doug Smythies wrote:
>>> On 2015.10.07 08:46 Prarit Bhargava wrote:
>>>> On 10/07/2015 11:40 AM, Doug Smythies wrote:
>>>>>
>>>>> Do we agree or disagree that the root issue seems to be (from your test)?:
>>>>>
>>>>> \#  echo 100 > /sys/devices/system/cpu/intel_pstate/min_perf_pct
>>>>>
>>>>> [   21.483436] store_min_perf_pct[453] min_sysfs_pct = 100
>>>>> [   21.489373] store_min_perf_pct[456] min_perf_pct = 100
>>>>> [   21.495203] store_min_perf_pct[459] min_perf_pct = 100
>>>>> [   21.501050] store_min_perf_pct[462] min_perf_pct = 100
>>>>
>>>> Yep, and it appears to be done by default in Fedora & RHEL :/ ... the issue is
>>>> still the same IMO that min_sysfs_pct & max_sysfs_pct are not cleared on a
>>>> governor switch.
>>>
>>> Clearing them will break some other things. For example, and as
>>> shown in my original reply, resume from suspend.
>>>
>>> Why? Because, at least on my computer, the governor is changed to
>>> "performance" during suspend, and the "powersave" governor is
>>> restored sometime during resume. The users wants the settings they had
>>> before the suspend.
>>>
>>
>> Looking at this in more detail after having tested on a Intel(R) Core(TM)
>> i7-2600 CPU @ 3.40GHz in Fedora and RHEL.
>>
>> I have a feeling that the switch you're seeing (poweersave->performance, suspend
>> ... resume, performance->powersave) is occurring in userspace, and not as a
>> result of the kernel.  IMO if userspace changes the governor, all bets are off
>> on maintaining max_sysfs_pct and min_sysfs_pct.
>>
>> Here's something I cannot figure out (because I do not have an Ubuntu install).
>>  *Why* is Ubuntu making the governor switch during suspend/resume?  Is it
>> because of archaic brokeness they were trying to paper over?
> 
> That's not limited to Ubuntu, pm-utils has been doing that forever.
> 
> I have no idea why has it been doing that, though.  I guess the reason
> was to "speed up" PM transitions (in case it started when you were in a
> low-frequency P-state and then there was no time to bump it up before
> things got too far).

Nuts :/.  Okay, it looks like git://anongit.freedesktop.org/git/pm-utils
pm/sleep.d/94cpufreq is the culprit here.  suspend/resume is known to work with
the intel_pstate driver and IMO should be blacklisted.  I'll install Ubuntu and
do some testing.

P.

> 
> Thanks,
> Rafael
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Prarit Bhargava Oct. 7, 2015, 11:17 p.m. UTC | #18
On 10/07/2015 06:26 PM, Doug Smythies wrote:
> On 2015.10.07 15:06 Rafael J. Wysocki wrote:
>> On Wednesday, October 07, 2015 05:31:25 PM Prarit Bhargava wrote:
>>> On 10/07/2015 02:52 PM, Doug Smythies wrote:
>>>> On 2015.10.07 08:46 Prarit Bhargava wrote:
>>>>> On 10/07/2015 11:40 AM, Doug Smythies wrote:
>>>>>>
>>>>>> Do we agree or disagree that the root issue seems to be (from your test)?:
>>>>>>
>>>>>> \#  echo 100 > /sys/devices/system/cpu/intel_pstate/min_perf_pct
>>>>>>
>>>>>> [   21.483436] store_min_perf_pct[453] min_sysfs_pct = 100
>>>>>> [   21.489373] store_min_perf_pct[456] min_perf_pct = 100
>>>>>> [   21.495203] store_min_perf_pct[459] min_perf_pct = 100
>>>>>> [   21.501050] store_min_perf_pct[462] min_perf_pct = 100
>>>>>
>>>>> Yep, and it appears to be done by default in Fedora & RHEL :/ ... the issue is
>>>>> still the same IMO that min_sysfs_pct & max_sysfs_pct are not cleared on a
>>>>> governor switch.
>>>>
>>>> Clearing them will break some other things. For example, and as
>>>> shown in my original reply, resume from suspend.
>>>>
>>>> Why? Because, at least on my computer, the governor is changed to
>>>> "performance" during suspend, and the "powersave" governor is
>>>> restored sometime during resume. The users wants the settings they had
>>>> before the suspend.
>>>>
>>> Looking at this in more detail after having tested on a Intel(R) Core(TM)
>>> i7-2600 CPU @ 3.40GHz in Fedora and RHEL.
>>>
>>> I have a feeling that the switch you're seeing (poweersave->performance, suspend
>>> ... resume, performance->powersave) is occurring in userspace, and not as a
>>> result of the kernel.
> Agreed. It is pm-suspend doing it.
> 
>>>  IMO if userspace changes the governor, all bets are off
>>> on maintaining max_sysfs_pct and min_sysfs_pct.
>>>
>>> Here's something I cannot figure out (because I do not have an Ubuntu install).
>>>  *Why* is Ubuntu making the governor switch during suspend/resume?  Is it
>>> because of archaic brokeness they were trying to paper over?
> 
>>> That's not limited to Ubuntu, pm-utils has been doing that forever.
> 
> Agreed. This in pm-utils, and not limited to Ubuntu.
> We can ignore this issue if everyone wants, but I can envision bug reports.
> 

Yeah -- I'd rather not just ignore it.  Sorry to repeat myself but I just sent
out a reply that said I'm going to submit a patch to pm-utils that blacklists
the intel-pstate driver.  Suspend/Resume is known to work with intel-pstate and
we shouldn't be doing anything fancy there.  I will put it on my TODO list to
also later on take a look at acpi-cpufreq.

>> I have no idea why has it been doing that, though.  I guess the reason
>> was to "speed up" PM transitions (in case it started when you were in a
>> low-frequency P-state and then there was no time to bump it up before
>> things got too far).
> 
> I have no idea either, but the stated theory seems sound.

Looking at various comments here and there online, it sounds like they wanted to
put the system into a "sane state" before attempting to suspend/resume.  The
code has been untouched since 2008.  We've come a long way and I'm willing to
argue that we don't need it anymore.

P.
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 3af9dd7..bb24458 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -986,6 +986,9 @@  static int intel_pstate_set_policy(struct cpufreq_policy *policy)
 	if (!policy->cpuinfo.max_freq)
 		return -ENODEV;
 
+	limits.min_sysfs_pct = 0;
+	limits.max_sysfs_pct = 100;
+
 	if (policy->policy == CPUFREQ_POLICY_PERFORMANCE &&
 	    policy->max >= policy->cpuinfo.max_freq) {
 		limits.min_policy_pct = 100;
@@ -1004,9 +1007,9 @@  static int intel_pstate_set_policy(struct cpufreq_policy *policy)
 	limits.max_policy_pct = clamp_t(int, limits.max_policy_pct, 0 , 100);
 
 	/* Normalize user input to [min_policy_pct, max_policy_pct] */
-	limits.min_perf_pct = max(limits.min_policy_pct, limits.min_sysfs_pct);
+	limits.min_perf_pct = limits.min_policy_pct;
 	limits.min_perf_pct = min(limits.max_policy_pct, limits.min_perf_pct);
-	limits.max_perf_pct = min(limits.max_policy_pct, limits.max_sysfs_pct);
+	limits.max_perf_pct = limits.max_sysfs_pct;
 	limits.max_perf_pct = max(limits.min_policy_pct, limits.max_perf_pct);
 
 	/* Make sure min_perf_pct <= max_perf_pct */