diff mbox

cpufreq, intel_pstate, Fix intel_pstate powersave min_perf_pct value

Message ID 1444822919-14650-1-git-send-email-prarit@redhat.com (mailing list archive)
State Changes Requested, archived
Delegated to: Rafael Wysocki
Headers show

Commit Message

Prarit Bhargava Oct. 14, 2015, 11:41 a.m. UTC
On systems that initialize the intel_pstate driver with the performance
governor, and then switch to the powersave governor will not transition to
lower cpu frequencies until /sys/devices/system/cpu/intel_pstate/min_perf_pct
is set to a low value.

The behavior of governor switching changed after commit a04759924e25
("[cpufreq] intel_pstate: honor user space min_perf_pct override on
 resume").  The commit introduced tracking of performance percentage
changes via sysfs in order to restore userspace changes during
suspend/resume.  The problem occurs because the global values of the newly
introduced max_sysfs_pct and min_sysfs_pct are not lowered on the governor
change and this causes the powersave governor to inherit the performance
governor's settings.

A simple change would have been to reset max_sysfs_pct to 100 and
min_sysfs_pct to 0 on a governor change, which fixes the problem with
governor switching.  However, since we cannot break userspace[1] the fix
is now to give each governor its own limits storage area so that governor
specific changes are tracked.

I successfully tested this by booting with both the performance governor
and the powersave governor by default, and switching between the two
governors (while monitoring /sys/devices/system/cpu/intel_pstate/ values,
and looking at the output of cpupower frequency-info).  Suspend/Resume
testing was performed by Doug Smythies.

[1] Systems which suspend/resume using the unmaintained pm-utils package
will always transition to the performance governor before the suspend and
after the resume.  This means a system using the powersave governor will
go from powersave to performance, then suspend/resume, performance to
powersave.  The simple change during governor changes would have been
overwritten when the governor changed before and after the suspend/resume.
I have submitted https://bugzilla.redhat.com/show_bug.cgi?id=1271225
against Fedora to remove the 94cpufreq file that causes the problem.  It
should be noted that pm-utils is obsoleted with newer versions of systemd.

Cc: Kristen Carlson Accardi <kristen@linux.intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: linux-pm@vger.kernel.org
Cc: Doug Smythies <dsmythies@telus.net>
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
---
 drivers/cpufreq/intel_pstate.c |  120 +++++++++++++++++++++++++---------------
 1 file changed, 75 insertions(+), 45 deletions(-)

Comments

Kristen Carlson Accardi Oct. 14, 2015, 9:09 p.m. UTC | #1
On Wed, 14 Oct 2015 07:41:59 -0400
Prarit Bhargava <prarit@redhat.com> wrote:

> On systems that initialize the intel_pstate driver with the performance
> governor, and then switch to the powersave governor will not transition to
> lower cpu frequencies until /sys/devices/system/cpu/intel_pstate/min_perf_pct
> is set to a low value.
> 
> The behavior of governor switching changed after commit a04759924e25
> ("[cpufreq] intel_pstate: honor user space min_perf_pct override on
>  resume").  The commit introduced tracking of performance percentage
> changes via sysfs in order to restore userspace changes during
> suspend/resume.  The problem occurs because the global values of the newly
> introduced max_sysfs_pct and min_sysfs_pct are not lowered on the governor
> change and this causes the powersave governor to inherit the performance
> governor's settings.
> 
> A simple change would have been to reset max_sysfs_pct to 100 and
> min_sysfs_pct to 0 on a governor change, which fixes the problem with
> governor switching.  However, since we cannot break userspace[1] the fix
> is now to give each governor its own limits storage area so that governor
> specific changes are tracked.
> 
> I successfully tested this by booting with both the performance governor
> and the powersave governor by default, and switching between the two
> governors (while monitoring /sys/devices/system/cpu/intel_pstate/ values,
> and looking at the output of cpupower frequency-info).  Suspend/Resume
> testing was performed by Doug Smythies.
> 
> [1] Systems which suspend/resume using the unmaintained pm-utils package
> will always transition to the performance governor before the suspend and
> after the resume.  This means a system using the powersave governor will
> go from powersave to performance, then suspend/resume, performance to
> powersave.  The simple change during governor changes would have been
> overwritten when the governor changed before and after the suspend/resume.
> I have submitted https://bugzilla.redhat.com/show_bug.cgi?id=1271225
> against Fedora to remove the 94cpufreq file that causes the problem.  It
> should be noted that pm-utils is obsoleted with newer versions of systemd.
> 
> Cc: Kristen Carlson Accardi <kristen@linux.intel.com>
> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> Cc: Viresh Kumar <viresh.kumar@linaro.org>
> Cc: linux-pm@vger.kernel.org
> Cc: Doug Smythies <dsmythies@telus.net>
> Signed-off-by: Prarit Bhargava <prarit@redhat.com>

Acked-by: Kristen Carlson Accardi <kristen@linux.intel.com>

BTW - I think I can see an issue here with HWP enabled systems.  It
looks to me like the hwp settings will not be programmed correctly
during a governor switch.  This probably needs to be addressed in a
separate patch.

> ---
>  drivers/cpufreq/intel_pstate.c |  120 +++++++++++++++++++++++++---------------
>  1 file changed, 75 insertions(+), 45 deletions(-)
> 
> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
> index 3af9dd7..78b4be5 100644
> --- a/drivers/cpufreq/intel_pstate.c
> +++ b/drivers/cpufreq/intel_pstate.c
> @@ -156,7 +156,20 @@ struct perf_limits {
>  	int min_sysfs_pct;
>  };
>  
> -static struct perf_limits limits = {
> +static struct perf_limits performance_limits = {
> +	.no_turbo = 0,
> +	.turbo_disabled = 0,
> +	.max_perf_pct = 100,
> +	.max_perf = int_tofp(1),
> +	.min_perf_pct = 100,
> +	.min_perf = int_tofp(1),
> +	.max_policy_pct = 100,
> +	.max_sysfs_pct = 100,
> +	.min_policy_pct = 0,
> +	.min_sysfs_pct = 0,
> +};
> +
> +static struct perf_limits powersave_limits = {
>  	.no_turbo = 0,
>  	.turbo_disabled = 0,
>  	.max_perf_pct = 100,
> @@ -169,6 +182,12 @@ static struct perf_limits limits = {
>  	.min_sysfs_pct = 0,
>  };
>  
> +#ifdef CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE
> +static struct perf_limits *limits = &performance_limits;
> +#else
> +static struct perf_limits *limits = &powersave_limits;
> +#endif
> +
>  static inline void pid_reset(struct _pid *pid, int setpoint, int busy,
>  			     int deadband, int integral) {
>  	pid->setpoint = setpoint;
> @@ -255,7 +274,7 @@ static inline void update_turbo_state(void)
>  
>  	cpu = all_cpu_data[0];
>  	rdmsrl(MSR_IA32_MISC_ENABLE, misc_en);
> -	limits.turbo_disabled =
> +	limits->turbo_disabled =
>  		(misc_en & MSR_IA32_MISC_ENABLE_TURBO_DISABLE ||
>  		 cpu->pstate.max_pstate == cpu->pstate.turbo_pstate);
>  }
> @@ -274,14 +293,14 @@ static void intel_pstate_hwp_set(void)
>  
>  	for_each_online_cpu(cpu) {
>  		rdmsrl_on_cpu(cpu, MSR_HWP_REQUEST, &value);
> -		adj_range = limits.min_perf_pct * range / 100;
> +		adj_range = limits->min_perf_pct * range / 100;
>  		min = hw_min + adj_range;
>  		value &= ~HWP_MIN_PERF(~0L);
>  		value |= HWP_MIN_PERF(min);
>  
> -		adj_range = limits.max_perf_pct * range / 100;
> +		adj_range = limits->max_perf_pct * range / 100;
>  		max = hw_min + adj_range;
> -		if (limits.no_turbo) {
> +		if (limits->no_turbo) {
>  			hw_max = HWP_GUARANTEED_PERF(cap);
>  			if (hw_max < max)
>  				max = hw_max;
> @@ -350,7 +369,7 @@ static void __init intel_pstate_debug_expose_params(void)
>  	static ssize_t show_##file_name					\
>  	(struct kobject *kobj, struct attribute *attr, char *buf)	\
>  	{								\
> -		return sprintf(buf, "%u\n", limits.object);		\
> +		return sprintf(buf, "%u\n", limits->object);		\
>  	}
>  
>  static ssize_t show_turbo_pct(struct kobject *kobj,
> @@ -386,10 +405,10 @@ static ssize_t show_no_turbo(struct kobject *kobj,
>  	ssize_t ret;
>  
>  	update_turbo_state();
> -	if (limits.turbo_disabled)
> -		ret = sprintf(buf, "%u\n", limits.turbo_disabled);
> +	if (limits->turbo_disabled)
> +		ret = sprintf(buf, "%u\n", limits->turbo_disabled);
>  	else
> -		ret = sprintf(buf, "%u\n", limits.no_turbo);
> +		ret = sprintf(buf, "%u\n", limits->no_turbo);
>  
>  	return ret;
>  }
> @@ -405,12 +424,12 @@ static ssize_t store_no_turbo(struct kobject *a, struct attribute *b,
>  		return -EINVAL;
>  
>  	update_turbo_state();
> -	if (limits.turbo_disabled) {
> +	if (limits->turbo_disabled) {
>  		pr_warn("intel_pstate: Turbo disabled by BIOS or unavailable on processor\n");
>  		return -EPERM;
>  	}
>  
> -	limits.no_turbo = clamp_t(int, input, 0, 1);
> +	limits->no_turbo = clamp_t(int, input, 0, 1);
>  
>  	if (hwp_active)
>  		intel_pstate_hwp_set();
> @@ -428,11 +447,15 @@ static ssize_t store_max_perf_pct(struct kobject *a, struct attribute *b,
>  	if (ret != 1)
>  		return -EINVAL;
>  
> -	limits.max_sysfs_pct = clamp_t(int, input, 0 , 100);
> -	limits.max_perf_pct = min(limits.max_policy_pct, limits.max_sysfs_pct);
> -	limits.max_perf_pct = max(limits.min_policy_pct, limits.max_perf_pct);
> -	limits.max_perf_pct = max(limits.min_perf_pct, limits.max_perf_pct);
> -	limits.max_perf = div_fp(int_tofp(limits.max_perf_pct), int_tofp(100));
> +	limits->max_sysfs_pct = clamp_t(int, input, 0 , 100);
> +	limits->max_perf_pct = min(limits->max_policy_pct,
> +				   limits->max_sysfs_pct);
> +	limits->max_perf_pct = max(limits->min_policy_pct,
> +				   limits->max_perf_pct);
> +	limits->max_perf_pct = max(limits->min_perf_pct,
> +				   limits->max_perf_pct);
> +	limits->max_perf = div_fp(int_tofp(limits->max_perf_pct),
> +				  int_tofp(100));
>  
>  	if (hwp_active)
>  		intel_pstate_hwp_set();
> @@ -449,11 +472,15 @@ static ssize_t store_min_perf_pct(struct kobject *a, struct attribute *b,
>  	if (ret != 1)
>  		return -EINVAL;
>  
> -	limits.min_sysfs_pct = clamp_t(int, input, 0 , 100);
> -	limits.min_perf_pct = max(limits.min_policy_pct, limits.min_sysfs_pct);
> -	limits.min_perf_pct = min(limits.max_policy_pct, limits.min_perf_pct);
> -	limits.min_perf_pct = min(limits.max_perf_pct, limits.min_perf_pct);
> -	limits.min_perf = div_fp(int_tofp(limits.min_perf_pct), int_tofp(100));
> +	limits->min_sysfs_pct = clamp_t(int, input, 0 , 100);
> +	limits->min_perf_pct = max(limits->min_policy_pct,
> +				   limits->min_sysfs_pct);
> +	limits->min_perf_pct = min(limits->max_policy_pct,
> +				   limits->min_perf_pct);
> +	limits->min_perf_pct = min(limits->max_perf_pct,
> +				   limits->min_perf_pct);
> +	limits->min_perf = div_fp(int_tofp(limits->min_perf_pct),
> +				  int_tofp(100));
>  
>  	if (hwp_active)
>  		intel_pstate_hwp_set();
> @@ -533,7 +560,7 @@ static void byt_set_pstate(struct cpudata *cpudata, int pstate)
>  	u32 vid;
>  
>  	val = (u64)pstate << 8;
> -	if (limits.no_turbo && !limits.turbo_disabled)
> +	if (limits->no_turbo && !limits->turbo_disabled)
>  		val |= (u64)1 << 32;
>  
>  	vid_fp = cpudata->vid.min + mul_fp(
> @@ -622,7 +649,7 @@ static void core_set_pstate(struct cpudata *cpudata, int pstate)
>  	u64 val;
>  
>  	val = (u64)pstate << 8;
> -	if (limits.no_turbo && !limits.turbo_disabled)
> +	if (limits->no_turbo && !limits->turbo_disabled)
>  		val |= (u64)1 << 32;
>  
>  	wrmsrl_on_cpu(cpudata->cpu, MSR_IA32_PERF_CTL, val);
> @@ -702,7 +729,7 @@ static void intel_pstate_get_min_max(struct cpudata *cpu, int *min, int *max)
>  	int max_perf_adj;
>  	int min_perf;
>  
> -	if (limits.no_turbo || limits.turbo_disabled)
> +	if (limits->no_turbo || limits->turbo_disabled)
>  		max_perf = cpu->pstate.max_pstate;
>  
>  	/*
> @@ -710,11 +737,11 @@ static void intel_pstate_get_min_max(struct cpudata *cpu, int *min, int *max)
>  	 * policy, or by cpu specific default values determined through
>  	 * experimentation.
>  	 */
> -	max_perf_adj = fp_toint(mul_fp(int_tofp(max_perf), limits.max_perf));
> +	max_perf_adj = fp_toint(mul_fp(int_tofp(max_perf), limits->max_perf));
>  	*max = clamp_t(int, max_perf_adj,
>  			cpu->pstate.min_pstate, cpu->pstate.turbo_pstate);
>  
> -	min_perf = fp_toint(mul_fp(int_tofp(max_perf), limits.min_perf));
> +	min_perf = fp_toint(mul_fp(int_tofp(max_perf), limits->min_perf));
>  	*min = clamp_t(int, min_perf, cpu->pstate.min_pstate, max_perf);
>  }
>  
> @@ -988,32 +1015,35 @@ static int intel_pstate_set_policy(struct cpufreq_policy *policy)
>  
>  	if (policy->policy == CPUFREQ_POLICY_PERFORMANCE &&
>  	    policy->max >= policy->cpuinfo.max_freq) {
> -		limits.min_policy_pct = 100;
> -		limits.min_perf_pct = 100;
> -		limits.min_perf = int_tofp(1);
> -		limits.max_policy_pct = 100;
> -		limits.max_perf_pct = 100;
> -		limits.max_perf = int_tofp(1);
> -		limits.no_turbo = 0;
> +		pr_debug("intel_pstate: set performance\n");
> +		limits = &performance_limits;
>  		return 0;
>  	}
>  
> -	limits.min_policy_pct = (policy->min * 100) / policy->cpuinfo.max_freq;
> -	limits.min_policy_pct = clamp_t(int, limits.min_policy_pct, 0 , 100);
> -	limits.max_policy_pct = (policy->max * 100) / policy->cpuinfo.max_freq;
> -	limits.max_policy_pct = clamp_t(int, limits.max_policy_pct, 0 , 100);
> +	pr_debug("intel_pstate: set powersave\n");
> +	limits = &powersave_limits;
> +	limits->min_policy_pct = (policy->min * 100) / policy->cpuinfo.max_freq;
> +	limits->min_policy_pct = clamp_t(int, limits->min_policy_pct, 0 , 100);
> +	limits->max_policy_pct = (policy->max * 100) / policy->cpuinfo.max_freq;
> +	limits->max_policy_pct = clamp_t(int, limits->max_policy_pct, 0 , 100);
>  
>  	/* Normalize user input to [min_policy_pct, max_policy_pct] */
> -	limits.min_perf_pct = max(limits.min_policy_pct, limits.min_sysfs_pct);
> -	limits.min_perf_pct = min(limits.max_policy_pct, limits.min_perf_pct);
> -	limits.max_perf_pct = min(limits.max_policy_pct, limits.max_sysfs_pct);
> -	limits.max_perf_pct = max(limits.min_policy_pct, limits.max_perf_pct);
> +	limits->min_perf_pct = max(limits->min_policy_pct,
> +				   limits->min_sysfs_pct);
> +	limits->min_perf_pct = min(limits->max_policy_pct,
> +				   limits->min_perf_pct);
> +	limits->max_perf_pct = min(limits->max_policy_pct,
> +				   limits->max_sysfs_pct);
> +	limits->max_perf_pct = max(limits->min_policy_pct,
> +				   limits->max_perf_pct);
>  
>  	/* Make sure min_perf_pct <= max_perf_pct */
> -	limits.min_perf_pct = min(limits.max_perf_pct, limits.min_perf_pct);
> +	limits->min_perf_pct = min(limits->max_perf_pct, limits->min_perf_pct);
>  
> -	limits.min_perf = div_fp(int_tofp(limits.min_perf_pct), int_tofp(100));
> -	limits.max_perf = div_fp(int_tofp(limits.max_perf_pct), int_tofp(100));
> +	limits->min_perf = div_fp(int_tofp(limits->min_perf_pct),
> +				  int_tofp(100));
> +	limits->max_perf = div_fp(int_tofp(limits->max_perf_pct),
> +				  int_tofp(100));
>  
>  	if (hwp_active)
>  		intel_pstate_hwp_set();
> @@ -1057,7 +1087,7 @@ static int intel_pstate_cpu_init(struct cpufreq_policy *policy)
>  
>  	cpu = all_cpu_data[policy->cpu];
>  
> -	if (limits.min_perf_pct == 100 && limits.max_perf_pct == 100)
> +	if (limits->min_perf_pct == 100 && limits->max_perf_pct == 100)
>  		policy->policy = CPUFREQ_POLICY_PERFORMANCE;
>  	else
>  		policy->policy = CPUFREQ_POLICY_POWERSAVE;

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Prarit Bhargava Oct. 14, 2015, 11:59 p.m. UTC | #2
On 10/14/2015 05:09 PM, Kristen Carlson Accardi wrote:
> On Wed, 14 Oct 2015 07:41:59 -0400
> Prarit Bhargava <prarit@redhat.com> wrote:
> 
>> On systems that initialize the intel_pstate driver with the performance
>> governor, and then switch to the powersave governor will not transition to
>> lower cpu frequencies until /sys/devices/system/cpu/intel_pstate/min_perf_pct
>> is set to a low value.
>>
>> The behavior of governor switching changed after commit a04759924e25
>> ("[cpufreq] intel_pstate: honor user space min_perf_pct override on
>>  resume").  The commit introduced tracking of performance percentage
>> changes via sysfs in order to restore userspace changes during
>> suspend/resume.  The problem occurs because the global values of the newly
>> introduced max_sysfs_pct and min_sysfs_pct are not lowered on the governor
>> change and this causes the powersave governor to inherit the performance
>> governor's settings.
>>
>> A simple change would have been to reset max_sysfs_pct to 100 and
>> min_sysfs_pct to 0 on a governor change, which fixes the problem with
>> governor switching.  However, since we cannot break userspace[1] the fix
>> is now to give each governor its own limits storage area so that governor
>> specific changes are tracked.
>>
>> I successfully tested this by booting with both the performance governor
>> and the powersave governor by default, and switching between the two
>> governors (while monitoring /sys/devices/system/cpu/intel_pstate/ values,
>> and looking at the output of cpupower frequency-info).  Suspend/Resume
>> testing was performed by Doug Smythies.
>>
>> [1] Systems which suspend/resume using the unmaintained pm-utils package
>> will always transition to the performance governor before the suspend and
>> after the resume.  This means a system using the powersave governor will
>> go from powersave to performance, then suspend/resume, performance to
>> powersave.  The simple change during governor changes would have been
>> overwritten when the governor changed before and after the suspend/resume.
>> I have submitted https://bugzilla.redhat.com/show_bug.cgi?id=1271225
>> against Fedora to remove the 94cpufreq file that causes the problem.  It
>> should be noted that pm-utils is obsoleted with newer versions of systemd.
>>
>> Cc: Kristen Carlson Accardi <kristen@linux.intel.com>
>> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
>> Cc: Viresh Kumar <viresh.kumar@linaro.org>
>> Cc: linux-pm@vger.kernel.org
>> Cc: Doug Smythies <dsmythies@telus.net>
>> Signed-off-by: Prarit Bhargava <prarit@redhat.com>
> 
> Acked-by: Kristen Carlson Accardi <kristen@linux.intel.com>
> 
> BTW - I think I can see an issue here with HWP enabled systems.  It
> looks to me like the hwp settings will not be programmed correctly
> during a governor switch.  This probably needs to be addressed in a
> separate patch.
> 

Oh, I see it now too.  I'll get to that in another patch.  Thanks for pointing
that out Kristen.

P.

>> ---
>>  drivers/cpufreq/intel_pstate.c |  120 +++++++++++++++++++++++++---------------
>>  1 file changed, 75 insertions(+), 45 deletions(-)
>>
>> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
>> index 3af9dd7..78b4be5 100644
>> --- a/drivers/cpufreq/intel_pstate.c
>> +++ b/drivers/cpufreq/intel_pstate.c
>> @@ -156,7 +156,20 @@ struct perf_limits {
>>  	int min_sysfs_pct;
>>  };
>>  
>> -static struct perf_limits limits = {
>> +static struct perf_limits performance_limits = {
>> +	.no_turbo = 0,
>> +	.turbo_disabled = 0,
>> +	.max_perf_pct = 100,
>> +	.max_perf = int_tofp(1),
>> +	.min_perf_pct = 100,
>> +	.min_perf = int_tofp(1),
>> +	.max_policy_pct = 100,
>> +	.max_sysfs_pct = 100,
>> +	.min_policy_pct = 0,
>> +	.min_sysfs_pct = 0,
>> +};
>> +
>> +static struct perf_limits powersave_limits = {
>>  	.no_turbo = 0,
>>  	.turbo_disabled = 0,
>>  	.max_perf_pct = 100,
>> @@ -169,6 +182,12 @@ static struct perf_limits limits = {
>>  	.min_sysfs_pct = 0,
>>  };
>>  
>> +#ifdef CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE
>> +static struct perf_limits *limits = &performance_limits;
>> +#else
>> +static struct perf_limits *limits = &powersave_limits;
>> +#endif
>> +
>>  static inline void pid_reset(struct _pid *pid, int setpoint, int busy,
>>  			     int deadband, int integral) {
>>  	pid->setpoint = setpoint;
>> @@ -255,7 +274,7 @@ static inline void update_turbo_state(void)
>>  
>>  	cpu = all_cpu_data[0];
>>  	rdmsrl(MSR_IA32_MISC_ENABLE, misc_en);
>> -	limits.turbo_disabled =
>> +	limits->turbo_disabled =
>>  		(misc_en & MSR_IA32_MISC_ENABLE_TURBO_DISABLE ||
>>  		 cpu->pstate.max_pstate == cpu->pstate.turbo_pstate);
>>  }
>> @@ -274,14 +293,14 @@ static void intel_pstate_hwp_set(void)
>>  
>>  	for_each_online_cpu(cpu) {
>>  		rdmsrl_on_cpu(cpu, MSR_HWP_REQUEST, &value);
>> -		adj_range = limits.min_perf_pct * range / 100;
>> +		adj_range = limits->min_perf_pct * range / 100;
>>  		min = hw_min + adj_range;
>>  		value &= ~HWP_MIN_PERF(~0L);
>>  		value |= HWP_MIN_PERF(min);
>>  
>> -		adj_range = limits.max_perf_pct * range / 100;
>> +		adj_range = limits->max_perf_pct * range / 100;
>>  		max = hw_min + adj_range;
>> -		if (limits.no_turbo) {
>> +		if (limits->no_turbo) {
>>  			hw_max = HWP_GUARANTEED_PERF(cap);
>>  			if (hw_max < max)
>>  				max = hw_max;
>> @@ -350,7 +369,7 @@ static void __init intel_pstate_debug_expose_params(void)
>>  	static ssize_t show_##file_name					\
>>  	(struct kobject *kobj, struct attribute *attr, char *buf)	\
>>  	{								\
>> -		return sprintf(buf, "%u\n", limits.object);		\
>> +		return sprintf(buf, "%u\n", limits->object);		\
>>  	}
>>  
>>  static ssize_t show_turbo_pct(struct kobject *kobj,
>> @@ -386,10 +405,10 @@ static ssize_t show_no_turbo(struct kobject *kobj,
>>  	ssize_t ret;
>>  
>>  	update_turbo_state();
>> -	if (limits.turbo_disabled)
>> -		ret = sprintf(buf, "%u\n", limits.turbo_disabled);
>> +	if (limits->turbo_disabled)
>> +		ret = sprintf(buf, "%u\n", limits->turbo_disabled);
>>  	else
>> -		ret = sprintf(buf, "%u\n", limits.no_turbo);
>> +		ret = sprintf(buf, "%u\n", limits->no_turbo);
>>  
>>  	return ret;
>>  }
>> @@ -405,12 +424,12 @@ static ssize_t store_no_turbo(struct kobject *a, struct attribute *b,
>>  		return -EINVAL;
>>  
>>  	update_turbo_state();
>> -	if (limits.turbo_disabled) {
>> +	if (limits->turbo_disabled) {
>>  		pr_warn("intel_pstate: Turbo disabled by BIOS or unavailable on processor\n");
>>  		return -EPERM;
>>  	}
>>  
>> -	limits.no_turbo = clamp_t(int, input, 0, 1);
>> +	limits->no_turbo = clamp_t(int, input, 0, 1);
>>  
>>  	if (hwp_active)
>>  		intel_pstate_hwp_set();
>> @@ -428,11 +447,15 @@ static ssize_t store_max_perf_pct(struct kobject *a, struct attribute *b,
>>  	if (ret != 1)
>>  		return -EINVAL;
>>  
>> -	limits.max_sysfs_pct = clamp_t(int, input, 0 , 100);
>> -	limits.max_perf_pct = min(limits.max_policy_pct, limits.max_sysfs_pct);
>> -	limits.max_perf_pct = max(limits.min_policy_pct, limits.max_perf_pct);
>> -	limits.max_perf_pct = max(limits.min_perf_pct, limits.max_perf_pct);
>> -	limits.max_perf = div_fp(int_tofp(limits.max_perf_pct), int_tofp(100));
>> +	limits->max_sysfs_pct = clamp_t(int, input, 0 , 100);
>> +	limits->max_perf_pct = min(limits->max_policy_pct,
>> +				   limits->max_sysfs_pct);
>> +	limits->max_perf_pct = max(limits->min_policy_pct,
>> +				   limits->max_perf_pct);
>> +	limits->max_perf_pct = max(limits->min_perf_pct,
>> +				   limits->max_perf_pct);
>> +	limits->max_perf = div_fp(int_tofp(limits->max_perf_pct),
>> +				  int_tofp(100));
>>  
>>  	if (hwp_active)
>>  		intel_pstate_hwp_set();
>> @@ -449,11 +472,15 @@ static ssize_t store_min_perf_pct(struct kobject *a, struct attribute *b,
>>  	if (ret != 1)
>>  		return -EINVAL;
>>  
>> -	limits.min_sysfs_pct = clamp_t(int, input, 0 , 100);
>> -	limits.min_perf_pct = max(limits.min_policy_pct, limits.min_sysfs_pct);
>> -	limits.min_perf_pct = min(limits.max_policy_pct, limits.min_perf_pct);
>> -	limits.min_perf_pct = min(limits.max_perf_pct, limits.min_perf_pct);
>> -	limits.min_perf = div_fp(int_tofp(limits.min_perf_pct), int_tofp(100));
>> +	limits->min_sysfs_pct = clamp_t(int, input, 0 , 100);
>> +	limits->min_perf_pct = max(limits->min_policy_pct,
>> +				   limits->min_sysfs_pct);
>> +	limits->min_perf_pct = min(limits->max_policy_pct,
>> +				   limits->min_perf_pct);
>> +	limits->min_perf_pct = min(limits->max_perf_pct,
>> +				   limits->min_perf_pct);
>> +	limits->min_perf = div_fp(int_tofp(limits->min_perf_pct),
>> +				  int_tofp(100));
>>  
>>  	if (hwp_active)
>>  		intel_pstate_hwp_set();
>> @@ -533,7 +560,7 @@ static void byt_set_pstate(struct cpudata *cpudata, int pstate)
>>  	u32 vid;
>>  
>>  	val = (u64)pstate << 8;
>> -	if (limits.no_turbo && !limits.turbo_disabled)
>> +	if (limits->no_turbo && !limits->turbo_disabled)
>>  		val |= (u64)1 << 32;
>>  
>>  	vid_fp = cpudata->vid.min + mul_fp(
>> @@ -622,7 +649,7 @@ static void core_set_pstate(struct cpudata *cpudata, int pstate)
>>  	u64 val;
>>  
>>  	val = (u64)pstate << 8;
>> -	if (limits.no_turbo && !limits.turbo_disabled)
>> +	if (limits->no_turbo && !limits->turbo_disabled)
>>  		val |= (u64)1 << 32;
>>  
>>  	wrmsrl_on_cpu(cpudata->cpu, MSR_IA32_PERF_CTL, val);
>> @@ -702,7 +729,7 @@ static void intel_pstate_get_min_max(struct cpudata *cpu, int *min, int *max)
>>  	int max_perf_adj;
>>  	int min_perf;
>>  
>> -	if (limits.no_turbo || limits.turbo_disabled)
>> +	if (limits->no_turbo || limits->turbo_disabled)
>>  		max_perf = cpu->pstate.max_pstate;
>>  
>>  	/*
>> @@ -710,11 +737,11 @@ static void intel_pstate_get_min_max(struct cpudata *cpu, int *min, int *max)
>>  	 * policy, or by cpu specific default values determined through
>>  	 * experimentation.
>>  	 */
>> -	max_perf_adj = fp_toint(mul_fp(int_tofp(max_perf), limits.max_perf));
>> +	max_perf_adj = fp_toint(mul_fp(int_tofp(max_perf), limits->max_perf));
>>  	*max = clamp_t(int, max_perf_adj,
>>  			cpu->pstate.min_pstate, cpu->pstate.turbo_pstate);
>>  
>> -	min_perf = fp_toint(mul_fp(int_tofp(max_perf), limits.min_perf));
>> +	min_perf = fp_toint(mul_fp(int_tofp(max_perf), limits->min_perf));
>>  	*min = clamp_t(int, min_perf, cpu->pstate.min_pstate, max_perf);
>>  }
>>  
>> @@ -988,32 +1015,35 @@ static int intel_pstate_set_policy(struct cpufreq_policy *policy)
>>  
>>  	if (policy->policy == CPUFREQ_POLICY_PERFORMANCE &&
>>  	    policy->max >= policy->cpuinfo.max_freq) {
>> -		limits.min_policy_pct = 100;
>> -		limits.min_perf_pct = 100;
>> -		limits.min_perf = int_tofp(1);
>> -		limits.max_policy_pct = 100;
>> -		limits.max_perf_pct = 100;
>> -		limits.max_perf = int_tofp(1);
>> -		limits.no_turbo = 0;
>> +		pr_debug("intel_pstate: set performance\n");
>> +		limits = &performance_limits;
>>  		return 0;
>>  	}
>>  
>> -	limits.min_policy_pct = (policy->min * 100) / policy->cpuinfo.max_freq;
>> -	limits.min_policy_pct = clamp_t(int, limits.min_policy_pct, 0 , 100);
>> -	limits.max_policy_pct = (policy->max * 100) / policy->cpuinfo.max_freq;
>> -	limits.max_policy_pct = clamp_t(int, limits.max_policy_pct, 0 , 100);
>> +	pr_debug("intel_pstate: set powersave\n");
>> +	limits = &powersave_limits;
>> +	limits->min_policy_pct = (policy->min * 100) / policy->cpuinfo.max_freq;
>> +	limits->min_policy_pct = clamp_t(int, limits->min_policy_pct, 0 , 100);
>> +	limits->max_policy_pct = (policy->max * 100) / policy->cpuinfo.max_freq;
>> +	limits->max_policy_pct = clamp_t(int, limits->max_policy_pct, 0 , 100);
>>  
>>  	/* Normalize user input to [min_policy_pct, max_policy_pct] */
>> -	limits.min_perf_pct = max(limits.min_policy_pct, limits.min_sysfs_pct);
>> -	limits.min_perf_pct = min(limits.max_policy_pct, limits.min_perf_pct);
>> -	limits.max_perf_pct = min(limits.max_policy_pct, limits.max_sysfs_pct);
>> -	limits.max_perf_pct = max(limits.min_policy_pct, limits.max_perf_pct);
>> +	limits->min_perf_pct = max(limits->min_policy_pct,
>> +				   limits->min_sysfs_pct);
>> +	limits->min_perf_pct = min(limits->max_policy_pct,
>> +				   limits->min_perf_pct);
>> +	limits->max_perf_pct = min(limits->max_policy_pct,
>> +				   limits->max_sysfs_pct);
>> +	limits->max_perf_pct = max(limits->min_policy_pct,
>> +				   limits->max_perf_pct);
>>  
>>  	/* Make sure min_perf_pct <= max_perf_pct */
>> -	limits.min_perf_pct = min(limits.max_perf_pct, limits.min_perf_pct);
>> +	limits->min_perf_pct = min(limits->max_perf_pct, limits->min_perf_pct);
>>  
>> -	limits.min_perf = div_fp(int_tofp(limits.min_perf_pct), int_tofp(100));
>> -	limits.max_perf = div_fp(int_tofp(limits.max_perf_pct), int_tofp(100));
>> +	limits->min_perf = div_fp(int_tofp(limits->min_perf_pct),
>> +				  int_tofp(100));
>> +	limits->max_perf = div_fp(int_tofp(limits->max_perf_pct),
>> +				  int_tofp(100));
>>  
>>  	if (hwp_active)
>>  		intel_pstate_hwp_set();
>> @@ -1057,7 +1087,7 @@ static int intel_pstate_cpu_init(struct cpufreq_policy *policy)
>>  
>>  	cpu = all_cpu_data[policy->cpu];
>>  
>> -	if (limits.min_perf_pct == 100 && limits.max_perf_pct == 100)
>> +	if (limits->min_perf_pct == 100 && limits->max_perf_pct == 100)
>>  		policy->policy = CPUFREQ_POLICY_PERFORMANCE;
>>  	else
>>  		policy->policy = CPUFREQ_POLICY_POWERSAVE;
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Prarit Bhargava Oct. 15, 2015, 12:02 a.m. UTC | #3
On 10/14/2015 08:29 PM, Rafael J. Wysocki wrote:
> On Wednesday, October 14, 2015 07:59:38 PM Prarit Bhargava wrote:
>>
>> On 10/14/2015 05:09 PM, Kristen Carlson Accardi wrote:
>>> On Wed, 14 Oct 2015 07:41:59 -0400
>>> Prarit Bhargava <prarit@redhat.com> wrote:
>>>
>>>> On systems that initialize the intel_pstate driver with the performance
>>>> governor, and then switch to the powersave governor will not transition to
>>>> lower cpu frequencies until /sys/devices/system/cpu/intel_pstate/min_perf_pct
>>>> is set to a low value.
>>>>
>>>> The behavior of governor switching changed after commit a04759924e25
>>>> ("[cpufreq] intel_pstate: honor user space min_perf_pct override on
>>>>  resume").  The commit introduced tracking of performance percentage
>>>> changes via sysfs in order to restore userspace changes during
>>>> suspend/resume.  The problem occurs because the global values of the newly
>>>> introduced max_sysfs_pct and min_sysfs_pct are not lowered on the governor
>>>> change and this causes the powersave governor to inherit the performance
>>>> governor's settings.
>>>>
>>>> A simple change would have been to reset max_sysfs_pct to 100 and
>>>> min_sysfs_pct to 0 on a governor change, which fixes the problem with
>>>> governor switching.  However, since we cannot break userspace[1] the fix
>>>> is now to give each governor its own limits storage area so that governor
>>>> specific changes are tracked.
>>>>
>>>> I successfully tested this by booting with both the performance governor
>>>> and the powersave governor by default, and switching between the two
>>>> governors (while monitoring /sys/devices/system/cpu/intel_pstate/ values,
>>>> and looking at the output of cpupower frequency-info).  Suspend/Resume
>>>> testing was performed by Doug Smythies.
>>>>
>>>> [1] Systems which suspend/resume using the unmaintained pm-utils package
>>>> will always transition to the performance governor before the suspend and
>>>> after the resume.  This means a system using the powersave governor will
>>>> go from powersave to performance, then suspend/resume, performance to
>>>> powersave.  The simple change during governor changes would have been
>>>> overwritten when the governor changed before and after the suspend/resume.
>>>> I have submitted https://bugzilla.redhat.com/show_bug.cgi?id=1271225
>>>> against Fedora to remove the 94cpufreq file that causes the problem.  It
>>>> should be noted that pm-utils is obsoleted with newer versions of systemd.
>>>>
>>>> Cc: Kristen Carlson Accardi <kristen@linux.intel.com>
>>>> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
>>>> Cc: Viresh Kumar <viresh.kumar@linaro.org>
>>>> Cc: linux-pm@vger.kernel.org
>>>> Cc: Doug Smythies <dsmythies@telus.net>
>>>> Signed-off-by: Prarit Bhargava <prarit@redhat.com>
>>>
>>> Acked-by: Kristen Carlson Accardi <kristen@linux.intel.com>
>>>
>>> BTW - I think I can see an issue here with HWP enabled systems.  It
>>> looks to me like the hwp settings will not be programmed correctly
>>> during a governor switch.  This probably needs to be addressed in a
>>> separate patch.
>>>
>>
>> Oh, I see it now too.  I'll get to that in another patch.  Thanks for pointing
>> that out Kristen.
> 
> The $subject patch doesn't apply any more after the series from Srinivas that
> I've just applied.
> 
> Can you please rebase it on top of my bleeding-edge branch?
> 

Sure -- can you send me a pointer to the branch?

P.

> Thanks,
> Rafael
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki Oct. 15, 2015, 12:29 a.m. UTC | #4
On Wednesday, October 14, 2015 07:59:38 PM Prarit Bhargava wrote:
> 
> On 10/14/2015 05:09 PM, Kristen Carlson Accardi wrote:
> > On Wed, 14 Oct 2015 07:41:59 -0400
> > Prarit Bhargava <prarit@redhat.com> wrote:
> > 
> >> On systems that initialize the intel_pstate driver with the performance
> >> governor, and then switch to the powersave governor will not transition to
> >> lower cpu frequencies until /sys/devices/system/cpu/intel_pstate/min_perf_pct
> >> is set to a low value.
> >>
> >> The behavior of governor switching changed after commit a04759924e25
> >> ("[cpufreq] intel_pstate: honor user space min_perf_pct override on
> >>  resume").  The commit introduced tracking of performance percentage
> >> changes via sysfs in order to restore userspace changes during
> >> suspend/resume.  The problem occurs because the global values of the newly
> >> introduced max_sysfs_pct and min_sysfs_pct are not lowered on the governor
> >> change and this causes the powersave governor to inherit the performance
> >> governor's settings.
> >>
> >> A simple change would have been to reset max_sysfs_pct to 100 and
> >> min_sysfs_pct to 0 on a governor change, which fixes the problem with
> >> governor switching.  However, since we cannot break userspace[1] the fix
> >> is now to give each governor its own limits storage area so that governor
> >> specific changes are tracked.
> >>
> >> I successfully tested this by booting with both the performance governor
> >> and the powersave governor by default, and switching between the two
> >> governors (while monitoring /sys/devices/system/cpu/intel_pstate/ values,
> >> and looking at the output of cpupower frequency-info).  Suspend/Resume
> >> testing was performed by Doug Smythies.
> >>
> >> [1] Systems which suspend/resume using the unmaintained pm-utils package
> >> will always transition to the performance governor before the suspend and
> >> after the resume.  This means a system using the powersave governor will
> >> go from powersave to performance, then suspend/resume, performance to
> >> powersave.  The simple change during governor changes would have been
> >> overwritten when the governor changed before and after the suspend/resume.
> >> I have submitted https://bugzilla.redhat.com/show_bug.cgi?id=1271225
> >> against Fedora to remove the 94cpufreq file that causes the problem.  It
> >> should be noted that pm-utils is obsoleted with newer versions of systemd.
> >>
> >> Cc: Kristen Carlson Accardi <kristen@linux.intel.com>
> >> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> >> Cc: Viresh Kumar <viresh.kumar@linaro.org>
> >> Cc: linux-pm@vger.kernel.org
> >> Cc: Doug Smythies <dsmythies@telus.net>
> >> Signed-off-by: Prarit Bhargava <prarit@redhat.com>
> > 
> > Acked-by: Kristen Carlson Accardi <kristen@linux.intel.com>
> > 
> > BTW - I think I can see an issue here with HWP enabled systems.  It
> > looks to me like the hwp settings will not be programmed correctly
> > during a governor switch.  This probably needs to be addressed in a
> > separate patch.
> > 
> 
> Oh, I see it now too.  I'll get to that in another patch.  Thanks for pointing
> that out Kristen.

The $subject patch doesn't apply any more after the series from Srinivas that
I've just applied.

Can you please rebase it on top of my bleeding-edge branch?

Thanks,
Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki Oct. 15, 2015, 12:32 a.m. UTC | #5
On Wednesday, October 14, 2015 08:02:02 PM Prarit Bhargava wrote:
> 
> On 10/14/2015 08:29 PM, Rafael J. Wysocki wrote:
> > On Wednesday, October 14, 2015 07:59:38 PM Prarit Bhargava wrote:
> >>
> >> On 10/14/2015 05:09 PM, Kristen Carlson Accardi wrote:
> >>> On Wed, 14 Oct 2015 07:41:59 -0400
> >>> Prarit Bhargava <prarit@redhat.com> wrote:
> >>>
> >>>> On systems that initialize the intel_pstate driver with the performance
> >>>> governor, and then switch to the powersave governor will not transition to
> >>>> lower cpu frequencies until /sys/devices/system/cpu/intel_pstate/min_perf_pct
> >>>> is set to a low value.
> >>>>
> >>>> The behavior of governor switching changed after commit a04759924e25
> >>>> ("[cpufreq] intel_pstate: honor user space min_perf_pct override on
> >>>>  resume").  The commit introduced tracking of performance percentage
> >>>> changes via sysfs in order to restore userspace changes during
> >>>> suspend/resume.  The problem occurs because the global values of the newly
> >>>> introduced max_sysfs_pct and min_sysfs_pct are not lowered on the governor
> >>>> change and this causes the powersave governor to inherit the performance
> >>>> governor's settings.
> >>>>
> >>>> A simple change would have been to reset max_sysfs_pct to 100 and
> >>>> min_sysfs_pct to 0 on a governor change, which fixes the problem with
> >>>> governor switching.  However, since we cannot break userspace[1] the fix
> >>>> is now to give each governor its own limits storage area so that governor
> >>>> specific changes are tracked.
> >>>>
> >>>> I successfully tested this by booting with both the performance governor
> >>>> and the powersave governor by default, and switching between the two
> >>>> governors (while monitoring /sys/devices/system/cpu/intel_pstate/ values,
> >>>> and looking at the output of cpupower frequency-info).  Suspend/Resume
> >>>> testing was performed by Doug Smythies.
> >>>>
> >>>> [1] Systems which suspend/resume using the unmaintained pm-utils package
> >>>> will always transition to the performance governor before the suspend and
> >>>> after the resume.  This means a system using the powersave governor will
> >>>> go from powersave to performance, then suspend/resume, performance to
> >>>> powersave.  The simple change during governor changes would have been
> >>>> overwritten when the governor changed before and after the suspend/resume.
> >>>> I have submitted https://bugzilla.redhat.com/show_bug.cgi?id=1271225
> >>>> against Fedora to remove the 94cpufreq file that causes the problem.  It
> >>>> should be noted that pm-utils is obsoleted with newer versions of systemd.
> >>>>
> >>>> Cc: Kristen Carlson Accardi <kristen@linux.intel.com>
> >>>> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> >>>> Cc: Viresh Kumar <viresh.kumar@linaro.org>
> >>>> Cc: linux-pm@vger.kernel.org
> >>>> Cc: Doug Smythies <dsmythies@telus.net>
> >>>> Signed-off-by: Prarit Bhargava <prarit@redhat.com>
> >>>
> >>> Acked-by: Kristen Carlson Accardi <kristen@linux.intel.com>
> >>>
> >>> BTW - I think I can see an issue here with HWP enabled systems.  It
> >>> looks to me like the hwp settings will not be programmed correctly
> >>> during a governor switch.  This probably needs to be addressed in a
> >>> separate patch.
> >>>
> >>
> >> Oh, I see it now too.  I'll get to that in another patch.  Thanks for pointing
> >> that out Kristen.
> > 
> > The $subject patch doesn't apply any more after the series from Srinivas that
> > I've just applied.
> > 
> > Can you please rebase it on top of my bleeding-edge branch?
> > 
> 
> Sure -- can you send me a pointer to the branch?

git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git bleeding-edge

Thanks,
Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 3af9dd7..78b4be5 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -156,7 +156,20 @@  struct perf_limits {
 	int min_sysfs_pct;
 };
 
-static struct perf_limits limits = {
+static struct perf_limits performance_limits = {
+	.no_turbo = 0,
+	.turbo_disabled = 0,
+	.max_perf_pct = 100,
+	.max_perf = int_tofp(1),
+	.min_perf_pct = 100,
+	.min_perf = int_tofp(1),
+	.max_policy_pct = 100,
+	.max_sysfs_pct = 100,
+	.min_policy_pct = 0,
+	.min_sysfs_pct = 0,
+};
+
+static struct perf_limits powersave_limits = {
 	.no_turbo = 0,
 	.turbo_disabled = 0,
 	.max_perf_pct = 100,
@@ -169,6 +182,12 @@  static struct perf_limits limits = {
 	.min_sysfs_pct = 0,
 };
 
+#ifdef CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE
+static struct perf_limits *limits = &performance_limits;
+#else
+static struct perf_limits *limits = &powersave_limits;
+#endif
+
 static inline void pid_reset(struct _pid *pid, int setpoint, int busy,
 			     int deadband, int integral) {
 	pid->setpoint = setpoint;
@@ -255,7 +274,7 @@  static inline void update_turbo_state(void)
 
 	cpu = all_cpu_data[0];
 	rdmsrl(MSR_IA32_MISC_ENABLE, misc_en);
-	limits.turbo_disabled =
+	limits->turbo_disabled =
 		(misc_en & MSR_IA32_MISC_ENABLE_TURBO_DISABLE ||
 		 cpu->pstate.max_pstate == cpu->pstate.turbo_pstate);
 }
@@ -274,14 +293,14 @@  static void intel_pstate_hwp_set(void)
 
 	for_each_online_cpu(cpu) {
 		rdmsrl_on_cpu(cpu, MSR_HWP_REQUEST, &value);
-		adj_range = limits.min_perf_pct * range / 100;
+		adj_range = limits->min_perf_pct * range / 100;
 		min = hw_min + adj_range;
 		value &= ~HWP_MIN_PERF(~0L);
 		value |= HWP_MIN_PERF(min);
 
-		adj_range = limits.max_perf_pct * range / 100;
+		adj_range = limits->max_perf_pct * range / 100;
 		max = hw_min + adj_range;
-		if (limits.no_turbo) {
+		if (limits->no_turbo) {
 			hw_max = HWP_GUARANTEED_PERF(cap);
 			if (hw_max < max)
 				max = hw_max;
@@ -350,7 +369,7 @@  static void __init intel_pstate_debug_expose_params(void)
 	static ssize_t show_##file_name					\
 	(struct kobject *kobj, struct attribute *attr, char *buf)	\
 	{								\
-		return sprintf(buf, "%u\n", limits.object);		\
+		return sprintf(buf, "%u\n", limits->object);		\
 	}
 
 static ssize_t show_turbo_pct(struct kobject *kobj,
@@ -386,10 +405,10 @@  static ssize_t show_no_turbo(struct kobject *kobj,
 	ssize_t ret;
 
 	update_turbo_state();
-	if (limits.turbo_disabled)
-		ret = sprintf(buf, "%u\n", limits.turbo_disabled);
+	if (limits->turbo_disabled)
+		ret = sprintf(buf, "%u\n", limits->turbo_disabled);
 	else
-		ret = sprintf(buf, "%u\n", limits.no_turbo);
+		ret = sprintf(buf, "%u\n", limits->no_turbo);
 
 	return ret;
 }
@@ -405,12 +424,12 @@  static ssize_t store_no_turbo(struct kobject *a, struct attribute *b,
 		return -EINVAL;
 
 	update_turbo_state();
-	if (limits.turbo_disabled) {
+	if (limits->turbo_disabled) {
 		pr_warn("intel_pstate: Turbo disabled by BIOS or unavailable on processor\n");
 		return -EPERM;
 	}
 
-	limits.no_turbo = clamp_t(int, input, 0, 1);
+	limits->no_turbo = clamp_t(int, input, 0, 1);
 
 	if (hwp_active)
 		intel_pstate_hwp_set();
@@ -428,11 +447,15 @@  static ssize_t store_max_perf_pct(struct kobject *a, struct attribute *b,
 	if (ret != 1)
 		return -EINVAL;
 
-	limits.max_sysfs_pct = clamp_t(int, input, 0 , 100);
-	limits.max_perf_pct = min(limits.max_policy_pct, limits.max_sysfs_pct);
-	limits.max_perf_pct = max(limits.min_policy_pct, limits.max_perf_pct);
-	limits.max_perf_pct = max(limits.min_perf_pct, limits.max_perf_pct);
-	limits.max_perf = div_fp(int_tofp(limits.max_perf_pct), int_tofp(100));
+	limits->max_sysfs_pct = clamp_t(int, input, 0 , 100);
+	limits->max_perf_pct = min(limits->max_policy_pct,
+				   limits->max_sysfs_pct);
+	limits->max_perf_pct = max(limits->min_policy_pct,
+				   limits->max_perf_pct);
+	limits->max_perf_pct = max(limits->min_perf_pct,
+				   limits->max_perf_pct);
+	limits->max_perf = div_fp(int_tofp(limits->max_perf_pct),
+				  int_tofp(100));
 
 	if (hwp_active)
 		intel_pstate_hwp_set();
@@ -449,11 +472,15 @@  static ssize_t store_min_perf_pct(struct kobject *a, struct attribute *b,
 	if (ret != 1)
 		return -EINVAL;
 
-	limits.min_sysfs_pct = clamp_t(int, input, 0 , 100);
-	limits.min_perf_pct = max(limits.min_policy_pct, limits.min_sysfs_pct);
-	limits.min_perf_pct = min(limits.max_policy_pct, limits.min_perf_pct);
-	limits.min_perf_pct = min(limits.max_perf_pct, limits.min_perf_pct);
-	limits.min_perf = div_fp(int_tofp(limits.min_perf_pct), int_tofp(100));
+	limits->min_sysfs_pct = clamp_t(int, input, 0 , 100);
+	limits->min_perf_pct = max(limits->min_policy_pct,
+				   limits->min_sysfs_pct);
+	limits->min_perf_pct = min(limits->max_policy_pct,
+				   limits->min_perf_pct);
+	limits->min_perf_pct = min(limits->max_perf_pct,
+				   limits->min_perf_pct);
+	limits->min_perf = div_fp(int_tofp(limits->min_perf_pct),
+				  int_tofp(100));
 
 	if (hwp_active)
 		intel_pstate_hwp_set();
@@ -533,7 +560,7 @@  static void byt_set_pstate(struct cpudata *cpudata, int pstate)
 	u32 vid;
 
 	val = (u64)pstate << 8;
-	if (limits.no_turbo && !limits.turbo_disabled)
+	if (limits->no_turbo && !limits->turbo_disabled)
 		val |= (u64)1 << 32;
 
 	vid_fp = cpudata->vid.min + mul_fp(
@@ -622,7 +649,7 @@  static void core_set_pstate(struct cpudata *cpudata, int pstate)
 	u64 val;
 
 	val = (u64)pstate << 8;
-	if (limits.no_turbo && !limits.turbo_disabled)
+	if (limits->no_turbo && !limits->turbo_disabled)
 		val |= (u64)1 << 32;
 
 	wrmsrl_on_cpu(cpudata->cpu, MSR_IA32_PERF_CTL, val);
@@ -702,7 +729,7 @@  static void intel_pstate_get_min_max(struct cpudata *cpu, int *min, int *max)
 	int max_perf_adj;
 	int min_perf;
 
-	if (limits.no_turbo || limits.turbo_disabled)
+	if (limits->no_turbo || limits->turbo_disabled)
 		max_perf = cpu->pstate.max_pstate;
 
 	/*
@@ -710,11 +737,11 @@  static void intel_pstate_get_min_max(struct cpudata *cpu, int *min, int *max)
 	 * policy, or by cpu specific default values determined through
 	 * experimentation.
 	 */
-	max_perf_adj = fp_toint(mul_fp(int_tofp(max_perf), limits.max_perf));
+	max_perf_adj = fp_toint(mul_fp(int_tofp(max_perf), limits->max_perf));
 	*max = clamp_t(int, max_perf_adj,
 			cpu->pstate.min_pstate, cpu->pstate.turbo_pstate);
 
-	min_perf = fp_toint(mul_fp(int_tofp(max_perf), limits.min_perf));
+	min_perf = fp_toint(mul_fp(int_tofp(max_perf), limits->min_perf));
 	*min = clamp_t(int, min_perf, cpu->pstate.min_pstate, max_perf);
 }
 
@@ -988,32 +1015,35 @@  static int intel_pstate_set_policy(struct cpufreq_policy *policy)
 
 	if (policy->policy == CPUFREQ_POLICY_PERFORMANCE &&
 	    policy->max >= policy->cpuinfo.max_freq) {
-		limits.min_policy_pct = 100;
-		limits.min_perf_pct = 100;
-		limits.min_perf = int_tofp(1);
-		limits.max_policy_pct = 100;
-		limits.max_perf_pct = 100;
-		limits.max_perf = int_tofp(1);
-		limits.no_turbo = 0;
+		pr_debug("intel_pstate: set performance\n");
+		limits = &performance_limits;
 		return 0;
 	}
 
-	limits.min_policy_pct = (policy->min * 100) / policy->cpuinfo.max_freq;
-	limits.min_policy_pct = clamp_t(int, limits.min_policy_pct, 0 , 100);
-	limits.max_policy_pct = (policy->max * 100) / policy->cpuinfo.max_freq;
-	limits.max_policy_pct = clamp_t(int, limits.max_policy_pct, 0 , 100);
+	pr_debug("intel_pstate: set powersave\n");
+	limits = &powersave_limits;
+	limits->min_policy_pct = (policy->min * 100) / policy->cpuinfo.max_freq;
+	limits->min_policy_pct = clamp_t(int, limits->min_policy_pct, 0 , 100);
+	limits->max_policy_pct = (policy->max * 100) / policy->cpuinfo.max_freq;
+	limits->max_policy_pct = clamp_t(int, limits->max_policy_pct, 0 , 100);
 
 	/* Normalize user input to [min_policy_pct, max_policy_pct] */
-	limits.min_perf_pct = max(limits.min_policy_pct, limits.min_sysfs_pct);
-	limits.min_perf_pct = min(limits.max_policy_pct, limits.min_perf_pct);
-	limits.max_perf_pct = min(limits.max_policy_pct, limits.max_sysfs_pct);
-	limits.max_perf_pct = max(limits.min_policy_pct, limits.max_perf_pct);
+	limits->min_perf_pct = max(limits->min_policy_pct,
+				   limits->min_sysfs_pct);
+	limits->min_perf_pct = min(limits->max_policy_pct,
+				   limits->min_perf_pct);
+	limits->max_perf_pct = min(limits->max_policy_pct,
+				   limits->max_sysfs_pct);
+	limits->max_perf_pct = max(limits->min_policy_pct,
+				   limits->max_perf_pct);
 
 	/* Make sure min_perf_pct <= max_perf_pct */
-	limits.min_perf_pct = min(limits.max_perf_pct, limits.min_perf_pct);
+	limits->min_perf_pct = min(limits->max_perf_pct, limits->min_perf_pct);
 
-	limits.min_perf = div_fp(int_tofp(limits.min_perf_pct), int_tofp(100));
-	limits.max_perf = div_fp(int_tofp(limits.max_perf_pct), int_tofp(100));
+	limits->min_perf = div_fp(int_tofp(limits->min_perf_pct),
+				  int_tofp(100));
+	limits->max_perf = div_fp(int_tofp(limits->max_perf_pct),
+				  int_tofp(100));
 
 	if (hwp_active)
 		intel_pstate_hwp_set();
@@ -1057,7 +1087,7 @@  static int intel_pstate_cpu_init(struct cpufreq_policy *policy)
 
 	cpu = all_cpu_data[policy->cpu];
 
-	if (limits.min_perf_pct == 100 && limits.max_perf_pct == 100)
+	if (limits->min_perf_pct == 100 && limits->max_perf_pct == 100)
 		policy->policy = CPUFREQ_POLICY_PERFORMANCE;
 	else
 		policy->policy = CPUFREQ_POLICY_POWERSAVE;