diff mbox series

xen/cpufreq: Reset policy after enabling/disabling turbo status

Message ID 20211110091935.16906-1-jane.malalane@citrix.com (mailing list archive)
State New, archived
Headers show
Series xen/cpufreq: Reset policy after enabling/disabling turbo status | expand

Commit Message

Jane Malalane Nov. 10, 2021, 9:19 a.m. UTC
Before, user would change turbo status but this had no effect: boolean
was set but policy wasn't reevaluated.  Policy must be reevaluated so
that CPU frequency is chosen according to the turbo status and the
current governor.

Therefore, add __cpufreq_governor() in cpufreq_update_turbo().

Reported-by: <edvin.torok@citrix.com>
Signed-off-by: <jane.malalane@citrix.com>
---
CC: Jan Beulich <jbeulich@suse.com>
CC: Ian Jackson <iwj@xenproject.org>
---

Release rationale:
Not taking this patch means that turbo status is misleading.

Taking this patch is low-risk as it only uses a function that already
exists and is already used for setting the chosen scaling governor.
Essentially, this change is equivalent to running 'xenpm
en/disable-turbo-mode' and, subsequently, running 'xenpm
set-scaling-governor <current governor>'.
---
 xen/drivers/cpufreq/utility.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Comments

Ian Jackson Nov. 10, 2021, 11:48 a.m. UTC | #1
Jane Malalane writes ("[PATCH] xen/cpufreq: Reset policy after enabling/disabling turbo status"):
> Before, user would change turbo status but this had no effect: boolean
> was set but policy wasn't reevaluated.  Policy must be reevaluated so
> that CPU frequency is chosen according to the turbo status and the
> current governor.
> 
> Therefore, add __cpufreq_governor() in cpufreq_update_turbo().
...
> Not taking this patch means that turbo status is misleading.
> 
> Taking this patch is low-risk as it only uses a function that already
> exists and is already used for setting the chosen scaling governor.
> Essentially, this change is equivalent to running 'xenpm
> en/disable-turbo-mode' and, subsequently, running 'xenpm
> set-scaling-governor <current governor>'.

Thanks.  I am positively inclined.  But I have one query about this
rationale.  Adding a new call to an existing function is OK if calling
__cpufreq_governor is permitted here.  Are there locking or reentrancy
hazards ?  Perhaps these issue have been considered but I would like
to see something explicit about that.

Thanks,
Ian.
Roger Pau Monné Nov. 10, 2021, 12:39 p.m. UTC | #2
On Wed, Nov 10, 2021 at 09:19:35AM +0000, Jane Malalane wrote:
> Before, user would change turbo status but this had no effect: boolean
> was set but policy wasn't reevaluated.  Policy must be reevaluated so
> that CPU frequency is chosen according to the turbo status and the
> current governor.
> 
> Therefore, add __cpufreq_governor() in cpufreq_update_turbo().
> 
> Reported-by: <edvin.torok@citrix.com>
> Signed-off-by: <jane.malalane@citrix.com>
> ---
> CC: Jan Beulich <jbeulich@suse.com>
> CC: Ian Jackson <iwj@xenproject.org>
> ---
> 
> Release rationale:
> Not taking this patch means that turbo status is misleading.
> 
> Taking this patch is low-risk as it only uses a function that already
> exists and is already used for setting the chosen scaling governor.
> Essentially, this change is equivalent to running 'xenpm
> en/disable-turbo-mode' and, subsequently, running 'xenpm
> set-scaling-governor <current governor>'.
> ---
>  xen/drivers/cpufreq/utility.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/drivers/cpufreq/utility.c b/xen/drivers/cpufreq/utility.c
> index b93895d4dd..5f200ff3ee 100644
> --- a/xen/drivers/cpufreq/utility.c
> +++ b/xen/drivers/cpufreq/utility.c
> @@ -417,10 +417,14 @@ int cpufreq_update_turbo(int cpuid, int new_state)
>      {
>          ret = cpufreq_driver.update(cpuid, policy);
>          if (ret)
> +        {
>              policy->turbo = curr_state;
> +            return ret;
> +        }
>      }
>  
> -    return ret;
> +    /* Reevaluate current CPU policy. */
> +    return __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS);

Do you need to revert the policy->turbo value to the previous one if
the call to __cpufreq_governor returns an error? (much like it's done
for the .update call above).

Thanks, Roger.
Jan Beulich Nov. 11, 2021, 11 a.m. UTC | #3
On 10.11.2021 10:19, Jane Malalane wrote:
> Before, user would change turbo status but this had no effect: boolean
> was set but policy wasn't reevaluated.  Policy must be reevaluated so
> that CPU frequency is chosen according to the turbo status and the
> current governor.

Aiui this only (or at least mainly) affects the ACPI driver. Powernow
updates CPB via its update hook. I think this wants clarifying.

> Therefore, add __cpufreq_governor() in cpufreq_update_turbo().
> 
> Reported-by: <edvin.torok@citrix.com>
> Signed-off-by: <jane.malalane@citrix.com>

Nit: These would look better with real names added. Without I'm not
even sure enclosing the email addresses in angle brackets yields
something that's valid.

> Release rationale:
> Not taking this patch means that turbo status is misleading.
> 
> Taking this patch is low-risk as it only uses a function that already
> exists and is already used for setting the chosen scaling governor.
> Essentially, this change is equivalent to running 'xenpm
> en/disable-turbo-mode' and, subsequently, running 'xenpm
> set-scaling-governor <current governor>'.

Otoh things have been this way virtually forever.

> --- a/xen/drivers/cpufreq/utility.c
> +++ b/xen/drivers/cpufreq/utility.c
> @@ -417,10 +417,14 @@ int cpufreq_update_turbo(int cpuid, int new_state)
>      {
>          ret = cpufreq_driver.update(cpuid, policy);
>          if (ret)
> +        {
>              policy->turbo = curr_state;
> +            return ret;
> +        }
>      }
>  
> -    return ret;
> +    /* Reevaluate current CPU policy. */
> +    return __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS);

Is this really needed when an .update hook is present? IOW wouldn't
this want to be in an "else" to the preceding if()? Or if not,
would this perhaps be more logically done prior to invoking .update()
(such that the hook would observe the updated policy, in case that's
relevant to what it does)?

Jan
Jan Beulich Nov. 11, 2021, 11:06 a.m. UTC | #4
On 10.11.2021 13:39, Roger Pau Monné wrote:
> On Wed, Nov 10, 2021 at 09:19:35AM +0000, Jane Malalane wrote:
>> --- a/xen/drivers/cpufreq/utility.c
>> +++ b/xen/drivers/cpufreq/utility.c
>> @@ -417,10 +417,14 @@ int cpufreq_update_turbo(int cpuid, int new_state)
>>      {
>>          ret = cpufreq_driver.update(cpuid, policy);
>>          if (ret)
>> +        {
>>              policy->turbo = curr_state;
>> +            return ret;
>> +        }
>>      }
>>  
>> -    return ret;
>> +    /* Reevaluate current CPU policy. */
>> +    return __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS);
> 
> Do you need to revert the policy->turbo value to the previous one if
> the call to __cpufreq_governor returns an error? (much like it's done
> for the .update call above).

I guess this can be viewed either way: Keeping the value would allow
a later successful invocation of the .target() hook to observe the
intended value. Obviously then it's questionable whether returning an
error in that case isn't going to be misleading - failure of the
policy update would then rather need to be indicated by some
"deferred" indicator (which we don't have). Taking into account the
behavior prior to this patch I wonder whether it's an option to
simply ignore an error from __cpufreq_governor() here.

Jan
Jane Malalane Nov. 12, 2021, 11:31 a.m. UTC | #5
On 10/11/2021 11:48, Ian Jackson wrote:

[CAUTION - EXTERNAL EMAIL] DO NOT reply, click links, or open attachments unless you have verified the sender and know the content is safe.

Jane Malalane writes ("[PATCH] xen/cpufreq: Reset policy after enabling/disabling turbo status"):


Before, user would change turbo status but this had no effect: boolean
was set but policy wasn't reevaluated.  Policy must be reevaluated so
that CPU frequency is chosen according to the turbo status and the
current governor.

Therefore, add __cpufreq_governor() in cpufreq_update_turbo().


...


Not taking this patch means that turbo status is misleading.

Taking this patch is low-risk as it only uses a function that already
exists and is already used for setting the chosen scaling governor.
Essentially, this change is equivalent to running 'xenpm
en/disable-turbo-mode' and, subsequently, running 'xenpm
set-scaling-governor <current governor>'.



Thanks.  I am positively inclined.  But I have one query about this
rationale.  Adding a new call to an existing function is OK if calling
__cpufreq_governor is permitted here.  Are there locking or reentrancy
hazards ?  Perhaps these issue have been considered but I would like
to see something explicit about that.

Thanks,


Hi Ian,


I think that's not a concern here because the only other
callers of __cpufreq_governor are __cpufreq_set_policy(), which is under the
same sysctl_lock, and cpufreq_del_cpu, which shouldn't be an issue because no
further action can be performed against the cpu when that function is called.


I will have to submit a v2 of this patch, so I can add
these considerations to the release rationale section?



Thanks,

Jane.
Andrew Cooper Nov. 12, 2021, 6:51 p.m. UTC | #6
On 10/11/2021 09:19, Jane Malalane wrote:
> Before, user would change turbo status but this had no effect: boolean
> was set but policy wasn't reevaluated.  Policy must be reevaluated so
> that CPU frequency is chosen according to the turbo status and the
> current governor.
>
> Therefore, add __cpufreq_governor() in cpufreq_update_turbo().
>
> Reported-by: <edvin.torok@citrix.com>
> Signed-off-by: <jane.malalane@citrix.com>
> ---
> CC: Jan Beulich <jbeulich@suse.com>
> CC: Ian Jackson <iwj@xenproject.org>
> ---
>
> Release rationale:
> Not taking this patch means that turbo status is misleading.
>
> Taking this patch is low-risk as it only uses a function that already
> exists and is already used for setting the chosen scaling governor.
> Essentially, this change is equivalent to running 'xenpm
> en/disable-turbo-mode' and, subsequently, running 'xenpm
> set-scaling-governor <current governor>'.
> ---
>  xen/drivers/cpufreq/utility.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/xen/drivers/cpufreq/utility.c b/xen/drivers/cpufreq/utility.c
> index b93895d4dd..5f200ff3ee 100644
> --- a/xen/drivers/cpufreq/utility.c
> +++ b/xen/drivers/cpufreq/utility.c
> @@ -417,10 +417,14 @@ int cpufreq_update_turbo(int cpuid, int new_state)
>      {
>          ret = cpufreq_driver.update(cpuid, policy);
>          if (ret)
> +        {
>              policy->turbo = curr_state;
> +            return ret;
> +        }
>      }
>  
> -    return ret;
> +    /* Reevaluate current CPU policy. */
> +    return __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS);
>  }

So, having looked through the manual, what the cpufreq_driver does for
turbo on Intel is bogus according to the SDM.

There is a non-architectrual dance with IA32_MISC_ENABLE bit 38 (per
package) for firmware to configure turbo, but it manifests as another
dynamic CPUID bit (which I think we handle correctly).  It has the same
semantics as XD_DISABLE and CPUID_LIMIT so we might want to consider
adding it to the set of bits we clear during boot.

However, the correct way to turn off turbo is to set
IA32_PERF_CTL.TURBO_DISENGAGE bit, which is per logical processor.  As
such, it *should* behave far more like the AMD CPB path.

Therefore, I propose that the update hook gets renamed to update_turbo()
to more clearly state it's purpose, and that we use the TURBO_DISENGAGE
bit as documented.

If we're going this route, I'd also like to make this hook consistent
with others, where we IPI directly, rather than having an intermediate
function pointer just to send an IPI.

~Andrew
Jason Andryuk Nov. 12, 2021, 7:51 p.m. UTC | #7
On Fri, Nov 12, 2021 at 1:51 PM Andrew Cooper <amc96@srcf.net> wrote:
>
> On 10/11/2021 09:19, Jane Malalane wrote:
> > Before, user would change turbo status but this had no effect: boolean
> > was set but policy wasn't reevaluated.  Policy must be reevaluated so
> > that CPU frequency is chosen according to the turbo status and the
> > current governor.
> >
> > Therefore, add __cpufreq_governor() in cpufreq_update_turbo().
> >
> > Reported-by: <edvin.torok@citrix.com>
> > Signed-off-by: <jane.malalane@citrix.com>
> > ---
> > CC: Jan Beulich <jbeulich@suse.com>
> > CC: Ian Jackson <iwj@xenproject.org>
> > ---
> >
> > Release rationale:
> > Not taking this patch means that turbo status is misleading.
> >
> > Taking this patch is low-risk as it only uses a function that already
> > exists and is already used for setting the chosen scaling governor.
> > Essentially, this change is equivalent to running 'xenpm
> > en/disable-turbo-mode' and, subsequently, running 'xenpm
> > set-scaling-governor <current governor>'.
> > ---
> >  xen/drivers/cpufreq/utility.c | 6 +++++-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/xen/drivers/cpufreq/utility.c b/xen/drivers/cpufreq/utility.c
> > index b93895d4dd..5f200ff3ee 100644
> > --- a/xen/drivers/cpufreq/utility.c
> > +++ b/xen/drivers/cpufreq/utility.c
> > @@ -417,10 +417,14 @@ int cpufreq_update_turbo(int cpuid, int new_state)
> >      {
> >          ret = cpufreq_driver.update(cpuid, policy);
> >          if (ret)
> > +        {
> >              policy->turbo = curr_state;
> > +            return ret;
> > +        }
> >      }
> >
> > -    return ret;
> > +    /* Reevaluate current CPU policy. */
> > +    return __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS);
> >  }
>
> So, having looked through the manual, what the cpufreq_driver does for
> turbo on Intel is bogus according to the SDM.
>
> There is a non-architectrual dance with IA32_MISC_ENABLE bit 38 (per
> package) for firmware to configure turbo, but it manifests as another
> dynamic CPUID bit (which I think we handle correctly).  It has the same
> semantics as XD_DISABLE and CPUID_LIMIT so we might want to consider
> adding it to the set of bits we clear during boot.
>
> However, the correct way to turn off turbo is to set
> IA32_PERF_CTL.TURBO_DISENGAGE bit, which is per logical processor.  As
> such, it *should* behave far more like the AMD CPB path.

I wrote this in my HWP work (which I need to get back to...):
+/*
+ * The SDM reads like turbo should be disabled with MSR_IA32_PERF_CTL and
+ * PERF_CTL_TURBO_DISENGAGE, but that does not seem to actually work, at least
+ * with my HWP testing.  MSR_IA32_MISC_ENABLE and MISC_ENABLE_TURBO_DISENGAGE
+ * is what Linux uses and seems to work.
+ */

Regards,
Jason
Jan Beulich Nov. 15, 2021, 10:53 a.m. UTC | #8
On 12.11.2021 19:51, Andrew Cooper wrote:
> On 10/11/2021 09:19, Jane Malalane wrote:
>> Before, user would change turbo status but this had no effect: boolean
>> was set but policy wasn't reevaluated.  Policy must be reevaluated so
>> that CPU frequency is chosen according to the turbo status and the
>> current governor.
>>
>> Therefore, add __cpufreq_governor() in cpufreq_update_turbo().
>>
>> Reported-by: <edvin.torok@citrix.com>
>> Signed-off-by: <jane.malalane@citrix.com>
>> ---
>> CC: Jan Beulich <jbeulich@suse.com>
>> CC: Ian Jackson <iwj@xenproject.org>
>> ---
>>
>> Release rationale:
>> Not taking this patch means that turbo status is misleading.
>>
>> Taking this patch is low-risk as it only uses a function that already
>> exists and is already used for setting the chosen scaling governor.
>> Essentially, this change is equivalent to running 'xenpm
>> en/disable-turbo-mode' and, subsequently, running 'xenpm
>> set-scaling-governor <current governor>'.
>> ---
>>  xen/drivers/cpufreq/utility.c | 6 +++++-
>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/xen/drivers/cpufreq/utility.c b/xen/drivers/cpufreq/utility.c
>> index b93895d4dd..5f200ff3ee 100644
>> --- a/xen/drivers/cpufreq/utility.c
>> +++ b/xen/drivers/cpufreq/utility.c
>> @@ -417,10 +417,14 @@ int cpufreq_update_turbo(int cpuid, int new_state)
>>      {
>>          ret = cpufreq_driver.update(cpuid, policy);
>>          if (ret)
>> +        {
>>              policy->turbo = curr_state;
>> +            return ret;
>> +        }
>>      }
>>  
>> -    return ret;
>> +    /* Reevaluate current CPU policy. */
>> +    return __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS);
>>  }
> 
> So, having looked through the manual, what the cpufreq_driver does for
> turbo on Intel is bogus according to the SDM.
> 
> There is a non-architectrual dance with IA32_MISC_ENABLE bit 38 (per
> package) for firmware to configure turbo, but it manifests as another
> dynamic CPUID bit (which I think we handle correctly).  It has the same
> semantics as XD_DISABLE and CPUID_LIMIT so we might want to consider
> adding it to the set of bits we clear during boot.

This is quite confusing in the docs - at least one of the tables calls
the bit "IDA Disable", while other entries at least also refer to the
effect of disabling IDA. I'm afraid I have trouble connecting turbo
mode and IDA disabling, unless both are two different names of the
same thing. Maybe they really are, except that SDM vol 2 uses yet
another slightly different term for CPUID[6].EAX[1]: "Intel Turbo Boost
Technology".

> However, the correct way to turn off turbo is to set
> IA32_PERF_CTL.TURBO_DISENGAGE bit, which is per logical processor.  As
> such, it *should* behave far more like the AMD CPB path.

I'm afraid public documentation has no mention of a bit of this name.
Considering the above I wonder whether you mean "IDA engage" (bit 32),
albeit this doesn't seem very likely when you're taking about a
"disengage" bit. If it is, we'd still need to cope with it being
unavailable: While as per the doc it exists from Merom onwards, i.e.
just far enough back for all 64-bit capable processors to be covered,
at least there it is attributed "Mobile only".

Jan

> Therefore, I propose that the update hook gets renamed to update_turbo()
> to more clearly state it's purpose, and that we use the TURBO_DISENGAGE
> bit as documented.
> 
> If we're going this route, I'd also like to make this hook consistent
> with others, where we IPI directly, rather than having an intermediate
> function pointer just to send an IPI.
> 
> ~Andrew
>
Andrew Cooper Nov. 15, 2021, 2:33 p.m. UTC | #9
On 15/11/2021 10:53, Jan Beulich wrote:
> On 12.11.2021 19:51, Andrew Cooper wrote:
>> On 10/11/2021 09:19, Jane Malalane wrote:
>>> Before, user would change turbo status but this had no effect: boolean
>>> was set but policy wasn't reevaluated.  Policy must be reevaluated so
>>> that CPU frequency is chosen according to the turbo status and the
>>> current governor.
>>>
>>> Therefore, add __cpufreq_governor() in cpufreq_update_turbo().
>>>
>>> Reported-by: <edvin.torok@citrix.com>
>>> Signed-off-by: <jane.malalane@citrix.com>
>>> ---
>>> CC: Jan Beulich <jbeulich@suse.com>
>>> CC: Ian Jackson <iwj@xenproject.org>
>>> ---
>>>
>>> Release rationale:
>>> Not taking this patch means that turbo status is misleading.
>>>
>>> Taking this patch is low-risk as it only uses a function that already
>>> exists and is already used for setting the chosen scaling governor.
>>> Essentially, this change is equivalent to running 'xenpm
>>> en/disable-turbo-mode' and, subsequently, running 'xenpm
>>> set-scaling-governor <current governor>'.
>>> ---
>>>  xen/drivers/cpufreq/utility.c | 6 +++++-
>>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/xen/drivers/cpufreq/utility.c b/xen/drivers/cpufreq/utility.c
>>> index b93895d4dd..5f200ff3ee 100644
>>> --- a/xen/drivers/cpufreq/utility.c
>>> +++ b/xen/drivers/cpufreq/utility.c
>>> @@ -417,10 +417,14 @@ int cpufreq_update_turbo(int cpuid, int new_state)
>>>      {
>>>          ret = cpufreq_driver.update(cpuid, policy);
>>>          if (ret)
>>> +        {
>>>              policy->turbo = curr_state;
>>> +            return ret;
>>> +        }
>>>      }
>>>  
>>> -    return ret;
>>> +    /* Reevaluate current CPU policy. */
>>> +    return __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS);
>>>  }
>> So, having looked through the manual, what the cpufreq_driver does for
>> turbo on Intel is bogus according to the SDM.
>>
>> There is a non-architectrual dance with IA32_MISC_ENABLE bit 38 (per
>> package) for firmware to configure turbo, but it manifests as another
>> dynamic CPUID bit (which I think we handle correctly).  It has the same
>> semantics as XD_DISABLE and CPUID_LIMIT so we might want to consider
>> adding it to the set of bits we clear during boot.
> This is quite confusing in the docs - at least one of the tables calls
> the bit "IDA Disable", while other entries at least also refer to the
> effect of disabling IDA. I'm afraid I have trouble connecting turbo
> mode and IDA disabling, unless both are two different names of the
> same thing. Maybe they really are, except that SDM vol 2 uses yet
> another slightly different term for CPUID[6].EAX[1]: "Intel Turbo Boost
> Technology".

SDM Vol3 14.3 uses IDA and turbo interchangeably in some cases.  It
reads as if IDA is the general technology name, and turbo is a sub-mode
for "doing it automatically without software involvement".

On CPUs which support IDA, the CPUID bit is !MISC_ENABLE[38], with
further adds to the confusion of which is which.

>> However, the correct way to turn off turbo is to set
>> IA32_PERF_CTL.TURBO_DISENGAGE bit, which is per logical processor.  As
>> such, it *should* behave far more like the AMD CPB path.
> I'm afraid public documentation has no mention of a bit of this name.
> Considering the above I wonder whether you mean "IDA engage" (bit 32),
> albeit this doesn't seem very likely when you're taking about a
> "disengage" bit.

It is that bit.  Despite the name given in Vol4 Table 2-12, it is "set
to disable".

Vol3 Figure 14-2 explicitly calls it the "IDA/Turbo disengage" bit and
the surrounding text makes it clear that it is disable bit, not an
enable bit.  Also, that's how the Linux intel_pstate driver uses it.

>  If it is, we'd still need to cope with it being
> unavailable: While as per the doc it exists from Merom onwards, i.e.
> just far enough back for all 64-bit capable processors to be covered,
> at least there it is attributed "Mobile only".

Honestly, the number of errors in those tables are alarming.  The MSR is
has been around longer than turbo, and I'm told that the interface has
never changed since its introduction.

Looking across Linux, there's a mess too.

acpi-cpufreq and energy_perf_policy use the MISC_ENABLE bit (which is
package wide)
intel_ips driver uses PERF_CTL (which is logical processor)
intel_pstate uses MISC_ENABLE || max_pstate == turbo_pstate

I'm certain I've seen "one pstate being special" WRT turbo before, but I
can't locate anything about this in the SDM, which possibly means it is
model specific.

~Andrew
Jan Beulich Nov. 15, 2021, 4:21 p.m. UTC | #10
On 15.11.2021 15:33, Andrew Cooper wrote:
> On 15/11/2021 10:53, Jan Beulich wrote:
>> On 12.11.2021 19:51, Andrew Cooper wrote:
>>> On 10/11/2021 09:19, Jane Malalane wrote:
>>>> Before, user would change turbo status but this had no effect: boolean
>>>> was set but policy wasn't reevaluated.  Policy must be reevaluated so
>>>> that CPU frequency is chosen according to the turbo status and the
>>>> current governor.
>>>>
>>>> Therefore, add __cpufreq_governor() in cpufreq_update_turbo().
>>>>
>>>> Reported-by: <edvin.torok@citrix.com>
>>>> Signed-off-by: <jane.malalane@citrix.com>
>>>> ---
>>>> CC: Jan Beulich <jbeulich@suse.com>
>>>> CC: Ian Jackson <iwj@xenproject.org>
>>>> ---
>>>>
>>>> Release rationale:
>>>> Not taking this patch means that turbo status is misleading.
>>>>
>>>> Taking this patch is low-risk as it only uses a function that already
>>>> exists and is already used for setting the chosen scaling governor.
>>>> Essentially, this change is equivalent to running 'xenpm
>>>> en/disable-turbo-mode' and, subsequently, running 'xenpm
>>>> set-scaling-governor <current governor>'.
>>>> ---
>>>>  xen/drivers/cpufreq/utility.c | 6 +++++-
>>>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/xen/drivers/cpufreq/utility.c b/xen/drivers/cpufreq/utility.c
>>>> index b93895d4dd..5f200ff3ee 100644
>>>> --- a/xen/drivers/cpufreq/utility.c
>>>> +++ b/xen/drivers/cpufreq/utility.c
>>>> @@ -417,10 +417,14 @@ int cpufreq_update_turbo(int cpuid, int new_state)
>>>>      {
>>>>          ret = cpufreq_driver.update(cpuid, policy);
>>>>          if (ret)
>>>> +        {
>>>>              policy->turbo = curr_state;
>>>> +            return ret;
>>>> +        }
>>>>      }
>>>>  
>>>> -    return ret;
>>>> +    /* Reevaluate current CPU policy. */
>>>> +    return __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS);
>>>>  }
>>> So, having looked through the manual, what the cpufreq_driver does for
>>> turbo on Intel is bogus according to the SDM.
>>>
>>> There is a non-architectrual dance with IA32_MISC_ENABLE bit 38 (per
>>> package) for firmware to configure turbo, but it manifests as another
>>> dynamic CPUID bit (which I think we handle correctly).  It has the same
>>> semantics as XD_DISABLE and CPUID_LIMIT so we might want to consider
>>> adding it to the set of bits we clear during boot.
>> This is quite confusing in the docs - at least one of the tables calls
>> the bit "IDA Disable", while other entries at least also refer to the
>> effect of disabling IDA. I'm afraid I have trouble connecting turbo
>> mode and IDA disabling, unless both are two different names of the
>> same thing. Maybe they really are, except that SDM vol 2 uses yet
>> another slightly different term for CPUID[6].EAX[1]: "Intel Turbo Boost
>> Technology".
> 
> SDM Vol3 14.3 uses IDA and turbo interchangeably in some cases.  It
> reads as if IDA is the general technology name, and turbo is a sub-mode
> for "doing it automatically without software involvement".
> 
> On CPUs which support IDA, the CPUID bit is !MISC_ENABLE[38], with
> further adds to the confusion of which is which.
> 
>>> However, the correct way to turn off turbo is to set
>>> IA32_PERF_CTL.TURBO_DISENGAGE bit, which is per logical processor.  As
>>> such, it *should* behave far more like the AMD CPB path.
>> I'm afraid public documentation has no mention of a bit of this name.
>> Considering the above I wonder whether you mean "IDA engage" (bit 32),
>> albeit this doesn't seem very likely when you're taking about a
>> "disengage" bit.
> 
> It is that bit.  Despite the name given in Vol4 Table 2-12, it is "set
> to disable".
> 
> Vol3 Figure 14-2 explicitly calls it the "IDA/Turbo disengage" bit and
> the surrounding text makes it clear that it is disable bit, not an
> enable bit.  Also, that's how the Linux intel_pstate driver uses it.

Okay, then I agree with the proposal you've made.

Jan

>>  If it is, we'd still need to cope with it being
>> unavailable: While as per the doc it exists from Merom onwards, i.e.
>> just far enough back for all 64-bit capable processors to be covered,
>> at least there it is attributed "Mobile only".
> 
> Honestly, the number of errors in those tables are alarming.  The MSR is
> has been around longer than turbo, and I'm told that the interface has
> never changed since its introduction.
> 
> Looking across Linux, there's a mess too.
> 
> acpi-cpufreq and energy_perf_policy use the MISC_ENABLE bit (which is
> package wide)
> intel_ips driver uses PERF_CTL (which is logical processor)
> intel_pstate uses MISC_ENABLE || max_pstate == turbo_pstate
> 
> I'm certain I've seen "one pstate being special" WRT turbo before, but I
> can't locate anything about this in the SDM, which possibly means it is
> model specific.
> 
> ~Andrew
>
Andrew Cooper Nov. 15, 2021, 10:05 p.m. UTC | #11
On 15/11/2021 16:21, Jan Beulich wrote:
> On 15.11.2021 15:33, Andrew Cooper wrote:
>> On 15/11/2021 10:53, Jan Beulich wrote:
>>> On 12.11.2021 19:51, Andrew Cooper wrote:
>>>> On 10/11/2021 09:19, Jane Malalane wrote:
>>>>> Before, user would change turbo status but this had no effect: boolean
>>>>> was set but policy wasn't reevaluated.  Policy must be reevaluated so
>>>>> that CPU frequency is chosen according to the turbo status and the
>>>>> current governor.
>>>>>
>>>>> Therefore, add __cpufreq_governor() in cpufreq_update_turbo().
>>>>>
>>>>> Reported-by: <edvin.torok@citrix.com>
>>>>> Signed-off-by: <jane.malalane@citrix.com>
>>>>> ---
>>>>> CC: Jan Beulich <jbeulich@suse.com>
>>>>> CC: Ian Jackson <iwj@xenproject.org>
>>>>> ---
>>>>>
>>>>> Release rationale:
>>>>> Not taking this patch means that turbo status is misleading.
>>>>>
>>>>> Taking this patch is low-risk as it only uses a function that already
>>>>> exists and is already used for setting the chosen scaling governor.
>>>>> Essentially, this change is equivalent to running 'xenpm
>>>>> en/disable-turbo-mode' and, subsequently, running 'xenpm
>>>>> set-scaling-governor <current governor>'.
>>>>> ---
>>>>>  xen/drivers/cpufreq/utility.c | 6 +++++-
>>>>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/xen/drivers/cpufreq/utility.c b/xen/drivers/cpufreq/utility.c
>>>>> index b93895d4dd..5f200ff3ee 100644
>>>>> --- a/xen/drivers/cpufreq/utility.c
>>>>> +++ b/xen/drivers/cpufreq/utility.c
>>>>> @@ -417,10 +417,14 @@ int cpufreq_update_turbo(int cpuid, int new_state)
>>>>>      {
>>>>>          ret = cpufreq_driver.update(cpuid, policy);
>>>>>          if (ret)
>>>>> +        {
>>>>>              policy->turbo = curr_state;
>>>>> +            return ret;
>>>>> +        }
>>>>>      }
>>>>>  
>>>>> -    return ret;
>>>>> +    /* Reevaluate current CPU policy. */
>>>>> +    return __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS);
>>>>>  }
>>>> So, having looked through the manual, what the cpufreq_driver does for
>>>> turbo on Intel is bogus according to the SDM.
>>>>
>>>> There is a non-architectrual dance with IA32_MISC_ENABLE bit 38 (per
>>>> package) for firmware to configure turbo, but it manifests as another
>>>> dynamic CPUID bit (which I think we handle correctly).  It has the same
>>>> semantics as XD_DISABLE and CPUID_LIMIT so we might want to consider
>>>> adding it to the set of bits we clear during boot.
>>> This is quite confusing in the docs - at least one of the tables calls
>>> the bit "IDA Disable", while other entries at least also refer to the
>>> effect of disabling IDA. I'm afraid I have trouble connecting turbo
>>> mode and IDA disabling, unless both are two different names of the
>>> same thing. Maybe they really are, except that SDM vol 2 uses yet
>>> another slightly different term for CPUID[6].EAX[1]: "Intel Turbo Boost
>>> Technology".
>> SDM Vol3 14.3 uses IDA and turbo interchangeably in some cases.  It
>> reads as if IDA is the general technology name, and turbo is a sub-mode
>> for "doing it automatically without software involvement".
>>
>> On CPUs which support IDA, the CPUID bit is !MISC_ENABLE[38], with
>> further adds to the confusion of which is which.
>>
>>>> However, the correct way to turn off turbo is to set
>>>> IA32_PERF_CTL.TURBO_DISENGAGE bit, which is per logical processor.  As
>>>> such, it *should* behave far more like the AMD CPB path.
>>> I'm afraid public documentation has no mention of a bit of this name.
>>> Considering the above I wonder whether you mean "IDA engage" (bit 32),
>>> albeit this doesn't seem very likely when you're taking about a
>>> "disengage" bit.
>> It is that bit.  Despite the name given in Vol4 Table 2-12, it is "set
>> to disable".
>>
>> Vol3 Figure 14-2 explicitly calls it the "IDA/Turbo disengage" bit and
>> the surrounding text makes it clear that it is disable bit, not an
>> enable bit.  Also, that's how the Linux intel_pstate driver uses it.
> Okay, then I agree with the proposal you've made.

I've just done some experimentation on a Intel(R) Xeon(R) E-2288G (CFL-R
based), and both the MISC_ENABLE and PERF_CTL bits have the same effect,
and cut nearly 1GHz off the APERF/MPERF reported frequency on a busy
single core on an otherwise idle system.

I have not checked the effect that PERF_CTL on thread 0 has on other
threads in the package, but the reality is that ~100% of production use
of these controls are for all CPUs at once.  (So much so, that I really
think the interface ought to gain a -1 sentential for "all cpus", rather
than forcing userspace to loop over each cpu individually, when Xen can
handle the entire loop in O(1) time.)

~Andrew
diff mbox series

Patch

diff --git a/xen/drivers/cpufreq/utility.c b/xen/drivers/cpufreq/utility.c
index b93895d4dd..5f200ff3ee 100644
--- a/xen/drivers/cpufreq/utility.c
+++ b/xen/drivers/cpufreq/utility.c
@@ -417,10 +417,14 @@  int cpufreq_update_turbo(int cpuid, int new_state)
     {
         ret = cpufreq_driver.update(cpuid, policy);
         if (ret)
+        {
             policy->turbo = curr_state;
+            return ret;
+        }
     }
 
-    return ret;
+    /* Reevaluate current CPU policy. */
+    return __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS);
 }