Message ID | 20211110091935.16906-1-jane.malalane@citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | xen/cpufreq: Reset policy after enabling/disabling turbo status | expand |
Jane Malalane writes ("[PATCH] xen/cpufreq: Reset policy after enabling/disabling turbo status"): > Before, user would change turbo status but this had no effect: boolean > was set but policy wasn't reevaluated. Policy must be reevaluated so > that CPU frequency is chosen according to the turbo status and the > current governor. > > Therefore, add __cpufreq_governor() in cpufreq_update_turbo(). ... > Not taking this patch means that turbo status is misleading. > > Taking this patch is low-risk as it only uses a function that already > exists and is already used for setting the chosen scaling governor. > Essentially, this change is equivalent to running 'xenpm > en/disable-turbo-mode' and, subsequently, running 'xenpm > set-scaling-governor <current governor>'. Thanks. I am positively inclined. But I have one query about this rationale. Adding a new call to an existing function is OK if calling __cpufreq_governor is permitted here. Are there locking or reentrancy hazards ? Perhaps these issue have been considered but I would like to see something explicit about that. Thanks, Ian.
On Wed, Nov 10, 2021 at 09:19:35AM +0000, Jane Malalane wrote: > Before, user would change turbo status but this had no effect: boolean > was set but policy wasn't reevaluated. Policy must be reevaluated so > that CPU frequency is chosen according to the turbo status and the > current governor. > > Therefore, add __cpufreq_governor() in cpufreq_update_turbo(). > > Reported-by: <edvin.torok@citrix.com> > Signed-off-by: <jane.malalane@citrix.com> > --- > CC: Jan Beulich <jbeulich@suse.com> > CC: Ian Jackson <iwj@xenproject.org> > --- > > Release rationale: > Not taking this patch means that turbo status is misleading. > > Taking this patch is low-risk as it only uses a function that already > exists and is already used for setting the chosen scaling governor. > Essentially, this change is equivalent to running 'xenpm > en/disable-turbo-mode' and, subsequently, running 'xenpm > set-scaling-governor <current governor>'. > --- > xen/drivers/cpufreq/utility.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/xen/drivers/cpufreq/utility.c b/xen/drivers/cpufreq/utility.c > index b93895d4dd..5f200ff3ee 100644 > --- a/xen/drivers/cpufreq/utility.c > +++ b/xen/drivers/cpufreq/utility.c > @@ -417,10 +417,14 @@ int cpufreq_update_turbo(int cpuid, int new_state) > { > ret = cpufreq_driver.update(cpuid, policy); > if (ret) > + { > policy->turbo = curr_state; > + return ret; > + } > } > > - return ret; > + /* Reevaluate current CPU policy. */ > + return __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS); Do you need to revert the policy->turbo value to the previous one if the call to __cpufreq_governor returns an error? (much like it's done for the .update call above). Thanks, Roger.
On 10.11.2021 10:19, Jane Malalane wrote: > Before, user would change turbo status but this had no effect: boolean > was set but policy wasn't reevaluated. Policy must be reevaluated so > that CPU frequency is chosen according to the turbo status and the > current governor. Aiui this only (or at least mainly) affects the ACPI driver. Powernow updates CPB via its update hook. I think this wants clarifying. > Therefore, add __cpufreq_governor() in cpufreq_update_turbo(). > > Reported-by: <edvin.torok@citrix.com> > Signed-off-by: <jane.malalane@citrix.com> Nit: These would look better with real names added. Without I'm not even sure enclosing the email addresses in angle brackets yields something that's valid. > Release rationale: > Not taking this patch means that turbo status is misleading. > > Taking this patch is low-risk as it only uses a function that already > exists and is already used for setting the chosen scaling governor. > Essentially, this change is equivalent to running 'xenpm > en/disable-turbo-mode' and, subsequently, running 'xenpm > set-scaling-governor <current governor>'. Otoh things have been this way virtually forever. > --- a/xen/drivers/cpufreq/utility.c > +++ b/xen/drivers/cpufreq/utility.c > @@ -417,10 +417,14 @@ int cpufreq_update_turbo(int cpuid, int new_state) > { > ret = cpufreq_driver.update(cpuid, policy); > if (ret) > + { > policy->turbo = curr_state; > + return ret; > + } > } > > - return ret; > + /* Reevaluate current CPU policy. */ > + return __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS); Is this really needed when an .update hook is present? IOW wouldn't this want to be in an "else" to the preceding if()? Or if not, would this perhaps be more logically done prior to invoking .update() (such that the hook would observe the updated policy, in case that's relevant to what it does)? Jan
On 10.11.2021 13:39, Roger Pau Monné wrote: > On Wed, Nov 10, 2021 at 09:19:35AM +0000, Jane Malalane wrote: >> --- a/xen/drivers/cpufreq/utility.c >> +++ b/xen/drivers/cpufreq/utility.c >> @@ -417,10 +417,14 @@ int cpufreq_update_turbo(int cpuid, int new_state) >> { >> ret = cpufreq_driver.update(cpuid, policy); >> if (ret) >> + { >> policy->turbo = curr_state; >> + return ret; >> + } >> } >> >> - return ret; >> + /* Reevaluate current CPU policy. */ >> + return __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS); > > Do you need to revert the policy->turbo value to the previous one if > the call to __cpufreq_governor returns an error? (much like it's done > for the .update call above). I guess this can be viewed either way: Keeping the value would allow a later successful invocation of the .target() hook to observe the intended value. Obviously then it's questionable whether returning an error in that case isn't going to be misleading - failure of the policy update would then rather need to be indicated by some "deferred" indicator (which we don't have). Taking into account the behavior prior to this patch I wonder whether it's an option to simply ignore an error from __cpufreq_governor() here. Jan
On 10/11/2021 11:48, Ian Jackson wrote: [CAUTION - EXTERNAL EMAIL] DO NOT reply, click links, or open attachments unless you have verified the sender and know the content is safe. Jane Malalane writes ("[PATCH] xen/cpufreq: Reset policy after enabling/disabling turbo status"): Before, user would change turbo status but this had no effect: boolean was set but policy wasn't reevaluated. Policy must be reevaluated so that CPU frequency is chosen according to the turbo status and the current governor. Therefore, add __cpufreq_governor() in cpufreq_update_turbo(). ... Not taking this patch means that turbo status is misleading. Taking this patch is low-risk as it only uses a function that already exists and is already used for setting the chosen scaling governor. Essentially, this change is equivalent to running 'xenpm en/disable-turbo-mode' and, subsequently, running 'xenpm set-scaling-governor <current governor>'. Thanks. I am positively inclined. But I have one query about this rationale. Adding a new call to an existing function is OK if calling __cpufreq_governor is permitted here. Are there locking or reentrancy hazards ? Perhaps these issue have been considered but I would like to see something explicit about that. Thanks, Hi Ian, I think that's not a concern here because the only other callers of __cpufreq_governor are __cpufreq_set_policy(), which is under the same sysctl_lock, and cpufreq_del_cpu, which shouldn't be an issue because no further action can be performed against the cpu when that function is called. I will have to submit a v2 of this patch, so I can add these considerations to the release rationale section? Thanks, Jane.
On 10/11/2021 09:19, Jane Malalane wrote: > Before, user would change turbo status but this had no effect: boolean > was set but policy wasn't reevaluated. Policy must be reevaluated so > that CPU frequency is chosen according to the turbo status and the > current governor. > > Therefore, add __cpufreq_governor() in cpufreq_update_turbo(). > > Reported-by: <edvin.torok@citrix.com> > Signed-off-by: <jane.malalane@citrix.com> > --- > CC: Jan Beulich <jbeulich@suse.com> > CC: Ian Jackson <iwj@xenproject.org> > --- > > Release rationale: > Not taking this patch means that turbo status is misleading. > > Taking this patch is low-risk as it only uses a function that already > exists and is already used for setting the chosen scaling governor. > Essentially, this change is equivalent to running 'xenpm > en/disable-turbo-mode' and, subsequently, running 'xenpm > set-scaling-governor <current governor>'. > --- > xen/drivers/cpufreq/utility.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/xen/drivers/cpufreq/utility.c b/xen/drivers/cpufreq/utility.c > index b93895d4dd..5f200ff3ee 100644 > --- a/xen/drivers/cpufreq/utility.c > +++ b/xen/drivers/cpufreq/utility.c > @@ -417,10 +417,14 @@ int cpufreq_update_turbo(int cpuid, int new_state) > { > ret = cpufreq_driver.update(cpuid, policy); > if (ret) > + { > policy->turbo = curr_state; > + return ret; > + } > } > > - return ret; > + /* Reevaluate current CPU policy. */ > + return __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS); > } So, having looked through the manual, what the cpufreq_driver does for turbo on Intel is bogus according to the SDM. There is a non-architectrual dance with IA32_MISC_ENABLE bit 38 (per package) for firmware to configure turbo, but it manifests as another dynamic CPUID bit (which I think we handle correctly). It has the same semantics as XD_DISABLE and CPUID_LIMIT so we might want to consider adding it to the set of bits we clear during boot. However, the correct way to turn off turbo is to set IA32_PERF_CTL.TURBO_DISENGAGE bit, which is per logical processor. As such, it *should* behave far more like the AMD CPB path. Therefore, I propose that the update hook gets renamed to update_turbo() to more clearly state it's purpose, and that we use the TURBO_DISENGAGE bit as documented. If we're going this route, I'd also like to make this hook consistent with others, where we IPI directly, rather than having an intermediate function pointer just to send an IPI. ~Andrew
On Fri, Nov 12, 2021 at 1:51 PM Andrew Cooper <amc96@srcf.net> wrote: > > On 10/11/2021 09:19, Jane Malalane wrote: > > Before, user would change turbo status but this had no effect: boolean > > was set but policy wasn't reevaluated. Policy must be reevaluated so > > that CPU frequency is chosen according to the turbo status and the > > current governor. > > > > Therefore, add __cpufreq_governor() in cpufreq_update_turbo(). > > > > Reported-by: <edvin.torok@citrix.com> > > Signed-off-by: <jane.malalane@citrix.com> > > --- > > CC: Jan Beulich <jbeulich@suse.com> > > CC: Ian Jackson <iwj@xenproject.org> > > --- > > > > Release rationale: > > Not taking this patch means that turbo status is misleading. > > > > Taking this patch is low-risk as it only uses a function that already > > exists and is already used for setting the chosen scaling governor. > > Essentially, this change is equivalent to running 'xenpm > > en/disable-turbo-mode' and, subsequently, running 'xenpm > > set-scaling-governor <current governor>'. > > --- > > xen/drivers/cpufreq/utility.c | 6 +++++- > > 1 file changed, 5 insertions(+), 1 deletion(-) > > > > diff --git a/xen/drivers/cpufreq/utility.c b/xen/drivers/cpufreq/utility.c > > index b93895d4dd..5f200ff3ee 100644 > > --- a/xen/drivers/cpufreq/utility.c > > +++ b/xen/drivers/cpufreq/utility.c > > @@ -417,10 +417,14 @@ int cpufreq_update_turbo(int cpuid, int new_state) > > { > > ret = cpufreq_driver.update(cpuid, policy); > > if (ret) > > + { > > policy->turbo = curr_state; > > + return ret; > > + } > > } > > > > - return ret; > > + /* Reevaluate current CPU policy. */ > > + return __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS); > > } > > So, having looked through the manual, what the cpufreq_driver does for > turbo on Intel is bogus according to the SDM. > > There is a non-architectrual dance with IA32_MISC_ENABLE bit 38 (per > package) for firmware to configure turbo, but it manifests as another > dynamic CPUID bit (which I think we handle correctly). It has the same > semantics as XD_DISABLE and CPUID_LIMIT so we might want to consider > adding it to the set of bits we clear during boot. > > However, the correct way to turn off turbo is to set > IA32_PERF_CTL.TURBO_DISENGAGE bit, which is per logical processor. As > such, it *should* behave far more like the AMD CPB path. I wrote this in my HWP work (which I need to get back to...): +/* + * The SDM reads like turbo should be disabled with MSR_IA32_PERF_CTL and + * PERF_CTL_TURBO_DISENGAGE, but that does not seem to actually work, at least + * with my HWP testing. MSR_IA32_MISC_ENABLE and MISC_ENABLE_TURBO_DISENGAGE + * is what Linux uses and seems to work. + */ Regards, Jason
On 12.11.2021 19:51, Andrew Cooper wrote: > On 10/11/2021 09:19, Jane Malalane wrote: >> Before, user would change turbo status but this had no effect: boolean >> was set but policy wasn't reevaluated. Policy must be reevaluated so >> that CPU frequency is chosen according to the turbo status and the >> current governor. >> >> Therefore, add __cpufreq_governor() in cpufreq_update_turbo(). >> >> Reported-by: <edvin.torok@citrix.com> >> Signed-off-by: <jane.malalane@citrix.com> >> --- >> CC: Jan Beulich <jbeulich@suse.com> >> CC: Ian Jackson <iwj@xenproject.org> >> --- >> >> Release rationale: >> Not taking this patch means that turbo status is misleading. >> >> Taking this patch is low-risk as it only uses a function that already >> exists and is already used for setting the chosen scaling governor. >> Essentially, this change is equivalent to running 'xenpm >> en/disable-turbo-mode' and, subsequently, running 'xenpm >> set-scaling-governor <current governor>'. >> --- >> xen/drivers/cpufreq/utility.c | 6 +++++- >> 1 file changed, 5 insertions(+), 1 deletion(-) >> >> diff --git a/xen/drivers/cpufreq/utility.c b/xen/drivers/cpufreq/utility.c >> index b93895d4dd..5f200ff3ee 100644 >> --- a/xen/drivers/cpufreq/utility.c >> +++ b/xen/drivers/cpufreq/utility.c >> @@ -417,10 +417,14 @@ int cpufreq_update_turbo(int cpuid, int new_state) >> { >> ret = cpufreq_driver.update(cpuid, policy); >> if (ret) >> + { >> policy->turbo = curr_state; >> + return ret; >> + } >> } >> >> - return ret; >> + /* Reevaluate current CPU policy. */ >> + return __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS); >> } > > So, having looked through the manual, what the cpufreq_driver does for > turbo on Intel is bogus according to the SDM. > > There is a non-architectrual dance with IA32_MISC_ENABLE bit 38 (per > package) for firmware to configure turbo, but it manifests as another > dynamic CPUID bit (which I think we handle correctly). It has the same > semantics as XD_DISABLE and CPUID_LIMIT so we might want to consider > adding it to the set of bits we clear during boot. This is quite confusing in the docs - at least one of the tables calls the bit "IDA Disable", while other entries at least also refer to the effect of disabling IDA. I'm afraid I have trouble connecting turbo mode and IDA disabling, unless both are two different names of the same thing. Maybe they really are, except that SDM vol 2 uses yet another slightly different term for CPUID[6].EAX[1]: "Intel Turbo Boost Technology". > However, the correct way to turn off turbo is to set > IA32_PERF_CTL.TURBO_DISENGAGE bit, which is per logical processor. As > such, it *should* behave far more like the AMD CPB path. I'm afraid public documentation has no mention of a bit of this name. Considering the above I wonder whether you mean "IDA engage" (bit 32), albeit this doesn't seem very likely when you're taking about a "disengage" bit. If it is, we'd still need to cope with it being unavailable: While as per the doc it exists from Merom onwards, i.e. just far enough back for all 64-bit capable processors to be covered, at least there it is attributed "Mobile only". Jan > Therefore, I propose that the update hook gets renamed to update_turbo() > to more clearly state it's purpose, and that we use the TURBO_DISENGAGE > bit as documented. > > If we're going this route, I'd also like to make this hook consistent > with others, where we IPI directly, rather than having an intermediate > function pointer just to send an IPI. > > ~Andrew >
On 15/11/2021 10:53, Jan Beulich wrote: > On 12.11.2021 19:51, Andrew Cooper wrote: >> On 10/11/2021 09:19, Jane Malalane wrote: >>> Before, user would change turbo status but this had no effect: boolean >>> was set but policy wasn't reevaluated. Policy must be reevaluated so >>> that CPU frequency is chosen according to the turbo status and the >>> current governor. >>> >>> Therefore, add __cpufreq_governor() in cpufreq_update_turbo(). >>> >>> Reported-by: <edvin.torok@citrix.com> >>> Signed-off-by: <jane.malalane@citrix.com> >>> --- >>> CC: Jan Beulich <jbeulich@suse.com> >>> CC: Ian Jackson <iwj@xenproject.org> >>> --- >>> >>> Release rationale: >>> Not taking this patch means that turbo status is misleading. >>> >>> Taking this patch is low-risk as it only uses a function that already >>> exists and is already used for setting the chosen scaling governor. >>> Essentially, this change is equivalent to running 'xenpm >>> en/disable-turbo-mode' and, subsequently, running 'xenpm >>> set-scaling-governor <current governor>'. >>> --- >>> xen/drivers/cpufreq/utility.c | 6 +++++- >>> 1 file changed, 5 insertions(+), 1 deletion(-) >>> >>> diff --git a/xen/drivers/cpufreq/utility.c b/xen/drivers/cpufreq/utility.c >>> index b93895d4dd..5f200ff3ee 100644 >>> --- a/xen/drivers/cpufreq/utility.c >>> +++ b/xen/drivers/cpufreq/utility.c >>> @@ -417,10 +417,14 @@ int cpufreq_update_turbo(int cpuid, int new_state) >>> { >>> ret = cpufreq_driver.update(cpuid, policy); >>> if (ret) >>> + { >>> policy->turbo = curr_state; >>> + return ret; >>> + } >>> } >>> >>> - return ret; >>> + /* Reevaluate current CPU policy. */ >>> + return __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS); >>> } >> So, having looked through the manual, what the cpufreq_driver does for >> turbo on Intel is bogus according to the SDM. >> >> There is a non-architectrual dance with IA32_MISC_ENABLE bit 38 (per >> package) for firmware to configure turbo, but it manifests as another >> dynamic CPUID bit (which I think we handle correctly). It has the same >> semantics as XD_DISABLE and CPUID_LIMIT so we might want to consider >> adding it to the set of bits we clear during boot. > This is quite confusing in the docs - at least one of the tables calls > the bit "IDA Disable", while other entries at least also refer to the > effect of disabling IDA. I'm afraid I have trouble connecting turbo > mode and IDA disabling, unless both are two different names of the > same thing. Maybe they really are, except that SDM vol 2 uses yet > another slightly different term for CPUID[6].EAX[1]: "Intel Turbo Boost > Technology". SDM Vol3 14.3 uses IDA and turbo interchangeably in some cases. It reads as if IDA is the general technology name, and turbo is a sub-mode for "doing it automatically without software involvement". On CPUs which support IDA, the CPUID bit is !MISC_ENABLE[38], with further adds to the confusion of which is which. >> However, the correct way to turn off turbo is to set >> IA32_PERF_CTL.TURBO_DISENGAGE bit, which is per logical processor. As >> such, it *should* behave far more like the AMD CPB path. > I'm afraid public documentation has no mention of a bit of this name. > Considering the above I wonder whether you mean "IDA engage" (bit 32), > albeit this doesn't seem very likely when you're taking about a > "disengage" bit. It is that bit. Despite the name given in Vol4 Table 2-12, it is "set to disable". Vol3 Figure 14-2 explicitly calls it the "IDA/Turbo disengage" bit and the surrounding text makes it clear that it is disable bit, not an enable bit. Also, that's how the Linux intel_pstate driver uses it. > If it is, we'd still need to cope with it being > unavailable: While as per the doc it exists from Merom onwards, i.e. > just far enough back for all 64-bit capable processors to be covered, > at least there it is attributed "Mobile only". Honestly, the number of errors in those tables are alarming. The MSR is has been around longer than turbo, and I'm told that the interface has never changed since its introduction. Looking across Linux, there's a mess too. acpi-cpufreq and energy_perf_policy use the MISC_ENABLE bit (which is package wide) intel_ips driver uses PERF_CTL (which is logical processor) intel_pstate uses MISC_ENABLE || max_pstate == turbo_pstate I'm certain I've seen "one pstate being special" WRT turbo before, but I can't locate anything about this in the SDM, which possibly means it is model specific. ~Andrew
On 15.11.2021 15:33, Andrew Cooper wrote: > On 15/11/2021 10:53, Jan Beulich wrote: >> On 12.11.2021 19:51, Andrew Cooper wrote: >>> On 10/11/2021 09:19, Jane Malalane wrote: >>>> Before, user would change turbo status but this had no effect: boolean >>>> was set but policy wasn't reevaluated. Policy must be reevaluated so >>>> that CPU frequency is chosen according to the turbo status and the >>>> current governor. >>>> >>>> Therefore, add __cpufreq_governor() in cpufreq_update_turbo(). >>>> >>>> Reported-by: <edvin.torok@citrix.com> >>>> Signed-off-by: <jane.malalane@citrix.com> >>>> --- >>>> CC: Jan Beulich <jbeulich@suse.com> >>>> CC: Ian Jackson <iwj@xenproject.org> >>>> --- >>>> >>>> Release rationale: >>>> Not taking this patch means that turbo status is misleading. >>>> >>>> Taking this patch is low-risk as it only uses a function that already >>>> exists and is already used for setting the chosen scaling governor. >>>> Essentially, this change is equivalent to running 'xenpm >>>> en/disable-turbo-mode' and, subsequently, running 'xenpm >>>> set-scaling-governor <current governor>'. >>>> --- >>>> xen/drivers/cpufreq/utility.c | 6 +++++- >>>> 1 file changed, 5 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/xen/drivers/cpufreq/utility.c b/xen/drivers/cpufreq/utility.c >>>> index b93895d4dd..5f200ff3ee 100644 >>>> --- a/xen/drivers/cpufreq/utility.c >>>> +++ b/xen/drivers/cpufreq/utility.c >>>> @@ -417,10 +417,14 @@ int cpufreq_update_turbo(int cpuid, int new_state) >>>> { >>>> ret = cpufreq_driver.update(cpuid, policy); >>>> if (ret) >>>> + { >>>> policy->turbo = curr_state; >>>> + return ret; >>>> + } >>>> } >>>> >>>> - return ret; >>>> + /* Reevaluate current CPU policy. */ >>>> + return __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS); >>>> } >>> So, having looked through the manual, what the cpufreq_driver does for >>> turbo on Intel is bogus according to the SDM. >>> >>> There is a non-architectrual dance with IA32_MISC_ENABLE bit 38 (per >>> package) for firmware to configure turbo, but it manifests as another >>> dynamic CPUID bit (which I think we handle correctly). It has the same >>> semantics as XD_DISABLE and CPUID_LIMIT so we might want to consider >>> adding it to the set of bits we clear during boot. >> This is quite confusing in the docs - at least one of the tables calls >> the bit "IDA Disable", while other entries at least also refer to the >> effect of disabling IDA. I'm afraid I have trouble connecting turbo >> mode and IDA disabling, unless both are two different names of the >> same thing. Maybe they really are, except that SDM vol 2 uses yet >> another slightly different term for CPUID[6].EAX[1]: "Intel Turbo Boost >> Technology". > > SDM Vol3 14.3 uses IDA and turbo interchangeably in some cases. It > reads as if IDA is the general technology name, and turbo is a sub-mode > for "doing it automatically without software involvement". > > On CPUs which support IDA, the CPUID bit is !MISC_ENABLE[38], with > further adds to the confusion of which is which. > >>> However, the correct way to turn off turbo is to set >>> IA32_PERF_CTL.TURBO_DISENGAGE bit, which is per logical processor. As >>> such, it *should* behave far more like the AMD CPB path. >> I'm afraid public documentation has no mention of a bit of this name. >> Considering the above I wonder whether you mean "IDA engage" (bit 32), >> albeit this doesn't seem very likely when you're taking about a >> "disengage" bit. > > It is that bit. Despite the name given in Vol4 Table 2-12, it is "set > to disable". > > Vol3 Figure 14-2 explicitly calls it the "IDA/Turbo disengage" bit and > the surrounding text makes it clear that it is disable bit, not an > enable bit. Also, that's how the Linux intel_pstate driver uses it. Okay, then I agree with the proposal you've made. Jan >> If it is, we'd still need to cope with it being >> unavailable: While as per the doc it exists from Merom onwards, i.e. >> just far enough back for all 64-bit capable processors to be covered, >> at least there it is attributed "Mobile only". > > Honestly, the number of errors in those tables are alarming. The MSR is > has been around longer than turbo, and I'm told that the interface has > never changed since its introduction. > > Looking across Linux, there's a mess too. > > acpi-cpufreq and energy_perf_policy use the MISC_ENABLE bit (which is > package wide) > intel_ips driver uses PERF_CTL (which is logical processor) > intel_pstate uses MISC_ENABLE || max_pstate == turbo_pstate > > I'm certain I've seen "one pstate being special" WRT turbo before, but I > can't locate anything about this in the SDM, which possibly means it is > model specific. > > ~Andrew >
On 15/11/2021 16:21, Jan Beulich wrote: > On 15.11.2021 15:33, Andrew Cooper wrote: >> On 15/11/2021 10:53, Jan Beulich wrote: >>> On 12.11.2021 19:51, Andrew Cooper wrote: >>>> On 10/11/2021 09:19, Jane Malalane wrote: >>>>> Before, user would change turbo status but this had no effect: boolean >>>>> was set but policy wasn't reevaluated. Policy must be reevaluated so >>>>> that CPU frequency is chosen according to the turbo status and the >>>>> current governor. >>>>> >>>>> Therefore, add __cpufreq_governor() in cpufreq_update_turbo(). >>>>> >>>>> Reported-by: <edvin.torok@citrix.com> >>>>> Signed-off-by: <jane.malalane@citrix.com> >>>>> --- >>>>> CC: Jan Beulich <jbeulich@suse.com> >>>>> CC: Ian Jackson <iwj@xenproject.org> >>>>> --- >>>>> >>>>> Release rationale: >>>>> Not taking this patch means that turbo status is misleading. >>>>> >>>>> Taking this patch is low-risk as it only uses a function that already >>>>> exists and is already used for setting the chosen scaling governor. >>>>> Essentially, this change is equivalent to running 'xenpm >>>>> en/disable-turbo-mode' and, subsequently, running 'xenpm >>>>> set-scaling-governor <current governor>'. >>>>> --- >>>>> xen/drivers/cpufreq/utility.c | 6 +++++- >>>>> 1 file changed, 5 insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/xen/drivers/cpufreq/utility.c b/xen/drivers/cpufreq/utility.c >>>>> index b93895d4dd..5f200ff3ee 100644 >>>>> --- a/xen/drivers/cpufreq/utility.c >>>>> +++ b/xen/drivers/cpufreq/utility.c >>>>> @@ -417,10 +417,14 @@ int cpufreq_update_turbo(int cpuid, int new_state) >>>>> { >>>>> ret = cpufreq_driver.update(cpuid, policy); >>>>> if (ret) >>>>> + { >>>>> policy->turbo = curr_state; >>>>> + return ret; >>>>> + } >>>>> } >>>>> >>>>> - return ret; >>>>> + /* Reevaluate current CPU policy. */ >>>>> + return __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS); >>>>> } >>>> So, having looked through the manual, what the cpufreq_driver does for >>>> turbo on Intel is bogus according to the SDM. >>>> >>>> There is a non-architectrual dance with IA32_MISC_ENABLE bit 38 (per >>>> package) for firmware to configure turbo, but it manifests as another >>>> dynamic CPUID bit (which I think we handle correctly). It has the same >>>> semantics as XD_DISABLE and CPUID_LIMIT so we might want to consider >>>> adding it to the set of bits we clear during boot. >>> This is quite confusing in the docs - at least one of the tables calls >>> the bit "IDA Disable", while other entries at least also refer to the >>> effect of disabling IDA. I'm afraid I have trouble connecting turbo >>> mode and IDA disabling, unless both are two different names of the >>> same thing. Maybe they really are, except that SDM vol 2 uses yet >>> another slightly different term for CPUID[6].EAX[1]: "Intel Turbo Boost >>> Technology". >> SDM Vol3 14.3 uses IDA and turbo interchangeably in some cases. It >> reads as if IDA is the general technology name, and turbo is a sub-mode >> for "doing it automatically without software involvement". >> >> On CPUs which support IDA, the CPUID bit is !MISC_ENABLE[38], with >> further adds to the confusion of which is which. >> >>>> However, the correct way to turn off turbo is to set >>>> IA32_PERF_CTL.TURBO_DISENGAGE bit, which is per logical processor. As >>>> such, it *should* behave far more like the AMD CPB path. >>> I'm afraid public documentation has no mention of a bit of this name. >>> Considering the above I wonder whether you mean "IDA engage" (bit 32), >>> albeit this doesn't seem very likely when you're taking about a >>> "disengage" bit. >> It is that bit. Despite the name given in Vol4 Table 2-12, it is "set >> to disable". >> >> Vol3 Figure 14-2 explicitly calls it the "IDA/Turbo disengage" bit and >> the surrounding text makes it clear that it is disable bit, not an >> enable bit. Also, that's how the Linux intel_pstate driver uses it. > Okay, then I agree with the proposal you've made. I've just done some experimentation on a Intel(R) Xeon(R) E-2288G (CFL-R based), and both the MISC_ENABLE and PERF_CTL bits have the same effect, and cut nearly 1GHz off the APERF/MPERF reported frequency on a busy single core on an otherwise idle system. I have not checked the effect that PERF_CTL on thread 0 has on other threads in the package, but the reality is that ~100% of production use of these controls are for all CPUs at once. (So much so, that I really think the interface ought to gain a -1 sentential for "all cpus", rather than forcing userspace to loop over each cpu individually, when Xen can handle the entire loop in O(1) time.) ~Andrew
diff --git a/xen/drivers/cpufreq/utility.c b/xen/drivers/cpufreq/utility.c index b93895d4dd..5f200ff3ee 100644 --- a/xen/drivers/cpufreq/utility.c +++ b/xen/drivers/cpufreq/utility.c @@ -417,10 +417,14 @@ int cpufreq_update_turbo(int cpuid, int new_state) { ret = cpufreq_driver.update(cpuid, policy); if (ret) + { policy->turbo = curr_state; + return ret; + } } - return ret; + /* Reevaluate current CPU policy. */ + return __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS); }
Before, user would change turbo status but this had no effect: boolean was set but policy wasn't reevaluated. Policy must be reevaluated so that CPU frequency is chosen according to the turbo status and the current governor. Therefore, add __cpufreq_governor() in cpufreq_update_turbo(). Reported-by: <edvin.torok@citrix.com> Signed-off-by: <jane.malalane@citrix.com> --- CC: Jan Beulich <jbeulich@suse.com> CC: Ian Jackson <iwj@xenproject.org> --- Release rationale: Not taking this patch means that turbo status is misleading. Taking this patch is low-risk as it only uses a function that already exists and is already used for setting the chosen scaling governor. Essentially, this change is equivalent to running 'xenpm en/disable-turbo-mode' and, subsequently, running 'xenpm set-scaling-governor <current governor>'. --- xen/drivers/cpufreq/utility.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)