Message ID | 20230501193034.88575-8-jandryuk@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Intel Hardware P-States (HWP) support | expand |
On 01.05.2023 21:30, Jason Andryuk wrote: > Extend xen_get_cpufreq_para to return hwp parameters. These match the > hardware rather closely. > > We need the features bitmask to indicated fields supported by the actual > hardware. > > The use of uint8_t parameters matches the hardware size. uint32_t > entries grows the sysctl_t past the build assertion in setup.c. The > uint8_t ranges are supported across multiple generations, so hopefully > they won't change. Still it feels a little odd for values to be this narrow. Aiui the scaling_governor[] and scaling_{max,min}_freq fields aren't (really) used by HWP. So you could widen the union in struct xen_get_cpufreq_para (in a binary but not necessarily source compatible manner), gaining you 6 more uint32_t slots. Possibly the somewhat oddly placed scaling_cur_freq could be included as well ... > --- a/xen/arch/x86/acpi/cpufreq/hwp.c > +++ b/xen/arch/x86/acpi/cpufreq/hwp.c > @@ -506,6 +506,31 @@ static const struct cpufreq_driver __initconstrel hwp_cpufreq_driver = > .update = hwp_cpufreq_update, > }; > > +int get_hwp_para(const struct cpufreq_policy *policy, While I don't really mind a policy being passed into here, ... > + struct xen_hwp_para *hwp_para) > +{ > + unsigned int cpu = policy->cpu; ... this is its only use afaics, and hence the caller could as well pass in just a CPU number? > --- a/xen/include/public/sysctl.h > +++ b/xen/include/public/sysctl.h > @@ -292,6 +292,31 @@ struct xen_ondemand { > uint32_t up_threshold; > }; > > +struct xen_hwp_para { > + /* > + * bits 6:0 - 7bit mantissa > + * bits 9:7 - 3bit base-10 exponent > + * btis 15:10 - Unused - must be 0 > + */ > +#define HWP_ACT_WINDOW_MANTISSA_MASK 0x7f > +#define HWP_ACT_WINDOW_EXPONENT_MASK 0x7 > +#define HWP_ACT_WINDOW_EXPONENT_SHIFT 7 > + uint16_t activity_window; > + /* energy_perf range 0-255 if 1. Otherwise 0-15 */ > +#define XEN_SYSCTL_HWP_FEAT_ENERGY_PERF (1 << 0) > + /* activity_window supported if 1 */ > +#define XEN_SYSCTL_HWP_FEAT_ACT_WINDOW (1 << 1) > + uint8_t features; /* bit flags for features */ > + uint8_t lowest; > + uint8_t most_efficient; > + uint8_t guaranteed; > + uint8_t highest; > + uint8_t minimum; > + uint8_t maximum; > + uint8_t desired; > + uint8_t energy_perf; These fields could do with some more commentary. To be honest I had trouble figuring (from the SDM) what exact meaning specific numeric values have. Readers of this header should at the very least be told where they can turn to in order to understand what these fields communicate. (FTAOD this could be section names, but please not section numbers. The latter are fine to use in a discussion, but they're changing too frequently to make them useful in code comments.) > +}; Also, if you decide to stick to uint8_t, then the trailing padding field (another uint8_t) wants making explicit. I'm on the edge whether to ask to also check the field: Right here the struct is "get only", and peeking ahead you look to be introducing a separate sub-op for "set". Perhaps if you added /* OUT */ at the top of the new struct? (But if you don't check the field for being zero, then you'll want to set it to zero for forward compatibility.) Jan
On 08.05.2023 12:25, Jan Beulich wrote: > On 01.05.2023 21:30, Jason Andryuk wrote: >> Extend xen_get_cpufreq_para to return hwp parameters. These match the >> hardware rather closely. >> >> We need the features bitmask to indicated fields supported by the actual >> hardware. >> >> The use of uint8_t parameters matches the hardware size. uint32_t >> entries grows the sysctl_t past the build assertion in setup.c. The >> uint8_t ranges are supported across multiple generations, so hopefully >> they won't change. > > Still it feels a little odd for values to be this narrow. Aiui the > scaling_governor[] and scaling_{max,min}_freq fields aren't (really) > used by HWP. So you could widen the union in struct > xen_get_cpufreq_para (in a binary but not necessarily source compatible > manner), gaining you 6 more uint32_t slots. Possibly the somewhat oddly > placed scaling_cur_freq could be included as well ... Having seen patch 9 now as well, I wonder whether here (or in a separate patch) you don't want to limit providing inapplicable data (for example not filling *scaling_available_governors would even avoid an allocation, thus removing a possible reason for failure), while there (or again in a separate patch) you'd also limit what the tool reports (inapplicable output causes confusion / questions at best). Jan
On Mon, May 8, 2023 at 6:26 AM Jan Beulich <jbeulich@suse.com> wrote: > > On 01.05.2023 21:30, Jason Andryuk wrote: > > Extend xen_get_cpufreq_para to return hwp parameters. These match the > > hardware rather closely. > > > > We need the features bitmask to indicated fields supported by the actual > > hardware. > > > > The use of uint8_t parameters matches the hardware size. uint32_t > > entries grows the sysctl_t past the build assertion in setup.c. The > > uint8_t ranges are supported across multiple generations, so hopefully > > they won't change. > > Still it feels a little odd for values to be this narrow. Aiui the > scaling_governor[] and scaling_{max,min}_freq fields aren't (really) > used by HWP. So you could widen the union in struct > xen_get_cpufreq_para (in a binary but not necessarily source compatible > manner), gaining you 6 more uint32_t slots. Possibly the somewhat oddly > placed scaling_cur_freq could be included as well ... The values are narrow, but they match the hardware. It works for HWP, so there is no need to change at this time AFAICT. Do you want me to make this change? > > --- a/xen/arch/x86/acpi/cpufreq/hwp.c > > +++ b/xen/arch/x86/acpi/cpufreq/hwp.c > > @@ -506,6 +506,31 @@ static const struct cpufreq_driver __initconstrel hwp_cpufreq_driver = > > .update = hwp_cpufreq_update, > > }; > > > > +int get_hwp_para(const struct cpufreq_policy *policy, > > While I don't really mind a policy being passed into here, ... > > > + struct xen_hwp_para *hwp_para) > > +{ > > + unsigned int cpu = policy->cpu; > > ... this is its only use afaics, and hence the caller could as well pass > in just a CPU number? Sounds good. > > --- a/xen/include/public/sysctl.h > > +++ b/xen/include/public/sysctl.h > > @@ -292,6 +292,31 @@ struct xen_ondemand { > > uint32_t up_threshold; > > }; > > > > +struct xen_hwp_para { > > + /* > > + * bits 6:0 - 7bit mantissa > > + * bits 9:7 - 3bit base-10 exponent > > + * btis 15:10 - Unused - must be 0 > > + */ > > +#define HWP_ACT_WINDOW_MANTISSA_MASK 0x7f > > +#define HWP_ACT_WINDOW_EXPONENT_MASK 0x7 > > +#define HWP_ACT_WINDOW_EXPONENT_SHIFT 7 > > + uint16_t activity_window; > > + /* energy_perf range 0-255 if 1. Otherwise 0-15 */ > > +#define XEN_SYSCTL_HWP_FEAT_ENERGY_PERF (1 << 0) > > + /* activity_window supported if 1 */ > > +#define XEN_SYSCTL_HWP_FEAT_ACT_WINDOW (1 << 1) > > + uint8_t features; /* bit flags for features */ > > + uint8_t lowest; > > + uint8_t most_efficient; > > + uint8_t guaranteed; > > + uint8_t highest; > > + uint8_t minimum; > > + uint8_t maximum; > > + uint8_t desired; > > + uint8_t energy_perf; > > These fields could do with some more commentary. To be honest I had > trouble figuring (from the SDM) what exact meaning specific numeric > values have. Readers of this header should at the very least be told > where they can turn to in order to understand what these fields > communicate. (FTAOD this could be section names, but please not > section numbers. The latter are fine to use in a discussion, but > they're changing too frequently to make them useful in code > comments.) Sounds good. I'll add some description. > > +}; > > Also, if you decide to stick to uint8_t, then the trailing padding > field (another uint8_t) wants making explicit. I'm on the edge > whether to ask to also check the field: Right here the struct is > "get only", and peeking ahead you look to be introducing a separate > sub-op for "set". Perhaps if you added /* OUT */ at the top of the > new struct? (But if you don't check the field for being zero, then > you'll want to set it to zero for forward compatibility.) Thanks for catching. I'll add the padding field and zero it. On Mon, May 8, 2023 at 6:47 AM Jan Beulich <jbeulich@suse.com> wrote: > > On 08.05.2023 12:25, Jan Beulich wrote: > > On 01.05.2023 21:30, Jason Andryuk wrote: > >> Extend xen_get_cpufreq_para to return hwp parameters. These match the > >> hardware rather closely. > >> > >> We need the features bitmask to indicated fields supported by the actual > >> hardware. > >> > >> The use of uint8_t parameters matches the hardware size. uint32_t > >> entries grows the sysctl_t past the build assertion in setup.c. The > >> uint8_t ranges are supported across multiple generations, so hopefully > >> they won't change. > > > > Still it feels a little odd for values to be this narrow. Aiui the > > scaling_governor[] and scaling_{max,min}_freq fields aren't (really) > > used by HWP. So you could widen the union in struct > > xen_get_cpufreq_para (in a binary but not necessarily source compatible > > manner), gaining you 6 more uint32_t slots. Possibly the somewhat oddly > > placed scaling_cur_freq could be included as well ... > > Having seen patch 9 now as well, I wonder whether here (or in a separate > patch) you don't want to limit providing inapplicable data (for example > not filling *scaling_available_governors would even avoid an allocation, > thus removing a possible reason for failure), while there (or again in a > separate patch) you'd also limit what the tool reports (inapplicable > output causes confusion / questions at best). The xenpm output only shows relevant information: # xenpm get-cpufreq-para 11 cpu id : 11 affected_cpus : 11 cpuinfo frequency : base [1600000] max [4900000] scaling_driver : hwp-cpufreq scaling_avail_gov : hwp current_governor : hwp hwp variables : hardware limits : lowest [1] most_efficient [11] : guaranteed [11] highest [49] configured limits : min [1] max [255] energy_perf [128] : activity_window [0 hardware selected] : desired [0 hw autonomous] turbo mode : enabled The scaling_*_freq values, policy->{min,max,cur} are filled in with base, max and 0 in hwp_get_cpu_speeds(), so it's not totally invalid values being returned. The governor registration restricting to only internal governors when HWP is active means it only has the single governor. I think it's okay as-is, but let me know if you want something changed. Regards, Jason
On 10.05.2023 19:49, Jason Andryuk wrote: > On Mon, May 8, 2023 at 6:26 AM Jan Beulich <jbeulich@suse.com> wrote: >> >> On 01.05.2023 21:30, Jason Andryuk wrote: >>> Extend xen_get_cpufreq_para to return hwp parameters. These match the >>> hardware rather closely. >>> >>> We need the features bitmask to indicated fields supported by the actual >>> hardware. >>> >>> The use of uint8_t parameters matches the hardware size. uint32_t >>> entries grows the sysctl_t past the build assertion in setup.c. The >>> uint8_t ranges are supported across multiple generations, so hopefully >>> they won't change. >> >> Still it feels a little odd for values to be this narrow. Aiui the >> scaling_governor[] and scaling_{max,min}_freq fields aren't (really) >> used by HWP. So you could widen the union in struct >> xen_get_cpufreq_para (in a binary but not necessarily source compatible >> manner), gaining you 6 more uint32_t slots. Possibly the somewhat oddly >> placed scaling_cur_freq could be included as well ... > > The values are narrow, but they match the hardware. It works for HWP, > so there is no need to change at this time AFAICT. > > Do you want me to make this change? Well, much depends on what these 8-bit values actually express (I did raise this question in one of the replies to your patches, as I wasn't able to find anything in the SDM). That'll then hopefully allow to make some educated prediction on on how likely it is that a future variant of hwp would want to widen them. (Was it energy_perf that went from 4 to 8 bits at some point, which you even comment upon in the public header?) > On Mon, May 8, 2023 at 6:47 AM Jan Beulich <jbeulich@suse.com> wrote: >> On 08.05.2023 12:25, Jan Beulich wrote: >>> On 01.05.2023 21:30, Jason Andryuk wrote: >>>> Extend xen_get_cpufreq_para to return hwp parameters. These match the >>>> hardware rather closely. >>>> >>>> We need the features bitmask to indicated fields supported by the actual >>>> hardware. >>>> >>>> The use of uint8_t parameters matches the hardware size. uint32_t >>>> entries grows the sysctl_t past the build assertion in setup.c. The >>>> uint8_t ranges are supported across multiple generations, so hopefully >>>> they won't change. >>> >>> Still it feels a little odd for values to be this narrow. Aiui the >>> scaling_governor[] and scaling_{max,min}_freq fields aren't (really) >>> used by HWP. So you could widen the union in struct >>> xen_get_cpufreq_para (in a binary but not necessarily source compatible >>> manner), gaining you 6 more uint32_t slots. Possibly the somewhat oddly >>> placed scaling_cur_freq could be included as well ... >> >> Having seen patch 9 now as well, I wonder whether here (or in a separate >> patch) you don't want to limit providing inapplicable data (for example >> not filling *scaling_available_governors would even avoid an allocation, >> thus removing a possible reason for failure), while there (or again in a >> separate patch) you'd also limit what the tool reports (inapplicable >> output causes confusion / questions at best). > > The xenpm output only shows relevant information: > > # xenpm get-cpufreq-para 11 > cpu id : 11 > affected_cpus : 11 > cpuinfo frequency : base [1600000] max [4900000] > scaling_driver : hwp-cpufreq > scaling_avail_gov : hwp > current_governor : hwp > hwp variables : > hardware limits : lowest [1] most_efficient [11] > : guaranteed [11] highest [49] > configured limits : min [1] max [255] energy_perf [128] > : activity_window [0 hardware selected] > : desired [0 hw autonomous] > turbo mode : enabled > > The scaling_*_freq values, policy->{min,max,cur} are filled in with > base, max and 0 in hwp_get_cpu_speeds(), so it's not totally invalid > values being returned. The governor registration restricting to only > internal governors when HWP is active means it only has the single > governor. I think it's okay as-is, but let me know if you want > something changed. Well, the main connection here was to the possible overloading of space in the sysctl struct, by widening what the union covers. That can of course only be done for fields which don't convey useful data. If we go without the further overloading, I guess we can for now leave the tool as you have it, and deal with possible tidying later on. Jan
On Thu, May 11, 2023 at 2:21 AM Jan Beulich <jbeulich@suse.com> wrote: > > On 10.05.2023 19:49, Jason Andryuk wrote: > > On Mon, May 8, 2023 at 6:26 AM Jan Beulich <jbeulich@suse.com> wrote: > >> > >> On 01.05.2023 21:30, Jason Andryuk wrote: > >>> Extend xen_get_cpufreq_para to return hwp parameters. These match the > >>> hardware rather closely. > >>> > >>> We need the features bitmask to indicated fields supported by the actual > >>> hardware. > >>> > >>> The use of uint8_t parameters matches the hardware size. uint32_t > >>> entries grows the sysctl_t past the build assertion in setup.c. The > >>> uint8_t ranges are supported across multiple generations, so hopefully > >>> they won't change. > >> > >> Still it feels a little odd for values to be this narrow. Aiui the > >> scaling_governor[] and scaling_{max,min}_freq fields aren't (really) > >> used by HWP. So you could widen the union in struct > >> xen_get_cpufreq_para (in a binary but not necessarily source compatible > >> manner), gaining you 6 more uint32_t slots. Possibly the somewhat oddly > >> placed scaling_cur_freq could be included as well ... > > > > The values are narrow, but they match the hardware. It works for HWP, > > so there is no need to change at this time AFAICT. > > > > Do you want me to make this change? > > Well, much depends on what these 8-bit values actually express (I did > raise this question in one of the replies to your patches, as I wasn't > able to find anything in the SDM). That'll then hopefully allow to > make some educated prediction on on how likely it is that a future > variant of hwp would want to widen them. Sorry for not providing a reference earlier. In the SDM, HARDWARE-CONTROLLED PERFORMANCE STATES (HWP) section, there is this second paragraph: """ In contrast, HWP is an implementation of the ACPI-defined Collaborative Processor Performance Control (CPPC), which specifies that the platform enumerates a continuous, abstract unit-less, performance value scale that is not tied to a specific performance state / frequency by definition. While the enumerated scale is roughly linear in terms of a delivered integer workload performance result, the OS is required to characterize the performance value range to comprehend the delivered performance for an applied workload. """ The numbers are "continuous, abstract unit-less, performance value." So there isn't much to go on there, but generally, smaller numbers mean slower and bigger numbers mean faster. Cross referencing the ACPI spec here: https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html#collaborative-processor-performance-control Scrolling down you can find the register entries such as Highest Performance Register or DWORD Attribute: Read Size: 8-32 bits AMD has its own pstate implementation that is similar to HWP. Looking at the Linux support, the AMD hardware also use 8 bit values for the comparable fields: https://elixir.bootlin.com/linux/latest/source/arch/x86/include/asm/msr-index.h#L612 So Intel and AMD are 8bit for now at least. Something could do 32bits according to the ACPI spec. 8 bits of granularity for slow to fast seems like plenty to me. I'm not sure what one would gain from 16 or 32 bits, but I'm not designing the hardware. From the earlier xenpm output, "highest" was 49, so still a decent amount of room in an 8 bit range. > (Was it energy_perf that went > from 4 to 8 bits at some point, which you even comment upon in the > public header?) energy_perf (Energy_Performanc_Preference) had a fallback: "If CPUID.06H:EAX[bit 10] indicates that this field is not supported, HWP uses the value of the IA32_ENERGY_PERF_BIAS MSR to determine the energy efficiency / performance preference." So it had a different range, but that was because it was being put into an older register. However, I've removed that fallback code in v4. Regards, Jason
On 11.05.2023 15:49, Jason Andryuk wrote: > On Thu, May 11, 2023 at 2:21 AM Jan Beulich <jbeulich@suse.com> wrote: >> >> On 10.05.2023 19:49, Jason Andryuk wrote: >>> On Mon, May 8, 2023 at 6:26 AM Jan Beulich <jbeulich@suse.com> wrote: >>>> >>>> On 01.05.2023 21:30, Jason Andryuk wrote: >>>>> Extend xen_get_cpufreq_para to return hwp parameters. These match the >>>>> hardware rather closely. >>>>> >>>>> We need the features bitmask to indicated fields supported by the actual >>>>> hardware. >>>>> >>>>> The use of uint8_t parameters matches the hardware size. uint32_t >>>>> entries grows the sysctl_t past the build assertion in setup.c. The >>>>> uint8_t ranges are supported across multiple generations, so hopefully >>>>> they won't change. >>>> >>>> Still it feels a little odd for values to be this narrow. Aiui the >>>> scaling_governor[] and scaling_{max,min}_freq fields aren't (really) >>>> used by HWP. So you could widen the union in struct >>>> xen_get_cpufreq_para (in a binary but not necessarily source compatible >>>> manner), gaining you 6 more uint32_t slots. Possibly the somewhat oddly >>>> placed scaling_cur_freq could be included as well ... >>> >>> The values are narrow, but they match the hardware. It works for HWP, >>> so there is no need to change at this time AFAICT. >>> >>> Do you want me to make this change? >> >> Well, much depends on what these 8-bit values actually express (I did >> raise this question in one of the replies to your patches, as I wasn't >> able to find anything in the SDM). That'll then hopefully allow to >> make some educated prediction on on how likely it is that a future >> variant of hwp would want to widen them. > > Sorry for not providing a reference earlier. In the SDM, > HARDWARE-CONTROLLED PERFORMANCE STATES (HWP) section, there is this > second paragraph: > """ > In contrast, HWP is an implementation of the ACPI-defined > Collaborative Processor Performance Control (CPPC), which specifies > that the platform enumerates a continuous, abstract unit-less, > performance value scale that is not tied to a specific performance > state / frequency by definition. While the enumerated scale is roughly > linear in terms of a delivered integer workload performance result, > the OS is required to characterize the performance value range to > comprehend the delivered performance for an applied workload. > """ > > The numbers are "continuous, abstract unit-less, performance value." > So there isn't much to go on there, but generally, smaller numbers > mean slower and bigger numbers mean faster. > > Cross referencing the ACPI spec here: > https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html#collaborative-processor-performance-control > > Scrolling down you can find the register entries such as > > Highest Performance > Register or DWORD Attribute: Read > Size: 8-32 bits > > AMD has its own pstate implementation that is similar to HWP. Looking > at the Linux support, the AMD hardware also use 8 bit values for the > comparable fields: > https://elixir.bootlin.com/linux/latest/source/arch/x86/include/asm/msr-index.h#L612 > > So Intel and AMD are 8bit for now at least. Something could do 32bits > according to the ACPI spec. > > 8 bits of granularity for slow to fast seems like plenty to me. I'm > not sure what one would gain from 16 or 32 bits, but I'm not designing > the hardware. From the earlier xenpm output, "highest" was 49, so > still a decent amount of room in an 8 bit range. Hmm, thanks for the pointers. I'm still somewhat undecided. I guess I'm okay with you keeping things as you have them. If and when needed we can still rework the structure - it is possible to change it as it's (for the time being at least) still an unstable interface. Jan
On Thu, May 11, 2023 at 10:10 AM Jan Beulich <jbeulich@suse.com> wrote: > > On 11.05.2023 15:49, Jason Andryuk wrote: > > On Thu, May 11, 2023 at 2:21 AM Jan Beulich <jbeulich@suse.com> wrote: > >> > >> On 10.05.2023 19:49, Jason Andryuk wrote: > >>> On Mon, May 8, 2023 at 6:26 AM Jan Beulich <jbeulich@suse.com> wrote: > >>>> > >>>> On 01.05.2023 21:30, Jason Andryuk wrote: > >>>>> Extend xen_get_cpufreq_para to return hwp parameters. These match the > >>>>> hardware rather closely. > >>>>> > >>>>> We need the features bitmask to indicated fields supported by the actual > >>>>> hardware. > >>>>> > >>>>> The use of uint8_t parameters matches the hardware size. uint32_t > >>>>> entries grows the sysctl_t past the build assertion in setup.c. The > >>>>> uint8_t ranges are supported across multiple generations, so hopefully > >>>>> they won't change. > >>>> > >>>> Still it feels a little odd for values to be this narrow. Aiui the > >>>> scaling_governor[] and scaling_{max,min}_freq fields aren't (really) > >>>> used by HWP. So you could widen the union in struct > >>>> xen_get_cpufreq_para (in a binary but not necessarily source compatible > >>>> manner), gaining you 6 more uint32_t slots. Possibly the somewhat oddly > >>>> placed scaling_cur_freq could be included as well ... > >>> > >>> The values are narrow, but they match the hardware. It works for HWP, > >>> so there is no need to change at this time AFAICT. > >>> > >>> Do you want me to make this change? > >> > >> Well, much depends on what these 8-bit values actually express (I did > >> raise this question in one of the replies to your patches, as I wasn't > >> able to find anything in the SDM). That'll then hopefully allow to > >> make some educated prediction on on how likely it is that a future > >> variant of hwp would want to widen them. > > > > Sorry for not providing a reference earlier. In the SDM, > > HARDWARE-CONTROLLED PERFORMANCE STATES (HWP) section, there is this > > second paragraph: > > """ > > In contrast, HWP is an implementation of the ACPI-defined > > Collaborative Processor Performance Control (CPPC), which specifies > > that the platform enumerates a continuous, abstract unit-less, > > performance value scale that is not tied to a specific performance > > state / frequency by definition. While the enumerated scale is roughly > > linear in terms of a delivered integer workload performance result, > > the OS is required to characterize the performance value range to > > comprehend the delivered performance for an applied workload. > > """ > > > > The numbers are "continuous, abstract unit-less, performance value." > > So there isn't much to go on there, but generally, smaller numbers > > mean slower and bigger numbers mean faster. > > > > Cross referencing the ACPI spec here: > > https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html#collaborative-processor-performance-control > > > > Scrolling down you can find the register entries such as > > > > Highest Performance > > Register or DWORD Attribute: Read > > Size: 8-32 bits > > > > AMD has its own pstate implementation that is similar to HWP. Looking > > at the Linux support, the AMD hardware also use 8 bit values for the > > comparable fields: > > https://elixir.bootlin.com/linux/latest/source/arch/x86/include/asm/msr-index.h#L612 > > > > So Intel and AMD are 8bit for now at least. Something could do 32bits > > according to the ACPI spec. > > > > 8 bits of granularity for slow to fast seems like plenty to me. I'm > > not sure what one would gain from 16 or 32 bits, but I'm not designing > > the hardware. From the earlier xenpm output, "highest" was 49, so > > still a decent amount of room in an 8 bit range. > > Hmm, thanks for the pointers. I'm still somewhat undecided. I guess I'm > okay with you keeping things as you have them. If and when needed we can > still rework the structure - it is possible to change it as it's (for > the time being at least) still an unstable interface. With an anonymous union and anonymous struct, struct xen_get_cpufreq_para can be re-arranged and compile without any changes to other cpufreq code. struct xen_hwp_para becomes 10 uint32_t's. The old scaling is 3 * uint32_t + 16 bytes CPUFREQ_NAME_LEN + 4 * uint32_t for xen_ondemand = 11 uint32_t. So int32_t turbo_enabled doesn't move and it's binary compatible. Anonymous unions and structs aren't allowed in the public header though, right? So that would need to change, though it doesn't seem too bad. There isn't too much churn. I have no plans to tackle AMD pstate. But having glanced at it this morning, maybe these hwp sysctls should be renamed cppc? AMD pstate and HWP are both implementations of CPPC, so that could be more future proof? But, again, I only glanced at the AMD stuff, so there may be other changes needed. Regards, Jason
On 11.05.2023 22:22, Jason Andryuk wrote: > On Thu, May 11, 2023 at 10:10 AM Jan Beulich <jbeulich@suse.com> wrote: >> >> On 11.05.2023 15:49, Jason Andryuk wrote: >>> On Thu, May 11, 2023 at 2:21 AM Jan Beulich <jbeulich@suse.com> wrote: >>>> >>>> On 10.05.2023 19:49, Jason Andryuk wrote: >>>>> On Mon, May 8, 2023 at 6:26 AM Jan Beulich <jbeulich@suse.com> wrote: >>>>>> >>>>>> On 01.05.2023 21:30, Jason Andryuk wrote: >>>>>>> Extend xen_get_cpufreq_para to return hwp parameters. These match the >>>>>>> hardware rather closely. >>>>>>> >>>>>>> We need the features bitmask to indicated fields supported by the actual >>>>>>> hardware. >>>>>>> >>>>>>> The use of uint8_t parameters matches the hardware size. uint32_t >>>>>>> entries grows the sysctl_t past the build assertion in setup.c. The >>>>>>> uint8_t ranges are supported across multiple generations, so hopefully >>>>>>> they won't change. >>>>>> >>>>>> Still it feels a little odd for values to be this narrow. Aiui the >>>>>> scaling_governor[] and scaling_{max,min}_freq fields aren't (really) >>>>>> used by HWP. So you could widen the union in struct >>>>>> xen_get_cpufreq_para (in a binary but not necessarily source compatible >>>>>> manner), gaining you 6 more uint32_t slots. Possibly the somewhat oddly >>>>>> placed scaling_cur_freq could be included as well ... >>>>> >>>>> The values are narrow, but they match the hardware. It works for HWP, >>>>> so there is no need to change at this time AFAICT. >>>>> >>>>> Do you want me to make this change? >>>> >>>> Well, much depends on what these 8-bit values actually express (I did >>>> raise this question in one of the replies to your patches, as I wasn't >>>> able to find anything in the SDM). That'll then hopefully allow to >>>> make some educated prediction on on how likely it is that a future >>>> variant of hwp would want to widen them. >>> >>> Sorry for not providing a reference earlier. In the SDM, >>> HARDWARE-CONTROLLED PERFORMANCE STATES (HWP) section, there is this >>> second paragraph: >>> """ >>> In contrast, HWP is an implementation of the ACPI-defined >>> Collaborative Processor Performance Control (CPPC), which specifies >>> that the platform enumerates a continuous, abstract unit-less, >>> performance value scale that is not tied to a specific performance >>> state / frequency by definition. While the enumerated scale is roughly >>> linear in terms of a delivered integer workload performance result, >>> the OS is required to characterize the performance value range to >>> comprehend the delivered performance for an applied workload. >>> """ >>> >>> The numbers are "continuous, abstract unit-less, performance value." >>> So there isn't much to go on there, but generally, smaller numbers >>> mean slower and bigger numbers mean faster. >>> >>> Cross referencing the ACPI spec here: >>> https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html#collaborative-processor-performance-control >>> >>> Scrolling down you can find the register entries such as >>> >>> Highest Performance >>> Register or DWORD Attribute: Read >>> Size: 8-32 bits >>> >>> AMD has its own pstate implementation that is similar to HWP. Looking >>> at the Linux support, the AMD hardware also use 8 bit values for the >>> comparable fields: >>> https://elixir.bootlin.com/linux/latest/source/arch/x86/include/asm/msr-index.h#L612 >>> >>> So Intel and AMD are 8bit for now at least. Something could do 32bits >>> according to the ACPI spec. >>> >>> 8 bits of granularity for slow to fast seems like plenty to me. I'm >>> not sure what one would gain from 16 or 32 bits, but I'm not designing >>> the hardware. From the earlier xenpm output, "highest" was 49, so >>> still a decent amount of room in an 8 bit range. >> >> Hmm, thanks for the pointers. I'm still somewhat undecided. I guess I'm >> okay with you keeping things as you have them. If and when needed we can >> still rework the structure - it is possible to change it as it's (for >> the time being at least) still an unstable interface. > > With an anonymous union and anonymous struct, struct > xen_get_cpufreq_para can be re-arranged and compile without any > changes to other cpufreq code. struct xen_hwp_para becomes 10 > uint32_t's. The old scaling is 3 * uint32_t + 16 bytes > CPUFREQ_NAME_LEN + 4 * uint32_t for xen_ondemand = 11 uint32_t. So > int32_t turbo_enabled doesn't move and it's binary compatible. > > Anonymous unions and structs aren't allowed in the public header > though, right? Correct. > So that would need to change, though it doesn't seem > too bad. There isn't too much churn. > > I have no plans to tackle AMD pstate. But having glanced at it this > morning, maybe these hwp sysctls should be renamed cppc? AMD pstate > and HWP are both implementations of CPPC, so that could be more future > proof? But, again, I only glanced at the AMD stuff, so there may be > other changes needed. I consider this naming change plan plausible. If further adjustments end up necessary for AMD, that'll be no worse (but maybe better) than if we have to go from HWP to a more general name altogether. Jan
diff --git a/xen/arch/x86/acpi/cpufreq/hwp.c b/xen/arch/x86/acpi/cpufreq/hwp.c index f84abe1386..cb52918799 100644 --- a/xen/arch/x86/acpi/cpufreq/hwp.c +++ b/xen/arch/x86/acpi/cpufreq/hwp.c @@ -506,6 +506,31 @@ static const struct cpufreq_driver __initconstrel hwp_cpufreq_driver = .update = hwp_cpufreq_update, }; +int get_hwp_para(const struct cpufreq_policy *policy, + struct xen_hwp_para *hwp_para) +{ + unsigned int cpu = policy->cpu; + const struct hwp_drv_data *data = per_cpu(hwp_drv_data, cpu); + + if ( data == NULL ) + return -EINVAL; + + hwp_para->features = + (feature_hwp_activity_window ? XEN_SYSCTL_HWP_FEAT_ACT_WINDOW : 0) | + (feature_hwp_energy_perf ? XEN_SYSCTL_HWP_FEAT_ENERGY_PERF : 0); + hwp_para->lowest = data->hw.lowest; + hwp_para->most_efficient = data->hw.most_efficient; + hwp_para->guaranteed = data->hw.guaranteed; + hwp_para->highest = data->hw.highest; + hwp_para->minimum = data->minimum; + hwp_para->maximum = data->maximum; + hwp_para->energy_perf = data->energy_perf; + hwp_para->activity_window = data->activity_window; + hwp_para->desired = data->desired; + + return 0; +} + int __init hwp_register_driver(void) { return cpufreq_register_driver(&hwp_cpufreq_driver); diff --git a/xen/drivers/acpi/pmstat.c b/xen/drivers/acpi/pmstat.c index 1bae635101..67fd9dabd4 100644 --- a/xen/drivers/acpi/pmstat.c +++ b/xen/drivers/acpi/pmstat.c @@ -290,6 +290,11 @@ static int get_cpufreq_para(struct xen_sysctl_pm_op *op) &op->u.get_para.u.ondemand.sampling_rate, &op->u.get_para.u.ondemand.up_threshold); } + + if ( !strncasecmp(op->u.get_para.scaling_governor, XEN_HWP_GOVERNOR, + CPUFREQ_NAME_LEN) ) + ret = get_hwp_para(policy, &op->u.get_para.u.hwp_para); + op->u.get_para.turbo_enabled = cpufreq_get_turbo_status(op->cpuid); return ret; diff --git a/xen/include/acpi/cpufreq/cpufreq.h b/xen/include/acpi/cpufreq/cpufreq.h index 29a712a4f1..92b4c7e79c 100644 --- a/xen/include/acpi/cpufreq/cpufreq.h +++ b/xen/include/acpi/cpufreq/cpufreq.h @@ -247,5 +247,7 @@ void intel_feature_detect(struct cpufreq_policy *policy); extern bool opt_cpufreq_hwp; extern bool opt_cpufreq_hdc; +int get_hwp_para(const struct cpufreq_policy *policy, + struct xen_hwp_para *hwp_para); #endif /* __XEN_CPUFREQ_PM_H__ */ diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h index b448f13b75..bf7e6594a7 100644 --- a/xen/include/public/sysctl.h +++ b/xen/include/public/sysctl.h @@ -292,6 +292,31 @@ struct xen_ondemand { uint32_t up_threshold; }; +struct xen_hwp_para { + /* + * bits 6:0 - 7bit mantissa + * bits 9:7 - 3bit base-10 exponent + * btis 15:10 - Unused - must be 0 + */ +#define HWP_ACT_WINDOW_MANTISSA_MASK 0x7f +#define HWP_ACT_WINDOW_EXPONENT_MASK 0x7 +#define HWP_ACT_WINDOW_EXPONENT_SHIFT 7 + uint16_t activity_window; + /* energy_perf range 0-255 if 1. Otherwise 0-15 */ +#define XEN_SYSCTL_HWP_FEAT_ENERGY_PERF (1 << 0) + /* activity_window supported if 1 */ +#define XEN_SYSCTL_HWP_FEAT_ACT_WINDOW (1 << 1) + uint8_t features; /* bit flags for features */ + uint8_t lowest; + uint8_t most_efficient; + uint8_t guaranteed; + uint8_t highest; + uint8_t minimum; + uint8_t maximum; + uint8_t desired; + uint8_t energy_perf; +}; + #define XEN_HWP_GOVERNOR "hwp-internal" /* * cpufreq para name of this structure named @@ -324,6 +349,7 @@ struct xen_get_cpufreq_para { union { struct xen_userspace userspace; struct xen_ondemand ondemand; + struct xen_hwp_para hwp_para; } u; int32_t turbo_enabled;
Extend xen_get_cpufreq_para to return hwp parameters. These match the hardware rather closely. We need the features bitmask to indicated fields supported by the actual hardware. The use of uint8_t parameters matches the hardware size. uint32_t entries grows the sysctl_t past the build assertion in setup.c. The uint8_t ranges are supported across multiple generations, so hopefully they won't change. Signed-off-by: Jason Andryuk <jandryuk@gmail.com> --- v2: Style fixes Don't bump XEN_SYSCTL_INTERFACE_VERSION Drop cpufreq.h comment divider Expand xen_hwp_para comment Add HWP activity window mantissa/exponent defines Handle union rename Add const to get_hwp_para Remove hw_ prefix from xen_hwp_para members Use XEN_HWP_GOVERNOR Use per_cpu for hwp_drv_data --- xen/arch/x86/acpi/cpufreq/hwp.c | 25 +++++++++++++++++++++++++ xen/drivers/acpi/pmstat.c | 5 +++++ xen/include/acpi/cpufreq/cpufreq.h | 2 ++ xen/include/public/sysctl.h | 26 ++++++++++++++++++++++++++ 4 files changed, 58 insertions(+)