diff mbox series

[v3,07/14,RESEND] cpufreq: Export HWP parameters to userspace

Message ID 20230501193034.88575-8-jandryuk@gmail.com (mailing list archive)
State Superseded
Headers show
Series Intel Hardware P-States (HWP) support | expand

Commit Message

Jason Andryuk May 1, 2023, 7:30 p.m. UTC
Extend xen_get_cpufreq_para to return hwp parameters.  These match the
hardware rather closely.

We need the features bitmask to indicated fields supported by the actual
hardware.

The use of uint8_t parameters matches the hardware size.  uint32_t
entries grows the sysctl_t past the build assertion in setup.c.  The
uint8_t ranges are supported across multiple generations, so hopefully
they won't change.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
---
v2:
Style fixes
Don't bump XEN_SYSCTL_INTERFACE_VERSION
Drop cpufreq.h comment divider
Expand xen_hwp_para comment
Add HWP activity window mantissa/exponent defines
Handle union rename
Add const to get_hwp_para
Remove hw_ prefix from xen_hwp_para members
Use XEN_HWP_GOVERNOR
Use per_cpu for hwp_drv_data
---
 xen/arch/x86/acpi/cpufreq/hwp.c    | 25 +++++++++++++++++++++++++
 xen/drivers/acpi/pmstat.c          |  5 +++++
 xen/include/acpi/cpufreq/cpufreq.h |  2 ++
 xen/include/public/sysctl.h        | 26 ++++++++++++++++++++++++++
 4 files changed, 58 insertions(+)

Comments

Jan Beulich May 8, 2023, 10:25 a.m. UTC | #1
On 01.05.2023 21:30, Jason Andryuk wrote:
> Extend xen_get_cpufreq_para to return hwp parameters.  These match the
> hardware rather closely.
> 
> We need the features bitmask to indicated fields supported by the actual
> hardware.
> 
> The use of uint8_t parameters matches the hardware size.  uint32_t
> entries grows the sysctl_t past the build assertion in setup.c.  The
> uint8_t ranges are supported across multiple generations, so hopefully
> they won't change.

Still it feels a little odd for values to be this narrow. Aiui the
scaling_governor[] and scaling_{max,min}_freq fields aren't (really)
used by HWP. So you could widen the union in struct
xen_get_cpufreq_para (in a binary but not necessarily source compatible
manner), gaining you 6 more uint32_t slots. Possibly the somewhat oddly
placed scaling_cur_freq could be included as well ...

> --- a/xen/arch/x86/acpi/cpufreq/hwp.c
> +++ b/xen/arch/x86/acpi/cpufreq/hwp.c
> @@ -506,6 +506,31 @@ static const struct cpufreq_driver __initconstrel hwp_cpufreq_driver =
>      .update = hwp_cpufreq_update,
>  };
>  
> +int get_hwp_para(const struct cpufreq_policy *policy,

While I don't really mind a policy being passed into here, ...

> +                 struct xen_hwp_para *hwp_para)
> +{
> +    unsigned int cpu = policy->cpu;

... this is its only use afaics, and hence the caller could as well pass
in just a CPU number?

> --- a/xen/include/public/sysctl.h
> +++ b/xen/include/public/sysctl.h
> @@ -292,6 +292,31 @@ struct xen_ondemand {
>      uint32_t up_threshold;
>  };
>  
> +struct xen_hwp_para {
> +    /*
> +     * bits 6:0   - 7bit mantissa
> +     * bits 9:7   - 3bit base-10 exponent
> +     * btis 15:10 - Unused - must be 0
> +     */
> +#define HWP_ACT_WINDOW_MANTISSA_MASK  0x7f
> +#define HWP_ACT_WINDOW_EXPONENT_MASK  0x7
> +#define HWP_ACT_WINDOW_EXPONENT_SHIFT 7
> +    uint16_t activity_window;
> +    /* energy_perf range 0-255 if 1. Otherwise 0-15 */
> +#define XEN_SYSCTL_HWP_FEAT_ENERGY_PERF (1 << 0)
> +    /* activity_window supported if 1 */
> +#define XEN_SYSCTL_HWP_FEAT_ACT_WINDOW  (1 << 1)
> +    uint8_t features; /* bit flags for features */
> +    uint8_t lowest;
> +    uint8_t most_efficient;
> +    uint8_t guaranteed;
> +    uint8_t highest;
> +    uint8_t minimum;
> +    uint8_t maximum;
> +    uint8_t desired;
> +    uint8_t energy_perf;

These fields could do with some more commentary. To be honest I had
trouble figuring (from the SDM) what exact meaning specific numeric
values have. Readers of this header should at the very least be told
where they can turn to in order to understand what these fields
communicate. (FTAOD this could be section names, but please not
section numbers. The latter are fine to use in a discussion, but
they're changing too frequently to make them useful in code
comments.)

> +};

Also, if you decide to stick to uint8_t, then the trailing padding
field (another uint8_t) wants making explicit. I'm on the edge
whether to ask to also check the field: Right here the struct is
"get only", and peeking ahead you look to be introducing a separate
sub-op for "set". Perhaps if you added /* OUT */ at the top of the
new struct? (But if you don't check the field for being zero, then
you'll want to set it to zero for forward compatibility.)

Jan
Jan Beulich May 8, 2023, 10:46 a.m. UTC | #2
On 08.05.2023 12:25, Jan Beulich wrote:
> On 01.05.2023 21:30, Jason Andryuk wrote:
>> Extend xen_get_cpufreq_para to return hwp parameters.  These match the
>> hardware rather closely.
>>
>> We need the features bitmask to indicated fields supported by the actual
>> hardware.
>>
>> The use of uint8_t parameters matches the hardware size.  uint32_t
>> entries grows the sysctl_t past the build assertion in setup.c.  The
>> uint8_t ranges are supported across multiple generations, so hopefully
>> they won't change.
> 
> Still it feels a little odd for values to be this narrow. Aiui the
> scaling_governor[] and scaling_{max,min}_freq fields aren't (really)
> used by HWP. So you could widen the union in struct
> xen_get_cpufreq_para (in a binary but not necessarily source compatible
> manner), gaining you 6 more uint32_t slots. Possibly the somewhat oddly
> placed scaling_cur_freq could be included as well ...

Having seen patch 9 now as well, I wonder whether here (or in a separate
patch) you don't want to limit providing inapplicable data (for example
not filling *scaling_available_governors would even avoid an allocation,
thus removing a possible reason for failure), while there (or again in a
separate patch) you'd also limit what the tool reports (inapplicable
output causes confusion / questions at best).

Jan
Jason Andryuk May 10, 2023, 5:49 p.m. UTC | #3
On Mon, May 8, 2023 at 6:26 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 01.05.2023 21:30, Jason Andryuk wrote:
> > Extend xen_get_cpufreq_para to return hwp parameters.  These match the
> > hardware rather closely.
> >
> > We need the features bitmask to indicated fields supported by the actual
> > hardware.
> >
> > The use of uint8_t parameters matches the hardware size.  uint32_t
> > entries grows the sysctl_t past the build assertion in setup.c.  The
> > uint8_t ranges are supported across multiple generations, so hopefully
> > they won't change.
>
> Still it feels a little odd for values to be this narrow. Aiui the
> scaling_governor[] and scaling_{max,min}_freq fields aren't (really)
> used by HWP. So you could widen the union in struct
> xen_get_cpufreq_para (in a binary but not necessarily source compatible
> manner), gaining you 6 more uint32_t slots. Possibly the somewhat oddly
> placed scaling_cur_freq could be included as well ...

The values are narrow, but they match the hardware.  It works for HWP,
so there is no need to change at this time AFAICT.

Do you want me to make this change?

> > --- a/xen/arch/x86/acpi/cpufreq/hwp.c
> > +++ b/xen/arch/x86/acpi/cpufreq/hwp.c
> > @@ -506,6 +506,31 @@ static const struct cpufreq_driver __initconstrel hwp_cpufreq_driver =
> >      .update = hwp_cpufreq_update,
> >  };
> >
> > +int get_hwp_para(const struct cpufreq_policy *policy,
>
> While I don't really mind a policy being passed into here, ...
>
> > +                 struct xen_hwp_para *hwp_para)
> > +{
> > +    unsigned int cpu = policy->cpu;
>
> ... this is its only use afaics, and hence the caller could as well pass
> in just a CPU number?

Sounds good.

> > --- a/xen/include/public/sysctl.h
> > +++ b/xen/include/public/sysctl.h
> > @@ -292,6 +292,31 @@ struct xen_ondemand {
> >      uint32_t up_threshold;
> >  };
> >
> > +struct xen_hwp_para {
> > +    /*
> > +     * bits 6:0   - 7bit mantissa
> > +     * bits 9:7   - 3bit base-10 exponent
> > +     * btis 15:10 - Unused - must be 0
> > +     */
> > +#define HWP_ACT_WINDOW_MANTISSA_MASK  0x7f
> > +#define HWP_ACT_WINDOW_EXPONENT_MASK  0x7
> > +#define HWP_ACT_WINDOW_EXPONENT_SHIFT 7
> > +    uint16_t activity_window;
> > +    /* energy_perf range 0-255 if 1. Otherwise 0-15 */
> > +#define XEN_SYSCTL_HWP_FEAT_ENERGY_PERF (1 << 0)
> > +    /* activity_window supported if 1 */
> > +#define XEN_SYSCTL_HWP_FEAT_ACT_WINDOW  (1 << 1)
> > +    uint8_t features; /* bit flags for features */
> > +    uint8_t lowest;
> > +    uint8_t most_efficient;
> > +    uint8_t guaranteed;
> > +    uint8_t highest;
> > +    uint8_t minimum;
> > +    uint8_t maximum;
> > +    uint8_t desired;
> > +    uint8_t energy_perf;
>
> These fields could do with some more commentary. To be honest I had
> trouble figuring (from the SDM) what exact meaning specific numeric
> values have. Readers of this header should at the very least be told
> where they can turn to in order to understand what these fields
> communicate. (FTAOD this could be section names, but please not
> section numbers. The latter are fine to use in a discussion, but
> they're changing too frequently to make them useful in code
> comments.)

Sounds good.  I'll add some description.

> > +};
>
> Also, if you decide to stick to uint8_t, then the trailing padding
> field (another uint8_t) wants making explicit. I'm on the edge
> whether to ask to also check the field: Right here the struct is
> "get only", and peeking ahead you look to be introducing a separate
> sub-op for "set". Perhaps if you added /* OUT */ at the top of the
> new struct? (But if you don't check the field for being zero, then
> you'll want to set it to zero for forward compatibility.)

Thanks for catching.  I'll add the padding field and zero it.

On Mon, May 8, 2023 at 6:47 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 08.05.2023 12:25, Jan Beulich wrote:
> > On 01.05.2023 21:30, Jason Andryuk wrote:
> >> Extend xen_get_cpufreq_para to return hwp parameters.  These match the
> >> hardware rather closely.
> >>
> >> We need the features bitmask to indicated fields supported by the actual
> >> hardware.
> >>
> >> The use of uint8_t parameters matches the hardware size.  uint32_t
> >> entries grows the sysctl_t past the build assertion in setup.c.  The
> >> uint8_t ranges are supported across multiple generations, so hopefully
> >> they won't change.
> >
> > Still it feels a little odd for values to be this narrow. Aiui the
> > scaling_governor[] and scaling_{max,min}_freq fields aren't (really)
> > used by HWP. So you could widen the union in struct
> > xen_get_cpufreq_para (in a binary but not necessarily source compatible
> > manner), gaining you 6 more uint32_t slots. Possibly the somewhat oddly
> > placed scaling_cur_freq could be included as well ...
>
> Having seen patch 9 now as well, I wonder whether here (or in a separate
> patch) you don't want to limit providing inapplicable data (for example
> not filling *scaling_available_governors would even avoid an allocation,
> thus removing a possible reason for failure), while there (or again in a
> separate patch) you'd also limit what the tool reports (inapplicable
> output causes confusion / questions at best).

The xenpm output only shows relevant information:

# xenpm get-cpufreq-para 11
cpu id               : 11
affected_cpus        : 11
cpuinfo frequency    : base [1600000] max [4900000]
scaling_driver       : hwp-cpufreq
scaling_avail_gov    : hwp
current_governor     : hwp
hwp variables        :
  hardware limits    : lowest [1] most_efficient [11]
                     : guaranteed [11] highest [49]
  configured limits  : min [1] max [255] energy_perf [128]
                     : activity_window [0 hardware selected]
                     : desired [0 hw autonomous]
turbo mode           : enabled

The scaling_*_freq values, policy->{min,max,cur} are filled in with
base, max and 0 in hwp_get_cpu_speeds(), so it's not totally invalid
values being returned.  The governor registration restricting to only
internal governors when HWP is active means it only has the single
governor.  I think it's okay as-is, but let me know if you want
something changed.

Regards,
Jason
Jan Beulich May 11, 2023, 6:21 a.m. UTC | #4
On 10.05.2023 19:49, Jason Andryuk wrote:
> On Mon, May 8, 2023 at 6:26 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 01.05.2023 21:30, Jason Andryuk wrote:
>>> Extend xen_get_cpufreq_para to return hwp parameters.  These match the
>>> hardware rather closely.
>>>
>>> We need the features bitmask to indicated fields supported by the actual
>>> hardware.
>>>
>>> The use of uint8_t parameters matches the hardware size.  uint32_t
>>> entries grows the sysctl_t past the build assertion in setup.c.  The
>>> uint8_t ranges are supported across multiple generations, so hopefully
>>> they won't change.
>>
>> Still it feels a little odd for values to be this narrow. Aiui the
>> scaling_governor[] and scaling_{max,min}_freq fields aren't (really)
>> used by HWP. So you could widen the union in struct
>> xen_get_cpufreq_para (in a binary but not necessarily source compatible
>> manner), gaining you 6 more uint32_t slots. Possibly the somewhat oddly
>> placed scaling_cur_freq could be included as well ...
> 
> The values are narrow, but they match the hardware.  It works for HWP,
> so there is no need to change at this time AFAICT.
> 
> Do you want me to make this change?

Well, much depends on what these 8-bit values actually express (I did
raise this question in one of the replies to your patches, as I wasn't
able to find anything in the SDM). That'll then hopefully allow to
make some educated prediction on on how likely it is that a future
variant of hwp would want to widen them. (Was it energy_perf that went
from 4 to 8 bits at some point, which you even comment upon in the
public header?)

> On Mon, May 8, 2023 at 6:47 AM Jan Beulich <jbeulich@suse.com> wrote:
>> On 08.05.2023 12:25, Jan Beulich wrote:
>>> On 01.05.2023 21:30, Jason Andryuk wrote:
>>>> Extend xen_get_cpufreq_para to return hwp parameters.  These match the
>>>> hardware rather closely.
>>>>
>>>> We need the features bitmask to indicated fields supported by the actual
>>>> hardware.
>>>>
>>>> The use of uint8_t parameters matches the hardware size.  uint32_t
>>>> entries grows the sysctl_t past the build assertion in setup.c.  The
>>>> uint8_t ranges are supported across multiple generations, so hopefully
>>>> they won't change.
>>>
>>> Still it feels a little odd for values to be this narrow. Aiui the
>>> scaling_governor[] and scaling_{max,min}_freq fields aren't (really)
>>> used by HWP. So you could widen the union in struct
>>> xen_get_cpufreq_para (in a binary but not necessarily source compatible
>>> manner), gaining you 6 more uint32_t slots. Possibly the somewhat oddly
>>> placed scaling_cur_freq could be included as well ...
>>
>> Having seen patch 9 now as well, I wonder whether here (or in a separate
>> patch) you don't want to limit providing inapplicable data (for example
>> not filling *scaling_available_governors would even avoid an allocation,
>> thus removing a possible reason for failure), while there (or again in a
>> separate patch) you'd also limit what the tool reports (inapplicable
>> output causes confusion / questions at best).
> 
> The xenpm output only shows relevant information:
> 
> # xenpm get-cpufreq-para 11
> cpu id               : 11
> affected_cpus        : 11
> cpuinfo frequency    : base [1600000] max [4900000]
> scaling_driver       : hwp-cpufreq
> scaling_avail_gov    : hwp
> current_governor     : hwp
> hwp variables        :
>   hardware limits    : lowest [1] most_efficient [11]
>                      : guaranteed [11] highest [49]
>   configured limits  : min [1] max [255] energy_perf [128]
>                      : activity_window [0 hardware selected]
>                      : desired [0 hw autonomous]
> turbo mode           : enabled
> 
> The scaling_*_freq values, policy->{min,max,cur} are filled in with
> base, max and 0 in hwp_get_cpu_speeds(), so it's not totally invalid
> values being returned.  The governor registration restricting to only
> internal governors when HWP is active means it only has the single
> governor.  I think it's okay as-is, but let me know if you want
> something changed.

Well, the main connection here was to the possible overloading of
space in the sysctl struct, by widening what the union covers. That
can of course only be done for fields which don't convey useful data.
If we go without the further overloading, I guess we can for now leave
the tool as you have it, and deal with possible tidying later on.

Jan
Jason Andryuk May 11, 2023, 1:49 p.m. UTC | #5
On Thu, May 11, 2023 at 2:21 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 10.05.2023 19:49, Jason Andryuk wrote:
> > On Mon, May 8, 2023 at 6:26 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> On 01.05.2023 21:30, Jason Andryuk wrote:
> >>> Extend xen_get_cpufreq_para to return hwp parameters.  These match the
> >>> hardware rather closely.
> >>>
> >>> We need the features bitmask to indicated fields supported by the actual
> >>> hardware.
> >>>
> >>> The use of uint8_t parameters matches the hardware size.  uint32_t
> >>> entries grows the sysctl_t past the build assertion in setup.c.  The
> >>> uint8_t ranges are supported across multiple generations, so hopefully
> >>> they won't change.
> >>
> >> Still it feels a little odd for values to be this narrow. Aiui the
> >> scaling_governor[] and scaling_{max,min}_freq fields aren't (really)
> >> used by HWP. So you could widen the union in struct
> >> xen_get_cpufreq_para (in a binary but not necessarily source compatible
> >> manner), gaining you 6 more uint32_t slots. Possibly the somewhat oddly
> >> placed scaling_cur_freq could be included as well ...
> >
> > The values are narrow, but they match the hardware.  It works for HWP,
> > so there is no need to change at this time AFAICT.
> >
> > Do you want me to make this change?
>
> Well, much depends on what these 8-bit values actually express (I did
> raise this question in one of the replies to your patches, as I wasn't
> able to find anything in the SDM). That'll then hopefully allow to
> make some educated prediction on on how likely it is that a future
> variant of hwp would want to widen them.

Sorry for not providing a reference earlier.  In the SDM,
HARDWARE-CONTROLLED PERFORMANCE STATES (HWP) section, there is this
second paragraph:
"""
In contrast, HWP is an implementation of the ACPI-defined
Collaborative Processor Performance Control (CPPC), which specifies
that the platform enumerates a continuous, abstract unit-less,
performance value scale that is not tied to a specific performance
state / frequency by definition. While the enumerated scale is roughly
linear in terms of a delivered integer workload performance result,
the OS is required to characterize the performance value range to
comprehend the delivered performance for an applied workload.
"""

The numbers are "continuous, abstract unit-less, performance value."
So there isn't much to go on there, but generally, smaller numbers
mean slower and bigger numbers mean faster.

Cross referencing the ACPI spec here:
https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html#collaborative-processor-performance-control

Scrolling down you can find the register entries such as

Highest Performance
Register or DWORD Attribute:  Read
Size:                         8-32 bits

AMD has its own pstate implementation that is similar to HWP.  Looking
at the Linux support, the AMD hardware also use 8 bit values for the
comparable fields:
https://elixir.bootlin.com/linux/latest/source/arch/x86/include/asm/msr-index.h#L612

So Intel and AMD are 8bit for now at least.  Something could do 32bits
according to the ACPI spec.

8 bits of granularity for slow to fast seems like plenty to me.  I'm
not sure what one would gain from 16 or 32 bits, but I'm not designing
the hardware.  From the earlier xenpm output, "highest" was 49, so
still a decent amount of room in an 8 bit range.

> (Was it energy_perf that went
> from 4 to 8 bits at some point, which you even comment upon in the
> public header?)

energy_perf (Energy_Performanc_Preference) had a fallback: "If
CPUID.06H:EAX[bit 10] indicates that this field is not supported, HWP
uses the value of the IA32_ENERGY_PERF_BIAS MSR to determine the
energy efficiency / performance preference."  So it had a different
range, but that was because it was being put into an older register.

However, I've removed that fallback code in v4.

Regards,
Jason
Jan Beulich May 11, 2023, 2:10 p.m. UTC | #6
On 11.05.2023 15:49, Jason Andryuk wrote:
> On Thu, May 11, 2023 at 2:21 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 10.05.2023 19:49, Jason Andryuk wrote:
>>> On Mon, May 8, 2023 at 6:26 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>
>>>> On 01.05.2023 21:30, Jason Andryuk wrote:
>>>>> Extend xen_get_cpufreq_para to return hwp parameters.  These match the
>>>>> hardware rather closely.
>>>>>
>>>>> We need the features bitmask to indicated fields supported by the actual
>>>>> hardware.
>>>>>
>>>>> The use of uint8_t parameters matches the hardware size.  uint32_t
>>>>> entries grows the sysctl_t past the build assertion in setup.c.  The
>>>>> uint8_t ranges are supported across multiple generations, so hopefully
>>>>> they won't change.
>>>>
>>>> Still it feels a little odd for values to be this narrow. Aiui the
>>>> scaling_governor[] and scaling_{max,min}_freq fields aren't (really)
>>>> used by HWP. So you could widen the union in struct
>>>> xen_get_cpufreq_para (in a binary but not necessarily source compatible
>>>> manner), gaining you 6 more uint32_t slots. Possibly the somewhat oddly
>>>> placed scaling_cur_freq could be included as well ...
>>>
>>> The values are narrow, but they match the hardware.  It works for HWP,
>>> so there is no need to change at this time AFAICT.
>>>
>>> Do you want me to make this change?
>>
>> Well, much depends on what these 8-bit values actually express (I did
>> raise this question in one of the replies to your patches, as I wasn't
>> able to find anything in the SDM). That'll then hopefully allow to
>> make some educated prediction on on how likely it is that a future
>> variant of hwp would want to widen them.
> 
> Sorry for not providing a reference earlier.  In the SDM,
> HARDWARE-CONTROLLED PERFORMANCE STATES (HWP) section, there is this
> second paragraph:
> """
> In contrast, HWP is an implementation of the ACPI-defined
> Collaborative Processor Performance Control (CPPC), which specifies
> that the platform enumerates a continuous, abstract unit-less,
> performance value scale that is not tied to a specific performance
> state / frequency by definition. While the enumerated scale is roughly
> linear in terms of a delivered integer workload performance result,
> the OS is required to characterize the performance value range to
> comprehend the delivered performance for an applied workload.
> """
> 
> The numbers are "continuous, abstract unit-less, performance value."
> So there isn't much to go on there, but generally, smaller numbers
> mean slower and bigger numbers mean faster.
> 
> Cross referencing the ACPI spec here:
> https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html#collaborative-processor-performance-control
> 
> Scrolling down you can find the register entries such as
> 
> Highest Performance
> Register or DWORD Attribute:  Read
> Size:                         8-32 bits
> 
> AMD has its own pstate implementation that is similar to HWP.  Looking
> at the Linux support, the AMD hardware also use 8 bit values for the
> comparable fields:
> https://elixir.bootlin.com/linux/latest/source/arch/x86/include/asm/msr-index.h#L612
> 
> So Intel and AMD are 8bit for now at least.  Something could do 32bits
> according to the ACPI spec.
> 
> 8 bits of granularity for slow to fast seems like plenty to me.  I'm
> not sure what one would gain from 16 or 32 bits, but I'm not designing
> the hardware.  From the earlier xenpm output, "highest" was 49, so
> still a decent amount of room in an 8 bit range.

Hmm, thanks for the pointers. I'm still somewhat undecided. I guess I'm
okay with you keeping things as you have them. If and when needed we can
still rework the structure - it is possible to change it as it's (for
the time being at least) still an unstable interface.

Jan
Jason Andryuk May 11, 2023, 8:22 p.m. UTC | #7
On Thu, May 11, 2023 at 10:10 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 11.05.2023 15:49, Jason Andryuk wrote:
> > On Thu, May 11, 2023 at 2:21 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> On 10.05.2023 19:49, Jason Andryuk wrote:
> >>> On Mon, May 8, 2023 at 6:26 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>>>
> >>>> On 01.05.2023 21:30, Jason Andryuk wrote:
> >>>>> Extend xen_get_cpufreq_para to return hwp parameters.  These match the
> >>>>> hardware rather closely.
> >>>>>
> >>>>> We need the features bitmask to indicated fields supported by the actual
> >>>>> hardware.
> >>>>>
> >>>>> The use of uint8_t parameters matches the hardware size.  uint32_t
> >>>>> entries grows the sysctl_t past the build assertion in setup.c.  The
> >>>>> uint8_t ranges are supported across multiple generations, so hopefully
> >>>>> they won't change.
> >>>>
> >>>> Still it feels a little odd for values to be this narrow. Aiui the
> >>>> scaling_governor[] and scaling_{max,min}_freq fields aren't (really)
> >>>> used by HWP. So you could widen the union in struct
> >>>> xen_get_cpufreq_para (in a binary but not necessarily source compatible
> >>>> manner), gaining you 6 more uint32_t slots. Possibly the somewhat oddly
> >>>> placed scaling_cur_freq could be included as well ...
> >>>
> >>> The values are narrow, but they match the hardware.  It works for HWP,
> >>> so there is no need to change at this time AFAICT.
> >>>
> >>> Do you want me to make this change?
> >>
> >> Well, much depends on what these 8-bit values actually express (I did
> >> raise this question in one of the replies to your patches, as I wasn't
> >> able to find anything in the SDM). That'll then hopefully allow to
> >> make some educated prediction on on how likely it is that a future
> >> variant of hwp would want to widen them.
> >
> > Sorry for not providing a reference earlier.  In the SDM,
> > HARDWARE-CONTROLLED PERFORMANCE STATES (HWP) section, there is this
> > second paragraph:
> > """
> > In contrast, HWP is an implementation of the ACPI-defined
> > Collaborative Processor Performance Control (CPPC), which specifies
> > that the platform enumerates a continuous, abstract unit-less,
> > performance value scale that is not tied to a specific performance
> > state / frequency by definition. While the enumerated scale is roughly
> > linear in terms of a delivered integer workload performance result,
> > the OS is required to characterize the performance value range to
> > comprehend the delivered performance for an applied workload.
> > """
> >
> > The numbers are "continuous, abstract unit-less, performance value."
> > So there isn't much to go on there, but generally, smaller numbers
> > mean slower and bigger numbers mean faster.
> >
> > Cross referencing the ACPI spec here:
> > https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html#collaborative-processor-performance-control
> >
> > Scrolling down you can find the register entries such as
> >
> > Highest Performance
> > Register or DWORD Attribute:  Read
> > Size:                         8-32 bits
> >
> > AMD has its own pstate implementation that is similar to HWP.  Looking
> > at the Linux support, the AMD hardware also use 8 bit values for the
> > comparable fields:
> > https://elixir.bootlin.com/linux/latest/source/arch/x86/include/asm/msr-index.h#L612
> >
> > So Intel and AMD are 8bit for now at least.  Something could do 32bits
> > according to the ACPI spec.
> >
> > 8 bits of granularity for slow to fast seems like plenty to me.  I'm
> > not sure what one would gain from 16 or 32 bits, but I'm not designing
> > the hardware.  From the earlier xenpm output, "highest" was 49, so
> > still a decent amount of room in an 8 bit range.
>
> Hmm, thanks for the pointers. I'm still somewhat undecided. I guess I'm
> okay with you keeping things as you have them. If and when needed we can
> still rework the structure - it is possible to change it as it's (for
> the time being at least) still an unstable interface.

With an anonymous union and anonymous struct, struct
xen_get_cpufreq_para can be re-arranged and compile without any
changes to other cpufreq code.  struct xen_hwp_para becomes 10
uint32_t's.  The old scaling is 3 * uint32_t + 16 bytes
CPUFREQ_NAME_LEN + 4 * uint32_t for xen_ondemand = 11 uint32_t.  So
int32_t turbo_enabled doesn't move and it's binary compatible.

Anonymous unions and structs aren't allowed in the public header
though, right?  So that would need to change, though it doesn't seem
too bad.  There isn't too much churn.

I have no plans to tackle AMD pstate.  But having glanced at it this
morning, maybe these hwp sysctls should be renamed cppc?  AMD pstate
and HWP are both implementations of CPPC, so that could be more future
proof?  But, again, I only glanced at the AMD stuff, so there may be
other changes needed.

Regards,
Jason
Jan Beulich May 12, 2023, 6:32 a.m. UTC | #8
On 11.05.2023 22:22, Jason Andryuk wrote:
> On Thu, May 11, 2023 at 10:10 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 11.05.2023 15:49, Jason Andryuk wrote:
>>> On Thu, May 11, 2023 at 2:21 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>
>>>> On 10.05.2023 19:49, Jason Andryuk wrote:
>>>>> On Mon, May 8, 2023 at 6:26 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>>>
>>>>>> On 01.05.2023 21:30, Jason Andryuk wrote:
>>>>>>> Extend xen_get_cpufreq_para to return hwp parameters.  These match the
>>>>>>> hardware rather closely.
>>>>>>>
>>>>>>> We need the features bitmask to indicated fields supported by the actual
>>>>>>> hardware.
>>>>>>>
>>>>>>> The use of uint8_t parameters matches the hardware size.  uint32_t
>>>>>>> entries grows the sysctl_t past the build assertion in setup.c.  The
>>>>>>> uint8_t ranges are supported across multiple generations, so hopefully
>>>>>>> they won't change.
>>>>>>
>>>>>> Still it feels a little odd for values to be this narrow. Aiui the
>>>>>> scaling_governor[] and scaling_{max,min}_freq fields aren't (really)
>>>>>> used by HWP. So you could widen the union in struct
>>>>>> xen_get_cpufreq_para (in a binary but not necessarily source compatible
>>>>>> manner), gaining you 6 more uint32_t slots. Possibly the somewhat oddly
>>>>>> placed scaling_cur_freq could be included as well ...
>>>>>
>>>>> The values are narrow, but they match the hardware.  It works for HWP,
>>>>> so there is no need to change at this time AFAICT.
>>>>>
>>>>> Do you want me to make this change?
>>>>
>>>> Well, much depends on what these 8-bit values actually express (I did
>>>> raise this question in one of the replies to your patches, as I wasn't
>>>> able to find anything in the SDM). That'll then hopefully allow to
>>>> make some educated prediction on on how likely it is that a future
>>>> variant of hwp would want to widen them.
>>>
>>> Sorry for not providing a reference earlier.  In the SDM,
>>> HARDWARE-CONTROLLED PERFORMANCE STATES (HWP) section, there is this
>>> second paragraph:
>>> """
>>> In contrast, HWP is an implementation of the ACPI-defined
>>> Collaborative Processor Performance Control (CPPC), which specifies
>>> that the platform enumerates a continuous, abstract unit-less,
>>> performance value scale that is not tied to a specific performance
>>> state / frequency by definition. While the enumerated scale is roughly
>>> linear in terms of a delivered integer workload performance result,
>>> the OS is required to characterize the performance value range to
>>> comprehend the delivered performance for an applied workload.
>>> """
>>>
>>> The numbers are "continuous, abstract unit-less, performance value."
>>> So there isn't much to go on there, but generally, smaller numbers
>>> mean slower and bigger numbers mean faster.
>>>
>>> Cross referencing the ACPI spec here:
>>> https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html#collaborative-processor-performance-control
>>>
>>> Scrolling down you can find the register entries such as
>>>
>>> Highest Performance
>>> Register or DWORD Attribute:  Read
>>> Size:                         8-32 bits
>>>
>>> AMD has its own pstate implementation that is similar to HWP.  Looking
>>> at the Linux support, the AMD hardware also use 8 bit values for the
>>> comparable fields:
>>> https://elixir.bootlin.com/linux/latest/source/arch/x86/include/asm/msr-index.h#L612
>>>
>>> So Intel and AMD are 8bit for now at least.  Something could do 32bits
>>> according to the ACPI spec.
>>>
>>> 8 bits of granularity for slow to fast seems like plenty to me.  I'm
>>> not sure what one would gain from 16 or 32 bits, but I'm not designing
>>> the hardware.  From the earlier xenpm output, "highest" was 49, so
>>> still a decent amount of room in an 8 bit range.
>>
>> Hmm, thanks for the pointers. I'm still somewhat undecided. I guess I'm
>> okay with you keeping things as you have them. If and when needed we can
>> still rework the structure - it is possible to change it as it's (for
>> the time being at least) still an unstable interface.
> 
> With an anonymous union and anonymous struct, struct
> xen_get_cpufreq_para can be re-arranged and compile without any
> changes to other cpufreq code.  struct xen_hwp_para becomes 10
> uint32_t's.  The old scaling is 3 * uint32_t + 16 bytes
> CPUFREQ_NAME_LEN + 4 * uint32_t for xen_ondemand = 11 uint32_t.  So
> int32_t turbo_enabled doesn't move and it's binary compatible.
> 
> Anonymous unions and structs aren't allowed in the public header
> though, right?

Correct.

>  So that would need to change, though it doesn't seem
> too bad.  There isn't too much churn.
> 
> I have no plans to tackle AMD pstate.  But having glanced at it this
> morning, maybe these hwp sysctls should be renamed cppc?  AMD pstate
> and HWP are both implementations of CPPC, so that could be more future
> proof?  But, again, I only glanced at the AMD stuff, so there may be
> other changes needed.

I consider this naming change plan plausible. If further adjustments
end up necessary for AMD, that'll be no worse (but maybe better) than
if we have to go from HWP to a more general name altogether.

Jan
diff mbox series

Patch

diff --git a/xen/arch/x86/acpi/cpufreq/hwp.c b/xen/arch/x86/acpi/cpufreq/hwp.c
index f84abe1386..cb52918799 100644
--- a/xen/arch/x86/acpi/cpufreq/hwp.c
+++ b/xen/arch/x86/acpi/cpufreq/hwp.c
@@ -506,6 +506,31 @@  static const struct cpufreq_driver __initconstrel hwp_cpufreq_driver =
     .update = hwp_cpufreq_update,
 };
 
+int get_hwp_para(const struct cpufreq_policy *policy,
+                 struct xen_hwp_para *hwp_para)
+{
+    unsigned int cpu = policy->cpu;
+    const struct hwp_drv_data *data = per_cpu(hwp_drv_data, cpu);
+
+    if ( data == NULL )
+        return -EINVAL;
+
+    hwp_para->features        =
+        (feature_hwp_activity_window ? XEN_SYSCTL_HWP_FEAT_ACT_WINDOW  : 0) |
+        (feature_hwp_energy_perf     ? XEN_SYSCTL_HWP_FEAT_ENERGY_PERF : 0);
+    hwp_para->lowest          = data->hw.lowest;
+    hwp_para->most_efficient  = data->hw.most_efficient;
+    hwp_para->guaranteed      = data->hw.guaranteed;
+    hwp_para->highest         = data->hw.highest;
+    hwp_para->minimum         = data->minimum;
+    hwp_para->maximum         = data->maximum;
+    hwp_para->energy_perf     = data->energy_perf;
+    hwp_para->activity_window = data->activity_window;
+    hwp_para->desired         = data->desired;
+
+    return 0;
+}
+
 int __init hwp_register_driver(void)
 {
     return cpufreq_register_driver(&hwp_cpufreq_driver);
diff --git a/xen/drivers/acpi/pmstat.c b/xen/drivers/acpi/pmstat.c
index 1bae635101..67fd9dabd4 100644
--- a/xen/drivers/acpi/pmstat.c
+++ b/xen/drivers/acpi/pmstat.c
@@ -290,6 +290,11 @@  static int get_cpufreq_para(struct xen_sysctl_pm_op *op)
             &op->u.get_para.u.ondemand.sampling_rate,
             &op->u.get_para.u.ondemand.up_threshold);
     }
+
+    if ( !strncasecmp(op->u.get_para.scaling_governor, XEN_HWP_GOVERNOR,
+                      CPUFREQ_NAME_LEN) )
+        ret = get_hwp_para(policy, &op->u.get_para.u.hwp_para);
+
     op->u.get_para.turbo_enabled = cpufreq_get_turbo_status(op->cpuid);
 
     return ret;
diff --git a/xen/include/acpi/cpufreq/cpufreq.h b/xen/include/acpi/cpufreq/cpufreq.h
index 29a712a4f1..92b4c7e79c 100644
--- a/xen/include/acpi/cpufreq/cpufreq.h
+++ b/xen/include/acpi/cpufreq/cpufreq.h
@@ -247,5 +247,7 @@  void intel_feature_detect(struct cpufreq_policy *policy);
 
 extern bool opt_cpufreq_hwp;
 extern bool opt_cpufreq_hdc;
+int get_hwp_para(const struct cpufreq_policy *policy,
+                 struct xen_hwp_para *hwp_para);
 
 #endif /* __XEN_CPUFREQ_PM_H__ */
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index b448f13b75..bf7e6594a7 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -292,6 +292,31 @@  struct xen_ondemand {
     uint32_t up_threshold;
 };
 
+struct xen_hwp_para {
+    /*
+     * bits 6:0   - 7bit mantissa
+     * bits 9:7   - 3bit base-10 exponent
+     * btis 15:10 - Unused - must be 0
+     */
+#define HWP_ACT_WINDOW_MANTISSA_MASK  0x7f
+#define HWP_ACT_WINDOW_EXPONENT_MASK  0x7
+#define HWP_ACT_WINDOW_EXPONENT_SHIFT 7
+    uint16_t activity_window;
+    /* energy_perf range 0-255 if 1. Otherwise 0-15 */
+#define XEN_SYSCTL_HWP_FEAT_ENERGY_PERF (1 << 0)
+    /* activity_window supported if 1 */
+#define XEN_SYSCTL_HWP_FEAT_ACT_WINDOW  (1 << 1)
+    uint8_t features; /* bit flags for features */
+    uint8_t lowest;
+    uint8_t most_efficient;
+    uint8_t guaranteed;
+    uint8_t highest;
+    uint8_t minimum;
+    uint8_t maximum;
+    uint8_t desired;
+    uint8_t energy_perf;
+};
+
 #define XEN_HWP_GOVERNOR "hwp-internal"
 /*
  * cpufreq para name of this structure named
@@ -324,6 +349,7 @@  struct xen_get_cpufreq_para {
     union {
         struct  xen_userspace userspace;
         struct  xen_ondemand ondemand;
+        struct  xen_hwp_para hwp_para;
     } u;
 
     int32_t turbo_enabled;