diff mbox series

[v7,1/4] cpufreq: Introduce an optional cpuinfo_avg_freq sysfs entry

Message ID 20240913132944.1880703-2-beata.michalska@arm.com (mailing list archive)
State Changes Requested, archived
Headers show
Series Add support for AArch64 AMUv1-based average freq | expand

Commit Message

Beata Michalska Sept. 13, 2024, 1:29 p.m. UTC
Currently the CPUFreq core exposes two sysfs attributes that can be used
to query current frequency of a given CPU(s): namely cpuinfo_cur_freq
and scaling_cur_freq. Both provide slightly different view on the
subject and they do come with their own drawbacks.

cpuinfo_cur_freq provides higher precision though at a cost of being
rather expensive. Moreover, the information retrieved via this attribute
is somewhat short lived as frequency can change at any point of time
making it difficult to reason from.

scaling_cur_freq, on the other hand, tends to be less accurate but then
the actual level of precision (and source of information) varies between
architectures making it a bit ambiguous.

The new attribute, cpuinfo_avg_freq, is intended to provide more stable,
distinct interface, exposing an average frequency of a given CPU(s), as
reported by the hardware, over a time frame spanning no more than a few
milliseconds. As it requires appropriate hardware support, this
interface is optional.

Signed-off-by: Beata Michalska <beata.michalska@arm.com>
---
 Documentation/admin-guide/pm/cpufreq.rst | 10 ++++++++
 drivers/cpufreq/cpufreq.c                | 31 ++++++++++++++++++++++++
 include/linux/cpufreq.h                  |  1 +
 3 files changed, 42 insertions(+)

Comments

Jie Zhan Sept. 25, 2024, 8:58 a.m. UTC | #1
Hi Beata,

Great thanks for the update.

On 13/09/2024 21:29, Beata Michalska wrote:
> Currently the CPUFreq core exposes two sysfs attributes that can be used
> to query current frequency of a given CPU(s): namely cpuinfo_cur_freq
> and scaling_cur_freq. Both provide slightly different view on the
> subject and they do come with their own drawbacks.
> 
> cpuinfo_cur_freq provides higher precision though at a cost of being
> rather expensive. Moreover, the information retrieved via this attribute
> is somewhat short lived as frequency can change at any point of time
> making it difficult to reason from.
> 
> scaling_cur_freq, on the other hand, tends to be less accurate but then
> the actual level of precision (and source of information) varies between
> architectures making it a bit ambiguous.
> 
> The new attribute, cpuinfo_avg_freq, is intended to provide more stable,
> distinct interface, exposing an average frequency of a given CPU(s), as
> reported by the hardware, over a time frame spanning no more than a few
> milliseconds. As it requires appropriate hardware support, this
> interface is optional.
> 
> Signed-off-by: Beata Michalska <beata.michalska@arm.com>
> ---
>  Documentation/admin-guide/pm/cpufreq.rst | 10 ++++++++
>  drivers/cpufreq/cpufreq.c                | 31 ++++++++++++++++++++++++
>  include/linux/cpufreq.h                  |  1 +
>  3 files changed, 42 insertions(+)
> 
> diff --git a/Documentation/admin-guide/pm/cpufreq.rst b/Documentation/admin-guide/pm/cpufreq.rst
> index fe1be4ad88cb..2204d6132c05 100644
> --- a/Documentation/admin-guide/pm/cpufreq.rst
> +++ b/Documentation/admin-guide/pm/cpufreq.rst
> @@ -248,6 +248,16 @@ are the following:
>  	If that frequency cannot be determined, this attribute should not
>  	be present.
>  
> +``cpuinfo_avg_freq``
> +        An average frequency (in KHz) of all CPUs belonging to a given policy,
> +        derived from a hardware provided feedback and reported on a time frame
> +        spanning at most few milliseconds.

I don't think it's necessary to put the 'at most few milliseconds'
limitation on.

It's supposed to be fine for other platforms to implement the interface
with a longer time period, e.g. a few seconds, in the future.  Otherwise,
this would probably force the implementation of 'cpuinfo_avg_freq' to be
binded with the 'scale freq tick' stuff.

> +
> +        This is expected to be based on the frequency the hardware actually runs
> +        at and, as such, might require specialised hardware support (such as AMU
> +        extension on ARM). If one cannot be determined, this attribute should
> +        not be present.
> +
>  ``cpuinfo_max_freq``
>  	Maximum possible operating frequency the CPUs belonging to this policy
>  	can run at (in kHz).

...

> diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
> index d4d2f4d1d7cb..48262073707e 100644
> --- a/include/linux/cpufreq.h
> +++ b/include/linux/cpufreq.h
> @@ -1195,6 +1195,7 @@ static inline int of_perf_domain_get_sharing_cpumask(int pcpu, const char *list_
>  #endif
>  
>  extern unsigned int arch_freq_get_on_cpu(int cpu);
> +extern int arch_freq_avg_get_on_cpu(int cpu);

It's werid to have two different functions with mostly the same behaviour,
i.e. arch_freq_get_on_cpu() and arch_freq_avg_get_on_cpu().

Appreciated that there would be some capatibility work with x86 at the
moment if merging them, e.g. return type, default implementation, impact on
some userspace tools, etc.

Anyhow, are they supposed to be merged in the near future?


Thanks,
Jie
>  
>  #ifndef arch_set_freq_scale
>  static __always_inline
Beata Michalska Sept. 26, 2024, 10:42 a.m. UTC | #2
On Wed, Sep 25, 2024 at 04:58:36PM +0800, Jie Zhan wrote:
> Hi Beata,
Hi Jie
> 
> Great thanks for the update.
> 
> On 13/09/2024 21:29, Beata Michalska wrote:
> > Currently the CPUFreq core exposes two sysfs attributes that can be used
> > to query current frequency of a given CPU(s): namely cpuinfo_cur_freq
> > and scaling_cur_freq. Both provide slightly different view on the
> > subject and they do come with their own drawbacks.
> > 
> > cpuinfo_cur_freq provides higher precision though at a cost of being
> > rather expensive. Moreover, the information retrieved via this attribute
> > is somewhat short lived as frequency can change at any point of time
> > making it difficult to reason from.
> > 
> > scaling_cur_freq, on the other hand, tends to be less accurate but then
> > the actual level of precision (and source of information) varies between
> > architectures making it a bit ambiguous.
> > 
> > The new attribute, cpuinfo_avg_freq, is intended to provide more stable,
> > distinct interface, exposing an average frequency of a given CPU(s), as
> > reported by the hardware, over a time frame spanning no more than a few
> > milliseconds. As it requires appropriate hardware support, this
> > interface is optional.
> > 
> > Signed-off-by: Beata Michalska <beata.michalska@arm.com>
> > ---
> >  Documentation/admin-guide/pm/cpufreq.rst | 10 ++++++++
> >  drivers/cpufreq/cpufreq.c                | 31 ++++++++++++++++++++++++
> >  include/linux/cpufreq.h                  |  1 +
> >  3 files changed, 42 insertions(+)
> > 
> > diff --git a/Documentation/admin-guide/pm/cpufreq.rst b/Documentation/admin-guide/pm/cpufreq.rst
> > index fe1be4ad88cb..2204d6132c05 100644
> > --- a/Documentation/admin-guide/pm/cpufreq.rst
> > +++ b/Documentation/admin-guide/pm/cpufreq.rst
> > @@ -248,6 +248,16 @@ are the following:
> >  	If that frequency cannot be determined, this attribute should not
> >  	be present.
> >  
> > +``cpuinfo_avg_freq``
> > +        An average frequency (in KHz) of all CPUs belonging to a given policy,
> > +        derived from a hardware provided feedback and reported on a time frame
> > +        spanning at most few milliseconds.
> 
> I don't think it's necessary to put the 'at most few milliseconds'
> limitation on.
> 
> It's supposed to be fine for other platforms to implement the interface
> with a longer time period, e.g. a few seconds, in the future.  Otherwise,
> this would probably force the implementation of 'cpuinfo_avg_freq' to be
> binded with the 'scale freq tick' stuff.
Actually the sched_tick was intentionally omitted from the description
to avoid associating one with another.
Not really sure how useful it would be to have a longer time-frames for the
average frequency though.
It is still intended to be rather accurate - thus the 'at most few
milliseconds' statement. Extending that period reduces the accuracy.
If we allow that - meaning getting average frequency over different time-frame
spans , we introduce yet again platform specific behaviour for common interface,
which might not be that desired.

> 
> > +
> > +        This is expected to be based on the frequency the hardware actually runs
> > +        at and, as such, might require specialised hardware support (such as AMU
> > +        extension on ARM). If one cannot be determined, this attribute should
> > +        not be present.
> > +
> >  ``cpuinfo_max_freq``
> >  	Maximum possible operating frequency the CPUs belonging to this policy
> >  	can run at (in kHz).
> 
> ...
> 
> > diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
> > index d4d2f4d1d7cb..48262073707e 100644
> > --- a/include/linux/cpufreq.h
> > +++ b/include/linux/cpufreq.h
> > @@ -1195,6 +1195,7 @@ static inline int of_perf_domain_get_sharing_cpumask(int pcpu, const char *list_
> >  #endif
> >  
> >  extern unsigned int arch_freq_get_on_cpu(int cpu);
> > +extern int arch_freq_avg_get_on_cpu(int cpu);
> 
> It's werid to have two different functions with mostly the same behaviour,
> i.e. arch_freq_get_on_cpu() and arch_freq_avg_get_on_cpu().
> 
> Appreciated that there would be some capatibility work with x86 at the
> moment if merging them, e.g. return type, default implementation, impact on
> some userspace tools, etc.
The intention here was indeed to have a clean distinction between the two.
> 
> Anyhow, are they supposed to be merged in the near future?
That depends on any further comments on that new sysfs attribute I guess.

---
Thanks
Beata
> 
> 
> Thanks,
> Jie
> >  
> >  #ifndef arch_set_freq_scale
> >  static __always_inline
Viresh Kumar Oct. 29, 2024, 7:04 a.m. UTC | #3
Apologies for the delay from my side. September was mostly holidays
for me and then I was stuck with other stuff plus email backlog and
this series was always a painful point to return to :(

On 13-09-24, 14:29, Beata Michalska wrote:
> Currently the CPUFreq core exposes two sysfs attributes that can be used
> to query current frequency of a given CPU(s): namely cpuinfo_cur_freq
> and scaling_cur_freq. Both provide slightly different view on the
> subject and they do come with their own drawbacks.
> 
> cpuinfo_cur_freq provides higher precision though at a cost of being
> rather expensive. Moreover, the information retrieved via this attribute
> is somewhat short lived as frequency can change at any point of time
> making it difficult to reason from.
> 
> scaling_cur_freq, on the other hand, tends to be less accurate but then
> the actual level of precision (and source of information) varies between
> architectures making it a bit ambiguous.
> 
> The new attribute, cpuinfo_avg_freq, is intended to provide more stable,
> distinct interface, exposing an average frequency of a given CPU(s), as
> reported by the hardware, over a time frame spanning no more than a few
> milliseconds. As it requires appropriate hardware support, this
> interface is optional.

From what I recall, the plan is to:
- keep cpuinfo_cur_freq as it is, not expose for x86 and call ->get()
  for ARM.

- introduce cpuinfo_avg_freq() and make it return frequency from hw
  counters for both ARM and Intel and others who provide the API.

- update scaling_cur_freq() to only return the requested frequency or
  error in case of X86 and update documentation to reflect the same.
  Right now or after some time ? How much time ?

  Rafael ?

> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 04fc786dd2c0..3493e5a9500d 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -752,6 +752,16 @@ __weak unsigned int arch_freq_get_on_cpu(int cpu)
>  	return 0;
>  }
>  
> +__weak int arch_freq_avg_get_on_cpu(int cpu)
> +{
> +	return -EOPNOTSUPP;
> +}
> +
> +static inline bool cpufreq_avg_freq_supported(struct cpufreq_policy *policy)
> +{
> +	return arch_freq_avg_get_on_cpu(policy->cpu) >= 0;
> +}

And why aren't we simply reusing arch_freq_get_on_cpu() here ?
Rafael J. Wysocki Oct. 29, 2024, 11:31 a.m. UTC | #4
On Tue, Oct 29, 2024 at 8:04 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> Apologies for the delay from my side. September was mostly holidays
> for me and then I was stuck with other stuff plus email backlog and
> this series was always a painful point to return to :(
>
> On 13-09-24, 14:29, Beata Michalska wrote:
> > Currently the CPUFreq core exposes two sysfs attributes that can be used
> > to query current frequency of a given CPU(s): namely cpuinfo_cur_freq
> > and scaling_cur_freq. Both provide slightly different view on the
> > subject and they do come with their own drawbacks.
> >
> > cpuinfo_cur_freq provides higher precision though at a cost of being
> > rather expensive. Moreover, the information retrieved via this attribute
> > is somewhat short lived as frequency can change at any point of time
> > making it difficult to reason from.
> >
> > scaling_cur_freq, on the other hand, tends to be less accurate but then
> > the actual level of precision (and source of information) varies between
> > architectures making it a bit ambiguous.
> >
> > The new attribute, cpuinfo_avg_freq, is intended to provide more stable,
> > distinct interface, exposing an average frequency of a given CPU(s), as
> > reported by the hardware, over a time frame spanning no more than a few
> > milliseconds. As it requires appropriate hardware support, this
> > interface is optional.
>
> From what I recall, the plan is to:
> - keep cpuinfo_cur_freq as it is, not expose for x86 and call ->get()
>   for ARM.

Yes.

> - introduce cpuinfo_avg_freq() and make it return frequency from hw
>   counters for both ARM and Intel and others who provide the API.

Yes.

> - update scaling_cur_freq() to only return the requested frequency or
>   error in case of X86

Yes.

Preferably, -ENOTSUPP for "setpolicy" drivers without the .get() callback.

>   and update documentation to reflect the same.
>   Right now or after some time ? How much time ?

After some time, I think at least two cycles, so people have the time
to switch over, but much more may be necessary if someone is stuck
with RHEL or similar user space.

Anyway, x86 will be the only one affected and there may be a Kconfig
option even to allow it to be changed at the kernel build time.

The documentation for cpuinfo_avg_freq() needs to be added along with it.

> > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> > index 04fc786dd2c0..3493e5a9500d 100644
> > --- a/drivers/cpufreq/cpufreq.c
> > +++ b/drivers/cpufreq/cpufreq.c
> > @@ -752,6 +752,16 @@ __weak unsigned int arch_freq_get_on_cpu(int cpu)
> >       return 0;
> >  }
> >
> > +__weak int arch_freq_avg_get_on_cpu(int cpu)
> > +{
> > +     return -EOPNOTSUPP;
> > +}
> > +
> > +static inline bool cpufreq_avg_freq_supported(struct cpufreq_policy *policy)
> > +{
> > +     return arch_freq_avg_get_on_cpu(policy->cpu) >= 0;
> > +}
>
> And why aren't we simply reusing arch_freq_get_on_cpu() here ?
>
> --
> viresh
Beata Michalska Nov. 4, 2024, 8 a.m. UTC | #5
On Tue, Oct 29, 2024 at 12:34:29PM +0530, Viresh Kumar wrote:
> Apologies for the delay from my side. September was mostly holidays
> for me and then I was stuck with other stuff plus email backlog and
> this series was always a painful point to return to :(
Thanks for getting back to me on this one!
> 
> On 13-09-24, 14:29, Beata Michalska wrote:
> > Currently the CPUFreq core exposes two sysfs attributes that can be used
> > to query current frequency of a given CPU(s): namely cpuinfo_cur_freq
> > and scaling_cur_freq. Both provide slightly different view on the
> > subject and they do come with their own drawbacks.
> > 
> > cpuinfo_cur_freq provides higher precision though at a cost of being
> > rather expensive. Moreover, the information retrieved via this attribute
> > is somewhat short lived as frequency can change at any point of time
> > making it difficult to reason from.
> > 
> > scaling_cur_freq, on the other hand, tends to be less accurate but then
> > the actual level of precision (and source of information) varies between
> > architectures making it a bit ambiguous.
> > 
> > The new attribute, cpuinfo_avg_freq, is intended to provide more stable,
> > distinct interface, exposing an average frequency of a given CPU(s), as
> > reported by the hardware, over a time frame spanning no more than a few
> > milliseconds. As it requires appropriate hardware support, this
> > interface is optional.
> 
> From what I recall, the plan is to:
> - keep cpuinfo_cur_freq as it is, not expose for x86 and call ->get()
>   for ARM.
> 
> - introduce cpuinfo_avg_freq() and make it return frequency from hw
>   counters for both ARM and Intel and others who provide the API.
> 
> - update scaling_cur_freq() to only return the requested frequency or
>   error in case of X86 and update documentation to reflect the same.
>   Right now or after some time ? How much time ?
> 
>   Rafael ?
>
> > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> > index 04fc786dd2c0..3493e5a9500d 100644
> > --- a/drivers/cpufreq/cpufreq.c
> > +++ b/drivers/cpufreq/cpufreq.c
> > @@ -752,6 +752,16 @@ __weak unsigned int arch_freq_get_on_cpu(int cpu)
> >  	return 0;
> >  }
> >  
> > +__weak int arch_freq_avg_get_on_cpu(int cpu)
> > +{
> > +	return -EOPNOTSUPP;
> > +}
> > +
> > +static inline bool cpufreq_avg_freq_supported(struct cpufreq_policy *policy)
> > +{
> > +	return arch_freq_avg_get_on_cpu(policy->cpu) >= 0;
> > +}
> 
> And why aren't we simply reusing arch_freq_get_on_cpu() here ?
We need a way to discover whether the new sysfs attrib is to be enabled or not.
I guess I could change the signature for arch_freq_get_on_cpu to return an error
if that functionality is not supported for given policy ?

---
Best Regards
Beata
> 
> -- 
> viresh
Beata Michalska Nov. 4, 2024, 8:01 a.m. UTC | #6
On Tue, Oct 29, 2024 at 12:31:11PM +0100, Rafael J. Wysocki wrote:
> On Tue, Oct 29, 2024 at 8:04 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> >
> > Apologies for the delay from my side. September was mostly holidays
> > for me and then I was stuck with other stuff plus email backlog and
> > this series was always a painful point to return to :(
> >
> > On 13-09-24, 14:29, Beata Michalska wrote:
> > > Currently the CPUFreq core exposes two sysfs attributes that can be used
> > > to query current frequency of a given CPU(s): namely cpuinfo_cur_freq
> > > and scaling_cur_freq. Both provide slightly different view on the
> > > subject and they do come with their own drawbacks.
> > >
> > > cpuinfo_cur_freq provides higher precision though at a cost of being
> > > rather expensive. Moreover, the information retrieved via this attribute
> > > is somewhat short lived as frequency can change at any point of time
> > > making it difficult to reason from.
> > >
> > > scaling_cur_freq, on the other hand, tends to be less accurate but then
> > > the actual level of precision (and source of information) varies between
> > > architectures making it a bit ambiguous.
> > >
> > > The new attribute, cpuinfo_avg_freq, is intended to provide more stable,
> > > distinct interface, exposing an average frequency of a given CPU(s), as
> > > reported by the hardware, over a time frame spanning no more than a few
> > > milliseconds. As it requires appropriate hardware support, this
> > > interface is optional.
> >
> > From what I recall, the plan is to:
> > - keep cpuinfo_cur_freq as it is, not expose for x86 and call ->get()
> >   for ARM.
> 
> Yes.
That one indeed remains unchanged.
> 
> > - introduce cpuinfo_avg_freq() and make it return frequency from hw
> >   counters for both ARM and Intel and others who provide the API.
> 
> Yes.
Will add changes for Intel as well.
> 
> > - update scaling_cur_freq() to only return the requested frequency or
> >   error in case of X86
> 
> Yes.
> 
> Preferably, -ENOTSUPP for "setpolicy" drivers without the .get() callback.
Right, my impression was that we want to leave that one as is.
Will add appropriate changes.
> 
> >   and update documentation to reflect the same.
> >   Right now or after some time ? How much time ?
> 
> After some time, I think at least two cycles, so people have the time
> to switch over, but much more may be necessary if someone is stuck
> with RHEL or similar user space.
> 
> Anyway, x86 will be the only one affected and there may be a Kconfig
> option even to allow it to be changed at the kernel build time.
> 
So just for my clarification we want a config switch to control what
scaling_cur_freq is to actually provide. It will keep the current behaviour as
default until we are ready to flip it and ultimately drop that temporary config
option ?
> The documentation for cpuinfo_avg_freq() needs to be added along with it.
That one is already provided unless you have smth else on mind ?
Like updating scaling_cur_freq to reference the new sysfs attribute ?

---
Best Regards
Beata
> 
> > > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> > > index 04fc786dd2c0..3493e5a9500d 100644
> > > --- a/drivers/cpufreq/cpufreq.c
> > > +++ b/drivers/cpufreq/cpufreq.c
> > > @@ -752,6 +752,16 @@ __weak unsigned int arch_freq_get_on_cpu(int cpu)
> > >       return 0;
> > >  }
> > >
> > > +__weak int arch_freq_avg_get_on_cpu(int cpu)
> > > +{
> > > +     return -EOPNOTSUPP;
> > > +}
> > > +
> > > +static inline bool cpufreq_avg_freq_supported(struct cpufreq_policy *policy)
> > > +{
> > > +     return arch_freq_avg_get_on_cpu(policy->cpu) >= 0;
> > > +}
> >
> > And why aren't we simply reusing arch_freq_get_on_cpu() here ?
> >
> > --
> > viresh
Rafael J. Wysocki Nov. 4, 2024, 1:26 p.m. UTC | #7
On Mon, Nov 4, 2024 at 9:01 AM Beata Michalska <beata.michalska@arm.com> wrote:
>
> On Tue, Oct 29, 2024 at 12:31:11PM +0100, Rafael J. Wysocki wrote:
> > On Tue, Oct 29, 2024 at 8:04 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > >
> > > Apologies for the delay from my side. September was mostly holidays
> > > for me and then I was stuck with other stuff plus email backlog and
> > > this series was always a painful point to return to :(
> > >
> > > On 13-09-24, 14:29, Beata Michalska wrote:
> > > > Currently the CPUFreq core exposes two sysfs attributes that can be used
> > > > to query current frequency of a given CPU(s): namely cpuinfo_cur_freq
> > > > and scaling_cur_freq. Both provide slightly different view on the
> > > > subject and they do come with their own drawbacks.
> > > >
> > > > cpuinfo_cur_freq provides higher precision though at a cost of being
> > > > rather expensive. Moreover, the information retrieved via this attribute
> > > > is somewhat short lived as frequency can change at any point of time
> > > > making it difficult to reason from.
> > > >
> > > > scaling_cur_freq, on the other hand, tends to be less accurate but then
> > > > the actual level of precision (and source of information) varies between
> > > > architectures making it a bit ambiguous.
> > > >
> > > > The new attribute, cpuinfo_avg_freq, is intended to provide more stable,
> > > > distinct interface, exposing an average frequency of a given CPU(s), as
> > > > reported by the hardware, over a time frame spanning no more than a few
> > > > milliseconds. As it requires appropriate hardware support, this
> > > > interface is optional.
> > >
> > > From what I recall, the plan is to:
> > > - keep cpuinfo_cur_freq as it is, not expose for x86 and call ->get()
> > >   for ARM.
> >
> > Yes.
> That one indeed remains unchanged.
> >
> > > - introduce cpuinfo_avg_freq() and make it return frequency from hw
> > >   counters for both ARM and Intel and others who provide the API.
> >
> > Yes.
> Will add changes for Intel as well.
> >
> > > - update scaling_cur_freq() to only return the requested frequency or
> > >   error in case of X86
> >
> > Yes.
> >
> > Preferably, -ENOTSUPP for "setpolicy" drivers without the .get() callback.
> Right, my impression was that we want to leave that one as is.

For now, yes please.

> Will add appropriate changes.

In the future.

> > >   and update documentation to reflect the same.
> > >   Right now or after some time ? How much time ?
> >
> > After some time, I think at least two cycles, so people have the time
> > to switch over, but much more may be necessary if someone is stuck
> > with RHEL or similar user space.
> >
> > Anyway, x86 will be the only one affected and there may be a Kconfig
> > option even to allow it to be changed at the kernel build time.
> >
> So just for my clarification we want a config switch to control what
> scaling_cur_freq is to actually provide.

Something like:

    if (IS_ENABLED(CONFIG_CPUFREQ_CUR_FREQ_FROM_ARCH))) {
        freq = arch_freq_get_on_cpu(policy->cpu);
        if (freq)
            return sysfs_emit(buf, "%u\n", freq);
    }

    if (cpufreq_driver->setpolicy) {
        if (cpufreq_driver->get))
            return sysfs_emit(buf, "%u\n", cpufreq_driver->get(policy->cpu));

        return -ENOTSUPP;
    }

    return sysfs_emit(buf, "%u\n", policy->cur);

And select CONFIG_CPUFREQ_CUR_FREQ_FROM_ARCH on x86 for now.

> It will keep the current behaviour as
> default until we are ready to flip it and ultimately drop that temporary config
> option ?

Yes.

> > The documentation for cpuinfo_avg_freq() needs to be added along with it.
> That one is already provided unless you have smth else on mind ?
> Like updating scaling_cur_freq to reference the new sysfs attribute ?

I haven't checked super-thoroughly, but generally all documentation
needs to be consistent.
Viresh Kumar Nov. 8, 2024, 6:28 a.m. UTC | #8
On Mon, 4 Nov 2024 at 13:30, Beata Michalska <beata.michalska@arm.com> wrote:

> We need a way to discover whether the new sysfs attrib is to be enabled or not.

Ahh, this is too much trouble to make it optional then :)

> I guess I could change the signature for arch_freq_get_on_cpu to return an error
> if that functionality is not supported for given policy ?

That would be more controversial I guess as we may want an unsigned int return
type for frequency.

I think we can do two things here:
- Just enable the sysfs attribute all the time ?
- Make it optional based on some static cpufreq driver flag ?

I would just enable it for everyone I guess.

--
Viresh
diff mbox series

Patch

diff --git a/Documentation/admin-guide/pm/cpufreq.rst b/Documentation/admin-guide/pm/cpufreq.rst
index fe1be4ad88cb..2204d6132c05 100644
--- a/Documentation/admin-guide/pm/cpufreq.rst
+++ b/Documentation/admin-guide/pm/cpufreq.rst
@@ -248,6 +248,16 @@  are the following:
 	If that frequency cannot be determined, this attribute should not
 	be present.
 
+``cpuinfo_avg_freq``
+        An average frequency (in KHz) of all CPUs belonging to a given policy,
+        derived from a hardware provided feedback and reported on a time frame
+        spanning at most few milliseconds.
+
+        This is expected to be based on the frequency the hardware actually runs
+        at and, as such, might require specialised hardware support (such as AMU
+        extension on ARM). If one cannot be determined, this attribute should
+        not be present.
+
 ``cpuinfo_max_freq``
 	Maximum possible operating frequency the CPUs belonging to this policy
 	can run at (in kHz).
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 04fc786dd2c0..3493e5a9500d 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -752,6 +752,16 @@  __weak unsigned int arch_freq_get_on_cpu(int cpu)
 	return 0;
 }
 
+__weak int arch_freq_avg_get_on_cpu(int cpu)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline bool cpufreq_avg_freq_supported(struct cpufreq_policy *policy)
+{
+	return arch_freq_avg_get_on_cpu(policy->cpu) >= 0;
+}
+
 static ssize_t show_scaling_cur_freq(struct cpufreq_policy *policy, char *buf)
 {
 	ssize_t ret;
@@ -802,6 +812,20 @@  static ssize_t show_cpuinfo_cur_freq(struct cpufreq_policy *policy,
 	return sysfs_emit(buf, "<unknown>\n");
 }
 
+/*
+ * show_cpuinfo_avg_freq - average CPU frequency as detected by hardware
+ */
+static ssize_t show_cpuinfo_avg_freq(struct cpufreq_policy *policy,
+				     char *buf)
+{
+	int avg_freq = arch_freq_avg_get_on_cpu(policy->cpu);
+
+	if (avg_freq > 0)
+		return sysfs_emit(buf, "%u\n", avg_freq);
+
+	return sysfs_emit(buf, "<unknown>\n");
+}
+
 /*
  * show_scaling_governor - show the current policy for the specified CPU
  */
@@ -964,6 +988,7 @@  static ssize_t show_bios_limit(struct cpufreq_policy *policy, char *buf)
 }
 
 cpufreq_freq_attr_ro_perm(cpuinfo_cur_freq, 0400);
+cpufreq_freq_attr_ro(cpuinfo_avg_freq);
 cpufreq_freq_attr_ro(cpuinfo_min_freq);
 cpufreq_freq_attr_ro(cpuinfo_max_freq);
 cpufreq_freq_attr_ro(cpuinfo_transition_latency);
@@ -1091,6 +1116,12 @@  static int cpufreq_add_dev_interface(struct cpufreq_policy *policy)
 			return ret;
 	}
 
+	if (cpufreq_avg_freq_supported(policy)) {
+		ret = sysfs_create_file(&policy->kobj, &cpuinfo_avg_freq.attr);
+		if (ret)
+			return ret;
+	}
+
 	ret = sysfs_create_file(&policy->kobj, &scaling_cur_freq.attr);
 	if (ret)
 		return ret;
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index d4d2f4d1d7cb..48262073707e 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -1195,6 +1195,7 @@  static inline int of_perf_domain_get_sharing_cpumask(int pcpu, const char *list_
 #endif
 
 extern unsigned int arch_freq_get_on_cpu(int cpu);
+extern int arch_freq_avg_get_on_cpu(int cpu);
 
 #ifndef arch_set_freq_scale
 static __always_inline