Message ID | 20240405133319.859813-5-beata.michalska@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Add support for AArch64 AMUv1-based arch_freq_get_on_cpu | expand |
On Fri, Apr 05, 2024 at 02:33:19PM +0100, Beata Michalska wrote: >Some architectures provide a way to determine an average frequency over >a certain period of time based on available performance monitors (AMU on >ARM or APERF/MPERf on x86). With those at hand, enroll arch_freq_get_on_cpu >into cpuinfo_cur_freq policy sysfs attribute handler, which is expected to >represent the current frequency of a given CPU, as obtained by the hardware. >This is the type of feedback that counters do provide. > --- snip --- While testing this patch series on AmpereOne system, I simulated CPU frequency throttling when the system is under power or thermal constraints. In this scenario, based on the user guilde, I expect scaling_cur_freq is the frequency the kernel requests from the hardware; cpuinfo_cur_freq is the actual frequency that the hardware is able to run at during the power or thermal constraints. The AmpereOne system I'm testing on has the following configuration: - Max frequency is 3000000 - Support for AMU registers - ACPI CPPC feedback counters use PCC register space - Fedora 39 with 6.7.5 kernel - Fedora 39 with 6.9.0-rc3 + this patch series With 6.7.5 kernel: Core scaling_cur_freq cpuinfo_cur_freq ---- ---------------- ---------------- 0 3000000 2593000 1 3000000 2613000 2 3000000 2625000 3 3000000 2632000 With 6.9.0-rc3 + this patch series: Core scaling_cur_freq cpuinfo_cur_freq ---- ---------------- ---------------- 0 2671875 2671875 1 2589632 2589632 2 2648437 2648437 3 2698242 2698242 In the second case we can't identify that the CPU frequency is being throttled by the hardware. I noticed this behavior with or without this patch. Thanks, Vanshi
On Mon, Apr 15, 2024 at 09:23:10PM -0700, Vanshidhar Konda wrote: > On Fri, Apr 05, 2024 at 02:33:19PM +0100, Beata Michalska wrote: > > Some architectures provide a way to determine an average frequency over > > a certain period of time based on available performance monitors (AMU on > > ARM or APERF/MPERf on x86). With those at hand, enroll arch_freq_get_on_cpu > > into cpuinfo_cur_freq policy sysfs attribute handler, which is expected to > > represent the current frequency of a given CPU, as obtained by the hardware. > > This is the type of feedback that counters do provide. > > > > --- snip --- > > While testing this patch series on AmpereOne system, I simulated CPU > frequency throttling when the system is under power or thermal > constraints. > > In this scenario, based on the user guilde, I expect scaling_cur_freq > is the frequency the kernel requests from the hardware; cpuinfo_cur_freq > is the actual frequency that the hardware is able to run at during the > power or thermal constraints. There has been a discussion on scaling_cur_freq vs cpuinfo_cur_freq [1]. The guidelines you are referring here (assuming you mean [2]) are kinda out-of-sync already as scaling_cur_freq has been wired earlier to use arch specific feedback. As there was no Arm dedicated implementation of arch_freq_get_on_cpu, this went kinda unnoticed. The conclusion of the above mentioned discussion (though rather unstated explicitly) was to keep the current behaviour of scaling_cur_freq and align both across different archs: so with the patches, both attributes will provide hw feedback on current frequency, when available. Note that if we are to adhere to the docs cpuinfo_cur_freq is the place to use the counters really. That change was also requested through [3] Adding @Viresh in case there was any shift in the tides .... > > The AmpereOne system I'm testing on has the following configuration: > - Max frequency is 3000000 > - Support for AMU registers > - ACPI CPPC feedback counters use PCC register space > - Fedora 39 with 6.7.5 kernel > - Fedora 39 with 6.9.0-rc3 + this patch series > > With 6.7.5 kernel: > Core scaling_cur_freq cpuinfo_cur_freq > ---- ---------------- ---------------- > 0 3000000 2593000 > 1 3000000 2613000 > 2 3000000 2625000 > 3 3000000 2632000 > So if I got it right from the info you have provided the numbers above are obtained without applying the patches. In that case, scaling_cur_freq will use policy->cur (in your case) showing last frequency set, not necessarily the actual freq, whereas cpuinfo_cur_freq uses __cpufreq_get and AMU counters. > With 6.9.0-rc3 + this patch series: > Core scaling_cur_freq cpuinfo_cur_freq > ---- ---------------- ---------------- > 0 2671875 2671875 > 1 2589632 2589632 > 2 2648437 2648437 > 3 2698242 2698242 > With the patches applied both scaling_cur_freq and cpuinfo_cur_freq will use AMU counters, or fie scale factor obtained based on AMU counters to be more precise: both should now show similar/same frequency (as discussed in [1]) I'd say due to existing implementation for scaling_cur_freq (which we cannot change at this point) this is unavoidable. > In the second case we can't identify that the CPU frequency is > being throttled by the hardware. I noticed this behavior with > or without this patch. > I am not entirely sure comparing the two should be a way to go about throttling (whether w/ or w/o the changes). It would probably be best to refer to thermal sysfs and get a hold of cur_state which should indicate current throttle state: /sys/class/thermal/thermal_zone[0-*]/cdev[0-*]/cur_state with values above '0' implying ongoing throttling. The appropriate thermal_zone can be identified through 'type' attribute. Thank you for giving those patches a spin. --- BR Beata --- [1] https://lore.kernel.org/all/20230609043922.eyyqutbwlofqaddz@vireshk-i7/ [2] https://elixir.bootlin.com/linux/latest/source/Documentation/admin-guide/pm/cpufreq.rst#L197 [3] https://lore.kernel.org/lkml/2cfbc633-1e94-d741-2337-e1b0cf48b81b@nvidia.com/ --- > Thanks, > Vanshi
On Tue, Apr 16, 2024 at 05:46:18PM +0200, Beata Michalska wrote: >On Mon, Apr 15, 2024 at 09:23:10PM -0700, Vanshidhar Konda wrote: >> On Fri, Apr 05, 2024 at 02:33:19PM +0100, Beata Michalska wrote: >> > Some architectures provide a way to determine an average frequency over >> > a certain period of time based on available performance monitors (AMU on >> > ARM or APERF/MPERf on x86). With those at hand, enroll arch_freq_get_on_cpu >> > into cpuinfo_cur_freq policy sysfs attribute handler, which is expected to >> > represent the current frequency of a given CPU, as obtained by the hardware. >> > This is the type of feedback that counters do provide. >> > >> >> --- snip --- >> >> While testing this patch series on AmpereOne system, I simulated CPU >> frequency throttling when the system is under power or thermal >> constraints. >> >> In this scenario, based on the user guilde, I expect scaling_cur_freq >> is the frequency the kernel requests from the hardware; cpuinfo_cur_freq >> is the actual frequency that the hardware is able to run at during the >> power or thermal constraints. >There has been a discussion on scaling_cur_freq vs cpuinfo_cur_freq [1]. >The guidelines you are referring here (assuming you mean [2]) are kinda >out-of-sync already as scaling_cur_freq has been wired earlier to use arch >specific feedback. As there was no Arm dedicated implementation of >arch_freq_get_on_cpu, this went kinda unnoticed. >The conclusion of the above mentioned discussion (though rather unstated >explicitly) was to keep the current behaviour of scaling_cur_freq and align >both across different archs: so with the patches, both attributes will provide >hw feedback on current frequency, when available. >Note that if we are to adhere to the docs cpuinfo_cur_freq is the place to use >the counters really. > >That change was also requested through [3] > >Adding @Viresh in case there was any shift in the tides .... > Thank you for the pointer to the discussion in [1]. It makes sense to bring arm64 behavior in line with x86. The question about whether modifying the behavior of scaling_cur_freq was a good idea did not get any response. >> >> The AmpereOne system I'm testing on has the following configuration: >> - Max frequency is 3000000 >> - Support for AMU registers >> - ACPI CPPC feedback counters use PCC register space >> - Fedora 39 with 6.7.5 kernel >> - Fedora 39 with 6.9.0-rc3 + this patch series >> >> With 6.7.5 kernel: >> Core scaling_cur_freq cpuinfo_cur_freq >> ---- ---------------- ---------------- >> 0 3000000 2593000 >> 1 3000000 2613000 >> 2 3000000 2625000 >> 3 3000000 2632000 >> >So if I got it right from the info you have provided the numbers above are >obtained without applying the patches. In that case, scaling_cur_freq will >use policy->cur (in your case) showing last frequency set, not necessarily >the actual freq, whereas cpuinfo_cur_freq uses __cpufreq_get and AMU counters. > > >> With 6.9.0-rc3 + this patch series: >> Core scaling_cur_freq cpuinfo_cur_freq >> ---- ---------------- ---------------- >> 0 2671875 2671875 >> 1 2589632 2589632 >> 2 2648437 2648437 >> 3 2698242 2698242 >> >With the patches applied both scaling_cur_freq and cpuinfo_cur_freq will use AMU >counters, or fie scale factor obtained based on AMU counters to be more precise: >both should now show similar/same frequency (as discussed in [1]) >I'd say due to existing implementation for scaling_cur_freq (which we cannot >change at this point) this is unavoidable. > >> In the second case we can't identify that the CPU frequency is >> being throttled by the hardware. I noticed this behavior with >> or without this patch. >> >I am not entirely sure comparing the two should be a way to go about throttling >(whether w/ or w/o the changes). >It would probably be best to refer to thermal sysfs and get a hold of cur_state Throttling could happen due to non-thermal reasons. Or a system may not even support thermal zones. So on those systems we wouldn't be able to identify/debug the behavior of the hardware providing less than maximum performance. The discussion around scaling_cur_freq should probably be re-visited in a targeted manner I think. I'll test v5 of the series on AmpereOne and report back on that thread. Thanks, Vanshi >which should indicate current throttle state: > > /sys/class/thermal/thermal_zone[0-*]/cdev[0-*]/cur_state > >with values above '0' implying ongoing throttling. > >The appropriate thermal_zone can be identified through 'type' attribute. > > >Thank you for giving those patches a spin. > >--- >BR >Beata >--- >[1] https://lore.kernel.org/all/20230609043922.eyyqutbwlofqaddz@vireshk-i7/ >[2] https://elixir.bootlin.com/linux/latest/source/Documentation/admin-guide/pm/cpufreq.rst#L197 >[3] https://lore.kernel.org/lkml/2cfbc633-1e94-d741-2337-e1b0cf48b81b@nvidia.com/ >--- > > >> Thanks, >> Vanshi
On Wed, Apr 17, 2024 at 02:38:58PM -0700, Vanshidhar Konda wrote: > On Tue, Apr 16, 2024 at 05:46:18PM +0200, Beata Michalska wrote: > > On Mon, Apr 15, 2024 at 09:23:10PM -0700, Vanshidhar Konda wrote: > > > On Fri, Apr 05, 2024 at 02:33:19PM +0100, Beata Michalska wrote: > > > > Some architectures provide a way to determine an average frequency over > > > > a certain period of time based on available performance monitors (AMU on > > > > ARM or APERF/MPERf on x86). With those at hand, enroll arch_freq_get_on_cpu > > > > into cpuinfo_cur_freq policy sysfs attribute handler, which is expected to > > > > represent the current frequency of a given CPU, as obtained by the hardware. > > > > This is the type of feedback that counters do provide. > > > > > > > > > > --- snip --- > > > > > > While testing this patch series on AmpereOne system, I simulated CPU > > > frequency throttling when the system is under power or thermal > > > constraints. > > > > > > In this scenario, based on the user guilde, I expect scaling_cur_freq > > > is the frequency the kernel requests from the hardware; cpuinfo_cur_freq > > > is the actual frequency that the hardware is able to run at during the > > > power or thermal constraints. > > There has been a discussion on scaling_cur_freq vs cpuinfo_cur_freq [1]. > > The guidelines you are referring here (assuming you mean [2]) are kinda > > out-of-sync already as scaling_cur_freq has been wired earlier to use arch > > specific feedback. As there was no Arm dedicated implementation of > > arch_freq_get_on_cpu, this went kinda unnoticed. > > The conclusion of the above mentioned discussion (though rather unstated > > explicitly) was to keep the current behaviour of scaling_cur_freq and align > > both across different archs: so with the patches, both attributes will provide > > hw feedback on current frequency, when available. > > Note that if we are to adhere to the docs cpuinfo_cur_freq is the place to use > > the counters really. > > > > That change was also requested through [3] > > > > Adding @Viresh in case there was any shift in the tides .... > > > > Thank you for the pointer to the discussion in [1]. It makes sense to > bring arm64 behavior in line with x86. The question about whether > modifying the behavior of scaling_cur_freq was a good idea did not get > any response. > > > > > > > The AmpereOne system I'm testing on has the following configuration: > > > - Max frequency is 3000000 > > > - Support for AMU registers > > > - ACPI CPPC feedback counters use PCC register space > > > - Fedora 39 with 6.7.5 kernel > > > - Fedora 39 with 6.9.0-rc3 + this patch series > > > > > > With 6.7.5 kernel: > > > Core scaling_cur_freq cpuinfo_cur_freq > > > ---- ---------------- ---------------- > > > 0 3000000 2593000 > > > 1 3000000 2613000 > > > 2 3000000 2625000 > > > 3 3000000 2632000 > > > > > So if I got it right from the info you have provided the numbers above are > > obtained without applying the patches. In that case, scaling_cur_freq will > > use policy->cur (in your case) showing last frequency set, not necessarily > > the actual freq, whereas cpuinfo_cur_freq uses __cpufreq_get and AMU counters. > > > > > > > With 6.9.0-rc3 + this patch series: > > > Core scaling_cur_freq cpuinfo_cur_freq > > > ---- ---------------- ---------------- > > > 0 2671875 2671875 > > > 1 2589632 2589632 > > > 2 2648437 2648437 > > > 3 2698242 2698242 > > > > > With the patches applied both scaling_cur_freq and cpuinfo_cur_freq will use AMU > > counters, or fie scale factor obtained based on AMU counters to be more precise: > > both should now show similar/same frequency (as discussed in [1]) > > I'd say due to existing implementation for scaling_cur_freq (which we cannot > > change at this point) this is unavoidable. > > > > > In the second case we can't identify that the CPU frequency is > > > being throttled by the hardware. I noticed this behavior with > > > or without this patch. > > > > > I am not entirely sure comparing the two should be a way to go about throttling > > (whether w/ or w/o the changes). > > It would probably be best to refer to thermal sysfs and get a hold of cur_state > > Throttling could happen due to non-thermal reasons. Or a system may not > even support thermal zones. So on those systems we wouldn't be able to > identify/debug the behavior of the hardware providing less than maximum > performance. The discussion around scaling_cur_freq should probably be > re-visited in a targeted manner I think. > @Viresh: It seems that we might need to revisit the discussion we've had around scaling_cur_freq and cpuinfo_cur_freq and the use of arch_freq_get_on_cpu. As Vanshi has raised, having both utilizing arch specific feedback for getting current frequency is bit problematic and might be confusing at best. As arch_freq_get_on_cpu is already used by show_scaling_cur_freq there are not many options we are left with, if we want to kee all archs aligned: we can either try to rework show_scaling_cur_freq and it's use of arch_freq_get_on_cpu, and move it to cpuinfo_cur_freq, which would align with relevant docs, though that will not work for x86, or we keep it only there and skip updating cpuinfo_cur_freq, going against the guidelines. Other options, purely theoretical, would involve making arch_freq_get_on_cpu aware of type of the info requested (hw vs sw) or adding yet another arch-specific implementation, and those are not really appealing alternatives to say at least. What's your opinion on this one ? --- BR Beata > I'll test v5 of the series on AmpereOne and report back on that thread. > > Thanks, > Vanshi > > > which should indicate current throttle state: > > > > /sys/class/thermal/thermal_zone[0-*]/cdev[0-*]/cur_state > > > > with values above '0' implying ongoing throttling. > > > > The appropriate thermal_zone can be identified through 'type' attribute. > > > > > > Thank you for giving those patches a spin. > > > > --- > > BR > > Beata > > --- > > [1] https://lore.kernel.org/all/20230609043922.eyyqutbwlofqaddz@vireshk-i7/ > > [2] https://elixir.bootlin.com/linux/latest/source/Documentation/admin-guide/pm/cpufreq.rst#L197 > > [3] https://lore.kernel.org/lkml/2cfbc633-1e94-d741-2337-e1b0cf48b81b@nvidia.com/ > > --- > > > > > > > Thanks, > > > Vanshi
On 26-04-24, 12:45, Beata Michalska wrote: > It seems that we might need to revisit the discussion we've had around > scaling_cur_freq and cpuinfo_cur_freq and the use of arch_freq_get_on_cpu. > As Vanshi has raised, having both utilizing arch specific feedback for > getting current frequency is bit problematic and might be confusing at best. > As arch_freq_get_on_cpu is already used by show_scaling_cur_freq there are not > many options we are left with, if we want to kee all archs aligned: > we can either try to rework show_scaling_cur_freq and it's use of > arch_freq_get_on_cpu, and move it to cpuinfo_cur_freq, which would align with > relevant docs, though that will not work for x86, or we keep it only there and > skip updating cpuinfo_cur_freq, going against the guidelines. Other options, > purely theoretical, would involve making arch_freq_get_on_cpu aware of type of > the info requested (hw vs sw) or adding yet another arch-specific implementation, > and those are not really appealing alternatives to say at least. > What's your opinion on this one ? Hi Beata / Vanshidhar, Lets forget for once what X86 and ARM may have done and think about it once again. I also had a chat with Vincent today about this. The documentation says it clearly, cpuinfo_cur_freq is the one received from hardware and scaling_cur_freq is the one requested from software. Now, I know that X86 has made both of them quite similar and I suggested to make them all aligned (and never received a reply on my previous message). There are few reasons why it may be worth keeping the definition (and behavior) of the sysfs files as is, at least for ARM: - First is that the documentation says so. - There is no point providing the same information via both the interfaces, there are two interfaces here for a reason. - There maybe tools around which depend on the documented behavior. - From userspace, currently there is only one way to know the exact frequency that the cpufreq governors have requested from a platform, i.e. the value from scaling_cur_freq. If we make it similar to cpuinfo_cur_freq, then userspace will never know about the requested frequency and the eventual one and if they are same or different. Lets keep the behavior as is and update only cpuinfo_cur_freq with arch_freq_get_on_cpu(). Makes sense ?
On Mon, Apr 29, 2024 at 02:55:15PM +0530, Viresh Kumar wrote: >On 26-04-24, 12:45, Beata Michalska wrote: >> It seems that we might need to revisit the discussion we've had around >> scaling_cur_freq and cpuinfo_cur_freq and the use of arch_freq_get_on_cpu. >> As Vanshi has raised, having both utilizing arch specific feedback for >> getting current frequency is bit problematic and might be confusing at best. >> As arch_freq_get_on_cpu is already used by show_scaling_cur_freq there are not >> many options we are left with, if we want to kee all archs aligned: >> we can either try to rework show_scaling_cur_freq and it's use of >> arch_freq_get_on_cpu, and move it to cpuinfo_cur_freq, which would align with >> relevant docs, though that will not work for x86, or we keep it only there and >> skip updating cpuinfo_cur_freq, going against the guidelines. Other options, >> purely theoretical, would involve making arch_freq_get_on_cpu aware of type of >> the info requested (hw vs sw) or adding yet another arch-specific implementation, >> and those are not really appealing alternatives to say at least. >> What's your opinion on this one ? > >Hi Beata / Vanshidhar, > >Lets forget for once what X86 and ARM may have done and think about it >once again. I also had a chat with Vincent today about this. > >The documentation says it clearly, cpuinfo_cur_freq is the one >received from hardware and scaling_cur_freq is the one requested from >software. > >Now, I know that X86 has made both of them quite similar and I >suggested to make them all aligned (and never received a reply on my >previous message). > >There are few reasons why it may be worth keeping the definition (and >behavior) of the sysfs files as is, at least for ARM: >- First is that the documentation says so. >- There is no point providing the same information via both the > interfaces, there are two interfaces here for a reason. >- There maybe tools around which depend on the documented behavior. >- From userspace, currently there is only one way to know the exact > frequency that the cpufreq governors have requested from a platform, > i.e. the value from scaling_cur_freq. If we make it similar to > cpuinfo_cur_freq, then userspace will never know about the requested > frequency and the eventual one and if they are same or different. > >Lets keep the behavior as is and update only cpuinfo_cur_freq with >arch_freq_get_on_cpu(). > >Makes sense ? I had the same concerns. It probably makes sense explicity note this in the next version of the patch series; in the future readers may be confused by x86 and ARM behave differntly on scaling_cur_freq. Thanks, Vanshi > >-- >viresh
On Mon, Apr 29, 2024 at 02:55:15PM +0530, Viresh Kumar wrote: > On 26-04-24, 12:45, Beata Michalska wrote: > > It seems that we might need to revisit the discussion we've had around > > scaling_cur_freq and cpuinfo_cur_freq and the use of arch_freq_get_on_cpu. > > As Vanshi has raised, having both utilizing arch specific feedback for > > getting current frequency is bit problematic and might be confusing at best. > > As arch_freq_get_on_cpu is already used by show_scaling_cur_freq there are not > > many options we are left with, if we want to kee all archs aligned: > > we can either try to rework show_scaling_cur_freq and it's use of > > arch_freq_get_on_cpu, and move it to cpuinfo_cur_freq, which would align with > > relevant docs, though that will not work for x86, or we keep it only there and > > skip updating cpuinfo_cur_freq, going against the guidelines. Other options, > > purely theoretical, would involve making arch_freq_get_on_cpu aware of type of > > the info requested (hw vs sw) or adding yet another arch-specific implementation, > > and those are not really appealing alternatives to say at least. > > What's your opinion on this one ? > > Hi Beata / Vanshidhar, > > Lets forget for once what X86 and ARM may have done and think about it > once again. I also had a chat with Vincent today about this. > > The documentation says it clearly, cpuinfo_cur_freq is the one > received from hardware and scaling_cur_freq is the one requested from > software. > > Now, I know that X86 has made both of them quite similar and I > suggested to make them all aligned (and never received a reply on my > previous message). > > There are few reasons why it may be worth keeping the definition (and > behavior) of the sysfs files as is, at least for ARM: > - First is that the documentation says so. > - There is no point providing the same information via both the > interfaces, there are two interfaces here for a reason. > - There maybe tools around which depend on the documented behavior. > - From userspace, currently there is only one way to know the exact > frequency that the cpufreq governors have requested from a platform, > i.e. the value from scaling_cur_freq. If we make it similar to > cpuinfo_cur_freq, then userspace will never know about the requested > frequency and the eventual one and if they are same or different. > > Lets keep the behavior as is and update only cpuinfo_cur_freq with > arch_freq_get_on_cpu(). > > Makes sense ? > First of all - apologies for late reply. It all makes sense, though to clarify things up, for my own benefit, and to avoid any potential confusion .... Adding arch_freq_get_on_cpu to cpuinfo_cur_freq does seem to be the right approach - no argue on this one. Doing that though means we need a way to skip calling arch_freq_get_on_cpu() from show_scaling_cur_freq(), to avoid having both providing the same information when that should not be the case. In the initial approach [1], that was handled by checking whether the cpufreq driver supports 'get' callback (and thus cpuinfo_cur_freq). In case it didn't, things remained unchanged for scaling_cur_freq. That does not seem to be a viable option though, as there are at least few drivers, that will support both: cpuinfo_cur_freq alongside scaling_cur_freq (+ APERF/MPERF) and for those, we would hit the initial problem of both relying on arch_freq_get_on_cpu. So I guess we need another way of avoiding calling arch_freq_get_on_cpu for show_scaling_cur_freq (and most probably avoid calling that one for cpuinfo_cur_freq). Quick idea on how to not bring arch specificity into cpufreq generic code would be to introduce a new flag for cpufreq drivers though that seems a bit stretched. Will ponder a bit about that but in the meantime suggestions are more than welcomed. Aside: I will most probably send the changes separately from this series to not mix things any further. --- [1] https://lore.kernel.org/all/20230606155754.245998-1-beata.michalska@arm.com/ --- BR Beata > -- > viresh
On Tue, May 07, 2024 at 10:31:52AM +0200, Beata Michalska wrote: > On Mon, Apr 29, 2024 at 02:55:15PM +0530, Viresh Kumar wrote: > > On 26-04-24, 12:45, Beata Michalska wrote: > > > It seems that we might need to revisit the discussion we've had around > > > scaling_cur_freq and cpuinfo_cur_freq and the use of arch_freq_get_on_cpu. > > > As Vanshi has raised, having both utilizing arch specific feedback for > > > getting current frequency is bit problematic and might be confusing at best. > > > As arch_freq_get_on_cpu is already used by show_scaling_cur_freq there are not > > > many options we are left with, if we want to kee all archs aligned: > > > we can either try to rework show_scaling_cur_freq and it's use of > > > arch_freq_get_on_cpu, and move it to cpuinfo_cur_freq, which would align with > > > relevant docs, though that will not work for x86, or we keep it only there and > > > skip updating cpuinfo_cur_freq, going against the guidelines. Other options, > > > purely theoretical, would involve making arch_freq_get_on_cpu aware of type of > > > the info requested (hw vs sw) or adding yet another arch-specific implementation, > > > and those are not really appealing alternatives to say at least. > > > What's your opinion on this one ? > > > > Hi Beata / Vanshidhar, > > > > Lets forget for once what X86 and ARM may have done and think about it > > once again. I also had a chat with Vincent today about this. > > > > The documentation says it clearly, cpuinfo_cur_freq is the one > > received from hardware and scaling_cur_freq is the one requested from > > software. > > > > Now, I know that X86 has made both of them quite similar and I > > suggested to make them all aligned (and never received a reply on my > > previous message). > > > > There are few reasons why it may be worth keeping the definition (and > > behavior) of the sysfs files as is, at least for ARM: > > - First is that the documentation says so. > > - There is no point providing the same information via both the > > interfaces, there are two interfaces here for a reason. > > - There maybe tools around which depend on the documented behavior. > > - From userspace, currently there is only one way to know the exact > > frequency that the cpufreq governors have requested from a platform, > > i.e. the value from scaling_cur_freq. If we make it similar to > > cpuinfo_cur_freq, then userspace will never know about the requested > > frequency and the eventual one and if they are same or different. > > > > Lets keep the behavior as is and update only cpuinfo_cur_freq with > > arch_freq_get_on_cpu(). > > > > Makes sense ? > > > First of all - apologies for late reply. > > It all makes sense, though to clarify things up, for my own benefit, and to > avoid any potential confusion .... > > Adding arch_freq_get_on_cpu to cpuinfo_cur_freq does seem to be the right > approach - no argue on this one. Doing that though means we need a way to > skip calling arch_freq_get_on_cpu() from show_scaling_cur_freq(), to avoid > having both providing the same information when that should not be the case. > In the initial approach [1], that was handled by checking whether the cpufreq > driver supports 'get' callback (and thus cpuinfo_cur_freq). In case it didn't, > things remained unchanged for scaling_cur_freq. That does not seem to be a viable > option though, as there are at least few drivers, that will support both: > cpuinfo_cur_freq alongside scaling_cur_freq (+ APERF/MPERF) and for those, > we would hit the initial problem of both relying on arch_freq_get_on_cpu. > So I guess we need another way of avoiding calling arch_freq_get_on_cpu > for show_scaling_cur_freq (and most probably avoid calling that one for > cpuinfo_cur_freq). Quick idea on how to not bring arch specificity into > cpufreq generic code would be to introduce a new flag for cpufreq drivers though > that seems a bit stretched. Will ponder a bit about that but in the meantime > suggestions are more than welcomed. Alternatively we could add a parameter to arch_freq_get_on_cpu specifying type of feedback required: hw vs sw. Then the arch specific implementation could decide which to provide when. It will get slightly counter-intuitive, especially for cases when sw feedback provides hw one, as it is the case for current arch_freq_get_on_cpu() and scaling_cur_freq but at least the changes would be minimal and it will contain handling the tricky bits inside arch specific implementation - hiding those messy bits. --- BR Beata > > Aside: I will most probably send the changes separately from this series to not > mix things any further. > > --- > [1] https://lore.kernel.org/all/20230606155754.245998-1-beata.michalska@arm.com/ > --- > BR > Beata > > > > -- > > viresh
On 07-05-24, 12:02, Beata Michalska wrote: > On Tue, May 07, 2024 at 10:31:52AM +0200, Beata Michalska wrote: > > On Mon, Apr 29, 2024 at 02:55:15PM +0530, Viresh Kumar wrote: > > > Lets forget for once what X86 and ARM may have done and think about it > > > once again. I also had a chat with Vincent today about this. > > > > > > The documentation says it clearly, cpuinfo_cur_freq is the one > > > received from hardware and scaling_cur_freq is the one requested from > > > software. > > > > > > Now, I know that X86 has made both of them quite similar and I > > > suggested to make them all aligned (and never received a reply on my > > > previous message). > > > > > > There are few reasons why it may be worth keeping the definition (and > > > behavior) of the sysfs files as is, at least for ARM: > > > - First is that the documentation says so. > > > - There is no point providing the same information via both the > > > interfaces, there are two interfaces here for a reason. > > > - There maybe tools around which depend on the documented behavior. > > > - From userspace, currently there is only one way to know the exact > > > frequency that the cpufreq governors have requested from a platform, > > > i.e. the value from scaling_cur_freq. If we make it similar to > > > cpuinfo_cur_freq, then userspace will never know about the requested > > > frequency and the eventual one and if they are same or different. > > > > > > Lets keep the behavior as is and update only cpuinfo_cur_freq with > > > arch_freq_get_on_cpu(). > > > > > > Makes sense ? > > > > > First of all - apologies for late reply. > > > > It all makes sense, though to clarify things up, for my own benefit, and to > > avoid any potential confusion .... > > > > Adding arch_freq_get_on_cpu to cpuinfo_cur_freq does seem to be the right > > approach - no argue on this one. Doing that though means we need a way to > > skip calling arch_freq_get_on_cpu() from show_scaling_cur_freq(), to avoid > > having both providing the same information when that should not be the case. > > In the initial approach [1], that was handled by checking whether the cpufreq > > driver supports 'get' callback (and thus cpuinfo_cur_freq). In case it didn't, > > things remained unchanged for scaling_cur_freq. That does not seem to be a viable > > option though, as there are at least few drivers, that will support both: > > cpuinfo_cur_freq alongside scaling_cur_freq (+ APERF/MPERF) and for those, > > we would hit the initial problem of both relying on arch_freq_get_on_cpu. > > So I guess we need another way of avoiding calling arch_freq_get_on_cpu > > for show_scaling_cur_freq (and most probably avoid calling that one for > > cpuinfo_cur_freq). Quick idea on how to not bring arch specificity into > > cpufreq generic code would be to introduce a new flag for cpufreq drivers though > > that seems a bit stretched. Will ponder a bit about that but in the meantime > > suggestions are more than welcomed. > Alternatively we could add a parameter to arch_freq_get_on_cpu specifying type > of feedback required: hw vs sw. Then the arch specific implementation could > decide which to provide when. It will get slightly counter-intuitive, especially > for cases when sw feedback provides hw one, as it is the case for current > arch_freq_get_on_cpu() and scaling_cur_freq but at least the changes would be > minimal and it will contain handling the tricky bits inside arch specific > implementation - hiding those messy bits. I think we should just move the call to arch_freq_get_on_cpu() from show_scaling_cur_freq() to show_cpuinfo_cur_freq() and post it. Lets see what X86 guys say to that. You can provide all the reasoning we discussed here, which makes perfect sense.
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 66e10a19d76a..603533b2608f 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -795,8 +795,10 @@ store_one(scaling_max_freq, max); static ssize_t show_cpuinfo_cur_freq(struct cpufreq_policy *policy, char *buf) { - unsigned int cur_freq = __cpufreq_get(policy); + unsigned int cur_freq = arch_freq_get_on_cpu(policy->cpu); + if (!cur_freq) + cur_freq = __cpufreq_get(policy); if (cur_freq) return sprintf(buf, "%u\n", cur_freq);
Some architectures provide a way to determine an average frequency over a certain period of time based on available performance monitors (AMU on ARM or APERF/MPERf on x86). With those at hand, enroll arch_freq_get_on_cpu into cpuinfo_cur_freq policy sysfs attribute handler, which is expected to represent the current frequency of a given CPU, as obtained by the hardware. This is the type of feedback that counters do provide. Signed-off-by: Beata Michalska <beata.michalska@arm.com> --- drivers/cpufreq/cpufreq.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)