Message ID | 20200701090751.7543-2-ionela.voinescu@arm.com (mailing list archive) |
---|---|
State | Changes Requested, archived |
Headers | show |
Series | cpufreq: improve frequency invariance support | expand |
On 01-07-20, 10:07, Ionela Voinescu wrote: > diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h > index 3494f6763597..42668588f9f8 100644 > --- a/include/linux/cpufreq.h > +++ b/include/linux/cpufreq.h > @@ -293,7 +293,7 @@ __ATTR(_name, 0644, show_##_name, store_##_name) > > struct cpufreq_driver { > char name[CPUFREQ_NAME_LEN]; > - u8 flags; > + u16 flags; Lets make it u32. > void *driver_data; > > /* needed by all drivers */ > @@ -417,6 +417,14 @@ struct cpufreq_driver { > */ > #define CPUFREQ_IS_COOLING_DEV BIT(7) > > +/* > + * Set by drivers which implement the necessary calls to the scheduler's > + * frequency invariance engine. The use of this flag will result in the > + * default arch_set_freq_scale calls being skipped in favour of custom > + * driver calls. > + */ > +#define CPUFREQ_CUSTOM_SET_FREQ_SCALE BIT(8) I will rather suggest CPUFREQ_SKIP_SET_FREQ_SCALE as the name and functionality. We need to give drivers a choice if they do not want the core to do it on their behalf, because they are doing it on their own or they don't want to do it.
Hi, Thank you for taking a look over these so quickly. On Wednesday 01 Jul 2020 at 16:16:17 (+0530), Viresh Kumar wrote: > On 01-07-20, 10:07, Ionela Voinescu wrote: > > diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h > > index 3494f6763597..42668588f9f8 100644 > > --- a/include/linux/cpufreq.h > > +++ b/include/linux/cpufreq.h > > @@ -293,7 +293,7 @@ __ATTR(_name, 0644, show_##_name, store_##_name) > > > > struct cpufreq_driver { > > char name[CPUFREQ_NAME_LEN]; > > - u8 flags; > > + u16 flags; > > Lets make it u32. > > > void *driver_data; > > > > /* needed by all drivers */ > > @@ -417,6 +417,14 @@ struct cpufreq_driver { > > */ > > #define CPUFREQ_IS_COOLING_DEV BIT(7) > > > > +/* > > + * Set by drivers which implement the necessary calls to the scheduler's > > + * frequency invariance engine. The use of this flag will result in the > > + * default arch_set_freq_scale calls being skipped in favour of custom > > + * driver calls. > > + */ > > +#define CPUFREQ_CUSTOM_SET_FREQ_SCALE BIT(8) > > I will rather suggest CPUFREQ_SKIP_SET_FREQ_SCALE as the name and > functionality. We need to give drivers a choice if they do not want > the core to do it on their behalf, because they are doing it on their > own or they don't want to do it. > In this case we would not be able to tell if cpufreq (driver or core) can provide the frequency scale factor, so we would not be able to tell if the system is really frequency invariant; CPUFREQ_SKIP_SET_FREQ_SCALE would be set if either: - the driver calls arch_set_freq_scale() on its own - the driver does not want arch_set_freq_scale() to be called. So at the core level we would not be able to distinguish between the two, and return whether cpufreq-based invariance is supported. I don't really see a reason why a driver would not want to set the frequency scale factor, if it has the proper mechanisms to do so (therefore excluding the exceptions mentioned in 2/8). I think the cpufreq core or drivers should produce the information (set the scale factor) and it should be up to the users to decide whether to use it or not. But being invariant should always be the default. Therefore, there are a few reasons I went for CPUFREQ_CUSTOM_SET_FREQ_SCALE instead: - It tells us if the driver has custom mechanisms to set the scale factor to filter the setting in cpufreq core and to inform the core on whether the system is frequency invariant. - It does have a user in the vexpress-spc driver. - Currently there aren't drivers that could but choose not to set the frequency scale factor, and it my opinion this should not be the case. Thanks, Ionela. > -- > viresh
On Wed, Jul 1, 2020 at 3:33 PM Ionela Voinescu <ionela.voinescu@arm.com> wrote: > > Hi, > > Thank you for taking a look over these so quickly. > > On Wednesday 01 Jul 2020 at 16:16:17 (+0530), Viresh Kumar wrote: > > On 01-07-20, 10:07, Ionela Voinescu wrote: > > > diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h > > > index 3494f6763597..42668588f9f8 100644 > > > --- a/include/linux/cpufreq.h > > > +++ b/include/linux/cpufreq.h > > > @@ -293,7 +293,7 @@ __ATTR(_name, 0644, show_##_name, store_##_name) > > > > > > struct cpufreq_driver { > > > char name[CPUFREQ_NAME_LEN]; > > > - u8 flags; > > > + u16 flags; > > > > Lets make it u32. > > > > > void *driver_data; > > > > > > /* needed by all drivers */ > > > @@ -417,6 +417,14 @@ struct cpufreq_driver { > > > */ > > > #define CPUFREQ_IS_COOLING_DEV BIT(7) > > > > > > +/* > > > + * Set by drivers which implement the necessary calls to the scheduler's > > > + * frequency invariance engine. The use of this flag will result in the > > > + * default arch_set_freq_scale calls being skipped in favour of custom > > > + * driver calls. > > > + */ > > > +#define CPUFREQ_CUSTOM_SET_FREQ_SCALE BIT(8) > > > > I will rather suggest CPUFREQ_SKIP_SET_FREQ_SCALE as the name and > > functionality. We need to give drivers a choice if they do not want > > the core to do it on their behalf, because they are doing it on their > > own or they don't want to do it. Well, this would go backwards to me, as we seem to be designing an opt-out flag for something that's not even implemented already. I would go for an opt-in instead. That would be much cleaner and less prone to regressions IMO. > > In this case we would not be able to tell if cpufreq (driver or core) > can provide the frequency scale factor, so we would not be able to tell > if the system is really frequency invariant; CPUFREQ_SKIP_SET_FREQ_SCALE > would be set if either: > - the driver calls arch_set_freq_scale() on its own > - the driver does not want arch_set_freq_scale() to be called. > > So at the core level we would not be able to distinguish between the > two, and return whether cpufreq-based invariance is supported. > > I don't really see a reason why a driver would not want to set the > frequency scale factor, if it has the proper mechanisms to do so > (therefore excluding the exceptions mentioned in 2/8). I think the > cpufreq core or drivers should produce the information (set the scale > factor) and it should be up to the users to decide whether to use it or > not. But being invariant should always be the default. So instead of what is being introduced by this patch, there should be an opt-in mechanism for drivers to tell the core to do the freq-scale factor setting on behalf of the driver. Then, the driver would be responsible to only opt-in for that if it knows it for a fact that the sched tick doesn't set the freq-scale factor. > Therefore, there are a few reasons I went for > CPUFREQ_CUSTOM_SET_FREQ_SCALE instead: > - It tells us if the driver has custom mechanisms to set the scale > factor to filter the setting in cpufreq core and to inform the > core on whether the system is frequency invariant. > - It does have a user in the vexpress-spc driver. > - Currently there aren't drivers that could but choose not to set > the frequency scale factor, and it my opinion this should not be > the case. Well, that depends on what you mean by "could". For example, it doesn't really make sense to set the freq-scale factor in either the ACPI cpufreq driver or intel_pstate, because the frequency (or P-state to be precise) requested by them may not be the one the CPU ends up running at and even so it may change at any time for various reasons (eg. in the turbo range). However, the ACPI cpufreq driver as well as intel_pstate in the passive mode both set policy->cur, so that might be used for setting the freq-scale factor in principle, but that freq-scale factor may not be very useful in practice. Thanks!
Hi Rafael, Thank you for the review! On Wednesday 01 Jul 2020 at 18:05:33 (+0200), Rafael J. Wysocki wrote: > On Wed, Jul 1, 2020 at 3:33 PM Ionela Voinescu <ionela.voinescu@arm.com> wrote: > > > > Hi, > > > > Thank you for taking a look over these so quickly. > > > > On Wednesday 01 Jul 2020 at 16:16:17 (+0530), Viresh Kumar wrote: > > > On 01-07-20, 10:07, Ionela Voinescu wrote: > > > > diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h > > > > index 3494f6763597..42668588f9f8 100644 > > > > --- a/include/linux/cpufreq.h > > > > +++ b/include/linux/cpufreq.h > > > > @@ -293,7 +293,7 @@ __ATTR(_name, 0644, show_##_name, store_##_name) > > > > > > > > struct cpufreq_driver { > > > > char name[CPUFREQ_NAME_LEN]; > > > > - u8 flags; > > > > + u16 flags; > > > > > > Lets make it u32. > > > > > > > void *driver_data; > > > > > > > > /* needed by all drivers */ > > > > @@ -417,6 +417,14 @@ struct cpufreq_driver { > > > > */ > > > > #define CPUFREQ_IS_COOLING_DEV BIT(7) > > > > > > > > +/* > > > > + * Set by drivers which implement the necessary calls to the scheduler's > > > > + * frequency invariance engine. The use of this flag will result in the > > > > + * default arch_set_freq_scale calls being skipped in favour of custom > > > > + * driver calls. > > > > + */ > > > > +#define CPUFREQ_CUSTOM_SET_FREQ_SCALE BIT(8) > > > > > > I will rather suggest CPUFREQ_SKIP_SET_FREQ_SCALE as the name and > > > functionality. We need to give drivers a choice if they do not want > > > the core to do it on their behalf, because they are doing it on their > > > own or they don't want to do it. > > Well, this would go backwards to me, as we seem to be designing an > opt-out flag for something that's not even implemented already. > > I would go for an opt-in instead. That would be much cleaner and less > prone to regressions IMO. > > > > > In this case we would not be able to tell if cpufreq (driver or core) > > can provide the frequency scale factor, so we would not be able to tell > > if the system is really frequency invariant; CPUFREQ_SKIP_SET_FREQ_SCALE > > would be set if either: > > - the driver calls arch_set_freq_scale() on its own > > - the driver does not want arch_set_freq_scale() to be called. > > > > So at the core level we would not be able to distinguish between the > > two, and return whether cpufreq-based invariance is supported. > > > > I don't really see a reason why a driver would not want to set the > > frequency scale factor, if it has the proper mechanisms to do so > > (therefore excluding the exceptions mentioned in 2/8). I think the > > cpufreq core or drivers should produce the information (set the scale > > factor) and it should be up to the users to decide whether to use it or > > not. But being invariant should always be the default. > > So instead of what is being introduced by this patch, there should be > an opt-in mechanism for drivers to tell the core to do the freq-scale > factor setting on behalf of the driver. > This could work better as it covers the following scenarios: - All the drivers in patch 3/8 would just use the flag to inform the the core that it can call arch_set_freq_scale() on their behalf. - It being omitted truly conveys the message that cpufreq information should not be used for frequency invariance, no matter the implementation of arch_set_freq_scale() (more details below) The only case that it does not cover is is the scenario in patch 4/8: one in which the driver is atypical and it needs its own calls to arch_set_freq_scale(), while it still wants to be able to report support for frequency invariance through cpufreq_sets_freq_scale() and later arch_scale_freq_invariant(). But the jury is still out on whether that part of the vexpress-spc driver should be given that much consideration. My choice of flag was considering this case and potentially other future ones like it, but this alternative also sounds good to me. > Then, the driver would be responsible to only opt-in for that if it > knows it for a fact that the sched tick doesn't set the freq-scale > factor. > I think that would create a tight coupling between the driver and the architecture, when arch_set_freq_scale() is already meant to have the same purpose, but it also provides some flexibility. Let me expand on this below. > > Therefore, there are a few reasons I went for > > CPUFREQ_CUSTOM_SET_FREQ_SCALE instead: > > - It tells us if the driver has custom mechanisms to set the scale > > factor to filter the setting in cpufreq core and to inform the > > core on whether the system is frequency invariant. > > - It does have a user in the vexpress-spc driver. > > - Currently there aren't drivers that could but choose not to set > > the frequency scale factor, and it my opinion this should not be > > the case. > > Well, that depends on what you mean by "could". > > For example, it doesn't really make sense to set the freq-scale factor > in either the ACPI cpufreq driver or intel_pstate, because the > frequency (or P-state to be precise) requested by them may not be the > one the CPU ends up running at and even so it may change at any time > for various reasons (eg. in the turbo range). However, the ACPI > cpufreq driver as well as intel_pstate in the passive mode both set > policy->cur, so that might be used for setting the freq-scale factor > in principle, but that freq-scale factor may not be very useful in > practice. > Yes, this completely makes sense, and if there are more accurate methods of obtaining information about the current performance level, by using counters for example, they should definitely be used. But in my opinion it should not be up to the driver to choose between the methods. The driver and core would only have some information on the current performance level (more or less accurate) and arch_set_freq_scale() is called to *potentially* use it to set the scale factor. So the use of policy->cur would be entirely dependent on the implementation of arch_set_freq_scale(). There could be a few scenarios here: - arch_set_freq_scale() is left to its weak default that does nothing (which would be the case for when the ACPI cpufreq driver or intel_psate are used) - arch_set_freq_scale() is implemented in such a way that takes into account the presence of a counter-based method of setting the scale factor and makes that take precedence (currently done for the users of the arch_topology driver). This also provides support for platforms that have partial support for counters, where the use of cpufreq information is still useful for the CPUs that don't support counters. For those cases, some information, although not entirely accurate, is still better than no information at all. So I believe cpufreq should just provide the information, if it can, and let the user decide whether to use it, or what source of information takes precedence. Therefore, arch_set_freq_scale() would decide to whether to filter it out. In any case, your suggestion regarding the choice of flag would make bypassing the use of cpufreq information in setting the scale factor explicit, no matter the definition of arch_set_freq_scale(). But it would also require writers of cpufreq driver code to remember to consider the setting of that flag. I'll consider this more while gauging interest in 4/8. Many thanks, Ionela. > Thanks!
On 01-07-20, 18:05, Rafael J. Wysocki wrote: > On Wed, Jul 1, 2020 at 3:33 PM Ionela Voinescu <ionela.voinescu@arm.com> wrote: > > On Wednesday 01 Jul 2020 at 16:16:17 (+0530), Viresh Kumar wrote: > > > I will rather suggest CPUFREQ_SKIP_SET_FREQ_SCALE as the name and > > > functionality. We need to give drivers a choice if they do not want > > > the core to do it on their behalf, because they are doing it on their > > > own or they don't want to do it. > > Well, this would go backwards to me, as we seem to be designing an > opt-out flag for something that's not even implemented already. > > I would go for an opt-in instead. That would be much cleaner and less > prone to regressions IMO. That's fine, I just wanted an option for drivers to opt-out of this thing. I felt okay with the opt-out flag as this should be enabled for most of the drivers and so enabling by default looked okay as well. > > In this case we would not be able to tell if cpufreq (driver or core) > > can provide the frequency scale factor, so we would not be able to tell > > if the system is really frequency invariant; CPUFREQ_SKIP_SET_FREQ_SCALE That is easy to fix. Let the drivers call enable_cpufreq_freq_invariance() and set the flag. > > would be set if either: > > - the driver calls arch_set_freq_scale() on its own > > - the driver does not want arch_set_freq_scale() to be called. > > > > So at the core level we would not be able to distinguish between the > > two, and return whether cpufreq-based invariance is supported. > > > > I don't really see a reason why a driver would not want to set the > > frequency scale factor A simple case where the driver doesn't have any idea what the real freq of the CPU is and it doesn't have counters to guess it as well. There can be other reasons which we aren't able to imagine at this point of time.
Hi, On Thursday 02 Jul 2020 at 08:28:18 (+0530), Viresh Kumar wrote: > On 01-07-20, 18:05, Rafael J. Wysocki wrote: > > On Wed, Jul 1, 2020 at 3:33 PM Ionela Voinescu <ionela.voinescu@arm.com> wrote: > > > On Wednesday 01 Jul 2020 at 16:16:17 (+0530), Viresh Kumar wrote: > > > > I will rather suggest CPUFREQ_SKIP_SET_FREQ_SCALE as the name and > > > > functionality. We need to give drivers a choice if they do not want > > > > the core to do it on their behalf, because they are doing it on their > > > > own or they don't want to do it. > > > > Well, this would go backwards to me, as we seem to be designing an > > opt-out flag for something that's not even implemented already. > > > > I would go for an opt-in instead. That would be much cleaner and less > > prone to regressions IMO. > > That's fine, I just wanted an option for drivers to opt-out of this > thing. I felt okay with the opt-out flag as this should be enabled for > most of the drivers and so enabling by default looked okay as well. > > > > In this case we would not be able to tell if cpufreq (driver or core) > > > can provide the frequency scale factor, so we would not be able to tell > > > if the system is really frequency invariant; CPUFREQ_SKIP_SET_FREQ_SCALE > > That is easy to fix. Let the drivers call > enable_cpufreq_freq_invariance() and set the flag. > Right! I suppose part of "the dream" :) was for drivers to be ignorant of frequency invariance, and for the core to figure out if it has proper information to potentially* pass to the scheduler. *potentially = depending on the arch_set_freq_scale() definition. > > > would be set if either: > > > - the driver calls arch_set_freq_scale() on its own > > > - the driver does not want arch_set_freq_scale() to be called. > > > > > > So at the core level we would not be able to distinguish between the > > > two, and return whether cpufreq-based invariance is supported. > > > > > > I don't really see a reason why a driver would not want to set the > > > frequency scale factor > > A simple case where the driver doesn't have any idea what the real > freq For me, this would have been filtered by either the type of callback they use (target_index(), fast_switch() and even target() would offer some close to accurate indication of the current frequency, while setpolicy() it obviously targets a range of frequencies) or by the definition of arch_set_freq_scale(). > ..of the CPU is and it doesn't have counters to guess it as well. > > There can be other reasons which we aren't able to imagine at this > point of time. > But I understand both the points you and Rafael raised so it's obvious that a 'opt in' flag would be the better option. Thank you both, Ionela. > -- > viresh
On 02/07/2020 13:44, Ionela Voinescu wrote: > Hi, > > On Thursday 02 Jul 2020 at 08:28:18 (+0530), Viresh Kumar wrote: >> On 01-07-20, 18:05, Rafael J. Wysocki wrote: >>> On Wed, Jul 1, 2020 at 3:33 PM Ionela Voinescu <ionela.voinescu@arm.com> wrote: >>>> On Wednesday 01 Jul 2020 at 16:16:17 (+0530), Viresh Kumar wrote: [...] >> There can be other reasons which we aren't able to imagine at this >> point of time. >> > > But I understand both the points you and Rafael raised so it's obvious > that a 'opt in' flag would be the better option. Why can't we just move the arch_set_freq_scale() call from cpufreq driver to cpufreq core w/o introducing a FIE related driver flag? Current scenario for Frequency Invariance Engine (FIE) on arm/arm64. +------------------------------+ +------------------------------+ | | | | | cpufreq core: | | arch: (arm, arm64) | | | | | | weak arch_set_freq_scale() {}| | | | | | | +------------------------------+ | | | | +------------------------------+ | | | | | | | cpufreq driver: | | | | +-----------> arch_set_freq_scale() | | | | { | +------------------------------+ | if (use counters) | | return; | +------------------------------+ | ... | | | | } | | task scheduler: | | | | +-----------> arch_scale_freq_tick()* | | | | { | | | | if (!use counters) | | | | return; | | | | ... | | | | } | +------------------------------+ +------------------------------+ * defined as topology_scale_freq_tick() in arm64 Only Arm/Arm64 defines arch_set_freq_scale() to get the 'legacy' CPUfreq based FIE. This would still be the case when we move arch_set_freq_scale() from individual cpufreq drivers to cpufreq core. Arm64 is the only arch which has to runtime-choose between two different FIEs. This is currently done by bailing out early in one of the FIE functions based on 'use counters'. X86 (and others) will continue to not define arch_set_freq_scale(). The issue with CONFIG_BL_SWITCHER (vexpress-spc-cpufreq.c) could be solved arm/arm64 internally (arch_topology.c) by putting arch_set_freq_scale() under a !CONFIG_BL_SWITCHER guard. I doubt that there are any arm bL systems out there running it. At least I'm not aware of any complaints due to missing FIE support in bl switcher setups so far.
Hi guys, On Monday 06 Jul 2020 at 14:14:47 (+0200), Dietmar Eggemann wrote: > On 02/07/2020 13:44, Ionela Voinescu wrote: > > Hi, > > > > On Thursday 02 Jul 2020 at 08:28:18 (+0530), Viresh Kumar wrote: > >> On 01-07-20, 18:05, Rafael J. Wysocki wrote: > >>> On Wed, Jul 1, 2020 at 3:33 PM Ionela Voinescu <ionela.voinescu@arm.com> wrote: > >>>> On Wednesday 01 Jul 2020 at 16:16:17 (+0530), Viresh Kumar wrote: > > [...] > > >> There can be other reasons which we aren't able to imagine at this > >> point of time. > >> > > > > But I understand both the points you and Rafael raised so it's obvious > > that a 'opt in' flag would be the better option. > > Why can't we just move the arch_set_freq_scale() call from cpufreq > driver to cpufreq core w/o introducing a FIE related driver flag? > > Current scenario for Frequency Invariance Engine (FIE) on arm/arm64. > > +------------------------------+ +------------------------------+ > | | | | > | cpufreq core: | | arch: (arm, arm64) | > > | | | | > | weak arch_set_freq_scale() {}| | | > | | | | > +------------------------------+ | | > | | > +------------------------------+ | | > | | | | > | cpufreq driver: | | | > | +-----------> arch_set_freq_scale() | > | | | { | > +------------------------------+ | if (use counters) | > | return; | > +------------------------------+ | ... | > | | | } | > | task scheduler: | | | > | +-----------> arch_scale_freq_tick()* | > | | | { | > > | | | if (!use counters) | > | | | return; | > | | | ... | > | | | } | > +------------------------------+ +------------------------------+ > > * defined as topology_scale_freq_tick() in arm64 > > Only Arm/Arm64 defines arch_set_freq_scale() to get the 'legacy' CPUfreq > based FIE. This would still be the case when we move > arch_set_freq_scale() from individual cpufreq drivers to cpufreq core. > > Arm64 is the only arch which has to runtime-choose between two different > FIEs. This is currently done by bailing out early in one of the FIE > functions based on 'use counters'. > > X86 (and others) will continue to not define arch_set_freq_scale(). > > The issue with CONFIG_BL_SWITCHER (vexpress-spc-cpufreq.c) could be > solved arm/arm64 internally (arch_topology.c) by putting > arch_set_freq_scale() under a !CONFIG_BL_SWITCHER guard. > I doubt that there are any arm bL systems out there running it. At least > I'm not aware of any complaints due to missing FIE support in bl > switcher setups so far. Thank you Dietmar, for your review. I was trying to suggest the same in my other replies. Given that BL_SWITCHER can be removed as an argument for introducing a flag, I would also find it cleaner to just skip on introducing a flag altogether, at least until we have a driver/scenario in the kernel that will functionally benefit from it. This would also give us the chance to reconsider the best meaning of the flag we later introduce. The introduction of the 'opt in' flag would be the next best thing as suggested in the other replies, but currently it would not result in anything functionally different. Rafael, Viresh, would you mind confirming whether you still consider having an 'opt in' flag is preferable here? Many thanks, Ionela.
On 09-07-20, 09:53, Ionela Voinescu wrote: > On Monday 06 Jul 2020 at 14:14:47 (+0200), Dietmar Eggemann wrote: > > Why can't we just move the arch_set_freq_scale() call from cpufreq > > driver to cpufreq core w/o introducing a FIE related driver flag? > > > > Current scenario for Frequency Invariance Engine (FIE) on arm/arm64. > > > > +------------------------------+ +------------------------------+ > > | | | | > > | cpufreq core: | | arch: (arm, arm64) | > > > > | | | | > > | weak arch_set_freq_scale() {}| | | > > | | | | > > +------------------------------+ | | > > | | > > +------------------------------+ | | > > | | | | > > | cpufreq driver: | | | > > | +-----------> arch_set_freq_scale() | > > | | | { | > > +------------------------------+ | if (use counters) | > > | return; | > > +------------------------------+ | ... | > > | | | } | > > | task scheduler: | | | > > | +-----------> arch_scale_freq_tick()* | > > | | | { | > > > > | | | if (!use counters) | > > | | | return; | > > | | | ... | > > | | | } | > > +------------------------------+ +------------------------------+ > > > > * defined as topology_scale_freq_tick() in arm64 > > > > Only Arm/Arm64 defines arch_set_freq_scale() to get the 'legacy' CPUfreq > > based FIE. This would still be the case when we move > > arch_set_freq_scale() from individual cpufreq drivers to cpufreq core. > > > > Arm64 is the only arch which has to runtime-choose between two different > > FIEs. This is currently done by bailing out early in one of the FIE > > functions based on 'use counters'. > > > > X86 (and others) will continue to not define arch_set_freq_scale(). > > > > The issue with CONFIG_BL_SWITCHER (vexpress-spc-cpufreq.c) could be > > solved arm/arm64 internally (arch_topology.c) by putting > > arch_set_freq_scale() under a !CONFIG_BL_SWITCHER guard. > > I doubt that there are any arm bL systems out there running it. At least > > I'm not aware of any complaints due to missing FIE support in bl > > switcher setups so far. I agree to that. > Thank you Dietmar, for your review. > > I was trying to suggest the same in my other replies. I am sorry, I must have overlooked that part in your replies, otherwise I may agreed to it :) > Rafael, Viresh, would you mind confirming whether you still consider > having an 'opt in' flag is preferable here? Well, we wanted an opt-in flag instead of an opt-out one. And no flag is certainly better.
diff --git a/drivers/cpufreq/cpufreq-dt.c b/drivers/cpufreq/cpufreq-dt.c index 944d7b45afe9..8e0571a49d1e 100644 --- a/drivers/cpufreq/cpufreq-dt.c +++ b/drivers/cpufreq/cpufreq-dt.c @@ -331,7 +331,8 @@ static int cpufreq_exit(struct cpufreq_policy *policy) static struct cpufreq_driver dt_cpufreq_driver = { .flags = CPUFREQ_STICKY | CPUFREQ_NEED_INITIAL_FREQ_CHECK | - CPUFREQ_IS_COOLING_DEV, + CPUFREQ_IS_COOLING_DEV | + CPUFREQ_CUSTOM_SET_FREQ_SCALE, .verify = cpufreq_generic_frequency_table_verify, .target_index = set_target, .get = cpufreq_generic_get, diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c b/drivers/cpufreq/qcom-cpufreq-hw.c index 573630c23aca..e13780beb373 100644 --- a/drivers/cpufreq/qcom-cpufreq-hw.c +++ b/drivers/cpufreq/qcom-cpufreq-hw.c @@ -337,7 +337,8 @@ static struct freq_attr *qcom_cpufreq_hw_attr[] = { static struct cpufreq_driver cpufreq_qcom_hw_driver = { .flags = CPUFREQ_STICKY | CPUFREQ_NEED_INITIAL_FREQ_CHECK | CPUFREQ_HAVE_GOVERNOR_PER_POLICY | - CPUFREQ_IS_COOLING_DEV, + CPUFREQ_IS_COOLING_DEV | + CPUFREQ_CUSTOM_SET_FREQ_SCALE, .verify = cpufreq_generic_frequency_table_verify, .target_index = qcom_cpufreq_hw_target_index, .get = qcom_cpufreq_hw_get, diff --git a/drivers/cpufreq/scmi-cpufreq.c b/drivers/cpufreq/scmi-cpufreq.c index fb42e3390377..16ab4ecc75e4 100644 --- a/drivers/cpufreq/scmi-cpufreq.c +++ b/drivers/cpufreq/scmi-cpufreq.c @@ -223,7 +223,8 @@ static struct cpufreq_driver scmi_cpufreq_driver = { .name = "scmi", .flags = CPUFREQ_STICKY | CPUFREQ_HAVE_GOVERNOR_PER_POLICY | CPUFREQ_NEED_INITIAL_FREQ_CHECK | - CPUFREQ_IS_COOLING_DEV, + CPUFREQ_IS_COOLING_DEV | + CPUFREQ_CUSTOM_SET_FREQ_SCALE, .verify = cpufreq_generic_frequency_table_verify, .attr = cpufreq_generic_attr, .target_index = scmi_cpufreq_set_target, diff --git a/drivers/cpufreq/scpi-cpufreq.c b/drivers/cpufreq/scpi-cpufreq.c index b0f5388b8854..6b5f56dc3ca3 100644 --- a/drivers/cpufreq/scpi-cpufreq.c +++ b/drivers/cpufreq/scpi-cpufreq.c @@ -197,7 +197,8 @@ static struct cpufreq_driver scpi_cpufreq_driver = { .name = "scpi-cpufreq", .flags = CPUFREQ_STICKY | CPUFREQ_HAVE_GOVERNOR_PER_POLICY | CPUFREQ_NEED_INITIAL_FREQ_CHECK | - CPUFREQ_IS_COOLING_DEV, + CPUFREQ_IS_COOLING_DEV | + CPUFREQ_CUSTOM_SET_FREQ_SCALE, .verify = cpufreq_generic_frequency_table_verify, .attr = cpufreq_generic_attr, .get = scpi_cpufreq_get_rate, diff --git a/drivers/cpufreq/vexpress-spc-cpufreq.c b/drivers/cpufreq/vexpress-spc-cpufreq.c index 4e8b1dee7c9a..e0a1a3367ec5 100644 --- a/drivers/cpufreq/vexpress-spc-cpufreq.c +++ b/drivers/cpufreq/vexpress-spc-cpufreq.c @@ -496,7 +496,8 @@ static struct cpufreq_driver ve_spc_cpufreq_driver = { .name = "vexpress-spc", .flags = CPUFREQ_STICKY | CPUFREQ_HAVE_GOVERNOR_PER_POLICY | - CPUFREQ_NEED_INITIAL_FREQ_CHECK, + CPUFREQ_NEED_INITIAL_FREQ_CHECK | + CPUFREQ_CUSTOM_SET_FREQ_SCALE, .verify = cpufreq_generic_frequency_table_verify, .target_index = ve_spc_cpufreq_set_target, .get = ve_spc_cpufreq_get_rate, diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index 3494f6763597..42668588f9f8 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -293,7 +293,7 @@ __ATTR(_name, 0644, show_##_name, store_##_name) struct cpufreq_driver { char name[CPUFREQ_NAME_LEN]; - u8 flags; + u16 flags; void *driver_data; /* needed by all drivers */ @@ -417,6 +417,14 @@ struct cpufreq_driver { */ #define CPUFREQ_IS_COOLING_DEV BIT(7) +/* + * Set by drivers which implement the necessary calls to the scheduler's + * frequency invariance engine. The use of this flag will result in the + * default arch_set_freq_scale calls being skipped in favour of custom + * driver calls. + */ +#define CPUFREQ_CUSTOM_SET_FREQ_SCALE BIT(8) + int cpufreq_register_driver(struct cpufreq_driver *driver_data); int cpufreq_unregister_driver(struct cpufreq_driver *driver_data);
The scheduler's Frequency Invariance Engine (FIE) is providing a frequency scale correction factor that helps achieve more accurate load-tracking by conveying information about the currently selected frequency relative to the maximum supported frequency of a CPU. In some cases this is achieved by passing information from cpufreq drivers about the frequency selection done by cpufreq. Given that most drivers follow a similar process of selecting and setting of frequency, there is a strong case for moving the setting of the frequency scale factor from the cpufreq drivers frequency switch callbacks (target_index() and fast_switch()), to the cpufreq core functions that call them. In preparation for this, acknowledge that there are still drivers who's frequency setting process is custom and therefore these drivers will want to provide and flag custom support for the setting of the scheduler's frequency invariance (FI) scale factor as well. Prepare for this by introducing a new flag: CPUFREQ_CUSTOM_SET_FREQ_SCALE. Examples of users of this flag are: - drivers that do not implement the callbacks that lend themselves to triggering the setting of the FI scale factor, - drivers that implement the appropriate callbacks but which have an atypical implementation. Currently, given that all drivers call arch_set_freq_scale() directly, flag all users with CPUFREQ_CUSTOM_SET_FREQ_SCALE. These driver changes are also useful to maintain bisection between the FI switch from the drivers to the core. Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Viresh Kumar <viresh.kumar@linaro.org> --- drivers/cpufreq/cpufreq-dt.c | 3 ++- drivers/cpufreq/qcom-cpufreq-hw.c | 3 ++- drivers/cpufreq/scmi-cpufreq.c | 3 ++- drivers/cpufreq/scpi-cpufreq.c | 3 ++- drivers/cpufreq/vexpress-spc-cpufreq.c | 3 ++- include/linux/cpufreq.h | 10 +++++++++- 6 files changed, 19 insertions(+), 6 deletions(-)