Message ID | 20180703234705.227473-6-mka@chromium.org (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
Hi Matthias, Firstly, I'm not sure why devfreq needs the devfreq_verify_within_limits() function. devfreq already used the OPP interface as default. It means that the outside of 'drivers/devfreq' can disable/enable the frequency such as drivers/thermal/devfreq_cooling.c. Also, when some device drivers disable/enable the specific frequency, the devfreq core consider them. So, devfreq doesn't need to devfreq_verify_within_limits() because already support some interface to change the minimum/maximum frequency of devfreq device. In case of cpufreq subsystem, cpufreq only provides 'cpufreq_verify_with_limits()' to change the minimum/maximum frequency of cpu. some device driver cannot change the minimum/maximum frequency through OPP interface. But, in case of devfreq subsystem, as I explained already, devfreq support the OPP interface as default way. devfreq subsystem doesn't need to add other way to change the minimum/maximum frequency. Secondly, This patch send the 'struct devfreq_policy' instance as the data when sending the notification as following: srcu_notifier_call_chain(&devfreq->policy_notifier_list, DEVFREQ_ADJUST, policy); But, I think that if devfreq core sends the 'struct devfreq_freq_limits' instance instead of 'struct devfreq_policy', it is enough. Because receiver of DEVFREQ_ADJUST just will use the min_freq/max_freq variables. So, I tried to find the cpufreq's case. The some device drivers using CPUFREQ_POLICY_NOTIFIER uses following variables of 'struct cpufreq_policy'. It means that receiver of CPUFREQ_POLICY_NOTIFIER don't need to other information/variables except for min/max frequency. - policy->min - policy->max - policy->cpuinfo.max_freq - policy->cpuinfo.min_freq - policy->cpu : not related to devfreq) - policy->related_cpus : not related to devfreq) - list of device drivers using CPUFREQ_POLICY_NOTIFIER (linux kernel is v4.18-rc1) $ grep -rn "CPUFREQ_POLICY_NOTIFIER" . ./drivers/macintosh/windfarm_cpufreq_clamp.c ./drivers/thermal/cpu_cooling.c ./drivers/thermal/cpu_cooling.c ./drivers/acpi/processor_thermal.c ./drivers/acpi/processor_thermal.c ./drivers/acpi/processor_perflib.c ./drivers/acpi/processor_perflib.c ./drivers/base/arch_topology.c ./drivers/base/arch_topology.c ./drivers/video/fbdev/sa1100fb.c ./drivers/video/fbdev/pxafb.c ./drivers/cpufreq/ppc_cbe_cpufreq_pmi.c ./drivers/cpufreq/cpufreq.c ./drivers/cpufreq/cpufreq.c ./drivers/cpufreq/cpufreq.c ./drivers/cpufreq/cpufreq.c On 2018년 07월 04일 08:46, Matthias Kaehlcke wrote: > Policy notifiers are called before a frequency change and may narrow > the min/max frequency range in devfreq_policy, which is used to adjust > the target frequency if it is beyond this range. > > Also add a few helpers: > - devfreq_verify_within_[dev_]limits() > - should be used by the notifiers for policy adjustments. > - dev_to_devfreq() > - lookup a devfreq strict from a device pointer > > Signed-off-by: Matthias Kaehlcke <mka@chromium.org> > Reviewed-by: Brian Norris <briannorris@chromium.org> > --- > Changes in v5: > - none > > Changes in v4: > - Fixed typo in commit message: devfreg => devfreq > - added 'Reviewed-by: Brian Norris <briannorris@chromium.org>' tag > > Changes in v3: > - devfreq.h: fixed misspelling of struct devfreq_policy > > Changes in v2: > - performance, powersave and simpleondemand governors don't need changes > with "PM / devfreq: Don't adjust to user limits in governors" > - formatting fixes > --- > drivers/devfreq/devfreq.c | 48 ++++++++++++++++++++++------- > include/linux/devfreq.h | 65 +++++++++++++++++++++++++++++++++++++++ > 2 files changed, 102 insertions(+), 11 deletions(-) > > diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c > index 21604d6ae2b8..4cbaa7ad1972 100644 > --- a/drivers/devfreq/devfreq.c > +++ b/drivers/devfreq/devfreq.c > @@ -72,6 +72,21 @@ static struct devfreq *find_device_devfreq(struct device *dev) > return ERR_PTR(-ENODEV); > } > > +/** > + * dev_to_devfreq() - find devfreq struct using device pointer > + * @dev: device pointer used to lookup device devfreq. > + */ > +struct devfreq *dev_to_devfreq(struct device *dev) > +{ > + struct devfreq *devfreq; > + > + mutex_lock(&devfreq_list_lock); > + devfreq = find_device_devfreq(dev); > + mutex_unlock(&devfreq_list_lock); > + > + return devfreq; > +} > + > static unsigned long find_available_min_freq(struct devfreq *devfreq) > { > struct dev_pm_opp *opp; > @@ -269,20 +284,21 @@ int update_devfreq(struct devfreq *devfreq) > if (!policy->governor) > return -EINVAL; > > + policy->min = policy->devinfo.min_freq; > + policy->max = policy->devinfo.max_freq; Why don't you consider 'policy->user.max/min_freq' as following? As I already commented, I think that 'struct devfreq_freq_limits' is enough instead of 'struct devfreq_policy'. ->max_freq = MIN(policy->devinfo.max_freq, policy->user.max_freq); ->min_freq = MAX(policy->devinfo.min_freq, policy->user.min_freq); > + > + srcu_notifier_call_chain(&devfreq->policy_notifier_list, > + DEVFREQ_ADJUST, policy); > + > /* Reevaluate the proper frequency */ > err = policy->governor->get_target_freq(devfreq, &freq); > if (err) > return err; > > - /* > - * Adjust the frequency with user freq, QoS and available freq. > - * > - * List from the highest priority > - * max_freq > - * min_freq > - */ > - max_freq = MIN(policy->devinfo.max_freq, policy->user.max_freq); > - min_freq = MAX(policy->devinfo.min_freq, policy->user.min_freq); > + /* Adjust the frequency */ > + > + max_freq = MIN(policy->max, policy->user.max_freq); > + min_freq = MAX(policy->min, policy->user.min_freq); > > if (freq < min_freq) { > freq = min_freq; > @@ -645,6 +661,7 @@ struct devfreq *devfreq_add_device(struct device *dev, > devfreq->last_stat_updated = jiffies; > > srcu_init_notifier_head(&devfreq->transition_notifier_list); > + srcu_init_notifier_head(&devfreq->policy_notifier_list); > > mutex_unlock(&devfreq->lock); > > @@ -1445,7 +1462,7 @@ EXPORT_SYMBOL(devm_devfreq_unregister_opp_notifier); > * devfreq_register_notifier() - Register a driver with devfreq > * @devfreq: The devfreq object. > * @nb: The notifier block to register. > - * @list: DEVFREQ_TRANSITION_NOTIFIER. > + * @list: DEVFREQ_TRANSITION_NOTIFIER or DEVFREQ_POLICY_NOTIFIER. > */ > int devfreq_register_notifier(struct devfreq *devfreq, > struct notifier_block *nb, > @@ -1461,6 +1478,10 @@ int devfreq_register_notifier(struct devfreq *devfreq, > ret = srcu_notifier_chain_register( > &devfreq->transition_notifier_list, nb); > break; > + case DEVFREQ_POLICY_NOTIFIER: > + ret = srcu_notifier_chain_register( > + &devfreq->policy_notifier_list, nb); > + break; > default: > ret = -EINVAL; > } > @@ -1473,7 +1494,7 @@ EXPORT_SYMBOL(devfreq_register_notifier); > * devfreq_unregister_notifier() - Unregister a driver with devfreq > * @devfreq: The devfreq object. > * @nb: The notifier block to be unregistered. > - * @list: DEVFREQ_TRANSITION_NOTIFIER. > + * @list: DEVFREQ_TRANSITION_NOTIFIER or DEVFREQ_POLICY_NOTIFIER. > */ > int devfreq_unregister_notifier(struct devfreq *devfreq, > struct notifier_block *nb, > @@ -1489,6 +1510,11 @@ int devfreq_unregister_notifier(struct devfreq *devfreq, > ret = srcu_notifier_chain_unregister( > &devfreq->transition_notifier_list, nb); > break; > + case DEVFREQ_POLICY_NOTIFIER: > + ret = srcu_notifier_chain_unregister( > + &devfreq->policy_notifier_list, nb); > + break; > + > default: > ret = -EINVAL; > } > diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h > index 9bf23b976f4d..7c8dce96db73 100644 > --- a/include/linux/devfreq.h > +++ b/include/linux/devfreq.h > @@ -33,6 +33,10 @@ > #define DEVFREQ_PRECHANGE (0) > #define DEVFREQ_POSTCHANGE (1) > > +#define DEVFREQ_POLICY_NOTIFIER 1 > + > +#define DEVFREQ_ADJUST 0 > + > struct devfreq; > struct devfreq_governor; > > @@ -121,12 +125,16 @@ struct devfreq_freq_limits { > > /** > * struct devfreq_policy - Devfreq policy > + * @min: minimum frequency (adjustable by policy notifiers) > + * @min: maximum frequency (adjustable by policy notifiers) > * @user: frequency limits requested by the user > * @devinfo: frequency limits of the device (available OPPs) > * @governor: method how to choose frequency based on the usage. > * @governor_name: devfreq governor name for use with this devfreq > */ > struct devfreq_policy { > + unsigned long min; > + unsigned long max; > struct devfreq_freq_limits user; > struct devfreq_freq_limits devinfo; > const struct devfreq_governor *governor; > @@ -155,6 +163,7 @@ struct devfreq_policy { > * @time_in_state: Statistics of devfreq states > * @last_stat_updated: The last time stat updated > * @transition_notifier_list: list head of DEVFREQ_TRANSITION_NOTIFIER notifier > + * @policy_notifier_list: list head of DEVFREQ_POLICY_NOTIFIER notifier > * > * This structure stores the devfreq information for a give device. > * > @@ -188,6 +197,7 @@ struct devfreq { > unsigned long last_stat_updated; > > struct srcu_notifier_head transition_notifier_list; > + struct srcu_notifier_head policy_notifier_list; > }; > > struct devfreq_freqs { > @@ -240,6 +250,45 @@ extern void devm_devfreq_unregister_notifier(struct device *dev, > extern struct devfreq *devfreq_get_devfreq_by_phandle(struct device *dev, > int index); > > +/** > + * devfreq_verify_within_limits() - Adjust a devfreq policy if needed to make > + * sure its min/max values are within a > + * specified range. > + * @policy: the policy > + * @min: the minimum frequency > + * @max: the maximum frequency > + */ > +static inline void devfreq_verify_within_limits(struct devfreq_policy *policy, > + unsigned int min, unsigned int max) > +{ > + if (policy->min < min) > + policy->min = min; > + if (policy->max < min) > + policy->max = min; > + if (policy->min > max) > + policy->min = max; > + if (policy->max > max) > + policy->max = max; > + if (policy->min > policy->max) > + policy->min = policy->max; > +} > + > +/** > + * devfreq_verify_within_dev_limits() - Adjust a devfreq policy if needed to > + * make sure its min/max values are within > + * the frequency range supported by the > + * device. > + * @policy: the policy > + */ > +static inline void > +devfreq_verify_within_dev_limits(struct devfreq_policy *policy) > +{ > + devfreq_verify_within_limits(policy, policy->devinfo.min_freq, > + policy->devinfo.max_freq); > +} > + > +struct devfreq *dev_to_devfreq(struct device *dev); > + > #if IS_ENABLED(CONFIG_DEVFREQ_GOV_SIMPLE_ONDEMAND) > /** > * struct devfreq_simple_ondemand_data - void *data fed to struct devfreq > @@ -394,10 +443,26 @@ static inline struct devfreq *devfreq_get_devfreq_by_phandle(struct device *dev, > return ERR_PTR(-ENODEV); > } > > +static inline void devfreq_verify_within_limits(struct devfreq_policy *policy, > + unsigned int min, unsigned int max) > +{ > +} > + > +static inline void > +devfreq_verify_within_dev_limits(struct devfreq_policy *policy) > +{ > +} > + > static inline int devfreq_update_stats(struct devfreq *df) > { > return -EINVAL; > } > + > +static inline struct devfreq *dev_to_devfreq(struct device *dev) > +{ > + return NULL; > +} > + > #endif /* CONFIG_PM_DEVFREQ */ > > #endif /* __LINUX_DEVFREQ_H__ */ >
Hi Chanwoo, On Wed, Jul 04, 2018 at 03:41:46PM +0900, Chanwoo Choi wrote: > Firstly, > I'm not sure why devfreq needs the devfreq_verify_within_limits() function. > > devfreq already used the OPP interface as default. It means that > the outside of 'drivers/devfreq' can disable/enable the frequency > such as drivers/thermal/devfreq_cooling.c. Also, when some device > drivers disable/enable the specific frequency, the devfreq core > consider them. > > So, devfreq doesn't need to devfreq_verify_within_limits() because > already support some interface to change the minimum/maximum frequency > of devfreq device. > > In case of cpufreq subsystem, cpufreq only provides 'cpufreq_verify_with_limits()' > to change the minimum/maximum frequency of cpu. some device driver cannot > change the minimum/maximum frequency through OPP interface. > > But, in case of devfreq subsystem, as I explained already, devfreq support > the OPP interface as default way. devfreq subsystem doesn't need to add > other way to change the minimum/maximum frequency. Using the OPP interface exclusively works as long as a enabling/disabling of OPPs is limited to a single driver (drivers/thermal/devfreq_cooling.c). When multiple drivers are involved you need a way to resolve conflicts, that's the purpose of devfreq_verify_within_limits(). Please let me know if there are existing mechanisms for conflict resolution that I overlooked. Possibly drivers/thermal/devfreq_cooling.c could be migrated to use devfreq_verify_within_limits() instead of the OPP interface if desired, however this seems beyond the scope of this series. > Secondly, > This patch send the 'struct devfreq_policy' instance as the data > when sending the notification as following: > > srcu_notifier_call_chain(&devfreq->policy_notifier_list, > DEVFREQ_ADJUST, policy); > > But, I think that if devfreq core sends the 'struct devfreq_freq_limits' > instance instead of 'struct devfreq_policy', it is enough. > Because receiver of DEVFREQ_ADJUST just will use the min_freq/max_freq variables. > > So, I tried to find the cpufreq's case. The some device drivers using > CPUFREQ_POLICY_NOTIFIER uses following variables of 'struct cpufreq_policy'. > It means that receiver of CPUFREQ_POLICY_NOTIFIER don't need to other > information/variables except for min/max frequency. > > - policy->min > - policy->max > - policy->cpuinfo.max_freq > - policy->cpuinfo.min_freq > - policy->cpu : not related to devfreq) > - policy->related_cpus : not related to devfreq) > > - list of device drivers using CPUFREQ_POLICY_NOTIFIER (linux kernel is v4.18-rc1) > $ grep -rn "CPUFREQ_POLICY_NOTIFIER" . > ./drivers/macintosh/windfarm_cpufreq_clamp.c > ./drivers/thermal/cpu_cooling.c > ./drivers/thermal/cpu_cooling.c > ./drivers/acpi/processor_thermal.c > ./drivers/acpi/processor_thermal.c > ./drivers/acpi/processor_perflib.c > ./drivers/acpi/processor_perflib.c > ./drivers/base/arch_topology.c > ./drivers/base/arch_topology.c > ./drivers/video/fbdev/sa1100fb.c > ./drivers/video/fbdev/pxafb.c > ./drivers/cpufreq/ppc_cbe_cpufreq_pmi.c > ./drivers/cpufreq/cpufreq.c > ./drivers/cpufreq/cpufreq.c > ./drivers/cpufreq/cpufreq.c > ./drivers/cpufreq/cpufreq.c Thanks for your investigation. I decided to mirror the cpufreq interface for consistency, but I agree that 'struct devfreq_freq_limits' could be passed instead of the policy object. I'm fine with changing that. > On 2018년 07월 04일 08:46, Matthias Kaehlcke wrote: > > Policy notifiers are called before a frequency change and may narrow > > the min/max frequency range in devfreq_policy, which is used to adjust > > the target frequency if it is beyond this range. > > > > Also add a few helpers: > > - devfreq_verify_within_[dev_]limits() > > - should be used by the notifiers for policy adjustments. > > - dev_to_devfreq() > > - lookup a devfreq strict from a device pointer > > > > Signed-off-by: Matthias Kaehlcke <mka@chromium.org> > > Reviewed-by: Brian Norris <briannorris@chromium.org> > > --- > > Changes in v5: > > - none > > > > Changes in v4: > > - Fixed typo in commit message: devfreg => devfreq > > - added 'Reviewed-by: Brian Norris <briannorris@chromium.org>' tag > > > > Changes in v3: > > - devfreq.h: fixed misspelling of struct devfreq_policy > > > > Changes in v2: > > - performance, powersave and simpleondemand governors don't need changes > > with "PM / devfreq: Don't adjust to user limits in governors" > > - formatting fixes > > --- > > drivers/devfreq/devfreq.c | 48 ++++++++++++++++++++++------- > > include/linux/devfreq.h | 65 +++++++++++++++++++++++++++++++++++++++ > > 2 files changed, 102 insertions(+), 11 deletions(-) > > > > diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c > > index 21604d6ae2b8..4cbaa7ad1972 100644 > > --- a/drivers/devfreq/devfreq.c > > +++ b/drivers/devfreq/devfreq.c > > @@ -72,6 +72,21 @@ static struct devfreq *find_device_devfreq(struct device *dev) > > return ERR_PTR(-ENODEV); > > } > > > > +/** > > + * dev_to_devfreq() - find devfreq struct using device pointer > > + * @dev: device pointer used to lookup device devfreq. > > + */ > > +struct devfreq *dev_to_devfreq(struct device *dev) > > +{ > > + struct devfreq *devfreq; > > + > > + mutex_lock(&devfreq_list_lock); > > + devfreq = find_device_devfreq(dev); > > + mutex_unlock(&devfreq_list_lock); > > + > > + return devfreq; > > +} > > + > > static unsigned long find_available_min_freq(struct devfreq *devfreq) > > { > > struct dev_pm_opp *opp; > > @@ -269,20 +284,21 @@ int update_devfreq(struct devfreq *devfreq) > > if (!policy->governor) > > return -EINVAL; > > > > + policy->min = policy->devinfo.min_freq; > > + policy->max = policy->devinfo.max_freq; > > Why don't you consider 'policy->user.max/min_freq' as following? > As I already commented, I think that 'struct devfreq_freq_limits' is enough > instead of 'struct devfreq_policy'. > > ->max_freq = MIN(policy->devinfo.max_freq, policy->user.max_freq); > ->min_freq = MAX(policy->devinfo.min_freq, policy->user.min_freq); You mean limiting the frequency range with user.min/max before DEVFREQ_ADJUST instead of adjusting it afterwards? That's fine with me. Thanks Matthias
Hi Matthias, On 2018년 07월 07일 02:53, Matthias Kaehlcke wrote: > Hi Chanwoo, > > On Wed, Jul 04, 2018 at 03:41:46PM +0900, Chanwoo Choi wrote: > >> Firstly, >> I'm not sure why devfreq needs the devfreq_verify_within_limits() function. >> >> devfreq already used the OPP interface as default. It means that >> the outside of 'drivers/devfreq' can disable/enable the frequency >> such as drivers/thermal/devfreq_cooling.c. Also, when some device >> drivers disable/enable the specific frequency, the devfreq core >> consider them. >> >> So, devfreq doesn't need to devfreq_verify_within_limits() because >> already support some interface to change the minimum/maximum frequency >> of devfreq device. >> >> In case of cpufreq subsystem, cpufreq only provides 'cpufreq_verify_with_limits()' >> to change the minimum/maximum frequency of cpu. some device driver cannot >> change the minimum/maximum frequency through OPP interface. >> >> But, in case of devfreq subsystem, as I explained already, devfreq support >> the OPP interface as default way. devfreq subsystem doesn't need to add >> other way to change the minimum/maximum frequency. > > Using the OPP interface exclusively works as long as a > enabling/disabling of OPPs is limited to a single driver > (drivers/thermal/devfreq_cooling.c). When multiple drivers are > involved you need a way to resolve conflicts, that's the purpose of > devfreq_verify_within_limits(). Please let me know if there are > existing mechanisms for conflict resolution that I overlooked. > > Possibly drivers/thermal/devfreq_cooling.c could be migrated to use > devfreq_verify_within_limits() instead of the OPP interface if > desired, however this seems beyond the scope of this series. Actually, if we uses this approach, it doesn't support the multiple drivers too. If non throttler drivers uses devfreq_verify_within_limits(), the conflict happen. To resolve the conflict for multiple device driver, maybe OPP interface have to support 'usage_count' such as clk_enable/disable(). > >> Secondly, >> This patch send the 'struct devfreq_policy' instance as the data >> when sending the notification as following: >> >> srcu_notifier_call_chain(&devfreq->policy_notifier_list, >> DEVFREQ_ADJUST, policy); >> >> But, I think that if devfreq core sends the 'struct devfreq_freq_limits' >> instance instead of 'struct devfreq_policy', it is enough. >> Because receiver of DEVFREQ_ADJUST just will use the min_freq/max_freq variables. >> >> So, I tried to find the cpufreq's case. The some device drivers using >> CPUFREQ_POLICY_NOTIFIER uses following variables of 'struct cpufreq_policy'. >> It means that receiver of CPUFREQ_POLICY_NOTIFIER don't need to other >> information/variables except for min/max frequency. >> >> - policy->min >> - policy->max >> - policy->cpuinfo.max_freq >> - policy->cpuinfo.min_freq >> - policy->cpu : not related to devfreq) >> - policy->related_cpus : not related to devfreq) >> >> - list of device drivers using CPUFREQ_POLICY_NOTIFIER (linux kernel is v4.18-rc1) >> $ grep -rn "CPUFREQ_POLICY_NOTIFIER" . >> ./drivers/macintosh/windfarm_cpufreq_clamp.c >> ./drivers/thermal/cpu_cooling.c >> ./drivers/thermal/cpu_cooling.c >> ./drivers/acpi/processor_thermal.c >> ./drivers/acpi/processor_thermal.c >> ./drivers/acpi/processor_perflib.c >> ./drivers/acpi/processor_perflib.c >> ./drivers/base/arch_topology.c >> ./drivers/base/arch_topology.c >> ./drivers/video/fbdev/sa1100fb.c >> ./drivers/video/fbdev/pxafb.c >> ./drivers/cpufreq/ppc_cbe_cpufreq_pmi.c >> ./drivers/cpufreq/cpufreq.c >> ./drivers/cpufreq/cpufreq.c >> ./drivers/cpufreq/cpufreq.c >> ./drivers/cpufreq/cpufreq.c > > Thanks for your investigation. > > I decided to mirror the cpufreq interface for consistency, but I agree > that 'struct devfreq_freq_limits' could be passed instead of the > policy object. I'm fine with changing that. > >> On 2018년 07월 04일 08:46, Matthias Kaehlcke wrote: >>> Policy notifiers are called before a frequency change and may narrow >>> the min/max frequency range in devfreq_policy, which is used to adjust >>> the target frequency if it is beyond this range. >>> >>> Also add a few helpers: >>> - devfreq_verify_within_[dev_]limits() >>> - should be used by the notifiers for policy adjustments. >>> - dev_to_devfreq() >>> - lookup a devfreq strict from a device pointer >>> >>> Signed-off-by: Matthias Kaehlcke <mka@chromium.org> >>> Reviewed-by: Brian Norris <briannorris@chromium.org> >>> --- >>> Changes in v5: >>> - none >>> >>> Changes in v4: >>> - Fixed typo in commit message: devfreg => devfreq >>> - added 'Reviewed-by: Brian Norris <briannorris@chromium.org>' tag >>> >>> Changes in v3: >>> - devfreq.h: fixed misspelling of struct devfreq_policy >>> >>> Changes in v2: >>> - performance, powersave and simpleondemand governors don't need changes >>> with "PM / devfreq: Don't adjust to user limits in governors" >>> - formatting fixes >>> --- >>> drivers/devfreq/devfreq.c | 48 ++++++++++++++++++++++------- >>> include/linux/devfreq.h | 65 +++++++++++++++++++++++++++++++++++++++ >>> 2 files changed, 102 insertions(+), 11 deletions(-) >>> >>> diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c >>> index 21604d6ae2b8..4cbaa7ad1972 100644 >>> --- a/drivers/devfreq/devfreq.c >>> +++ b/drivers/devfreq/devfreq.c >>> @@ -72,6 +72,21 @@ static struct devfreq *find_device_devfreq(struct device *dev) >>> return ERR_PTR(-ENODEV); >>> } >>> >>> +/** >>> + * dev_to_devfreq() - find devfreq struct using device pointer >>> + * @dev: device pointer used to lookup device devfreq. >>> + */ >>> +struct devfreq *dev_to_devfreq(struct device *dev) >>> +{ >>> + struct devfreq *devfreq; >>> + >>> + mutex_lock(&devfreq_list_lock); >>> + devfreq = find_device_devfreq(dev); >>> + mutex_unlock(&devfreq_list_lock); >>> + >>> + return devfreq; >>> +} >>> + >>> static unsigned long find_available_min_freq(struct devfreq *devfreq) >>> { >>> struct dev_pm_opp *opp; >>> @@ -269,20 +284,21 @@ int update_devfreq(struct devfreq *devfreq) >>> if (!policy->governor) >>> return -EINVAL; >>> >>> + policy->min = policy->devinfo.min_freq; >>> + policy->max = policy->devinfo.max_freq; >> >> Why don't you consider 'policy->user.max/min_freq' as following? >> As I already commented, I think that 'struct devfreq_freq_limits' is enough >> instead of 'struct devfreq_policy'. >> >> ->max_freq = MIN(policy->devinfo.max_freq, policy->user.max_freq); >> ->min_freq = MAX(policy->devinfo.min_freq, policy->user.min_freq); > > You mean limiting the frequency range with user.min/max before > DEVFREQ_ADJUST instead of adjusting it afterwards? That's fine with > me. > > Thanks > > Matthias > > >
On Thu, Jul 12, 2018 at 05:44:33PM +0900, Chanwoo Choi wrote: > Hi Matthias, > > On 2018년 07월 07일 02:53, Matthias Kaehlcke wrote: > > Hi Chanwoo, > > > > On Wed, Jul 04, 2018 at 03:41:46PM +0900, Chanwoo Choi wrote: > > > >> Firstly, > >> I'm not sure why devfreq needs the devfreq_verify_within_limits() function. > >> > >> devfreq already used the OPP interface as default. It means that > >> the outside of 'drivers/devfreq' can disable/enable the frequency > >> such as drivers/thermal/devfreq_cooling.c. Also, when some device > >> drivers disable/enable the specific frequency, the devfreq core > >> consider them. > >> > >> So, devfreq doesn't need to devfreq_verify_within_limits() because > >> already support some interface to change the minimum/maximum frequency > >> of devfreq device. > >> > >> In case of cpufreq subsystem, cpufreq only provides 'cpufreq_verify_with_limits()' > >> to change the minimum/maximum frequency of cpu. some device driver cannot > >> change the minimum/maximum frequency through OPP interface. > >> > >> But, in case of devfreq subsystem, as I explained already, devfreq support > >> the OPP interface as default way. devfreq subsystem doesn't need to add > >> other way to change the minimum/maximum frequency. > > > > Using the OPP interface exclusively works as long as a > > enabling/disabling of OPPs is limited to a single driver > > (drivers/thermal/devfreq_cooling.c). When multiple drivers are > > involved you need a way to resolve conflicts, that's the purpose of > > devfreq_verify_within_limits(). Please let me know if there are > > existing mechanisms for conflict resolution that I overlooked. > > > > Possibly drivers/thermal/devfreq_cooling.c could be migrated to use > > devfreq_verify_within_limits() instead of the OPP interface if > > desired, however this seems beyond the scope of this series. > > Actually, if we uses this approach, it doesn't support the multiple drivers too. > If non throttler drivers uses devfreq_verify_within_limits(), the conflict > happen. As long as drivers limit the max freq there is no conflict, the lowest max freq wins. I expect this to be the usual case, apparently it worked for cpufreq for 10+ years. However it is correct that there would be a conflict if a driver requests a min freq that is higher than the max freq requested by another. In this case devfreq_verify_within_limits() resolves the conflict by raising p->max to the min freq. Not sure if this is something that would ever occur in practice though. If we are really concerned about this case it would also be an option to limit the adjustment to the max frequency. > To resolve the conflict for multiple device driver, maybe OPP interface > have to support 'usage_count' such as clk_enable/disable(). This would require supporting negative usage count values, since a OPP should not be enabled if e.g. thermal enables it but the throttler disabled it or viceversa. Theoretically there could also be conflicts, like one driver disabling the higher OPPs and another the lower ones, with the outcome of all OPPs being disabled, which would be a more drastic conflict resolution than that of devfreq_verify_within_limits(). Viresh, what do you think about an OPP usage count? Thanks Matthias
On Mon, Jul 16, 2018 at 10:50:50AM -0700, Matthias Kaehlcke wrote: > On Thu, Jul 12, 2018 at 05:44:33PM +0900, Chanwoo Choi wrote: > > Hi Matthias, > > > > On 2018년 07월 07일 02:53, Matthias Kaehlcke wrote: > > > Hi Chanwoo, > > > > > > On Wed, Jul 04, 2018 at 03:41:46PM +0900, Chanwoo Choi wrote: > > > > > >> Firstly, > > >> I'm not sure why devfreq needs the devfreq_verify_within_limits() function. > > >> > > >> devfreq already used the OPP interface as default. It means that > > >> the outside of 'drivers/devfreq' can disable/enable the frequency > > >> such as drivers/thermal/devfreq_cooling.c. Also, when some device > > >> drivers disable/enable the specific frequency, the devfreq core > > >> consider them. > > >> > > >> So, devfreq doesn't need to devfreq_verify_within_limits() because > > >> already support some interface to change the minimum/maximum frequency > > >> of devfreq device. > > >> > > >> In case of cpufreq subsystem, cpufreq only provides 'cpufreq_verify_with_limits()' > > >> to change the minimum/maximum frequency of cpu. some device driver cannot > > >> change the minimum/maximum frequency through OPP interface. > > >> > > >> But, in case of devfreq subsystem, as I explained already, devfreq support > > >> the OPP interface as default way. devfreq subsystem doesn't need to add > > >> other way to change the minimum/maximum frequency. > > > > > > Using the OPP interface exclusively works as long as a > > > enabling/disabling of OPPs is limited to a single driver > > > (drivers/thermal/devfreq_cooling.c). When multiple drivers are > > > involved you need a way to resolve conflicts, that's the purpose of > > > devfreq_verify_within_limits(). Please let me know if there are > > > existing mechanisms for conflict resolution that I overlooked. > > > > > > Possibly drivers/thermal/devfreq_cooling.c could be migrated to use > > > devfreq_verify_within_limits() instead of the OPP interface if > > > desired, however this seems beyond the scope of this series. > > > > Actually, if we uses this approach, it doesn't support the multiple drivers too. > > If non throttler drivers uses devfreq_verify_within_limits(), the conflict > > happen. > > As long as drivers limit the max freq there is no conflict, the lowest > max freq wins. I expect this to be the usual case, apparently it > worked for cpufreq for 10+ years. > > However it is correct that there would be a conflict if a driver > requests a min freq that is higher than the max freq requested by > another. In this case devfreq_verify_within_limits() resolves the > conflict by raising p->max to the min freq. Not sure if this is > something that would ever occur in practice though. > > If we are really concerned about this case it would also be an option > to limit the adjustment to the max frequency. > > > To resolve the conflict for multiple device driver, maybe OPP interface > > have to support 'usage_count' such as clk_enable/disable(). > > This would require supporting negative usage count values, since a OPP > should not be enabled if e.g. thermal enables it but the throttler > disabled it or viceversa. > > Theoretically there could also be conflicts, like one driver disabling > the higher OPPs and another the lower ones, with the outcome of all > OPPs being disabled, which would be a more drastic conflict resolution > than that of devfreq_verify_within_limits(). > > Viresh, what do you think about an OPP usage count? Ping, can we try to reach a conclusion on this or at least keep the discussion going? Not that it matters, but my preferred solution continues to be devfreq_verify_within_limits(). It solves conflicts in some way (which could be adjusted if needed) and has proven to work in practice for 10+ years in a very similar sub-system.
On 2018년 08월 01일 04:39, Matthias Kaehlcke wrote: > On Mon, Jul 16, 2018 at 10:50:50AM -0700, Matthias Kaehlcke wrote: >> On Thu, Jul 12, 2018 at 05:44:33PM +0900, Chanwoo Choi wrote: >>> Hi Matthias, >>> >>> On 2018년 07월 07일 02:53, Matthias Kaehlcke wrote: >>>> Hi Chanwoo, >>>> >>>> On Wed, Jul 04, 2018 at 03:41:46PM +0900, Chanwoo Choi wrote: >>>> >>>>> Firstly, >>>>> I'm not sure why devfreq needs the devfreq_verify_within_limits() function. >>>>> >>>>> devfreq already used the OPP interface as default. It means that >>>>> the outside of 'drivers/devfreq' can disable/enable the frequency >>>>> such as drivers/thermal/devfreq_cooling.c. Also, when some device >>>>> drivers disable/enable the specific frequency, the devfreq core >>>>> consider them. >>>>> >>>>> So, devfreq doesn't need to devfreq_verify_within_limits() because >>>>> already support some interface to change the minimum/maximum frequency >>>>> of devfreq device. >>>>> >>>>> In case of cpufreq subsystem, cpufreq only provides 'cpufreq_verify_with_limits()' >>>>> to change the minimum/maximum frequency of cpu. some device driver cannot >>>>> change the minimum/maximum frequency through OPP interface. >>>>> >>>>> But, in case of devfreq subsystem, as I explained already, devfreq support >>>>> the OPP interface as default way. devfreq subsystem doesn't need to add >>>>> other way to change the minimum/maximum frequency. >>>> >>>> Using the OPP interface exclusively works as long as a >>>> enabling/disabling of OPPs is limited to a single driver >>>> (drivers/thermal/devfreq_cooling.c). When multiple drivers are >>>> involved you need a way to resolve conflicts, that's the purpose of >>>> devfreq_verify_within_limits(). Please let me know if there are >>>> existing mechanisms for conflict resolution that I overlooked. >>>> >>>> Possibly drivers/thermal/devfreq_cooling.c could be migrated to use >>>> devfreq_verify_within_limits() instead of the OPP interface if >>>> desired, however this seems beyond the scope of this series. >>> >>> Actually, if we uses this approach, it doesn't support the multiple drivers too. >>> If non throttler drivers uses devfreq_verify_within_limits(), the conflict >>> happen. >> >> As long as drivers limit the max freq there is no conflict, the lowest >> max freq wins. I expect this to be the usual case, apparently it >> worked for cpufreq for 10+ years. >> >> However it is correct that there would be a conflict if a driver >> requests a min freq that is higher than the max freq requested by >> another. In this case devfreq_verify_within_limits() resolves the >> conflict by raising p->max to the min freq. Not sure if this is >> something that would ever occur in practice though. >> >> If we are really concerned about this case it would also be an option >> to limit the adjustment to the max frequency. >> >>> To resolve the conflict for multiple device driver, maybe OPP interface >>> have to support 'usage_count' such as clk_enable/disable(). >> >> This would require supporting negative usage count values, since a OPP >> should not be enabled if e.g. thermal enables it but the throttler >> disabled it or viceversa. >> >> Theoretically there could also be conflicts, like one driver disabling >> the higher OPPs and another the lower ones, with the outcome of all >> OPPs being disabled, which would be a more drastic conflict resolution >> than that of devfreq_verify_within_limits(). >> >> Viresh, what do you think about an OPP usage count? > > Ping, can we try to reach a conclusion on this or at least keep the > discussion going? > > Not that it matters, but my preferred solution continues to be > devfreq_verify_within_limits(). It solves conflicts in some way (which > could be adjusted if needed) and has proven to work in practice for > 10+ years in a very similar sub-system. It is not true. Current cpufreq subsystem doesn't support external OPP control to enable/disable the OPP entry. If some device driver controls the OPP entry of cpufreq driver with opp_disable/enable(), the operation is not working. Because cpufreq considers the limit through 'cpufreq_verify_with_limits()' only. As I already commented[1], there is different between cpufreq and devfreq. [1] https://lkml.org/lkml/2018/7/4/80 Already, subsystem already used OPP interface in order to control specific OPP entry. I don't want to provide two outside method to control the frequency of devfreq driver. It might make the confusion. I want to use only OPP interface to enable/disable frequency even if we have to modify the OPP interface.
Hi Chanwoo, On Wed, Aug 01, 2018 at 10:22:16AM +0900, Chanwoo Choi wrote: > On 2018년 08월 01일 04:39, Matthias Kaehlcke wrote: > > On Mon, Jul 16, 2018 at 10:50:50AM -0700, Matthias Kaehlcke wrote: > >> On Thu, Jul 12, 2018 at 05:44:33PM +0900, Chanwoo Choi wrote: > >>> Hi Matthias, > >>> > >>> On 2018년 07월 07일 02:53, Matthias Kaehlcke wrote: > >>>> Hi Chanwoo, > >>>> > >>>> On Wed, Jul 04, 2018 at 03:41:46PM +0900, Chanwoo Choi wrote: > >>>> > >>>>> Firstly, > >>>>> I'm not sure why devfreq needs the devfreq_verify_within_limits() function. > >>>>> > >>>>> devfreq already used the OPP interface as default. It means that > >>>>> the outside of 'drivers/devfreq' can disable/enable the frequency > >>>>> such as drivers/thermal/devfreq_cooling.c. Also, when some device > >>>>> drivers disable/enable the specific frequency, the devfreq core > >>>>> consider them. > >>>>> > >>>>> So, devfreq doesn't need to devfreq_verify_within_limits() because > >>>>> already support some interface to change the minimum/maximum frequency > >>>>> of devfreq device. > >>>>> > >>>>> In case of cpufreq subsystem, cpufreq only provides 'cpufreq_verify_with_limits()' > >>>>> to change the minimum/maximum frequency of cpu. some device driver cannot > >>>>> change the minimum/maximum frequency through OPP interface. > >>>>> > >>>>> But, in case of devfreq subsystem, as I explained already, devfreq support > >>>>> the OPP interface as default way. devfreq subsystem doesn't need to add > >>>>> other way to change the minimum/maximum frequency. > >>>> > >>>> Using the OPP interface exclusively works as long as a > >>>> enabling/disabling of OPPs is limited to a single driver > >>>> (drivers/thermal/devfreq_cooling.c). When multiple drivers are > >>>> involved you need a way to resolve conflicts, that's the purpose of > >>>> devfreq_verify_within_limits(). Please let me know if there are > >>>> existing mechanisms for conflict resolution that I overlooked. > >>>> > >>>> Possibly drivers/thermal/devfreq_cooling.c could be migrated to use > >>>> devfreq_verify_within_limits() instead of the OPP interface if > >>>> desired, however this seems beyond the scope of this series. > >>> > >>> Actually, if we uses this approach, it doesn't support the multiple drivers too. > >>> If non throttler drivers uses devfreq_verify_within_limits(), the conflict > >>> happen. > >> > >> As long as drivers limit the max freq there is no conflict, the lowest > >> max freq wins. I expect this to be the usual case, apparently it > >> worked for cpufreq for 10+ years. > >> > >> However it is correct that there would be a conflict if a driver > >> requests a min freq that is higher than the max freq requested by > >> another. In this case devfreq_verify_within_limits() resolves the > >> conflict by raising p->max to the min freq. Not sure if this is > >> something that would ever occur in practice though. > >> > >> If we are really concerned about this case it would also be an option > >> to limit the adjustment to the max frequency. > >> > >>> To resolve the conflict for multiple device driver, maybe OPP interface > >>> have to support 'usage_count' such as clk_enable/disable(). > >> > >> This would require supporting negative usage count values, since a OPP > >> should not be enabled if e.g. thermal enables it but the throttler > >> disabled it or viceversa. > >> > >> Theoretically there could also be conflicts, like one driver disabling > >> the higher OPPs and another the lower ones, with the outcome of all > >> OPPs being disabled, which would be a more drastic conflict resolution > >> than that of devfreq_verify_within_limits(). > >> > >> Viresh, what do you think about an OPP usage count? > > > > Ping, can we try to reach a conclusion on this or at least keep the > > discussion going? > > > > Not that it matters, but my preferred solution continues to be > > devfreq_verify_within_limits(). It solves conflicts in some way (which > > could be adjusted if needed) and has proven to work in practice for > > 10+ years in a very similar sub-system. > > It is not true. Current cpufreq subsystem doesn't support external OPP > control to enable/disable the OPP entry. If some device driver > controls the OPP entry of cpufreq driver with opp_disable/enable(), > the operation is not working. Because cpufreq considers the limit > through 'cpufreq_verify_with_limits()' only. Ok, we can probably agree that using cpufreq_verify_with_limits() exclusively seems to have worked well for cpufreq, and that in their overall purpose cpufreq and devfreq are similar subsystems. The current throttler series with devfreq_verify_within_limits() takes the enabled OPPs into account, the lowest and highest OPP are used as a starting point for the frequency adjustment and (in theory) the frequency range should only be narrowed by devfreq_verify_within_limits(). > As I already commented[1], there is different between cpufreq and devfreq. > [1] https://lkml.org/lkml/2018/7/4/80 > > Already, subsystem already used OPP interface in order to control > specific OPP entry. I don't want to provide two outside method > to control the frequency of devfreq driver. It might make the confusion. I understand your point, it would indeed be preferable to have a single method. However I'm not convinced that the OPP interface is a suitable solution, as I exposed earlier in this thread (quoted below). I would like you to at least consider the possibility of changing drivers/thermal/devfreq_cooling.c to devfreq_verify_within_limits(). Besides that it's not what is currently used, do you see any technical concerns that would make devfreq_verify_within_limits() an unsuitable or inferior solution? > I want to use only OPP interface to enable/disable frequency > even if we have to modify the OPP interface. These are the concerns I raised earlier about a solution with OPP usage counts: "This would require supporting negative usage count values, since a OPP should not be enabled if e.g. thermal enables it but the throttler disabled it or viceversa. Theoretically there could also be conflicts, like one driver disabling the higher OPPs and another the lower ones, with the outcome of all OPPs being disabled, which would be a more drastic conflict resolution than that of devfreq_verify_within_limits()." What do you think about these points? The negative usage counts aren't necessarily a dealbreaker in a technical sense, though I'm not a friend of quirky interfaces that don't behave like a typical user would expect (e.g. an OPP isn't necessarily enabled after dev_pm_opp_enable()). I can sent an RFC with OPP usage counts, though due to the above concerns I have doubts it will be well received. Thanks Matthias
Hi Matthias, On 2018년 08월 02일 02:08, Matthias Kaehlcke wrote: > Hi Chanwoo, > > On Wed, Aug 01, 2018 at 10:22:16AM +0900, Chanwoo Choi wrote: >> On 2018년 08월 01일 04:39, Matthias Kaehlcke wrote: >>> On Mon, Jul 16, 2018 at 10:50:50AM -0700, Matthias Kaehlcke wrote: >>>> On Thu, Jul 12, 2018 at 05:44:33PM +0900, Chanwoo Choi wrote: >>>>> Hi Matthias, >>>>> >>>>> On 2018년 07월 07일 02:53, Matthias Kaehlcke wrote: >>>>>> Hi Chanwoo, >>>>>> >>>>>> On Wed, Jul 04, 2018 at 03:41:46PM +0900, Chanwoo Choi wrote: >>>>>> >>>>>>> Firstly, >>>>>>> I'm not sure why devfreq needs the devfreq_verify_within_limits() function. >>>>>>> >>>>>>> devfreq already used the OPP interface as default. It means that >>>>>>> the outside of 'drivers/devfreq' can disable/enable the frequency >>>>>>> such as drivers/thermal/devfreq_cooling.c. Also, when some device >>>>>>> drivers disable/enable the specific frequency, the devfreq core >>>>>>> consider them. >>>>>>> >>>>>>> So, devfreq doesn't need to devfreq_verify_within_limits() because >>>>>>> already support some interface to change the minimum/maximum frequency >>>>>>> of devfreq device. >>>>>>> >>>>>>> In case of cpufreq subsystem, cpufreq only provides 'cpufreq_verify_with_limits()' >>>>>>> to change the minimum/maximum frequency of cpu. some device driver cannot >>>>>>> change the minimum/maximum frequency through OPP interface. >>>>>>> >>>>>>> But, in case of devfreq subsystem, as I explained already, devfreq support >>>>>>> the OPP interface as default way. devfreq subsystem doesn't need to add >>>>>>> other way to change the minimum/maximum frequency. >>>>>> >>>>>> Using the OPP interface exclusively works as long as a >>>>>> enabling/disabling of OPPs is limited to a single driver >>>>>> (drivers/thermal/devfreq_cooling.c). When multiple drivers are >>>>>> involved you need a way to resolve conflicts, that's the purpose of >>>>>> devfreq_verify_within_limits(). Please let me know if there are >>>>>> existing mechanisms for conflict resolution that I overlooked. >>>>>> >>>>>> Possibly drivers/thermal/devfreq_cooling.c could be migrated to use >>>>>> devfreq_verify_within_limits() instead of the OPP interface if >>>>>> desired, however this seems beyond the scope of this series. >>>>> >>>>> Actually, if we uses this approach, it doesn't support the multiple drivers too. >>>>> If non throttler drivers uses devfreq_verify_within_limits(), the conflict >>>>> happen. >>>> >>>> As long as drivers limit the max freq there is no conflict, the lowest >>>> max freq wins. I expect this to be the usual case, apparently it >>>> worked for cpufreq for 10+ years. >>>> >>>> However it is correct that there would be a conflict if a driver >>>> requests a min freq that is higher than the max freq requested by >>>> another. In this case devfreq_verify_within_limits() resolves the >>>> conflict by raising p->max to the min freq. Not sure if this is >>>> something that would ever occur in practice though. >>>> >>>> If we are really concerned about this case it would also be an option >>>> to limit the adjustment to the max frequency. >>>> >>>>> To resolve the conflict for multiple device driver, maybe OPP interface >>>>> have to support 'usage_count' such as clk_enable/disable(). >>>> >>>> This would require supporting negative usage count values, since a OPP >>>> should not be enabled if e.g. thermal enables it but the throttler >>>> disabled it or viceversa. >>>> >>>> Theoretically there could also be conflicts, like one driver disabling >>>> the higher OPPs and another the lower ones, with the outcome of all >>>> OPPs being disabled, which would be a more drastic conflict resolution >>>> than that of devfreq_verify_within_limits(). >>>> >>>> Viresh, what do you think about an OPP usage count? >>> >>> Ping, can we try to reach a conclusion on this or at least keep the >>> discussion going? >>> >>> Not that it matters, but my preferred solution continues to be >>> devfreq_verify_within_limits(). It solves conflicts in some way (which >>> could be adjusted if needed) and has proven to work in practice for >>> 10+ years in a very similar sub-system. >> >> It is not true. Current cpufreq subsystem doesn't support external OPP >> control to enable/disable the OPP entry. If some device driver >> controls the OPP entry of cpufreq driver with opp_disable/enable(), >> the operation is not working. Because cpufreq considers the limit >> through 'cpufreq_verify_with_limits()' only. > > Ok, we can probably agree that using cpufreq_verify_with_limits() > exclusively seems to have worked well for cpufreq, and that in their > overall purpose cpufreq and devfreq are similar subsystems. > > The current throttler series with devfreq_verify_within_limits() takes > the enabled OPPs into account, the lowest and highest OPP are used as > a starting point for the frequency adjustment and (in theory) the > frequency range should only be narrowed by > devfreq_verify_within_limits(). > >> As I already commented[1], there is different between cpufreq and devfreq. >> [1] https://lkml.org/lkml/2018/7/4/80 >> >> Already, subsystem already used OPP interface in order to control >> specific OPP entry. I don't want to provide two outside method >> to control the frequency of devfreq driver. It might make the confusion. > > I understand your point, it would indeed be preferable to have a > single method. However I'm not convinced that the OPP interface is > a suitable solution, as I exposed earlier in this thread (quoted > below). > > I would like you to at least consider the possibility of changing > drivers/thermal/devfreq_cooling.c to devfreq_verify_within_limits(). > Besides that it's not what is currently used, do you see any technical > concerns that would make devfreq_verify_within_limits() an unsuitable > or inferior solution? As we already discussed, devfreq_verify_within_limits() doesn't support the multiple outside controllers (e.g., devfreq-cooling.c). After you are suggesting the throttler core, there are at least two outside controllers (e.g., devfreq-cooling.c and throttler driver). As I knew the problem about conflict, I cannot agree the temporary method. OPP interface is mandatory for devfreq in order to control the OPP (frequency/voltage). In this situation, we have to try to find the method through OPP interface. We can refer to regulator/clock. Multiple device driver can use the regulator/clock without any problem. I think that usage of OPP is similiar with regulator/clock. As you mentioned, maybe OPP would handle the negative count. Although opp_enable/opp_disable() have to handle the negative count and opp_enable/opp_disable() can support the multiple usage from device drivers, I think that this approach is right. > >> I want to use only OPP interface to enable/disable frequency >> even if we have to modify the OPP interface. > > These are the concerns I raised earlier about a solution with OPP > usage counts: > > "This would require supporting negative usage count values, since a OPP > should not be enabled if e.g. thermal enables it but the throttler > disabled it or viceversa. Already replied about negative usage count. I think that negative usage count is not problem if this approach could resolve the issue. > > Theoretically there could also be conflicts, like one driver disabling > the higher OPPs and another the lower ones, with the outcome of all > OPPs being disabled, which would be a more drastic conflict resolution > than that of devfreq_verify_within_limits()." > > What do you think about these points? It depends on how to use OPP interface on multiple device driver. Even if devfreq/opp provides the control method, outside device driver are misusing them. It is problem of user. Instead, if we use the OPP interface, we can check why OPP entry is disabled or enabled through usage count. > > The negative usage counts aren't necessarily a dealbreaker in a > technical sense, though I'm not a friend of quirky interfaces that > don't behave like a typical user would expect (e.g. an OPP isn't > necessarily enabled after dev_pm_opp_enable()). > > I can sent an RFC with OPP usage counts, though due to the above > concerns I have doubts it will be well received. Please add me to Cc list. > > Thanks > > Matthias > >
Hi Chanwoo, On Thu, Aug 02, 2018 at 10:58:59AM +0900, Chanwoo Choi wrote: > Hi Matthias, > > On 2018년 08월 02일 02:08, Matthias Kaehlcke wrote: > > Hi Chanwoo, > > > > On Wed, Aug 01, 2018 at 10:22:16AM +0900, Chanwoo Choi wrote: > >> On 2018년 08월 01일 04:39, Matthias Kaehlcke wrote: > >>> On Mon, Jul 16, 2018 at 10:50:50AM -0700, Matthias Kaehlcke wrote: > >>>> On Thu, Jul 12, 2018 at 05:44:33PM +0900, Chanwoo Choi wrote: > >>>>> Hi Matthias, > >>>>> > >>>>> On 2018년 07월 07일 02:53, Matthias Kaehlcke wrote: > >>>>>> Hi Chanwoo, > >>>>>> > >>>>>> On Wed, Jul 04, 2018 at 03:41:46PM +0900, Chanwoo Choi wrote: > >>>>>> > >>>>>>> Firstly, > >>>>>>> I'm not sure why devfreq needs the devfreq_verify_within_limits() function. > >>>>>>> > >>>>>>> devfreq already used the OPP interface as default. It means that > >>>>>>> the outside of 'drivers/devfreq' can disable/enable the frequency > >>>>>>> such as drivers/thermal/devfreq_cooling.c. Also, when some device > >>>>>>> drivers disable/enable the specific frequency, the devfreq core > >>>>>>> consider them. > >>>>>>> > >>>>>>> So, devfreq doesn't need to devfreq_verify_within_limits() because > >>>>>>> already support some interface to change the minimum/maximum frequency > >>>>>>> of devfreq device. > >>>>>>> > >>>>>>> In case of cpufreq subsystem, cpufreq only provides 'cpufreq_verify_with_limits()' > >>>>>>> to change the minimum/maximum frequency of cpu. some device driver cannot > >>>>>>> change the minimum/maximum frequency through OPP interface. > >>>>>>> > >>>>>>> But, in case of devfreq subsystem, as I explained already, devfreq support > >>>>>>> the OPP interface as default way. devfreq subsystem doesn't need to add > >>>>>>> other way to change the minimum/maximum frequency. > >>>>>> > >>>>>> Using the OPP interface exclusively works as long as a > >>>>>> enabling/disabling of OPPs is limited to a single driver > >>>>>> (drivers/thermal/devfreq_cooling.c). When multiple drivers are > >>>>>> involved you need a way to resolve conflicts, that's the purpose of > >>>>>> devfreq_verify_within_limits(). Please let me know if there are > >>>>>> existing mechanisms for conflict resolution that I overlooked. > >>>>>> > >>>>>> Possibly drivers/thermal/devfreq_cooling.c could be migrated to use > >>>>>> devfreq_verify_within_limits() instead of the OPP interface if > >>>>>> desired, however this seems beyond the scope of this series. > >>>>> > >>>>> Actually, if we uses this approach, it doesn't support the multiple drivers too. > >>>>> If non throttler drivers uses devfreq_verify_within_limits(), the conflict > >>>>> happen. > >>>> > >>>> As long as drivers limit the max freq there is no conflict, the lowest > >>>> max freq wins. I expect this to be the usual case, apparently it > >>>> worked for cpufreq for 10+ years. > >>>> > >>>> However it is correct that there would be a conflict if a driver > >>>> requests a min freq that is higher than the max freq requested by > >>>> another. In this case devfreq_verify_within_limits() resolves the > >>>> conflict by raising p->max to the min freq. Not sure if this is > >>>> something that would ever occur in practice though. > >>>> > >>>> If we are really concerned about this case it would also be an option > >>>> to limit the adjustment to the max frequency. > >>>> > >>>>> To resolve the conflict for multiple device driver, maybe OPP interface > >>>>> have to support 'usage_count' such as clk_enable/disable(). > >>>> > >>>> This would require supporting negative usage count values, since a OPP > >>>> should not be enabled if e.g. thermal enables it but the throttler > >>>> disabled it or viceversa. > >>>> > >>>> Theoretically there could also be conflicts, like one driver disabling > >>>> the higher OPPs and another the lower ones, with the outcome of all > >>>> OPPs being disabled, which would be a more drastic conflict resolution > >>>> than that of devfreq_verify_within_limits(). > >>>> > >>>> Viresh, what do you think about an OPP usage count? > >>> > >>> Ping, can we try to reach a conclusion on this or at least keep the > >>> discussion going? > >>> > >>> Not that it matters, but my preferred solution continues to be > >>> devfreq_verify_within_limits(). It solves conflicts in some way (which > >>> could be adjusted if needed) and has proven to work in practice for > >>> 10+ years in a very similar sub-system. > >> > >> It is not true. Current cpufreq subsystem doesn't support external OPP > >> control to enable/disable the OPP entry. If some device driver > >> controls the OPP entry of cpufreq driver with opp_disable/enable(), > >> the operation is not working. Because cpufreq considers the limit > >> through 'cpufreq_verify_with_limits()' only. > > > > Ok, we can probably agree that using cpufreq_verify_with_limits() > > exclusively seems to have worked well for cpufreq, and that in their > > overall purpose cpufreq and devfreq are similar subsystems. > > > > The current throttler series with devfreq_verify_within_limits() takes > > the enabled OPPs into account, the lowest and highest OPP are used as > > a starting point for the frequency adjustment and (in theory) the > > frequency range should only be narrowed by > > devfreq_verify_within_limits(). > > > >> As I already commented[1], there is different between cpufreq and devfreq. > >> [1] https://lkml.org/lkml/2018/7/4/80 > >> > >> Already, subsystem already used OPP interface in order to control > >> specific OPP entry. I don't want to provide two outside method > >> to control the frequency of devfreq driver. It might make the confusion. > > > > I understand your point, it would indeed be preferable to have a > > single method. However I'm not convinced that the OPP interface is > > a suitable solution, as I exposed earlier in this thread (quoted > > below). > > > > I would like you to at least consider the possibility of changing > > drivers/thermal/devfreq_cooling.c to devfreq_verify_within_limits(). > > Besides that it's not what is currently used, do you see any technical > > concerns that would make devfreq_verify_within_limits() an unsuitable > > or inferior solution? > > As we already discussed, devfreq_verify_within_limits() doesn't support > the multiple outside controllers (e.g., devfreq-cooling.c). That's incorrect, its purpose is precisely that. Are you suggesting that cpufreq with its use of cpufreq_verify_within_limits() (the inspiration for devfreq_verify_within_limits()) is broken? It is used by cpu_cooling.c and other drivers when receiving a CPUFREQ_ADJUST event, essentially what I am proposing with DEVFREQ_ADJUST. Could you elaborate why this model wouldn't work for devfreq? "OPP interface is mandatory for devfreq" isn't really a technical argument, is it mandatory for any other reason than that it is the interface that is currently used? > After you are suggesting the throttler core, there are at least two > outside controllers (e.g., devfreq-cooling.c and throttler driver). > As I knew the problem about conflict, I cannot agree the temporary > method. OPP interface is mandatory for devfreq in order to control > the OPP (frequency/voltage). In this situation, we have to try to > find the method through OPP interface. What do you mean with "temporary method"? We can try to find a method through the OPP interface, but at this point I'm not convinced that it is technically necessary or even preferable. Another inconvenient of the OPP approach for both devfreq-cooling.c and the throttler is that they have to bother with disabling all OPPs above/below the max/min (they don't/shouldn't have to care), instead of just telling devfreq the max/min. > We can refer to regulator/clock. Multiple device driver can use > the regulator/clock without any problem. I think that usage of OPP > is similiar with regulator/clock. As you mentioned, maybe OPP > would handle the negative count. Although opp_enable/opp_disable() > have to handle the negative count and opp_enable/opp_disable() > can support the multiple usage from device drivers, I think that > this approach is right. The regulator/clock approach with the typical usage counts seems more intuitive to me, personally I wouldn't write an interface with negative usage count if I could reasonably avoid it. > >> I want to use only OPP interface to enable/disable frequency > >> even if we have to modify the OPP interface. > > > > These are the concerns I raised earlier about a solution with OPP > > usage counts: > > > > "This would require supporting negative usage count values, since a OPP > > should not be enabled if e.g. thermal enables it but the throttler > > disabled it or viceversa. > > Already replied about negative usage count. I think that negative usage count > is not problem if this approach could resolve the issue. > > > > > Theoretically there could also be conflicts, like one driver disabling > > the higher OPPs and another the lower ones, with the outcome of all > > OPPs being disabled, which would be a more drastic conflict resolution > > than that of devfreq_verify_within_limits()." > > > > What do you think about these points? > > It depends on how to use OPP interface on multiple device driver. > Even if devfreq/opp provides the control method, outside device driver > are misusing them. It is problem of user. I wouldn't call it misusing if two independent drivers take contradictory actions on an interface that doesn't provide arbitration. How can driver A know that it shouldn't disable OPPs a, b and c because driver B disabled d, e and f? Who is misusing the interface, driver A or driver B? > Instead, if we use the OPP interface, we can check why OPP entry > is disabled or enabled through usage count. > > > > > The negative usage counts aren't necessarily a dealbreaker in a > > technical sense, though I'm not a friend of quirky interfaces that > > don't behave like a typical user would expect (e.g. an OPP isn't > > necessarily enabled after dev_pm_opp_enable()). > > > > I can sent an RFC with OPP usage counts, though due to the above > > concerns I have doubts it will be well received. > > Please add me to Cc list. Will do Thanks Matthias
On Thu, Aug 02, 2018 at 04:13:43PM -0700, Matthias Kaehlcke wrote: > Hi Chanwoo, > > On Thu, Aug 02, 2018 at 10:58:59AM +0900, Chanwoo Choi wrote: > > Hi Matthias, > > > > On 2018년 08월 02일 02:08, Matthias Kaehlcke wrote: > > > Hi Chanwoo, > > > > > > On Wed, Aug 01, 2018 at 10:22:16AM +0900, Chanwoo Choi wrote: > > >> On 2018년 08월 01일 04:39, Matthias Kaehlcke wrote: > > >>> On Mon, Jul 16, 2018 at 10:50:50AM -0700, Matthias Kaehlcke wrote: > > >>>> On Thu, Jul 12, 2018 at 05:44:33PM +0900, Chanwoo Choi wrote: > > >>>>> Hi Matthias, > > >>>>> > > >>>>> On 2018년 07월 07일 02:53, Matthias Kaehlcke wrote: > > >>>>>> Hi Chanwoo, > > >>>>>> > > >>>>>> On Wed, Jul 04, 2018 at 03:41:46PM +0900, Chanwoo Choi wrote: > > >>>>>> > > >>>>>>> Firstly, > > >>>>>>> I'm not sure why devfreq needs the devfreq_verify_within_limits() function. > > >>>>>>> > > >>>>>>> devfreq already used the OPP interface as default. It means that > > >>>>>>> the outside of 'drivers/devfreq' can disable/enable the frequency > > >>>>>>> such as drivers/thermal/devfreq_cooling.c. Also, when some device > > >>>>>>> drivers disable/enable the specific frequency, the devfreq core > > >>>>>>> consider them. > > >>>>>>> > > >>>>>>> So, devfreq doesn't need to devfreq_verify_within_limits() because > > >>>>>>> already support some interface to change the minimum/maximum frequency > > >>>>>>> of devfreq device. > > >>>>>>> > > >>>>>>> In case of cpufreq subsystem, cpufreq only provides 'cpufreq_verify_with_limits()' > > >>>>>>> to change the minimum/maximum frequency of cpu. some device driver cannot > > >>>>>>> change the minimum/maximum frequency through OPP interface. > > >>>>>>> > > >>>>>>> But, in case of devfreq subsystem, as I explained already, devfreq support > > >>>>>>> the OPP interface as default way. devfreq subsystem doesn't need to add > > >>>>>>> other way to change the minimum/maximum frequency. > > >>>>>> > > >>>>>> Using the OPP interface exclusively works as long as a > > >>>>>> enabling/disabling of OPPs is limited to a single driver > > >>>>>> (drivers/thermal/devfreq_cooling.c). When multiple drivers are > > >>>>>> involved you need a way to resolve conflicts, that's the purpose of > > >>>>>> devfreq_verify_within_limits(). Please let me know if there are > > >>>>>> existing mechanisms for conflict resolution that I overlooked. > > >>>>>> > > >>>>>> Possibly drivers/thermal/devfreq_cooling.c could be migrated to use > > >>>>>> devfreq_verify_within_limits() instead of the OPP interface if > > >>>>>> desired, however this seems beyond the scope of this series. > > >>>>> > > >>>>> Actually, if we uses this approach, it doesn't support the multiple drivers too. > > >>>>> If non throttler drivers uses devfreq_verify_within_limits(), the conflict > > >>>>> happen. > > >>>> > > >>>> As long as drivers limit the max freq there is no conflict, the lowest > > >>>> max freq wins. I expect this to be the usual case, apparently it > > >>>> worked for cpufreq for 10+ years. > > >>>> > > >>>> However it is correct that there would be a conflict if a driver > > >>>> requests a min freq that is higher than the max freq requested by > > >>>> another. In this case devfreq_verify_within_limits() resolves the > > >>>> conflict by raising p->max to the min freq. Not sure if this is > > >>>> something that would ever occur in practice though. > > >>>> > > >>>> If we are really concerned about this case it would also be an option > > >>>> to limit the adjustment to the max frequency. > > >>>> > > >>>>> To resolve the conflict for multiple device driver, maybe OPP interface > > >>>>> have to support 'usage_count' such as clk_enable/disable(). > > >>>> > > >>>> This would require supporting negative usage count values, since a OPP > > >>>> should not be enabled if e.g. thermal enables it but the throttler > > >>>> disabled it or viceversa. > > >>>> > > >>>> Theoretically there could also be conflicts, like one driver disabling > > >>>> the higher OPPs and another the lower ones, with the outcome of all > > >>>> OPPs being disabled, which would be a more drastic conflict resolution > > >>>> than that of devfreq_verify_within_limits(). > > >>>> > > >>>> Viresh, what do you think about an OPP usage count? > > >>> > > >>> Ping, can we try to reach a conclusion on this or at least keep the > > >>> discussion going? > > >>> > > >>> Not that it matters, but my preferred solution continues to be > > >>> devfreq_verify_within_limits(). It solves conflicts in some way (which > > >>> could be adjusted if needed) and has proven to work in practice for > > >>> 10+ years in a very similar sub-system. > > >> > > >> It is not true. Current cpufreq subsystem doesn't support external OPP > > >> control to enable/disable the OPP entry. If some device driver > > >> controls the OPP entry of cpufreq driver with opp_disable/enable(), > > >> the operation is not working. Because cpufreq considers the limit > > >> through 'cpufreq_verify_with_limits()' only. > > > > > > Ok, we can probably agree that using cpufreq_verify_with_limits() > > > exclusively seems to have worked well for cpufreq, and that in their > > > overall purpose cpufreq and devfreq are similar subsystems. > > > > > > The current throttler series with devfreq_verify_within_limits() takes > > > the enabled OPPs into account, the lowest and highest OPP are used as > > > a starting point for the frequency adjustment and (in theory) the > > > frequency range should only be narrowed by > > > devfreq_verify_within_limits(). > > > > > >> As I already commented[1], there is different between cpufreq and devfreq. > > >> [1] https://lkml.org/lkml/2018/7/4/80 > > >> > > >> Already, subsystem already used OPP interface in order to control > > >> specific OPP entry. I don't want to provide two outside method > > >> to control the frequency of devfreq driver. It might make the confusion. > > > > > > I understand your point, it would indeed be preferable to have a > > > single method. However I'm not convinced that the OPP interface is > > > a suitable solution, as I exposed earlier in this thread (quoted > > > below). > > > > > > I would like you to at least consider the possibility of changing > > > drivers/thermal/devfreq_cooling.c to devfreq_verify_within_limits(). > > > Besides that it's not what is currently used, do you see any technical > > > concerns that would make devfreq_verify_within_limits() an unsuitable > > > or inferior solution? > > > > As we already discussed, devfreq_verify_within_limits() doesn't support > > the multiple outside controllers (e.g., devfreq-cooling.c). > > That's incorrect, its purpose is precisely that. > > Are you suggesting that cpufreq with its use of > cpufreq_verify_within_limits() (the inspiration for > devfreq_verify_within_limits()) is broken? It is used by cpu_cooling.c > and other drivers when receiving a CPUFREQ_ADJUST event, essentially > what I am proposing with DEVFREQ_ADJUST. > > Could you elaborate why this model wouldn't work for devfreq? "OPP > interface is mandatory for devfreq" isn't really a technical argument, > is it mandatory for any other reason than that it is the interface > that is currently used? > > > After you are suggesting the throttler core, there are at least two > > outside controllers (e.g., devfreq-cooling.c and throttler driver). > > As I knew the problem about conflict, I cannot agree the temporary > > method. OPP interface is mandatory for devfreq in order to control > > the OPP (frequency/voltage). In this situation, we have to try to > > find the method through OPP interface. > > What do you mean with "temporary method"? > > We can try to find a method through the OPP interface, but at this > point I'm not convinced that it is technically necessary or even > preferable. > > Another inconvenient of the OPP approach for both devfreq-cooling.c > and the throttler is that they have to bother with disabling all OPPs > above/below the max/min (they don't/shouldn't have to care), instead > of just telling devfreq the max/min. And a more important one: both drivers now have to keep track which OPPs they enabled/disabled previously, done are the days of a simple dev_pm_opp_enable/disable() in devfreq_cooling. Certainly it is possible and not very complex to implement, but is it really the best/a good solution?
Hi Matthias, On 2018년 08월 03일 08:13, Matthias Kaehlcke wrote: > Hi Chanwoo, > > On Thu, Aug 02, 2018 at 10:58:59AM +0900, Chanwoo Choi wrote: >> Hi Matthias, >> >> On 2018년 08월 02일 02:08, Matthias Kaehlcke wrote: >>> Hi Chanwoo, >>> >>> On Wed, Aug 01, 2018 at 10:22:16AM +0900, Chanwoo Choi wrote: >>>> On 2018년 08월 01일 04:39, Matthias Kaehlcke wrote: >>>>> On Mon, Jul 16, 2018 at 10:50:50AM -0700, Matthias Kaehlcke wrote: >>>>>> On Thu, Jul 12, 2018 at 05:44:33PM +0900, Chanwoo Choi wrote: >>>>>>> Hi Matthias, >>>>>>> >>>>>>> On 2018년 07월 07일 02:53, Matthias Kaehlcke wrote: >>>>>>>> Hi Chanwoo, >>>>>>>> >>>>>>>> On Wed, Jul 04, 2018 at 03:41:46PM +0900, Chanwoo Choi wrote: >>>>>>>> >>>>>>>>> Firstly, >>>>>>>>> I'm not sure why devfreq needs the devfreq_verify_within_limits() function. >>>>>>>>> >>>>>>>>> devfreq already used the OPP interface as default. It means that >>>>>>>>> the outside of 'drivers/devfreq' can disable/enable the frequency >>>>>>>>> such as drivers/thermal/devfreq_cooling.c. Also, when some device >>>>>>>>> drivers disable/enable the specific frequency, the devfreq core >>>>>>>>> consider them. >>>>>>>>> >>>>>>>>> So, devfreq doesn't need to devfreq_verify_within_limits() because >>>>>>>>> already support some interface to change the minimum/maximum frequency >>>>>>>>> of devfreq device. >>>>>>>>> >>>>>>>>> In case of cpufreq subsystem, cpufreq only provides 'cpufreq_verify_with_limits()' >>>>>>>>> to change the minimum/maximum frequency of cpu. some device driver cannot >>>>>>>>> change the minimum/maximum frequency through OPP interface. >>>>>>>>> >>>>>>>>> But, in case of devfreq subsystem, as I explained already, devfreq support >>>>>>>>> the OPP interface as default way. devfreq subsystem doesn't need to add >>>>>>>>> other way to change the minimum/maximum frequency. >>>>>>>> >>>>>>>> Using the OPP interface exclusively works as long as a >>>>>>>> enabling/disabling of OPPs is limited to a single driver >>>>>>>> (drivers/thermal/devfreq_cooling.c). When multiple drivers are >>>>>>>> involved you need a way to resolve conflicts, that's the purpose of >>>>>>>> devfreq_verify_within_limits(). Please let me know if there are >>>>>>>> existing mechanisms for conflict resolution that I overlooked. >>>>>>>> >>>>>>>> Possibly drivers/thermal/devfreq_cooling.c could be migrated to use >>>>>>>> devfreq_verify_within_limits() instead of the OPP interface if >>>>>>>> desired, however this seems beyond the scope of this series. >>>>>>> >>>>>>> Actually, if we uses this approach, it doesn't support the multiple drivers too. >>>>>>> If non throttler drivers uses devfreq_verify_within_limits(), the conflict >>>>>>> happen. >>>>>> >>>>>> As long as drivers limit the max freq there is no conflict, the lowest >>>>>> max freq wins. I expect this to be the usual case, apparently it >>>>>> worked for cpufreq for 10+ years. >>>>>> >>>>>> However it is correct that there would be a conflict if a driver >>>>>> requests a min freq that is higher than the max freq requested by >>>>>> another. In this case devfreq_verify_within_limits() resolves the >>>>>> conflict by raising p->max to the min freq. Not sure if this is >>>>>> something that would ever occur in practice though. >>>>>> >>>>>> If we are really concerned about this case it would also be an option >>>>>> to limit the adjustment to the max frequency. >>>>>> >>>>>>> To resolve the conflict for multiple device driver, maybe OPP interface >>>>>>> have to support 'usage_count' such as clk_enable/disable(). >>>>>> >>>>>> This would require supporting negative usage count values, since a OPP >>>>>> should not be enabled if e.g. thermal enables it but the throttler >>>>>> disabled it or viceversa. >>>>>> >>>>>> Theoretically there could also be conflicts, like one driver disabling >>>>>> the higher OPPs and another the lower ones, with the outcome of all >>>>>> OPPs being disabled, which would be a more drastic conflict resolution >>>>>> than that of devfreq_verify_within_limits(). >>>>>> >>>>>> Viresh, what do you think about an OPP usage count? >>>>> >>>>> Ping, can we try to reach a conclusion on this or at least keep the >>>>> discussion going? >>>>> >>>>> Not that it matters, but my preferred solution continues to be >>>>> devfreq_verify_within_limits(). It solves conflicts in some way (which >>>>> could be adjusted if needed) and has proven to work in practice for >>>>> 10+ years in a very similar sub-system. >>>> >>>> It is not true. Current cpufreq subsystem doesn't support external OPP >>>> control to enable/disable the OPP entry. If some device driver >>>> controls the OPP entry of cpufreq driver with opp_disable/enable(), >>>> the operation is not working. Because cpufreq considers the limit >>>> through 'cpufreq_verify_with_limits()' only. >>> >>> Ok, we can probably agree that using cpufreq_verify_with_limits() >>> exclusively seems to have worked well for cpufreq, and that in their >>> overall purpose cpufreq and devfreq are similar subsystems. >>> >>> The current throttler series with devfreq_verify_within_limits() takes >>> the enabled OPPs into account, the lowest and highest OPP are used as >>> a starting point for the frequency adjustment and (in theory) the >>> frequency range should only be narrowed by >>> devfreq_verify_within_limits(). >>> >>>> As I already commented[1], there is different between cpufreq and devfreq. >>>> [1] https://lkml.org/lkml/2018/7/4/80 >>>> >>>> Already, subsystem already used OPP interface in order to control >>>> specific OPP entry. I don't want to provide two outside method >>>> to control the frequency of devfreq driver. It might make the confusion. >>> >>> I understand your point, it would indeed be preferable to have a >>> single method. However I'm not convinced that the OPP interface is >>> a suitable solution, as I exposed earlier in this thread (quoted >>> below). >>> >>> I would like you to at least consider the possibility of changing >>> drivers/thermal/devfreq_cooling.c to devfreq_verify_within_limits(). >>> Besides that it's not what is currently used, do you see any technical >>> concerns that would make devfreq_verify_within_limits() an unsuitable >>> or inferior solution? >> >> As we already discussed, devfreq_verify_within_limits() doesn't support >> the multiple outside controllers (e.g., devfreq-cooling.c). > > That's incorrect, its purpose is precisely that. > > Are you suggesting that cpufreq with its use of > cpufreq_verify_within_limits() (the inspiration for > devfreq_verify_within_limits()) is broken? It is used by cpu_cooling.c > and other drivers when receiving a CPUFREQ_ADJUST event, essentially > what I am proposing with DEVFREQ_ADJUST. > > Could you elaborate why this model wouldn't work for devfreq? "OPP I don't mention that this model is not working. As I already commented[1], devfreq used OPP interface to control OPP entry on outside of devfreq driver. Because devfreq used OPP interface, I hope to provide only OPP method to control the frequency on outside of devfreq. [1] https://lkml.org/lkml/2018/7/4/80 > interface is mandatory for devfreq" isn't really a technical argument, > is it mandatory for any other reason than that it is the interface > that is currently used? In case of controlling the frequency, OPP interface is mandatory for devfreq. cpufreq used cpufreq_verify_within_limit(). If outside driver disable specific OPP entry, cpufreq don't consider them because after getting the frequency from devicetree, cpufreq don't use the OPP interface for disabling/enabling. Only if outside driver used cpufreq_verify_within_limit(), cpufreq consider the range of minimum/maximum frequency. cpufreq core doesn't use 'dev_pm_opp_find_*' function. It means that cpufreq code doesn't consider the statue of opp_diable/enable. devfreq used OPP interface. If outside driver disable specific OPP entry, devfreq consider them. When find available minimum frequency, devfreq used OPP interface. (find_available_min_freq) When find available maximum frequency, devfreq used OPP interface. (find_available_max_freq) When make freq_table of devfreq device, devfreq used OPP interface. (set_freq_table) When outside driver disable or enable OPP entry, devfreq receives the notification from OPP interface and then update the scaling_min_freq/scaling_max_freq by using OPP interface. (devfreq_notifier_call) At this point of using scaling_min_freq/scaling_max_freq on devfreq, it indicates that devfreq used OPP interface because devfref tried to find scaling_min_freq/scaling_max_freq through OPP interface. If outside driver use OPP interface in order to control frequency, devfreq core is well working without any modification of devfreq core. > >> After you are suggesting the throttler core, there are at least two >> outside controllers (e.g., devfreq-cooling.c and throttler driver). >> As I knew the problem about conflict, I cannot agree the temporary >> method. OPP interface is mandatory for devfreq in order to control >> the OPP (frequency/voltage). In this situation, we have to try to >> find the method through OPP interface. > > What do you mean with "temporary method"? this expression might be not proper. Please ignore this expression. > > We can try to find a method through the OPP interface, but at this > point I'm not convinced that it is technically necessary or even > preferable. I replied it about this as following. > > Another inconvenient of the OPP approach for both devfreq-cooling.c > and the throttler is that they have to bother with disabling all OPPs > above/below the max/min (they don't/shouldn't have to care), instead > of just telling devfreq the max/min. I think it doesn't matter. We can enable/disable the OPP entry by traversing. partition_enable_opps() in drivers/thermal/devfreq-cools.c have already done so. > >> We can refer to regulator/clock. Multiple device driver can use >> the regulator/clock without any problem. I think that usage of OPP >> is similiar with regulator/clock. As you mentioned, maybe OPP >> would handle the negative count. Although opp_enable/opp_disable() >> have to handle the negative count and opp_enable/opp_disable() >> can support the multiple usage from device drivers, I think that >> this approach is right. > > The regulator/clock approach with the typical usage counts seems more > intuitive to me, personally I wouldn't write an interface with > negative usage count if I could reasonably avoid it. I think the use of negative usage count is not problem if it's required. > >>>> I want to use only OPP interface to enable/disable frequency >>>> even if we have to modify the OPP interface. >>> >>> These are the concerns I raised earlier about a solution with OPP >>> usage counts: >>> >>> "This would require supporting negative usage count values, since a OPP >>> should not be enabled if e.g. thermal enables it but the throttler >>> disabled it or viceversa. >> >> Already replied about negative usage count. I think that negative usage count >> is not problem if this approach could resolve the issue. >> >>> >>> Theoretically there could also be conflicts, like one driver disabling >>> the higher OPPs and another the lower ones, with the outcome of all >>> OPPs being disabled, which would be a more drastic conflict resolution >>> than that of devfreq_verify_within_limits()." >>> >>> What do you think about these points? >> >> It depends on how to use OPP interface on multiple device driver. >> Even if devfreq/opp provides the control method, outside device driver >> are misusing them. It is problem of user. > > I wouldn't call it misusing if two independent drivers take > contradictory actions on an interface that doesn't provide > arbitration. How can driver A know that it shouldn't disable OPPs a, b > and c because driver B disabled d, e and f? Who is misusing the > interface, driver A or driver B? Each outside driver has their own throttling policy to control OPP entries. They don't care the requirement of other driver and cannot know the requirement of other driver. devfreq core can only recognize them. > >> Instead, if we use the OPP interface, we can check why OPP entry >> is disabled or enabled through usage count. >> >>> >>> The negative usage counts aren't necessarily a dealbreaker in a >>> technical sense, though I'm not a friend of quirky interfaces that >>> don't behave like a typical user would expect (e.g. an OPP isn't >>> necessarily enabled after dev_pm_opp_enable()). >>> >>> I can sent an RFC with OPP usage counts, though due to the above >>> concerns I have doubts it will be well received. >> >> Please add me to Cc list. > > Will do OK. Thanks.
Hi Matthias, On 2018년 08월 03일 08:48, Matthias Kaehlcke wrote: > On Thu, Aug 02, 2018 at 04:13:43PM -0700, Matthias Kaehlcke wrote: >> Hi Chanwoo, >> >> On Thu, Aug 02, 2018 at 10:58:59AM +0900, Chanwoo Choi wrote: >>> Hi Matthias, >>> >>> On 2018년 08월 02일 02:08, Matthias Kaehlcke wrote: >>>> Hi Chanwoo, >>>> >>>> On Wed, Aug 01, 2018 at 10:22:16AM +0900, Chanwoo Choi wrote: >>>>> On 2018년 08월 01일 04:39, Matthias Kaehlcke wrote: >>>>>> On Mon, Jul 16, 2018 at 10:50:50AM -0700, Matthias Kaehlcke wrote: >>>>>>> On Thu, Jul 12, 2018 at 05:44:33PM +0900, Chanwoo Choi wrote: >>>>>>>> Hi Matthias, >>>>>>>> >>>>>>>> On 2018년 07월 07일 02:53, Matthias Kaehlcke wrote: >>>>>>>>> Hi Chanwoo, >>>>>>>>> >>>>>>>>> On Wed, Jul 04, 2018 at 03:41:46PM +0900, Chanwoo Choi wrote: >>>>>>>>> >>>>>>>>>> Firstly, >>>>>>>>>> I'm not sure why devfreq needs the devfreq_verify_within_limits() function. >>>>>>>>>> >>>>>>>>>> devfreq already used the OPP interface as default. It means that >>>>>>>>>> the outside of 'drivers/devfreq' can disable/enable the frequency >>>>>>>>>> such as drivers/thermal/devfreq_cooling.c. Also, when some device >>>>>>>>>> drivers disable/enable the specific frequency, the devfreq core >>>>>>>>>> consider them. >>>>>>>>>> >>>>>>>>>> So, devfreq doesn't need to devfreq_verify_within_limits() because >>>>>>>>>> already support some interface to change the minimum/maximum frequency >>>>>>>>>> of devfreq device. >>>>>>>>>> >>>>>>>>>> In case of cpufreq subsystem, cpufreq only provides 'cpufreq_verify_with_limits()' >>>>>>>>>> to change the minimum/maximum frequency of cpu. some device driver cannot >>>>>>>>>> change the minimum/maximum frequency through OPP interface. >>>>>>>>>> >>>>>>>>>> But, in case of devfreq subsystem, as I explained already, devfreq support >>>>>>>>>> the OPP interface as default way. devfreq subsystem doesn't need to add >>>>>>>>>> other way to change the minimum/maximum frequency. >>>>>>>>> >>>>>>>>> Using the OPP interface exclusively works as long as a >>>>>>>>> enabling/disabling of OPPs is limited to a single driver >>>>>>>>> (drivers/thermal/devfreq_cooling.c). When multiple drivers are >>>>>>>>> involved you need a way to resolve conflicts, that's the purpose of >>>>>>>>> devfreq_verify_within_limits(). Please let me know if there are >>>>>>>>> existing mechanisms for conflict resolution that I overlooked. >>>>>>>>> >>>>>>>>> Possibly drivers/thermal/devfreq_cooling.c could be migrated to use >>>>>>>>> devfreq_verify_within_limits() instead of the OPP interface if >>>>>>>>> desired, however this seems beyond the scope of this series. >>>>>>>> >>>>>>>> Actually, if we uses this approach, it doesn't support the multiple drivers too. >>>>>>>> If non throttler drivers uses devfreq_verify_within_limits(), the conflict >>>>>>>> happen. >>>>>>> >>>>>>> As long as drivers limit the max freq there is no conflict, the lowest >>>>>>> max freq wins. I expect this to be the usual case, apparently it >>>>>>> worked for cpufreq for 10+ years. >>>>>>> >>>>>>> However it is correct that there would be a conflict if a driver >>>>>>> requests a min freq that is higher than the max freq requested by >>>>>>> another. In this case devfreq_verify_within_limits() resolves the >>>>>>> conflict by raising p->max to the min freq. Not sure if this is >>>>>>> something that would ever occur in practice though. >>>>>>> >>>>>>> If we are really concerned about this case it would also be an option >>>>>>> to limit the adjustment to the max frequency. >>>>>>> >>>>>>>> To resolve the conflict for multiple device driver, maybe OPP interface >>>>>>>> have to support 'usage_count' such as clk_enable/disable(). >>>>>>> >>>>>>> This would require supporting negative usage count values, since a OPP >>>>>>> should not be enabled if e.g. thermal enables it but the throttler >>>>>>> disabled it or viceversa. >>>>>>> >>>>>>> Theoretically there could also be conflicts, like one driver disabling >>>>>>> the higher OPPs and another the lower ones, with the outcome of all >>>>>>> OPPs being disabled, which would be a more drastic conflict resolution >>>>>>> than that of devfreq_verify_within_limits(). >>>>>>> >>>>>>> Viresh, what do you think about an OPP usage count? >>>>>> >>>>>> Ping, can we try to reach a conclusion on this or at least keep the >>>>>> discussion going? >>>>>> >>>>>> Not that it matters, but my preferred solution continues to be >>>>>> devfreq_verify_within_limits(). It solves conflicts in some way (which >>>>>> could be adjusted if needed) and has proven to work in practice for >>>>>> 10+ years in a very similar sub-system. >>>>> >>>>> It is not true. Current cpufreq subsystem doesn't support external OPP >>>>> control to enable/disable the OPP entry. If some device driver >>>>> controls the OPP entry of cpufreq driver with opp_disable/enable(), >>>>> the operation is not working. Because cpufreq considers the limit >>>>> through 'cpufreq_verify_with_limits()' only. >>>> >>>> Ok, we can probably agree that using cpufreq_verify_with_limits() >>>> exclusively seems to have worked well for cpufreq, and that in their >>>> overall purpose cpufreq and devfreq are similar subsystems. >>>> >>>> The current throttler series with devfreq_verify_within_limits() takes >>>> the enabled OPPs into account, the lowest and highest OPP are used as >>>> a starting point for the frequency adjustment and (in theory) the >>>> frequency range should only be narrowed by >>>> devfreq_verify_within_limits(). >>>> >>>>> As I already commented[1], there is different between cpufreq and devfreq. >>>>> [1] https://lkml.org/lkml/2018/7/4/80 >>>>> >>>>> Already, subsystem already used OPP interface in order to control >>>>> specific OPP entry. I don't want to provide two outside method >>>>> to control the frequency of devfreq driver. It might make the confusion. >>>> >>>> I understand your point, it would indeed be preferable to have a >>>> single method. However I'm not convinced that the OPP interface is >>>> a suitable solution, as I exposed earlier in this thread (quoted >>>> below). >>>> >>>> I would like you to at least consider the possibility of changing >>>> drivers/thermal/devfreq_cooling.c to devfreq_verify_within_limits(). >>>> Besides that it's not what is currently used, do you see any technical >>>> concerns that would make devfreq_verify_within_limits() an unsuitable >>>> or inferior solution? >>> >>> As we already discussed, devfreq_verify_within_limits() doesn't support >>> the multiple outside controllers (e.g., devfreq-cooling.c). >> >> That's incorrect, its purpose is precisely that. >> >> Are you suggesting that cpufreq with its use of >> cpufreq_verify_within_limits() (the inspiration for >> devfreq_verify_within_limits()) is broken? It is used by cpu_cooling.c >> and other drivers when receiving a CPUFREQ_ADJUST event, essentially >> what I am proposing with DEVFREQ_ADJUST. >> >> Could you elaborate why this model wouldn't work for devfreq? "OPP >> interface is mandatory for devfreq" isn't really a technical argument, >> is it mandatory for any other reason than that it is the interface >> that is currently used? >> >>> After you are suggesting the throttler core, there are at least two >>> outside controllers (e.g., devfreq-cooling.c and throttler driver). >>> As I knew the problem about conflict, I cannot agree the temporary >>> method. OPP interface is mandatory for devfreq in order to control >>> the OPP (frequency/voltage). In this situation, we have to try to >>> find the method through OPP interface. >> >> What do you mean with "temporary method"? >> >> We can try to find a method through the OPP interface, but at this >> point I'm not convinced that it is technically necessary or even >> preferable. >> >> Another inconvenient of the OPP approach for both devfreq-cooling.c >> and the throttler is that they have to bother with disabling all OPPs >> above/below the max/min (they don't/shouldn't have to care), instead >> of just telling devfreq the max/min. > > And a more important one: both drivers now have to keep track which > OPPs they enabled/disabled previously, done are the days of a simple > dev_pm_opp_enable/disable() in devfreq_cooling. Certainly it is > possible and not very complex to implement, but is it really the > best/a good solution? As I replied them right before, Each outside driver has their own throttling policy to control OPP entries. They don't care the requirement of other driver and cannot know the requirement of other driver. devfreq core can only recognize them and then only consider enabled OPP entris without disabled OPP entries. For example1, | devfreq-cooling| throttler --------------------------------------- 500Mhz | disabled | disabled 400Mhz | disabled | disabled 300Mhz | | disabled 200Mhz | | 100Mhz | | => devfreq driver can use only 100/200Mhz For example2, | devfreq-cooling| throttler --------------------------------------- 500Mhz | disabled | disabled 400Mhz | disabled | 300Mhz | disabled | 200Mhz | | 100Mhz | | => devfreq driver can use only 100/200Mhz For example3, | devfreq-cooling| throttler --------------------------------------- 500Mhz | disabled | disabled 400Mhz | | 300Mhz | | 200Mhz | | disabled 100Mhz | | disabled => devfreq driver can use only 300/400Mhz
Hi Chanwoo, On Fri, Aug 03, 2018 at 08:56:57AM +0900, Chanwoo Choi wrote: > Hi Matthias, > > On 2018년 08월 03일 08:13, Matthias Kaehlcke wrote: > > Hi Chanwoo, > > > > On Thu, Aug 02, 2018 at 10:58:59AM +0900, Chanwoo Choi wrote: > >> Hi Matthias, > >> > >> On 2018년 08월 02일 02:08, Matthias Kaehlcke wrote: > >>> Hi Chanwoo, > >>> > >>> On Wed, Aug 01, 2018 at 10:22:16AM +0900, Chanwoo Choi wrote: > >>>> On 2018년 08월 01일 04:39, Matthias Kaehlcke wrote: > >>>>> On Mon, Jul 16, 2018 at 10:50:50AM -0700, Matthias Kaehlcke wrote: > >>>>>> On Thu, Jul 12, 2018 at 05:44:33PM +0900, Chanwoo Choi wrote: > >>>>>>> Hi Matthias, > >>>>>>> > >>>>>>> On 2018년 07월 07일 02:53, Matthias Kaehlcke wrote: > >>>>>>>> Hi Chanwoo, > >>>>>>>> > >>>>>>>> On Wed, Jul 04, 2018 at 03:41:46PM +0900, Chanwoo Choi wrote: > >>>>>>>> > >>>>>>>>> Firstly, > >>>>>>>>> I'm not sure why devfreq needs the devfreq_verify_within_limits() function. > >>>>>>>>> > >>>>>>>>> devfreq already used the OPP interface as default. It means that > >>>>>>>>> the outside of 'drivers/devfreq' can disable/enable the frequency > >>>>>>>>> such as drivers/thermal/devfreq_cooling.c. Also, when some device > >>>>>>>>> drivers disable/enable the specific frequency, the devfreq core > >>>>>>>>> consider them. > >>>>>>>>> > >>>>>>>>> So, devfreq doesn't need to devfreq_verify_within_limits() because > >>>>>>>>> already support some interface to change the minimum/maximum frequency > >>>>>>>>> of devfreq device. > >>>>>>>>> > >>>>>>>>> In case of cpufreq subsystem, cpufreq only provides 'cpufreq_verify_with_limits()' > >>>>>>>>> to change the minimum/maximum frequency of cpu. some device driver cannot > >>>>>>>>> change the minimum/maximum frequency through OPP interface. > >>>>>>>>> > >>>>>>>>> But, in case of devfreq subsystem, as I explained already, devfreq support > >>>>>>>>> the OPP interface as default way. devfreq subsystem doesn't need to add > >>>>>>>>> other way to change the minimum/maximum frequency. > >>>>>>>> > >>>>>>>> Using the OPP interface exclusively works as long as a > >>>>>>>> enabling/disabling of OPPs is limited to a single driver > >>>>>>>> (drivers/thermal/devfreq_cooling.c). When multiple drivers are > >>>>>>>> involved you need a way to resolve conflicts, that's the purpose of > >>>>>>>> devfreq_verify_within_limits(). Please let me know if there are > >>>>>>>> existing mechanisms for conflict resolution that I overlooked. > >>>>>>>> > >>>>>>>> Possibly drivers/thermal/devfreq_cooling.c could be migrated to use > >>>>>>>> devfreq_verify_within_limits() instead of the OPP interface if > >>>>>>>> desired, however this seems beyond the scope of this series. > >>>>>>> > >>>>>>> Actually, if we uses this approach, it doesn't support the multiple drivers too. > >>>>>>> If non throttler drivers uses devfreq_verify_within_limits(), the conflict > >>>>>>> happen. > >>>>>> > >>>>>> As long as drivers limit the max freq there is no conflict, the lowest > >>>>>> max freq wins. I expect this to be the usual case, apparently it > >>>>>> worked for cpufreq for 10+ years. > >>>>>> > >>>>>> However it is correct that there would be a conflict if a driver > >>>>>> requests a min freq that is higher than the max freq requested by > >>>>>> another. In this case devfreq_verify_within_limits() resolves the > >>>>>> conflict by raising p->max to the min freq. Not sure if this is > >>>>>> something that would ever occur in practice though. > >>>>>> > >>>>>> If we are really concerned about this case it would also be an option > >>>>>> to limit the adjustment to the max frequency. > >>>>>> > >>>>>>> To resolve the conflict for multiple device driver, maybe OPP interface > >>>>>>> have to support 'usage_count' such as clk_enable/disable(). > >>>>>> > >>>>>> This would require supporting negative usage count values, since a OPP > >>>>>> should not be enabled if e.g. thermal enables it but the throttler > >>>>>> disabled it or viceversa. > >>>>>> > >>>>>> Theoretically there could also be conflicts, like one driver disabling > >>>>>> the higher OPPs and another the lower ones, with the outcome of all > >>>>>> OPPs being disabled, which would be a more drastic conflict resolution > >>>>>> than that of devfreq_verify_within_limits(). > >>>>>> > >>>>>> Viresh, what do you think about an OPP usage count? > >>>>> > >>>>> Ping, can we try to reach a conclusion on this or at least keep the > >>>>> discussion going? > >>>>> > >>>>> Not that it matters, but my preferred solution continues to be > >>>>> devfreq_verify_within_limits(). It solves conflicts in some way (which > >>>>> could be adjusted if needed) and has proven to work in practice for > >>>>> 10+ years in a very similar sub-system. > >>>> > >>>> It is not true. Current cpufreq subsystem doesn't support external OPP > >>>> control to enable/disable the OPP entry. If some device driver > >>>> controls the OPP entry of cpufreq driver with opp_disable/enable(), > >>>> the operation is not working. Because cpufreq considers the limit > >>>> through 'cpufreq_verify_with_limits()' only. > >>> > >>> Ok, we can probably agree that using cpufreq_verify_with_limits() > >>> exclusively seems to have worked well for cpufreq, and that in their > >>> overall purpose cpufreq and devfreq are similar subsystems. > >>> > >>> The current throttler series with devfreq_verify_within_limits() takes > >>> the enabled OPPs into account, the lowest and highest OPP are used as > >>> a starting point for the frequency adjustment and (in theory) the > >>> frequency range should only be narrowed by > >>> devfreq_verify_within_limits(). > >>> > >>>> As I already commented[1], there is different between cpufreq and devfreq. > >>>> [1] https://lkml.org/lkml/2018/7/4/80 > >>>> > >>>> Already, subsystem already used OPP interface in order to control > >>>> specific OPP entry. I don't want to provide two outside method > >>>> to control the frequency of devfreq driver. It might make the confusion. > >>> > >>> I understand your point, it would indeed be preferable to have a > >>> single method. However I'm not convinced that the OPP interface is > >>> a suitable solution, as I exposed earlier in this thread (quoted > >>> below). > >>> > >>> I would like you to at least consider the possibility of changing > >>> drivers/thermal/devfreq_cooling.c to devfreq_verify_within_limits(). > >>> Besides that it's not what is currently used, do you see any technical > >>> concerns that would make devfreq_verify_within_limits() an unsuitable > >>> or inferior solution? > >> > >> As we already discussed, devfreq_verify_within_limits() doesn't support > >> the multiple outside controllers (e.g., devfreq-cooling.c). > > > > That's incorrect, its purpose is precisely that. > > > > Are you suggesting that cpufreq with its use of > > cpufreq_verify_within_limits() (the inspiration for > > devfreq_verify_within_limits()) is broken? It is used by cpu_cooling.c > > and other drivers when receiving a CPUFREQ_ADJUST event, essentially > > what I am proposing with DEVFREQ_ADJUST. > > > > Could you elaborate why this model wouldn't work for devfreq? "OPP > > I don't mention that this model is not working. As I already commented[1], > devfreq used OPP interface to control OPP entry on outside of devfreq driver. > Because devfreq used OPP interface, I hope to provide only OPP method > to control the frequency on outside of devfreq. > [1] https://lkml.org/lkml/2018/7/4/80 > > > interface is mandatory for devfreq" isn't really a technical argument, > > is it mandatory for any other reason than that it is the interface > > that is currently used? > > In case of controlling the frequency, OPP interface is mandatory for devfreq. > > cpufreq used cpufreq_verify_within_limit(). If outside driver disable > specific OPP entry, cpufreq don't consider them because after getting the frequency > from devicetree, cpufreq don't use the OPP interface for disabling/enabling. > Only if outside driver used cpufreq_verify_within_limit(), cpufreq consider > the range of minimum/maximum frequency. cpufreq core doesn't use 'dev_pm_opp_find_*' > function. It means that cpufreq code doesn't consider the statue of opp_diable/enable. > > devfreq used OPP interface. If outside driver disable specific OPP entry, devfreq consider them. What exactly is this 'outside driver' you are referring? The driver that 'owns' the devfreq device, e.g. a GPU driver? Or just any non-devfreq driver, like devfreq-cooling.c? If it's the first case then this isn't currently working as intended when the devfreq device is used as a cooling device, since the cooling device would overwrite the state set by the 'owner' in partition_enable_opps(). > When find available minimum frequency, devfreq used OPP interface. (find_available_min_freq) > When find available maximum frequency, devfreq used OPP interface. (find_available_max_freq) > When make freq_table of devfreq device, devfreq used OPP interface. (set_freq_table) > When outside driver disable or enable OPP entry, devfreq receives the notification > from OPP interface and then update the scaling_min_freq/scaling_max_freq by using > OPP interface. (devfreq_notifier_call) > At this point of using scaling_min_freq/scaling_max_freq on devfreq, it indicates > that devfreq used OPP interface because devfref tried to find scaling_min_freq/scaling_max_freq > through OPP interface. > > If outside driver use OPP interface in order to control frequency, > devfreq core is well working without any modification of devfreq > core. Thanks for elaborating! I understand that this is how it currently works, but unless I'm missing something about the outside driver disabling an OPP I still essentially read this as 'the OPP interface is mandatory because it's what is currently used by the devfreq core to limit the frequency range', rather than that using the OPP interface allows to provide a particular feature or is inherently better in some other way. I don't propose to completely strip the OPP interface out of devfreq, but mainly to switch devfreq-cooling.c to devfreq_verify_within_limits() to avoid having two mechanisms for limiting the frequency range. Besides being simpler this would allow to support the case where the 'owner' disables a certain OPP and devfreq respects that. The code required in the devfreq core to support this would be minimal (this patch). > >> After you are suggesting the throttler core, there are at least two > >> outside controllers (e.g., devfreq-cooling.c and throttler driver). > >> As I knew the problem about conflict, I cannot agree the temporary > >> method. OPP interface is mandatory for devfreq in order to control > >> the OPP (frequency/voltage). In this situation, we have to try to > >> find the method through OPP interface. > > > > What do you mean with "temporary method"? > > this expression might be not proper. Please ignore this expression. > > > > > We can try to find a method through the OPP interface, but at this > > point I'm not convinced that it is technically necessary or even > > preferable. > > I replied it about this as following. > > > > > Another inconvenient of the OPP approach for both devfreq-cooling.c > > and the throttler is that they have to bother with disabling all OPPs > > above/below the max/min (they don't/shouldn't have to care), instead > > of just telling devfreq the max/min. > > I think it doesn't matter. We can enable/disable the OPP entry by traversing. > partition_enable_opps() in drivers/thermal/devfreq-cools.c have already done so. > > > > >> We can refer to regulator/clock. Multiple device driver can use > >> the regulator/clock without any problem. I think that usage of OPP > >> is similiar with regulator/clock. As you mentioned, maybe OPP > >> would handle the negative count. Although opp_enable/opp_disable() > >> have to handle the negative count and opp_enable/opp_disable() > >> can support the multiple usage from device drivers, I think that > >> this approach is right. > > > > The regulator/clock approach with the typical usage counts seems more > > intuitive to me, personally I wouldn't write an interface with > > negative usage count if I could reasonably avoid it. > > I think the use of negative usage count is not problem if it's required. > > > > >>>> I want to use only OPP interface to enable/disable frequency > >>>> even if we have to modify the OPP interface. > >>> > >>> These are the concerns I raised earlier about a solution with OPP > >>> usage counts: > >>> > >>> "This would require supporting negative usage count values, since a OPP > >>> should not be enabled if e.g. thermal enables it but the throttler > >>> disabled it or viceversa. > >> > >> Already replied about negative usage count. I think that negative usage count > >> is not problem if this approach could resolve the issue. > >> > >>> > >>> Theoretically there could also be conflicts, like one driver disabling > >>> the higher OPPs and another the lower ones, with the outcome of all > >>> OPPs being disabled, which would be a more drastic conflict resolution > >>> than that of devfreq_verify_within_limits()." > >>> > >>> What do you think about these points? > >> > >> It depends on how to use OPP interface on multiple device driver. > >> Even if devfreq/opp provides the control method, outside device driver > >> are misusing them. It is problem of user. > > > > I wouldn't call it misusing if two independent drivers take > > contradictory actions on an interface that doesn't provide > > arbitration. How can driver A know that it shouldn't disable OPPs a, b > > and c because driver B disabled d, e and f? Who is misusing the > > interface, driver A or driver B? > > Each outside driver has their own throttling policy to control OPP entries. > They don't care the requirement of other driver and cannot know the requirement > of other driver. devfreq core can only recognize them. > > > > >> Instead, if we use the OPP interface, we can check why OPP entry > >> is disabled or enabled through usage count. > >> > >>> > >>> The negative usage counts aren't necessarily a dealbreaker in a > >>> technical sense, though I'm not a friend of quirky interfaces that > >>> don't behave like a typical user would expect (e.g. an OPP isn't > >>> necessarily enabled after dev_pm_opp_enable()). > >>> > >>> I can sent an RFC with OPP usage counts, though due to the above > >>> concerns I have doubts it will be well received. > >> > >> Please add me to Cc list. > > > > Will do > > OK. Thanks. This might take a bit for a few reasons. Before posting anything I would like to experiment a bit with it and find time to do so between other tasks (admittedly I'm also procrastinating a bit, because I'm unconvinced). And I will be out of office for two weeks starting nextweek, it's probably not the best to post and then disapear from the discussion. I might post the RFC if I can advance it in the next 48 hours, otherwise I think it is better to delay until I'm back from vacation. Cheers Matthias
Hi Chanwoo, On Fri, Aug 03, 2018 at 09:14:46AM +0900, Chanwoo Choi wrote: > Hi Matthias, > > On 2018년 08월 03일 08:48, Matthias Kaehlcke wrote: > > On Thu, Aug 02, 2018 at 04:13:43PM -0700, Matthias Kaehlcke wrote: > >> Hi Chanwoo, > >> > >> On Thu, Aug 02, 2018 at 10:58:59AM +0900, Chanwoo Choi wrote: > >>> Hi Matthias, > >>> > >>> On 2018년 08월 02일 02:08, Matthias Kaehlcke wrote: > >>>> Hi Chanwoo, > >>>> > >>>> On Wed, Aug 01, 2018 at 10:22:16AM +0900, Chanwoo Choi wrote: > >>>>> On 2018년 08월 01일 04:39, Matthias Kaehlcke wrote: > >>>>>> On Mon, Jul 16, 2018 at 10:50:50AM -0700, Matthias Kaehlcke wrote: > >>>>>>> On Thu, Jul 12, 2018 at 05:44:33PM +0900, Chanwoo Choi wrote: > >>>>>>>> Hi Matthias, > >>>>>>>> > >>>>>>>> On 2018년 07월 07일 02:53, Matthias Kaehlcke wrote: > >>>>>>>>> Hi Chanwoo, > >>>>>>>>> > >>>>>>>>> On Wed, Jul 04, 2018 at 03:41:46PM +0900, Chanwoo Choi wrote: > >>>>>>>>> > >>>>>>>>>> Firstly, > >>>>>>>>>> I'm not sure why devfreq needs the devfreq_verify_within_limits() function. > >>>>>>>>>> > >>>>>>>>>> devfreq already used the OPP interface as default. It means that > >>>>>>>>>> the outside of 'drivers/devfreq' can disable/enable the frequency > >>>>>>>>>> such as drivers/thermal/devfreq_cooling.c. Also, when some device > >>>>>>>>>> drivers disable/enable the specific frequency, the devfreq core > >>>>>>>>>> consider them. > >>>>>>>>>> > >>>>>>>>>> So, devfreq doesn't need to devfreq_verify_within_limits() because > >>>>>>>>>> already support some interface to change the minimum/maximum frequency > >>>>>>>>>> of devfreq device. > >>>>>>>>>> > >>>>>>>>>> In case of cpufreq subsystem, cpufreq only provides 'cpufreq_verify_with_limits()' > >>>>>>>>>> to change the minimum/maximum frequency of cpu. some device driver cannot > >>>>>>>>>> change the minimum/maximum frequency through OPP interface. > >>>>>>>>>> > >>>>>>>>>> But, in case of devfreq subsystem, as I explained already, devfreq support > >>>>>>>>>> the OPP interface as default way. devfreq subsystem doesn't need to add > >>>>>>>>>> other way to change the minimum/maximum frequency. > >>>>>>>>> > >>>>>>>>> Using the OPP interface exclusively works as long as a > >>>>>>>>> enabling/disabling of OPPs is limited to a single driver > >>>>>>>>> (drivers/thermal/devfreq_cooling.c). When multiple drivers are > >>>>>>>>> involved you need a way to resolve conflicts, that's the purpose of > >>>>>>>>> devfreq_verify_within_limits(). Please let me know if there are > >>>>>>>>> existing mechanisms for conflict resolution that I overlooked. > >>>>>>>>> > >>>>>>>>> Possibly drivers/thermal/devfreq_cooling.c could be migrated to use > >>>>>>>>> devfreq_verify_within_limits() instead of the OPP interface if > >>>>>>>>> desired, however this seems beyond the scope of this series. > >>>>>>>> > >>>>>>>> Actually, if we uses this approach, it doesn't support the multiple drivers too. > >>>>>>>> If non throttler drivers uses devfreq_verify_within_limits(), the conflict > >>>>>>>> happen. > >>>>>>> > >>>>>>> As long as drivers limit the max freq there is no conflict, the lowest > >>>>>>> max freq wins. I expect this to be the usual case, apparently it > >>>>>>> worked for cpufreq for 10+ years. > >>>>>>> > >>>>>>> However it is correct that there would be a conflict if a driver > >>>>>>> requests a min freq that is higher than the max freq requested by > >>>>>>> another. In this case devfreq_verify_within_limits() resolves the > >>>>>>> conflict by raising p->max to the min freq. Not sure if this is > >>>>>>> something that would ever occur in practice though. > >>>>>>> > >>>>>>> If we are really concerned about this case it would also be an option > >>>>>>> to limit the adjustment to the max frequency. > >>>>>>> > >>>>>>>> To resolve the conflict for multiple device driver, maybe OPP interface > >>>>>>>> have to support 'usage_count' such as clk_enable/disable(). > >>>>>>> > >>>>>>> This would require supporting negative usage count values, since a OPP > >>>>>>> should not be enabled if e.g. thermal enables it but the throttler > >>>>>>> disabled it or viceversa. > >>>>>>> > >>>>>>> Theoretically there could also be conflicts, like one driver disabling > >>>>>>> the higher OPPs and another the lower ones, with the outcome of all > >>>>>>> OPPs being disabled, which would be a more drastic conflict resolution > >>>>>>> than that of devfreq_verify_within_limits(). > >>>>>>> > >>>>>>> Viresh, what do you think about an OPP usage count? > >>>>>> > >>>>>> Ping, can we try to reach a conclusion on this or at least keep the > >>>>>> discussion going? > >>>>>> > >>>>>> Not that it matters, but my preferred solution continues to be > >>>>>> devfreq_verify_within_limits(). It solves conflicts in some way (which > >>>>>> could be adjusted if needed) and has proven to work in practice for > >>>>>> 10+ years in a very similar sub-system. > >>>>> > >>>>> It is not true. Current cpufreq subsystem doesn't support external OPP > >>>>> control to enable/disable the OPP entry. If some device driver > >>>>> controls the OPP entry of cpufreq driver with opp_disable/enable(), > >>>>> the operation is not working. Because cpufreq considers the limit > >>>>> through 'cpufreq_verify_with_limits()' only. > >>>> > >>>> Ok, we can probably agree that using cpufreq_verify_with_limits() > >>>> exclusively seems to have worked well for cpufreq, and that in their > >>>> overall purpose cpufreq and devfreq are similar subsystems. > >>>> > >>>> The current throttler series with devfreq_verify_within_limits() takes > >>>> the enabled OPPs into account, the lowest and highest OPP are used as > >>>> a starting point for the frequency adjustment and (in theory) the > >>>> frequency range should only be narrowed by > >>>> devfreq_verify_within_limits(). > >>>> > >>>>> As I already commented[1], there is different between cpufreq and devfreq. > >>>>> [1] https://lkml.org/lkml/2018/7/4/80 > >>>>> > >>>>> Already, subsystem already used OPP interface in order to control > >>>>> specific OPP entry. I don't want to provide two outside method > >>>>> to control the frequency of devfreq driver. It might make the confusion. > >>>> > >>>> I understand your point, it would indeed be preferable to have a > >>>> single method. However I'm not convinced that the OPP interface is > >>>> a suitable solution, as I exposed earlier in this thread (quoted > >>>> below). > >>>> > >>>> I would like you to at least consider the possibility of changing > >>>> drivers/thermal/devfreq_cooling.c to devfreq_verify_within_limits(). > >>>> Besides that it's not what is currently used, do you see any technical > >>>> concerns that would make devfreq_verify_within_limits() an unsuitable > >>>> or inferior solution? > >>> > >>> As we already discussed, devfreq_verify_within_limits() doesn't support > >>> the multiple outside controllers (e.g., devfreq-cooling.c). > >> > >> That's incorrect, its purpose is precisely that. > >> > >> Are you suggesting that cpufreq with its use of > >> cpufreq_verify_within_limits() (the inspiration for > >> devfreq_verify_within_limits()) is broken? It is used by cpu_cooling.c > >> and other drivers when receiving a CPUFREQ_ADJUST event, essentially > >> what I am proposing with DEVFREQ_ADJUST. > >> > >> Could you elaborate why this model wouldn't work for devfreq? "OPP > >> interface is mandatory for devfreq" isn't really a technical argument, > >> is it mandatory for any other reason than that it is the interface > >> that is currently used? > >> > >>> After you are suggesting the throttler core, there are at least two > >>> outside controllers (e.g., devfreq-cooling.c and throttler driver). > >>> As I knew the problem about conflict, I cannot agree the temporary > >>> method. OPP interface is mandatory for devfreq in order to control > >>> the OPP (frequency/voltage). In this situation, we have to try to > >>> find the method through OPP interface. > >> > >> What do you mean with "temporary method"? > >> > >> We can try to find a method through the OPP interface, but at this > >> point I'm not convinced that it is technically necessary or even > >> preferable. > >> > >> Another inconvenient of the OPP approach for both devfreq-cooling.c > >> and the throttler is that they have to bother with disabling all OPPs > >> above/below the max/min (they don't/shouldn't have to care), instead > >> of just telling devfreq the max/min. > > > > And a more important one: both drivers now have to keep track which > > OPPs they enabled/disabled previously, done are the days of a simple > > dev_pm_opp_enable/disable() in devfreq_cooling. Certainly it is > > possible and not very complex to implement, but is it really the > > best/a good solution? > > > As I replied them right before, Each outside driver has their own throttling > policy to control OPP entries. They don't care the requirement of other > driver and cannot know the requirement of other driver. devfreq core can only > recognize them and then only consider enabled OPP entris without disabled OPP entries. > > For example1, > | devfreq-cooling| throttler > --------------------------------------- > 500Mhz | disabled | disabled > 400Mhz | disabled | disabled > 300Mhz | | disabled > 200Mhz | | > 100Mhz | | > => devfreq driver can use only 100/200Mhz > > > For example2, > | devfreq-cooling| throttler > --------------------------------------- > 500Mhz | disabled | disabled > 400Mhz | disabled | > 300Mhz | disabled | > 200Mhz | | > 100Mhz | | > => devfreq driver can use only 100/200Mhz > > > For example3, > | devfreq-cooling| throttler > --------------------------------------- > 500Mhz | disabled | disabled > 400Mhz | | > 300Mhz | | > 200Mhz | | disabled > 100Mhz | | disabled > => devfreq driver can use only 300/400Mhz These are all cases without conflicts, my concern is about this: > | devfreq-cooling| throttler > --------------------------------------- > 500Mhz | disabled | > 400Mhz | disabled | > 300Mhz | | disabled > 200Mhz | | disabled > 100Mhz | | disabled > => devfreq driver can't use any frequency? Actually my above comment wasn't about this case, but about the added complexity in devfreq-cooling.c and the throttler: A bit simplified partition_enable_opps() currently does this: for_each_opp(opp) { if (opp->freq <= max) opp_enable(opp) else opp_disable(opp) } With the OPP usage/disable count this doesn't work any longer. Now we need to keep track of the enabled/disabled state of the OPP, something like: dev_pm_opp_enable(opp) { if (opp->freq <= max) { if (opp->freq > prev_max) opp_enable(opp) } else { if (opp->freq < prev_max) opp_disable(opp) } } And duplicate the same in the throttler (and other possible drivers). Obviously it can be done, but is there really any gain from it? Instead they just could do: devfreq_verify_within_limits(policy/freq_pair, 0, max_freq) without being concerned about implementation details of devfreq. Thanks Matthias
Hi Matthias, On 2018년 08월 07일 03:46, Matthias Kaehlcke wrote: > Hi Chanwoo, > > On Fri, Aug 03, 2018 at 08:56:57AM +0900, Chanwoo Choi wrote: >> Hi Matthias, >> >> On 2018년 08월 03일 08:13, Matthias Kaehlcke wrote: >>> Hi Chanwoo, >>> >>> On Thu, Aug 02, 2018 at 10:58:59AM +0900, Chanwoo Choi wrote: >>>> Hi Matthias, >>>> >>>> On 2018년 08월 02일 02:08, Matthias Kaehlcke wrote: >>>>> Hi Chanwoo, >>>>> >>>>> On Wed, Aug 01, 2018 at 10:22:16AM +0900, Chanwoo Choi wrote: >>>>>> On 2018년 08월 01일 04:39, Matthias Kaehlcke wrote: >>>>>>> On Mon, Jul 16, 2018 at 10:50:50AM -0700, Matthias Kaehlcke wrote: >>>>>>>> On Thu, Jul 12, 2018 at 05:44:33PM +0900, Chanwoo Choi wrote: >>>>>>>>> Hi Matthias, >>>>>>>>> >>>>>>>>> On 2018년 07월 07일 02:53, Matthias Kaehlcke wrote: >>>>>>>>>> Hi Chanwoo, >>>>>>>>>> >>>>>>>>>> On Wed, Jul 04, 2018 at 03:41:46PM +0900, Chanwoo Choi wrote: >>>>>>>>>> >>>>>>>>>>> Firstly, >>>>>>>>>>> I'm not sure why devfreq needs the devfreq_verify_within_limits() function. >>>>>>>>>>> >>>>>>>>>>> devfreq already used the OPP interface as default. It means that >>>>>>>>>>> the outside of 'drivers/devfreq' can disable/enable the frequency >>>>>>>>>>> such as drivers/thermal/devfreq_cooling.c. Also, when some device >>>>>>>>>>> drivers disable/enable the specific frequency, the devfreq core >>>>>>>>>>> consider them. >>>>>>>>>>> >>>>>>>>>>> So, devfreq doesn't need to devfreq_verify_within_limits() because >>>>>>>>>>> already support some interface to change the minimum/maximum frequency >>>>>>>>>>> of devfreq device. >>>>>>>>>>> >>>>>>>>>>> In case of cpufreq subsystem, cpufreq only provides 'cpufreq_verify_with_limits()' >>>>>>>>>>> to change the minimum/maximum frequency of cpu. some device driver cannot >>>>>>>>>>> change the minimum/maximum frequency through OPP interface. >>>>>>>>>>> >>>>>>>>>>> But, in case of devfreq subsystem, as I explained already, devfreq support >>>>>>>>>>> the OPP interface as default way. devfreq subsystem doesn't need to add >>>>>>>>>>> other way to change the minimum/maximum frequency. >>>>>>>>>> >>>>>>>>>> Using the OPP interface exclusively works as long as a >>>>>>>>>> enabling/disabling of OPPs is limited to a single driver >>>>>>>>>> (drivers/thermal/devfreq_cooling.c). When multiple drivers are >>>>>>>>>> involved you need a way to resolve conflicts, that's the purpose of >>>>>>>>>> devfreq_verify_within_limits(). Please let me know if there are >>>>>>>>>> existing mechanisms for conflict resolution that I overlooked. >>>>>>>>>> >>>>>>>>>> Possibly drivers/thermal/devfreq_cooling.c could be migrated to use >>>>>>>>>> devfreq_verify_within_limits() instead of the OPP interface if >>>>>>>>>> desired, however this seems beyond the scope of this series. >>>>>>>>> >>>>>>>>> Actually, if we uses this approach, it doesn't support the multiple drivers too. >>>>>>>>> If non throttler drivers uses devfreq_verify_within_limits(), the conflict >>>>>>>>> happen. >>>>>>>> >>>>>>>> As long as drivers limit the max freq there is no conflict, the lowest >>>>>>>> max freq wins. I expect this to be the usual case, apparently it >>>>>>>> worked for cpufreq for 10+ years. >>>>>>>> >>>>>>>> However it is correct that there would be a conflict if a driver >>>>>>>> requests a min freq that is higher than the max freq requested by >>>>>>>> another. In this case devfreq_verify_within_limits() resolves the >>>>>>>> conflict by raising p->max to the min freq. Not sure if this is >>>>>>>> something that would ever occur in practice though. >>>>>>>> >>>>>>>> If we are really concerned about this case it would also be an option >>>>>>>> to limit the adjustment to the max frequency. >>>>>>>> >>>>>>>>> To resolve the conflict for multiple device driver, maybe OPP interface >>>>>>>>> have to support 'usage_count' such as clk_enable/disable(). >>>>>>>> >>>>>>>> This would require supporting negative usage count values, since a OPP >>>>>>>> should not be enabled if e.g. thermal enables it but the throttler >>>>>>>> disabled it or viceversa. >>>>>>>> >>>>>>>> Theoretically there could also be conflicts, like one driver disabling >>>>>>>> the higher OPPs and another the lower ones, with the outcome of all >>>>>>>> OPPs being disabled, which would be a more drastic conflict resolution >>>>>>>> than that of devfreq_verify_within_limits(). >>>>>>>> >>>>>>>> Viresh, what do you think about an OPP usage count? >>>>>>> >>>>>>> Ping, can we try to reach a conclusion on this or at least keep the >>>>>>> discussion going? >>>>>>> >>>>>>> Not that it matters, but my preferred solution continues to be >>>>>>> devfreq_verify_within_limits(). It solves conflicts in some way (which >>>>>>> could be adjusted if needed) and has proven to work in practice for >>>>>>> 10+ years in a very similar sub-system. >>>>>> >>>>>> It is not true. Current cpufreq subsystem doesn't support external OPP >>>>>> control to enable/disable the OPP entry. If some device driver >>>>>> controls the OPP entry of cpufreq driver with opp_disable/enable(), >>>>>> the operation is not working. Because cpufreq considers the limit >>>>>> through 'cpufreq_verify_with_limits()' only. >>>>> >>>>> Ok, we can probably agree that using cpufreq_verify_with_limits() >>>>> exclusively seems to have worked well for cpufreq, and that in their >>>>> overall purpose cpufreq and devfreq are similar subsystems. >>>>> >>>>> The current throttler series with devfreq_verify_within_limits() takes >>>>> the enabled OPPs into account, the lowest and highest OPP are used as >>>>> a starting point for the frequency adjustment and (in theory) the >>>>> frequency range should only be narrowed by >>>>> devfreq_verify_within_limits(). >>>>> >>>>>> As I already commented[1], there is different between cpufreq and devfreq. >>>>>> [1] https://lkml.org/lkml/2018/7/4/80 >>>>>> >>>>>> Already, subsystem already used OPP interface in order to control >>>>>> specific OPP entry. I don't want to provide two outside method >>>>>> to control the frequency of devfreq driver. It might make the confusion. >>>>> >>>>> I understand your point, it would indeed be preferable to have a >>>>> single method. However I'm not convinced that the OPP interface is >>>>> a suitable solution, as I exposed earlier in this thread (quoted >>>>> below). >>>>> >>>>> I would like you to at least consider the possibility of changing >>>>> drivers/thermal/devfreq_cooling.c to devfreq_verify_within_limits(). >>>>> Besides that it's not what is currently used, do you see any technical >>>>> concerns that would make devfreq_verify_within_limits() an unsuitable >>>>> or inferior solution? >>>> >>>> As we already discussed, devfreq_verify_within_limits() doesn't support >>>> the multiple outside controllers (e.g., devfreq-cooling.c). >>> >>> That's incorrect, its purpose is precisely that. >>> >>> Are you suggesting that cpufreq with its use of >>> cpufreq_verify_within_limits() (the inspiration for >>> devfreq_verify_within_limits()) is broken? It is used by cpu_cooling.c >>> and other drivers when receiving a CPUFREQ_ADJUST event, essentially >>> what I am proposing with DEVFREQ_ADJUST. >>> >>> Could you elaborate why this model wouldn't work for devfreq? "OPP >> >> I don't mention that this model is not working. As I already commented[1], >> devfreq used OPP interface to control OPP entry on outside of devfreq driver. >> Because devfreq used OPP interface, I hope to provide only OPP method >> to control the frequency on outside of devfreq. >> [1] https://lkml.org/lkml/2018/7/4/80 >> >>> interface is mandatory for devfreq" isn't really a technical argument, >>> is it mandatory for any other reason than that it is the interface >>> that is currently used? >> >> In case of controlling the frequency, OPP interface is mandatory for devfreq. >> >> cpufreq used cpufreq_verify_within_limit(). If outside driver disable >> specific OPP entry, cpufreq don't consider them because after getting the frequency >> from devicetree, cpufreq don't use the OPP interface for disabling/enabling. >> Only if outside driver used cpufreq_verify_within_limit(), cpufreq consider >> the range of minimum/maximum frequency. cpufreq core doesn't use 'dev_pm_opp_find_*' >> function. It means that cpufreq code doesn't consider the statue of opp_diable/enable. >> >> devfreq used OPP interface. If outside driver disable specific OPP entry, devfreq consider them. > > What exactly is this 'outside driver' you are referring? The driver > that 'owns' the devfreq device, e.g. a GPU driver? Or just any > non-devfreq driver, like devfreq-cooling.c? > > If it's the first case then this isn't currently working as intended > when the devfreq device is used as a cooling device, since the cooling > device would overwrite the state set by the 'owner' in > partition_enable_opps(). > >> When find available minimum frequency, devfreq used OPP interface. (find_available_min_freq) >> When find available maximum frequency, devfreq used OPP interface. (find_available_max_freq) >> When make freq_table of devfreq device, devfreq used OPP interface. (set_freq_table) >> When outside driver disable or enable OPP entry, devfreq receives the notification >> from OPP interface and then update the scaling_min_freq/scaling_max_freq by using >> OPP interface. (devfreq_notifier_call) >> At this point of using scaling_min_freq/scaling_max_freq on devfreq, it indicates >> that devfreq used OPP interface because devfref tried to find scaling_min_freq/scaling_max_freq >> through OPP interface. >> >> If outside driver use OPP interface in order to control frequency, >> devfreq core is well working without any modification of devfreq >> core. > > Thanks for elaborating! > > I understand that this is how it currently works, but unless I'm > missing something about the outside driver disabling an OPP I still > essentially read this as 'the OPP interface is mandatory because it's > what is currently used by the devfreq core to limit the frequency > range', rather than that using the OPP interface allows to provide a > particular feature or is inherently better in some other way. > > I don't propose to completely strip the OPP interface out of devfreq, > but mainly to switch devfreq-cooling.c to > devfreq_verify_within_limits() to avoid having two mechanisms for > limiting the frequency range. Besides being simpler this would allow > to support the case where the 'owner' disables a certain OPP and > devfreq respects that. The code required in the devfreq core to > support this would be minimal (this patch). > >>>> After you are suggesting the throttler core, there are at least two >>>> outside controllers (e.g., devfreq-cooling.c and throttler driver). >>>> As I knew the problem about conflict, I cannot agree the temporary >>>> method. OPP interface is mandatory for devfreq in order to control >>>> the OPP (frequency/voltage). In this situation, we have to try to >>>> find the method through OPP interface. >>> >>> What do you mean with "temporary method"? >> >> this expression might be not proper. Please ignore this expression. >> >>> >>> We can try to find a method through the OPP interface, but at this >>> point I'm not convinced that it is technically necessary or even >>> preferable. >> >> I replied it about this as following. >> >>> >>> Another inconvenient of the OPP approach for both devfreq-cooling.c >>> and the throttler is that they have to bother with disabling all OPPs >>> above/below the max/min (they don't/shouldn't have to care), instead >>> of just telling devfreq the max/min. >> >> I think it doesn't matter. We can enable/disable the OPP entry by traversing. >> partition_enable_opps() in drivers/thermal/devfreq-cools.c have already done so. >> >>> >>>> We can refer to regulator/clock. Multiple device driver can use >>>> the regulator/clock without any problem. I think that usage of OPP >>>> is similiar with regulator/clock. As you mentioned, maybe OPP >>>> would handle the negative count. Although opp_enable/opp_disable() >>>> have to handle the negative count and opp_enable/opp_disable() >>>> can support the multiple usage from device drivers, I think that >>>> this approach is right. >>> >>> The regulator/clock approach with the typical usage counts seems more >>> intuitive to me, personally I wouldn't write an interface with >>> negative usage count if I could reasonably avoid it. >> >> I think the use of negative usage count is not problem if it's required. >> >>> >>>>>> I want to use only OPP interface to enable/disable frequency >>>>>> even if we have to modify the OPP interface. >>>>> >>>>> These are the concerns I raised earlier about a solution with OPP >>>>> usage counts: >>>>> >>>>> "This would require supporting negative usage count values, since a OPP >>>>> should not be enabled if e.g. thermal enables it but the throttler >>>>> disabled it or viceversa. >>>> >>>> Already replied about negative usage count. I think that negative usage count >>>> is not problem if this approach could resolve the issue. >>>> >>>>> >>>>> Theoretically there could also be conflicts, like one driver disabling >>>>> the higher OPPs and another the lower ones, with the outcome of all >>>>> OPPs being disabled, which would be a more drastic conflict resolution >>>>> than that of devfreq_verify_within_limits()." >>>>> >>>>> What do you think about these points? >>>> >>>> It depends on how to use OPP interface on multiple device driver. >>>> Even if devfreq/opp provides the control method, outside device driver >>>> are misusing them. It is problem of user. >>> >>> I wouldn't call it misusing if two independent drivers take >>> contradictory actions on an interface that doesn't provide >>> arbitration. How can driver A know that it shouldn't disable OPPs a, b >>> and c because driver B disabled d, e and f? Who is misusing the >>> interface, driver A or driver B? >> >> Each outside driver has their own throttling policy to control OPP entries. >> They don't care the requirement of other driver and cannot know the requirement >> of other driver. devfreq core can only recognize them. >> >>> >>>> Instead, if we use the OPP interface, we can check why OPP entry >>>> is disabled or enabled through usage count. >>>> >>>>> >>>>> The negative usage counts aren't necessarily a dealbreaker in a >>>>> technical sense, though I'm not a friend of quirky interfaces that >>>>> don't behave like a typical user would expect (e.g. an OPP isn't >>>>> necessarily enabled after dev_pm_opp_enable()). >>>>> >>>>> I can sent an RFC with OPP usage counts, though due to the above >>>>> concerns I have doubts it will be well received. >>>> >>>> Please add me to Cc list. >>> >>> Will do >> >> OK. Thanks. > > This might take a bit for a few reasons. Before posting anything I > would like to experiment a bit with it and find time to do so between > other tasks (admittedly I'm also procrastinating a bit, because I'm > unconvinced). And I will be out of office for two weeks starting > nextweek, it's probably not the best to post and then disapear from > the discussion. I might post the RFC if I can advance it in the next > 48 hours, otherwise I think it is better to delay until I'm back from > vacation. I agree you better to do this after your vacation.
Hi Matthias, On 2018년 08월 07일 04:21, Matthias Kaehlcke wrote: > Hi Chanwoo, > > On Fri, Aug 03, 2018 at 09:14:46AM +0900, Chanwoo Choi wrote: >> Hi Matthias, >> >> On 2018년 08월 03일 08:48, Matthias Kaehlcke wrote: >>> On Thu, Aug 02, 2018 at 04:13:43PM -0700, Matthias Kaehlcke wrote: >>>> Hi Chanwoo, >>>> >>>> On Thu, Aug 02, 2018 at 10:58:59AM +0900, Chanwoo Choi wrote: >>>>> Hi Matthias, >>>>> >>>>> On 2018년 08월 02일 02:08, Matthias Kaehlcke wrote: >>>>>> Hi Chanwoo, >>>>>> >>>>>> On Wed, Aug 01, 2018 at 10:22:16AM +0900, Chanwoo Choi wrote: >>>>>>> On 2018년 08월 01일 04:39, Matthias Kaehlcke wrote: >>>>>>>> On Mon, Jul 16, 2018 at 10:50:50AM -0700, Matthias Kaehlcke wrote: >>>>>>>>> On Thu, Jul 12, 2018 at 05:44:33PM +0900, Chanwoo Choi wrote: >>>>>>>>>> Hi Matthias, >>>>>>>>>> >>>>>>>>>> On 2018년 07월 07일 02:53, Matthias Kaehlcke wrote: >>>>>>>>>>> Hi Chanwoo, >>>>>>>>>>> >>>>>>>>>>> On Wed, Jul 04, 2018 at 03:41:46PM +0900, Chanwoo Choi wrote: >>>>>>>>>>> >>>>>>>>>>>> Firstly, >>>>>>>>>>>> I'm not sure why devfreq needs the devfreq_verify_within_limits() function. >>>>>>>>>>>> >>>>>>>>>>>> devfreq already used the OPP interface as default. It means that >>>>>>>>>>>> the outside of 'drivers/devfreq' can disable/enable the frequency >>>>>>>>>>>> such as drivers/thermal/devfreq_cooling.c. Also, when some device >>>>>>>>>>>> drivers disable/enable the specific frequency, the devfreq core >>>>>>>>>>>> consider them. >>>>>>>>>>>> >>>>>>>>>>>> So, devfreq doesn't need to devfreq_verify_within_limits() because >>>>>>>>>>>> already support some interface to change the minimum/maximum frequency >>>>>>>>>>>> of devfreq device. >>>>>>>>>>>> >>>>>>>>>>>> In case of cpufreq subsystem, cpufreq only provides 'cpufreq_verify_with_limits()' >>>>>>>>>>>> to change the minimum/maximum frequency of cpu. some device driver cannot >>>>>>>>>>>> change the minimum/maximum frequency through OPP interface. >>>>>>>>>>>> >>>>>>>>>>>> But, in case of devfreq subsystem, as I explained already, devfreq support >>>>>>>>>>>> the OPP interface as default way. devfreq subsystem doesn't need to add >>>>>>>>>>>> other way to change the minimum/maximum frequency. >>>>>>>>>>> >>>>>>>>>>> Using the OPP interface exclusively works as long as a >>>>>>>>>>> enabling/disabling of OPPs is limited to a single driver >>>>>>>>>>> (drivers/thermal/devfreq_cooling.c). When multiple drivers are >>>>>>>>>>> involved you need a way to resolve conflicts, that's the purpose of >>>>>>>>>>> devfreq_verify_within_limits(). Please let me know if there are >>>>>>>>>>> existing mechanisms for conflict resolution that I overlooked. >>>>>>>>>>> >>>>>>>>>>> Possibly drivers/thermal/devfreq_cooling.c could be migrated to use >>>>>>>>>>> devfreq_verify_within_limits() instead of the OPP interface if >>>>>>>>>>> desired, however this seems beyond the scope of this series. >>>>>>>>>> >>>>>>>>>> Actually, if we uses this approach, it doesn't support the multiple drivers too. >>>>>>>>>> If non throttler drivers uses devfreq_verify_within_limits(), the conflict >>>>>>>>>> happen. >>>>>>>>> >>>>>>>>> As long as drivers limit the max freq there is no conflict, the lowest >>>>>>>>> max freq wins. I expect this to be the usual case, apparently it >>>>>>>>> worked for cpufreq for 10+ years. >>>>>>>>> >>>>>>>>> However it is correct that there would be a conflict if a driver >>>>>>>>> requests a min freq that is higher than the max freq requested by >>>>>>>>> another. In this case devfreq_verify_within_limits() resolves the >>>>>>>>> conflict by raising p->max to the min freq. Not sure if this is >>>>>>>>> something that would ever occur in practice though. >>>>>>>>> >>>>>>>>> If we are really concerned about this case it would also be an option >>>>>>>>> to limit the adjustment to the max frequency. >>>>>>>>> >>>>>>>>>> To resolve the conflict for multiple device driver, maybe OPP interface >>>>>>>>>> have to support 'usage_count' such as clk_enable/disable(). >>>>>>>>> >>>>>>>>> This would require supporting negative usage count values, since a OPP >>>>>>>>> should not be enabled if e.g. thermal enables it but the throttler >>>>>>>>> disabled it or viceversa. >>>>>>>>> >>>>>>>>> Theoretically there could also be conflicts, like one driver disabling >>>>>>>>> the higher OPPs and another the lower ones, with the outcome of all >>>>>>>>> OPPs being disabled, which would be a more drastic conflict resolution >>>>>>>>> than that of devfreq_verify_within_limits(). >>>>>>>>> >>>>>>>>> Viresh, what do you think about an OPP usage count? >>>>>>>> >>>>>>>> Ping, can we try to reach a conclusion on this or at least keep the >>>>>>>> discussion going? >>>>>>>> >>>>>>>> Not that it matters, but my preferred solution continues to be >>>>>>>> devfreq_verify_within_limits(). It solves conflicts in some way (which >>>>>>>> could be adjusted if needed) and has proven to work in practice for >>>>>>>> 10+ years in a very similar sub-system. >>>>>>> >>>>>>> It is not true. Current cpufreq subsystem doesn't support external OPP >>>>>>> control to enable/disable the OPP entry. If some device driver >>>>>>> controls the OPP entry of cpufreq driver with opp_disable/enable(), >>>>>>> the operation is not working. Because cpufreq considers the limit >>>>>>> through 'cpufreq_verify_with_limits()' only. >>>>>> >>>>>> Ok, we can probably agree that using cpufreq_verify_with_limits() >>>>>> exclusively seems to have worked well for cpufreq, and that in their >>>>>> overall purpose cpufreq and devfreq are similar subsystems. >>>>>> >>>>>> The current throttler series with devfreq_verify_within_limits() takes >>>>>> the enabled OPPs into account, the lowest and highest OPP are used as >>>>>> a starting point for the frequency adjustment and (in theory) the >>>>>> frequency range should only be narrowed by >>>>>> devfreq_verify_within_limits(). >>>>>> >>>>>>> As I already commented[1], there is different between cpufreq and devfreq. >>>>>>> [1] https://lkml.org/lkml/2018/7/4/80 >>>>>>> >>>>>>> Already, subsystem already used OPP interface in order to control >>>>>>> specific OPP entry. I don't want to provide two outside method >>>>>>> to control the frequency of devfreq driver. It might make the confusion. >>>>>> >>>>>> I understand your point, it would indeed be preferable to have a >>>>>> single method. However I'm not convinced that the OPP interface is >>>>>> a suitable solution, as I exposed earlier in this thread (quoted >>>>>> below). >>>>>> >>>>>> I would like you to at least consider the possibility of changing >>>>>> drivers/thermal/devfreq_cooling.c to devfreq_verify_within_limits(). >>>>>> Besides that it's not what is currently used, do you see any technical >>>>>> concerns that would make devfreq_verify_within_limits() an unsuitable >>>>>> or inferior solution? >>>>> >>>>> As we already discussed, devfreq_verify_within_limits() doesn't support >>>>> the multiple outside controllers (e.g., devfreq-cooling.c). >>>> >>>> That's incorrect, its purpose is precisely that. >>>> >>>> Are you suggesting that cpufreq with its use of >>>> cpufreq_verify_within_limits() (the inspiration for >>>> devfreq_verify_within_limits()) is broken? It is used by cpu_cooling.c >>>> and other drivers when receiving a CPUFREQ_ADJUST event, essentially >>>> what I am proposing with DEVFREQ_ADJUST. >>>> >>>> Could you elaborate why this model wouldn't work for devfreq? "OPP >>>> interface is mandatory for devfreq" isn't really a technical argument, >>>> is it mandatory for any other reason than that it is the interface >>>> that is currently used? >>>> >>>>> After you are suggesting the throttler core, there are at least two >>>>> outside controllers (e.g., devfreq-cooling.c and throttler driver). >>>>> As I knew the problem about conflict, I cannot agree the temporary >>>>> method. OPP interface is mandatory for devfreq in order to control >>>>> the OPP (frequency/voltage). In this situation, we have to try to >>>>> find the method through OPP interface. >>>> >>>> What do you mean with "temporary method"? >>>> >>>> We can try to find a method through the OPP interface, but at this >>>> point I'm not convinced that it is technically necessary or even >>>> preferable. >>>> >>>> Another inconvenient of the OPP approach for both devfreq-cooling.c >>>> and the throttler is that they have to bother with disabling all OPPs >>>> above/below the max/min (they don't/shouldn't have to care), instead >>>> of just telling devfreq the max/min. >>> >>> And a more important one: both drivers now have to keep track which >>> OPPs they enabled/disabled previously, done are the days of a simple >>> dev_pm_opp_enable/disable() in devfreq_cooling. Certainly it is >>> possible and not very complex to implement, but is it really the >>> best/a good solution? >> >> >> As I replied them right before, Each outside driver has their own throttling >> policy to control OPP entries. They don't care the requirement of other >> driver and cannot know the requirement of other driver. devfreq core can only >> recognize them and then only consider enabled OPP entris without disabled OPP entries. >> >> For example1, >> | devfreq-cooling| throttler >> --------------------------------------- >> 500Mhz | disabled | disabled >> 400Mhz | disabled | disabled >> 300Mhz | | disabled >> 200Mhz | | >> 100Mhz | | >> => devfreq driver can use only 100/200Mhz >> >> >> For example2, >> | devfreq-cooling| throttler >> --------------------------------------- >> 500Mhz | disabled | disabled >> 400Mhz | disabled | >> 300Mhz | disabled | >> 200Mhz | | >> 100Mhz | | >> => devfreq driver can use only 100/200Mhz >> >> >> For example3, >> | devfreq-cooling| throttler >> --------------------------------------- >> 500Mhz | disabled | disabled >> 400Mhz | | >> 300Mhz | | >> 200Mhz | | disabled >> 100Mhz | | disabled >> => devfreq driver can use only 300/400Mhz > > These are all cases without conflicts, my concern is about this: > >> | devfreq-cooling| throttler >> --------------------------------------- >> 500Mhz | disabled | >> 400Mhz | disabled | >> 300Mhz | | disabled >> 200Mhz | | disabled >> 100Mhz | | disabled >> => devfreq driver can't use any frequency? There are no any enabled frequency. Because device driver (devfreq-cooling, throttler) disable all frequencies. Outside drivers(devfreq-cooling, throttler) can enable/disable specific OPP entries. As I already commented, each outside driver doesn't consider the policy of other device driver about OPP entries. OPP interface is independent on devfreq and just control OPP entries. After that, devfreq just consider the only enabled OPP entries. > > Actually my above comment wasn't about this case, but about the > added complexity in devfreq-cooling.c and the throttler: > > A bit simplified partition_enable_opps() currently does this: > > for_each_opp(opp) { > if (opp->freq <= max) > opp_enable(opp) > else > opp_disable(opp) > } > > With the OPP usage/disable count this doesn't work any longer. Now we > need to keep track of the enabled/disabled state of the OPP, something > like: > > dev_pm_opp_enable(opp) { > if (opp->freq <= max) { > if (opp->freq > prev_max) > opp_enable(opp) > } else { > if (opp->freq < prev_max) > opp_disable(opp) > } > } > > And duplicate the same in the throttler (and other possible > drivers). Obviously it can be done, but is there really any gain > from it? > > Instead they just could do: > > devfreq_verify_within_limits(policy/freq_pair, 0, max_freq) > > without being concerned about implementation details of devfreq. > I don't think so. dev_pm_opp_enable()/dev_pm_opp_disable() have to consider only one OPP entry without any other OPP entry. dev_pm_opp_enable()/dev_pm_opp_disable() can never know the other OPP entries. After some driver(devfreq-cooling.c and throttler) enable or disable specific OPP entries, the remaining OPP entry with enabled state will be considered on devfreq driver.
Hi Viresh Kumar, I have a question about dev_pm_opp_enable() and dev_pm_opp_disable(). Two functions have 'available' field to indicate the status of specific OPP. If different device drivers try to control the same OPP, dev_pm_opp_enable() and dev_pm_opp_disable() will consider only last operation. It means that OPP should be enabled/disabled by only one device driver. For example, opp_table of driver a(dev_a) - 500Mhz - 400Mhz - 300Mhz - 200Mhz - 100Mhz Driver B, opp_disable(dev_a, 500) Driver C, opp_enable(dev_a, 500) -> 500Mhz is enabled. But, driver B might want to enable 500Mhz at this time such as cooling. I think that if OPP support the use of multiple device drivers, dev_pm_opp_enable() and dev_pm_opp_disable() should support the usage count such as regulator/clock. I would like your opinion. Regards, Chanwoo Choi On 2018년 08월 07일 07:31, Chanwoo Choi wrote: > Hi Matthias, > > On 2018년 08월 07일 04:21, Matthias Kaehlcke wrote: >> Hi Chanwoo, >> >> On Fri, Aug 03, 2018 at 09:14:46AM +0900, Chanwoo Choi wrote: >>> Hi Matthias, >>> >>> On 2018년 08월 03일 08:48, Matthias Kaehlcke wrote: >>>> On Thu, Aug 02, 2018 at 04:13:43PM -0700, Matthias Kaehlcke wrote: >>>>> Hi Chanwoo, >>>>> >>>>> On Thu, Aug 02, 2018 at 10:58:59AM +0900, Chanwoo Choi wrote: >>>>>> Hi Matthias, >>>>>> >>>>>> On 2018년 08월 02일 02:08, Matthias Kaehlcke wrote: >>>>>>> Hi Chanwoo, >>>>>>> >>>>>>> On Wed, Aug 01, 2018 at 10:22:16AM +0900, Chanwoo Choi wrote: >>>>>>>> On 2018년 08월 01일 04:39, Matthias Kaehlcke wrote: >>>>>>>>> On Mon, Jul 16, 2018 at 10:50:50AM -0700, Matthias Kaehlcke wrote: >>>>>>>>>> On Thu, Jul 12, 2018 at 05:44:33PM +0900, Chanwoo Choi wrote: >>>>>>>>>>> Hi Matthias, >>>>>>>>>>> >>>>>>>>>>> On 2018년 07월 07일 02:53, Matthias Kaehlcke wrote: >>>>>>>>>>>> Hi Chanwoo, >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Jul 04, 2018 at 03:41:46PM +0900, Chanwoo Choi wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Firstly, >>>>>>>>>>>>> I'm not sure why devfreq needs the devfreq_verify_within_limits() function. >>>>>>>>>>>>> >>>>>>>>>>>>> devfreq already used the OPP interface as default. It means that >>>>>>>>>>>>> the outside of 'drivers/devfreq' can disable/enable the frequency >>>>>>>>>>>>> such as drivers/thermal/devfreq_cooling.c. Also, when some device >>>>>>>>>>>>> drivers disable/enable the specific frequency, the devfreq core >>>>>>>>>>>>> consider them. >>>>>>>>>>>>> >>>>>>>>>>>>> So, devfreq doesn't need to devfreq_verify_within_limits() because >>>>>>>>>>>>> already support some interface to change the minimum/maximum frequency >>>>>>>>>>>>> of devfreq device. >>>>>>>>>>>>> >>>>>>>>>>>>> In case of cpufreq subsystem, cpufreq only provides 'cpufreq_verify_with_limits()' >>>>>>>>>>>>> to change the minimum/maximum frequency of cpu. some device driver cannot >>>>>>>>>>>>> change the minimum/maximum frequency through OPP interface. >>>>>>>>>>>>> >>>>>>>>>>>>> But, in case of devfreq subsystem, as I explained already, devfreq support >>>>>>>>>>>>> the OPP interface as default way. devfreq subsystem doesn't need to add >>>>>>>>>>>>> other way to change the minimum/maximum frequency. >>>>>>>>>>>> >>>>>>>>>>>> Using the OPP interface exclusively works as long as a >>>>>>>>>>>> enabling/disabling of OPPs is limited to a single driver >>>>>>>>>>>> (drivers/thermal/devfreq_cooling.c). When multiple drivers are >>>>>>>>>>>> involved you need a way to resolve conflicts, that's the purpose of >>>>>>>>>>>> devfreq_verify_within_limits(). Please let me know if there are >>>>>>>>>>>> existing mechanisms for conflict resolution that I overlooked. >>>>>>>>>>>> >>>>>>>>>>>> Possibly drivers/thermal/devfreq_cooling.c could be migrated to use >>>>>>>>>>>> devfreq_verify_within_limits() instead of the OPP interface if >>>>>>>>>>>> desired, however this seems beyond the scope of this series. >>>>>>>>>>> >>>>>>>>>>> Actually, if we uses this approach, it doesn't support the multiple drivers too. >>>>>>>>>>> If non throttler drivers uses devfreq_verify_within_limits(), the conflict >>>>>>>>>>> happen. >>>>>>>>>> >>>>>>>>>> As long as drivers limit the max freq there is no conflict, the lowest >>>>>>>>>> max freq wins. I expect this to be the usual case, apparently it >>>>>>>>>> worked for cpufreq for 10+ years. >>>>>>>>>> >>>>>>>>>> However it is correct that there would be a conflict if a driver >>>>>>>>>> requests a min freq that is higher than the max freq requested by >>>>>>>>>> another. In this case devfreq_verify_within_limits() resolves the >>>>>>>>>> conflict by raising p->max to the min freq. Not sure if this is >>>>>>>>>> something that would ever occur in practice though. >>>>>>>>>> >>>>>>>>>> If we are really concerned about this case it would also be an option >>>>>>>>>> to limit the adjustment to the max frequency. >>>>>>>>>> >>>>>>>>>>> To resolve the conflict for multiple device driver, maybe OPP interface >>>>>>>>>>> have to support 'usage_count' such as clk_enable/disable(). >>>>>>>>>> >>>>>>>>>> This would require supporting negative usage count values, since a OPP >>>>>>>>>> should not be enabled if e.g. thermal enables it but the throttler >>>>>>>>>> disabled it or viceversa. >>>>>>>>>> >>>>>>>>>> Theoretically there could also be conflicts, like one driver disabling >>>>>>>>>> the higher OPPs and another the lower ones, with the outcome of all >>>>>>>>>> OPPs being disabled, which would be a more drastic conflict resolution >>>>>>>>>> than that of devfreq_verify_within_limits(). >>>>>>>>>> >>>>>>>>>> Viresh, what do you think about an OPP usage count? >>>>>>>>> >>>>>>>>> Ping, can we try to reach a conclusion on this or at least keep the >>>>>>>>> discussion going? >>>>>>>>> >>>>>>>>> Not that it matters, but my preferred solution continues to be >>>>>>>>> devfreq_verify_within_limits(). It solves conflicts in some way (which >>>>>>>>> could be adjusted if needed) and has proven to work in practice for >>>>>>>>> 10+ years in a very similar sub-system. >>>>>>>> >>>>>>>> It is not true. Current cpufreq subsystem doesn't support external OPP >>>>>>>> control to enable/disable the OPP entry. If some device driver >>>>>>>> controls the OPP entry of cpufreq driver with opp_disable/enable(), >>>>>>>> the operation is not working. Because cpufreq considers the limit >>>>>>>> through 'cpufreq_verify_with_limits()' only. >>>>>>> >>>>>>> Ok, we can probably agree that using cpufreq_verify_with_limits() >>>>>>> exclusively seems to have worked well for cpufreq, and that in their >>>>>>> overall purpose cpufreq and devfreq are similar subsystems. >>>>>>> >>>>>>> The current throttler series with devfreq_verify_within_limits() takes >>>>>>> the enabled OPPs into account, the lowest and highest OPP are used as >>>>>>> a starting point for the frequency adjustment and (in theory) the >>>>>>> frequency range should only be narrowed by >>>>>>> devfreq_verify_within_limits(). >>>>>>> >>>>>>>> As I already commented[1], there is different between cpufreq and devfreq. >>>>>>>> [1] https://lkml.org/lkml/2018/7/4/80 >>>>>>>> >>>>>>>> Already, subsystem already used OPP interface in order to control >>>>>>>> specific OPP entry. I don't want to provide two outside method >>>>>>>> to control the frequency of devfreq driver. It might make the confusion. >>>>>>> >>>>>>> I understand your point, it would indeed be preferable to have a >>>>>>> single method. However I'm not convinced that the OPP interface is >>>>>>> a suitable solution, as I exposed earlier in this thread (quoted >>>>>>> below). >>>>>>> >>>>>>> I would like you to at least consider the possibility of changing >>>>>>> drivers/thermal/devfreq_cooling.c to devfreq_verify_within_limits(). >>>>>>> Besides that it's not what is currently used, do you see any technical >>>>>>> concerns that would make devfreq_verify_within_limits() an unsuitable >>>>>>> or inferior solution? >>>>>> >>>>>> As we already discussed, devfreq_verify_within_limits() doesn't support >>>>>> the multiple outside controllers (e.g., devfreq-cooling.c). >>>>> >>>>> That's incorrect, its purpose is precisely that. >>>>> >>>>> Are you suggesting that cpufreq with its use of >>>>> cpufreq_verify_within_limits() (the inspiration for >>>>> devfreq_verify_within_limits()) is broken? It is used by cpu_cooling.c >>>>> and other drivers when receiving a CPUFREQ_ADJUST event, essentially >>>>> what I am proposing with DEVFREQ_ADJUST. >>>>> >>>>> Could you elaborate why this model wouldn't work for devfreq? "OPP >>>>> interface is mandatory for devfreq" isn't really a technical argument, >>>>> is it mandatory for any other reason than that it is the interface >>>>> that is currently used? >>>>> >>>>>> After you are suggesting the throttler core, there are at least two >>>>>> outside controllers (e.g., devfreq-cooling.c and throttler driver). >>>>>> As I knew the problem about conflict, I cannot agree the temporary >>>>>> method. OPP interface is mandatory for devfreq in order to control >>>>>> the OPP (frequency/voltage). In this situation, we have to try to >>>>>> find the method through OPP interface. >>>>> >>>>> What do you mean with "temporary method"? >>>>> >>>>> We can try to find a method through the OPP interface, but at this >>>>> point I'm not convinced that it is technically necessary or even >>>>> preferable. >>>>> >>>>> Another inconvenient of the OPP approach for both devfreq-cooling.c >>>>> and the throttler is that they have to bother with disabling all OPPs >>>>> above/below the max/min (they don't/shouldn't have to care), instead >>>>> of just telling devfreq the max/min. >>>> >>>> And a more important one: both drivers now have to keep track which >>>> OPPs they enabled/disabled previously, done are the days of a simple >>>> dev_pm_opp_enable/disable() in devfreq_cooling. Certainly it is >>>> possible and not very complex to implement, but is it really the >>>> best/a good solution? >>> >>> >>> As I replied them right before, Each outside driver has their own throttling >>> policy to control OPP entries. They don't care the requirement of other >>> driver and cannot know the requirement of other driver. devfreq core can only >>> recognize them and then only consider enabled OPP entris without disabled OPP entries. >>> >>> For example1, >>> | devfreq-cooling| throttler >>> --------------------------------------- >>> 500Mhz | disabled | disabled >>> 400Mhz | disabled | disabled >>> 300Mhz | | disabled >>> 200Mhz | | >>> 100Mhz | | >>> => devfreq driver can use only 100/200Mhz >>> >>> >>> For example2, >>> | devfreq-cooling| throttler >>> --------------------------------------- >>> 500Mhz | disabled | disabled >>> 400Mhz | disabled | >>> 300Mhz | disabled | >>> 200Mhz | | >>> 100Mhz | | >>> => devfreq driver can use only 100/200Mhz >>> >>> >>> For example3, >>> | devfreq-cooling| throttler >>> --------------------------------------- >>> 500Mhz | disabled | disabled >>> 400Mhz | | >>> 300Mhz | | >>> 200Mhz | | disabled >>> 100Mhz | | disabled >>> => devfreq driver can use only 300/400Mhz >> >> These are all cases without conflicts, my concern is about this: >> >>> | devfreq-cooling| throttler >>> --------------------------------------- >>> 500Mhz | disabled | >>> 400Mhz | disabled | >>> 300Mhz | | disabled >>> 200Mhz | | disabled >>> 100Mhz | | disabled >>> => devfreq driver can't use any frequency? > > There are no any enabled frequency. Because device driver > (devfreq-cooling, throttler) disable all frequencies. > > Outside drivers(devfreq-cooling, throttler) can enable/disable > specific OPP entries. As I already commented, each outside driver > doesn't consider the policy of other device driver about OPP entries. > > OPP interface is independent on devfreq and just control OPP entries. > After that, devfreq just consider the only enabled OPP entries. In this case, at least one OPP should be remained on enabled state. Maybe, OPP interface should provide the function which cannot disable specific OPP entry. > >> >> Actually my above comment wasn't about this case, but about the >> added complexity in devfreq-cooling.c and the throttler: >> >> A bit simplified partition_enable_opps() currently does this: >> >> for_each_opp(opp) { >> if (opp->freq <= max) >> opp_enable(opp) >> else >> opp_disable(opp) >> } >> >> With the OPP usage/disable count this doesn't work any longer. Now we >> need to keep track of the enabled/disabled state of the OPP, something >> like: >> >> dev_pm_opp_enable(opp) { >> if (opp->freq <= max) { >> if (opp->freq > prev_max) >> opp_enable(opp) >> } else { >> if (opp->freq < prev_max) >> opp_disable(opp) >> } >> } >> >> And duplicate the same in the throttler (and other possible >> drivers). Obviously it can be done, but is there really any gain >> from it? >> >> Instead they just could do: >> >> devfreq_verify_within_limits(policy/freq_pair, 0, max_freq) >> >> without being concerned about implementation details of devfreq. >> > > I don't think so. dev_pm_opp_enable()/dev_pm_opp_disable() > have to consider only one OPP entry without any other OPP entry. > > dev_pm_opp_enable()/dev_pm_opp_disable() can never know the other > OPP entries. After some driver(devfreq-cooling.c and throttler) > enable or disable specific OPP entries, the remaining OPP entry > with enabled state will be considered on devfreq driver. >
Hi Chanwoo, On Tue, Aug 07, 2018 at 07:31:16AM +0900, Chanwoo Choi wrote: > Hi Matthias, > > On 2018년 08월 07일 04:21, Matthias Kaehlcke wrote: > > Hi Chanwoo, > > > > On Fri, Aug 03, 2018 at 09:14:46AM +0900, Chanwoo Choi wrote: > >> Hi Matthias, > >> > >> On 2018년 08월 03일 08:48, Matthias Kaehlcke wrote: > >>> On Thu, Aug 02, 2018 at 04:13:43PM -0700, Matthias Kaehlcke wrote: > >>>> Hi Chanwoo, > >>>> > >>>> On Thu, Aug 02, 2018 at 10:58:59AM +0900, Chanwoo Choi wrote: > >>>>> Hi Matthias, > >>>>> > >>>>> On 2018년 08월 02일 02:08, Matthias Kaehlcke wrote: > >>>>>> Hi Chanwoo, > >>>>>> > >>>>>> On Wed, Aug 01, 2018 at 10:22:16AM +0900, Chanwoo Choi wrote: > >>>>>>> On 2018년 08월 01일 04:39, Matthias Kaehlcke wrote: > >>>>>>>> On Mon, Jul 16, 2018 at 10:50:50AM -0700, Matthias Kaehlcke wrote: > >>>>>>>>> On Thu, Jul 12, 2018 at 05:44:33PM +0900, Chanwoo Choi wrote: > >>>>>>>>>> Hi Matthias, > >>>>>>>>>> > >>>>>>>>>> On 2018년 07월 07일 02:53, Matthias Kaehlcke wrote: > >>>>>>>>>>> Hi Chanwoo, > >>>>>>>>>>> > >>>>>>>>>>> On Wed, Jul 04, 2018 at 03:41:46PM +0900, Chanwoo Choi wrote: > >>>>>>>>>>> > >>>>>>>>>>>> Firstly, > >>>>>>>>>>>> I'm not sure why devfreq needs the devfreq_verify_within_limits() function. > >>>>>>>>>>>> > >>>>>>>>>>>> devfreq already used the OPP interface as default. It means that > >>>>>>>>>>>> the outside of 'drivers/devfreq' can disable/enable the frequency > >>>>>>>>>>>> such as drivers/thermal/devfreq_cooling.c. Also, when some device > >>>>>>>>>>>> drivers disable/enable the specific frequency, the devfreq core > >>>>>>>>>>>> consider them. > >>>>>>>>>>>> > >>>>>>>>>>>> So, devfreq doesn't need to devfreq_verify_within_limits() because > >>>>>>>>>>>> already support some interface to change the minimum/maximum frequency > >>>>>>>>>>>> of devfreq device. > >>>>>>>>>>>> > >>>>>>>>>>>> In case of cpufreq subsystem, cpufreq only provides 'cpufreq_verify_with_limits()' > >>>>>>>>>>>> to change the minimum/maximum frequency of cpu. some device driver cannot > >>>>>>>>>>>> change the minimum/maximum frequency through OPP interface. > >>>>>>>>>>>> > >>>>>>>>>>>> But, in case of devfreq subsystem, as I explained already, devfreq support > >>>>>>>>>>>> the OPP interface as default way. devfreq subsystem doesn't need to add > >>>>>>>>>>>> other way to change the minimum/maximum frequency. > >>>>>>>>>>> > >>>>>>>>>>> Using the OPP interface exclusively works as long as a > >>>>>>>>>>> enabling/disabling of OPPs is limited to a single driver > >>>>>>>>>>> (drivers/thermal/devfreq_cooling.c). When multiple drivers are > >>>>>>>>>>> involved you need a way to resolve conflicts, that's the purpose of > >>>>>>>>>>> devfreq_verify_within_limits(). Please let me know if there are > >>>>>>>>>>> existing mechanisms for conflict resolution that I overlooked. > >>>>>>>>>>> > >>>>>>>>>>> Possibly drivers/thermal/devfreq_cooling.c could be migrated to use > >>>>>>>>>>> devfreq_verify_within_limits() instead of the OPP interface if > >>>>>>>>>>> desired, however this seems beyond the scope of this series. > >>>>>>>>>> > >>>>>>>>>> Actually, if we uses this approach, it doesn't support the multiple drivers too. > >>>>>>>>>> If non throttler drivers uses devfreq_verify_within_limits(), the conflict > >>>>>>>>>> happen. > >>>>>>>>> > >>>>>>>>> As long as drivers limit the max freq there is no conflict, the lowest > >>>>>>>>> max freq wins. I expect this to be the usual case, apparently it > >>>>>>>>> worked for cpufreq for 10+ years. > >>>>>>>>> > >>>>>>>>> However it is correct that there would be a conflict if a driver > >>>>>>>>> requests a min freq that is higher than the max freq requested by > >>>>>>>>> another. In this case devfreq_verify_within_limits() resolves the > >>>>>>>>> conflict by raising p->max to the min freq. Not sure if this is > >>>>>>>>> something that would ever occur in practice though. > >>>>>>>>> > >>>>>>>>> If we are really concerned about this case it would also be an option > >>>>>>>>> to limit the adjustment to the max frequency. > >>>>>>>>> > >>>>>>>>>> To resolve the conflict for multiple device driver, maybe OPP interface > >>>>>>>>>> have to support 'usage_count' such as clk_enable/disable(). > >>>>>>>>> > >>>>>>>>> This would require supporting negative usage count values, since a OPP > >>>>>>>>> should not be enabled if e.g. thermal enables it but the throttler > >>>>>>>>> disabled it or viceversa. > >>>>>>>>> > >>>>>>>>> Theoretically there could also be conflicts, like one driver disabling > >>>>>>>>> the higher OPPs and another the lower ones, with the outcome of all > >>>>>>>>> OPPs being disabled, which would be a more drastic conflict resolution > >>>>>>>>> than that of devfreq_verify_within_limits(). > >>>>>>>>> > >>>>>>>>> Viresh, what do you think about an OPP usage count? > >>>>>>>> > >>>>>>>> Ping, can we try to reach a conclusion on this or at least keep the > >>>>>>>> discussion going? > >>>>>>>> > >>>>>>>> Not that it matters, but my preferred solution continues to be > >>>>>>>> devfreq_verify_within_limits(). It solves conflicts in some way (which > >>>>>>>> could be adjusted if needed) and has proven to work in practice for > >>>>>>>> 10+ years in a very similar sub-system. > >>>>>>> > >>>>>>> It is not true. Current cpufreq subsystem doesn't support external OPP > >>>>>>> control to enable/disable the OPP entry. If some device driver > >>>>>>> controls the OPP entry of cpufreq driver with opp_disable/enable(), > >>>>>>> the operation is not working. Because cpufreq considers the limit > >>>>>>> through 'cpufreq_verify_with_limits()' only. > >>>>>> > >>>>>> Ok, we can probably agree that using cpufreq_verify_with_limits() > >>>>>> exclusively seems to have worked well for cpufreq, and that in their > >>>>>> overall purpose cpufreq and devfreq are similar subsystems. > >>>>>> > >>>>>> The current throttler series with devfreq_verify_within_limits() takes > >>>>>> the enabled OPPs into account, the lowest and highest OPP are used as > >>>>>> a starting point for the frequency adjustment and (in theory) the > >>>>>> frequency range should only be narrowed by > >>>>>> devfreq_verify_within_limits(). > >>>>>> > >>>>>>> As I already commented[1], there is different between cpufreq and devfreq. > >>>>>>> [1] https://lkml.org/lkml/2018/7/4/80 > >>>>>>> > >>>>>>> Already, subsystem already used OPP interface in order to control > >>>>>>> specific OPP entry. I don't want to provide two outside method > >>>>>>> to control the frequency of devfreq driver. It might make the confusion. > >>>>>> > >>>>>> I understand your point, it would indeed be preferable to have a > >>>>>> single method. However I'm not convinced that the OPP interface is > >>>>>> a suitable solution, as I exposed earlier in this thread (quoted > >>>>>> below). > >>>>>> > >>>>>> I would like you to at least consider the possibility of changing > >>>>>> drivers/thermal/devfreq_cooling.c to devfreq_verify_within_limits(). > >>>>>> Besides that it's not what is currently used, do you see any technical > >>>>>> concerns that would make devfreq_verify_within_limits() an unsuitable > >>>>>> or inferior solution? > >>>>> > >>>>> As we already discussed, devfreq_verify_within_limits() doesn't support > >>>>> the multiple outside controllers (e.g., devfreq-cooling.c). > >>>> > >>>> That's incorrect, its purpose is precisely that. > >>>> > >>>> Are you suggesting that cpufreq with its use of > >>>> cpufreq_verify_within_limits() (the inspiration for > >>>> devfreq_verify_within_limits()) is broken? It is used by cpu_cooling.c > >>>> and other drivers when receiving a CPUFREQ_ADJUST event, essentially > >>>> what I am proposing with DEVFREQ_ADJUST. > >>>> > >>>> Could you elaborate why this model wouldn't work for devfreq? "OPP > >>>> interface is mandatory for devfreq" isn't really a technical argument, > >>>> is it mandatory for any other reason than that it is the interface > >>>> that is currently used? > >>>> > >>>>> After you are suggesting the throttler core, there are at least two > >>>>> outside controllers (e.g., devfreq-cooling.c and throttler driver). > >>>>> As I knew the problem about conflict, I cannot agree the temporary > >>>>> method. OPP interface is mandatory for devfreq in order to control > >>>>> the OPP (frequency/voltage). In this situation, we have to try to > >>>>> find the method through OPP interface. > >>>> > >>>> What do you mean with "temporary method"? > >>>> > >>>> We can try to find a method through the OPP interface, but at this > >>>> point I'm not convinced that it is technically necessary or even > >>>> preferable. > >>>> > >>>> Another inconvenient of the OPP approach for both devfreq-cooling.c > >>>> and the throttler is that they have to bother with disabling all OPPs > >>>> above/below the max/min (they don't/shouldn't have to care), instead > >>>> of just telling devfreq the max/min. > >>> > >>> And a more important one: both drivers now have to keep track which > >>> OPPs they enabled/disabled previously, done are the days of a simple > >>> dev_pm_opp_enable/disable() in devfreq_cooling. Certainly it is > >>> possible and not very complex to implement, but is it really the > >>> best/a good solution? > >> > >> > >> As I replied them right before, Each outside driver has their own throttling > >> policy to control OPP entries. They don't care the requirement of other > >> driver and cannot know the requirement of other driver. devfreq core can only > >> recognize them and then only consider enabled OPP entris without disabled OPP entries. > >> > >> For example1, > >> | devfreq-cooling| throttler > >> --------------------------------------- > >> 500Mhz | disabled | disabled > >> 400Mhz | disabled | disabled > >> 300Mhz | | disabled > >> 200Mhz | | > >> 100Mhz | | > >> => devfreq driver can use only 100/200Mhz > >> > >> > >> For example2, > >> | devfreq-cooling| throttler > >> --------------------------------------- > >> 500Mhz | disabled | disabled > >> 400Mhz | disabled | > >> 300Mhz | disabled | > >> 200Mhz | | > >> 100Mhz | | > >> => devfreq driver can use only 100/200Mhz > >> > >> > >> For example3, > >> | devfreq-cooling| throttler > >> --------------------------------------- > >> 500Mhz | disabled | disabled > >> 400Mhz | | > >> 300Mhz | | > >> 200Mhz | | disabled > >> 100Mhz | | disabled > >> => devfreq driver can use only 300/400Mhz > > > > These are all cases without conflicts, my concern is about this: > > > >> | devfreq-cooling| throttler > >> --------------------------------------- > >> 500Mhz | disabled | > >> 400Mhz | disabled | > >> 300Mhz | | disabled > >> 200Mhz | | disabled > >> 100Mhz | | disabled > >> => devfreq driver can't use any frequency? > > There are no any enabled frequency. Because device driver > (devfreq-cooling, throttler) disable all frequencies. > > Outside drivers(devfreq-cooling, throttler) can enable/disable > specific OPP entries. As I already commented, each outside driver > doesn't consider the policy of other device driver about OPP entries. And wouldn't it be preferable to have an interface that tries to avoid this situation in the first place and has a clear policy for conflict resolution? > OPP interface is independent on devfreq and just control OPP entries. > After that, devfreq just consider the only enabled OPP entries. > > > > > Actually my above comment wasn't about this case, but about the > > added complexity in devfreq-cooling.c and the throttler: > > > > A bit simplified partition_enable_opps() currently does this: > > > > for_each_opp(opp) { > > if (opp->freq <= max) > > opp_enable(opp) > > else > > opp_disable(opp) > > } > > > > With the OPP usage/disable count this doesn't work any longer. Now we > > need to keep track of the enabled/disabled state of the OPP, something > > like: > > > > dev_pm_opp_enable(opp) { > > if (opp->freq <= max) { > > if (opp->freq > prev_max) > > opp_enable(opp) > > } else { > > if (opp->freq < prev_max) > > opp_disable(opp) > > } > > } > > > > And duplicate the same in the throttler (and other possible > > drivers). Obviously it can be done, but is there really any gain > > from it? > > > > Instead they just could do: > > > > devfreq_verify_within_limits(policy/freq_pair, 0, max_freq) > > > > without being concerned about implementation details of devfreq. > > > > I don't think so. What are you referring to, the change that I claim that will be needed in partition_enable_opps() when OPPs have usage/disable counts? If so, how do you avoid that the function doesn't enable/disable an OPP that was already enabled/disabled in the previous iteration? > dev_pm_opp_enable()/dev_pm_opp_disable() have to consider only one > OPP entry without any other OPP entry. I agree with this :) > dev_pm_opp_enable()/dev_pm_opp_disable() can never know the other > OPP entries. After some driver(devfreq-cooling.c and throttler) > enable or disable specific OPP entries, the remaining OPP entry > with enabled state will be considered on devfreq driver. Having multiple drivers (or even a single one) enable and disable OPPs independently and at the time of their choosing sounds like a recipe for race conditions. What happens if e.g. the devfreq core calls dev_pm_opp_find_freq_ceil/floor() and right after returning another driver disables the OPP? devfreq uses the disabled OPP. Probably not a big deal if disabling the OPP is only a semantic question, but I imagine there can be worse scenarios. Currently the only user of dev_pm_opp_disable() besides devfreq_cooling.c is imx6q-cpufreq.c, and it is well behaved and only disables OPPs during probe(). I keep missing a clear answer to the question in which sense manipulating the OPPs in devfreq_cooling.c is superior over narrowing down the frequency during DEVFREQ_ADJUST, which would avoid potential races and allow to resolve conflicts. Does it allow for some functionality that couldn't be achieved otherwise, does it make the code significantly less complex, is some integration with the OPP subsystem needed that I'm overlooking, is it more efficient, ...? I'm not just insisting because I'm stubborn. I'd be happy to use any interface that fits the bill, or to adjust one to fit the bill, but as of now I mainly see drawbacks on the OPPs side and haven't seen convincing arguments that it is really needed in devreq_cooling.c or a better solution. Cheers Matthias
Hi Matthias, On 2018년 08월 07일 09:23, Matthias Kaehlcke wrote: > Hi Chanwoo, > > On Tue, Aug 07, 2018 at 07:31:16AM +0900, Chanwoo Choi wrote: >> Hi Matthias, >> >> On 2018년 08월 07일 04:21, Matthias Kaehlcke wrote: >>> Hi Chanwoo, >>> >>> On Fri, Aug 03, 2018 at 09:14:46AM +0900, Chanwoo Choi wrote: >>>> Hi Matthias, >>>> >>>> On 2018년 08월 03일 08:48, Matthias Kaehlcke wrote: >>>>> On Thu, Aug 02, 2018 at 04:13:43PM -0700, Matthias Kaehlcke wrote: >>>>>> Hi Chanwoo, >>>>>> >>>>>> On Thu, Aug 02, 2018 at 10:58:59AM +0900, Chanwoo Choi wrote: >>>>>>> Hi Matthias, >>>>>>> >>>>>>> On 2018년 08월 02일 02:08, Matthias Kaehlcke wrote: >>>>>>>> Hi Chanwoo, >>>>>>>> >>>>>>>> On Wed, Aug 01, 2018 at 10:22:16AM +0900, Chanwoo Choi wrote: >>>>>>>>> On 2018년 08월 01일 04:39, Matthias Kaehlcke wrote: >>>>>>>>>> On Mon, Jul 16, 2018 at 10:50:50AM -0700, Matthias Kaehlcke wrote: >>>>>>>>>>> On Thu, Jul 12, 2018 at 05:44:33PM +0900, Chanwoo Choi wrote: >>>>>>>>>>>> Hi Matthias, >>>>>>>>>>>> >>>>>>>>>>>> On 2018년 07월 07일 02:53, Matthias Kaehlcke wrote: >>>>>>>>>>>>> Hi Chanwoo, >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Jul 04, 2018 at 03:41:46PM +0900, Chanwoo Choi wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Firstly, >>>>>>>>>>>>>> I'm not sure why devfreq needs the devfreq_verify_within_limits() function. >>>>>>>>>>>>>> >>>>>>>>>>>>>> devfreq already used the OPP interface as default. It means that >>>>>>>>>>>>>> the outside of 'drivers/devfreq' can disable/enable the frequency >>>>>>>>>>>>>> such as drivers/thermal/devfreq_cooling.c. Also, when some device >>>>>>>>>>>>>> drivers disable/enable the specific frequency, the devfreq core >>>>>>>>>>>>>> consider them. >>>>>>>>>>>>>> >>>>>>>>>>>>>> So, devfreq doesn't need to devfreq_verify_within_limits() because >>>>>>>>>>>>>> already support some interface to change the minimum/maximum frequency >>>>>>>>>>>>>> of devfreq device. >>>>>>>>>>>>>> >>>>>>>>>>>>>> In case of cpufreq subsystem, cpufreq only provides 'cpufreq_verify_with_limits()' >>>>>>>>>>>>>> to change the minimum/maximum frequency of cpu. some device driver cannot >>>>>>>>>>>>>> change the minimum/maximum frequency through OPP interface. >>>>>>>>>>>>>> >>>>>>>>>>>>>> But, in case of devfreq subsystem, as I explained already, devfreq support >>>>>>>>>>>>>> the OPP interface as default way. devfreq subsystem doesn't need to add >>>>>>>>>>>>>> other way to change the minimum/maximum frequency. >>>>>>>>>>>>> >>>>>>>>>>>>> Using the OPP interface exclusively works as long as a >>>>>>>>>>>>> enabling/disabling of OPPs is limited to a single driver >>>>>>>>>>>>> (drivers/thermal/devfreq_cooling.c). When multiple drivers are >>>>>>>>>>>>> involved you need a way to resolve conflicts, that's the purpose of >>>>>>>>>>>>> devfreq_verify_within_limits(). Please let me know if there are >>>>>>>>>>>>> existing mechanisms for conflict resolution that I overlooked. >>>>>>>>>>>>> >>>>>>>>>>>>> Possibly drivers/thermal/devfreq_cooling.c could be migrated to use >>>>>>>>>>>>> devfreq_verify_within_limits() instead of the OPP interface if >>>>>>>>>>>>> desired, however this seems beyond the scope of this series. >>>>>>>>>>>> >>>>>>>>>>>> Actually, if we uses this approach, it doesn't support the multiple drivers too. >>>>>>>>>>>> If non throttler drivers uses devfreq_verify_within_limits(), the conflict >>>>>>>>>>>> happen. >>>>>>>>>>> >>>>>>>>>>> As long as drivers limit the max freq there is no conflict, the lowest >>>>>>>>>>> max freq wins. I expect this to be the usual case, apparently it >>>>>>>>>>> worked for cpufreq for 10+ years. >>>>>>>>>>> >>>>>>>>>>> However it is correct that there would be a conflict if a driver >>>>>>>>>>> requests a min freq that is higher than the max freq requested by >>>>>>>>>>> another. In this case devfreq_verify_within_limits() resolves the >>>>>>>>>>> conflict by raising p->max to the min freq. Not sure if this is >>>>>>>>>>> something that would ever occur in practice though. >>>>>>>>>>> >>>>>>>>>>> If we are really concerned about this case it would also be an option >>>>>>>>>>> to limit the adjustment to the max frequency. >>>>>>>>>>> >>>>>>>>>>>> To resolve the conflict for multiple device driver, maybe OPP interface >>>>>>>>>>>> have to support 'usage_count' such as clk_enable/disable(). >>>>>>>>>>> >>>>>>>>>>> This would require supporting negative usage count values, since a OPP >>>>>>>>>>> should not be enabled if e.g. thermal enables it but the throttler >>>>>>>>>>> disabled it or viceversa. >>>>>>>>>>> >>>>>>>>>>> Theoretically there could also be conflicts, like one driver disabling >>>>>>>>>>> the higher OPPs and another the lower ones, with the outcome of all >>>>>>>>>>> OPPs being disabled, which would be a more drastic conflict resolution >>>>>>>>>>> than that of devfreq_verify_within_limits(). >>>>>>>>>>> >>>>>>>>>>> Viresh, what do you think about an OPP usage count? >>>>>>>>>> >>>>>>>>>> Ping, can we try to reach a conclusion on this or at least keep the >>>>>>>>>> discussion going? >>>>>>>>>> >>>>>>>>>> Not that it matters, but my preferred solution continues to be >>>>>>>>>> devfreq_verify_within_limits(). It solves conflicts in some way (which >>>>>>>>>> could be adjusted if needed) and has proven to work in practice for >>>>>>>>>> 10+ years in a very similar sub-system. >>>>>>>>> >>>>>>>>> It is not true. Current cpufreq subsystem doesn't support external OPP >>>>>>>>> control to enable/disable the OPP entry. If some device driver >>>>>>>>> controls the OPP entry of cpufreq driver with opp_disable/enable(), >>>>>>>>> the operation is not working. Because cpufreq considers the limit >>>>>>>>> through 'cpufreq_verify_with_limits()' only. >>>>>>>> >>>>>>>> Ok, we can probably agree that using cpufreq_verify_with_limits() >>>>>>>> exclusively seems to have worked well for cpufreq, and that in their >>>>>>>> overall purpose cpufreq and devfreq are similar subsystems. >>>>>>>> >>>>>>>> The current throttler series with devfreq_verify_within_limits() takes >>>>>>>> the enabled OPPs into account, the lowest and highest OPP are used as >>>>>>>> a starting point for the frequency adjustment and (in theory) the >>>>>>>> frequency range should only be narrowed by >>>>>>>> devfreq_verify_within_limits(). >>>>>>>> >>>>>>>>> As I already commented[1], there is different between cpufreq and devfreq. >>>>>>>>> [1] https://lkml.org/lkml/2018/7/4/80 >>>>>>>>> >>>>>>>>> Already, subsystem already used OPP interface in order to control >>>>>>>>> specific OPP entry. I don't want to provide two outside method >>>>>>>>> to control the frequency of devfreq driver. It might make the confusion. >>>>>>>> >>>>>>>> I understand your point, it would indeed be preferable to have a >>>>>>>> single method. However I'm not convinced that the OPP interface is >>>>>>>> a suitable solution, as I exposed earlier in this thread (quoted >>>>>>>> below). >>>>>>>> >>>>>>>> I would like you to at least consider the possibility of changing >>>>>>>> drivers/thermal/devfreq_cooling.c to devfreq_verify_within_limits(). >>>>>>>> Besides that it's not what is currently used, do you see any technical >>>>>>>> concerns that would make devfreq_verify_within_limits() an unsuitable >>>>>>>> or inferior solution? >>>>>>> >>>>>>> As we already discussed, devfreq_verify_within_limits() doesn't support >>>>>>> the multiple outside controllers (e.g., devfreq-cooling.c). >>>>>> >>>>>> That's incorrect, its purpose is precisely that. >>>>>> >>>>>> Are you suggesting that cpufreq with its use of >>>>>> cpufreq_verify_within_limits() (the inspiration for >>>>>> devfreq_verify_within_limits()) is broken? It is used by cpu_cooling.c >>>>>> and other drivers when receiving a CPUFREQ_ADJUST event, essentially >>>>>> what I am proposing with DEVFREQ_ADJUST. >>>>>> >>>>>> Could you elaborate why this model wouldn't work for devfreq? "OPP >>>>>> interface is mandatory for devfreq" isn't really a technical argument, >>>>>> is it mandatory for any other reason than that it is the interface >>>>>> that is currently used? >>>>>> >>>>>>> After you are suggesting the throttler core, there are at least two >>>>>>> outside controllers (e.g., devfreq-cooling.c and throttler driver). >>>>>>> As I knew the problem about conflict, I cannot agree the temporary >>>>>>> method. OPP interface is mandatory for devfreq in order to control >>>>>>> the OPP (frequency/voltage). In this situation, we have to try to >>>>>>> find the method through OPP interface. >>>>>> >>>>>> What do you mean with "temporary method"? >>>>>> >>>>>> We can try to find a method through the OPP interface, but at this >>>>>> point I'm not convinced that it is technically necessary or even >>>>>> preferable. >>>>>> >>>>>> Another inconvenient of the OPP approach for both devfreq-cooling.c >>>>>> and the throttler is that they have to bother with disabling all OPPs >>>>>> above/below the max/min (they don't/shouldn't have to care), instead >>>>>> of just telling devfreq the max/min. >>>>> >>>>> And a more important one: both drivers now have to keep track which >>>>> OPPs they enabled/disabled previously, done are the days of a simple >>>>> dev_pm_opp_enable/disable() in devfreq_cooling. Certainly it is >>>>> possible and not very complex to implement, but is it really the >>>>> best/a good solution? >>>> >>>> >>>> As I replied them right before, Each outside driver has their own throttling >>>> policy to control OPP entries. They don't care the requirement of other >>>> driver and cannot know the requirement of other driver. devfreq core can only >>>> recognize them and then only consider enabled OPP entris without disabled OPP entries. >>>> >>>> For example1, >>>> | devfreq-cooling| throttler >>>> --------------------------------------- >>>> 500Mhz | disabled | disabled >>>> 400Mhz | disabled | disabled >>>> 300Mhz | | disabled >>>> 200Mhz | | >>>> 100Mhz | | >>>> => devfreq driver can use only 100/200Mhz >>>> >>>> >>>> For example2, >>>> | devfreq-cooling| throttler >>>> --------------------------------------- >>>> 500Mhz | disabled | disabled >>>> 400Mhz | disabled | >>>> 300Mhz | disabled | >>>> 200Mhz | | >>>> 100Mhz | | >>>> => devfreq driver can use only 100/200Mhz >>>> >>>> >>>> For example3, >>>> | devfreq-cooling| throttler >>>> --------------------------------------- >>>> 500Mhz | disabled | disabled >>>> 400Mhz | | >>>> 300Mhz | | >>>> 200Mhz | | disabled >>>> 100Mhz | | disabled >>>> => devfreq driver can use only 300/400Mhz >>> >>> These are all cases without conflicts, my concern is about this: >>> >>>> | devfreq-cooling| throttler >>>> --------------------------------------- >>>> 500Mhz | disabled | >>>> 400Mhz | disabled | >>>> 300Mhz | | disabled >>>> 200Mhz | | disabled >>>> 100Mhz | | disabled >>>> => devfreq driver can't use any frequency? >> >> There are no any enabled frequency. Because device driver >> (devfreq-cooling, throttler) disable all frequencies. >> >> Outside drivers(devfreq-cooling, throttler) can enable/disable >> specific OPP entries. As I already commented, each outside driver >> doesn't consider the policy of other device driver about OPP entries. > > And wouldn't it be preferable to have an interface that tries to avoid > this situation in the first place and has a clear policy for conflict > resolution? > >> OPP interface is independent on devfreq and just control OPP entries. >> After that, devfreq just consider the only enabled OPP entries. >> >>> >>> Actually my above comment wasn't about this case, but about the >>> added complexity in devfreq-cooling.c and the throttler: >>> >>> A bit simplified partition_enable_opps() currently does this: >>> >>> for_each_opp(opp) { >>> if (opp->freq <= max) >>> opp_enable(opp) >>> else >>> opp_disable(opp) >>> } >>> >>> With the OPP usage/disable count this doesn't work any longer. Now we >>> need to keep track of the enabled/disabled state of the OPP, something >>> like: >>> >>> dev_pm_opp_enable(opp) { >>> if (opp->freq <= max) { >>> if (opp->freq > prev_max) >>> opp_enable(opp) >>> } else { >>> if (opp->freq < prev_max) >>> opp_disable(opp) >>> } >>> } >>> >>> And duplicate the same in the throttler (and other possible >>> drivers). Obviously it can be done, but is there really any gain >>> from it? >>> >>> Instead they just could do: >>> >>> devfreq_verify_within_limits(policy/freq_pair, 0, max_freq) I have a new question about using devfreq_verify_within_limits(). For example, Driver A has following opp-table - 500Mhz - 400Mhz - 300Mhz - 200Mhz - 100Mhz Basically, driver A has following init value: - policy->min is 100 - policy->max is 500 Driver B, devfreq_verify_within_limits(200, 300) policy->min is 200 policy->max is 300 Driver C, devfreq_verify_within_limits(300, 400) policy->min is 300 policy->max is 300 Driver D, devfreq_verify_within_limits(400, 500) policy->min is 400 policy->max is 400 In result, it looks like the requirement of Driver B are disappeared. is it the intention of devfreq_verify_within_limits()? >>> >>> without being concerned about implementation details of devfreq. >>> >> >> I don't think so. > > What are you referring to, the change that I claim that will be needed > in partition_enable_opps() when OPPs have usage/disable counts? If so, > how do you avoid that the function doesn't enable/disable an OPP that > was already enabled/disabled in the previous iteration? Just about changes of "dev_pm_opp_enable(opp)". > >> dev_pm_opp_enable()/dev_pm_opp_disable() have to consider only one >> OPP entry without any other OPP entry. > > I agree with this :) > >> dev_pm_opp_enable()/dev_pm_opp_disable() can never know the other >> OPP entries. After some driver(devfreq-cooling.c and throttler) >> enable or disable specific OPP entries, the remaining OPP entry >> with enabled state will be considered on devfreq driver. > > Having multiple drivers (or even a single one) enable and disable > OPPs independently and at the time of their choosing sounds like a > recipe for race conditions. > > What happens if e.g. the devfreq core calls > dev_pm_opp_find_freq_ceil/floor() and right after returning another > driver disables the OPP? devfreq uses the disabled OPP. Probably not a\ devfreq doesn't use the disabled OPP. For example, 1. devfreq-cooling.c disable/enable some OPP and OPP send notification about OPP changes. 2. devfreq receives the notification (devfreq_notifier_call() is executed) 3. devfreq_notifier_call() try to find scaling_min_freq/scaling_max_freq 4. devfreq_notifier_call() executes update_devfreq() in order to apply the OPP changes. 5. devfreq can consider only enabled frequencies right after dev_pm_opp_disable/enable() > big deal if disabling the OPP is only a semantic question, but I > imagine there can be worse scenarios. Currently the only user of > dev_pm_opp_disable() besides devfreq_cooling.c is imx6q-cpufreq.c, and > it is well behaved and only disables OPPs during probe(). imx6q-cpufreq.c used the dev_pm_opp_disable() before calling dev_pm_opp_init_cpufreq_table(). After registered cpufreq_register_driver(), imx6q-cpufreq.c doesn't use the dev_pm_opp_disable/enable(). It means that dev_pm_opp_disable() of imx6q-cpufreq.c doesn't affect the frequency choice of cpufreq on the runtime after registered cpufreq driver. On the other hand, devfreq_cooling.c use dev_pm_opp_disable/enable() on the runtime after registering devfreq driver. It affect the frequency choice of devfreq on the runtime. > > I keep missing a clear answer to the question in which sense > manipulating the OPPs in devfreq_cooling.c is superior over narrowing > down the frequency during DEVFREQ_ADJUST, which would avoid potential > races and allow to resolve conflicts. Does it allow for some You mentioned the race conditions eariler. Actually, I don't know the potential races. > functionality that couldn't be achieved otherwise, does it make the > code significantly less complex, is some integration with the OPP > subsystem needed that I'm overlooking, is it more efficient, ...? > > I'm not just insisting because I'm stubborn. I'd be happy to use any > interface that fits the bill, or to adjust one to fit the bill, but as > of now I mainly see drawbacks on the OPPs side and haven't seen > convincing arguments that it is really needed in devreq_cooling.c or a > better solution. During we discussed, we knew that OPP doesn't provide all operation perfectly. As of now, OPP is standard framework to control the pair of frequency/voltage. devfreq need to use OPP to the pair of frequency/voltage. Even if OPP doesn't provide all of our requirements, I think that devfreq should use OPP interface after updating OPP framework, instead of adding other functionality to control frequency in outside driver(devfreq-cooling.c, throttler). Finally, I asked to Viresh (OPP maintainer). If dev_pm_opp_enable() and dev_pm_opp_disable() don't support the usecase on multiple device drivers, opp interface could not be used in order to support both devfreq-cooling.c and throttler. So, We better to wait the opinion from OPP maintainer.
Hi Chanwoo, On Tue, Aug 07, 2018 at 10:35:37AM +0900, Chanwoo Choi wrote: > Hi Matthias, > > On 2018년 08월 07일 09:23, Matthias Kaehlcke wrote: > > Hi Chanwoo, > > > > On Tue, Aug 07, 2018 at 07:31:16AM +0900, Chanwoo Choi wrote: > >> Hi Matthias, > >> > >> On 2018년 08월 07일 04:21, Matthias Kaehlcke wrote: > >>> Hi Chanwoo, > >>> > >>> On Fri, Aug 03, 2018 at 09:14:46AM +0900, Chanwoo Choi wrote: > >>>> Hi Matthias, > >>>> > >>>> On 2018년 08월 03일 08:48, Matthias Kaehlcke wrote: > >>>>> On Thu, Aug 02, 2018 at 04:13:43PM -0700, Matthias Kaehlcke wrote: > >>>>>> Hi Chanwoo, > >>>>>> > >>>>>> On Thu, Aug 02, 2018 at 10:58:59AM +0900, Chanwoo Choi wrote: > >>>>>>> Hi Matthias, > >>>>>>> > >>>>>>> On 2018년 08월 02일 02:08, Matthias Kaehlcke wrote: > >>>>>>>> Hi Chanwoo, > >>>>>>>> > >>>>>>>> On Wed, Aug 01, 2018 at 10:22:16AM +0900, Chanwoo Choi wrote: > >>>>>>>>> On 2018년 08월 01일 04:39, Matthias Kaehlcke wrote: > >>>>>>>>>> On Mon, Jul 16, 2018 at 10:50:50AM -0700, Matthias Kaehlcke wrote: > >>>>>>>>>>> On Thu, Jul 12, 2018 at 05:44:33PM +0900, Chanwoo Choi wrote: > >>>>>>>>>>>> Hi Matthias, > >>>>>>>>>>>> > >>>>>>>>>>>> On 2018년 07월 07일 02:53, Matthias Kaehlcke wrote: > >>>>>>>>>>>>> Hi Chanwoo, > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Wed, Jul 04, 2018 at 03:41:46PM +0900, Chanwoo Choi wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>>> Firstly, > >>>>>>>>>>>>>> I'm not sure why devfreq needs the devfreq_verify_within_limits() function. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> devfreq already used the OPP interface as default. It means that > >>>>>>>>>>>>>> the outside of 'drivers/devfreq' can disable/enable the frequency > >>>>>>>>>>>>>> such as drivers/thermal/devfreq_cooling.c. Also, when some device > >>>>>>>>>>>>>> drivers disable/enable the specific frequency, the devfreq core > >>>>>>>>>>>>>> consider them. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> So, devfreq doesn't need to devfreq_verify_within_limits() because > >>>>>>>>>>>>>> already support some interface to change the minimum/maximum frequency > >>>>>>>>>>>>>> of devfreq device. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> In case of cpufreq subsystem, cpufreq only provides 'cpufreq_verify_with_limits()' > >>>>>>>>>>>>>> to change the minimum/maximum frequency of cpu. some device driver cannot > >>>>>>>>>>>>>> change the minimum/maximum frequency through OPP interface. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> But, in case of devfreq subsystem, as I explained already, devfreq support > >>>>>>>>>>>>>> the OPP interface as default way. devfreq subsystem doesn't need to add > >>>>>>>>>>>>>> other way to change the minimum/maximum frequency. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Using the OPP interface exclusively works as long as a > >>>>>>>>>>>>> enabling/disabling of OPPs is limited to a single driver > >>>>>>>>>>>>> (drivers/thermal/devfreq_cooling.c). When multiple drivers are > >>>>>>>>>>>>> involved you need a way to resolve conflicts, that's the purpose of > >>>>>>>>>>>>> devfreq_verify_within_limits(). Please let me know if there are > >>>>>>>>>>>>> existing mechanisms for conflict resolution that I overlooked. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Possibly drivers/thermal/devfreq_cooling.c could be migrated to use > >>>>>>>>>>>>> devfreq_verify_within_limits() instead of the OPP interface if > >>>>>>>>>>>>> desired, however this seems beyond the scope of this series. > >>>>>>>>>>>> > >>>>>>>>>>>> Actually, if we uses this approach, it doesn't support the multiple drivers too. > >>>>>>>>>>>> If non throttler drivers uses devfreq_verify_within_limits(), the conflict > >>>>>>>>>>>> happen. > >>>>>>>>>>> > >>>>>>>>>>> As long as drivers limit the max freq there is no conflict, the lowest > >>>>>>>>>>> max freq wins. I expect this to be the usual case, apparently it > >>>>>>>>>>> worked for cpufreq for 10+ years. > >>>>>>>>>>> > >>>>>>>>>>> However it is correct that there would be a conflict if a driver > >>>>>>>>>>> requests a min freq that is higher than the max freq requested by > >>>>>>>>>>> another. In this case devfreq_verify_within_limits() resolves the > >>>>>>>>>>> conflict by raising p->max to the min freq. Not sure if this is > >>>>>>>>>>> something that would ever occur in practice though. > >>>>>>>>>>> > >>>>>>>>>>> If we are really concerned about this case it would also be an option > >>>>>>>>>>> to limit the adjustment to the max frequency. > >>>>>>>>>>> > >>>>>>>>>>>> To resolve the conflict for multiple device driver, maybe OPP interface > >>>>>>>>>>>> have to support 'usage_count' such as clk_enable/disable(). > >>>>>>>>>>> > >>>>>>>>>>> This would require supporting negative usage count values, since a OPP > >>>>>>>>>>> should not be enabled if e.g. thermal enables it but the throttler > >>>>>>>>>>> disabled it or viceversa. > >>>>>>>>>>> > >>>>>>>>>>> Theoretically there could also be conflicts, like one driver disabling > >>>>>>>>>>> the higher OPPs and another the lower ones, with the outcome of all > >>>>>>>>>>> OPPs being disabled, which would be a more drastic conflict resolution > >>>>>>>>>>> than that of devfreq_verify_within_limits(). > >>>>>>>>>>> > >>>>>>>>>>> Viresh, what do you think about an OPP usage count? > >>>>>>>>>> > >>>>>>>>>> Ping, can we try to reach a conclusion on this or at least keep the > >>>>>>>>>> discussion going? > >>>>>>>>>> > >>>>>>>>>> Not that it matters, but my preferred solution continues to be > >>>>>>>>>> devfreq_verify_within_limits(). It solves conflicts in some way (which > >>>>>>>>>> could be adjusted if needed) and has proven to work in practice for > >>>>>>>>>> 10+ years in a very similar sub-system. > >>>>>>>>> > >>>>>>>>> It is not true. Current cpufreq subsystem doesn't support external OPP > >>>>>>>>> control to enable/disable the OPP entry. If some device driver > >>>>>>>>> controls the OPP entry of cpufreq driver with opp_disable/enable(), > >>>>>>>>> the operation is not working. Because cpufreq considers the limit > >>>>>>>>> through 'cpufreq_verify_with_limits()' only. > >>>>>>>> > >>>>>>>> Ok, we can probably agree that using cpufreq_verify_with_limits() > >>>>>>>> exclusively seems to have worked well for cpufreq, and that in their > >>>>>>>> overall purpose cpufreq and devfreq are similar subsystems. > >>>>>>>> > >>>>>>>> The current throttler series with devfreq_verify_within_limits() takes > >>>>>>>> the enabled OPPs into account, the lowest and highest OPP are used as > >>>>>>>> a starting point for the frequency adjustment and (in theory) the > >>>>>>>> frequency range should only be narrowed by > >>>>>>>> devfreq_verify_within_limits(). > >>>>>>>> > >>>>>>>>> As I already commented[1], there is different between cpufreq and devfreq. > >>>>>>>>> [1] https://lkml.org/lkml/2018/7/4/80 > >>>>>>>>> > >>>>>>>>> Already, subsystem already used OPP interface in order to control > >>>>>>>>> specific OPP entry. I don't want to provide two outside method > >>>>>>>>> to control the frequency of devfreq driver. It might make the confusion. > >>>>>>>> > >>>>>>>> I understand your point, it would indeed be preferable to have a > >>>>>>>> single method. However I'm not convinced that the OPP interface is > >>>>>>>> a suitable solution, as I exposed earlier in this thread (quoted > >>>>>>>> below). > >>>>>>>> > >>>>>>>> I would like you to at least consider the possibility of changing > >>>>>>>> drivers/thermal/devfreq_cooling.c to devfreq_verify_within_limits(). > >>>>>>>> Besides that it's not what is currently used, do you see any technical > >>>>>>>> concerns that would make devfreq_verify_within_limits() an unsuitable > >>>>>>>> or inferior solution? > >>>>>>> > >>>>>>> As we already discussed, devfreq_verify_within_limits() doesn't support > >>>>>>> the multiple outside controllers (e.g., devfreq-cooling.c). > >>>>>> > >>>>>> That's incorrect, its purpose is precisely that. > >>>>>> > >>>>>> Are you suggesting that cpufreq with its use of > >>>>>> cpufreq_verify_within_limits() (the inspiration for > >>>>>> devfreq_verify_within_limits()) is broken? It is used by cpu_cooling.c > >>>>>> and other drivers when receiving a CPUFREQ_ADJUST event, essentially > >>>>>> what I am proposing with DEVFREQ_ADJUST. > >>>>>> > >>>>>> Could you elaborate why this model wouldn't work for devfreq? "OPP > >>>>>> interface is mandatory for devfreq" isn't really a technical argument, > >>>>>> is it mandatory for any other reason than that it is the interface > >>>>>> that is currently used? > >>>>>> > >>>>>>> After you are suggesting the throttler core, there are at least two > >>>>>>> outside controllers (e.g., devfreq-cooling.c and throttler driver). > >>>>>>> As I knew the problem about conflict, I cannot agree the temporary > >>>>>>> method. OPP interface is mandatory for devfreq in order to control > >>>>>>> the OPP (frequency/voltage). In this situation, we have to try to > >>>>>>> find the method through OPP interface. > >>>>>> > >>>>>> What do you mean with "temporary method"? > >>>>>> > >>>>>> We can try to find a method through the OPP interface, but at this > >>>>>> point I'm not convinced that it is technically necessary or even > >>>>>> preferable. > >>>>>> > >>>>>> Another inconvenient of the OPP approach for both devfreq-cooling.c > >>>>>> and the throttler is that they have to bother with disabling all OPPs > >>>>>> above/below the max/min (they don't/shouldn't have to care), instead > >>>>>> of just telling devfreq the max/min. > >>>>> > >>>>> And a more important one: both drivers now have to keep track which > >>>>> OPPs they enabled/disabled previously, done are the days of a simple > >>>>> dev_pm_opp_enable/disable() in devfreq_cooling. Certainly it is > >>>>> possible and not very complex to implement, but is it really the > >>>>> best/a good solution? > >>>> > >>>> > >>>> As I replied them right before, Each outside driver has their own throttling > >>>> policy to control OPP entries. They don't care the requirement of other > >>>> driver and cannot know the requirement of other driver. devfreq core can only > >>>> recognize them and then only consider enabled OPP entris without disabled OPP entries. > >>>> > >>>> For example1, > >>>> | devfreq-cooling| throttler > >>>> --------------------------------------- > >>>> 500Mhz | disabled | disabled > >>>> 400Mhz | disabled | disabled > >>>> 300Mhz | | disabled > >>>> 200Mhz | | > >>>> 100Mhz | | > >>>> => devfreq driver can use only 100/200Mhz > >>>> > >>>> > >>>> For example2, > >>>> | devfreq-cooling| throttler > >>>> --------------------------------------- > >>>> 500Mhz | disabled | disabled > >>>> 400Mhz | disabled | > >>>> 300Mhz | disabled | > >>>> 200Mhz | | > >>>> 100Mhz | | > >>>> => devfreq driver can use only 100/200Mhz > >>>> > >>>> > >>>> For example3, > >>>> | devfreq-cooling| throttler > >>>> --------------------------------------- > >>>> 500Mhz | disabled | disabled > >>>> 400Mhz | | > >>>> 300Mhz | | > >>>> 200Mhz | | disabled > >>>> 100Mhz | | disabled > >>>> => devfreq driver can use only 300/400Mhz > >>> > >>> These are all cases without conflicts, my concern is about this: > >>> > >>>> | devfreq-cooling| throttler > >>>> --------------------------------------- > >>>> 500Mhz | disabled | > >>>> 400Mhz | disabled | > >>>> 300Mhz | | disabled > >>>> 200Mhz | | disabled > >>>> 100Mhz | | disabled > >>>> => devfreq driver can't use any frequency? > >> > >> There are no any enabled frequency. Because device driver > >> (devfreq-cooling, throttler) disable all frequencies. > >> > >> Outside drivers(devfreq-cooling, throttler) can enable/disable > >> specific OPP entries. As I already commented, each outside driver > >> doesn't consider the policy of other device driver about OPP entries. > > > > And wouldn't it be preferable to have an interface that tries to avoid > > this situation in the first place and has a clear policy for conflict > > resolution? > > > >> OPP interface is independent on devfreq and just control OPP entries. > >> After that, devfreq just consider the only enabled OPP entries. > >> > >>> > >>> Actually my above comment wasn't about this case, but about the > >>> added complexity in devfreq-cooling.c and the throttler: > >>> > >>> A bit simplified partition_enable_opps() currently does this: > >>> > >>> for_each_opp(opp) { > >>> if (opp->freq <= max) > >>> opp_enable(opp) > >>> else > >>> opp_disable(opp) > >>> } > >>> > >>> With the OPP usage/disable count this doesn't work any longer. Now we > >>> need to keep track of the enabled/disabled state of the OPP, something > >>> like: > >>> > >>> dev_pm_opp_enable(opp) { > >>> if (opp->freq <= max) { > >>> if (opp->freq > prev_max) > >>> opp_enable(opp) > >>> } else { > >>> if (opp->freq < prev_max) > >>> opp_disable(opp) > >>> } > >>> } > >>> > >>> And duplicate the same in the throttler (and other possible > >>> drivers). Obviously it can be done, but is there really any gain > >>> from it? > >>> > >>> Instead they just could do: > >>> > >>> devfreq_verify_within_limits(policy/freq_pair, 0, max_freq) > > I have a new question about using devfreq_verify_within_limits(). > > For example, > Driver A has following opp-table > - 500Mhz > - 400Mhz > - 300Mhz > - 200Mhz > - 100Mhz > > Basically, driver A has following init value: > - policy->min is 100 > - policy->max is 500 > > Driver B, devfreq_verify_within_limits(200, 300) > policy->min is 200 > policy->max is 300 > Driver C, devfreq_verify_within_limits(300, 400) > policy->min is 300 > policy->max is 300 > Driver D, devfreq_verify_within_limits(400, 500) > policy->min is 400 > policy->max is 400 > > In result, it looks like the requirement of Driver B are disappeared. > is it the intention of devfreq_verify_within_limits()? The requirements are conflicting, neither devfreq_verify_within_limits() nor any other mechanism can completely resolve that. Any conflict resolution method will violate the requirements of at least one client. I don't claim the current method is necessarily the best, it's just one that takes a predictable action, I'm open to other suggestions. > >>> without being concerned about implementation details of devfreq. > >>> > >> > >> I don't think so. > > > > What are you referring to, the change that I claim that will be needed > > in partition_enable_opps() when OPPs have usage/disable counts? If so, > > how do you avoid that the function doesn't enable/disable an OPP that > > was already enabled/disabled in the previous iteration? > > Just about changes of "dev_pm_opp_enable(opp)". > > > > >> dev_pm_opp_enable()/dev_pm_opp_disable() have to consider only one > >> OPP entry without any other OPP entry. > > > > I agree with this :) > > > >> dev_pm_opp_enable()/dev_pm_opp_disable() can never know the other > >> OPP entries. After some driver(devfreq-cooling.c and throttler) > >> enable or disable specific OPP entries, the remaining OPP entry > >> with enabled state will be considered on devfreq driver. > > > > Having multiple drivers (or even a single one) enable and disable > > OPPs independently and at the time of their choosing sounds like a > > recipe for race conditions. > > > > What happens if e.g. the devfreq core calls > > dev_pm_opp_find_freq_ceil/floor() and right after returning another > > driver disables the OPP? devfreq uses the disabled OPP. Probably not a\ > > devfreq doesn't use the disabled OPP. > > For example, > 1. devfreq-cooling.c disable/enable some OPP and OPP send notification about OPP changes. > 2. devfreq receives the notification (devfreq_notifier_call() is executed) > 3. devfreq_notifier_call() try to find scaling_min_freq/scaling_max_freq 3.5 The throttler runs and disables scaling_max_freq > 4. devfreq_notifier_call() executes update_devfreq() in order to apply the OPP changes. > 5. devfreq can consider only enabled frequencies right after dev_pm_opp_disable/enable() Depending on the locking requirements within the throttler it *might* be possible to avoid this situation if the throttler held devfreq->lock while manipulating the OPPs. Not a great option though, since we'd be dealing with devfreq implementation details in the throttler. If I am not mistaken the same can happen nowadays with devfreq_cooling.c: <thermal governor> mutex_lock(tz->lock) thermal_cdev_update mutex_lock(cdev->lock) devfreq_cooling_set_cur_state partition_enable_opps The governor function can run at the same time as devfreq_notifier_call() / update_devfreq() and unless I overlooked something there is nothing preventing partition_enable_opps() from disabling OPPs while the devfreq functions are executing. > > big deal if disabling the OPP is only a semantic question, but I > > imagine there can be worse scenarios. Currently the only user of > > dev_pm_opp_disable() besides devfreq_cooling.c is imx6q-cpufreq.c, and > > it is well behaved and only disables OPPs during probe(). > > imx6q-cpufreq.c used the dev_pm_opp_disable() before calling > dev_pm_opp_init_cpufreq_table(). After registered > cpufreq_register_driver(), imx6q-cpufreq.c doesn't use the > dev_pm_opp_disable/enable(). It means that dev_pm_opp_disable() of > imx6q-cpufreq.c doesn't affect the frequency choice of cpufreq on > the runtime after registered cpufreq driver. > > On the other hand, devfreq_cooling.c use dev_pm_opp_disable/enable() > on the runtime after registering devfreq driver. It affect the > frequency choice of devfreq on the runtime. Exactly, that was my point. devfreq_cooling.c is currently the only driver that uses the interface at runtime. And I have doubts that proper synchronization is in place. > > I keep missing a clear answer to the question in which sense > > manipulating the OPPs in devfreq_cooling.c is superior over narrowing > > down the frequency during DEVFREQ_ADJUST, which would avoid potential > > races and allow to resolve conflicts. Does it allow for some > > You mentioned the race conditions eariler. Actually, I don't know > the potential races. Does my above analysis look correct to you, or did I miss some mechanism that provides synchronization? > > functionality that couldn't be achieved otherwise, does it make the > > code significantly less complex, is some integration with the OPP > > subsystem needed that I'm overlooking, is it more efficient, ...? > > > > I'm not just insisting because I'm stubborn. I'd be happy to use any > > interface that fits the bill, or to adjust one to fit the bill, but as > > of now I mainly see drawbacks on the OPPs side and haven't seen > > convincing arguments that it is really needed in devreq_cooling.c or a > > better solution. > > During we discussed, we knew that OPP doesn't provide all operation > perfectly. As of now, OPP is standard framework to control the pair > of frequency/voltage. devfreq need to use OPP to the pair of > frequency/voltage. Even if OPP doesn't provide all of our > requirements, I think that devfreq should use OPP interface after > updating OPP framework, instead of adding other functionality to > control frequency in outside driver(devfreq-cooling.c, throttler). I think it's great if devfreq uses the OPP interface internally, and disables OPPs *after* having received the requirements from all outside drivers and resolved potential conflicts. For narrowing down the frequency range from outside drivers - even with the extended OPP interface we are discussing - there are multiple caveats and so far not a single *technical* argument in it's favor (and I'm really in favor of using standard interfaces when suitable instead of implementing custom solutions, though in this case the 'custom solution' is a simple function call). > I asked to Viresh (OPP maintainer). If dev_pm_opp_enable() and > dev_pm_opp_disable() don't support the usecase on multiple device > drivers, opp interface could not be used in order to support both > devfreq-cooling.c and throttler. So, We better to wait the opinion > from OPP maintainer. Yes, it would be interesting to know Viresh's opinion, maybe I'm completely wrong and there is an elegant solution using the OPPs. Hopefully this thread didn't get buried in his inbox, it might be worth to ping him separately. Thanks Matthias
diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c index 21604d6ae2b8..4cbaa7ad1972 100644 --- a/drivers/devfreq/devfreq.c +++ b/drivers/devfreq/devfreq.c @@ -72,6 +72,21 @@ static struct devfreq *find_device_devfreq(struct device *dev) return ERR_PTR(-ENODEV); } +/** + * dev_to_devfreq() - find devfreq struct using device pointer + * @dev: device pointer used to lookup device devfreq. + */ +struct devfreq *dev_to_devfreq(struct device *dev) +{ + struct devfreq *devfreq; + + mutex_lock(&devfreq_list_lock); + devfreq = find_device_devfreq(dev); + mutex_unlock(&devfreq_list_lock); + + return devfreq; +} + static unsigned long find_available_min_freq(struct devfreq *devfreq) { struct dev_pm_opp *opp; @@ -269,20 +284,21 @@ int update_devfreq(struct devfreq *devfreq) if (!policy->governor) return -EINVAL; + policy->min = policy->devinfo.min_freq; + policy->max = policy->devinfo.max_freq; + + srcu_notifier_call_chain(&devfreq->policy_notifier_list, + DEVFREQ_ADJUST, policy); + /* Reevaluate the proper frequency */ err = policy->governor->get_target_freq(devfreq, &freq); if (err) return err; - /* - * Adjust the frequency with user freq, QoS and available freq. - * - * List from the highest priority - * max_freq - * min_freq - */ - max_freq = MIN(policy->devinfo.max_freq, policy->user.max_freq); - min_freq = MAX(policy->devinfo.min_freq, policy->user.min_freq); + /* Adjust the frequency */ + + max_freq = MIN(policy->max, policy->user.max_freq); + min_freq = MAX(policy->min, policy->user.min_freq); if (freq < min_freq) { freq = min_freq; @@ -645,6 +661,7 @@ struct devfreq *devfreq_add_device(struct device *dev, devfreq->last_stat_updated = jiffies; srcu_init_notifier_head(&devfreq->transition_notifier_list); + srcu_init_notifier_head(&devfreq->policy_notifier_list); mutex_unlock(&devfreq->lock); @@ -1445,7 +1462,7 @@ EXPORT_SYMBOL(devm_devfreq_unregister_opp_notifier); * devfreq_register_notifier() - Register a driver with devfreq * @devfreq: The devfreq object. * @nb: The notifier block to register. - * @list: DEVFREQ_TRANSITION_NOTIFIER. + * @list: DEVFREQ_TRANSITION_NOTIFIER or DEVFREQ_POLICY_NOTIFIER. */ int devfreq_register_notifier(struct devfreq *devfreq, struct notifier_block *nb, @@ -1461,6 +1478,10 @@ int devfreq_register_notifier(struct devfreq *devfreq, ret = srcu_notifier_chain_register( &devfreq->transition_notifier_list, nb); break; + case DEVFREQ_POLICY_NOTIFIER: + ret = srcu_notifier_chain_register( + &devfreq->policy_notifier_list, nb); + break; default: ret = -EINVAL; } @@ -1473,7 +1494,7 @@ EXPORT_SYMBOL(devfreq_register_notifier); * devfreq_unregister_notifier() - Unregister a driver with devfreq * @devfreq: The devfreq object. * @nb: The notifier block to be unregistered. - * @list: DEVFREQ_TRANSITION_NOTIFIER. + * @list: DEVFREQ_TRANSITION_NOTIFIER or DEVFREQ_POLICY_NOTIFIER. */ int devfreq_unregister_notifier(struct devfreq *devfreq, struct notifier_block *nb, @@ -1489,6 +1510,11 @@ int devfreq_unregister_notifier(struct devfreq *devfreq, ret = srcu_notifier_chain_unregister( &devfreq->transition_notifier_list, nb); break; + case DEVFREQ_POLICY_NOTIFIER: + ret = srcu_notifier_chain_unregister( + &devfreq->policy_notifier_list, nb); + break; + default: ret = -EINVAL; } diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h index 9bf23b976f4d..7c8dce96db73 100644 --- a/include/linux/devfreq.h +++ b/include/linux/devfreq.h @@ -33,6 +33,10 @@ #define DEVFREQ_PRECHANGE (0) #define DEVFREQ_POSTCHANGE (1) +#define DEVFREQ_POLICY_NOTIFIER 1 + +#define DEVFREQ_ADJUST 0 + struct devfreq; struct devfreq_governor; @@ -121,12 +125,16 @@ struct devfreq_freq_limits { /** * struct devfreq_policy - Devfreq policy + * @min: minimum frequency (adjustable by policy notifiers) + * @min: maximum frequency (adjustable by policy notifiers) * @user: frequency limits requested by the user * @devinfo: frequency limits of the device (available OPPs) * @governor: method how to choose frequency based on the usage. * @governor_name: devfreq governor name for use with this devfreq */ struct devfreq_policy { + unsigned long min; + unsigned long max; struct devfreq_freq_limits user; struct devfreq_freq_limits devinfo; const struct devfreq_governor *governor; @@ -155,6 +163,7 @@ struct devfreq_policy { * @time_in_state: Statistics of devfreq states * @last_stat_updated: The last time stat updated * @transition_notifier_list: list head of DEVFREQ_TRANSITION_NOTIFIER notifier + * @policy_notifier_list: list head of DEVFREQ_POLICY_NOTIFIER notifier * * This structure stores the devfreq information for a give device. * @@ -188,6 +197,7 @@ struct devfreq { unsigned long last_stat_updated; struct srcu_notifier_head transition_notifier_list; + struct srcu_notifier_head policy_notifier_list; }; struct devfreq_freqs { @@ -240,6 +250,45 @@ extern void devm_devfreq_unregister_notifier(struct device *dev, extern struct devfreq *devfreq_get_devfreq_by_phandle(struct device *dev, int index); +/** + * devfreq_verify_within_limits() - Adjust a devfreq policy if needed to make + * sure its min/max values are within a + * specified range. + * @policy: the policy + * @min: the minimum frequency + * @max: the maximum frequency + */ +static inline void devfreq_verify_within_limits(struct devfreq_policy *policy, + unsigned int min, unsigned int max) +{ + if (policy->min < min) + policy->min = min; + if (policy->max < min) + policy->max = min; + if (policy->min > max) + policy->min = max; + if (policy->max > max) + policy->max = max; + if (policy->min > policy->max) + policy->min = policy->max; +} + +/** + * devfreq_verify_within_dev_limits() - Adjust a devfreq policy if needed to + * make sure its min/max values are within + * the frequency range supported by the + * device. + * @policy: the policy + */ +static inline void +devfreq_verify_within_dev_limits(struct devfreq_policy *policy) +{ + devfreq_verify_within_limits(policy, policy->devinfo.min_freq, + policy->devinfo.max_freq); +} + +struct devfreq *dev_to_devfreq(struct device *dev); + #if IS_ENABLED(CONFIG_DEVFREQ_GOV_SIMPLE_ONDEMAND) /** * struct devfreq_simple_ondemand_data - void *data fed to struct devfreq @@ -394,10 +443,26 @@ static inline struct devfreq *devfreq_get_devfreq_by_phandle(struct device *dev, return ERR_PTR(-ENODEV); } +static inline void devfreq_verify_within_limits(struct devfreq_policy *policy, + unsigned int min, unsigned int max) +{ +} + +static inline void +devfreq_verify_within_dev_limits(struct devfreq_policy *policy) +{ +} + static inline int devfreq_update_stats(struct devfreq *df) { return -EINVAL; } + +static inline struct devfreq *dev_to_devfreq(struct device *dev) +{ + return NULL; +} + #endif /* CONFIG_PM_DEVFREQ */ #endif /* __LINUX_DEVFREQ_H__ */