Message ID | 1314775779-21399-3-git-send-email-myungjoo.ham@samsung.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
On Wed, Aug 31, 2011 at 12:29 AM, MyungJoo Ham <myungjoo.ham@samsung.com> wrote: [snip] > +/** > + * get_devfreq() - find devfreq struct. a wrapped find_device_devfreq() > + * with mutex protection. exported for governors > + * @dev: device pointer used to lookup device devfreq. > + */ > +struct devfreq *get_devfreq(struct device *dev) > +{ > + struct devfreq *ret; > + > + mutex_lock(&devfreq_list_lock); > + ret = find_device_devfreq(dev); > + mutex_unlock(&devfreq_list_lock); You prevent changes to the devfreq list while searching (good) but after returning the pointer there is no protection from that item being removed from the list. Generally "get" and "put" functions do more than just return a pointer: get functions often increment a refcount, or hold a lock. The put function will decrement the refcount or release the lock. Maybe you want something like the following: mutex_lock(&devfreq_list_lock); ret = find_device_devfreq(dev); mutex_lock(&devfreq->lock); mutex_unlock(&devfreq_list_lock); Then you need a corresponding put which does the mutex_unlock(&devfreq->lock). It looks like the only consumers of get_devfreq are the sysfs show/store interfaces, which immediately hold devfreq->lock, so the above proposal certainly makes fits the existing use case. Also CPUfreq's "cpufreq_get" function does a nice job of protecting the object from getting freed with a rw_semaphore. It is a bit more "complicated" but makes for good reading. > + > + return ret; > +} > + > +/** > + * devfreq_do() - Check the usage profile of a given device and configure > + * frequency and voltage accordingly > + * @devfreq: devfreq info of the given device > + */ > +static int devfreq_do(struct devfreq *devfreq) > +{ > + struct opp *opp; > + unsigned long freq; > + int err; > + Put the mutex_is_locked check here? See more below. > + err = devfreq->governor->get_target_freq(devfreq, &freq); > + if (err) > + return err; > + > + opp = opp_find_freq_ceil(devfreq->dev, &freq); > + if (opp == ERR_PTR(-ENODEV)) > + opp = opp_find_freq_floor(devfreq->dev, &freq); > + > + if (IS_ERR(opp)) > + return PTR_ERR(opp); > + > + if (devfreq->previous_freq == freq) > + return 0; > + > + err = devfreq->profile->target(devfreq->dev, opp); > + if (err) > + return err; > + > + devfreq->previous_freq = freq; > + return 0; > +} > + > +/** > + * update_devfreq() - Notify that the device OPP or frequency requirement > + * has been changed. This function is exported for governors. > + * @devfreq: the devfreq instance. > + * > + * Note: lock devfreq->lock before calling update_devfreq > + */ > +int update_devfreq(struct devfreq *devfreq) > +{ > + int err = 0; > + > + if (!mutex_is_locked(&devfreq->lock)) { > + WARN(true, "devfreq->lock must be locked by the caller.\n"); > + return -EINVAL; > + } > + > + /* Reevaluate the proper frequency */ > + err = devfreq_do(devfreq); > + return err; > +} > + > +/** > + * devfreq_update() - Notify that the device OPP has been changed. > + * @dev: the device whose OPP has been changed. > + * > + * Called by OPP notifier. > + */ > +static int devfreq_update(struct notifier_block *nb, unsigned long type, > + void *devp) > +{ > + struct devfreq *devfreq = container_of(nb, struct devfreq, nb); > + int ret; > + > + mutex_lock(&devfreq->lock); > + ret = update_devfreq(devfreq); > + mutex_unlock(&devfreq->lock); The whole devfreq_update/update_devfreq pairing is redundant. update_devfreq's purpose is to make sure the lock is held before going further, and the only caller of update_devfreq is devfreq_update which always holds the lock. This still doesn't stop a bad driver writer from just calling devfreq_do with an extern. Perhaps the lock detection should be moved into devfreq_do and update_devfreq should go away? > + > + return ret; > +} > + > +/** > + * devfreq_monitor() - Periodically run devfreq_do() > + * @work: the work struct used to run devfreq_monitor periodically. > + * > + */ > +static void devfreq_monitor(struct work_struct *work) > +{ > + static unsigned long last_polled_at; > + struct devfreq *devfreq, *tmp; > + int error; > + unsigned long jiffies_passed; > + unsigned long next_jiffies = ULONG_MAX, now = jiffies; > + > + /* Initially last_polled_at = 0, polling every device at bootup */ > + jiffies_passed = now - last_polled_at; > + last_polled_at = now; > + if (jiffies_passed == 0) > + jiffies_passed = 1; > + > + mutex_lock(&devfreq_list_lock); Should not lock the list here. If we lock the list for all major operations, it nullifies the performance benefit of having a mutex in struct devfreq. > + > + list_for_each_entry_safe(devfreq, tmp, &devfreq_list, node) { > + mutex_lock(&devfreq->lock); > + > + if (devfreq->next_polling == 0) { > + mutex_unlock(&devfreq->lock); > + continue; > + } > + > + /* > + * Reduce more next_polling if devfreq_wq took an extra > + * delay. (i.e., CPU has been idled.) > + */ > + if (devfreq->next_polling <= jiffies_passed) { > + error = devfreq_do(devfreq); > + > + /* Remove a devfreq with an error. */ > + if (error && error != -EAGAIN) { > + dev_err(devfreq->dev, "Due to devfreq_do error(%d), devfreq(%s) is removed from the device\n", > + error, devfreq->governor->name); > + > + list_del(&devfreq->node); > + mutex_unlock(&devfreq->lock); > + kfree(devfreq); > + continue; Should this error handling also unregister the OPP notifier? This code duplicates portions of devfreq_remove_device. I propose instead here we do the following: mutex_unlock(&devfreq->lock); _devfreq_remove_lock(devfreq); /* this locks the list first, * then locks devfreq->lock, * then does the house cleaning */ continue; > + } > + devfreq->next_polling = devfreq->polling_jiffies; > + > + /* No more polling required (polling_ms changed) */ > + if (devfreq->next_polling == 0) { > + mutex_unlock(&devfreq->lock); > + continue; > + } > + } else { > + devfreq->next_polling -= jiffies_passed; > + } > + > + next_jiffies = (next_jiffies > devfreq->next_polling) ? > + devfreq->next_polling : next_jiffies; > + > + mutex_unlock(&devfreq->lock); > + } > + > + if (next_jiffies > 0 && next_jiffies < ULONG_MAX) { > + polling = true; > + queue_delayed_work(devfreq_wq, &devfreq_work, next_jiffies); > + } else { > + polling = false; > + } > + > + mutex_unlock(&devfreq_list_lock); Again, list should not be locked over this whole function. It blocks other unrelated devfreq devices from scaling. > +} [snip] > +/** > + * devfreq_remove_device() - Remove devfreq feature from a device. > + * @device: the device to remove devfreq feature. > + */ > +int devfreq_remove_device(struct device *dev) Why does this take a struct device*? Shouldn't it be a struct devfreq*? If there is a case for removing a devfreq device with only struct device* as input, how about: int devfreq_remove_device(struct device *dev) { struct devfreq *devfreq; mutex_lock(&devfreq_list_lock); devfreq = find_device_devfreq(dev); mutex_unlock(&devfreq_list_lock); return _devfreq_remove_device(struct devfreq *df); /* _devfreq_remove_device does the real work and can also be called from devfreq_monitor */ } Regards, Mike > +{ > + struct devfreq *devfreq; > + struct srcu_notifier_head *nh; > + int err = 0; > + > + if (!dev) > + return -EINVAL; > + > + mutex_lock(&devfreq_list_lock); > + devfreq = find_device_devfreq(dev); > + if (IS_ERR(devfreq)) { > + err = PTR_ERR(devfreq); > + goto out; > + } > + > + mutex_lock(&devfreq->lock); > + nh = opp_get_notifier(dev); > + if (IS_ERR(nh)) { > + err = PTR_ERR(nh); > + mutex_unlock(&devfreq->lock); > + goto out; > + } > + > + list_del(&devfreq->node); > + > + if (devfreq->governor->exit) > + devfreq->governor->exit(devfreq); > + > + srcu_notifier_chain_unregister(nh, &devfreq->nb); > + mutex_unlock(&devfreq->lock); > + kfree(devfreq); > +out: > + mutex_unlock(&devfreq_list_lock); > + return 0; > +} [snip]
On Thu, Sep 1, 2011 at 5:05 AM, Turquette, Mike <mturquette@ti.com> wrote: > On Wed, Aug 31, 2011 at 12:29 AM, MyungJoo Ham <myungjoo.ham@samsung.com> wrote: > [snip] >> +/** >> + * get_devfreq() - find devfreq struct. a wrapped find_device_devfreq() >> + * with mutex protection. exported for governors >> + * @dev: device pointer used to lookup device devfreq. >> + */ >> +struct devfreq *get_devfreq(struct device *dev) >> +{ >> + struct devfreq *ret; >> + >> + mutex_lock(&devfreq_list_lock); >> + ret = find_device_devfreq(dev); >> + mutex_unlock(&devfreq_list_lock); > > You prevent changes to the devfreq list while searching (good) but > after returning the pointer there is no protection from that item > being removed from the list. Generally "get" and "put" functions do > more than just return a pointer: get functions often increment a > refcount, or hold a lock. The put function will decrement the > refcount or release the lock. Maybe you want something like the > following: > > mutex_lock(&devfreq_list_lock); > ret = find_device_devfreq(dev); > mutex_lock(&devfreq->lock); > mutex_unlock(&devfreq_list_lock); > > Then you need a corresponding put which does the mutex_unlock(&devfreq->lock). > > It looks like the only consumers of get_devfreq are the sysfs > show/store interfaces, which immediately hold devfreq->lock, so the > above proposal certainly makes fits the existing use case. > > Also CPUfreq's "cpufreq_get" function does a nice job of protecting > the object from getting freed with a rw_semaphore. It is a bit more > "complicated" but makes for good reading. > Thank you. The possibility that someone may do something on struct devfreq after mutex_unlock(&devfreq_list_lock) and before mutex_lock(&devfreq->lock) bothered me and it appears that I need to add devfreq_put() anyway. I hesitated it because users might forget using devfreq_put() after calling devfreq_get(); however, it is just same as forgetting mutex_unlock after mutex_lock. So I wouldn't mind that much. The next devfreq_get() will do > mutex_lock(&devfreq_list_lock); > ret = find_device_devfreq(dev); > mutex_lock(&devfreq->lock); > mutex_unlock(&devfreq_list_lock); and devfreq_put() will do > mutex_unlock(&devfreq->lock); as you've suggested. >> +static int devfreq_do(struct devfreq *devfreq) >> +{ >> + struct opp *opp; >> + unsigned long freq; >> + int err; >> + > > Put the mutex_is_locked check here? See more below. > [] >> +static int devfreq_update(struct notifier_block *nb, unsigned long type, >> + void *devp) >> +{ >> + struct devfreq *devfreq = container_of(nb, struct devfreq, nb); >> + int ret; >> + >> + mutex_lock(&devfreq->lock); >> + ret = update_devfreq(devfreq); >> + mutex_unlock(&devfreq->lock); > > The whole devfreq_update/update_devfreq pairing is redundant. > update_devfreq's purpose is to make sure the lock is held before going > further, and the only caller of update_devfreq is devfreq_update which > always holds the lock. > > This still doesn't stop a bad driver writer from just calling > devfreq_do with an extern. Perhaps the lock detection should be moved > into devfreq_do and update_devfreq should go away? > - update_devfreq: extern for governors - devfreq_update: notifier callback for OPP - devfreq_do: internal function of devfreq. Anyway, as you've mentioned, it seems I'd better rename devfreq_do as update_devfreq and make it exported with mutex check. [] >> +static void devfreq_monitor(struct work_struct *work) >> +{ >> + static unsigned long last_polled_at; >> + struct devfreq *devfreq, *tmp; >> + int error; >> + unsigned long jiffies_passed; >> + unsigned long next_jiffies = ULONG_MAX, now = jiffies; >> + >> + /* Initially last_polled_at = 0, polling every device at bootup */ >> + jiffies_passed = now - last_polled_at; >> + last_polled_at = now; >> + if (jiffies_passed == 0) >> + jiffies_passed = 1; >> + >> + mutex_lock(&devfreq_list_lock); > > Should not lock the list here. If we lock the list for all major > operations, it nullifies the performance benefit of having a mutex in > struct devfreq. > Ok... then.. how about locking like this? : mutex_lock(&devfreq_list_lock); list_for_each_entry_safe(devfreq, tmp, &devfreq_list, node) { mutex_lock(&devfreq->lock); mutex_unlock(&devfreq_list_lock); blahblah mutex_unlock(&devfreq->lock); mutex_lock(&devfreq_list_lock); } mutex_unlock(&devfreq_list_lock); Anyway, there is one more problem with allowing unlocked-devfreq_list_lock in the loop. list_for_each_entry_safe(devfreq, tmp, &devfreq_list, node) is safe for the removal of devfreq from the list in the loop. However, it is not safe against the removal of devfreq's next member in the loop and while devfreq_list_lock is unlocked, devfreq_remove_device may remove that one; thus, breaking the list_for_each_entry_safe loop. Such break is prevent by adding one more mutex_lock/mutex_unlock to the loop for tmp->lock. However, if we do mutex_unlock(&tmp->lock) before mutex_lock(&devfreq_list_lock), we still have the same breaking-the-loop issue and if we do it after mutex_lock(&devfreq_list_lock), we have a deadlock issue (someone might have locked devfreq_list_lock and waiting to lock tmp->lock). Thus, we will need to block devfreq_remove_device at devfreq_monitor whlie unlocking devfreq_list in the loop. Other operations (add / list) on the list are fine for it. So, the loop will be: mutex_lock(&devfreq_list_lock); prohibit_devfreq_remove = true; list_for_each_entry_safe(devfreq, tmp, &devfreq_list, node) { mutex_lock(&devfreq->lock); mutex_unlock(&devfreq_list_lock); blahblah mutex_unlock(&devfreq->lock); mutex_lock(&devfreq_list_lock); } prohibit_devfreq_remove = false; mutex_unlock(&devfreq_list_lock); [] >> + /* Remove a devfreq with an error. */ >> + if (error && error != -EAGAIN) { >> + dev_err(devfreq->dev, "Due to devfreq_do error(%d), devfreq(%s) is removed from the device\n", >> + error, devfreq->governor->name); >> + >> + list_del(&devfreq->node); >> + mutex_unlock(&devfreq->lock); >> + kfree(devfreq); >> + continue; > > Should this error handling also unregister the OPP notifier? This > code duplicates portions of devfreq_remove_device. I propose instead > here we do the following: > > mutex_unlock(&devfreq->lock); > _devfreq_remove_lock(devfreq); > /* this locks the list first, > * then locks devfreq->lock, > * then does the house cleaning > */ > continue; Ah.. that was missing. Thanks! I'll make a _devfreq_remove_device(struct device *dev, bool call_by_monitor) and let devfreq_remove_device use it. > >> + } >> + devfreq->next_polling = devfreq->polling_jiffies; >> + >> + /* No more polling required (polling_ms changed) */ >> + if (devfreq->next_polling == 0) { >> + mutex_unlock(&devfreq->lock); >> + continue; >> + } >> + } else { >> + devfreq->next_polling -= jiffies_passed; >> + } >> + >> + next_jiffies = (next_jiffies > devfreq->next_polling) ? >> + devfreq->next_polling : next_jiffies; >> + >> + mutex_unlock(&devfreq->lock); >> + } >> + >> + if (next_jiffies > 0 && next_jiffies < ULONG_MAX) { >> + polling = true; >> + queue_delayed_work(devfreq_wq, &devfreq_work, next_jiffies); >> + } else { >> + polling = false; >> + } >> + >> + mutex_unlock(&devfreq_list_lock); > > Again, list should not be locked over this whole function. It blocks > other unrelated devfreq devices from scaling. > >> +} > > [snip] > >> +/** >> + * devfreq_remove_device() - Remove devfreq feature from a device. >> + * @device: the device to remove devfreq feature. >> + */ >> +int devfreq_remove_device(struct device *dev) > > Why does this take a struct device*? Shouldn't it be a struct devfreq*? It is because devfreq_add_device() does not return struct devfreq. struct devfreq is not visible to device drivers. It is visible to governors. > > If there is a case for removing a devfreq device with only struct > device* as input, how about: > > int devfreq_remove_device(struct device *dev) > { > struct devfreq *devfreq; > mutex_lock(&devfreq_list_lock); > devfreq = find_device_devfreq(dev); > mutex_unlock(&devfreq_list_lock); > > return _devfreq_remove_device(struct devfreq *df); > /* _devfreq_remove_device does the real work and can also be > called from devfreq_monitor */ > } Sure, I'll do so and reduce the redundancy. > > Regards, > Mike > [] Thank you so much! Cheers, MyungJoo
On Wed, Aug 31, 2011 at 9:51 PM, MyungJoo Ham <myungjoo.ham@samsung.com> wrote: > On Thu, Sep 1, 2011 at 5:05 AM, Turquette, Mike <mturquette@ti.com> wrote: >> On Wed, Aug 31, 2011 at 12:29 AM, MyungJoo Ham <myungjoo.ham@samsung.com> wrote: [snip] >>> +static void devfreq_monitor(struct work_struct *work) >>> +{ >>> + static unsigned long last_polled_at; >>> + struct devfreq *devfreq, *tmp; >>> + int error; >>> + unsigned long jiffies_passed; >>> + unsigned long next_jiffies = ULONG_MAX, now = jiffies; >>> + >>> + /* Initially last_polled_at = 0, polling every device at bootup */ >>> + jiffies_passed = now - last_polled_at; >>> + last_polled_at = now; >>> + if (jiffies_passed == 0) >>> + jiffies_passed = 1; >>> + >>> + mutex_lock(&devfreq_list_lock); >> >> Should not lock the list here. If we lock the list for all major >> operations, it nullifies the performance benefit of having a mutex in >> struct devfreq. >> > > Ok... then.. how about locking like this? : > > mutex_lock(&devfreq_list_lock); > list_for_each_entry_safe(devfreq, tmp, &devfreq_list, node) { > mutex_lock(&devfreq->lock); > mutex_unlock(&devfreq_list_lock); > > blahblah > > mutex_unlock(&devfreq->lock); > mutex_lock(&devfreq_list_lock); > } > mutex_unlock(&devfreq_list_lock); I took a step back from the code to rethink the big picture, and I've come back to the same conclusion conclusion that I had in the V5 patchset. (https://patchwork.kernel.org/patch/1043442/) Firstly, there is no reason to walk the list here. All of the locking discussion here could just go away if we didn't walk the list of devfreq devices every time the delay_work gets fired. If each device programmed it's own delayed work then this problem simply goes away. If you don't mind I'd like to post an RFC of devfreq with these changes implemented so we can review them and discuss the ideas around some concrete code. Best regards, Mike
On Fri, Sep 2, 2011 at 1:57 AM, Turquette, Mike <mturquette@ti.com> wrote: > On Wed, Aug 31, 2011 at 9:51 PM, MyungJoo Ham <myungjoo.ham@samsung.com> wrote: >> On Thu, Sep 1, 2011 at 5:05 AM, Turquette, Mike <mturquette@ti.com> wrote: >>> On Wed, Aug 31, 2011 at 12:29 AM, MyungJoo Ham <myungjoo.ham@samsung.com> wrote: > [snip] >>>> +static void devfreq_monitor(struct work_struct *work) >>>> +{ >>>> + static unsigned long last_polled_at; >>>> + struct devfreq *devfreq, *tmp; >>>> + int error; >>>> + unsigned long jiffies_passed; >>>> + unsigned long next_jiffies = ULONG_MAX, now = jiffies; >>>> + >>>> + /* Initially last_polled_at = 0, polling every device at bootup */ >>>> + jiffies_passed = now - last_polled_at; >>>> + last_polled_at = now; >>>> + if (jiffies_passed == 0) >>>> + jiffies_passed = 1; >>>> + >>>> + mutex_lock(&devfreq_list_lock); >>> >>> Should not lock the list here. If we lock the list for all major >>> operations, it nullifies the performance benefit of having a mutex in >>> struct devfreq. >>> >> >> Ok... then.. how about locking like this? : >> >> mutex_lock(&devfreq_list_lock); >> list_for_each_entry_safe(devfreq, tmp, &devfreq_list, node) { >> mutex_lock(&devfreq->lock); >> mutex_unlock(&devfreq_list_lock); >> >> blahblah >> >> mutex_unlock(&devfreq->lock); >> mutex_lock(&devfreq_list_lock); >> } >> mutex_unlock(&devfreq_list_lock); > > I took a step back from the code to rethink the big picture, and I've > come back to the same conclusion conclusion that I had in the V5 > patchset. (https://patchwork.kernel.org/patch/1043442/) > > Firstly, there is no reason to walk the list here. All of the locking > discussion here could just go away if we didn't walk the list of > devfreq devices every time the delay_work gets fired. If each device > programmed it's own delayed work then this problem simply goes away. > > If you don't mind I'd like to post an RFC of devfreq with these > changes implemented so we can review them and discuss the ideas around > some concrete code. > > Best regards, > Mike > I'm considering to allow a governor to declare that it will run its own loop . Because we have .init/.exit callbacks and an update in OPP triggers to call devfreq->profile->target anyway, we only need to add one bit in struct devfreq_governor, "bool own_loop". Then, we just need to modify devfreq_add_device to omit from the list or mark them not to be looped if the "own_loop" bit is true for the given governor. That way, we can keep most governors simple as cpufreq's drivers and as not complex as cpufreq's governors. For those governors that really need to run their own loop, we can give the option. I want most devfreq governors to be general enough for devfreq devices so that the device drivers may use any of them without altering the parameters much and easy and short enough for subsystems to have their own governors. Thus, common things such as looping and getting usage statistics are moved from governors to devfreq framework. You are welcome to post an RFC patch to allow governors to have their own loop implemented in governors; however, I think that should be optional, not mandatory for governors. And, that degree of synchronization issue in devfreq ain't that badly complex, isn't it? Thank you. Cheers! It's Friday. MyungJoo ps. Ah.. and for the kobject thing that you've mentioned in the other thread, I'm experimenting it (a kobject class "devfreq"). However, it will relocate devfreq sysfs entries from /sys/devices/.../power/* to /sys/devices/.../devfreq/*. To Rafael: Would it be fine for the devfreq sys entries to move to such a location by creating "devfreq" class? I remember you've objected to an independent per-device sysfs directory for devfreq entries. However, it is sort of "sideeffect" in reducing overhead of searching struct devfreq again and again for sysfs callbacks.
On Thu, Sep 1, 2011 at 9:38 PM, MyungJoo Ham <myungjoo.ham@gmail.com> wrote: > I'm considering to allow a governor to declare that it will run its own loop > . > Because we have .init/.exit callbacks and an update in OPP triggers to > call devfreq->profile->target anyway, we only need to add one bit in > struct devfreq_governor, "bool own_loop". Then, we just need to modify > devfreq_add_device to omit from the list or mark them not to be looped > if the "own_loop" bit is true for the given governor. > > That way, we can keep most governors simple as cpufreq's drivers and > as not complex as cpufreq's governors. For those governors that really > need to run their own loop, we can give the option. I want most > devfreq governors to be general enough for devfreq devices so that the > device drivers may use any of them without altering the parameters > much and easy and short enough for subsystems to have their own > governors. Thus, common things such as looping and getting usage > statistics are moved from governors to devfreq framework. I can understand wanting to keep the governors simpler than their CPUfreq counterparts. However I also disagree with polling boolean and the use of devfreq for non-polling purposes (with the exception of powersave & performance, which are mostly for testing, and for userspace which is a special case). Also my version of the patch removes the dependency on the opp library and abstracts those details away in a frequency table, which might use OPPs as a backing store, or might accept arbitrary rates given min/max limitations (such as for a device connected to a clock with a very wide divider). There are probably devices out there than can simply change rates with clk_set_rate that would benefit from devfreq, but do not have OPPs. > You are welcome to post an RFC patch to allow governors to have their > own loop implemented in governors; however, I think that should be > optional, not mandatory for governors. And, that degree of > synchronization issue in devfreq ain't that badly complex, isn't it? Thanks for being open to the idea. I don't think that the list walk is wildly complex but some of the differences run deeper than that. I'm at LPC next week so I'm not sure when my RFC will hit the list. Hopefully the week after LPC. Until then I'll continue to review your devfreq patches as usual. > > Thank you. > > > Cheers! It's Friday. > MyungJoo > > ps. Ah.. and for the kobject thing that you've mentioned in the other > thread, I'm experimenting it (a kobject class "devfreq"). However, it > will relocate devfreq sysfs entries from /sys/devices/.../power/* to > /sys/devices/.../devfreq/*. Just my $0.02, but my patch does the same thing and I think it's a good thing. The code lives drivers/devfreq/ so placing it under /sys/device/.../power/ wasn't a perfect fit anyways. Regards, Mike > To Rafael: > Would it be fine for the devfreq sys entries to move to such a > location by creating "devfreq" class? I remember you've objected to an > independent per-device sysfs directory for devfreq entries. However, > it is sort of "sideeffect" in reducing overhead of searching struct > devfreq again and again for sysfs callbacks. > > -- > MyungJoo Ham, Ph.D. > Mobile Software Platform Lab, DMC Business, Samsung Electronics >
diff --git a/drivers/Kconfig b/drivers/Kconfig index 95b9e7e..a1efd75 100644 --- a/drivers/Kconfig +++ b/drivers/Kconfig @@ -130,4 +130,6 @@ source "drivers/iommu/Kconfig" source "drivers/virt/Kconfig" +source "drivers/devfreq/Kconfig" + endmenu diff --git a/drivers/Makefile b/drivers/Makefile index 7fa433a..97c957b 100644 --- a/drivers/Makefile +++ b/drivers/Makefile @@ -127,3 +127,5 @@ obj-$(CONFIG_IOMMU_SUPPORT) += iommu/ # Virtualization drivers obj-$(CONFIG_VIRT_DRIVERS) += virt/ + +obj-$(CONFIG_PM_DEVFREQ) += devfreq/ diff --git a/drivers/devfreq/Kconfig b/drivers/devfreq/Kconfig new file mode 100644 index 0000000..1fb42de --- /dev/null +++ b/drivers/devfreq/Kconfig @@ -0,0 +1,39 @@ +config ARCH_HAS_DEVFREQ + bool + depends on ARCH_HAS_OPP + help + Denotes that the architecture supports DEVFREQ. If the architecture + supports multiple OPP entries per device and the frequency of the + devices with OPPs may be altered dynamically, the architecture + supports DEVFREQ. + +menuconfig PM_DEVFREQ + bool "Generic Dynamic Voltage and Frequency Scaling (DVFS) support" + depends on PM_OPP && ARCH_HAS_DEVFREQ + help + With OPP support, a device may have a list of frequencies and + voltages available. DEVFREQ, a generic DVFS framework can be + registered for a device with OPP support in order to let the + governor provided to DEVFREQ choose an operating frequency + based on the OPP's list and the policy given with DEVFREQ. + + Each device may have its own governor and policy. DEVFREQ can + reevaluate the device state periodically and/or based on the + OPP list changes (each frequency/voltage pair in OPP may be + disabled or enabled). + + Like some CPUs with CPUFREQ, a device may have multiple clocks. + However, because the clock frequencies of a single device are + determined by the single device's state, an instance of DEVFREQ + is attached to a single device and returns a "representative" + clock frequency from the OPP of the device, which is also attached + to a device by 1-to-1. The device registering DEVFREQ takes the + responsiblity to "interpret" the frequency listed in OPP and + to set its every clock accordingly with the "target" callback + given to DEVFREQ. + +if PM_DEVFREQ + +comment "DEVFREQ Drivers" + +endif # PM_DEVFREQ diff --git a/drivers/devfreq/Makefile b/drivers/devfreq/Makefile new file mode 100644 index 0000000..168934a --- /dev/null +++ b/drivers/devfreq/Makefile @@ -0,0 +1 @@ +obj-$(CONFIG_PM_DEVFREQ) += devfreq.o diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c new file mode 100644 index 0000000..621b863 --- /dev/null +++ b/drivers/devfreq/devfreq.c @@ -0,0 +1,364 @@ +/* + * devfreq: Generic Dynamic Voltage and Frequency Scaling (DVFS) Framework + * for Non-CPU Devices Based on OPP. + * + * Copyright (C) 2011 Samsung Electronics + * MyungJoo Ham <myungjoo.ham@samsung.com> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#include <linux/kernel.h> +#include <linux/errno.h> +#include <linux/err.h> +#include <linux/init.h> +#include <linux/slab.h> +#include <linux/opp.h> +#include <linux/devfreq.h> +#include <linux/workqueue.h> +#include <linux/platform_device.h> +#include <linux/list.h> +#include <linux/printk.h> +#include <linux/hrtimer.h> + +/* + * devfreq_work periodically monitors every registered device. + * The minimum polling interval is one jiffy. The polling interval is + * determined by the minimum polling period among all polling devfreq + * devices. The resolution of polling interval is one jiffy. + */ +static bool polling; +static struct workqueue_struct *devfreq_wq; +static struct delayed_work devfreq_work; + +/* The list of all device-devfreq */ +static LIST_HEAD(devfreq_list); +static DEFINE_MUTEX(devfreq_list_lock); + +/** + * find_device_devfreq() - find devfreq struct using device pointer + * @dev: device pointer used to lookup device devfreq. + * + * Search the list of device devfreqs and return the matched device's + * devfreq info. devfreq_list_lock should be held by the caller. + */ +static struct devfreq *find_device_devfreq(struct device *dev) +{ + struct devfreq *tmp_devfreq; + + if (unlikely(IS_ERR_OR_NULL(dev))) { + pr_err("DEVFREQ: %s: Invalid parameters\n", __func__); + return ERR_PTR(-EINVAL); + } + + list_for_each_entry(tmp_devfreq, &devfreq_list, node) { + if (tmp_devfreq->dev == dev) + return tmp_devfreq; + } + + return ERR_PTR(-ENODEV); +} + +/** + * get_devfreq() - find devfreq struct. a wrapped find_device_devfreq() + * with mutex protection. exported for governors + * @dev: device pointer used to lookup device devfreq. + */ +struct devfreq *get_devfreq(struct device *dev) +{ + struct devfreq *ret; + + mutex_lock(&devfreq_list_lock); + ret = find_device_devfreq(dev); + mutex_unlock(&devfreq_list_lock); + + return ret; +} + +/** + * devfreq_do() - Check the usage profile of a given device and configure + * frequency and voltage accordingly + * @devfreq: devfreq info of the given device + */ +static int devfreq_do(struct devfreq *devfreq) +{ + struct opp *opp; + unsigned long freq; + int err; + + err = devfreq->governor->get_target_freq(devfreq, &freq); + if (err) + return err; + + opp = opp_find_freq_ceil(devfreq->dev, &freq); + if (opp == ERR_PTR(-ENODEV)) + opp = opp_find_freq_floor(devfreq->dev, &freq); + + if (IS_ERR(opp)) + return PTR_ERR(opp); + + if (devfreq->previous_freq == freq) + return 0; + + err = devfreq->profile->target(devfreq->dev, opp); + if (err) + return err; + + devfreq->previous_freq = freq; + return 0; +} + +/** + * update_devfreq() - Notify that the device OPP or frequency requirement + * has been changed. This function is exported for governors. + * @devfreq: the devfreq instance. + * + * Note: lock devfreq->lock before calling update_devfreq + */ +int update_devfreq(struct devfreq *devfreq) +{ + int err = 0; + + if (!mutex_is_locked(&devfreq->lock)) { + WARN(true, "devfreq->lock must be locked by the caller.\n"); + return -EINVAL; + } + + /* Reevaluate the proper frequency */ + err = devfreq_do(devfreq); + return err; +} + +/** + * devfreq_update() - Notify that the device OPP has been changed. + * @dev: the device whose OPP has been changed. + * + * Called by OPP notifier. + */ +static int devfreq_update(struct notifier_block *nb, unsigned long type, + void *devp) +{ + struct devfreq *devfreq = container_of(nb, struct devfreq, nb); + int ret; + + mutex_lock(&devfreq->lock); + ret = update_devfreq(devfreq); + mutex_unlock(&devfreq->lock); + + return ret; +} + +/** + * devfreq_monitor() - Periodically run devfreq_do() + * @work: the work struct used to run devfreq_monitor periodically. + * + */ +static void devfreq_monitor(struct work_struct *work) +{ + static unsigned long last_polled_at; + struct devfreq *devfreq, *tmp; + int error; + unsigned long jiffies_passed; + unsigned long next_jiffies = ULONG_MAX, now = jiffies; + + /* Initially last_polled_at = 0, polling every device at bootup */ + jiffies_passed = now - last_polled_at; + last_polled_at = now; + if (jiffies_passed == 0) + jiffies_passed = 1; + + mutex_lock(&devfreq_list_lock); + + list_for_each_entry_safe(devfreq, tmp, &devfreq_list, node) { + mutex_lock(&devfreq->lock); + + if (devfreq->next_polling == 0) { + mutex_unlock(&devfreq->lock); + continue; + } + + /* + * Reduce more next_polling if devfreq_wq took an extra + * delay. (i.e., CPU has been idled.) + */ + if (devfreq->next_polling <= jiffies_passed) { + error = devfreq_do(devfreq); + + /* Remove a devfreq with an error. */ + if (error && error != -EAGAIN) { + dev_err(devfreq->dev, "Due to devfreq_do error(%d), devfreq(%s) is removed from the device\n", + error, devfreq->governor->name); + + list_del(&devfreq->node); + mutex_unlock(&devfreq->lock); + kfree(devfreq); + continue; + } + devfreq->next_polling = devfreq->polling_jiffies; + + /* No more polling required (polling_ms changed) */ + if (devfreq->next_polling == 0) { + mutex_unlock(&devfreq->lock); + continue; + } + } else { + devfreq->next_polling -= jiffies_passed; + } + + next_jiffies = (next_jiffies > devfreq->next_polling) ? + devfreq->next_polling : next_jiffies; + + mutex_unlock(&devfreq->lock); + } + + if (next_jiffies > 0 && next_jiffies < ULONG_MAX) { + polling = true; + queue_delayed_work(devfreq_wq, &devfreq_work, next_jiffies); + } else { + polling = false; + } + + mutex_unlock(&devfreq_list_lock); +} + +/** + * devfreq_add_device() - Add devfreq feature to the device + * @dev: the device to add devfreq feature. + * @profile: device-specific profile to run devfreq. + * @governor: the policy to choose frequency. + * @data: private data for the governor. The devfreq framework does not + * touch this value. + */ +int devfreq_add_device(struct device *dev, struct devfreq_dev_profile *profile, + struct devfreq_governor *governor, void *data) +{ + struct devfreq *devfreq; + struct srcu_notifier_head *nh; + int err = 0; + + if (!dev || !profile || !governor) { + dev_err(dev, "%s: Invalid parameters.\n", __func__); + return -EINVAL; + } + + mutex_lock(&devfreq_list_lock); + + devfreq = find_device_devfreq(dev); + if (!IS_ERR(devfreq)) { + dev_err(dev, "%s: Unable to create devfreq for the device. It already has one.\n", __func__); + err = -EINVAL; + goto out; + } + + devfreq = kzalloc(sizeof(struct devfreq), GFP_KERNEL); + if (!devfreq) { + dev_err(dev, "%s: Unable to create devfreq for the device\n", + __func__); + err = -ENOMEM; + goto out; + } + + mutex_init(&devfreq->lock); + mutex_lock(&devfreq->lock); + devfreq->dev = dev; + devfreq->profile = profile; + devfreq->governor = governor; + devfreq->next_polling = devfreq->polling_jiffies + = msecs_to_jiffies(devfreq->profile->polling_ms); + devfreq->previous_freq = profile->initial_freq; + devfreq->data = data; + + devfreq->nb.notifier_call = devfreq_update; + + nh = opp_get_notifier(dev); + if (IS_ERR(nh)) { + err = PTR_ERR(nh); + goto err_opp; + } + err = srcu_notifier_chain_register(nh, &devfreq->nb); + if (err) + goto err_opp; + + if (governor->init) + err = governor->init(devfreq); + if (err) + goto err_init; + + list_add(&devfreq->node, &devfreq_list); + + if (devfreq_wq && devfreq->next_polling && !polling) { + polling = true; + queue_delayed_work(devfreq_wq, &devfreq_work, + devfreq->next_polling); + } + mutex_unlock(&devfreq->lock); + goto out; +err_init: + srcu_notifier_chain_unregister(nh, &devfreq->nb); +err_opp: + mutex_unlock(&devfreq->lock); + kfree(devfreq); +out: + mutex_unlock(&devfreq_list_lock); + return err; +} + +/** + * devfreq_remove_device() - Remove devfreq feature from a device. + * @device: the device to remove devfreq feature. + */ +int devfreq_remove_device(struct device *dev) +{ + struct devfreq *devfreq; + struct srcu_notifier_head *nh; + int err = 0; + + if (!dev) + return -EINVAL; + + mutex_lock(&devfreq_list_lock); + devfreq = find_device_devfreq(dev); + if (IS_ERR(devfreq)) { + err = PTR_ERR(devfreq); + goto out; + } + + mutex_lock(&devfreq->lock); + nh = opp_get_notifier(dev); + if (IS_ERR(nh)) { + err = PTR_ERR(nh); + mutex_unlock(&devfreq->lock); + goto out; + } + + list_del(&devfreq->node); + + if (devfreq->governor->exit) + devfreq->governor->exit(devfreq); + + srcu_notifier_chain_unregister(nh, &devfreq->nb); + mutex_unlock(&devfreq->lock); + kfree(devfreq); +out: + mutex_unlock(&devfreq_list_lock); + return 0; +} + +/** + * devfreq_init() - Initialize data structure for devfreq framework and + * start polling registered devfreq devices. + */ +static int __init devfreq_init(void) +{ + mutex_lock(&devfreq_list_lock); + polling = false; + devfreq_wq = create_freezable_workqueue("devfreq_wq"); + INIT_DELAYED_WORK_DEFERRABLE(&devfreq_work, devfreq_monitor); + mutex_unlock(&devfreq_list_lock); + + devfreq_monitor(&devfreq_work.work); + return 0; +} +late_initcall(devfreq_init); diff --git a/drivers/devfreq/governor.h b/drivers/devfreq/governor.h new file mode 100644 index 0000000..9122090 --- /dev/null +++ b/drivers/devfreq/governor.h @@ -0,0 +1,22 @@ +/* + * governor.h - internal header for governors. + * + * Copyright (C) 2011 Samsung Electronics + * MyungJoo Ham <myungjoo.ham@samsung.com> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This header is for devfreq governors in drivers/devfreq/ + */ + +#ifndef _GOVERNOR_H +#define _GOVERNOR_H + +extern struct devfreq *get_devfreq(struct device *dev); + +/* (Mandatory) Lock devfreq->lock before calling update_devfreq */ +extern int update_devfreq(struct devfreq *devfreq); + +#endif /* _GOVERNOR_H */ diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h new file mode 100644 index 0000000..f14b57d --- /dev/null +++ b/include/linux/devfreq.h @@ -0,0 +1,123 @@ +/* + * devfreq: Generic Dynamic Voltage and Frequency Scaling (DVFS) Framework + * for Non-CPU Devices Based on OPP. + * + * Copyright (C) 2011 Samsung Electronics + * MyungJoo Ham <myungjoo.ham@samsung.com> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#ifndef __LINUX_DEVFREQ_H__ +#define __LINUX_DEVFREQ_H__ + +#include <linux/device.h> +#include <linux/notifier.h> +#include <linux/opp.h> + +#define DEVFREQ_NAME_LEN 16 + +struct devfreq; +struct devfreq_dev_status { + /* both since the last measure */ + unsigned long total_time; + unsigned long busy_time; + unsigned long current_frequency; +}; + +struct devfreq_dev_profile { + unsigned long max_freq; /* may be larger than the actual value */ + unsigned long initial_freq; + unsigned int polling_ms; /* 0 for at opp change only */ + + int (*target)(struct device *dev, struct opp *opp); + int (*get_dev_status)(struct device *dev, + struct devfreq_dev_status *stat); +}; + +/** + * struct devfreq_governor - Devfreq policy governor + * @name Governor's name + * @get_target_freq Returns desired operating frequency for the device. + * Basically, get_target_freq will run + * devfreq_dev_profile.get_dev_status() to get the + * status of the device (load = busy_time / total_time). + * @init Called when the devfreq is being attached to a device + * @exit Called when the devfreq is being removed from a device + * + * Note that the callbacks are called with devfreq->lock locked by devfreq. + */ +struct devfreq_governor { + char name[DEVFREQ_NAME_LEN]; + int (*get_target_freq)(struct devfreq *this, unsigned long *freq); + int (*init)(struct devfreq *this); + void (*exit)(struct devfreq *this); +}; + +/** + * struct devfreq - Device devfreq structure + * @node list node - contains the devices with devfreq that have been + * registered. + * @lock a mutex to protect accessing devfreq. + * @dev device pointer + * @profile device-specific devfreq profile + * @governor method how to choose frequency based on the usage. + * @nb notifier block registered to the corresponding OPP to get + * notified for frequency availability updates. + * @polling_jiffies interval in jiffies. + * @previous_freq previously configured frequency value. + * @next_polling the number of remaining jiffies to poll with + * "devfreq_monitor" executions to reevaluate + * frequency/voltage of the device. Set by + * profile's polling_ms interval. + * @data Private data of the governor. The devfreq framework does not + * touch this. + * + * This structure stores the devfreq information for a give device. + * + * Note that when a governor accesses entries in struct devfreq in its + * functions except for the context of callbacks defined in struct + * devfreq_governor, the governor should protect its access with the + * struct mutex lock in struct devfreq. A governor may use this mutex + * to protect its own private data in void *data as well. + */ +struct devfreq { + struct list_head node; + + struct mutex lock; + struct device *dev; + struct devfreq_dev_profile *profile; + struct devfreq_governor *governor; + struct notifier_block nb; + + unsigned long polling_jiffies; + unsigned long previous_freq; + unsigned int next_polling; + + void *data; /* private data for governors */ +}; + +#if defined(CONFIG_PM_DEVFREQ) +extern int devfreq_add_device(struct device *dev, + struct devfreq_dev_profile *profile, + struct devfreq_governor *governor, + void *data); +extern int devfreq_remove_device(struct device *dev); +#else /* !CONFIG_PM_DEVFREQ */ +static int devfreq_add_device(struct device *dev, + struct devfreq_dev_profile *profile, + struct devfreq_governor *governor, + void *data) +{ + return 0; +} + +static int devfreq_remove_device(struct device *dev) +{ + return 0; +} +#endif /* CONFIG_PM_DEVFREQ */ + +#endif /* __LINUX_DEVFREQ_H__ */