Message ID | 200906020121.38946.rjw@sisk.pl (mailing list archive) |
---|---|
State | RFC |
Headers | show |
On Tue, 2 Jun 2009, Rafael J. Wysocki wrote: > > It's partly a question of avoiding unnecessary activity. Is there any > > reason to resume a device if it's just going to be autosuspended again > > a few seconds later? > > Well, I'm not sure if we know in advance what's going to happen to the device > right after a system-wide resume. It may be a totally different situation from > the device activity point of view. We _never_ know in advance what's going to happen. An autosuspended device may have to wake up at any time in order to carry out some work -- that's true whether or not a system sleep has taken place. IMO it doesn't make sense to wake up an autosuspended device for no reason other than that it _may_ need to do something; we should let it stay in a low-power mode until we _know_ it has to wake up. > > It's also a matter of principle. A system sleep should be as > > transparent as possible (except for the passage of time, of course). > > When the sleep is over, devices should be in the same state as before > > it began. > > As a matter of fact, this need not be true at all. For example, consider a > wireless network adapter that was assosicated with specific access point > before a system-wide suspend, but this access point is not present any more > after the subsequent system-wide resume. This example doesn't prove your point. Sure, conditions may have changed while the system was asleep. They might also change while the system is awake. In general, whether or not the system was asleep shouldn't make any difference -- to as great an extent as possible we should strive to pretend that the entire sleep took no time at all (or simply didn't happen). So: Suppose the system had been awake when the access point disappeared. What would the wireless adapter driver do then? It should try to do exactly the same thing if the access point disappears while the system is asleep. You see? The sleep should be transparent. > I really don't think we can predict what's going to happen to a device after > a system-wide resume, so IMO we should assume that the device will be necessary > immediately after the resume and we should resume it. It still can autosuspend > normally afterwards if it's not needed after all. This is highly debatable. I'd like to hear some thoughts from other people. > > > Also, what happens if the platform firmware > > > resumes the autosuspended devices before passing control to the kernel during > > > system-wide resume? How do we handle that? > > > > In the case of USB, it's not possible to resume a suspended device > > without first resuming and using a host controller. The host > > controller driver will realize that the firmware has taken over the > > controller and will reset the controller. ... > I'm not sure if this is doable so easily for PCI devices, for one example. > Most probably, they will come up in D0 after the system-wide suspend and > the PCI resume code will have to suspend them directly so that they remain > "autosuspended" after the resume. Clearly the behavior will need to vary among buses. What's appropriate for USB won't be appropriate for PCI. Furthermore, the overhead of waking a PCI device unnecessarily might be a lot lower than the overhead of waking a USB device, so it might make sense always to resume autosuspended PCI devices during a system resume. > > So the work routine would somehow be aware when a system sleep is in > > progress, at which point it would return immediately without doing > > anything? That seems racey. Isn't it easier just to freeze the > > workqueue? > > Well, yes and no. Namely, if we freeze the workqueue, all of the bus types > using it will have to check the vaildity of operations carried out by it, > because of the possibility that system-wide power transition took place in the > meantime and changed the state of the device. Also, please take the PCI > example into consideration. Freezability doesn't matter; you can prevent items from being added to the workqueue during a power transition just as easily regardless of whether it is freezable or not. In fact, it's easier if the workqueue is freezable. You don't have to prevent anything from being added; you just have to cancel everything on the workqueue before unfreezing it. One solution would be to have the bus types cancel their own pending work items explicitly. Another is to have two freezable PM workqueues, one with automatic cancellation and the other without. > > For autoresume requests coming from an interrupt handler, I'm not so > > sure. I don't know of any good examples -- in fact there don't seem to > > be any examples at the moment. (The USB core supports autoresume > > requests in interrupt context, AFAICS no driver uses them.) So maybe > > it's premature to worry about this case. > > Hmm. Can you please describe a theoretical situation in which this kind > of autoresume request will appear? I'm not really sure what you mean here. :-) Well, here's something people discussed but which hasn't been implemented (and may never be implemented). A suspended USB keyboard doesn't have enough power to turn on any LEDs. So suppose a system has two keyboards, and kbd-A is autosuspended when the user engages CapsLock on kbd-B. The input layer wants to turn on the CapsLock LED on all the keyboards, which means it has to autoresume kbd-A. But all this action takes place in interrupt context, so an autoresume request would have to be queued. > > Lastly, we have resume requests coming from a device -- i.e., remote > > wakeup requests. This is a little tricky. Suppose you have remote > > wakeup enabled, and you press a key on your USB keyboard just as the > > computer is going to sleep. Should this cause the sleep to be aborted? > > Should it cause the computer to wake back up again right away? Should > > the keystroke be ignored? > > This is a difficult problem in general. I _think_ we should ignore such > autoresume requests for devices that are not marked as wake-up ones > and we should abort the suspend if there's a request from a wake-up device. It _is_ difficult. Right now USB ignores such requests, because the threads that handle them are all freezable. But the information isn't lost; it is stored in the hardware and when the system wakes up the requests will be seen. Except for wakeup requests from root hubs. If they arrive at just the right time then they are not stored in the hardware -- instead they are stored as work items on the ksuspend_usbd workqueue! In essence, the workqueue serves as a software mechanism for saving IRQs which were delivered but couldn't be acted on. That's why I don't want to erase the work items; it would be logically equivalent to dropping interrupts. > Anyway, below is a prototype implementation of a generic runtime PM workqueue, > just for discussion. > > pm_schedule_suspend() and pm_schedule_resume() are to be called by bus types > that are requesting the autosuspend or autoresume, respectively, of a device. > The supposed control flow is like this: > * bus type calls pm_schedule_suspend() to schedule autosuspend of a device > * at the right time the PM workqueue executes the bus type's ->autosuspend() > callback which is responsible for calling the device driver to suspend the > device (the way in which this is done is totally bus type-specific) > * if there's a resume event for the device, it's delivered to the bus type > driver (in a bus type-specific way) and the bus type schedules autoresume of > the device by calling pm_schedule_resume() > * the PM workqueue executes the bus type's ->autoresume() as soon as it can > and this callback is responsible for calling the device driver to resume the > device (the way in which this is done is totally bus type-specific). It looks like the only benefits provided by this patch are the workqueue itself and the embedded delayed_work structure. The ability to cancel the work items (or prevent them from being added) during a system sleep is questionable, as discussed above. There isn't any value added by the new functions or methods. In short, it would be just as useful simply to add the delayed_work structure and export two PM workqueues, both freezable, with automatic cancellation for one of them. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tuesday 02 June 2009, Alan Stern wrote: > On Tue, 2 Jun 2009, Rafael J. Wysocki wrote: > > > > It's partly a question of avoiding unnecessary activity. Is there any > > > reason to resume a device if it's just going to be autosuspended again > > > a few seconds later? > > > > Well, I'm not sure if we know in advance what's going to happen to the device > > right after a system-wide resume. It may be a totally different situation from > > the device activity point of view. > > We _never_ know in advance what's going to happen. An autosuspended > device may have to wake up at any time in order to carry out some work > -- that's true whether or not a system sleep has taken place. IMO it > doesn't make sense to wake up an autosuspended device for no reason > other than that it _may_ need to do something; we should let it stay in > a low-power mode until we _know_ it has to wake up. OK, that's a valid point. I also think that if the BIOS has already woke up the device during system-wide resume, we shouldn't try to put it to a low power state again unless we know that this is desirable. > > > It's also a matter of principle. A system sleep should be as > > > transparent as possible (except for the passage of time, of course). > > > When the sleep is over, devices should be in the same state as before > > > it began. > > > > As a matter of fact, this need not be true at all. For example, consider a > > wireless network adapter that was assosicated with specific access point > > before a system-wide suspend, but this access point is not present any more > > after the subsequent system-wide resume. > > This example doesn't prove your point. No, it doesn't. Still, it illustrates the problem I had in mind, that it may not be desirable to restore the pre-suspend settings in all cases. > Sure, conditions may have changed while the system was asleep. They might > also change while the system is awake. In general, whether or not the system > was asleep shouldn't make any difference -- to as great an extent as possible > we should strive to pretend that the entire sleep took no time at all (or > simply didn't happen). > > So: Suppose the system had been awake when the access point > disappeared. What would the wireless adapter driver do then? It > should try to do exactly the same thing if the access point disappears > while the system is asleep. Not really. Namely, when the access point _vanishes_ at run time, this is an error condition from which we have to recover, while if it's _not_ _present_ after a resume, it's a normal situation and we shouldn't be recovering from that (the user might have changed the physical location while suspended). On my system the wireless always remembers the network it was associated with before suspend and it tries to reassociate with it after the subsequent resume, even if the network is not there any more, so it fails and I have to choose another one manually. This isn't a desirable behavior and I'd even say it's not correct, because after a fresh boot it always scans for networks and associates with one that's actually present. I'd like it to do the same after the resume. > You see? The sleep should be transparent. In some cases it won't be, because the hardware state changes while sleeping. For example, a laptop battery may be depleted while suspended, so that after resume it's in the 'low' condition, while it was in the 'good' condition before the preceding suspend. Arguably, you can't drain a battery from 90% to 10%, for example, momentarily. Also, a CPU fan might be 100% on before suspend, because the CPU was hot at that time, but that doesn't mean the fan should be 100% on after the subsequent resume, because it's likely that the CPU will be cold (on some systems trying to make the fan spin too fast may lead to general problems with thermal management afterwards). > > I really don't think we can predict what's going to happen to a device after > > a system-wide resume, so IMO we should assume that the device will be necessary > > immediately after the resume and we should resume it. It still can autosuspend > > normally afterwards if it's not needed after all. > > This is highly debatable. I'd like to hear some thoughts from other > people. Agreed, but no one has spoken so far. :-) > > > > Also, what happens if the platform firmware > > > > resumes the autosuspended devices before passing control to the kernel during > > > > system-wide resume? How do we handle that? > > > > > > In the case of USB, it's not possible to resume a suspended device > > > without first resuming and using a host controller. The host > > > controller driver will realize that the firmware has taken over the > > > controller and will reset the controller. > ... > > > I'm not sure if this is doable so easily for PCI devices, for one example. > > Most probably, they will come up in D0 after the system-wide suspend and > > the PCI resume code will have to suspend them directly so that they remain > > "autosuspended" after the resume. > > Clearly the behavior will need to vary among buses. Agreed. > What's appropriate for USB won't be appropriate for PCI. Furthermore, the > overhead of waking a PCI device unnecessarily might be a lot lower than the > overhead of waking a USB device, so it might make sense always to > resume autosuspended PCI devices during a system resume. Sure. Also, as I said earlier, the BIOS usually wakes up PCI devices (in fact a bus reset causes them to wake up and the buses are ususally reset during resume by the BIOS) and it doesn't really make sense to put them into low power states just because they were in low power states before the suspend. > > > So the work routine would somehow be aware when a system sleep is in > > > progress, at which point it would return immediately without doing > > > anything? That seems racey. Isn't it easier just to freeze the > > > workqueue? > > > > Well, yes and no. Namely, if we freeze the workqueue, all of the bus types > > using it will have to check the vaildity of operations carried out by it, > > because of the possibility that system-wide power transition took place in the > > meantime and changed the state of the device. Also, please take the PCI > > example into consideration. > > Freezability doesn't matter; you can prevent items from being added to > the workqueue during a power transition just as easily regardless of > whether it is freezable or not. In fact, it's easier if the workqueue > is freezable. You don't have to prevent anything from being added; you > just have to cancel everything on the workqueue before unfreezing it. Good idea. > One solution would be to have the bus types cancel their own pending > work items explicitly. Another is to have two freezable PM workqueues, > one with automatic cancellation and the other without. What about having a flag in struct dev_pm_info (along with the delayed work structure) indicating whether or not the work should be automatically cancelled before unfreezing the queue? [It seems, in that case we'd need another flag to indicate what is to cancel, the delayed work or just the work.] > > > For autoresume requests coming from an interrupt handler, I'm not so > > > sure. I don't know of any good examples -- in fact there don't seem to > > > be any examples at the moment. (The USB core supports autoresume > > > requests in interrupt context, AFAICS no driver uses them.) So maybe > > > it's premature to worry about this case. > > > > Hmm. Can you please describe a theoretical situation in which this kind > > of autoresume request will appear? I'm not really sure what you mean here. :-) > > Well, here's something people discussed but which hasn't been > implemented (and may never be implemented). A suspended USB keyboard > doesn't have enough power to turn on any LEDs. So suppose a system has > two keyboards, and kbd-A is autosuspended when the user engages > CapsLock on kbd-B. The input layer wants to turn on the CapsLock LED > on all the keyboards, which means it has to autoresume kbd-A. But all > this action takes place in interrupt context, so an autoresume request > would have to be queued. I see, but this is similar to the remote wake-up in that we have to queue the autoresume request. > > > Lastly, we have resume requests coming from a device -- i.e., remote > > > wakeup requests. This is a little tricky. Suppose you have remote > > > wakeup enabled, and you press a key on your USB keyboard just as the > > > computer is going to sleep. Should this cause the sleep to be aborted? > > > Should it cause the computer to wake back up again right away? Should > > > the keystroke be ignored? > > > > This is a difficult problem in general. I _think_ we should ignore such > > autoresume requests for devices that are not marked as wake-up ones > > and we should abort the suspend if there's a request from a wake-up device. > > It _is_ difficult. Right now USB ignores such requests, because the > threads that handle them are all freezable. But the information isn't > lost; it is stored in the hardware and when the system wakes up the > requests will be seen. > > Except for wakeup requests from root hubs. If they arrive at just the > right time then they are not stored in the hardware -- instead they are > stored as work items on the ksuspend_usbd workqueue! In essence, the > workqueue serves as a software mechanism for saving IRQs which were > delivered but couldn't be acted on. > > That's why I don't want to erase the work items; it would be logically > equivalent to dropping interrupts. I see. Still, we'll need to think about a mechanism allowing us to abort suspend if there's a wake-up request from a wake-up device. > > Anyway, below is a prototype implementation of a generic runtime PM workqueue, > > just for discussion. > > > > pm_schedule_suspend() and pm_schedule_resume() are to be called by bus types > > that are requesting the autosuspend or autoresume, respectively, of a device. > > The supposed control flow is like this: > > * bus type calls pm_schedule_suspend() to schedule autosuspend of a device > > * at the right time the PM workqueue executes the bus type's ->autosuspend() > > callback which is responsible for calling the device driver to suspend the > > device (the way in which this is done is totally bus type-specific) > > * if there's a resume event for the device, it's delivered to the bus type > > driver (in a bus type-specific way) and the bus type schedules autoresume of > > the device by calling pm_schedule_resume() > > * the PM workqueue executes the bus type's ->autoresume() as soon as it can > > and this callback is responsible for calling the device driver to resume the > > device (the way in which this is done is totally bus type-specific). > > It looks like the only benefits provided by this patch are the > workqueue itself and the embedded delayed_work structure. The ability > to cancel the work items (or prevent them from being added) during a > system sleep is questionable, as discussed above. There isn't any > value added by the new functions or methods. The value added is that since the delayed work member is to be used for queueing work, each bus type will have to do the same casting from 'struct work_struct' to 'struct delayed_work' and next to 'struct device' to get to the device object, which means quite a bit of code duplication. Also, IMO, if there's a set of functions for doing that, the probability that someone gets things wrong is smaller (slightly). Moreover, we can add variants for the works that will be automatically cancelled during system-wide resume, so that bus types don't have to worry about the details related to that. Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi! > > Sure, conditions may have changed while the system was asleep. They might > > also change while the system is awake. In general, whether or not the system > > was asleep shouldn't make any difference -- to as great an extent as possible > > we should strive to pretend that the entire sleep took no time at all (or > > simply didn't happen). > > > > So: Suppose the system had been awake when the access point > > disappeared. What would the wireless adapter driver do then? It > > should try to do exactly the same thing if the access point disappears > > while the system is asleep. > > Not really. Namely, when the access point _vanishes_ at run time, this is an > error condition from which we have to recover, while if it's _not_ _present_ > after a resume, it's a normal situation and we shouldn't be recovering from > that (the user might have changed the physical location while > suspended). I disagree here. User can just walk away with powered notebook. > > You see? The sleep should be transparent. > > In some cases it won't be, because the hardware state changes while sleeping. > For example, a laptop battery may be depleted while suspended, so that after > resume it's in the 'low' condition, while it was in the 'good' condition before > the preceding suspend. Arguably, you can't drain a battery from 90% to 10%, > for example, momentarily. Also, a CPU fan might be 100% on before > suspend, Actually... going 90%->10% immediately is common behaviour of old li-ion batteries. > because the CPU was hot at that time, but that doesn't mean the fan should be > 100% on after the subsequent resume, because it's likely that the CPU will be > cold (on some systems trying to make the fan spin too fast may lead to > general problems with thermal management afterwards). 100% is indeed sane default for fan. Yes you should slow the fan down when you read the real temperature. And yes some systems play it safe and do 100% fan during boot. (Otherwise you may have heat problems if you suspend and immediately resume). Pavel
Index: linux-2.6/include/linux/pm.h =================================================================== --- linux-2.6.orig/include/linux/pm.h +++ linux-2.6/include/linux/pm.h @@ -22,6 +22,7 @@ #define _LINUX_PM_H #include <linux/list.h> +#include <linux/workqueue.h> /* * Callbacks for platform drivers to implement. @@ -182,6 +183,8 @@ struct dev_pm_ops { int (*thaw_noirq)(struct device *dev); int (*poweroff_noirq)(struct device *dev); int (*restore_noirq)(struct device *dev); + int (*autosuspend)(struct device *dev); + int (*autoresume)(struct device *dev); }; /** @@ -323,8 +326,21 @@ struct dev_pm_info { #ifdef CONFIG_PM_SLEEP struct list_head entry; #endif + struct delayed_work dwork; }; +extern void pm_schedule_suspend(struct device *dev, unsigned long delay); +extern void pm_schedule_resume(struct device *dev); + +extern bool transition_started; + +static inline bool pm_transition_in_progress(void) +{ + return transition_started; +} + +extern void pm_flush_workqueue(void); + /* * The PM_EVENT_ messages are also used by drivers implementing the legacy * suspend framework, based on the ->suspend() and ->resume() callbacks common Index: linux-2.6/kernel/power/main.c =================================================================== --- linux-2.6.orig/kernel/power/main.c +++ linux-2.6/kernel/power/main.c @@ -22,6 +22,8 @@ #include <linux/freezer.h> #include <linux/vmstat.h> #include <linux/syscalls.h> +#include <linux/workqueue.h> +#include <linux/device.h> #include "power.h" @@ -578,12 +580,20 @@ static struct attribute_group attr_group .attrs = g, }; +static int __init pm_start_workqueue(void); static int __init pm_init(void) { + int error; + + error = pm_start_workqueue(); + if (error) + return error; + power_kobj = kobject_create_and_add("power", NULL); if (!power_kobj) return -ENOMEM; + return sysfs_create_group(power_kobj, &attr_group); } @@ -729,3 +739,68 @@ done: late_initcall(test_suspend); #endif /* CONFIG_PM_TEST_SUSPEND */ + + +static struct workqueue_struct *pm_wq; + +static int __init pm_start_workqueue(void) +{ + pm_wq = create_singlethread_workqueue("pm"); + + return pm_wq ? 0 : -ENOMEM; +} + +void pm_flush_workqueue(void) +{ + flush_workqueue(pm_wq); +} + +static inline struct device *pm_dwork_to_device(struct delayed_work *dwork) +{ + struct dev_pm_info *d = container_of(dwork, struct dev_pm_info, dwork); + return container_of(d, struct device, power); +} + +static void pm_autosuspend(struct work_struct *work) +{ + struct delayed_work *dwork = to_delayed_work(work); + struct device *dev = pm_dwork_to_device(dwork); + + device_pm_lock(); + if (dev && dev->bus && dev->bus->pm && dev->bus->pm->autosuspend + && !pm_transition_in_progress()) + dev->bus->pm->autosuspend(dev); + device_pm_unlock(); +} + +void pm_schedule_suspend(struct device *dev, unsigned long delay) +{ + device_pm_lock(); + if (!pm_transition_in_progress()) { + INIT_DELAYED_WORK(&dev->power.dwork, pm_autosuspend); + queue_delayed_work(pm_wq, &dev->power.dwork, delay); + } + device_pm_unlock(); +} + +static void pm_autoresume(struct work_struct *work) +{ + struct delayed_work *dwork = to_delayed_work(work); + struct device *dev = pm_dwork_to_device(dwork); + + device_pm_lock(); + if (dev && dev->bus && dev->bus->pm && dev->bus->pm->autoresume + && !pm_transition_in_progress()) + dev->bus->pm->autoresume(dev); + device_pm_unlock(); +} + +void pm_schedule_resume(struct device *dev) +{ + device_pm_lock(); + if (!pm_transition_in_progress()) { + INIT_WORK(&dev->power.dwork.work, pm_autoresume); + queue_work(pm_wq, &dev->power.dwork.work); + } + device_pm_unlock(); +} Index: linux-2.6/drivers/base/power/main.c =================================================================== --- linux-2.6.orig/drivers/base/power/main.c +++ linux-2.6/drivers/base/power/main.c @@ -46,7 +46,7 @@ static DEFINE_MUTEX(dpm_list_mtx); * Set once the preparation of devices for a PM transition has started, reset * before starting to resume devices. Protected by dpm_list_mtx. */ -static bool transition_started; +bool transition_started; /** * device_pm_lock - lock the list of active devices used by the PM core @@ -436,7 +436,6 @@ static void dpm_resume(pm_message_t stat INIT_LIST_HEAD(&list); mutex_lock(&dpm_list_mtx); - transition_started = false; while (!list_empty(&dpm_list)) { struct device *dev = to_device(dpm_list.next); @@ -461,6 +460,7 @@ static void dpm_resume(pm_message_t stat put_device(dev); } list_splice(&list, &dpm_list); + transition_started = false; mutex_unlock(&dpm_list_mtx); } @@ -752,9 +752,14 @@ static int dpm_prepare(pm_message_t stat struct list_head list; int error = 0; - INIT_LIST_HEAD(&list); mutex_lock(&dpm_list_mtx); transition_started = true; + mutex_unlock(&dpm_list_mtx); + + pm_flush_workqueue(); + + INIT_LIST_HEAD(&list); + mutex_lock(&dpm_list_mtx); while (!list_empty(&dpm_list)) { struct device *dev = to_device(dpm_list.next);