Message ID | 20220418210046.2060937-1-evgreen@chromium.org (mailing list archive) |
---|---|
Headers | show |
Series | USB: Quiesce interrupts across pm freeze | expand |
On 18.04.22 23:00, Evan Green wrote: > The documentation for the freeze() method says that it "should quiesce > the device so that it doesn't generate IRQs or DMA". The unspoken > consequence of not doing this is that MSIs aimed at non-boot CPUs may > get fully lost if they're sent during the period where the target CPU is > offline. > > The current behavior of the USB subsystem still allows interrupts to > come in after freeze, both in terms of remote wakeups and HC events > related to things like root plug port activity. This can get controllers > like XHCI, which is very sensitive to lost interrupts, in a wedged > state. This series attempts to fully quiesce interrupts coming from USB > across in a freeze or quiescent state. > > These patches are grouped together because they serve a united purpose, > but are actually independent. They could be merged or reverted > individually. Hi, sorry for being a bit late in this discussion. There was something that I didn't remember immediately. We have a set of quirky devices that need HID_QUIRK_ALWAYS_POLL. They have the nasty firmware bug that, if you suspend them without remote wakeup, they will crash or reset themselves. I am afraid that has an obvious relevance to your cool patches. I am not completely sure how to deal with this. It seems to me that the quirk will need to be shifted from HID to core USB and thaw() needs to be translated into usb_device_reset() + reset_resume() for them, but I am not really sure about the optimal mechanism. Regards Oliver
On Tue, Apr 19, 2022 at 09:05:53AM +0200, Oliver Neukum wrote: > > > On 18.04.22 23:00, Evan Green wrote: > > The documentation for the freeze() method says that it "should quiesce > > the device so that it doesn't generate IRQs or DMA". The unspoken > > consequence of not doing this is that MSIs aimed at non-boot CPUs may > > get fully lost if they're sent during the period where the target CPU is > > offline. > > > > The current behavior of the USB subsystem still allows interrupts to > > come in after freeze, both in terms of remote wakeups and HC events > > related to things like root plug port activity. This can get controllers > > like XHCI, which is very sensitive to lost interrupts, in a wedged > > state. This series attempts to fully quiesce interrupts coming from USB > > across in a freeze or quiescent state. > > > > These patches are grouped together because they serve a united purpose, > > but are actually independent. They could be merged or reverted > > individually. > Hi, > > sorry for being a bit late in this discussion. There was something that > I didn't remember immediately. > > We have a set of quirky devices that need HID_QUIRK_ALWAYS_POLL. > They have the nasty firmware bug that, if you suspend them without > remote wakeup, they will crash or reset themselves. > I am afraid that has an obvious relevance to your cool patches. > I am not completely sure how to deal with this. It seems to me that the > quirk will need to be shifted from HID to core USB and thaw() needs to > be translated into usb_device_reset() + reset_resume() for them, > but I am not really sure about the optimal mechanism. We may not need to do anything. This patch specifically addresses hibernation, not system suspend or runtime suspend. A device crashing or resetting during hibernation is not at all unusual; we should be able to handle such cases properly. The THAW part of suspend-to-hibernation is used only for writing the memory image to permanent storage. I doubt that a malfunctioning HID device would interfere with this process. Alan Stern
On 19.04.22 16:35, Alan Stern wrote: > On Tue, Apr 19, 2022 at 09:05:53AM +0200, Oliver Neukum wrote: > > > We have a set of quirky devices that need HID_QUIRK_ALWAYS_POLL. > They have the nasty firmware bug that, if you suspend them without > remote wakeup, they will crash or reset themselves. > I am afraid that has an obvious relevance to your cool patches. > I am not completely sure how to deal with this. It seems to me that the > quirk will need to be shifted from HID to core USB and thaw() needs to > be translated into usb_device_reset() + reset_resume() for them, > but I am not really sure about the optimal mechanism. > We may not need to do anything. This patch specifically addresses > hibernation, not system suspend or runtime suspend. A device crashing > or resetting during hibernation is not at all unusual; we should be able > to handle such cases properly. > > The THAW part of suspend-to-hibernation is used only for writing the > memory image to permanent storage. I doubt that a malfunctioning HID > device would interfere with this process. > True, if and only if all goes well. At the time thaw() has run writing the image to disk can still fail. In that case the devices will still be needed. Regards Oliver
On Tue, Apr 19, 2022 at 05:51:38PM +0200, Oliver Neukum wrote: > > > On 19.04.22 16:35, Alan Stern wrote: > > On Tue, Apr 19, 2022 at 09:05:53AM +0200, Oliver Neukum wrote: > > > > > > We have a set of quirky devices that need HID_QUIRK_ALWAYS_POLL. > > They have the nasty firmware bug that, if you suspend them without > > remote wakeup, they will crash or reset themselves. > > I am afraid that has an obvious relevance to your cool patches. > > I am not completely sure how to deal with this. It seems to me that the > > quirk will need to be shifted from HID to core USB and thaw() needs to > > be translated into usb_device_reset() + reset_resume() for them, > > but I am not really sure about the optimal mechanism. > > We may not need to do anything. This patch specifically addresses > > hibernation, not system suspend or runtime suspend. A device crashing > > or resetting during hibernation is not at all unusual; we should be able > > to handle such cases properly. > > > > The THAW part of suspend-to-hibernation is used only for writing the > > memory image to permanent storage. I doubt that a malfunctioning HID > > device would interfere with this process. > > > True, if and only if all goes well. At the time thaw() has run writing > the image to disk can still fail. In that case the devices will still > be needed. Consider adding a mechanism to usbcore which would allow an interface driver to request that the next time its device is resumed, the core should perform a reset-resume. Would that help? Alan Stern
On 19.04.22 19:49, Alan Stern wrote: > On Tue, Apr 19, 2022 at 05:51:38PM +0200, Oliver Neukum wrote: >> >> On 19.04.22 16:35, Alan Stern wrote: >>> On Tue, Apr 19, 2022 at 09:05:53AM +0200, Oliver Neukum wrote: >>> >>> >>> The THAW part of suspend-to-hibernation is used only for writing the >>> memory image to permanent storage. I doubt that a malfunctioning HID >>> device would interfere with this process. >>> >> True, if and only if all goes well. At the time thaw() has run writing >> the image to disk can still fail. In that case the devices will still >> be needed. > Consider adding a mechanism to usbcore which would allow an interface > driver to request that the next time its device is resumed, the core > should perform a reset-resume. Would that help? > > Strictly speaking no. We already have that in form of the RESET_RESUME quirk. The broken devices we are talking about here can do runtime PM perfectly fine, if and only if remote wakeup is requested. So we need that flag to translate only in freeze()/thaw() resulting in that behavior, as opposed to every pair of suspend()/resume() Regards Oliver
On Wed, Apr 20, 2022 at 10:47:27AM +0200, Oliver Neukum wrote: > > > On 19.04.22 19:49, Alan Stern wrote: > > On Tue, Apr 19, 2022 at 05:51:38PM +0200, Oliver Neukum wrote: > >> > >> On 19.04.22 16:35, Alan Stern wrote: > >>> On Tue, Apr 19, 2022 at 09:05:53AM +0200, Oliver Neukum wrote: > >>> > >>> > >>> The THAW part of suspend-to-hibernation is used only for writing the > >>> memory image to permanent storage. I doubt that a malfunctioning HID > >>> device would interfere with this process. > >>> > >> True, if and only if all goes well. At the time thaw() has run writing > >> the image to disk can still fail. In that case the devices will still > >> be needed. > > Consider adding a mechanism to usbcore which would allow an interface > > driver to request that the next time its device is resumed, the core > > should perform a reset-resume. Would that help? > > > > > > Strictly speaking no. We already have that in form of the RESET_RESUME > quirk. The broken devices we are talking about here can do runtime PM > perfectly fine, if and only if remote wakeup is requested. > So we need that flag to translate only in freeze()/thaw() resulting in that > behavior, as opposed to every pair of suspend()/resume() That was my point. The HID driver can check at suspend time whether or not remote wakeup will be enabled. If yes, well and good, no changes are needed. If not, the driver can then request that the following resume be a runtime-resume. Another possibility is, as you mentioned before, adding a USB quirk for devices which require reset-resume whenever they are resumed with wakeup disabled. Alan Stern