Message ID | 20200204175238.3817-1-lja@iki.fi (mailing list archive) |
---|---|
State | Rejected |
Headers | show |
Series | [v5] USB: HID: random timeout failures tackle try. | expand |
On Tue, 4 Feb 2020, Lauri Jakku wrote: > -- v1 ------------------------------------------------------------ > send, 20ms apart, control messages, if error is timeout. > > There is multiple reports of random behaviour of USB HID devices. > > I have mouse that acts sometimes quite randomly, I debugged with > logs others have published that there is HW timeouts that leave > device in state that it is errorneus. > > To fix this I introduced retry mechanism in root of USB HID drivers. > > Fix does not slow down operations at all if there is no -ETIMEDOUT > got from control message sending. If there is one, then sleep 20ms > and try again. Retry count is 20 witch translates maximium of 400ms > before giving up. > > NOTE: This does not sleep anymore then before, if all is golden. > > -- v2 ------------------------------------------------------------ > > If there is timeout, then sleep 20ms and try again. Retry count is 20 > witch translates maximium of 400ms before giving up. If the 400ms > boundary is reached the HW is really bad. > > JUST to be clear: > This does not make USB HID devices to sleep anymore than > before, if all is golden. > > Why modify usb-hid-core: No need to modify driver by driver. Please note: Your patch does _not_ modify usb-hid-core (that is, drivers/hid/usbhid). It modifies usb-core (that is, drivers/usb/core). Alan Stern
On Tue, Feb 04, 2020 at 07:52:39PM +0200, Lauri Jakku wrote: > -- v1 ------------------------------------------------------------ > send, 20ms apart, control messages, if error is timeout. > > There is multiple reports of random behaviour of USB HID devices. > > I have mouse that acts sometimes quite randomly, I debugged with > logs others have published that there is HW timeouts that leave > device in state that it is errorneus. > > To fix this I introduced retry mechanism in root of USB HID drivers. > > Fix does not slow down operations at all if there is no -ETIMEDOUT > got from control message sending. If there is one, then sleep 20ms > and try again. Retry count is 20 witch translates maximium of 400ms > before giving up. > > NOTE: This does not sleep anymore then before, if all is golden. > > -- v2 ------------------------------------------------------------ > > If there is timeout, then sleep 20ms and try again. Retry count is 20 > witch translates maximium of 400ms before giving up. If the 400ms > boundary is reached the HW is really bad. > > JUST to be clear: > This does not make USB HID devices to sleep anymore than > before, if all is golden. > > Why modify usb-hid-core: No need to modify driver by driver. > > -- v3 ------------------------------------------------------------ > > Timeout given is divided by 100, but taken care that it is always > at least 10ms. > > so total time in common worst-case-scenario is: > > sleep of 20ms + common timeout divided by 100 (50ms) makes > 70ms per loop, 20 loops => 1.4sec . > > -- v4 ------------------------------------------------------------ > No changes in code, just elaborating what is done in v[1,2,3]. > > -- v5 ------------------------------------------------------------ > changes in code: what the build robot found: > drivers/usb/core/message.c: In function 'usb_control_msg': > >> drivers/usb/core/message.c:173:11: error: type defaults to 'int' \ > in declaration of 'timeout_happened' [-Werror=implicit-int] > static timeout_happened = 0; > ^~~~~~~~~~~~~~~~ > cc1: some warnings being treated as errors > > Fix done: added int to declaration. Um, please read the documentation for how to do versioning of patches. Or look at examples on the mailing list for this as well. What you have here will unfortunately not work at all. thanks, greg k-h
On Tue, Feb 04, 2020 at 07:52:39PM +0200, Lauri Jakku wrote: > diff --git a/drivers/usb/core/message.c b/drivers/usb/core/message.c > index 5adf489428aa..2e0bfe70f7c5 100644 > --- a/drivers/usb/core/message.c > +++ b/drivers/usb/core/message.c Ok, changelog issues aside: > @@ -20,6 +20,7 @@ > #include <linux/usb/hcd.h> /* for usbcore internals */ > #include <linux/usb/of.h> > #include <asm/byteorder.h> > +#include <linux/errno.h> > > #include "usb.h" > > @@ -137,7 +138,10 @@ int usb_control_msg(struct usb_device *dev, unsigned int pipe, __u8 request, > __u16 size, int timeout) > { > struct usb_ctrlrequest *dr; > - int ret; > + int ret = -ETIMEDOUT; No need to initialize this. > + > + /* retry_cnt * 20ms, max retry time set to 400ms */ > + int retry_cnt = 20; Why? You just now changed how all drivers will deal with timeouts. And you didn't change any drivers :( > > dr = kmalloc(sizeof(struct usb_ctrlrequest), GFP_NOIO); > if (!dr) > @@ -149,11 +153,52 @@ int usb_control_msg(struct usb_device *dev, unsigned int pipe, __u8 request, > dr->wIndex = cpu_to_le16(index); > dr->wLength = cpu_to_le16(size); > > - ret = usb_internal_control_msg(dev, pipe, dr, data, size, timeout); > + do { > + ret = usb_internal_control_msg(dev, > + pipe, > + dr, > + data, > + size, > + timeout); > + > + /* > + * Linger a bit, prior to the next control message > + * or if return value is timeout, but do try few > + * times (max 400ms) before quitting. Adapt timeout > + * to be smaller when we have timeout'd first time. > + */ > + if (dev->quirks & USB_QUIRK_DELAY_CTRL_MSG) > + msleep(200); > + else if (ret == -ETIMEDOUT) { > + static int timeout_happened = 0; Woah, what about this function being called multiple times from different devices all at the same time? Are you _SURE_ you want this to be static? Hint, no way, not at all, this doesn't do at all what you think it does. We support more than one USB device in the system at a time :) > + > + if ( ! timeout_happened ) { > + timeout_happened = 1; > + > + /* > + * If timeout is given, divide it > + * by 100, if not, put 10ms timeout. > + * Always run scripts/checkpatch.pl on your patches so you do not get grumpy maintainers telling you to run scripts/checkpatch.pl on your patches. > + * Then safeguard: if timeout is under > + * 10ms, make timeout to be 10ms. > + */ > + > + if (timeout > 0) > + timeout /= 100; > + else > + timeout = 10; > + > + if (timeout < 10) > + timeout = 10; What is with this "backing off"? Why? Again, you just broke how all USB drivers treat the timeout value here, what happens for devices that do NOT want this retried? We do not want to retry in the USB core, _UNLESS_ the caller asks us to do so. Otherwise we will break things. I understand this works for your one device, but realize we need to support all devices in existance, at the same time :) thanks, greg k-h
Hi, Inline comments. On 4.2.2020 20.15, Greg KH wrote: > On Tue, Feb 04, 2020 at 07:52:39PM +0200, Lauri Jakku wrote: >> diff --git a/drivers/usb/core/message.c b/drivers/usb/core/message.c >> index 5adf489428aa..2e0bfe70f7c5 100644 >> --- a/drivers/usb/core/message.c >> +++ b/drivers/usb/core/message.c > Ok, changelog issues aside: > > >> @@ -20,6 +20,7 @@ >> #include <linux/usb/hcd.h> /* for usbcore internals */ >> #include <linux/usb/of.h> >> #include <asm/byteorder.h> >> +#include <linux/errno.h> >> >> #include "usb.h" >> >> @@ -137,7 +138,10 @@ int usb_control_msg(struct usb_device *dev, unsigned int pipe, __u8 request, >> __u16 size, int timeout) >> { >> struct usb_ctrlrequest *dr; >> - int ret; >> + int ret = -ETIMEDOUT; > No need to initialize this. > >> + >> + /* retry_cnt * 20ms, max retry time set to 400ms */ >> + int retry_cnt = 20; > Why? You just now changed how all drivers will deal with timeouts. And > you didn't change any drivers :( > > >> >> dr = kmalloc(sizeof(struct usb_ctrlrequest), GFP_NOIO); >> if (!dr) >> @@ -149,11 +153,52 @@ int usb_control_msg(struct usb_device *dev, unsigned int pipe, __u8 request, >> dr->wIndex = cpu_to_le16(index); >> dr->wLength = cpu_to_le16(size); >> >> - ret = usb_internal_control_msg(dev, pipe, dr, data, size, timeout); >> + do { >> + ret = usb_internal_control_msg(dev, >> + pipe, >> + dr, >> + data, >> + size, >> + timeout); >> + >> + /* >> + * Linger a bit, prior to the next control message >> + * or if return value is timeout, but do try few >> + * times (max 400ms) before quitting. Adapt timeout >> + * to be smaller when we have timeout'd first time. >> + */ >> + if (dev->quirks & USB_QUIRK_DELAY_CTRL_MSG) >> + msleep(200); >> + else if (ret == -ETIMEDOUT) { >> + static int timeout_happened = 0; > Woah, what about this function being called multiple times from > different devices all at the same time? I did not tought of that, yes, it will break alot of things like that, it can't be like that. Could the flag be in some platform-data? hmm platformdata would be the right place, or do you have suggestions. I have tought about the move from core -> hid ..would that be better, then in hid module put the flag so that it is per device. > Are you _SURE_ you want this to be static? > > Hint, no way, not at all, this doesn't do at all what you think it does. > We support more than one USB device in the system at a time :) > >> + >> + if ( ! timeout_happened ) { >> + timeout_happened = 1; >> + >> + /* >> + * If timeout is given, divide it >> + * by 100, if not, put 10ms timeout. >> + * > Always run scripts/checkpatch.pl on your patches so you do not get > grumpy maintainers telling you to run scripts/checkpatch.pl on your > patches. > Uh, forgot.. just starting to learn from general developer, to kernel developer. I'll try to remember that. >> + * Then safeguard: if timeout is under >> + * 10ms, make timeout to be 10ms. >> + */ >> + >> + if (timeout > 0) >> + timeout /= 100; >> + else >> + timeout = 10; >> + >> + if (timeout < 10) >> + timeout = 10; > What is with this "backing off"? Why? > > Again, you just broke how all USB drivers treat the timeout value here, > what happens for devices that do NOT want this retried? Hmm, maybe the quirk method could be used to ena/disa the retry. > > We do not want to retry in the USB core, _UNLESS_ the caller asks us to > do so. Otherwise we will break things. Yeah, i'm turning to like the idea to have the retry moved to hid module and have quirk define to disable/enable it. Starting really like the idea, I'm yet no kernel guru, learning while doing now. I really should buy a book about linux kernel. > > I understand this works for your one device, but realize we need to > support all devices in existance, at the same time :) Good comments, thanks for review. This is what i do (soonish, tomorrow perhaps): 1. move patchcode from core -> hid module 2. add variable & quirck to _disable_ retry code done to HID module (variable per platformdata, would used by hid module, no need to mess with core) Now is already so late that, if i start coding, i got in the zone and 6h is like breeze. > thanks, > > greg k-h
Hi, Inline reply. On 4.2.2020 20.07, Alan Stern wrote: > On Tue, 4 Feb 2020, Lauri Jakku wrote: > >> -- v1 ------------------------------------------------------------ >> send, 20ms apart, control messages, if error is timeout. >> >> There is multiple reports of random behaviour of USB HID devices. >> >> I have mouse that acts sometimes quite randomly, I debugged with >> logs others have published that there is HW timeouts that leave >> device in state that it is errorneus. >> >> To fix this I introduced retry mechanism in root of USB HID drivers. >> >> Fix does not slow down operations at all if there is no -ETIMEDOUT >> got from control message sending. If there is one, then sleep 20ms >> and try again. Retry count is 20 witch translates maximium of 400ms >> before giving up. >> >> NOTE: This does not sleep anymore then before, if all is golden. >> >> -- v2 ------------------------------------------------------------ >> >> If there is timeout, then sleep 20ms and try again. Retry count is 20 >> witch translates maximium of 400ms before giving up. If the 400ms >> boundary is reached the HW is really bad. >> >> JUST to be clear: >> This does not make USB HID devices to sleep anymore than >> before, if all is golden. >> >> Why modify usb-hid-core: No need to modify driver by driver. > Please note: Your patch does _not_ modify usb-hid-core (that is, > drivers/hid/usbhid). It modifies usb-core (that is, drivers/usb/core). > > Alan Stern Uh, sorry about that misleading commit message, i'll fix it in next version. Thank you for pointing that out :) Br, Lauri J.
Am Dienstag, den 04.02.2020, 19:52 +0200 schrieb Lauri Jakku: > -- v1 ------------------------------------------------------------ > send, 20ms apart, control messages, if error is timeout. > > There is multiple reports of random behaviour of USB HID devices. > > I have mouse that acts sometimes quite randomly, I debugged with > logs others have published that there is HW timeouts that leave > device in state that it is errorneus. > > To fix this I introduced retry mechanism in root of USB HID drivers. > > Fix does not slow down operations at all if there is no -ETIMEDOUT > got from control message sending. If there is one, then sleep 20ms > and try again. Retry count is 20 witch translates maximium of 400ms > before giving up. > > NOTE: This does not sleep anymore then before, if all is golden. This is well if retries help. In case any other form of error handling, such as a reset, are needed, these retries will hamper recovery. It seems to me that you really should use a separate function for this. Regards Oliver
diff --git a/drivers/usb/core/message.c b/drivers/usb/core/message.c index 5adf489428aa..2e0bfe70f7c5 100644 --- a/drivers/usb/core/message.c +++ b/drivers/usb/core/message.c @@ -20,6 +20,7 @@ #include <linux/usb/hcd.h> /* for usbcore internals */ #include <linux/usb/of.h> #include <asm/byteorder.h> +#include <linux/errno.h> #include "usb.h" @@ -137,7 +138,10 @@ int usb_control_msg(struct usb_device *dev, unsigned int pipe, __u8 request, __u16 size, int timeout) { struct usb_ctrlrequest *dr; - int ret; + int ret = -ETIMEDOUT; + + /* retry_cnt * 20ms, max retry time set to 400ms */ + int retry_cnt = 20; dr = kmalloc(sizeof(struct usb_ctrlrequest), GFP_NOIO); if (!dr) @@ -149,11 +153,52 @@ int usb_control_msg(struct usb_device *dev, unsigned int pipe, __u8 request, dr->wIndex = cpu_to_le16(index); dr->wLength = cpu_to_le16(size); - ret = usb_internal_control_msg(dev, pipe, dr, data, size, timeout); + do { + ret = usb_internal_control_msg(dev, + pipe, + dr, + data, + size, + timeout); + + /* + * Linger a bit, prior to the next control message + * or if return value is timeout, but do try few + * times (max 400ms) before quitting. Adapt timeout + * to be smaller when we have timeout'd first time. + */ + if (dev->quirks & USB_QUIRK_DELAY_CTRL_MSG) + msleep(200); + else if (ret == -ETIMEDOUT) { + static int timeout_happened = 0; + + if ( ! timeout_happened ) { + timeout_happened = 1; + + /* + * If timeout is given, divide it + * by 100, if not, put 10ms timeout. + * + * Then safeguard: if timeout is under + * 10ms, make timeout to be 10ms. + */ + + if (timeout > 0) + timeout /= 100; + else + timeout = 10; + + if (timeout < 10) + timeout = 10; + + } + + msleep(20); + } + + /* Loop while timeout, max loops: retry_cnt times. */ + } while ((retry_cnt-- > 0) && (ret == -ETIMEDOUT)); - /* Linger a bit, prior to the next control message. */ - if (dev->quirks & USB_QUIRK_DELAY_CTRL_MSG) - msleep(200); kfree(dr);