diff mbox series

[v5] USB: HID: random timeout failures tackle try.

Message ID 20200204175238.3817-1-lja@iki.fi (mailing list archive)
State Rejected
Headers show
Series [v5] USB: HID: random timeout failures tackle try. | expand

Commit Message

Lauri Jakku Feb. 4, 2020, 5:52 p.m. UTC
-- v1 ------------------------------------------------------------
send, 20ms apart, control messages, if error is timeout.

There is multiple reports of random behaviour of USB HID devices.

I have mouse that acts sometimes quite randomly, I debugged with
logs others have published that there is HW timeouts that leave
device in state that it is errorneus.

To fix this I introduced retry mechanism in root of USB HID drivers.

Fix does not slow down operations at all if there is no -ETIMEDOUT
got from control message sending. If there is one, then sleep 20ms
and try again. Retry count is 20 witch translates maximium of 400ms
before giving up.

NOTE: This does not sleep anymore then before, if all is golden.

-- v2 ------------------------------------------------------------

If there is timeout, then sleep 20ms and try again. Retry count is 20
witch translates maximium of 400ms before giving up. If the 400ms
boundary is reached the HW is really bad.

JUST to be clear:
    This does not make USB HID devices to sleep anymore than
    before, if all is golden.

Why modify usb-hid-core: No need to modify driver by driver.

-- v3 ------------------------------------------------------------

Timeout given is divided by 100, but taken care that it is always
at least 10ms.

so total time in common worst-case-scenario is:

 sleep of 20ms + common timeout divided by 100 (50ms) makes
 70ms per loop, 20 loops => 1.4sec .

-- v4 ------------------------------------------------------------
No changes in code, just elaborating what is done in v[1,2,3].

-- v5 ------------------------------------------------------------
changes in code: what the build robot found:
   drivers/usb/core/message.c: In function 'usb_control_msg':
>> drivers/usb/core/message.c:173:11: error: type defaults to 'int' \
     in declaration of 'timeout_happened' [-Werror=implicit-int]
       static timeout_happened = 0;
              ^~~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors

Fix done: added int to declaration.


Signed-off-by: Lauri Jakku <lja@iki.fi>
---
 drivers/usb/core/message.c | 55 ++++++++++++++++++++++++++++++++++----
 1 file changed, 50 insertions(+), 5 deletions(-)

Comments

Alan Stern Feb. 4, 2020, 6:07 p.m. UTC | #1
On Tue, 4 Feb 2020, Lauri Jakku wrote:

> -- v1 ------------------------------------------------------------
> send, 20ms apart, control messages, if error is timeout.
> 
> There is multiple reports of random behaviour of USB HID devices.
> 
> I have mouse that acts sometimes quite randomly, I debugged with
> logs others have published that there is HW timeouts that leave
> device in state that it is errorneus.
> 
> To fix this I introduced retry mechanism in root of USB HID drivers.
> 
> Fix does not slow down operations at all if there is no -ETIMEDOUT
> got from control message sending. If there is one, then sleep 20ms
> and try again. Retry count is 20 witch translates maximium of 400ms
> before giving up.
> 
> NOTE: This does not sleep anymore then before, if all is golden.
> 
> -- v2 ------------------------------------------------------------
> 
> If there is timeout, then sleep 20ms and try again. Retry count is 20
> witch translates maximium of 400ms before giving up. If the 400ms
> boundary is reached the HW is really bad.
> 
> JUST to be clear:
>     This does not make USB HID devices to sleep anymore than
>     before, if all is golden.
> 
> Why modify usb-hid-core: No need to modify driver by driver.

Please note: Your patch does _not_ modify usb-hid-core (that is, 
drivers/hid/usbhid).  It modifies usb-core (that is, drivers/usb/core).

Alan Stern
Greg KH Feb. 4, 2020, 6:11 p.m. UTC | #2
On Tue, Feb 04, 2020 at 07:52:39PM +0200, Lauri Jakku wrote:
> -- v1 ------------------------------------------------------------
> send, 20ms apart, control messages, if error is timeout.
> 
> There is multiple reports of random behaviour of USB HID devices.
> 
> I have mouse that acts sometimes quite randomly, I debugged with
> logs others have published that there is HW timeouts that leave
> device in state that it is errorneus.
> 
> To fix this I introduced retry mechanism in root of USB HID drivers.
> 
> Fix does not slow down operations at all if there is no -ETIMEDOUT
> got from control message sending. If there is one, then sleep 20ms
> and try again. Retry count is 20 witch translates maximium of 400ms
> before giving up.
> 
> NOTE: This does not sleep anymore then before, if all is golden.
> 
> -- v2 ------------------------------------------------------------
> 
> If there is timeout, then sleep 20ms and try again. Retry count is 20
> witch translates maximium of 400ms before giving up. If the 400ms
> boundary is reached the HW is really bad.
> 
> JUST to be clear:
>     This does not make USB HID devices to sleep anymore than
>     before, if all is golden.
> 
> Why modify usb-hid-core: No need to modify driver by driver.
> 
> -- v3 ------------------------------------------------------------
> 
> Timeout given is divided by 100, but taken care that it is always
> at least 10ms.
> 
> so total time in common worst-case-scenario is:
> 
>  sleep of 20ms + common timeout divided by 100 (50ms) makes
>  70ms per loop, 20 loops => 1.4sec .
> 
> -- v4 ------------------------------------------------------------
> No changes in code, just elaborating what is done in v[1,2,3].
> 
> -- v5 ------------------------------------------------------------
> changes in code: what the build robot found:
>    drivers/usb/core/message.c: In function 'usb_control_msg':
> >> drivers/usb/core/message.c:173:11: error: type defaults to 'int' \
>      in declaration of 'timeout_happened' [-Werror=implicit-int]
>        static timeout_happened = 0;
>               ^~~~~~~~~~~~~~~~
>    cc1: some warnings being treated as errors
> 
> Fix done: added int to declaration.

Um, please read the documentation for how to do versioning of patches.
Or look at examples on the mailing list for this as well.  What you have
here will unfortunately not work at all.

thanks,

greg k-h
Greg KH Feb. 4, 2020, 6:15 p.m. UTC | #3
On Tue, Feb 04, 2020 at 07:52:39PM +0200, Lauri Jakku wrote:
> diff --git a/drivers/usb/core/message.c b/drivers/usb/core/message.c
> index 5adf489428aa..2e0bfe70f7c5 100644
> --- a/drivers/usb/core/message.c
> +++ b/drivers/usb/core/message.c

Ok, changelog issues aside:


> @@ -20,6 +20,7 @@
>  #include <linux/usb/hcd.h>	/* for usbcore internals */
>  #include <linux/usb/of.h>
>  #include <asm/byteorder.h>
> +#include <linux/errno.h>
>  
>  #include "usb.h"
>  
> @@ -137,7 +138,10 @@ int usb_control_msg(struct usb_device *dev, unsigned int pipe, __u8 request,
>  		    __u16 size, int timeout)
>  {
>  	struct usb_ctrlrequest *dr;
> -	int ret;
> +	int ret = -ETIMEDOUT;

No need to initialize this.

> +
> +	/* retry_cnt * 20ms, max retry time set to 400ms */
> +	int retry_cnt = 20;

Why?  You just now changed how all drivers will deal with timeouts.  And
you didn't change any drivers :(


>  
>  	dr = kmalloc(sizeof(struct usb_ctrlrequest), GFP_NOIO);
>  	if (!dr)
> @@ -149,11 +153,52 @@ int usb_control_msg(struct usb_device *dev, unsigned int pipe, __u8 request,
>  	dr->wIndex = cpu_to_le16(index);
>  	dr->wLength = cpu_to_le16(size);
>  
> -	ret = usb_internal_control_msg(dev, pipe, dr, data, size, timeout);
> +	do {
> +		ret = usb_internal_control_msg(dev,
> +					pipe,
> +					dr,
> +					data,
> +					size,
> +					timeout);
> +
> +		/*
> +		 * Linger a bit, prior to the next control message
> +		 * or if return value is timeout, but do try few
> +		 * times (max 400ms) before quitting. Adapt timeout
> +		 * to be smaller when we have timeout'd first time.
> +		 */
> +		if (dev->quirks & USB_QUIRK_DELAY_CTRL_MSG)
> +			msleep(200);
> +		else if (ret == -ETIMEDOUT) {
> +			static int timeout_happened = 0;

Woah, what about this function being called multiple times from
different devices all at the same time?

Are you _SURE_ you want this to be static?

Hint, no way, not at all, this doesn't do at all what you think it does.
We support more than one USB device in the system at a time :)

> +
> +			if ( ! timeout_happened ) {
> +				timeout_happened = 1;
> +
> +				/* 
> +				 * If timeout is given, divide it
> +				 * by 100, if not, put 10ms timeout.
> +				 * 

Always run scripts/checkpatch.pl on your patches so you do not get
grumpy maintainers telling you to run scripts/checkpatch.pl on your
patches.


> +				 * Then safeguard: if timeout is under
> +				 * 10ms, make timeout to be 10ms.
> +				 */
> +
> +				if (timeout > 0)
> +					timeout /= 100;
> +				else
> +					timeout = 10;
> +
> +				if (timeout < 10)
> +					timeout = 10;

What is with this "backing off"?  Why?

Again, you just broke how all USB drivers treat the timeout value here,
what happens for devices that do NOT want this retried?

We do not want to retry in the USB core, _UNLESS_ the caller asks us to
do so.  Otherwise we will break things.

I understand this works for your one device, but realize we need to
support all devices in existance, at the same time :)

thanks,

greg k-h
Lauri Jakku Feb. 4, 2020, 8:05 p.m. UTC | #4
Hi,

Inline comments.

On 4.2.2020 20.15, Greg KH wrote:
> On Tue, Feb 04, 2020 at 07:52:39PM +0200, Lauri Jakku wrote:
>> diff --git a/drivers/usb/core/message.c b/drivers/usb/core/message.c
>> index 5adf489428aa..2e0bfe70f7c5 100644
>> --- a/drivers/usb/core/message.c
>> +++ b/drivers/usb/core/message.c
> Ok, changelog issues aside:
>
>
>> @@ -20,6 +20,7 @@
>>  #include <linux/usb/hcd.h>	/* for usbcore internals */
>>  #include <linux/usb/of.h>
>>  #include <asm/byteorder.h>
>> +#include <linux/errno.h>
>>  
>>  #include "usb.h"
>>  
>> @@ -137,7 +138,10 @@ int usb_control_msg(struct usb_device *dev, unsigned int pipe, __u8 request,
>>  		    __u16 size, int timeout)
>>  {
>>  	struct usb_ctrlrequest *dr;
>> -	int ret;
>> +	int ret = -ETIMEDOUT;
> No need to initialize this.
>
>> +
>> +	/* retry_cnt * 20ms, max retry time set to 400ms */
>> +	int retry_cnt = 20;
> Why?  You just now changed how all drivers will deal with timeouts.  And
> you didn't change any drivers :(
>
>
>>  
>>  	dr = kmalloc(sizeof(struct usb_ctrlrequest), GFP_NOIO);
>>  	if (!dr)
>> @@ -149,11 +153,52 @@ int usb_control_msg(struct usb_device *dev, unsigned int pipe, __u8 request,
>>  	dr->wIndex = cpu_to_le16(index);
>>  	dr->wLength = cpu_to_le16(size);
>>  
>> -	ret = usb_internal_control_msg(dev, pipe, dr, data, size, timeout);
>> +	do {
>> +		ret = usb_internal_control_msg(dev,
>> +					pipe,
>> +					dr,
>> +					data,
>> +					size,
>> +					timeout);
>> +
>> +		/*
>> +		 * Linger a bit, prior to the next control message
>> +		 * or if return value is timeout, but do try few
>> +		 * times (max 400ms) before quitting. Adapt timeout
>> +		 * to be smaller when we have timeout'd first time.
>> +		 */
>> +		if (dev->quirks & USB_QUIRK_DELAY_CTRL_MSG)
>> +			msleep(200);
>> +		else if (ret == -ETIMEDOUT) {
>> +			static int timeout_happened = 0;
> Woah, what about this function being called multiple times from
> different devices all at the same time?


I did not tought of that, yes, it will break alot of things like that, it can't be

like that.


Could the flag be in some platform-data? hmm platformdata

would be the right place, or do you have suggestions. I have tought

about the move from core -> hid ..would that be better, then in hid

module put the flag so that it is per device.


> Are you _SURE_ you want this to be static?
>
> Hint, no way, not at all, this doesn't do at all what you think it does.
> We support more than one USB device in the system at a time :)
>
>> +
>> +			if ( ! timeout_happened ) {
>> +				timeout_happened = 1;
>> +
>> +				/* 
>> +				 * If timeout is given, divide it
>> +				 * by 100, if not, put 10ms timeout.
>> +				 * 
> Always run scripts/checkpatch.pl on your patches so you do not get
> grumpy maintainers telling you to run scripts/checkpatch.pl on your
> patches.
>
Uh, forgot.. just starting to learn from general developer, to

kernel developer. I'll try to remember that.

>> +				 * Then safeguard: if timeout is under
>> +				 * 10ms, make timeout to be 10ms.
>> +				 */
>> +
>> +				if (timeout > 0)
>> +					timeout /= 100;
>> +				else
>> +					timeout = 10;
>> +
>> +				if (timeout < 10)
>> +					timeout = 10;
> What is with this "backing off"?  Why?
>
> Again, you just broke how all USB drivers treat the timeout value here,
> what happens for devices that do NOT want this retried?

Hmm, maybe the quirk method could be used to ena/disa the retry.

>
> We do not want to retry in the USB core, _UNLESS_ the caller asks us to
> do so.  Otherwise we will break things.

Yeah, i'm turning to like the idea to have the retry moved to hid module and

have quirk define to disable/enable it.


Starting really like the idea, I'm yet no kernel guru, learning while doing now.


I really should buy a book about linux kernel.

>
> I understand this works for your one device, but realize we need to
> support all devices in existance, at the same time :)


Good comments, thanks for review.


This is what i do (soonish, tomorrow perhaps):


1. move patchcode from core -> hid module

2. add variable & quirck to _disable_ retry code done to HID module

  (variable per platformdata, would used by hid module, no need to

   mess with core)


Now is already so late that, if i start coding, i got in the zone and 6h is

like breeze.


> thanks,
>
> greg k-h
Lauri Jakku Feb. 4, 2020, 8:14 p.m. UTC | #5
Hi,


Inline reply.

On 4.2.2020 20.07, Alan Stern wrote:
> On Tue, 4 Feb 2020, Lauri Jakku wrote:
>
>> -- v1 ------------------------------------------------------------
>> send, 20ms apart, control messages, if error is timeout.
>>
>> There is multiple reports of random behaviour of USB HID devices.
>>
>> I have mouse that acts sometimes quite randomly, I debugged with
>> logs others have published that there is HW timeouts that leave
>> device in state that it is errorneus.
>>
>> To fix this I introduced retry mechanism in root of USB HID drivers.
>>
>> Fix does not slow down operations at all if there is no -ETIMEDOUT
>> got from control message sending. If there is one, then sleep 20ms
>> and try again. Retry count is 20 witch translates maximium of 400ms
>> before giving up.
>>
>> NOTE: This does not sleep anymore then before, if all is golden.
>>
>> -- v2 ------------------------------------------------------------
>>
>> If there is timeout, then sleep 20ms and try again. Retry count is 20
>> witch translates maximium of 400ms before giving up. If the 400ms
>> boundary is reached the HW is really bad.
>>
>> JUST to be clear:
>>     This does not make USB HID devices to sleep anymore than
>>     before, if all is golden.
>>
>> Why modify usb-hid-core: No need to modify driver by driver.
> Please note: Your patch does _not_ modify usb-hid-core (that is, 
> drivers/hid/usbhid).  It modifies usb-core (that is, drivers/usb/core).
>
> Alan Stern

Uh, sorry about that misleading commit message, i'll fix it in next version.


Thank you for pointing that out :)


Br,

Lauri J.
Oliver Neukum Feb. 5, 2020, 10:03 a.m. UTC | #6
Am Dienstag, den 04.02.2020, 19:52 +0200 schrieb Lauri Jakku:
> -- v1 ------------------------------------------------------------
> send, 20ms apart, control messages, if error is timeout.
> 
> There is multiple reports of random behaviour of USB HID devices.
> 
> I have mouse that acts sometimes quite randomly, I debugged with
> logs others have published that there is HW timeouts that leave
> device in state that it is errorneus.
> 
> To fix this I introduced retry mechanism in root of USB HID drivers.
> 
> Fix does not slow down operations at all if there is no -ETIMEDOUT
> got from control message sending. If there is one, then sleep 20ms
> and try again. Retry count is 20 witch translates maximium of 400ms
> before giving up.
> 
> NOTE: This does not sleep anymore then before, if all is golden.

This is well if retries help. In case any other form of error handling,
such as a reset, are needed, these retries will hamper recovery.
It seems to me that you really should use a separate function
for this.

	Regards
		Oliver
diff mbox series

Patch

diff --git a/drivers/usb/core/message.c b/drivers/usb/core/message.c
index 5adf489428aa..2e0bfe70f7c5 100644
--- a/drivers/usb/core/message.c
+++ b/drivers/usb/core/message.c
@@ -20,6 +20,7 @@ 
 #include <linux/usb/hcd.h>	/* for usbcore internals */
 #include <linux/usb/of.h>
 #include <asm/byteorder.h>
+#include <linux/errno.h>
 
 #include "usb.h"
 
@@ -137,7 +138,10 @@  int usb_control_msg(struct usb_device *dev, unsigned int pipe, __u8 request,
 		    __u16 size, int timeout)
 {
 	struct usb_ctrlrequest *dr;
-	int ret;
+	int ret = -ETIMEDOUT;
+
+	/* retry_cnt * 20ms, max retry time set to 400ms */
+	int retry_cnt = 20;
 
 	dr = kmalloc(sizeof(struct usb_ctrlrequest), GFP_NOIO);
 	if (!dr)
@@ -149,11 +153,52 @@  int usb_control_msg(struct usb_device *dev, unsigned int pipe, __u8 request,
 	dr->wIndex = cpu_to_le16(index);
 	dr->wLength = cpu_to_le16(size);
 
-	ret = usb_internal_control_msg(dev, pipe, dr, data, size, timeout);
+	do {
+		ret = usb_internal_control_msg(dev,
+					pipe,
+					dr,
+					data,
+					size,
+					timeout);
+
+		/*
+		 * Linger a bit, prior to the next control message
+		 * or if return value is timeout, but do try few
+		 * times (max 400ms) before quitting. Adapt timeout
+		 * to be smaller when we have timeout'd first time.
+		 */
+		if (dev->quirks & USB_QUIRK_DELAY_CTRL_MSG)
+			msleep(200);
+		else if (ret == -ETIMEDOUT) {
+			static int timeout_happened = 0;
+
+			if ( ! timeout_happened ) {
+				timeout_happened = 1;
+
+				/* 
+				 * If timeout is given, divide it
+				 * by 100, if not, put 10ms timeout.
+				 * 
+				 * Then safeguard: if timeout is under
+				 * 10ms, make timeout to be 10ms.
+				 */
+
+				if (timeout > 0)
+					timeout /= 100;
+				else
+					timeout = 10;
+
+				if (timeout < 10)
+					timeout = 10;
+
+			}
+
+			msleep(20);
+		}
+
+		/* Loop while timeout, max loops: retry_cnt times. */
+	} while ((retry_cnt-- > 0) && (ret == -ETIMEDOUT));
 
-	/* Linger a bit, prior to the next control message. */
-	if (dev->quirks & USB_QUIRK_DELAY_CTRL_MSG)
-		msleep(200);
 
 	kfree(dr);