diff mbox series

usb: dwc3: gadget: Clear DWC3_EP_PENDING_REQUEST from non-0 endpoints

Message ID 20230531085544.253363-1-dan.scally@ideasonboard.com (mailing list archive)
State New, archived
Headers show
Series usb: dwc3: gadget: Clear DWC3_EP_PENDING_REQUEST from non-0 endpoints | expand

Commit Message

Dan Scally May 31, 2023, 8:55 a.m. UTC
The DWC3_EP_PENDING_REQUEST flag is set against an endpoint when
there are no pending or started requests available. This flag is
cleared on queuing to the endpoint for endpoint 0, but not for any
other endpoints. This can exacerbate timing problems by allowing a
queue to go ahead for an isochronous endpoint that should not be
started, so clear the flag upon a successful dwc3_gadget_ep_queue().

Signed-off-by: Daniel Scally <dan.scally@ideasonboard.com>
---
 drivers/usb/dwc3/gadget.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

Comments

Dan Scally May 31, 2023, 9:03 a.m. UTC | #1
Hi Thinh

On 31/05/2023 09:55, Daniel Scally wrote:
> The DWC3_EP_PENDING_REQUEST flag is set against an endpoint when
> there are no pending or started requests available. This flag is
> cleared on queuing to the endpoint for endpoint 0, but not for any
> other endpoints. This can exacerbate timing problems by allowing a
> queue to go ahead for an isochronous endpoint that should not be
> started, so clear the flag upon a successful dwc3_gadget_ep_queue().
>
> Signed-off-by: Daniel Scally <dan.scally@ideasonboard.com>
> ---


Just wanted to give some background on the timing issues that this is helping to relieve; we spotted 
this issue as a result of a "No resource for ep1in" error being thrown occasionally during repeated 
stream on/off tests for the UVC gadget on a platform using the DWC3; following the error stream 
won't restart unless you reboot. That error occurs when the gadget's workqueue function runs 
usb_ep_queue() whilst usb_ep_disable() is running during stream off. The DWC3 gadget code's locking 
plus the nulling of the endpoint descriptor during __dwc3_gadget_ep_disable() [1] and the check for 
that situation in __dwc3_gadget_ep_queue() [2] should make that harmless, but what occasionally 
happens is the dwc3_gadget_ep_queue() call sometimes manages to grab the lock when it's briefly 
unlocked during dwc3_gadget_giveback() [3]. That happens after the Stop Transfer command has been 
sent, so __dwc3_gadget_ep_queue() running through triggers a Start Transfer command, the 
dwc3_gadget_ep_disable() then finishes and stream shuts down, but when it's started back up again 
another Start Transfer command is sent and triggers the error. This patch ameliorates the impact of 
that race in my case, because clearing the flag prevents __dwc3_gadget_ep_queue() from running 
either __dwc3_gadget_start_isoc() or __dwc3_gadget_kick_transfer() for a non started isoc endpoint - 
but the race is still there. I think the potential for races is probably unavoidable given the 
unlock, but I thought it was worth explaining what lead to the patch in case it raises some issue 
that I'm missing.


Thanks

Dan


[1] https://elixir.bootlin.com/linux/latest/source/drivers/usb/dwc3/gadget.c#L1044
[2] https://elixir.bootlin.com/linux/latest/source/drivers/usb/dwc3/gadget.c#L1923
[3] https://elixir.bootlin.com/linux/latest/source/drivers/usb/dwc3/gadget.c#L216

>   drivers/usb/dwc3/gadget.c | 8 ++++++--
>   1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> index 578804dc29ca..bc1d93c56d82 100644
> --- a/drivers/usb/dwc3/gadget.c
> +++ b/drivers/usb/dwc3/gadget.c
> @@ -1988,13 +1988,17 @@ static int __dwc3_gadget_ep_queue(struct dwc3_ep *dep, struct dwc3_request *req)
>   	 */
>   	if (usb_endpoint_xfer_isoc(dep->endpoint.desc)) {
>   		if (!(dep->flags & DWC3_EP_TRANSFER_STARTED)) {
> +			int ret = 0;
> +
>   			if ((dep->flags & DWC3_EP_PENDING_REQUEST))
> -				return __dwc3_gadget_start_isoc(dep);
> +				ret = __dwc3_gadget_start_isoc(dep);
>   
> -			return 0;
> +			dep->flags &= ~DWC3_EP_PENDING_REQUEST;
> +			return ret;
>   		}
>   	}
>   
> +	dep->flags &= ~DWC3_EP_PENDING_REQUEST;
>   	__dwc3_gadget_kick_transfer(dep);
>   
>   	return 0;
Dan Scally June 1, 2023, 9:14 a.m. UTC | #2
Hi Thinh

On 31/05/2023 19:25, Thinh Nguyen wrote:
> Hi Dan,
>
> On Wed, May 31, 2023, Dan Scally wrote:
>> Hi Thinh
>>
>> On 31/05/2023 09:55, Daniel Scally wrote:
>>> The DWC3_EP_PENDING_REQUEST flag is set against an endpoint when
>>> there are no pending or started requests available. This flag is
>>> cleared on queuing to the endpoint for endpoint 0, but not for any
>>> other endpoints. This can exacerbate timing problems by allowing a
>>> queue to go ahead for an isochronous endpoint that should not be
>>> started, so clear the flag upon a successful dwc3_gadget_ep_queue().
>>>
>>> Signed-off-by: Daniel Scally <dan.scally@ideasonboard.com>
>>> ---
>>
>> Just wanted to give some background on the timing issues that this is
>> helping to relieve; we spotted this issue as a result of a "No resource for
>> ep1in" error being thrown occasionally during repeated stream on/off tests
>> for the UVC gadget on a platform using the DWC3; following the error stream
>> won't restart unless you reboot. That error occurs when the gadget's
>> workqueue function runs usb_ep_queue() whilst usb_ep_disable() is running
>> during stream off. The DWC3 gadget code's locking plus the nulling of the
>> endpoint descriptor during __dwc3_gadget_ep_disable() [1] and the check for
>> that situation in __dwc3_gadget_ep_queue() [2] should make that harmless,
>> but what occasionally happens is the dwc3_gadget_ep_queue() call sometimes
>> manages to grab the lock when it's briefly unlocked during
>> dwc3_gadget_giveback() [3]. That happens after the Stop Transfer command has
>> been sent, so __dwc3_gadget_ep_queue() running through triggers a Start
>> Transfer command, the dwc3_gadget_ep_disable() then finishes and stream
>> shuts down, but when it's started back up again another Start Transfer
>> command is sent and triggers the error. This patch ameliorates the impact of
>> that race in my case, because clearing the flag prevents
>> __dwc3_gadget_ep_queue() from running either __dwc3_gadget_start_isoc() or
>> __dwc3_gadget_kick_transfer() for a non started isoc endpoint - but the race
>> is still there. I think the potential for races is probably unavoidable
>> given the unlock, but I thought it was worth explaining what lead to the
>> patch in case it raises some issue that I'm missing.
>>
>>
>> Thanks
>>
>> Dan
>>
>>
>> [1] https://urldefense.com/v3/__https://elixir.bootlin.com/linux/latest/source/drivers/usb/dwc3/gadget.c*L1044__;Iw!!A4F2R9G_pg!aqEKDZwHxzb0OYos1htFTiXTnPWcQbnNg8qezhCNJ7bVQP-Ewp4NDw2N_mijL02MRmoyOFj4dlO7vEg1NowmS4A3kjU6$
>> [2] https://urldefense.com/v3/__https://elixir.bootlin.com/linux/latest/source/drivers/usb/dwc3/gadget.c*L1923__;Iw!!A4F2R9G_pg!aqEKDZwHxzb0OYos1htFTiXTnPWcQbnNg8qezhCNJ7bVQP-Ewp4NDw2N_mijL02MRmoyOFj4dlO7vEg1NowmS7iHqZdg$
>> [3] https://urldefense.com/v3/__https://elixir.bootlin.com/linux/latest/source/drivers/usb/dwc3/gadget.c*L216__;Iw!!A4F2R9G_pg!aqEKDZwHxzb0OYos1htFTiXTnPWcQbnNg8qezhCNJ7bVQP-Ewp4NDw2N_mijL02MRmoyOFj4dlO7vEg1NowmS4WOrh98$
>>
>
> Thanks for the detail background description. This helps a lot. The
> DWC3_EP_PENDING_REQUEST flag is actually only meant for ep0 and isoc
> endpoints. Regarding isoc endpoint, the gadget driver should prepare and
> queue isoc requests prior to the host requesting for it. Should that's
> not the case, this flag is set so that dwc3 can schedule transfer later
> when the are isoc requests are queued.
>
> The "No resource for ep1in" is expected in your case. The
> usb_ep_disable() API is documented that no other task may be using the
> endpoint when it's called.


Ah - I had missed that in the kerneldoc comment for usb_ep_disable(). OK.

> The UVC gadget driver is breaking this usage
> model when it tries to queue more request after calling
> usb_ep_disable().
>
> Your patch will not resolve the issue you're trying to solve. The fix
> should be in the UVC gadget driver.


I think in that case I'll patch the gadget to wind down the workqueue before usb_ep_disable() is 
called and that should also fix it - thanks for the help.


Dan

>
> Thanks,
> Thinh
diff mbox series

Patch

diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 578804dc29ca..bc1d93c56d82 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -1988,13 +1988,17 @@  static int __dwc3_gadget_ep_queue(struct dwc3_ep *dep, struct dwc3_request *req)
 	 */
 	if (usb_endpoint_xfer_isoc(dep->endpoint.desc)) {
 		if (!(dep->flags & DWC3_EP_TRANSFER_STARTED)) {
+			int ret = 0;
+
 			if ((dep->flags & DWC3_EP_PENDING_REQUEST))
-				return __dwc3_gadget_start_isoc(dep);
+				ret = __dwc3_gadget_start_isoc(dep);
 
-			return 0;
+			dep->flags &= ~DWC3_EP_PENDING_REQUEST;
+			return ret;
 		}
 	}
 
+	dep->flags &= ~DWC3_EP_PENDING_REQUEST;
 	__dwc3_gadget_kick_transfer(dep);
 
 	return 0;