diff mbox series

[RFC] usb: dwc3: usb: dwc3: Force stop EP0 transfers during pullup disable

Message ID 1628648608-15239-1-git-send-email-wcheng@codeaurora.org (mailing list archive)
State New, archived
Headers show
Series [RFC] usb: dwc3: usb: dwc3: Force stop EP0 transfers during pullup disable | expand

Commit Message

Wesley Cheng Aug. 11, 2021, 2:23 a.m. UTC
During a USB cable disconnect, or soft disconnect scenario, a pending
SETUP transaction may not be completed, leading to the following
error:

    dwc3 a600000.dwc3: timed out waiting for SETUP phase

If this occurs, then the entire pullup disable routine is skipped and
proper cleanup and halting of the controller does not complete.
Instead of returning an error (which is ignored from the UDC
perspective), do what is mentioned in the comments and force the
transaction to complete and put the ep0state back to the SETUP phase.

Signed-off-by: Wesley Cheng <wcheng@codeaurora.org>
---
 drivers/usb/dwc3/ep0.c    | 4 ++--
 drivers/usb/dwc3/gadget.c | 6 +++++-
 drivers/usb/dwc3/gadget.h | 3 +++
 3 files changed, 10 insertions(+), 3 deletions(-)

Comments

Thinh Nguyen Aug. 12, 2021, 12:47 a.m. UTC | #1
Wesley Cheng wrote:
> During a USB cable disconnect, or soft disconnect scenario, a pending
> SETUP transaction may not be completed, leading to the following
> error:
> 
>     dwc3 a600000.dwc3: timed out waiting for SETUP phase

How could it be a case of cable disconnect? The pullup(0) only applies
for soft-disconnect scenario. If the device initiated a disconnect, then
the driver should wait for the control request to complete. If it times
out, something is already wrong here. The programming guide only
mentions that we should wait for completion, but it doesn't say about
recovery in a case of hung transfer. I need to check internally but it
should be safe to issue End Transfer.

> 
> If this occurs, then the entire pullup disable routine is skipped and
> proper cleanup and halting of the controller does not complete.
> Instead of returning an error (which is ignored from the UDC
> perspective), do what is mentioned in the comments and force the
> transaction to complete and put the ep0state back to the SETUP phase.
> 
> Signed-off-by: Wesley Cheng <wcheng@codeaurora.org>
> ---
>  drivers/usb/dwc3/ep0.c    | 4 ++--
>  drivers/usb/dwc3/gadget.c | 6 +++++-
>  drivers/usb/dwc3/gadget.h | 3 +++
>  3 files changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
> index 6587394..abfc42b 100644
> --- a/drivers/usb/dwc3/ep0.c
> +++ b/drivers/usb/dwc3/ep0.c
> @@ -218,7 +218,7 @@ int dwc3_gadget_ep0_queue(struct usb_ep *ep, struct usb_request *request,
>  	return ret;
>  }
>  
> -static void dwc3_ep0_stall_and_restart(struct dwc3 *dwc)
> +void dwc3_ep0_stall_and_restart(struct dwc3 *dwc)
>  {
>  	struct dwc3_ep		*dep;
>  
> @@ -1073,7 +1073,7 @@ void dwc3_ep0_send_delayed_status(struct dwc3 *dwc)
>  	__dwc3_ep0_do_control_status(dwc, dwc->eps[direction]);
>  }
>  
> -static void dwc3_ep0_end_control_data(struct dwc3 *dwc, struct dwc3_ep *dep)
> +void dwc3_ep0_end_control_data(struct dwc3 *dwc, struct dwc3_ep *dep)
>  {
>  	struct dwc3_gadget_ep_cmd_params params;
>  	u32			cmd;
> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> index 54c5a08..a0e2e4d 100644
> --- a/drivers/usb/dwc3/gadget.c
> +++ b/drivers/usb/dwc3/gadget.c
> @@ -2437,7 +2437,11 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on)
>  				msecs_to_jiffies(DWC3_PULL_UP_TIMEOUT));
>  		if (ret == 0) {
>  			dev_err(dwc->dev, "timed out waiting for SETUP phase\n");
> -			return -ETIMEDOUT;
> +			spin_lock_irqsave(&dwc->lock, flags);
> +			dwc3_ep0_end_control_data(dwc, dwc->eps[0]);
> +			dwc3_ep0_end_control_data(dwc, dwc->eps[1]);

End transfer command takes time, need to wait for it to complete before
issuing Start transfer again. Also, why restart again when it's about to
be disconnected.

We'd also need to watch out for soft-connect in quick succession before
the End Transfer command completes.

> +			dwc3_ep0_stall_and_restart(dwc);
> +			spin_unlock_irqrestore(&dwc->lock, flags);
>  		}
>  	}
>  
> diff --git a/drivers/usb/dwc3/gadget.h b/drivers/usb/dwc3/gadget.h
> index 77df4b6..632f7b7 100644
> --- a/drivers/usb/dwc3/gadget.h
> +++ b/drivers/usb/dwc3/gadget.h
> @@ -114,6 +114,9 @@ int __dwc3_gadget_ep0_set_halt(struct usb_ep *ep, int value);
>  int dwc3_gadget_ep0_set_halt(struct usb_ep *ep, int value);
>  int dwc3_gadget_ep0_queue(struct usb_ep *ep, struct usb_request *request,
>  		gfp_t gfp_flags);
> +void dwc3_ep0_stall_and_restart(struct dwc3 *dwc);
> +void dwc3_ep0_end_control_data(struct dwc3 *dwc, struct dwc3_ep *dep);
> +
>  int __dwc3_gadget_ep_set_halt(struct dwc3_ep *dep, int value, int protocol);
>  void dwc3_ep0_send_delayed_status(struct dwc3 *dwc);
>  
> 

BR,
Thinh
Wesley Cheng Aug. 12, 2021, 5:12 p.m. UTC | #2
Hi Thinh,

On 8/11/2021 5:47 PM, Thinh Nguyen wrote:
> Wesley Cheng wrote:
>> During a USB cable disconnect, or soft disconnect scenario, a pending
>> SETUP transaction may not be completed, leading to the following
>> error:
>>
>>     dwc3 a600000.dwc3: timed out waiting for SETUP phase
> 
> How could it be a case of cable disconnect? The pullup(0) only applies
> for soft-disconnect scenario. If the device initiated a disconnect, then

Thanks for the response.  I guess this is specific for some use cases,
but some applications such as ADB will close the FFS EP files after it
gets the disconnection event, leading to this pullup disable as well.
So its specific to that particular use case.

> the driver should wait for the control request to complete. If it times
> out, something is already wrong here. The programming guide only
> mentions that we should wait for completion, but it doesn't say about
> recovery in a case of hung transfer. I need to check internally but it
> should be safe to issue End Transfer.
> 

Yes, what I did was modify a device running the Linux XHCI stack w/o
reading/sending out the SETUP DATA phase, so that on the device end we'd
always run into that situation where there's still a pending EP0 TRB queued.

We're running multiple devices with this fix as well, and doing device
initiated disconnect.

>>
>> If this occurs, then the entire pullup disable routine is skipped and
>> proper cleanup and halting of the controller does not complete.
>> Instead of returning an error (which is ignored from the UDC
>> perspective), do what is mentioned in the comments and force the
>> transaction to complete and put the ep0state back to the SETUP phase.
>>
>> Signed-off-by: Wesley Cheng <wcheng@codeaurora.org>
>> ---
>>  drivers/usb/dwc3/ep0.c    | 4 ++--
>>  drivers/usb/dwc3/gadget.c | 6 +++++-
>>  drivers/usb/dwc3/gadget.h | 3 +++
>>  3 files changed, 10 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
>> index 6587394..abfc42b 100644
>> --- a/drivers/usb/dwc3/ep0.c
>> +++ b/drivers/usb/dwc3/ep0.c
>> @@ -218,7 +218,7 @@ int dwc3_gadget_ep0_queue(struct usb_ep *ep, struct usb_request *request,
>>  	return ret;
>>  }
>>  
>> -static void dwc3_ep0_stall_and_restart(struct dwc3 *dwc)
>> +void dwc3_ep0_stall_and_restart(struct dwc3 *dwc)
>>  {
>>  	struct dwc3_ep		*dep;
>>  
>> @@ -1073,7 +1073,7 @@ void dwc3_ep0_send_delayed_status(struct dwc3 *dwc)
>>  	__dwc3_ep0_do_control_status(dwc, dwc->eps[direction]);
>>  }
>>  
>> -static void dwc3_ep0_end_control_data(struct dwc3 *dwc, struct dwc3_ep *dep)
>> +void dwc3_ep0_end_control_data(struct dwc3 *dwc, struct dwc3_ep *dep)
>>  {
>>  	struct dwc3_gadget_ep_cmd_params params;
>>  	u32			cmd;
>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>> index 54c5a08..a0e2e4d 100644
>> --- a/drivers/usb/dwc3/gadget.c
>> +++ b/drivers/usb/dwc3/gadget.c
>> @@ -2437,7 +2437,11 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on)
>>  				msecs_to_jiffies(DWC3_PULL_UP_TIMEOUT));
>>  		if (ret == 0) {
>>  			dev_err(dwc->dev, "timed out waiting for SETUP phase\n");
>> -			return -ETIMEDOUT;
>> +			spin_lock_irqsave(&dwc->lock, flags);
>> +			dwc3_ep0_end_control_data(dwc, dwc->eps[0]);
>> +			dwc3_ep0_end_control_data(dwc, dwc->eps[1]);
> 
> End transfer command takes time, need to wait for it to complete before
> issuing Start transfer again. Also, why restart again when it's about to
> be disconnected.

I can try without restarting it again, and see if that works.  Instead
of waiting for the command complete event, can we set the ForceRM bit,
similar to what we do for dwc3_remove_requests()?

> 
> We'd also need to watch out for soft-connect in quick succession before
> the End Transfer command completes.
> 
>> +			dwc3_ep0_stall_and_restart(dwc);
>> +			spin_unlock_irqrestore(&dwc->lock, flags);
>>  		}
>>  	}
>>  
>> diff --git a/drivers/usb/dwc3/gadget.h b/drivers/usb/dwc3/gadget.h
>> index 77df4b6..632f7b7 100644
>> --- a/drivers/usb/dwc3/gadget.h
>> +++ b/drivers/usb/dwc3/gadget.h
>> @@ -114,6 +114,9 @@ int __dwc3_gadget_ep0_set_halt(struct usb_ep *ep, int value);
>>  int dwc3_gadget_ep0_set_halt(struct usb_ep *ep, int value);
>>  int dwc3_gadget_ep0_queue(struct usb_ep *ep, struct usb_request *request,
>>  		gfp_t gfp_flags);
>> +void dwc3_ep0_stall_and_restart(struct dwc3 *dwc);
>> +void dwc3_ep0_end_control_data(struct dwc3 *dwc, struct dwc3_ep *dep);
>> +
>>  int __dwc3_gadget_ep_set_halt(struct dwc3_ep *dep, int value, int protocol);
>>  void dwc3_ep0_send_delayed_status(struct dwc3 *dwc);
>>  
>>
> 
> BR,
> Thinh
> 

Thanks
Wesley Cheng
Thinh Nguyen Aug. 12, 2021, 9:31 p.m. UTC | #3
Wesley Cheng wrote:
> Hi Thinh,
> 
> On 8/11/2021 5:47 PM, Thinh Nguyen wrote:
>> Wesley Cheng wrote:
>>> During a USB cable disconnect, or soft disconnect scenario, a pending
>>> SETUP transaction may not be completed, leading to the following
>>> error:
>>>
>>>     dwc3 a600000.dwc3: timed out waiting for SETUP phase
>>
>> How could it be a case of cable disconnect? The pullup(0) only applies
>> for soft-disconnect scenario. If the device initiated a disconnect, then
> 
> Thanks for the response.  I guess this is specific for some use cases,
> but some applications such as ADB will close the FFS EP files after it
> gets the disconnection event, leading to this pullup disable as well.
> So its specific to that particular use case.

Does that mean that the ADB application won't expect a connection until
user intervention or some other notification to do pullup(1)?

> 
>> the driver should wait for the control request to complete. If it times
>> out, something is already wrong here. The programming guide only
>> mentions that we should wait for completion, but it doesn't say about
>> recovery in a case of hung transfer. I need to check internally but it
>> should be safe to issue End Transfer.
>>
> 
> Yes, what I did was modify a device running the Linux XHCI stack w/o
> reading/sending out the SETUP DATA phase, so that on the device end we'd
> always run into that situation where there's still a pending EP0 TRB queued.

Is this only for validation purpose?

> 
> We're running multiple devices with this fix as well, and doing device
> initiated disconnect.
> 
>>>
>>> If this occurs, then the entire pullup disable routine is skipped and
>>> proper cleanup and halting of the controller does not complete.
>>> Instead of returning an error (which is ignored from the UDC
>>> perspective), do what is mentioned in the comments and force the
>>> transaction to complete and put the ep0state back to the SETUP phase.
>>>
>>> Signed-off-by: Wesley Cheng <wcheng@codeaurora.org>
>>> ---
>>>  drivers/usb/dwc3/ep0.c    | 4 ++--
>>>  drivers/usb/dwc3/gadget.c | 6 +++++-
>>>  drivers/usb/dwc3/gadget.h | 3 +++
>>>  3 files changed, 10 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
>>> index 6587394..abfc42b 100644
>>> --- a/drivers/usb/dwc3/ep0.c
>>> +++ b/drivers/usb/dwc3/ep0.c
>>> @@ -218,7 +218,7 @@ int dwc3_gadget_ep0_queue(struct usb_ep *ep, struct usb_request *request,
>>>  	return ret;
>>>  }
>>>  
>>> -static void dwc3_ep0_stall_and_restart(struct dwc3 *dwc)
>>> +void dwc3_ep0_stall_and_restart(struct dwc3 *dwc)
>>>  {
>>>  	struct dwc3_ep		*dep;
>>>  
>>> @@ -1073,7 +1073,7 @@ void dwc3_ep0_send_delayed_status(struct dwc3 *dwc)
>>>  	__dwc3_ep0_do_control_status(dwc, dwc->eps[direction]);
>>>  }
>>>  
>>> -static void dwc3_ep0_end_control_data(struct dwc3 *dwc, struct dwc3_ep *dep)
>>> +void dwc3_ep0_end_control_data(struct dwc3 *dwc, struct dwc3_ep *dep)
>>>  {
>>>  	struct dwc3_gadget_ep_cmd_params params;
>>>  	u32			cmd;
>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>>> index 54c5a08..a0e2e4d 100644
>>> --- a/drivers/usb/dwc3/gadget.c
>>> +++ b/drivers/usb/dwc3/gadget.c
>>> @@ -2437,7 +2437,11 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on)
>>>  				msecs_to_jiffies(DWC3_PULL_UP_TIMEOUT));
>>>  		if (ret == 0) {
>>>  			dev_err(dwc->dev, "timed out waiting for SETUP phase\n");
>>> -			return -ETIMEDOUT;
>>> +			spin_lock_irqsave(&dwc->lock, flags);
>>> +			dwc3_ep0_end_control_data(dwc, dwc->eps[0]);
>>> +			dwc3_ep0_end_control_data(dwc, dwc->eps[1]);
>>
>> End transfer command takes time, need to wait for it to complete before
>> issuing Start transfer again. Also, why restart again when it's about to
>> be disconnected.
> 
> I can try without restarting it again, and see if that works.  Instead
> of waiting for the command complete event, can we set the ForceRM bit,
> similar to what we do for dwc3_remove_requests()?
> 

ForceRM=1 means that the controller will ignore updating the TRBs
(including not clearing the HWO and remain transfer size). The driver
still needs to wait for the command to complete before issuing Start
Transfer command. Otherwise Start Transfer won't go through. If we know
that we're not going to issue Start Transfer any time soon, then we may
be able to get away with ignoring End Transfer command completion.

BR,
Thinh
Wesley Cheng Aug. 12, 2021, 11:19 p.m. UTC | #4
Hi Thinh,

On 8/12/2021 2:31 PM, Thinh Nguyen wrote:
> Wesley Cheng wrote:
>> Hi Thinh,
>>
>> On 8/11/2021 5:47 PM, Thinh Nguyen wrote:
>>> Wesley Cheng wrote:
>>>> During a USB cable disconnect, or soft disconnect scenario, a pending
>>>> SETUP transaction may not be completed, leading to the following
>>>> error:
>>>>
>>>>     dwc3 a600000.dwc3: timed out waiting for SETUP phase
>>>
>>> How could it be a case of cable disconnect? The pullup(0) only applies
>>> for soft-disconnect scenario. If the device initiated a disconnect, then
>>
>> Thanks for the response.  I guess this is specific for some use cases,
>> but some applications such as ADB will close the FFS EP files after it
>> gets the disconnection event, leading to this pullup disable as well.
>> So its specific to that particular use case.
> 
> Does that mean that the ADB application won't expect a connection until
> user intervention or some other notification to do pullup(1)?
> 

Yes, correct.  The Android USB framework will trigger the pullup(1) again.

>>
>>> the driver should wait for the control request to complete. If it times
>>> out, something is already wrong here. The programming guide only
>>> mentions that we should wait for completion, but it doesn't say about
>>> recovery in a case of hung transfer. I need to check internally but it
>>> should be safe to issue End Transfer.
>>>
>>
>> Yes, what I did was modify a device running the Linux XHCI stack w/o
>> reading/sending out the SETUP DATA phase, so that on the device end we'd
>> always run into that situation where there's still a pending EP0 TRB queued.
> 
> Is this only for validation purpose?
> 

Yes, just to help verify the fix by injecting the error condition.

>>
>> We're running multiple devices with this fix as well, and doing device
>> initiated disconnect.
>>
>>>>
>>>> If this occurs, then the entire pullup disable routine is skipped and
>>>> proper cleanup and halting of the controller does not complete.
>>>> Instead of returning an error (which is ignored from the UDC
>>>> perspective), do what is mentioned in the comments and force the
>>>> transaction to complete and put the ep0state back to the SETUP phase.
>>>>
>>>> Signed-off-by: Wesley Cheng <wcheng@codeaurora.org>
>>>> ---
>>>>  drivers/usb/dwc3/ep0.c    | 4 ++--
>>>>  drivers/usb/dwc3/gadget.c | 6 +++++-
>>>>  drivers/usb/dwc3/gadget.h | 3 +++
>>>>  3 files changed, 10 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
>>>> index 6587394..abfc42b 100644
>>>> --- a/drivers/usb/dwc3/ep0.c
>>>> +++ b/drivers/usb/dwc3/ep0.c
>>>> @@ -218,7 +218,7 @@ int dwc3_gadget_ep0_queue(struct usb_ep *ep, struct usb_request *request,
>>>>  	return ret;
>>>>  }
>>>>  
>>>> -static void dwc3_ep0_stall_and_restart(struct dwc3 *dwc)
>>>> +void dwc3_ep0_stall_and_restart(struct dwc3 *dwc)
>>>>  {
>>>>  	struct dwc3_ep		*dep;
>>>>  
>>>> @@ -1073,7 +1073,7 @@ void dwc3_ep0_send_delayed_status(struct dwc3 *dwc)
>>>>  	__dwc3_ep0_do_control_status(dwc, dwc->eps[direction]);
>>>>  }
>>>>  
>>>> -static void dwc3_ep0_end_control_data(struct dwc3 *dwc, struct dwc3_ep *dep)
>>>> +void dwc3_ep0_end_control_data(struct dwc3 *dwc, struct dwc3_ep *dep)
>>>>  {
>>>>  	struct dwc3_gadget_ep_cmd_params params;
>>>>  	u32			cmd;
>>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>>>> index 54c5a08..a0e2e4d 100644
>>>> --- a/drivers/usb/dwc3/gadget.c
>>>> +++ b/drivers/usb/dwc3/gadget.c
>>>> @@ -2437,7 +2437,11 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on)
>>>>  				msecs_to_jiffies(DWC3_PULL_UP_TIMEOUT));
>>>>  		if (ret == 0) {
>>>>  			dev_err(dwc->dev, "timed out waiting for SETUP phase\n");
>>>> -			return -ETIMEDOUT;
>>>> +			spin_lock_irqsave(&dwc->lock, flags);
>>>> +			dwc3_ep0_end_control_data(dwc, dwc->eps[0]);
>>>> +			dwc3_ep0_end_control_data(dwc, dwc->eps[1]);
>>>
>>> End transfer command takes time, need to wait for it to complete before
>>> issuing Start transfer again. Also, why restart again when it's about to
>>> be disconnected.
>>
>> I can try without restarting it again, and see if that works.  Instead
>> of waiting for the command complete event, can we set the ForceRM bit,
>> similar to what we do for dwc3_remove_requests()?
>>
> 
> ForceRM=1 means that the controller will ignore updating the TRBs
> (including not clearing the HWO and remain transfer size). The driver
> still needs to wait for the command to complete before issuing Start
> Transfer command. Otherwise Start Transfer won't go through. If we know
> that we're not going to issue Start Transfer any time soon, then we may
> be able to get away with ignoring End Transfer command completion.
> 

I see.  Currently, in the place that we do use
dwc3_ep0_end_control_data(), its followed by
dwc3_ep0_stall_and_restart() which would execute start transfer.  For
the most part, we were trying to follow the flow diagram in:

	4.4 Control Transfer Programming Model

We'd technically be in the "wait for host" stage at this point, so hence
why we issued the end transfer, then followed with the stall and restart.

Thanks
Wesley Cheng
Thinh Nguyen Aug. 13, 2021, 11:21 p.m. UTC | #5
Wesley Cheng wrote:
> Hi Thinh,
> 
> On 8/12/2021 2:31 PM, Thinh Nguyen wrote:
>> Wesley Cheng wrote:
>>> Hi Thinh,
>>>
>>> On 8/11/2021 5:47 PM, Thinh Nguyen wrote:
>>>> Wesley Cheng wrote:
>>>>> During a USB cable disconnect, or soft disconnect scenario, a pending
>>>>> SETUP transaction may not be completed, leading to the following
>>>>> error:
>>>>>
>>>>>     dwc3 a600000.dwc3: timed out waiting for SETUP phase
>>>>
>>>> How could it be a case of cable disconnect? The pullup(0) only applies
>>>> for soft-disconnect scenario. If the device initiated a disconnect, then
>>>
>>> Thanks for the response.  I guess this is specific for some use cases,
>>> but some applications such as ADB will close the FFS EP files after it
>>> gets the disconnection event, leading to this pullup disable as well.
>>> So its specific to that particular use case.
>>
>> Does that mean that the ADB application won't expect a connection until
>> user intervention or some other notification to do pullup(1)?
>>
> 
> Yes, correct.  The Android USB framework will trigger the pullup(1) again.
> 
>>>
>>>> the driver should wait for the control request to complete. If it times
>>>> out, something is already wrong here. The programming guide only
>>>> mentions that we should wait for completion, but it doesn't say about
>>>> recovery in a case of hung transfer. I need to check internally but it
>>>> should be safe to issue End Transfer.
>>>>
>>>
>>> Yes, what I did was modify a device running the Linux XHCI stack w/o
>>> reading/sending out the SETUP DATA phase, so that on the device end we'd
>>> always run into that situation where there's still a pending EP0 TRB queued.
>>
>> Is this only for validation purpose?
>>
> 
> Yes, just to help verify the fix by injecting the error condition.

Just want to clarify, so you didn't really run into this problem but
want to fix the recovery path right?

> 
>>>
>>> We're running multiple devices with this fix as well, and doing device
>>> initiated disconnect.
>>>
>>>>>
>>>>> If this occurs, then the entire pullup disable routine is skipped and
>>>>> proper cleanup and halting of the controller does not complete.
>>>>> Instead of returning an error (which is ignored from the UDC
>>>>> perspective), do what is mentioned in the comments and force the
>>>>> transaction to complete and put the ep0state back to the SETUP phase.
>>>>>
>>>>> Signed-off-by: Wesley Cheng <wcheng@codeaurora.org>
>>>>> ---
>>>>>  drivers/usb/dwc3/ep0.c    | 4 ++--
>>>>>  drivers/usb/dwc3/gadget.c | 6 +++++-
>>>>>  drivers/usb/dwc3/gadget.h | 3 +++
>>>>>  3 files changed, 10 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
>>>>> index 6587394..abfc42b 100644
>>>>> --- a/drivers/usb/dwc3/ep0.c
>>>>> +++ b/drivers/usb/dwc3/ep0.c
>>>>> @@ -218,7 +218,7 @@ int dwc3_gadget_ep0_queue(struct usb_ep *ep, struct usb_request *request,
>>>>>  	return ret;
>>>>>  }
>>>>>  
>>>>> -static void dwc3_ep0_stall_and_restart(struct dwc3 *dwc)
>>>>> +void dwc3_ep0_stall_and_restart(struct dwc3 *dwc)
>>>>>  {
>>>>>  	struct dwc3_ep		*dep;
>>>>>  
>>>>> @@ -1073,7 +1073,7 @@ void dwc3_ep0_send_delayed_status(struct dwc3 *dwc)
>>>>>  	__dwc3_ep0_do_control_status(dwc, dwc->eps[direction]);
>>>>>  }
>>>>>  
>>>>> -static void dwc3_ep0_end_control_data(struct dwc3 *dwc, struct dwc3_ep *dep)
>>>>> +void dwc3_ep0_end_control_data(struct dwc3 *dwc, struct dwc3_ep *dep)
>>>>>  {
>>>>>  	struct dwc3_gadget_ep_cmd_params params;
>>>>>  	u32			cmd;
>>>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>>>>> index 54c5a08..a0e2e4d 100644
>>>>> --- a/drivers/usb/dwc3/gadget.c
>>>>> +++ b/drivers/usb/dwc3/gadget.c
>>>>> @@ -2437,7 +2437,11 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on)
>>>>>  				msecs_to_jiffies(DWC3_PULL_UP_TIMEOUT));
>>>>>  		if (ret == 0) {
>>>>>  			dev_err(dwc->dev, "timed out waiting for SETUP phase\n");
>>>>> -			return -ETIMEDOUT;
>>>>> +			spin_lock_irqsave(&dwc->lock, flags);
>>>>> +			dwc3_ep0_end_control_data(dwc, dwc->eps[0]);
>>>>> +			dwc3_ep0_end_control_data(dwc, dwc->eps[1]);
>>>>
>>>> End transfer command takes time, need to wait for it to complete before
>>>> issuing Start transfer again. Also, why restart again when it's about to
>>>> be disconnected.
>>>
>>> I can try without restarting it again, and see if that works.  Instead
>>> of waiting for the command complete event, can we set the ForceRM bit,
>>> similar to what we do for dwc3_remove_requests()?
>>>
>>
>> ForceRM=1 means that the controller will ignore updating the TRBs
>> (including not clearing the HWO and remain transfer size). The driver
>> still needs to wait for the command to complete before issuing Start
>> Transfer command. Otherwise Start Transfer won't go through. If we know
>> that we're not going to issue Start Transfer any time soon, then we may
>> be able to get away with ignoring End Transfer command completion.
>>
> 
> I see.  Currently, in the place that we do use
> dwc3_ep0_end_control_data(), its followed by
> dwc3_ep0_stall_and_restart() which would execute start transfer.  For

That doesn't look right. You can try to see if it can recover from a
control write request. Often time we do control read and not write.
(i.e. try to End Transfer and immediately Start Transfer on the same
direction control endpoint).

> the most part, we were trying to follow the flow diagram in:
> 
> 	4.4 Control Transfer Programming Model
> 
> We'd technically be in the "wait for host" stage at this point, so hence
> why we issued the end transfer, then followed with the stall and restart.
> 

You're not setting up the data stage on host side right? So, we wouldn't
receive a XferNotReady(data) to proceed to the next state to interpret
whether it's unexpected direction to end and restart. Also, you should
check to see which direction/control stage was hung to issue End
Transfer to ep0 or ep1. On reconnect after device initiated disconnect,
the gadget should start ep0 for setup stage again.

Note:
I don't think we clearly defined a recovery for this timeout for device
initiated disconnect. Most of the time we don't want to issue End
Transfer for endpoint 0 (OUT direction) because we may be expecting a
setup packet. However, in this case of error recovery to proceed to
soft-disconnect, it should be fine.

BR,
Thinh
Wesley Cheng Aug. 13, 2021, 11:49 p.m. UTC | #6
Hi Thinh,

On 8/13/2021 4:21 PM, Thinh Nguyen wrote:
> Wesley Cheng wrote:
>> Hi Thinh,
>>
>> On 8/12/2021 2:31 PM, Thinh Nguyen wrote:
>>> Wesley Cheng wrote:
>>>> Hi Thinh,
>>>>
>>>> On 8/11/2021 5:47 PM, Thinh Nguyen wrote:
>>>>> Wesley Cheng wrote:
>>>>>> During a USB cable disconnect, or soft disconnect scenario, a pending
>>>>>> SETUP transaction may not be completed, leading to the following
>>>>>> error:
>>>>>>
>>>>>>     dwc3 a600000.dwc3: timed out waiting for SETUP phase
>>>>>
>>>>> How could it be a case of cable disconnect? The pullup(0) only applies
>>>>> for soft-disconnect scenario. If the device initiated a disconnect, then
>>>>
>>>> Thanks for the response.  I guess this is specific for some use cases,
>>>> but some applications such as ADB will close the FFS EP files after it
>>>> gets the disconnection event, leading to this pullup disable as well.
>>>> So its specific to that particular use case.
>>>
>>> Does that mean that the ADB application won't expect a connection until
>>> user intervention or some other notification to do pullup(1)?
>>>
>>
>> Yes, correct.  The Android USB framework will trigger the pullup(1) again.
>>
>>>>
>>>>> the driver should wait for the control request to complete. If it times
>>>>> out, something is already wrong here. The programming guide only
>>>>> mentions that we should wait for completion, but it doesn't say about
>>>>> recovery in a case of hung transfer. I need to check internally but it
>>>>> should be safe to issue End Transfer.
>>>>>
>>>>
>>>> Yes, what I did was modify a device running the Linux XHCI stack w/o
>>>> reading/sending out the SETUP DATA phase, so that on the device end we'd
>>>> always run into that situation where there's still a pending EP0 TRB queued.
>>>
>>> Is this only for validation purpose?
>>>
>>
>> Yes, just to help verify the fix by injecting the error condition.
> 
> Just want to clarify, so you didn't really run into this problem but
> want to fix the recovery path right?
> 
No, we ran into this issue, but it was only seen in our mass device
testing.  I only made the changes to simulate the issue, so then I
wouldn't need to have to wait for the test results.

However, we did get a mass device test run in w/ the changes, and they
were unable to reproduce the same issues we've been seeing, so this does
fix an actual issue.
>>
>>>>
>>>> We're running multiple devices with this fix as well, and doing device
>>>> initiated disconnect.
>>>>
>>>>>>
>>>>>> If this occurs, then the entire pullup disable routine is skipped and
>>>>>> proper cleanup and halting of the controller does not complete.
>>>>>> Instead of returning an error (which is ignored from the UDC
>>>>>> perspective), do what is mentioned in the comments and force the
>>>>>> transaction to complete and put the ep0state back to the SETUP phase.
>>>>>>
>>>>>> Signed-off-by: Wesley Cheng <wcheng@codeaurora.org>
>>>>>> ---
>>>>>>  drivers/usb/dwc3/ep0.c    | 4 ++--
>>>>>>  drivers/usb/dwc3/gadget.c | 6 +++++-
>>>>>>  drivers/usb/dwc3/gadget.h | 3 +++
>>>>>>  3 files changed, 10 insertions(+), 3 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
>>>>>> index 6587394..abfc42b 100644
>>>>>> --- a/drivers/usb/dwc3/ep0.c
>>>>>> +++ b/drivers/usb/dwc3/ep0.c
>>>>>> @@ -218,7 +218,7 @@ int dwc3_gadget_ep0_queue(struct usb_ep *ep, struct usb_request *request,
>>>>>>  	return ret;
>>>>>>  }
>>>>>>  
>>>>>> -static void dwc3_ep0_stall_and_restart(struct dwc3 *dwc)
>>>>>> +void dwc3_ep0_stall_and_restart(struct dwc3 *dwc)
>>>>>>  {
>>>>>>  	struct dwc3_ep		*dep;
>>>>>>  
>>>>>> @@ -1073,7 +1073,7 @@ void dwc3_ep0_send_delayed_status(struct dwc3 *dwc)
>>>>>>  	__dwc3_ep0_do_control_status(dwc, dwc->eps[direction]);
>>>>>>  }
>>>>>>  
>>>>>> -static void dwc3_ep0_end_control_data(struct dwc3 *dwc, struct dwc3_ep *dep)
>>>>>> +void dwc3_ep0_end_control_data(struct dwc3 *dwc, struct dwc3_ep *dep)
>>>>>>  {
>>>>>>  	struct dwc3_gadget_ep_cmd_params params;
>>>>>>  	u32			cmd;
>>>>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>>>>>> index 54c5a08..a0e2e4d 100644
>>>>>> --- a/drivers/usb/dwc3/gadget.c
>>>>>> +++ b/drivers/usb/dwc3/gadget.c
>>>>>> @@ -2437,7 +2437,11 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on)
>>>>>>  				msecs_to_jiffies(DWC3_PULL_UP_TIMEOUT));
>>>>>>  		if (ret == 0) {
>>>>>>  			dev_err(dwc->dev, "timed out waiting for SETUP phase\n");
>>>>>> -			return -ETIMEDOUT;
>>>>>> +			spin_lock_irqsave(&dwc->lock, flags);
>>>>>> +			dwc3_ep0_end_control_data(dwc, dwc->eps[0]);
>>>>>> +			dwc3_ep0_end_control_data(dwc, dwc->eps[1]);
>>>>>
>>>>> End transfer command takes time, need to wait for it to complete before
>>>>> issuing Start transfer again. Also, why restart again when it's about to
>>>>> be disconnected.
>>>>
>>>> I can try without restarting it again, and see if that works.  Instead
>>>> of waiting for the command complete event, can we set the ForceRM bit,
>>>> similar to what we do for dwc3_remove_requests()?
>>>>
>>>
>>> ForceRM=1 means that the controller will ignore updating the TRBs
>>> (including not clearing the HWO and remain transfer size). The driver
>>> still needs to wait for the command to complete before issuing Start
>>> Transfer command. Otherwise Start Transfer won't go through. If we know
>>> that we're not going to issue Start Transfer any time soon, then we may
>>> be able to get away with ignoring End Transfer command completion.
>>>
>>
>> I see.  Currently, in the place that we do use
>> dwc3_ep0_end_control_data(), its followed by
>> dwc3_ep0_stall_and_restart() which would execute start transfer.  For
> 
> That doesn't look right. You can try to see if it can recover from a
> control write request. Often time we do control read and not write.
> (i.e. try to End Transfer and immediately Start Transfer on the same
> direction control endpoint).
> 
OK, I can try, but just to clarify, I was referring to how it was being
done in:

static void dwc3_ep0_xfernotready(struct dwc3 *dwc,
		const struct dwc3_event_depevt *event)
{
...
		if (dwc->ep0_expect_in != event->endpoint_number) {
			struct dwc3_ep	*dep = dwc->eps[dwc->ep0_expect_in];

			dev_err(dwc->dev, "unexpected direction for Data Phase\n");
			dwc3_ep0_end_control_data(dwc, dep);
			dwc3_ep0_stall_and_restart(dwc);
			return;
		}

>> the most part, we were trying to follow the flow diagram in:
>>
>> 	4.4 Control Transfer Programming Model
>>
>> We'd technically be in the "wait for host" stage at this point, so hence
>> why we issued the end transfer, then followed with the stall and restart.
>>
> 
> You're not setting up the data stage on host side right? So, we wouldn't
> receive a XferNotReady(data) to proceed to the next state to interpret
Correct, so the situation we're running into isn't technically something
covered in the flow chart, but it is a similar situation as the one it
currently highlights.
> whether it's unexpected direction to end and restart. Also, you should
> check to see which direction/control stage was hung to issue End
> Transfer to ep0 or ep1. On reconnect after device initiated disconnect,
dwc3_ep0_end_control_data() does check if the dep->resource_index == 0
before issuing the end transfer.  The resource index should be zero if
there was no start transfer path executed.  However, I can add a check
here if that is preferred.
> the gadget should start ep0 for setup stage again.
> 
Yes, so maybe we don't need to explicitly stall and restart here, as it
will be restarted during gadget start.
> Note:
> I don't think we clearly defined a recovery for this timeout for device
> initiated disconnect. Most of the time we don't want to issue End
> Transfer for endpoint 0 (OUT direction) because we may be expecting a
> setup packet. However, in this case of error recovery to proceed to
> soft-disconnect, it should be fine.
> 
Another solution we were considering is if we just avoid returning
-ETIMEDOUT even if there were pending transfers on EP0.  In the end, the
gadget stop routine would disable EP0/1, which would also stop any
active transfers as well w/ EP events disabled.

Thanks
Wesley Cheng
Thinh Nguyen Aug. 14, 2021, 12:30 a.m. UTC | #7
Wesley Cheng wrote:
> Hi Thinh,
> 
> On 8/13/2021 4:21 PM, Thinh Nguyen wrote:
>> Wesley Cheng wrote:
>>> Hi Thinh,
>>>
>>> On 8/12/2021 2:31 PM, Thinh Nguyen wrote:
>>>> Wesley Cheng wrote:
>>>>> Hi Thinh,
>>>>>
>>>>> On 8/11/2021 5:47 PM, Thinh Nguyen wrote:
>>>>>> Wesley Cheng wrote:
>>>>>>> During a USB cable disconnect, or soft disconnect scenario, a pending
>>>>>>> SETUP transaction may not be completed, leading to the following
>>>>>>> error:
>>>>>>>
>>>>>>>     dwc3 a600000.dwc3: timed out waiting for SETUP phase
>>>>>>
>>>>>> How could it be a case of cable disconnect? The pullup(0) only applies
>>>>>> for soft-disconnect scenario. If the device initiated a disconnect, then
>>>>>
>>>>> Thanks for the response.  I guess this is specific for some use cases,
>>>>> but some applications such as ADB will close the FFS EP files after it
>>>>> gets the disconnection event, leading to this pullup disable as well.
>>>>> So its specific to that particular use case.
>>>>
>>>> Does that mean that the ADB application won't expect a connection until
>>>> user intervention or some other notification to do pullup(1)?
>>>>
>>>
>>> Yes, correct.  The Android USB framework will trigger the pullup(1) again.
>>>
>>>>>
>>>>>> the driver should wait for the control request to complete. If it times
>>>>>> out, something is already wrong here. The programming guide only
>>>>>> mentions that we should wait for completion, but it doesn't say about
>>>>>> recovery in a case of hung transfer. I need to check internally but it
>>>>>> should be safe to issue End Transfer.
>>>>>>
>>>>>
>>>>> Yes, what I did was modify a device running the Linux XHCI stack w/o
>>>>> reading/sending out the SETUP DATA phase, so that on the device end we'd
>>>>> always run into that situation where there's still a pending EP0 TRB queued.
>>>>
>>>> Is this only for validation purpose?
>>>>
>>>
>>> Yes, just to help verify the fix by injecting the error condition.
>>
>> Just want to clarify, so you didn't really run into this problem but
>> want to fix the recovery path right?
>>
> No, we ran into this issue, but it was only seen in our mass device
> testing.  I only made the changes to simulate the issue, so then I
> wouldn't need to have to wait for the test results.
> 
> However, we did get a mass device test run in w/ the changes, and they
> were unable to reproduce the same issues we've been seeing, so this does
> fix an actual issue.

Ok

>>>
>>>>>
>>>>> We're running multiple devices with this fix as well, and doing device
>>>>> initiated disconnect.
>>>>>
>>>>>>>
>>>>>>> If this occurs, then the entire pullup disable routine is skipped and
>>>>>>> proper cleanup and halting of the controller does not complete.
>>>>>>> Instead of returning an error (which is ignored from the UDC
>>>>>>> perspective), do what is mentioned in the comments and force the
>>>>>>> transaction to complete and put the ep0state back to the SETUP phase.
>>>>>>>
>>>>>>> Signed-off-by: Wesley Cheng <wcheng@codeaurora.org>
>>>>>>> ---
>>>>>>>  drivers/usb/dwc3/ep0.c    | 4 ++--
>>>>>>>  drivers/usb/dwc3/gadget.c | 6 +++++-
>>>>>>>  drivers/usb/dwc3/gadget.h | 3 +++
>>>>>>>  3 files changed, 10 insertions(+), 3 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
>>>>>>> index 6587394..abfc42b 100644
>>>>>>> --- a/drivers/usb/dwc3/ep0.c
>>>>>>> +++ b/drivers/usb/dwc3/ep0.c
>>>>>>> @@ -218,7 +218,7 @@ int dwc3_gadget_ep0_queue(struct usb_ep *ep, struct usb_request *request,
>>>>>>>  	return ret;
>>>>>>>  }
>>>>>>>  
>>>>>>> -static void dwc3_ep0_stall_and_restart(struct dwc3 *dwc)
>>>>>>> +void dwc3_ep0_stall_and_restart(struct dwc3 *dwc)
>>>>>>>  {
>>>>>>>  	struct dwc3_ep		*dep;
>>>>>>>  
>>>>>>> @@ -1073,7 +1073,7 @@ void dwc3_ep0_send_delayed_status(struct dwc3 *dwc)
>>>>>>>  	__dwc3_ep0_do_control_status(dwc, dwc->eps[direction]);
>>>>>>>  }
>>>>>>>  
>>>>>>> -static void dwc3_ep0_end_control_data(struct dwc3 *dwc, struct dwc3_ep *dep)
>>>>>>> +void dwc3_ep0_end_control_data(struct dwc3 *dwc, struct dwc3_ep *dep)
>>>>>>>  {
>>>>>>>  	struct dwc3_gadget_ep_cmd_params params;
>>>>>>>  	u32			cmd;
>>>>>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>>>>>>> index 54c5a08..a0e2e4d 100644
>>>>>>> --- a/drivers/usb/dwc3/gadget.c
>>>>>>> +++ b/drivers/usb/dwc3/gadget.c
>>>>>>> @@ -2437,7 +2437,11 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on)
>>>>>>>  				msecs_to_jiffies(DWC3_PULL_UP_TIMEOUT));
>>>>>>>  		if (ret == 0) {
>>>>>>>  			dev_err(dwc->dev, "timed out waiting for SETUP phase\n");
>>>>>>> -			return -ETIMEDOUT;
>>>>>>> +			spin_lock_irqsave(&dwc->lock, flags);
>>>>>>> +			dwc3_ep0_end_control_data(dwc, dwc->eps[0]);
>>>>>>> +			dwc3_ep0_end_control_data(dwc, dwc->eps[1]);
>>>>>>
>>>>>> End transfer command takes time, need to wait for it to complete before
>>>>>> issuing Start transfer again. Also, why restart again when it's about to
>>>>>> be disconnected.
>>>>>
>>>>> I can try without restarting it again, and see if that works.  Instead
>>>>> of waiting for the command complete event, can we set the ForceRM bit,
>>>>> similar to what we do for dwc3_remove_requests()?
>>>>>
>>>>
>>>> ForceRM=1 means that the controller will ignore updating the TRBs
>>>> (including not clearing the HWO and remain transfer size). The driver
>>>> still needs to wait for the command to complete before issuing Start
>>>> Transfer command. Otherwise Start Transfer won't go through. If we know
>>>> that we're not going to issue Start Transfer any time soon, then we may
>>>> be able to get away with ignoring End Transfer command completion.
>>>>
>>>
>>> I see.  Currently, in the place that we do use
>>> dwc3_ep0_end_control_data(), its followed by
>>> dwc3_ep0_stall_and_restart() which would execute start transfer.  For
>>
>> That doesn't look right. You can try to see if it can recover from a
>> control write request. Often time we do control read and not write.
>> (i.e. try to End Transfer and immediately Start Transfer on the same
>> direction control endpoint).
>>
> OK, I can try, but just to clarify, I was referring to how it was being
> done in:
> 
> static void dwc3_ep0_xfernotready(struct dwc3 *dwc,
> 		const struct dwc3_event_depevt *event)
> {
> ...
> 		if (dwc->ep0_expect_in != event->endpoint_number) {
> 			struct dwc3_ep	*dep = dwc->eps[dwc->ep0_expect_in];
> 
> 			dev_err(dwc->dev, "unexpected direction for Data Phase\n");
> 			dwc3_ep0_end_control_data(dwc, dep);
> 			dwc3_ep0_stall_and_restart(dwc);
> 			return;
> 		}
> 
>>> the most part, we were trying to follow the flow diagram in:
>>>
>>> 	4.4 Control Transfer Programming Model
>>>
>>> We'd technically be in the "wait for host" stage at this point, so hence
>>> why we issued the end transfer, then followed with the stall and restart.
>>>
>>
>> You're not setting up the data stage on host side right? So, we wouldn't
>> receive a XferNotReady(data) to proceed to the next state to interpret
> Correct, so the situation we're running into isn't technically something
> covered in the flow chart, but it is a similar situation as the one it
> currently highlights.
>> whether it's unexpected direction to end and restart. Also, you should
>> check to see which direction/control stage was hung to issue End
>> Transfer to ep0 or ep1. On reconnect after device initiated disconnect,
> dwc3_ep0_end_control_data() does check if the dep->resource_index == 0
> before issuing the end transfer.  The resource index should be zero if
> there was no start transfer path executed.  However, I can add a check
> here if that is preferred.

The resource index for ep0 is 0. That means that the driver will not
issue End Transfer to ep0. In the above code snippet, it should not be a
problem because if the driver received XferNotReady for ep0, that means
that the endpoint has not started yet, so it's ok to issue Start
Transfer again. If it's for ep1, then it's ok to issue End Transfer on
that direction (IN) while Start Transfer for ep0 (OUT).

>> the gadget should start ep0 for setup stage again.
>>
> Yes, so maybe we don't need to explicitly stall and restart here, as it
> will be restarted during gadget start.
>> Note:
>> I don't think we clearly defined a recovery for this timeout for device
>> initiated disconnect. Most of the time we don't want to issue End
>> Transfer for endpoint 0 (OUT direction) because we may be expecting a
>> setup packet. However, in this case of error recovery to proceed to
>> soft-disconnect, it should be fine.
>>
> Another solution we were considering is if we just avoid returning
> -ETIMEDOUT even if there were pending transfers on EP0.  In the end, the
> gadget stop routine would disable EP0/1, which would also stop any
> active transfers as well w/ EP events disabled.
> 

Yes that's true. Probably better this way.

BR,
Thinh
Thinh Nguyen Aug. 15, 2021, 12:26 a.m. UTC | #8
Thinh Nguyen wrote:
> Wesley Cheng wrote:
>> Hi Thinh,
>>
>> On 8/13/2021 4:21 PM, Thinh Nguyen wrote:
>>> Wesley Cheng wrote:
>>>> Hi Thinh,
>>>>
>>>> On 8/12/2021 2:31 PM, Thinh Nguyen wrote:
>>>>> Wesley Cheng wrote:
>>>>>> Hi Thinh,
>>>>>>
>>>>>> On 8/11/2021 5:47 PM, Thinh Nguyen wrote:
>>>>>>> Wesley Cheng wrote:
>>>>>>>> During a USB cable disconnect, or soft disconnect scenario, a pending
>>>>>>>> SETUP transaction may not be completed, leading to the following
>>>>>>>> error:
>>>>>>>>
>>>>>>>>     dwc3 a600000.dwc3: timed out waiting for SETUP phase
>>>>>>>
>>>>>>> How could it be a case of cable disconnect? The pullup(0) only applies
>>>>>>> for soft-disconnect scenario. If the device initiated a disconnect, then
>>>>>>
>>>>>> Thanks for the response.  I guess this is specific for some use cases,
>>>>>> but some applications such as ADB will close the FFS EP files after it
>>>>>> gets the disconnection event, leading to this pullup disable as well.
>>>>>> So its specific to that particular use case.
>>>>>
>>>>> Does that mean that the ADB application won't expect a connection until
>>>>> user intervention or some other notification to do pullup(1)?
>>>>>
>>>>
>>>> Yes, correct.  The Android USB framework will trigger the pullup(1) again.
>>>>
>>>>>>
>>>>>>> the driver should wait for the control request to complete. If it times
>>>>>>> out, something is already wrong here. The programming guide only
>>>>>>> mentions that we should wait for completion, but it doesn't say about
>>>>>>> recovery in a case of hung transfer. I need to check internally but it
>>>>>>> should be safe to issue End Transfer.
>>>>>>>
>>>>>>
>>>>>> Yes, what I did was modify a device running the Linux XHCI stack w/o
>>>>>> reading/sending out the SETUP DATA phase, so that on the device end we'd
>>>>>> always run into that situation where there's still a pending EP0 TRB queued.
>>>>>
>>>>> Is this only for validation purpose?
>>>>>
>>>>
>>>> Yes, just to help verify the fix by injecting the error condition.
>>>
>>> Just want to clarify, so you didn't really run into this problem but
>>> want to fix the recovery path right?
>>>
>> No, we ran into this issue, but it was only seen in our mass device
>> testing.  I only made the changes to simulate the issue, so then I
>> wouldn't need to have to wait for the test results.
>>
>> However, we did get a mass device test run in w/ the changes, and they
>> were unable to reproduce the same issues we've been seeing, so this does
>> fix an actual issue.
> 
> Ok
> 
>>>>
>>>>>>
>>>>>> We're running multiple devices with this fix as well, and doing device
>>>>>> initiated disconnect.
>>>>>>
>>>>>>>>
>>>>>>>> If this occurs, then the entire pullup disable routine is skipped and
>>>>>>>> proper cleanup and halting of the controller does not complete.
>>>>>>>> Instead of returning an error (which is ignored from the UDC
>>>>>>>> perspective), do what is mentioned in the comments and force the
>>>>>>>> transaction to complete and put the ep0state back to the SETUP phase.
>>>>>>>>
>>>>>>>> Signed-off-by: Wesley Cheng <wcheng@codeaurora.org>
>>>>>>>> ---
>>>>>>>>  drivers/usb/dwc3/ep0.c    | 4 ++--
>>>>>>>>  drivers/usb/dwc3/gadget.c | 6 +++++-
>>>>>>>>  drivers/usb/dwc3/gadget.h | 3 +++
>>>>>>>>  3 files changed, 10 insertions(+), 3 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
>>>>>>>> index 6587394..abfc42b 100644
>>>>>>>> --- a/drivers/usb/dwc3/ep0.c
>>>>>>>> +++ b/drivers/usb/dwc3/ep0.c
>>>>>>>> @@ -218,7 +218,7 @@ int dwc3_gadget_ep0_queue(struct usb_ep *ep, struct usb_request *request,
>>>>>>>>  	return ret;
>>>>>>>>  }
>>>>>>>>  
>>>>>>>> -static void dwc3_ep0_stall_and_restart(struct dwc3 *dwc)
>>>>>>>> +void dwc3_ep0_stall_and_restart(struct dwc3 *dwc)
>>>>>>>>  {
>>>>>>>>  	struct dwc3_ep		*dep;
>>>>>>>>  
>>>>>>>> @@ -1073,7 +1073,7 @@ void dwc3_ep0_send_delayed_status(struct dwc3 *dwc)
>>>>>>>>  	__dwc3_ep0_do_control_status(dwc, dwc->eps[direction]);
>>>>>>>>  }
>>>>>>>>  
>>>>>>>> -static void dwc3_ep0_end_control_data(struct dwc3 *dwc, struct dwc3_ep *dep)
>>>>>>>> +void dwc3_ep0_end_control_data(struct dwc3 *dwc, struct dwc3_ep *dep)
>>>>>>>>  {
>>>>>>>>  	struct dwc3_gadget_ep_cmd_params params;
>>>>>>>>  	u32			cmd;
>>>>>>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>>>>>>>> index 54c5a08..a0e2e4d 100644
>>>>>>>> --- a/drivers/usb/dwc3/gadget.c
>>>>>>>> +++ b/drivers/usb/dwc3/gadget.c
>>>>>>>> @@ -2437,7 +2437,11 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on)
>>>>>>>>  				msecs_to_jiffies(DWC3_PULL_UP_TIMEOUT));
>>>>>>>>  		if (ret == 0) {
>>>>>>>>  			dev_err(dwc->dev, "timed out waiting for SETUP phase\n");
>>>>>>>> -			return -ETIMEDOUT;
>>>>>>>> +			spin_lock_irqsave(&dwc->lock, flags);
>>>>>>>> +			dwc3_ep0_end_control_data(dwc, dwc->eps[0]);
>>>>>>>> +			dwc3_ep0_end_control_data(dwc, dwc->eps[1]);
>>>>>>>
>>>>>>> End transfer command takes time, need to wait for it to complete before
>>>>>>> issuing Start transfer again. Also, why restart again when it's about to
>>>>>>> be disconnected.
>>>>>>
>>>>>> I can try without restarting it again, and see if that works.  Instead
>>>>>> of waiting for the command complete event, can we set the ForceRM bit,
>>>>>> similar to what we do for dwc3_remove_requests()?
>>>>>>
>>>>>
>>>>> ForceRM=1 means that the controller will ignore updating the TRBs
>>>>> (including not clearing the HWO and remain transfer size). The driver
>>>>> still needs to wait for the command to complete before issuing Start
>>>>> Transfer command. Otherwise Start Transfer won't go through. If we know
>>>>> that we're not going to issue Start Transfer any time soon, then we may
>>>>> be able to get away with ignoring End Transfer command completion.
>>>>>
>>>>
>>>> I see.  Currently, in the place that we do use
>>>> dwc3_ep0_end_control_data(), its followed by
>>>> dwc3_ep0_stall_and_restart() which would execute start transfer.  For
>>>
>>> That doesn't look right. You can try to see if it can recover from a
>>> control write request. Often time we do control read and not write.
>>> (i.e. try to End Transfer and immediately Start Transfer on the same
>>> direction control endpoint).
>>>
>> OK, I can try, but just to clarify, I was referring to how it was being
>> done in:
>>
>> static void dwc3_ep0_xfernotready(struct dwc3 *dwc,
>> 		const struct dwc3_event_depevt *event)
>> {
>> ...
>> 		if (dwc->ep0_expect_in != event->endpoint_number) {
>> 			struct dwc3_ep	*dep = dwc->eps[dwc->ep0_expect_in];
>>
>> 			dev_err(dwc->dev, "unexpected direction for Data Phase\n");
>> 			dwc3_ep0_end_control_data(dwc, dep);
>> 			dwc3_ep0_stall_and_restart(dwc);
>> 			return;
>> 		}
>>

Looking at this snippet again, it looks wrong. For control write
unexpected direction, if the driver hasn't setup and started the DATA
phase yet, then it's fine, but there is a problem if it did.

Since dwc3_ep0_end_control_data() doesn't issue End Transfer command to
ep0 due to the resource_index check, it doesn't follow the control
transfer flow model in the programming guide. This may cause
dwc3_ep0_stall_and_restart() to overwrite the TRBs for the DATA phase
with SETUP stage. Also, if the ep0 is already started, the driver won't
issue Start Transfer command again.

This issue is unlikely to occur unless we see a misbehave host for
control write request. Regardless, we need to fix this. I may need some
time before I can create a patch and test it. If you or anyone is up to
take this on, it'd be highly appreciated.

If I missed anything, do let me know.

Thanks,
Thinh
Felipe Balbi Aug. 15, 2021, 6:06 a.m. UTC | #9
Hi,

Thinh Nguyen <Thinh.Nguyen@synopsys.com> writes:
>>>>>>>>> If this occurs, then the entire pullup disable routine is skipped and
>>>>>>>>> proper cleanup and halting of the controller does not complete.
>>>>>>>>> Instead of returning an error (which is ignored from the UDC
>>>>>>>>> perspective), do what is mentioned in the comments and force the
>>>>>>>>> transaction to complete and put the ep0state back to the SETUP phase.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Wesley Cheng <wcheng@codeaurora.org>
>>>>>>>>> ---
>>>>>>>>>  drivers/usb/dwc3/ep0.c    | 4 ++--
>>>>>>>>>  drivers/usb/dwc3/gadget.c | 6 +++++-
>>>>>>>>>  drivers/usb/dwc3/gadget.h | 3 +++
>>>>>>>>>  3 files changed, 10 insertions(+), 3 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
>>>>>>>>> index 6587394..abfc42b 100644
>>>>>>>>> --- a/drivers/usb/dwc3/ep0.c
>>>>>>>>> +++ b/drivers/usb/dwc3/ep0.c
>>>>>>>>> @@ -218,7 +218,7 @@ int dwc3_gadget_ep0_queue(struct usb_ep *ep, struct usb_request *request,
>>>>>>>>>  	return ret;
>>>>>>>>>  }
>>>>>>>>>  
>>>>>>>>> -static void dwc3_ep0_stall_and_restart(struct dwc3 *dwc)
>>>>>>>>> +void dwc3_ep0_stall_and_restart(struct dwc3 *dwc)
>>>>>>>>>  {
>>>>>>>>>  	struct dwc3_ep		*dep;
>>>>>>>>>  
>>>>>>>>> @@ -1073,7 +1073,7 @@ void dwc3_ep0_send_delayed_status(struct dwc3 *dwc)
>>>>>>>>>  	__dwc3_ep0_do_control_status(dwc, dwc->eps[direction]);
>>>>>>>>>  }
>>>>>>>>>  
>>>>>>>>> -static void dwc3_ep0_end_control_data(struct dwc3 *dwc, struct dwc3_ep *dep)
>>>>>>>>> +void dwc3_ep0_end_control_data(struct dwc3 *dwc, struct dwc3_ep *dep)
>>>>>>>>>  {
>>>>>>>>>  	struct dwc3_gadget_ep_cmd_params params;
>>>>>>>>>  	u32			cmd;
>>>>>>>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>>>>>>>>> index 54c5a08..a0e2e4d 100644
>>>>>>>>> --- a/drivers/usb/dwc3/gadget.c
>>>>>>>>> +++ b/drivers/usb/dwc3/gadget.c
>>>>>>>>> @@ -2437,7 +2437,11 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on)
>>>>>>>>>  				msecs_to_jiffies(DWC3_PULL_UP_TIMEOUT));
>>>>>>>>>  		if (ret == 0) {
>>>>>>>>>  			dev_err(dwc->dev, "timed out waiting for SETUP phase\n");
>>>>>>>>> -			return -ETIMEDOUT;
>>>>>>>>> +			spin_lock_irqsave(&dwc->lock, flags);
>>>>>>>>> +			dwc3_ep0_end_control_data(dwc, dwc->eps[0]);
>>>>>>>>> +			dwc3_ep0_end_control_data(dwc, dwc->eps[1]);
>>>>>>>>
>>>>>>>> End transfer command takes time, need to wait for it to complete before
>>>>>>>> issuing Start transfer again. Also, why restart again when it's about to
>>>>>>>> be disconnected.
>>>>>>>
>>>>>>> I can try without restarting it again, and see if that works.  Instead
>>>>>>> of waiting for the command complete event, can we set the ForceRM bit,
>>>>>>> similar to what we do for dwc3_remove_requests()?
>>>>>>>
>>>>>>
>>>>>> ForceRM=1 means that the controller will ignore updating the TRBs
>>>>>> (including not clearing the HWO and remain transfer size). The driver
>>>>>> still needs to wait for the command to complete before issuing Start
>>>>>> Transfer command. Otherwise Start Transfer won't go through. If we know
>>>>>> that we're not going to issue Start Transfer any time soon, then we may
>>>>>> be able to get away with ignoring End Transfer command completion.
>>>>>>
>>>>>
>>>>> I see.  Currently, in the place that we do use
>>>>> dwc3_ep0_end_control_data(), its followed by
>>>>> dwc3_ep0_stall_and_restart() which would execute start transfer.  For
>>>>
>>>> That doesn't look right. You can try to see if it can recover from a
>>>> control write request. Often time we do control read and not write.
>>>> (i.e. try to End Transfer and immediately Start Transfer on the same
>>>> direction control endpoint).
>>>>
>>> OK, I can try, but just to clarify, I was referring to how it was being
>>> done in:
>>>
>>> static void dwc3_ep0_xfernotready(struct dwc3 *dwc,
>>> 		const struct dwc3_event_depevt *event)
>>> {
>>> ...
>>> 		if (dwc->ep0_expect_in != event->endpoint_number) {
>>> 			struct dwc3_ep	*dep = dwc->eps[dwc->ep0_expect_in];
>>>
>>> 			dev_err(dwc->dev, "unexpected direction for Data Phase\n");
>>> 			dwc3_ep0_end_control_data(dwc, dep);
>>> 			dwc3_ep0_stall_and_restart(dwc);
>>> 			return;
>>> 		}
>>>
>
> Looking at this snippet again, it looks wrong. For control write
> unexpected direction, if the driver hasn't setup and started the DATA
> phase yet, then it's fine, but there is a problem if it did.
>
> Since dwc3_ep0_end_control_data() doesn't issue End Transfer command to
> ep0 due to the resource_index check, it doesn't follow the control

IIRC resource_index is always non-zero, so the command should be
triggered. If you have access to a Lecroy USB Trainer, could you script
this very scenario for verification?

> transfer flow model in the programming guide. This may cause
> dwc3_ep0_stall_and_restart() to overwrite the TRBs for the DATA phase
> with SETUP stage. Also, if the ep0 is already started, the driver won't
> issue Start Transfer command again.

> This issue is unlikely to occur unless we see a misbehave host for
> control write request. Regardless, we need to fix this. I may need some

right, it would be a misbehaving host, however databook called it out as
something that _can_ happen. Moreover, I have vague memories of this
being one of the test cases in Lecroy's USB Certification Suite.

> time before I can create a patch and test it. If you or anyone is up to
> take this on, it'd be highly appreciated.

Before we go ahead writing a patch for this, I'd really like to see
traces showing this failure and a minimal reproducer. The reproducer
would probably have to be a script for Lecroy's USB Trainer.

Keep in mind this entire ep0 stack used to pass USBCV on every -rc and
major release (before I lost access to all my USB gear heh).
Thinh Nguyen Aug. 16, 2021, 12:33 a.m. UTC | #10
Felipe Balbi wrote:
> 
> Hi,
> 
> Thinh Nguyen <Thinh.Nguyen@synopsys.com> writes:
>>>>>>>>>> If this occurs, then the entire pullup disable routine is skipped and
>>>>>>>>>> proper cleanup and halting of the controller does not complete.
>>>>>>>>>> Instead of returning an error (which is ignored from the UDC
>>>>>>>>>> perspective), do what is mentioned in the comments and force the
>>>>>>>>>> transaction to complete and put the ep0state back to the SETUP phase.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Wesley Cheng <wcheng@codeaurora.org>
>>>>>>>>>> ---
>>>>>>>>>>  drivers/usb/dwc3/ep0.c    | 4 ++--
>>>>>>>>>>  drivers/usb/dwc3/gadget.c | 6 +++++-
>>>>>>>>>>  drivers/usb/dwc3/gadget.h | 3 +++
>>>>>>>>>>  3 files changed, 10 insertions(+), 3 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
>>>>>>>>>> index 6587394..abfc42b 100644
>>>>>>>>>> --- a/drivers/usb/dwc3/ep0.c
>>>>>>>>>> +++ b/drivers/usb/dwc3/ep0.c
>>>>>>>>>> @@ -218,7 +218,7 @@ int dwc3_gadget_ep0_queue(struct usb_ep *ep, struct usb_request *request,
>>>>>>>>>>  	return ret;
>>>>>>>>>>  }
>>>>>>>>>>  
>>>>>>>>>> -static void dwc3_ep0_stall_and_restart(struct dwc3 *dwc)
>>>>>>>>>> +void dwc3_ep0_stall_and_restart(struct dwc3 *dwc)
>>>>>>>>>>  {
>>>>>>>>>>  	struct dwc3_ep		*dep;
>>>>>>>>>>  
>>>>>>>>>> @@ -1073,7 +1073,7 @@ void dwc3_ep0_send_delayed_status(struct dwc3 *dwc)
>>>>>>>>>>  	__dwc3_ep0_do_control_status(dwc, dwc->eps[direction]);
>>>>>>>>>>  }
>>>>>>>>>>  
>>>>>>>>>> -static void dwc3_ep0_end_control_data(struct dwc3 *dwc, struct dwc3_ep *dep)
>>>>>>>>>> +void dwc3_ep0_end_control_data(struct dwc3 *dwc, struct dwc3_ep *dep)
>>>>>>>>>>  {
>>>>>>>>>>  	struct dwc3_gadget_ep_cmd_params params;
>>>>>>>>>>  	u32			cmd;
>>>>>>>>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>>>>>>>>>> index 54c5a08..a0e2e4d 100644
>>>>>>>>>> --- a/drivers/usb/dwc3/gadget.c
>>>>>>>>>> +++ b/drivers/usb/dwc3/gadget.c
>>>>>>>>>> @@ -2437,7 +2437,11 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on)
>>>>>>>>>>  				msecs_to_jiffies(DWC3_PULL_UP_TIMEOUT));
>>>>>>>>>>  		if (ret == 0) {
>>>>>>>>>>  			dev_err(dwc->dev, "timed out waiting for SETUP phase\n");
>>>>>>>>>> -			return -ETIMEDOUT;
>>>>>>>>>> +			spin_lock_irqsave(&dwc->lock, flags);
>>>>>>>>>> +			dwc3_ep0_end_control_data(dwc, dwc->eps[0]);
>>>>>>>>>> +			dwc3_ep0_end_control_data(dwc, dwc->eps[1]);
>>>>>>>>>
>>>>>>>>> End transfer command takes time, need to wait for it to complete before
>>>>>>>>> issuing Start transfer again. Also, why restart again when it's about to
>>>>>>>>> be disconnected.
>>>>>>>>
>>>>>>>> I can try without restarting it again, and see if that works.  Instead
>>>>>>>> of waiting for the command complete event, can we set the ForceRM bit,
>>>>>>>> similar to what we do for dwc3_remove_requests()?
>>>>>>>>
>>>>>>>
>>>>>>> ForceRM=1 means that the controller will ignore updating the TRBs
>>>>>>> (including not clearing the HWO and remain transfer size). The driver
>>>>>>> still needs to wait for the command to complete before issuing Start
>>>>>>> Transfer command. Otherwise Start Transfer won't go through. If we know
>>>>>>> that we're not going to issue Start Transfer any time soon, then we may
>>>>>>> be able to get away with ignoring End Transfer command completion.
>>>>>>>
>>>>>>
>>>>>> I see.  Currently, in the place that we do use
>>>>>> dwc3_ep0_end_control_data(), its followed by
>>>>>> dwc3_ep0_stall_and_restart() which would execute start transfer.  For
>>>>>
>>>>> That doesn't look right. You can try to see if it can recover from a
>>>>> control write request. Often time we do control read and not write.
>>>>> (i.e. try to End Transfer and immediately Start Transfer on the same
>>>>> direction control endpoint).
>>>>>
>>>> OK, I can try, but just to clarify, I was referring to how it was being
>>>> done in:
>>>>
>>>> static void dwc3_ep0_xfernotready(struct dwc3 *dwc,
>>>> 		const struct dwc3_event_depevt *event)
>>>> {
>>>> ...
>>>> 		if (dwc->ep0_expect_in != event->endpoint_number) {
>>>> 			struct dwc3_ep	*dep = dwc->eps[dwc->ep0_expect_in];
>>>>
>>>> 			dev_err(dwc->dev, "unexpected direction for Data Phase\n");
>>>> 			dwc3_ep0_end_control_data(dwc, dep);
>>>> 			dwc3_ep0_stall_and_restart(dwc);
>>>> 			return;
>>>> 		}
>>>>
>>
>> Looking at this snippet again, it looks wrong. For control write
>> unexpected direction, if the driver hasn't setup and started the DATA
>> phase yet, then it's fine, but there is a problem if it did.
>>
>> Since dwc3_ep0_end_control_data() doesn't issue End Transfer command to
>> ep0 due to the resource_index check, it doesn't follow the control
> 
> IIRC resource_index is always non-zero, so the command should be

No, resource_index for ep0out is 0, ep0in is 1. You can check from any
of the driver tracepoint log for the return value of Start Transfer
command for the resource index of ep0. There could be a mixed up with
the undocumented return value of Set Endpoint Transfer Resource command
before when this code was written, don't mix up with that.

> triggered. If you have access to a Lecroy USB Trainer, could you script
> this very scenario for verification?

For anyone who wants to work on this, we don't need a LeCroy USB
trainer. If you use xhci host, just modify the xhci-ring.c to queue a
wrong direction DATA phase TRB of a particular control write request
test, and continue with the next control requests.

> 
>> transfer flow model in the programming guide. This may cause
>> dwc3_ep0_stall_and_restart() to overwrite the TRBs for the DATA phase
>> with SETUP stage. Also, if the ep0 is already started, the driver won't
>> issue Start Transfer command again.
> 
>> This issue is unlikely to occur unless we see a misbehave host for
>> control write request. Regardless, we need to fix this. I may need some
> 
> right, it would be a misbehaving host, however databook called it out as
> something that _can_ happen. Moreover, I have vague memories of this
> being one of the test cases in Lecroy's USB Certification Suite.
> 

Yes, it's something that can happen, and dwc3 should be able to handle
it. If you remember which test in particular that tests this, let me
know. I want to check how it was passed.

>> time before I can create a patch and test it. If you or anyone is up to
>> take this on, it'd be highly appreciated.
> 
> Before we go ahead writing a patch for this, I'd really like to see
> traces showing this failure and a minimal reproducer. The reproducer
> would probably have to be a script for Lecroy's USB Trainer.
> 
> Keep in mind this entire ep0 stack used to pass USBCV on every -rc and
> major release (before I lost access to all my USB gear heh).
> 

Are you referring to Ch9 USBCV? I don't recall there's a particular test
for this.

There should be a red flag whenever we see End Transfer command
immediately follows by a Start Transfer command without any waiting for
End Transfer completion. Though, in this case, we don't go through with
the End Transfer for ep0 due to the resource_index check in
dwc3_ep0_end_control_data().

BR,
Thinh
Felipe Balbi Aug. 16, 2021, 5:15 a.m. UTC | #11
Hi,

Thinh Nguyen <Thinh.Nguyen@synopsys.com> writes:
>> Thinh Nguyen <Thinh.Nguyen@synopsys.com> writes:
>>>>>>>>>>> If this occurs, then the entire pullup disable routine is skipped and
>>>>>>>>>>> proper cleanup and halting of the controller does not complete.
>>>>>>>>>>> Instead of returning an error (which is ignored from the UDC
>>>>>>>>>>> perspective), do what is mentioned in the comments and force the
>>>>>>>>>>> transaction to complete and put the ep0state back to the SETUP phase.
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Wesley Cheng <wcheng@codeaurora.org>
>>>>>>>>>>> ---
>>>>>>>>>>>  drivers/usb/dwc3/ep0.c    | 4 ++--
>>>>>>>>>>>  drivers/usb/dwc3/gadget.c | 6 +++++-
>>>>>>>>>>>  drivers/usb/dwc3/gadget.h | 3 +++
>>>>>>>>>>>  3 files changed, 10 insertions(+), 3 deletions(-)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
>>>>>>>>>>> index 6587394..abfc42b 100644
>>>>>>>>>>> --- a/drivers/usb/dwc3/ep0.c
>>>>>>>>>>> +++ b/drivers/usb/dwc3/ep0.c
>>>>>>>>>>> @@ -218,7 +218,7 @@ int dwc3_gadget_ep0_queue(struct usb_ep *ep, struct usb_request *request,
>>>>>>>>>>>  	return ret;
>>>>>>>>>>>  }
>>>>>>>>>>>  
>>>>>>>>>>> -static void dwc3_ep0_stall_and_restart(struct dwc3 *dwc)
>>>>>>>>>>> +void dwc3_ep0_stall_and_restart(struct dwc3 *dwc)
>>>>>>>>>>>  {
>>>>>>>>>>>  	struct dwc3_ep		*dep;
>>>>>>>>>>>  
>>>>>>>>>>> @@ -1073,7 +1073,7 @@ void dwc3_ep0_send_delayed_status(struct dwc3 *dwc)
>>>>>>>>>>>  	__dwc3_ep0_do_control_status(dwc, dwc->eps[direction]);
>>>>>>>>>>>  }
>>>>>>>>>>>  
>>>>>>>>>>> -static void dwc3_ep0_end_control_data(struct dwc3 *dwc, struct dwc3_ep *dep)
>>>>>>>>>>> +void dwc3_ep0_end_control_data(struct dwc3 *dwc, struct dwc3_ep *dep)
>>>>>>>>>>>  {
>>>>>>>>>>>  	struct dwc3_gadget_ep_cmd_params params;
>>>>>>>>>>>  	u32			cmd;
>>>>>>>>>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>>>>>>>>>>> index 54c5a08..a0e2e4d 100644
>>>>>>>>>>> --- a/drivers/usb/dwc3/gadget.c
>>>>>>>>>>> +++ b/drivers/usb/dwc3/gadget.c
>>>>>>>>>>> @@ -2437,7 +2437,11 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on)
>>>>>>>>>>>  				msecs_to_jiffies(DWC3_PULL_UP_TIMEOUT));
>>>>>>>>>>>  		if (ret == 0) {
>>>>>>>>>>>  			dev_err(dwc->dev, "timed out waiting for SETUP phase\n");
>>>>>>>>>>> -			return -ETIMEDOUT;
>>>>>>>>>>> +			spin_lock_irqsave(&dwc->lock, flags);
>>>>>>>>>>> +			dwc3_ep0_end_control_data(dwc, dwc->eps[0]);
>>>>>>>>>>> +			dwc3_ep0_end_control_data(dwc, dwc->eps[1]);
>>>>>>>>>>
>>>>>>>>>> End transfer command takes time, need to wait for it to complete before
>>>>>>>>>> issuing Start transfer again. Also, why restart again when it's about to
>>>>>>>>>> be disconnected.
>>>>>>>>>
>>>>>>>>> I can try without restarting it again, and see if that works.  Instead
>>>>>>>>> of waiting for the command complete event, can we set the ForceRM bit,
>>>>>>>>> similar to what we do for dwc3_remove_requests()?
>>>>>>>>>
>>>>>>>>
>>>>>>>> ForceRM=1 means that the controller will ignore updating the TRBs
>>>>>>>> (including not clearing the HWO and remain transfer size). The driver
>>>>>>>> still needs to wait for the command to complete before issuing Start
>>>>>>>> Transfer command. Otherwise Start Transfer won't go through. If we know
>>>>>>>> that we're not going to issue Start Transfer any time soon, then we may
>>>>>>>> be able to get away with ignoring End Transfer command completion.
>>>>>>>>
>>>>>>>
>>>>>>> I see.  Currently, in the place that we do use
>>>>>>> dwc3_ep0_end_control_data(), its followed by
>>>>>>> dwc3_ep0_stall_and_restart() which would execute start transfer.  For
>>>>>>
>>>>>> That doesn't look right. You can try to see if it can recover from a
>>>>>> control write request. Often time we do control read and not write.
>>>>>> (i.e. try to End Transfer and immediately Start Transfer on the same
>>>>>> direction control endpoint).
>>>>>>
>>>>> OK, I can try, but just to clarify, I was referring to how it was being
>>>>> done in:
>>>>>
>>>>> static void dwc3_ep0_xfernotready(struct dwc3 *dwc,
>>>>> 		const struct dwc3_event_depevt *event)
>>>>> {
>>>>> ...
>>>>> 		if (dwc->ep0_expect_in != event->endpoint_number) {
>>>>> 			struct dwc3_ep	*dep = dwc->eps[dwc->ep0_expect_in];
>>>>>
>>>>> 			dev_err(dwc->dev, "unexpected direction for Data Phase\n");
>>>>> 			dwc3_ep0_end_control_data(dwc, dep);
>>>>> 			dwc3_ep0_stall_and_restart(dwc);
>>>>> 			return;
>>>>> 		}
>>>>>
>>>
>>> Looking at this snippet again, it looks wrong. For control write
>>> unexpected direction, if the driver hasn't setup and started the DATA
>>> phase yet, then it's fine, but there is a problem if it did.
>>>
>>> Since dwc3_ep0_end_control_data() doesn't issue End Transfer command to
>>> ep0 due to the resource_index check, it doesn't follow the control
>> 
>> IIRC resource_index is always non-zero, so the command should be
>
> No, resource_index for ep0out is 0, ep0in is 1. You can check from any
> of the driver tracepoint log for the return value of Start Transfer
> command for the resource index of ep0. There could be a mixed up with
> the undocumented return value of Set Endpoint Transfer Resource command
> before when this code was written, don't mix up with that.

fair enough :-)

>> triggered. If you have access to a Lecroy USB Trainer, could you script
>> this very scenario for verification?
>
> For anyone who wants to work on this, we don't need a LeCroy USB
> trainer. If you use xhci host, just modify the xhci-ring.c to queue a
> wrong direction DATA phase TRB of a particular control write request
> test, and continue with the next control requests.

That's true.

>>> transfer flow model in the programming guide. This may cause
>>> dwc3_ep0_stall_and_restart() to overwrite the TRBs for the DATA phase
>>> with SETUP stage. Also, if the ep0 is already started, the driver won't
>>> issue Start Transfer command again.
>> 
>>> This issue is unlikely to occur unless we see a misbehave host for
>>> control write request. Regardless, we need to fix this. I may need some
>> 
>> right, it would be a misbehaving host, however databook called it out as
>> something that _can_ happen. Moreover, I have vague memories of this
>> being one of the test cases in Lecroy's USB Certification Suite.
>
> Yes, it's something that can happen, and dwc3 should be able to handle
> it. If you remember which test in particular that tests this, let me
> know. I want to check how it was passed.

Unfortunately, not off the top of my head.

>>> time before I can create a patch and test it. If you or anyone is up to
>>> take this on, it'd be highly appreciated.
>> 
>> Before we go ahead writing a patch for this, I'd really like to see
>> traces showing this failure and a minimal reproducer. The reproducer
>> would probably have to be a script for Lecroy's USB Trainer.
>> 
>> Keep in mind this entire ep0 stack used to pass USBCV on every -rc and
>> major release (before I lost access to all my USB gear heh).
>> 
>
> Are you referring to Ch9 USBCV? I don't recall there's a particular test
> for this.

No, no. USBCV doesn't have tests for this. Lecroy's Test Suite was more
extensive than USBCV. I was simply saying that changes to ep0 (and any
of the low level control pipe handling) should be done with care to make
sure we don't break certification for folks who'd like to pursue that
route.

> There should be a red flag whenever we see End Transfer command
> immediately follows by a Start Transfer command without any waiting for
> End Transfer completion. Though, in this case, we don't go through with
> the End Transfer for ep0 due to the resource_index check in
> dwc3_ep0_end_control_data().

Understood. Do you have sample traces showing this case?
Wesley Cheng Aug. 16, 2021, 7:13 p.m. UTC | #12
Hi Thinh,

On 8/15/2021 5:33 PM, Thinh Nguyen wrote:
> Felipe Balbi wrote:
>>
>> Hi,
>>
>> Thinh Nguyen <Thinh.Nguyen@synopsys.com> writes:
>>>>>>>>>>> If this occurs, then the entire pullup disable routine is skipped and
>>>>>>>>>>> proper cleanup and halting of the controller does not complete.
>>>>>>>>>>> Instead of returning an error (which is ignored from the UDC
>>>>>>>>>>> perspective), do what is mentioned in the comments and force the
>>>>>>>>>>> transaction to complete and put the ep0state back to the SETUP phase.
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Wesley Cheng <wcheng@codeaurora.org>
>>>>>>>>>>> ---
>>>>>>>>>>>  drivers/usb/dwc3/ep0.c    | 4 ++--
>>>>>>>>>>>  drivers/usb/dwc3/gadget.c | 6 +++++-
>>>>>>>>>>>  drivers/usb/dwc3/gadget.h | 3 +++
>>>>>>>>>>>  3 files changed, 10 insertions(+), 3 deletions(-)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
>>>>>>>>>>> index 6587394..abfc42b 100644
>>>>>>>>>>> --- a/drivers/usb/dwc3/ep0.c
>>>>>>>>>>> +++ b/drivers/usb/dwc3/ep0.c
>>>>>>>>>>> @@ -218,7 +218,7 @@ int dwc3_gadget_ep0_queue(struct usb_ep *ep, struct usb_request *request,
>>>>>>>>>>>  	return ret;
>>>>>>>>>>>  }
>>>>>>>>>>>  
>>>>>>>>>>> -static void dwc3_ep0_stall_and_restart(struct dwc3 *dwc)
>>>>>>>>>>> +void dwc3_ep0_stall_and_restart(struct dwc3 *dwc)
>>>>>>>>>>>  {
>>>>>>>>>>>  	struct dwc3_ep		*dep;
>>>>>>>>>>>  
>>>>>>>>>>> @@ -1073,7 +1073,7 @@ void dwc3_ep0_send_delayed_status(struct dwc3 *dwc)
>>>>>>>>>>>  	__dwc3_ep0_do_control_status(dwc, dwc->eps[direction]);
>>>>>>>>>>>  }
>>>>>>>>>>>  
>>>>>>>>>>> -static void dwc3_ep0_end_control_data(struct dwc3 *dwc, struct dwc3_ep *dep)
>>>>>>>>>>> +void dwc3_ep0_end_control_data(struct dwc3 *dwc, struct dwc3_ep *dep)
>>>>>>>>>>>  {
>>>>>>>>>>>  	struct dwc3_gadget_ep_cmd_params params;
>>>>>>>>>>>  	u32			cmd;
>>>>>>>>>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>>>>>>>>>>> index 54c5a08..a0e2e4d 100644
>>>>>>>>>>> --- a/drivers/usb/dwc3/gadget.c
>>>>>>>>>>> +++ b/drivers/usb/dwc3/gadget.c
>>>>>>>>>>> @@ -2437,7 +2437,11 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on)
>>>>>>>>>>>  				msecs_to_jiffies(DWC3_PULL_UP_TIMEOUT));
>>>>>>>>>>>  		if (ret == 0) {
>>>>>>>>>>>  			dev_err(dwc->dev, "timed out waiting for SETUP phase\n");
>>>>>>>>>>> -			return -ETIMEDOUT;
>>>>>>>>>>> +			spin_lock_irqsave(&dwc->lock, flags);
>>>>>>>>>>> +			dwc3_ep0_end_control_data(dwc, dwc->eps[0]);
>>>>>>>>>>> +			dwc3_ep0_end_control_data(dwc, dwc->eps[1]);
>>>>>>>>>>
>>>>>>>>>> End transfer command takes time, need to wait for it to complete before
>>>>>>>>>> issuing Start transfer again. Also, why restart again when it's about to
>>>>>>>>>> be disconnected.
>>>>>>>>>
>>>>>>>>> I can try without restarting it again, and see if that works.  Instead
>>>>>>>>> of waiting for the command complete event, can we set the ForceRM bit,
>>>>>>>>> similar to what we do for dwc3_remove_requests()?
>>>>>>>>>
>>>>>>>>
>>>>>>>> ForceRM=1 means that the controller will ignore updating the TRBs
>>>>>>>> (including not clearing the HWO and remain transfer size). The driver
>>>>>>>> still needs to wait for the command to complete before issuing Start
>>>>>>>> Transfer command. Otherwise Start Transfer won't go through. If we know
>>>>>>>> that we're not going to issue Start Transfer any time soon, then we may
>>>>>>>> be able to get away with ignoring End Transfer command completion.
>>>>>>>>
>>>>>>>
>>>>>>> I see.  Currently, in the place that we do use
>>>>>>> dwc3_ep0_end_control_data(), its followed by
>>>>>>> dwc3_ep0_stall_and_restart() which would execute start transfer.  For
>>>>>>
>>>>>> That doesn't look right. You can try to see if it can recover from a
>>>>>> control write request. Often time we do control read and not write.
>>>>>> (i.e. try to End Transfer and immediately Start Transfer on the same
>>>>>> direction control endpoint).
>>>>>>
>>>>> OK, I can try, but just to clarify, I was referring to how it was being
>>>>> done in:
>>>>>
>>>>> static void dwc3_ep0_xfernotready(struct dwc3 *dwc,
>>>>> 		const struct dwc3_event_depevt *event)
>>>>> {
>>>>> ...
>>>>> 		if (dwc->ep0_expect_in != event->endpoint_number) {
>>>>> 			struct dwc3_ep	*dep = dwc->eps[dwc->ep0_expect_in];
>>>>>
>>>>> 			dev_err(dwc->dev, "unexpected direction for Data Phase\n");
>>>>> 			dwc3_ep0_end_control_data(dwc, dep);
>>>>> 			dwc3_ep0_stall_and_restart(dwc);
>>>>> 			return;
>>>>> 		}
>>>>>
>>>
>>> Looking at this snippet again, it looks wrong. For control write
>>> unexpected direction, if the driver hasn't setup and started the DATA
>>> phase yet, then it's fine, but there is a problem if it did.
>>>
>>> Since dwc3_ep0_end_control_data() doesn't issue End Transfer command to
>>> ep0 due to the resource_index check, it doesn't follow the control
>>
>> IIRC resource_index is always non-zero, so the command should be
> 
> No, resource_index for ep0out is 0, ep0in is 1. You can check from any
> of the driver tracepoint log for the return value of Start Transfer
> command for the resource index of ep0. There could be a mixed up with
> the undocumented return value of Set Endpoint Transfer Resource command
> before when this code was written, don't mix up with that.
> 
>> triggered. If you have access to a Lecroy USB Trainer, could you script
>> this very scenario for verification?
> 
> For anyone who wants to work on this, we don't need a LeCroy USB
> trainer. If you use xhci host, just modify the xhci-ring.c to queue a
> wrong direction DATA phase TRB of a particular control write request
> test, and continue with the next control requests.
> 
Let me give this a try since I already have a modified (broken :)) XHCI
stack.

Thanks
Wesley Cheng
>>
>>> transfer flow model in the programming guide. This may cause
>>> dwc3_ep0_stall_and_restart() to overwrite the TRBs for the DATA phase
>>> with SETUP stage. Also, if the ep0 is already started, the driver won't
>>> issue Start Transfer command again.
>>
>>> This issue is unlikely to occur unless we see a misbehave host for
>>> control write request. Regardless, we need to fix this. I may need some
>>
>> right, it would be a misbehaving host, however databook called it out as
>> something that _can_ happen. Moreover, I have vague memories of this
>> being one of the test cases in Lecroy's USB Certification Suite.
>>
> 
> Yes, it's something that can happen, and dwc3 should be able to handle
> it. If you remember which test in particular that tests this, let me
> know. I want to check how it was passed.
> 
>>> time before I can create a patch and test it. If you or anyone is up to
>>> take this on, it'd be highly appreciated.
>>
>> Before we go ahead writing a patch for this, I'd really like to see
>> traces showing this failure and a minimal reproducer. The reproducer
>> would probably have to be a script for Lecroy's USB Trainer.
>>
>> Keep in mind this entire ep0 stack used to pass USBCV on every -rc and
>> major release (before I lost access to all my USB gear heh).
>>
> 
> Are you referring to Ch9 USBCV? I don't recall there's a particular test
> for this.
> 
> There should be a red flag whenever we see End Transfer command
> immediately follows by a Start Transfer command without any waiting for
> End Transfer completion. Though, in this case, we don't go through with
> the End Transfer for ep0 due to the resource_index check in
> dwc3_ep0_end_control_data().
> 
> BR,
> Thinh
>
Thinh Nguyen Aug. 17, 2021, 1:25 a.m. UTC | #13
Felipe Balbi wrote:
> 
> Hi,
> 
> Thinh Nguyen <Thinh.Nguyen@synopsys.com> writes:
>>> Thinh Nguyen <Thinh.Nguyen@synopsys.com> writes:
>>>>>>>>>>>> If this occurs, then the entire pullup disable routine is skipped and
>>>>>>>>>>>> proper cleanup and halting of the controller does not complete.
>>>>>>>>>>>> Instead of returning an error (which is ignored from the UDC
>>>>>>>>>>>> perspective), do what is mentioned in the comments and force the
>>>>>>>>>>>> transaction to complete and put the ep0state back to the SETUP phase.
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Wesley Cheng <wcheng@codeaurora.org>
>>>>>>>>>>>> ---
>>>>>>>>>>>>  drivers/usb/dwc3/ep0.c    | 4 ++--
>>>>>>>>>>>>  drivers/usb/dwc3/gadget.c | 6 +++++-
>>>>>>>>>>>>  drivers/usb/dwc3/gadget.h | 3 +++
>>>>>>>>>>>>  3 files changed, 10 insertions(+), 3 deletions(-)
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
>>>>>>>>>>>> index 6587394..abfc42b 100644
>>>>>>>>>>>> --- a/drivers/usb/dwc3/ep0.c
>>>>>>>>>>>> +++ b/drivers/usb/dwc3/ep0.c
>>>>>>>>>>>> @@ -218,7 +218,7 @@ int dwc3_gadget_ep0_queue(struct usb_ep *ep, struct usb_request *request,
>>>>>>>>>>>>  	return ret;
>>>>>>>>>>>>  }
>>>>>>>>>>>>  
>>>>>>>>>>>> -static void dwc3_ep0_stall_and_restart(struct dwc3 *dwc)
>>>>>>>>>>>> +void dwc3_ep0_stall_and_restart(struct dwc3 *dwc)
>>>>>>>>>>>>  {
>>>>>>>>>>>>  	struct dwc3_ep		*dep;
>>>>>>>>>>>>  
>>>>>>>>>>>> @@ -1073,7 +1073,7 @@ void dwc3_ep0_send_delayed_status(struct dwc3 *dwc)
>>>>>>>>>>>>  	__dwc3_ep0_do_control_status(dwc, dwc->eps[direction]);
>>>>>>>>>>>>  }
>>>>>>>>>>>>  
>>>>>>>>>>>> -static void dwc3_ep0_end_control_data(struct dwc3 *dwc, struct dwc3_ep *dep)
>>>>>>>>>>>> +void dwc3_ep0_end_control_data(struct dwc3 *dwc, struct dwc3_ep *dep)
>>>>>>>>>>>>  {
>>>>>>>>>>>>  	struct dwc3_gadget_ep_cmd_params params;
>>>>>>>>>>>>  	u32			cmd;
>>>>>>>>>>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>>>>>>>>>>>> index 54c5a08..a0e2e4d 100644
>>>>>>>>>>>> --- a/drivers/usb/dwc3/gadget.c
>>>>>>>>>>>> +++ b/drivers/usb/dwc3/gadget.c
>>>>>>>>>>>> @@ -2437,7 +2437,11 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on)
>>>>>>>>>>>>  				msecs_to_jiffies(DWC3_PULL_UP_TIMEOUT));
>>>>>>>>>>>>  		if (ret == 0) {
>>>>>>>>>>>>  			dev_err(dwc->dev, "timed out waiting for SETUP phase\n");
>>>>>>>>>>>> -			return -ETIMEDOUT;
>>>>>>>>>>>> +			spin_lock_irqsave(&dwc->lock, flags);
>>>>>>>>>>>> +			dwc3_ep0_end_control_data(dwc, dwc->eps[0]);
>>>>>>>>>>>> +			dwc3_ep0_end_control_data(dwc, dwc->eps[1]);
>>>>>>>>>>>
>>>>>>>>>>> End transfer command takes time, need to wait for it to complete before
>>>>>>>>>>> issuing Start transfer again. Also, why restart again when it's about to
>>>>>>>>>>> be disconnected.
>>>>>>>>>>
>>>>>>>>>> I can try without restarting it again, and see if that works.  Instead
>>>>>>>>>> of waiting for the command complete event, can we set the ForceRM bit,
>>>>>>>>>> similar to what we do for dwc3_remove_requests()?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ForceRM=1 means that the controller will ignore updating the TRBs
>>>>>>>>> (including not clearing the HWO and remain transfer size). The driver
>>>>>>>>> still needs to wait for the command to complete before issuing Start
>>>>>>>>> Transfer command. Otherwise Start Transfer won't go through. If we know
>>>>>>>>> that we're not going to issue Start Transfer any time soon, then we may
>>>>>>>>> be able to get away with ignoring End Transfer command completion.
>>>>>>>>>
>>>>>>>>
>>>>>>>> I see.  Currently, in the place that we do use
>>>>>>>> dwc3_ep0_end_control_data(), its followed by
>>>>>>>> dwc3_ep0_stall_and_restart() which would execute start transfer.  For
>>>>>>>
>>>>>>> That doesn't look right. You can try to see if it can recover from a
>>>>>>> control write request. Often time we do control read and not write.
>>>>>>> (i.e. try to End Transfer and immediately Start Transfer on the same
>>>>>>> direction control endpoint).
>>>>>>>
>>>>>> OK, I can try, but just to clarify, I was referring to how it was being
>>>>>> done in:
>>>>>>
>>>>>> static void dwc3_ep0_xfernotready(struct dwc3 *dwc,
>>>>>> 		const struct dwc3_event_depevt *event)
>>>>>> {
>>>>>> ...
>>>>>> 		if (dwc->ep0_expect_in != event->endpoint_number) {
>>>>>> 			struct dwc3_ep	*dep = dwc->eps[dwc->ep0_expect_in];
>>>>>>
>>>>>> 			dev_err(dwc->dev, "unexpected direction for Data Phase\n");
>>>>>> 			dwc3_ep0_end_control_data(dwc, dep);
>>>>>> 			dwc3_ep0_stall_and_restart(dwc);
>>>>>> 			return;
>>>>>> 		}
>>>>>>
>>>>
>>>> Looking at this snippet again, it looks wrong. For control write
>>>> unexpected direction, if the driver hasn't setup and started the DATA
>>>> phase yet, then it's fine, but there is a problem if it did.
>>>>
>>>> Since dwc3_ep0_end_control_data() doesn't issue End Transfer command to
>>>> ep0 due to the resource_index check, it doesn't follow the control
>>>
>>> IIRC resource_index is always non-zero, so the command should be
>>
>> No, resource_index for ep0out is 0, ep0in is 1. You can check from any
>> of the driver tracepoint log for the return value of Start Transfer
>> command for the resource index of ep0. There could be a mixed up with
>> the undocumented return value of Set Endpoint Transfer Resource command
>> before when this code was written, don't mix up with that.
> 
> fair enough :-)
> 
>>> triggered. If you have access to a Lecroy USB Trainer, could you script
>>> this very scenario for verification?
>>
>> For anyone who wants to work on this, we don't need a LeCroy USB
>> trainer. If you use xhci host, just modify the xhci-ring.c to queue a
>> wrong direction DATA phase TRB of a particular control write request
>> test, and continue with the next control requests.
> 
> That's true.
> 
>>>> transfer flow model in the programming guide. This may cause
>>>> dwc3_ep0_stall_and_restart() to overwrite the TRBs for the DATA phase
>>>> with SETUP stage. Also, if the ep0 is already started, the driver won't
>>>> issue Start Transfer command again.
>>>
>>>> This issue is unlikely to occur unless we see a misbehave host for
>>>> control write request. Regardless, we need to fix this. I may need some
>>>
>>> right, it would be a misbehaving host, however databook called it out as
>>> something that _can_ happen. Moreover, I have vague memories of this
>>> being one of the test cases in Lecroy's USB Certification Suite.
>>
>> Yes, it's something that can happen, and dwc3 should be able to handle
>> it. If you remember which test in particular that tests this, let me
>> know. I want to check how it was passed.
> 
> Unfortunately, not off the top of my head.
> 
>>>> time before I can create a patch and test it. If you or anyone is up to
>>>> take this on, it'd be highly appreciated.
>>>
>>> Before we go ahead writing a patch for this, I'd really like to see
>>> traces showing this failure and a minimal reproducer. The reproducer
>>> would probably have to be a script for Lecroy's USB Trainer.
>>>
>>> Keep in mind this entire ep0 stack used to pass USBCV on every -rc and
>>> major release (before I lost access to all my USB gear heh).
>>>
>>
>> Are you referring to Ch9 USBCV? I don't recall there's a particular test
>> for this.
> 
> No, no. USBCV doesn't have tests for this. Lecroy's Test Suite was more
> extensive than USBCV. I was simply saying that changes to ep0 (and any
> of the low level control pipe handling) should be done with care to make
> sure we don't break certification for folks who'd like to pursue that
> route.
> 

Yes, I agree.

>> There should be a red flag whenever we see End Transfer command
>> immediately follows by a Start Transfer command without any waiting for
>> End Transfer completion. Though, in this case, we don't go through with
>> the End Transfer for ep0 due to the resource_index check in
>> dwc3_ep0_end_control_data().
> 
> Understood. Do you have sample traces showing this case?
> 

No, I'd need to create a test case for it.

BR,
Thinh
Wesley Cheng Aug. 18, 2021, 9:24 p.m. UTC | #14
Hi Thinh/Felipe,

On 8/16/2021 12:13 PM, Wesley Cheng wrote:
> Hi Thinh,
> 
> On 8/15/2021 5:33 PM, Thinh Nguyen wrote:
>> Felipe Balbi wrote:
>>>
>>> Hi,
>>>
>>> Thinh Nguyen <Thinh.Nguyen@synopsys.com> writes:
>>>>>>>>>>>> If this occurs, then the entire pullup disable routine is skipped and
>>>>>>>>>>>> proper cleanup and halting of the controller does not complete.
>>>>>>>>>>>> Instead of returning an error (which is ignored from the UDC
>>>>>>>>>>>> perspective), do what is mentioned in the comments and force the
>>>>>>>>>>>> transaction to complete and put the ep0state back to the SETUP phase.
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Wesley Cheng <wcheng@codeaurora.org>
>>>>>>>>>>>> ---
>>>>>>>>>>>>  drivers/usb/dwc3/ep0.c    | 4 ++--
>>>>>>>>>>>>  drivers/usb/dwc3/gadget.c | 6 +++++-
>>>>>>>>>>>>  drivers/usb/dwc3/gadget.h | 3 +++
>>>>>>>>>>>>  3 files changed, 10 insertions(+), 3 deletions(-)
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
>>>>>>>>>>>> index 6587394..abfc42b 100644
>>>>>>>>>>>> --- a/drivers/usb/dwc3/ep0.c
>>>>>>>>>>>> +++ b/drivers/usb/dwc3/ep0.c
>>>>>>>>>>>> @@ -218,7 +218,7 @@ int dwc3_gadget_ep0_queue(struct usb_ep *ep, struct usb_request *request,
>>>>>>>>>>>>  	return ret;
>>>>>>>>>>>>  }
>>>>>>>>>>>>  
>>>>>>>>>>>> -static void dwc3_ep0_stall_and_restart(struct dwc3 *dwc)
>>>>>>>>>>>> +void dwc3_ep0_stall_and_restart(struct dwc3 *dwc)
>>>>>>>>>>>>  {
>>>>>>>>>>>>  	struct dwc3_ep		*dep;
>>>>>>>>>>>>  
>>>>>>>>>>>> @@ -1073,7 +1073,7 @@ void dwc3_ep0_send_delayed_status(struct dwc3 *dwc)
>>>>>>>>>>>>  	__dwc3_ep0_do_control_status(dwc, dwc->eps[direction]);
>>>>>>>>>>>>  }
>>>>>>>>>>>>  
>>>>>>>>>>>> -static void dwc3_ep0_end_control_data(struct dwc3 *dwc, struct dwc3_ep *dep)
>>>>>>>>>>>> +void dwc3_ep0_end_control_data(struct dwc3 *dwc, struct dwc3_ep *dep)
>>>>>>>>>>>>  {
>>>>>>>>>>>>  	struct dwc3_gadget_ep_cmd_params params;
>>>>>>>>>>>>  	u32			cmd;
>>>>>>>>>>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>>>>>>>>>>>> index 54c5a08..a0e2e4d 100644
>>>>>>>>>>>> --- a/drivers/usb/dwc3/gadget.c
>>>>>>>>>>>> +++ b/drivers/usb/dwc3/gadget.c
>>>>>>>>>>>> @@ -2437,7 +2437,11 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on)
>>>>>>>>>>>>  				msecs_to_jiffies(DWC3_PULL_UP_TIMEOUT));
>>>>>>>>>>>>  		if (ret == 0) {
>>>>>>>>>>>>  			dev_err(dwc->dev, "timed out waiting for SETUP phase\n");
>>>>>>>>>>>> -			return -ETIMEDOUT;
>>>>>>>>>>>> +			spin_lock_irqsave(&dwc->lock, flags);
>>>>>>>>>>>> +			dwc3_ep0_end_control_data(dwc, dwc->eps[0]);
>>>>>>>>>>>> +			dwc3_ep0_end_control_data(dwc, dwc->eps[1]);
>>>>>>>>>>>
>>>>>>>>>>> End transfer command takes time, need to wait for it to complete before
>>>>>>>>>>> issuing Start transfer again. Also, why restart again when it's about to
>>>>>>>>>>> be disconnected.
>>>>>>>>>>
>>>>>>>>>> I can try without restarting it again, and see if that works.  Instead
>>>>>>>>>> of waiting for the command complete event, can we set the ForceRM bit,
>>>>>>>>>> similar to what we do for dwc3_remove_requests()?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ForceRM=1 means that the controller will ignore updating the TRBs
>>>>>>>>> (including not clearing the HWO and remain transfer size). The driver
>>>>>>>>> still needs to wait for the command to complete before issuing Start
>>>>>>>>> Transfer command. Otherwise Start Transfer won't go through. If we know
>>>>>>>>> that we're not going to issue Start Transfer any time soon, then we may
>>>>>>>>> be able to get away with ignoring End Transfer command completion.
>>>>>>>>>
>>>>>>>>
>>>>>>>> I see.  Currently, in the place that we do use
>>>>>>>> dwc3_ep0_end_control_data(), its followed by
>>>>>>>> dwc3_ep0_stall_and_restart() which would execute start transfer.  For
>>>>>>>
>>>>>>> That doesn't look right. You can try to see if it can recover from a
>>>>>>> control write request. Often time we do control read and not write.
>>>>>>> (i.e. try to End Transfer and immediately Start Transfer on the same
>>>>>>> direction control endpoint).
>>>>>>>
>>>>>> OK, I can try, but just to clarify, I was referring to how it was being
>>>>>> done in:
>>>>>>
>>>>>> static void dwc3_ep0_xfernotready(struct dwc3 *dwc,
>>>>>> 		const struct dwc3_event_depevt *event)
>>>>>> {
>>>>>> ...
>>>>>> 		if (dwc->ep0_expect_in != event->endpoint_number) {
>>>>>> 			struct dwc3_ep	*dep = dwc->eps[dwc->ep0_expect_in];
>>>>>>
>>>>>> 			dev_err(dwc->dev, "unexpected direction for Data Phase\n");
>>>>>> 			dwc3_ep0_end_control_data(dwc, dep);
>>>>>> 			dwc3_ep0_stall_and_restart(dwc);
>>>>>> 			return;
>>>>>> 		}
>>>>>>
>>>>
>>>> Looking at this snippet again, it looks wrong. For control write
>>>> unexpected direction, if the driver hasn't setup and started the DATA
>>>> phase yet, then it's fine, but there is a problem if it did.
>>>>
>>>> Since dwc3_ep0_end_control_data() doesn't issue End Transfer command to
>>>> ep0 due to the resource_index check, it doesn't follow the control
>>>
>>> IIRC resource_index is always non-zero, so the command should be
>>
>> No, resource_index for ep0out is 0, ep0in is 1. You can check from any
>> of the driver tracepoint log for the return value of Start Transfer
>> command for the resource index of ep0. There could be a mixed up with
>> the undocumented return value of Set Endpoint Transfer Resource command
>> before when this code was written, don't mix up with that.
>>
>>> triggered. If you have access to a Lecroy USB Trainer, could you script
>>> this very scenario for verification?
>>
>> For anyone who wants to work on this, we don't need a LeCroy USB
>> trainer. If you use xhci host, just modify the xhci-ring.c to queue a
>> wrong direction DATA phase TRB of a particular control write request
>> test, and continue with the next control requests.
>>
> Let me give this a try since I already have a modified (broken :)) XHCI
> stack.
> 
> Thanks
> Wesley Cheng

Sorry for the late response.  I was trying to get a reliable change to
get the issue to reproduce.  I think I was able to find a set up which
will generate the unexpected direction issue.  I'll try my best to
summarize the traces here:

Set up:
- Modified XHCI stack to queue an IN TRB for a three stage CONTROL OUT
transaction (OUT data stage)
- Device is using RNDIS, as that has interface specific commands (ie
SEND_ENCAPSUALTED messages)

Kernel Log:
[ 1255.312870] msm-usb-hsphy 88e3000.hsphy: Avail curr from USB = 900
[ 1255.315300] dwc3_ep0_xfernotready event status = 1
[ 1255.315429] dwc3_ep0_xfernotready event status = 2
[ 1255.316390] android_work: sent uevent USB_STATE=CONFIGURED
[ 1255.317467] dwc3_ep0_xfernotready event status = 1
[ 1255.317588] dwc3_ep0_xfernotready event status = 2
[ 1255.334196] dwc3_ep0_xfernotready event status = 1
[ 1255.334217] dwc3 a600000.dwc3: unexpected direction for Data Phase
[ 1255.334311] rndis_msg_parser: unknown RNDIS message 0x0052033A len
4456526
[ 1255.334328] RNDIS command error -524, 0/24
[ 1255.334369] ------------[ cut here ]------------
[ 1255.334377] dwc3 a600000.dwc3: No resource for ep0out
[ 1255.334440] WARNING: CPU: 0 PID: 8364 at
drivers/usb/dwc3/gadget.c:360 dwc3_send_gadget_ep_cmd+0x3c4/0x96c
...
 1255.336163] WARNING: CPU: 0 PID: 8364 at drivers/usb/dwc3/ep0.c:281
dwc3_ep0_out_start+0x108/0x144

So the kernel log does indeed show the concern mentioned by Thinh, where
after ending the transfer, we do see the dwc3_ep0_out_start() fail due
to no xfer resource.

ftrace:
<...>-8364    [000] d..1  1255.333988: dwc3_ctrl_req: Get Interface
Status(Intf = 0, Length = 24)
<...>-8364    [000] d..1  1255.334115: dwc3_prepare_trb: ep0out: trb
ffffffc01bffd000 (E1:D0) buf 00000000efb76000 size 24 ctrl 00000455
(HlCs:Sc:data)
<...>-8364    [000] d..1  1255.334128: dwc3_prepare_trb: ep0out: trb
ffffffc01bffd010 (E1:D0) buf 00000000efff9000 size 488 ctrl 00000c53
(HLcs:SC:data)
<...>-8364    [000] d..1  1255.334166: dwc3_gadget_ep_cmd: ep0out: cmd
'Start Transfer' [406] params 00000000 efffa000 00000000 --> status:
Successful
<...>-8364    [000] d..1  1255.334239: dwc3_gadget_ep_cmd: ep0out: cmd
'Set Stall' [404] params 00000000 00000000 00000000 --> status: Successful
<...>-8364    [000] d..1  1255.334291: dwc3_gadget_giveback: ep0out: req
ffffff8891724e00 length 0/24 zsI ==> -104
<...>-8364    [000] d..1  1255.334339: dwc3_prepare_trb: ep0out: trb
ffffffc01bffd010 (E1:D0) buf 00000000efffa000 size 8 ctrl 00000c23
(HLcs:SC:setup)
<...>-8364    [000] d..1  1255.336099: dwc3_gadget_ep_cmd: ep0out: cmd
'Start Transfer' [406] params 00000000 efffa000 00000000 --> status: No
Resource
<...>-8364    [000] d..1  1255.357594: dwc3_ctrl_req: 00 60 b7 ef 00 00
00 00
<...>-8364    [000] d..1  1255.357680: dwc3_gadget_ep_cmd: ep0out: cmd
'Set Stall' [404] params 00000000 00000000 00000000 --> status: Successful
<...>-8364    [000] d..1  1255.357696: dwc3_prepare_trb: ep0out: trb
ffffffc01bffd010 (E1:D0) buf 00000000efffa000 size 8 ctrl 00000c23
(HLcs:SC:setup)
<...>-8364    [000] d..1  1255.357722: dwc3_gadget_ep_cmd: ep0out: cmd
'Start Transfer' [406] params 00000000 efffa000 00000000 --> status:
Successful
<...>-224     [005] d..1  1266.313014: dwc3_gadget_ep_cmd: ep2out: cmd
'End Transfer' [40c08] params 00000000 00000000 00000000 --> status:
Timed Out

Thanks
Wesley Cheng
Thinh Nguyen Aug. 19, 2021, 12:40 a.m. UTC | #15
Wesley Cheng wrote:
> Hi Thinh/Felipe,
> 
> On 8/16/2021 12:13 PM, Wesley Cheng wrote:
>> Hi Thinh,
>>
>> On 8/15/2021 5:33 PM, Thinh Nguyen wrote:
>>> Felipe Balbi wrote:
>>>>
>>>> Hi,
>>>>
>>>> Thinh Nguyen <Thinh.Nguyen@synopsys.com> writes:
>>>>>>>>>>>>> If this occurs, then the entire pullup disable routine is skipped and
>>>>>>>>>>>>> proper cleanup and halting of the controller does not complete.
>>>>>>>>>>>>> Instead of returning an error (which is ignored from the UDC
>>>>>>>>>>>>> perspective), do what is mentioned in the comments and force the
>>>>>>>>>>>>> transaction to complete and put the ep0state back to the SETUP phase.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Signed-off-by: Wesley Cheng <wcheng@codeaurora.org>
>>>>>>>>>>>>> ---
>>>>>>>>>>>>>  drivers/usb/dwc3/ep0.c    | 4 ++--
>>>>>>>>>>>>>  drivers/usb/dwc3/gadget.c | 6 +++++-
>>>>>>>>>>>>>  drivers/usb/dwc3/gadget.h | 3 +++
>>>>>>>>>>>>>  3 files changed, 10 insertions(+), 3 deletions(-)
>>>>>>>>>>>>>
>>>>>>>>>>>>> diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
>>>>>>>>>>>>> index 6587394..abfc42b 100644
>>>>>>>>>>>>> --- a/drivers/usb/dwc3/ep0.c
>>>>>>>>>>>>> +++ b/drivers/usb/dwc3/ep0.c
>>>>>>>>>>>>> @@ -218,7 +218,7 @@ int dwc3_gadget_ep0_queue(struct usb_ep *ep, struct usb_request *request,
>>>>>>>>>>>>>  	return ret;
>>>>>>>>>>>>>  }
>>>>>>>>>>>>>  
>>>>>>>>>>>>> -static void dwc3_ep0_stall_and_restart(struct dwc3 *dwc)
>>>>>>>>>>>>> +void dwc3_ep0_stall_and_restart(struct dwc3 *dwc)
>>>>>>>>>>>>>  {
>>>>>>>>>>>>>  	struct dwc3_ep		*dep;
>>>>>>>>>>>>>  
>>>>>>>>>>>>> @@ -1073,7 +1073,7 @@ void dwc3_ep0_send_delayed_status(struct dwc3 *dwc)
>>>>>>>>>>>>>  	__dwc3_ep0_do_control_status(dwc, dwc->eps[direction]);
>>>>>>>>>>>>>  }
>>>>>>>>>>>>>  
>>>>>>>>>>>>> -static void dwc3_ep0_end_control_data(struct dwc3 *dwc, struct dwc3_ep *dep)
>>>>>>>>>>>>> +void dwc3_ep0_end_control_data(struct dwc3 *dwc, struct dwc3_ep *dep)
>>>>>>>>>>>>>  {
>>>>>>>>>>>>>  	struct dwc3_gadget_ep_cmd_params params;
>>>>>>>>>>>>>  	u32			cmd;
>>>>>>>>>>>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>>>>>>>>>>>>> index 54c5a08..a0e2e4d 100644
>>>>>>>>>>>>> --- a/drivers/usb/dwc3/gadget.c
>>>>>>>>>>>>> +++ b/drivers/usb/dwc3/gadget.c
>>>>>>>>>>>>> @@ -2437,7 +2437,11 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on)
>>>>>>>>>>>>>  				msecs_to_jiffies(DWC3_PULL_UP_TIMEOUT));
>>>>>>>>>>>>>  		if (ret == 0) {
>>>>>>>>>>>>>  			dev_err(dwc->dev, "timed out waiting for SETUP phase\n");
>>>>>>>>>>>>> -			return -ETIMEDOUT;
>>>>>>>>>>>>> +			spin_lock_irqsave(&dwc->lock, flags);
>>>>>>>>>>>>> +			dwc3_ep0_end_control_data(dwc, dwc->eps[0]);
>>>>>>>>>>>>> +			dwc3_ep0_end_control_data(dwc, dwc->eps[1]);
>>>>>>>>>>>>
>>>>>>>>>>>> End transfer command takes time, need to wait for it to complete before
>>>>>>>>>>>> issuing Start transfer again. Also, why restart again when it's about to
>>>>>>>>>>>> be disconnected.
>>>>>>>>>>>
>>>>>>>>>>> I can try without restarting it again, and see if that works.  Instead
>>>>>>>>>>> of waiting for the command complete event, can we set the ForceRM bit,
>>>>>>>>>>> similar to what we do for dwc3_remove_requests()?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ForceRM=1 means that the controller will ignore updating the TRBs
>>>>>>>>>> (including not clearing the HWO and remain transfer size). The driver
>>>>>>>>>> still needs to wait for the command to complete before issuing Start
>>>>>>>>>> Transfer command. Otherwise Start Transfer won't go through. If we know
>>>>>>>>>> that we're not going to issue Start Transfer any time soon, then we may
>>>>>>>>>> be able to get away with ignoring End Transfer command completion.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I see.  Currently, in the place that we do use
>>>>>>>>> dwc3_ep0_end_control_data(), its followed by
>>>>>>>>> dwc3_ep0_stall_and_restart() which would execute start transfer.  For
>>>>>>>>
>>>>>>>> That doesn't look right. You can try to see if it can recover from a
>>>>>>>> control write request. Often time we do control read and not write.
>>>>>>>> (i.e. try to End Transfer and immediately Start Transfer on the same
>>>>>>>> direction control endpoint).
>>>>>>>>
>>>>>>> OK, I can try, but just to clarify, I was referring to how it was being
>>>>>>> done in:
>>>>>>>
>>>>>>> static void dwc3_ep0_xfernotready(struct dwc3 *dwc,
>>>>>>> 		const struct dwc3_event_depevt *event)
>>>>>>> {
>>>>>>> ...
>>>>>>> 		if (dwc->ep0_expect_in != event->endpoint_number) {
>>>>>>> 			struct dwc3_ep	*dep = dwc->eps[dwc->ep0_expect_in];
>>>>>>>
>>>>>>> 			dev_err(dwc->dev, "unexpected direction for Data Phase\n");
>>>>>>> 			dwc3_ep0_end_control_data(dwc, dep);
>>>>>>> 			dwc3_ep0_stall_and_restart(dwc);
>>>>>>> 			return;
>>>>>>> 		}
>>>>>>>
>>>>>
>>>>> Looking at this snippet again, it looks wrong. For control write
>>>>> unexpected direction, if the driver hasn't setup and started the DATA
>>>>> phase yet, then it's fine, but there is a problem if it did.
>>>>>
>>>>> Since dwc3_ep0_end_control_data() doesn't issue End Transfer command to
>>>>> ep0 due to the resource_index check, it doesn't follow the control
>>>>
>>>> IIRC resource_index is always non-zero, so the command should be
>>>
>>> No, resource_index for ep0out is 0, ep0in is 1. You can check from any
>>> of the driver tracepoint log for the return value of Start Transfer
>>> command for the resource index of ep0. There could be a mixed up with
>>> the undocumented return value of Set Endpoint Transfer Resource command
>>> before when this code was written, don't mix up with that.
>>>
>>>> triggered. If you have access to a Lecroy USB Trainer, could you script
>>>> this very scenario for verification?
>>>
>>> For anyone who wants to work on this, we don't need a LeCroy USB
>>> trainer. If you use xhci host, just modify the xhci-ring.c to queue a
>>> wrong direction DATA phase TRB of a particular control write request
>>> test, and continue with the next control requests.
>>>
>> Let me give this a try since I already have a modified (broken :)) XHCI
>> stack.
>>
>> Thanks
>> Wesley Cheng
> 
> Sorry for the late response.  I was trying to get a reliable change to
> get the issue to reproduce.  I think I was able to find a set up which
> will generate the unexpected direction issue.  I'll try my best to
> summarize the traces here:
> 
> Set up:
> - Modified XHCI stack to queue an IN TRB for a three stage CONTROL OUT
> transaction (OUT data stage)
> - Device is using RNDIS, as that has interface specific commands (ie
> SEND_ENCAPSUALTED messages)
> 
> Kernel Log:
> [ 1255.312870] msm-usb-hsphy 88e3000.hsphy: Avail curr from USB = 900
> [ 1255.315300] dwc3_ep0_xfernotready event status = 1
> [ 1255.315429] dwc3_ep0_xfernotready event status = 2
> [ 1255.316390] android_work: sent uevent USB_STATE=CONFIGURED
> [ 1255.317467] dwc3_ep0_xfernotready event status = 1
> [ 1255.317588] dwc3_ep0_xfernotready event status = 2
> [ 1255.334196] dwc3_ep0_xfernotready event status = 1
> [ 1255.334217] dwc3 a600000.dwc3: unexpected direction for Data Phase
> [ 1255.334311] rndis_msg_parser: unknown RNDIS message 0x0052033A len
> 4456526
> [ 1255.334328] RNDIS command error -524, 0/24
> [ 1255.334369] ------------[ cut here ]------------
> [ 1255.334377] dwc3 a600000.dwc3: No resource for ep0out
> [ 1255.334440] WARNING: CPU: 0 PID: 8364 at
> drivers/usb/dwc3/gadget.c:360 dwc3_send_gadget_ep_cmd+0x3c4/0x96c
> ...
>  1255.336163] WARNING: CPU: 0 PID: 8364 at drivers/usb/dwc3/ep0.c:281
> dwc3_ep0_out_start+0x108/0x144
> 
> So the kernel log does indeed show the concern mentioned by Thinh, where
> after ending the transfer, we do see the dwc3_ep0_out_start() fail due
> to no xfer resource.
> 

No, there's no end transfer command seen, and it's expected from code
review.

> ftrace:
> <...>-8364    [000] d..1  1255.333988: dwc3_ctrl_req: Get Interface
> Status(Intf = 0, Length = 24)
> <...>-8364    [000] d..1  1255.334115: dwc3_prepare_trb: ep0out: trb
> ffffffc01bffd000 (E1:D0) buf 00000000efb76000 size 24 ctrl 00000455
> (HlCs:Sc:data)
> <...>-8364    [000] d..1  1255.334128: dwc3_prepare_trb: ep0out: trb
> ffffffc01bffd010 (E1:D0) buf 00000000efff9000 size 488 ctrl 00000c53
> (HLcs:SC:data)
> <...>-8364    [000] d..1  1255.334166: dwc3_gadget_ep_cmd: ep0out: cmd
> 'Start Transfer' [406] params 00000000 efffa000 00000000 --> status:
> Successful
> <...>-8364    [000] d..1  1255.334239: dwc3_gadget_ep_cmd: ep0out: cmd
> 'Set Stall' [404] params 00000000 00000000 00000000 --> status: Successful
> <...>-8364    [000] d..1  1255.334291: dwc3_gadget_giveback: ep0out: req
> ffffff8891724e00 length 0/24 zsI ==> -104

It detected wrong direction and sets STALL here, but no End Transfer
command.

> <...>-8364    [000] d..1  1255.334339: dwc3_prepare_trb: ep0out: trb
> ffffffc01bffd010 (E1:D0) buf 00000000efffa000 size 8 ctrl 00000c23
> (HLcs:SC:setup)

The driver overwrote the active TRB with a new SETUP TRB.

> <...>-8364    [000] d..1  1255.336099: dwc3_gadget_ep_cmd: ep0out: cmd
> 'Start Transfer' [406] params 00000000 efffa000 00000000 --> status: No
> Resource

The dep->flags DWC3_EP_TRANSFER_STARTED got cleared. When it tries to
issue a Start Transfer a SETUP transfer, the command will fail with no
resource because the endpoint never ended properly.

> <...>-8364    [000] d..1  1255.357594: dwc3_ctrl_req: 00 60 b7 ef 00 00
> 00 00

Here is iffy because the behavior is undefined. The driver overwrote the
previous TRB. The Start Transfer command didn't go through, so the
controller still has the old TRB setup in its cache. It gets the next
SETUP request completion anyway because the SETUP stage is a short
packet. The driver updated its state that it's expecting the SETUP
stage, and it doesn't check the TRB write back buffer size for more or
less than 8 bytes or whether this is valid data.

Wesley, is this SETUP packet the correct RNDIS control request?

> <...>-8364    [000] d..1  1255.357680: dwc3_gadget_ep_cmd: ep0out: cmd
> 'Set Stall' [404] params 00000000 00000000 00000000 --> status: Successful

The device Set Stall on an unrecognized request, probably from the
application, which can be normal.

> <...>-8364    [000] d..1  1255.357696: dwc3_prepare_trb: ep0out: trb
> ffffffc01bffd010 (E1:D0) buf 00000000efffa000 size 8 ctrl 00000c23
> (HLcs:SC:setup)
> <...>-8364    [000] d..1  1255.357722: dwc3_gadget_ep_cmd: ep0out: cmd
> 'Start Transfer' [406] params 00000000 efffa000 00000000 --> status:
> Successful
> <...>-224     [005] d..1  1266.313014: dwc3_gadget_ep_cmd: ep2out: cmd
> 'End Transfer' [40c08] params 00000000 00000000 00000000 --> status:
> Timed Out
> 
> Thanks
> Wesley Cheng
> 

Thanks for the test Wesley

BR,
Thinh
Felipe Balbi Aug. 19, 2021, 5:36 a.m. UTC | #16
Hi,

Wesley Cheng <wcheng@codeaurora.org> writes:
>>>> Thinh Nguyen <Thinh.Nguyen@synopsys.com> writes:
>>>>>>>>>>>>> If this occurs, then the entire pullup disable routine is skipped and
>>>>>>>>>>>>> proper cleanup and halting of the controller does not complete.
>>>>>>>>>>>>> Instead of returning an error (which is ignored from the UDC
>>>>>>>>>>>>> perspective), do what is mentioned in the comments and force the
>>>>>>>>>>>>> transaction to complete and put the ep0state back to the SETUP phase.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Signed-off-by: Wesley Cheng <wcheng@codeaurora.org>
>>>>>>>>>>>>> ---
>>>>>>>>>>>>>  drivers/usb/dwc3/ep0.c    | 4 ++--
>>>>>>>>>>>>>  drivers/usb/dwc3/gadget.c | 6 +++++-
>>>>>>>>>>>>>  drivers/usb/dwc3/gadget.h | 3 +++
>>>>>>>>>>>>>  3 files changed, 10 insertions(+), 3 deletions(-)
>>>>>>>>>>>>>
>>>>>>>>>>>>> diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
>>>>>>>>>>>>> index 6587394..abfc42b 100644
>>>>>>>>>>>>> --- a/drivers/usb/dwc3/ep0.c
>>>>>>>>>>>>> +++ b/drivers/usb/dwc3/ep0.c
>>>>>>>>>>>>> @@ -218,7 +218,7 @@ int dwc3_gadget_ep0_queue(struct usb_ep *ep, struct usb_request *request,
>>>>>>>>>>>>>  	return ret;
>>>>>>>>>>>>>  }
>>>>>>>>>>>>>  
>>>>>>>>>>>>> -static void dwc3_ep0_stall_and_restart(struct dwc3 *dwc)
>>>>>>>>>>>>> +void dwc3_ep0_stall_and_restart(struct dwc3 *dwc)
>>>>>>>>>>>>>  {
>>>>>>>>>>>>>  	struct dwc3_ep		*dep;
>>>>>>>>>>>>>  
>>>>>>>>>>>>> @@ -1073,7 +1073,7 @@ void dwc3_ep0_send_delayed_status(struct dwc3 *dwc)
>>>>>>>>>>>>>  	__dwc3_ep0_do_control_status(dwc, dwc->eps[direction]);
>>>>>>>>>>>>>  }
>>>>>>>>>>>>>  
>>>>>>>>>>>>> -static void dwc3_ep0_end_control_data(struct dwc3 *dwc, struct dwc3_ep *dep)
>>>>>>>>>>>>> +void dwc3_ep0_end_control_data(struct dwc3 *dwc, struct dwc3_ep *dep)
>>>>>>>>>>>>>  {
>>>>>>>>>>>>>  	struct dwc3_gadget_ep_cmd_params params;
>>>>>>>>>>>>>  	u32			cmd;
>>>>>>>>>>>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>>>>>>>>>>>>> index 54c5a08..a0e2e4d 100644
>>>>>>>>>>>>> --- a/drivers/usb/dwc3/gadget.c
>>>>>>>>>>>>> +++ b/drivers/usb/dwc3/gadget.c
>>>>>>>>>>>>> @@ -2437,7 +2437,11 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on)
>>>>>>>>>>>>>  				msecs_to_jiffies(DWC3_PULL_UP_TIMEOUT));
>>>>>>>>>>>>>  		if (ret == 0) {
>>>>>>>>>>>>>  			dev_err(dwc->dev, "timed out waiting for SETUP phase\n");
>>>>>>>>>>>>> -			return -ETIMEDOUT;
>>>>>>>>>>>>> +			spin_lock_irqsave(&dwc->lock, flags);
>>>>>>>>>>>>> +			dwc3_ep0_end_control_data(dwc, dwc->eps[0]);
>>>>>>>>>>>>> +			dwc3_ep0_end_control_data(dwc, dwc->eps[1]);
>>>>>>>>>>>>
>>>>>>>>>>>> End transfer command takes time, need to wait for it to complete before
>>>>>>>>>>>> issuing Start transfer again. Also, why restart again when it's about to
>>>>>>>>>>>> be disconnected.
>>>>>>>>>>>
>>>>>>>>>>> I can try without restarting it again, and see if that works.  Instead
>>>>>>>>>>> of waiting for the command complete event, can we set the ForceRM bit,
>>>>>>>>>>> similar to what we do for dwc3_remove_requests()?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ForceRM=1 means that the controller will ignore updating the TRBs
>>>>>>>>>> (including not clearing the HWO and remain transfer size). The driver
>>>>>>>>>> still needs to wait for the command to complete before issuing Start
>>>>>>>>>> Transfer command. Otherwise Start Transfer won't go through. If we know
>>>>>>>>>> that we're not going to issue Start Transfer any time soon, then we may
>>>>>>>>>> be able to get away with ignoring End Transfer command completion.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I see.  Currently, in the place that we do use
>>>>>>>>> dwc3_ep0_end_control_data(), its followed by
>>>>>>>>> dwc3_ep0_stall_and_restart() which would execute start transfer.  For
>>>>>>>>
>>>>>>>> That doesn't look right. You can try to see if it can recover from a
>>>>>>>> control write request. Often time we do control read and not write.
>>>>>>>> (i.e. try to End Transfer and immediately Start Transfer on the same
>>>>>>>> direction control endpoint).
>>>>>>>>
>>>>>>> OK, I can try, but just to clarify, I was referring to how it was being
>>>>>>> done in:
>>>>>>>
>>>>>>> static void dwc3_ep0_xfernotready(struct dwc3 *dwc,
>>>>>>> 		const struct dwc3_event_depevt *event)
>>>>>>> {
>>>>>>> ...
>>>>>>> 		if (dwc->ep0_expect_in != event->endpoint_number) {
>>>>>>> 			struct dwc3_ep	*dep = dwc->eps[dwc->ep0_expect_in];
>>>>>>>
>>>>>>> 			dev_err(dwc->dev, "unexpected direction for Data Phase\n");
>>>>>>> 			dwc3_ep0_end_control_data(dwc, dep);
>>>>>>> 			dwc3_ep0_stall_and_restart(dwc);
>>>>>>> 			return;
>>>>>>> 		}
>>>>>>>
>>>>>
>>>>> Looking at this snippet again, it looks wrong. For control write
>>>>> unexpected direction, if the driver hasn't setup and started the DATA
>>>>> phase yet, then it's fine, but there is a problem if it did.
>>>>>
>>>>> Since dwc3_ep0_end_control_data() doesn't issue End Transfer command to
>>>>> ep0 due to the resource_index check, it doesn't follow the control
>>>>
>>>> IIRC resource_index is always non-zero, so the command should be
>>>
>>> No, resource_index for ep0out is 0, ep0in is 1. You can check from any
>>> of the driver tracepoint log for the return value of Start Transfer
>>> command for the resource index of ep0. There could be a mixed up with
>>> the undocumented return value of Set Endpoint Transfer Resource command
>>> before when this code was written, don't mix up with that.
>>>
>>>> triggered. If you have access to a Lecroy USB Trainer, could you script
>>>> this very scenario for verification?
>>>
>>> For anyone who wants to work on this, we don't need a LeCroy USB
>>> trainer. If you use xhci host, just modify the xhci-ring.c to queue a
>>> wrong direction DATA phase TRB of a particular control write request
>>> test, and continue with the next control requests.
>>>
>> Let me give this a try since I already have a modified (broken :)) XHCI
>> stack.
>> 
>> Thanks
>> Wesley Cheng
>
> Sorry for the late response.  I was trying to get a reliable change to
> get the issue to reproduce.  I think I was able to find a set up which
> will generate the unexpected direction issue.  I'll try my best to
> summarize the traces here:
>
> Set up:
> - Modified XHCI stack to queue an IN TRB for a three stage CONTROL OUT
> transaction (OUT data stage)
> - Device is using RNDIS, as that has interface specific commands (ie
> SEND_ENCAPSUALTED messages)
>
> Kernel Log:
> [ 1255.312870] msm-usb-hsphy 88e3000.hsphy: Avail curr from USB = 900
> [ 1255.315300] dwc3_ep0_xfernotready event status = 1
> [ 1255.315429] dwc3_ep0_xfernotready event status = 2
> [ 1255.316390] android_work: sent uevent USB_STATE=CONFIGURED
> [ 1255.317467] dwc3_ep0_xfernotready event status = 1
> [ 1255.317588] dwc3_ep0_xfernotready event status = 2
> [ 1255.334196] dwc3_ep0_xfernotready event status = 1
> [ 1255.334217] dwc3 a600000.dwc3: unexpected direction for Data Phase
> [ 1255.334311] rndis_msg_parser: unknown RNDIS message 0x0052033A len
> 4456526
> [ 1255.334328] RNDIS command error -524, 0/24
> [ 1255.334369] ------------[ cut here ]------------
> [ 1255.334377] dwc3 a600000.dwc3: No resource for ep0out
> [ 1255.334440] WARNING: CPU: 0 PID: 8364 at
> drivers/usb/dwc3/gadget.c:360 dwc3_send_gadget_ep_cmd+0x3c4/0x96c
> ...
>  1255.336163] WARNING: CPU: 0 PID: 8364 at drivers/usb/dwc3/ep0.c:281
> dwc3_ep0_out_start+0x108/0x144
>
> So the kernel log does indeed show the concern mentioned by Thinh, where
> after ending the transfer, we do see the dwc3_ep0_out_start() fail due
> to no xfer resource.

Right, we also don't see an end transfer in the traces that
follow. Thanks for confirming.
Wesley Cheng Aug. 19, 2021, 8:51 p.m. UTC | #17
Hi Thinh,

On 8/18/2021 5:40 PM, Thinh Nguyen wrote:
> Wesley Cheng wrote:
>> Hi Thinh/Felipe,
>>
>> On 8/16/2021 12:13 PM, Wesley Cheng wrote:
>>> Hi Thinh,
>>>
>>> On 8/15/2021 5:33 PM, Thinh Nguyen wrote:
>>>> Felipe Balbi wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> Thinh Nguyen <Thinh.Nguyen@synopsys.com> writes:
>>>>>>>>>>>>>> If this occurs, then the entire pullup disable routine is skipped and
>>>>>>>>>>>>>> proper cleanup and halting of the controller does not complete.
>>>>>>>>>>>>>> Instead of returning an error (which is ignored from the UDC
>>>>>>>>>>>>>> perspective), do what is mentioned in the comments and force the
>>>>>>>>>>>>>> transaction to complete and put the ep0state back to the SETUP phase.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Signed-off-by: Wesley Cheng <wcheng@codeaurora.org>
>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>  drivers/usb/dwc3/ep0.c    | 4 ++--
>>>>>>>>>>>>>>  drivers/usb/dwc3/gadget.c | 6 +++++-
>>>>>>>>>>>>>>  drivers/usb/dwc3/gadget.h | 3 +++
>>>>>>>>>>>>>>  3 files changed, 10 insertions(+), 3 deletions(-)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
>>>>>>>>>>>>>> index 6587394..abfc42b 100644
>>>>>>>>>>>>>> --- a/drivers/usb/dwc3/ep0.c
>>>>>>>>>>>>>> +++ b/drivers/usb/dwc3/ep0.c
>>>>>>>>>>>>>> @@ -218,7 +218,7 @@ int dwc3_gadget_ep0_queue(struct usb_ep *ep, struct usb_request *request,
>>>>>>>>>>>>>>  	return ret;
>>>>>>>>>>>>>>  }
>>>>>>>>>>>>>>  
>>>>>>>>>>>>>> -static void dwc3_ep0_stall_and_restart(struct dwc3 *dwc)
>>>>>>>>>>>>>> +void dwc3_ep0_stall_and_restart(struct dwc3 *dwc)
>>>>>>>>>>>>>>  {
>>>>>>>>>>>>>>  	struct dwc3_ep		*dep;
>>>>>>>>>>>>>>  
>>>>>>>>>>>>>> @@ -1073,7 +1073,7 @@ void dwc3_ep0_send_delayed_status(struct dwc3 *dwc)
>>>>>>>>>>>>>>  	__dwc3_ep0_do_control_status(dwc, dwc->eps[direction]);
>>>>>>>>>>>>>>  }
>>>>>>>>>>>>>>  
>>>>>>>>>>>>>> -static void dwc3_ep0_end_control_data(struct dwc3 *dwc, struct dwc3_ep *dep)
>>>>>>>>>>>>>> +void dwc3_ep0_end_control_data(struct dwc3 *dwc, struct dwc3_ep *dep)
>>>>>>>>>>>>>>  {
>>>>>>>>>>>>>>  	struct dwc3_gadget_ep_cmd_params params;
>>>>>>>>>>>>>>  	u32			cmd;
>>>>>>>>>>>>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>>>>>>>>>>>>>> index 54c5a08..a0e2e4d 100644
>>>>>>>>>>>>>> --- a/drivers/usb/dwc3/gadget.c
>>>>>>>>>>>>>> +++ b/drivers/usb/dwc3/gadget.c
>>>>>>>>>>>>>> @@ -2437,7 +2437,11 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on)
>>>>>>>>>>>>>>  				msecs_to_jiffies(DWC3_PULL_UP_TIMEOUT));
>>>>>>>>>>>>>>  		if (ret == 0) {
>>>>>>>>>>>>>>  			dev_err(dwc->dev, "timed out waiting for SETUP phase\n");
>>>>>>>>>>>>>> -			return -ETIMEDOUT;
>>>>>>>>>>>>>> +			spin_lock_irqsave(&dwc->lock, flags);
>>>>>>>>>>>>>> +			dwc3_ep0_end_control_data(dwc, dwc->eps[0]);
>>>>>>>>>>>>>> +			dwc3_ep0_end_control_data(dwc, dwc->eps[1]);
>>>>>>>>>>>>>
>>>>>>>>>>>>> End transfer command takes time, need to wait for it to complete before
>>>>>>>>>>>>> issuing Start transfer again. Also, why restart again when it's about to
>>>>>>>>>>>>> be disconnected.
>>>>>>>>>>>>
>>>>>>>>>>>> I can try without restarting it again, and see if that works.  Instead
>>>>>>>>>>>> of waiting for the command complete event, can we set the ForceRM bit,
>>>>>>>>>>>> similar to what we do for dwc3_remove_requests()?
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ForceRM=1 means that the controller will ignore updating the TRBs
>>>>>>>>>>> (including not clearing the HWO and remain transfer size). The driver
>>>>>>>>>>> still needs to wait for the command to complete before issuing Start
>>>>>>>>>>> Transfer command. Otherwise Start Transfer won't go through. If we know
>>>>>>>>>>> that we're not going to issue Start Transfer any time soon, then we may
>>>>>>>>>>> be able to get away with ignoring End Transfer command completion.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I see.  Currently, in the place that we do use
>>>>>>>>>> dwc3_ep0_end_control_data(), its followed by
>>>>>>>>>> dwc3_ep0_stall_and_restart() which would execute start transfer.  For
>>>>>>>>>
>>>>>>>>> That doesn't look right. You can try to see if it can recover from a
>>>>>>>>> control write request. Often time we do control read and not write.
>>>>>>>>> (i.e. try to End Transfer and immediately Start Transfer on the same
>>>>>>>>> direction control endpoint).
>>>>>>>>>
>>>>>>>> OK, I can try, but just to clarify, I was referring to how it was being
>>>>>>>> done in:
>>>>>>>>
>>>>>>>> static void dwc3_ep0_xfernotready(struct dwc3 *dwc,
>>>>>>>> 		const struct dwc3_event_depevt *event)
>>>>>>>> {
>>>>>>>> ...
>>>>>>>> 		if (dwc->ep0_expect_in != event->endpoint_number) {
>>>>>>>> 			struct dwc3_ep	*dep = dwc->eps[dwc->ep0_expect_in];
>>>>>>>>
>>>>>>>> 			dev_err(dwc->dev, "unexpected direction for Data Phase\n");
>>>>>>>> 			dwc3_ep0_end_control_data(dwc, dep);
>>>>>>>> 			dwc3_ep0_stall_and_restart(dwc);
>>>>>>>> 			return;
>>>>>>>> 		}
>>>>>>>>
>>>>>>
>>>>>> Looking at this snippet again, it looks wrong. For control write
>>>>>> unexpected direction, if the driver hasn't setup and started the DATA
>>>>>> phase yet, then it's fine, but there is a problem if it did.
>>>>>>
>>>>>> Since dwc3_ep0_end_control_data() doesn't issue End Transfer command to
>>>>>> ep0 due to the resource_index check, it doesn't follow the control
>>>>>
>>>>> IIRC resource_index is always non-zero, so the command should be
>>>>
>>>> No, resource_index for ep0out is 0, ep0in is 1. You can check from any
>>>> of the driver tracepoint log for the return value of Start Transfer
>>>> command for the resource index of ep0. There could be a mixed up with
>>>> the undocumented return value of Set Endpoint Transfer Resource command
>>>> before when this code was written, don't mix up with that.
>>>>
>>>>> triggered. If you have access to a Lecroy USB Trainer, could you script
>>>>> this very scenario for verification?
>>>>
>>>> For anyone who wants to work on this, we don't need a LeCroy USB
>>>> trainer. If you use xhci host, just modify the xhci-ring.c to queue a
>>>> wrong direction DATA phase TRB of a particular control write request
>>>> test, and continue with the next control requests.
>>>>
>>> Let me give this a try since I already have a modified (broken :)) XHCI
>>> stack.
>>>
>>> Thanks
>>> Wesley Cheng
>>
>> Sorry for the late response.  I was trying to get a reliable change to
>> get the issue to reproduce.  I think I was able to find a set up which
>> will generate the unexpected direction issue.  I'll try my best to
>> summarize the traces here:
>>
>> Set up:
>> - Modified XHCI stack to queue an IN TRB for a three stage CONTROL OUT
>> transaction (OUT data stage)
>> - Device is using RNDIS, as that has interface specific commands (ie
>> SEND_ENCAPSUALTED messages)
>>
>> Kernel Log:
>> [ 1255.312870] msm-usb-hsphy 88e3000.hsphy: Avail curr from USB = 900
>> [ 1255.315300] dwc3_ep0_xfernotready event status = 1
>> [ 1255.315429] dwc3_ep0_xfernotready event status = 2
>> [ 1255.316390] android_work: sent uevent USB_STATE=CONFIGURED
>> [ 1255.317467] dwc3_ep0_xfernotready event status = 1
>> [ 1255.317588] dwc3_ep0_xfernotready event status = 2
>> [ 1255.334196] dwc3_ep0_xfernotready event status = 1
>> [ 1255.334217] dwc3 a600000.dwc3: unexpected direction for Data Phase
>> [ 1255.334311] rndis_msg_parser: unknown RNDIS message 0x0052033A len
>> 4456526
>> [ 1255.334328] RNDIS command error -524, 0/24
>> [ 1255.334369] ------------[ cut here ]------------
>> [ 1255.334377] dwc3 a600000.dwc3: No resource for ep0out
>> [ 1255.334440] WARNING: CPU: 0 PID: 8364 at
>> drivers/usb/dwc3/gadget.c:360 dwc3_send_gadget_ep_cmd+0x3c4/0x96c
>> ...
>>  1255.336163] WARNING: CPU: 0 PID: 8364 at drivers/usb/dwc3/ep0.c:281
>> dwc3_ep0_out_start+0x108/0x144
>>
>> So the kernel log does indeed show the concern mentioned by Thinh, where
>> after ending the transfer, we do see the dwc3_ep0_out_start() fail due
>> to no xfer resource.
>>
> 
> No, there's no end transfer command seen, and it's expected from code
> review.
> 
Sorry, yes that's correct.
>> ftrace:
>> <...>-8364    [000] d..1  1255.333988: dwc3_ctrl_req: Get Interface
>> Status(Intf = 0, Length = 24)
>> <...>-8364    [000] d..1  1255.334115: dwc3_prepare_trb: ep0out: trb
>> ffffffc01bffd000 (E1:D0) buf 00000000efb76000 size 24 ctrl 00000455
>> (HlCs:Sc:data)
>> <...>-8364    [000] d..1  1255.334128: dwc3_prepare_trb: ep0out: trb
>> ffffffc01bffd010 (E1:D0) buf 00000000efff9000 size 488 ctrl 00000c53
>> (HLcs:SC:data)
>> <...>-8364    [000] d..1  1255.334166: dwc3_gadget_ep_cmd: ep0out: cmd
>> 'Start Transfer' [406] params 00000000 efffa000 00000000 --> status:
>> Successful
>> <...>-8364    [000] d..1  1255.334239: dwc3_gadget_ep_cmd: ep0out: cmd
>> 'Set Stall' [404] params 00000000 00000000 00000000 --> status: Successful
>> <...>-8364    [000] d..1  1255.334291: dwc3_gadget_giveback: ep0out: req
>> ffffff8891724e00 length 0/24 zsI ==> -104
> 
> It detected wrong direction and sets STALL here, but no End Transfer
> command.
> 
>> <...>-8364    [000] d..1  1255.334339: dwc3_prepare_trb: ep0out: trb
>> ffffffc01bffd010 (E1:D0) buf 00000000efffa000 size 8 ctrl 00000c23
>> (HLcs:SC:setup)
> 
> The driver overwrote the active TRB with a new SETUP TRB.
> 
>> <...>-8364    [000] d..1  1255.336099: dwc3_gadget_ep_cmd: ep0out: cmd
>> 'Start Transfer' [406] params 00000000 efffa000 00000000 --> status: No
>> Resource
> 
> The dep->flags DWC3_EP_TRANSFER_STARTED got cleared. When it tries to
> issue a Start Transfer a SETUP transfer, the command will fail with no
> resource because the endpoint never ended properly.
> 
>> <...>-8364    [000] d..1  1255.357594: dwc3_ctrl_req: 00 60 b7 ef 00 00
>> 00 00
> 
> Here is iffy because the behavior is undefined. The driver overwrote the
> previous TRB. The Start Transfer command didn't go through, so the
> controller still has the old TRB setup in its cache. It gets the next
> SETUP request completion anyway because the SETUP stage is a short
> packet. The driver updated its state that it's expecting the SETUP
> stage, and it doesn't check the TRB write back buffer size for more or
> less than 8 bytes or whether this is valid data.
> 
> Wesley, is this SETUP packet the correct RNDIS control request?
> 
That doesn't look to be a RNDIS control packet.  I collected a bus
analyzer log as well w/ this snippet, and nothing was transmitted from
the host side during the data stage.  Subsequent SETUP transactions were
standard USB descriptors (GET string descriptors).
>> <...>-8364    [000] d..1  1255.357680: dwc3_gadget_ep_cmd: ep0out: cmd
>> 'Set Stall' [404] params 00000000 00000000 00000000 --> status: Successful
> 
> The device Set Stall on an unrecognized request, probably from the
> application, which can be normal.
> 
>> <...>-8364    [000] d..1  1255.357696: dwc3_prepare_trb: ep0out: trb
>> ffffffc01bffd010 (E1:D0) buf 00000000efffa000 size 8 ctrl 00000c23
>> (HLcs:SC:setup)
>> <...>-8364    [000] d..1  1255.357722: dwc3_gadget_ep_cmd: ep0out: cmd
>> 'Start Transfer' [406] params 00000000 efffa000 00000000 --> status:
>> Successful
>> <...>-224     [005] d..1  1266.313014: dwc3_gadget_ep_cmd: ep2out: cmd
>> 'End Transfer' [40c08] params 00000000 00000000 00000000 --> status:
>> Timed Out
>>
>> Thanks
>> Wesley Cheng
>>
> 
> Thanks for the test Wesley
> 
> BR,
> Thinh
> 

So back to the original issue, which was the SETUP timeout during pullup
disable, I went ahead and collected the ftrace w/ a change to just
remove the return statement:

ret = wait_for_completion_timeout(&dwc->ep0_in_setup,
				msecs_to_jiffies(DWC3_PULL_UP_TIMEOUT));
if (ret == 0) {
	dev_err(dwc->dev, "timed out waiting for SETUP phase\n");
	return -ETIMEDOUT;
}

This would allow the __dwc3_gadget_stop() to disable EP0/1, which would
call dwc3_gadget_stop_active_transfer() to send the end xfer command:

//Packet which will not queue a data stage (injected failure)
<...>-7098    [003] d..1   346.560711: dwc3_ctrl_req: Get String
Descriptor(Index = 3, Length = 2)

//Prepare IN data stage TRB
<...>-7098    [003] d..1   346.560865: dwc3_prepare_trb: ep0in: trb
ffffffc011edd000 (E0:D0) buf 00000000effcc000 size 2 ctrl 00000c53
(HLcs:SC:data)
<...>-7098    [003] d..1   346.560915: dwc3_gadget_ep_cmd: ep0in: cmd
'Start Transfer' [406] params 00000000 efffa000 00000000 --> status:
Successful

//Pullup disable here kicks in - __dwc3_gadget_stop()
<...>-224     [006] d..1   348.607367: dwc3_gadget_ep_disable: ep0out:
mps 64/512 streams 0 burst 1 ring 0/0 flags E:swbp:>
<...>-224     [006] d..1   348.607430: dwc3_gadget_giveback: ep0out: req
ffffff884e5f5500 length 0/2 zsI ==> -108
<...>-224     [006] d..1   348.607444: dwc3_gadget_ep_disable: ep0in:
mps 64/512 streams 0 burst 1 ring 0/0 flags E:swBp:<

//End transfer on the pending EP0 in TRB queued previously
<...>-224     [006] d..1   348.607484: dwc3_gadget_ep_cmd: ep0in: cmd
'End Transfer' [10c08] params 00000000 00000000 00000000 --> status:
Successful
<...>-224     [006] dN.1   348.607788: usb_gadget_vbus_draw: speed 3/6
state 6 0mA [sg:self-powered:activated:disconnected] --> 0
<...>-224     [006] ....   348.616888: usb_gadget_disconnect: speed 3/6
state 6 0mA [sg:self-powered:activated:disconnected] --> 0

Does this look ok, Thinh?

Thanks
Wesley Cheng
Thinh Nguyen Aug. 20, 2021, 1:52 a.m. UTC | #18
Wesley Cheng wrote:
> Hi Thinh,
> 
> On 8/18/2021 5:40 PM, Thinh Nguyen wrote:
>> Wesley Cheng wrote:
>>> Hi Thinh/Felipe,
>>>
>>> On 8/16/2021 12:13 PM, Wesley Cheng wrote:
>>>> Hi Thinh,
>>>>
>>>> On 8/15/2021 5:33 PM, Thinh Nguyen wrote:
>>>>> Felipe Balbi wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Thinh Nguyen <Thinh.Nguyen@synopsys.com> writes:
>>>>>>>>>>>>>>> If this occurs, then the entire pullup disable routine is skipped and
>>>>>>>>>>>>>>> proper cleanup and halting of the controller does not complete.
>>>>>>>>>>>>>>> Instead of returning an error (which is ignored from the UDC
>>>>>>>>>>>>>>> perspective), do what is mentioned in the comments and force the
>>>>>>>>>>>>>>> transaction to complete and put the ep0state back to the SETUP phase.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Signed-off-by: Wesley Cheng <wcheng@codeaurora.org>
>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>  drivers/usb/dwc3/ep0.c    | 4 ++--
>>>>>>>>>>>>>>>  drivers/usb/dwc3/gadget.c | 6 +++++-
>>>>>>>>>>>>>>>  drivers/usb/dwc3/gadget.h | 3 +++
>>>>>>>>>>>>>>>  3 files changed, 10 insertions(+), 3 deletions(-)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
>>>>>>>>>>>>>>> index 6587394..abfc42b 100644
>>>>>>>>>>>>>>> --- a/drivers/usb/dwc3/ep0.c
>>>>>>>>>>>>>>> +++ b/drivers/usb/dwc3/ep0.c
>>>>>>>>>>>>>>> @@ -218,7 +218,7 @@ int dwc3_gadget_ep0_queue(struct usb_ep *ep, struct usb_request *request,
>>>>>>>>>>>>>>>  	return ret;
>>>>>>>>>>>>>>>  }
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>> -static void dwc3_ep0_stall_and_restart(struct dwc3 *dwc)
>>>>>>>>>>>>>>> +void dwc3_ep0_stall_and_restart(struct dwc3 *dwc)
>>>>>>>>>>>>>>>  {
>>>>>>>>>>>>>>>  	struct dwc3_ep		*dep;
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>> @@ -1073,7 +1073,7 @@ void dwc3_ep0_send_delayed_status(struct dwc3 *dwc)
>>>>>>>>>>>>>>>  	__dwc3_ep0_do_control_status(dwc, dwc->eps[direction]);
>>>>>>>>>>>>>>>  }
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>> -static void dwc3_ep0_end_control_data(struct dwc3 *dwc, struct dwc3_ep *dep)
>>>>>>>>>>>>>>> +void dwc3_ep0_end_control_data(struct dwc3 *dwc, struct dwc3_ep *dep)
>>>>>>>>>>>>>>>  {
>>>>>>>>>>>>>>>  	struct dwc3_gadget_ep_cmd_params params;
>>>>>>>>>>>>>>>  	u32			cmd;
>>>>>>>>>>>>>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>>>>>>>>>>>>>>> index 54c5a08..a0e2e4d 100644
>>>>>>>>>>>>>>> --- a/drivers/usb/dwc3/gadget.c
>>>>>>>>>>>>>>> +++ b/drivers/usb/dwc3/gadget.c
>>>>>>>>>>>>>>> @@ -2437,7 +2437,11 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on)
>>>>>>>>>>>>>>>  				msecs_to_jiffies(DWC3_PULL_UP_TIMEOUT));
>>>>>>>>>>>>>>>  		if (ret == 0) {
>>>>>>>>>>>>>>>  			dev_err(dwc->dev, "timed out waiting for SETUP phase\n");
>>>>>>>>>>>>>>> -			return -ETIMEDOUT;
>>>>>>>>>>>>>>> +			spin_lock_irqsave(&dwc->lock, flags);
>>>>>>>>>>>>>>> +			dwc3_ep0_end_control_data(dwc, dwc->eps[0]);
>>>>>>>>>>>>>>> +			dwc3_ep0_end_control_data(dwc, dwc->eps[1]);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> End transfer command takes time, need to wait for it to complete before
>>>>>>>>>>>>>> issuing Start transfer again. Also, why restart again when it's about to
>>>>>>>>>>>>>> be disconnected.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I can try without restarting it again, and see if that works.  Instead
>>>>>>>>>>>>> of waiting for the command complete event, can we set the ForceRM bit,
>>>>>>>>>>>>> similar to what we do for dwc3_remove_requests()?
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ForceRM=1 means that the controller will ignore updating the TRBs
>>>>>>>>>>>> (including not clearing the HWO and remain transfer size). The driver
>>>>>>>>>>>> still needs to wait for the command to complete before issuing Start
>>>>>>>>>>>> Transfer command. Otherwise Start Transfer won't go through. If we know
>>>>>>>>>>>> that we're not going to issue Start Transfer any time soon, then we may
>>>>>>>>>>>> be able to get away with ignoring End Transfer command completion.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I see.  Currently, in the place that we do use
>>>>>>>>>>> dwc3_ep0_end_control_data(), its followed by
>>>>>>>>>>> dwc3_ep0_stall_and_restart() which would execute start transfer.  For
>>>>>>>>>>
>>>>>>>>>> That doesn't look right. You can try to see if it can recover from a
>>>>>>>>>> control write request. Often time we do control read and not write.
>>>>>>>>>> (i.e. try to End Transfer and immediately Start Transfer on the same
>>>>>>>>>> direction control endpoint).
>>>>>>>>>>
>>>>>>>>> OK, I can try, but just to clarify, I was referring to how it was being
>>>>>>>>> done in:
>>>>>>>>>
>>>>>>>>> static void dwc3_ep0_xfernotready(struct dwc3 *dwc,
>>>>>>>>> 		const struct dwc3_event_depevt *event)
>>>>>>>>> {
>>>>>>>>> ...
>>>>>>>>> 		if (dwc->ep0_expect_in != event->endpoint_number) {
>>>>>>>>> 			struct dwc3_ep	*dep = dwc->eps[dwc->ep0_expect_in];
>>>>>>>>>
>>>>>>>>> 			dev_err(dwc->dev, "unexpected direction for Data Phase\n");
>>>>>>>>> 			dwc3_ep0_end_control_data(dwc, dep);
>>>>>>>>> 			dwc3_ep0_stall_and_restart(dwc);
>>>>>>>>> 			return;
>>>>>>>>> 		}
>>>>>>>>>
>>>>>>>
>>>>>>> Looking at this snippet again, it looks wrong. For control write
>>>>>>> unexpected direction, if the driver hasn't setup and started the DATA
>>>>>>> phase yet, then it's fine, but there is a problem if it did.
>>>>>>>
>>>>>>> Since dwc3_ep0_end_control_data() doesn't issue End Transfer command to
>>>>>>> ep0 due to the resource_index check, it doesn't follow the control
>>>>>>
>>>>>> IIRC resource_index is always non-zero, so the command should be
>>>>>
>>>>> No, resource_index for ep0out is 0, ep0in is 1. You can check from any
>>>>> of the driver tracepoint log for the return value of Start Transfer
>>>>> command for the resource index of ep0. There could be a mixed up with
>>>>> the undocumented return value of Set Endpoint Transfer Resource command
>>>>> before when this code was written, don't mix up with that.
>>>>>
>>>>>> triggered. If you have access to a Lecroy USB Trainer, could you script
>>>>>> this very scenario for verification?
>>>>>
>>>>> For anyone who wants to work on this, we don't need a LeCroy USB
>>>>> trainer. If you use xhci host, just modify the xhci-ring.c to queue a
>>>>> wrong direction DATA phase TRB of a particular control write request
>>>>> test, and continue with the next control requests.
>>>>>
>>>> Let me give this a try since I already have a modified (broken :)) XHCI
>>>> stack.
>>>>
>>>> Thanks
>>>> Wesley Cheng
>>>
>>> Sorry for the late response.  I was trying to get a reliable change to
>>> get the issue to reproduce.  I think I was able to find a set up which
>>> will generate the unexpected direction issue.  I'll try my best to
>>> summarize the traces here:
>>>
>>> Set up:
>>> - Modified XHCI stack to queue an IN TRB for a three stage CONTROL OUT
>>> transaction (OUT data stage)
>>> - Device is using RNDIS, as that has interface specific commands (ie
>>> SEND_ENCAPSUALTED messages)
>>>
>>> Kernel Log:
>>> [ 1255.312870] msm-usb-hsphy 88e3000.hsphy: Avail curr from USB = 900
>>> [ 1255.315300] dwc3_ep0_xfernotready event status = 1
>>> [ 1255.315429] dwc3_ep0_xfernotready event status = 2
>>> [ 1255.316390] android_work: sent uevent USB_STATE=CONFIGURED
>>> [ 1255.317467] dwc3_ep0_xfernotready event status = 1
>>> [ 1255.317588] dwc3_ep0_xfernotready event status = 2
>>> [ 1255.334196] dwc3_ep0_xfernotready event status = 1
>>> [ 1255.334217] dwc3 a600000.dwc3: unexpected direction for Data Phase
>>> [ 1255.334311] rndis_msg_parser: unknown RNDIS message 0x0052033A len
>>> 4456526
>>> [ 1255.334328] RNDIS command error -524, 0/24
>>> [ 1255.334369] ------------[ cut here ]------------
>>> [ 1255.334377] dwc3 a600000.dwc3: No resource for ep0out
>>> [ 1255.334440] WARNING: CPU: 0 PID: 8364 at
>>> drivers/usb/dwc3/gadget.c:360 dwc3_send_gadget_ep_cmd+0x3c4/0x96c
>>> ...
>>>  1255.336163] WARNING: CPU: 0 PID: 8364 at drivers/usb/dwc3/ep0.c:281
>>> dwc3_ep0_out_start+0x108/0x144
>>>
>>> So the kernel log does indeed show the concern mentioned by Thinh, where
>>> after ending the transfer, we do see the dwc3_ep0_out_start() fail due
>>> to no xfer resource.
>>>
>>
>> No, there's no end transfer command seen, and it's expected from code
>> review.
>>
> Sorry, yes that's correct.
>>> ftrace:
>>> <...>-8364    [000] d..1  1255.333988: dwc3_ctrl_req: Get Interface
>>> Status(Intf = 0, Length = 24)
>>> <...>-8364    [000] d..1  1255.334115: dwc3_prepare_trb: ep0out: trb
>>> ffffffc01bffd000 (E1:D0) buf 00000000efb76000 size 24 ctrl 00000455
>>> (HlCs:Sc:data)
>>> <...>-8364    [000] d..1  1255.334128: dwc3_prepare_trb: ep0out: trb
>>> ffffffc01bffd010 (E1:D0) buf 00000000efff9000 size 488 ctrl 00000c53
>>> (HLcs:SC:data)
>>> <...>-8364    [000] d..1  1255.334166: dwc3_gadget_ep_cmd: ep0out: cmd
>>> 'Start Transfer' [406] params 00000000 efffa000 00000000 --> status:
>>> Successful
>>> <...>-8364    [000] d..1  1255.334239: dwc3_gadget_ep_cmd: ep0out: cmd
>>> 'Set Stall' [404] params 00000000 00000000 00000000 --> status: Successful
>>> <...>-8364    [000] d..1  1255.334291: dwc3_gadget_giveback: ep0out: req
>>> ffffff8891724e00 length 0/24 zsI ==> -104
>>
>> It detected wrong direction and sets STALL here, but no End Transfer
>> command.
>>
>>> <...>-8364    [000] d..1  1255.334339: dwc3_prepare_trb: ep0out: trb
>>> ffffffc01bffd010 (E1:D0) buf 00000000efffa000 size 8 ctrl 00000c23
>>> (HLcs:SC:setup)
>>
>> The driver overwrote the active TRB with a new SETUP TRB.
>>
>>> <...>-8364    [000] d..1  1255.336099: dwc3_gadget_ep_cmd: ep0out: cmd
>>> 'Start Transfer' [406] params 00000000 efffa000 00000000 --> status: No
>>> Resource
>>
>> The dep->flags DWC3_EP_TRANSFER_STARTED got cleared. When it tries to
>> issue a Start Transfer a SETUP transfer, the command will fail with no
>> resource because the endpoint never ended properly.
>>
>>> <...>-8364    [000] d..1  1255.357594: dwc3_ctrl_req: 00 60 b7 ef 00 00
>>> 00 00
>>
>> Here is iffy because the behavior is undefined. The driver overwrote the
>> previous TRB. The Start Transfer command didn't go through, so the
>> controller still has the old TRB setup in its cache. It gets the next
>> SETUP request completion anyway because the SETUP stage is a short
>> packet. The driver updated its state that it's expecting the SETUP
>> stage, and it doesn't check the TRB write back buffer size for more or
>> less than 8 bytes or whether this is valid data.
>>
>> Wesley, is this SETUP packet the correct RNDIS control request?
>>
> That doesn't look to be a RNDIS control packet.  I collected a bus
> analyzer log as well w/ this snippet, and nothing was transmitted from
> the host side during the data stage.  Subsequent SETUP transactions were
> standard USB descriptors (GET string descriptors).

Ok.

>>> <...>-8364    [000] d..1  1255.357680: dwc3_gadget_ep_cmd: ep0out: cmd
>>> 'Set Stall' [404] params 00000000 00000000 00000000 --> status: Successful
>>
>> The device Set Stall on an unrecognized request, probably from the
>> application, which can be normal.
>>
>>> <...>-8364    [000] d..1  1255.357696: dwc3_prepare_trb: ep0out: trb
>>> ffffffc01bffd010 (E1:D0) buf 00000000efffa000 size 8 ctrl 00000c23
>>> (HLcs:SC:setup)
>>> <...>-8364    [000] d..1  1255.357722: dwc3_gadget_ep_cmd: ep0out: cmd
>>> 'Start Transfer' [406] params 00000000 efffa000 00000000 --> status:
>>> Successful
>>> <...>-224     [005] d..1  1266.313014: dwc3_gadget_ep_cmd: ep2out: cmd
>>> 'End Transfer' [40c08] params 00000000 00000000 00000000 --> status:
>>> Timed Out
>>>
>>> Thanks
>>> Wesley Cheng
>>>
>>
>> Thanks for the test Wesley
>>
>> BR,
>> Thinh
>>
> 
> So back to the original issue, which was the SETUP timeout during pullup
> disable, I went ahead and collected the ftrace w/ a change to just
> remove the return statement:
> 
> ret = wait_for_completion_timeout(&dwc->ep0_in_setup,
> 				msecs_to_jiffies(DWC3_PULL_UP_TIMEOUT));
> if (ret == 0) {
> 	dev_err(dwc->dev, "timed out waiting for SETUP phase\n");
> 	return -ETIMEDOUT;
> }
> 
> This would allow the __dwc3_gadget_stop() to disable EP0/1, which would
> call dwc3_gadget_stop_active_transfer() to send the end xfer command:
> 
> //Packet which will not queue a data stage (injected failure)
> <...>-7098    [003] d..1   346.560711: dwc3_ctrl_req: Get String
> Descriptor(Index = 3, Length = 2)
> 
> //Prepare IN data stage TRB
> <...>-7098    [003] d..1   346.560865: dwc3_prepare_trb: ep0in: trb
> ffffffc011edd000 (E0:D0) buf 00000000effcc000 size 2 ctrl 00000c53
> (HLcs:SC:data)
> <...>-7098    [003] d..1   346.560915: dwc3_gadget_ep_cmd: ep0in: cmd
> 'Start Transfer' [406] params 00000000 efffa000 00000000 --> status:
> Successful
> 
> //Pullup disable here kicks in - __dwc3_gadget_stop()
> <...>-224     [006] d..1   348.607367: dwc3_gadget_ep_disable: ep0out:
> mps 64/512 streams 0 burst 1 ring 0/0 flags E:swbp:>
> <...>-224     [006] d..1   348.607430: dwc3_gadget_giveback: ep0out: req
> ffffff884e5f5500 length 0/2 zsI ==> -108
> <...>-224     [006] d..1   348.607444: dwc3_gadget_ep_disable: ep0in:
> mps 64/512 streams 0 burst 1 ring 0/0 flags E:swBp:<
> 
> //End transfer on the pending EP0 in TRB queued previously
> <...>-224     [006] d..1   348.607484: dwc3_gadget_ep_cmd: ep0in: cmd
> 'End Transfer' [10c08] params 00000000 00000000 00000000 --> status:
> Successful
> <...>-224     [006] dN.1   348.607788: usb_gadget_vbus_draw: speed 3/6
> state 6 0mA [sg:self-powered:activated:disconnected] --> 0
> <...>-224     [006] ....   348.616888: usb_gadget_disconnect: speed 3/6
> state 6 0mA [sg:self-powered:activated:disconnected] --> 0
> 
> Does this look ok, Thinh?
> 

It should be fine.

The way we're handling soft-connect and soft-disconnect in dwc3 is
different than what the programming guide suggested. We're not doing
soft-reset on soft-connect. Our HW testing didn't show a problem in
quick soft-connect/disconnect succession before, but can you test doing
soft-connect immediately after this scenario?

Also, maybe we can change dev_err() print to dev_warn() on timeout instead?

Thanks,
Thinh
Wesley Cheng Aug. 20, 2021, 3:05 a.m. UTC | #19
Hi Thinh,

On 8/19/2021 6:52 PM, Thinh Nguyen wrote:
> Wesley Cheng wrote:
>> Hi Thinh,
>>
>> On 8/18/2021 5:40 PM, Thinh Nguyen wrote:
>>> Wesley Cheng wrote:
>>>> Hi Thinh/Felipe,
>>>>
>>>> On 8/16/2021 12:13 PM, Wesley Cheng wrote:
>>>>> Hi Thinh,
>>>>>
>>>>> On 8/15/2021 5:33 PM, Thinh Nguyen wrote:
>>>>>> Felipe Balbi wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Thinh Nguyen <Thinh.Nguyen@synopsys.com> writes:
>>>>>>>>>>>>>>>> If this occurs, then the entire pullup disable routine is skipped and
>>>>>>>>>>>>>>>> proper cleanup and halting of the controller does not complete.
>>>>>>>>>>>>>>>> Instead of returning an error (which is ignored from the UDC
>>>>>>>>>>>>>>>> perspective), do what is mentioned in the comments and force the
>>>>>>>>>>>>>>>> transaction to complete and put the ep0state back to the SETUP phase.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Signed-off-by: Wesley Cheng <wcheng@codeaurora.org>
>>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>>  drivers/usb/dwc3/ep0.c    | 4 ++--
>>>>>>>>>>>>>>>>  drivers/usb/dwc3/gadget.c | 6 +++++-
>>>>>>>>>>>>>>>>  drivers/usb/dwc3/gadget.h | 3 +++
>>>>>>>>>>>>>>>>  3 files changed, 10 insertions(+), 3 deletions(-)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
>>>>>>>>>>>>>>>> index 6587394..abfc42b 100644
>>>>>>>>>>>>>>>> --- a/drivers/usb/dwc3/ep0.c
>>>>>>>>>>>>>>>> +++ b/drivers/usb/dwc3/ep0.c
>>>>>>>>>>>>>>>> @@ -218,7 +218,7 @@ int dwc3_gadget_ep0_queue(struct usb_ep *ep, struct usb_request *request,
>>>>>>>>>>>>>>>>  	return ret;
>>>>>>>>>>>>>>>>  }
>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>> -static void dwc3_ep0_stall_and_restart(struct dwc3 *dwc)
>>>>>>>>>>>>>>>> +void dwc3_ep0_stall_and_restart(struct dwc3 *dwc)
>>>>>>>>>>>>>>>>  {
>>>>>>>>>>>>>>>>  	struct dwc3_ep		*dep;
>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>> @@ -1073,7 +1073,7 @@ void dwc3_ep0_send_delayed_status(struct dwc3 *dwc)
>>>>>>>>>>>>>>>>  	__dwc3_ep0_do_control_status(dwc, dwc->eps[direction]);
>>>>>>>>>>>>>>>>  }
>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>> -static void dwc3_ep0_end_control_data(struct dwc3 *dwc, struct dwc3_ep *dep)
>>>>>>>>>>>>>>>> +void dwc3_ep0_end_control_data(struct dwc3 *dwc, struct dwc3_ep *dep)
>>>>>>>>>>>>>>>>  {
>>>>>>>>>>>>>>>>  	struct dwc3_gadget_ep_cmd_params params;
>>>>>>>>>>>>>>>>  	u32			cmd;
>>>>>>>>>>>>>>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>>>>>>>>>>>>>>>> index 54c5a08..a0e2e4d 100644
>>>>>>>>>>>>>>>> --- a/drivers/usb/dwc3/gadget.c
>>>>>>>>>>>>>>>> +++ b/drivers/usb/dwc3/gadget.c
>>>>>>>>>>>>>>>> @@ -2437,7 +2437,11 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on)
>>>>>>>>>>>>>>>>  				msecs_to_jiffies(DWC3_PULL_UP_TIMEOUT));
>>>>>>>>>>>>>>>>  		if (ret == 0) {
>>>>>>>>>>>>>>>>  			dev_err(dwc->dev, "timed out waiting for SETUP phase\n");
>>>>>>>>>>>>>>>> -			return -ETIMEDOUT;
>>>>>>>>>>>>>>>> +			spin_lock_irqsave(&dwc->lock, flags);
>>>>>>>>>>>>>>>> +			dwc3_ep0_end_control_data(dwc, dwc->eps[0]);
>>>>>>>>>>>>>>>> +			dwc3_ep0_end_control_data(dwc, dwc->eps[1]);
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> End transfer command takes time, need to wait for it to complete before
>>>>>>>>>>>>>>> issuing Start transfer again. Also, why restart again when it's about to
>>>>>>>>>>>>>>> be disconnected.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I can try without restarting it again, and see if that works.  Instead
>>>>>>>>>>>>>> of waiting for the command complete event, can we set the ForceRM bit,
>>>>>>>>>>>>>> similar to what we do for dwc3_remove_requests()?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> ForceRM=1 means that the controller will ignore updating the TRBs
>>>>>>>>>>>>> (including not clearing the HWO and remain transfer size). The driver
>>>>>>>>>>>>> still needs to wait for the command to complete before issuing Start
>>>>>>>>>>>>> Transfer command. Otherwise Start Transfer won't go through. If we know
>>>>>>>>>>>>> that we're not going to issue Start Transfer any time soon, then we may
>>>>>>>>>>>>> be able to get away with ignoring End Transfer command completion.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I see.  Currently, in the place that we do use
>>>>>>>>>>>> dwc3_ep0_end_control_data(), its followed by
>>>>>>>>>>>> dwc3_ep0_stall_and_restart() which would execute start transfer.  For
>>>>>>>>>>>
>>>>>>>>>>> That doesn't look right. You can try to see if it can recover from a
>>>>>>>>>>> control write request. Often time we do control read and not write.
>>>>>>>>>>> (i.e. try to End Transfer and immediately Start Transfer on the same
>>>>>>>>>>> direction control endpoint).
>>>>>>>>>>>
>>>>>>>>>> OK, I can try, but just to clarify, I was referring to how it was being
>>>>>>>>>> done in:
>>>>>>>>>>
>>>>>>>>>> static void dwc3_ep0_xfernotready(struct dwc3 *dwc,
>>>>>>>>>> 		const struct dwc3_event_depevt *event)
>>>>>>>>>> {
>>>>>>>>>> ...
>>>>>>>>>> 		if (dwc->ep0_expect_in != event->endpoint_number) {
>>>>>>>>>> 			struct dwc3_ep	*dep = dwc->eps[dwc->ep0_expect_in];
>>>>>>>>>>
>>>>>>>>>> 			dev_err(dwc->dev, "unexpected direction for Data Phase\n");
>>>>>>>>>> 			dwc3_ep0_end_control_data(dwc, dep);
>>>>>>>>>> 			dwc3_ep0_stall_and_restart(dwc);
>>>>>>>>>> 			return;
>>>>>>>>>> 		}
>>>>>>>>>>
>>>>>>>>
>>>>>>>> Looking at this snippet again, it looks wrong. For control write
>>>>>>>> unexpected direction, if the driver hasn't setup and started the DATA
>>>>>>>> phase yet, then it's fine, but there is a problem if it did.
>>>>>>>>
>>>>>>>> Since dwc3_ep0_end_control_data() doesn't issue End Transfer command to
>>>>>>>> ep0 due to the resource_index check, it doesn't follow the control
>>>>>>>
>>>>>>> IIRC resource_index is always non-zero, so the command should be
>>>>>>
>>>>>> No, resource_index for ep0out is 0, ep0in is 1. You can check from any
>>>>>> of the driver tracepoint log for the return value of Start Transfer
>>>>>> command for the resource index of ep0. There could be a mixed up with
>>>>>> the undocumented return value of Set Endpoint Transfer Resource command
>>>>>> before when this code was written, don't mix up with that.
>>>>>>
>>>>>>> triggered. If you have access to a Lecroy USB Trainer, could you script
>>>>>>> this very scenario for verification?
>>>>>>
>>>>>> For anyone who wants to work on this, we don't need a LeCroy USB
>>>>>> trainer. If you use xhci host, just modify the xhci-ring.c to queue a
>>>>>> wrong direction DATA phase TRB of a particular control write request
>>>>>> test, and continue with the next control requests.
>>>>>>
>>>>> Let me give this a try since I already have a modified (broken :)) XHCI
>>>>> stack.
>>>>>
>>>>> Thanks
>>>>> Wesley Cheng
>>>>
>>>> Sorry for the late response.  I was trying to get a reliable change to
>>>> get the issue to reproduce.  I think I was able to find a set up which
>>>> will generate the unexpected direction issue.  I'll try my best to
>>>> summarize the traces here:
>>>>
>>>> Set up:
>>>> - Modified XHCI stack to queue an IN TRB for a three stage CONTROL OUT
>>>> transaction (OUT data stage)
>>>> - Device is using RNDIS, as that has interface specific commands (ie
>>>> SEND_ENCAPSUALTED messages)
>>>>
>>>> Kernel Log:
>>>> [ 1255.312870] msm-usb-hsphy 88e3000.hsphy: Avail curr from USB = 900
>>>> [ 1255.315300] dwc3_ep0_xfernotready event status = 1
>>>> [ 1255.315429] dwc3_ep0_xfernotready event status = 2
>>>> [ 1255.316390] android_work: sent uevent USB_STATE=CONFIGURED
>>>> [ 1255.317467] dwc3_ep0_xfernotready event status = 1
>>>> [ 1255.317588] dwc3_ep0_xfernotready event status = 2
>>>> [ 1255.334196] dwc3_ep0_xfernotready event status = 1
>>>> [ 1255.334217] dwc3 a600000.dwc3: unexpected direction for Data Phase
>>>> [ 1255.334311] rndis_msg_parser: unknown RNDIS message 0x0052033A len
>>>> 4456526
>>>> [ 1255.334328] RNDIS command error -524, 0/24
>>>> [ 1255.334369] ------------[ cut here ]------------
>>>> [ 1255.334377] dwc3 a600000.dwc3: No resource for ep0out
>>>> [ 1255.334440] WARNING: CPU: 0 PID: 8364 at
>>>> drivers/usb/dwc3/gadget.c:360 dwc3_send_gadget_ep_cmd+0x3c4/0x96c
>>>> ...
>>>>  1255.336163] WARNING: CPU: 0 PID: 8364 at drivers/usb/dwc3/ep0.c:281
>>>> dwc3_ep0_out_start+0x108/0x144
>>>>
>>>> So the kernel log does indeed show the concern mentioned by Thinh, where
>>>> after ending the transfer, we do see the dwc3_ep0_out_start() fail due
>>>> to no xfer resource.
>>>>
>>>
>>> No, there's no end transfer command seen, and it's expected from code
>>> review.
>>>
>> Sorry, yes that's correct.
>>>> ftrace:
>>>> <...>-8364    [000] d..1  1255.333988: dwc3_ctrl_req: Get Interface
>>>> Status(Intf = 0, Length = 24)
>>>> <...>-8364    [000] d..1  1255.334115: dwc3_prepare_trb: ep0out: trb
>>>> ffffffc01bffd000 (E1:D0) buf 00000000efb76000 size 24 ctrl 00000455
>>>> (HlCs:Sc:data)
>>>> <...>-8364    [000] d..1  1255.334128: dwc3_prepare_trb: ep0out: trb
>>>> ffffffc01bffd010 (E1:D0) buf 00000000efff9000 size 488 ctrl 00000c53
>>>> (HLcs:SC:data)
>>>> <...>-8364    [000] d..1  1255.334166: dwc3_gadget_ep_cmd: ep0out: cmd
>>>> 'Start Transfer' [406] params 00000000 efffa000 00000000 --> status:
>>>> Successful
>>>> <...>-8364    [000] d..1  1255.334239: dwc3_gadget_ep_cmd: ep0out: cmd
>>>> 'Set Stall' [404] params 00000000 00000000 00000000 --> status: Successful
>>>> <...>-8364    [000] d..1  1255.334291: dwc3_gadget_giveback: ep0out: req
>>>> ffffff8891724e00 length 0/24 zsI ==> -104
>>>
>>> It detected wrong direction and sets STALL here, but no End Transfer
>>> command.
>>>
>>>> <...>-8364    [000] d..1  1255.334339: dwc3_prepare_trb: ep0out: trb
>>>> ffffffc01bffd010 (E1:D0) buf 00000000efffa000 size 8 ctrl 00000c23
>>>> (HLcs:SC:setup)
>>>
>>> The driver overwrote the active TRB with a new SETUP TRB.
>>>
>>>> <...>-8364    [000] d..1  1255.336099: dwc3_gadget_ep_cmd: ep0out: cmd
>>>> 'Start Transfer' [406] params 00000000 efffa000 00000000 --> status: No
>>>> Resource
>>>
>>> The dep->flags DWC3_EP_TRANSFER_STARTED got cleared. When it tries to
>>> issue a Start Transfer a SETUP transfer, the command will fail with no
>>> resource because the endpoint never ended properly.
>>>
>>>> <...>-8364    [000] d..1  1255.357594: dwc3_ctrl_req: 00 60 b7 ef 00 00
>>>> 00 00
>>>
>>> Here is iffy because the behavior is undefined. The driver overwrote the
>>> previous TRB. The Start Transfer command didn't go through, so the
>>> controller still has the old TRB setup in its cache. It gets the next
>>> SETUP request completion anyway because the SETUP stage is a short
>>> packet. The driver updated its state that it's expecting the SETUP
>>> stage, and it doesn't check the TRB write back buffer size for more or
>>> less than 8 bytes or whether this is valid data.
>>>
>>> Wesley, is this SETUP packet the correct RNDIS control request?
>>>
>> That doesn't look to be a RNDIS control packet.  I collected a bus
>> analyzer log as well w/ this snippet, and nothing was transmitted from
>> the host side during the data stage.  Subsequent SETUP transactions were
>> standard USB descriptors (GET string descriptors).
> 
> Ok.
> 
>>>> <...>-8364    [000] d..1  1255.357680: dwc3_gadget_ep_cmd: ep0out: cmd
>>>> 'Set Stall' [404] params 00000000 00000000 00000000 --> status: Successful
>>>
>>> The device Set Stall on an unrecognized request, probably from the
>>> application, which can be normal.
>>>
>>>> <...>-8364    [000] d..1  1255.357696: dwc3_prepare_trb: ep0out: trb
>>>> ffffffc01bffd010 (E1:D0) buf 00000000efffa000 size 8 ctrl 00000c23
>>>> (HLcs:SC:setup)
>>>> <...>-8364    [000] d..1  1255.357722: dwc3_gadget_ep_cmd: ep0out: cmd
>>>> 'Start Transfer' [406] params 00000000 efffa000 00000000 --> status:
>>>> Successful
>>>> <...>-224     [005] d..1  1266.313014: dwc3_gadget_ep_cmd: ep2out: cmd
>>>> 'End Transfer' [40c08] params 00000000 00000000 00000000 --> status:
>>>> Timed Out
>>>>
>>>> Thanks
>>>> Wesley Cheng
>>>>
>>>
>>> Thanks for the test Wesley
>>>
>>> BR,
>>> Thinh
>>>
>>
>> So back to the original issue, which was the SETUP timeout during pullup
>> disable, I went ahead and collected the ftrace w/ a change to just
>> remove the return statement:
>>
>> ret = wait_for_completion_timeout(&dwc->ep0_in_setup,
>> 				msecs_to_jiffies(DWC3_PULL_UP_TIMEOUT));
>> if (ret == 0) {
>> 	dev_err(dwc->dev, "timed out waiting for SETUP phase\n");
>> 	return -ETIMEDOUT;
>> }
>>
>> This would allow the __dwc3_gadget_stop() to disable EP0/1, which would
>> call dwc3_gadget_stop_active_transfer() to send the end xfer command:
>>
>> //Packet which will not queue a data stage (injected failure)
>> <...>-7098    [003] d..1   346.560711: dwc3_ctrl_req: Get String
>> Descriptor(Index = 3, Length = 2)
>>
>> //Prepare IN data stage TRB
>> <...>-7098    [003] d..1   346.560865: dwc3_prepare_trb: ep0in: trb
>> ffffffc011edd000 (E0:D0) buf 00000000effcc000 size 2 ctrl 00000c53
>> (HLcs:SC:data)
>> <...>-7098    [003] d..1   346.560915: dwc3_gadget_ep_cmd: ep0in: cmd
>> 'Start Transfer' [406] params 00000000 efffa000 00000000 --> status:
>> Successful
>>
>> //Pullup disable here kicks in - __dwc3_gadget_stop()
>> <...>-224     [006] d..1   348.607367: dwc3_gadget_ep_disable: ep0out:
>> mps 64/512 streams 0 burst 1 ring 0/0 flags E:swbp:>
>> <...>-224     [006] d..1   348.607430: dwc3_gadget_giveback: ep0out: req
>> ffffff884e5f5500 length 0/2 zsI ==> -108
>> <...>-224     [006] d..1   348.607444: dwc3_gadget_ep_disable: ep0in:
>> mps 64/512 streams 0 burst 1 ring 0/0 flags E:swBp:<
>>
>> //End transfer on the pending EP0 in TRB queued previously
>> <...>-224     [006] d..1   348.607484: dwc3_gadget_ep_cmd: ep0in: cmd
>> 'End Transfer' [10c08] params 00000000 00000000 00000000 --> status:
>> Successful
>> <...>-224     [006] dN.1   348.607788: usb_gadget_vbus_draw: speed 3/6
>> state 6 0mA [sg:self-powered:activated:disconnected] --> 0
>> <...>-224     [006] ....   348.616888: usb_gadget_disconnect: speed 3/6
>> state 6 0mA [sg:self-powered:activated:disconnected] --> 0
>>
>> Does this look ok, Thinh?
>>
> 
> It should be fine.
> 
> The way we're handling soft-connect and soft-disconnect in dwc3 is
> different than what the programming guide suggested. We're not doing
> soft-reset on soft-connect. Our HW testing didn't show a problem in
> quick soft-connect/disconnect succession before, but can you test doing
> soft-connect immediately after this scenario?
> 

Thanks Thinh.

I can try.  Currently, the Android USB framework will immediately rebind
to the UDC if I run the below command from the console:
echo "" > /config/usb_gadget/g1/UDC

So that will be a quick pullup disable (due to the command) and then a
subsequent pullup enable from the Android USB framework.  I'll run a
script overnight to keep doing the echo command above.
> Also, maybe we can change dev_err() print to dev_warn() on timeout instead?
> 
Sure, will do.

Thanks
Wesley Cheng
diff mbox series

Patch

diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
index 6587394..abfc42b 100644
--- a/drivers/usb/dwc3/ep0.c
+++ b/drivers/usb/dwc3/ep0.c
@@ -218,7 +218,7 @@  int dwc3_gadget_ep0_queue(struct usb_ep *ep, struct usb_request *request,
 	return ret;
 }
 
-static void dwc3_ep0_stall_and_restart(struct dwc3 *dwc)
+void dwc3_ep0_stall_and_restart(struct dwc3 *dwc)
 {
 	struct dwc3_ep		*dep;
 
@@ -1073,7 +1073,7 @@  void dwc3_ep0_send_delayed_status(struct dwc3 *dwc)
 	__dwc3_ep0_do_control_status(dwc, dwc->eps[direction]);
 }
 
-static void dwc3_ep0_end_control_data(struct dwc3 *dwc, struct dwc3_ep *dep)
+void dwc3_ep0_end_control_data(struct dwc3 *dwc, struct dwc3_ep *dep)
 {
 	struct dwc3_gadget_ep_cmd_params params;
 	u32			cmd;
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 54c5a08..a0e2e4d 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -2437,7 +2437,11 @@  static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on)
 				msecs_to_jiffies(DWC3_PULL_UP_TIMEOUT));
 		if (ret == 0) {
 			dev_err(dwc->dev, "timed out waiting for SETUP phase\n");
-			return -ETIMEDOUT;
+			spin_lock_irqsave(&dwc->lock, flags);
+			dwc3_ep0_end_control_data(dwc, dwc->eps[0]);
+			dwc3_ep0_end_control_data(dwc, dwc->eps[1]);
+			dwc3_ep0_stall_and_restart(dwc);
+			spin_unlock_irqrestore(&dwc->lock, flags);
 		}
 	}
 
diff --git a/drivers/usb/dwc3/gadget.h b/drivers/usb/dwc3/gadget.h
index 77df4b6..632f7b7 100644
--- a/drivers/usb/dwc3/gadget.h
+++ b/drivers/usb/dwc3/gadget.h
@@ -114,6 +114,9 @@  int __dwc3_gadget_ep0_set_halt(struct usb_ep *ep, int value);
 int dwc3_gadget_ep0_set_halt(struct usb_ep *ep, int value);
 int dwc3_gadget_ep0_queue(struct usb_ep *ep, struct usb_request *request,
 		gfp_t gfp_flags);
+void dwc3_ep0_stall_and_restart(struct dwc3 *dwc);
+void dwc3_ep0_end_control_data(struct dwc3 *dwc, struct dwc3_ep *dep);
+
 int __dwc3_gadget_ep_set_halt(struct dwc3_ep *dep, int value, int protocol);
 void dwc3_ep0_send_delayed_status(struct dwc3 *dwc);