diff mbox series

[RFC] dwc3: gadget: Fix losing list items in dwc3_gadget_ep_cleanup_completed_requests()

Message ID 20210809223159.2342385-1-john.stultz@linaro.org (mailing list archive)
State New, archived
Headers show
Series [RFC] dwc3: gadget: Fix losing list items in dwc3_gadget_ep_cleanup_completed_requests() | expand

Commit Message

John Stultz Aug. 9, 2021, 10:31 p.m. UTC
In commit d25d85061bd8 ("usb: dwc3: gadget: Use
list_replace_init() before traversing lists"), a local list_head
was introduced to process the started_list items to avoid races.

However, in dwc3_gadget_ep_cleanup_completed_requests() if
dwc3_gadget_ep_cleanup_completed_request() fails, we break early,
causing the items on the local list_head to be lost.

This issue showed up as problems on the db845c/RB3 board, where
adb connetions would fail, showing the device as "offline".

This patch tries to fix the issue by if we are returning early
we splice in the local list head back into the started_list
and return (avoiding an infinite loop, as the started_list is
now non-null).

Not sure if this is fully correct, but seems to work for me so I
wanted to share for feedback.

Cc: Wesley Cheng <wcheng@codeaurora.org>
Cc: Felipe Balbi <balbi@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Jack Pham <jackp@codeaurora.org>
Cc: Thinh Nguyen <thinh.nguyen@synopsys.com>
Cc: Todd Kjos <tkjos@google.com>
Cc: Amit Pundir <amit.pundir@linaro.org>
Cc: YongQin Liu <yongqin.liu@linaro.org>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Petri Gynther <pgynther@google.com>
Cc: linux-usb@vger.kernel.org
Fixes: d25d85061bd8 ("usb: dwc3: gadget: Use list_replace_init() before traversing lists")
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 drivers/usb/dwc3/gadget.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Thinh Nguyen Aug. 9, 2021, 10:44 p.m. UTC | #1
John Stultz wrote:
> In commit d25d85061bd8 ("usb: dwc3: gadget: Use
> list_replace_init() before traversing lists"), a local list_head
> was introduced to process the started_list items to avoid races.
> 
> However, in dwc3_gadget_ep_cleanup_completed_requests() if
> dwc3_gadget_ep_cleanup_completed_request() fails, we break early,
> causing the items on the local list_head to be lost.
> 
> This issue showed up as problems on the db845c/RB3 board, where
> adb connetions would fail, showing the device as "offline".
> 
> This patch tries to fix the issue by if we are returning early
> we splice in the local list head back into the started_list
> and return (avoiding an infinite loop, as the started_list is
> now non-null).
> 
> Not sure if this is fully correct, but seems to work for me so I
> wanted to share for feedback.
> 
> Cc: Wesley Cheng <wcheng@codeaurora.org>
> Cc: Felipe Balbi <balbi@kernel.org>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Alan Stern <stern@rowland.harvard.edu>
> Cc: Jack Pham <jackp@codeaurora.org>
> Cc: Thinh Nguyen <thinh.nguyen@synopsys.com>
> Cc: Todd Kjos <tkjos@google.com>
> Cc: Amit Pundir <amit.pundir@linaro.org>
> Cc: YongQin Liu <yongqin.liu@linaro.org>
> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> Cc: Petri Gynther <pgynther@google.com>
> Cc: linux-usb@vger.kernel.org
> Fixes: d25d85061bd8 ("usb: dwc3: gadget: Use list_replace_init() before traversing lists")
> Signed-off-by: John Stultz <john.stultz@linaro.org>
> ---
>  drivers/usb/dwc3/gadget.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> index b8d4b2d327b23..a73ebe8e75024 100644
> --- a/drivers/usb/dwc3/gadget.c
> +++ b/drivers/usb/dwc3/gadget.c
> @@ -2990,6 +2990,12 @@ static void dwc3_gadget_ep_cleanup_completed_requests(struct dwc3_ep *dep,
>  			break;
>  	}
>  
> +	if (!list_empty(&local)) {
> +		list_splice_tail(&local, &dep->started_list);
> +		/* Return so we don't hit the restart case and loop forever */
> +		return;
> +	}
> +
>  	if (!list_empty(&dep->started_list))
>  		goto restart;
>  }
> 

No, we should revert the change for
dwc3_gadget_ep_cleaup_completed_requests(). As I mentioned previously,
we don't cleanup the entire started_list. If the original problem is due
to disconnection in the middle of request completion, then we can just
check for pullup_connected and exit the loop and let the
dwc3_remove_requests() do the cleanup.

BR,
Thinh
John Stultz Aug. 9, 2021, 10:53 p.m. UTC | #2
On Mon, Aug 9, 2021 at 3:44 PM Thinh Nguyen <Thinh.Nguyen@synopsys.com> wrote:
>
> John Stultz wrote:
> > In commit d25d85061bd8 ("usb: dwc3: gadget: Use
> > list_replace_init() before traversing lists"), a local list_head
> > was introduced to process the started_list items to avoid races.
> >
> > However, in dwc3_gadget_ep_cleanup_completed_requests() if
> > dwc3_gadget_ep_cleanup_completed_request() fails, we break early,
> > causing the items on the local list_head to be lost.
> >
> > This issue showed up as problems on the db845c/RB3 board, where
> > adb connetions would fail, showing the device as "offline".
> >
> > This patch tries to fix the issue by if we are returning early
> > we splice in the local list head back into the started_list
> > and return (avoiding an infinite loop, as the started_list is
> > now non-null).
> >
> > Not sure if this is fully correct, but seems to work for me so I
> > wanted to share for feedback.
> >
> > Cc: Wesley Cheng <wcheng@codeaurora.org>
> > Cc: Felipe Balbi <balbi@kernel.org>
> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > Cc: Alan Stern <stern@rowland.harvard.edu>
> > Cc: Jack Pham <jackp@codeaurora.org>
> > Cc: Thinh Nguyen <thinh.nguyen@synopsys.com>
> > Cc: Todd Kjos <tkjos@google.com>
> > Cc: Amit Pundir <amit.pundir@linaro.org>
> > Cc: YongQin Liu <yongqin.liu@linaro.org>
> > Cc: Sumit Semwal <sumit.semwal@linaro.org>
> > Cc: Petri Gynther <pgynther@google.com>
> > Cc: linux-usb@vger.kernel.org
> > Fixes: d25d85061bd8 ("usb: dwc3: gadget: Use list_replace_init() before traversing lists")
> > Signed-off-by: John Stultz <john.stultz@linaro.org>
> > ---
> >  drivers/usb/dwc3/gadget.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> >
> > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> > index b8d4b2d327b23..a73ebe8e75024 100644
> > --- a/drivers/usb/dwc3/gadget.c
> > +++ b/drivers/usb/dwc3/gadget.c
> > @@ -2990,6 +2990,12 @@ static void dwc3_gadget_ep_cleanup_completed_requests(struct dwc3_ep *dep,
> >                       break;
> >       }
> >
> > +     if (!list_empty(&local)) {
> > +             list_splice_tail(&local, &dep->started_list);
> > +             /* Return so we don't hit the restart case and loop forever */
> > +             return;
> > +     }
> > +
> >       if (!list_empty(&dep->started_list))
> >               goto restart;
> >  }
> >
>
> No, we should revert the change for
> dwc3_gadget_ep_cleaup_completed_requests(). As I mentioned previously,
> we don't cleanup the entire started_list. If the original problem is due
> to disconnection in the middle of request completion, then we can just
> check for pullup_connected and exit the loop and let the
> dwc3_remove_requests() do the cleanup.

Ok, sorry, I didn't read your mail in depth until I had this patch
sent out. If a revert of d25d85061bd8 is the better fix, I'm fine with
that too.

thanks
-john
Thinh Nguyen Aug. 9, 2021, 10:57 p.m. UTC | #3
John Stultz wrote:
> On Mon, Aug 9, 2021 at 3:44 PM Thinh Nguyen <Thinh.Nguyen@synopsys.com> wrote:
>>
>> John Stultz wrote:
>>> In commit d25d85061bd8 ("usb: dwc3: gadget: Use
>>> list_replace_init() before traversing lists"), a local list_head
>>> was introduced to process the started_list items to avoid races.
>>>
>>> However, in dwc3_gadget_ep_cleanup_completed_requests() if
>>> dwc3_gadget_ep_cleanup_completed_request() fails, we break early,
>>> causing the items on the local list_head to be lost.
>>>
>>> This issue showed up as problems on the db845c/RB3 board, where
>>> adb connetions would fail, showing the device as "offline".
>>>
>>> This patch tries to fix the issue by if we are returning early
>>> we splice in the local list head back into the started_list
>>> and return (avoiding an infinite loop, as the started_list is
>>> now non-null).
>>>
>>> Not sure if this is fully correct, but seems to work for me so I
>>> wanted to share for feedback.
>>>
>>> Cc: Wesley Cheng <wcheng@codeaurora.org>
>>> Cc: Felipe Balbi <balbi@kernel.org>
>>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>> Cc: Alan Stern <stern@rowland.harvard.edu>
>>> Cc: Jack Pham <jackp@codeaurora.org>
>>> Cc: Thinh Nguyen <thinh.nguyen@synopsys.com>
>>> Cc: Todd Kjos <tkjos@google.com>
>>> Cc: Amit Pundir <amit.pundir@linaro.org>
>>> Cc: YongQin Liu <yongqin.liu@linaro.org>
>>> Cc: Sumit Semwal <sumit.semwal@linaro.org>
>>> Cc: Petri Gynther <pgynther@google.com>
>>> Cc: linux-usb@vger.kernel.org
>>> Fixes: d25d85061bd8 ("usb: dwc3: gadget: Use list_replace_init() before traversing lists")
>>> Signed-off-by: John Stultz <john.stultz@linaro.org>
>>> ---
>>>  drivers/usb/dwc3/gadget.c | 6 ++++++
>>>  1 file changed, 6 insertions(+)
>>>
>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>>> index b8d4b2d327b23..a73ebe8e75024 100644
>>> --- a/drivers/usb/dwc3/gadget.c
>>> +++ b/drivers/usb/dwc3/gadget.c
>>> @@ -2990,6 +2990,12 @@ static void dwc3_gadget_ep_cleanup_completed_requests(struct dwc3_ep *dep,
>>>                       break;
>>>       }
>>>
>>> +     if (!list_empty(&local)) {
>>> +             list_splice_tail(&local, &dep->started_list);
>>> +             /* Return so we don't hit the restart case and loop forever */
>>> +             return;
>>> +     }
>>> +
>>>       if (!list_empty(&dep->started_list))
>>>               goto restart;
>>>  }
>>>
>>
>> No, we should revert the change for
>> dwc3_gadget_ep_cleaup_completed_requests(). As I mentioned previously,
>> we don't cleanup the entire started_list. If the original problem is due
>> to disconnection in the middle of request completion, then we can just
>> check for pullup_connected and exit the loop and let the
>> dwc3_remove_requests() do the cleanup.
> 
> Ok, sorry, I didn't read your mail in depth until I had this patch
> sent out. If a revert of d25d85061bd8 is the better fix, I'm fine with
> that too.
> 
> thanks
> -john
> 

IMO, we should revert this patch for now since it will cause regression.
We can review and test a proper fix at a later time.

Thanks,
Thinh
Greg KH Aug. 10, 2021, 6:05 a.m. UTC | #4
On Mon, Aug 09, 2021 at 10:57:27PM +0000, Thinh Nguyen wrote:
> John Stultz wrote:
> > On Mon, Aug 9, 2021 at 3:44 PM Thinh Nguyen <Thinh.Nguyen@synopsys.com> wrote:
> >>
> >> John Stultz wrote:
> >>> In commit d25d85061bd8 ("usb: dwc3: gadget: Use
> >>> list_replace_init() before traversing lists"), a local list_head
> >>> was introduced to process the started_list items to avoid races.
> >>>
> >>> However, in dwc3_gadget_ep_cleanup_completed_requests() if
> >>> dwc3_gadget_ep_cleanup_completed_request() fails, we break early,
> >>> causing the items on the local list_head to be lost.
> >>>
> >>> This issue showed up as problems on the db845c/RB3 board, where
> >>> adb connetions would fail, showing the device as "offline".
> >>>
> >>> This patch tries to fix the issue by if we are returning early
> >>> we splice in the local list head back into the started_list
> >>> and return (avoiding an infinite loop, as the started_list is
> >>> now non-null).
> >>>
> >>> Not sure if this is fully correct, but seems to work for me so I
> >>> wanted to share for feedback.
> >>>
> >>> Cc: Wesley Cheng <wcheng@codeaurora.org>
> >>> Cc: Felipe Balbi <balbi@kernel.org>
> >>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> >>> Cc: Alan Stern <stern@rowland.harvard.edu>
> >>> Cc: Jack Pham <jackp@codeaurora.org>
> >>> Cc: Thinh Nguyen <thinh.nguyen@synopsys.com>
> >>> Cc: Todd Kjos <tkjos@google.com>
> >>> Cc: Amit Pundir <amit.pundir@linaro.org>
> >>> Cc: YongQin Liu <yongqin.liu@linaro.org>
> >>> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> >>> Cc: Petri Gynther <pgynther@google.com>
> >>> Cc: linux-usb@vger.kernel.org
> >>> Fixes: d25d85061bd8 ("usb: dwc3: gadget: Use list_replace_init() before traversing lists")
> >>> Signed-off-by: John Stultz <john.stultz@linaro.org>
> >>> ---
> >>>  drivers/usb/dwc3/gadget.c | 6 ++++++
> >>>  1 file changed, 6 insertions(+)
> >>>
> >>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> >>> index b8d4b2d327b23..a73ebe8e75024 100644
> >>> --- a/drivers/usb/dwc3/gadget.c
> >>> +++ b/drivers/usb/dwc3/gadget.c
> >>> @@ -2990,6 +2990,12 @@ static void dwc3_gadget_ep_cleanup_completed_requests(struct dwc3_ep *dep,
> >>>                       break;
> >>>       }
> >>>
> >>> +     if (!list_empty(&local)) {
> >>> +             list_splice_tail(&local, &dep->started_list);
> >>> +             /* Return so we don't hit the restart case and loop forever */
> >>> +             return;
> >>> +     }
> >>> +
> >>>       if (!list_empty(&dep->started_list))
> >>>               goto restart;
> >>>  }
> >>>
> >>
> >> No, we should revert the change for
> >> dwc3_gadget_ep_cleaup_completed_requests(). As I mentioned previously,
> >> we don't cleanup the entire started_list. If the original problem is due
> >> to disconnection in the middle of request completion, then we can just
> >> check for pullup_connected and exit the loop and let the
> >> dwc3_remove_requests() do the cleanup.
> > 
> > Ok, sorry, I didn't read your mail in depth until I had this patch
> > sent out. If a revert of d25d85061bd8 is the better fix, I'm fine with
> > that too.
> > 
> > thanks
> > -john
> > 
> 
> IMO, we should revert this patch for now since it will cause regression.
> We can review and test a proper fix at a later time.

Ok, can someone send me a revert please?  That will go faster than me
having to create it myself...

thanks,

greg k-h
Greg KH Aug. 10, 2021, 7:11 a.m. UTC | #5
On Tue, Aug 10, 2021 at 08:05:49AM +0200, Greg Kroah-Hartman wrote:
> On Mon, Aug 09, 2021 at 10:57:27PM +0000, Thinh Nguyen wrote:
> > John Stultz wrote:
> > > On Mon, Aug 9, 2021 at 3:44 PM Thinh Nguyen <Thinh.Nguyen@synopsys.com> wrote:
> > >>
> > >> John Stultz wrote:
> > >>> In commit d25d85061bd8 ("usb: dwc3: gadget: Use
> > >>> list_replace_init() before traversing lists"), a local list_head
> > >>> was introduced to process the started_list items to avoid races.
> > >>>
> > >>> However, in dwc3_gadget_ep_cleanup_completed_requests() if
> > >>> dwc3_gadget_ep_cleanup_completed_request() fails, we break early,
> > >>> causing the items on the local list_head to be lost.
> > >>>
> > >>> This issue showed up as problems on the db845c/RB3 board, where
> > >>> adb connetions would fail, showing the device as "offline".
> > >>>
> > >>> This patch tries to fix the issue by if we are returning early
> > >>> we splice in the local list head back into the started_list
> > >>> and return (avoiding an infinite loop, as the started_list is
> > >>> now non-null).
> > >>>
> > >>> Not sure if this is fully correct, but seems to work for me so I
> > >>> wanted to share for feedback.
> > >>>
> > >>> Cc: Wesley Cheng <wcheng@codeaurora.org>
> > >>> Cc: Felipe Balbi <balbi@kernel.org>
> > >>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > >>> Cc: Alan Stern <stern@rowland.harvard.edu>
> > >>> Cc: Jack Pham <jackp@codeaurora.org>
> > >>> Cc: Thinh Nguyen <thinh.nguyen@synopsys.com>
> > >>> Cc: Todd Kjos <tkjos@google.com>
> > >>> Cc: Amit Pundir <amit.pundir@linaro.org>
> > >>> Cc: YongQin Liu <yongqin.liu@linaro.org>
> > >>> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> > >>> Cc: Petri Gynther <pgynther@google.com>
> > >>> Cc: linux-usb@vger.kernel.org
> > >>> Fixes: d25d85061bd8 ("usb: dwc3: gadget: Use list_replace_init() before traversing lists")
> > >>> Signed-off-by: John Stultz <john.stultz@linaro.org>
> > >>> ---
> > >>>  drivers/usb/dwc3/gadget.c | 6 ++++++
> > >>>  1 file changed, 6 insertions(+)
> > >>>
> > >>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> > >>> index b8d4b2d327b23..a73ebe8e75024 100644
> > >>> --- a/drivers/usb/dwc3/gadget.c
> > >>> +++ b/drivers/usb/dwc3/gadget.c
> > >>> @@ -2990,6 +2990,12 @@ static void dwc3_gadget_ep_cleanup_completed_requests(struct dwc3_ep *dep,
> > >>>                       break;
> > >>>       }
> > >>>
> > >>> +     if (!list_empty(&local)) {
> > >>> +             list_splice_tail(&local, &dep->started_list);
> > >>> +             /* Return so we don't hit the restart case and loop forever */
> > >>> +             return;
> > >>> +     }
> > >>> +
> > >>>       if (!list_empty(&dep->started_list))
> > >>>               goto restart;
> > >>>  }
> > >>>
> > >>
> > >> No, we should revert the change for
> > >> dwc3_gadget_ep_cleaup_completed_requests(). As I mentioned previously,
> > >> we don't cleanup the entire started_list. If the original problem is due
> > >> to disconnection in the middle of request completion, then we can just
> > >> check for pullup_connected and exit the loop and let the
> > >> dwc3_remove_requests() do the cleanup.
> > > 
> > > Ok, sorry, I didn't read your mail in depth until I had this patch
> > > sent out. If a revert of d25d85061bd8 is the better fix, I'm fine with
> > > that too.
> > > 
> > > thanks
> > > -john
> > > 
> > 
> > IMO, we should revert this patch for now since it will cause regression.
> > We can review and test a proper fix at a later time.
> 
> Ok, can someone send me a revert please?  That will go faster than me
> having to create it myself...

I'll go do this now...
Wesley Cheng Aug. 10, 2021, 5:11 p.m. UTC | #6
Hi Thinh,

On 8/9/2021 3:57 PM, Thinh Nguyen wrote:
> John Stultz wrote:
>> On Mon, Aug 9, 2021 at 3:44 PM Thinh Nguyen <Thinh.Nguyen@synopsys.com> wrote:
>>>
>>> John Stultz wrote:
>>>> In commit d25d85061bd8 ("usb: dwc3: gadget: Use
>>>> list_replace_init() before traversing lists"), a local list_head
>>>> was introduced to process the started_list items to avoid races.
>>>>
>>>> However, in dwc3_gadget_ep_cleanup_completed_requests() if
>>>> dwc3_gadget_ep_cleanup_completed_request() fails, we break early,
>>>> causing the items on the local list_head to be lost.
>>>>
>>>> This issue showed up as problems on the db845c/RB3 board, where
>>>> adb connetions would fail, showing the device as "offline".
>>>>
>>>> This patch tries to fix the issue by if we are returning early
>>>> we splice in the local list head back into the started_list
>>>> and return (avoiding an infinite loop, as the started_list is
>>>> now non-null).
>>>>
>>>> Not sure if this is fully correct, but seems to work for me so I
>>>> wanted to share for feedback.
>>>>
>>>> Cc: Wesley Cheng <wcheng@codeaurora.org>
>>>> Cc: Felipe Balbi <balbi@kernel.org>
>>>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>> Cc: Alan Stern <stern@rowland.harvard.edu>
>>>> Cc: Jack Pham <jackp@codeaurora.org>
>>>> Cc: Thinh Nguyen <thinh.nguyen@synopsys.com>
>>>> Cc: Todd Kjos <tkjos@google.com>
>>>> Cc: Amit Pundir <amit.pundir@linaro.org>
>>>> Cc: YongQin Liu <yongqin.liu@linaro.org>
>>>> Cc: Sumit Semwal <sumit.semwal@linaro.org>
>>>> Cc: Petri Gynther <pgynther@google.com>
>>>> Cc: linux-usb@vger.kernel.org
>>>> Fixes: d25d85061bd8 ("usb: dwc3: gadget: Use list_replace_init() before traversing lists")
>>>> Signed-off-by: John Stultz <john.stultz@linaro.org>
>>>> ---
>>>>  drivers/usb/dwc3/gadget.c | 6 ++++++
>>>>  1 file changed, 6 insertions(+)
>>>>
>>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>>>> index b8d4b2d327b23..a73ebe8e75024 100644
>>>> --- a/drivers/usb/dwc3/gadget.c
>>>> +++ b/drivers/usb/dwc3/gadget.c
>>>> @@ -2990,6 +2990,12 @@ static void dwc3_gadget_ep_cleanup_completed_requests(struct dwc3_ep *dep,
>>>>                       break;
>>>>       }
>>>>
>>>> +     if (!list_empty(&local)) {
>>>> +             list_splice_tail(&local, &dep->started_list);
>>>> +             /* Return so we don't hit the restart case and loop forever */
>>>> +             return;
>>>> +     }
>>>> +
>>>>       if (!list_empty(&dep->started_list))
>>>>               goto restart;
>>>>  }
>>>>
>>>
>>> No, we should revert the change for
>>> dwc3_gadget_ep_cleaup_completed_requests(). As I mentioned previously,
>>> we don't cleanup the entire started_list. If the original problem is due
>>> to disconnection in the middle of request completion, then we can just
>>> check for pullup_connected and exit the loop and let the
>>> dwc3_remove_requests() do the cleanup.
>>
>> Ok, sorry, I didn't read your mail in depth until I had this patch
>> sent out. If a revert of d25d85061bd8 is the better fix, I'm fine with
>> that too.
>>
>> thanks
>> -john
>>
> 
> IMO, we should revert this patch for now since it will cause regression.
> We can review and test a proper fix at a later time.
> 
> Thanks,
> Thinh
> 

Another suggestion would just be to replace the loop with a while() loop
and using list_entry() instead.  That was what was discussed in the
earlier patch series which also addresses the problem as well.  Issue
here is the tmp variable still carries a stale request after the dwc3
giveback is called.  We can avoid that by always fetching the
list_entry() instead of relying on list_for_each_safe()

https://lore.kernel.org/linux-usb/1620716636-12422-1-git-send-email-wcheng@codeaurora.org/

Thanks
Wesley Cheng
Thinh Nguyen Aug. 10, 2021, 8:14 p.m. UTC | #7
Wesley Cheng wrote:
> Hi Thinh,
> 
> On 8/9/2021 3:57 PM, Thinh Nguyen wrote:
>> John Stultz wrote:
>>> On Mon, Aug 9, 2021 at 3:44 PM Thinh Nguyen <Thinh.Nguyen@synopsys.com> wrote:
>>>>
>>>> John Stultz wrote:
>>>>> In commit d25d85061bd8 ("usb: dwc3: gadget: Use
>>>>> list_replace_init() before traversing lists"), a local list_head
>>>>> was introduced to process the started_list items to avoid races.
>>>>>
>>>>> However, in dwc3_gadget_ep_cleanup_completed_requests() if
>>>>> dwc3_gadget_ep_cleanup_completed_request() fails, we break early,
>>>>> causing the items on the local list_head to be lost.
>>>>>
>>>>> This issue showed up as problems on the db845c/RB3 board, where
>>>>> adb connetions would fail, showing the device as "offline".
>>>>>
>>>>> This patch tries to fix the issue by if we are returning early
>>>>> we splice in the local list head back into the started_list
>>>>> and return (avoiding an infinite loop, as the started_list is
>>>>> now non-null).
>>>>>
>>>>> Not sure if this is fully correct, but seems to work for me so I
>>>>> wanted to share for feedback.
>>>>>
>>>>> Cc: Wesley Cheng <wcheng@codeaurora.org>
>>>>> Cc: Felipe Balbi <balbi@kernel.org>
>>>>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>> Cc: Alan Stern <stern@rowland.harvard.edu>
>>>>> Cc: Jack Pham <jackp@codeaurora.org>
>>>>> Cc: Thinh Nguyen <thinh.nguyen@synopsys.com>
>>>>> Cc: Todd Kjos <tkjos@google.com>
>>>>> Cc: Amit Pundir <amit.pundir@linaro.org>
>>>>> Cc: YongQin Liu <yongqin.liu@linaro.org>
>>>>> Cc: Sumit Semwal <sumit.semwal@linaro.org>
>>>>> Cc: Petri Gynther <pgynther@google.com>
>>>>> Cc: linux-usb@vger.kernel.org
>>>>> Fixes: d25d85061bd8 ("usb: dwc3: gadget: Use list_replace_init() before traversing lists")
>>>>> Signed-off-by: John Stultz <john.stultz@linaro.org>
>>>>> ---
>>>>>  drivers/usb/dwc3/gadget.c | 6 ++++++
>>>>>  1 file changed, 6 insertions(+)
>>>>>
>>>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>>>>> index b8d4b2d327b23..a73ebe8e75024 100644
>>>>> --- a/drivers/usb/dwc3/gadget.c
>>>>> +++ b/drivers/usb/dwc3/gadget.c
>>>>> @@ -2990,6 +2990,12 @@ static void dwc3_gadget_ep_cleanup_completed_requests(struct dwc3_ep *dep,
>>>>>                       break;
>>>>>       }
>>>>>
>>>>> +     if (!list_empty(&local)) {
>>>>> +             list_splice_tail(&local, &dep->started_list);
>>>>> +             /* Return so we don't hit the restart case and loop forever */
>>>>> +             return;
>>>>> +     }
>>>>> +
>>>>>       if (!list_empty(&dep->started_list))
>>>>>               goto restart;
>>>>>  }
>>>>>
>>>>
>>>> No, we should revert the change for
>>>> dwc3_gadget_ep_cleaup_completed_requests(). As I mentioned previously,
>>>> we don't cleanup the entire started_list. If the original problem is due
>>>> to disconnection in the middle of request completion, then we can just
>>>> check for pullup_connected and exit the loop and let the
>>>> dwc3_remove_requests() do the cleanup.
>>>
>>> Ok, sorry, I didn't read your mail in depth until I had this patch
>>> sent out. If a revert of d25d85061bd8 is the better fix, I'm fine with
>>> that too.
>>>
>>> thanks
>>> -john
>>>
>>
>> IMO, we should revert this patch for now since it will cause regression.
>> We can review and test a proper fix at a later time.
>>
>> Thanks,
>> Thinh
>>
> 
> Another suggestion would just be to replace the loop with a while() loop
> and using list_entry() instead.  That was what was discussed in the
> earlier patch series which also addresses the problem as well.  Issue
> here is the tmp variable still carries a stale request after the dwc3
> giveback is called.  We can avoid that by always fetching the
> list_entry() instead of relying on list_for_each_safe()
> 
> https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/1620716636-12422-1-git-send-email-wcheng@codeaurora.org/__;!!A4F2R9G_pg!P0E1pv3C0PStDepKyy8iqKgUaOhDy0ZDhYdz-_cZwnJRQjNjvw0MdJQCdU6Xwnt3YAs_$ 
> 

This should work, but the awkward thing is 2 loops from 2 separate
threads competing to remove/giveback the requests and may report mix status.

BR,
Thinh
Thinh Nguyen Aug. 10, 2021, 8:17 p.m. UTC | #8
Thinh Nguyen wrote:
> Wesley Cheng wrote:
>> Hi Thinh,
>>
>> On 8/9/2021 3:57 PM, Thinh Nguyen wrote:
>>> John Stultz wrote:
>>>> On Mon, Aug 9, 2021 at 3:44 PM Thinh Nguyen <Thinh.Nguyen@synopsys.com> wrote:
>>>>>
>>>>> John Stultz wrote:
>>>>>> In commit d25d85061bd8 ("usb: dwc3: gadget: Use
>>>>>> list_replace_init() before traversing lists"), a local list_head
>>>>>> was introduced to process the started_list items to avoid races.
>>>>>>
>>>>>> However, in dwc3_gadget_ep_cleanup_completed_requests() if
>>>>>> dwc3_gadget_ep_cleanup_completed_request() fails, we break early,
>>>>>> causing the items on the local list_head to be lost.
>>>>>>
>>>>>> This issue showed up as problems on the db845c/RB3 board, where
>>>>>> adb connetions would fail, showing the device as "offline".
>>>>>>
>>>>>> This patch tries to fix the issue by if we are returning early
>>>>>> we splice in the local list head back into the started_list
>>>>>> and return (avoiding an infinite loop, as the started_list is
>>>>>> now non-null).
>>>>>>
>>>>>> Not sure if this is fully correct, but seems to work for me so I
>>>>>> wanted to share for feedback.
>>>>>>
>>>>>> Cc: Wesley Cheng <wcheng@codeaurora.org>
>>>>>> Cc: Felipe Balbi <balbi@kernel.org>
>>>>>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>>> Cc: Alan Stern <stern@rowland.harvard.edu>
>>>>>> Cc: Jack Pham <jackp@codeaurora.org>
>>>>>> Cc: Thinh Nguyen <thinh.nguyen@synopsys.com>
>>>>>> Cc: Todd Kjos <tkjos@google.com>
>>>>>> Cc: Amit Pundir <amit.pundir@linaro.org>
>>>>>> Cc: YongQin Liu <yongqin.liu@linaro.org>
>>>>>> Cc: Sumit Semwal <sumit.semwal@linaro.org>
>>>>>> Cc: Petri Gynther <pgynther@google.com>
>>>>>> Cc: linux-usb@vger.kernel.org
>>>>>> Fixes: d25d85061bd8 ("usb: dwc3: gadget: Use list_replace_init() before traversing lists")
>>>>>> Signed-off-by: John Stultz <john.stultz@linaro.org>
>>>>>> ---
>>>>>>  drivers/usb/dwc3/gadget.c | 6 ++++++
>>>>>>  1 file changed, 6 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>>>>>> index b8d4b2d327b23..a73ebe8e75024 100644
>>>>>> --- a/drivers/usb/dwc3/gadget.c
>>>>>> +++ b/drivers/usb/dwc3/gadget.c
>>>>>> @@ -2990,6 +2990,12 @@ static void dwc3_gadget_ep_cleanup_completed_requests(struct dwc3_ep *dep,
>>>>>>                       break;
>>>>>>       }
>>>>>>
>>>>>> +     if (!list_empty(&local)) {
>>>>>> +             list_splice_tail(&local, &dep->started_list);
>>>>>> +             /* Return so we don't hit the restart case and loop forever */
>>>>>> +             return;
>>>>>> +     }
>>>>>> +
>>>>>>       if (!list_empty(&dep->started_list))
>>>>>>               goto restart;
>>>>>>  }
>>>>>>
>>>>>
>>>>> No, we should revert the change for
>>>>> dwc3_gadget_ep_cleaup_completed_requests(). As I mentioned previously,
>>>>> we don't cleanup the entire started_list. If the original problem is due
>>>>> to disconnection in the middle of request completion, then we can just
>>>>> check for pullup_connected and exit the loop and let the
>>>>> dwc3_remove_requests() do the cleanup.
>>>>
>>>> Ok, sorry, I didn't read your mail in depth until I had this patch
>>>> sent out. If a revert of d25d85061bd8 is the better fix, I'm fine with
>>>> that too.
>>>>
>>>> thanks
>>>> -john
>>>>
>>>
>>> IMO, we should revert this patch for now since it will cause regression.
>>> We can review and test a proper fix at a later time.
>>>
>>> Thanks,
>>> Thinh
>>>
>>
>> Another suggestion would just be to replace the loop with a while() loop
>> and using list_entry() instead.  That was what was discussed in the
>> earlier patch series which also addresses the problem as well.  Issue
>> here is the tmp variable still carries a stale request after the dwc3
>> giveback is called.  We can avoid that by always fetching the
>> list_entry() instead of relying on list_for_each_safe()
>>
>> https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/1620716636-12422-1-git-send-email-wcheng@codeaurora.org/__;!!A4F2R9G_pg!P0E1pv3C0PStDepKyy8iqKgUaOhDy0ZDhYdz-_cZwnJRQjNjvw0MdJQCdU6Xwnt3YAs_$ 
>>
> 
> This should work, but the awkward thing is 2 loops from 2 separate
> threads competing to remove/giveback the requests and may report mix status.
> 

It's fine with me.

BR,
Thinh
Thinh Nguyen Aug. 10, 2021, 11:40 p.m. UTC | #9
Hi Wesley,

Thinh Nguyen wrote:
> Thinh Nguyen wrote:
>> Wesley Cheng wrote:
>>> Hi Thinh,
>>>
>>> On 8/9/2021 3:57 PM, Thinh Nguyen wrote:
>>>> John Stultz wrote:
>>>>> On Mon, Aug 9, 2021 at 3:44 PM Thinh Nguyen <Thinh.Nguyen@synopsys.com> wrote:
>>>>>>
>>>>>> John Stultz wrote:
>>>>>>> In commit d25d85061bd8 ("usb: dwc3: gadget: Use
>>>>>>> list_replace_init() before traversing lists"), a local list_head
>>>>>>> was introduced to process the started_list items to avoid races.
>>>>>>>
>>>>>>> However, in dwc3_gadget_ep_cleanup_completed_requests() if
>>>>>>> dwc3_gadget_ep_cleanup_completed_request() fails, we break early,
>>>>>>> causing the items on the local list_head to be lost.
>>>>>>>
>>>>>>> This issue showed up as problems on the db845c/RB3 board, where
>>>>>>> adb connetions would fail, showing the device as "offline".
>>>>>>>
>>>>>>> This patch tries to fix the issue by if we are returning early
>>>>>>> we splice in the local list head back into the started_list
>>>>>>> and return (avoiding an infinite loop, as the started_list is
>>>>>>> now non-null).
>>>>>>>
>>>>>>> Not sure if this is fully correct, but seems to work for me so I
>>>>>>> wanted to share for feedback.
>>>>>>>
>>>>>>> Cc: Wesley Cheng <wcheng@codeaurora.org>
>>>>>>> Cc: Felipe Balbi <balbi@kernel.org>
>>>>>>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>>>> Cc: Alan Stern <stern@rowland.harvard.edu>
>>>>>>> Cc: Jack Pham <jackp@codeaurora.org>
>>>>>>> Cc: Thinh Nguyen <thinh.nguyen@synopsys.com>
>>>>>>> Cc: Todd Kjos <tkjos@google.com>
>>>>>>> Cc: Amit Pundir <amit.pundir@linaro.org>
>>>>>>> Cc: YongQin Liu <yongqin.liu@linaro.org>
>>>>>>> Cc: Sumit Semwal <sumit.semwal@linaro.org>
>>>>>>> Cc: Petri Gynther <pgynther@google.com>
>>>>>>> Cc: linux-usb@vger.kernel.org
>>>>>>> Fixes: d25d85061bd8 ("usb: dwc3: gadget: Use list_replace_init() before traversing lists")
>>>>>>> Signed-off-by: John Stultz <john.stultz@linaro.org>
>>>>>>> ---
>>>>>>>  drivers/usb/dwc3/gadget.c | 6 ++++++
>>>>>>>  1 file changed, 6 insertions(+)
>>>>>>>
>>>>>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>>>>>>> index b8d4b2d327b23..a73ebe8e75024 100644
>>>>>>> --- a/drivers/usb/dwc3/gadget.c
>>>>>>> +++ b/drivers/usb/dwc3/gadget.c
>>>>>>> @@ -2990,6 +2990,12 @@ static void dwc3_gadget_ep_cleanup_completed_requests(struct dwc3_ep *dep,
>>>>>>>                       break;
>>>>>>>       }
>>>>>>>
>>>>>>> +     if (!list_empty(&local)) {
>>>>>>> +             list_splice_tail(&local, &dep->started_list);
>>>>>>> +             /* Return so we don't hit the restart case and loop forever */
>>>>>>> +             return;
>>>>>>> +     }
>>>>>>> +
>>>>>>>       if (!list_empty(&dep->started_list))
>>>>>>>               goto restart;
>>>>>>>  }
>>>>>>>
>>>>>>
>>>>>> No, we should revert the change for
>>>>>> dwc3_gadget_ep_cleaup_completed_requests(). As I mentioned previously,
>>>>>> we don't cleanup the entire started_list. If the original problem is due
>>>>>> to disconnection in the middle of request completion, then we can just
>>>>>> check for pullup_connected and exit the loop and let the
>>>>>> dwc3_remove_requests() do the cleanup.
>>>>>
>>>>> Ok, sorry, I didn't read your mail in depth until I had this patch
>>>>> sent out. If a revert of d25d85061bd8 is the better fix, I'm fine with
>>>>> that too.
>>>>>
>>>>> thanks
>>>>> -john
>>>>>
>>>>
>>>> IMO, we should revert this patch for now since it will cause regression.
>>>> We can review and test a proper fix at a later time.
>>>>
>>>> Thanks,
>>>> Thinh
>>>>
>>>
>>> Another suggestion would just be to replace the loop with a while() loop
>>> and using list_entry() instead.  That was what was discussed in the
>>> earlier patch series which also addresses the problem as well.  Issue
>>> here is the tmp variable still carries a stale request after the dwc3
>>> giveback is called.  We can avoid that by always fetching the
>>> list_entry() instead of relying on list_for_each_safe()
>>>
>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/1620716636-12422-1-git-send-email-wcheng@codeaurora.org/__;!!A4F2R9G_pg!P0E1pv3C0PStDepKyy8iqKgUaOhDy0ZDhYdz-_cZwnJRQjNjvw0MdJQCdU6Xwnt3YAs_$ 
>>>
>>
>> This should work, but the awkward thing is 2 loops from 2 separate
>> threads competing to remove/giveback the requests and may report mix status.
>>


Can you try this?

diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 706246d93a00..17b2d8d4efb4 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -2029,6 +2029,13 @@ static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep)
                        dwc3_gadget_giveback(dep, req, -ECONNRESET);
                        break;
                }
+
+               /*
+                * The endpoint is disabled, let the dwc3_remove_requests()
+                * handle the cleanup.
+                */
+               if (!dep->endpoint.desc)
+                       break;
        }
 }
 
@@ -3402,6 +3409,13 @@ static void dwc3_gadget_ep_cleanup_completed_requests(struct dwc3_ep *dep,
                                req, status);
                if (ret)
                        break;
+
+               /*
+                * The endpoint is disabled, let the dwc3_remove_requests()
+                * handle the cleanup.
+                */
+               if (!dep->endpoint.desc)
+                       break;
        }
 }

If needed, you can also use your change while(!list_empty(started_list)) along with this for future proof.

BR,
Thinh
diff mbox series

Patch

diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index b8d4b2d327b23..a73ebe8e75024 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -2990,6 +2990,12 @@  static void dwc3_gadget_ep_cleanup_completed_requests(struct dwc3_ep *dep,
 			break;
 	}
 
+	if (!list_empty(&local)) {
+		list_splice_tail(&local, &dep->started_list);
+		/* Return so we don't hit the restart case and loop forever */
+		return;
+	}
+
 	if (!list_empty(&dep->started_list))
 		goto restart;
 }