Message ID | 1627543994-20327-1-git-send-email-wcheng@codeaurora.org (mailing list archive) |
---|---|
State | Accepted |
Commit | d25d85061bd856d6be221626605319154f9b5043 |
Headers | show |
Series | usb: dwc3: gadget: Use list_replace_init() before traversing lists | expand |
Hi, Wesley Cheng <wcheng@codeaurora.org> writes: > The list_for_each_entry_safe() macro saves the current item (n) and > the item after (n+1), so that n can be safely removed without > corrupting the list. However, when traversing the list and removing > items using gadget giveback, the DWC3 lock is briefly released, > allowing other routines to execute. There is a situation where, while > items are being removed from the cancelled_list using > dwc3_gadget_ep_cleanup_cancelled_requests(), the pullup disable > routine is running in parallel (due to UDC unbind). As the cleanup > routine removes n, and the pullup disable removes n+1, once the > cleanup retakes the DWC3 lock, it references a request who was already > removed/handled. With list debug enabled, this leads to a panic. > Ensure all instances of the macro are replaced where gadget giveback > is used. > > Example call stack: > > Thread#1: > __dwc3_gadget_ep_set_halt() - CLEAR HALT > -> dwc3_gadget_ep_cleanup_cancelled_requests() > ->list_for_each_entry_safe() > ->dwc3_gadget_giveback(n) > ->dwc3_gadget_del_and_unmap_request()- n deleted[cancelled_list] > ->spin_unlock > ->Thread#2 executes > ... > ->dwc3_gadget_giveback(n+1) > ->Already removed! > > Thread#2: > dwc3_gadget_pullup() > ->waiting for dwc3 spin_lock > ... > ->Thread#1 released lock > ->dwc3_stop_active_transfers() > ->dwc3_remove_requests() > ->fetches n+1 item from cancelled_list (n removed by Thread#1) > ->dwc3_gadget_giveback() > ->dwc3_gadget_del_and_unmap_request()- n+1 > deleted[cancelled_list] > ->spin_unlock > > Fix this condition by utilizing list_replace_init(), and traversing > through a local copy of the current elements in the endpoint lists. > This will also set the parent list as empty, so if another thread is > also looping through the list, it will be empty on the next iteration. > > Fixes: d4f1afe5e896 ("usb: dwc3: gadget: move requests to cancelled_list") > Signed-off-by: Wesley Cheng <wcheng@codeaurora.org> > > --- > Previous patchset: > https://lore.kernel.org/linux-usb/1620716636-12422-1-git-send-email-wcheng@codeaurora.org/ > --- > drivers/usb/dwc3/gadget.c | 18 ++++++++++++++++-- > 1 file changed, 16 insertions(+), 2 deletions(-) > > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c > index a29a4ca..3ce6ed9 100644 > --- a/drivers/usb/dwc3/gadget.c > +++ b/drivers/usb/dwc3/gadget.c > @@ -1926,9 +1926,13 @@ static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep) > { > struct dwc3_request *req; > struct dwc3_request *tmp; > + struct list_head local; > struct dwc3 *dwc = dep->dwc; > > - list_for_each_entry_safe(req, tmp, &dep->cancelled_list, list) { > +restart: > + list_replace_init(&dep->cancelled_list, &local); hmm, if the lock is held and IRQs disabled when this runs, then no other threads will be able to append requests to the list which makes the "restart" label unnecessary, no? I wonder if we should release the lock and reenable interrupts after replacing the head. The problem is that dwc3_gadget_ep_cleanup_cancelled_requests() can run from the IRQ handler. Alan, could you provide your insight here? Do you think we should defer this to a low priority tasklet or something along those lines? > + list_for_each_entry_safe(req, tmp, &local, list) { > dwc3_gadget_ep_skip_trbs(dep, req); > switch (req->status) { > case DWC3_REQUEST_STATUS_DISCONNECTED:
Hi Felipe, On 7/29/2021 1:09 AM, Felipe Balbi wrote: > > Hi, > > Wesley Cheng <wcheng@codeaurora.org> writes: > >> The list_for_each_entry_safe() macro saves the current item (n) and >> the item after (n+1), so that n can be safely removed without >> corrupting the list. However, when traversing the list and removing >> items using gadget giveback, the DWC3 lock is briefly released, >> allowing other routines to execute. There is a situation where, while >> items are being removed from the cancelled_list using >> dwc3_gadget_ep_cleanup_cancelled_requests(), the pullup disable >> routine is running in parallel (due to UDC unbind). As the cleanup >> routine removes n, and the pullup disable removes n+1, once the >> cleanup retakes the DWC3 lock, it references a request who was already >> removed/handled. With list debug enabled, this leads to a panic. >> Ensure all instances of the macro are replaced where gadget giveback >> is used. >> >> Example call stack: >> >> Thread#1: >> __dwc3_gadget_ep_set_halt() - CLEAR HALT >> -> dwc3_gadget_ep_cleanup_cancelled_requests() >> ->list_for_each_entry_safe() >> ->dwc3_gadget_giveback(n) >> ->dwc3_gadget_del_and_unmap_request()- n deleted[cancelled_list] >> ->spin_unlock >> ->Thread#2 executes >> ... >> ->dwc3_gadget_giveback(n+1) >> ->Already removed! >> >> Thread#2: >> dwc3_gadget_pullup() >> ->waiting for dwc3 spin_lock >> ... >> ->Thread#1 released lock >> ->dwc3_stop_active_transfers() >> ->dwc3_remove_requests() >> ->fetches n+1 item from cancelled_list (n removed by Thread#1) >> ->dwc3_gadget_giveback() >> ->dwc3_gadget_del_and_unmap_request()- n+1 >> deleted[cancelled_list] >> ->spin_unlock >> >> Fix this condition by utilizing list_replace_init(), and traversing >> through a local copy of the current elements in the endpoint lists. >> This will also set the parent list as empty, so if another thread is >> also looping through the list, it will be empty on the next iteration. >> >> Fixes: d4f1afe5e896 ("usb: dwc3: gadget: move requests to cancelled_list") >> Signed-off-by: Wesley Cheng <wcheng@codeaurora.org> >> >> --- >> Previous patchset: >> https://lore.kernel.org/linux-usb/1620716636-12422-1-git-send-email-wcheng@codeaurora.org/ >> --- >> drivers/usb/dwc3/gadget.c | 18 ++++++++++++++++-- >> 1 file changed, 16 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c >> index a29a4ca..3ce6ed9 100644 >> --- a/drivers/usb/dwc3/gadget.c >> +++ b/drivers/usb/dwc3/gadget.c >> @@ -1926,9 +1926,13 @@ static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep) >> { >> struct dwc3_request *req; >> struct dwc3_request *tmp; >> + struct list_head local; >> struct dwc3 *dwc = dep->dwc; >> >> - list_for_each_entry_safe(req, tmp, &dep->cancelled_list, list) { >> +restart: >> + list_replace_init(&dep->cancelled_list, &local); > > hmm, if the lock is held and IRQs disabled when this runs, then no other > threads will be able to append requests to the list which makes the > "restart" label unnecessary, no? We do still call dwc3_gadget_giveback() which would release the lock briefly, so if there was another thread waiting on dwc->lock, it would be able to add additional items to that list. > > I wonder if we should release the lock and reenable interrupts after > replacing the head. The problem is that > dwc3_gadget_ep_cleanup_cancelled_requests() can run from the IRQ > handler. > We would also need to consider that some of the APIs being called in these situations would also have the assumption that the dwc->lock is held, ie dwc3_gadget_giveback() Thanks Wesley Cheng > Alan, could you provide your insight here? Do you think we should defer > this to a low priority tasklet or something along those lines? > >> + list_for_each_entry_safe(req, tmp, &local, list) { >> dwc3_gadget_ep_skip_trbs(dep, req); >> switch (req->status) { >> case DWC3_REQUEST_STATUS_DISCONNECTED: > >
Hi, Wesley Cheng <wcheng@codeaurora.org> writes: >>> The list_for_each_entry_safe() macro saves the current item (n) and >>> the item after (n+1), so that n can be safely removed without >>> corrupting the list. However, when traversing the list and removing >>> items using gadget giveback, the DWC3 lock is briefly released, >>> allowing other routines to execute. There is a situation where, while >>> items are being removed from the cancelled_list using >>> dwc3_gadget_ep_cleanup_cancelled_requests(), the pullup disable >>> routine is running in parallel (due to UDC unbind). As the cleanup >>> routine removes n, and the pullup disable removes n+1, once the >>> cleanup retakes the DWC3 lock, it references a request who was already >>> removed/handled. With list debug enabled, this leads to a panic. >>> Ensure all instances of the macro are replaced where gadget giveback >>> is used. >>> >>> Example call stack: >>> >>> Thread#1: >>> __dwc3_gadget_ep_set_halt() - CLEAR HALT >>> -> dwc3_gadget_ep_cleanup_cancelled_requests() >>> ->list_for_each_entry_safe() >>> ->dwc3_gadget_giveback(n) >>> ->dwc3_gadget_del_and_unmap_request()- n deleted[cancelled_list] >>> ->spin_unlock >>> ->Thread#2 executes >>> ... >>> ->dwc3_gadget_giveback(n+1) >>> ->Already removed! >>> >>> Thread#2: >>> dwc3_gadget_pullup() >>> ->waiting for dwc3 spin_lock >>> ... >>> ->Thread#1 released lock >>> ->dwc3_stop_active_transfers() >>> ->dwc3_remove_requests() >>> ->fetches n+1 item from cancelled_list (n removed by Thread#1) >>> ->dwc3_gadget_giveback() >>> ->dwc3_gadget_del_and_unmap_request()- n+1 >>> deleted[cancelled_list] >>> ->spin_unlock >>> >>> Fix this condition by utilizing list_replace_init(), and traversing >>> through a local copy of the current elements in the endpoint lists. >>> This will also set the parent list as empty, so if another thread is >>> also looping through the list, it will be empty on the next iteration. >>> >>> Fixes: d4f1afe5e896 ("usb: dwc3: gadget: move requests to cancelled_list") >>> Signed-off-by: Wesley Cheng <wcheng@codeaurora.org> >>> >>> --- >>> Previous patchset: >>> https://lore.kernel.org/linux-usb/1620716636-12422-1-git-send-email-wcheng@codeaurora.org/ >>> --- >>> drivers/usb/dwc3/gadget.c | 18 ++++++++++++++++-- >>> 1 file changed, 16 insertions(+), 2 deletions(-) >>> >>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c >>> index a29a4ca..3ce6ed9 100644 >>> --- a/drivers/usb/dwc3/gadget.c >>> +++ b/drivers/usb/dwc3/gadget.c >>> @@ -1926,9 +1926,13 @@ static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep) >>> { >>> struct dwc3_request *req; >>> struct dwc3_request *tmp; >>> + struct list_head local; >>> struct dwc3 *dwc = dep->dwc; >>> >>> - list_for_each_entry_safe(req, tmp, &dep->cancelled_list, list) { >>> +restart: >>> + list_replace_init(&dep->cancelled_list, &local); >> >> hmm, if the lock is held and IRQs disabled when this runs, then no other >> threads will be able to append requests to the list which makes the >> "restart" label unnecessary, no? > > We do still call dwc3_gadget_giveback() which would release the lock > briefly, so if there was another thread waiting on dwc->lock, it would > be able to add additional items to that list. > >> >> I wonder if we should release the lock and reenable interrupts after >> replacing the head. The problem is that >> dwc3_gadget_ep_cleanup_cancelled_requests() can run from the IRQ >> handler. >> > > We would also need to consider that some of the APIs being called in > these situations would also have the assumption that the dwc->lock is > held, ie dwc3_gadget_giveback() yeah, good point. I think we're good to integrate this, unless Alan can shed some light on some particular possible race scenario we may have missed. In any case: Acked-by: Felipe Balbi <balbi@kernel.org>
On Thu, Jul 29, 2021 at 11:09:57AM +0300, Felipe Balbi wrote: > > Hi, > > Wesley Cheng <wcheng@codeaurora.org> writes: > > > The list_for_each_entry_safe() macro saves the current item (n) and > > the item after (n+1), so that n can be safely removed without > > corrupting the list. However, when traversing the list and removing > > items using gadget giveback, the DWC3 lock is briefly released, > > allowing other routines to execute. There is a situation where, while > > items are being removed from the cancelled_list using > > dwc3_gadget_ep_cleanup_cancelled_requests(), the pullup disable > > routine is running in parallel (due to UDC unbind). As the cleanup > > routine removes n, and the pullup disable removes n+1, once the > > cleanup retakes the DWC3 lock, it references a request who was already > > removed/handled. With list debug enabled, this leads to a panic. > > Ensure all instances of the macro are replaced where gadget giveback > > is used. > > > > Example call stack: > > > > Thread#1: > > __dwc3_gadget_ep_set_halt() - CLEAR HALT > > -> dwc3_gadget_ep_cleanup_cancelled_requests() > > ->list_for_each_entry_safe() > > ->dwc3_gadget_giveback(n) > > ->dwc3_gadget_del_and_unmap_request()- n deleted[cancelled_list] > > ->spin_unlock > > ->Thread#2 executes > > ... > > ->dwc3_gadget_giveback(n+1) > > ->Already removed! > > > > Thread#2: > > dwc3_gadget_pullup() > > ->waiting for dwc3 spin_lock > > ... > > ->Thread#1 released lock > > ->dwc3_stop_active_transfers() > > ->dwc3_remove_requests() > > ->fetches n+1 item from cancelled_list (n removed by Thread#1) > > ->dwc3_gadget_giveback() > > ->dwc3_gadget_del_and_unmap_request()- n+1 > > deleted[cancelled_list] > > ->spin_unlock > > > > Fix this condition by utilizing list_replace_init(), and traversing > > through a local copy of the current elements in the endpoint lists. > > This will also set the parent list as empty, so if another thread is > > also looping through the list, it will be empty on the next iteration. > > > > Fixes: d4f1afe5e896 ("usb: dwc3: gadget: move requests to cancelled_list") > > Signed-off-by: Wesley Cheng <wcheng@codeaurora.org> > > > > --- > > Previous patchset: > > https://lore.kernel.org/linux-usb/1620716636-12422-1-git-send-email-wcheng@codeaurora.org/ > > --- > > drivers/usb/dwc3/gadget.c | 18 ++++++++++++++++-- > > 1 file changed, 16 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c > > index a29a4ca..3ce6ed9 100644 > > --- a/drivers/usb/dwc3/gadget.c > > +++ b/drivers/usb/dwc3/gadget.c > > @@ -1926,9 +1926,13 @@ static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep) > > { > > struct dwc3_request *req; > > struct dwc3_request *tmp; > > + struct list_head local; > > struct dwc3 *dwc = dep->dwc; > > > > - list_for_each_entry_safe(req, tmp, &dep->cancelled_list, list) { > > +restart: > > + list_replace_init(&dep->cancelled_list, &local); > > hmm, if the lock is held and IRQs disabled when this runs, then no other > threads will be able to append requests to the list which makes the > "restart" label unnecessary, no? As Wesley pointed out, the lock can be released during giveback and requests can be added to the cancelled_list at that time. On the other hand, if that happens, do you need to process those requests in this function call? Will another cleanup iteration take care of them later? (I don't know the driver well enough to answer this.) If it will, you may not need to restart anything. > I wonder if we should release the lock and reenable interrupts after > replacing the head. The problem is that > dwc3_gadget_ep_cleanup_cancelled_requests() can run from the IRQ > handler. > > Alan, could you provide your insight here? Do you think we should defer > this to a low priority tasklet or something along those lines? I don't see why anything like that would be necessary. Giving back cancelled requests isn't important enough to warrant special treatment. An alternative approach, used by some other drivers, is to stick with list_for_each_entry_safe as in the existing code, but go back to the restart label immediately each time the lock is released and reacquired. Also, if this loop always removes the entry it is processing from the list (I don't know whether it does this), you don't have to use list_for_each_entry_safe. You can simply use list_first_entry. Alan Stern > > + list_for_each_entry_safe(req, tmp, &local, list) { > > dwc3_gadget_ep_skip_trbs(dep, req); > > switch (req->status) { > > case DWC3_REQUEST_STATUS_DISCONNECTED: > > > -- > balbi
On Thu, Jul 29, 2021 at 12:34 AM Wesley Cheng <wcheng@codeaurora.org> wrote: > > The list_for_each_entry_safe() macro saves the current item (n) and > the item after (n+1), so that n can be safely removed without > corrupting the list. However, when traversing the list and removing > items using gadget giveback, the DWC3 lock is briefly released, > allowing other routines to execute. There is a situation where, while > items are being removed from the cancelled_list using > dwc3_gadget_ep_cleanup_cancelled_requests(), the pullup disable > routine is running in parallel (due to UDC unbind). As the cleanup > routine removes n, and the pullup disable removes n+1, once the > cleanup retakes the DWC3 lock, it references a request who was already > removed/handled. With list debug enabled, this leads to a panic. > Ensure all instances of the macro are replaced where gadget giveback > is used. > > Example call stack: > > Thread#1: > __dwc3_gadget_ep_set_halt() - CLEAR HALT > -> dwc3_gadget_ep_cleanup_cancelled_requests() > ->list_for_each_entry_safe() > ->dwc3_gadget_giveback(n) > ->dwc3_gadget_del_and_unmap_request()- n deleted[cancelled_list] > ->spin_unlock > ->Thread#2 executes > ... > ->dwc3_gadget_giveback(n+1) > ->Already removed! > > Thread#2: > dwc3_gadget_pullup() > ->waiting for dwc3 spin_lock > ... > ->Thread#1 released lock > ->dwc3_stop_active_transfers() > ->dwc3_remove_requests() > ->fetches n+1 item from cancelled_list (n removed by Thread#1) > ->dwc3_gadget_giveback() > ->dwc3_gadget_del_and_unmap_request()- n+1 > deleted[cancelled_list] > ->spin_unlock > > Fix this condition by utilizing list_replace_init(), and traversing > through a local copy of the current elements in the endpoint lists. > This will also set the parent list as empty, so if another thread is > also looping through the list, it will be empty on the next iteration. > > Fixes: d4f1afe5e896 ("usb: dwc3: gadget: move requests to cancelled_list") > Signed-off-by: Wesley Cheng <wcheng@codeaurora.org> Hey Wesley, Just as a heads up, since this patch just landed upstream, I've bisected it down as causing a regression on the db845c/RB3 board. After booting with mainline, I'm seeing attempts to connect via adb fail with: error: device offline Running "adb devices" provides: List of devices attached c4e1189c offline After reverting this patch, I can properly connect via adb again, and "adb devices" shows the expected output: List of devices attached c4e1189c device I've not been able to isolate what might be going on, as there's no obvious errors in dmesg. Any suggestions to further debug this? thanks -john
On Thu, Jul 29, 2021 at 12:34 AM Wesley Cheng <wcheng@codeaurora.org> wrote: > > The list_for_each_entry_safe() macro saves the current item (n) and > the item after (n+1), so that n can be safely removed without > corrupting the list. However, when traversing the list and removing > items using gadget giveback, the DWC3 lock is briefly released, > allowing other routines to execute. There is a situation where, while > items are being removed from the cancelled_list using > dwc3_gadget_ep_cleanup_cancelled_requests(), the pullup disable > routine is running in parallel (due to UDC unbind). As the cleanup > routine removes n, and the pullup disable removes n+1, once the > cleanup retakes the DWC3 lock, it references a request who was already > removed/handled. With list debug enabled, this leads to a panic. > Ensure all instances of the macro are replaced where gadget giveback > is used. > > Example call stack: > > Thread#1: > __dwc3_gadget_ep_set_halt() - CLEAR HALT > -> dwc3_gadget_ep_cleanup_cancelled_requests() > ->list_for_each_entry_safe() > ->dwc3_gadget_giveback(n) > ->dwc3_gadget_del_and_unmap_request()- n deleted[cancelled_list] > ->spin_unlock > ->Thread#2 executes > ... > ->dwc3_gadget_giveback(n+1) > ->Already removed! > > Thread#2: > dwc3_gadget_pullup() > ->waiting for dwc3 spin_lock > ... > ->Thread#1 released lock > ->dwc3_stop_active_transfers() > ->dwc3_remove_requests() > ->fetches n+1 item from cancelled_list (n removed by Thread#1) > ->dwc3_gadget_giveback() > ->dwc3_gadget_del_and_unmap_request()- n+1 > deleted[cancelled_list] > ->spin_unlock > > Fix this condition by utilizing list_replace_init(), and traversing > through a local copy of the current elements in the endpoint lists. > This will also set the parent list as empty, so if another thread is > also looping through the list, it will be empty on the next iteration. > > Fixes: d4f1afe5e896 ("usb: dwc3: gadget: move requests to cancelled_list") > Signed-off-by: Wesley Cheng <wcheng@codeaurora.org> > > --- > Previous patchset: > https://lore.kernel.org/linux-usb/1620716636-12422-1-git-send-email-wcheng@codeaurora.org/ > --- > drivers/usb/dwc3/gadget.c | 18 ++++++++++++++++-- > 1 file changed, 16 insertions(+), 2 deletions(-) > > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c > index a29a4ca..3ce6ed9 100644 > --- a/drivers/usb/dwc3/gadget.c > +++ b/drivers/usb/dwc3/gadget.c > @@ -1926,9 +1926,13 @@ static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep) > { > struct dwc3_request *req; > struct dwc3_request *tmp; > + struct list_head local; > struct dwc3 *dwc = dep->dwc; > > - list_for_each_entry_safe(req, tmp, &dep->cancelled_list, list) { > +restart: > + list_replace_init(&dep->cancelled_list, &local); > + > + list_for_each_entry_safe(req, tmp, &local, list) { > dwc3_gadget_ep_skip_trbs(dep, req); > switch (req->status) { > case DWC3_REQUEST_STATUS_DISCONNECTED: > @@ -1946,6 +1950,9 @@ static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep) > break; > } > } > + > + if (!list_empty(&dep->cancelled_list)) > + goto restart; > } So, I'm not sure yet, but the "break" cases in the list_for_each_entry_safe seem suspicious to me. It seems we've move the list to the local listhead, then as we process the local listhead, we may hit the "break" case, which will stop processing the list, and then we end up returning, losing the unprocessed items on the local listhead. I suspect we need to move them back to the started/cancelled_list, or rework things so we don't hit the "break" cases and fully process the local list before returning. thanks -john
+ John Stultz Wesley Cheng wrote: > The list_for_each_entry_safe() macro saves the current item (n) and > the item after (n+1), so that n can be safely removed without > corrupting the list. However, when traversing the list and removing > items using gadget giveback, the DWC3 lock is briefly released, > allowing other routines to execute. There is a situation where, while > items are being removed from the cancelled_list using > dwc3_gadget_ep_cleanup_cancelled_requests(), the pullup disable > routine is running in parallel (due to UDC unbind). As the cleanup > routine removes n, and the pullup disable removes n+1, once the > cleanup retakes the DWC3 lock, it references a request who was already > removed/handled. With list debug enabled, this leads to a panic. > Ensure all instances of the macro are replaced where gadget giveback > is used. > > Example call stack: > > Thread#1: > __dwc3_gadget_ep_set_halt() - CLEAR HALT > -> dwc3_gadget_ep_cleanup_cancelled_requests() > ->list_for_each_entry_safe() > ->dwc3_gadget_giveback(n) > ->dwc3_gadget_del_and_unmap_request()- n deleted[cancelled_list] > ->spin_unlock > ->Thread#2 executes > ... > ->dwc3_gadget_giveback(n+1) > ->Already removed! > > Thread#2: > dwc3_gadget_pullup() > ->waiting for dwc3 spin_lock > ... > ->Thread#1 released lock > ->dwc3_stop_active_transfers() > ->dwc3_remove_requests() > ->fetches n+1 item from cancelled_list (n removed by Thread#1) > ->dwc3_gadget_giveback() > ->dwc3_gadget_del_and_unmap_request()- n+1 > deleted[cancelled_list] > ->spin_unlock > > Fix this condition by utilizing list_replace_init(), and traversing > through a local copy of the current elements in the endpoint lists. > This will also set the parent list as empty, so if another thread is > also looping through the list, it will be empty on the next iteration. > > Fixes: d4f1afe5e896 ("usb: dwc3: gadget: move requests to cancelled_list") > Signed-off-by: Wesley Cheng <wcheng@codeaurora.org> > > --- > Previous patchset: > https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/1620716636-12422-1-git-send-email-wcheng@codeaurora.org/__;!!A4F2R9G_pg!Ngid3pREhM1FWiRmEnCGrN6FhBvSxDTkPbZ4RzAEO5Ubs0aGSxtikFT1APzTWhgw42As$ > --- > drivers/usb/dwc3/gadget.c | 18 ++++++++++++++++-- > 1 file changed, 16 insertions(+), 2 deletions(-) > > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c > index a29a4ca..3ce6ed9 100644 > --- a/drivers/usb/dwc3/gadget.c > +++ b/drivers/usb/dwc3/gadget.c > @@ -1926,9 +1926,13 @@ static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep) > { > struct dwc3_request *req; > struct dwc3_request *tmp; > + struct list_head local; > struct dwc3 *dwc = dep->dwc; > > - list_for_each_entry_safe(req, tmp, &dep->cancelled_list, list) { > +restart: > + list_replace_init(&dep->cancelled_list, &local); > + > + list_for_each_entry_safe(req, tmp, &local, list) { > dwc3_gadget_ep_skip_trbs(dep, req); > switch (req->status) { > case DWC3_REQUEST_STATUS_DISCONNECTED: > @@ -1946,6 +1950,9 @@ static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep) > break; > } > } > + > + if (!list_empty(&dep->cancelled_list)) > + goto restart; > } > > static int dwc3_gadget_ep_dequeue(struct usb_ep *ep, > @@ -3190,8 +3197,12 @@ static void dwc3_gadget_ep_cleanup_completed_requests(struct dwc3_ep *dep, > { > struct dwc3_request *req; > struct dwc3_request *tmp; > + struct list_head local; > > - list_for_each_entry_safe(req, tmp, &dep->started_list, list) { > +restart: > + list_replace_init(&dep->started_list, &local); > + > + list_for_each_entry_safe(req, tmp, &local, list) { > int ret; > > ret = dwc3_gadget_ep_cleanup_completed_request(dep, event, > @@ -3199,6 +3210,9 @@ static void dwc3_gadget_ep_cleanup_completed_requests(struct dwc3_ep *dep, > if (ret) > break; > } > + > + if (!list_empty(&dep->started_list)) > + goto restart; This is not right. We don't cleanup the entire started list here. Sometime we end early because some TRBs are completed but not all. BR, Thinh
From: Thinh Nguyen <Thinh.Nguyen@synopsys.com> > + John Stultz > > Wesley Cheng wrote: > > The list_for_each_entry_safe() macro saves the current item (n) and > > the item after (n+1), so that n can be safely removed without > > corrupting the list. However, when traversing the list and removing > > items using gadget giveback, the DWC3 lock is briefly released, > > allowing other routines to execute. There is a situation where, while > > items are being removed from the cancelled_list using > > dwc3_gadget_ep_cleanup_cancelled_requests(), the pullup disable > > routine is running in parallel (due to UDC unbind). As the cleanup > > routine removes n, and the pullup disable removes n+1, once the > > cleanup retakes the DWC3 lock, it references a request who was already > > removed/handled. With list debug enabled, this leads to a panic. > > Ensure all instances of the macro are replaced where gadget giveback > > is used. > > > > Example call stack: > > > > Thread#1: > > __dwc3_gadget_ep_set_halt() - CLEAR HALT > > -> dwc3_gadget_ep_cleanup_cancelled_requests() > > ->list_for_each_entry_safe() > > ->dwc3_gadget_giveback(n) > > ->dwc3_gadget_del_and_unmap_request()- n deleted[cancelled_list] > > ->spin_unlock > > ->Thread#2 executes > > ... > > ->dwc3_gadget_giveback(n+1) > > ->Already removed! > > > > Thread#2: > > dwc3_gadget_pullup() > > ->waiting for dwc3 spin_lock > > ... > > ->Thread#1 released lock > > ->dwc3_stop_active_transfers() > > ->dwc3_remove_requests() > > ->fetches n+1 item from cancelled_list (n removed by Thread#1) > > ->dwc3_gadget_giveback() > > ->dwc3_gadget_del_and_unmap_request()- n+1 > > deleted[cancelled_list] > > ->spin_unlock > > > > Fix this condition by utilizing list_replace_init(), and traversing > > through a local copy of the current elements in the endpoint lists. > > This will also set the parent list as empty, so if another thread is > > also looping through the list, it will be empty on the next iteration. > > > > Fixes: d4f1afe5e896 ("usb: dwc3: gadget: move requests to cancelled_list") > > Signed-off-by: Wesley Cheng <wcheng@codeaurora.org> > > > > --- > > Previous patchset: > > https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/1620716636-12422-1-git-send-email-wcheng@codeaurora.org/__;!!A4F2R9G_pg!Ngid3pREhM1FWiRmEnCGrN6FhBvSxDTkPbZ4RzAEO5Ubs0aGSxtikFT1APzTWhgw42As$ > > --- > > drivers/usb/dwc3/gadget.c | 18 ++++++++++++++++-- > > 1 file changed, 16 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c > > index a29a4ca..3ce6ed9 100644 > > --- a/drivers/usb/dwc3/gadget.c > > +++ b/drivers/usb/dwc3/gadget.c > > @@ -1926,9 +1926,13 @@ static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep) > > { > > struct dwc3_request *req; > > struct dwc3_request *tmp; > > + struct list_head local; > > struct dwc3 *dwc = dep->dwc; > > > > - list_for_each_entry_safe(req, tmp, &dep->cancelled_list, list) { > > +restart: > > + list_replace_init(&dep->cancelled_list, &local); > > + > > + list_for_each_entry_safe(req, tmp, &local, list) { > > dwc3_gadget_ep_skip_trbs(dep, req); > > switch (req->status) { > > case DWC3_REQUEST_STATUS_DISCONNECTED: > > @@ -1946,6 +1950,9 @@ static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep) > > break; > > } > > } > > + > > + if (!list_empty(&dep->cancelled_list)) > > + goto restart; > > } > > > > static int dwc3_gadget_ep_dequeue(struct usb_ep *ep, > > @@ -3190,8 +3197,12 @@ static void dwc3_gadget_ep_cleanup_completed_requests(struct dwc3_ep *dep, > > { > > struct dwc3_request *req; > > struct dwc3_request *tmp; > > + struct list_head local; > > > > - list_for_each_entry_safe(req, tmp, &dep->started_list, list) { > > +restart: > > + list_replace_init(&dep->started_list, &local); > > + > > + list_for_each_entry_safe(req, tmp, &local, list) { > > int ret; > > > > ret = dwc3_gadget_ep_cleanup_completed_request(dep, event, > > @@ -3199,6 +3210,9 @@ static void dwc3_gadget_ep_cleanup_completed_requests(struct dwc3_ep *dep, > > if (ret) > > break; I also met the connection issue. The problem is related that dwc3 requests in local list are ignored due to loop break. > > } > > + > > + if (!list_empty(&dep->started_list)) > > + goto restart; > > This is not right. We don't cleanup the entire started list here. > Sometime we end early because some TRBs are completed but not all. Yes, I also think it can be replaced with checking local list and restoring unhandled requests directly. > BR, > Thinh > Best regards, Ray
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c index a29a4ca..3ce6ed9 100644 --- a/drivers/usb/dwc3/gadget.c +++ b/drivers/usb/dwc3/gadget.c @@ -1926,9 +1926,13 @@ static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep) { struct dwc3_request *req; struct dwc3_request *tmp; + struct list_head local; struct dwc3 *dwc = dep->dwc; - list_for_each_entry_safe(req, tmp, &dep->cancelled_list, list) { +restart: + list_replace_init(&dep->cancelled_list, &local); + + list_for_each_entry_safe(req, tmp, &local, list) { dwc3_gadget_ep_skip_trbs(dep, req); switch (req->status) { case DWC3_REQUEST_STATUS_DISCONNECTED: @@ -1946,6 +1950,9 @@ static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep) break; } } + + if (!list_empty(&dep->cancelled_list)) + goto restart; } static int dwc3_gadget_ep_dequeue(struct usb_ep *ep, @@ -3190,8 +3197,12 @@ static void dwc3_gadget_ep_cleanup_completed_requests(struct dwc3_ep *dep, { struct dwc3_request *req; struct dwc3_request *tmp; + struct list_head local; - list_for_each_entry_safe(req, tmp, &dep->started_list, list) { +restart: + list_replace_init(&dep->started_list, &local); + + list_for_each_entry_safe(req, tmp, &local, list) { int ret; ret = dwc3_gadget_ep_cleanup_completed_request(dep, event, @@ -3199,6 +3210,9 @@ static void dwc3_gadget_ep_cleanup_completed_requests(struct dwc3_ep *dep, if (ret) break; } + + if (!list_empty(&dep->started_list)) + goto restart; } static bool dwc3_gadget_ep_should_continue(struct dwc3_ep *dep)
The list_for_each_entry_safe() macro saves the current item (n) and the item after (n+1), so that n can be safely removed without corrupting the list. However, when traversing the list and removing items using gadget giveback, the DWC3 lock is briefly released, allowing other routines to execute. There is a situation where, while items are being removed from the cancelled_list using dwc3_gadget_ep_cleanup_cancelled_requests(), the pullup disable routine is running in parallel (due to UDC unbind). As the cleanup routine removes n, and the pullup disable removes n+1, once the cleanup retakes the DWC3 lock, it references a request who was already removed/handled. With list debug enabled, this leads to a panic. Ensure all instances of the macro are replaced where gadget giveback is used. Example call stack: Thread#1: __dwc3_gadget_ep_set_halt() - CLEAR HALT -> dwc3_gadget_ep_cleanup_cancelled_requests() ->list_for_each_entry_safe() ->dwc3_gadget_giveback(n) ->dwc3_gadget_del_and_unmap_request()- n deleted[cancelled_list] ->spin_unlock ->Thread#2 executes ... ->dwc3_gadget_giveback(n+1) ->Already removed! Thread#2: dwc3_gadget_pullup() ->waiting for dwc3 spin_lock ... ->Thread#1 released lock ->dwc3_stop_active_transfers() ->dwc3_remove_requests() ->fetches n+1 item from cancelled_list (n removed by Thread#1) ->dwc3_gadget_giveback() ->dwc3_gadget_del_and_unmap_request()- n+1 deleted[cancelled_list] ->spin_unlock Fix this condition by utilizing list_replace_init(), and traversing through a local copy of the current elements in the endpoint lists. This will also set the parent list as empty, so if another thread is also looping through the list, it will be empty on the next iteration. Fixes: d4f1afe5e896 ("usb: dwc3: gadget: move requests to cancelled_list") Signed-off-by: Wesley Cheng <wcheng@codeaurora.org> --- Previous patchset: https://lore.kernel.org/linux-usb/1620716636-12422-1-git-send-email-wcheng@codeaurora.org/ --- drivers/usb/dwc3/gadget.c | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-)