Message ID | 1620091264-418-1-git-send-email-wcheng@codeaurora.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] usb: dwc3: gadget: Avoid canceling current request for queuing error | expand |
Hi, Wesley Cheng wrote: > If an error is received when issuing a start or update transfer > command, the error handler will stop all active requests (including > the current USB request), and call dwc3_gadget_giveback() to notify > function drivers of the requests which have been stopped. Avoid > having to cancel the current request which is trying to be queued, as > the function driver will handle the EP queue error accordingly. > Simply unmap the request as it was done before, and allow previously > started transfers to be cleaned up. > It looks like you're still letting dwc3 stopping and cancelling all the active requests instead letting the function driver doing the dequeue. BTW, what kinds of command and error do you see in your setup and for what type endpoint? I'm thinking of letting the function driver to dequeue the requests instead of letting dwc3 automatically ending/cancelling the queued requests. However, it's a bit tricky to do that if the error is -ETIMEDOUT since we're not sure if the controller had already cached the TRBs. This seems to add more complexity and I don't have a good solution to it. Since you're already cancelling all the active request anyway, what do you think of always letting dwc3_gadget_ep_queue() to go through with success, but report failure through request completion? Hi Felipe, can you also chime in? Thanks, Thinh > Signed-off-by: Wesley Cheng <wcheng@codeaurora.org> > --- > Changes in v2: > - Addressed feedback received by making sure the logic only applies to a req > which is being queued, decrementing the enqueue pointer, and only clearing > the HWO bit. > > drivers/usb/dwc3/gadget.c | 75 +++++++++++++++++++++++++++++++++++++++++------ > 1 file changed, 66 insertions(+), 9 deletions(-) > > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c > index dd80e5c..c8ddbe1 100644 > --- a/drivers/usb/dwc3/gadget.c > +++ b/drivers/usb/dwc3/gadget.c > @@ -140,6 +140,29 @@ int dwc3_gadget_set_link_state(struct dwc3 *dwc, enum dwc3_link_state state) > } > > /** > + * dwc3_ep_dec_trb - decrement a trb index. > + * @index: Pointer to the TRB index to increment. > + * > + * The index should never point to the link TRB. After decrementing, > + * if index is zero, wrap around to the TRB before the link TRB. > + */ > +static void dwc3_ep_dec_trb(u8 *index) > +{ > + (*index)--; > + if (*index < 0) > + *index = DWC3_TRB_NUM - 1; > +} > + > +/** > + * dwc3_ep_dec_enq - decrement endpoint's enqueue pointer > + * @dep: The endpoint whose enqueue pointer we're decrementing > + */ > +static void dwc3_ep_dec_enq(struct dwc3_ep *dep) > +{ > + dwc3_ep_dec_trb(&dep->trb_enqueue); > +} > + > +/** > * dwc3_ep_inc_trb - increment a trb index. > * @index: Pointer to the TRB index to increment. > * > @@ -1352,7 +1375,26 @@ static int dwc3_prepare_trbs(struct dwc3_ep *dep) > > static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep); > > -static int __dwc3_gadget_kick_transfer(struct dwc3_ep *dep) > +static void dwc3_gadget_ep_revert_trbs(struct dwc3_ep *dep, struct dwc3_request *req) > +{ > + int i; > + > + if (!req->trb) > + return; > + > + for (i = 0; i < req->num_trbs; i++) { > + struct dwc3_trb *trb; > + > + trb = &dep->trb_pool[dep->trb_enqueue]; > + trb->ctrl &= ~DWC3_TRB_CTRL_HWO; > + dwc3_ep_dec_enq(dep); > + } > + > + req->num_trbs = 0; > +} > + > +static int __dwc3_gadget_kick_transfer(struct dwc3_ep *dep, > + struct dwc3_request *queued_req) > { > struct dwc3_gadget_ep_cmd_params params; > struct dwc3_request *req; > @@ -1410,8 +1452,23 @@ static int __dwc3_gadget_kick_transfer(struct dwc3_ep *dep) > > dwc3_stop_active_transfer(dep, true, true); > > - list_for_each_entry_safe(req, tmp, &dep->started_list, list) > - dwc3_gadget_move_cancelled_request(req, DWC3_REQUEST_STATUS_DEQUEUED); > + /* > + * In order to ensure the logic is applied to a request being > + * queued by dwc3_gadget_ep_queue(), it needs to explicitly > + * check that req is the same as queued_req (request being > + * queued). If so, then just unmap and decrement the enqueue > + * pointer, as the usb_ep_queue() error handler in the function > + * driver will handle cleaning up the USB request. > + */ > + list_for_each_entry_safe(req, tmp, &dep->started_list, list) { > + if (req == queued_req) { > + dwc3_gadget_ep_revert_trbs(dep, req); > + dwc3_gadget_del_and_unmap_request(dep, req, ret); > + } else { > + dwc3_gadget_move_cancelled_request(req, > + DWC3_REQUEST_STATUS_DEQUEUED); > + } > + } > > /* If ep isn't started, then there's no end transfer pending */ > if (!(dep->flags & DWC3_EP_END_TRANSFER_PENDING)) > @@ -1546,7 +1603,7 @@ static int dwc3_gadget_start_isoc_quirk(struct dwc3_ep *dep) > dep->start_cmd_status = 0; > dep->combo_num = 0; > > - return __dwc3_gadget_kick_transfer(dep); > + return __dwc3_gadget_kick_transfer(dep, NULL); > } > > static int __dwc3_gadget_start_isoc(struct dwc3_ep *dep) > @@ -1593,7 +1650,7 @@ static int __dwc3_gadget_start_isoc(struct dwc3_ep *dep) > for (i = 0; i < DWC3_ISOC_MAX_RETRIES; i++) { > dep->frame_number = DWC3_ALIGN_FRAME(dep, i + 1); > > - ret = __dwc3_gadget_kick_transfer(dep); > + ret = __dwc3_gadget_kick_transfer(dep, NULL); > if (ret != -EAGAIN) > break; > } > @@ -1684,7 +1741,7 @@ static int __dwc3_gadget_ep_queue(struct dwc3_ep *dep, struct dwc3_request *req) > } > } > > - return __dwc3_gadget_kick_transfer(dep); > + return __dwc3_gadget_kick_transfer(dep, req); > } > > static int dwc3_gadget_ep_queue(struct usb_ep *ep, struct usb_request *request, > @@ -1893,7 +1950,7 @@ int __dwc3_gadget_ep_set_halt(struct dwc3_ep *dep, int value, int protocol) > > if ((dep->flags & DWC3_EP_DELAY_START) && > !usb_endpoint_xfer_isoc(dep->endpoint.desc)) > - __dwc3_gadget_kick_transfer(dep); > + __dwc3_gadget_kick_transfer(dep, NULL); > > dep->flags &= ~DWC3_EP_DELAY_START; > } > @@ -2992,7 +3049,7 @@ static bool dwc3_gadget_endpoint_trbs_complete(struct dwc3_ep *dep, > (list_empty(&dep->pending_list) || status == -EXDEV)) > dwc3_stop_active_transfer(dep, true, true); > else if (dwc3_gadget_ep_should_continue(dep)) > - if (__dwc3_gadget_kick_transfer(dep) == 0) > + if (__dwc3_gadget_kick_transfer(dep, NULL) == 0) > no_started_trb = false; > > out: > @@ -3106,7 +3163,7 @@ static void dwc3_gadget_endpoint_command_complete(struct dwc3_ep *dep, > > if ((dep->flags & DWC3_EP_DELAY_START) && > !usb_endpoint_xfer_isoc(dep->endpoint.desc)) > - __dwc3_gadget_kick_transfer(dep); > + __dwc3_gadget_kick_transfer(dep, NULL); > > dep->flags &= ~DWC3_EP_DELAY_START; > } >
On 5/3/2021 7:20 PM, Thinh Nguyen wrote: > Hi, > > Wesley Cheng wrote: >> If an error is received when issuing a start or update transfer >> command, the error handler will stop all active requests (including >> the current USB request), and call dwc3_gadget_giveback() to notify >> function drivers of the requests which have been stopped. Avoid >> having to cancel the current request which is trying to be queued, as >> the function driver will handle the EP queue error accordingly. >> Simply unmap the request as it was done before, and allow previously >> started transfers to be cleaned up. >> Hi Thinh, > > It looks like you're still letting dwc3 stopping and cancelling all the > active requests instead letting the function driver doing the dequeue. > Yeah, main issue isn't due to the function driver doing dequeue, but having cleanup (ie USB request free) if there is an error during usb_ep_queue(). The function driver in question at the moment is the f_fs driver in AIO mode. When async IO is enabled in the FFS driver, every time it queues a packet, it will allocate a io_data struct beforehand. If the usb_ep_queue() fails it will free this io_data memory. Problem is that, since the DWC3 gadget calls the completion with -ECONNRESET, the FFS driver will also schedule a work item (within io_data struct) to handle the completion. So you end up with a flow like below allocate io_data (ffs) --> usb_ep_queue() --> __dwc3_gadget_kick_transfer() --> dwc3_send_gadget_ep_cmd(EINVAL) --> dwc3_gadget_ep_cleanup_cancelled_requests() --> dwc3_gadget_giveback(ECONNRESET) ffs completion callback queue work item within io_data --> usb_ep_queue returns EINVAL ffs frees io_data ... work scheduled --> NULL pointer/memory fault as io_data is freed > BTW, what kinds of command and error do you see in your setup and for > what type endpoint? I'm thinking of letting the function driver to > dequeue the requests instead of letting dwc3 automatically > ending/cancelling the queued requests. However, it's a bit tricky to do > that if the error is -ETIMEDOUT since we're not sure if the controller > had already cached the TRBs. > Happens on bulk EPs so far, but I think it wouldn't matter as long as its over the FFS interface. (and using async IO transfers) > This seems to add more complexity and I don't have a good solution to > it. Since you're already cancelling all the active request anyway, what > do you think of always letting dwc3_gadget_ep_queue() to go through with > success, but report failure through request completion? > We do have something similar as well downstream (returning success always on dwc3_gadget_ep_queue()) and its been working for us also. Problem is we don't test the ISOC path much, so this is the only type of EP that might come into question... Coming up with a way to address the concerns you brought up was a bit difficult as there were scenarios we needed to consider. next_request() doesn't always have to be the request being queued (even if ep queue triggered it). There was no easy way to determine if kick transfer was due to ep queue, but even if there was, we'd need to remember the previous point as well. Thanks Wesley Cheng > Hi Felipe, can you also chime in? > > Thanks, > Thinh > > >> Signed-off-by: Wesley Cheng <wcheng@codeaurora.org> >> --- >> Changes in v2: >> - Addressed feedback received by making sure the logic only applies to a req >> which is being queued, decrementing the enqueue pointer, and only clearing >> the HWO bit. >> >> drivers/usb/dwc3/gadget.c | 75 +++++++++++++++++++++++++++++++++++++++++------ >> 1 file changed, 66 insertions(+), 9 deletions(-) >> >> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c >> index dd80e5c..c8ddbe1 100644 >> --- a/drivers/usb/dwc3/gadget.c >> +++ b/drivers/usb/dwc3/gadget.c >> @@ -140,6 +140,29 @@ int dwc3_gadget_set_link_state(struct dwc3 *dwc, enum dwc3_link_state state) >> } >> >> /** >> + * dwc3_ep_dec_trb - decrement a trb index. >> + * @index: Pointer to the TRB index to increment. >> + * >> + * The index should never point to the link TRB. After decrementing, >> + * if index is zero, wrap around to the TRB before the link TRB. >> + */ >> +static void dwc3_ep_dec_trb(u8 *index) >> +{ >> + (*index)--; >> + if (*index < 0) >> + *index = DWC3_TRB_NUM - 1; >> +} >> + >> +/** >> + * dwc3_ep_dec_enq - decrement endpoint's enqueue pointer >> + * @dep: The endpoint whose enqueue pointer we're decrementing >> + */ >> +static void dwc3_ep_dec_enq(struct dwc3_ep *dep) >> +{ >> + dwc3_ep_dec_trb(&dep->trb_enqueue); >> +} >> + >> +/** >> * dwc3_ep_inc_trb - increment a trb index. >> * @index: Pointer to the TRB index to increment. >> * >> @@ -1352,7 +1375,26 @@ static int dwc3_prepare_trbs(struct dwc3_ep *dep) >> >> static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep); >> >> -static int __dwc3_gadget_kick_transfer(struct dwc3_ep *dep) >> +static void dwc3_gadget_ep_revert_trbs(struct dwc3_ep *dep, struct dwc3_request *req) >> +{ >> + int i; >> + >> + if (!req->trb) >> + return; >> + >> + for (i = 0; i < req->num_trbs; i++) { >> + struct dwc3_trb *trb; >> + >> + trb = &dep->trb_pool[dep->trb_enqueue]; >> + trb->ctrl &= ~DWC3_TRB_CTRL_HWO; >> + dwc3_ep_dec_enq(dep); >> + } >> + >> + req->num_trbs = 0; >> +} >> + >> +static int __dwc3_gadget_kick_transfer(struct dwc3_ep *dep, >> + struct dwc3_request *queued_req) >> { >> struct dwc3_gadget_ep_cmd_params params; >> struct dwc3_request *req; >> @@ -1410,8 +1452,23 @@ static int __dwc3_gadget_kick_transfer(struct dwc3_ep *dep) >> >> dwc3_stop_active_transfer(dep, true, true); >> >> - list_for_each_entry_safe(req, tmp, &dep->started_list, list) >> - dwc3_gadget_move_cancelled_request(req, DWC3_REQUEST_STATUS_DEQUEUED); >> + /* >> + * In order to ensure the logic is applied to a request being >> + * queued by dwc3_gadget_ep_queue(), it needs to explicitly >> + * check that req is the same as queued_req (request being >> + * queued). If so, then just unmap and decrement the enqueue >> + * pointer, as the usb_ep_queue() error handler in the function >> + * driver will handle cleaning up the USB request. >> + */ >> + list_for_each_entry_safe(req, tmp, &dep->started_list, list) { >> + if (req == queued_req) { >> + dwc3_gadget_ep_revert_trbs(dep, req); >> + dwc3_gadget_del_and_unmap_request(dep, req, ret); >> + } else { >> + dwc3_gadget_move_cancelled_request(req, >> + DWC3_REQUEST_STATUS_DEQUEUED); >> + } >> + } >> >> /* If ep isn't started, then there's no end transfer pending */ >> if (!(dep->flags & DWC3_EP_END_TRANSFER_PENDING)) >> @@ -1546,7 +1603,7 @@ static int dwc3_gadget_start_isoc_quirk(struct dwc3_ep *dep) >> dep->start_cmd_status = 0; >> dep->combo_num = 0; >> >> - return __dwc3_gadget_kick_transfer(dep); >> + return __dwc3_gadget_kick_transfer(dep, NULL); >> } >> >> static int __dwc3_gadget_start_isoc(struct dwc3_ep *dep) >> @@ -1593,7 +1650,7 @@ static int __dwc3_gadget_start_isoc(struct dwc3_ep *dep) >> for (i = 0; i < DWC3_ISOC_MAX_RETRIES; i++) { >> dep->frame_number = DWC3_ALIGN_FRAME(dep, i + 1); >> >> - ret = __dwc3_gadget_kick_transfer(dep); >> + ret = __dwc3_gadget_kick_transfer(dep, NULL); >> if (ret != -EAGAIN) >> break; >> } >> @@ -1684,7 +1741,7 @@ static int __dwc3_gadget_ep_queue(struct dwc3_ep *dep, struct dwc3_request *req) >> } >> } >> >> - return __dwc3_gadget_kick_transfer(dep); >> + return __dwc3_gadget_kick_transfer(dep, req); >> } >> >> static int dwc3_gadget_ep_queue(struct usb_ep *ep, struct usb_request *request, >> @@ -1893,7 +1950,7 @@ int __dwc3_gadget_ep_set_halt(struct dwc3_ep *dep, int value, int protocol) >> >> if ((dep->flags & DWC3_EP_DELAY_START) && >> !usb_endpoint_xfer_isoc(dep->endpoint.desc)) >> - __dwc3_gadget_kick_transfer(dep); >> + __dwc3_gadget_kick_transfer(dep, NULL); >> >> dep->flags &= ~DWC3_EP_DELAY_START; >> } >> @@ -2992,7 +3049,7 @@ static bool dwc3_gadget_endpoint_trbs_complete(struct dwc3_ep *dep, >> (list_empty(&dep->pending_list) || status == -EXDEV)) >> dwc3_stop_active_transfer(dep, true, true); >> else if (dwc3_gadget_ep_should_continue(dep)) >> - if (__dwc3_gadget_kick_transfer(dep) == 0) >> + if (__dwc3_gadget_kick_transfer(dep, NULL) == 0) >> no_started_trb = false; >> >> out: >> @@ -3106,7 +3163,7 @@ static void dwc3_gadget_endpoint_command_complete(struct dwc3_ep *dep, >> >> if ((dep->flags & DWC3_EP_DELAY_START) && >> !usb_endpoint_xfer_isoc(dep->endpoint.desc)) >> - __dwc3_gadget_kick_transfer(dep); >> + __dwc3_gadget_kick_transfer(dep, NULL); >> >> dep->flags &= ~DWC3_EP_DELAY_START; >> } >> >
Hi Wesley, Wesley Cheng wrote: > > > On 5/3/2021 7:20 PM, Thinh Nguyen wrote: >> Hi, >> >> Wesley Cheng wrote: >>> If an error is received when issuing a start or update transfer >>> command, the error handler will stop all active requests (including >>> the current USB request), and call dwc3_gadget_giveback() to notify >>> function drivers of the requests which have been stopped. Avoid >>> having to cancel the current request which is trying to be queued, as >>> the function driver will handle the EP queue error accordingly. >>> Simply unmap the request as it was done before, and allow previously >>> started transfers to be cleaned up. >>> > > Hi Thinh, > >> >> It looks like you're still letting dwc3 stopping and cancelling all the >> active requests instead letting the function driver doing the dequeue. >> > > Yeah, main issue isn't due to the function driver doing dequeue, but > having cleanup (ie USB request free) if there is an error during > usb_ep_queue(). > > The function driver in question at the moment is the f_fs driver in AIO > mode. When async IO is enabled in the FFS driver, every time it queues > a packet, it will allocate a io_data struct beforehand. If the > usb_ep_queue() fails it will free this io_data memory. Problem is that, > since the DWC3 gadget calls the completion with -ECONNRESET, the FFS > driver will also schedule a work item (within io_data struct) to handle > the completion. So you end up with a flow like below > > allocate io_data (ffs) > --> usb_ep_queue() > --> __dwc3_gadget_kick_transfer() > --> dwc3_send_gadget_ep_cmd(EINVAL) > --> dwc3_gadget_ep_cleanup_cancelled_requests() > --> dwc3_gadget_giveback(ECONNRESET) > ffs completion callback > queue work item within io_data > --> usb_ep_queue returns EINVAL > ffs frees io_data > ... > > work scheduled > --> NULL pointer/memory fault as io_data is freed sounds like a race issue. > >> BTW, what kinds of command and error do you see in your setup and for >> what type endpoint? I'm thinking of letting the function driver to >> dequeue the requests instead of letting dwc3 automatically >> ending/cancelling the queued requests. However, it's a bit tricky to do >> that if the error is -ETIMEDOUT since we're not sure if the controller >> had already cached the TRBs. >> > > Happens on bulk EPs so far, but I think it wouldn't matter as long as > its over the FFS interface. (and using async IO transfers) Do you know which command and error code? It's strange if UPDATE_TRANSFER command failed. > >> This seems to add more complexity and I don't have a good solution to >> it. Since you're already cancelling all the active request anyway, what >> do you think of always letting dwc3_gadget_ep_queue() to go through with >> success, but report failure through request completion? >> > > We do have something similar as well downstream (returning success > always on dwc3_gadget_ep_queue()) and its been working for us also. > Problem is we don't test the ISOC path much, so this is the only type of > EP that might come into question... > It should be similiar with isoc. I can't think of a potential issue yet. > Coming up with a way to address the concerns you brought up was a bit > difficult as there were scenarios we needed to consider. next_request() > doesn't always have to be the request being queued (even if ep queue > triggered it). There was no easy way to determine if kick transfer was > due to ep queue, but even if there was, we'd need to remember the > previous point as well. > Yeah, there are a few pitfalls. I don't have a good solution to it if we want to return failure immediately and let the function driver handle the dequeue (if it wants to). Thanks, Thinh
On 5/3/2021 8:12 PM, Thinh Nguyen wrote: > Hi Wesley, > > Wesley Cheng wrote: >> >> >> On 5/3/2021 7:20 PM, Thinh Nguyen wrote: >>> Hi, >>> >>> Wesley Cheng wrote: >>>> If an error is received when issuing a start or update transfer >>>> command, the error handler will stop all active requests (including >>>> the current USB request), and call dwc3_gadget_giveback() to notify >>>> function drivers of the requests which have been stopped. Avoid >>>> having to cancel the current request which is trying to be queued, as >>>> the function driver will handle the EP queue error accordingly. >>>> Simply unmap the request as it was done before, and allow previously >>>> started transfers to be cleaned up. >>>> >> >> Hi Thinh, >> >>> >>> It looks like you're still letting dwc3 stopping and cancelling all the >>> active requests instead letting the function driver doing the dequeue. >>> >> >> Yeah, main issue isn't due to the function driver doing dequeue, but >> having cleanup (ie USB request free) if there is an error during >> usb_ep_queue(). >> >> The function driver in question at the moment is the f_fs driver in AIO >> mode. When async IO is enabled in the FFS driver, every time it queues >> a packet, it will allocate a io_data struct beforehand. If the >> usb_ep_queue() fails it will free this io_data memory. Problem is that, >> since the DWC3 gadget calls the completion with -ECONNRESET, the FFS >> driver will also schedule a work item (within io_data struct) to handle >> the completion. So you end up with a flow like below >> >> allocate io_data (ffs) >> --> usb_ep_queue() >> --> __dwc3_gadget_kick_transfer() >> --> dwc3_send_gadget_ep_cmd(EINVAL) >> --> dwc3_gadget_ep_cleanup_cancelled_requests() >> --> dwc3_gadget_giveback(ECONNRESET) >> ffs completion callback >> queue work item within io_data >> --> usb_ep_queue returns EINVAL >> ffs frees io_data >> ... >> >> work scheduled >> --> NULL pointer/memory fault as io_data is freed Hi Thinh, > > sounds like a race issue. > It'll always happen if usb_ep_queue() fails with an error. Sorry for not clarifying, but the "..." represents executing in a different context :). Anything above the "..." is in the same context. >> >>> BTW, what kinds of command and error do you see in your setup and for >>> what type endpoint? I'm thinking of letting the function driver to >>> dequeue the requests instead of letting dwc3 automatically >>> ending/cancelling the queued requests. However, it's a bit tricky to do >>> that if the error is -ETIMEDOUT since we're not sure if the controller >>> had already cached the TRBs. >>> >> >> Happens on bulk EPs so far, but I think it wouldn't matter as long as >> its over the FFS interface. (and using async IO transfers) > > Do you know which command and error code? It's strange if > UPDATE_TRANSFER command failed. > Sorry for missing that part of the question. It is a no xfer resource error on a start transfer command. So far this happens on low system memory test cases, so there may be some sequences that were missed, which led to this particular command error. Thanks Wesley Cheng >> >>> This seems to add more complexity and I don't have a good solution to >>> it. Since you're already cancelling all the active request anyway, what >>> do you think of always letting dwc3_gadget_ep_queue() to go through with >>> success, but report failure through request completion? >>> >> >> We do have something similar as well downstream (returning success >> always on dwc3_gadget_ep_queue()) and its been working for us also. >> Problem is we don't test the ISOC path much, so this is the only type of >> EP that might come into question... >> > > It should be similiar with isoc. I can't think of a potential issue yet. > >> Coming up with a way to address the concerns you brought up was a bit >> difficult as there were scenarios we needed to consider. next_request() >> doesn't always have to be the request being queued (even if ep queue >> triggered it). There was no easy way to determine if kick transfer was >> due to ep queue, but even if there was, we'd need to remember the >> previous point as well. >> > > Yeah, there are a few pitfalls. I don't have a good solution to it if we > want to return failure immediately and let the function driver handle > the dequeue (if it wants to). > > Thanks, > Thinh >
Wesley Cheng wrote: > > > On 5/3/2021 8:12 PM, Thinh Nguyen wrote: >> Hi Wesley, >> >> Wesley Cheng wrote: >>> >>> >>> On 5/3/2021 7:20 PM, Thinh Nguyen wrote: >>>> Hi, >>>> >>>> Wesley Cheng wrote: >>>>> If an error is received when issuing a start or update transfer >>>>> command, the error handler will stop all active requests (including >>>>> the current USB request), and call dwc3_gadget_giveback() to notify >>>>> function drivers of the requests which have been stopped. Avoid >>>>> having to cancel the current request which is trying to be queued, as >>>>> the function driver will handle the EP queue error accordingly. >>>>> Simply unmap the request as it was done before, and allow previously >>>>> started transfers to be cleaned up. >>>>> >>> >>> Hi Thinh, >>> >>>> >>>> It looks like you're still letting dwc3 stopping and cancelling all the >>>> active requests instead letting the function driver doing the dequeue. >>>> >>> >>> Yeah, main issue isn't due to the function driver doing dequeue, but >>> having cleanup (ie USB request free) if there is an error during >>> usb_ep_queue(). >>> >>> The function driver in question at the moment is the f_fs driver in AIO >>> mode. When async IO is enabled in the FFS driver, every time it queues >>> a packet, it will allocate a io_data struct beforehand. If the >>> usb_ep_queue() fails it will free this io_data memory. Problem is that, >>> since the DWC3 gadget calls the completion with -ECONNRESET, the FFS >>> driver will also schedule a work item (within io_data struct) to handle >>> the completion. So you end up with a flow like below >>> >>> allocate io_data (ffs) >>> --> usb_ep_queue() >>> --> __dwc3_gadget_kick_transfer() >>> --> dwc3_send_gadget_ep_cmd(EINVAL) >>> --> dwc3_gadget_ep_cleanup_cancelled_requests() >>> --> dwc3_gadget_giveback(ECONNRESET) >>> ffs completion callback >>> queue work item within io_data >>> --> usb_ep_queue returns EINVAL >>> ffs frees io_data >>> ... >>> >>> work scheduled >>> --> NULL pointer/memory fault as io_data is freed > > Hi Thinh, > >> >> sounds like a race issue. >> > > It'll always happen if usb_ep_queue() fails with an error. Sorry for not > clarifying, but the "..." represents executing in a different context > :). Anything above the "..." is in the same context. >>> >>>> BTW, what kinds of command and error do you see in your setup and for >>>> what type endpoint? I'm thinking of letting the function driver to >>>> dequeue the requests instead of letting dwc3 automatically >>>> ending/cancelling the queued requests. However, it's a bit tricky to do >>>> that if the error is -ETIMEDOUT since we're not sure if the controller >>>> had already cached the TRBs. >>>> >>> >>> Happens on bulk EPs so far, but I think it wouldn't matter as long as >>> its over the FFS interface. (and using async IO transfers) >> >> Do you know which command and error code? It's strange if >> UPDATE_TRANSFER command failed. >> > > Sorry for missing that part of the question. It is a no xfer resource > error on a start transfer command. So far this happens on low system > memory test cases, so there may be some sequences that were missed, > which led to this particular command error. > > Thanks > Wesley Cheng No xfer resource usually means that the driver attempted to send START_TRANSFER without waiting for END_TRANSFER command to complete. This may be a dwc3 driver issue. Did you check this? Thanks, Thinh > >>> >>>> This seems to add more complexity and I don't have a good solution to >>>> it. Since you're already cancelling all the active request anyway, what >>>> do you think of always letting dwc3_gadget_ep_queue() to go through with >>>> success, but report failure through request completion? >>>> >>> >>> We do have something similar as well downstream (returning success >>> always on dwc3_gadget_ep_queue()) and its been working for us also. >>> Problem is we don't test the ISOC path much, so this is the only type of >>> EP that might come into question... >>> >> >> It should be similiar with isoc. I can't think of a potential issue yet. >> >>> Coming up with a way to address the concerns you brought up was a bit >>> difficult as there were scenarios we needed to consider. next_request() >>> doesn't always have to be the request being queued (even if ep queue >>> triggered it). There was no easy way to determine if kick transfer was >>> due to ep queue, but even if there was, we'd need to remember the >>> previous point as well. >>> >> >> Yeah, there are a few pitfalls. I don't have a good solution to it if we >> want to return failure immediately and let the function driver handle >> the dequeue (if it wants to). >> >> Thanks, >> Thinh >> >
On 5/3/2021 10:22 PM, Thinh Nguyen wrote: > Wesley Cheng wrote: >> >> >> On 5/3/2021 8:12 PM, Thinh Nguyen wrote: >>> Hi Wesley, >>> >>> Wesley Cheng wrote: >>>> >>>> >>>> On 5/3/2021 7:20 PM, Thinh Nguyen wrote: >>>>> Hi, >>>>> >>>>> Wesley Cheng wrote: >>>>>> If an error is received when issuing a start or update transfer >>>>>> command, the error handler will stop all active requests (including >>>>>> the current USB request), and call dwc3_gadget_giveback() to notify >>>>>> function drivers of the requests which have been stopped. Avoid >>>>>> having to cancel the current request which is trying to be queued, as >>>>>> the function driver will handle the EP queue error accordingly. >>>>>> Simply unmap the request as it was done before, and allow previously >>>>>> started transfers to be cleaned up. >>>>>> >>>> >>>> Hi Thinh, >>>> >>>>> >>>>> It looks like you're still letting dwc3 stopping and cancelling all the >>>>> active requests instead letting the function driver doing the dequeue. >>>>> >>>> >>>> Yeah, main issue isn't due to the function driver doing dequeue, but >>>> having cleanup (ie USB request free) if there is an error during >>>> usb_ep_queue(). >>>> >>>> The function driver in question at the moment is the f_fs driver in AIO >>>> mode. When async IO is enabled in the FFS driver, every time it queues >>>> a packet, it will allocate a io_data struct beforehand. If the >>>> usb_ep_queue() fails it will free this io_data memory. Problem is that, >>>> since the DWC3 gadget calls the completion with -ECONNRESET, the FFS >>>> driver will also schedule a work item (within io_data struct) to handle >>>> the completion. So you end up with a flow like below >>>> >>>> allocate io_data (ffs) >>>> --> usb_ep_queue() >>>> --> __dwc3_gadget_kick_transfer() >>>> --> dwc3_send_gadget_ep_cmd(EINVAL) >>>> --> dwc3_gadget_ep_cleanup_cancelled_requests() >>>> --> dwc3_gadget_giveback(ECONNRESET) >>>> ffs completion callback >>>> queue work item within io_data >>>> --> usb_ep_queue returns EINVAL >>>> ffs frees io_data >>>> ... >>>> >>>> work scheduled >>>> --> NULL pointer/memory fault as io_data is freed >> >> Hi Thinh, >> >>> >>> sounds like a race issue. >>> >> >> It'll always happen if usb_ep_queue() fails with an error. Sorry for not >> clarifying, but the "..." represents executing in a different context >> :). Anything above the "..." is in the same context. >>>> >>>>> BTW, what kinds of command and error do you see in your setup and for >>>>> what type endpoint? I'm thinking of letting the function driver to >>>>> dequeue the requests instead of letting dwc3 automatically >>>>> ending/cancelling the queued requests. However, it's a bit tricky to do >>>>> that if the error is -ETIMEDOUT since we're not sure if the controller >>>>> had already cached the TRBs. >>>>> >>>> >>>> Happens on bulk EPs so far, but I think it wouldn't matter as long as >>>> its over the FFS interface. (and using async IO transfers) >>> >>> Do you know which command and error code? It's strange if >>> UPDATE_TRANSFER command failed. >>> >> >> Sorry for missing that part of the question. It is a no xfer resource >> error on a start transfer command. So far this happens on low system >> memory test cases, so there may be some sequences that were missed, >> which led to this particular command error. >> >> Thanks >> Wesley Cheng Hi Thinh, > > No xfer resource usually means that the driver attempted to send > START_TRANSFER without waiting for END_TRANSFER command to complete. > This may be a dwc3 driver issue. Did you check this? > > Thanks, > Thinh > > Yes, we know the reason why this happens, and its due to one of the downstream changes we had that led to the scenario above. Although, that has been fixed, I still believe the error path is a potential scenario we'd still want to address. I think the returning success always on dwc3_gadget_ep_queue(), and allowing the error in the completion handler/giveback at the function driver level to do the cleanup is a feasible solution. Doesn't change the flow of the DWC3 gadget, and so far all function drivers we've used handle this in the correct manner. Thanks Wesley Cheng >> >>>> >>>>> This seems to add more complexity and I don't have a good solution to >>>>> it. Since you're already cancelling all the active request anyway, what >>>>> do you think of always letting dwc3_gadget_ep_queue() to go through with >>>>> success, but report failure through request completion? >>>>> >>>> >>>> We do have something similar as well downstream (returning success >>>> always on dwc3_gadget_ep_queue()) and its been working for us also. >>>> Problem is we don't test the ISOC path much, so this is the only type of >>>> EP that might come into question... >>>> >>> >>> It should be similiar with isoc. I can't think of a potential issue yet. >>> >>>> Coming up with a way to address the concerns you brought up was a bit >>>> difficult as there were scenarios we needed to consider. next_request() >>>> doesn't always have to be the request being queued (even if ep queue >>>> triggered it). There was no easy way to determine if kick transfer was >>>> due to ep queue, but even if there was, we'd need to remember the >>>> previous point as well. >>>> >>> >>> Yeah, there are a few pitfalls. I don't have a good solution to it if we >>> want to return failure immediately and let the function driver handle >>> the dequeue (if it wants to). >>> >>> Thanks, >>> Thinh >>> >> >
Wesley Cheng wrote: > > > On 5/3/2021 10:22 PM, Thinh Nguyen wrote: >> Wesley Cheng wrote: >>> >>> >>> On 5/3/2021 8:12 PM, Thinh Nguyen wrote: >>>> Hi Wesley, >>>> >>>> Wesley Cheng wrote: >>>>> >>>>> >>>>> On 5/3/2021 7:20 PM, Thinh Nguyen wrote: >>>>>> Hi, >>>>>> >>>>>> Wesley Cheng wrote: >>>>>>> If an error is received when issuing a start or update transfer >>>>>>> command, the error handler will stop all active requests (including >>>>>>> the current USB request), and call dwc3_gadget_giveback() to notify >>>>>>> function drivers of the requests which have been stopped. Avoid >>>>>>> having to cancel the current request which is trying to be queued, as >>>>>>> the function driver will handle the EP queue error accordingly. >>>>>>> Simply unmap the request as it was done before, and allow previously >>>>>>> started transfers to be cleaned up. >>>>>>> >>>>> >>>>> Hi Thinh, >>>>> >>>>>> >>>>>> It looks like you're still letting dwc3 stopping and cancelling all the >>>>>> active requests instead letting the function driver doing the dequeue. >>>>>> >>>>> >>>>> Yeah, main issue isn't due to the function driver doing dequeue, but >>>>> having cleanup (ie USB request free) if there is an error during >>>>> usb_ep_queue(). >>>>> >>>>> The function driver in question at the moment is the f_fs driver in AIO >>>>> mode. When async IO is enabled in the FFS driver, every time it queues >>>>> a packet, it will allocate a io_data struct beforehand. If the >>>>> usb_ep_queue() fails it will free this io_data memory. Problem is that, >>>>> since the DWC3 gadget calls the completion with -ECONNRESET, the FFS >>>>> driver will also schedule a work item (within io_data struct) to handle >>>>> the completion. So you end up with a flow like below >>>>> >>>>> allocate io_data (ffs) >>>>> --> usb_ep_queue() >>>>> --> __dwc3_gadget_kick_transfer() >>>>> --> dwc3_send_gadget_ep_cmd(EINVAL) >>>>> --> dwc3_gadget_ep_cleanup_cancelled_requests() >>>>> --> dwc3_gadget_giveback(ECONNRESET) >>>>> ffs completion callback >>>>> queue work item within io_data >>>>> --> usb_ep_queue returns EINVAL >>>>> ffs frees io_data >>>>> ... >>>>> >>>>> work scheduled >>>>> --> NULL pointer/memory fault as io_data is freed >>> >>> Hi Thinh, >>> >>>> >>>> sounds like a race issue. >>>> >>> >>> It'll always happen if usb_ep_queue() fails with an error. Sorry for not >>> clarifying, but the "..." represents executing in a different context >>> :). Anything above the "..." is in the same context. >>>>> >>>>>> BTW, what kinds of command and error do you see in your setup and for >>>>>> what type endpoint? I'm thinking of letting the function driver to >>>>>> dequeue the requests instead of letting dwc3 automatically >>>>>> ending/cancelling the queued requests. However, it's a bit tricky to do >>>>>> that if the error is -ETIMEDOUT since we're not sure if the controller >>>>>> had already cached the TRBs. >>>>>> >>>>> >>>>> Happens on bulk EPs so far, but I think it wouldn't matter as long as >>>>> its over the FFS interface. (and using async IO transfers) >>>> >>>> Do you know which command and error code? It's strange if >>>> UPDATE_TRANSFER command failed. >>>> >>> >>> Sorry for missing that part of the question. It is a no xfer resource >>> error on a start transfer command. So far this happens on low system >>> memory test cases, so there may be some sequences that were missed, >>> which led to this particular command error. >>> >>> Thanks >>> Wesley Cheng > > Hi Thinh, > >> >> No xfer resource usually means that the driver attempted to send >> START_TRANSFER without waiting for END_TRANSFER command to complete. >> This may be a dwc3 driver issue. Did you check this? >> >> Thanks, >> Thinh >> >> > > Yes, we know the reason why this happens, and its due to one of the > downstream changes we had that led to the scenario above. Although, > that has been fixed, I still believe the error path is a potential > scenario we'd still want to address. > > I think the returning success always on dwc3_gadget_ep_queue(), and > allowing the error in the completion handler/giveback at the function > driver level to do the cleanup is a feasible solution. Doesn't change > the flow of the DWC3 gadget, and so far all function drivers we've used > handle this in the correct manner. > > Thanks > Wesley Cheng Right. I think for now we should do that (return success always except for cases of disconnect or already in-flight etc). This helps keeping it simple and avoid some pitfalls dealing with giving back the request. Currently we return the error status to dwc3_gadget_ep_queue if we failed to send a command that may not even related to the same request being queued. This way, I think it matches with how we handle it in the driver. We always put the request in the pending list (queued) first and possibly start/update the controller with new data. Thanks, Thinh > >>> >>>>> >>>>>> This seems to add more complexity and I don't have a good solution to >>>>>> it. Since you're already cancelling all the active request anyway, what >>>>>> do you think of always letting dwc3_gadget_ep_queue() to go through with >>>>>> success, but report failure through request completion? >>>>>> >>>>> >>>>> We do have something similar as well downstream (returning success >>>>> always on dwc3_gadget_ep_queue()) and its been working for us also. >>>>> Problem is we don't test the ISOC path much, so this is the only type of >>>>> EP that might come into question... >>>>> >>>> >>>> It should be similiar with isoc. I can't think of a potential issue yet. >>>> >>>>> Coming up with a way to address the concerns you brought up was a bit >>>>> difficult as there were scenarios we needed to consider. next_request() >>>>> doesn't always have to be the request being queued (even if ep queue >>>>> triggered it). There was no easy way to determine if kick transfer was >>>>> due to ep queue, but even if there was, we'd need to remember the >>>>> previous point as well. >>>>> >>>> >>>> Yeah, there are a few pitfalls. I don't have a good solution to it if we >>>> want to return failure immediately and let the function driver handle >>>> the dequeue (if it wants to). >>>> >>>> Thanks, >>>> Thinh >>>> >>> >> >
On 5/4/2021 6:50 PM, Thinh Nguyen wrote: > Wesley Cheng wrote: >> >> >> On 5/3/2021 10:22 PM, Thinh Nguyen wrote: >>> Wesley Cheng wrote: >>>> >>>> >>>> On 5/3/2021 8:12 PM, Thinh Nguyen wrote: >>>>> Hi Wesley, >>>>> >>>>> Wesley Cheng wrote: >>>>>> >>>>>> >>>>>> On 5/3/2021 7:20 PM, Thinh Nguyen wrote: >>>>>>> Hi, >>>>>>> >>>>>>> Wesley Cheng wrote: >>>>>>>> If an error is received when issuing a start or update transfer >>>>>>>> command, the error handler will stop all active requests (including >>>>>>>> the current USB request), and call dwc3_gadget_giveback() to notify >>>>>>>> function drivers of the requests which have been stopped. Avoid >>>>>>>> having to cancel the current request which is trying to be queued, as >>>>>>>> the function driver will handle the EP queue error accordingly. >>>>>>>> Simply unmap the request as it was done before, and allow previously >>>>>>>> started transfers to be cleaned up. >>>>>>>> >>>>>> >>>>>> Hi Thinh, >>>>>> >>>>>>> >>>>>>> It looks like you're still letting dwc3 stopping and cancelling all the >>>>>>> active requests instead letting the function driver doing the dequeue. >>>>>>> >>>>>> >>>>>> Yeah, main issue isn't due to the function driver doing dequeue, but >>>>>> having cleanup (ie USB request free) if there is an error during >>>>>> usb_ep_queue(). >>>>>> >>>>>> The function driver in question at the moment is the f_fs driver in AIO >>>>>> mode. When async IO is enabled in the FFS driver, every time it queues >>>>>> a packet, it will allocate a io_data struct beforehand. If the >>>>>> usb_ep_queue() fails it will free this io_data memory. Problem is that, >>>>>> since the DWC3 gadget calls the completion with -ECONNRESET, the FFS >>>>>> driver will also schedule a work item (within io_data struct) to handle >>>>>> the completion. So you end up with a flow like below >>>>>> >>>>>> allocate io_data (ffs) >>>>>> --> usb_ep_queue() >>>>>> --> __dwc3_gadget_kick_transfer() >>>>>> --> dwc3_send_gadget_ep_cmd(EINVAL) >>>>>> --> dwc3_gadget_ep_cleanup_cancelled_requests() >>>>>> --> dwc3_gadget_giveback(ECONNRESET) >>>>>> ffs completion callback >>>>>> queue work item within io_data >>>>>> --> usb_ep_queue returns EINVAL >>>>>> ffs frees io_data >>>>>> ... >>>>>> >>>>>> work scheduled >>>>>> --> NULL pointer/memory fault as io_data is freed >>>> >>>> Hi Thinh, >>>> >>>>> >>>>> sounds like a race issue. >>>>> >>>> >>>> It'll always happen if usb_ep_queue() fails with an error. Sorry for not >>>> clarifying, but the "..." represents executing in a different context >>>> :). Anything above the "..." is in the same context. >>>>>> >>>>>>> BTW, what kinds of command and error do you see in your setup and for >>>>>>> what type endpoint? I'm thinking of letting the function driver to >>>>>>> dequeue the requests instead of letting dwc3 automatically >>>>>>> ending/cancelling the queued requests. However, it's a bit tricky to do >>>>>>> that if the error is -ETIMEDOUT since we're not sure if the controller >>>>>>> had already cached the TRBs. >>>>>>> >>>>>> >>>>>> Happens on bulk EPs so far, but I think it wouldn't matter as long as >>>>>> its over the FFS interface. (and using async IO transfers) >>>>> >>>>> Do you know which command and error code? It's strange if >>>>> UPDATE_TRANSFER command failed. >>>>> >>>> >>>> Sorry for missing that part of the question. It is a no xfer resource >>>> error on a start transfer command. So far this happens on low system >>>> memory test cases, so there may be some sequences that were missed, >>>> which led to this particular command error. >>>> >>>> Thanks >>>> Wesley Cheng >> >> Hi Thinh, >> >>> >>> No xfer resource usually means that the driver attempted to send >>> START_TRANSFER without waiting for END_TRANSFER command to complete. >>> This may be a dwc3 driver issue. Did you check this? >>> >>> Thanks, >>> Thinh >>> >>> >> >> Yes, we know the reason why this happens, and its due to one of the >> downstream changes we had that led to the scenario above. Although, >> that has been fixed, I still believe the error path is a potential >> scenario we'd still want to address. >> >> I think the returning success always on dwc3_gadget_ep_queue(), and >> allowing the error in the completion handler/giveback at the function >> driver level to do the cleanup is a feasible solution. Doesn't change >> the flow of the DWC3 gadget, and so far all function drivers we've used >> handle this in the correct manner. >> >> Thanks >> Wesley Cheng > > Right. I think for now we should do that (return success always except > for cases of disconnect or already in-flight etc). This helps keeping it > simple and avoid some pitfalls dealing with giving back the request. > Currently we return the error status to dwc3_gadget_ep_queue if we > failed to send a command that may not even related to the same request > being queued. > > This way, I think it matches with how we handle it in the driver. We > always put the request in the pending list (queued) first and possibly > start/update the controller with new data. > > Thanks, > Thinh > > Hi Thinh, Agreed, thanks for the input and in depth discussion. Will spin a new revision with the suggestion above. Thanks Wesley Cheng >> >>>> >>>>>> >>>>>>> This seems to add more complexity and I don't have a good solution to >>>>>>> it. Since you're already cancelling all the active request anyway, what >>>>>>> do you think of always letting dwc3_gadget_ep_queue() to go through with >>>>>>> success, but report failure through request completion? >>>>>>> >>>>>> >>>>>> We do have something similar as well downstream (returning success >>>>>> always on dwc3_gadget_ep_queue()) and its been working for us also. >>>>>> Problem is we don't test the ISOC path much, so this is the only type of >>>>>> EP that might come into question... >>>>>> >>>>> >>>>> It should be similiar with isoc. I can't think of a potential issue yet. >>>>> >>>>>> Coming up with a way to address the concerns you brought up was a bit >>>>>> difficult as there were scenarios we needed to consider. next_request() >>>>>> doesn't always have to be the request being queued (even if ep queue >>>>>> triggered it). There was no easy way to determine if kick transfer was >>>>>> due to ep queue, but even if there was, we'd need to remember the >>>>>> previous point as well. >>>>>> >>>>> >>>>> Yeah, there are a few pitfalls. I don't have a good solution to it if we >>>>> want to return failure immediately and let the function driver handle >>>>> the dequeue (if it wants to). >>>>> >>>>> Thanks, >>>>> Thinh >>>>> >>>> >>> >> >
Wesley Cheng <wcheng@codeaurora.org> writes: > If an error is received when issuing a start or update transfer > command, the error handler will stop all active requests (including > the current USB request), and call dwc3_gadget_giveback() to notify > function drivers of the requests which have been stopped. Avoid > having to cancel the current request which is trying to be queued, as > the function driver will handle the EP queue error accordingly. > Simply unmap the request as it was done before, and allow previously > started transfers to be cleaned up. > > Signed-off-by: Wesley Cheng <wcheng@codeaurora.org> > --- > Changes in v2: > - Addressed feedback received by making sure the logic only applies to a req > which is being queued, decrementing the enqueue pointer, and only clearing > the HWO bit. > > drivers/usb/dwc3/gadget.c | 75 +++++++++++++++++++++++++++++++++++++++++------ > 1 file changed, 66 insertions(+), 9 deletions(-) > > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c > index dd80e5c..c8ddbe1 100644 > --- a/drivers/usb/dwc3/gadget.c > +++ b/drivers/usb/dwc3/gadget.c > @@ -140,6 +140,29 @@ int dwc3_gadget_set_link_state(struct dwc3 *dwc, enum dwc3_link_state state) > } > > /** > + * dwc3_ep_dec_trb - decrement a trb index. > + * @index: Pointer to the TRB index to increment. > + * > + * The index should never point to the link TRB. After decrementing, > + * if index is zero, wrap around to the TRB before the link TRB. > + */ > +static void dwc3_ep_dec_trb(u8 *index) > +{ > + (*index)--; > + if (*index < 0) > + *index = DWC3_TRB_NUM - 1; > +} > + > +/** > + * dwc3_ep_dec_enq - decrement endpoint's enqueue pointer > + * @dep: The endpoint whose enqueue pointer we're decrementing > + */ > +static void dwc3_ep_dec_enq(struct dwc3_ep *dep) > +{ > + dwc3_ep_dec_trb(&dep->trb_enqueue); > +} > + > +/** > * dwc3_ep_inc_trb - increment a trb index. > * @index: Pointer to the TRB index to increment. > * it looks like moving these helpers isn't part of $subject and could be split to a patch of its own. > @@ -1352,7 +1375,26 @@ static int dwc3_prepare_trbs(struct dwc3_ep *dep) > > static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep); > > -static int __dwc3_gadget_kick_transfer(struct dwc3_ep *dep) > +static void dwc3_gadget_ep_revert_trbs(struct dwc3_ep *dep, struct dwc3_request *req) > +{ > + int i; > + > + if (!req->trb) > + return; > + > + for (i = 0; i < req->num_trbs; i++) { > + struct dwc3_trb *trb; > + > + trb = &dep->trb_pool[dep->trb_enqueue]; wait, enqueue or dequeue? > @@ -1410,8 +1452,23 @@ static int __dwc3_gadget_kick_transfer(struct dwc3_ep *dep) > > dwc3_stop_active_transfer(dep, true, true); > > - list_for_each_entry_safe(req, tmp, &dep->started_list, list) > - dwc3_gadget_move_cancelled_request(req, DWC3_REQUEST_STATUS_DEQUEUED); > + /* > + * In order to ensure the logic is applied to a request being > + * queued by dwc3_gadget_ep_queue(), it needs to explicitly > + * check that req is the same as queued_req (request being > + * queued). If so, then just unmap and decrement the enqueue > + * pointer, as the usb_ep_queue() error handler in the function > + * driver will handle cleaning up the USB request. > + */ > + list_for_each_entry_safe(req, tmp, &dep->started_list, list) { > + if (req == queued_req) { > + dwc3_gadget_ep_revert_trbs(dep, req); > + dwc3_gadget_del_and_unmap_request(dep, req, ret); > + } else { > + dwc3_gadget_move_cancelled_request(req, > + DWC3_REQUEST_STATUS_DEQUEUED); we don't line up the arguments like that in dwc3. > @@ -1546,7 +1603,7 @@ static int dwc3_gadget_start_isoc_quirk(struct dwc3_ep *dep) > dep->start_cmd_status = 0; > dep->combo_num = 0; > > - return __dwc3_gadget_kick_transfer(dep); > + return __dwc3_gadget_kick_transfer(dep, NULL); I would rather not have this extra special case for kick transfer, instead you can treat the special case in the only location where it can happen, i.e. ep_queue(). So, instead of patching kick_transfer itself, you can make sure that kick transfer does *NOT* touch the current request and only cancel all the previous, then ep_queue is responsible for treating failed kick transfer for the current request. Either that, or make sure dwc3_prepare_trbs() knows how to handle this special case internally. The current method seems wrong, IMO. It really seems like the problem is elsewhere. Perhaps there's some other part of the driver that's not cleaning up after itself.
Hi, Wesley Cheng <wcheng@codeaurora.org> writes: > On 5/3/2021 7:20 PM, Thinh Nguyen wrote: >> Hi, >> >> Wesley Cheng wrote: >>> If an error is received when issuing a start or update transfer >>> command, the error handler will stop all active requests (including >>> the current USB request), and call dwc3_gadget_giveback() to notify >>> function drivers of the requests which have been stopped. Avoid >>> having to cancel the current request which is trying to be queued, as >>> the function driver will handle the EP queue error accordingly. >>> Simply unmap the request as it was done before, and allow previously >>> started transfers to be cleaned up. >>> > > Hi Thinh, > >> >> It looks like you're still letting dwc3 stopping and cancelling all the >> active requests instead letting the function driver doing the dequeue. >> > > Yeah, main issue isn't due to the function driver doing dequeue, but > having cleanup (ie USB request free) if there is an error during > usb_ep_queue(). > > The function driver in question at the moment is the f_fs driver in AIO > mode. When async IO is enabled in the FFS driver, every time it queues > a packet, it will allocate a io_data struct beforehand. If the > usb_ep_queue() fails it will free this io_data memory. Problem is that, > since the DWC3 gadget calls the completion with -ECONNRESET, the FFS > driver will also schedule a work item (within io_data struct) to handle > the completion. So you end up with a flow like below > > allocate io_data (ffs) > --> usb_ep_queue() > --> __dwc3_gadget_kick_transfer() > --> dwc3_send_gadget_ep_cmd(EINVAL) > --> dwc3_gadget_ep_cleanup_cancelled_requests() > --> dwc3_gadget_giveback(ECONNRESET) > ffs completion callback > queue work item within io_data > --> usb_ep_queue returns EINVAL > ffs frees io_data > ... > > work scheduled > --> NULL pointer/memory fault as io_data is freed I have some vague memory of discussing (something like) this with Alan Stern long ago and the conclusion was that the gadget driver should handle cases such as this. OTOH, we're returning failure during usb_ep_queue() which tells me there's something with dwc3 (perhaps not exclusively, but that's yet to be shown). If I understood the whole thing correctly, we want everything except the current request (the one that failed START or UPDATE transfer) to go through giveback(). This really tells me that we're not handling error case in kick_transfer and/or prepare_trbs() correctly. I also don't want to pass another argument to kick_transfer because it should be unnecessary: the current request should *always* be the last one in the list. Therefore we should rely on something like list_last_entry() followed by list_for_each_entry_safe_reverse() to handle this without a special case. ret = dwc3_send_gadget_ep_cmd(); if (ret < 0) { current = list_last_entry(); unmap(current); for_each_trb_in(current) { clear_HWO(trb); } list_for_entry_safe_reverse() { move_cancelled(); } }
Hi, Thinh Nguyen <Thinh.Nguyen@synopsys.com> writes: >>>>>> allocate io_data (ffs) >>>>>> --> usb_ep_queue() >>>>>> --> __dwc3_gadget_kick_transfer() >>>>>> --> dwc3_send_gadget_ep_cmd(EINVAL) >>>>>> --> dwc3_gadget_ep_cleanup_cancelled_requests() >>>>>> --> dwc3_gadget_giveback(ECONNRESET) >>>>>> ffs completion callback >>>>>> queue work item within io_data >>>>>> --> usb_ep_queue returns EINVAL >>>>>> ffs frees io_data >>>>>> ... >>>>>> >>>>>> work scheduled >>>>>> --> NULL pointer/memory fault as io_data is freed >>>> >>>> Hi Thinh, >>>> >>>>> >>>>> sounds like a race issue. >>>>> >>>> >>>> It'll always happen if usb_ep_queue() fails with an error. Sorry for not >>>> clarifying, but the "..." represents executing in a different context >>>> :). Anything above the "..." is in the same context. >>>>>> >>>>>>> BTW, what kinds of command and error do you see in your setup and for >>>>>>> what type endpoint? I'm thinking of letting the function driver to >>>>>>> dequeue the requests instead of letting dwc3 automatically >>>>>>> ending/cancelling the queued requests. However, it's a bit tricky to do >>>>>>> that if the error is -ETIMEDOUT since we're not sure if the controller >>>>>>> had already cached the TRBs. >>>>>>> >>>>>> >>>>>> Happens on bulk EPs so far, but I think it wouldn't matter as long as >>>>>> its over the FFS interface. (and using async IO transfers) >>>>> >>>>> Do you know which command and error code? It's strange if >>>>> UPDATE_TRANSFER command failed. >>>>> >>>> >>>> Sorry for missing that part of the question. It is a no xfer resource >>>> error on a start transfer command. So far this happens on low system >>>> memory test cases, so there may be some sequences that were missed, >>>> which led to this particular command error. >>>> >>>> Thanks >>>> Wesley Cheng >> >> Hi Thinh, >> >>> >>> No xfer resource usually means that the driver attempted to send >>> START_TRANSFER without waiting for END_TRANSFER command to complete. >>> This may be a dwc3 driver issue. Did you check this? >>> >>> Thanks, >>> Thinh >>> >>> >> >> Yes, we know the reason why this happens, and its due to one of the >> downstream changes we had that led to the scenario above. Although, >> that has been fixed, I still believe the error path is a potential >> scenario we'd still want to address. >> >> I think the returning success always on dwc3_gadget_ep_queue(), and >> allowing the error in the completion handler/giveback at the function >> driver level to do the cleanup is a feasible solution. Doesn't change >> the flow of the DWC3 gadget, and so far all function drivers we've used >> handle this in the correct manner. >> >> Thanks >> Wesley Cheng > > Right. I think for now we should do that (return success always except > for cases of disconnect or already in-flight etc). This helps keeping it no, let's not lie to our users ;-) > simple and avoid some pitfalls dealing with giving back the request. > Currently we return the error status to dwc3_gadget_ep_queue if we > failed to send a command that may not even related to the same request > being queued. I think the fix should be simple, but we're trying to patch it in the wrong way. Can y'all comment on my suggestion on the other subthread?
On Wed, May 05, 2021 at 03:57:03PM +0300, Felipe Balbi wrote: > > Hi, > > Wesley Cheng <wcheng@codeaurora.org> writes: > > On 5/3/2021 7:20 PM, Thinh Nguyen wrote: > >> Hi, > >> > >> Wesley Cheng wrote: > >>> If an error is received when issuing a start or update transfer > >>> command, the error handler will stop all active requests (including > >>> the current USB request), and call dwc3_gadget_giveback() to notify > >>> function drivers of the requests which have been stopped. Avoid > >>> having to cancel the current request which is trying to be queued, as > >>> the function driver will handle the EP queue error accordingly. > >>> Simply unmap the request as it was done before, and allow previously > >>> started transfers to be cleaned up. > >>> > > > > Hi Thinh, > > > >> > >> It looks like you're still letting dwc3 stopping and cancelling all the > >> active requests instead letting the function driver doing the dequeue. > >> > > > > Yeah, main issue isn't due to the function driver doing dequeue, but > > having cleanup (ie USB request free) if there is an error during > > usb_ep_queue(). > > > > The function driver in question at the moment is the f_fs driver in AIO > > mode. When async IO is enabled in the FFS driver, every time it queues > > a packet, it will allocate a io_data struct beforehand. If the > > usb_ep_queue() fails it will free this io_data memory. Problem is that, > > since the DWC3 gadget calls the completion with -ECONNRESET, the FFS > > driver will also schedule a work item (within io_data struct) to handle > > the completion. So you end up with a flow like below > > > > allocate io_data (ffs) > > --> usb_ep_queue() > > --> __dwc3_gadget_kick_transfer() > > --> dwc3_send_gadget_ep_cmd(EINVAL) > > --> dwc3_gadget_ep_cleanup_cancelled_requests() > > --> dwc3_gadget_giveback(ECONNRESET) > > ffs completion callback > > queue work item within io_data > > --> usb_ep_queue returns EINVAL > > ffs frees io_data > > ... > > > > work scheduled > > --> NULL pointer/memory fault as io_data is freed Am I reading this correctly? It looks like usb_ep_queue() returns a -EINVAL error, but then the completion callback gets invoked with a status of -ECONNRESET. > I have some vague memory of discussing (something like) this with Alan > Stern long ago and the conclusion was that the gadget driver should > handle cases such as this. Indeed, this predates the creation of the Gadget API; the same design principle applies to the host-side API. It's a very simple idea: If an URB or usb_request submission succeeds, it is guaranteed that the completion routine will be called when the transfer is finished, cancelled, aborted, or whatever (note that this may happen before the submission call returns). If an URB or usb_request submission fails, it is guaranteed that the completion routine will not be called. So if dwc3 behaves as described above (usb_ep_queue() fails _and_ the completion handler is called), this is a bug. Alan Stern > OTOH, we're returning failure during > usb_ep_queue() which tells me there's something with dwc3 (perhaps not > exclusively, but that's yet to be shown). > > If I understood the whole thing correctly, we want everything except the > current request (the one that failed START or UPDATE transfer) to go > through giveback(). This really tells me that we're not handling error > case in kick_transfer and/or prepare_trbs() correctly. > > I also don't want to pass another argument to kick_transfer because it > should be unnecessary: the current request should *always* be the last > one in the list. Therefore we should rely on something like > list_last_entry() followed by list_for_each_entry_safe_reverse() to > handle this without a special case. > > ret = dwc3_send_gadget_ep_cmd(); > if (ret < 0) { > current = list_last_entry(); > > unmap(current); > for_each_trb_in(current) { > clear_HWO(trb); > } > > list_for_entry_safe_reverse() { > move_cancelled(); > } > } > > -- > balbi
On 5/5/2021 5:57 AM, Felipe Balbi wrote: > > Hi, > > Wesley Cheng <wcheng@codeaurora.org> writes: >> On 5/3/2021 7:20 PM, Thinh Nguyen wrote: >>> Hi, >>> >>> Wesley Cheng wrote: >>>> If an error is received when issuing a start or update transfer >>>> command, the error handler will stop all active requests (including >>>> the current USB request), and call dwc3_gadget_giveback() to notify >>>> function drivers of the requests which have been stopped. Avoid >>>> having to cancel the current request which is trying to be queued, as >>>> the function driver will handle the EP queue error accordingly. >>>> Simply unmap the request as it was done before, and allow previously >>>> started transfers to be cleaned up. >>>> >> >> Hi Thinh, >> >>> >>> It looks like you're still letting dwc3 stopping and cancelling all the >>> active requests instead letting the function driver doing the dequeue. >>> >> >> Yeah, main issue isn't due to the function driver doing dequeue, but >> having cleanup (ie USB request free) if there is an error during >> usb_ep_queue(). >> >> The function driver in question at the moment is the f_fs driver in AIO >> mode. When async IO is enabled in the FFS driver, every time it queues >> a packet, it will allocate a io_data struct beforehand. If the >> usb_ep_queue() fails it will free this io_data memory. Problem is that, >> since the DWC3 gadget calls the completion with -ECONNRESET, the FFS >> driver will also schedule a work item (within io_data struct) to handle >> the completion. So you end up with a flow like below >> >> allocate io_data (ffs) >> --> usb_ep_queue() >> --> __dwc3_gadget_kick_transfer() >> --> dwc3_send_gadget_ep_cmd(EINVAL) >> --> dwc3_gadget_ep_cleanup_cancelled_requests() >> --> dwc3_gadget_giveback(ECONNRESET) >> ffs completion callback >> queue work item within io_data >> --> usb_ep_queue returns EINVAL >> ffs frees io_data >> ... >> >> work scheduled >> --> NULL pointer/memory fault as io_data is freed > > I have some vague memory of discussing (something like) this with Alan > Stern long ago and the conclusion was that the gadget driver should > handle cases such as this. OTOH, we're returning failure during > usb_ep_queue() which tells me there's something with dwc3 (perhaps not > exclusively, but that's yet to be shown). > Hi Felipe, > If I understood the whole thing correctly, we want everything except the > current request (the one that failed START or UPDATE transfer) to go > through giveback(). This really tells me that we're not handling error > case in kick_transfer and/or prepare_trbs() correctly. > We don't want the request passed in usb_ep_queue() to be calling giveback() IF DONE IN the usb_ep_queue() context only. > I also don't want to pass another argument to kick_transfer because it > should be unnecessary: the current request should *always* be the last > one in the list. Therefore we should rely on something like > list_last_entry() followed by list_for_each_entry_safe_reverse() to > handle this without a special case. > > ret = dwc3_send_gadget_ep_cmd(); > if (ret < 0) { > current = list_last_entry(); > > unmap(current); > for_each_trb_in(current) { > clear_HWO(trb); > } > > list_for_entry_safe_reverse() { > move_cancelled(); > } > } > Nice, thanks for the suggestion and info! Problem we have is that kick transfer is being used elsewhere, for example, during the TRB complete path: static bool dwc3_gadget_endpoint_trbs_complete(struct dwc3_ep *dep, const struct dwc3_event_depevt *event, int status) { ... else if (dwc3_gadget_ep_should_continue(dep)) if (__dwc3_gadget_kick_transfer(dep) == 0) no_started_trb = false; So in these types of calls, we would still want ALL requests to be cancelled w/ giveback() called, so that the completion() callbacks can cleanup/free those requests accordingly. If we went and only unmapped the last entry (and removed it from any list), then no one would clean it up as it is outside of the usb_ep_queue() context, and not within any of the DWC3 lists. Thanks Wesley Cheng
On 5/5/2021 8:19 AM, Alan Stern wrote: > On Wed, May 05, 2021 at 03:57:03PM +0300, Felipe Balbi wrote: >> >> Hi, >> >> Wesley Cheng <wcheng@codeaurora.org> writes: >>> On 5/3/2021 7:20 PM, Thinh Nguyen wrote: >>>> Hi, >>>> >>>> Wesley Cheng wrote: >>>>> If an error is received when issuing a start or update transfer >>>>> command, the error handler will stop all active requests (including >>>>> the current USB request), and call dwc3_gadget_giveback() to notify >>>>> function drivers of the requests which have been stopped. Avoid >>>>> having to cancel the current request which is trying to be queued, as >>>>> the function driver will handle the EP queue error accordingly. >>>>> Simply unmap the request as it was done before, and allow previously >>>>> started transfers to be cleaned up. >>>>> >>> >>> Hi Thinh, >>> >>>> >>>> It looks like you're still letting dwc3 stopping and cancelling all the >>>> active requests instead letting the function driver doing the dequeue. >>>> >>> >>> Yeah, main issue isn't due to the function driver doing dequeue, but >>> having cleanup (ie USB request free) if there is an error during >>> usb_ep_queue(). >>> >>> The function driver in question at the moment is the f_fs driver in AIO >>> mode. When async IO is enabled in the FFS driver, every time it queues >>> a packet, it will allocate a io_data struct beforehand. If the >>> usb_ep_queue() fails it will free this io_data memory. Problem is that, >>> since the DWC3 gadget calls the completion with -ECONNRESET, the FFS >>> driver will also schedule a work item (within io_data struct) to handle >>> the completion. So you end up with a flow like below >>> >>> allocate io_data (ffs) >>> --> usb_ep_queue() >>> --> __dwc3_gadget_kick_transfer() >>> --> dwc3_send_gadget_ep_cmd(EINVAL) >>> --> dwc3_gadget_ep_cleanup_cancelled_requests() >>> --> dwc3_gadget_giveback(ECONNRESET) >>> ffs completion callback >>> queue work item within io_data >>> --> usb_ep_queue returns EINVAL >>> ffs frees io_data >>> ... >>> >>> work scheduled >>> --> NULL pointer/memory fault as io_data is freed Hi Alan, > > Am I reading this correctly? It looks like usb_ep_queue() returns a > -EINVAL error, but then the completion callback gets invoked with a > status of -ECONNRESET. > Yes, your understanding of the current behavior is correct. In this situation we'd get a completion callback with ECONNRESET, and then also return EINVAL to usb_ep_queue(). >> I have some vague memory of discussing (something like) this with Alan >> Stern long ago and the conclusion was that the gadget driver should >> handle cases such as this. > > Indeed, this predates the creation of the Gadget API; the same design > principle applies to the host-side API. It's a very simple idea: > > If an URB or usb_request submission succeeds, it is guaranteed > that the completion routine will be called when the transfer is > finished, cancelled, aborted, or whatever (note that this may > happen before the submission call returns). > > If an URB or usb_request submission fails, it is guaranteed that > the completion routine will not be called. > > So if dwc3 behaves as described above (usb_ep_queue() fails _and_ the > completion handler is called), this is a bug Thanks for this. So we're trying to only allow one or the other to happen right now. (either completion with an error, or returning error for usb_ep_queue()) I think that would be OK then. Thanks Wesley Cheng > > Alan Stern > >> OTOH, we're returning failure during >> usb_ep_queue() which tells me there's something with dwc3 (perhaps not >> exclusively, but that's yet to be shown). >> >> If I understood the whole thing correctly, we want everything except the >> current request (the one that failed START or UPDATE transfer) to go >> through giveback(). This really tells me that we're not handling error >> case in kick_transfer and/or prepare_trbs() correctly. >> >> I also don't want to pass another argument to kick_transfer because it >> should be unnecessary: the current request should *always* be the last >> one in the list. Therefore we should rely on something like >> list_last_entry() followed by list_for_each_entry_safe_reverse() to >> handle this without a special case. >> >> ret = dwc3_send_gadget_ep_cmd(); >> if (ret < 0) { >> current = list_last_entry(); >> >> unmap(current); >> for_each_trb_in(current) { >> clear_HWO(trb); >> } >> >> list_for_entry_safe_reverse() { >> move_cancelled(); >> } >> } >> >> -- >> balbi > >
Alan Stern wrote: > On Wed, May 05, 2021 at 03:57:03PM +0300, Felipe Balbi wrote: >> >> Hi, >> >> Wesley Cheng <wcheng@codeaurora.org> writes: >>> On 5/3/2021 7:20 PM, Thinh Nguyen wrote: >>>> Hi, >>>> >>>> Wesley Cheng wrote: >>>>> If an error is received when issuing a start or update transfer >>>>> command, the error handler will stop all active requests (including >>>>> the current USB request), and call dwc3_gadget_giveback() to notify >>>>> function drivers of the requests which have been stopped. Avoid >>>>> having to cancel the current request which is trying to be queued, as >>>>> the function driver will handle the EP queue error accordingly. >>>>> Simply unmap the request as it was done before, and allow previously >>>>> started transfers to be cleaned up. >>>>> >>> >>> Hi Thinh, >>> >>>> >>>> It looks like you're still letting dwc3 stopping and cancelling all the >>>> active requests instead letting the function driver doing the dequeue. >>>> >>> >>> Yeah, main issue isn't due to the function driver doing dequeue, but >>> having cleanup (ie USB request free) if there is an error during >>> usb_ep_queue(). >>> >>> The function driver in question at the moment is the f_fs driver in AIO >>> mode. When async IO is enabled in the FFS driver, every time it queues >>> a packet, it will allocate a io_data struct beforehand. If the >>> usb_ep_queue() fails it will free this io_data memory. Problem is that, >>> since the DWC3 gadget calls the completion with -ECONNRESET, the FFS >>> driver will also schedule a work item (within io_data struct) to handle >>> the completion. So you end up with a flow like below >>> >>> allocate io_data (ffs) >>> --> usb_ep_queue() >>> --> __dwc3_gadget_kick_transfer() >>> --> dwc3_send_gadget_ep_cmd(EINVAL) >>> --> dwc3_gadget_ep_cleanup_cancelled_requests() >>> --> dwc3_gadget_giveback(ECONNRESET) >>> ffs completion callback >>> queue work item within io_data >>> --> usb_ep_queue returns EINVAL >>> ffs frees io_data >>> ... >>> >>> work scheduled >>> --> NULL pointer/memory fault as io_data is freed > > Am I reading this correctly? It looks like usb_ep_queue() returns a > -EINVAL error, but then the completion callback gets invoked with a > status of -ECONNRESET. > >> I have some vague memory of discussing (something like) this with Alan >> Stern long ago and the conclusion was that the gadget driver should >> handle cases such as this. > > Indeed, this predates the creation of the Gadget API; the same design > principle applies to the host-side API. It's a very simple idea: > > If an URB or usb_request submission succeeds, it is guaranteed > that the completion routine will be called when the transfer is > finished, cancelled, aborted, or whatever (note that this may > happen before the submission call returns). > > If an URB or usb_request submission fails, it is guaranteed that > the completion routine will not be called. > > So if dwc3 behaves as described above (usb_ep_queue() fails _and_ the > completion handler is called), this is a bug. > > Alan Stern > Hi Alan, Yes, it's a bug, no question about that. The concern here is how should we fix it. In dwc3, when the usb_ep_queue() is called, it does 3 main things: 1) Put the request in a pending list to be processed 2) Process any started but partially processed request and any outstanding request from the pending list and map them to TRBs 3) Send a command to the controller telling it to cache and process these new TRBs Currently, if step 3) fails, then usb_ep_queue() returns an error status and we stop the controller from processing TRBs and return any request related to those outstanding TRBs. This is a bug. My suggestion here is don't immediately return an error code and let the completion interrupt inform of a transfer failure. The reasons are: a) Step 3) happened when the request is already (arguably) queued. b) If the error from step 3) is due to command timed out, the controller may have partially cached/processed some of these TRBs. We can't simply give back the request immediately without stopping the transfer and fail all the in-flight request. c) It adds unnecessary complexity to the driver and there are a few pitfalls that we have to watch out for when handling giving back the request. d) Except the case where the device is disconnected or that the request is already in-flight, step 1) and 2) are always done after usb_ep_queue(). The controller driver already owns these requests and can be considered "queued". If our definition whether a request is "queued" must be a combination of all those 3 steps, then my suggestion is not enough and we'd have to explore other options. Note that we already handle it this way for isoc this way. We don't send the START_TRANSFER command immediately and consider the requests to be queued in the usb_ep_queue(). So here we're just extending this to other endpoints too. Thanks, Thinh
On Wed, May 05, 2021 at 06:31:31PM +0000, Thinh Nguyen wrote: > Alan Stern wrote: > > On Wed, May 05, 2021 at 03:57:03PM +0300, Felipe Balbi wrote: > >> > >> Hi, > >> > >> Wesley Cheng <wcheng@codeaurora.org> writes: > >>> On 5/3/2021 7:20 PM, Thinh Nguyen wrote: > >>>> Hi, > >>>> > >>>> Wesley Cheng wrote: > >>>>> If an error is received when issuing a start or update transfer > >>>>> command, the error handler will stop all active requests (including > >>>>> the current USB request), and call dwc3_gadget_giveback() to notify > >>>>> function drivers of the requests which have been stopped. Avoid > >>>>> having to cancel the current request which is trying to be queued, as > >>>>> the function driver will handle the EP queue error accordingly. > >>>>> Simply unmap the request as it was done before, and allow previously > >>>>> started transfers to be cleaned up. > >>>>> > >>> > >>> Hi Thinh, > >>> > >>>> > >>>> It looks like you're still letting dwc3 stopping and cancelling all the > >>>> active requests instead letting the function driver doing the dequeue. > >>>> > >>> > >>> Yeah, main issue isn't due to the function driver doing dequeue, but > >>> having cleanup (ie USB request free) if there is an error during > >>> usb_ep_queue(). > >>> > >>> The function driver in question at the moment is the f_fs driver in AIO > >>> mode. When async IO is enabled in the FFS driver, every time it queues > >>> a packet, it will allocate a io_data struct beforehand. If the > >>> usb_ep_queue() fails it will free this io_data memory. Problem is that, > >>> since the DWC3 gadget calls the completion with -ECONNRESET, the FFS > >>> driver will also schedule a work item (within io_data struct) to handle > >>> the completion. So you end up with a flow like below > >>> > >>> allocate io_data (ffs) > >>> --> usb_ep_queue() > >>> --> __dwc3_gadget_kick_transfer() > >>> --> dwc3_send_gadget_ep_cmd(EINVAL) > >>> --> dwc3_gadget_ep_cleanup_cancelled_requests() > >>> --> dwc3_gadget_giveback(ECONNRESET) > >>> ffs completion callback > >>> queue work item within io_data > >>> --> usb_ep_queue returns EINVAL > >>> ffs frees io_data > >>> ... > >>> > >>> work scheduled > >>> --> NULL pointer/memory fault as io_data is freed > > > > Am I reading this correctly? It looks like usb_ep_queue() returns a > > -EINVAL error, but then the completion callback gets invoked with a > > status of -ECONNRESET. > > > >> I have some vague memory of discussing (something like) this with Alan > >> Stern long ago and the conclusion was that the gadget driver should > >> handle cases such as this. > > > > Indeed, this predates the creation of the Gadget API; the same design > > principle applies to the host-side API. It's a very simple idea: > > > > If an URB or usb_request submission succeeds, it is guaranteed > > that the completion routine will be called when the transfer is > > finished, cancelled, aborted, or whatever (note that this may > > happen before the submission call returns). > > > > If an URB or usb_request submission fails, it is guaranteed that > > the completion routine will not be called. > > > > So if dwc3 behaves as described above (usb_ep_queue() fails _and_ the > > completion handler is called), this is a bug. > > > > Alan Stern > > > > > Hi Alan, > > Yes, it's a bug, no question about that. The concern here is how should > we fix it. > > In dwc3, when the usb_ep_queue() is called, it does 3 main things: > 1) Put the request in a pending list to be processed > 2) Process any started but partially processed request and any > outstanding request from the pending list and map them to TRBs > 3) Send a command to the controller telling it to cache and process > these new TRBs > > Currently, if step 3) fails, then usb_ep_queue() returns an error status > and we stop the controller from processing TRBs and return any request > related to those outstanding TRBs. This is a bug. > > My suggestion here is don't immediately return an error code and let the > completion interrupt inform of a transfer failure. The reasons are: > > a) Step 3) happened when the request is already (arguably) queued. > b) If the error from step 3) is due to command timed out, the controller > may have partially cached/processed some of these TRBs. We can't simply > give back the request immediately without stopping the transfer and fail > all the in-flight request. > c) It adds unnecessary complexity to the driver and there are a few > pitfalls that we have to watch out for when handling giving back the > request. > d) Except the case where the device is disconnected or that the request > is already in-flight, step 1) and 2) are always done after > usb_ep_queue(). The controller driver already owns these requests and > can be considered "queued". > > If our definition whether a request is "queued" must be a combination of > all those 3 steps, then my suggestion is not enough and we'd have to > explore other options. > > Note that we already handle it this way for isoc this way. We don't send > the START_TRANSFER command immediately and consider the requests to be > queued in the usb_ep_queue(). So here we're just extending this to other > endpoints too. That does sound like the best approach. Thinking of the procedure in terms of ownership, as you wrote above, is entirely appropriate. Alan Stern
Felipe Balbi wrote: > > Hi, > > Wesley Cheng <wcheng@codeaurora.org> writes: >> On 5/3/2021 7:20 PM, Thinh Nguyen wrote: >>> Hi, >>> >>> Wesley Cheng wrote: >>>> If an error is received when issuing a start or update transfer >>>> command, the error handler will stop all active requests (including >>>> the current USB request), and call dwc3_gadget_giveback() to notify >>>> function drivers of the requests which have been stopped. Avoid >>>> having to cancel the current request which is trying to be queued, as >>>> the function driver will handle the EP queue error accordingly. >>>> Simply unmap the request as it was done before, and allow previously >>>> started transfers to be cleaned up. >>>> >> >> Hi Thinh, >> >>> >>> It looks like you're still letting dwc3 stopping and cancelling all the >>> active requests instead letting the function driver doing the dequeue. >>> >> >> Yeah, main issue isn't due to the function driver doing dequeue, but >> having cleanup (ie USB request free) if there is an error during >> usb_ep_queue(). >> >> The function driver in question at the moment is the f_fs driver in AIO >> mode. When async IO is enabled in the FFS driver, every time it queues >> a packet, it will allocate a io_data struct beforehand. If the >> usb_ep_queue() fails it will free this io_data memory. Problem is that, >> since the DWC3 gadget calls the completion with -ECONNRESET, the FFS >> driver will also schedule a work item (within io_data struct) to handle >> the completion. So you end up with a flow like below >> >> allocate io_data (ffs) >> --> usb_ep_queue() >> --> __dwc3_gadget_kick_transfer() >> --> dwc3_send_gadget_ep_cmd(EINVAL) >> --> dwc3_gadget_ep_cleanup_cancelled_requests() >> --> dwc3_gadget_giveback(ECONNRESET) >> ffs completion callback >> queue work item within io_data >> --> usb_ep_queue returns EINVAL >> ffs frees io_data >> ... >> >> work scheduled >> --> NULL pointer/memory fault as io_data is freed > > I have some vague memory of discussing (something like) this with Alan > Stern long ago and the conclusion was that the gadget driver should > handle cases such as this. OTOH, we're returning failure during > usb_ep_queue() which tells me there's something with dwc3 (perhaps not > exclusively, but that's yet to be shown). > > If I understood the whole thing correctly, we want everything except the > current request (the one that failed START or UPDATE transfer) to go > through giveback(). This really tells me that we're not handling error > case in kick_transfer and/or prepare_trbs() correctly. > > I also don't want to pass another argument to kick_transfer because it > should be unnecessary: the current request should *always* be the last > one in the list. Therefore we should rely on something like > list_last_entry() followed by list_for_each_entry_safe_reverse() to > handle this without a special case. > > ret = dwc3_send_gadget_ep_cmd(); > if (ret < 0) { > current = list_last_entry(); > > unmap(current); > for_each_trb_in(current) { > clear_HWO(trb); > } > > list_for_entry_safe_reverse() { > move_cancelled(); > } > } > Hi Felipe, This won't work. The queued request may not have left the pending_list and never started at all (e.g. due to no available TRB). So we can't simply get the last entry of whichever list without checking which request is being queued. See my suggestions and response to Alan's comment. Thanks, Thinh
Hi, Wesley Cheng <wcheng@codeaurora.org> writes: >> If I understood the whole thing correctly, we want everything except the >> current request (the one that failed START or UPDATE transfer) to go >> through giveback(). This really tells me that we're not handling error >> case in kick_transfer and/or prepare_trbs() correctly. >> > > We don't want the request passed in usb_ep_queue() to be calling > giveback() IF DONE IN the usb_ep_queue() context only. right, that's how this should behave. >> I also don't want to pass another argument to kick_transfer because it >> should be unnecessary: the current request should *always* be the last >> one in the list. Therefore we should rely on something like >> list_last_entry() followed by list_for_each_entry_safe_reverse() to >> handle this without a special case. >> >> ret = dwc3_send_gadget_ep_cmd(); >> if (ret < 0) { >> current = list_last_entry(); >> >> unmap(current); >> for_each_trb_in(current) { >> clear_HWO(trb); >> } >> >> list_for_entry_safe_reverse() { >> move_cancelled(); >> } >> } >> > Nice, thanks for the suggestion and info! Problem we have is that kick > transfer is being used elsewhere, for example, during the TRB complete path: > > static bool dwc3_gadget_endpoint_trbs_complete(struct dwc3_ep *dep, > const struct dwc3_event_depevt *event, int status) > { > ... > else if (dwc3_gadget_ep_should_continue(dep)) > if (__dwc3_gadget_kick_transfer(dep) == 0) > no_started_trb = false; > > So in these types of calls, we would still want ALL requests to be > cancelled w/ giveback() called, so that the completion() callbacks can > cleanup/free those requests accordingly. > > If we went and only unmapped the last entry (and removed it from any > list), then no one would clean it up as it is outside of the > usb_ep_queue() context, and not within any of the DWC3 lists. oh, I see what you mean. At the moment we want kick_transfer to behave in two different manners and that's probably where the bug is originating from. It sounds like it's time to split kick_transfer into kick_queued_transfer() and e.g. continue_pending_transfers() The thing is that if we continue to sprinkle special cases all over the place, soon enough it'll be super hard to maintain the driver.
Hi, Thinh Nguyen <Thinh.Nguyen@synopsys.com> writes: > Alan Stern wrote: >> On Wed, May 05, 2021 at 03:57:03PM +0300, Felipe Balbi wrote: >>> >>> Hi, >>> >>> Wesley Cheng <wcheng@codeaurora.org> writes: >>>> On 5/3/2021 7:20 PM, Thinh Nguyen wrote: >>>>> Hi, >>>>> >>>>> Wesley Cheng wrote: >>>>>> If an error is received when issuing a start or update transfer >>>>>> command, the error handler will stop all active requests (including >>>>>> the current USB request), and call dwc3_gadget_giveback() to notify >>>>>> function drivers of the requests which have been stopped. Avoid >>>>>> having to cancel the current request which is trying to be queued, as >>>>>> the function driver will handle the EP queue error accordingly. >>>>>> Simply unmap the request as it was done before, and allow previously >>>>>> started transfers to be cleaned up. >>>>>> >>>> >>>> Hi Thinh, >>>> >>>>> >>>>> It looks like you're still letting dwc3 stopping and cancelling all the >>>>> active requests instead letting the function driver doing the dequeue. >>>>> >>>> >>>> Yeah, main issue isn't due to the function driver doing dequeue, but >>>> having cleanup (ie USB request free) if there is an error during >>>> usb_ep_queue(). >>>> >>>> The function driver in question at the moment is the f_fs driver in AIO >>>> mode. When async IO is enabled in the FFS driver, every time it queues >>>> a packet, it will allocate a io_data struct beforehand. If the >>>> usb_ep_queue() fails it will free this io_data memory. Problem is that, >>>> since the DWC3 gadget calls the completion with -ECONNRESET, the FFS >>>> driver will also schedule a work item (within io_data struct) to handle >>>> the completion. So you end up with a flow like below >>>> >>>> allocate io_data (ffs) >>>> --> usb_ep_queue() >>>> --> __dwc3_gadget_kick_transfer() >>>> --> dwc3_send_gadget_ep_cmd(EINVAL) >>>> --> dwc3_gadget_ep_cleanup_cancelled_requests() >>>> --> dwc3_gadget_giveback(ECONNRESET) >>>> ffs completion callback >>>> queue work item within io_data >>>> --> usb_ep_queue returns EINVAL >>>> ffs frees io_data >>>> ... >>>> >>>> work scheduled >>>> --> NULL pointer/memory fault as io_data is freed >> >> Am I reading this correctly? It looks like usb_ep_queue() returns a >> -EINVAL error, but then the completion callback gets invoked with a >> status of -ECONNRESET. >> >>> I have some vague memory of discussing (something like) this with Alan >>> Stern long ago and the conclusion was that the gadget driver should >>> handle cases such as this. >> >> Indeed, this predates the creation of the Gadget API; the same design >> principle applies to the host-side API. It's a very simple idea: >> >> If an URB or usb_request submission succeeds, it is guaranteed >> that the completion routine will be called when the transfer is >> finished, cancelled, aborted, or whatever (note that this may >> happen before the submission call returns). >> >> If an URB or usb_request submission fails, it is guaranteed that >> the completion routine will not be called. >> >> So if dwc3 behaves as described above (usb_ep_queue() fails _and_ the >> completion handler is called), this is a bug. >> >> Alan Stern >> > > > Hi Alan, > > Yes, it's a bug, no question about that. The concern here is how should > we fix it. > > In dwc3, when the usb_ep_queue() is called, it does 3 main things: > 1) Put the request in a pending list to be processed > 2) Process any started but partially processed request and any > outstanding request from the pending list and map them to TRBs > 3) Send a command to the controller telling it to cache and process > these new TRBs > > Currently, if step 3) fails, then usb_ep_queue() returns an error status > and we stop the controller from processing TRBs and return any request > related to those outstanding TRBs. This is a bug. > > My suggestion here is don't immediately return an error code and let the > completion interrupt inform of a transfer failure. The reasons are: > > a) Step 3) happened when the request is already (arguably) queued. > b) If the error from step 3) is due to command timed out, the controller > may have partially cached/processed some of these TRBs. We can't simply > give back the request immediately without stopping the transfer and fail > all the in-flight request. > c) It adds unnecessary complexity to the driver and there are a few > pitfalls that we have to watch out for when handling giving back the > request. > d) Except the case where the device is disconnected or that the request > is already in-flight, step 1) and 2) are always done after > usb_ep_queue(). The controller driver already owns these requests and > can be considered "queued". Oh, I see. Indeed this sounds like the best minimal fix for the -rc and backporting to stables. We may still want to get back to this at some point and, potentially, split kick_transfer() into two parts that can make assumptions about the context where they can be called. Also, we may want to move the request to the pending list only if the command succeeds. There should be no race condition as kick_transfer is always called with interrupts disabled.
Felipe Balbi wrote: > > Hi, > > Thinh Nguyen <Thinh.Nguyen@synopsys.com> writes: >> Alan Stern wrote: >>> On Wed, May 05, 2021 at 03:57:03PM +0300, Felipe Balbi wrote: >>>> >>>> Hi, >>>> >>>> Wesley Cheng <wcheng@codeaurora.org> writes: >>>>> On 5/3/2021 7:20 PM, Thinh Nguyen wrote: >>>>>> Hi, >>>>>> >>>>>> Wesley Cheng wrote: >>>>>>> If an error is received when issuing a start or update transfer >>>>>>> command, the error handler will stop all active requests (including >>>>>>> the current USB request), and call dwc3_gadget_giveback() to notify >>>>>>> function drivers of the requests which have been stopped. Avoid >>>>>>> having to cancel the current request which is trying to be queued, as >>>>>>> the function driver will handle the EP queue error accordingly. >>>>>>> Simply unmap the request as it was done before, and allow previously >>>>>>> started transfers to be cleaned up. >>>>>>> >>>>> >>>>> Hi Thinh, >>>>> >>>>>> >>>>>> It looks like you're still letting dwc3 stopping and cancelling all the >>>>>> active requests instead letting the function driver doing the dequeue. >>>>>> >>>>> >>>>> Yeah, main issue isn't due to the function driver doing dequeue, but >>>>> having cleanup (ie USB request free) if there is an error during >>>>> usb_ep_queue(). >>>>> >>>>> The function driver in question at the moment is the f_fs driver in AIO >>>>> mode. When async IO is enabled in the FFS driver, every time it queues >>>>> a packet, it will allocate a io_data struct beforehand. If the >>>>> usb_ep_queue() fails it will free this io_data memory. Problem is that, >>>>> since the DWC3 gadget calls the completion with -ECONNRESET, the FFS >>>>> driver will also schedule a work item (within io_data struct) to handle >>>>> the completion. So you end up with a flow like below >>>>> >>>>> allocate io_data (ffs) >>>>> --> usb_ep_queue() >>>>> --> __dwc3_gadget_kick_transfer() >>>>> --> dwc3_send_gadget_ep_cmd(EINVAL) >>>>> --> dwc3_gadget_ep_cleanup_cancelled_requests() >>>>> --> dwc3_gadget_giveback(ECONNRESET) >>>>> ffs completion callback >>>>> queue work item within io_data >>>>> --> usb_ep_queue returns EINVAL >>>>> ffs frees io_data >>>>> ... >>>>> >>>>> work scheduled >>>>> --> NULL pointer/memory fault as io_data is freed >>> >>> Am I reading this correctly? It looks like usb_ep_queue() returns a >>> -EINVAL error, but then the completion callback gets invoked with a >>> status of -ECONNRESET. >>> >>>> I have some vague memory of discussing (something like) this with Alan >>>> Stern long ago and the conclusion was that the gadget driver should >>>> handle cases such as this. >>> >>> Indeed, this predates the creation of the Gadget API; the same design >>> principle applies to the host-side API. It's a very simple idea: >>> >>> If an URB or usb_request submission succeeds, it is guaranteed >>> that the completion routine will be called when the transfer is >>> finished, cancelled, aborted, or whatever (note that this may >>> happen before the submission call returns). >>> >>> If an URB or usb_request submission fails, it is guaranteed that >>> the completion routine will not be called. >>> >>> So if dwc3 behaves as described above (usb_ep_queue() fails _and_ the >>> completion handler is called), this is a bug. >>> >>> Alan Stern >>> >> >> >> Hi Alan, >> >> Yes, it's a bug, no question about that. The concern here is how should >> we fix it. >> >> In dwc3, when the usb_ep_queue() is called, it does 3 main things: >> 1) Put the request in a pending list to be processed >> 2) Process any started but partially processed request and any >> outstanding request from the pending list and map them to TRBs >> 3) Send a command to the controller telling it to cache and process >> these new TRBs >> >> Currently, if step 3) fails, then usb_ep_queue() returns an error status >> and we stop the controller from processing TRBs and return any request >> related to those outstanding TRBs. This is a bug. >> >> My suggestion here is don't immediately return an error code and let the >> completion interrupt inform of a transfer failure. The reasons are: >> >> a) Step 3) happened when the request is already (arguably) queued. >> b) If the error from step 3) is due to command timed out, the controller >> may have partially cached/processed some of these TRBs. We can't simply >> give back the request immediately without stopping the transfer and fail >> all the in-flight request. >> c) It adds unnecessary complexity to the driver and there are a few >> pitfalls that we have to watch out for when handling giving back the >> request. >> d) Except the case where the device is disconnected or that the request >> is already in-flight, step 1) and 2) are always done after >> usb_ep_queue(). The controller driver already owns these requests and >> can be considered "queued". > > Oh, I see. Indeed this sounds like the best minimal fix for the -rc and > backporting to stables. We may still want to get back to this at some > point and, potentially, split kick_transfer() into two parts that can > make assumptions about the context where they can be called. Yeah, the driver may need some reorganization to keep the logic clear. > > Also, we may want to move the request to the pending list only if the > command succeeds. There should be no race condition as kick_transfer is > always called with interrupts disabled. > Hm... I don't see how this is better. If I understand correctly, that requires that there will be a command issued for every call to usb_ep_queue(). In other word, with this requirement, we're enforcing whether we can give the request to the controller and drop it otherwise. So, if we run out of TRBs, usb_ep_queue() will fail? Thanks, Thinh
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c index dd80e5c..c8ddbe1 100644 --- a/drivers/usb/dwc3/gadget.c +++ b/drivers/usb/dwc3/gadget.c @@ -140,6 +140,29 @@ int dwc3_gadget_set_link_state(struct dwc3 *dwc, enum dwc3_link_state state) } /** + * dwc3_ep_dec_trb - decrement a trb index. + * @index: Pointer to the TRB index to increment. + * + * The index should never point to the link TRB. After decrementing, + * if index is zero, wrap around to the TRB before the link TRB. + */ +static void dwc3_ep_dec_trb(u8 *index) +{ + (*index)--; + if (*index < 0) + *index = DWC3_TRB_NUM - 1; +} + +/** + * dwc3_ep_dec_enq - decrement endpoint's enqueue pointer + * @dep: The endpoint whose enqueue pointer we're decrementing + */ +static void dwc3_ep_dec_enq(struct dwc3_ep *dep) +{ + dwc3_ep_dec_trb(&dep->trb_enqueue); +} + +/** * dwc3_ep_inc_trb - increment a trb index. * @index: Pointer to the TRB index to increment. * @@ -1352,7 +1375,26 @@ static int dwc3_prepare_trbs(struct dwc3_ep *dep) static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep); -static int __dwc3_gadget_kick_transfer(struct dwc3_ep *dep) +static void dwc3_gadget_ep_revert_trbs(struct dwc3_ep *dep, struct dwc3_request *req) +{ + int i; + + if (!req->trb) + return; + + for (i = 0; i < req->num_trbs; i++) { + struct dwc3_trb *trb; + + trb = &dep->trb_pool[dep->trb_enqueue]; + trb->ctrl &= ~DWC3_TRB_CTRL_HWO; + dwc3_ep_dec_enq(dep); + } + + req->num_trbs = 0; +} + +static int __dwc3_gadget_kick_transfer(struct dwc3_ep *dep, + struct dwc3_request *queued_req) { struct dwc3_gadget_ep_cmd_params params; struct dwc3_request *req; @@ -1410,8 +1452,23 @@ static int __dwc3_gadget_kick_transfer(struct dwc3_ep *dep) dwc3_stop_active_transfer(dep, true, true); - list_for_each_entry_safe(req, tmp, &dep->started_list, list) - dwc3_gadget_move_cancelled_request(req, DWC3_REQUEST_STATUS_DEQUEUED); + /* + * In order to ensure the logic is applied to a request being + * queued by dwc3_gadget_ep_queue(), it needs to explicitly + * check that req is the same as queued_req (request being + * queued). If so, then just unmap and decrement the enqueue + * pointer, as the usb_ep_queue() error handler in the function + * driver will handle cleaning up the USB request. + */ + list_for_each_entry_safe(req, tmp, &dep->started_list, list) { + if (req == queued_req) { + dwc3_gadget_ep_revert_trbs(dep, req); + dwc3_gadget_del_and_unmap_request(dep, req, ret); + } else { + dwc3_gadget_move_cancelled_request(req, + DWC3_REQUEST_STATUS_DEQUEUED); + } + } /* If ep isn't started, then there's no end transfer pending */ if (!(dep->flags & DWC3_EP_END_TRANSFER_PENDING)) @@ -1546,7 +1603,7 @@ static int dwc3_gadget_start_isoc_quirk(struct dwc3_ep *dep) dep->start_cmd_status = 0; dep->combo_num = 0; - return __dwc3_gadget_kick_transfer(dep); + return __dwc3_gadget_kick_transfer(dep, NULL); } static int __dwc3_gadget_start_isoc(struct dwc3_ep *dep) @@ -1593,7 +1650,7 @@ static int __dwc3_gadget_start_isoc(struct dwc3_ep *dep) for (i = 0; i < DWC3_ISOC_MAX_RETRIES; i++) { dep->frame_number = DWC3_ALIGN_FRAME(dep, i + 1); - ret = __dwc3_gadget_kick_transfer(dep); + ret = __dwc3_gadget_kick_transfer(dep, NULL); if (ret != -EAGAIN) break; } @@ -1684,7 +1741,7 @@ static int __dwc3_gadget_ep_queue(struct dwc3_ep *dep, struct dwc3_request *req) } } - return __dwc3_gadget_kick_transfer(dep); + return __dwc3_gadget_kick_transfer(dep, req); } static int dwc3_gadget_ep_queue(struct usb_ep *ep, struct usb_request *request, @@ -1893,7 +1950,7 @@ int __dwc3_gadget_ep_set_halt(struct dwc3_ep *dep, int value, int protocol) if ((dep->flags & DWC3_EP_DELAY_START) && !usb_endpoint_xfer_isoc(dep->endpoint.desc)) - __dwc3_gadget_kick_transfer(dep); + __dwc3_gadget_kick_transfer(dep, NULL); dep->flags &= ~DWC3_EP_DELAY_START; } @@ -2992,7 +3049,7 @@ static bool dwc3_gadget_endpoint_trbs_complete(struct dwc3_ep *dep, (list_empty(&dep->pending_list) || status == -EXDEV)) dwc3_stop_active_transfer(dep, true, true); else if (dwc3_gadget_ep_should_continue(dep)) - if (__dwc3_gadget_kick_transfer(dep) == 0) + if (__dwc3_gadget_kick_transfer(dep, NULL) == 0) no_started_trb = false; out: @@ -3106,7 +3163,7 @@ static void dwc3_gadget_endpoint_command_complete(struct dwc3_ep *dep, if ((dep->flags & DWC3_EP_DELAY_START) && !usb_endpoint_xfer_isoc(dep->endpoint.desc)) - __dwc3_gadget_kick_transfer(dep); + __dwc3_gadget_kick_transfer(dep, NULL); dep->flags &= ~DWC3_EP_DELAY_START; }
If an error is received when issuing a start or update transfer command, the error handler will stop all active requests (including the current USB request), and call dwc3_gadget_giveback() to notify function drivers of the requests which have been stopped. Avoid having to cancel the current request which is trying to be queued, as the function driver will handle the EP queue error accordingly. Simply unmap the request as it was done before, and allow previously started transfers to be cleaned up. Signed-off-by: Wesley Cheng <wcheng@codeaurora.org> --- Changes in v2: - Addressed feedback received by making sure the logic only applies to a req which is being queued, decrementing the enqueue pointer, and only clearing the HWO bit. drivers/usb/dwc3/gadget.c | 75 +++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 66 insertions(+), 9 deletions(-)