diff mbox series

[RFC,1/2] usb: dwc3: gadget: Check for IOC/LST bit in both event->status and TRB->ctrl fields

Message ID 20200122222645.38805-2-john.stultz@linaro.org (mailing list archive)
State New, archived
Headers show
Series Avoiding DWC3 transfer stalls/hangs when using adb over f_fs | expand

Commit Message

John Stultz Jan. 22, 2020, 10:26 p.m. UTC
From: Anurag Kumar Vulisha <anurag.kumar.vulisha@xilinx.com>

The present code in dwc3_gadget_ep_reclaim_completed_trb() will check
for IOC/LST bit in the event->status and returns if IOC/LST bit is
set. This logic doesn't work if multiple TRBs are queued per
request and the IOC/LST bit is set on the last TRB of that request.
Consider an example where a queued request has multiple queued TRBs
and IOC/LST bit is set only for the last TRB. In this case, the Core
generates XferComplete/XferInProgress events only for the last TRB
(since IOC/LST are set only for the last TRB). As per the logic in
dwc3_gadget_ep_reclaim_completed_trb() event->status is checked for
IOC/LST bit and returns on the first TRB. This makes the remaining
TRBs left unhandled.
To aviod this, changed the code to check for IOC/LST bits in both
event->status & TRB->ctrl. This patch does the same.

At a practical level, this patch resolves USB transfer stalls seen
with adb on dwc3 based HiKey960 after functionfs gadget added
scatter-gather support around v4.20.

Cc: Felipe Balbi <felipe.balbi@linux.intel.com>
Cc: Yang Fei <fei.yang@intel.com>
Cc: Thinh Nguyen <thinhn@synopsys.com>
Cc: Tejas Joglekar <tejas.joglekar@synopsys.com>
Cc: Andrzej Pietrasiewicz <andrzej.p@collabora.com>
Cc: Jack Pham <jackp@codeaurora.org>
Cc: Todd Kjos <tkjos@google.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Linux USB List <linux-usb@vger.kernel.org>
Cc: stable <stable@vger.kernel.org>
Tested-by: Tejas Joglekar <tejas.joglekar@synopsys.com>
Reviewed-by: Thinh Nguyen <thinhn@synopsys.com>
Signed-off-by: Anurag Kumar Vulisha <anurag.kumar.vulisha@xilinx.com>
[jstultz: forward ported to mainline, added note to commit log]
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 drivers/usb/dwc3/gadget.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

Comments

Felipe Balbi Jan. 23, 2020, 7:24 a.m. UTC | #1
Hi,

John Stultz <john.stultz@linaro.org> writes:

> From: Anurag Kumar Vulisha <anurag.kumar.vulisha@xilinx.com>
>
> The present code in dwc3_gadget_ep_reclaim_completed_trb() will check
> for IOC/LST bit in the event->status and returns if IOC/LST bit is
> set. This logic doesn't work if multiple TRBs are queued per
> request and the IOC/LST bit is set on the last TRB of that request.
> Consider an example where a queued request has multiple queued TRBs
> and IOC/LST bit is set only for the last TRB. In this case, the Core
> generates XferComplete/XferInProgress events only for the last TRB
> (since IOC/LST are set only for the last TRB). As per the logic in
> dwc3_gadget_ep_reclaim_completed_trb() event->status is checked for
> IOC/LST bit and returns on the first TRB. This makes the remaining
> TRBs left unhandled.
> To aviod this, changed the code to check for IOC/LST bits in both
     avoid

> event->status & TRB->ctrl. This patch does the same.

We don't need to check both. It's very likely that checking the TRB is
enough.

> At a practical level, this patch resolves USB transfer stalls seen
> with adb on dwc3 based HiKey960 after functionfs gadget added
> scatter-gather support around v4.20.

Right, I remember asking for tracepoint data showing this problem
happening. It's the best way to figure out what's really going on.

Before we accept these two patches, could you collect dwc3 tracepoint
data and share here?
Yang, Fei Jan. 23, 2020, 6:54 p.m. UTC | #2
>> The present code in dwc3_gadget_ep_reclaim_completed_trb() will check 
>> for IOC/LST bit in the event->status and returns if IOC/LST bit is 
>> set. This logic doesn't work if multiple TRBs are queued per request 
>> and the IOC/LST bit is set on the last TRB of that request.
>> Consider an example where a queued request has multiple queued TRBs 
>> and IOC/LST bit is set only for the last TRB. In this case, the Core 
>> generates XferComplete/XferInProgress events only for the last TRB 
>> (since IOC/LST are set only for the last TRB). As per the logic in
>> dwc3_gadget_ep_reclaim_completed_trb() event->status is checked for 
>> IOC/LST bit and returns on the first TRB. This makes the remaining 
>> TRBs left unhandled.
>> To aviod this, changed the code to check for IOC/LST bits in both
>     avoid
>
>> event->status & TRB->ctrl. This patch does the same.
>
> We don't need to check both. It's very likely that checking the TRB is enough.
>
>> At a practical level, this patch resolves USB transfer stalls seen 
>> with adb on dwc3 based HiKey960 after functionfs gadget added 
>> scatter-gather support around v4.20.
>
> Right, I remember asking for tracepoint data showing this problem happening. It's the best way to figure out what's really going on.
>
> Before we accept these two patches, could you collect dwc3 tracepoint data and share here?

I should have replied to this one. Sorry for the confusion on the other thread.
I have sent tracepoints long time ago for this problem, but the tracepoints did help much on debugging the issue, so I'm not going to send again.

But the problem is really obvious though. In function dwc3_gadget_ep_reclaim_trb_sg(), the for_each_sg loop doesn't get a chance to iterate through all TRBs in the sg list, because this function only gets called when the last TRB in the list is completed (because of setting IOC), so at this moment event->status has IOC bit set. The consequence is that, when the for_each_sg loop trying to reclaim the first TRB in the sg list, the call dwc3_gadget_ep_reclaim_completed_trb() returns 1 (if (event-status & DEPEVT_STATUS_IOC) return 1;), thus the for loop is terminated before the rest of the TRBs are reclaimed.

static int dwc3_gadget_ep_reclaim_trb_sg(struct dwc3_ep *dep,
                struct dwc3_request *req, const struct dwc3_event_depevt *event,
                int status)
{
        struct dwc3_trb *trb = &dep->trb_pool[dep->trb_dequeue];
        struct scatterlist *sg = req->sg;
        struct scatterlist *s;
        unsigned int pending = req->num_pending_sgs;
        unsigned int i;
        int ret = 0;

        for_each_sg(sg, s, pending, i) {
                trb = &dep->trb_pool[dep->trb_dequeue];

                if (trb->ctrl & DWC3_TRB_CTRL_HWO)
                        break;

                req->sg = sg_next(s);
                req->num_pending_sgs--;

                ret = dwc3_gadget_ep_reclaim_completed_trb(dep, req,
                                trb, event, status, true);
                if (ret)
                        break;
        }

        return ret;
}
Felipe Balbi Jan. 24, 2020, 7:45 a.m. UTC | #3
Hi,

John Stultz <john.stultz@linaro.org> writes:

> On Wed, Jan 22, 2020 at 11:23 PM Felipe Balbi <balbi@kernel.org> wrote:
>> > From: Anurag Kumar Vulisha <anurag.kumar.vulisha@xilinx.com>
>> >
>> > The present code in dwc3_gadget_ep_reclaim_completed_trb() will check
>> > for IOC/LST bit in the event->status and returns if IOC/LST bit is
>> > set. This logic doesn't work if multiple TRBs are queued per
>> > request and the IOC/LST bit is set on the last TRB of that request.
>> > Consider an example where a queued request has multiple queued TRBs
>> > and IOC/LST bit is set only for the last TRB. In this case, the Core
>> > generates XferComplete/XferInProgress events only for the last TRB
>> > (since IOC/LST are set only for the last TRB). As per the logic in
>> > dwc3_gadget_ep_reclaim_completed_trb() event->status is checked for
>> > IOC/LST bit and returns on the first TRB. This makes the remaining
>> > TRBs left unhandled.
>> > To aviod this, changed the code to check for IOC/LST bits in both
>>      avoid
>>
>> > event->status & TRB->ctrl. This patch does the same.
>>
>> We don't need to check both. It's very likely that checking the TRB is
>> enough.
>
> Sorry, just to clarify, are you suggesting instead of:
> -       if (event->status & DEPEVT_STATUS_IOC)
> +       if ((event->status & DEPEVT_STATUS_IOC) &&
> +           (trb->ctrl & DWC3_TRB_CTRL_IOC))
>
>
> We do something like:
> -       if (event->status & DEPEVT_STATUS_IOC)
> +       if (trb->ctrl & DWC3_TRB_CTRL_IOC)
> +               return 1;
> +
> +       if (trb->ctrl & DWC3_TRB_CTRL_LST)
>                 return 1;
>
> ?

that's correct. In hindsight, I have no idea why I used the
event->status here since all other checks are done against the TRB
only.

>> > At a practical level, this patch resolves USB transfer stalls seen
>> > with adb on dwc3 based HiKey960 after functionfs gadget added
>> > scatter-gather support around v4.20.
>>
>> Right, I remember asking for tracepoint data showing this problem
>> happening. It's the best way to figure out what's really going on.
>>
>> Before we accept these two patches, could you collect dwc3 tracepoint
>> data and share here?
>
> Sure. Attached is trace logs and regdumps for hikey960.

Thanks

> The one gotcha with the logs is that in the working case (with this
> patch applied), I booted with the usb-c cable disconnected (as
> suggested in the dwc3.rst doc), enabled tracing and plugged in the
> device, then ran adb logcat a few times to validate no stalls.
>
> In the failure case (without this patch), I booted with the usb-c
> cable disconnected, enabled tracing and then when I plugged in the
> device, it never was detected by adb (it seems perhaps the problem had
> already struck?).

You never got a Reset Interrupt, so something else is going on. I
suggest putting a sniffer and first making sure the host *does* drive
reset signalling. Second step would be to look at your phy
configuration. Is it going in suspend for any reason? Might want to try
our snps,dis_u3_susphy_quirk and snps,dis_u2_susphy_quirk flags.

> So I generated the failure2 log by booting with USB-C plugged in,
> enabling tracing, and running adb logcat on the host to observe the
> stall.

Thank you. Here's a quick summary of what's in failure2:

There is a series of 24-byte transfers on ep1out and that's the one
which shows a problem. We can clearly see that adb is issuing one
transfer at a time, only enqueueing transfer n+1 when transfer n is
completed and given back, so we see a series of similar blocks:

- dwc3_alloc_request
- dwc3_ep_queue
- dwc3_prepare_trb
- dwc3_prepare_trb (for the chained bit)
- dwc3_gadget_ep_cmd (update transfer)
- dwc3_event (transfer in progress)
- dwc3_complete_trb
- dwc3_complete_trb (for the chained bit)
- dwc3_gadget_giveback
- dwc3_free_request

So this works for several iterations. Note, however, that the TRB
addresses don't really make sense. DWC3 allocates a contiguous block of
memory to server as TRB pool, but we see non-consecutive addresses on
these TRBs. I'm assuming there's an IOMMU in your system.

Anyway, the failing point is here:

>          adbd-461   [002] d..1    49.855992: dwc3_alloc_request: ep1out: req 000000004e6eaaba length 0/0 zsI ==> 0
>          adbd-461   [002] d..2    49.855994: dwc3_ep_queue: ep1out: req 000000004e6eaaba length 0/24 zsI ==> -115
>          adbd-461   [002] d..2    49.855996: dwc3_prepare_trb: ep1out: trb 00000000bae39b48 buf 000000009eb0b100 size 24 ctrl 0000001d (HlCS:sc:normal)
>          adbd-461   [002] d..2    49.855997: dwc3_prepare_trb: ep1out: trb 000000009093a074 buf 0000000217da8000 size 488 ctrl 00000819 (HlcS:sC:normal)
>          adbd-461   [002] d..2    49.856003: dwc3_gadget_ep_cmd: ep1out: cmd 'Update Transfer' [20007] params 00000000 00000000 00000000 --> status: Successful
>   irq/65-dwc3-498   [000] d..1    53.902752: dwc3_event: event (00006084): ep1out: Transfer In Progress [0] (SIm)
>   irq/65-dwc3-498   [000] d..1    53.902763: dwc3_complete_trb: ep1out: trb 00000000bae39b48 buf 000000009eb0b100 size 0 ctrl 0000001c (hlCS:sc:normal)
>   irq/65-dwc3-498   [000] d..1    53.902769: dwc3_complete_trb: ep1out: trb 000000009093a074 buf 0000000217da8000 size 488 ctrl 00000819 (HlcS:sC:normal)
>   irq/65-dwc3-498   [000] d..1    53.902781: dwc3_gadget_giveback: ep1out: req 000000004e6eaaba length 24/24 zsI ==> 0
> kworker/u16:0-7     [000] ....    53.903020: dwc3_free_request: ep1out: req 000000004e6eaaba length 24/24 zsI ==> 0
>          adbd-461   [002] d..1    53.903273: dwc3_alloc_request: ep1out: req 00000000c769beab length 0/0 zsI ==> 0
>          adbd-461   [002] d..2    53.903285: dwc3_ep_queue: ep1out: req 00000000c769beab length 0/24 zsI ==> -115
>          adbd-461   [002] d..2    53.903292: dwc3_prepare_trb: ep1out: trb 00000000f0ffa827 buf 000000009eb11e80 size 24 ctrl 0000001d (HlCS:sc:normal)
>          adbd-461   [002] d..2    53.903296: dwc3_prepare_trb: ep1out: trb 00000000d6a9892a buf 0000000217da8000 size 488 ctrl 00000819 (HlcS:sC:normal)
>          adbd-461   [002] d..2    53.903315: dwc3_gadget_ep_cmd: ep1out: cmd 'Update Transfer' [20007] params 00000000 00000000 00000000 --> status: Successful

Note that this transfer, after started, took 4 seconds to complete,
while all others completed within a few ms. There's no real reason for
this visible from dwc3 driver itself. What follows, is a transfer that
never completed.

The only thing I can come up with, is that we starve the TRB ring, by
continuously reclaiming a single TRB. We have 255 usable TRBs, so after
a few iterations, we would see a stall due to starved TRB ring.

There is a way to verify this by tracking trb_enqueue and trb_dequeue,
if you're willing to do that, that'll help us prove that this is really
the problem and, since current tracepoints doen't really show that
information, it may be a good idea to add this information to
dwc3_log_trb tracepoint class. Something like below should be enough,
could you re-run the test of failure2 with this patch applied?

drivers/usb/dwc3/trace.h | 9 +++++++--

modified   drivers/usb/dwc3/trace.h
@@ -227,6 +227,8 @@ DECLARE_EVENT_CLASS(dwc3_log_trb,
 		__field(u32, size)
 		__field(u32, ctrl)
 		__field(u32, type)
+		__field(u32, enqueue)
+		__field(u32, dequeue)
 	),
 	TP_fast_assign(
 		__assign_str(name, dep->name);
@@ -236,9 +238,12 @@ DECLARE_EVENT_CLASS(dwc3_log_trb,
 		__entry->size = trb->size;
 		__entry->ctrl = trb->ctrl;
 		__entry->type = usb_endpoint_type(dep->endpoint.desc);
+		__entry->enqueue = dep->trb_enqueue
+		__entry->dequeue = dep->trb_dequeue
 	),
-	TP_printk("%s: trb %p buf %08x%08x size %s%d ctrl %08x (%c%c%c%c:%c%c:%s)",
-		__get_str(name), __entry->trb, __entry->bph, __entry->bpl,
+	TP_printk("%s: trb %p (E%d:D%d) buf %08x%08x size %s%d ctrl %08x (%c%c%c%c:%c%c:%s)",
+		__get_str(name), __entry->trb, __entry->enqueue,
+		__entry->dequeue, __entry->bph, __entry->bpl,
 		({char *s;
 		int pcm = ((__entry->size >> 24) & 3) + 1;
 		switch (__entry->type) {

> Anyway, all three sets of logs are included. Let me know if you need
> me to try anything else.

Thanks for doing this
John Stultz Jan. 24, 2020, 10:10 p.m. UTC | #4
On Thu, Jan 23, 2020 at 11:44 PM Felipe Balbi <balbi@kernel.org> wrote:
>
>
> Hi,
>
> John Stultz <john.stultz@linaro.org> writes:
>
> > On Wed, Jan 22, 2020 at 11:23 PM Felipe Balbi <balbi@kernel.org> wrote:
> >> > From: Anurag Kumar Vulisha <anurag.kumar.vulisha@xilinx.com>
> >> >
> >> > The present code in dwc3_gadget_ep_reclaim_completed_trb() will check
> >> > for IOC/LST bit in the event->status and returns if IOC/LST bit is
> >> > set. This logic doesn't work if multiple TRBs are queued per
> >> > request and the IOC/LST bit is set on the last TRB of that request.
> >> > Consider an example where a queued request has multiple queued TRBs
> >> > and IOC/LST bit is set only for the last TRB. In this case, the Core
> >> > generates XferComplete/XferInProgress events only for the last TRB
> >> > (since IOC/LST are set only for the last TRB). As per the logic in
> >> > dwc3_gadget_ep_reclaim_completed_trb() event->status is checked for
> >> > IOC/LST bit and returns on the first TRB. This makes the remaining
> >> > TRBs left unhandled.
> >> > To aviod this, changed the code to check for IOC/LST bits in both
> >>      avoid
> >>
> >> > event->status & TRB->ctrl. This patch does the same.
> >>
> >> We don't need to check both. It's very likely that checking the TRB is
> >> enough.
> >
> > Sorry, just to clarify, are you suggesting instead of:
> > -       if (event->status & DEPEVT_STATUS_IOC)
> > +       if ((event->status & DEPEVT_STATUS_IOC) &&
> > +           (trb->ctrl & DWC3_TRB_CTRL_IOC))
> >
> >
> > We do something like:
> > -       if (event->status & DEPEVT_STATUS_IOC)
> > +       if (trb->ctrl & DWC3_TRB_CTRL_IOC)
> > +               return 1;
> > +
> > +       if (trb->ctrl & DWC3_TRB_CTRL_LST)
> >                 return 1;
> >
> > ?
>
> that's correct. In hindsight, I have no idea why I used the
> event->status here since all other checks are done against the TRB
> only.
>
> >> > At a practical level, this patch resolves USB transfer stalls seen
> >> > with adb on dwc3 based HiKey960 after functionfs gadget added
> >> > scatter-gather support around v4.20.
> >>
> >> Right, I remember asking for tracepoint data showing this problem
> >> happening. It's the best way to figure out what's really going on.
> >>
> >> Before we accept these two patches, could you collect dwc3 tracepoint
> >> data and share here?
> >
> > Sure. Attached is trace logs and regdumps for hikey960.
>
> Thanks
>
> > The one gotcha with the logs is that in the working case (with this
> > patch applied), I booted with the usb-c cable disconnected (as
> > suggested in the dwc3.rst doc), enabled tracing and plugged in the
> > device, then ran adb logcat a few times to validate no stalls.
> >
> > In the failure case (without this patch), I booted with the usb-c
> > cable disconnected, enabled tracing and then when I plugged in the
> > device, it never was detected by adb (it seems perhaps the problem had
> > already struck?).
>
> You never got a Reset Interrupt, so something else is going on. I
> suggest putting a sniffer and first making sure the host *does* drive
> reset signalling. Second step would be to look at your phy
> configuration. Is it going in suspend for any reason? Might want to try
> our snps,dis_u3_susphy_quirk and snps,dis_u2_susphy_quirk flags.
>
> > So I generated the failure2 log by booting with USB-C plugged in,
> > enabling tracing, and running adb logcat on the host to observe the
> > stall.
>
> Thank you. Here's a quick summary of what's in failure2:
>
> There is a series of 24-byte transfers on ep1out and that's the one
> which shows a problem. We can clearly see that adb is issuing one
> transfer at a time, only enqueueing transfer n+1 when transfer n is
> completed and given back, so we see a series of similar blocks:
>
> - dwc3_alloc_request
> - dwc3_ep_queue
> - dwc3_prepare_trb
> - dwc3_prepare_trb (for the chained bit)
> - dwc3_gadget_ep_cmd (update transfer)
> - dwc3_event (transfer in progress)
> - dwc3_complete_trb
> - dwc3_complete_trb (for the chained bit)
> - dwc3_gadget_giveback
> - dwc3_free_request
>
> So this works for several iterations. Note, however, that the TRB
> addresses don't really make sense. DWC3 allocates a contiguous block of
> memory to server as TRB pool, but we see non-consecutive addresses on
> these TRBs. I'm assuming there's an IOMMU in your system.
>
> Anyway, the failing point is here:
>
> >          adbd-461   [002] d..1    49.855992: dwc3_alloc_request: ep1out: req 000000004e6eaaba length 0/0 zsI ==> 0
> >          adbd-461   [002] d..2    49.855994: dwc3_ep_queue: ep1out: req 000000004e6eaaba length 0/24 zsI ==> -115
> >          adbd-461   [002] d..2    49.855996: dwc3_prepare_trb: ep1out: trb 00000000bae39b48 buf 000000009eb0b100 size 24 ctrl 0000001d (HlCS:sc:normal)
> >          adbd-461   [002] d..2    49.855997: dwc3_prepare_trb: ep1out: trb 000000009093a074 buf 0000000217da8000 size 488 ctrl 00000819 (HlcS:sC:normal)
> >          adbd-461   [002] d..2    49.856003: dwc3_gadget_ep_cmd: ep1out: cmd 'Update Transfer' [20007] params 00000000 00000000 00000000 --> status: Successful
> >   irq/65-dwc3-498   [000] d..1    53.902752: dwc3_event: event (00006084): ep1out: Transfer In Progress [0] (SIm)
> >   irq/65-dwc3-498   [000] d..1    53.902763: dwc3_complete_trb: ep1out: trb 00000000bae39b48 buf 000000009eb0b100 size 0 ctrl 0000001c (hlCS:sc:normal)
> >   irq/65-dwc3-498   [000] d..1    53.902769: dwc3_complete_trb: ep1out: trb 000000009093a074 buf 0000000217da8000 size 488 ctrl 00000819 (HlcS:sC:normal)
> >   irq/65-dwc3-498   [000] d..1    53.902781: dwc3_gadget_giveback: ep1out: req 000000004e6eaaba length 24/24 zsI ==> 0
> > kworker/u16:0-7     [000] ....    53.903020: dwc3_free_request: ep1out: req 000000004e6eaaba length 24/24 zsI ==> 0
> >          adbd-461   [002] d..1    53.903273: dwc3_alloc_request: ep1out: req 00000000c769beab length 0/0 zsI ==> 0
> >          adbd-461   [002] d..2    53.903285: dwc3_ep_queue: ep1out: req 00000000c769beab length 0/24 zsI ==> -115
> >          adbd-461   [002] d..2    53.903292: dwc3_prepare_trb: ep1out: trb 00000000f0ffa827 buf 000000009eb11e80 size 24 ctrl 0000001d (HlCS:sc:normal)
> >          adbd-461   [002] d..2    53.903296: dwc3_prepare_trb: ep1out: trb 00000000d6a9892a buf 0000000217da8000 size 488 ctrl 00000819 (HlcS:sC:normal)
> >          adbd-461   [002] d..2    53.903315: dwc3_gadget_ep_cmd: ep1out: cmd 'Update Transfer' [20007] params 00000000 00000000 00000000 --> status: Successful
>
> Note that this transfer, after started, took 4 seconds to complete,
> while all others completed within a few ms. There's no real reason for
> this visible from dwc3 driver itself. What follows, is a transfer that
> never completed.
>
> The only thing I can come up with, is that we starve the TRB ring, by
> continuously reclaiming a single TRB. We have 255 usable TRBs, so after
> a few iterations, we would see a stall due to starved TRB ring.
>
> There is a way to verify this by tracking trb_enqueue and trb_dequeue,
> if you're willing to do that, that'll help us prove that this is really
> the problem and, since current tracepoints doen't really show that
> information, it may be a good idea to add this information to
> dwc3_log_trb tracepoint class. Something like below should be enough,
> could you re-run the test of failure2 with this patch applied?


Ok. Attached is the trace logs using the new tracepoints with and
without the patch. In both cases, I started with the usb-c cable
plugged in, started tracing and ran "adb logcat -d" a few times.

Also, in the -with-fix case, I'm using the patch modified as we
discussed yesterday (which still avoids the issue). If this log
confirms your suspicions I'll go ahead and resubmit the new patch.

thanks
-john
Felipe Balbi Jan. 25, 2020, 11:02 a.m. UTC | #5
Hi,

John Stultz <john.stultz@linaro.org> writes:
>> > On Wed, Jan 22, 2020 at 11:23 PM Felipe Balbi <balbi@kernel.org> wrote:
>> >> > From: Anurag Kumar Vulisha <anurag.kumar.vulisha@xilinx.com>
>> >> >
>> >> > The present code in dwc3_gadget_ep_reclaim_completed_trb() will check
>> >> > for IOC/LST bit in the event->status and returns if IOC/LST bit is
>> >> > set. This logic doesn't work if multiple TRBs are queued per
>> >> > request and the IOC/LST bit is set on the last TRB of that request.
>> >> > Consider an example where a queued request has multiple queued TRBs
>> >> > and IOC/LST bit is set only for the last TRB. In this case, the Core
>> >> > generates XferComplete/XferInProgress events only for the last TRB
>> >> > (since IOC/LST are set only for the last TRB). As per the logic in
>> >> > dwc3_gadget_ep_reclaim_completed_trb() event->status is checked for
>> >> > IOC/LST bit and returns on the first TRB. This makes the remaining
>> >> > TRBs left unhandled.
>> >> > To aviod this, changed the code to check for IOC/LST bits in both
>> >>      avoid
>> >>
>> >> > event->status & TRB->ctrl. This patch does the same.
>> >>
>> >> We don't need to check both. It's very likely that checking the TRB is
>> >> enough.
>> >
>> > Sorry, just to clarify, are you suggesting instead of:
>> > -       if (event->status & DEPEVT_STATUS_IOC)
>> > +       if ((event->status & DEPEVT_STATUS_IOC) &&
>> > +           (trb->ctrl & DWC3_TRB_CTRL_IOC))
>> >
>> >
>> > We do something like:
>> > -       if (event->status & DEPEVT_STATUS_IOC)
>> > +       if (trb->ctrl & DWC3_TRB_CTRL_IOC)
>> > +               return 1;
>> > +
>> > +       if (trb->ctrl & DWC3_TRB_CTRL_LST)
>> >                 return 1;
>> >
>> > ?
>>
>> that's correct. In hindsight, I have no idea why I used the
>> event->status here since all other checks are done against the TRB
>> only.
>>
>> >> > At a practical level, this patch resolves USB transfer stalls seen
>> >> > with adb on dwc3 based HiKey960 after functionfs gadget added
>> >> > scatter-gather support around v4.20.
>> >>
>> >> Right, I remember asking for tracepoint data showing this problem
>> >> happening. It's the best way to figure out what's really going on.
>> >>
>> >> Before we accept these two patches, could you collect dwc3 tracepoint
>> >> data and share here?
>> >
>> > Sure. Attached is trace logs and regdumps for hikey960.
>>
>> Thanks
>>
>> > The one gotcha with the logs is that in the working case (with this
>> > patch applied), I booted with the usb-c cable disconnected (as
>> > suggested in the dwc3.rst doc), enabled tracing and plugged in the
>> > device, then ran adb logcat a few times to validate no stalls.
>> >
>> > In the failure case (without this patch), I booted with the usb-c
>> > cable disconnected, enabled tracing and then when I plugged in the
>> > device, it never was detected by adb (it seems perhaps the problem had
>> > already struck?).
>>
>> You never got a Reset Interrupt, so something else is going on. I
>> suggest putting a sniffer and first making sure the host *does* drive
>> reset signalling. Second step would be to look at your phy
>> configuration. Is it going in suspend for any reason? Might want to try
>> our snps,dis_u3_susphy_quirk and snps,dis_u2_susphy_quirk flags.
>>
>> > So I generated the failure2 log by booting with USB-C plugged in,
>> > enabling tracing, and running adb logcat on the host to observe the
>> > stall.
>>
>> Thank you. Here's a quick summary of what's in failure2:
>>
>> There is a series of 24-byte transfers on ep1out and that's the one
>> which shows a problem. We can clearly see that adb is issuing one
>> transfer at a time, only enqueueing transfer n+1 when transfer n is
>> completed and given back, so we see a series of similar blocks:
>>
>> - dwc3_alloc_request
>> - dwc3_ep_queue
>> - dwc3_prepare_trb
>> - dwc3_prepare_trb (for the chained bit)
>> - dwc3_gadget_ep_cmd (update transfer)
>> - dwc3_event (transfer in progress)
>> - dwc3_complete_trb
>> - dwc3_complete_trb (for the chained bit)
>> - dwc3_gadget_giveback
>> - dwc3_free_request
>>
>> So this works for several iterations. Note, however, that the TRB
>> addresses don't really make sense. DWC3 allocates a contiguous block of
>> memory to server as TRB pool, but we see non-consecutive addresses on
>> these TRBs. I'm assuming there's an IOMMU in your system.
>>
>> Anyway, the failing point is here:
>>
>> >          adbd-461   [002] d..1    49.855992: dwc3_alloc_request: ep1out: req 000000004e6eaaba length 0/0 zsI ==> 0
>> >          adbd-461   [002] d..2    49.855994: dwc3_ep_queue: ep1out: req 000000004e6eaaba length 0/24 zsI ==> -115
>> >          adbd-461   [002] d..2    49.855996: dwc3_prepare_trb: ep1out: trb 00000000bae39b48 buf 000000009eb0b100 size 24 ctrl 0000001d (HlCS:sc:normal)
>> >          adbd-461   [002] d..2    49.855997: dwc3_prepare_trb: ep1out: trb 000000009093a074 buf 0000000217da8000 size 488 ctrl 00000819 (HlcS:sC:normal)
>> >          adbd-461   [002] d..2    49.856003: dwc3_gadget_ep_cmd: ep1out: cmd 'Update Transfer' [20007] params 00000000 00000000 00000000 --> status: Successful
>> >   irq/65-dwc3-498   [000] d..1    53.902752: dwc3_event: event (00006084): ep1out: Transfer In Progress [0] (SIm)
>> >   irq/65-dwc3-498   [000] d..1    53.902763: dwc3_complete_trb: ep1out: trb 00000000bae39b48 buf 000000009eb0b100 size 0 ctrl 0000001c (hlCS:sc:normal)
>> >   irq/65-dwc3-498   [000] d..1    53.902769: dwc3_complete_trb: ep1out: trb 000000009093a074 buf 0000000217da8000 size 488 ctrl 00000819 (HlcS:sC:normal)
>> >   irq/65-dwc3-498   [000] d..1    53.902781: dwc3_gadget_giveback: ep1out: req 000000004e6eaaba length 24/24 zsI ==> 0
>> > kworker/u16:0-7     [000] ....    53.903020: dwc3_free_request: ep1out: req 000000004e6eaaba length 24/24 zsI ==> 0
>> >          adbd-461   [002] d..1    53.903273: dwc3_alloc_request: ep1out: req 00000000c769beab length 0/0 zsI ==> 0
>> >          adbd-461   [002] d..2    53.903285: dwc3_ep_queue: ep1out: req 00000000c769beab length 0/24 zsI ==> -115
>> >          adbd-461   [002] d..2    53.903292: dwc3_prepare_trb: ep1out: trb 00000000f0ffa827 buf 000000009eb11e80 size 24 ctrl 0000001d (HlCS:sc:normal)
>> >          adbd-461   [002] d..2    53.903296: dwc3_prepare_trb: ep1out: trb 00000000d6a9892a buf 0000000217da8000 size 488 ctrl 00000819 (HlcS:sC:normal)
>> >          adbd-461   [002] d..2    53.903315: dwc3_gadget_ep_cmd: ep1out: cmd 'Update Transfer' [20007] params 00000000 00000000 00000000 --> status: Successful
>>
>> Note that this transfer, after started, took 4 seconds to complete,
>> while all others completed within a few ms. There's no real reason for
>> this visible from dwc3 driver itself. What follows, is a transfer that
>> never completed.
>>
>> The only thing I can come up with, is that we starve the TRB ring, by
>> continuously reclaiming a single TRB. We have 255 usable TRBs, so after
>> a few iterations, we would see a stall due to starved TRB ring.
>>
>> There is a way to verify this by tracking trb_enqueue and trb_dequeue,
>> if you're willing to do that, that'll help us prove that this is really
>> the problem and, since current tracepoints doen't really show that
>> information, it may be a good idea to add this information to
>> dwc3_log_trb tracepoint class. Something like below should be enough,
>> could you re-run the test of failure2 with this patch applied?
>
>
> Ok. Attached is the trace logs using the new tracepoints with and
> without the patch. In both cases, I started with the usb-c cable
> plugged in, started tracing and ran "adb logcat -d" a few times.
>
> Also, in the -with-fix case, I'm using the patch modified as we
> discussed yesterday (which still avoids the issue). If this log
> confirms your suspicions I'll go ahead and resubmit the new patch.

So the problem is caused with ep1in, not ep1out as I originally
though. Here's snippet with the fix:

            adbd-2020  [005] d..2   696.765411: dwc3_ep_queue: ep1in: req 0000000090c1f3b7 length 0/8197 zsI ==> -115
            adbd-2020  [005] d..2   696.765414: dwc3_prepare_trb: ep1in: trb 00000000c0b7b1ee (E97:D96) buf 00000000aac5d000 size 4096 ctrl 00000015 (HlCs:sc:normal)
            adbd-2020  [005] d..2   696.765415: dwc3_prepare_trb: ep1in: trb 00000000cd8ddc31 (E98:D96) buf 00000000adf18000 size 4101 ctrl 00000811 (Hlcs:sC:normal)
            adbd-2020  [005] d..2   696.765419: dwc3_gadget_ep_cmd: ep1in: cmd 'Update Transfer' [30007] params 00000000 00000000 00000000 --> status: Successful
     irq/65-dwc3-2021  [000] d..1   696.765640: dwc3_event: event (00004086): ep1in: Transfer In Progress [0] (sIm)
     irq/65-dwc3-2021  [000] d..1   696.765642: dwc3_complete_trb: ep1in: trb 00000000c0b7b1ee (E98:D97) buf 00000000aac5d000 size 0 ctrl 00000014 (hlCs:sc:normal)
     irq/65-dwc3-2021  [000] d..1   696.765644: dwc3_complete_trb: ep1in: trb 00000000cd8ddc31 (E98:D98) buf 00000000adf18000 size 0 ctrl 00000810 (hlcs:sC:normal)
     irq/65-dwc3-2021  [000] d..1   696.765647: dwc3_gadget_giveback: ep1in: req 0000000090c1f3b7 length 8197/8197 zsI ==> 0
   kworker/u16:0-7     [003] ....   696.765667: dwc3_free_request: ep1in: req 0000000090c1f3b7 length 8197/8197 zsI ==> 0

And without the fix:

            adbd-469   [005] d..1    40.118540: dwc3_alloc_request: ep1in: req 000000000dca92a3 length 0/0 zsI ==> 0
            adbd-469   [005] d..2    40.118541: dwc3_ep_queue: ep1in: req 000000000dca92a3 length 0/5424 zsI ==> -115
            adbd-469   [005] d..2    40.118543: dwc3_prepare_trb: ep1in: trb 0000000020352887 (E77:D76) buf 0000000057db5000 size 4096 ctrl 00000015 (HlCs:sc:normal)
            adbd-469   [005] d..2    40.118543: dwc3_prepare_trb: ep1in: trb 00000000227d614e (E78:D76) buf 0000000057db4000 size 1328 ctrl 00000811 (Hlcs:sC:normal)
            adbd-469   [005] d..2    40.118547: dwc3_gadget_ep_cmd: ep1in: cmd 'Update Transfer' [30007] params 00000000 00000000 00000000 --> status: Successful
     irq/65-dwc3-473   [000] d..1    40.118720: dwc3_event: event (00004086): ep1in: Transfer In Progress [0] (sIm)
     irq/65-dwc3-473   [000] d..1    40.118721: dwc3_complete_trb: ep1in: trb 0000000020352887 (E78:D77) buf 0000000057db5000 size 0 ctrl 00000014 (hlCs:sc:normal)
     irq/65-dwc3-473   [000] d..1    40.118730: dwc3_gadget_ep_cmd: ep1in: cmd 'Update Transfer' [30007] params 00000000 00000000 00000000 --> status: Successful

Note that we completed a single TRB in the failure case. The odd thing
is why this doesn't happen with OUT direction? (/me goes look at the
code).

Okay, here's the answer: With OUT direction, DWC3, itself, is adding an
extra chained TRB because OUT transfers must be aligned to
wMaxPacketSize. Because of that we set needs_extra_trb flag which causes
this flow:

XferInProgress
  dwc3_gadget_ep_cleanup_completed_request
    dwc3_gadget_ep_reclaim_trb_sg
      for_each_sg {
        dwc3_gadget_ep_reclaim_completed_trb
          if (IOC)
            break;
      }
      if (needs_extra_trb)
        dwc3_gadget_ep_reclaim_trb_linear
          dwc3_gadget_ep_reclaim_completed_trb

In summary, OUT directions work solely out of luck :-) If gadget
function enqueues an unaligned request with sglist already in it, it
should fail the same way, since we will append another TRB to something
that already uses more than one TRB.

We should probably add some of this explanation to commit log as well
and, BTW, tracepoints actually had the data to show where the problem
was, arguably printing out enqueue and dequeue points made it easier to
see the issue.

I'm now convinced of what the problem really is, please resend the
modified patch so we can apply and backport it.

cheers
John Stultz Jan. 27, 2020, 6:48 p.m. UTC | #6
On Sat, Jan 25, 2020 at 3:01 AM Felipe Balbi <balbi@kernel.org> wrote:
>
>
> Hi,
>
> John Stultz <john.stultz@linaro.org> writes:
> >> > On Wed, Jan 22, 2020 at 11:23 PM Felipe Balbi <balbi@kernel.org> wrote:
> >> >> > From: Anurag Kumar Vulisha <anurag.kumar.vulisha@xilinx.com>
> >> >> >
> >> >> > The present code in dwc3_gadget_ep_reclaim_completed_trb() will check
> >> >> > for IOC/LST bit in the event->status and returns if IOC/LST bit is
> >> >> > set. This logic doesn't work if multiple TRBs are queued per
> >> >> > request and the IOC/LST bit is set on the last TRB of that request.
> >> >> > Consider an example where a queued request has multiple queued TRBs
> >> >> > and IOC/LST bit is set only for the last TRB. In this case, the Core
> >> >> > generates XferComplete/XferInProgress events only for the last TRB
> >> >> > (since IOC/LST are set only for the last TRB). As per the logic in
> >> >> > dwc3_gadget_ep_reclaim_completed_trb() event->status is checked for
> >> >> > IOC/LST bit and returns on the first TRB. This makes the remaining
> >> >> > TRBs left unhandled.
> >> >> > To aviod this, changed the code to check for IOC/LST bits in both
> >> >>      avoid
> >> >>
> >> >> > event->status & TRB->ctrl. This patch does the same.
> >> >>
> >> >> We don't need to check both. It's very likely that checking the TRB is
> >> >> enough.
> >> >
> >> > Sorry, just to clarify, are you suggesting instead of:
> >> > -       if (event->status & DEPEVT_STATUS_IOC)
> >> > +       if ((event->status & DEPEVT_STATUS_IOC) &&
> >> > +           (trb->ctrl & DWC3_TRB_CTRL_IOC))
> >> >
> >> >
> >> > We do something like:
> >> > -       if (event->status & DEPEVT_STATUS_IOC)
> >> > +       if (trb->ctrl & DWC3_TRB_CTRL_IOC)
> >> > +               return 1;
> >> > +
> >> > +       if (trb->ctrl & DWC3_TRB_CTRL_LST)
> >> >                 return 1;
> >> >
> >> > ?
> >>
> >> that's correct. In hindsight, I have no idea why I used the
> >> event->status here since all other checks are done against the TRB
> >> only.
> >>
> >> >> > At a practical level, this patch resolves USB transfer stalls seen
> >> >> > with adb on dwc3 based HiKey960 after functionfs gadget added
> >> >> > scatter-gather support around v4.20.
> >> >>
> >> >> Right, I remember asking for tracepoint data showing this problem
> >> >> happening. It's the best way to figure out what's really going on.
> >> >>
> >> >> Before we accept these two patches, could you collect dwc3 tracepoint
> >> >> data and share here?
> >> >
> >> > Sure. Attached is trace logs and regdumps for hikey960.
> >>
> >> Thanks
> >>
> >> > The one gotcha with the logs is that in the working case (with this
> >> > patch applied), I booted with the usb-c cable disconnected (as
> >> > suggested in the dwc3.rst doc), enabled tracing and plugged in the
> >> > device, then ran adb logcat a few times to validate no stalls.
> >> >
> >> > In the failure case (without this patch), I booted with the usb-c
> >> > cable disconnected, enabled tracing and then when I plugged in the
> >> > device, it never was detected by adb (it seems perhaps the problem had
> >> > already struck?).
> >>
> >> You never got a Reset Interrupt, so something else is going on. I
> >> suggest putting a sniffer and first making sure the host *does* drive
> >> reset signalling. Second step would be to look at your phy
> >> configuration. Is it going in suspend for any reason? Might want to try
> >> our snps,dis_u3_susphy_quirk and snps,dis_u2_susphy_quirk flags.
> >>
> >> > So I generated the failure2 log by booting with USB-C plugged in,
> >> > enabling tracing, and running adb logcat on the host to observe the
> >> > stall.
> >>
> >> Thank you. Here's a quick summary of what's in failure2:
> >>
> >> There is a series of 24-byte transfers on ep1out and that's the one
> >> which shows a problem. We can clearly see that adb is issuing one
> >> transfer at a time, only enqueueing transfer n+1 when transfer n is
> >> completed and given back, so we see a series of similar blocks:
> >>
> >> - dwc3_alloc_request
> >> - dwc3_ep_queue
> >> - dwc3_prepare_trb
> >> - dwc3_prepare_trb (for the chained bit)
> >> - dwc3_gadget_ep_cmd (update transfer)
> >> - dwc3_event (transfer in progress)
> >> - dwc3_complete_trb
> >> - dwc3_complete_trb (for the chained bit)
> >> - dwc3_gadget_giveback
> >> - dwc3_free_request
> >>
> >> So this works for several iterations. Note, however, that the TRB
> >> addresses don't really make sense. DWC3 allocates a contiguous block of
> >> memory to server as TRB pool, but we see non-consecutive addresses on
> >> these TRBs. I'm assuming there's an IOMMU in your system.
> >>
> >> Anyway, the failing point is here:
> >>
> >> >          adbd-461   [002] d..1    49.855992: dwc3_alloc_request: ep1out: req 000000004e6eaaba length 0/0 zsI ==> 0
> >> >          adbd-461   [002] d..2    49.855994: dwc3_ep_queue: ep1out: req 000000004e6eaaba length 0/24 zsI ==> -115
> >> >          adbd-461   [002] d..2    49.855996: dwc3_prepare_trb: ep1out: trb 00000000bae39b48 buf 000000009eb0b100 size 24 ctrl 0000001d (HlCS:sc:normal)
> >> >          adbd-461   [002] d..2    49.855997: dwc3_prepare_trb: ep1out: trb 000000009093a074 buf 0000000217da8000 size 488 ctrl 00000819 (HlcS:sC:normal)
> >> >          adbd-461   [002] d..2    49.856003: dwc3_gadget_ep_cmd: ep1out: cmd 'Update Transfer' [20007] params 00000000 00000000 00000000 --> status: Successful
> >> >   irq/65-dwc3-498   [000] d..1    53.902752: dwc3_event: event (00006084): ep1out: Transfer In Progress [0] (SIm)
> >> >   irq/65-dwc3-498   [000] d..1    53.902763: dwc3_complete_trb: ep1out: trb 00000000bae39b48 buf 000000009eb0b100 size 0 ctrl 0000001c (hlCS:sc:normal)
> >> >   irq/65-dwc3-498   [000] d..1    53.902769: dwc3_complete_trb: ep1out: trb 000000009093a074 buf 0000000217da8000 size 488 ctrl 00000819 (HlcS:sC:normal)
> >> >   irq/65-dwc3-498   [000] d..1    53.902781: dwc3_gadget_giveback: ep1out: req 000000004e6eaaba length 24/24 zsI ==> 0
> >> > kworker/u16:0-7     [000] ....    53.903020: dwc3_free_request: ep1out: req 000000004e6eaaba length 24/24 zsI ==> 0
> >> >          adbd-461   [002] d..1    53.903273: dwc3_alloc_request: ep1out: req 00000000c769beab length 0/0 zsI ==> 0
> >> >          adbd-461   [002] d..2    53.903285: dwc3_ep_queue: ep1out: req 00000000c769beab length 0/24 zsI ==> -115
> >> >          adbd-461   [002] d..2    53.903292: dwc3_prepare_trb: ep1out: trb 00000000f0ffa827 buf 000000009eb11e80 size 24 ctrl 0000001d (HlCS:sc:normal)
> >> >          adbd-461   [002] d..2    53.903296: dwc3_prepare_trb: ep1out: trb 00000000d6a9892a buf 0000000217da8000 size 488 ctrl 00000819 (HlcS:sC:normal)
> >> >          adbd-461   [002] d..2    53.903315: dwc3_gadget_ep_cmd: ep1out: cmd 'Update Transfer' [20007] params 00000000 00000000 00000000 --> status: Successful
> >>
> >> Note that this transfer, after started, took 4 seconds to complete,
> >> while all others completed within a few ms. There's no real reason for
> >> this visible from dwc3 driver itself. What follows, is a transfer that
> >> never completed.
> >>
> >> The only thing I can come up with, is that we starve the TRB ring, by
> >> continuously reclaiming a single TRB. We have 255 usable TRBs, so after
> >> a few iterations, we would see a stall due to starved TRB ring.
> >>
> >> There is a way to verify this by tracking trb_enqueue and trb_dequeue,
> >> if you're willing to do that, that'll help us prove that this is really
> >> the problem and, since current tracepoints doen't really show that
> >> information, it may be a good idea to add this information to
> >> dwc3_log_trb tracepoint class. Something like below should be enough,
> >> could you re-run the test of failure2 with this patch applied?
> >
> >
> > Ok. Attached is the trace logs using the new tracepoints with and
> > without the patch. In both cases, I started with the usb-c cable
> > plugged in, started tracing and ran "adb logcat -d" a few times.
> >
> > Also, in the -with-fix case, I'm using the patch modified as we
> > discussed yesterday (which still avoids the issue). If this log
> > confirms your suspicions I'll go ahead and resubmit the new patch.
>
> So the problem is caused with ep1in, not ep1out as I originally
> though. Here's snippet with the fix:
>
>             adbd-2020  [005] d..2   696.765411: dwc3_ep_queue: ep1in: req 0000000090c1f3b7 length 0/8197 zsI ==> -115
>             adbd-2020  [005] d..2   696.765414: dwc3_prepare_trb: ep1in: trb 00000000c0b7b1ee (E97:D96) buf 00000000aac5d000 size 4096 ctrl 00000015 (HlCs:sc:normal)
>             adbd-2020  [005] d..2   696.765415: dwc3_prepare_trb: ep1in: trb 00000000cd8ddc31 (E98:D96) buf 00000000adf18000 size 4101 ctrl 00000811 (Hlcs:sC:normal)
>             adbd-2020  [005] d..2   696.765419: dwc3_gadget_ep_cmd: ep1in: cmd 'Update Transfer' [30007] params 00000000 00000000 00000000 --> status: Successful
>      irq/65-dwc3-2021  [000] d..1   696.765640: dwc3_event: event (00004086): ep1in: Transfer In Progress [0] (sIm)
>      irq/65-dwc3-2021  [000] d..1   696.765642: dwc3_complete_trb: ep1in: trb 00000000c0b7b1ee (E98:D97) buf 00000000aac5d000 size 0 ctrl 00000014 (hlCs:sc:normal)
>      irq/65-dwc3-2021  [000] d..1   696.765644: dwc3_complete_trb: ep1in: trb 00000000cd8ddc31 (E98:D98) buf 00000000adf18000 size 0 ctrl 00000810 (hlcs:sC:normal)
>      irq/65-dwc3-2021  [000] d..1   696.765647: dwc3_gadget_giveback: ep1in: req 0000000090c1f3b7 length 8197/8197 zsI ==> 0
>    kworker/u16:0-7     [003] ....   696.765667: dwc3_free_request: ep1in: req 0000000090c1f3b7 length 8197/8197 zsI ==> 0
>
> And without the fix:
>
>             adbd-469   [005] d..1    40.118540: dwc3_alloc_request: ep1in: req 000000000dca92a3 length 0/0 zsI ==> 0
>             adbd-469   [005] d..2    40.118541: dwc3_ep_queue: ep1in: req 000000000dca92a3 length 0/5424 zsI ==> -115
>             adbd-469   [005] d..2    40.118543: dwc3_prepare_trb: ep1in: trb 0000000020352887 (E77:D76) buf 0000000057db5000 size 4096 ctrl 00000015 (HlCs:sc:normal)
>             adbd-469   [005] d..2    40.118543: dwc3_prepare_trb: ep1in: trb 00000000227d614e (E78:D76) buf 0000000057db4000 size 1328 ctrl 00000811 (Hlcs:sC:normal)
>             adbd-469   [005] d..2    40.118547: dwc3_gadget_ep_cmd: ep1in: cmd 'Update Transfer' [30007] params 00000000 00000000 00000000 --> status: Successful
>      irq/65-dwc3-473   [000] d..1    40.118720: dwc3_event: event (00004086): ep1in: Transfer In Progress [0] (sIm)
>      irq/65-dwc3-473   [000] d..1    40.118721: dwc3_complete_trb: ep1in: trb 0000000020352887 (E78:D77) buf 0000000057db5000 size 0 ctrl 00000014 (hlCs:sc:normal)
>      irq/65-dwc3-473   [000] d..1    40.118730: dwc3_gadget_ep_cmd: ep1in: cmd 'Update Transfer' [30007] params 00000000 00000000 00000000 --> status: Successful
>
> Note that we completed a single TRB in the failure case. The odd thing
> is why this doesn't happen with OUT direction? (/me goes look at the
> code).
>
> Okay, here's the answer: With OUT direction, DWC3, itself, is adding an
> extra chained TRB because OUT transfers must be aligned to
> wMaxPacketSize. Because of that we set needs_extra_trb flag which causes
> this flow:
>
> XferInProgress
>   dwc3_gadget_ep_cleanup_completed_request
>     dwc3_gadget_ep_reclaim_trb_sg
>       for_each_sg {
>         dwc3_gadget_ep_reclaim_completed_trb
>           if (IOC)
>             break;
>       }
>       if (needs_extra_trb)
>         dwc3_gadget_ep_reclaim_trb_linear
>           dwc3_gadget_ep_reclaim_completed_trb
>
> In summary, OUT directions work solely out of luck :-) If gadget
> function enqueues an unaligned request with sglist already in it, it
> should fail the same way, since we will append another TRB to something
> that already uses more than one TRB.
>
> We should probably add some of this explanation to commit log as well
> and, BTW, tracepoints actually had the data to show where the problem
> was, arguably printing out enqueue and dequeue points made it easier to
> see the issue.
>
> I'm now convinced of what the problem really is, please resend the
> modified patch so we can apply and backport it.

Sure thing! Though I'm not as adept at staring at the matrix/tracelogs
as you, so I'll do my best to add a comment to the commit log to the
effect of the above, but it may not be accurate so feel free to reword
it yourself to correct it after I send it out. :)

thanks so much again for taking a look at this!
-john
diff mbox series

Patch

diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 154f3f3e8cff..1edce3bbb55c 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -2420,7 +2420,12 @@  static int dwc3_gadget_ep_reclaim_completed_trb(struct dwc3_ep *dep,
 	if (event->status & DEPEVT_STATUS_SHORT && !chain)
 		return 1;
 
-	if (event->status & DEPEVT_STATUS_IOC)
+	if ((event->status & DEPEVT_STATUS_IOC) &&
+	    (trb->ctrl & DWC3_TRB_CTRL_IOC))
+		return 1;
+
+	if ((event->status & DEPEVT_STATUS_LST) &&
+	    (trb->ctrl & DWC3_TRB_CTRL_LST))
 		return 1;
 
 	return 0;