mbox series

[v3,0/5] xHCI: Isochronous error handling fixes and improvements

Message ID 20250226080202.7eb5e142@foxbook (mailing list archive)
Headers show
Series xHCI: Isochronous error handling fixes and improvements | expand

Message

Michał Pecio Feb. 26, 2025, 7:02 a.m. UTC
These patches reduce latency of error reporting in some cases related
to 'error mid TD' and Missed Service events and sometimes fix a failure
to give back such TDs altogether until they are cancelled.

Also included are fixes for potential packet loss or memory corruption
due to obscure races. Whether it causes problems IRL is not known and
the worst case would be hard to reproduce, but exactly for this reason
if the worst case actually happens, it could be hard to debug too.

The first three should be safe. The fourth should also be safe, but it
relies on HC functionality Linux never relied on before, so I placed it
towards the end in case it would need some tweaks. I tested it on all
hardware I have and it worked just fine.

The last one is perhaps the most controversial, though it should be
OK with typical "complete -> resubmit" drivers. It's the only one here
which increases latency in some severe error cases. The intent is to
avoid potentially giving back URBs not yet executed by hardware.

New in v3:
- dropped the cleanup patch
- added Don't skip on Stopped - Length Invalid

New in v3:
- fixed spurious empty list warning on xrun
- clear skip flag if one skipped event was the last

Michal Pecio (5):
  usb: xhci: Don't skip on Stopped - Length Invalid
  usb: xhci: Complete 'error mid TD' transfers when handling Missed
    Service
  usb: xhci: Fix isochronous Ring Underrun/Overrun event handling
  usb: xhci: Expedite skipping missed isoch TDs on modern HCs
  usb: xhci: Skip only one TD on Ring Underrun/Overrun

 drivers/usb/host/xhci-ring.c | 55 +++++++++++++++++++++++++++++-------
 1 file changed, 45 insertions(+), 10 deletions(-)

Comments

Mathias Nyman Feb. 26, 2025, 12:41 p.m. UTC | #1
On 26.2.2025 9.02, Michal Pecio wrote:
> These patches reduce latency of error reporting in some cases related
> to 'error mid TD' and Missed Service events and sometimes fix a failure
> to give back such TDs altogether until they are cancelled.
> 
> Also included are fixes for potential packet loss or memory corruption
> due to obscure races. Whether it causes problems IRL is not known and
> the worst case would be hard to reproduce, but exactly for this reason
> if the worst case actually happens, it could be hard to debug too.
> 
> The first three should be safe. The fourth should also be safe, but it
> relies on HC functionality Linux never relied on before, so I placed it
> towards the end in case it would need some tweaks. I tested it on all
> hardware I have and it worked just fine.
> 
> The last one is perhaps the most controversial, though it should be
> OK with typical "complete -> resubmit" drivers. It's the only one here
> which increases latency in some severe error cases. The intent is to
> avoid potentially giving back URBs not yet executed by hardware.
> 
> New in v3:
> - dropped the cleanup patch
> - added Don't skip on Stopped - Length Invalid
> 
> New in v3:
> - fixed spurious empty list warning on xrun
> - clear skip flag if one skipped event was the last
> 
> Michal Pecio (5):
>    usb: xhci: Don't skip on Stopped - Length Invalid
>    usb: xhci: Complete 'error mid TD' transfers when handling Missed
>      Service
>    usb: xhci: Fix isochronous Ring Underrun/Overrun event handling
>    usb: xhci: Expedite skipping missed isoch TDs on modern HCs
>    usb: xhci: Skip only one TD on Ring Underrun/Overrun
> 

Updated my for-usb-next branch to this v3 version

Thanks
Mathias
Michał Pecio Feb. 26, 2025, 10:05 p.m. UTC | #2
On Wed, 26 Feb 2025 14:41:44 +0200, Mathias Nyman wrote:
> Updated my for-usb-next branch to this v3 version


A few remarks regarding "Add helper to find trb from its dma address":

xhci_dma_to_trb():
This function could use xhci_for_each_ring_seg.
The use of in_range() perhaps deserves a comment, because its
correctness is not as obvious as it might seem.

Long story short, my own version:

/*
 * Look up a TRB on the ring by its DMA address, return NULL if not found.
 * Start from deq_seg to optimize for event handling use.
 *
 * Note: false positive is possible if dma < TRB_SEGMENT_SIZE *and*
 * seg->dma > (dma_addr_t) 0 - TRB_SEGMENT_SIZE, but that shouldn't
 * happen if seg->dma is an allocation of size TRB_SEGMENT_SIZE.
 */
static union xhci_trb *xhci_dma_to_trb(struct xhci_ring *ring, dma_addr_t dma)
{
       struct xhci_segment *seg;

       xhci_for_each_ring_seg(ring->deq_seg, seg)
               if (in_range(dma, seg->dma, TRB_SEGMENT_SIZE))
                       return seg->trbs + (dma - seg->dma) / sizeof(seg->trbs[0]);

       return NULL;
}

>+       struct xhci_td *matched_td;

This variable is only used as bool so it could be declared as such.
Other places still use 'td' and assume that it equals 'matched_td'.
And that's OK because there is no need for iterating further after
the matching TD is found.

>+       /* find the transfer trb this events points to */
>+       if (ep_trb_dma)
>+               ep_trb = xhci_dma_to_trb(ep_ring, ep_trb_dma);

This may deserve a dedicated warning. It's a pathology. Either the
event is bogus due to internal corruption in the HC, or it's executing
TRBs from a wrong ring due to damaged/ignored Link TRB or bad Set Deq.
Or we completely screwed up and are looking at a wrong ep_ring here.

>-       if (trb_comp_code == COMP_MISSED_SERVICE_ERROR && !ep_trb_dma)
>+       if (trb_comp_code == COMP_MISSED_SERVICE_ERROR && !ep_trb)
>                return 0;

Good idea. I think I would go further and refuse to handle anything
when (ep_trb_dma && !ep_trb). Nothing is going to match, nothing good
will come from trying as far as I see.

But that's a behavior change, so maybe material for a separate patch.

>+               matched_td = NULL;
> 
>                /* Is this a TRB in the currently executing TD? */
>-               ep_seg = trb_in_td(xhci, td, ep_trb_dma, false);
>+               if (trb_in_td(xhci, td, ep_trb_dma, false))
>+                       matched_td = td;

If 'matched_td' is declared as bool, this will become simply:

matched_td = trb_in_td(xhci, td, ep_trb_dma, false);

Regards,
Michal